Feature segregation by signed weights in artificial vision systems and biological models
Abstract
Signed connectivity is fundamental to neural computation in both brains (excitatory/inhibitory) and machines (positive/negative). Yet the role of signed weights in shaping visual representations in object recognition remains unclear. Dale's Law, the biological principle that neurons send exclusively excitatory or inhibitory outputs, is typically not enforced in artificial neural networks (ANNs). Here, we find that accuracy in ImageNet-trained ANNs correlates with the spontaneous emergence of sign-specific "Dale-like" segregation in their output layers. Ablation and feature visualization reveal a functional segregation in ANNs: removing positive inputs primarily disrupts localized, object-related structure, while removing negative inputs alters mainly dispersed background textures. This segregation is more pronounced in adversarially robust models, persists with unsupervised learning, and vanishes with non-rectified activation functions. We validate these observations in the macaque ventral visual cortex (V1, V4, and IT) using encoding models and in vivo feature visualization. The features recovered by encoding models qualitatively matched those identified in vivo. Model representations changed more upon positive than negative input ablations. We analyzed the most Dale-like units across neuron models, positive units showed localized features, while negative units showed larger, more dispersed features. Consistent with this, experimentally clearing the background around a neuron's preferred feature enhanced its response, likely by reducing inhibitory drive. Our results suggest that both artificial and biological vision systems segregate features by weight sign: positive weights emphasize object-related features, while negative weights refine context. This highlights a convergent representational strategy in brains and machines, yielding predictions for visual neuroscience.