Poster
Divisive Feature Normalization Improves Image Recognition Performance in AlexNet
Michelle Miller · SueYeon Chung · Ken Miller
Keywords: [ imagenet ] [ batch normalization ] [ receptive fields ] [ sparsity ] [ layer normalization ]
Local divisive normalization provides a phenomenological description of many nonlinear response properties of neurons across visual cortical areas. To gain insight into the utility of this operation, we studied the effects on AlexNet of a local divisive normalization between features, with learned parameters. Developing features were arranged in a line topology, with the influence between features determined by an exponential function of the distance between them. We compared an AlexNet model with no normalization or with canonical normalizations (Batch, Group, Layer) to the same models with divisive normalization added (before the canonical normalization, when those were used). The normalization was performed after the RELU in all five convolutional layers. Divisive normalization always improved performance for models with batch or group or no normalization, gen- erally by 1-2 percentage points, on both the CIFAR-100 and ImageNet databases. Divisive followed by batch normalization showed best performance. To gain in- sight into mechanisms underlying the improved performance, we examined several aspects of network representations. In the early layers both canonical and divisive normalizations reduced manifold capacities and increased average dimension of the individual categorical manifolds. In later layers the capacity was higher and manifold dimension lower for models roughly in order of their performance im- provement. We also use the Gini index, a measure of the inequality of a distribution, as a metric for sparsity of the distribution of activities within a given layer. Divisive normalization layers increase the Gini index (i.e. increase sparsity), whereas the other normalizations decrease the Gini index in their respective layers. Nonetheless, in the final layer, the sparseness of activity increases in the order of no normal- ization, divisive, combined, and canonical. We also investigate how the receptive fields (RFs) in the first convolutional layer (where RFs are most interpretable) change with normalization. Divisive normalization enhances RF Fourier power at low wavelengths, and divisive+canonical enhances power at mid (batch, group) or low (layer) wavelengths, compared to canonical alone or no normalization. In conclusion, divisive normalization enhances image recognition performance, most strongly when combined with canonical normalization, and in doing so it reduces manifold capacity and sparsity in early layers while increasing them in final layers, and increases low- or mid-wavelength power in the first-layer receptive fields.