Skip to yearly menu bar Skip to main content


Poster
in
Workshop: New Frontiers in Associative Memories

Hebbian Sparse Autoencoder

Nikita Kurdiukov · Anton Razzhigaev


Abstract:

We establish a connection between a fully connected layer trained with Hebbian learning, augmented with Anti-Hebbian plasticity, and Sparse Autoencoders (SAEs) with tied weights. Specifically, we Hebbian learning to token embeddings in a small language model and find that the layer trained via biologically inspired rules becomes highly selective to certain features of these representations. We hypothesize that this arises from two factors: (1) Anti-Hebbian updates act as an effective L1/L2 regularization term, resembling the sparsity mechanism in SAEs, and (2) Hebbian updates approximately minimize the MSE reconstruction objective under the tied weights constraint. Although the resulting model produces interpretable, sparse representations, its performance remains below that of a standard SAE trained via backpropagation, likely due to the approximate and hand-tuned nature of the Hebbian “regularization.” Nevertheless, our findings highlightthe potential of Hebbian and Anti-Hebbian mechanisms as a biologically plausible means to capture key properties of sparse coding in language model embeddings.

Chat is not available.