Invited Talk
in
Workshop: NFAM Workshop: New Frontiers in Associative Memories Sun, Apr 26, 2026 • 5:40 AM – 6:10 AM PDT

Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory

Xueyan Niu

Abstract

We present a theoretical framework connecting Transformer attention to associative memory through three components: distance-based energy for nearest-neighbor search, layer stacking via majorization-minimization, and cross-entropy loss analysis. Empirical scaling laws lack theoretical justification and cannot explain why smaller models sometimes outperform larger ones. Our main result shows that for memorizing well-separated patterns, optimal scaling satisfies N = O(D²), where N is parameters and D is dataset size. To demonstrate practical applications, we present NeuralDB, which scales knowledge editing to 10,000 facts, an order-of-magnitude improvement over existing methods, while maintaining robust generalization. This work positions associative memory as a unified lens for understanding and improving large language models, bridging theory and practice.

Speaker

Xueyan Niu

Video

Chat is not available.