Invited Talk
in
Workshop: NFAM Workshop: New Frontiers in Associative Memories Sun, Apr 26, 2026 • 9:45 AM – 10:15 AM PDT

From Theory to Throughput: Unifying Architectures and Scaling Deep Memory

Meisam Razaviyayn

Abstract

While Transformers and modern linear RNNs have driven massive advancements in sequence modeling, their underlying mechanisms are often treated as entirely distinct paradigms. In this talk, we first introduce MIRAS, a unifying framework that reconceptualizes these diverse architectures as associative-memory modules governed by (inverse) online optimization. We then explore how MIRAS unlocks a rich (potentially non-Euclidean) design space utilizing robust statistics, leading to novel, highly stable architectures. However, realizing the full potential of these deep memory modules requires overcoming several significant practical bottlenecks: token-level myopia, fixed memory capacities, and severe chunk-wise training inefficiencies. We will explore how recent advances resolve these limitations. We discuss ATLAS, which utilizes the Omega rule and Muon optimizer for optimal context memorization; Memory Caching, which allows RNN memory capacity to dynamically grow; and TNT, a novel parallel training paradigm that completely decouples training hardware throughput from inference resolution. Finally, we conclude by situating these innovations within the broader Nested Learning paradigm, proposing a future where both architectures and their optimizers function as an interconnected hierarchy of multi-timescale continual learning systems.

Speaker

Meisam Razaviyayn

Video

Chat is not available.