Understanding and Fixing Bottlenecks in State Space Models: What Recency and Over-Smoothing Tell Us
Adrita Das · Dantong Zhu
Abstract
State Space Models (SSMs), including Mamba, commonly suffer from two failure modes: recency bias, where the model biases strongly toward recent inputs, and over-smoothing, where hidden states become indistinguishable with depth. The paper argues that these issues originate from the learned state-transition matrix $A_t$, whose memory-decay values collapse into a narrow range, limiting the diversity of timescales the model can represent. To mitigate this, the authors introduce polarization, where one dimension of $A_t$ is fixed to serve as a perfect long-term memory channel and another as a pure short-term memory channel, while all other dimensions remain learnable. This enforces both a stable non-decaying pathway and a rapidly resetting pathway, preventing the system from collapsing into uniformly slow or fast decay behavior. Through associative-recall experiments, the polarized Mamba variants demonstrate substantially improved long-context retrieval. Overall, the findings indicate that standard parameterizations of $A_t$ fail to preserve sufficient memory diversity, whereas polarization offers a simple and effective mechanism for stabilizing long-range information flow in SSMs.
Successful Page Load