Skip to yearly menu bar Skip to main content


Poster

Lambda-Skip Connections: the architectural component that prevents Rank Collapse

Federico Arangath Joseph · Jerome Sieber · Melanie Zeilinger · Carmen Amo Alonso

Hall 3 + Hall 2B #338
[ ]
Thu 24 Apr 7 p.m. PDT — 9:30 p.m. PDT

Abstract:

Rank collapse, a phenomenon where embedding vectors in sequence modelsrapidly converge to a uniform token or equilibrium state, has recently gained at-tention in the deep learning literature. This phenomenon leads to reduced expres-sivity and potential training instabilities due to vanishing gradients. Empirical ev-idence suggests that architectural components like skip connections, LayerNorm,and MultiLayer Perceptrons (MLPs) play critical roles in mitigating rank collapse.While this issue is well-documented for transformers, alternative sequence mod-els, such as State Space Models (SSMs), which have recently gained prominence,have not been thoroughly examined for similar vulnerabilities. This paper extendsthe theory of rank collapse from transformers to SSMs using a unifying frame-work that captures both architectures. We introduce a modification in the skipconnection component, termed lambda-skip connections, that provides guaran-tees for rank collapse prevention. We present, via analytical results, a sufficientcondition to achieve the guarantee for all of the aforementioned architectures. Wealso study the necessity of this condition via ablation studies and analytical exam-ples. To our knowledge, this is the first study that provides a general guarantee toprevent rank collapse, and that investigates rank collapse in the context of SSMs,offering valuable understanding for both theoreticians and practitioners. Finally,we validate our findings with experiments demonstrating the crucial role of archi-tectural components in preventing rank collapse.

Live content is unavailable. Log in and register to view live content