Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 2nd Workshop on Mathematical and Empirical Understanding of Foundation Models

On provable length and compositional generalization

Kartik Ahuja · Amin Mansouri


Abstract:

Length generalization -- the ability to generalize to longer sequences than ones seen during training, and compositional generalization -- the ability to generalize to token combinations not seen during training, are crucial forms of out-of-distribution generalization in sequence-to-sequence models. In this work, we take first steps towards provable length and compositional generalization for a range of architectures, including deep sets, transformers, state space models, and simple recurrent neural nets. Depending on the architecture, we prove different degrees of representation identification, e.g., a linear or a permutation relation with ground truth representation, is necessary for length and compositional generalization.

Chat is not available.