Poster
Generalizing Reasoning Problems to Longer Lengths
Changnan Xiao · Bing Liu
Hall 3 + Hall 2B #305
[
Abstract
]
Fri 25 Apr 7 p.m. PDT
— 9:30 p.m. PDT
Abstract:
Length generalization (LG) is a challenging problem in learning to reason. It refers to the phenomenon that when trained on reasoning problems of smaller lengths/sizes, the model struggles with problems of larger sizes or lengths. Although it has been proven that reasoning can be learned if the intermediate reasoning steps (also known as chain-of-thought (CoT)) are given in the training data, existing studies only apply to within a given length (interpolation), while LG is about extrapolation beyond the given length. This paper begins by presenting a theorem that identifies the root cause of the LG problem. It then defines a class of reasoning problems for which achieving LG with Transformers can be theoretically guaranteed, provided the CoT schemes are constructed to meet a proposed condition called (n,r)-consistency.
Live content is unavailable. Log in and register to view live content