Skip to yearly menu bar Skip to main content


Poster

Why RoPE Struggles to Maintain Long-Term Decay in Long Sequences?

Wei Shen · Chao Yin · Yuliang Liu · Zikai Xiao · Xiaonan He · WangYan

Hall 3 + Hall 2B #128
[ ]
Thu 24 Apr 7 p.m. PDT — 9:30 p.m. PDT

Abstract:

Rotary Position Embedding (RoPE) improves upon traditional positional encodings but struggles with long-term decay in contexts exceeding its training length, limiting the model's generalization to longer sequences. Our experiments suggest that this issue may stem from a high proportion of obtuse angles on the complex plane between the linear transformations of query and key embeddings.

Live content is unavailable. Log in and register to view live content