Reinforced Fast Weights with Next-Sequence Prediction
Hee Seung Hwang ⋅ Xindi Wu ⋅ Sanghyuk Chun ⋅ Olga Russakovsky
Abstract
Fast weight architectures offer a promising alternative to attention-based transformers in long-context settings, but their potential is limited by the next-token prediction (NTP) training paradigm. The NTP objective ignores semantic relations of multiple tokens that follow a prefix by optimizing for single-token predictions. As a result, fast weight models, which dynamically update their parameters to store contextual information, learn suboptimal mappings of token representations. We introduce REFINE ($\textbf{Re}$inforced $\textbf{F}$ast we$\textbf{I}$ghts with $\textbf{N}$ext s$\textbf{E}$quence prediction), a method that leverages reinforcement learning to train fast weight models under the next-sequence prediction (NSP) objective. REFINE selects informative token positions with high entropy, generates multi-token rollouts, assigns self-supervised sequence-level rewards, and optimizes the model with GRPO. REFINE is applicable throughout the language model training lifecycle, including mid-, post-, and test-time training phases. Our experiments on the LaCT-760M and DeltaNet-1.3B models show that REFINE consistently outperforms supervised fine-tuning under NTP in needle-in-a-haystack tasks, long-context QA, and various subtasks in LongBench. REFINE thus provides an effective and versatile solution for improving long-context modeling of fast weight architectures.
Chat is not available.
Successful Page Load