Research

Reinforced Fast Weights with Next-Sequence Prediction

arXiv•February 18, 2026 ()•Hee Seung Hwang, Xindi Wu, Sanghyuk Chun, Olga Russakovsky

Professional Abstract

"The paper introduces REFINE (Reinforced Fast weIghts with Next sEquence prediction), a novel framework aimed at enhancing the capabilities of fast weight architectures in the context of long-context modeling. Fast weight architectures are recognized for their ability to maintain constant memory overhead irrespective of the context length, presenting a significant advantage over traditional attention-based transformers. However, the authors identify a critical limitation in the prevailing next-token prediction (NTP) training paradigm, which primarily focuses on optimizing single-token predictions. This approach neglects the semantic coherence that exists across multiple tokens following a prefix, leading to suboptimal representations that fail to adequately capture long-range dependencies in the data. To address this issue, REFINE employs a reinforcement learning strategy that shifts the training objective from NTP to next-sequence prediction (NSP). This shift allows the model to select informative token positions based on prediction entropy, thereby generating multi-token rollouts that better reflect the contextual relationships inherent in the data. The framework assigns self-supervised sequence-level rewards, which are crucial for guiding the learning process, and utilizes group relative policy optimization (GRPO) for model optimization. REFINE is designed to be versatile, applicable throughout various stages of the training lifecycle of pre-trained language models, including mid-training, post-training, and test-time training. The experimental results presented in the paper, based on models such as LaCT-760M and DeltaNet-1.3B, demonstrate that REFINE consistently outperforms traditional supervised fine-tuning methods based on NTP across a range of tasks, including needle-in-a-haystack retrieval, long-context question answering, and diverse benchmarks in LongBench. The findings underscore the potential of REFINE as an effective framework for enhancing long-context modeling in fast weight architectures, paving the way for future advancements in natural language processing and machine learning."

Technical Insights

1REFINE introduces a reinforcement learning approach to improve fast weight architectures for long-context modeling.

2The framework shifts the training paradigm from next-token prediction (NTP) to next-sequence prediction (NSP), addressing limitations in capturing long-range dependencies.

3Fast weight architectures maintain constant memory overhead, making them advantageous for processing long contexts compared to attention-based transformers.

4REFINE selects informative token positions based on prediction entropy, allowing for the generation of multi-token rollouts.

5The model assigns self-supervised sequence-level rewards to guide the learning process more effectively.

6Group relative policy optimization (GRPO) is employed for optimizing the model, enhancing its performance across various tasks.

7Experimental results demonstrate that REFINE outperforms traditional supervised fine-tuning methods based on NTP in multiple benchmarks.

8The framework is applicable throughout the training lifecycle, including mid-training, post-training, and test-time training.

9REFINE shows significant improvements in tasks such as needle-in-a-haystack retrieval and long-context question answering, indicating its versatility and effectiveness.