Instruction Following by Principled Attention Boosting of Large Language Models
Abstract
Large language models' behavior is often shaped by instructions such as system prompts, refusal boundaries, privacy constraints, and tool-use rules that must hold at inference time. One such training-free intervention is attention steering, which biases attention toward instruction tokens. In this work, we present a theoretical formalization of instruction following as rule-based competition between instruction rules and context-derived rules, with attention mediating which rules dominate, unifying existing attention-steering methods. We prove that boosting attention to instruction tokens tilts this competition, making it harder for context to override instruction-following. However, excessive boosting can suppress task-relevant context that should be incorporated alongside the instruction. Guided by this theory, we propose Instruction Attention Boosting (\ourmethod), a simple intervention that applies a constant additive bias to instruction-key attention logits uniformly.