Poster
in
Workshop: Scientific Methods for Understanding Deep Learning (Sci4DL)

Instruction Following by Principled Attention Boosting of Large Language Models

Vitoria Guardieiro ⋅ Avishree Khare ⋅ Adam Stein ⋅ Eric Wong

Project Page [ OpenReview]

Abstract

Large language models' behavior is often shaped by instructions such as system prompts, refusal boundaries, privacy constraints, and tool-use rules that must hold at inference time. One such training-free intervention is attention steering, which biases attention toward instruction tokens. In this work, we present a theoretical formalization of instruction following as rule-based competition between instruction rules and context-derived rules, with attention mediating which rules dominate, unifying existing attention-steering methods. We prove that boosting attention to instruction tokens tilts this competition, making it harder for context to override instruction-following. However, excessive boosting can suppress task-relevant context that should be incorporated alongside the instruction. Guided by this theory, we propose Instruction Attention Boosting (\ourmethod), a simple intervention that applies a constant additive bias to instruction-key attention logits uniformly.

Chat is not available.