Poster
in
Workshop: Representational Alignment

Verbalizing LLMs' Assumptions to Reduce Sycophancy

Myra Cheng ⋅ Sunny Yu ⋅ Lujain Ibrahim ⋅ Diyi Yang ⋅ Dan Jurafsky

Project Page [ OpenReview]

Abstract

Conversation often requires inferring the speaker’s underlying goal rather than interpreting statements literally. For instance, asking “does my outfit look OK?” may seek reassurance, not objective assessment. LLMs similarly infer users’ intentions, but these internal assumptions are typically implicit and inaccessible. We demonstrate that verbalized assumptions – prompting LLMs for their implicit assumptions – can help understand and control downstream behavior. We present three case studies where verbalized assumptions help understand and address LLM sycophancy, i.e., perceptions of LLMs excessively affirming and validating users. First, we show a systematic mismatch of expectations: for queries that are typically validation-seeking in human conversation, users expect LLMs to respond objectively, while LLMs internally assume validation-seeking intent. Second, we link sycophancy to LLMs overwhelmingly assuming that users are validation-seeking. Finally, we show that these representations can be causally intervened on: by probing and steering assumption-level representations, we reduce sycophantic behavior without degrading task performance. These results show that verbalized assumptions are a useful primitive for controlling LLM behavior to align with user expectations.

Chat is not available.