Skip to yearly menu bar Skip to main content


Poster
in
Workshop: ICLR 2025 Workshop on Bidirectional Human-AI Alignment

Sycophancy Claims about Language Models: The Missing Human-in-the-Loop

Jan Batzner · Volker Stocker · Stefan Schmid · Gjergji Kasneci


Abstract:

In this tiny paper, we discuss automated methods for detecting sycophantic response patterns in Large Language Models (LLMs). Focusing on the methodological challenges of measuring and disambiguating sycophancy without human evaluation, we review existing research designs and discuss their operationalizations. Our analysis highlights the difficulties in distinguishing 'sycophantic' responses from related concepts in AI alignment and offers actionable recommendations for future research.

Chat is not available.