Skip to yearly menu bar Skip to main content


Oral
in
Workshop: ICLR 2025 Workshop on Bidirectional Human-AI Alignment

Sycophancy Claims about Language Models: The Missing Human-in-the-Loop


Abstract:

In this tiny paper, we discuss automated methods for detecting sycophantic response patterns in Large Language Models (LLMs). Focusing on the methodological challenges of measuring and disambiguating sycophancy without human evaluation, we review existing research designs and discuss their operationalizations. Our analysis highlights the difficulties in distinguishing 'sycophantic' responses from related concepts in AI alignment and offers actionable recommendations for future research.

Chat is not available.