Poster
in
Workshop: I Can't Believe It's Not Better: Where Large Language Models need to improve

Query Timing Produces Opposite Positional Biases Between LLMs and Humans

Jasin Cekinmez ⋅ Addison J. Wu ⋅ Thomas L. Griffiths

Project Page [ Poster] [ OpenReview]

Abstract

Positional biases such as recency and primacy effects have been documented in large language models (LLMs), yet the underlying mechanism by which these models make their evaluations remains poorly understood. Both $\textit{primacy}$ and $\textit{recency}$ biases have been observed in human judgments in response to evidence, but recent work suggest that $\textit{when}$ the listener updates their beliefs -- during the presentation of evidence or only at the end -- influences the presence of such effects. We investigate whether a similar phenomenon holds for LLMs, finding divergence from human behavior across many models. Furthermore, we find that such positional biases are more exacerbated in newer models compared to their predecessors, raising concerns about the reliability and robustness of LLM-based evaluations in settings where evidence order should be irrelevant.

Chat is not available.