ICLR Poster Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs

Poster

Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs

Shashank Gupta · Vaishnavi Shrivastava · Ameet Deshpande · Ashwin Kalyan · Peter Clark · Ashish Sabharwal · Tushar Khot

Halle B #232

[ Abstract ] [ Project Page ]

[ Poster] [ OpenReview]

Abstract:

Recent works have showcased the ability of large-scale language models (LLMs) to embody diverse personas in their responses, exemplified by prompts like ‘You are Yoda. Explain the Theory of Relativity.’ While this ability allows personalization of LLMs and enables human behavior simulation, its effect on LLMs’ capabilities remains unclear. To fill this gap, we present the first extensive study of the unintended side-effects of persona assignment on the ability of LLMs to perform basic reasoning tasks. Our study covers 24 reasoning datasets (spanning mathematics, law, medicine, morals, and more), 4 LLMs (2 versions of ChatGPT-3.5, GPT-4-Turbo, and Llama-2-70b-chat), and 19 diverse personas (e.g., ‘an Asian person’) spanning 5 socio-demographic groups: race, gender, religion, disability, and political affiliation. Our experiments unveil that LLMs harbor deep rooted bias against various socio-demographics underneath a veneer of fairness. While they overtly reject stereotypes when explicitly asked (‘Are Black people less skilled at mathematics?’), they manifest stereotypical and often erroneous presumptions when prompted to answer questions while adopting a persona. These can be observed as abstentions in the model’s response, e.g., ‘As a Black person, I am unable to answer this question as it requires math knowledge’, and generally result in a substantial drop in performance on reasoning tasks. Our experiments with ChatGPT-3.5 show that this bias is ubiquitous—80% of our personas demonstrate bias; it is significant—some datasets show performance drops of 70%+; and can be especially harmful for certain groups—some personas suffer statistically significant drops on 80%+ of the datasets. Overall, all four LLMs exhibit persona-induced bias to varying extents, with GPT-4-Turbo showing the least but still a problematic amount of bias (evident in 42% of the personas). Further analysis shows that these persona-induced errors can be hard-to-discern as they do not always manifest as explicit abstentions, and can also be hard-to-avoid—we find de-biasing prompts to have minimal to no effect. Our findings serve as a cautionary tale that the practice of assigning personas to LLMs—a trend on the rise—can surface their deep-rooted biases and have unforeseeable and detrimental side-effects.

Chat is not available.