Skip to yearly menu bar Skip to main content


Blog Post Poster

Is the evidence in 'Language Models Learn to Mislead Humans via RLHF' valid?

Aaryan Chandna · Lukas Fluri · Micah Carroll

Abstract

Log in and register to view live content