Skip to yearly menu bar Skip to main content


Blog Track Poster

Is the evidence in 'Language Models Learn to Mislead Humans via RLHF' valid?

Aaryan Chandna ⋅ Lukas Fluri ⋅ Micah Carroll

Abstract

Log in and register to view live content