Skip to yearly menu bar Skip to main content


Poster

Reward Models Inherit Value Biases from Pretraining

Brian Christian · Jessica Thompson · Elle Michelle Yang · Vincent Adam · Hannah Kirk · Christopher Summerfield · Tsvetomira Dumbalska

Abstract

Log in and register to view live content