Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Representational Alignment

Depth-Wise Activation Steering for Honest Language Models

Marysia Winkels ⋅ Gracjan Góral ⋅ Steven Basart

Abstract

Chat is not available.