Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Representational Alignment

Probing and Steering Chain-of-Thought Unfaithfulness in Language Models

Giovanni Occhipinti ⋅ Alessandro Abate ⋅ Nandi Schoots

Abstract

Chat is not available.