Skip to yearly menu bar Skip to main content


Interpretable Steering of Large Language Models with Feature Guided Activation Additions

Samuel Soo ⋅ Wesley Teng ⋅ Balaganesh Chandrasekaran ⋅ Guoxian TAN ⋅ Ming YAN

Abstract

Chat is not available.