Skip to yearly menu bar Skip to main content

Workshop: PAIR^2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data

Post-hoc Concept Bottleneck Models

Mert Yuksekgonul · Maggie Wang · James Y Zou


Concept Bottleneck Models (CBMs) map the inputs onto a concept bottleneck and use the bottleneck to make a prediction. A concept bottleneck enhances interpretability since it can be investigated to understand what the model sees in an input, and which of these concepts are deemed important. However, CBMs are restrictive in practice as they require concept labels during training to learn the bottleneck. Additionally, it is questionable if CBMs can match the accuracy of an unrestricted neural network trained on a given domain, potentially reducing the incentive to deploy them in practice. In this work, we address these two key limitations by introducing Post-hoc Concept Bottleneck models (P-CBMs). We show that we can turn any neural network into a P-CBM, without sacrificing model performance and retaining interpretability benefits. Finally, we show that P-CBMs can provide significant performance gains with model editing without any fine-tuning and needing data from the target domain.

Chat is not available.