Simulating Concept Bottlenecks with Vision-Language Models
Abstract
Concept Bottleneck Models (CBMs) enhance transparency by first predicting human-interpretable concepts before producing the final decision, allowing experts to inspect and correct intermediate reasoning. We demonstrate that large vision–language models (VLMs) can naturally support this paradigm and act as a concept bottleneck by leveraging their parametric knowledge and generative capabilities. We introduce LangCBM, which uses VLMs to generate textual descriptions of visual concepts, followed by a lightweight extraction and classification pipeline. Training via supervised fine-tuning (SFT), optionally followed by reinforcement learning (RL), yields accurate concept predictions. Across synthetic and real-world benchmarks, LangCBM achieves competitive concept and label accuracy, as well as high post-intervention accuracy compared to alternative CBM formulations, establishing VLM-generated text as a viable, interpretable bottleneck representation.