Poster
in
Workshop: ICLR 2025 Workshop on Bidirectional Human-AI Alignment

PREFERENCE OPTIMIZATION FOR CONCEPT BOTTLENECK MODELS

Emiliano Penaloza · Tianyue Zhang · Laurent Charlin · Mateo Espinosa Zarlenga

Project Page [ OpenReview]

Abstract

Concept Bottleneck Models (CBMs) propose to enhance the trustworthiness ofAI systems by constraining their decisions on a set of human-understandableconcepts. However, CBMs typically assume that datasets contain accurate conceptlabels—an assumption often violated in practice, which we show can significantlydegrade performance (by 25% in some cases). To address this, we introduce theConcept Preference Optimization (CPO) objective, a new loss function based onDirect Preference Optimization, which effectively mitigates the negative impactof concept mislabeling on CBM performance. We provide an analysis of somekey properties of the CPO objective showing it directly optimizes for the concept’sposterior distribution, and contrast it against Binary Cross Entropy (BCE) wherewe show CPO is inherently less sensitive to concept noise. We empirically confirmour analysis finding that CPO consistently outperforms BCE in three real-worlddatasets with and without added label noise.

Chat is not available.