Workshop: PAIR^2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data

ConceptDistil: Model-Agnostic Distillation of Concept Explanations

João Pedro Sousa · Ricardo Moreira · Vladimir Balayan · Pedro Saleiro · Pedro Bizarro


Concept-based explainability aims to fill the model interpretability gap for non-technical decision-makers. Previous work has focused on providing concepts for specific models (e.g, neural networks) or data types (e.g., images), and by either trying to extract concepts from an already trained network or training self-explainable models through multi-task learning. In this work, we propose ConceptDistil, a method to bring concept explanations to any black-box classifier using knowledge distillation. Our method uses a surrogate neural network that approximates the predictions of a black-box classifier while producing concept explanations. We validate our proposed concept-based knowledge distillation explainer in a real world use-case, showing that it achieves alignment with the black-box classifier while attaining high performance on the explainability task, providing high-level domain explanations.

Chat is not available.