Debiasing Concept-based Explanations with Causal Analysis

Mohammad Taha Bahadori · David Heckerman

Keywords: [ Interpretability ] [ Concept-based Explanation ]


Studying the concept-based explanation techniques, we provided evidences for potential existence of spurious association between the features and concepts due to unobserved latent variables or noise. We proposed a new causal prior graph that models the impact of the noise and latent confounding fron the estimated concepts. We showed that using the labels as instruments, we can remove the impact of the context from the explanations. Our experiments showed that our debiasing technique not only improves the quality of the explanations, but also improve the accuracy of predicting labels through the concepts. As future work, we will investigate other two-stage-regression techniques to find the most accurate debiasing method.

Chat is not available.