Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Pitfalls of limited data and computation for Trustworthy ML

DORA: Exploring outlier representations in Deep Neural Networks

Kirill Bykov · Mayukh Deb · Dennis Grinwald · Klaus R Muller · Marina Höhne


Abstract:

Deep Neural Networks (DNNs) draw their power from the representations they learn. However, while being incredibly effective in learning complex abstractions, they are susceptible to learn malicious artifacts, due to the spurious correlations inherent in the training data. In this paper, we introduce DORA (Data-agnOstic Representation Analysis): the first data-agnostic framework for the analysis of the representation space of DNNs. We propose a novel distance measure between representations that utilizes self-explaining capabilities within the network itself and quantitatively validate its alignment with human-defined semantic distance. We further demonstrate that this metric could be utilized for the detection of anomalous representations, which may bear a risk of learning unintended spurious concepts deviating from the desired decision-making policy. Finally, we demonstrate the practical utility of DORA by analyzing and identifying artifactual representations in widely popular Computer Vision networks.

Chat is not available.