Workshop: PAIR^2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data

Saliency Maps Contain Network "Fingerprints"

Amy Widdicombe · Been Kim · Simon Julier


Explaining deep learning models and their predictions is an open question with many proposed, but difficult to validate, solutions. This difficulty in assessing explanation methods has raised the question on the validity of these methods: What are they showing and what are the factors influencing the explanations? Furthermore, how should one choose which one to use? Here, we explore saliency-type methods, finding that saliency maps contain network “fingerprints”, by which the network which generated the map can be uniquely identified. We test this by creating datasets made up of saliency maps from different “primary” networks, then training “secondary” networks on these saliency-map datasets. We find that secondary networks can learn to identify which primary network a saliency map comes from. Our findings hold across several saliency methods and for both CNN and ResNet "primary" architectures.Our analysis also reveals complex relationships between methods: a set of methods share fingerprints, while some contain unique fingerprints. We discuss a potentially related prior work that may explain some of these relationships; some methods are made up of 'higher order derivatives'.Our simple analytical framework is a first step towards understanding ingredients of and relationships between many saliency methods.

Chat is not available.