Processing math: 100%
Skip to yearly menu bar Skip to main content


Poster

Metric-Driven Attributions for Vision Transformers

Chase Walker · Sumit Jha · Rickard Ewetz

Hall 3 + Hall 2B #535
[ ] [ Project Page ]
Fri 25 Apr midnight PDT — 2:30 a.m. PDT

Abstract: Attribution algorithms explain computer vision models by attributing the model response to pixels within the input. Existing attribution methods generate explanations by combining transformations of internal model representations such as class activation maps, gradients, attention, or relevance scores. The effectiveness of an attribution map is measured using attribution quality metrics. This leads us to pose the following question: if attribution methods are assessed using attribution quality metrics, why are the metrics not used to generate the attributions? In response to this question, we propose a Metric-Driven Attribution for explaining Vision Transformers (ViT) called MDA. Guided by attribution quality metrics, the method creates attribution maps by performing patch order and patch magnitude optimization across all patch tokens. The first step orders the patches in terms of importance and the second step assigns the magnitude to each patch while preserving the patch order. Moreover, MDA can provide a smooth trade-off between sparse and dense attributions by modifying the optimization objective. Experimental evaluation demonstrates the proposed MDA method outperforms 7 existing ViT attribution methods by an average of 12% across 12 attribution metrics on the ImageNet dataset for the ViT-base 16×16, ViT-tiny 16×16, and ViT-base 32×32 models. Code is publicly available at https://github.com/chasewalker26/MDA-Metric-Driven-Attributions-for-ViT.

Live content is unavailable. Log in and register to view live content