Skip to yearly menu bar Skip to main content


Poster
in
Workshop: XAI4Science: From Understanding Model Behavior to Discovering New Scientific Knowledge

ULTra: Unveiling Latent Token Interpretability in Transformer-Based Understanding

Hesam Hosseini · Ghazal Hosseini Mighan · Amirabbas Afzali · Sajjad Amini · Amir Houmansadr


Abstract:

Transformers have revolutionized Computer Vision (CV) through self-attention mechanisms. However, due to their complexity, their latent token representations are often difficult to interpret. We propose a framework to interpret Transformer embeddings, revealing semantic patterns. Based on this framework, we demonstrate that zero-shot unsupervised semantic segmentation can be performed effectively without any fine-tuning using a model pre-trained for tasks other than segmentation. Our method showcases Transformers' innate semantic understanding, surpassing traditional models. It attains 67.2\% accuracy and 32.9\% mIoU on COCO-Stuff and 51.9\% mIoU on PASCAL VOC.

Chat is not available.