Poster
in
Workshop: First Workshop on Representational Alignment (Re-Align)
Measuring Human-CLIP Alignment at Different Abstraction Levels
Pablo Hernández-Cámara · Jorge Vila Tomás · Jesus Malo · Valero Laparra
Keywords: [ CLIP ] [ different complexities ] [ abstraction levels ] [ human-alignment ]
Measuring the human alignment of trained models is gaining traction as practitioners realize its importance. Employing the CLIP model and some of its variants as a case study, we showcase the importance of utilizing different abstraction levels in the experiments, because when measuring image distances, the differences between them can have lower or higher abstraction. This allows us to extract richer conclusions about the models while showing some interesting phenomena arising when analyzing the models in a depth-wise fashion. The conclusions extracted from our analysis identify the size of the patches in which the image is divided as the most important factor in achieving a high human alignment for all the abstraction levels. Moreover, replacing the usual softmax activation with a sigmoid also increases the human alignment at all the abstractions especially in the last model layers. Surprisingly, training the model with Chinese captions or medical data gives more human-aligned models but only on low abstraction levels.