Skip to yearly menu bar Skip to main content


Poster
in
Workshop: First Workshop on Representational Alignment (Re-Align)

How aligned are different alignment metrics?

Jannis Ahlert · Thomas Klein · Felix Wichmann · Robert Geirhos

Keywords: [ brain-score ] [ integrative benchmarking ] [ alignment ]


Abstract: In recent years, various methods and benchmarks have been proposed to empirically evaluate the alignment of artificial neural networks to human neural and behavioral data. But how aligned are different alignment metrics?To answer this question, we here analyze visual data from Brain-Score (Schrimpf et al., 2018), including metrics from the model-vs-human toolbox (Geirhos et al., 2021), together with human feature alignment (Linsley et al., 2018; Fel et al., 2022) and human similarity judgements (Muttenthaler et al., 2022).We find that pairwise correlations between neural scores and behavioral scores are quite low and sometimes even negative. For instance, the average correlation between those $95$ models on Brain-Score that were fully evaluated on all $51$ alignment metrics is only $0.161$. Assuming that all of the employed metrics are sound, this implies that alignment with human perception may best be thought of as a multidimensional concept, with different methods measuring fundamentally different aspects. Our results underline the importance of integrative benchmarking, but also raise questions about how to correctly combine and aggregate individual metrics. Aggregating by taking the arithmetic average, as done in Brain-Score, leads to the overall performance currently being dominated by behavior (81.24% explained variance) while the neural predictivity plays a less important role (only 67.31% explained variance). As a first step towards making sure that different alignment metrics all contribute towards aggregated scores, we therefore conclude by comparing three different aggregation options.

Chat is not available.