Poster
in
Workshop: First Workshop on Representational Alignment (Re-Align)
Self-supervised learning facilitates neural representation structures that can be unsupervisedly aligned to human behaviors
Soh Takahashi · Masaru Sasaki · Ken Takeda · Masafumi Oizumi
Keywords: [ Unsupervised alignment ] [ Similarity structure ] [ self-supervised learning ] [ optimal transport ]
The structure of perceived similarity between objects is crucial for understanding human object recognition. The acquisition of such a similarity structure during development is a pivotal question in cognitive science and neuroscience. While previous studies have focused on supervised learning guided by external teacher signals of object categories, the absence of such signals in early development prompts an exploration of the role of self-supervised learning. Self-supervised learning is thought to be the dominant mechanism for pre-linguistic learning, and supervised learning takes place after language is learned. Here, we compare the similarity structure of human object representations with the internal representations of a deep neural network model that underwent training through self-supervised contrastive learning, followed by supervised learning. To compare two similarity structures at the fine-item-level, we employed an unsupervised alignment approach using Gromov-Wasserstein Optimal Transport rather than the conventional supervised alignment approach known as Representational Similarity Analysis. We found that the model trained via self-supervised contrastive learning followed by supervised learning was more aligned with human behavior compared to models solely trained by supervised learning or self-supervised contrastive learning. We also found that at the level of coarse categories, the internal representation structure acquired through self-supervised learning alone was somewhat alignable to human behavior. These results suggest that self-supervised learning and its combination with supervised learning are effective in acquiring similarity structure that is unsupervisedly alignable to human behavior, offering potential mechanisms for the development of human object representations.