Poster
in
Workshop: Scientific Methods for Understanding Deep Learning (Sci4DL)

Multi-Task Pretraining Drives Representational Convergence

Core Francisco Park

Project Page [ OpenReview]

Abstract

What determines the geometry of a neural network’s internal representations, and when do different training objectives lead to the same representational solution? We study these questions using a controlled framework in which small transformers are trained on geometric tasks defined over real-world city coordinates. We find that single-task training produces diverse, task-specific representational geometries, from thread-like structures to 2D manifolds to fragmented clusters. However, multi-task training drives rapid representational convergence: models trained on different task combinations develop increasingly similar internal representations, as measured by CKA. A 7-task model spontaneously recovers world-map-like structure in raw PCA; while linear world representations exist in all models, multi-task training amplifies their magnitude until they dominate the principal components. These findings provide controlled evidence for the Multitask Scaling Hypothesis, one proposed mechanism underlying the Platonic Representation Hypothesis.

Chat is not available.