Dogs aren’t Cars: The Dangers of Poor Objectives for Representational Alignment
Abstract
Centered Kernel Alignment (CKA) is widely used to ask whether two neural networks learn the same representations, but it conflates two quantities: agreement on categorical structure, and agreement on individual stimuli. Whenever the stimulus set spans many categories, the categorical part dominates, and CKA ends up measuring shared task competence more than shared representation. We show this from both sides in the vision setting. In the vision setting, the models whose representations align most tightly under CKA are not the architecture family recent work has highlighted as the leading candidate for representational universality. They are the lowest-capacity models in the pool. We read this as capacity-forced convergence: without room to develop idiosyncratic representations, models fall back on the task-level signal any competent classifier produces. That cuts against architecture-first accounts of representational similarity. Going the other way, restricting stimuli to a single fine-grained superclass (dog breeds) removes the categorical signal entirely and exposes the within-category disagreement where architectures diverge, the same disagreement seen in the texture-vs-shape literature and in recent work on what ViTs and CLIP attend to relative to CNNs. Our results were submitted as winning entries to both tracks of the ICLR 2026 Re-Align Challenge.