The Geometrical and Topological Signature of Transformers
Asif Khan
Abstract
We propose a topological framework to analyze the layerwise evolution of transformer representations by modeling attention heads as Markov kernels on a token metric space. This formulation admits a Wasserstein-1 ($W_1$) lifting where coarse Ollivier-Ricci curvature provides quantitative bounds on the action of the induced operator. A positive curvature implies layerwise Wasserstein contraction while negative implies expansion. To connect these statements to practice, we introduce a reproducible probe that estimates robust curvature lower quantiles, directly tests contraction on random measures in $W_1$, and tracks layerwise topological simplification using persistent homology on diffusion-induced distances. In pretrained GPT-2 and GPT-2-medium models, we observe a depthwise transition toward more contractive support, with shrinking ($H_1$) lifetimes and persistence of a coarse ($H_0$) skeleton.
Chat is not available.
Successful Page Load