Geometric Stability of Representation Manifolds as a Training-Free Diagnostic for Studying Data Augmentations
Abstract
Data augmentation is the primary mechanism for defining representation invariances in self-supervised learning (SSL), but the selection of augmentations remains largely empirical and computationally costly, as it typically requires repeated full training runs for validation. We introduce a training-free diagnostic that evaluates augmentations based on the geometric stability of the learned embedding manifold. Our method uses Procrustes analysis to measure the non-rigid distortions caused by augmentation operators in the feature space of a strong pre-trained encoder. We observe a statistically significant relationship between geometric preservation and the semantic consistency of representations in high-dimensional space. These findings establish global geometric stability as a computationally efficient, training-free diagnostic for studying the semantic effects of data augmentations. Furthermore, we investigate the boundary conditions by analyzing situations in which geometric proximity decouples from instance-level discriminability. Our framework provides a principled and mathematically grounded approach for evaluating augmentations in medical and general-purpose foundation models.