Understanding the Learning Phases in Self-Supervised Learning via Critical Periods
Abstract
Self-supervised learning (SSL) has emerged as a powerful pretraining strategy to learn transferable representations from unlabeled data. Yet, it remains unclear how long SSL models should be pretrained to yield such representations. Contrary to the prevailing heuristic that longer pretraining translates to better downstream performance, we observe a transferability trade-off: across diverse SSL settings, intermediate checkpoints can yield stronger out-of-domain (OOD) generalization, whereas additional pretraining primarily benefits in-domain (ID) performance. From this observation, we hypothesize that SSL progresses through learning phases that can be characterized via the lens of critical periods (CP). Prior work on CP has shown that supervised models exhibit an early phase of high plasticity, followed by a consolidation phase where adaptability declines but task-specific performance increases. Since traditional CP analysis was developed for supervised settings, we rethink it for SSL in two ways. First, we inject deficits to perturb the pretraining data and assess their lasting impact on representation quality via downstream tasks. Second, we compute the Fisher Information on pretext objectives to track plasticity, quantifying how sensitive model parameters are to the pretext task. Our experiments suggest that SSL models may exhibit their own CP, with CP closure coinciding with a sweet spot for broad downstream transferability. Leveraging these insights, we introduce CP-guided checkpoint selection as a strategy for selecting checkpoints that offer stronger OOD transferability. Finally, to balance the transferability trade-off, we present CP-guided self-distillation, which selectively distills layer representations from the intermediate checkpoint into their overspecialized counterparts in the final checkpoint.