Skip to yearly menu bar Skip to main content


Poster

Rethinking Self-Distillation: Label Averaging and Enhanced Soft Label Refinement with Partial Labels

Hyeonsu Jeong · Hye Won Chung

Hall 3 + Hall 2B #442
[ ] [ Project Page ]
Fri 25 Apr 7 p.m. PDT — 9:30 p.m. PDT

Abstract:

We investigate the mechanisms of self-distillation in multi-class classification, particularly in the context of linear probing with fixed feature extractors where traditional feature learning explanations do not apply. Our theoretical analysis reveals that multi-round self-distillation effectively performs label averaging among instances with high feature correlations, governed by the eigenvectors of the Gram matrix derived from input features. This process leads to clustered predictions and improved generalization, mitigating the impact of label noise by reducing the model's reliance on potentially corrupted labels. We establish conditions under which multi-round self-distillation achieves 100\% population accuracy despite label noise. Furthermore, we introduce a novel, efficient single-round self-distillation method using refined partial labels from the teacher's top two softmax outputs, referred to as the PLL student model. This approach replicates the benefits of multi-round distillation in a single round, achieving comparable or superior performance--especially in high-noise scenarios--while significantly reducing computational cost.

Live content is unavailable. Log in and register to view live content