ICLR Poster Annealing Self-Distillation Rectification Improves Adversarial Training

Poster

Annealing Self-Distillation Rectification Improves Adversarial Training

Yu-Yu Wu · Hung-Jui Wang · Shang-Tse Chen

Halle B #126

[ Abstract ] [ Project Page ]

[ Poster] [ OpenReview]

Abstract:

In standard adversarial training, models are optimized to fit invariant one-hot labels for adversarial data when the perturbations are within allowable budgets. However, the overconfident target harms generalization and causes the problem of robust overfitting. To address this issue and enhance adversarial robustness, we analyze the characteristics of robust models and identify that robust models tend to produce smoother and well-calibrated outputs. Based on the observation, we propose a simple yet effective method, Annealing Self-Distillation Rectification (ADR), which generates soft labels as a better guidance mechanism that reflects the underlying distribution of data. By utilizing ADR, we can obtain rectified labels that improve model robustness without the need for pre-trained models or extensive extra computation. Moreover, our method facilitates seamless plug-and-play integration with other adversarial training techniques by replacing the hard labels in their objectives. We demonstrate the efficacy of ADR through extensive experiments and strong performances across datasets.

Chat is not available.