Poster
in
Workshop: PAIR^2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data
Sparse Logits Suffice to Fail Knowledge Distillation
Haoyu Ma · Yifan Huang · Hao Tang · Chenyu You · Deying Kong · Xiaohui Xie
Knowledge distillation (KD) aims to transfer the power of pre-trained teacher models to (more lightweight) student models. However, KD also poses the risk of intellectual properties (IPs) leakage of teacher models. Even if the teacher model is released as a black box, it can still be cloned through KD by imitating input-output behaviors. To address this unwanted effect of KD, the concept of Nasty Teacher was proposed recently. It is a special network that achieves nearly the same accuracy as a normal one, but significantly degrades the accuracy of student models trying to imitate it. Previous work builds the nasty teacher by retraining a new model and distorting its output distribution from the normal one via an adversarial loss. With this design, the ``nasty" teacher tends to produce sparse and noisy logits. However, it is unclear why the distorted distribution is catastrophic to the student model, as the nasty logits still maintain the correct labels.In this paper, we provide a theoretical analysis of why the sparsity of logits is key to Nasty Teacher. Furthermore, we propose an ideal version of the nasty teacher to prevent imitation through KD, named \textit{Stingy Teacher}. The Stingy Teacher directly manipulates the logits of a standard pre-trained network by maintaining the values for a small subset of classes while zeroing out the rest. Extensive experiments on several datasets demonstrate that stingy teacher is more catastrophic to student models on both standard KD and data-free KD. Code and pretrained models will be released upon acceptance.