Poster
in
Workshop: 5th Workshop on practical ML for limited/low resource settings (PML4LRS) @ ICLR 2024
NEURAL NETWORK COMPRESSION: THE FUNCTIONAL PERSPECTIVE
Israel Mason-Williams
Compression techniques, such as Knowledge distillation, Pruning, and Quantization reduce the computational costs of model inference and enable on-edge machine learning. The efficacy of compression methods is often evaluated through the proxy of accuracy and loss to understand similarity of the compressed model. This study aims to explore the functional divergence between compressed and uncompressed models. The results indicate that Quantization and Pruning create models that are functionally similar to the original model. In contrast, Knowledge distillation creates models that do not functionally approximate their teacher models. The compressed model resembles the dissimilarity of function observed in independently trained models. Therefore, it is verified, via a functional understanding, that Knowledge distillation is not a compression method. Thus, leading to the definition of Knowledge distillation as a training regulariser given that no knowledge is distilled from a teacher to a student.