Poster
in
Workshop: Workshop on Sparsity in LLMs (SLLM): Deep Dive into Mixture of Experts, Quantization, Hardware, and Inference
TASP: Preserving Training Dynamics in Transformers via NTK-Aware Structured Pruning
Mengting Ai · Tianxin Wei · Jingrui He
Structured pruning of large-scale Transformer models promises substantial efficiency gains by removing entire hidden units. However, such pruning often degrades accuracy more than unstructured pruning, necessitating compensation strategies such as supervised fine-tuning (SFT) or adapter modules (e.g., LoRA). In this paper, we introduce TASP (Neural Tangent Kernel-Aware Structured Pruning), a novel method that identifies and prunes low-saliency hidden units in Transformer. Our approach computes a saliency score for each weight—as the product of the weight and its partial derivative with respect to the network output—and aggregates these scores to measure the contribution of each hidden unit. We prove, via a piecewise-linear bounding argument, that pruning units with minimal saliency preserves the network’s Neural Tangent Kernel (NTK) and, consequently, its training dynamics under Adam-based optimization. Empirical results on standard benchmarks confirm that TASP achieves significant model compression while maintaining training performance, offering a theoretically grounded and efficient pathway for Transformer model compression.