Skip to yearly menu bar Skip to main content


Poster

Beware of Calibration Data for Pruning Large Language Models

Yixin Ji · Yang Xiang · Juntao Li · Qingrong Xia · Ping Li · Xinyu Duan · Zhefeng Wang · Min Zhang

Hall 3 + Hall 2B #316
[ ]
Wed 23 Apr 7 p.m. PDT — 9:30 p.m. PDT

Abstract:

As large language models (LLMs) are widely applied across various fields, modelcompression has become increasingly crucial for reducing costs and improvinginference efficiency. Post-training pruning is a promising method that does notrequire resource-intensive iterative training and only needs a small amount ofcalibration data to assess the importance of parameters. Recent research has enhanced post-training pruning from different aspects but few of them systematicallyexplore the effects of calibration data, and it is unclear if there exist better calibration data construction strategies. We fill this blank and surprisingly observe thatcalibration data is also crucial to post-training pruning, especially for high sparsity. Through controlled experiments on important influence factors of calibrationdata, including the pruning settings, the amount of data, and its similarity withpre-training data, we observe that a small size of data is adequate, and more similar data to its pre-training stage can yield better performance. As pre-training datais usually inaccessible for advanced LLMs, we further provide a self-generatingcalibration data synthesis strategy to construct feasible calibration data. Experimental results on recent strong open-source LLMs (e.g., DCLM, and LLaMA-3)show that the proposed strategy can enhance the performance of strong pruningmethods (e.g., Wanda, DSnoT, OWL) by a large margin (up to 2.68%).

Live content is unavailable. Log in and register to view live content