Skip to yearly menu bar Skip to main content


Oral
in
Affinity Workshop: Tiny Papers Showcase Day (a DEI initiative)

Prune and Tune: Improving Efficient Pruning Techniques for Massive Language Models

Aaquib Syed · Phillip Guo


Abstract:

Massive language models with billions of parameters have significant compute expenses and thus can benefit from pruning. Pruning techniques for massive models are typically iterative and require extensive weight retraining after pruning. SparseGPT, a recently introduced one-shot technique for pruning such models, enables pruning without retraining. We improve upon SparseGPT by fine-tuning during pruning with minimal training steps, and we perform experiments against magnitude pruning and find that our iteratively fine-tuned SparseGPT models significantly outperform their magnitude pruning counterparts at high sparsity.

Chat is not available.