Skip to yearly menu bar Skip to main content


Poster

CLOVER: Cross-Layer Orthogonal Vectors Pruning and Fine-Tuning

Fanxu Meng · Muhan Zhang

Hall 3 + Hall 2B #129
[ ]
Thu 24 Apr 7 p.m. PDT — 9:30 p.m. PDT

Abstract:

The absorb operation utilized in DeepSeek, which merges Query-Key and Value-Output weight matrices during inference, significantly increases parameter count and computational overhead. We observe that these absorbed matrices inherently exhibit low-rank structures. Motivated by this insight, we introduce CLOVER (Cross-Layer Orthogonal Vectors), a method that factorizes these matrices into four head-wise orthogonal matrices and two sets of singular values without any loss of information. By eliminating redundant vectors, CLOVER reduces the encoder parameters in Whisper-large-v3 by 46.42% without requiring additional training. Moreover, by freezing singular vectors and fine-tuning only singular values, CLOVER enables efficient full-rank fine-tuning. When evaluated on eight commonsense reasoning tasks with LLaMA-2 7B, CLOVER surpasses existing SoTA methods—LoRA, DoRA, HiRA, and PiSSA—by 7.6%, 5.5%, 3.8%, and 0.7%, respectively.

Live content is unavailable. Log in and register to view live content