ICLR Poster Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models

Poster

Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models

Jerry Yao-Chieh Hu · Maojiang Su · En-Jui Kuo · Zhao Song · Han Liu

Hall 3 + Hall 2B #583

[ Abstract ]

Wed 23 Apr 7 p.m. PDT — 9:30 p.m. PDT

Abstract: We study the computational limits of Low-Rank Adaptation (LoRA) for finetuning transformer-based models using fine-grained complexity theory.Our key observation is that the existence of low-rank decompositions within the gradient computation of LoRA adaptation leads to possible algorithmic speedup.This allows us to (i) identify a phase transition behavior of efficiency \blue{assuming the Strong Exponential Time Hypothesis (SETH)}, and (ii) prove the existence of almost linear algorithms by controlling the LoRA update computation term by term.For the former, we identify a sharp transition in the efficiency of all possible rank-

r

$r$ LoRA update algorithms for transformers, based on specific norms resulting from the multiplications of the input sequence

X

$X$ , pretrained weights

W^{⋆}

${W^\star}$ , and adapter matrices

α B A / r

$\alpha B A/r$ .Specifically, we derive a shared upper bound threshold for such norms and show that efficient (sub-quadratic) approximation algorithms of LoRA exist only below this threshold.For the latter, we prove the existence of almost linear approximation algorithms for LoRA adaptation by utilizing the hierarchical low-rank structures of LoRA gradients and approximating the gradients with a series of chained low-rank approximations.To showcase our theory, we consider two practical scenarios: partial (e.g., only

W_{V}

$W_V$ and

W_{Q}

$W_Q$ ) and full adaptations (e.g.,

W_{Q}

$W_Q$ ,

W_{V}

$W_V$ , and

W_{K}

$W_K$ ) of weights in attention heads.

Live content is unavailable. Log in and register to view live content