ICLR Poster Second-Order Fine-Tuning without Pain for LLMs: A Hessian Informed Zeroth-Order Optimizer

Poster

Second-Order Fine-Tuning without Pain for LLMs: A Hessian Informed Zeroth-Order Optimizer

Yanjun Zhao · Sizhe Dang · Haishan Ye · Guang Dai · Yi Qian · Ivor Tsang

Hall 3 + Hall 2B #601

[ Abstract ]

Wed 23 Apr 7 p.m. PDT — 9:30 p.m. PDT

Abstract: Fine-tuning large language models (LLMs) is necessary for specific downstream tasks, but classic first-order optimizer entails prohibitive GPU memory because of the back propagation. Recent works such as MeZO have turned to zeroth-order optimizers for fine-tuning, which reduce substantial memory by using two forward passes. However, heterogeneous curvatures across different parameter dimensions in LLMs often cause model convergence instability or even failure. In this work, we propose HiZOO, a diagonal Hessian informed Zeroth-Order Optimizer , which is the first work to leverage the diagonal Hessian to enhance ZOO for fine-tuning LLMs. We provide theoretical proof for HiZOO and visualize the optimization trajectories on test functions to illustrate how it improves convergence in handling heterogeneous curvatures. Extensive experiments on various models (RoBERTa, OPT, Phi-2 and LLama3, with 350M

$\sim$ 66B parameters) indicate that HiZOO significantly reduces training steps and enhances model accuracy, while keeping the memory advantage of ZOO. For example, on SST2 task HiZOO achieves

$8\times$ speedup and better accuracy over MeZO across different models. We also propose HiZOO-L, which reduces the Hessian memory cost to 10\% of the MeZO, while maintaining almost same performance. Compared with ZO-Adam, HiZOO-L achieves a 4.3\% improvement, just using 50\% of the GPU memory. Code is available at https://anonymous.4open.science/r/HiZOO-27F8.

Live content is unavailable. Log in and register to view live content