ICLR Poster Param$\Delta$ for Direct Mixing: Post-Train Large Language Model At Zero Cost

Poster

Param $\Delta$ for Direct Mixing: Post-Train Large Language Model At Zero Cost

Sheng Cao · Mingrui Wu · Karthik Prasad · Yuandong Tian · Zechun Liu

Hall 3 + Hall 2B #638

[ Abstract ]

Fri 25 Apr midnight PDT — 2:30 a.m. PDT

Abstract: The post-training phase of large language models (LLMs) is crucial for enhancing model's ability to follow instructions, improve reasoning, and align with human preferences. It also poses significant challenges, including the need for extensive high-quality training data and the risk of overfitting. Moreover, the model development process requires post-training and evaluation after every base model adjustment, which incurs significant computational costs. This paper introduces Param

$\Delta$ , an innovative approach that streamlines the post-training process by transferring knowledge and capability from an existing post-trained model to a newly upgraded base model without additional training. By capturing the latent knowledge embedded in model parameters, our approach effectively replicates the traditional post-training and achieves comparable performance with below 2\% performance gap (98\% of transfer efficiency). As a result, Param

$\Delta$ facilitates an efficient iterative process in base model development and enables painless plug-and-play testing of the updated model's performance, offering a sustainable solution for continuous retraining cycles. Furthermore, Param

$\Delta$ brings a new perspective on how to fully leverage models in the open-weight community, where checkpoints for base models and instruction-tuned models are readily available and frequently updated, by providing a cost-effective framework that has the potential to accelerate the iterative cycle of model development.

Live content is unavailable. Log in and register to view live content

Poster

ParamΔ\Delta for Direct Mixing: Post-Train Large Language Model At Zero Cost

Sheng Cao · Mingrui Wu · Karthik Prasad · Yuandong Tian · Zechun Liu

Hall 3 + Hall 2B #638

Param $\Delta$ for Direct Mixing: Post-Train Large Language Model At Zero Cost