Invited Talk
in
Workshop: Bridging the Gap Between Practice and Theory in Deep Learning
Invited Talk 4 : Emergence of unexpected complex skills in LLMs: Some theory and experiments
Abstract:
It has been discovered that as LLMs are scaled up (both with respect to number of parameters and size of training data) they spontaneously acquire new and complex skills. Our paper (Arora and Goyal'23) gave a mathematical analysis. Under a plausible framework for the structure of training dataset, it was shown rigorously that the LLM will be able to combine $k$-tuples of elementary skills when solving new tasks; where $k$ roughly doubles with each order of scaling.
This talk will report on subsequent experiments ---based upon the SKILLMIX eval---that verify this prediction, including the prediction that LLMs can combine skills at test time despite never having seen the same combination during training. Another recent experiment of special interest involved training on data that was generated by asking GPT4 and exhibited random subsets of up to $k$ skills. The resulting trained model displaying new capabilities at combining skills that were not seen **at all (in any combinations)** during training. This is of interest in discussions of alignment and safety, where it has been implicitly assumed that filtering training data of all "objectionable behaviors" would keep the model free of such objectionable behaviors.
(Based upon "A Theory for Emergence of Complex Skills in Language Models" and "SKILLMIX: A Flexible and Expandable Family of Evaluations for AI models", and a paper in progress.)
Chat is not available.