Skip to yearly menu bar Skip to main content


Invited Talk

Training Language Models in Academia: Challenge or Calling?

Danqi Chen

Hall 1 Apex
[ ]
Fri 25 Apr 6 p.m. PDT — 7 p.m. PDT

Abstract:

Training large language models has become a defining pursuit in modern machine learning—one that is almost entirely led by industry, fueled by massive computational resources and guided by scaling laws that reward ever-larger models and datasets. For academic researchers, participating in this space can feel out of reach. The barriers—limited compute, infrastructure, and access to proprietary data—are real and growing. Still, I believe academia has an essential role to play. Even with constraints, there are important scientific questions and meaningful opportunities that academic research is uniquely positioned to tackle. By engaging with the training process itself, we can deepen our understanding of language models and develop novel and efficient approaches that complement large-scale efforts. In this talk, I’ll share my lab’s research efforts over the past two years in both pre-training and post-training of language models under an academic budget. Our work has aimed to better understand training dynamics, innovate within limitations, and release artifacts that benefit the broader research community. I’ll also highlight three areas where academic researchers can make significant contributions: (1) developing small but capable models, (2) understanding and improving training data, and (3) advancing post-training methods on top of open-weight models. My hope is to encourage broader engagement with LM training in academia, and to foster new forms of collaboration between academic and industry research.

Live content is unavailable. Log in and register to view live content