Invited Talk
Workshop: Deep Learning for Code (DL4C)

BigCode: open and responsible development of LLMs for code

Harm de Vries · Leandro von Werra


In this presentation, we will share several accomplishments of the BigCode project, a community effort working on the responsible development of LLMs for code generation through open-science and open-governance. These include:

  • A new 15B parameter LLM for code
  • The Stack, 6.4 TB of permissively licensed source code with opt-out mechanism
  • Novel insights on the LLM scaling laws, suggesting we haven't reached the limit of training smaller LLMs for longer

