Skip to yearly menu bar Skip to main content


( events)   Timezone:  
The 2021 schedule is still incomplete
Workshop
Fri May 07 07:45 AM -- 05:00 PM (PDT)
Workshop on Enormous Language Models: Perspectives and Benchmarks
Colin Raffel · Adam Roberts · Amanda Askell · Daphne Ippolito · Ethan Dyer · Guy Gur-Ari · Jared Kaplan · Jascha Sohl-Dickstein · Katherine Lee · Melanie Subbiah · Sam McCandlish · Tom Brown · William Fedus · Vedant Misra · Ambrose Slone · Daniel Freeman





Workshop Home Page

Language models that have been trained on unlabeled text data are a cornerstone of modern natural language processing (NLP) research, and many recent state-of-the-art results in NLP were achieved by leveraging these self-supervised models. The success of this recipe is largely thanks to scalability: Better results can often be obtained by training larger models on larger amounts of unlabeled text data. This places our field at a crossroads. Will scaling lead to models that outperform humans on all text-based tasks, or are there limits to the scalability of these models? Should we focus on simply scaling these models, or should we design more sophisticated architectures and training schemes? Do our current benchmark effectively test capabilities that humans can master but large language models lack? How can we address the legal and ethical issues that arise from using unstructured web crawls for training language models? What can we learn from the fields of cognition, linguistics, and philosophy as we attempt to measure the “intelligence” of machines? The goal of this workshop is to find answers to these questions by inviting a diverse group of researchers to critically examine the state of giant language models.

This workshop will have a non-standard submission format: Rather than submitting research papers, participants will be invited to contribute diverse tasks that they believe measure uniquely human or particularly challenging capabilities for large language models. Teams at Google and OpenAI have committed to evaluate this task set on their best-performing model architectures, across models spanning from tens of thousands through hundreds of billions or more of parameters. Researchers will also be invited to contribute and evaluate their own models on these tasks. We will analyze these experiments, and report the results at the workshop, with a particular focus on how model performance on different task types scales with model size. By inviting contributions of tasks or models, we provide a means for researchers to participate whether or not they have the (cost-prohibitive) computational resources to train giant language models. The end result will be the Beyond the Imitation Game Benchmark (BIG Bench): A novel participant-driven test of the limits of giant language models. Find out more about BIG Bench and participate here.

Opening remarks
Invited talk by Thomas Margoni (Invited talk)
Invited talk by Jesse Dodge (Invited talk)
Invited talk by Emily M. Bender and Angelina McMillan-Major (Invited talk)
Break to discuss talks and questions for panel #1 (Break)
Invited talk by Thomas Wolf (Invited talk)
Invited talk by Emily Dinan (Invited talk)
Break to discuss talks and questions for panel #1 (Break)
Panel #1: “Bias, safety, copyright, and efficiency” with Thomas Wolf, Thomas Margoni, Emily Dinan, Natalie Schluter, and Jesse Dodge (Panel)
Overview of BIG-bench results (Presentation)
Spotlight presentations by BIG-bench participants (Contributed talks)
Contributed presentations by BIG-bench participants (Contributed talks)
Invited talk by Noam Shazeer (Invited talk)
Invited talk by Mike Lewis (Invited talk)
Invited talk by Nicholas Carlini (Invited talk)
Break to discuss talks and questions for panel #2 (Break)
Invited talk by Alison Gopnik (Invited talk)
Invited talk by Yejin Choi (Invited talk)
Break to discuss talks and questions for panel #2 (Break)
Panel #2: “Extrapolating the abilities of language models” with Alison Gopnik, Yejin Choi, Mike Lewis, and Emily M. Bender (Panel)
Closing remarks