Invited Talk
in
Workshop: Navigating and Addressing Data Problems for Foundation Models (DPFM)
Invited Talk #3 - Making “GPT-Next” Trustworthy Through Data [Speaker: Eric Wallace (OpenAI)]
Eric Wallace
Title: Making “GPT-Next” Trustworthy Through Data
Abstract: I’ll talk about three recent directions from OpenAI to make our next-generation of models more responsible, trustworthy, and secure. First, I will briefly outline the “Media Manager”—a tool to enable content owners to specify how they want their works to be included/excluded from AI training. Next, I will do a deep dive on prompt injections and how we can mitigate them by teaching LLMs to follow instructions in a hierarchal manner. Finally, I will discuss the tensions that exist between developer access and security, whereby providing access to LM output probabilities can allow adversaries to reveal the hidden size of black-box models.
Bio: Eric Wallace is a research scientist at OpenAI, where he studies the theory and practice of building trustworthy, secure, and private machine learning models. He did his PhD work at UC Berkeley, where he was supported by the Apple Scholars in AI Fellowship and had his research recognized by various awards (EMNLP, PETS). Prior to OpenAI, Eric interned at Google Brain, AI2, and FAIR.