invited talk
in
Workshop: Representational Alignment Mon, Apr 27, 2026 • 5:30 AM – 6:00 AM PDT

Toward Verifiably Steerable Language Models

Valentina Pyatkin

Project Page

Abstract

Post-training is the key to making large language models useful: it shapes how models respond to instructions, align with human intent, and generalize across diverse tasks. This talk addresses the challenge of developing steerable AI through post-training. I will discuss how we can train models to be better instruction followers. And I will show that most models severely overfit on a small set of instruction-following constraints and are not able to generalize well to unseen output constraints. I propose to train models with reinforcement learning from verifiable rewards for verifiable instruction following, and show how this leads to improved generalization on constraint following. Throughout the presentation, I will outline how I have applied these insights into developing open generative models, like Tülu and OLMo, and I will conclude with an outlook on how we can make AI more steerable in the future.

Speaker

Valentina Pyatkin

Video

Chat is not available.