Workshop
Self-Improving Foundation Models Without Human Supervision
Amrith Setlur · Katie Kang · Aviral Kumar · Feryal Behbahani · Roberta Raileanu · Rishabh Agarwal
As foundation models (FMs) scale, they face a data bottleneck, where the growth of high-quality internet data unable to keep pace with their training needs. This is most apparent with text data already, has been a consistent problem in domains such as embodied intelligence, and is expected to soon inflict other modalities as well. Self-improvement, a paradigm where models generate and train on synthetic data generated from the same or other models, offers a promising solution. This paradigm differs from both supervised learning, which relies on curated human data, and reinforcement learning (RL), which depends on external rewards. Self-improvement frameworks require models to self-curate training data, often using imperfect learned verifiers, with unique challenges. This workshop will explore algorithms for self-improvement, covering topics such as synthetic data, multi-agent and multi-modal systems, weak-to-strong generalization, inference-time self-supervision, and theoretical limits.
Schedule
|
Sat 6:00 p.m. - 6:15 p.m.
|
Opening remarks
(
Opening remarks
)
>
SlidesLive Video |
🔗 |
|
Sat 6:15 p.m. - 6:55 p.m.
|
Contributed talks 4x
SlidesLive Video |
🔗 |
|
Sat 7:00 p.m. - 7:30 p.m.
|
Break & Posters
|
🔗 |
|
Sat 7:30 p.m. - 8:05 p.m.
|
Invited Talk
(
Ida Momennejad (Microsoft Research)
)
>
SlidesLive Video |
🔗 |
|
Sat 8:05 p.m. - 8:40 p.m.
|
Invited Talk
(
Noah Goodman (Stanford University)
)
>
SlidesLive Video |
🔗 |
|
Sat 8:45 p.m. - 9:20 p.m.
|
Invited Talk
(
Shunyu Yao (OpenAI)
)
>
SlidesLive Video |
🔗 |
|
Sat 9:00 p.m. - 11:00 p.m.
|
Lunch break
|
🔗 |
|
Sat 11:00 p.m. - 11:35 p.m.
|
Invited Talk
(
Minjoon Seo (KAIST)
)
>
SlidesLive Video |
🔗 |
|
Sat 11:35 p.m. - 11:55 p.m.
|
Contributed Talks (2x)
SlidesLive Video |
🔗 |
|
Sun 12:00 a.m. - 12:45 a.m.
|
Poster Session
|
🔗 |
|
Sun 12:45 a.m. - 1:20 a.m.
|
Invited Talk
(
Ulyana Piterbarg (NYU)
)
>
SlidesLive Video |
🔗 |
|
Sun 1:20 a.m. - 1:55 a.m.
|
Invited Talk
(
Jaehun Jung (University of Washington)
)
>
SlidesLive Video |
🔗 |
|
Sun 2:00 a.m. - 3:00 a.m.
|
Panel Discussion
SlidesLive Video |
🔗 |
|
-
|
Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-Translation ( Poster ) > link | Tianyu Zheng · Shuyue Guo · Xingwei Qu · Jiawei Guo · Xeron Du · Chenghua Lin · Wenhao Huang · Jie Fu · Ge Zhang 🔗 |
|
-
|
MPAW: Multi-Preference Alignment through Weak Model Collaboration for Efficient and Flexible LLM Decoding ( Poster ) > link | Nuo Chen · GUOJUN XIONG · Bingsheng He 🔗 |
|
-
|
Natural Language Reinforcement Learning ( Poster ) > link | Xidong Feng · Bo Liu · Ziyu Wan · Haotian Fu · Girish Arun Koushik · Zhiyuan Hu · Mengyue Yang · Ying Wen · Jun Wang 🔗 |
|
-
|
Optimizing Test-Time Compute via Meta Reinforcement Finetuning ( Poster ) > link | Yuxiao Qu · Matthew Yang · Lewis Tunstall · Edward Beeching · Russ Salakhutdinov 🔗 |
|
-
|
InfoQuest: Evaluating Multi-Turn Dialogue Agents for Open-Ended Conversations with Hidden Context ( Poster ) > link | Bryan Lincoln Marques de Oliveira · Luana Martins · Bruno BrandĂŁo · Luckeciano Melo 🔗 |
|
-
|
Self-Taught Self-Correction for Small Language Models ( Poster ) > link | Viktor Moskvoretskii · Chris Biemann · Irina Nikishina 🔗 |
|
-
|
AMPO: Active Multi Preference Optimization for Self-play Preference Selection ( Poster ) > link | Taneesh Gupta · Rahul Madhavan · Xuchao Zhang · Chetan Bansal · Saravanakumar Rajmohan 🔗 |
|
-
|
An Adversarial Collaborative Framework for Comprehensive Image Captioning ( Poster ) > link | Dinesh Chowdary Attota · Ying Xie · Linh Le 🔗 |
|
-
|
How to Mitigate Overfitting in Weak-to-strong Generalization? ( Poster ) > link | Junhao Shi · Qingyuan Chen · Zhaoye Fei · Yining Zheng · Qipeng Guo · Xuanjing Huang · Xipeng Qiu 🔗 |
|
-
|
Evaluating LLMs Without Oracle Feedback: Agentic Annotation Evaluation Through Unsupervised Consistency Signals ( Poster ) > link | cheng chen · Haiyan Yin · Ivor Tsang 🔗 |
|
-
|
Great Models Think Alike and this Undermines AI Oversight ( Poster ) > link | Shashwat Goel · Joschka StrĂĽber · Ilze Auzina · Karuna Chandra · Ponnurangam Kumaraguru · Douwe Kiela · Ameya Prabhu · Matthias Bethge · Jonas Geiping 🔗 |
|
-
|
Multi-Turn Code Generation Through Single-Step Rewards ( Poster ) > link | Arnav Kumar Jain · Gonzalo Gonzalez-Pumariega · Wayne Chen · Alexander Rush · Wenting Zhao · Sanjiban Choudhury 🔗 |
|
-
|
A Self-Improving Coding Agent ( Oral ) > link | Maxime Robeyns · Martin Szummer · Laurence Aitchison 🔗 |
|
-
|
RMBoost: Reward Model Training With Preference-Conditional Multi-Aspect Synthetic Data Generation ( Poster ) > link | Jiaming Shen · Ran Xu · Yennie Jun · Zhen Qin · Tianqi Liu · Carl Yang · Yi Liang · Simon Baumgartner · Michael Bendersky 🔗 |
|
-
|
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources ( Poster ) > link | Alisia Lupidi · Carlos Gemmell · Nicola Cancedda · Jane Dwivedi-Yu · Jason E Weston · Jakob Foerster · Roberta Raileanu · Maria Lomeli 🔗 |
|
-
|
Safety is Essential for Responsible Open-Ended Systems ( Poster ) > link | Ivaxi Sheth · Jan Wehner · Sahar Abdelnabi · Ruta Binkyte · Mario Fritz 🔗 |
|
-
|
Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens ( Poster ) > link | Zhepeng Cen · Yao Liu · Siliang Zeng · Pratik A Chaudhari · Huzefa Rangwala · George Karypis · Rasool Fakoor 🔗 |
|
-
|
MALT: Improving Reasoning with Multi-Agent LLM Training ( Poster ) > link | Sumeet Motwani · Chandler Smith · Rocktim Das · Rafael Rafailov · Ivan Laptev · Philip Torr · Fabio Pizzati · Ronald Clark · Christian Schroeder de Witt 🔗 |
|
-
|
KernelBench: Can LLMs Write Efficient GPU Kernels? ( Poster ) > link | Anne Ouyang · Simon Guo · Simran Arora · Alex Zhang · William Hu · Christopher Re · Azalia Mirhoseini 🔗 |
|
-
|
Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension ( Poster ) > link | Xiyao Wang · Zhengyuan Yang · Linjie Li · Hongjin Lu · Yuancheng Xu · Chung-Ching Lin · Kevin Lin · Furong Huang · Lijuan Wang 🔗 |
|
-
|
Scalable Thompson Sampling via Ensemble++ ( Poster ) > link | Yingru Li · Jiawei Xu · Baoxiang Wang · Zhi-Quan Luo 🔗 |
|
-
|
AgentBreeder: Mitigating the AI Safety Impact of Multi-Agent Scaffolds via Self-Improvement ( Oral ) > link | J Rosser · Jakob Foerster 🔗 |
|
-
|
Preference Tree Optimization: Enhancing Goal-Oriented Dialogue with Look-Ahead Simulations ( Poster ) > link | Lior Baruch · Moshe Butman · Kfir Bar · Doron Friedman 🔗 |
|
-
|
Don't Throw Away Data: Improving Sequence Knowledge Distillation with Minimum Bayes Risk Decoding ( Poster ) > link | Jun Wang · Eleftheria Briakou · Hamid Dadkhahi · Rishabh Agarwal · Colin Cherry · Trevor Cohn 🔗 |
|
-
|
MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge ( Poster ) > link | yuntao du · Kailin Jiang · Zhi Gao · Chenrui Shi · Zilong Zheng · Siyuan Qi · Qing Li 🔗 |
|
-
|
Training a Generally Curious Agent ( Poster ) > link | Fahim Tajwar · Yiding Jiang · Abitha Thankaraj · Sumaita Rahman · Zico Kolter · Jeff Schneider · Russ Salakhutdinov 🔗 |
|
-
|
Towards Internet-Scale Training For Agents ( Poster ) > link | Brandon Trabucco · Gunnar Sigurdsson · Robinson Piramuthu · Russ Salakhutdinov 🔗 |
|
-
|
Moral Intrinsic Rewards for Automated Alignment of LLM Agents ( Poster ) > link | Elizaveta Tennant · Stephen Hailes · Mirco Musolesi 🔗 |
|
-
|
Exploring the Pre-conditions for Memory-Learning Agents ( Poster ) > link | Vishwa Shah · Vishruth Veerendranath · Graham Neubig · Daniel Fried · Zora Zhiruo Wang 🔗 |
|
-
|
OpenRAG: Optimizing RAG End-to-End via In-Context Retrieval Learning ( Poster ) > link | Jiawei Zhou · Lei Chen 🔗 |
|
-
|
Evolutionary Prompt Optimization Discovers Emergent Multimodal Reasoning Strategies in Vision-Language Models ( Poster ) > link | Sid Bharthulwar · John Rho · Katrina Brown 🔗 |
|
-
|
Automated Capability Discovery via Model Self-Exploration ( Poster ) > link | Cong Lu · Shengran Hu · Jeff Clune 🔗 |
|
-
|
Understanding the Capabilities and Limitations of Weak-to-Strong Generalization ( Poster ) > link | Wei Yao · Wenkai Yang · Ziqiao Wang · Yankai Lin · Yong Liu 🔗 |
|
-
|
Adaptively-Labeled Vision Datasets Via Instance-Level Retrieval ( Poster ) > link | Brandon Trabucco · Rishav Mukherji · Yutong Bai · Russ Salakhutdinov 🔗 |
|
-
|
I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm ( Poster ) > link |
12 presentersYiming Liang · Xingwei Qu · Tianyu Zheng · Jiawei Guo · Xeron Du · Zhenzhu Yang · JIAHENG LIU · Chenghua Lin · Ge Zhang · Lei Ma · Wenhao Huang · Jiajun Zhang |
|
-
|
Can Language Models Falsify? The Need for Inverse Benchmarking ( Oral ) > link | Shiven Sinha · Shashwat Goel · Ponnurangam Kumaraguru · Jonas Geiping · Matthias Bethge · Ameya Prabhu 🔗 |
|
-
|
Think, Prune, Train, Improve: Scaling Reasoning Without Scaling Models ( Poster ) > link | Caia Costello 🔗 |
|
-
|
Solving Robotic Tasks via Self-Adapting Improvement Loops with Internet Video Knowledge ( Poster ) > link | Calvin Luo · Zilai Zeng · Yilun Du · Chen Sun 🔗 |
|
-
|
Self-Correcting Self-Consuming Loops For Generative Model Training ( Poster ) > link | Nate Gillman · Michael Freeman · Daksh Aggarwal · Chia-Hong HSU · Calvin Luo · Yonglong Tian · Chen Sun 🔗 |
|
-
|
Policy-Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone ( Poster ) > link | Max Sobol Mark · Tian Gao · Georgia Gabriela Sampaio · Mohan Kumar Srirama · Archit Sharma · Chelsea Finn · Aviral Kumar 🔗 |
|
-
|
Improving Test-Time Search for LLMs with Backtracking Against In-Context Value Verifiers ( Poster ) > link | Anikait Singh · Kushal Arora · Sedrick Keh · Jean Mercat · Tatsunori Hashimoto · Chelsea Finn · Aviral Kumar 🔗 |
|
-
|
Scaling Flaws of Verifier-guided Search in Mathematical Reasoning ( Poster ) > link | Fei Yu · Yingru Li · Wang Benyou 🔗 |
|
-
|
NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild ( Poster ) > link | Shikhar Murty · Hao Zhu · Dzmitry Bahdanau · Christopher Manning 🔗 |
|
-
|
D3: A Large Dataset for Training Code Language Models to Act Diff-by-Diff ( Poster ) > link | Ulyana Piterbarg · Kanishk Gandhi · Lerrel Pinto · Noah Goodman · Rob Fergus 🔗 |
|
-
|
Mitigating Short Board Effect via Dynamic Reward Balancing in Multi-reward LLM Optimization ( Poster ) > link | Nuo Chen · Yufei Gao · Yongnan Jin · Yan Hu · Anningzhe Gao · Lingyong Yan · Wang Benyou 🔗 |
|
-
|
Boss LLM: Adaptation via No-Regret Learning ( Poster ) > link | Yu Feng · Avishree Khare · Nghia Nguyen · Sikata Sengupta 🔗 |
|
-
|
AIDE: Agentically Improve Visual Language Model with Domain Experts ( Poster ) > link | Ming-Chang Chiu · Fuxiao Liu · Karan Sapra · Andrew Tao · Yaser Yacoob · Xuezhe Ma · Zhiding Yu · Guilin Liu 🔗 |
|
-
|
Self-Improving Diffusion Models With Synthetic Data ( Poster ) > link | Sina Alemohammad · Ahmed Imtiaz Humayun · Shruti Agarwal · John Collomosse · Richard Baraniuk 🔗 |
|
-
|
DISC: Dynamic Decomposition Improves LLM Inference Scaling ( Poster ) > link | Jonathan Light · Wei Cheng · Yue Wu · Masafumi Oyamada · Mengdi Wang · Santiago Paternain · Haifeng Chen 🔗 |
|
-
|
Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges ( Oral ) > link | Nayoung Lee · Ziyang Cai · Avi Schwarzschild · Kangwook Lee · Dimitris Papailiopoulos 🔗 |
|
-
|
An Architecture Search Framework for Inference-Time Techniques ( Oral ) > link |
11 presentersJon Saad-Falcon · Adrian Gamarra Lafuente · Shlok Natarajan · Nahum Maru · Hristo Todorov · Etash Guha · E. Kelly Buchanan · Mayee Chen · Neel Guha · Christopher Re · Azalia Mirhoseini |
|
-
|
Game-Theoretic Regularized Self-Play Alignment of Large Language Models ( Poster ) > link | Xiaohang Tang · Sangwoong Yoon · Seongho Son · Rina Hughes · Quanquan Gu · Ilija Bogunovic 🔗 |
|
-
|
Leveraging Reasoning with Guidelines to Elicit and Utilize Knowledge for Enhancing Safety Alignment ( Poster ) > link | Haoyu Wang · Zeyu Qin · Li Shen · Xueqian Wang · Minhao Cheng · Dacheng Tao 🔗 |
|
-
|
MetaSC: Test-Time Safety Specification Optimization for Language Models ( Poster ) > link | Victor Gallego 🔗 |
|
-
|
VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making ( Poster ) > link | Jake Grigsby · Yuke Zhu · Michael Ryoo · Juan Carlos Niebles 🔗 |
|
-
|
Yes, Q-learning Helps Offline In-Context RL ( Poster ) > link | Denis Tarasov · Alexander Nikulin · Ilya Zisman · Albina Klepach · Andrei Polubarov · Lyubaykin Nikita · Alexander Derevyagin · Igor Kiselev · Vladislav Kurenkov 🔗 |
|
-
|
AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement ( Poster ) > link | Pranjal Aggarwal · Bryan Parno · Sean Welleck 🔗 |
|
-
|
Assessing Diversity Collapse in Reasoning ( Poster ) > link | Xingyu Dang · Christina Baek · Zico Kolter · Aditi Raghunathan 🔗 |
|
-
|
Value-Based Deep RL Scales Predictably ( Poster ) > link | Oleh Rybkin · Michal Nauman · Preston Fu · Charlie Snell · Pieter Abbeel · Sergey Levine · Aviral Kumar 🔗 |
|
-
|
Escaping Collapse: The Strength of Weak Data for Large Language Model Training ( Poster ) > link | Kareem Amin · Sara Babakniya · Alex Bie · Weiwei Kong · Umar Syed · Sergei Vassilvitskii 🔗 |
|
-
|
HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning ( Poster ) > link |
11 presentersManish Bhattarai · Ryan Barron · Maksim Eren · Minh Vu · Vesselin Grantcharov · Ismael · Valentin Stanev · Cynthia Matuszek · Vladimir Valtchinov · Kim Rasmussen · Boian Alexandrov |
|
-
|
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers (Abridged) ( Poster ) > link | Shalev Lifshitz · Sheila McIlraith · Yilun Du 🔗 |
|
-
|
DetailCLIP: Detail-Oriented CLIP for Fine-Grained Tasks ( Poster ) > link | Amin Karimi Monsefi · Kishore Sailaja · Ali Alilooee · Ser-Nam Lim · Rajiv Ramnath 🔗 |
|
-
|
LaMsS: When Large Language Models Meet Self-Skepticism ( Poster ) > link | Yetao Wu · Yihong Wang · Teng Chen · Ningyuan Xi · Qingqing Gu · Hongyang Lei · Luo Ji 🔗 |
|
-
|
Demystifying Long Chain-of-Thought Reasoning in LLMs ( Oral ) > link | Edward Yeo · Yuxuan Tong · Xinyao Niu · Graham Neubig · Xiang Yue 🔗 |
|
-
|
ReSL: Enhancing Deep Clustering Through Reset-based Self-Labeling ( Poster ) > link | Andrii Shkabrii · Timo Klein · Lukas Miklautz · Sebastian Tschiatschek · Claudia Plant 🔗 |
|
-
|
Vision-Language Model Dialog Games for Self-Improvement ( Poster ) > link | Ksenia Konyushkova · Christos Kaplanis · Serkan Cabi · Misha Denil 🔗 |
|
-
|
Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy ( Poster ) > link | Saeid Asgari · Joao Monteiro 🔗 |
|
-
|
Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage ( Poster ) > link | Zhi Gao · Bofei Zhang · Pengxiang Li · Xiaojian Ma · Tao Yuan · Yue Fan · Yuwei Wu · Yunde Jia · Song-Chun Zhu · Qing Li 🔗 |
|
-
|
Aviary: Training Language Agents on Challenging Scientific Tasks ( Poster ) > link |
11 presentersSiddharth Narayanan · James Braza · Ryan-Rhys Griffiths · MANVITHA PONNAPATI · Albert Bou · Jon Laurent · Ori Kabeli · Geemi Wellawatte · Sam Cox · Samuel Rodriques · Andrew White |
|
-
|
Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning ( Poster ) > link | Anja Ĺ urina · Amin Mansouri · Amal Seddas · Maryna Viazovska · Emmanuel Abbe · Caglar Gulcehre 🔗 |
|
-
|
SCOPE: Improving LLM Conversations with Efficient Semantic Space Planning ( Poster ) > link | Zhiliang Chen · Xinyuan Niu · Chuan Sheng Foo · Bryan Kian Hsiang Low 🔗 |
|
-
|
Self-correction for OOD generalization ( Poster ) > link | Vanya Bannihatti Kumar · Abhinav Rao · Aditi Raghunathan 🔗 |