Workshop
VerifAI: AI Verification in the Wild
Celine Lee · Wenting Zhao · Ameesh Shah · Theo X. Olausson · Tao Yu · Sean Welleck
This workshop explores the intersection of scale-driven generative artificial intelligence (AI) and the correctness-focused principles of verification. Formal analysis tools such as theorem provers, satisfiability solvers, and execution monitoring have demonstrated success in ensuring properties of interest across a range of tasks in software development and mathematics where precise reasoning is necessary. However, these methods face scaling challenges. Recently, generative AI such as large language models (LLMs) has been explored as a scalable and adaptable option to create solutions in these settings. The effectiveness of AI in these settings increases with more compute and data, but unlike traditional formalisms, they are built around probabilistic methods – not correctness by construction. In the VerifAI: AI Verification in the Wild workshop we invite papers and discussions that discuss how to bridge these two fields. Potential angles include, but are not limited to the following: generative AI for formal methods, formal methods for generative AI, AI as verifiers, datasets and benchmarks, and a special theme: LLMs for code generation. We welcome novel methodologies, analytic contributions, works in progress, negative results, andreview and positional papers that will foster discussion. We will also have a track for tiny or short papers.
Schedule
|
Sat 5:55 p.m. - 6:00 p.m.
|
Opening Remarks
(
Opening Remarks
)
>
SlidesLive Video |
🔗 |
|
Sat 6:00 p.m. - 6:30 p.m.
|
Keynote 1: Kevin Ellis
(
Invited Talk
)
>
SlidesLive Video |
Kevin Ellis 🔗 |
|
Sat 6:30 p.m. - 7:00 p.m.
|
Keynote 2: Shan Lu
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
|
Sat 7:00 p.m. - 7:30 p.m.
|
Keynote 3: Sida Wang
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
|
Sat 7:30 p.m. - 8:00 p.m.
|
Coffee Break
(
Coffee Break
)
>
|
🔗 |
|
Sat 7:30 p.m. - 7:35 p.m.
|
Presentation
|
🔗 |
|
Sat 8:00 p.m. - 9:00 p.m.
|
Poster Session 1
(
Poster Session
)
>
|
🔗 |
|
Sat 9:00 p.m. - 10:00 p.m.
|
Lunch Break
(
Lunch
)
>
|
🔗 |
|
Sat 10:00 p.m. - 10:30 p.m.
|
Keynote 4: Emily First
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
|
Sat 10:30 p.m. - 11:00 p.m.
|
Keynote 5: Charlie Snell
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
|
Sat 11:00 p.m. - 11:30 p.m.
|
Keynote 6: Nikitha Rao
(
Invited Talk
)
>
SlidesLive Video |
🔗 |
|
Sat 11:30 p.m. - 12:00 a.m.
|
Poster Session 2
(
Poster Session
)
>
|
🔗 |
|
Sun 12:00 a.m. - 12:15 a.m.
|
Orals 1: AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement
(
Oral Presentation
)
>
SlidesLive Video |
Pranjal Aggarwal 🔗 |
|
Sun 12:15 a.m. - 12:30 a.m.
|
Orals 2: Self-Steering Language Models
(
Oral Presentation
)
>
SlidesLive Video |
🔗 |
|
Sun 12:30 a.m. - 12:45 a.m.
|
Orals 3: Exact Certification of (Graph) Neural Networks Against Label Poisoning
(
Oral Presentation
)
>
SlidesLive Video |
🔗 |
|
Sun 12:45 a.m. - 1:00 a.m.
|
Orals 4: Scaling Test-Time Compute Without Verification or RL is Suboptimal
(
Oral Presentation
)
>
SlidesLive Video |
🔗 |
|
Sun 1:00 a.m. - 2:00 a.m.
|
Discussion Panel
(
Panel
)
>
SlidesLive Video |
🔗 |
|
Sun 2:00 a.m. - 2:05 a.m.
|
Closing Remarks
(
Closing Remarks
)
>
|
🔗 |
|
-
|
CAPM: Fast and Robust Verification on Maxpool-based CNN via Dual Network ( Poster ) > link | Jia-Hau Bai · Chi-Ting Liu · Yu Wang · Fu-Chieh Chang · Pei-Yuan Wu 🔗 |
|
-
|
LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction ( Poster ) > link | Suozhi Huang · Peiyang Song · Robert Joseph George · anima anandkumar 🔗 |
|
-
|
Reinforcement Learning with LTL and $\omega$-Regular Objectives via Optimality-Preserving Translation to Average Rewards ( Poster ) > link | Xuan-Bach Le · Dominik Wagner · Leon Witzman · Alexander Rabinovich · Luke Ong 🔗 |
|
-
|
Self-Steering Language Models ( Oral ) > link | Gabriel Grand · Joshua B Tenenbaum · Vikash Mansinghka · Alexander Lew · Jacob Andreas 🔗 |
|
-
|
Using GPUs And LLMs Can Be Satisfying for Nonlinear Real Arithmetic Problems ( Poster ) > link | Christopher Brix · Julia Walczak · Nils Lommen · Thomas Noll 🔗 |
|
-
|
Temporal Consistency for LLM Reasoning Process Error Identification ( Poster ) > link | Jiacheng Guo · Yue Wu · Jiahao Qiu · Kaixuan Huang · Xinzhe Juan · Ling Yang · Mengdi Wang 🔗 |
|
-
|
Synthesis and Verification of String Stable Control for Interconnected Systems via Neural sISS Certificate ( Poster ) > link | Jingyuan Zhou · Haoze Wu · Longhao Yan · Kaidi Yang 🔗 |
|
-
|
Exact Certification of (Graph) Neural Networks Against Label Poisoning ( Oral ) > link | Mahalakshmi Sabanayagam · Lukas Gosch · Stephan Günnemann · Debarghya Ghoshdastidar 🔗 |
|
-
|
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers (Abridged) ( Poster ) > link | Shalev Lifshitz · Sheila McIlraith · Yilun Du 🔗 |
|
-
|
MathConstruct: Challenging LLM Reasoning with Constructive Proofs ( Poster ) > link | Jasper Dekoninck · Mislav Balunovic · Nikola Jovanović · Ivo Petrov · Martin Vechev 🔗 |
|
-
|
Scaling Randomized Smoothing to state-of-the-art Vision-Language Models ( Poster ) > link | Emmanouil Seferis 🔗 |
|
-
|
LLMV-AgE: Verifying LLM-Guided Planning for Agentic Exploration in Open-World RL ( Poster ) > link | Haotian Chi · Songwei Zhao · Ivor Tsang · Yew-Soon Ong · Hechang Chen · Yi Chang · Haiyan Yin 🔗 |
|
-
|
Verifying Omega-regular Properties of Neural Network-Controlled Systems via Proof Certificates ( Poster ) > link | Peixin Wang · Jianhao Bai · Dapeng Zhi · Min Zhang · Luke Ong 🔗 |
|
-
|
Neural Abstract Interpretation ( Poster ) > link | Shaurya Gomber · Gagandeep Singh 🔗 |
|
-
|
CRANE: Reasoning with constrained LLM generation ( Poster ) > link | Debangshu Banerjee · Tarun Suresh · Shubham Dipak Ugare · Sasa Misailovic · Gagandeep Singh 🔗 |
|
-
|
Learning Automata from Demonstrations, Examples, and Natural Language ( Poster ) > link | Marcell Vazquez-Chanlatte · Karim Elmaaroufi · Stefan Witwicki · Matei Zaharia · Sanjit Seshia 🔗 |
|
-
|
Multi-Turn Code Generation Through Single-Step Rewards ( Poster ) > link | Arnav Kumar Jain · Gonzalo Gonzalez-Pumariega · Wayne Chen · Alexander Rush · Wenting Zhao · Sanjiban Choudhury 🔗 |
|
-
|
ReFoRCE: A Text-to-SQL Agent with Self-Refinement, Format Restriction, and Column Exploration ( Poster ) > link | Minghang Deng · Ashwin Ramachandran · Canwen Xu · Lanxiang Hu · Zhewei Yao · Anupam Datta · Hao Zhang 🔗 |
|
-
|
LipShiFT: A Certifiably Robust Shift-based Vision Transformer ( Poster ) > link | Rohan Menon · Nicola Franco · Stephan Günnemann 🔗 |
|
-
|
Randomly Sampled Language Reasoning Problems Reveal Limits of LLMs ( Poster ) > link | Kavi Gupta · Kate Sanders · Armando Solar-Lezama 🔗 |
|
-
|
Training and Verifying robust Kolmogorov-Arnold Networks ( Poster ) > link | Bjoern Heiderich · Max-Lion Schumacher · Marco Huber 🔗 |
|
-
|
Type-Constrained Code Generation with Language Models ( Poster ) > link | Niels Mündler · Jingxuan He · Hao Wang · Koushik Sen · Dawn Song · Martin Vechev 🔗 |
|
-
|
Guided Proof Search Using Large Language Models and Lemma Extraction in Coq ( Poster ) > link | Tarun Prasad · Nada Amin 🔗 |
|
-
|
Leveraging Large Language Models to Repair High-level Robot Controllers from Assumption Violations ( Poster ) > link | Qian Meng · Jin Zhou · Kilian Weinberger · Hadas Kress-Gazit 🔗 |
|
-
|
AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement ( Oral ) > link | Pranjal Aggarwal · Bryan Parno · Sean Welleck 🔗 |
|
-
|
On the Query Complexity of Verifier-Assisted Language Generation ( Poster ) > link | Edoardo Botta · Yuchen Li · Aashay Mehta · Jordan Ash · Cyril Zhang · Andrej Risteski 🔗 |
|
-
|
NO STRESS NO GAIN: STRESS TESTING BASED SELF-CONSISTENCY FOR OLYMPIAD PROGRAMMING ( Poster ) > link | Kunal Singh · Sayandeep Bhowmick · Pradeep Moturi · Siva Gollapalli 🔗 |
|
-
|
Tasks, Challenges, and Paths Towards AI for Software Engineering ( Poster ) > link | Alex Gu · Naman Jain · Wen-Ding Li · Manish Shetty · Yijia Shao · Ziyang Li · Diyi Yang · Koushik Sen · Kevin Ellis · Armando Solar-Lezama 🔗 |
|
-
|
ABSINT-AI: Language Models for Abstract Interpretation ( Poster ) > link | Michael Wang · Kexin Pei · Armando Solar-Lezama 🔗 |
|
-
|
Toward Trustworthy Neural Program Synthesis ( Poster ) > link | Wen-Ding Li · Darren Key · Kevin Ellis 🔗 |
|
-
|
Lightweight Latent Verifiers for Efficient Meta-Generation Strategies ( Poster ) > link | Bartosz Piotrowski · Witold Drzewakowski · Konrad Staniszewski · Piotr Miłoś 🔗 |
|
-
|
Scaling Test-Time Compute Without Verification or RL is Suboptimal ( Oral ) > link | Amrith Setlur · Nived Rajaraman · Sergey Levine · Aviral Kumar 🔗 |