Toggle Poster Visibility
Oral
Fri Apr 24 11:15 AM -- 11:25 AM (PDT) @ 203 A/B None
SWINGARENA: Adversarial Programming Arena for Long-context GitHub Issue Solving
[
OpenReview]
Oral
Fri Apr 24 11:27 AM -- 11:37 AM (PDT) @ 203 A/B None
BIRD-INTERACT: Re-imagining Text-to-SQL Evaluation via Lens of Dynamic Interactions
[
OpenReview]
Oral
Fri Apr 24 11:39 AM -- 11:49 AM (PDT) @ 203 A/B None
EditBench: Evaluating LLM Abilities to Perform Real-World Instructed Code Edits
[
OpenReview]
Oral
Fri Apr 24 11:51 AM -- 12:01 PM (PDT) @ 203 A/B None
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents
[
OpenReview]
Oral
Fri Apr 24 12:03 PM -- 12:13 PM (PDT) @ 203 A/B None
AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite
[
OpenReview]
Oral
Fri Apr 24 12:15 PM -- 12:25 PM (PDT) @ 203 A/B None
MedAgentGym: A Scalable Agentic Training Environment for Code-Centric Reasoning in Biomedical Data Science
[
OpenReview]
Successful Page Load