Skip to yearly menu bar Skip to main content


Show Detail
Timezone: America/Sao_Paulo
 
Filter Rooms:  

THU 23 APR
10:30 a.m.
Orals 10:30-11:40
[10:30] Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling
[10:42] $\mathbf{T^3}$: Reducing Belief Deviation in Reinforcement Learning for Active Reasoning
[10:54] MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent
[11:06] Verifying Chain-of-Thought Reasoning via its Computational Graph
[11:18] Revela: Dense Retriever Learning via Language Modeling
[11:30] RAIN-Merging: A Gradient-Free Method to Enhance Instruction Following in Large Reasoning Models with Preserved Thinking Format
(ends 12:00 PM)
Orals 10:30-11:52
[10:30] Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer
[10:42] Improving Diffusion Models for Class-imbalanced Training Data via Capacity Manipulation
[10:54] Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers
[11:06] DiffusionNFT: Online Diffusion Reinforcement with Forward Process
[11:18] Universal Inverse Distillation for Matching Models with Real-Data Supervision (No GANs)
[11:30] GLASS Flows: Efficient Inference for Reward Alignment of Flow and Diffusion Models
[11:42] Neon: Negative Extrapolation From Self-Training Improves Image Generation
(ends 12:00 PM)
Orals 10:30-11:16
[10:30] Mastering Sparse CUDA Generation through Pretrained Models and Deep Reinforcement Learning
[10:42] RefineStat: Efficient Exploration for Probabilistic Program Synthesis
[10:54] Huxley-G\"odel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine
[11:06] TileLang: Bridge Programmability and Performance in Modern Neural Kernels
(ends 12:00 PM)
Orals 10:30-11:52
[10:30] One for Two: A Unified Framework for Imbalanced Graph Classification via Dynamic Balanced Prototype
[10:42] Compactness and Consistency: A Conjoint Framework for Deep Graph Clustering
[10:54] Actions Speak Louder than Prompts: A Large-Scale Study of LLMs for Graph Inference
[11:06] Multi-Domain Transferable Graph Gluing for Building Graph Foundation Models
[11:18] Modality-free Graph In-context Alignment
[11:30] Learning with Dual-level Noisy Correspondence for Multi-modal Entity Alignment
[11:42] Exchangeability of GNN Representations with Applications to Graph Retrieval
(ends 12:00 PM)
Orals 10:30-11:52
[10:30] Information Shapes Koopman Representation
[10:42] On The Surprising Effectiveness of a Single Global Merging in Decentralized Learning
[10:54] Similarity-aware Non-Convex Federated Optimization
[11:06] On the Wasserstein Geodesic Principal Component Analysis of probability measures
[11:18] Fast Escape, Slow Convergence: Learning Dynamics of Phase Retrieval under Power-Law Data
[11:30] Hyperparameter Trajectory Inference with Conditional Lagrangian Optimal Transport
[11:42] Gaussian certified unlearning in high dimensions: A hypothesis testing approach
(ends 12:00 PM)
Orals 10:30-11:52
[10:30] Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models
[10:42] Invisible Safety Threat: Malicious Finetuning for LLM via Steganography
[10:54] The Shape of Adversarial Influence: Characterizing LLM Latent Spaces with Persistent Homology
[11:06] Watch your steps: Dormant Adversarial Behaviors that Activate upon LLM Finetuning
[11:18] LLM Fingerprinting via Semantically Conditioned Watermarks
[11:30] Steering the Herd: A Framework for LLM-based Control of Social Learning
[11:42] Every Language Model Has a Forgery-Resistant Signature
(ends 12:00 PM)
Posters 10:30-1:00
(ends 1:00 PM)
3:15 p.m.
Orals 3:15-4:37
[3:15] LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning
[3:27] EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning
[3:39] Token-Importance Guided Direct Preference Optimization
[3:51] P-GenRM: Personalized Generative Reward Model with Test-time User-based Scaling
[4:03] Reasoning without Training: Your Base Model is Smarter Than You Think
[4:15] LoongRL: Reinforcement Learning for Advanced Reasoning over Long Contexts
[4:27] Q-RAG: Long Context Multi‑Step Retrieval via Value‑Based Embedder Training
(ends 4:45 PM)
Orals 3:15-4:37
[3:15] High-dimensional Analysis of Synthetic Data Selection
[3:27] How Do Transformers Learn to Associate Tokens: Gradient Leading Terms Bring Mechanistic Interpretability
[3:39] Sequences of Logits Reveal the Low Rank Structure of Language Models
[3:51] Intrinsic Entropy of Context Length Scaling in LLMs
[4:03] From Markov to Laplace: How Mamba In-Context Learns Markov Chains
[4:15] The Coverage Principle: How Pre-Training Enables Post-Training
[4:27] Quantitative Bounds for Length Generalization in Transformers
(ends 4:45 PM)
Orals 3:15-4:37
[3:15] Reasoning as Representation: Rethinking Visual Reinforcement Learning in Image Quality Assessment
[3:27] Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning
[3:39] On the Generalization Capacities of MLLMs for Spatial Intelligence
[3:51] DepthLM: Metric Depth from Vision Language Models
[4:03] FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Merging
[4:15] Multimodal Aligned Semantic Knowledge for Unpaired Image-text Matching
[4:27] Vid-LLM: A Compact Video-based 3D Multimodal LLM with Reconstruction–Reasoning Synergy
(ends 4:45 PM)
Orals 3:15-4:37
[3:15] Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts
[3:27] Is it Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
[3:39] LLMs Get Lost In Multi-Turn Conversation
[3:51] How Reliable is Language Model Micro-Benchmarking?
[4:03] AdAEM: An Adaptively and Automated Extensible Evaluation Method of LLMs' Value Difference
[4:15] What's In My Human Feedback? Learning Interpretable Descriptions of Preference Data
[4:27] EigenBench: A Comparative Behavioral Measure of Value Alignment
(ends 4:45 PM)
Orals 3:15-4:37
[3:15] Generative Human Geometry Distribution
[3:27] Depth Anything 3: Recovering the Visual Space from Any Views
[3:39] Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator
[3:51] Monocular Normal Estimation via Shading Sequence Estimation
[4:03] Radiometrically Consistent Gaussian Surfels for Inverse Rendering
[4:15] True Self-Supervised Novel View Synthesis is Transferable
[4:27] cadrille: Multi-modal CAD Reconstruction with Reinforcement Learning
(ends 4:45 PM)
Orals 3:15-4:37
[3:15] Distributional Equivalence in Linear Non-Gaussian Latent-Variable Cyclic Causal Models: Characterization and Learning
[3:27] Probabilistic Kernel Function for Fast Angle Testing
[3:39] Differentially Private Domain Discovery
[3:51] Causal Structure Learning in Hawkes Processes with Complex Latent Confounder Networks
[4:03] Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability
[4:15] A Representer Theorem for Hawkes Processes via Penalized Least Squares Minimization
[4:27] Cross-Domain Lossy Compression via Rate- and Classification-Constrained Optimal Transport
(ends 4:45 PM)
Posters 3:15-5:45
(ends 5:45 PM)

FRI 24 APR
10:30 a.m.
Orals 10:30-11:52
[10:30] ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
[10:42] Shoot First, Ask Questions Later? Building Rational Agents that Explore and Act Like People
[10:54] In-The-Flow Agentic System Optimization for Effective Planning and Tool Use
[11:06] Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments
[11:18] AgentGym-RL: An Open-Source Framework to Train LLM Agents for Long-Horizon Decision Making via Multi-Turn RL
[11:30] GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
[11:42] Speculative Actions: A Lossless Framework for Faster AI Agents
(ends 12:00 PM)
Orals 10:30-11:52
[10:30] Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
[10:42] SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer
[10:54] Partition Generative Modeling: Masked Modeling Without Masks
[11:06] NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale
[11:18] TTSDS2: Resources and Benchmark for Evaluating Human-Quality Text to Speech Systems
[11:30] VibeVoice: Expressive Podcast Generation with Next-Token Diffusion
[11:42] UALM: Unified Audio Language Model for Understanding, Generation and Reasoning
(ends 12:00 PM)
Orals 10:30-11:52
[10:30] Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation
[10:42] WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training
[10:54] Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks
[11:06] How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining
[11:18] In-Place Test-Time Training
[11:30] Softmax Transformers are Turing-Complete
[11:42] Pre-training under infinite compute
(ends 12:00 PM)
Orals 10:30-11:28
[10:30] MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Models for Embodied Task Planning
[10:42] Generative Universal Verifier as Multimodal Meta-Reasoner
[10:54] Visual Planning: Let's Think Only with Images
[11:06] MC-Search: Evaluating and Enhancing Multimodal Agentic Search with Structured Long Reasoning Chains
[11:18] Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences
(ends 12:00 PM)
Orals 10:30-11:40
[10:30] The Polar Express: Optimal Matrix Sign Methods and their Application to the Muon Algorithm
[10:42] Temporal superposition and feature geometry of RNNs under memory demands
[10:54] Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime
[11:06] Efficient Resource-Constrained Training of Vision Transformers via Subspace Optimization
[11:18] Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
[11:30] HATSolver: Learning Gröbner Bases with Hierarchical Attention Transformers
(ends 12:00 PM)
Orals 10:30-11:40
[10:30] Extending Sequence Length is Not All You Need: Effective Integration of Multimodal Signals for Gene Expression Prediction
[10:42] Premise Selection for a Lean Hammer
[10:54] Exploring Synthesizable Chemical Space with Iterative Pathway Refinements
[11:06] mCLM: A Modular Chemical Language Model that Generates Functional and Makeable Molecules
[11:18] It's All Just Vectorization: einx, a Universal Notation for Tensor Operations
[11:30] Exploratory Causal Inference in SAEnce
(ends 12:00 PM)
Posters 10:30-1:00
(ends 1:00 PM)
3:15 p.m.
Orals 3:15-4:25
[3:15] ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models
[3:27] MrRoPE: Mixed-radix Rotary Position Embedding
[3:39] Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
[3:51] FlashRNN: Unlocking Parallel Training of Nonlinear RNNs for Large Language Models
[4:03] Mamba-3: Improved Sequence Modeling using State Space Principles
[4:15] Energy-Based Transformers are Scalable Learners and Thinkers
(ends 4:45 PM)
Orals 3:15-4:37
[3:15] WoW!: World Models in a Closed-Loop World
[3:27] Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling
[3:39] Exploratory Diffusion Model for Unsupervised Reinforcement Learning
[3:51] Mean Flow Policy with Instantaneous Velocity Constraint for One-step Action Generation
[4:03] Rodrigues Network for Learning Robot Actions
[4:15] Pareto-Conditioned Diffusion Models for Offline Multi-Objective Optimization
[4:27] Compositional Diffusion with Guided search for Long-Horizon Planning
(ends 4:45 PM)
Orals 3:15-4:37
[3:15] Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training
[3:27] Hallucination Begins Where Saliency Drops
[3:39] Through the Lens of Contrast: Self-Improving Visual Reasoning in VLMs
[3:51] MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction
[4:03] Seeing Through the Brain: New Insights from Decoding Visual Stimuli with fMRI
[4:15] WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM
[4:27] Visual symbolic mechanisms: Emergent symbol processing in Vision Language Models
(ends 4:45 PM)
Orals 3:15-4:25
[3:15] SWINGARENA: Adversarial Programming Arena for Long-context GitHub Issue Solving
[3:27] BIRD-INTERACT: Re-imagining Text-to-SQL Evaluation via Lens of Dynamic Interactions
[3:39] EditBench: Evaluating LLM Abilities to Perform Real-World Instructed Code Edits
[3:51] Agent Data Protocol
[4:03] AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite
[4:15] MedAgentGym: A Scalable Agentic Training Environment for Code-Centric Reasoning in Biomedical Data Science
(ends 4:45 PM)
Orals 3:15-4:01
[3:15] OpenThoughts: Data Recipes for Reasoning Models
[3:27] FRABench and UFEval: Unified Fine-grained Evaluation with Task and Aspect Generalization
[3:39] SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents
[3:51] Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training
(ends 4:45 PM)
Posters 3:15-5:45
(ends 5:45 PM)

SAT 25 APR
10:30 a.m.
Orals 10:30-11:52
[10:30] Diffusion Language Model Knows the Answer Before It Decodes
[10:42] On the Reasoning Abilities of Masked Diffusion Language Models
[10:54] Planner Aware Path Learning in Diffusion Language Models Training
[11:06] Global Resolution: Optimal Multi-Draft Speculative Sampling via Convex Optimization
[11:18] Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding
[11:30] $p\textrm{-less}$ Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding
[11:42] Latent Speech-Text Transformer
(ends 12:00 PM)
Orals 10:30-11:52
[10:30] Stable Video Infinity: Infinite-Length Video Generation with Error Recycling
[10:42] Instilling an Active Mind in Avatars via Cognitive Simulation
[10:54] FlashWorld: High-quality 3D Scene Generation within Seconds
[11:06] MotionStream: Real-Time Video Generation with Interactive Motion Controls
[11:18] EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning
[11:30] $PhyWorldBench$: A Comprehensive Evaluation of Physical Realism in Text-to-Video Models
[11:42] TRACE: Your Diffusion Model is Secretly an Instance Edge Detector
(ends 12:00 PM)
Orals 10:30-11:52
[10:30] Semi-Supervised Preference Optimization with Limited Feedback
[10:42] TROLL: Trust Regions Improve Reinforcement Learning for Large Language Models
[10:54] Multiplayer Nash Preference Optimization
[11:06] The Art of Scaling Reinforcement Learning Compute for LLMs
[11:18] To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models
[11:30] SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety
[11:42] Why DPO is a Misspecified Estimator and How to Fix It
(ends 12:00 PM)
Orals 10:30-11:52
[10:30] Difficult Examples Hurt Unsupervised Contrastive Learning: A Theoretical Perspective
[10:42] Characterizing the Discrete Geometry of ReLU Networks
[10:54] InfoNCE Induces Gaussian Distribution
[11:06] Navigating the Latent Space Dynamics of Neural Models
[11:18] Overparametrization bends the landscape: BBP transitions at initialization in simple Neural Networks
[11:30] Addressing divergent representations from causal interventions on neural networks
[11:42] FIRE: Frobenius-Isometry Reinitialization for Balancing the Stability–Plasticity Tradeoff
(ends 12:00 PM)
Orals 10:30-11:52
[10:30] Uncover Underlying Correspondence for Robust Multi-view Clustering
[10:42] WAFT: Warping-Alone Field Transforms for Optical Flow
[10:54] InfoTok: Adaptive Discrete Video Tokenizer via Information-Theoretic Compression
[11:06] DTO-KD: Dynamic Trade-off Optimization for Effective Knowledge Distillation
[11:18] AnyUp: Universal Feature Upsampling
[11:30] Generating metamers of human scene understanding
[11:42] Plug-and-Play Compositionality for Boosting Continual Learning with Foundation Models
(ends 12:00 PM)
Orals 10:30-11:28
[10:30] BioX-Bridge: Model Bridging for Unsupervised Cross-Modal Knowledge Transfer across Biosignals
[10:42] A Scalable Distributed Framework for Multimodal GigaVoxel Image Registration
[10:54] CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data
[11:06] Decentralized Attention Fails Centralized Signals: Rethinking Transformers for Medical Time Series
[11:18] From movement to cognitive maps: recurrent neural networks reveal how locomotor development shapes hippocampal spatial coding
(ends 12:00 PM)
Posters 10:30-1:00
(ends 1:00 PM)
3:15 p.m.
Orals 3:15-4:25
[3:15] TD-JEPA: Latent-predictive Representations for Zero-Shot Reinforcement Learning
[3:27] Online Learning and Equilibrium Computation with Ranking Feedback
[3:39] Non-Asymptotic Analysis of (Sticky) Track-and-Stop
[3:51] Conformal Robustness Control: A New Strategy for Robust Decision
[4:03] Optimistic Task Inference for Behavior Foundation Models
[4:15] Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search
(ends 4:45 PM)
Orals 3:15-4:37
[3:15] TabStruct: Measuring Structural Fidelity of Tabular Data
[3:27] SAFETY-GUIDED FLOW (SGF): A UNIFIED FRAMEWORK FOR NEGATIVE GUIDANCE IN SAFE GENERATION
[3:39] The Spacetime of Diffusion Models: An Information Geometry Perspective
[3:51] PetaGAIL++: Utility Optimized Private Trajectory Generation with Imitation Learning
[4:03] Structured Flow Autoencoders: Learning Structured Probabilistic Representations with Flow Matching
[4:15] Spherical Watermark: Encryption-Free, Lossless Watermarking for Diffusion Models
[4:27] Latent Fourier Transform
(ends 4:45 PM)
Orals 3:15-4:37
[3:15] Task-free Adaptive Meta Black-box Optimization
[3:27] Differentiable Model Predictive Control on the GPU
[3:39] Triple-BERT: Do We Really Need MARL for Order Dispatch on Ride-Sharing Platforms?
[3:51] Pinet: Optimizing hard-constrained neural networks with orthogonal projection layers
[4:03] Learning to Segment for Vehicle Routing Problems
[4:15] Discount Model Search for Quality Diversity Optimization in High-Dimensional Measure Spaces
[4:27] AutoEP: LLMs-Driven Automation of Hyperparameter Evolution for Metaheuristic Algorithms
(ends 4:45 PM)
Orals 3:15-4:01
[3:15] Train-before-Test Harmonizes Language Model Rankings
[3:27] LLM DNA: Tracing Model Evolution via Functional Representations
[3:39] Hubble: a Model Suite to Advance the Study of LLM Memorization
[3:51] Mixture-of-Experts Can Surpass Dense LLMs Under Strictly Equal Resource
(ends 4:45 PM)
Orals 3:15-4:25
[3:15] Reliable Weak-to-Strong Monitoring of LLM Agents
[3:27] CyberGym: Evaluating AI Agents' Real-World Cybersecurity Capabilities at Scale
[3:39] OpenApps: Simulating Environment Variations to Measure UI Agent Reliability
[3:51] RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments
[4:03] CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmarking of Large Language Models in Mental Health Question Answering
[4:15] WebDevJudge: Evaluating (M)LLMs as Critiques for Web Development Quality
(ends 4:45 PM)
Orals 3:15-4:25
[3:15] RealBench: A Benchmark for Complex Physical Systems with Real-World Data
[3:27] Quotient-Space Diffusion Model
[3:39] DCFold: Efficient Protein Structure Generation with Single Forward Pass
[3:51] Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute
[4:03] FALCON: Few-step Accurate Likelihoods for Continuous Flows
[4:15] Fast training of accurate physics-informed neural networks without gradient descent
(ends 4:45 PM)
Posters 3:15-5:45
(ends 5:45 PM)

SUN 26 APR
9 a.m.
Workshop:
(ends 5:00 PM)
Workshop:
(ends 5:00 PM)
Workshop:
(ends 5:00 PM)
Workshop:
(ends 5:00 PM)

MON 27 APR
9 a.m.