Skip to yearly menu bar
Skip to main content
Main Navigation
ICLR
Help/FAQ
Contact ICLR
Downloads
ICLR Blog
Code of Conduct
Privacy Policy
Create Profile
Reset Password
Journal To Conference Track
Diversity & Inclusion
Proceedings at OpenReview
Future Meetings
Press
Exhibitor Information
ICLR Twitter
About ICLR
My Stuff
Login
Select Year: (2026)
2026
2025
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
Getting Started
Schedule
Main Conference
Invited Talks
Awards
Papers
In-person Orals
Blog Track Posters
Workshops
Community
Town Hall
Socials
Sponsors
Organizers
Help
Getting Started
Layout:
mini
compact
topic
detail
×
No topics available
No sessions available
title
author
topic
session
shuffle
by
serendipity
bookmarked first
visited first
not visited first
bookmarked but not visited
Enable Javascript in your browser to see the papers page.
Skill Learning via Policy Diversity Yields Identifiable Representations for Reinforcement Learning
Testing Most Influential Sets
Variational Deep Learning via Implicit Regularization
Generating metamers of human scene understanding
Adaptive gradient descent on Riemannian manifolds and its applications to Gaussian variational inference
Accelerated co-design of robots through morphological pretraining
FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction
ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance Generation
Mix-Ecom: Towards Mixed-Type E-Commerce Dialogues with Complex Domain Rules
Capturing Visual Environment Structure Correlates with Control Performance
Scaling Goal-conditioned Reinforcement Learning with Multistep Quasimetric Distances
Stochastic Neural Networks for Causal Inference with Missing Confounders
Premise Selection for a Lean Hammer
Two (narrow) heads are better than (an arbitrarily wide) one
K-Sort Eval: Efficient Preference Evaluation for Visual Generation via Corrected VLM-as-a-Judge
Invisible Safety Threat: Malicious Finetuning for LLM via Steganography
Measuring LLM Novelty As The Frontier Of Original And High-Quality Output
TTT3R: 3D Reconstruction as Test-Time Training
SUSD: Structured Unsupervised Skill Discovery through State Factorization
Human3R: Everyone Everywhere All at Once
Flow Caching for Autoregressive Video Generation
Sequential Parallel Duality in Prefix Scannable Models
CR-Net: Scaling Parameter-Efficient Training with Cross-Layer Low-Rank Structure
HiVid: LLM-Guided Video Saliency For Content-Aware VOD And Live Streaming
Uncovering Semantic Selectivity of Latent Groups in Higher Visual Cortex with Mutual Information-Guided Diffusion
SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer
Remotely Detectable Robot Policy Watermarking
CE-Nav: Flow-Guided Reinforcement Refinement for Cross-Embodiment Local Navigation
Eliciting Numerical Predictive Distributions of LLMs Without Auto-Regression
ProReGen: Progressive Residual Generation under Attribute Correlations
Breaking the Total Variance Barrier: Sharp Sample Complexity for Linear Heteroscedastic Bandits with Fixed Action Set
Towards a Sharp Analysis of Learning Offline $f$-Divergence-Regularized Contextual Bandits
LMask: Learn to Solve Constrained Routing Problems with Lazy Masking
Action Chunking and Data Augmentation Yield Exponential Improvements for Imitation Learning in Continuous Spaces
Reconstruction Alignment Improves Unified Multimodal Models
FastVGGT: Fast Visual Geometry Transformer
Compute-Optimal Quantization-Aware Training
ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding
Learning Concept Bottleneck Models from Mechanistic Explanations
COMAL: A Convergent Meta-Algorithm for Aligning LLMs with General Preferences
Architecture-Agnostic Test-Time Adaptation via Backprop-Free Embedding Alignment
QuRL: Rubrics As Judge For Open-Ended Question Answering
Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation
Bradley-Terry and Multi-Objective Reward Modeling Are Complementary
Intention-Conditioned Flow Occupancy Models
VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use
Inferring brain plasticity rule under long-term stimulation with structured recurrent dynamics
Robust Equation Structure learning with Adaptive Refinement
Beyond Linear Probes: Dynamic Safety Monitoring for Language Models
Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining
Do We Need All the Synthetic Data? Targeted Image Augmentation via Diffusion Models
Inference-Time Scaling of Discrete Diffusion Models via Importance Weighting and Optimal Proposal Design
Unbalanced Soft-Matching Distance For Neural Representational Comparison With Partial Unit Correspondence
Efficient Ensemble Conditional Independence Test Framework for Causal Discovery
Binomial Gradient-Based Meta-Learning for Enhanced Meta-Gradient Estimation
Rethinking LLM Evaluation: Can We Evaluate LLMs with 200× Less Data?
StreamingVLM: Real-Time Understanding for Infinite Video Streams
Mamba-3: Improved Sequence Modeling using State Space Principles
LINK: Learning Instance-level Knowledge from Vision-Language Models for Human-Object Interaction Detection
Partition Generative Modeling: Masked Modeling Without Masks
H$^3$DP: Triply‑Hierarchical Diffusion Policy for Visuomotor Learning
Negative Pre-activations Differentiate Syntax
On the Wings of Imagination: Conflicting Script-based Multi-role Framework for Humor Caption Generation
Embodied Agents Meet Personalization: Investigating Challenges and Solutions Through the Lens of Memory Utilization
Think-While-Generating: On-the-Fly Reasoning for Personalized Long-Form Generation
Reinforced Latent Reasoning for LLM-based Recommendation
Improving Attributed Long-form Question Answering with Intent Awareness
Adaptive Domain Shift in Diffusion Models for Cross-Modality Image Translation
How To Open the Black Box: Modern Models for Mechanistic Interpretability
FMIP: Joint Continuous-Integer Flow For Mixed-Integer Linear Programming
Hierarchical Entity-centric Reinforcement Learning with Factored Subgoal Diffusion
STDDN: A Physics-Guided Deep Learning Framework for Crowd Simulation
Are Deep Speech Denoising Models Robust to Adversarial Noise?
LLaVA-FA: Learning Fourier Approximation for Compressing Large Multimodal Models
Harnessing Hyperbolic Geometry for Harmful Prompt Detection and Sanitization
JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation
The Alignment Waltz: Jointly Training Agents to Collaborate for Safety
FlowAD: Ego-Scene Interactive Modeling for Autonomous Driving
Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning
Grasp Any Region: Prompting MLLM to Understand the Dense World
Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning
PRISM: Progressive Robust Learning for Open-World Continual Category Discovery
Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling
DeepTRACE: Auditing Deep Research AI Systems for Tracking Reliability Across Citations and Evidence
Ensemble Prediction of Task Affinity for Efficient Multi-Task Learning
Teach2Eval: An Interaction-Driven LLMs Evaluation Method via Teaching Effectiveness
Efficient Adversarial Attacks on High-dimensional Offline Bandits
Structured Flow Autoencoders: Learning Structured Probabilistic Representations with Flow Matching
Markovian Transformers for Informative Language Modeling
Enhancing Complex Symbolic Logical Reasoning of Large Language Models via Sparse Multi-Agent Debate
Best-of-Majority: Minimax-Optimal Strategy for Pass@k Inference Scaling
Aria: an Agent for Retrieval and Iterative Auto-Formalization via Dependency Graph
Incentivizing Consistent, Effective and Scalable Reasoning Capability in Audio LLMs via Reasoning Process Rewards
Difficult Examples Hurt Unsupervised Contrastive Learning: A Theoretical Perspective
Deep FlexQP: Accelerated Nonlinear Programming via Deep Unfolding
GenSR: Symbolic regression based on equation generative space
Strategic Planning and Rationalizing on Trees Make LLMs Better Debaters
Echo: Towards Advanced Audio Comprehension via Audio-Interleaved Reasoning
Human-Object Interaction via Automatically Designed VLM-Guided Motion Policy
EdgeCape: Edge Weight Prediction For Category-Agnostic Pose Estimation
On the Interaction of Compressibility and Adversarial Robustness
RESFL: An Uncertainty-Aware Framework for Responsible Federated Learning by Balancing Privacy, Fairness and Utility
Two-Way Is Better Than One: Bidirectional Alignment with Cycle Consistency for Exemplar-Free Class-Incremental Learning
UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings
Flow-based Conformal Prediction for Multi-dimensional Time Series
UnLoc: Leveraging Depth Uncertainties for Floorplan Localization
When More is Less: Understanding Chain-of-Thought Length in LLMs
From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning
Curation Leaks: Membership Inference Attacks against Data Curation for Machine Learning
RF-MatID: Dataset and Benchmark for Radio Frequency Material Identification
Cactus: Accelerating Auto-Regressive Decoding with Constrained Acceptance Speculative Sampling
Navigating the Latent Space Dynamics of Neural Models
To Augment or Not to Augment? Diagnosing Distributional Symmetry Breaking
Dragging with Geometry: From Pixels to Geometry-Guided Image Editing
Learning Explicit Single-Cell Dynamics Using ODE Representations
pySpatial: Generating 3D Visual Programs for Zero-Shot Spatial Reasoning
The Diffusion Duality, Chapter II: $\Psi$-Samplers and Efficient Curriculum
On the Limits of Sparse Autoencoders: A Theoretical Framework and Reweighted Remedy
When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models
SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training
Selective Expert Guidance for Effective and Diverse Exploration in Reinforcement Learning of LLMs
Fast and Stable Riemannian Metrics on SPD Manifolds via Cholesky Product Geometry
Text2Grad: Reinforcement Learning from Natural Language Feedback
Is it Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency
Read the Room: Video Social Reasoning with Mental-Physical Causal Chains
Diverse Text Decoding via Iterative Reweighting
Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning
Intrinsic training dynamics of deep neural networks
ActiveCQ: Active Estimation of Causal Quantities
Improving Online-to-Nonconvex Conversion for Smooth Optimization via Double Optimism
Learning Facts at Scale with Active Reading
Libra: Effective yet Efficient Load Balancing for Large-scale MoE Inference
Stable Video Infinity: Infinite-Length Video Generation with Error Recycling
Training-free Counterfactual Explanation for Temporal Graph Model Inference
Any-to-Bokeh: Arbitrary-Subject Video Refocusing with Video Diffusion Model
Exploiting Low-Dimensional Manifold of Features for Few-shot Whole Slide Image Classification
Causal Interpretation of Neural Network Computations with Contribution Decomposition (CODEC)
Sharpness-Aware Machine Unlearning
MVCustom: Multi-View Customized Diffusion via Geometric Latent Rendering and Completion
SAM 3: Segment Anything with Concepts
Augmented Radiance Field: A General Framework for Enhanced Gaussian Splatting
Robust Denoising Neural Reranker for Recommender Systems
DrugTrail: Explainable Drug Discovery via Structured Reasoning and Druggability‑Tailored Preference Optimization
Pinet: Optimizing hard-constrained neural networks with orthogonal projection layers
Householder-Diagonalized Linear Attention (HDLA): Utilizing Enhanced Decay Mechanism for Efficient Sequence Modeling
GranViT: A Fine-Grained Vision Model With Autoregressive Perception For MLLMs
Provably Explaining Neural Additive Models
Towards Self-Evolving Agent Benchmarks : Validatable Agent Trajectory via Test-Time Exploration
A Scalable Constant-Factor Approximation Algorithm for $W_p$ Optimal Transport
LLMs Get Lost In Multi-Turn Conversation
Expressive yet Efficient Feature Expansion with Adaptive Cross-Hadamard Products
How Stable is the Next Token? A Geometric View of LLM Prediction Stability
EditBench: Evaluating LLM Abilities to Perform Real-World Instructed Code Edits
SliderQuant: Accurate Post-Training Quantization for LLMs
VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models
CFT-RAG: An Entity Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter
IVC-Prune: Revealing the Implicit Visual Coordinates in LVLMs for Vision Token Pruning
Adaptive Gaussian Expansion for On-the-fly Category Discovery
From Tokens to Nodes: Semantic-Guided Motion Control for Dynamic 3D Gaussian Splatting
Topological Flow Matching
FALCON: Few-step Accurate Likelihoods for Continuous Flows
The Human Brain as a Dynamic Mixture of Expert Models in Video Understanding
A Fictional Q&A Dataset for Studying Memorization and Knowledge Acquisition
Uni-DPO: A Unified Paradigm for Dynamic Preference Optimization of LLMs
PatchRefiner V2: Fast and Lightweight Real-Domain High-Resolution Metric Depth Estimation
Graph Diffusion Transformers are In-Context Molecular Designers
Coupled Transformer Autoencoder for Disentangling Multi-Region Neural Latent Dynamics
Localized Concept Erasure in Text-to-Image Diffusion Models via High-Level Representation Misdirection
Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression
A Statistical Learning Perspective on Semi-dual Adversarial Neural Optimal Transport Solvers
Unveiling the Cognitive Compass: Theory-of-Mind–Guided Multimodal Emotion Reasoning
Gauge-invariant representation holonomy
STARK: Strategic Team of Agents for Refining Kernels
DIVA-GRPO: Enhancing Multimodal Reasoning through Difficulty-Adaptive Variant Advantage
Toward Faithful Retrieval-Augmented Generation with Sparse Autoencoders
Jailbreaking on Text-to-Video Models via Scene Splitting Strategy
Pre-training under infinite compute
MULTIMODALITY AS SUPERVISION: SELF-SUPERVISED SPECIALIZATION TO THE TEST ENVIRONMENT VIA MULTIMODALITY
Protein Structure Tokenization via Geometric Byte Pair Encoding
MnemoDyn: Learning Resting State Dynamics from $40$K FMRI sequences
Rectifying LLM Thought from Lens of Optimization
Do Vision-Language Models Respect Contextual Integrity in Location Disclosure?
Nef-Net+: Adapting Electrocardio Panorama in the wild
Certifying the Full YOLO Pipeline: A Probabilistic Verification Approach
Context parroting: A simple but tough-to-beat baseline for foundation models in scientific machine learning
Neural Theorem Proving for Verification Conditions: A Real-World Benchmark
Unleashing Guidance Without Classifiers for Human-Object Interaction Animation
Bayesian Neural Networks for Functional ANOVA Model
OptimSyn: Influence-Guided Rubrics Optimization for Synthetic Data Generation
Repurposing Foundation Model for Generalizable Medical Time Series Classification
RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
FSPO: Few-Shot Optimization of Synthetic Preferences Effectively Personalizes to Real Users
Source-Guided Flow Matching
GeomMotif: A Benchmark for Arbitrary Geometric Preservation in Protein Generation
Online Decision-Focused Learning
EventFlash: Towards Efficient MLLMs for Event-Based Vision
BioX-Bridge: Model Bridging for Unsupervised Cross-Modal Knowledge Transfer across Biosignals
Deconstructing Positional Information: From Attention Logits to Training Biases
Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark
Counterfactual Structural Causal Bandits
Fine-Grained Activation Steering: Steering Less, Achieving More
Prompt Curriculum Learning for Efficient LLM Post-Training
MotionStream: Real-Time Video Generation with Interactive Motion Controls
SAFETY-GUIDED FLOW (SGF): A UNIFIED FRAMEWORK FOR NEGATIVE GUIDANCE IN SAFE GENERATION
Locally Subspace-Informed Neural Operators for Efficient Multiscale PDE Solving
Memento: Toward an All-Day Proactive Assistant for Ultra-Long Streaming Video
Physics-Constrained Fine-Tuning of Flow-Matching Models for Generation and Inverse Problems
MAC-AMP: A Closed-Loop Multi-Agent Collaboration System for Multi-Objective Antimicrobial Peptide Design
DiffPBR: Point-Based Rendering via Spatial-Aware Residual Diffusion
RankLLM: Weighted Ranking of LLMs by Quantifying Question Difficulty
Bi-Lipschitz Autoencoder With Injectivity Guarantee
LearnIR: Learnable Posterior Sampling for Real-World Image Restoration
JointDiff: Bridging Continuous and Discrete in Multi-Agent Trajectory Generation
Motion-Aligned Word Embeddings for Text-to-Motion Generation
Learning to Reason as Action Abstractions with Scalable Mid-Training RL
Not All Clients Are Equal: Collaborative Model Personalization on Heterogeneous Multi-Modal Clients
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards
Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents
Evolving Graph Structured Programs for Circuit Generation with Large Language Models
MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents
Generation then Reconstruction: Accelerating Masked Autoregressive Models via Two-Stage Sampling
A Hierarchical Circuit Symbolic Discovery Framework for Efficient Logic Optimization
Building a Foundational Guardrail for General Agentic Systems via Synthetic Data
Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts
Deterministic Bounds and Random Estimates of Metric Tensors on Neuromanifolds
MoAlign: Motion-Centric Representation Alignment for Video Diffusion Models
Neodragon: Mobile Video Generation Using Diffusion Transformer
PreciseCache: Precise Feature Caching for Efficient and High-fidelity Video Generation
The Value of Information in Human-AI Decision-making
Out of the Memory Barrier: A Highly Memory-Efficient Training System for LLMs with Million-Token Contexts
NeuralOS: Towards Simulating Operating Systems via Neural Generative Models
Converge Faster, Talk Less: Hessian-Informed Federated Zeroth-Order Optimization
Open-Set Semantic Gaussian Splatting SLAM with Expandable Representation
MoL: Adaptive Mixture-of-Length Reasoning for Efficient Question Answering with Context
Variation in Verification: Understanding Verification Dynamics in Large Language Models
SceneCOT: Eliciting Chain-of-Thought Reasoning in 3D Scenes
Reconciling Visual Perception and Generation in Diffusion Models
Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction
DiffWind: Physics-Informed Differentiable Modeling of Wind-Driven Object Dynamics
Towards Generalizable PDE Dynamics Forecasting via Physics-Guided Invariant Learning
Uncertainty-Aware 3D Reconstruction for Dynamic Underwater Scenes
Diffusion Language Models are Provably Optimal Parallel Samplers
Uncertainty-Aware Gaussian Map for Vision-Language Navigation
Learning to Weight Parameters for Data Attribution
TrustGen: A Platform of Dynamic Benchmarking on the Trustworthiness of Generative Foundation Models
Master Skill Learning with Policy-Grounded Synergy of LLM-based Reward Shaping and Exploring
Visual symbolic mechanisms: Emergent symbol processing in Vision Language Models
Beyond Visual Reconstruction Quality: Object Perception-aware 3D Gaussian Splatting for Autonomous Driving
A New Approach to Controlling Linear Dynamical Systems
Error Feedback for Muon and Friends
Thinking as Society: Multi-Social-Agent Self-Distillation for Multimodal Misinformation Detection
RefTool: Reference-Guided Tool Creation for Knowledge-Intensive Reasoning
Causality ≠ Invariance: Function vs Concept Vectors in LLMs
On the $O(1/T)$ Convergence of Alternating Gradient Descent–Ascent in Bilinear Games
On the Tension Between Optimality and Adversarial Robustness in Policy Optimization
RECON: Robust symmetry discovery via Explicit Canonical Orientation Normalization
Mechanism of Task-oriented Information Removal in In-context Learning
Constrained Diffusion for Protein Design with Hard Structural Constraints
Optimal Brain Restoration for Joint Quantization and Sparsification of LLMs
SparseD: Sparse Attention for Diffusion Language Models
Distributionally Robust Linear Regression with Block Lewis Weights
RayI2P: Learning Rays for Image-to-Point Cloud Registration
G-Merging: Graph Models Merging for Parameter-Efficient Multi-Task Knowledge Consolidation
Efficient Zero-shot Inpainting with Decoupled Diffusion Guidance
MatRIS: Toward Reliable and Efficient Pretrained Machine Learning Interaction Potentials
PointRePar : SpatioTemporal Point Relation Parsing for Robust Category-Unified 3D Tracking
MIMIC: Mask-Injected Manipulation Video Generation with Interaction Control
Non-Asymptotic Analysis of (Sticky) Track-and-Stop
Square Peg, Round Hole: Plugging Non-Sequential Data into Sequential Language Models
REAP the Experts: Why Pruning Prevails for One-Shot MoE compression
The Deleuzian Representation Hypothesis
DaVinci: Reinforcing Visual-Structural Syntax in MLLMs for Generalized Scientific Diagram Parsing
Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information
Nonparametric Contextual Online Bilateral Trade
LoC-Decomp: LLM Autoformalization via Logical Concept Decomposition and Iterative Feedback Correction
Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Diffusion Diffusion Process
CoEmoGen: Towards Semantically-Coherent and Scalable Emotional Image Content Generation
A General Spatio-Temporal Backbone with Scalable Contextual Pattern Bank for Urban Continual Forecasting
Time-To-Inconsistency: A Survival Analysis of Large Language Model Robustness to Adversarial Attacks
EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging Benchmark
RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers
When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training
Mixture-of-Experts Can Surpass Dense LLMs Under Strictly Equal Resource
Monitoring Decomposition Attacks with Lightweight Sequential Monitors
Compositional Generalization through Gradient Search in Nonparametric Latent Space
Spatial Reasoning with Vision-Language Models in Ego-Centric Multi-View Scenes
A Derandomization Framework for Structure Discovery: Applications in Neural Networks and Beyond
FrugalRAG: Less is More in RL Finetuning for Multi-hop Question Answering
Emotions Where Art Thou: Understanding and Characterizing the Emotional Latent Space of Large Language Models
PoSh: Using Scene Graphs to Guide LLMs-as-a-Judge for Detailed Image Descriptions
Emergent Dexterity Via Diverse Resets and Large-Scale Reinforcement Learning
Evidence for Limited Metacognition in LLMs
Counterfactual LLM-based Framework for Measuring Rhetorical Style
GraphOmni: A Comprehensive and Extensible Benchmark Framework for Large Language Models on Graph-theoretic Tasks
TS-Attn: Temporal-wise Separable Attention for Multi-Event Video Generation
Ensembling Pruned Attention Heads For Uncertainty-Aware Efficient Transformers
RoboInter: A Holistic Intermediate Representation Suite Towards Robotic Manipulation
Revisiting [CLS] and Patch Token Interaction in Vision Transformers
Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs
Representation Alignment for Diffusion Transformers without External Components
Controllable Sequence Editing for Biological and Clinical Trajectories
Efficient Best-of-Both-Worlds Algorithms for Contextual Combinatorial Semi-Bandits
AdaRank: Adaptive Rank Pruning for Enhanced Model Merging
Vision-Language-Action Instruction Tuning: From Understanding to Manipulation
AtC: Aggregate-then-Calibrate for Human-centered Assessment
Midway Network: Learning Representations for Recognition and Motion from Latent Dynamics
BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models
ODE-GS: Latent ODEs for Dynamic Scene Extrapolation with 3D Gaussian Splatting
Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models
Multi-state Protein Design with DynamicMPNN
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces
OpenApps: Simulating Environment Variations to Measure UI Agent Reliability
Concept-based Adversarial Attack: a Probabilistic Perspective
RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs
Dimension-Free Minimax Rates for Learning Pairwise Interactions in Attention-Style Models
KaLM-Embedding-V2: Superior Training Techniques and Data Inspire A Versatile Embedding Model
Completing Missing Annotation: Multi-Agent Debate for Accurate and Scalable Relevant Assessment for IR Benchmarks
ICYM2I: The illusion of multimodal informativeness under missingness
A Framework for Studying AI Agent Behavior: Evidence from Consumer Choice Experiments
HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization
Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability
Foundation Visual Encoders Are Secretly Few-Shot Anomaly Detectors
Diffusion and Flow-based Copulas: Forgetting and Remembering Dependencies
SigmaDock: Untwisting Molecular Docking with Fragment-Based SE(3) Diffusion
Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds
GmNet: Revisiting Gating Mechanisms From A Frequency View
Enhancing Hallucination Detection through Noise Injection
Discovering and Steering Interpretable Concepts in Large Generative Music Models
CO3: CONTRASTING CONCEPTS COMPOSE BETTER
Test-Time Efficient Pretrained Model Portfolios for Time Series Forecasting
Learning to Generate Unit Test via Adversarial Reinforcement Learning
Contrastive Diffusion Guidance for Spatial Inverse Problems
Contrastive Predictive Coding Done Right for Mutual Information Estimation
How Catastrophic is Your LLM? Certifying Risk in Conversation
Symmetry-Aware Bayesian Optimization via Max Kernels
Hybrid Reinforcement: when reward is sparse, better to be dense
Discrete Adjoint Matching
RESTRAIN: From Spurious Votes to Signals — Self-Training RL with Self-Penalization
Micro-Macro Retrieval: Reducing Long-Form Hallucination in Large Language Models
The 99% Success Paradox: When Near-Perfect Retrieval Equals Random Selection
Jet Expansions: Restructuring LLM Computation for Model Inspection
From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance
PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model
Meta-Learning Theory-Informed Inductive Biases using Deep Kernel Gaussian Processes
Enhancing Trustworthiness of Fine-Tuned LLMs via Regularized Subset Selection
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
AQER: A Scalable and Efficient Data Loader for Digital Quantum Computers
On Fairness of Task Arithmetic: The Role of Task Vectors
Asymmetric Synthetic Data Update for Domain Incremental Dataset Distillation
EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning
Forward-Learned Discrete Diffusion: Learning how to noise to denoise faster
Dissecting Non-Determinism in Large Language Models
Divide, Conquer, and Standardize — A Recursive Architecture for Multi-Agent Systems (MAS)
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
Maximizing Asynchronicity in Event-based Neural Networks
Effect of Parallel Environments and Rollout Steps in PPO
WebDevJudge: Evaluating (M)LLMs as Critiques for Web Development Quality
VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications
Native Adaptive Solution Expansion for Diffusion-based Combinatorial Optimization
ProofBridge: Auto-Formalization of Natural Language Proofs in Lean via Joint Embeddings
Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
OmniField: Conditioned Neural Fields for Robust Multimodal Spatiotemporal Learning
Understanding vs. Generation: Navigating Optimization Dilemma in Multimodal Models
AbsTopK: Rethinking Sparse Autoencoders For Bidirectional Features
Singleton-Optimized Conformal Prediction
Good allocations from bad estimates
CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally
Steering Autoregressive Music Generation with Recursive Feature Machines
One protein is all you need
HiMAE: Hierarchical Masked Autoencoders Discover Resolution-Specific Structure in Wearable Time Series
Beyond Sequential Reranking: Reranker-Guided Search Improves Reasoning Intensive Retrieval
SimULi: Real-Time LiDAR and Camera Simulation with Unscented Transforms
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing
Predicting LLM Output Length via Entropy-Guided Representations
FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates
Evaluating Data Influence in Meta Learning
$\textit{MADFormer}$: Mixed Autoregressive and Diffusion Transformers for Continuous Image Generation
AFTER: Mitigating the Object Hallucination of LVLM via Adaptive Factual-Guided Activation Editing
EMFuse: Energy-based Model Fusion for Decision Making
DR-GGAD: Dual Residual Centering for Mitigating Anomaly Non‑Discriminativity in Generalist Graph Anomaly Detection
Composable Sparse Subnetworks via Maximum-Entropy Principle
Controlling Repetition in Protein Language Models
Predictive Differential Training Guided by Training Dynamics
A Step to Decouple Optimization in 3DGS
Part-X-MLLM: Part-aware 3D Multimodal Large Language Model
The Price of Amortized inference in Sparse Autoencoders
Dissecting Representation Misalignment in Contrastive Learning via Influence Function
Generative Value Conflicts Reveal LLM Priorities
A Scalable Distributed Framework for Multimodal GigaVoxel Image Registration
The logical expressiveness of topological neural networks
ReST-KV: Robust KV Cache Eviction with Layer-wise Output Reconstruction and Spatial-Temporal Smoothing
UrbanFeel:A Comprehensive Benchmark for Temporal and Perceptual Understanding of City Scenes through Human Perspective
Scaling Linear Attention with Sparse State Expansion
Earth-Agent: Unlocking the Full Landscape of Earth Observation with Agents
HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation
TrajTok: What makes for a good trajectory tokenizer in behavior generation?
Helix: Evolutionary Reinforcement Learning for Open-Ended Scientific Problem Solving
Pixel-Perfect Puppetry: Precision-Guided Enhancement for Face Image and Video Editing
Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis
CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs
A^2TG: Adaptive Anisotropic Textured Gaussians for Efficient 3D Scene Representation
Characterizing Pattern Matching and Its Limits on Compositional Task Structures
MedVR: Annotation-Free Medical Visual Reasoning via Agentic Reinforcement Learning
BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models
Minimax-Optimal Aggregation for Density Ratio Estimation
Train-before-Test Harmonizes Language Model Rankings
Self-Supervised Evolution Operator Learning for High-Dimensional Dynamical Systems
Layerwise Federated Learning for Heterogeneous Quantum Clients using Quorus
A Fano-Style Accuracy Upper Bound for LLM Single-Pass Reasoning in Multi-Hop QA
CoT Vectors: Transferring and Probing the Reasoning Mechanisms of LLMs
Model Misspecification in Simulation-Based Inference - Recent Advances and Open Challenges
Spectral Bellman Method: Unifying Representation and Exploration in RL
SmellNet: A Large-scale Dataset for Real-world Smell Recognition
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
Interp3D: Correspondence-aware Interpolation for Generative Textured 3D Morphing
Fisher-Rao Sensitivity for Out-of-Distribution Detection in Deep Neural Networks
RoboOmni: Proactive Robot Manipulation in Omni-modal Context
EgoBrain: Synergizing Minds and Eyes For Human Action Understanding
MoCa: Modeling Object Consistency for 3D Camera Control in Video Generation
Joint Shadow Generation and Relighting via Light-Geometry Interaction Maps
PERSISTENCE SPHERES: BI-CONTINUOUS REPRESENTATIONS OF PERSISTENCE DIAGRAMS.
Instance-wise Adaptive Scheduling via Derivative-Free Meta-Learning
SecP-Tuning: Efficient Privacy-Preserving Prompt Tuning for Large Language Models via MPC
Object-Centric World Models from Few-Shot Annotations for Sample-Efficient Reinforcement Learning
ProtoKV: Long-context Knowledges Are Already Well-Organized Before Your Query
MATH-Beyond: A Benchmark for RL to Expand Beyond the Base Model
Mixture-of-World Models: Scaling Multi-Task Reinforcement Learning with Modular Latent Dynamics
One-Step Video Restoration via Diffusion Adversarial Post-Training
OmniCVR: A Benchmark for Omni-Composed Video Retrieval with Vision, Audio, and Text
CoPRS: Learning Positional Prior from Chain-of-Thought for Reasoning Segmentation
InfBaGel: Human-Object-Scene Interaction Generation with Dynamic Perception and Iterative Refinement
Stable-LoRA: Stabilizing Feature Learning of Low-Rank Adaption
Distributional Equivalence in Linear Non-Gaussian Latent-Variable Cyclic Causal Models: Characterization and Learning
Beyond Outliers: A Study of Optimizers Under Quantization
NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction
DND: Boosting Large Language Models with Dynamic Nested Depth
Photon: Speedup Volume Understanding with Efficient Multimodal Large Language Models
FastVMT: Eliminating Redundancy in Video Motion Transfer
GeoDiv: Framework for Measuring Geographical Diversity in Text-to-Image Models
RobustSpring: Benchmarking Robustness to Image Corruptions for Optical Flow, Scene Flow and Stereo
Graph Representational Learning: When Does More Expressivity Hurt Generalization?
How Reliable is Language Model Micro-Benchmarking?
Universal Model Routing for Efficient LLM Inference
SelvaBox: A high‑resolution dataset for tropical tree crown detection
CMT-Benchmark: A Benchmark for Condensed Matter Theory Built by Expert Researchers
Federated Learning with Profile Mapping under Distribution Shifts and Drifts
ICaRus: Identical Cache Reuse for Efficient Multi-Model Inference
Learning Distributions over Permutations and Rankings with Factorized Representations
Explore-on-Graph: Incentivizing Autonomous Exploration of Large Language Models on Knowledge Graphs with Path-refined Reward Modeling
Peak-Return Greedy Slicing: Subtrajectory Selection for Transformer-based Offline RL
An Information-Theoretic Parameter-Free Bayesian Framework for Probing Labeled Dependency Trees from Attention Score
LogART: Pushing the Limit of Efficient Logarithmic Post-Training Quantization
TIPO: Text to Image with Text Pre-sampling for Prompt Optimization
Code2Bench: Scaling Source and Rigor for Dynamic Benchmark Construction
Break the Trade-off Between Watermark Strength and Speculative Sampling Efficiency for Language Models
ReTrace: Reinforcement Learning-Guided Reconstruction Attacks on Machine Unlearning
Cross-ControlNet: Training-Free Fusion of Multiple Conditions for Text-to-Image Generation
Realtime Video Frame Interpolation using One-Step Diffusion Sampling
Self-Supervised Learning from Structural Invariance
T1: One-to-One Channel-Head Binding for Multivariate Time-Series Imputation
Improved Adversarial Diffusion Compression for Real-World Video Super-Resolution
Massive Activations are the Key to Local Detail Synthesis in Diffusion Transformers
Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization
Prune Redundancy, Preserve Essence: Vision Token Compression in VLMs via Synergistic Importance-Diversity
Wait, Do We Need to Wait? Revisiting Budget Forcing for Sequential Test-Time Scaling
ARTDECO: Toward High-Fidelity On-the-Fly Reconstruction with Hierarchical Gaussian Structure and Feed-Forward Guidance
Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning
BANZ-FS: BANZSL Fingerspelling Dataset
InclusiveVidPose: Bridging the Pose Estimation Gap for Individuals with Limb Deficiencies in Video-Based Motion
When Machine Learning Gets Personal: Evaluating Prediction and Explanation
Safeguarding Multimodal Knowledge Copyright in the RAG-as-a-Service Environment
PLAGUE: Plug-and-play Framework for Lifelong Adaptive Generation of Multi-turn Exploits
Inheriting Generalizable Knowledge from LLMs to Diverse Vertical Tasks
FlashDLM: Accelerating Diffusion Language Model Inference via Efficient KV Caching and Guided Diffusion
On The Surprising Effectiveness of a Single Global Merging in Decentralized Learning
Mordal: Automated Pretrained Model Selection for Vision Language Models
Towards Bridging the Gap between Large-Scale Pretraining and Efficient Finetuning for Humanoid Control
Strongly Convex Sets in Riemannian Manifolds
Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs
Geometry-aware Policy Imitation
ReIn: Conversational Error Recovery with Reasoning Inception
Learning-Time Encoding Shapes Unlearning in LLMs
Deep Learning with Learnable Product-Structured Activations
The Unseen Frontier: Pushing the Limits of LLM Sparsity with Surrogate-Free ADMM
Multi-Action Self-Improvement For Neural Combinatorial Optimization
Robust Training of Neural Networks at Arbitrary Precision and Sparsity
Efficient Benchmarking of Functional Connectivity Modeling via Structure-aware Core-set Selection
Dual-Objective Reinforcement Learning with Novel Hamilton-Jacobi-Bellman Formulations
On Measuring Influence in Avoiding Undesired Future
MedLesionVQA: A Multimodal Benchmark Emulating Clinical Visual Diagnosis for Body Surface Health
GuirlVG: Incentivize GUI Visual Grounding via Empirical Exploration on Reinforcement Learning
From Embedding to Control: Representations for Stochastic Multi-Object Systems
Information Shapes Koopman Representation
TaskCraft: Automated Generation of Agentic Tasks
ACADREASON: Exploring the Limits of Reasoning Models with Academic Research Problems
What's In My Human Feedback? Learning Interpretable Descriptions of Preference Data
From Reproduction to Replication: Evaluating Research Agents with Progressive Code Masking
Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence
Beyond Raw Detection Scores: Markov-Informed Calibration for Boosting Machine-Generated Text Detection
The Limits of Inference Scaling Through Resampling
Understanding the Dynamics of Forgetting and Generalization in Continual Learning via the Neural Tangent Kernel
Free Energy Mixer
Discount Model Search for Quality Diversity Optimization in High-Dimensional Measure Spaces
Learning Massively Multitask World Models for Continuous Control
Subspace Kernel Learning on Tensor Sequences
TIPS: Turn-level Information-Potential Reward Shaping for Search-Augmented LLMs
Efficient Multimodal Spatial Reasoning via Dynamic and Asymmetric Routing
Train Once, Answer All: Many Pretraining Experiments for the Cost of One
FLOWER: A Flow-Matching Solver for Inverse Problems
A Unifying View of Coverage in Linear Off-policy Evaluation
Why Less is More (Sometimes): A Theory of Data Curation
Latent Wasserstein Adversarial Imitation Learning
Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models
Adaptive Data-Knowledge Alignment in Genetic Perturbation Prediction
Learning to summarize user information for personalized reinforcement learning from human feedback
APT: Towards Universal Scene Graph Generation via Plug-in Adaptive Prompt Tuning
The Coverage Principle: How Pre-Training Enables Post-Training
Robust Adversarial Quantification via Conflict-Aware Evidential Deep Learning
Edit-Based Flow Matching for Temporal Point Processes
InfoDet: A Dataset for Infographic Element Detection
Visual Self-Refine: A Pixel-Guided Paradigm for Accurate Chart Parsing
Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models
TAO-Attack: Toward Advanced Optimization-Based Jailbreak Attacks for Large Language Models
Feature segregation by signed weights in artificial vision systems and biological models
ReactID: Synchronizing Realistic Actions and Identity in Personalized Video Generation
Efficient Sliced Wasserstein Distance Computation via Adaptive Bayesian Optimization
mCLM: A Modular Chemical Language Model that Generates Functional and Makeable Molecules
AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
Unifying Stable Optimization and Reference Regularization in RLHF
Teaching LLMs to Admit Uncertainty in OCR
TPRU: Advancing Temporal and Procedural Understanding in Large Multimodal Models
Bilevel Optimization with Lower-Level Uniform Convexity: Theory and Algorithm
Implicit Sensing for Fourier Sparse Boolean Functions
MaskCO: Masked Generation Drives Effective Representation Learning and Exploiting for Combinatorial Optimization
FedOpenMatch: Towards Semi-Supervised Federated Learning in Open-Set Environments
Learning Admissible Heuristics for A*: Theory and Practice
CoMind: Towards Community-Driven Agents for Machine Learning Engineering
Grounding-IQA: Grounding Multimodal Language Model for Image Quality Assessment
Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training
Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution
Emergent Misalignment is Easy, Narrow Misalignment is Hard
Scaling Generalist Data-Analytic Agents
RAP: 3D Rasterization Augmented End-to-End Planning
Scaling with Collapse: Efficient and Predictable Training of LLM Families
PTNET: A PROPOSAL-CENTRIC TRANSFORMER NET- WORK FOR 3D OBJECT DETECTION
VERIFY: A Novel Multi-Domain Dataset Grounding LTL in Contextual Natural Language via Provable Intermediate Logic
Graph Random Features for Scalable Gaussian Processes
Circuit Insights: Towards Interpretability Beyond Activations
Value Gradient Flow: Behavior-Regularized RL without Regularization
In-Context Compositional Q-Learning for Offline Reinforcement Learning
Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments
Multiverse Mechanica: A Testbed for Learning Game Mechanics via Counterfactual Worlds
TRIBE: TRImodal Brain Encoder for whole-brain fMRI response prediction
DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving
Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match
ViPRA: Video Prediction for Robot Actions
Preventing Model Collapse Under Overparametrization: Optimal Mixing Ratios for Interpolation Learning and Ridge Regression
Sparse CLIP: Co-Optimizing Interpretability and Performance in Contrastive Learning
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training
LLMs Must Think Thrice to Solve Executable Counterfactuals
Refine Drugs, Don’t Complete Them: Uniform-Source Discrete Flows for Fragment-Based Drug Discovery
PASER: Post-Training Data Selection for Efficient Pruned Large Language Model Recovery
Pedagogically-Inspired Data Synthesis for Language Model Knowledge Distillation
Part-level Semantic-guided Contrastive Learning for Fine-grained Visual Classification
Uncertainty Matters in Dynamic Gaussian Splatting for Monocular 4D Reconstruction
Spatial CAPTCHA: Generatively Benchmarking Spatial Reasoning for Human-Machine Differentiation
A Brain-Inspired Gating Mechanism Unlocks Robust Computation in Spiking Neural Networks
High-dimensional Analysis of Synthetic Data Selection
IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment
Can Transformers Really Do It All? On the Compatibility of Inductive Biases Across Tasks
Set Representation Auxiliary Learning with Adversarial Encoding Perturbation and Optimization
Cutting the Skip: Training Residual-Free Transformers
SketchThinker-R1: Towards Efficient Sketch-Style Reasoning in Large Multimodal Models
PriorGuide: Test-Time Prior Adaptation for Simulation-Based Inference
HAMLET: Hyperadaptive Agent-based Modeling for Live Embodied Theatrics
Rote Learning Considered Useful: Generalizing over Memorized Data in LLMs
SCOPED: Score–Curvature Out-of-distribution Proximity Evaluator for Diffusion
FideDiff: Efficient Diffusion Model for High-Fidelity Image Motion Deblurring
Refining Hybrid Genetic Search for CVRP via Reinforcement Learning-Finetuned LLM
Seq vs Seq: An Open Suite of Paired Encoders and Decoders
On the Theoretical Limitations of Embedding-Based Retrieval
MobiEdit: Resource-efficient Knowledge Editing for Personalized On-device LLMs
RECODE: A Benchmark for Research Code DEvelopment with Interactive Human Feedback
Scalable Second-order Riemannian Optimization for $K$-means Clustering
HARDTESTGEN: A High-Quality RL Verifier Generation Pipeline for LLM Algorithimic Coding
Video Scene Segmentation with Genre and Duration Signals
Efficient Offline Reinforcement Learning via Peer-Influenced Constraint
Post-training Large Language Models for Diverse High-Quality Responses
GenCape: Structure-Inductive Generative Modeling for Category-Agnostic Pose Estimation
AEGIS: Adversarial Target–Guided Retention-Data-Free Robust Concept Erasure from Diffusion Models
Gaussian certified unlearning in high dimensions: A hypothesis testing approach
Visualizing LLM Latent Space Geometry Through Dimensionality Reduction
EasyTune: Efficient Step-Aware Fine-Tuning for Diffusion-Based Motion Generation
Intrinsic Explanation of Random Subspace Method for Enhanced Security Applications
Label Smoothing Improves Machine Unlearning
Dr.LLM: Dynamic Layer Routing in LLMs
How Base Frequency Shapes RoPE: An Analytical Study of Frequency-Band Formation
Pre-training Limited Memory Language Models with Internal and External Knowledge
Ringleader ASGD: The First Asynchronous SGD with Optimal Time Complexity under Data Heterogeneity
Back to Square Roots: An Optimal Bound on the Matrix Factorization Error for Multi-Epoch Differentially Private SGD
NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks
Regulating Internal Evidence Flows for Robust Learning Under Spurious Correlations
Learning to Recall with Transformers Beyond Orthogonal Embeddings
GIT-BO: High-Dimensional Bayesian Optimization with Tabular Foundation Models
Long-tailed Test-Time Adaptation for Vision-Language Models
Omni-iEEG: A Large-Scale, Comprehensive iEEG Dataset and Benchmark for Epilepsy Research
Retrieval-of-Thought: Efficient Reasoning via Reusing Thoughts
Foundation Models for Causal Inference via Prior-Data Fitted Networks
Instilling an Active Mind in Avatars via Cognitive Simulation
Debugging Concept Bottleneck Models through Removal and Retraining
Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs
Incentives in Federated Learning with Heterogeneous Agents
Estimating Dimensionality of Neural Representations from Finite Samples
Belief-Based Offline Reinforcement Learning for Delay-Robust Policy Optimization
LABEL-FREE MITIGATION OF SPURIOUS CORRELATIONS IN VLMS USING SPARSE AUTOENCODERS
Decoupled MeanFlow: Turning Flow Models into Flow Maps for Accelerated Sampling
Reasoning without Training: Your Base Model is Smarter Than You Think
HAMLET: Switch Your Vision-Language-Action Model into a History-Aware Policy
IDEAL: Data Equilibrium Adaptation for Multi-Capability Language Model Alignment
From Sequential to Parallel: Reformulating Dynamic Programming as GPU Kernels for Large-Scale Stochastic Combinatorial Optimization
WRING Out The Bias: A Rotation-Based Alternative To Projection Debiasing
The Hidden Lattice Geometry of LLMs
Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs
Hybrid Deep Searcher: Scalable Parallel and Sequential Search Reasoning
Heterogeneous Agent Q-weighted Policy Optimization
A tale of two tails: Preferred and anti-preferred natural stimuli in visual cortex
GeoPurify: A Data-Efficient Geometric Distillation Framework for Open-Vocabulary 3D Segmentation
BioTamperNet: Affinity-Guided State-Space Model Detecting Tampered Biomedical Images
PolicyFlow: Policy Optimization with Continuous Normalizing Flow in Reinforcement Learning
EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning
Beyond Pairwise: Empowering LLM Alignment With (Ranked) Choice Modeling
Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling
Continual Unlearning for Text-to-Image Diffusion Models: A Regularization Perspective
Neuron-Level Analysis of Cultural Understanding in Large Language Models
REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Reasoning
Score Distillation Beyond Acceleration: Generative Modeling from Corrupted Data
Shop-R1: Rewarding LLMs to Simulate Human Behavior in Online Shopping via Reinforcement Learning
JAPAN: Joint Adaptive Prediction Areas with Normalising Flow
EdiVal-Agent: An Object-Centric Framework for Automated, Fine-Grained Evaluation of Multi-Turn Editing
SpeechOp: Inference-Time Task Composition for Generative Speech Processing
``Noisier'’ Noise Contrastive Estimation is (Almost) Maximum Likelihood
CardioComposer: Leveraging Differentiable Geometry for Compositional Control of Anatomical Diffusion Models
AlphaAlign: Incentivizing Safety Alignment with Extremely Simplified Reinforcement Learning
Any-Order Flexible Length Masked Diffusion
RankFlow: Property-aware Transport for Protein Optimization
Towards Multimodal Time Series Anomaly Detection with Semantic Alignment and Condensed Interaction
Unlocking the Value of Text: Event-Driven Reasoning and Multi-Level Alignment for Time Series Forecasting
Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards
Text-Aware Image Restoration with Diffusion Models
Off-Trajectory Reasoning: Can LLMs Collaborate on Reasoning Trajectory?
FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models
RAR: Reversing Visual Attention Re-Sinking for Unlocking Potential in Multimodal Large Language Models
A Structured, Tagged, and Localized Visual Question Answering Dataset with Full Sentence Answers and Scene Graphs for Chest X-ray Images
Short Window Attention Enables Long-Term Memorization
VQ-Transplant: Efficient VQ-Module Integration for Pre-trained Visual Tokenizers
Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations
Where’s the Chicken? Unpacking Spatial Awareness in Vision-Language Models
Reasoning-Aligned Perception Decoupling for Scalable Multi-modal Reasoning
AttTok: Marrying Attribute Tokens with Generative Pre-trained Vision-Language Models towards Medical Image Understanding
Learning residue level protein dynamics with multiscale Gaussians
Differentiable JPEG-based Input Perturbation for Knowledge Distillation Amplification via Conditional Mutual Information Maximization
Consis-GCPO: Consistency-Preserving Group Causal Preference Optimization for Vision Customization
Negotiated Reasoning: On Provably Addressing Relative Over-Generalization
AutoQD: Automatic Discovery of Diverse Behaviors with Quality-Diversity Optimization
Revela: Dense Retriever Learning via Language Modeling
Joint Adaptation of Uni-modal Foundation Models for Multi-modal Alzheimer's Disease Diagnosis
PAGE-4D: Disentangled Pose and Geometry Estimation for 4D Perception
H$^3$GNNs: Harmonizing Heterophily and Homophily in GNNs via Self-Supervised Node Encoding
U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding
A Recovery Guarantee for Sparse Neural Networks
OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding
Adaptive Moments are Surprisingly Effective for Plug-and-Play Diffusion Sampling
Bidirectional Predictive Coding
Flow Matching Policy Gradients
LLM2Fx-Tools: Tool Calling for Music Post-Production
LiveMoments: Reselected Key Photo Restoration in Live Photos via Reference-guided Diffusion
LogicXGNN: Grounded Logical Rules for Explaining Graph Neural Networks
FedMC: Federated Manifold Calibration
Towards Faithful Reasoning in Remote Sensing: A Perceptually-Grounded GeoSpatial Chain-of-Thought for Vision-Language Models
Shoot First, Ask Questions Later? Building Rational Agents that Explore and Act Like People
Rescue: Retrieval Augmented Secure Code Generation
Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime
DHG-Bench: A Comprehensive Benchmark for Deep Hypergraph Learning
ReLaSH: Reconstructing Joint Latent Spaces for Efficient Generation of Synthetic Hypergraphs with Hyperlink Attributes
Triangle Multiplication is All You Need for Biomolecular Structure Representations
Optimizer Choice Matters For The Emergence of Neural Collapse
BAR: Refactor the Basis of Autoregressive Visual Generation
Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding
Neural Predictor-Corrector: Solving Homotopy Problems with Reinforcement Learning
SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents
FlexProtein: Joint Sequence and Structure Pretraining for Protein Modeling
Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model
IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction
End-to-End Probabilistic Framework for Learning with Hard Constraints
MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
Polynomial Convergence of Riemannian Diffusion Models
ToolWeaver: Weaving Collaborative Semantics for Scalable Tool Use in Large Language Models
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
Neural+Symbolic Approaches for Interpretable Actor-Critic Reinforcement Learning
QuoKA: Query-Oriented KV Selection for Efficient LLM Prefill
Unified Biomolecular Trajectory Generation via Pretrained Variational Bridge
BIRD-INTERACT: Re-imagining Text-to-SQL Evaluation via Lens of Dynamic Interactions
ssToken: Self-modulated and Semantic-aware Token Selection for LLM Fine-tuning
Perception-Aware Policy Optimization for Multimodal Reasoning
What Do Large Language Models Know About Opinions?
Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation
Evolution of Concepts in Language Model Pre-Training
Multimodal Policy Internalization for Conversational Agents
Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition
Reliability-Adjusted Prioritized Experience Replay
Discrete Variational Autoencoding via Policy Search
Dual-Space Smoothness for Robust and Balanced LLM Unlearning
Procedural Mistake Detection via Action Effect Modeling
QuaMo: Quaternion Motions for Vision-based 3D Human Kinematics Capture
Value Flows
Learning to Reason in Structured In-context Environments with Reinforcement Learning
Escaping Policy Contraction: Contraction-Aware PPO (CaPPO) for Stable Language Model Fine-Tuning
OpenThoughts: Data Recipes for Reasoning Models
TangleScore: Tangle-Guided Purge and Imprint for Unstructured Knowledge Editing
A Theoretical Analysis of Mamba’s Training Dynamics: Filtering Relevant Features for Generalization in State Space Models
RAEE: A Robust Retrieval-Augmented Early Exit Framework for Efficient Inference
Task-free Adaptive Meta Black-box Optimization
Adaptive Mamba Neural Operators
Learning to Grasp Anything By Playing with Random Toys
Discovering alternative solutions beyond the simplicity bias in recurrent neural networks
Robust Adversarial Attacks Against Unknown Disturbance via Inverse Gradient Sample
Is In-Context Learning Learning?
Tequila: Deadzone-free Ternary Quantization for Large Language Models
SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors
Learning Ordinal Probabilistic Reward from Preferences
Learning to Maximize Rewards via Reaching Goals
NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching
ProtoTS: Learning Hierarchical Prototypes for Explainable Time Series Forecasting
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
INTIMA: A Benchmark for Human-AI Companionship Behavior
A Relative Error-Based Evaluation Framework of Heterogeneous Treatment Effect Estimators
Structure Learning from Time-Series Data with Lag-Agnostic Structural Prior
Mean Estimation from Coarse Data: Characterizations and Efficient Algorithms
Contractive Diffusion Policies: Robust Action Diffusion via Contractive Score-Based Sampling with Differential Equations
MergeTune: Continued Fine-Tuning of Vision-Language Models
Object Fidelity Diffusion for Remote Sensing Image Generation
GUI-Shift: Enhancing VLM-Based GUI Agents through Self-supervised Reinforcement Learning
Characterizing the Discrete Geometry of ReLU Networks
Unified 3D Scene Understanding Through Physical World Modeling
LiveResearchBench: Benchmarking Single- and Multi-Agent Systems for Citation-Grounded Deep Research
CoFact: Conformal Factuality Guarantees for Language Models under Distribution Shift
SAGA: Structural Aggregation Guided Alignment with Dynamic View and Neighborhood Order Selection for Multiview Graph Domain Adaptation
Expert Heads: Robust Evidence Identification for Large Language Models
On the Wasserstein Geodesic Principal Component Analysis of probability measures
Rethinking Unsupervised Cross-modal Flow Estimation: Learning from Decoupled Optimization and Consistency Constraint
Preserving Forgery Artifacts: AI-Generated Video Detection at Native Scale
Textual Bayes: Quantifying Uncertainty in LLM-Based Systems
LLM Pretraining with Continuous Concepts
Primary-Fine Decoupling for Action Generation in Robotic Imitation
GRL-SNAM: Geometric Reinforcement Learning with Differential Hamiltonians for Navigation and Mapping in Unknown Environments
ELEPHANT: Measuring and understanding social sycophancy in LLMs
Pretrain–Test Task Alignment Governs Generalization in In-Context Learning
UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice
A Rich Knowledge Space for Scalable Deepfake Detection
Multiplayer Nash Preference Optimization
Strictly Constrained Generative Modeling via Split Augmented Langevin Sampling
RegionE: Adaptive Region-Aware Generation for Efficient Image Editing
DepthLM: Metric Depth from Vision Language Models
TraPO: A Semi-Supervised Reinforcement Learning Framework for Boosting LLM Reasoning
E²LoRA: Efficient and Effective Low-Rank Adaptation with Entropy-Guided Adaptive Sharing
Gradient Intrinsic Dimensionality Alignment:Narrowing The Gap Between Low-Rank Adaptation and Full Fine-Tuning
When a Robot is More Capable than a Human: Learning from Constrained Demonstrators
LiTo: Surface Light Field Tokenization
MARS-Sep: Multimodal-Aligned Reinforced Sound Separation
Temporal Graph Thumbnail: Robust Representation Learning with Global Evolutionary Skeleton
Extracting Model Precision from 20 Logprobs
ScalingCache: Extreme Acceleration of DiTs through Difference Scaling and Dynamic Interval Caching
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse
A Training-Free Framework for Long Video Understanding via Video-Query-Options Similarity
Is the evidence in 'Language Models Learn to Mislead Humans via RLHF' valid?
UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction as Reasoning
VisJudge-Bench: Aesthetics and Quality Assessment of Visualizations
Towards Persistent Noise-Tolerant Active Learning of Regular Languages with Class Query
LycheeDecode: Accelerating Long-Context LLM Inference via Hybrid-Head Sparse Decoding
Syncphony: Synchronized Audio-to-Video Generation with Diffusion Transformers
Charts Are Not Images: On the Challenges of Scientific Chart Editing
DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning
Equivariant Splitting: Self-supervised learning from incomplete data
Grounding and Enhancing Informativeness and Utility in Dataset Distillation
Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments
LLM as an Algorithmist: Enhancing Anomaly Detectors via Programmatic Synthesis
Huxley-G\"odel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine
DSA: Efficient Inference For Video Generation Models via Distributed Sparse Attention
Enhancing Communication Compression via Discrepancy-aware Calibration for Federated Learning
Solving General-Utility Markov Decision Processes in the Single-Trial Regime with Online Planning
UniHand: A Unified Model for Diverse Controlled 4D Hand Motion Modeling
AlignSep: Temporally-Aligned Video-Queried Sound Separation with Flow Matching
ContextNav: Towards Agentic Multimodal In-Context Learning
SpatialHand: Generative Object Manipulation from 3D Prespective
UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity
CoAct-1: Computer-using Multi-agent System with Coding Actions
Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs
Time Is All It Takes: Spike-Retiming Attacks on Event-Driven Spiking Neural Networks
APPLE: Toward General Active Perception via Reinforcement Learning
From REINFORCE to Dr. GRPO: A Unified Perspective on LLM Post-Training
Directional Textual Inversion for Personalized Text-to-Image Generation
Speech-to-LaTeX: New Models and Datasets for Converting Spoken Equations and Sentences
AUHead: Realistic Emotional Talking Head Generation via Action Units Control
PoseX: AI Defeats Physics-based Methods on Protein Ligand Cross-Docking
Multimodal Dataset Distillation via Phased Teacher Models
GIQ: Benchmarking 3D Geometric Reasoning of Vision Foundation Models with Simulated and Real Polyhedra
Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games
Grounding Generative Planners in Verifiable Logic: A Hybrid Architecture for Trustworthy Embodied AI
CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation
Deep SPI: Safe Policy Improvement via World Models
S2R-HDR: A Large-Scale Rendered Dataset for HDR Fusion
Not All Bits Are Equal: How Model Scale Changes Memory-Optimal Reasoning
DePO: Demonstration-guided Policy Optimization for Molecular Optimization
SpatialViz-Bench: A Cognitively-Grounded Benchmark for Diagnosing Spatial Visualization in MLLMs
Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models
Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning
NAB: Neural Adaptive Binning for Sparse-View CT reconstruction
Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks
ProofFlow: A Dependency Graph Approach to Faithful Proof Autoformalization
Zero-shot Human Pose Estimation using Diffusion-based Inverse solvers
HUMOF: Human Motion Forecasting in Interactive Social Scenes
Patching Gaps In LLM Reasoning With Interventional Training
MEGS^{2}: Memory-Efficient Gaussian Splatting via Spherical Gaussians and Unified Pruning
SVD Provably Denoises Nearest Neighbor Data
e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs
Trust The Typical
Real-Time Robot Execution with Masked Action Chunking
Automatic Dialectic Jailbreak: A Framework for Generating Effective Jailbreak Strategies
Step-Aware Residual-Guided Diffusion for EEG Spatial Super-Resolution
Closing the Modality Gap Aligns Group-Wise Semantics
Training LLMs with LogicReward for Faithful and Rigorous Reasoning
Synergizing Understanding and Generation with Interleaved Analyzing-Drafting Thinking
Multi-Agent Guided Policy Optimization
Bilateral Information-aware Test-time Adaptation for Vision-Language Models
Winter Soldier: Backdooring Language Models at Pre-Training with Indirect Data Poisoning
Pragma-VL: Towards a Pragmatic Arbitration of Safety and Helpfulness in MLLMs
Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models
Reducing Class-Wise Performance Disparity via Margin Regularization
Do LLMs Forget What They Should? Evaluating In-Context Forgetting in Large Language Models
TimeSeg: An Information-Theoretic Segment-Wise Explainer for Time-Series Predictions
AudioX: A Unified Framework for Anything-to-Audio Generation
SE-Diff: Simulator and Experience Enhanced Diffusion Model for Comprehensive ECG Generation
Learning Molecular Chirality via Chiral Determinant Kernels
RIVER: Real-time Video Interaction Benchmark
Parallel Sampling from Masked Diffusion Models via Conditional Independence Testing
Diffusion Models as Dataset Distillation Priors
Deconstructing Guidance: A Semantic Hierarchy for Precise Diffusion Model Editing
FaLW: A Forgetting-aware Loss Reweighting for Long-tailed Unlearning
ExpertLongBench: Benchmarking Language Models on Expert-Level Long-Form Generation Tasks with Structured Checklists
Differentially Private Domain Discovery
SeeDNorm: Self-Rescaled Dynamic Normalization
ResWorld: Temporal Residual World Model for End-to-End Autonomous Driving
ScaleCap: Scalable Image Captioning via Dual-Modality Debiasing
KaVa: Latent Reasoning via Compressed KV-Cache Distillation
Propaganda AI: An Analysis of Semantic Divergence in Large Language Models
From Markov to Laplace: How Mamba In-Context Learns Markov Chains
GOT-Edit: Geometry-Aware Generic Object Tracking via Online Model Editing
Gradient Descent Dynamics of Rank-One Matrix Denoising
A High Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation
Discrete Diffusion for Bundle Construction
Guided Policy Optimization under Partial Observability
SRT: Super-Resolution for Time Series via Disentangled Rectified Flow
Preference-based Policy Optimization from Sparse-reward Offline Dataset
ExpGuard: LLM Content Moderation in Specialized Domains
Rethinking Residual Errors in Compensation-based LLM Quantization
Distill-SynthKG: Distilling Knowledge Graph Synthesis Workflow for Improved Coverage and Efficiency
Pareto Variational Autoencoder
Branch and Bound Search for Exact MAP Inference in Credal Networks
SpotIt: Evaluating Text-to-SQL Evaluation with Formal Verification
MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation
LoRA-S: An Efficient Low Rank Adaptation scheme via Sylvester equation
UniCalli: A Unified Diffusion Framework for Column-Level Generation and Recognition of Chinese Calligraphy
InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models
Enabling Fine-Tuning of Direct Feedback Alignment via Feedback-Weight Matching
ON THE ROLE OF IMPLICIT REGULARIZATION OF STOCHASTIC GRADIENT DESCENT IN GROUP ROBUSTNESS
Gelato: Graph Edit Distance via Autoregressive Neural Combinatorial Optimization
CitySeeker: How Do VLMs Explore Embodied Urban Navigation with Implicit Human Needs?
Improving Text-guided CAD Prototyping via Modality-Specific Tokenization
MindMix: A Multimodal Foundation Model for Auditory Perception Decoding via Deep Neural-Acoustic Alignment
Improving Set Function Approximation with Quasi-Arithmetic Neural Networks
$\alpha$-DPO: Robust Preference Alignment for Diffusion Models via $\alpha$ Divergence
Bures Generalized Category Discovery
RESA: Bringing Back What Sparse Attention Ignores with Residual Estimation
When Priors Backfire: On the Vulnerability of Unlearnable Examples to Pretraining
The Layered Ontology of Models, Resolving the Epistemological Crisis of AI
FETAL-GAUGE: A BENCHMARK FOR ASSESSING VISION-LANGUAGE MODELS IN FETAL ULTRASOUND
RedSage: A Cybersecurity Generalist LLM
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation
Visual Jigsaw Post-Training Improves MLLMs
Dataless Weight Disentanglement in Task Arithmetic via Kronecker-Factored Approximate Curvature
Fantastic Tractor-Dogs and How Not to Find Them With Open-Vocabulary Detectors
GoR: A Unified and Extensible Generative Framework for Ordinal Regression
Why Adversarially Train Diffusion Models?
$\mathbf{T^3}$: Reducing Belief Deviation in Reinforcement Learning for Active Reasoning
AlphaAgentEvo: Evolution-Oriented Alpha Mining via Self-Evolving Agentic Reinforcement Learning
Why Keep Your Doubts to Yourself? Trading Visual Uncertainties in Multi-Agent Bandit Systems
Implicit Inversion turns CLIP into a Decoder
SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization
SCUBA: Salesforce Computer Use Benchmark
Spilled Energy in Large Language Models
PRISM: Enhancing PRotein Inverse Folding through Fine- Grained Retrieval on Structure-Sequence Multimodal Representations
Peng's Q($\lambda$) for Conservative Value Estimation in Offline Reinforcement Learning
Don't Look Up (Every Token): Escaping Quadratic Complexity via Geometric Patterns and Algorithms
Unmute the Patch Tokens: Rethinking Probing in Multi-Label Audio Classification
Glance and Focus Reinforcement for Pan-cancer Screening
On Discriminative vs. Generative classifiers: Rethinking MLLMs for Action Understanding
Group Representational Position Embedding
Soft Quality-Diversity Optimization
BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design
Next-ToBE: Probabilistic Next Token-Bag Exploitation for Activating Anticipatory Capacity in LLMs
SongEcho: Cover Song Generation via Instance-Adaptive Element-wise Linear Modulation
Performative Prediction made practical
LLMs Struggle to Balance Reasoning and World Knowledge in Causal Narrative Understanding
Test-Time Accuracy-Cost Control in Neural Simulators via Recurrent-Depth
On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning
Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering
Music Flamingo: Scaling Music Understanding in Audio Language Models
Tug-of-War No More: Harmonizing Accuracy and Robustness in Vision-Language Models via Stability-Aware Task Vector Merging
HUME: Measuring the Human-Model Performance Gap in Text Embedding Tasks
OrthAlign: Orthogonal Subspace Decomposition for Non-Interfering Multi-Objective Alignment
Can LLMs Refuse Questions They Do Not Know? Measuring Knowledge-Aware Refusal in Factual Tasks
LEGACY: A Lightweight Dynamic Gradient Compression Strategy for Distributed Deep Learning
AgentFold: Long-Horizon Web Agents with Proactive Context Folding
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Agentic Reinforced Policy Optimization
IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction
CompMarkGS: Robust Watermarking for Compressed 3D Gaussian Splatting
Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies
Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLMs
BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning
Capability-Based Scaling Laws for LLM-Based Red-Teaming
Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols
From Fields to Random Trees
Personalized Reasoning: Just-in-time Personalization and Why LLMs Fail at It
When Bias Helps Learning: Bridging Initial Prejudice and Trainability
Don't Shift the Trigger: Robust Gradient Ascent for Backdoor Unlearning
Adapting Self-Supervised Representations as a Latent Space for Efficient Generation
From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments
The Less You Depend, The More You Learn: Synthesizing Novel Views from Sparse, Unposed Images without Any 3D Knowledge
BrowseNet: Knowledge Graph-Based Associative Memory for Contextual Information Retrieval
RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models
Pretraining with hierarchical memories: separating long-tail and common knowledge
Is Pure Exploitation Sufficient in Exogenous MDPs with Linear Function Approximation?
SelfReflect: Can LLMs Communicate Their Internal Answer Distribution?
Defining and quantifying compositional structure
BRIDGE: Bi-level Reinforcement Learning for Dynamic Group Structure in Coalition Formation Games
Trained on Tokens, Calibrated on Concepts: The Emergence of Semantic Calibration in LLMs
COMI: Coarse-to-fine Context Compression via Marginal Information Gain
Towards Interpretable Visual Decoding with Attention to Brain Representations
Projected Coupled Diffusion for Test-Time Constrained Joint Generation
Faster Parameter-Free Regret Matching Algorithms
LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence
The Matthew Effect of AI Programming Assistants: A Hidden Bias in Software Evolution
Alternating Diffusion for Proximal Sampling with Zeroth Order Queries
Critical attention scaling in long-context transformers
KnowProxy: Adapting Large Language Models by Knowledge-guided Proxy
Self-Augmented Visual Contrastive Decoding
Dual-Kernel Adapter: Expanding Spatial Horizons for Data-Constrained Medical Image Analysis
Similarity-aware Non-Convex Federated Optimization
Non-Clashing Teaching in Graphs: Algorithms, Complexity, and Bounds
Efficient Differentiable Contact Model with Long-range Influence
MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning
Prior-based Noisy Text Data Filtering: Fast and Strong Alternative For Perplexity
Advancing End-to-End Pixel-Space Generative Modeling via Self-Supervised Pre-Training
EGG-SR: Embedding Symbolic Equivalence into Symbolic Regression via Equality Graph
MrRoPE: Mixed-radix Rotary Position Embedding
RCPU: Rotation-Constrained Error Compensation for Structured Pruning of a Large Language Model
Does “Do Differentiable Simulators Give Better Policy Gradients?” Give Better Policy Gradients?
GAPrune: Gradient-Alignment Pruning for Domain-Aware Embeddings
Generalizable Coarse-to-Fine Robot Manipulation via Language-Aligned 3D Keypoints
PACE: Pretrained Audio Continual Learning
FlyPrompt: Brain-Inspired Random-Expanded Routing with Temporal-Ensemble Experts for General Continual Learning
Latent Fourier Transform
CARE: Towards Clinical Accountability in Multi-Modal Medical Reasoning with an Evidence-Grounded Agentic Framework
Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation
EgoTwin: Dreaming Body and View in First Person
PTQ4ARVG: Post-Training Quantization for AutoRegressive Visual Generation Models
Discern Truth from Falsehood: Reducing Over-Refusal via Contrastive Refinement
Semantic Visual Anomaly Detection and Reasoning in AI-Generated Images
A Fair Bayesian Inference through Matched Gibbs Posterior
HEEGNet: Hyperbolic Embeddings for EEG
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
VidBridge-R1: Bridging QA and Captioning for RL-based Video Understanding Models with Intermediate Proxy Tasks
A Statistical Theory of Overfitting for Imbalanced Classification
The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss
M4PQA: A Comprehensive QA Dataset for AI Research with Instance-Level Evaluation
T-TAMER: Provably Taming Trade-offs in ML Serving
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
UltraViCo: Breaking Extrapolation Limits in Video Diffusion Transformers
Exponential-Wrapped Mechanisms: Differential Privacy on Hadamard Manifolds Made Practical
Global and Local Topology-Aware Graph Generation via Dual Conditioning Diffusion
Geometric Graph Neural Diffusion for Stable Molecular Dynamics
VisuRiddles: Fine-grained Perception is a Primary Bottleneck for Multimodal Large Language Models in Abstract Visual Reasoning
SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests
Transformers with Endogenous In-Context Learning: Bias Characterization and Mitigation
Cross-Domain Lossy Compression via Rate- and Classification-Constrained Optimal Transport
From Text to Talk: Audio-Language Model Needs Non-Autoregressive Joint Training
MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models
DistDF: Time-series Forecasting Needs Joint-distribution Wasserstein Alignment
LogiConBench: Benchmarking Logical Consistencies of LLMs
TripleSumm: Adaptive Triple-Modality Fusion for Video Summarization
FIRE: Frobenius-Isometry Reinitialization for Balancing the Stability–Plasticity Tradeoff
LLM-as-a-Prophet: Understanding Predictive Intelligence with Prophet Arena
A Probabilistic Hard Concept Bottleneck for Steerable Generative Models
Spotlight on Token Perception for Multimodal Reinforcement Learning
ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models
VEAttack: Downstream-agnostic Vision Encoder Attack against Large Vision Language Models
Landing with the Score: Riemannian Optimization through Denoising
Slicing Wasserstein over Wasserstein via Functional Optimal Transport
DiffVax: Optimization-Free Image Immunization Against Diffusion-Based Editing
FedDAG: Clustered Federated Learning via Global Data and Gradient Integration for Heterogeneous Environments
H2OFlow: Grounding Human-Object Affordances with 3D Generative Models and Dense Diffused Flows
Faster SVD via Accelerated Newton-Schulz Iteration
SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration
Flow Where You Want
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
Query-Aware Flow Diffusion for Graph-Based RAG with Retrieval Guarantees
Computer Use Survey - A Visual Survey of Computer Use Agents
Near-Optimal Second-Order Guarantees for Model-Based Adversarial Imitation Learning
Loneliness as a Case Study for Social Reward Misalignment
WATS: Wavelet-Aware Temperature Scaling for Reliable Graph Neural Networks
AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM
The effect of feature resolution on embedding dimension
Discretisation invariance
Ready For General Agents? Let's test it.
Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models
Revisiting the NetHack Learning Environment
IceCache: Memory-Efficient KV-cache Management for Long-Sequence LLMs
Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs
AI Fundamentals: Valuing AI Agents & Data Assets
The Markovian Thinker
Stopping Computation for Converged Tokens in Masked Diffusion-LM Decoding
Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning
SHE-LoRA: Selective Homomorphic Encryption for Federated Tuning with Heterogeneous LoRA
Overparametrization bends the landscape: BBP transitions at initialization in simple Neural Networks
Attention Smoothing Is All You Need For Unlearning
WARP: Weight Teleportation for Attack-Resilient Unlearning Protocols
Adaptive Test-Time Training for Predicting Need for Invasive Mechanical Ventilation in Multi-Center Cohorts
Destruction is a General Strategy to Learn Generation; Diffusion's Strength is to Take it Seriously; Exploration is the Future
JULI: Jailbreak Large Language Models by Self-Introspection
Language-guided Open-world Video Anomaly Detection under Weak Supervision
Navigating the Manifold — A Geometric Perspective on Diffusion-Based Inverse Problems
Evaluating Machine Learned Inter-Atomic Potentials for a Practical Simulation Workflow
VITA: Zero-Shot Value Functions via Test-Time Adaptation of Vision–Language Models
Scaling Group Inference for Diverse and High-Quality Generation
Generative AI Archaeology
Understanding and Fixing Bottlenecks in State Space Models: What Recency and Over-Smoothing Tell Us
AdS-GNN - a Conformally Equivariant Graph Neural Network
SteinsGate: Adding Causality to Diffusions for Long Video Generation via Path Integral
UFO-4D: Unposed Feedforward 4D reconstruction from Two Images
ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases
Towards Strategic Persuasion with Language Models
ImageDoctor: Diagnosing Text-to-Image Generation via Grounded Image Reasoning
Orbital Transformers for Predicting Wavefunctions in Time-Dependent Density Functional Theory
Agentic Collaboration as an Information Bottleneck Problem
Laplacian Kernelized Bandit
ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality
Larger Datasets Can Be Repeated More: A Theoretical Analysis of Multi-Epoch Scaling in Linear Regression
Lumos-1: On Autoregressive Video Generation with Discrete Diffusion from a Unified Model Perspective
FlashRNN: Unlocking Parallel Training of Nonlinear RNNs for Large Language Models
Large Depth Completion Model from Sparse Observations
Planning with an Embodied Learnable Memory
Sublinear Spectral Clustering Oracle with Little Memory
The Open Proof Corpus: A Large-Scale Study of LLM-Generated Mathematical Proofs
Evoking User Memory: Personalizing LLM via Recollection-Familiarity Adaptive Retrieval
The Intricate Dance of Prompt Complexity, Quality, Diversity and Consistency in T2I Models
BoRA: Towards More Expressive Low-Rank Adaptation with Block Diversity
Egalitarian Gradient Descent: A Simple Approach to Accelerated Grokking
DUET: DISTILLED LLM UNLEARNING FROM AN EFFICIENTLY CONTEXTUALIZED TEACHER
Selection, Reflection and Self-Refinement: Revisit Reasoning Tasks via a Causal Lens
Overlap-weighted orthogonal meta-learner for treatment effect estimation over time
ChinaTravel: An Open-Ended Travel Planning Benchmark with Compositional Constraint Validation for Language Agents
PAS: Estimating the target Accuracy before domain adaptation
VADv2: End-to-End Autonomous Driving via Probabilistic Planning
Tell me Habibi, is it Real or Fake?
Benefits and Limitations of Communication in Multi-Agent Reasoning
Stochastic Optimal Control for Continuous-Time fMRI Representation Learning
When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling
Self-Guided Low Light Object Detection Framework
Beyond Multi-Token Prediction: Pretraining LLMs with Future Summaries
From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning
Newton Method Revisited: Global Convergence Rates up to $O(1/k^3)$ for Stepsize Schedules and Linesearch Procedures
AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism
Learning to Interpret Weight Differences in Language Models
SMOTE and Mirrors: Exposing Privacy Leakage from Synthetic Minority Oversampling
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Learning Collective Variables from BioEmu with Time-Lagged Generation
MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation
TabStruct: Measuring Structural Fidelity of Tabular Data
ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning
Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth
CircuitSense: A Hierarchical Circuit System Benchmark Bridging Visual Comprehension and Symbolic Reasoning in Engineering Design Process
SimpleFold: Folding Proteins is Simpler than You Think
From U-Nets to DiTs: The Architectural Evolution of Text-to-Image Diffusion Models (2021–2025)
Topology Matters in RTL Circuit Representation Learning
Language Models are Injective and Hence Invertible
One-Shot Exemplars for Class Grounding in Self-Supervised Learning
Characterizing Human Semantic Navigation in Concept Production as Trajectories in Embedding Space
Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks
On the Spectral Differences Between NTK and CNTK and Their Implications for Point Cloud Recognition
Dataset Distillation as Pushforward Optimal Quantization
Are Global Dependencies Necessary? Scalable Time Series Forecasting via Local Cross-Variate Modeling
Efficient Resource-Constrained Training of Vision Transformers via Subspace Optimization
Taming the Fragility of KV Cache Eviction in LLM Inference
Sublinear Time Quantum Algorithm for Attention Approximation
R2PS: Worst-Case Robust Real-Time Pursuit Strategies under Partial Observability
Leveraging Explanation to Improve Generalization of Meta Reinforcement Learning
Permutation-Consistent Variational Encoding for Incomplete Multi-View Multi-Label Classification
OWLEYE: ZERO-SHOT LEARNER FOR CROSSDOMAIN GRAPH DATA ANOMALY DETECTION
Language in the Flow of Time: Time-Series-Paired Texts Weaved into a Unified Temporal Narrative
Evolution and compression in LLMs: on the emergence of human-aligned categorization
How reinforcement learning after next-token prediction facilitates learning
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
GEPO: Group Expectation Policy Optimization for Stable Heterogeneous Reinforcement Learning
Global-Recent Semantic Reasoning on Dynamic Text-Attributed Graphs with Large Language Models
Neural Multi-Objective Combinatorial Optimization for Flexible Job Shop Scheduling Problems
Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models
TESSAR: Geometry-Aware Active Regression via Dynamic Voronoi Tessellation
Relational Feature Caching for Accelerating Diffusion Transformers
UrbanGraph: Physics-Informed Spatio-Temporal Dynamic Heterogeneous Graphs for Urban Microclimate Prediction
Vulcan: Crafting Compact Class-Specific Vision Transformers For Edge Intelligence
Relatron: Automating Relational Machine Learning over Relational Databases
Revisiting Confidence Calibration for Misclassification Detection in VLMs
Grounding Computer Use Agents on Human Demonstrations
Healthcare Insurance Fraud Detection via Continual Fiedler Vector Graph Model
Multi-Object System Identification from Videos
GAS: Enhancing Reward-Cost Balance of Generative Model-assisted Offline Safe RL
Unleashing Scientific Reasoning for Bio-experimental Protocol Generation via Structured Component-based Reward Mechanism
Parameter-Efficient Reinforcement Learning using Prefix Optimization
CodeQuant: Unified Clustering and Quantization for Enhanced Outlier Smoothing in Low-Precision Mixture-of-Experts
StreamingThinker: Large Language Models Can Think While Reading
MAGREF: Masked Guidance for Any-Reference Video Generation with Subject Disentanglement
Dynamic Novel View Synthesis in High Dynamic Range
Query-Level Uncertainty in Large Language Models
Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation
Adversarially Pretrained Transformers may be Universally Robust In-Context Learners
RD-HRL: Generating Reliable Sub-Goals for Long-Horizon Sparse-Reward Tasks
Exposing and Defending the Achilles' Heel of Video Mixture-of-Experts
Emergent Discrete Controller Modules for Symbolic Planning in Transformers
NetArena: Dynamically Generated LLM Benchmarks for Network Applications
Breaking the SFT Plateau: Multimodal Structured Reinforcement Learning for Chart-to-Code Generation
(U)NFV: Supervised and Unsupervised Neural Finite Volume Methods for Solving Hyperbolic PDEs
Quantitative Bounds for Length Generalization in Transformers
Bridging Piano Transcription and Rendering via Disentangled Score Content and Style
Predicting Kernel Regression Learning Curves from Only Raw Data Statistics
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs
Characterizing Deep Research: A Benchmark and Formal Definition
DragFlow: Unleashing DiT Priors with Region-Based Supervision for Drag Editing
The Lie of the Average: How Class Incremental Learning Evaluation Deceives You?
MOAI: Module-Optimizing Architecture for Non-Interactive Secure Transformer Inference
Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents
A Memory-Efficient Hierarchical Algorithm for Large-scale Optimal Transport Problems
WaterDrum: Watermark-based Data-centric Unlearning Metric
DTP: Delta-Guided Two Stage Pruning for Mamba-based Multimodal Large Language Models
CAR-LoRA: Training Compression-Aware and Robust LoRA Adapters for Evolving LLMs
TRAC: Tensor-Train based Across-layer Compression for Parameter-Efficient Fine-Tuning
MRAD: Zero-Shot Anomaly Detection with Memory-Driven Retrieval
Flash-Mono: Feed-Forward Accelerated Gaussian Splatting Monocular SLAM
Reinforcement Learning for Machine Learning Engineering Agents
Translation Heads: Unveiling Attention's Role in LLM Multilingual Translation
SpikeGen: Decoupled “Rods and Cones” Visual Representation Processing with Latent Generative Framework
Doubly-Regressing Approach for Subgroup Fairness
Improving Extreme Wind Prediction with Frequency-Informed Learning
Robust Fine-Tuning from Non-Robust Pretrained Models: Mitigating Suboptimal Transfer With Epsilon-Scheduling
From Sorting Algorithms to Scalable Kernels: Bayesian Optimization in High-Dimensional Permutation Spaces
General search techniques without common knowledge for imperfect-information games, and application to superhuman Fog of War chess
Learning a Game by Paying the Agents
TRACE: Your Diffusion Model is Secretly an Instance Edge Detector
Routing Channel-Patch Dependencies in Time Series Forecasting with Graph Spectral Decomposition
RefAny3D: 3D Asset-Referenced Diffusion Models for Image Generation
Convergence of Regret Matching in Potential Games and Constrained Optimization
MAGE: Multi-scale Autoregressive Generation for Offline Reinforcement Learning
FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation
Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs
Using maximal information auxiliary variables to improve synthetic data generation based on TabPFN foundation models
A Federated Generalized Expectation-Maximization Algorithm for Mixture Models with an Unknown Number of Components
Supervised Fine-Tuning or Contrastive Learning? Towards Better Multimodal LLM Reranking
Sparsity-promoting Fine-tuning for Equivariant Materials Foundation Model
Fast-dLLM v2: Efficient Block-Diffusion LLM
RAPID$^3$: Tri-Level Reinforced Acceleration Policies for Diffusion Transformer
Pisces: Cryptography-based Private Retrieval-Augmented Generation with Dual-Path Retrieval
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding
Price of Quality: Sufficient Conditions for Sparse Recovery using Mixed-Quality Data
Breaking Safety Paradox with Feasible Dual Policy Iteration
VenusX: Unlocking Fine-Grained Functional Understanding of Proteins
Context Learning for Multi-Agent Discussion
PD$^{2}$GS: Part-Level Decoupling and Continuous Deformation of Articulated Objects via Gaussian Splatting
EgoWorld: Translating Exocentric View to Egocentric View using Rich Exocentric Observations
MLP Memory: A Retriever-Pretrained Memory for Large Language Models
Mobile-GS: Real-time Gaussian Splatting for Mobile Devices
On Code-Induced Reasoning in LLMs
Heterogeneous Federated Fine-Tuning with Parallel One-Rank Adaptation
MASAM: Multimodal Adaptive Sharpness-Aware Minimization for Heterogeneous Data Fusion
Prior-free Tabular Test-time Adaptation
PonderLM: Pretraining Language Models to Ponder in Continuous Space
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
PSDNorm: Temporal Normalization for Deep Learning in Sleep Staging
FreqKV: Key-Value Compression in Frequency Domain for Context Window Extension
KANO: Kolmogorov-Arnold Neural Operator
Reasoning in Space via Grounding in the World
More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models
Faster Gradient Methods for Highly-smooth Stochastic Bilevel Optimization
Efficient Autoregressive Inference for Transformer Probabilistic Models
AssetFormer: Modular 3D Assets Generation with Autoregressive Transformer
Communication-Efficient Decentralized Optimization via Double-Communication Symmetric ADMM
Q&C: When Quantization Meets Cache in Efficient Generation
Towards Physically Executable 3D Gaussian for Embodied Navigation
NGS-Marker: Robust Native Watermarking for 3D Gaussian Splatting
Accessible, Realistic, and Fair Evaluation of Positive-Unlabeled Learning Algorithms
Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning
Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models
Contextual and Seasonal LSTMs for Time Series Anomaly Detection
From Evaluation to Defense: Advancing Safety in Video Large Language Models
JailNewsBench: Multi-Lingual and Regional Benchmark for Fake News Generation under Jailbreak Attacks
Embedding-Based Context-Aware Reranker
Jackpot: Align Actor-Policy Distribution for scalable and stable RL for LLM
M$^2$-Miner: Multi-Agent Enhanced MCTS for Mobile GUI Agent Data Mining
Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents
Scaling up Memory for Robotic Control via Experience Retrieval
Multi-Feature Quantized Self-Attention for Fair Large Language Models
DAG-Math: Graph-Guided Mathematical Reasoning in LLMs
Adaptive Conformal Guidance for Learning under Uncertainty
On the Bayes Inconsistency of Disagreement Discrepancy Surrogates
Learning from Synthetic Data Improves Multi-hop Reasoning
BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation
Combinatorial Rising Bandits
DeLiVR: Differential Spatiotemporal Lie Bias for Efficient Video Deraining
When Language Models Lose Their Mind: The Consequences of Brain Misalignment
Trade-offs in LLM Compute for Reasoning-Intensive Information Retrieval
LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer
Seeing Through Deception: Uncovering Misleading Creator Intent in Multimodal News with Vision-Language Models
Pre-training LLM without Learning Rate Decay Enhances Supervised Fine-Tuning
Dynamic Multi-sample Mixup with Gradient Exploration for Open-set Graph Anomaly Detection
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning
Learning linear state-space models with sparse system matrices
CIAR: Interval-based Collaborative Decoding for Image Generation Acceleration
DSSA: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation
Dynamic Classifier-Free Diffusion Guidance via Online Feedback
Not Search, But Scan: Benchmarking MLLMs on Scan-Oriented Academic Paper Reasoning
Multi-Condition Conformal Selection
CFO: Learning Continuous-Time PDE Dynamics via Flow-Matched Neural Operators
Pulp Motion: Framing-aware multimodal camera and human motion generation
Asynchronous Denoising Diffusion Models for Aligning Text-to-Image Generation
Learning with Dual-level Noisy Correspondence for Multi-modal Entity Alignment
Dual-Branch Representations with Dynamic Gated Fusion and Triple-Granularity Alignment for Deep Multi-View Clustering
SAQ: Stabilizer-Aware Quantum Error Correction Decoder
Aligning Visual Foundation Encoders to Tokenizers for Diffusion Models
Local Success Does Not Compose: Benchmarking Large Language Models for Compositional Formal Verification
VeriEquivBench: An Equivalence Score for Ground-Truth-Free Evaluation of Formally Verifiable Code
RegionReasoner: Region-Grounded Multi-Round Visual Reasoning
SimpleGVR: A Simple Baseline for Latent-Cascaded Generative Video Super-Resolution
Prompt-Robust Vision-Language Models via Meta-Finetuning
Local Geometry Attention for Time Series Forecasting under Realistic Corruptions
HSIC Bottleneck for Cross-Generator and Domain-Incremental Synthetic Image Detection
Predicting LLM Reasoning Performance with Small Proxy Model
Anime-Ready: Controllable 3D Anime Character Generation with Body-Aligned Component-Wise Garment Modeling
One2Scene: Geometric Consistent Explorable 3D Scene Generation from a Single Image
PSP: Prompt-Guided Self-Training Sampling Policy for Active Prompt Learning
General Exploratory Bonus for Optimistic Exploration in RLHF
Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning
Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving
BridgeDrive: Diffusion Bridge Policy for Closed-Loop Trajectory Planning in Autonomous Driving
DeepEyesV2: Toward Agentic Multimodal Model
DM4CT: Benchmarking Diffusion Models for Computed Tomography Reconstruction
Fair Conformal Classification via Learning Representation-Based Groups
Plan and Budget: Effective and Efficient Test-Time Scaling on Reasoning Large Language Models
Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing
PepBenchmark: A Standardized Benchmark for Peptide Machine Learning
Attention Is All You Need for KV Cache in Diffusion LLMs
Behavior Learning
Exploring Interpretability for Visual Prompt Tuning with Cross-layer Concepts
Compactness and Consistency: A Conjoint Framework for Deep Graph Clustering
MVR: Multi-view Video Reward Shaping for Reinforcement Learning
Multilevel Control Functional
Test-Time Optimization of 3D Point Cloud LLM via Manifold-Aware In-Context Guidance and Refinement
FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning
Proper Velocity Neural Networks
HWC-Loco: A Hierarchical Whole-Body Control Approach to Robust Humanoid Locomotion
StepORLM: A Self-Evolving Framework With Generative Process Supervision For Operations Research Language Models
WIMFRIS: WIndow Mamba Fusion and Parameter Efficient Tuning for Referring Image Segmentation
Hierarchical Encoding Tree with Modality Mixup for Cross-modal Hashing
Hyper-SET: Designing Transformers via Hyperspherical Energy Minimization
CERTIFIED VS. EMPIRICAL ADVERSARIAL ROBUSTNESS VIA HYBRID CONVOLUTIONS WITH ATTENTION STOCHASTICITY
SpareTrain: Fault-Tolerant LLM Training via Low-Cost Dual Modular Redundancy
Conformalized Hierarchical Calibration for Uncertainty-Aware Adaptive Hashing
PRISM: Partial-label Relational Inference with Spatial and Spectral Cues
Disentangled Hierarchical VAE for 3D Human-Human Interaction Generation
pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation
On Coreset for LASSO Regression Problem with Sensitivity Sampling
Mitigating Spurious Correlation via Distributionally Robust Learning with Hierarchical Ambiguity Sets
VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning
EditAnyShape: Shape-Aware Image Editing via Trajectory-Guided Region Control
Joint Distribution–Informed Shapley Values for Sparse Counterfactual Explanations
SR-Scientist: Scientific Equation Discovery With Agentic AI
Verifier-free Test-Time Sampling for Vision Language Action Models
MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE
Beyond Aggregation: Guiding Clients in Heterogeneous Federated Learning
HoloPart: Generative 3D Part Amodal Segmentation
DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
HiFo-Prompt: Prompting with Hindsight and Foresight for LLM-based Automatic Heuristic Design
Influence Dynamics and Stagewise Data Attribution
Batch Pruning by Activation Stability
SWINGARENA: Adversarial Programming Arena for Long-context GitHub Issue Solving
The Pensieve Paradigm: Stateful Language Models with Learned Memory Management
Nonparametric Teaching of Attention Learners
THE END OF MANUAL DECODING: TOWARDS TRULY END-TO-END LANGUAGE MODELS
Expressiveness of Multi-Neuron Convex Relaxations in Neural Network Certification
Understanding the Learning Phases in Self-Supervised Learning via Critical Periods
Dual Randomized Smoothing: Beyond Global Noise Variance
ODEBrain: Continuous-Time EEG Graph for Modeling Dynamic Brain Networks
lmgame-Bench: How Good are LLMs at Playing Games?
Automatic Stage Lighting Control: Is it a Rule-Driven Process or Generative Task?
Sample Lottery: Unsupervised Discovery of Critical Instances for LLM Reasoning
Triple-BERT: Do We Really Need MARL for Order Dispatch on Ride-Sharing Platforms?
STRONGER TOGETHER: ON-POLICY REINFORCEMENT LEARNING FOR COLLABORATIVE LLMS
Sapiens2
To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models
From Medical Records to Diagnostic Dialogues: A Clinical-Grounded Approach and Dataset for Psychiatric Comorbidity
Mapping Overlaps in Benchmarks through Perplexity in the Wild
Some Neural Networks Inherently Preserve Subspace Clustering Structure
Guided Query Refinement: Multimodal Hybrid Retrieval with Test-Time Optimization
Fluent Alignment with Disfluent Judges: Post-training for lower-resource languages
LiveWeb-IE: A Benchmark For Online Web Information Extraction
SONATA: Synergistic Coreset Informed Adaptive Temporal Tensor Factorization
Neural Compression of 3D Meshes using Sparse Implicit Representation
OmniActor: A Generalist GUI and Embodied Agent for 2D&3D Worlds
Dual Language Models: Balancing sample-efficiency and overfitting resilience
Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling
Critique-RL: Training Critiquing Language Models Through Two-Stage RL for Improved Discrimination and Constructive Feedback
AgentGym-RL: An Open-Source Framework to Train LLM Agents for Long-Horizon Decision Making via Multi-Turn RL
GRACE: Generative Representation Learning via Contrastive Policy Optimization
Don't Forget Its Variance! The Minimum Path Variance Principle for Accurate and Stable Score-Based Density Ratio Estimation
Why Reinforcement Fine-Tuning Enables MLLMs Preserve Prior Knowledge Better: A Data Perspective
Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding
Harmonized Cone for Feasible and Non-conflict Directions in Training Physics-Informed Neural Networks
ATLAS: Alibaba Dataset and Benchmark for Learning-Augmented Scheduling
The Geometry of Reasoning: Flowing Logics in Representation Space
Inoculation Prompting: Eliciting traits from LLMs during training can reduce trait expression at test-time
What's the plan? Metrics for implicit planning in LLMs and their application to rhyme generation
Fastcar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge
Scalable In-Context Q-Learning
vAttention: Verified Sparse Attention via Sampling
Feedback-driven recurrent quantum neural network universality
Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation
Parameterization-Based Dataset Distillation of 3D Point Clouds through Learnable Shape Morphing
Dynamic Parameter Reuse Augments Reasoning via Latent Chain of Thought
Semi-Parametric Contextual Pricing with General Smoothness
CompassNav: Steering From Path Imitation to Decision Understanding In Navigation
SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration
Taming Hierarchical Image Coding Optimization: A Spectral Regularization Perspective
Decoupling Primitive with Experts: Dynamic Feature Alignment for Compositional Zero-Shot Learning
Optimal Aggregation of LLM and PRM Signals for Efficient Test-Time Scaling
Can LLMs Reason Soundly in Law? Auditing Inference Patterns for Legal Judgment
CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics
Quantized Visual Geometry Grounded Transformer
QuantSparse: Comprehensively Compressing Video Diffusion Transformer with Model Quantization and Attention Sparsification
Squeeze the Soaked Sponge: Efficient Off-policy RFT for Large Language Model
Critic–Adviser–Reviser Cyclic Refinement: Towards High-Quality EMR Corpus Generation with LLMs
Culture In a Frame: C$^3$B as a Comic-Based Benchmark for Multimodal Culturally Awareness
Delving into Spectral Clustering with Vision-Language Representations
SpaCE-Eval: A Benchmark for Real-World Multi-Modal Reasoning
Entropic Confinement and Mode Connectivity in Overparameterized Neural Networks
Towards Spatial Supersensing in Video
Joint Selection for Large-Scale Pre-Training Data via Policy Gradient-based Mask Learning
An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models
Understanding the Emergence of Seemingly Useless Features in Next-Token Predictors
Probabilistic Circuits for Uncertainty Quantification
Evaluating Cross-Modal Reasoning Ability and Problem Charactaristics with Multimodal Item Response Theory
Fine-tuning Done Right in Model Editing
Branched Schrödinger Bridge Matching
Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning
Neural Graduated Assignment for Maximum Common Edge Subgraphs
Multi-View Encoders for Performance Prediction in LLM-Based Agentic Workflows
Decomposition of Concept-Level Rules in Visual Scenes
AnyBCQ: Hardware Efficient Flexible Binary-Coded Quantization for Multi-Precision LLMs
OD$^3$: Optimization-free Dataset Distillation for Object Detection
A Guardrail for Safety Preservation: When Safety-Sensitive Subspace Meets Harmful-Resistant Null-Space
DeRaDiff: Denoising Time Realignment of Diffusion Models
Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners
How Do Transformers Learn to Associate Tokens: Gradient Leading Terms Bring Mechanistic Interpretability
Block-wise Adaptive Caching for Accelerating Diffusion Policy
LongHorizonUI: A Unified Framework for Robust long-horizon Task Automation of GUI Agent
The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner
Enhancing Shortcut Models with Cumulative Self-Consistency Loss for One-Step Diffusion
Detect, Decide, Unlearn: A Transfer-Aware Framework for Continual Learning
From Trajectories to Operators — A Unified Flow Map Perspective on Generative Modeling
InfoNCE Induces Gaussian Distribution
Latent Visual Reasoning
Unlearning during Training: Domain-Specific Gradient Ascent for Domain Generalization
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
Pruning as a Cooperative Game: Surrogate-Assisted Layer Contribution Estimation for Large Language Models
TrojanTO: Action-Level Backdoor Attacks Against Trajectory Optimization Models
Output Supervision Can Obfuscate the Chain of Thought
Temporal Geometry of Deep Networks: Hyperbolic Representations of Training Dynamics for Intrinsic Explainability
MicroMix: Efficient Mixed-Precision Quantization with Microscaling Formats for Large Language Models
GraphPlanner: Graph-Based Agentic Routing for LLMs
What (and What Not) are Calibrated Probabilities Actually Useful for?
CogniLoad: A Synthetic Natural Language Reasoning Benchmark With Tunable Length, Intrinsic Difficulty, and Distractor Density
Interference-Isolated Elastic Weight Consolidation and Knowledge Calibration for Incremental Object Detection
Fast Language Generation through Discrete Diffusion Divergence Instruct
Three Forward, One Backward: Memory-Efficient Full-Rank Fine-Tuning of Large Models via Extra Forward Passes
Single-Loop Byzantine-Resilient Federated Bilevel Optimization
Dual-IPO: Dual-Iterative Preference Optimization for Text-to-Video Generation
On Universality of Deep Equivariant Networks
Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
FieryGS: In-the-Wild Fire Synthesis with Physics-Integrated Gaussian Splatting
SketchEvo: Leveraging Drawing Dynamics for Enhanced Image Synthesis
PerfGuard: A Performance-Aware Agent for Visual Content Generation
MoM: Linear Sequence Modeling with Mixture-of-Memories
Hubble: a Model Suite to Advance the Study of LLM Memorization
Self-Refining Vision Language Model for Robotic Failure Detection and Reasoning
Graph homophily booster: Rethinking the role of discrete features on heterophilic graphs
Revisiting Weight Regularization for Low-Rank Continual Learning
Quantile Advantage Estimation for Entropy-Safe Reasoning
On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation
Matting Anything 2: Towards Video Matting for Anything
Weight-Space Linear Recurrent Neural Networks
Extending Sequence Length is Not All You Need: Effective Integration of Multimodal Signals for Gene Expression Prediction
Intrinsic Lorentz Neural Network
Fast Proteome-Scale Protein Interaction Retrieval via Residue-Level Factorization
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization
WebWatcher: Breaking New Frontiers of Vision-Language Deep Research Agent
Incorporating Expert Priors into Bayesian Optimization via Dynamic Mean Decay
Hierarchical Value-Decomposed Offline Reinforcement Learning for Whole-Body Control
Virne: A Comprehensive Benchmark for RL-based Network Resource Allocation in NFV
FlexLoRA: Entropy-Guided Flexible Low-Rank Adaptation
Relative Entropy Pathwise Policy Optimization
MoDr: Mixture-of-Depth-Recurrent Transformers for Test-Time Reasoning
MolEditRL: Structure-Preserving Molecular Editing via Discrete Diffusion and Reinforcement Learning
Shuffling the Data, Extrapolating the Step: Sharper Bias In Constant Step-Size SGD
On Discovering Algorithms for Adversarial Imitation Learning
Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models
STORK: Faster Diffusion and Flow Matching Sampling by Resolving both Stiffness and Structure-Dependence
Prune-then-Quantize or Quantize-then-Prune? Understanding the Impact of Compression Order in Joint Model Compression
DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models
Pretraining with Re-parametrized Self-Attention: Unlocking Generalizationin SNN-Based Neural Decoding Across Time, Brains, and Tasks
FedMuon: Federated Learning with Bias-corrected LMO-based Optimization
LDT: Layer-Decomposition Training Makes Networks More Generalizable
RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents
Test-time Domain Generalization for Image Super-resolution
AdAEM: An Adaptively and Automated Extensible Evaluation Method of LLMs' Value Difference
Unleashing Perception-Time Scaling to Multimodal Reasoning Models
Fair Decision Utility in Human-AI Collaboration: Interpretable Confidence Adjustment for Humans with Cognitive Disparities
Panoptic Pairwise Distortion Graph
Federated ADMM from Bayesian Duality
ASSESS: A Semantic and Structural Evaluation Framework for Statement Similarity
Light Differentiable Logic Gate Networks
Beyond Scattered Acceptance: Fast and Coherent Inference for DLMs via Longest Stable Prefixes
Asynchronous Matching with Dynamic Sampling for Multimodal Dataset Distillation
QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models
Interpolation-Based Conditioning of Flow Matching Models for Bioisosteric Ligand Design
Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter
Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond
On the Ability of Deep Networks to Learn Symmetries from Data – A Neural Kernel Theory
COSMO-INR: Complex Sinusoidal Modulation for Implicit Neural Representations
Active Learning of 3D Gaussian Splatting with Consistent Region Partition and Robust Pose Estimation
Lossy Common Information in a Learnable Gray-Wyner Network
G4Splat: Geometry-Guided Gaussian Splatting with Generative Prior
PCLR: Progressively Compressed LoRA for Multimodal Continual Instruction Tuning
TEN-DM: Topology-Enhanced Diffusion Model for Spatio-Temporal Event Prediction
Aligning Deep Implicit Preferences by Learning to Reason Defensively
X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model
MMReD: a Cross-Modal Benchmark for Dense Context Reasoning
Neural Posterior Estimation with Latent Basis Expansions
Direct Doubly Robust Estimation of Conditional Quantile Contrasts
Hourglass Persistence for Graphs, Simplices, and Cells
WSVD: Weighted Low-Rank Approximation for Fast and Efficient Execution of Low-Precision Vision-Language Models
The Spacetime of Diffusion Models: An Information Geometry Perspective
SinkTrack: Attention Sink based Context Anchoring for Large Language Models
From Assistant to Independent Developer — Are GPTs Ready for Software Development?
DiffInk: Glyph- and Style-Aware Latent Diffusion Transformer for Text to Online Handwriting Generation
Latent Denoising Makes Good Visual Tokenizers
TokMem: Tokenized Procedural Memory for Large Language Models
FreeAdapt: Unleashing Diffusion Priors for Ultra-High-Definition Image Restoration
On the Expressiveness of State Space Models via Temporal Logics
Expert Merging in Sparse Mixture of Experts with Nash Bargaining
Memba: Membrane-driven Parameter-Efficient Fine-Tuning for Mamba
SafeMPO: Constrained Reinforcement Learning with Probabilistic Incremental Improvement
Compositional Neuro-Symbolic Concepts in Neural Activities
Beyond Pass@ 1: Self-Play with Variational Problem Synthesis Sustains RLVR
Beyond Length: Quantifying Long-Range Information for Long-Context LLM Pretraining Data
MotionGPT3: Human Motion as a Second Modality
TTOM: Test-Time Optimization and Memorization for Compositional Video Generation
VINCIE: Unlocking In-context Image Editing from Video
Adversarial Attacks Already Tell the Answer: Directional Bias-Guided Test-time Defense for Vision-Language Models
Selective Rotary Position Embedding
Achieving Expert-Level Agent from Foundation Model via Complexity Curriculum Reinforcement Learning with Synthetic Data
Understanding Dataset Distillation via Spectral Filtering
Unsupervised Learning of Efficient Exploration: Pre-training Adaptive Policies via Self-Imposed Goals
Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models
Measuring the Intrinsic Dimension of Earth Representations
Diffusion Language Model Knows the Answer Before It Decodes
Representation-Based Exploration for Language Models: From Test-Time to Post-Training
Beyond Match Maximization and Fairness: Retention-Objectified Two-Sided Matching
CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives
Scaling Agents via Continual Pre-training
WorldTree: Towards 4D Dynamic Worlds from Monocular Video using Tree-Chains
Towards Reliable Detection of Empty Space: Conditional Marked Point Processes for Object Detection
On the Computational Limits of AI4S-RL : A Unified $\varepsilon$-$N$ Analysis
SCoT: Teaching 3D-LLMs to Think Spatially with Million-scale CoT Annotations
Secret-Protected Evolution for Differentially Private Synthetic Text Generation
AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
Sparse Imagination for Efficient Visual World Model Planning
Reinforcement Learning from Dynamic Critic Feedback for Free-Form Generations
ROC-n-reroll: How verifier imperfection affects test-time scaling
Speculative Speculative Decoding
Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance
CoMem: Compositional Concept-Graph Memory for Continual Vision–Language Learning
KGOT: Unified Knowledge Graph and Optimal Transport Pseudo-Labeling for Molecule-Protein Interaction Prediction
Beware Untrusted Simulators -- Reward-Free Backdoor Attacks in Reinforcement Learning
SLM-MUX: Orchestrating Small Language Models for Reasoning
Pretrain Value, Not Reward: Decoupled Value Policy Optimization
TABLET: A Large-Scale Dataset for Robust Visual Table Understanding
Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4): Analysis and Variations
Federated Learning of Quantile Inference under Local Differential Privacy
AnyTouch 2: General Optical Tactile Representation Learning For Dynamic Tactile Perception
Modal Aphasia: Can Unified Multimodal Models Describe Images From Memory?
Rethinking LLM Reasoning: From Explicit Trajectories to Latent Representations
MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent
Exploring the Potential of Encoder-free Architectures in 3D LMMs
Learning Structure-Semantic Evolution Trajectories for Graph Domain Adaptation
Soft Tokens, Hard Truths
Implicit Models: Expressive Power Scales with Test-Time Compute
Scalable Training for Vector-Quantized Networks with 100% Codebook Utilization
Plug-and-Play Compositionality for Boosting Continual Learning with Foundation Models
RLAP-CLIP: Continual Multimodal Learning with Prototype Adaptation and Difficulty-Aware Routing
Si-GT: Fast Interconnect Signal Integrity Analysis for Integrated Circuit Design via Graph Transformers
Smooth Calibration Error: Uniform Convergence and Functional Gradient Analysis
Learning Adaptive Distribution Alignment with Neural Characteristic Function for Graph Domain Adaptation
A Simple "Motivation" Can Enhance Reinforcement Finetuning of Large Reasoning Models
Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning
Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP
Cross-Embodied Co-Design for Dexterous Hands
Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models
ArtUV: Artist-style UV Unwrapping
DeepWeightFlow: Re-Basined Flow Matching for Generating Neural Network Weights
Delay Flow Matching
Riemannian High-Order Pooling for Brain Foundation Models
EchoMind: An Interrelated Multi-level Benchmark for Evaluating Empathetic Speech Language Models
Generative View Stitching
WholeBodyVLA: Towards Unified Latent VLA for Whole-body Loco-manipulation Control
Lifelong Embodied Navigation Learning
KDP: Simplifying Representation Dynamics in Kernel Space
Seeing Through Words: Controlling Visual Retrieval Quality with Language
Point-MoE: Large-Scale Multi-Dataset Training with Mixture-of-Experts for 3D Semantic Segmentation
To Compress or Not? Pushing the Frontier of Lossless GenAI Model Weights Compression with Exponent Concentration
Fast and Interpretable Protein Substructure Alignment via Optimal Transport
Decoupling the Class Label and the Target Concept in Machine Unlearning
EAST: Early Action Prediction Sampling Strategy with Token Masking
Fusing Pixels and Genes: Spatially-Aware Learning in Computational Pathology
JailbreakLoRA: Your Downloaded LoRA from Sharing Platforms might be Unsafe
Towards Understanding Valuable Preference Data for Large Language Model Alignment
Children's Intelligence Tests Pose Challenges for MLLMs? KidGym: A 2D Grid-Based Reasoning Benchmark for MLLMs
Deep Global-sense Hard-negative Discriminative Generation Hashing for Cross-modal Retrieval
Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning
SFBD-OMNI: Bridge models for lossy measurement restoration with limited clean samples
Autonomous Play with Correspondence-Driven Trajectory Warping
Quotient-Space Diffusion Model
ExGRPO: Learning to Reason from Prior Successes
Diagnosing and Improving Diffusion Models by Estimating Optimal Loss Value
Hyperparameter Trajectory Inference with Conditional Lagrangian Optimal Transport
Robust Spiking Neural Networks Against Adversarial Attacks
Neural Dynamics Self-Attention for Spiking Transformers
GGBall: Graph Generative Model on Poincaré Ball
RealBench: A Benchmark for Complex Physical Systems with Real-World Data
MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement
Imagine How To Change: Explicit Procedure Modeling for Change Captioning
KLAS: Using Similarity to Stitch Neural Networks for an Improved Accuracy-Efficiency Tradeoff
Beyond Markovian Drifts: Action-Biased Geometric Walks with Memory for Personalized Summarization
CoLLMLight: Cooperative Large Language Model Agents for Network-Wide Traffic Signal Control
USTBench: Benchmarking and Dissecting Spatiotemporal Reasoning Capabilities of LLMs as Urban Agents
Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting
Koopman-Assisted Trajectory Synthesis: A Data Augmentation Framework for Offline Imitation Learning
Distilling the Thought, Watermarking the Answer: A Principle Semantic Guided Watermark for Reasoning Large Language Models
Locality-Attending Vision Transformer
A Unified Federated Framework for Trajectory Data Preparation via LLMs
AutoGPS: Automated Geometry Problem Solving via Multimodal Formalization and Deductive Reasoning
Plan then Act: Bi-level CAD Command Sequence Generation
Learning Global Hypothesis Space for Enhancing Synergistic Reasoning Chain
ADEPT: Continual Pretraining via Adaptive Expansion and Dynamic Decoupled Tuning
MambaVoiceCloning: Efficient and Expressive Text-to-Speech via State-Space Modeling and Diffusion Control
CloDS: Visual-Only Unsupervised Cloth Dynamics Learning in Unknown Conditions
Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems
The Overthinking Predicament: When Reasoning Hurts Ranking
Tools are under-documented: Simple Document Expansion Boosts Tool Retrieval
The Power of Small Initialization in Noisy Low-Tubal-Rank Tensor Recovery
A2ASecBench: A Protocol-Aware Security Benchmark for Agent-to-Agent Multi-Agent Systems
All-day Multi-scenes Lifelong Vision-and-Language Navigation with Tucker Adaptation
Go-Browse: Training Web Agents with Structured Exploration
Map as a Prompt: Learning Multi-Modal Spatial-Signal Foundation Models for Cross-scenario Wireless Localization
Agent Data Protocol
From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents
Parallel Multimodal Diffusion Language Models for Thinking-Aware Editing and Generation
Dynamic Texture Modeling of 3D Clothed Gaussian Avatars from a Single Video
Retain and Adapt: Auto-Balanced Model Editing for Open-Vocabulary Object Detection under Domain Shifts
Learned Meta-Tokens for Language Modeling
Semantic Voting: A Self-Evaluation-Free Approach for Efficient LLM Self-Improvement on Unverifiable Open-ended Tasks
MedAraBench: Large-scale Arabic Medical Question Answering Dataset and Benchmark
Understanding Routing Mechanism in Mixture-of-Experts Language Models
Divide, Harmonize, Then Conquer It: Shooting Multi-Commodity Flow Problems with Multimodal Language Models
LSA: Layer-wise Sparsity Allocation for Large Language Model Pruning Based on Minimal Linear Reconstruction Error
Designing Rules to Pick a Rule: Aggregation by Consistency
Through the Lens of Contrast: Self-Improving Visual Reasoning in VLMs
Closing the Gap Between Text and Speech Understanding in LLMs
Fast Data Mixture Optimization via Gradient Descent
Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning
FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning
Nüwa: Mending the Spatial Integrity Torn by VLM Token Pruning
PT$^2$-LLM: Post-Training Ternarization for Large Language Models
Regret-Guided Search Control for Efficient Learning in AlphaZero
EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video
TVTSyn: Content-Synchronous Time-Varying Timbre for Streaming Voice Conversion and Anonymization
Differentiable Simulation of Hard Contacts with Soft Gradients for Learning and Control
Mode-conditioning unlocks superior test-time compute scaling
Buckingham $\pi$-Invariant Test‑Time Projection for Robust PDE Surrogate Modeling
From Narrow to Panoramic Vision: Attention-Guided Cold-Start Reshapes Multimodal Reasoning
Depth Anything 3: Recovering the Visual Space from Any Views
Trace Anything: Representing Any Video in 4D via Trajectory Fields
Modeling Others' Minds as Code
Training Large Language Models To Reason In Parallel With Global Forking Tokens
InputDSA: Demixing, then comparing recurrent and externally driven dynamics
Learning to Answer from Correct Demonstrations
SciTS: Scientific Time Series Understanding and Generation with LLMs
Discrete Guidance Matching: Exact Guidance for Discrete Flow Matching
SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety
Study of Training Dynamics for Memory-Constrained Fine-Tuning
ShieldedCode: Learning Robust Representations for Virtual Machine Protected Code
Characterizing and Optimizing the Spatial Kernel of Multi Resolution Hash Encodings
Forest-Based Graph Learning for Semi-Supervised Node Classification
Uncover Underlying Correspondence for Robust Multi-view Clustering
Dual-Solver: A Generalized ODE Solver for Diffusion Models with Dual Prediction
Reasoning as Representation: Rethinking Visual Reinforcement Learning in Image Quality Assessment
Memorizing Long-tail Data Can Help Generalization Through Composition
VideoAnchor: Reinforcing Subspace-Structured Visual Cues for Coherent Visual-Spatial Reasoning
MUSE: Model-Agnostic Tabular Watermarking via Multi-Sample Selection
R-WoM: Retrieval-augmented World Model For Computer-use Agents
Bridging Fairness and Explainability: Can Input-Based Explanations Promote Fairness in Hate Speech Detection?
The Art of Scaling Reinforcement Learning Compute for LLMs
The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity?
Understanding and improving Shampoo and SOAP via Kullback-Leibler Minimization
ViTSP: A Vision Language Models Guided Framework for Large-Scale Traveling Salesman Problems
Robust Optimization for Mitigating Reward Hacking with Correlated Proxies
Solving Parameter-Robust Avoid Problems with Unknown Feasibility using Reinforcement Learning
SEED-SET: Scalable Evolving Experimental Design for System-level Ethical Testing
Hierarchical Semantic-Acoustic Modeling via Semi-Discrete Residual Representations for Expressive End-to-End Speech Synthesis
ReFORM: Reflected Flows for On-support Offline RL via Noise Manipulation
Debiased and Denoised Projection Learning for Incomplete Multi-view Clustering
Semantic Regexes: Auto-Interpreting LLM Features with a Structured Language
VoG: Enhancing LLM Reasoning through Stepwise Verification on Knowledge Graphs
SynCoGen: Synthesizable 3D Molecule Generation via Joint Reaction and Coordinate Modeling
TwinVLA: Data-Efficient Bimanual Manipulation with Twin Single-Arm Vision-Language-Action Models
Advancing Multi-agent Traffic Simulation via R1-Style Reinforcement Fine-Tuning
Zeros can be Informative: Masked Binary U-Net for Image Segmentation on Tensor Cores
RL makes MLLMs see better than SFT
InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models
Understanding Collaboration Mechanism In VAE Recommender Systems
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task
RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy
Thicker and Quicker: The Jumbo Token for Fast Plain Vision Transformers
FlowGen: Synthesizing Diverse Flowcharts to Enhance and Benchmark MLLM Reasoning
Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context
From Gradient Volume to Shapley Fairness: Towards Fair Multi-Task Learning
OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!
Neural Networks Learn Multi-Index Models Near the Information-Theoretic Limit
Breaking Barriers: Do Reinforcement Fine-tuning Gains Transfer To Unseen Domains?
Does Higher Interpretability Imply Better Utility? A Pairwise Analysis on Sparse Autoencoders
Towards a Certificate of Trust: Task-Aware OOD Detection for Scientific AI
Distributionally Robust Cooperative Multi-agent Reinforcement Learning with Value Factorization
Embodied Navigation Foundation Model
Rapid Training of Hamiltonian Graph Networks Using Random Features
Lookup multivariate Kolmogorov-Arnold Networks
Supporting Multimodal Intermediate Fusion with Informatic Constraint and Distribution Coherence
ReTabAD: A Benchmark for Restoring Semantic Context in Tabular Anomaly Detection
Flow2GAN: Hybrid Flow Matching and GAN with Multi-Resolution Network for One-/Two-step High-Fidelity Audio Generation
Unlocking the Essence of Beauty: Advanced Aesthetic Reasoning with Relative-Absolute Policy Optimization
Rethinking the Diffusion Model from a Langevin Perspective
MetaVLA: Unified Meta Co-Training for Efficient Embodied Adaptation
FASTer: Toward Powerful and Efficient Autoregressive Vision–Language–Action Models with Learnable Action Tokenizer and Block-wise Decoding
Model Predictive Adversarial Imitation Learning for Planning from Observation
Clipped Gradient Methods for Nonsmooth Convex Optimization under Heavy-Tailed Noise: A Refined Analysis
From Large to Small: Transferring CUDA Optimization Expertise via Reasoning Graph
NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents
FragFM: Hierarchical Framework for Efficient Molecule Generation via Fragment-Level Discrete Flow Matching
PartSAM: A Scalable Promptable Part Segmentation Model Trained on Native 3D Data
Identifying Robust Neural Pathways: Few-Shot Adversarial Mask Tuning for Vision-Language Models
Flow Matching with Injected Noise for Offline-to-Online Reinforcement Learning
Stable and Scalable Deep Predictive Coding Networks with Meta Prediction Errors
sleep2vec: Unified Cross-Modal Alignment for Heterogeneous Nocturnal Biosignals
Scaling Up, Speeding Up: A Benchmark of Speculative Decoding for Efficient LLM Test-Time Scaling
Darwin Gödel Machine: Open-Ended Evolution of Self-Improving Agents
Frequency-Domain Better than Time-Domain for Causal Structure Recovery in Dynamical Systems on Networks
Flow Actor-Critic for Offline Reinforcement Learning
RMFlow: Refined Mean Flow by a Noise-Injection Step for Multimodal Generation
CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmarking of Large Language Models in Mental Health Question Answering
Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes
Heterogeneous Front-Door Effects: Debiased Estimation with Quasi-Oracle Guarantees
LightCtrl: Training-free Controllable Video Relighting
STEM: SCALING TRANSFORMERS WITH EMBEDDING MODULES
Towards Prompt-Robust Machine-Generated Text Detection
Color3D: Controllable and Consistent 3D Colorization with Personalized Colorizer
Do Large Language Models Know What They Are Capable Of?
Gumbel Distillation for Parallel Text Generation
Deep Think with Confidence
Multi-agent Coordination via Flow Matching
Annotation-Efficient Honesty Alignment via Confidence Elicitation and Calibration
A2D: Any-Order, Any-Step Safety Alignment for Diffusion Language Models
TRAJECT-Bench:A Trajectory-Aware Benchmark for Evaluating Agentic Tool Use
Seeing but Not Believing: Probing the Disconnect Between Visual Attention and Answer Correctness in VLMs
Bi-directional Bias Attribution: Debiasing Large Language Models without Modifying Prompts
Cautious Weight Decay
Mechanistic Detection and Mitigation of Hallucination in Large Reasoning Models
Enhancing Agentic Search via Data Synthesis on Hierarchical Constraint Satisfaction
Loc$^{2}$: Interpretable Cross-View Localization via Depth-Lifted Local Feature Matching
A Genetic Algorithm for Navigating Synthesizable Molecular Spaces
gen2seg: Generative Models Enable Generalizable Instance Segmentation
WAFT: Warping-Alone Field Transforms for Optical Flow
Does Weak-to-strong Generalization Happen under Spurious Correlations?
Reshaping Reasoning in LLMs: A Theoretical Analysis of RL Training Dynamics through Pattern Selection
InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search
XQC: Well-conditioned Optimization Accelerates Deep Reinforcement Learning
ARROW: An Adaptive Rollout and Routing Method for Global Weather Forecasting
Visual Planning: Let's Think Only with Images
QueryStream: Advancing Streaming Video Understanding with Query-Aware Pruning and Proactive Response
Geometric Image Editing via Effects-Sensitive In-Context Inpainting with Diffusion Transformers
LFQA-E: Carefully Benchmarking Long-form QA Evaluation
Learning to Parallel: Accelerating Diffusion Large Language Models via Adaptive Parallel Decoding
EVLP: Learning Unified Embodied Vision-Language Planner with Reinforced Supervised Fine-Tuning
SC-Arena: A Natural Language Benchmark for Single-Cell Reasoning with Knowledge-Augmented Evaluation
CHROMA: Consistent Harmonization of Multi-View Appearance via Bilateral Grid Prediction
BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs
CARD: Towards Conditional Design of Multi-agent Topological Structures
EgoHandICL: Egocentric 3D Hand Reconstruction with In-Context Learning
SPACeR: Self-Play Anchoring with Centralized Reference Models
Knowledge Reasoning Language Model: Unifying Knowledge and Language for Inductive Knowledge Graph Reasoning
3DSMT: A Hybrid Spiking Mamba-Transformer for Point Cloud Analysis
DiffuDETR: Rethinking Detection Transformers with Diffusion Process
Revisiting Matrix Sketching in Linear Bandits: Achieving Sublinear Regret via Dyadic Block Sketching
GuidedSampling: Steering LLMs Towards Diverse Candidate Solutions at Inference-Time
Pusa V1.0: Unlocking Temporal Control in Pretrained Video Diffusion Models via Vectorized Timestep Adaptation
Preserve and Personalize: Personalized Text-to-Image Diffusion Models without Distributional Drift
Counterfactual Explanations on Robust Perceptual Geodesics
STAIRS-Former: Spatio-Temporal Attention with Interleaved Recursive Structure TransFormer for Offline Mulit-task Multi-agent Reinforcement Learning
Composite Optimization with Error Feedback: the Dual Averaging Approach
Enabling arbitrary inference in spatio-temporal dynamic systems: A physics-inspired perspective
Dual Goal Representations
Continual Low-Rank Adapters for LLM-based Generative Recommender Systems
Generalized Spherical Neural Operators: Green’s Function Formulation
Decoupled Q-Chunking
Beyond Magic Words: Sharpness-Aware Prompt Evolving for Robust Large Language Models with TARE
Readout Representation: Redefining Neural Codes by Input Recovery
Bridging Radiology and Pathology Foundation Models via Concept-Based Multimodal Co-Adaptation
Incomplete Multi-View Multi-Label Classification via Shared Codebook and Fused-Teacher Self-Distillation
HiGS: History-Guided Sampling for Plug-and-Play Enhancement of Diffusion Models
Adaptive Thinking: Large Language Models Know When to Think in Latent Space
Beyond Instance-Level Alignment: Dual-Level Optimal Transport for Audio-Text Retrieval
Rejuvenating Cross-Entropy Loss in Knowledge Distillation for Recommender Systems
Robustness of Probabilistic Models to Low-Quality Data: A Multi-Perspective Analysis
Learning Patient-Specific Disease Dynamics With Latent Flow Matching For Longitudinal Imaging Generation
Inductive Reasoning for Temporal Knowledge Graphs with Emerging Entities
Entropy-preserving reinforcement learning
RigidSSL: Rigidity-based Geometric Pretraining for Protein Generation
The Alignment Auditor: A Bayesian Framework for Verifying and Refining LLM Objectives
WFR-FM: Simulation-Free Dynamic Unbalanced Optimal Transport
DVLA-RL: Dual-Level Vision–Language Alignment with Reinforcement Learning Gating for Few-Shot Learning
AC-Foley: Reference-Audio-Guided Video-to-Audio Synthesis with Acoustic Transfer
Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents
Flow Autoencoders are Effective Protein Tokenizers
Benchmarking Bias Mitigation Toward Fairness Without Harm from Vision to LVLMs
Mesh Splatting for End-to-end Multiview Surface Reconstruction
DEAS: DEtached value learning with Action Sequence for Scalable Offline RL
Self-Improving Vision-Language-Action Models with Data Generation via Residual RL
Beyond the Known: An Unknown-Aware Large Language Model for Open-Set Text Classification
KBVQ-MoE: KLT-guided SVD with Bias-Corrected Vector Quantization for MoE Large Language Models
Beyond the Heatmap: A Rigorous Evaluation of Component Impact in MCTS-Based TSP Solvers
The Geometry and Topology of Circuits: the Manifolds of Modular Addition
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts
Predictability Shapes Adaptation: An Evolutionary Perspective on Modes of Learning in Transformers
Topology-Preserved Auto-regressive Mesh Generation in the Manner of Weaving Silk
Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions
The Adversarial Conditioning Paradox: Why Attacked Inputs Are More Stable, Not Less
Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning
Once-More: Continuous Self-Correction for Large Language Models via Perplexity-Guided Intervention
How Far Can Unsupervised RLVR Scale LLM Training?
JustRL: Scaling a 1.5B LLM with a Simple RL Recipe
OFMU: OPTIMIZATION-DRIVEN FRAMEWORK FOR MACHINE UNLEARNING
Spectral Attention Steering for Prompt Highlighting
Disentanglement of Variations with Multimodal Generative Modeling
GRO-RAG: Gradient-aware Re-rank Optimization for Multi-source Retrieval-Augmented Generation
Simulating and Understanding Deceptive Behaviors in Long-Horizon Interactions
Compositional Visual Planning via Inference-Time Diffusion Scaling
Detection of unknown unknowns in autonomous systems
Gradient-Normalized Smoothness for Optimization with Approximate Hessians
PerFit: Exploring Personalization Shifts in Representation Space of LLMs
Compositional Diffusion with Guided search for Long-Horizon Planning
DNT: a Deeply Normalized Transformer that can be trained by Momentum SGD
GARLIC: Graph Attention-based Relational Learning of Multivariate Time Series in Intensive Care
Uncovering Conceptual Blindspots in Generative Image Models Using Sparse Autoencoders
StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams
Urban Socio-Semantic Segmentation with Vision-Language Reasoning
$\pi^3$: Permutation-Equivariant Visual Geometry Learning
Process-Verified Reinforcement Learning for Theorem Proving via Lean
Tokenizing Single-Channel EEG with Time-Frequency Motif Learning
DiSRouter: Distributed Self-Routing for LLM Selections
Point Prompting: Counterfactual Tracking with Video Diffusion Models
Sequential Information Bottleneck Fusion: Towards Robust and Generalizable Multi-Modal Brain Tumor Segmentation
WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool
RAIN-Merging: A Gradient-Free Method to Enhance Instruction Following in Large Reasoning Models with Preserved Thinking Format
Visual Prompt-Agnostic Evolution
Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
WebArbiter: A Generative Reasoning Process Reward Model for Web Agents
VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models
Remaining-data-free Machine Unlearning by Suppressing Sample Contribution
Bi-LoRA: Efficient Sharpness-Aware Minimization for Fine-Tuning Large-Scale Models
RAG4DMC: Retrieval-Augmented Generation for Data-Level Modality Completion
Geometric-Mean Policy Optimization
P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark
VibeVoice: Expressive Podcast Generation with Next-Token Diffusion
InnoGym: Benchmarking the Innovation Potential of AI Agents
Resisting Contextual Interference in RAG via Parametric-Knowledge Reinforcement
CAPSUL: A Comprehensive Human Protein Benchmark for Subcellular Localization
LouisKV: Efficient KV Cache Retrieval for Long Input-Output Sequences
Beyond Linear Processing: Dendritic Bilinear Integration in Spiking Neural Networks
Beyond Distributions: Geometric Action Control for Continuous Reinforcement Learning
Instance-Dependent Fixed-Budget Pure Exploration in Reinforcement Learning
Faithful Bi-Directional Model Steering via Distribution Matching and Distributed Interchange Interventions
SSG: Scaled Spatial Guidance for Multi-Scale Visual Autoregressive Generation
FoNE: Precise Single-Token Number Embeddings via Fourier Features
SIM-CoT: Supervised Implicit Chain-of-Thought
Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning
Online Pseudo-Zeroth-Order Training of Neuromorphic Spiking Neural Networks
Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks
Hilbert-Guided Sparse Local Attention
Multi-Scale Hypergraph Meets LLMs: Aligning Large Language Models for Time Series Analysis
HistoPrism: Unlocking Functional Pathway Analysis from Pan-Cancer Histology via Gene Expression Prediction
SEMA: Simple yet Effective Learning for Multi-Turn Jailbreak Attacks
Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search
Language Models Use Lookbacks to Track Beliefs
High-Dimensional Analysis of Single-Layer Attention for Sparse-Token Classification
A Tale of Two Smoothness Notions: Adaptive Optimizers and Non-Euclidean Descent
On Natural Ways to Generate and Their Provable Power
VideoJudge: Bootstrapping Enables Scalable Supervision of MLLM-as-a-Judge for Video Understanding
Learning Pseudorandom Numbers with Transformers: Permuted Congruential Generators, Curricula, and Interpretability
Token-Guard: Towards Token-Level Hallucination Control via Self-Checking Decoding
LearNAT: Learning NL2SQL with AST-guided Task Decomposition for Large Language Models
OPPO: Accelerating PPO-based RLHF via Pipeline Overlap
VisioMath: Benchmarking Figure-based Mathematical Reasoning in LMMs
DA$^2$: Depth Anything in Any Direction
SportR: A Benchmark for Multimodal Large Language Model Reasoning in Sports
From Predictors to Samplers via the Training Trajectory
FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
Better Bounds for the Distributed Experts Problem
CatalystBench: A Comprehensive Multi-Task Benchmark for Advancing Language Models in Catalysis Science
Gradient-Direction-Aware Density Control for 3D Gaussian Splatting
Count Counts: Motivating Exploration in LLM Reasoning with Count-based Intrinsic Rewards
Streaming Visual Geometry Transformer
UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing
Deep Learning for Subspace Regression
Latent Diffusion Model without Variational Autoencoder
QKV Projections Require a Fraction of Their Memory
S2GO: Streaming Sparse Gaussian Occupancy
Uniform Discrete Diffusion with Metric Path for Video Generation
Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning
CoDA: From Text-to-Image Diffusion Models to Truly Training-Free Dataset Distillation
ACE-Bench: Benchmarking Agentic Coding in End-to-End Development of Complex Features
Math Blind: Failures in Diagram Understanding Undermine Reasoning in MLLMs
PE-SGD: Differentially Private Deep Learning via Evolution of Gradient Subspace for Text
LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation
Forge: Compiling a Unified Abstraction into Scalable Kernels for Linear Attention
UNIVERSAL AND EFFICIENT LOADING BALANCING FOR RL TRAINING OF LARGE MULTIMODAL MODELS
Ctrl-World: A Controllable Generative World Model for Robot Manipulation
Stop Unnecessary Reflection: Training LRMs for Efficient Reasoning with Adaptive Reflection and Length Coordinated Penalty
Dynamic Early Exit in Reasoning Models
TIGaussian: Disentangle Gaussians for Spatial-Awared Text-Image-3D Alignment
Smarter Not Harder: Generative Process Evaluation with Intrinsic-Signal Driving and Ability‑Adaptive Reward Shaping
Controllable Video Generation with Provable Disentanglement
ConRep4CO: Contrastive Representation Learning of Combinatorial Optimization Instances across Types
Learning for Highly Faithful Explainability
Enhancing Instruction Following of LLMs via Activation Steering with Dynamic Rejection
Graph-of-Agents: A Graph-based Framework for Multi-Agent LLM Collaboration
One Life to Learn: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration
Efficient Credal Prediction through Decalibration
Intrinsic Entropy of Context Length Scaling in LLMs
Anchored Supervised Fine-Tuning
FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting
3D Scene Prompting for Scene-Consistent Camera-Controllable Video Generation
CoCoDiff: Correspondence-Consistent Diffusion Model for Fine-grained Style Transfer
Decoding Open-Ended Information Seeking Goals from Eye Movements in Reading
Evolution of Flash Attention
Anatomy-aware Representation Learning for Medical Ultrasound
Generative Human Geometry Distribution
ASCIIEval: Benchmarking Models' Visual Perception in Text Strings via ASCII Art
DiMeR: Disentangled Mesh Reconstruction Model with Normal-only Geometry Training
Implicit 4D Gaussian Splatting for Fast Motion with Large Inter-Frame Displacements
StyliTruth : Unlocking Stylized yet Truthful LLM Generation via Disentangled Steering
COMPASS: Robust Feature Conformal Prediction for Medical Segmentation Metrics
Unified Registration of Cortical and Subcortical Structures
ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization
$\boldsymbol{\partial^\infty}$-Grid: Differentiable Grid Representations for Fast and Accurate Solutions to Differential Equations
RL's Razor: Why Online Reinforcement Learning Forgets Less
Plug, Play, and Fortify: A Low-Cost Module for Robust Multimodal Image Understanding Models
ReFocusEraser: Refocusing for Small Object Removal with Robust Context-Shadow Repair
UniSplat: Unified Spatio-Temporal Fusion via 3D Latent Scaffolds for Dynamic Driving Scene Reconstruction
PIRN: Prototypical-based Intra-modal Reconstruction with Normality Communication for Multi-modal Anomaly Detection.
Greater than the Sum of Its Parts: Building Substructure into Protein Encoding Models
Identifiability and recoverability in self-supervised models
Spatial-DISE: A Unified Benchmark for Evaluating Spatial Reasoning in Vision-Language Models
Native Reasoning Models: Training Language Models to Reason on Unverifiable Data
GTool: Graph Enhanced Tool Planning with Large Language Model
Uncertainty-Aware Diagnostics for Physics-Informed Machine Learning
ReDDiT: Rehashing Noise for Discrete Visual Generation
Adjusting Prediction Model Through Wasserstein Geodesic for Causal Inference
Matching without Group Barrier for Heterogeneous Treatment Effect Estimation
Sample Reward Soups: Query-efficient Multi-Reward Guidance for Text-to-Image Diffusion Models
Robustness in the Face of Partial Identifiability in Reward Learning
DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning
TS$^2$: Training with Sparsemax+, Testing with Softmax for Accurate and Diverse LLM Fine-Tuning
Learning to Orchestrate Agents in Natural Language with the Conductor
Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer
Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization
Trinity: An Evolved LLM Coordinator
ULTRA-360: Unconstrained Dataset for Large-scale Temporal 3D Reconstruction across Altitudes and Omnidirectional Views
AMiD: Knowledge Distillation for LLMs with $\alpha$-mixture Assistant Distribution
Distribution-Aware Multi-Granularity Phase Coding: Towards Lower Conversion Error for Spike-Driven Large Language Models
Test-Time Adaptation without Source Data for Out-of-Domain Bioactivity Prediction
A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning
Radiometrically Consistent Gaussian Surfels for Inverse Rendering
No Caption, No Problem: Caption-Free Membership Inference via Model-Fitted Embeddings
SPR$^2$Q: Static Priority-based Rectifier Routing Quantization for Image Super-Resolution
TyphoonMLA: A Mixed Naive-Absorb MLA Kernel For Shared Prefix
Constant Degree Matrix-Driven Incomplete Multi-View Clustering via Connectivity-Structure and Embedding Tensor Learning
Lost in Tokenization: Context as the Key to Unlocking Biomolecular Understanding in Scientific LLMs
Spatial Structure and Selective Text Jointly Facilitate Image Clustering
Qronos: Correcting the Past by Shaping the Future... in Post-Training Quantization
Attributing Response to Context: A Jensen–Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation
LaTo: Landmark-tokenized Diffusion Transformer for Fine-grained Human Face Editing
Robust Selective Activation with Randomized Temporal K-Winner-Take-All in Spiking Neural Networks for Continual Learning
D-REX: Differentiable Real-to-Sim-to-Real Engine for Learning Dexterous Grasping
On learning linear dynamical systems in context with attention layers
BiasScope: Towards Automated Detection of Bias in LLM-as-a-Judge Evaluation
Joint Optimization for 4D Human-Scene Reconstruction in the Wild
Assembling the Mind's Mosaic: Towards EEG Semantic Intent Decoding
Understanding the Mechanisms of Fast Hyperparameter Transfer
MAGO: Beyond Fixed Hyperparameters with Multi-Objective Pareto Optimization for Hybrid LLM Reasoning
Shift-and-Sum Quantization for Visual Autoregressive Models
Value Matching: Scalable and Gradient-Free Reward-Guided Flow Adaptation
PLoP: Precise LoRA Placement for Efficient Finetuning of Large Models
EXP-Bench: Can AI Conduct AI Research Experiments?
Steerable Adversarial Scenario Generation through Test-Time Preference Alignment
MindPilot: Closed-loop Visual Stimulation Optimization for Brain Modulation with EEG-guided Diffusion
Risk-Sensitive Reinforcement Learning for Alleviating Exploration Dilemmas in Large Language Models
Evaluating Text Creativity across Diverse Domains: a Dataset and Large Language Model Evaluator
From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors
APC-RL: Exceeding data-driven behavior priors with adaptive policy composition
Manipulation as in Simulation: Enabling Accurate Geometry Perception in Robots
Token-level Data Selection for Safe LLM Fine-tuning
Efficient and Sharp Off-Policy Learning under Unobserved Confounding
When Silence Is Golden: Can LLMs Learn to Abstain in Temporal QA and Beyond?
Learning Efficient and Interpretable Multi-Agent Communication
Counterfactual Reasoning for Retrieval-Augmented Generation
How Many Code and Test Cases Are Enough? Evaluating Test Cases Generation from a Binary-Matrix Perspective
Exploring Knowledge Purification in Multi-Teacher Knowledge Distillation for LLMs
PEAR: Phase Entropy Aware Reward for Efficient Reasoning
Differentially Private Equilibrium Finding in Polymatrix Games
Rethinking Consistent Multi-Label Classification under Inexact Supervision
GradPCA: Leveraging NTK Alignment for Reliable Out-of-Distribution Detection
Adaptive Conformal Anomaly Detection with Time Series Foundation Models for Signal Monitoring.
Proximal Supervised Fine-Tuning
IGU-LoRA: Adaptive Rank Allocation via Integrated Gradients and Uncertainty-Aware Scoring
Stability Under Scrutiny: Benchmarking Representation Paradigms for Online HD Mapping
Developmental Federated Tuning: A Cognitive-Inspired Paradigm for Efficient LLM Adaptation
Data Selection for LLM Alignment Using Fine-Grained Preferences
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
IMSE: Intrinsic Mixture of Spectral Experts Fine-tuning for Test-Time Adaptation
Beyond DAGs: A Latent Partial Causal Model for Multimodal Learning
CerebraGloss: Instruction-Tuning a Large Vision-Language Model for Fine-Grained Clinical EEG Interpretation
Efficient Message-Passing Transformer for Error Correcting Codes
Understanding and Improving Continuous LLM Adversarial Training via In-context Learning Theory
Q-learning with Posterior Sampling
Causal Discovery via Quantile Partial Effect
DreamPhase: Offline Imagination and Uncertainty-Guided Planning for Large-Language-Model Agents
Motion Prior Distillation in Time Reversal Sampling for Generative Inbetweening
Explainable Token-level Noise Filtering for LLM Fine-tuning Datasets
SPELL: Self-Play Reinforcement Learning for evolving Long-Context Language Models
Do LLM Agents Know How to Ground, Recover, and Assess? A Benchmark for Epistemic Competence in Information-Seeking Agents
Breaking Scale Anchoring: Frequency Representation Learning for Accurate High-Resolution Inference from Low-Resolution Training
ES-dLLM: Efficient Inference for Diffusion Large Language Models by Early-Skipping
FRABench and UFEval: Unified Fine-grained Evaluation with Task and Aspect Generalization
PostAlign: Multimodal Grounding as a Corrective Lens for MLLMs
BWCache: Accelerating Video Diffusion Transformers through Block-Wise Caching
From Curiosity to Caution: Mitigating Reward Hacking for Best-of-$N$ with Pessimism
AutoFly: Vision-Language-Action Model for UAV Autonomous Navigation in the Wild
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm
Learning on a Razor’s Edge: Identifiability and Singularity of Polynomial Neural Networks
Sampling Complexity of TD and PPO in RKHS
Does FLUX Already Know How to Perform Physically Plausible Image Composition?
MIRACLE: Model-free Imitation and Reinforcement Learning for Adaptive Cut-Selection
SPARTA: Scalable and Principled Benchmark of Tree-Structured Multi-hop QA over Text and Tables
FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning
MoMa: A Simple Modular Learning Framework for Material Property Prediction
Query-Guided Spatial–Temporal–Frequency Interaction for Music Audio–Visual Question Answering
Video-As-Prompt: Unified Semantic Control for Video Generation
Graphon Cross-Validation: Assessing Models on Network Data
Story-Iter: A Training-free Iterative Paradigm for Long Story Visualization
Multimodal LLM-assisted Evolutionary Search for Programmatic Control Policies
Misalignments and RL Failure Modes in the Early Stage of Superintelligence
Globally aware optimization with resurgence
Faster Vision Transformers with Adaptive Patches
Temperature as a Meta-Policy: Adaptive Temperature in LLM Reinforcement Learning
SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models
There and Back Again: On the relation between Noise and Image Inversions in Diffusion Models
MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs
Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator
Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?
A Scalable Inter-edge Correlation Modeling in CopulaGNN for Link Sign Prediction
Swap-guided Preference Learning for Personalized Reinforcement Learning from Human Feedback
P3D: Highly Scalable 3D Neural Surrogates for Physics Simulations with Global Context
BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses
LEAP: Local ECT-Based Learnable Positional Encodings for Graphs
From movement to cognitive maps: recurrent neural networks reveal how locomotor development shapes hippocampal spatial coding
WithAnyone: Toward Controllable and ID Consistent Image Generation
Discounted Online Convex Optimization: Uniform Regret Across a Continuous Interval
Unified In-Context Video Editing
Wavelet Predictive Representations for Non-Stationary Reinforcement Learning
NC-Bench and NCfold: A Benchmark and Closed-Loop Framework for RNA Non-Canonical Base-Pair Prediction
Entropy-Based Block Pruning for Efficient Large Language Models
Lost in the Non-convex Loss Landscape: How to Fine-tune the Large Time Series Model?
On the Impact of the Utility in Semivalue-based Data Valuation
CoDi: Subject-Consistent and Pose-Diverse Text-to-Image Generation
DreamSwapV: Mask-guided Subject Swapping for Any Customized Video Editing
VFScale: Intrinsic Reasoning through Verifier-Free Test-time Scalable Diffusion Model
GTA1: GUI Test-time Scaling Agent
Fast training of accurate physics-informed neural networks without gradient descent
SAFA-SNN: Sparsity-Aware On-Device Few-Shot Class-Incremental Learning with Fast-Adaptive Structure of Spiking Neural Network
DVD-Quant: Data-free Video Diffusion Transformers Quantization
CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data
TetraGT: Tetrahedral Geometry-Driven Explicit Token Interactions with Graph Transformer for Molecular Representation Learning
Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding
Low-pass Personalized Subgraph Federated Recommendation
Scaling Laws Revisited: Modeling the Role of Data Quality in Language Model Pretraining
Provably Accelerated Imaging with Restarted Inertia and Score-based Image Priors
CLAUSE: Agentic Neuro-Symbolic Knowledge Graph Reasoning via Dynamic Learnable Context Engineering
Narrow Finetuning Leaves Clearly Readable Traces in the Activation Differences
HackWorld: Evaluating Computer-Use Agents on Exploiting Web Application Vulnerabilities
Plan-Answer-Refine-on-Graph: Structured Planning and Self-Refinement for Large Language Model Reasoning on Knowledge Graphs
MVAR: Visual Autoregressive Modeling with Scale and Spatial Markovian Conditioning
STVG-R1: Incentivizing Instance-Level Reasoning and Grounding in Videos via Reinforcement Learning
ConsisDrive: Identity-Preserving Driving World Models for Video Generation by Instance Mask
reAR: Rethinking Visual Autoregressive Models via Token-wise Consistency Regularization
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning
Teach to Reason Safely: Policy-Guided Safety Tuning for MLRMs
Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function
GhostEI-Bench: Do Mobile Agent Resilience to Environmental Injection in Dynamic On-Device Environments?
Imitation Learning as Return Distribution Matching
Lightweight Spatio-Temporal Modeling via Temporally Shifted Distillation for Real-Time Accident Anticipation
Learning Koopman Representations with Controllability Guarantees
Escaping Low-Rank Traps: Interpretable Visual Concept Learning via Implicit Vector Quantization
RECAST: Expanding the Boundaries of LLMs' Complex Instruction Following with Multi-Constraint Data
ABBA-Adapters: Efficient and Expressive Fine-Tuning of Foundation Models
Aurora: Towards Universal Generative Multimodal Time Series Forecasting
CheckMate! Watermarking Graph Diffusion Models in Polynomial Time
Safety Subspaces are Not Linearly Distinct: A Fine-Tuning Case Study
Efficient-SAM2: Accelerating SAM2 with Object-Aware Visual Encoding and Memory Retrieval
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models
Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data
UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models
ConfHit: Conformal Generative Design via Nested Testing
R-Zero: Self-Evolving Reasoning LLM from Zero Data
Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond
Leveraging Pretrained Knowledge at Inference Time: LoRA-Gated Contrastive Decoding for Multilingual Factual Language Generation in Adapted LLMs
Difference-Aware Retrieval Polices for Imitation Learning
Missingness Bias Calibration in Feature Attribution Explanations
K²-Agent: Co-Evolving Know-What and Know-How for Hierarchical Mobile Device Control
GAP: Gradient Adjustment with Phase-guidance for Robust Vision-Proprioception Policies in Robotic Manipulation
TINY BUT MIGHTY: A SOFTWARE-HARDWARE CO- DESIGN APPROACH FOR EFFICIENT MULTIMODAL IN- FERENCE ON BATTERY-POWERED SMALL DEVICES
Watermarking Diffusion Language Models
Perception-R1: Advancing Multimodal Reasoning Capabilities of MLLMs via Visual Perception Reward
Provable Guarantees for Automated Circuit Discovery in Mechanistic Interpretability
VMDiff: Visual Mixing Diffusion for Limitless Cross-Object Synthesis
Unlocking Long-Horizon Agentic Search with Large-Scale End-to-End RL
Invert4TVG: A Temporal Video Grounding Framework with Inversion Tasks Preserving Action Understanding Ability
Mirror Flow Matching with Heavy-Tailed Priors for Generative Modeling on Convex Domains
STEDiff: Revealing the Spatial and Temporal Redundancy of Backdoor Attacks in Text-to-Image Diffusion Models
QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation
RLP: Reinforcement as a Pretraining Objective
LEXam: Benchmarking Legal Reasoning on 340 Law Exams
ContextPRM: Leveraging Contextual Coherence for multi-domain Test-Time Scaling
From Static Benchmarks to Dynamic Protocol: Agent-Centric Text Anomaly Detection for Evaluating LLM Reasoning
CortiLife: A Unified Framework for Cortical Representation Learning across the Lifespan
Model Already Knows the Best Noise: Bayesian Active Noise Selection via Attention in Video Diffusion Model
I-DRUID: Layout to image generation via instance-disentangled representation and unpaired data
Flowing Through States: Neural ODE Regularization for Reinforcement Learning
Spectral-guided Physical Dynamics Distillation
TusoAI: Agentic Optimization for Scientific Methods
Does the Data Processing Inequality Reflect Practice? On the Utility of Low-Level Tasks
Programming with Pixels: Can Computer-Use Agents do Software Engineering?
OrthoSolver: A Neural Proper Orthogonal Decomposition Solver For PDEs
Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers
On-the-Fly Adaptation to Quantization: Configuration-Aware LoRA for Efficient Fine-Tuning of Quantized LLMs
Closed-form $\ell_r$ norm scaling with data for overparameterized linear regression and diagonal linear networks under $\ell_p$ bias
Poisson Midpoint Method for Log Concave Sampling: Beyond the Strong Error Lower Bounds
Optimizing ID Consistency in Multimodal Large Models: Facial Restoration via Alignment, Entanglement, and Disentanglement
AP-OOD: Attention Pooling for Out-of- Distribution Detection
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
Textual Equilibrium Propagation for Deep Compound AI Systems
Lossless Vocabulary Reduction for Auto-Regressive Language Models
DUET: Optimizing Training Data Mixtures via Coarse, Noisy Feedback from Unseen Evaluation Tasks
S$^2$-Guidance: Stochastic Self-Guidance for Training-Free Enhancement of Diffusion Models
When Thinking Backfires: Mechanistic Insights into Reason-induced Misalignment
Uni-NTFM: A Unified Foundation Model for EEG Signal Representation Learning
Latent Concept Disentanglement in Transformer-based Language Models
Deploying Models to Non-participating Clients in Federated Learning without Fine-tuning: A Hypernetwork-based Approach
Beyond Penalization: Diffusion-based Out-of-Distribution Detection and Selective Regularization in Offline Reinforcement Learning
MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Models for Embodied Task Planning
HDR-4DGS: High Dynamic Range 4D Gaussian Splatting from Alternating-exposure Monocular Videos
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding
Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning
Vision Language Models are Biased
Scaling Direct Feedback Learning with Theoretical Guarantees
High-dimensional Mean-Field Games by Particle-based Flow Matching
QeRL: Beyond Efficiency - Quantization-enhanced Reinforcement Learning for LLMs
There Was Never a Bottleneck in Concept Bottleneck Models
Divide and Abstract: Autoformalization via Decomposition and Abstraction Learning
Learning Dynamic Causal Graphs Under Parametric Uncertainty via Polynomial Chaos Expansions
VeriRole: Verifiable Role-Awareness through Hint-Guided Reinforcement Learning
Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks
Do We Really Need Permutations? Impact of Width Expansion on Linear Mode Connectivity
Frustratingly Simple Retrieval Improves Challenging, Reasoning-Intensive Benchmarks
Is Finer Better? The Limits of Microscaling Formats in Large Language Models
ELViS: Efficient Visual Similarity from Local Descriptors that Generalizes Across Domains
Dataset Color Quantization: A Training-Oriented Framework for Dataset-Level Compression
Dual Distillation for Few-Shot Anomaly Detection
Implicit Regularisation in Diffusion Models: An Algorithm-Dependent Generalisation Analysis
Journey to the Centre of Cluster: Harnessing Interior Nodes for A/B Testing under Network Interference
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
CryoLVM: Self-supervised Learning from Cryo-EM Density Maps with Large Vision Models
Superficial Safety Alignment Hypothesis
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective
MARC: Memory-Augmented RL Token Compression for Efficient Video Understanding
Ads that Stick: Near-Optimal Ad Optimization through Psychological Behavior Models
Hinge Regression Tree: A Newton Method for Oblique Regression Tree Splitting
Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought
MambaSL: Exploring Single-Layer Mamba for Time Series Classification
Best-of-N through the Smoothing Lens: KL Divergence and Regret Analysis
Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation
Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search
Theoretical Guarantees for Causal Discovery on Large Random Graphs
Parallel Token Generation for Language Models
Measure Twice, Cut Once: A Semantic-Oriented Approach to Video Temporal Localization with Video LLMs
Cooperative Sheaf Neural Networks
Multi-Marginal Flow Matching with Adversarially Learnt Interpolants
The Rank and Gradient Lost in Non-stationarity: Sample Weight Decay for Mitigating Plasticity Loss in Reinforcement Learning
Oracle-efficient Hybrid Learning with Constrained Adversaries
Think Then Embed: Generative Context Improves Multimodal Embedding
Social Agents: Collective Intelligence Improves LLM Predictions
Reducing Contextual Stochastic Bilevel Optimization via Structured Function Approximation
Dual-Path Condition Alignment for Diffusion Transformers
Test-Time Mixture of World Models for Embodied Agents in Dynamic Environments
PhyScensis: Physics-Augmented LLM Agents for Complex Physical Scene Generation
PetaGAIL++: Utility Optimized Private Trajectory Generation with Imitation Learning
LipNeXt: Scaling up Lipschitz-based Certified Robustness to Billion-parameter Models
VideoZoomer: Reinforcement-Learned Temporal Focusing for Long Video Reasoning
Multimodal Classification via Total Correlation Maximization
AutoBio: A Simulation and Benchmark for Robotic Automation in Digital Biology Laboratory
Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty
CARPRT: Class-Aware Zero-Shot Prompt Reweighting for Vision-Language Model
In-Context Algorithm Emulation in Fixed-Weight Transformers
Transformers as a Measure-Theoretic Associative Memory: A Statistical Perspective
Identifying and Evaluating Inactive Heads in Pretrained LLMs
LD-EnSF: Synergizing Latent Dynamics with Ensemble Score Filters for Fast Data Assimilation with Sparse Observations
PICABench: How Far are We from Physical Realistic Image Editing?
Zero-Sacrifice Lifelong Adversarial Defense for Pre-Trained Encoders
DRIFT: Divergent Response in Filtered Transformations for Robust Adversarial Defense
WebDS: An End-to-End Benchmark for Web-based Data Science
DataMIL: Selecting Data for Robot Imitation Learning with Datamodels
Fast Escape, Slow Convergence: Learning Dynamics of Phase Retrieval under Power-Law Data
SpikePingpong: Spike Vision-based Fast-Slow Pingpong Robot System
TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models
Toward Safer Diffusion Language Models: Discovery and Mitigation of Priming Vulnerability
RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization
Simplex Constrained Sparse Optimization via Tail Screening
Animal behavioral analysis and neural encoding with transformer-based self-supervised pretraining
MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting
R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?
Chain-of-Context Learning: Dynamic Constraint Understanding for Multi-Task VRPs
InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents
SAVE: A Generalizable Framework for Multi-Condition Single-Cell Generation with Gene Block Attention
FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels
ManipEvalAgent: Promptable and Efficient Evaluation Framework for Robotic Manipulation Policies
A New Paradigm for Genome-wide DNA Methylation Prediction Without Methylation Input
Faithfulness Under the Distribution: A New Look at Attribution Evaluation
Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs
Towards Safe and Optimal Online Bidding: A Modular Look-ahead Lyapunov Framework
ViPO: Visual Preference Optimization at Scale
Learning from Noisy Preferences: A Semi-Supervised Learning Approach to Direct Preference Optimization
No outlier channels but with outlier blocks
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation
Score-based Greedy Search for Structure Identification of Partially Observed Linear Causal Models
CollectiveKV: Decoupling and Sharing Collaborative Information in Sequential Recommendation
Do 3D Large Language Models Really Understand 3D Spatial Relationships?
KL-Regularized Reinforcement Learning is Designed to Mode Collapse
SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking
Eigen-1: Scientific Reasoning through Adaptive Multi-Agent Refinement and Monitor-based RAG
UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation
GRAM-DTI: Adaptive Multimodal Representation Learning for Drug–Target Interaction Prediction
WOW-Seg: A Word-free Open World Segmentation Model
Can we generate portable representations for clinical time series data using LLMs?
EarthSE: A Benchmark Evaluating Earth Scientific Exploration Capability for Large Language Models
Out-of-Distribution Graph Models Merging
Escaping the Homophily Trap: A Threshold-free Graph Outlier Detection Framework via Clustering-guided Edge Reweighting
A Spectral-Grassmann Wasserstein metric for operator representations of dynamical systems
Decentralized Nonconvex Optimization under Heavy-Tailed Noise: Normalization and Optimal Convergence
Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models
Multi-Resolution Score-Based Variational Graphical Diffusion for Causal Inference on Latent Systems
How Do Medical MLLMs Fail? A Study on Visual Grounding in Medical Images
Hidden Patterns in Chain-of-Thought Reasoning
SPRINT: Sparse-Dense Residual Fusion for Efficient Diffusion Transformers
Robust Federated Inference
LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution
Minimax Optimal Adversarial Reinforcement Learning
Dichotomous Diffusion Policy Optimization
DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference
SesaHand: Enhancing 3D Hand Reconstruction via Controllable Generation with Semantic and Structural Alignment
GoldenStart: Q-Guided Priors and Entropy Control for Distilling Flow Policies
ATEX-CF: Attack-Informed Counterfactual Explanations for Graph Neural Networks
One-Step Flow Q-Learning: Addressing the Diffusion Policy Bottleneck in Offline Reinforcement Learning
TrajFlow: Nation-wide Pseudo GPS Trajectory Generation with Flow Matching Models
PixNerd: Pixel Neural Field Diffusion
PRO-MOF: Policy Optimization with Universal Atomistic Models for Controllable MOF Generation
Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations
Verification of the Implicit World Model in a Generative Model via Adversarial Sequences
Culture in Action: Evaluating Text-to-Image Models through Social Activities
ASTRAEA: A Token-wise Acceleration Framework for Video Diffusion Transformers
An Agentic Framework with LLMs for Solving Complex Vehicle Routing Problems
IDER: IDEMPOTENT EXPERIENCE REPLAY FOR RELIABLE CONTINUAL LEARNING
Concept-TRAK: Understanding how diffusion models learn concepts through concept attribution
Rethinking LoRA for Privacy-Preserving Federated Learning in Large Models
Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization
P-GenRM: Personalized Generative Reward Model with Test-time User-based Scaling
Progressive Online Video Understanding with Evidence-Aligned Timing and Transparent Decisions
ReSplat: Degradation-agnostic Feed-forward Gaussian Splatting via Self-guided Residual Diffusion
EmoPrefer: Can Large Language Models Understand Human Emotion Preferences?
AudioTrust: Benchmarking The Multifaceted Trustworthiness of Audio Large Language Models
Improved Quality, Synchrony, and Preference Alignment for Joint Audio-Video Generation
SASFT: Sparse Autoencoder-guided Supervised Finetuning to Mitigate Unexpected Code-Switching in LLMs
CP-Agent: Context‑Aware Multimodal Reasoning for Cellular Morphological Profiling under Chemical Perturbations
SynthWorlds: Controlled Parallel Worlds for Disentangling Reasoning and Knowledge in Language Models
Direct Preference Optimization for Primitive-Enabled Hierarchical RL: A Bilevel Approach
Unbiased Gradient Estimation for Event Binning via Functional Backpropagation
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs
Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives
HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space
Saddle-to-Saddle Dynamics Explains A Simplicity Bias Across Architectures
Product-Quantised Image Representation for High-Quality Image Synthesis
AutoQVLA: Not All Channels Are Equal in Vision-Language-Action Model's Quantization
Low rank adaptation of chemical foundation models generate effective odorant representations
On the Reasoning Abilities of Masked Diffusion Language Models
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Method
GaitSnippet: Gait Recognition Beyond Unordered Sets and Ordered Sequences
Activation Steering for LLM Alignment via a Unified ODE-Based Framework
Reasoned Safety Alignment: Ensuring Jailbreak Defense via Answer-Then-Check
More Than What Was Chosen: LLM-based Explainable Recommendation Beyond Noisy User Preferences
Continuous Space-Time Video Super-Resolution with 3D Fourier Fields
NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context
AutoLibra: Agent Metric Induction from Open-Ended Human Feedback
MergOPT: A Merge-Aware Optimizer for Robust Model Merging
TFHE-Coder: Evaluating LLM Agents for secure Fully Homomorphic Encryption Code Generation
CodeSense: a Real-World Benchmark and Dataset for Code Semantic Reasoning
TCD-Arena: Assessing Robustness of Time Series Causal Discovery Methods Against Assumption Violations
Consistency Geodesic Bridge: Image Restoration with Pretrained Diffusion Models
Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning
Reconstructing KV Caches with Cross-Layer Fusion for Enhanced Transformers
Scaling Speech Tokenizers with Diffusion Autoencoders
LLMS ON TRIAL: Evaluating Judicial Fairness For Large Language Models
Systematic Biosafety Evaluation of DNA Language Models under Jailbreak Attacks
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
TRACEDET: HALLUCINATION DETECTION FROM THE DECODING TRACE OF DIFFUSION LARGE LANGUAGE MODELS
Pay Less Attention to Function Words for Free Robustness of Vision-Language Models
MAPSS: Manifold-based Assessment of Perceptual Source Separation
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?
ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding
Inconsistency Biases in Dynamic Data Pruning
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs
Structural Prognostic Event Modeling for Multimodal Cancer Survival Analysis
IF-VidCap: Can Video Caption Models Follow Instructions?
RADAR: Reasoning–Ability and Difficulty-Aware Routing in Language Models
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs
YuE: Scaling Open Foundation Models for Long-Form Music Generation
Segment-Level Attribution for Selective Learning of Long Reasoning Traces
Towards Sustainable Investment Policies Informed by Opponent Shaping
Lookahead Tree-Based Rollouts for Enhanced Trajectory-Level Exploration in Reinforcement Learning with Verifiable Rewards
(Token-Level) \textbf{InfoRMIA}: Stronger Membership Inference and Privacy Assessment for LLMs
Real-Time Motion-Controllable Autoregressive Video Diffusion
Modality-free Graph In-context Alignment
Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference
Flow Matching with Semidiscrete Couplings
BioBO: Biology-informed Bayesian Optimization for Perturbation Design
Transfer Paramatters: Optimal per-Module Hyperparameters Across All Scaling Axes
Cortical Policy: A Dual-Stream View Transformer for Robotic Manipulation
Beyond Text-Only: Towards Multimodal Table Retrieval in Open-World
When Scores Learn Geometry: Rate Separations under the Manifold Hypothesis
Understanding Task Vectors in In-Context Learning: Emergence, Functionality, and Limitations
SCI-Verifier: Scientific Verifier with Thinking
WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents
DanceTogether: Generating Interactive Multi-Person Video without Identity Drifting
CellAgent: LLM-Driven Multi-Agent Framework for Natural Language-Based Single-Cell Analysis
MASS: MoErging through Adaptive Subspace Selection
TSPulse: Tiny Pre-Trained Models with Disentangled Representations for Rapid Time-Series Analysis
Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models
TOUCH: Text-guided Controllable Generation of Free-Form Hand-Object Interactions
ViMo: A Generative Visual GUI World Model for App Agents
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
Score-Based Density Estimation from Pairwise Comparisons
Diffusion Negative Preference Optimization Made Simple
On the Convergence Behavior of Preconditioned Gradient Descent Toward the Rich Learning Regime
VLMgineer: Vision-Language Models as Robotic Toolsmiths
ASMIL: Attention-Stabilized Multiple Instance Learning for Whole-Slide Imaging
CSRv2: Unlocking Ultra-Sparse Embeddings
Figma2Code: Automating Multimodal Design to Code in the Wild
MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning
Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences
Replicable Reinforcement Learning with Linear Function Approximation
ParaS2S: Benchmarking and Aligning Spoken Language Models for Paralinguistic-aware Speech-to-Speech Interaction
Fixing the Broken Compass: Diagnosing and Improving Inference-Time Reward Modeling
MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle
XIL: Cross-Expanding Incremental Learning
Multiple Token Divergence: Measuring and Steering In-Context Computation Density
LeRobot: An Open-Source Library for End-to-End Robot Learning
Differential Fine-Tuning Large Language Models Towards Better Diverse Reasoning Abilities
SYNC: Measuring and Advancing Synthesizability in Structure-Based Drug Design
Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image
Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs
Steering Evaluation-Aware Language Models To Act Like They Are Deployed
Identity-Free Deferral For Unseen Experts
DecompGAIL: Learning Realistic Traffic Behaviors with Decomposed Multi-Agent Generative Adversarial Imitation Learning
Scalable Multilingual Multimodal Machine Translation with Speech-Text Fusion
AI-for-Science Low-code Platform with Bayesian Adversarial Multi-Agent Framework
DynamicInfer: Runtime-Aware Sparse Offloading for LLMs Inference on a Consumer-Grade GPU
Holdout-Loss-Based Data Selection for LLM Finetuning via In-Context Learning
On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment
ExoPredicator: Learning Abstract Models of Dynamic Worlds for Robot Planning
ToolTree: Efficient LLM Tool Planning via Dual-Feedback Monte Carlo Tree Search and Bidirectional Pruning
SoFlow: Solution Flow Models for One-Step Generative Modeling
Delta-XAI: A Unified Framework for Explaining Prediction Changes in Online Time Series Monitoring
Soft Equivariance Regularization for Invariant Self-Supervised Learning
Spiking Discrepancy Transformer for Point Cloud Analysis
Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility
SMixer: Rethinking Efficient-Training and Event-Driven SNNs
Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards
One Patch Doesn’t Fit All: Adaptive Patching for Native-Resolution Multimodal Large Language Models
HippoTune: A Hippocampal Associative Loop–Inspired Fine-Tuning Method for Continual Learning
Using Graph Neural Networks in Reinforcement Learning: A Practical Guide
MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents
V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model Interaction
Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models
Wiki-R1: Incentivizing Multimodal Reasoning for Knowledge-based VQA via Data and Sampling Curriculum
SAGE: Spatial-visual Adaptive Graph Exploration for Visual Place Recognition
Autoregressive Visual Decoding from EEG Signals
COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning
Relative Value Learning
Machine Unlearning under Retain–Forget Entanglement
On the Generalization Capacities of MLLMs for Spatial Intelligence
Exploring State-Space Models for Data-Specific Neural Representations
Diversity-Aware Online Prompt Assignment to Generative Models
Path Matters: Unveiling Geometric Implicit Bias via Curvature-Aware Sparse View Optimization
Reversible Primitive–Composition Alignment for Continual Vision–Language Learning
Meta-UCF: Unified Task-Conditioned LoRA Generation for Continual Learning in Large Language Models
Pi-CCA: Prompt-Invariant CCA Certificates for Replay-Free Vision–Language Continual Learning
Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models
Adaptive Nonlinear Compression for Large Foundation Models
Generative Universal Verifier as Multimodal Meta-Reasoner
Dual Optimistic Ascent (PI Control) is the Augmented Lagrangian Method in Disguise
Can Language Models Discover Scaling Laws?
BideDPO: Conditional Image Generation with Simultaneous Text and Condition Alignment
PathChat-SegR1: Reasoning Segmentation in Pathology via SO-GRPO
EVALUATING MEMORY IN LLM AGENTS VIA INCRE- MENTAL MULTI-TURN INTERACTIONS
Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures
OmniPortrait: Fine-Grained Personalized Portrait Synthesis via Pivotal Optimization
Count Bridges enable Modeling and Deconvolving Transcriptomics
LeSTD: LLM Compression via Learning-based Sparse Tensor Decomposition
Primal-Dual Policy Optimization for Adversarial Linear CMDPs
Provable Separations between Memorization and Generalization in Diffusion Models
Latent Thinking Optimization: Your Latent Reasoning Language Model Secretly Encodes Reward Signals in its Latent Thoughts
CyclicReflex: Improving Reasoning Models via Cyclical Reflection Token Scheduling
Disentangled Robot Learning via Separate Forward and Inverse Dynamics Pretraining
WorldEdit: Towards Open-World Image Editing with a Knowledge-Informed Benchmark
TimeSeriesExamAgent: Creating TimeSeries Reasoning Benchmarks at Scale
Can You Hear Me Now? A Benchmark for Long-Range Graph Propagation
Distributions as Actions: A Unified Framework for Diverse Action Spaces
A Physics-Inspired Optimizer: Velocity Regularized Adam
LiFR-Seg: Anytime High-Frame-Rate Segmentation via Event-Guided Propagation
Test-Time Poisoned Sample Detection by Exploiting Shallow Malicious Matching in Backdoored CLIP
From Prediction to Perfection: Introducing Refinement to Autoregressive Image Generation
DuPO: Enabling Reliable Self-Verification via Dual Preference Optimization
Human Behavior Atlas: Benchmarking Unified Psychological And Social Behavior Understanding
FlashWorld: High-quality 3D Scene Generation within Seconds
Alignment-Enhanced Integration of Connectivity and Spectral Sparse in Dynamic Sparse Training of LLM
CodeBrain: Towards Decoupled Interpretability and Multi-Scale Architecture for EEG Foundation Model
Condition Matters in Full-head 3D GANs
First is Not Really Better Than Last: Evaluating Layer Choice and Aggregation Strategies in Language Model Data Influence Estimation
HLD: Approximate Hierarchical Linguistic Distribution Modeling for LLM-Generated Text Detection
Towards Understanding the Shape of Representations in Protein Language Models
Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models
High-Probability Bounds for the Last Iterate of Clipped SGD
WavefrontDiffusion: Dynamic Decoding Schedule for Improved Reasoning
DistMLIP: A Distributed Inference Platform for Machine Learning Interatomic Potentials
Efficient Reinforcement Learning by Guiding World Models with Non-Curated Data
Sparse Autoencoders Trained on the Same Data Learn Different Features
Random-projection ensemble dimension reduction
Enhancing Molecular Property Predictions by Learning from Bond Modelling and Interactions
Rethinking Causal Mask Attention for Vision-Language Inference
Taming Curvature: Architecture Warm-up for Stable Transformer Training
The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation
Causal Discovery in the Wild: A Voting-Theoretic Ensemble Approach
LiveClin: A Live Clinical Benchmark without Leakage
Robust Amortized Bayesian Inference with Self-Consistency Losses on Unlabeled Data
FastGHA: Generalized Few-Shot 3D Gaussian Head Avatars with Real-Time Animation
Learning to Reason for Hallucination Span Detection
VOGUE: Unified Understanding, Generation, and Editing for Videos
An Overview of Subliminal Learning
Doubly-Robust LLM-as-a-Judge: Externally Valid Estimation with Imperfect Personas
Robustify Spiking Neural Networks via Dominant Singular Deflation under Heterogeneous Training Vulnerability
Why AI Evaluations Need Error Bars
DGNet: Learning Spatiotemporal PDEs with Discrete Green Networks
Improving Black-Box Generative Attacks via Generator Semantic Consistency
Refine Now, Query Fast: A Decoupled Refinement Paradigm for Implicit Neural Fields
Adaptive Acquisition Selection for Bayesian Optimization with Large Language Models
Late-to-Early Training: LET LLMs Learn Earlier, So Faster and Better
SDErasure: Concept-Specific Trajectory Shifting for Concept Erasure via Adaptive Diffusion Classifier
TPDiff: Temporal Pyramid Video Diffusion Model
PINFDiT: Energy-Based Physics-Informed Diffusion Transformers for General-purpose Time Series Tasks
ResearchRubrics: A Benchmark of Prompts and Rubrics For Deep Research Agents
BaseReward: A Strong Baseline for Multimodal Reward Model
TTS Can Speak in Any Style with Any Voice
KVComm: Enabling Efficient LLM Communication through Selective KV Sharing
Reducing information dependency does not cause training data privacy. Adversarially non-robust features do.
Learning to Generate Stylized Handwritten Text via a Unified Representation of Style, Content, and Noise
Test-Time Iterative Error Correction for Efficient Diffusion Models
OrchestrationBench: LLM-Driven Agentic Planning and Tool Use in Multi-Domain Scenarios
PromptHub: Enhancing Multi-Prompt Visual In-Context Learning with Locality-Aware Fusion, Concentration and Alignment
SkyEvents: A Large-Scale Event-enhanced UAV Dataset for Robust 3D Scene Reconstruction
Towards a Foundation Model for Crowdsourced Label Aggregation
Multi-ReduNet: Interpretable Class-Wise Decomposition of ReduNet
Geometric Constraints for Small Language Models to Understand and Expand Scientific Taxonomies
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Towards Personalized Deep Research: Benchmarks and Evaluations
Hot Fuzz: Temperature-Tunable Composition of Diffusion models with Fuzzy Logic
BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration
Robust Test-time Video-Text Retrieval: Benchmarking and Adapting for Query Shifts
Learning Human Habits with Rule-Guided Active Inference
Reasoning Scaffolding: Distilling the Flow of Thought from LLMs
In-context learning of representations can be explained by induction circuits
Softmax Transformers are Turing-Complete
A Hidden Semantic Bottleneck in Conditional Embeddings of Diffusion Transformers
D&R: Recovery-based AI-Generated Text Detection via a Single Black-box LLM Call
Flow-Disentangled Feature Importance
Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning
Separable Neural Networks: Approximation Theory, NTK Regime, and Preconditioned Gradient Descent
The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas
Learning Posterior Predictive Distributions for Node Classification from Synthetic Graph Priors
FASA: FREQUENCY-AWARE SPARSE ATTENTION
Two-Layer Convolutional Autoencoders Trained on Normal Data Provably Detect Unseen Anomalies
A General Framework for Black-Box Attacks Under Cost Asymmetry
Multi-Domain Transferable Graph Gluing for Building Graph Foundation Models
Uncertainty Estimation via Hyperspherical Confidence Mapping
Harpoon: Generalised Manifold Guidance for Conditional Tabular Diffusion
Robust LLM Unlearning via Post Judgment and Multi-round Thinking
Generative Bayesian Optimization: Generative Models as Acquisition Functions
Understanding and Relaxing the Limitations of Transformers for Linear Algebra
Scaling Behavior of Discrete Diffusion Language Models
Mitigating the Safety Alignment Tax with Null-Space Constrained Policy Optimization
Learning to Reason via Mixture-of-Thought for Logical Reasoning
Conditionally Whitened Generative Models for Probabilistic Time Series Forecasting
Adaptive Social Learning via Mode Policy Optimization for Language Agents
Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-Language Navigation
Evaluating SAE interpretability without generating explanations
MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference
VoxPrivacy: A Benchmark for Evaluating Interactional Privacy of Speech Language Models
An efficient, provably optimal, practical algorithm for the 0-1 loss linear classification problem
3D Aware Region Prompted Vision Language Model
Beyond Speedup - Utilizing KV Cache for Sampling and Reasoning
A Bayesian Nonparametric Framework For Learning Disentangled Representations
Flow Straight and Fast in Hilbert Space: Functional Rectified Flow
Improved high-dimensional estimation with Langevin dynamics and stochastic weight averaging
OrthoRF: Exploring Orthogonality in Object-Centric Representations
Any-step Generation via N-th Order Recursive Consistent Velocity Field Estimation
ZIP-RC: Zero-overhead Inference-time Prediction of Reward and Cost for Adaptive and Interpretable Generation
Uncovering Robot Vulnerabilities through Semantic Potential Fields
MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs
Analysis of approximate linear programming solution to Markov decision problem with log barrier function
Let's (not) just put things in Context: Test-time Training for Long-context LLMs
Unifying Diffusion and Autoregression for Generalizable Vision-Language-Action Model
VIRTUE: Visual-Interactive Text-Image Universal Embedder
Product of Experts for Visual Generation
Sim2Real VLA: Zero-Shot Generalization of Synthesized Skills to Realistic Manipulation
DiffTrans: Differentiable Geometry-Materials Decomposition for Reconstructing Transparent Objects
FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Merging
No labels, No Problem: Training Visual Reasoners with Multimodal Verifiers
AFD-INSTRUCTION: A Comprehensive Antibody Instruction Dataset with Functional Annotations for LLM-Based Understanding and Design
TurboBoA: Faster and Exact Attention-aware Quantization without Backpropagation
Can SAEs reveal and mitigate racial biases of LLMs in healthcare?
Null-Space Filtering for Data-free Continual Model Merging: Preserving Transparency, Promoting Fidelity
Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks
Importance Sampling for Multi-Negative Multimodal Direct Preference Optimization
Tree-sliced Sobolev IPM
Efficient Submodular Maximization for Sums of Concave over Modular Functions
ICDiffAD: Implicit Conditioning Diffusion Model for Time Series Anomaly Detection
Adaptive Hopfield Network: Rethinking Similarities in Associative Memory
Overlap-Adaptive Regularization for Conditional Average Treatment Effect Estimation
INO-SGD: Addressing Utility Imbalance under Individualized Differential Privacy
Online Prediction of Stochastic Sequences with High Probability Regret Bounds
Exchangeability of GNN Representations with Applications to Graph Retrieval
XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning
Theoretical Modeling of Large Language Model Self-Improvement Training Dynamics Through Solver-Verifier Gap
GaussianFusion: Unified 3D Gaussian Representation for Multi-Modal Fusion Perception
Is Your Paper Being Reviewed by an LLM? Benchmarking AI Text Detection in Peer Review
FACT: Fine-grained Across-variable Convolution for Multivariate Time Series Forecasting
Sample-efficient evidence estimation of score based priors for model selection
PixelCraft: A Multi-Agent system for High-Fidelity Visual Reasoning on Structured Images
Why Do Unlearnable Examples Work: A Novel Perspective of Mutual Information
CARL: Preserving Causal Structure in Representation Learning
CellDuality: Unlocking Biological Reasoning in LLMs with Self-Supervised RLVR
Tackling Heavy-Tailed Q-Value Bias in Offline-to-Online Reinforcement Learning with Laplace-Robust Modeling
UniCA: Unified Covariate Adaptation for Time Series Foundation Model
SpeechJudge: Towards Human-Level Judgment for Speech Naturalness
Universal Properties of Activation Sparsity in Modern Large Language Models
The Counting Power of Transformers
LRIM: a Physics-Based Benchmark for Provably Evaluating Long-Range Capabilities in Graph Learning
Rolling Forcing: Autoregressive Long Video Diffusion in Real Time
Hybrid Training for Vision-Language-Action Models
Contact Wasserstein Geodesics for Non-Conservative Schrödinger Bridges
ADM-v2: Pursuing Full-Horizon Roll-out in Dynamics Models for Offline Policy Learning and Evaluation
VisualPRM400K: An Effective Dataset for Training Multimodal Process Reward Models
Uni-X: Mitigating Modality Conflict with a Two-End-Separated Architecture for Unified Multimodal Models
Graph-Theoretic Intrinsic Reward: Guiding RL with Effective Resistance
MARTI: A Framework for Multi-Agent LLM Systems Reinforced Training and Inference
Self-Evolving Vision-Language Models for Image Quality Assessment via Voting and Ranking
Time Optimal Execution of Action Chunk Policies Beyond Demonstration Speed
Language as a Window Into the Mind: How NLP and LLMs Advance Human Sciences
Selective Data Removal for Distributional Machine Unlearning
MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs
Towards Dynamic Interleaving Optimizers
Geometric Autoencoder Priors for Bayesian Inversion: Learn First Observe Later
GarmentGPT: Compositional Garment Pattern Generation via Discrete Latent Tokenization
Flow Expansion via Verifier-Constrained Noised State Space Exploration
MeSH: Memory-as-State-Highways for Recursive Transformers
D-AR: Diffusion via Autoregressive Models
Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow
Phantom-Data: Towards a General Subject-Consistent Video Generation Dataset
YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting
Bringing Stability to Diffusion: Decomposing and Reducing Variance of Training Masked Diffusion Models
Scalable Oversight via Partitioned Human Supervision
I2Mole: Interaction-aware Invariant Molecular Learning For Generalizable Property Prediction
$p\textrm{-less}$ Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding
ContextIF: Enhancing Instruction-Following through Context Reward
Empowering Multi-Robot Cooperation via Sequential World Models
A Revisit of Active Sequential Prediction-Powered Mean Estimation
Bridging Kolmogorov Complexity and Deep Learning: Asymptotically Optimal Description Length Objectives for Transformers
Learning Domain-Aware Task Prompt Representations for Multi-Domain All-in-One Image Restoration
Synthetic History: Evaluating Visual Representations of the Past in Diffusion Models
Operator Learning with Domain Decomposition for Geometry Generalization in PDE Solving
Factuality Matters: When Image Generation and Editing Meet Structured Visuals
Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights
Structural Inference: Interpreting Small Language Models with Susceptibilities
Online Decision Making with Generative Action Sets
TINKER: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization
FineNib: A Query Synthesizer For Static Analysis of Security Vulnerabilities
3DCS: Datasets and Benchmark for Evaluating Conformational Sensitivity in Molecular Representations
MC-Search: Evaluating and Enhancing Multimodal Agentic Search with Structured Long Reasoning Chains
Concepts' Information Bottleneck Models
Near-Optimal Online Deployment and Routing for Streaming LLMs
STORM: Synergistic Cross-Scale Spatio-Temporal Modeling for Weather Forecasting
AIRE-Prune: Asymptotic Impulse-Response Energy for State Pruning in State Space Models
CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning
CoRA: Boosting Time Series Foundation Models for Multivariate Forecasting through Correlation-aware Adapter
From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers
Unveiling Super Experts in Mixture-of-Experts Large Language Models
Improving Reasoning for Diffusion Language Models via Group Diffusion Policy Optimization
TileLang: Bridge Programmability and Performance in Modern Neural Kernels
Gogo: Group-wise granularity-ordered codec for stable and efficient speech generation
R2-Dreamer: Redundancy-Reduced World Models without Decoders or Augmentation
Fed-Duet: Dual Expert-Orchestrated Framework for Continual Federated Vision-Language Learning
A Study on PAVE Specification for Learnware
On the Alignment Between Supervised and Self-Supervised Contrastive Learning
ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
ChemEval: A Multi-level and Fine-grained Chemical Capability Evaluation for Large Language Models
Inverse Reinforcement Learning with Dynamic Reward Scaling for LLM Alignment
Masked Skill Token Training for Hierarchical Off-Dynamics Transfer
Unified and Efficient Multi-view Clustering from Probabilistic Perspective
Knowledge Exchange with Confidence: Cost-Effective LLM Integration for Reliable and Efficient Visual Question Answering
Pairwise is Not Enough: Hypergraph Neural Networks for Multi-Agent Pathfinding
Masked Generative Policy for Robotic Control
Amortising Inference and Meta-Learning Priors in Neural Networks
Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning
SAES-SVD: Self-Adaptive Suppression of Accumulated and Local Errors for SVD-based LLM Compression
Point-Focused Attention Meets Context-Scan State Space: Robust Biological Visual Perception for Point Cloud Representation
PILOT-Bench: Probabilistic Interaction for LLM Operations in Tool-driven Scenarios
Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension
Fast Convergence of Natural Gradient Descent for Over-parameterized Physics-Informed Neural Networks
WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving
One Skill, Many Websites: Learning Generalizable Skills Through Polymorphic Abstraction
High Probability Bounds for Non-Convex Stochastic Optimization with Momentum
RiskPO: Risk-based Policy Optimization with Verifiable Reward for LLM Post-Training
MotionWeaver: Holistic 4D-Anchored Framework for Multi-Humanoid Image Animation
Mitigating Mismatch within Reference-based Preference Optimization
Universal Multi-Domain Translation via Diffusion Routers
Fore-Mamba3D: Mamba-based Foreground-Enhanced Encoding for 3D Object Detection
CL-DPS: A Contrastive Learning Approach to Blind Nonlinear Inverse Problem Solving via Diffusion Posterior Sampling
RFS: Reinforcement learning with Residual flow steering for dexterous manipulation
Non-Autoregressive Generation for Agentic Multi-Turn Interaction
wd1: Weighted Policy Optimization for Reasoning in Diffusion Language Models
UrbanGS: Efficient and Scalable Architecture for Geometrically Accurate Large-Scene Reconstruction
A Block Coordinate Descent Method for Nonsmooth Composite Optimization under Orthogonality Constraints
The Lattice Geometry of Neural Network Quantization: A Short Equivalence Proof of GPTQ and Babai's algorithm
TP-Spikformer: Token Pruned Spiking Transformer
TEST-TIME SCALING IN DIFFUSION LLMS VIA HIDDEN SEMI-AUTOREGRESSIVE EXPERTS
Rethinking Pareto Frontier: On the Optimal Trade-offs in Fair Classification
Aurelius: Relation Aware Text-to-Audio Generation At Scale
Membrane Potential Perturbation Dynamic Is Total Variation
Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime
DCFold: Efficient Protein Structure Generation with Single Forward Pass
Who Matters Matters: Agent-Specific Conservative Offline MARL
Safety Instincts: LLMs Learn to Trust Their Internal Compass for Self-Defense
PI-Light: Physics-Inspired Diffusion for Full-Image Relighting
FLARE: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
End-to-end Listen, Look, Speak and Act
VaseVQA-3D: Benchmarking 3D VLMs on Ancient Greek Pottery
Untraceable DeepFakes via Traceable Fingerprint Elimination
A Near-Optimal Best-of-Both-Worlds Algorithm for Federated Bandits
Unleashing LLMs in Bayesian Optimization: Preference-Guided Framework for Scientific Discovery
IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs
Empowering LLM Tool Invocation with Tool-call Reward Model
Decoding Inner Speech with an End-to-End Brain-to-Text Neural Interface
EMBridge: Enhancing Gesture Generalization from EMG Signals Through Cross-modal Representation Learning
Fine-tuning Behavioral Cloning Policies with Preference‑Based Reinforcement Learning
NewtonGen: Physics-consistent and Controllable Text-to-Video Generation via Neural Newtonian Dynamics
Human Uncertainty-Aware Data Selection and Automatic Labeling in Visual Question Answering
Efficient Multi-objective Prompt Optimization via Pure-exploration Bandits
PolySHAP: Extending KernelSHAP with Interaction-Informed Polynomial Regression
Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models
Toward Efficient Exploration by Large Language Model Agents
CoDA: Agentic Systems for Collaborative Data Visualization
Neural Collapse in Multi-Task Learning
A One-shot Framework for Directed Evolution of Antibodies
Purifying Generative LLMs from Backdoors without Prior Knowledge or Clean Reference
Head collapse, features stay: why replay needs big buffers
Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking
Ice Cream Doesn’t Cause Drowning: Benchmarking LLMs Against Statistical Pitfalls in Causal Inference
Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages
From Sparse to Dense: Spatio-Temporal Fusion for Multi-View 3D Human Pose Estimation with DenseWarper
Multi-Subspace Multi-Modal Modeling for Diffusion Models: Estimation, Convergence and Mixture of Experts
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
DriveMamba: Task-Centric Scalable State Space Model for Efficient End-to-End Autonomous Driving
Enhancing Visual Token Representations for Video Large Language Models via Training-free Spatial-Temporal Pooling and Gridding
ChronoEdit: Towards Temporal Reasoning for In-Context Image Editing and World Simulation
OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging
Actions Speak Louder than Prompts: A Large-Scale Study of LLMs for Graph Inference
WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM
From Collapse to Control: Understanding and Extending Context Length in Emerging Hybrid Models via Universal Position Interpolation
Spike-based Digital Brain: a novel fundamental model for brain activity analysis
Exploratory Causal Inference in SAEnce
TRIM: Hybrid Inference via Targeted Stepwise Routing in Multi-Step Reasoning Tasks
OmniText: A Training-Free Generalist for Controllable Text-Image Manipulation
WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research
SmartChunk Retrieval: Query-Aware Chunk Compression with Planning for Efficient Document RAG
Experience-based Knowledge Correction for Robust Planning in Minecraft
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence
Mango-GS: Enhancing Spatio-Temporal Consistency in Dynamic Scenes Reconstruction using Multi-Frame Node-Guided 4D Gaussian Splatting
OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
From Broad Exploration to Stable Synthesis: Entropy-Guided Optimization for Autoregressive Image Generation
MMPD: Diverse Time Series Forecasting via Multi-Mode Patch Diffusion Loss
Setting up for failure: automatic discovery of the neural mechanisms of cognitive errors
Learning Unified Representation of 3D Gaussian Splatting
FZOO: Fast Zeroth-Order Optimizer for Fine‑Tuning Large Language Models towards Adam‑Scale Speed
Arbitrary-Order Block SignSGD for Memory-Efficient LLM Fine-Tuning
Revisual-R1: Advancing Multimodal Reasoning From Optimized Cold Start to Staged Reinforcement Learning
HGNet: Scalable Foundation Model for Automated Knowledge Graph Generation from Scientific Literature
Implicit Bias and Loss of Plasticity in Matrix Completion: Depth Promotes Low-Rankness
Fair in Mind, Fair in Action? A Synchronous Benchmark for Understanding and Generation in UMLLMs
Universal Value-Function Uncertainties
Seesaw: Accelerating Training by Balancing Batch Size and Learning Rate Scheduling
Let's Explore Step by Step: Generating Provable Formal Statements with Deductive Exploration
Riemannian Variational Flow Matching for Material and Protein Design
Improving LLM Alignment with References
Flow Along the $K$-Amplitude for Generative Modeling
Discovering Hierarchical Software Engineering Agents via Bandit Optimization
Beyond Softmax and Entropy: $f$-Regularized Policy Gradients with Coupled Parametrizations
ARM-FM: Automated Reward Machines via Foundation Models for Compositional Reinforcement Learning
Grounded Test-Time Adaptation for LLM Agents
NIMO: a Nonlinear Interpretable MOdel
Learning an Image Editing Model without Image Editing Pairs
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization
DiscoX: Benchmarking Discourse-Level Translation in Expert Domains
From Observations to Events: Event-Aware World Models for Reinforcement Learning
CaReBench: A Fine-grained Benchmark for Video Captioning and Retrieval
The Serial Scaling Hypothesis
TreeGrad-Ranker: Feature Ranking via $O(L)$-Time Gradients for Decision Trees
HOTA: Hamiltonian framework for Optimal Transport Advection
Transfer Learning in Infinite Width Feature Learning Networks
Characterizing and Mitigating Reasoning Drift in Large Language Models
Einstein Fields: A Neural Perspective To Computational General Relativity
RobotArena $\infty$: Unlimited Robot Benchmarking via Real-to-Sim Translation
Personalized Collaborative Learning with Affinity-Based Variance Reduction
A Resolution-Agnostic Geometric Transformer for Chromosome Modeling Using Inertial Frame
On the Sample Complexity of GNNs
ETGS: Explicit Thermodynamics Gaussian Splatting for Dynamic Thermal Reconstruction
Tina: Tiny Reasoning Models via LoRA
INSTANT: Compressing Gradients and Activations for Resource-Efficient Training
Towards One-step Causal Video Generation via Adversarial Self-Distillation
SeRI: Gradient-Free Sensitive Region Identification in Decision-Based Black-Box Attacks
ChartGalaxy: A Dataset for Infographic Chart Understanding and Generation
Mitigating Semantic Collapse in Generative Personalization with Test-Time Embedding Adjustment
RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents
Neural Sum-of-Squares: Certifying the Nonnegativity of Polynomials with Transformers
BézierFlow: Learning Bézier Stochastic Interpolant Schedulers for Few-Step Generation
Near-Optimal Sample Complexity Bounds for Constrained Average-Reward MDPs
Sample-efficient and Scalable Exploration in Continuous-Time RL
Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making
Demystifying Robot Diffusion Policies: Action Memorization and a Simple Lookup Table Alternative
Breaking and Fixing Defenses Against Control Flow Hijacking in Multi-Agent Systems
Distilled Pretraining: A modern lens of Data, In-Context Learning and Test-Time Scaling
An Orthogonal Learner for Individualized Outcomes in Markov Decision Processes
Subquadratic Algorithms and Hardness for Attention with Any Temperature
Don’t Pass@$k$: A Bayesian Framework for Large Language Model Evaluation
GTM: A General Time-series Model for Enhanced Representation Learning of Time-Series data
TAMMs:~Change Understanding and Forecasting in Satellite Image Time Series with a Temporal-Aware Multimodal Model
Towards Revealing the Effect of Batch Size Scheduling on Pre-training
SPWOOD: Sparse Partial Weakly-Supervised Oriented Object Detection
Policy Newton Algorithm in Reproducing Kernel Hilbert Space
Reasoning on Time-Series for Financial Technical Analysis
A Brain Graph Foundation Model: Pre-Training and Prompt-Tuning across Broad Atlases and Disorders
DreamCS: Geometry-Aware Text-to-3D Generation with Unpaired 3D Reward Supervision
Human or Machine? A Preliminary Turing Test for Speech-to-Speech Interaction
Falcon: Fast Proximal Linearization of Normalized Cuts for Unsupervised Image Segmentation
CoLA: Co-Calibrated Logit Adjustment for Long-Tailed Semi-Supervised Learning
VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
On the Expressive Power of GNNs for Boolean Satisfiability
Scalable Spatio-Temporal SE(3) Diffusion for Long-Horizon Protein Dynamics
LLMs are Single-threaded Reasoners: Demystifying the Working Mechanism of Soft Thinking
WideSearch: Benchmarking Agentic Broad Info-Seeking
Process-Level Trajectory Evaluation for Environment Configuration in Software Engineering Agents
Behavioral Embeddings of Programs: A Quasi-Dynamic Approach for Optimization Prediction
Unlocking the Power of Co-Occurrence in CLIP: A DualPrompt-Driven Method for Training-Free Zero-Shot Multi-Label Classification
Can Vision–Language Models Assess Graphic Design Aesthetics? A Benchmark, Evaluation, and Dataset Perspective.
OPRIDE: Efficient Offline Preference-based Reinforcement Learning via In-Dataset Exploration
SPIKE-RL: Video-LLMs meet Bayesian Surprise
Enough is as good as a feast: A Comprehensive Analysis of How Reinforcement Learning Mitigates Task Conflicts in LLMs
MENLO: From Preferences to Proficiency – Evaluating and Modeling Native-like Quality Across 47 Languages
Generalizable End-to-End Tool-Use RL with Synthetic CodeGym
Same Content, Different Representations: A Controlled Study for Table QA
MixLinear: Extreme Low Resource Multivariate Time Series Forecasting with $0.1K$ Parameters
Rethinking the Gold Standard: Why Discrete Curvature Fails to Fully Capture Over-squashing in GNNs?
Unified Multi-Modal Interactive and Reactive 3D Motion Generation via Rectified Flow
Accelerating Inference for Multilayer Neural Networks with Quantum Computers
S3OD: Towards Generalizable Salient Object Detection with Synthetic Data
NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation
Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning
ProSafePrune: Projected Safety Pruning for Mitigating Over-Refusal in LLMs
FullPart: Generating each 3D Part at Full Resolution
LaVCa: LLM-assisted Visual Cortex Captioning
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
Virtual Community: An Open World for Humans, Robots, and Society
Polynomial, trigonometric, and tropical activations
No, of Course I Can! Deeper Fine-Tuning Attacks That Bypass Token-Level Safety Mechanisms
GAS: Improving Discretization of Diffusion ODEs via Generalized Adversarial Solver
Boosting Multi-Domain Reasoning of LLMs via Curvature-Guided Policy Optimization
Improving 2D Diffusion Models for 3D Medical Imaging with Inter‑Slice Consistent Stochasticity
Randomization Boosts KV Caching, Learning Balances Query Load: A Joint Perspective
Following the Navigation: Enhancing Small Language Models Contextual Reasoning with LLM Guidance
ZeroGR: A Generalizable and Scalable Framework for Zero-Shot Generative Retrieval
A Balanced Neuro-Symbolic Approach for Commonsense Abductive Logic
ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment
Salient Object Ranking via Cyclical Perception-Viewing Interaction Modeling
Beyond Uniformity: Regularizing Implicit Neural Representations through a Lipschitz Lens
CLIP-FMoE: Scalable CLIP via Fused Mixture-of-Experts with Enforced Specialization
Less Gaussians, Texture More: 4K Feed-Forward Textured Splatting
Task Vectors, Learned Not Extracted: Performance Gains and Mechanistic Insights
Sample Complexity and Representation Ability of Test-time Scaling Paradigms
Online Conformal Prediction with Adversarial Feedback via Regret Minimization
Interleave-VLA: Enhancing Robot Manipulation with Image-Text Interleaved Instructions
CORDS - Continuous Representations of Discrete Structures
When Weak LLMs Speak with Confidence, Preference Alignment Gets Stronger
How does the optimizer implicitly bias the model merging loss landscape?
Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning
GoT-R1: Unleashing Reasoning Capability of Autoregressive Visual Generation with Reinforcement Learning
Know When to Abstain: Optimal Selective Classification with Likelihood Ratios
Distributionally Robust Classification for Multi-source Unsupervised Domain Adaptation
Co-occurring Associated REtained concepts in Diffusion Unlearning
CTBench: Cryptocurrency Time Series Generation Benchmark
LoopFormer: Elastic-Depth Looped Transformers for Latent Reasoning via Shortcut Modulation
Quantifying Cross-Attention Interaction in Transformers for Interpreting TCR-pMHC Binding
Model-Guided Microstimulation Steers Primate Visual Behavior
Multihead Mixture of Experts for Classification of Gigapixel Pathology Images
M3CoTBench: Benchmark Chain-of-Thought of MLLMs in Medical Image Understanding
Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle
RefineStat: Efficient Exploration for Probabilistic Program Synthesis
CHAMMI-75: pre-training multi-channel models with heterogeneous microscopy images
mR3: Multilingual Rubric-Agnostic Reward Reasoning Models
Detecting Temporal Misalignment Attacks in Multimodal Fusion for Autonomous Driving
Pixel to Gaussian: Ultra-Fast Continuous Super-Resolution with 2D Gaussian Modeling
Analyzing the Training Dynamics of Image Restoration Transformers: A Revisit to Layer Normalization
EVEREST: A Transformer for Probabilistic Rare-Event Anomaly Detection with Evidential and Tail-Aware Uncertainty
On Predictability of Reinforcement Learning Dynamics for Large Language Models
Dynamic-dLLM: Dynamic Cache-Budget and Adaptive Parallel Decoding for Training-Free Acceleration of Diffusion LLM
Convex Efficient Coding
d$^2$Cache: Accelerating Diffusion-Based LLMs via Dual Adaptive Caching
MobileKGQA: On-Device KGQA System on Dynamic Mobile Environments
Derandomized Online-to-Non-convex Conversion for Stochastic Weakly Convex Optimization
Benchmarking Multi-Agent Reinforcement Learning in Power Grid Operations
Samples Are Not Equal: A Sample Selection Approach for Deep Clustering
Matching multiple experts: on the exploitability of multi-agent imitation learning
Downgrade to Upgrade: Optimizer Simplification Enhances Robustness in LLM Unlearning
NLI : Non-uniform Linear Interpolation Approximation of Nonlinear Operations for Efficient LLMs Inference
Self-Improving Loops for Visual Robotic Planning
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation
KeepLoRA: Continual Learning with Residual Gradient Adaptation
How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining
CTRL&SHIFT: High-quality Geometry-Aware Object Manipulation in Visual Generation
Market Games for Generative Models: Equilibria, Welfare, and Strategic Entry
Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks
3DGEER: 3D Gaussian Rendering Made Exact and Efficient for Generic Cameras
Conditional Independent Component Analysis For Estimating Causal Structure with Latent Variables
ICPO: Provable and Practical In-Context Policy Optimization for Test-Time Scaling
LiteGuard: Efficient Task-Agnostic Model Fingerprinting with Enhanced Generalization
Accelerating Materials Design via LLM-Guided Evolutionary Search
Structured Reasoning for LLMs: A Unified Framework for Efficiency and Explainability
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
Weak-to-Strong Generalization with Failure Trajectories
Detecting Data Contamination in LLMs via In-Context Learning
Test-Time Training Done Right
Learning Correlated Reward Models: Statistical Barriers and Opportunities
Topology of Reasoning: Retrieved Cell Complex-Augmented Generation for Textual Graph Question Answering
FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of Large Language Models
Fresh in memory: Training-order recency is linearly encoded in language model activations
Compose Your Policies! Improving Diffusion-based or Flow-based Robot Policies via Test-time Distribution-level Composition
CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence
The Tutor-Pupil Augmentation: Enhancing Learning and Interpretability via Input Corrections
f-INE: A Hypothesis Testing Framework for Estimating Influence under Training Randomness
Video Unlearning via Low-Rank Refusal Vector
GDR-learners: Orthogonal Learning of Generative Models for Potential Outcomes
Evaluating Language Models' Evaluations of Games
DeMo: Decoupled Momentum Optimization
In-Context Algebra
Tucker-FNO: Tensor Tucker-Fourier Neural Operator and its Universal Approximation Theory
Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning
Mitigating Noise Shift in Denoising Generative Models with Noise Awareness Guidance
VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents
Arbitrary-Shaped Image Generation via Spherical Neural Field Diffusion
Inlier-Centric Post-Training Quantization for Object Detection Models
LLaVA-4D: Embedding SpatioTemporal Prompt into LMMs for 4D Scene Understanding
Point-UQ: An Uncertainty-Quantification Paradigm for Point Cloud Few-Shot Class Incremental Learning
Welfarist Formulations for Diverse Similarity Search
Feature compression is the root cause of adversarial fragility in neural networks
Frequency-aware Dynamic Gaussian Splatting
Adaptive Concept Discovery for Interpretable Few-Shot Text Classification
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
Independence Test for Linear Non-Gaussian Data and Applications in Causal Discovery
Full-Graph vs. Mini-Batch Training: Comprehensive Analysis from a Batch Size and Fan-Out Size Perspective
Multi-objective Large Language Model Alignment with Hierarchical Experts
Non-Asymptotic Analysis of Efficiency in Conformalized Regression
Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
Scaling Attention via Feature Sparsity
Streaming Drag-Oriented Interactive Video Manipulation: Drag Anything, Anytime!
Summaries as Centroids for Interpretable and Scalable Text Clustering
Decoupling Positional and Symbolic Attention in Transformers
DIFFSPARSE: ACCELERATING DIFFUSION TRANSFORMERS WITH LEARNED TOKEN SPARSITY
Scheduling Your LLM Reinforcement Learning with Reasoning Trees
Light-X: Generative 4D Video Rendering with Camera and Illumination Control
Convergence Dynamics of Over-Parameterized Score Matching for a Single Gaussian
MMSearch-Plus: Benchmarking Provenance-Aware Search for Multimodal Browsing Agents
TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning
DeepRAG: Thinking to Retrieve Step by Step for Large Language Models
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Video
FAME: $\underline{F}$ormal $\underline{A}$bstract $\underline{M}$inimal $\underline{E}$xplanation for neural networks
Off-Policy Evaluation for Ranking Policies under Deterministic Logging Policies
Transformers Don’t Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and Implications for Mechanistic Interpretability
Fairness-Aware Multi-view Evidential Learning with Adaptive Prior
PM-KVQ: Progressive Mixed-precision KV Cache Quantization for Long-CoT LLMs
A-TPT: Angular Diversity Calibration Properties for Test-Time Prompt Tuning of Vision-Language Models
Routing, Cascades, and User Choice for LLMs
From Parameters to Behaviors: Unsupervised Compression of the Policy Space
Decision Aggregation under Quantal Response
Taming Polysemanticity in LLMs: Theory-Grounded Feature Recovery via Sparse Autoencoders
Unveiling the Mechanism of Continuous Representation Full-Waveform Inversion: A Wave Based Neural Tangent Kernel Framework
QPrompt-R1: Real-Time Reasoning for Domain-Generalized Semantic Segmentation via Group-Relative Query Alignment
PETRI: Learning Unified Cell Embeddings from Unpaired Modalities via Early-Fusion Joint Reconstruction
VSF: Simple, Efficient, and Effective Negative Guidance in Few-Step Image Generation Models By Value Sign Flip
Task-Aware Data Selection via Proxy-Label Enhanced Distribution Matching for LLM Finetuning
Harnessing Temporal Databases for Systematic Evaluation of Factual Time-Sensitive Question-Answering in LLMs
Long-Context Attention Benchmark: From Kernel Efficiency to Distributed Context Parallelism
Information Estimation with Discrete Diffusion
Foresight Diffusion: Improving Sampling Consistency in Predictive Diffusion Models
Frayed RoPE and Long Inputs: A Geometric Perspective
Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer
LANE: Label-Aware Noise Elimination for Fine-Grained Text Classification
Energy-Based Transformers are Scalable Learners and Thinkers
Reassessing Layer Pruning in LLMs: New Insights and Methods
ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL
ACPBench Hard: Unrestrained Reasoning about Action, Change, and Planning
The human knowledge loophole in the 'bitter lesson' for LLMs
Reliable Weak-to-Strong Monitoring of LLM Agents
scDFM: Distributional Flow Matching Model for Robust Single-Cell Perturbation Prediction
AbdCTBench: Learning Clinical Biomarker Representations from Abdominal Surface Geometry
Diverse Dictionary Learning
Positional Encoding Field
Demystifying Supervision Data Generalization in Multimodal LMs
Rectified Decoupled Dataset Distillation: A Closer Look for Fair and Comprehensive Evaluation
Bilinear relational structure fixes reversal curse and enables consistent model editing
Fair Graph Machine Learning under Adversarial Missingness Processes
Evaluating and Improving Cultural Awareness of Reward Models for LLM Alignment
FeDaL: Federated Dataset Learning for General Time Series Foundation Models
Sharp asymptotic theory for Q-learning with \texttt{LD2Z} learning rate and its generalization
NeuCLIP: Efficient Large-Scale CLIP Training with Neural Normalizer Optimization
CreatiDesign: A Unified Multi-Conditional Diffusion Transformer for Creative Graphic Design
MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation
Content-Aware Mamba for Learned Image Compression
ARMOR: Aligning Secure and Safe Large Language Models via Meticulous Reasoning
Interactive Agents to Overcome Underspecificity in Software Engineering
Streaming Autoregressive Video Generation via Diagonal Distillation
Paradigm Shift of GNN Explainer from Label Space to Prototypical Representation Space
Toward Principled Flexible Scaling for Self-Gated Neural Activation
QVGen: Pushing the Limit of Quantized Video Generative Models
Maximizing Incremental Information Entropy for Contrastive Learning
Quasi-Equivariant Metanetworks
Proximal Diffusion Neural Sampler
Fewer Battles, More Gain: An Information-Efficient Framework for Arena-based LLM Evaluation
Computing Equilibrium beyond Unilateral Deviation
Revisiting Group Relative Policy Optimization: Insights into On-Policy and Off-Policy Training
Strategic Scaling of Test-Time Compute: A Bandit Learning Approach
ConvRec-R1: Training LLM-based Conversational Recommender Systems with Reinforcement Learning
How Transformers Learn Causal Structures In-Context: Explainable Mechanism Meets Theoretical Guarantee
Scaling Multi-Task Bayesian Optimization with Large Language Models
MAVEN: A Mesh-Aware Volumetric Encoding Network for Simulating 3D Flexible Deformation
SwiftTS: A Swift Selection Framework for Time Series Pre-trained Models via Multi-task Meta-Learning
AutoDrive-R²: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving
AnyUp: Universal Feature Upsampling
WebSeer: Training Deeper Search Agents through Reinforcement Learning with Self-Reflection
When MLLMs Meets Compression Distortion: A Coding Paradigm Tailored to MLLMs
Training-Free Reward-Guided Image Editing via Trajectory Optimal Control
Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning
Directed Semi-Simplicial Learning with Applications to Brain Activity Decoding
Fewer Weights, More Problems: A Practical Attack on LLM Pruning
Task-Related Token Compression in Multimodal Large Language Models from an Explainability Perspective
Noise-Adaptive Diffusion Sampling for Inverse Problems Without Task-Specific Tuning
Low-Pass Filtering Improves Behavioral Alignment of Vision Models
Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training
CylinderSplat: 3D Gaussian Splatting with Cylindrical Triplanes for Panoramic Novel View Synthesis
Dynamic Multimodal Activation Steering for Hallucination Mitigation in Large Vision-Language Models
Tracing the Traces: Latent Temporal Signals for Efficient and Accurate Reasoning
EA3D: Event-Augmented 3D Diffusion for Generalizable Novel View Synthesis
Consolidating Reinforcement Learning for Multimodal Discrete Diffusion Models
HSG-12M: A Large-Scale Dataset of Spatial Multigraphs from the Energy Spectra of non-Hermitian Crystals
TableMaster: A Recipe to Advance Table Understanding with Language Models
Multi-turn Evaluation of Anthropomorphic Behaviours in Large Language Models
Towards Better Optimization For Listwise Preference in Diffusion Models
How Text Quality Interventions Reshape Neural Scaling Laws for LLMs: Empirical Study
AssoMem: Scalable Memory QA with Multi-Signal Associative Retrieval
ShinkaEvolve: Towards Open-Ended and Sample-Efficient Program Evolution
A Generalized Geometric Theoretical Framework of Centroid Discriminant Analysis for Linear Classification of Multi-dimensional Data
Zero-shot Forecasting by Simulation Alone
MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow
Paper Copilot: Tracking the Evolution of Peer Review in AI Conferences
Topological Anomaly Quantification for Semi-supervised Graph Anomaly Detection
TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation
Multimodal Aligned Semantic Knowledge for Unpaired Image-text Matching
Beyond Spectra: Eigenvector Overlaps in Loss Geometry
JointAVBench: A Benchmark for Joint Audio-Visual Reasoning Evaluation
UniOD: A Universal Model for Outlier Detection across Diverse Domains
MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
Search Self-Play: Pushing the Frontier of Agent Capability without Supervision
IGC-Net for conditional average potential outcome estimation over time
SigLIP-HD by Fine-to-Coarse Supervision
IA2: Alignment with ICL Activations improves Supervised Fine-Tuning
True Self-Supervised Novel View Synthesis is Transferable
Financial fraud collusion among generative AI agents in social networks
StoryAlign: Evaluating and Training Reward Models for Story Generation
When Is Diversity Rewarded in Cooperative Multi-Agent Learning?
Automating the Refinement of Reinforcement Learning Specifications
When Style Breaks Safety: Defending LLMs Against Superficial Style Alignment
Path Channels and Plan Extension Kernels: a Mechanistic Description of Planning in a Sokoban RNN
Diffusion Alignment as Variataional Expectation-Maximization
EXPO: Stable Reinforcement Learning with Expressive Policies
OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning
Bridging Generalization Gap of Heterogeneous Federated Clients Using Generative Models
Type-Compliant Adaptation Cascades
Are Reasoning LLMs Robust to Interventions on their Chain-of-Thought?
PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing
Searching for Privacy Risks in LLM Agents via Simulation
BioMD: All-atom Generative Model for Biomolecular Dynamics Simulation
TriQDef: Disrupting Semantic and Gradient Alignment to Prevent Adversarial Patch Transferability in Quantized Neural Networks
AlphaFlow: Understanding and Improving MeanFlow Models
InfoScan: Information-Efficient Visual Scanning via Resource-Adaptive Walks
Self-Jailbreaking: Language Models Can Reason Themselves Out of Safety Alignment After Benign Reasoning Training
PCPO: Proportionate Credit Policy Optimization for Preference Alignment of Image Generation Models
Poly-attention: a general scheme for higher-order self-attention
VER: Vision Expert Transformer for Robot Learning via Foundation Distillation and Dynamic Routing
Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools
Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model
PairFlow: Closed-Form Source-Target Coupling for Few-Step Generation in Discrete Flow Models
Reliable Probabilistic Forecasting of Irregular Time Series through Marginalization-Consistent Flows
UIS-Digger: Towards Comprehensive Research Agent Systems for Real-world Unindexed Information Seeking
Image Quality Assessment for Embodied AI
Single-stream Policy Optimization
Inference-time scaling of diffusion models through classical search
Divergence-Free Neural Networks with Application to Image Denoising
Differentiable Model Predictive Control on the GPU
Synthetic Bootstrapped Pretraining
Free Point-wise Anomaly Detection via Fold-bifurcation
LaplacianFormer:Rethinking Linear Attention with Laplacian Kernel
Grounding or Guessing? Visual Signals for Detecting Hallucinations in Sign Language Translation
Universal Inverse Distillation for Matching Models with Real-Data Supervision (No GANs)
Thought Branches: Interpreting LLM Reasoning Requires Resampling
Beyond Grid-Locked Voxels: Neural Response Functions for Continuous Brain Encoding
From Vicious to Virtuous Cycles: Synergistic Representation Learning for Unsupervised Video Object-Centric Learning
Learning Exposure Mapping Functions for Inferring Heterogeneous Peer Effects
Distillation of Large Language Models via Concrete Score Matching
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs
Conformal Robustness Control: A New Strategy for Robust Decision
Why DPO is a Misspecified Estimator and How to Fix It
DeAltHDR: Learning HDR Video Reconstruction from Degraded Alternating Exposure Sequences
DispViT: Direct Stereo Disparity Regression with a Single-Stream Vision Transformer
Split Happens (But Your Video Model Can Be Edited)
Decision Pre-Trained Transformer is a Scalable In-Context Reinforcement Learner
Understanding the Robustness of Distributed Self-Supervised Learning Frameworks Against Non-IID Data
Sparse Attention Adaptation for Long Reasoning
Spurious Correlation-Aware Embedding Regularization for Worst-Group Robustness
Zebra-CoT: A Dataset for Interleaved Vision-Language Reasoning
Estimating Worst-Case Frontier Risks of Open-Weight LLMs
A Sharp KL-Convergence Analysis for Diffusion Models under Minimal Assumptions
Making Slow Thinking Faster: Compressing LLM Chain-of-Thought via Step Entropy
Learning AND–OR Templates for Compositional Representation in Art and Design
Only Brains Align with Brains: Cross-Region Patterns Expose Limits of Normative Models
A Dense Subset Index for Collective Query Coverage
All That Glitters Is Not Gold: Key-Secured 3D Secrets within 3D Gaussian Splatting
Out of the Shadows: Exploring a Latent Space for Neural Network Verification
DiaBlo: Diagonal Blocks Are Sufficient For Finetuning
Addressing divergent representations from causal interventions on neural networks
Using Reinforcement Learning to Train Large Language Models to Explain Human Decisions
One step further with Monte-Carlo sampler to guide diffusion better
Copy-Paste to Mitigate Large Language Model Hallucinations
Text2Interact: High-Fidelity and Diverse Text-to-Two-Person Interaction Generation
EchoGen: Generating Visual Echoes in Any Scene via Feed-Forward Subject-Driven Auto-Regressive Model
Enhanced Generative Model Evaluation with Clipped Density and Coverage
Physics-Informed Inference Time Scaling for Solving High-Dimensional Partial Differential Equations
From Utterance to Vividity: Training Expressive Subtitle Translation LLM via Adaptive Local Preference Optimization
A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling
CPQS-Tuning: A Model Self-Perception-Based Data Filtering Algorithm for Efficient Instruction Fine-Tuning
Text2Arch: A Dataset for Generating Scientific Architecture Diagrams from Natural Language Descriptions
Batch and Sequential Unlearning for Neural Networks
CryoSplat: Gaussian Splatting for Cryo-EM Homogeneous Reconstruction
Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs
Translating Flow to Policy via Hindsight Online Imitation
In-Place Test-Time Training
Topological Causal Effects
World2Minecraft: Occupancy-Driven simulated scenes Construction
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Less is more: Clustered Cross-Covariance Control for Offline RL
LightMem: Lightweight and Efficient Memory-Augmented Generation
CoNavBench: Collaborative Long-Horizon Vision-Language Navigation Benchmark
AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent
Mixed-Curvature Tree-Sliced Wasserstein Distance
OrderDP: A Theoretically Guaranteed Lossless Dynamic Data Pruning Framework
Agentic Context Engineering: Learning Comprehensive Contexts for Self-Improving Language Models
Group-Normalized Implicit Value Optimization for Language Models
QuRL: Low-Precision Reinforcement Learning for Efficient Reasoning
Turbo-DDCM: Fast and Flexible Zero-Shot Diffusion-Based Image Compression
DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science
Arbitrary Generative Video Interpolation
Spatially Guided Training for Vision-Language-Action Model
Declarative Audio Editing with Audio Language Model
Constitutional Classifiers++: Production-Grade Defenses against Universal Jailbreaks
Trajectory Generation with Conservative Value Guidance for Offline Reinforcement Learning
Motion-R1: Enhancing Motion Generation with Decomposed Chain-of-Thought and RL Binding
OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot
EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing
GTR-Bench: Evaluating Geo-Temporal Reasoning in Vision-Language Models
Hidden Breakthroughs in Language Model Training
P$^2$-DPO:Grounding Hallucination in Perceptual Processing via Calibration Direct Preference Optimization
Online Navigation Refinement: Achieving Lane-Level Guidance by Associating Standard-Definition and Online Perception Maps
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model
Online Rounding and Learning Augmented Algorithms for Facility Location
Don't Throw Away Your Pretrained Model
Benchmarking Overton Pluralism in LLMs
Dual Perspectives on Non-Contrastive Self-Supervised Learning
$\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers
From Natural Alignment to Conditional Controllability in Multimodal Dialogue
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows
MATA: A Trainable Hierarchical Automaton System for Multi-Agent Visual Reasoning
Dual-Scale World Models for LLM Agents towards Hard-Exploration Problems
Entering the Era of Discrete Diffusion Models: A Benchmark for Schrödinger Bridges and Entropic Optimal Transport
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation
Cancer-Myth: Evaluating Large Language Models on Patient Questions with False Presuppositions
Pyramid Patchification Flow for Visual Generation
Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs
Why Prototypes Collapse: Diagnosing and Preventing Partial Collapse in Prototypical Self-Supervised Learning
Reverse-Engineered Reasoning for Open-Ended Generation
On the Shelf Life of Finetuned LLM-Judges: Future Proofing, Backward Compatibility, and Question Generalization
Bottlenecked Transformers: Periodic KV Cache Consolidation for Generalised Reasoning
Exploring Real-Time Super-Resolution: Benchmarking and Fine-Tuning for Streaming Content
HiCache: A Plug-in Scaled-Hermite Upgrade for Taylor-Style Cache-then-Forecast Diffusion Acceleration
Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations
Johnson-Lindenstrauss Lemma Guided Network for Efficient 3D Medical Segmentation
PROTDYN: A FOUNDATION PROTEIN LANGUAGE MODEL FOR THERMODYNAMICS AND DYNAMICS GENERATION
Forget Forgetting: Continual Learning in a World of Abundant Memory
GT-Space: Enhancing Heterogeneous Collaborative Perception with Ground Truth Feature Space
Achieving low-bit Muon through subspace preservation and grid quantization
SketchingReality: From Freehand Scene Sketches to Photorealistic Images
TrimR: Verifier-based Training-Free Thinking Trimming for Efficient Test-Time Scaling
CONCUR: A Framework for Continual Constrained and Unconstrained Routing
Thyme: Think Beyond Images
Dynamic Reflections: Probing Video Representations with Text Alignment
ActivationReasoning: Logical Reasoning in Latent Activation Spaces
Block Recurrent Dynamics in Vision Transformers
Antislop: A Comprehensive Framework for Identifying and Eliminating Repetitive Patterns in Language Models
ResT: Reshaping Token-Level Policy Gradients for Tool-Use Large Language Models
KnowGuard: Knowledge-Driven Abstention for Multi-Round Clinical Reasoning
SCRIBES: Web-Scale Script-Based Semi-Structured Data Extraction with Reinforcement Learning
Rethinking Policy Diversity in Ensemble Policy Gradient in Large-Scale Reinforcement Learning
TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models
Cross-Timestep: 3D Diffusion Model with Trans-temporal Memory LSTM and Adaptive Priori Decoding Strategy for Medical Segmentation
GoalRank: Group-Relative Optimization for a Large Ranking Model
FutureMind: Equipping Small Language Models with Strategic Thinking-Pattern Priors via Adaptive Knowledge Distillation
Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
DynaGuard: A Dynamic Guardian Model With User-Defined Policies
Sequences of Logits Reveal the Low Rank Structure of Language Models
Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis
GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning
Semi-Supervised Preference Optimization with Limited Feedback
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation
Neural Synchrony Between Socially Interacting Language Models
Towards Knowledge‑and‑Data‑Driven Organic Reaction Prediction: RAG‑Enhanced and Reasoning‑Powered Hybrid System with LLMs
Precise and Interpretable Editing of Code Knowledge in Large Language Models
Global Resolution: Optimal Multi-Draft Speculative Sampling via Convex Optimization
ClarifyVC: Clarifying Ambiguous Commands in Vehicle Control with a Hybrid Data Augmentation Pipeline
StochasTok: Improving Fine-Grained Subword Understanding in LLMs
Near Optimal Robust Federated Learning Against Data Poisoning Attack
Representing local protein environments with machine learning force fields
Meta-RL Induces Exploration in Language Agents
Are LLMs Really Not Knowledgeable? Mining the Submerged Knowledge in LLMs' Memory
Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
What Exactly Does Guidance Do in Masked Discrete Diffusion Models
Medical thinking with multiple images
Noisy but Valid: Robust Statistical Evaluation of LLMs with Imperfect Judges
DrVoice: Parallel Speech-Text Voice Conversation Model via Dual-Resolution Speech Representations
DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving
Advancing Complex Video Object Segmentation via Progressive Concept Construction
Log Probability Tracking of LLM APIs
Disentangling Length Bias in Preference Learning via Response-Conditioned Modeling
Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels
Operator Theory-Driven Autoformulation of MDPs for Control of Queueing Systems
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
Long-Text-to-Image Generation via Compositional Prompt Decomposition
Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation
Bridging Degradation Discrimination and Generation for Universal Image Restoration
Compositional amortized inference for large-scale hierarchical Bayesian models
dParallel: Learnable Parallel Decoding for dLLMs
Self-Speculative Decoding Accelerates Lossless Inference in Any-Order and Any-Subset Autoregressive Models
GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks
Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints
Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles
VLBiMan: Vision-Language Anchored One-Shot Demonstration Enables Generalizable Bimanual Robotic Manipulation
ST-SimDiff: Balancing Spatiotemporal Similarity and Difference for Efficient Video Understanding with MLLMs
Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
Token Hidden Reward: Steering Exploration-Exploitation in Group Relative Deep Reinforcement Learning
ProofOptimizer: Training Language Models to Simplify Proofs without Human Demonstrations
Riemannian Optimization on Relaxed Indicator Matrix Manifold
Block-sample MAC-Bayes generalization bounds
Not All Documents Are What You Need for Extracting Instruction Tuning Data
Self-Consistency Improves the Trustworthiness of Self-Interpretable GNNs
Pushing Test-Time Scaling Limits of Deep Search with Asymmetric Verification
Robust Multi-Objective Controlled Decoding of Large Language Models
From Single to Multi-Granularity: Toward Long-Term Memory Association and Selection of Conversational Agents
Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation
Modeling the Density of Pixel-level Self-supervised Embeddings for Unsupervised Pathology Segmentation in Medical CT
Disentangling the Factors of Convergence between Brains and Computer Vision Models
UNDERSTANDING TRANSFORMERS FOR TIME SEIRES FORECASTING: A CASE STUDY ON MOIRAI
How hard is learning to cut? Trade-offs and sample complexity
PreferThinker: Reasoning-based Personalized Image Preference Assessment
Revisiting Parameter Server in LLM Post-Training
PAT3D: Physics-Augmented Text-to-3D Scene Generation
Towards Multimodal Data-Driven Scientific Discovery Powered by LLM Agents
Rewarding Doubt: A Reinforcement Learning Approach to Calibrated Confidence Expression of Large Language Models
Unveiling Perceptual Artifacts: A Fine-Grained Benchmark for Interpretable AI-Generated Image Detection
Learning under Quantization for High-Dimensional Linear Regression
Talk, Evaluate, Diagnose: User-aware Agent Evaluation with Automated Error Analysis
Memory-Free Continual Learning with Null Space Adaptation for Zero-Shot Vision-Language Models
LLMs Process Lists With General Filter Heads
Towards Better Branching Policies: Leveraging the Sequential Nature of Branch-and-Bound Tree
HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models
RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning
One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning
An Ensemble Framework for Unbiased Language Model Watermarking
When Large Multimodal Models Confront Evolving Knowledge: Challenges and Explorations
Cost-Aware Dynamic Tree Construction for Efficient Large Language Model Inference
Information-based Value Iteration Networks for Decision Making Under Uncertainty
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
Don't Throw Away Your Beams: Improving Consistency-based Uncertainties in LLMs via Beam Search
Customizing Visual Emotion Evaluation for MLLMs: An Open-vocabulary, Multifaceted, and Scalable Approach
Anchor Frame Bridging for Coherent First-Last Frame Video Generation
Overtone: Cyclic Patch Modulation for Cleaner, Faster Physics Emulators
C-Evolve: Consensus-based Evolution for Prompt Groups
Adaptive Canonicalization with Application to Invariant Anisotropic Geometric Networks
FACT: a first-principles alternative to the Neural Feature Ansatz for how networks learn representations
Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction
Omni-IML: Towards Unified Interpretable Image Manipulation Localization
Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
DMAP: A Distribution Map for Text
MedAgentGym: A Scalable Agentic Training Environment for Code-Centric Reasoning in Biomedical Data Science
Fly-CL: A Fly-Inspired Framework for Enhancing Efficient Decorrelation and Reduced Training Time in Pre-trained Model-based Continual Representation Learning
PropensityBench: Evaluating Latent Safety Risks in Large Language Models via an Agentic Approach
Exploring the Design Space of Transition Matching
Battery Fault: A Comprehensive Dataset and Benchmark for Battery Fault Diagnosis
SK2Decompile: LLM-based Two-Phase Binary Decompilation from Skeleton to Skin
TrainRef: Curating Data with Label Distribution and Minimal Reference for Accurate Prediction and Reliable Confidence
Regularized Latent Dynamics Prediction is a Strong Baseline For Behavioral Foundation Models
Guaranteed Simply Connected Mesh Reconstruction from an Unorganized Point Cloud
NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks
Should We Still Pretrain Encoders with Masked Language Modeling?
Latent-Guided Reasoning: Empowering Small LLMs with Large-Model Thinking
CaTS: Calibrated Test-Time Scaling for Efficient LLM Inference
From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones
CLUE: Conflict-guided Localization for LLM Unlearning Framework
Stretching Beyond the Obvious: A Gradient-Free Framework to Unveil the Hidden Landscape of Visual Invariance
Tuning the burn-in phase in training recurrent neural networks improves their performance
Any-Subgroup Equivariant Networks via Symmetry Breaking
On the Design of One-step Diffusion via Shortcutting Flow Paths
Recurrent Action Transformer with Memory
Revisiting Tree-Sliced Wasserstein Distance Through the Lens of the Fermat–Weber Problem
Scalable Exploration for High-Dimensional Continuous Control via Value-Guided Flow
DIVERSE: Disagreement-Inducing Vector Evolution for Rashomon Set Exploration
Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the Diverse Framework
Direct Reward Fine-Tuning on Poses for Single Image to 3D Human in the Wild
AC-Sampler: Accelerate and Correct Diffusion Sampling with Metropolis-Hastings Algorithm
Unifying Complexity-Theoretic Perspectives on Provable Explanations
Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning
DiCache: Let Diffusion Model Determine Its Own Cache
Learning to Solve Orienteering Problem with Time Windows and Variable Profits
Tversky Neural Networks: Psychologically Plausible Deep Learning with Differentiable Tversky Similarity
Exploring Cross-Modal Flows for Few-Shot Learning
Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness
LingoLoop Attack: Trapping MLLMs via Linguistic Context and State Entrapment into Endless Loops
Probing in the Dark: State Entropy Maximization for POMDPs
DriftLite: Lightweight Drift Control for Inference-Time Scaling of Diffusion Models
AgenTracer: Who Is Inducing Failure in the LLM Agentic Systems?
WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference
Learn to Guide Your Diffusion Model
A Law of Data Reconstruction for Random Features (And Beyond)
3D RNA Inverse Design with Reinforcement Learning-Guided Diffusion Models
Combination-of-Experts with Knowledge Sharing for Cross-Task Vehicle Routing Problems
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Convergence of Muon with Newton-Schulz
Complementing Self-Consistency with Cross-Model Disagreement for Uncertainty Quantification
SpikeStereoNet: A Brain-Inspired Framework for Stereo Depth Estimation from Spike Streams
AetherCode: Evaluating LLMs’ Ability to Win In Premier Programming Competitions
On the Thinking-Language Modeling Gap in Large Language Models
Enhancing Persona Following at Decoding Time via Dynamic Importance Estimation for Role-Playing Agents
RADAR: Learning to Route with Asymmetry-aware Distance Representations
Discrete Latent Features Ablate Adversarial Attack: A Robust Prompt Tuning Framework for VLMs
An Improved Model-free Decision-estimation Coefficient with Applications in Adversarial MDPs
Online Learning and Equilibrium Computation with Ranking Feedback
MATRIX: Mask Track Alignment for Interaction-aware Video Generation
Emergence of Spatial Representation in an Actor-Critic Agent with Hippocampus-Inspired Sequence Generator
Adaptive Scaling of Policy Constraints for Offline Reinforcement Learning
Explainable $ K $-means Neural Networks for Multi-view Clustering
Jacobian Aligned Random Forests
FARTrack: Fast Autoregressive Visual Tracking with High Performance
Learning Semi-Structured Sparsity for LLMs via Shared and Context-Aware Hypernetwork
PHAT: Modeling Period Heterogeneity for Multivariate Time Series Forecasting
Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning
Emergence of Superposition: Unveiling the Training Dynamics of Chain of Continuous Thought
Composer: A Search Framework for Hybrid Neural Architecture Design
floq: Training Critics via Flow-Matching for Scaling Compute in Value-Based RL
In Good GRACES: Principled Teacher Selection for Knowledge Distillation
ProxyAttn: Guided Sparse Attention via Representative Heads
STAT: Skill-Targeted Adaptive Training
MTVCraft: Tokenizing 4D Motion for Arbitrary Character Animation
Causal Structure Learning in Hawkes Processes with Complex Latent Confounder Networks
Towards Anomaly-Aware Pre-Training and Fine-Tuning for Graph Anomaly Detection
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
Simplicial Embeddings Improve Sample Efficiency in Actor–Critic Agents
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling
OmniMouse: Scaling properties of multi-modal, multi-task Brain Models on 150B Neural Tokens
Why Ask One When You Can Ask $k$? Learning-to-Defer to the Top-$k$ Experts
Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction
MuonBP: Faster Muon via Block-Periodic Orthogonalization
BeyondBench: Benchmark-Free Evaluation of Reasoning in Language Models
ATLAS: Constraints-Aware Multi-Agent Collaboration for Real-World Travel Planning
Mean Flow Policy with Instantaneous Velocity Constraint for One-step Action Generation
Toward Effective Tool-Integrated Reasoning via Self-Evolved Preference Learning
Rethinking Layer Relevance in Large Language Models Beyond Cosine Similarity
Black-Box Privacy Attacks on Shared Representations in Multitask Learning
EasyCreator: Empowering 4D Creation through Video Inpainting
When Foundation Models are One-Liners: Limitations and Future Directions for Time Series Anomaly Detection
Private Rate-Constrained Optimization with Applications to Fair Learning
Noise Tolerance of Distributionally Robust Learning
How NOT to benchmark your SITE metric: Beyond Static Leaderboards and Towards Realistic Evaluation.
An Open-Ended Benchmark and Formal Framework for Adjuvant Research with MLLM
Beyond Student: An Asymmetric Network for Neural Network Inheritance
OVID: Open-Vocabulary Intrusion Detection
Beyond Binary Preferences: A Principled Framework for Reward Modeling with Ordinal Feedback
OpenFly: A COMPREHENSIVE PLATFORM FOR AERIAL VISION-LANGUAGE NAVIGATION
Statistical Guarantees in the Search for Less Discriminatory Algorithms
LongLive: Real-time Interactive Long Video Generation
SERQ: Saliency-Aware Low-Rank Error Reconstruction for LLM Quantization
Beyond Accuracy: Are Time Series Foundation Models Well-Calibrated?
Synchronizing Probabilities in Model-Driven Lossless Compression
SERE: Similarity-based Expert Re-routing for Efficient Batch Decoding in MoE Models
Corner Gradient Descent
Constantly Improving Image Models Need Constantly Improving Benchmarks
AlignFlow: Improving Flow-based Generative Models with Semi-Discrete Optimal Transport
AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning
Rethinking Radiology Report Generation: From Narrative Flow to Topic-Guided Findings
DPQuant: Efficient and Private Model Training via Dynamic Quantization Scheduling
Scaling Synthetic Task Generation for Agents via Exploration
IC-Custom: Diverse Image Customization via In-Context Learning
FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image Editing
Don't Just Fine-tune the Agent, Tune the Environment
LoongRL: Reinforcement Learning for Advanced Reasoning over Long Contexts
ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents
Zero-shot HOI Detection with MLLM-based Detector-agnostic Interaction Recognition
LUMINA: Detecting Hallucinations in RAG System with Context–Knowledge Signals
RPM: Reasoning-Level Personalization for Black-Box Large Language Models
Learning Hierarchical and Geometry-Aware Graph Representations for Text-to-CAD
SAM-Veteran: An MLLM-Based Human-like SAM Agent for Reasoning Segmentation
SPECS: Decoupling Multimodal Learning via Self-distilled Preference-based Cold Start
Robust Reward Modeling via Causal Rubrics
Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation
VMoBA: Mixture-of-Block Attention for Video Diffusion Models
Command-V: Training-Free Representation Finetuning Transfer
Making, Not Taking, the Best of N
Trust-Region Adaptive Policy Optimization
Object-Centric Refinement for Enhanced Zero-Shot Segmentation
ORCaS: Unsupervised Depth Completion via Occluded Region Completion as Supervision
Optimizing Data Augmentation through Bayesian Model Selection
Human-AI Curation Synergy: Scaling Preference Data Curation via Human-Guided AI Feedback
Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders
Steering and Rectifying Latent representation manifolds in Frozen Multi-modal LLMs for Video Anomaly Detection
Improving Discrete Diffusion Unmasking Policies Beyond Explicit Reference Policies
The Natural Geometry of Code: Hyperbolic Representation Learning for Program Reasoning
Detective SAM: Adaptive AI-Image Forgery Localization
Pseudo-Non-Linear Data Augmentation: A Constrained Energy Minimization Viewpoint
VLM-SubtleBench: How Far Are VLMs from Human-Level Subtle Comparative Reasoning?
Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training
$\nabla$-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Textual Space
Measurement Score-Based Diffusion Model
M$^3$E: Continual Vision-and-Language Navigation via Mixture of Macro and Micro Experts
WearVox: An Egocentric Multichannel Voice Assistant Benchmark for Wearables
Beyond Text-to-Image: Liberating Generation with a Unified Discrete Diffusion Model
Neyman-Pearson Classification under Both Null and Alternative Distributions Shift
Scaling Bayesian Experimental Design to High-Dimensions with Information-Guided Diffusion
Sample Smart, Not Hard: Correctness-First Decoding for Better Reasoning in LLMs
Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity
Bridging Past and Future: Distribution-Aware Alignment for Time Series Forecasting
CORE: Concept-Oriented Reinforcement for Bridging the Definition–Application Gap in Mathematical Reasoning
$\ell_1$ Latent Distance based Continuous-time Graph Representation
Calibrating Verbalized Confidence with Self-Generated Distractors
GeoBench: Rethinking Multimodal Geometric Problem-Solving via Hierarchical Evaluation
Local Reinforcement Learning with Action-Conditioned Root Mean Squared Q-Functions
SSDi8: Accurate and Efficient 8-bit Quantization for State Space Duality
Terminal Velocity Matching
Discrete Bayesian Sample Inference for Graph Generation
Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems
Training Large Reasoning Models Efficiently via Progressive Thought Encoding
ReVeal: Self-Evolving Code Agents via Reliable Self-Verification
Data Aware and Scalable Sensitivity Analysis for Decision Tree Ensembles
WMPO: World Model-based Policy Optimization for Vision-Language-Action Models
Guiding Mixture-of-Experts with Temporal Multimodal Interactions
Empowering Small VLMs to Think with Dynamic Memorization and Exploration
Consistent Noisy Latent Rewards for Trajectory Preference Optimization in Diffusion Models
RF-DETR: Neural Architecture Search for Real-Time Detection Transformers
Scaling Reasoning Hop Exposes Weaknesses: Demystifying and Improving Hop Generalization in Large Language Models
Why is Your Language Model a Poor Implicit Reward Model?
Learning Part-Aware Dense 3D Feature Field For Generalizable Articulated Object Manipulation
Data Provenance for Image Auto-Regressive Generation
FM4NPP: A Scaling Foundation Model for Nuclear and Particle Physics
Characterization and Learning of Causal Graphs with Latent Confounders and Post-treatment Selection from Interventional Data
PYRREGULAR: A Unified Framework for Irregular Time Series, with Classification Benchmarks
TangoFlux: Text to Audio Generation with CLAP-Ranked Preference Optimization
MAS$^2$: Self-Generative, Self-Configuring, Self-Rectifying Multi-Agent Systems
Convergent Differential Privacy Analysis for General Federated Learning
Highly Efficient and Effective LLMs with Multi-Boolean Architectures
Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization
Language-Instructed Vision Embeddings for Controllable and Generalizable Perception
Are we measuring oversmoothing in graph neural networks correctly?
Towards Robust Real-World Multivariate Time Series Forecasting: A Unified Framework for Dependency, Asynchrony, and Missingness
Monotone Near-Zero-Sum Games
Computational Bottlenecks for Denoising Diffusions
SURGE: Surprise-Guided Token Reduction for Efficient Video Understanding with VLMs
Target-Aware Video Diffusion Models
PRISM: Festina Lente Proactivity—Risk-Sensitive, Uncertainty-Aware Deliberation for Proactive Agents
The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs
NAIPv2: Debiased Pairwise Learning for Efficient Paper Quality Estimation
Rényi Sharpness: A Novel Sharpness that Strongly Correlates with Generalization
Supporting High-Stakes Decision Making Through Interactive Preference Elicitation in the Latent Space
Learning Flexible Forward Trajectories for Masked Molecular Diffusion
TNT: Improving Chunkwise Training for Test-Time Memorization
OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents
SafeFlowMatcher: Safe and Fast Planning using Flow Matching with Control Barrier Functions
Measuring Uncertainty Calibration
Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset
Probing to Refine: Reinforcement Distillation of LLM Reasoners via Explanatory Inversion
$PhyWorldBench$: A Comprehensive Evaluation of Physical Realism in Text-to-Video Models
Accelerating Eigenvalue Dataset Generation via Chebyshev Subspace Filter
Extending Fourier Neural Operators for Modeling Parameterized and Coupled PDEs
GEOMETRY OF UNCERTAINTY: LEARNING METRIC SPACES FOR MULTIMODAL STATE ESTIMATION IN RL
Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification
ToProVAR: Efficient Visual Autoregressive Modeling via Tri-Dimensional Entropy-Aware Semantic Analysis and Sparsity Optimization
OpenEstimate: Evaluating LLMs on Probabilistic Estimation with Real-World Data
DRIFT: Learning from Abundant User Dissatisfaction in Real-World Preference Learning
Revisiting Multimodal Positional Encoding in Vision–Language Models
**TandemFoilSet**: Datasets for Flow Field Prediction of Tandem-Airfoil Through the Reuse of Single Airfoils
SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus
Measuring Audio's Impact on Correctness: Audio-Contribution-Aware Post-Training of Large Audio Language Models
Efficient Morphology–Control Co-Design via Stackelberg PPO under Non-Differentiable Leader–Follower Interfaces
Towards Real-World Routing with Neural Combinatorial Optimization
Enhancing Multi-Image Understanding through Delimiter Token Scaling
Continuous Chain of Thought: Parallel Exploration and Reasoning through a Theoretical Lens
All Patches Matter, More Patches Better: Enhance AI-Generated Image Detection via Panoptic Patch Learning
Semantic-aware Wasserstein Policy Regularization for Large Language Model Alignment
R4: Nested Reasoning-Retrieval for Reward Modeling in Role-Playing Agents
CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving
Fairness via Independence: A General Regularization Framework for Machine Learning
AMemGym: Interactive Memory Benchmarking for Assistants in Long-Horizon Conversations
Fingerprinting Deep Neural Networks for Ownership Protection: An Analytical Approach
When Flatness Does (Not) Guarantee Adversarial Robustness
Generalized Parallel Scaling with Interdependent Generations
iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models
Watermark-based Attribution of AI-Generated Images
What Happens Next? Anticipating Future Motion by Generating Point Trajectories
LLM Fingerprinting via Semantically Conditioned Watermarks
Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models
FutureFill: Fast Generation from Convolutional Sequence Models
Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing
Controllable diffusion-based generation for multi-channel biological data
AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation
T1: Tool-integrated Verification for Test-time Compute Scaling in Small Language Models
Temporal Slowness in Central Vision Drives Semantic Object Learning
MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on LLMs
RainPro-8: An Efficient Deep Learning Model to Estimate Rainfall Probabilities Over 8 Hours
SIGMark: Scalable In-Generation Watermark with Blind Extraction for Video Diffusion
PAMDP: Interact to Persona Alignment via a Partially Observable Markov Decision Process
PLANETALIGN: A Comprehensive Python Library for Benchmarking Network Alignment
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Moving Beyond Medical Exams: A Clinician-Annotated Fairness Dataset of Real-World Tasks and Ambiguity in Mental Healthcare
Turning Internal Gap into Self-Improvement: Promoting the Generation-Understanding Unification in MLLMs
OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning
MIDAS: Multi-Image Dispersion and Semantic Reconstruction for Jailbreaking MLLMs
Kimi-Dev: Agentless Training as Skill Prior for SWE-agents
On the identifiability of causal graphs with multiple environments
Hystar: Hypernetwork-driven Style-adaptive Retrieval via Dynamic SVD Modulation
OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning
Closing the Safety Gap: Surgical Concept Erasure in Visual Autoregressive Models
MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers
Mini-cluster Guided Long-tailed Deep Clustering
Learning is Forgetting; LLM Training As Lossy Compression
Muon Outperforms Adam in Tail-End Associative Memory Learning
Latent Stochastic Interpolants
Durian: Dual Reference Image-Guided Portrait Animation with Attribute Transfer
Is On-Policy Data always the Best Choice for Direct Preference Optimization-Based LM Alignment?
Efficient-LVSM: Faster, Cheaper, and Better Large View Synthesis Model via Decoupled Co-Refinement Attention
Adaptive Logit Adjustment for Debiasing Multimodal Language Models
Play to Generalize: Learning to Reason Through Game Play
PluriHarms: Benchmarking the Full Spectrum of Human Judgments on AI Harm
Weight Space Representation Learning on Diverse NeRF Architectures
Offline Reinforcement Learning with Adaptive Feature Fusion
Multi-Synaptic Cooperation: A Bio-Inspired Framework for Robust and Scalable Continual Learning
Text summarization via global structure awareness
MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes
EvolProver: Advancing Automated theorem proving by Evolving Formalized Problems via Symmetry and Difficulty
GRACE: A Language Model Framework for Explainable Inverse Reinforcement Learning
Robust Fine-tuning of Vision-Language-Action Robot Policies via Parameter Merging
Breaking Gradient Temporal Collinearity for Robust Spiking Neural Networks
FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models
Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment
The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think
Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards
Transformers Trained via Gradient Descent Can Provably Learn a Class of Teacher Models
Spectrum Tuning: Post-Training for Distributional Coverage and In-Context Steerability
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use
What Matters for Batch Online Reinforcement Learning in Robotics?
MOSS: Efficient and Accurate FP8 LLM Training with Microscaling and Automatic Scaling
Exploring Diverse Generation Paths via Inference-time Stiefel Activation Steering
Mathesis: Towards Formal Theorem Proving from Natural Languages
The Shape of Adversarial Influence: Characterizing LLM Latent Spaces with Persistent Homology
Hedonic Neurons: A Mechanistic Mapping of Latent Coalitions in Transformer MLPs
Hallucination-aware Intermediate Representation Editing in Large Vision-Lanugage Models
Explainable Mixture Models through Differentiable Rule Learning
Sampling-aware Adversarial Attacks Against Large Language Models
CircuitNet 3.0: A Multi-Modal Dataset with Task-Oriented Augmentation for AI-Driven Circuit Design
Reevaluating Policy Gradient Methods for Imperfect-Information Games
Adaptive Conformal Prediction via Mixture-of-Experts Gating Similarity
HiTeA: Hierarchical Temporal Alignment for Training-Free Long-Video Temporal Grounding
Every Language Model Has a Forgery-Resistant Signature
Nesterov Finds GRAAL: Optimal and Adaptive Gradient Method for Convex Optimization
DRAGON: Guard LLM Unlearning in Context via Negative Detection and Reasoning
EffiVMT: Video Motion Transfer via Efficient Spatial-Temporal Decoupled Finetuning
How Far Are LLMs from Professional Poker Players? Revisiting Game-Theoretic Reasoning with Agentic Tool Use
Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game
Q-Learning with Adjoint Matching
DeepAFL: Deep Analytic Federated Learning
Weak Correlations as the Underlying Principle for Linearization of Gradient-Based Learning Systems
Guided Speculative Inference for Efficient Test-Time Alignment of LLMs
DTO-KD: Dynamic Trade-off Optimization for Effective Knowledge Distillation
FlowSymm: Physics–Aware, Symmetry–Preserving Graph Attention for Network Flow Completion
Any-Order Any-Subset AutoRegressive Model
Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning
A universal compression theory: Lottery ticket hypothesis and superpolynomial scaling laws
Graph Signal Processing Meets Mamba2: Adaptive Filter Bank via Delta Modulation
FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory
ContextGen: Contextual Layout Anchoring for Identity-Consistent Multi-Instance Generation
Critical Confabulation: Can LLMs Hallucinate for Social Good?
xRFM: Accurate, scalable, and interpretable feature learning models for tabular data
Knowledge Fusion of Large Language Models via Modular SkillPacks
Error as Signal: Stiffness-Aware Diffusion Sampling via Embedded Runge-Kutta Guidance
Learning Nonlinear Causal Reductions to Explain Reinforcement Learning Policies
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
ARMs: Adaptive Red-Teaming Agent against Multimodal Models with Plug-and-Play Attacks
Improving Diffusion Models for Class-imbalanced Training Data via Capacity Manipulation
FastFlow: Accelerating The Generative Flow Matching Models with Bandit Inference
FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
Planned Diffusion
Rethinking Uncertainty Estimation in LLMs: A Principled Single-Sequence Measure
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities
Explainable LLM Unlearning through Reasoning
Flatter Tokens are More Valuable for Speculative Draft Model Training
SatDreamer360: Multiview-Consistent Generation of Ground-Level Scenes from Satellite Imagery
Expert Divergence Learning for MoE-based Language Models
Causally Robust Preference Learning with Reasons
Attack-Resistant Watermarking for AIGC Image Forensics via Diffusion-based Semantic Deflection
Beyond Noisy-TVs: Noise-Robust Exploration Via Learning Progress Monitoring
PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits
ACE: Attribution-Controlled Knowledge Editing for Multi-hop Factual Recall
Structure-Aware Graph Hypernetworks for Neural Program Synthesis
A Noise is Worth Diffusion Guidance
Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs
Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification
Quantized Gradient Projection for Memory-Efficient Continual Learning
The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward
Learning More with Less: A Dynamic Dual-Level Down-Sampling Framework for Efficient Policy Optimization
Codified Finite-state Machines for Role-playing
Flock: A Knowledge Graph Foundation Model via Learning on Random Walks
Disentangled Representation Learning for Parametric Partial Differential Equations
OpenAgentSafety: A Comprehensive Framework For Evaluating Real-World AI Agent Safety
ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis
SceneTransporter: Optimal Transport-Guided Compositional Latent Diffusion for Single-Image Structured 3D Scene Generation
Kevin: Multi-Turn RL for Generating CUDA Kernels
VideoAgentTrek: Computer-Use Pretraining from Unlabeled Videos
Representational Alignment Across Model Layers and Brain Regions with Hierarchical Optimal Transport
What matters for Representation Alignment: Global Information or Spatial Structure?
Programming by Backprop: Learning Behaviour from Symbolic Descriptions
Low-Latency Neural LiDAR Compression with 2D Context Models
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models
Mixture of Contexts for Long Video Generation
Improving LLM-based Global Optimization with Search Space Partitioning
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
Interact-RAG: Reason and Interact with the Corpus, Beyond Black-Box Retrieval
MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark
MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction
How to train data-efficient LLMs
PepTri: Tri-Guided All-Atom Diffusion for Peptide Design via Physics, Evolution, and Mutual Information
RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments
Reducing Semantic Mismatch in Brain-to-Text Decoding Through Personalized Multimodal Masking
RoRE: Rotary Ray Embedding for Generalised Multi-Modal Scene Understanding
Sparling: End-to-End Spatial Concept Learning via Extremely Sparse Activations
Watch your steps: Dormant Adversarial Behaviors that Activate upon LLM Finetuning
Causal-Steer: Disentangled Continuous Style Control without Parallel Corpora
FaSTA*: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing
Nemotron-Research-Tool-N1: Exploring Tool-Using Language Models with Reinforced Reasoning
Efficient Quantization of Mixture-of-Experts with Theoretical Generalization Guarantees
Channel-Aware Mixed-Precision Quantization for Efficient Long-Context Inference
Latent Adaptation of Foundation Policies for Sim-to-Real Transfer
From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation
Generative Blocks World: Moving Things Around in Pictures
The Potential of Second-Order Optimization for LLMs: A Study with Full Gauss-Newton
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
Death of the Novel(ty): Beyond N-Gram Novelty as a Metric for Textual Creativity
LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning
RAVEN: End-to-end Equivariant Robot Learning with RGB Cameras
Bridging the Gap Between Promise and Performance for FP4 Quantization
Avoid Catastrophic Forgetting with Rank-1 Fisher from Diffusion Models
DualMap: Enabling Both Cache Affinity and Load Balancing for Distributed LLM Serving
Softmax is not Enough (for Adaptive Conformal Classification)
Prompt-MII: Meta-Learning Instruction Induction for LLMs
A Statistical Benchmark for Diffusion Posterior Sampling Algorithms
OmniNav: A Unified Framework for Prospective Exploration and Visual-Language Navigation
VeriCoT: Neuro-symbolic Chain-of-Thought Validation via Logical Consistency Checks
PICS: Pairwise Image Compositing with Spatial Interactions
AlphaSAGE: Structure-Aware Alpha Mining via GFlowNets for Robust Exploration
Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation
Generalization Below the Edge of Stability: The Role of Data Geometry
SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via Velocity-Reparameterized Sequential Modeling
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning
Fracture-GS: Dynamic Fracture Simulation with Physics-Integrated Gaussian Splatting
PoinnCARE: Hyperbolic Multi-Modal Learning for Enzyme Classification
CLoD-GS: Continuous Level-of-Detail via 3D Gaussian Splatting
Towards Cognitively-Faithful Decision-Making Models to Improve AI Alignment
TSM-Bench: Detecting LLM-Generated Text in Real-World Wikipedia Editing Practices
Towards True Speech-to-Speech Models Without Text Guidance
Draft-based Approximate Inference for LLMs
Captain Cinema: Towards Short Movie Generation
Translate Policy to Language: Flow Matching Generated Rewards for LLM Explanations
Aegis: Automated Error Generation and Identification for Multi-Agent Systems
Boomerang Distillation Enables Zero-Shot Model Size Interpolation
Algorithmic Guarantees for Distilling Supervised and Offline RL Datasets
Clustering by Denoising: Latent plug-and-play diffusion for single-cell embeddings
Your Language Model Secretly Contains Personality Subnetworks
DiVE-k: DIFFERENTIAL VISUAL REASONING FOR FINE-GRAINED IMAGE RECOGNITION
Bandit Learning in Matching Markets Robust to Adversarial Corruptions
SGD with Adaptive Preconditioning: Unified Analysis and Momentum Acceleration
Queue Length Regret Bounds for Contextual Queueing Bandits
Activation Steering with a Feedback Controller
Reinforcing Diffusion Models by Direct Group Preference Optimization
Enforcing Axioms for AI Alignment under Loss-Based Rules
From Concepts to Components: Concept-Agnostic Attention Module Discovery in Transformers
BBQ: Boosting Quantization Entropy with Bell Box Quantization
Sparsity Forcing: Reinforcing Token Sparsity of MLLMs
Variance-Dependent Regret Lower Bounds for Contextual Bandits
Analyzing and Evaluating Unbiased Language Model Watermark
RATE-DISTORTION OPTIMIZED COMMUNICATION FOR COLLABORATIVE PERCEPTION
Verification and Co-Alignment via Heterogeneous Consistency for Preference-Aligned LLM Annotations
Towards Learned Optimization Free Lunch
Consistent Low-Rank Approximation
On Smoothness Bounds for Non-Clairvoyant Scheduling with Predictions
The Forecast After the Forecast: A Post-Processing Shift in Time Series
Conjuring Semantic Similarity
Online Alignment as Perceptual Loss
FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark
Mean-Field Neural Differential Equations: A Game-Theoretic Approach to Sequence Prediction
On Entropy Control in LLM-RL Algorithms
Enhancing Multivariate Time Series Forecasting with Global Temporal Retrieval
Scalable Random Wavelet Features: Efficient Non-Stationary Kernel Approximation with Convergence Guarantees
Training-Free Determination of Network Width via Neural Tangent Kernel
Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs
FlexHiNM-GP: Flexible Hierarchical Pruning via Region Allocation and Channel Permutation
Plug-and-Play Fidelity Optimization for Diffusion Transformer Acceleration via Cumulative Error Minimization
How to Lose Inherent Counterfactuality in Reinforcement Learning
Quantization-Aware Diffusion Models For Maximum Likelihood Training
Interactive Learning of Single-Index Models via Stochastic Gradient Descent
SNaX: sparse narrow accelerated mixture of experts
Elastic Optimal Transport: Theory, Application, and Empirical Evaluation
Cross-Tokenizer Likelihood Scoring Algorithms for Language Model Distillation
Perturbation-Induced Linearization: Constructing Unlearnable Data with Solely Linear Classifiers
High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning
String Seed of Thought: Prompting LLMs for Distribution-Faithful and Diverse Generation
Almost Bayesian: Dynamics of SGD Through Singular Learning Theory
Data-to-Energy Stochastic Dynamics
WILD-Diffusion: A WDRO Inspired Training Method for Diffusion Models under Limited Data
Multi-Agent Debate with Memory Masking
Graph Tokenization for Bridging Graphs and Transformers
Reconstruct Anything Model a lightweight foundation model for computational imaging
Deep Hierarchical Learning with Nested Subspace Networks
The Softmax Bottleneck Does Not Limit the Probabilities of the Most Likely Tokens
$\mathbf{Li_2}$: A Framework on Dynamics of Feature Emergence and Delayed Generalization
Into the Rabbit Hull: From Task-Relevant Concepts in DINO to Minkowski Geometry
Noise-Aware Generalization: Robustness to In-Domain Noise and Out-of-Domain Generalization
Fantastic Pretraining Optimizers and Where to Find Them
Diversified Multinomial Logit Contextual Bandits
Improving Semantic Proximity in English-Centric Information Retrieval through Cross-Lingual Alignment
Conditioned Initialization for Attention
Theory of Scaling Laws for In-Context Regression: Depth, Width, Context and Time
Pose Prior Learner: Unsupervised Categorical Prior Learning for Pose Estimation
Cyber-Zero: Training Cybersecurity Agents without Runtime
Multi-Head Low-Rank Attention
Learning from Label Proportions via Proportional Value Classification
A Convergence Analysis of Adaptive Optimizers under Floating-point Quantization
AQuA: Toward Strategic Response Generation for Ambiguous Visual Questions
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
A Study of Posterior Stability in Time-Series Latent Diffusion
MoSA: Mosaic Shared Adaptation of Large Language Models
AutoDV: An End-to-End Deep Learning Model for High-Dimensional Data Visualization
Rethinking Continual Learning with Progressive Neural Collapse
Covariate-Guided Clusterwise Linear Regression for Generalization to Unseen Data
Control Tax: The Price of Keeping AI in Check
Riemannian Federated Learning via Averaging Gradient Streams
Learning in Prophet Inequalities with Noisy Observations
Unified Vision–Language Modeling via Concept Space Alignment
From EduVisBench to EduVisAgent: A Benchmark and Multi-Agent Framework for Reasoning-Driven Pedagogical Visualization
Memory-Statistics Tradeoff in Continual Learning with Structural Regularization
Predicting Training Re-evaluation Curves Enables Effective Data Curriculums for LLMs
HYPER: A Foundation Model for Inductive Link Prediction with Knowledge Hypergraphs
Pay Attention to CTC: Fast and Robust Pseudo-Labelling for Unified Speech Recognition
Identifiability Challenges in Sparse Linear Ordinary Differential Equations
Asymmetric Proximal Policy Optimization: mini-critics boost LLM reasoning
Pretraining Scaling Laws for Generative Evaluations of Language Models
Riesz Neural Operator for Solving Partial Differential Equations
SciNav: A Principled Agent Framework for Scientific Coding Tasks
TTSDS2: Resources and Benchmark for Evaluating Human-Quality Text to Speech Systems
LatentQA: Teaching LLMs to Decode Activations Into Natural Language
QLIP: A Dynamic Quadtree Vision Prior Enhances MLLM Performance Without Retraining
D$^2$GS: Depth-and-Density Guided Gaussian Splatting for Stable and Accurate Sparse-View Reconstruction
EquAct: An SE(3)-Equivariant Multi-Task Transformer for 3D Robotic Manipulation
Jailbreak Transferability Emerges from Shared Representations
Distributed Algorithms for Euclidean Clustering
Pursuing Minimal Sufficiency in Spatial Reasoning
jqBench: a benchmark for reading and editing JSON from natural language and/or examples
Gradient-Based Program Synthesis with Neurally Interpreted Languages
Scalable Chain of Thoughts via Elastic Reasoning
Efficient Estimation of Kernel Surrogate Models for Task Attribution
Gauge Flow Matching: Efficient Constrained Generative Modeling over General Convex Set and Beyond
Learning to Play Multi-Follower Bayesian Stackelberg Games
Diffusion Transformers with Representation Autoencoders
Secure Inference for Diffusion Models via Unconditional Scores
Eliminating Inductive Bias in Reward Models with Information-Theoretic Guidance
Self-Rewarding Vision-Language Model via Reasoning Decomposition and Multi-Reward Policy Optimization
Combinatorial Bandit Bayesian Optimization for Tensor Outputs
Language Identification in the Limit with Computational Trace
Depth Anything with Any Prior
AdaCache: Adaptive Caching and Context Augmentation for Efficient LLM Serving
Multi-Task Low-Rank Model Adaptation
Enhancing LLMs for Knowledge Base Question Answering by Chain-of-Decomposition
To View Transform or Not to View Transform: NeRF-based Pre-training Perspective
Beyond Short Steps in Frank-Wolfe Algorithms
On the trade-off between expressivity and privacy in graph representation learning
SONIC: Spectral Oriented Neural Invariant Convolutions
NI Sampling: Accelerating Discrete Diffusion Sampling by Token Order Optimization
Robust Preference Alignment via Directional Neighborhood Consensus
PROS: Towards Compute-Efficient RLVR via Rollout Prefix Reuse
Distractor-free Generalizable 3D Gaussian Splatting
LongRLVR: Long-Context Reinforcement Learning Requires Verifiable Context Rewards
Quantum machine learning advantages beyond hardness of evaluation
Convergence of Actor-Critic gradient flow for entropy regularised MDPs in general spaces
Confident and Adaptive Generative Speech Recognition via Conformal Risk Control
Dimension-Free Decision Calibration for Nonlinear Loss Functions
Rethinking Code Similarity for Automated Algorithm Design with LLMs
Continuous Audio Language Models
Optimizing Canaries for Privacy Auditing with Metagradient Descent
VITA: Vision-to-Action Flow Matching Policy
Diffusion Language Models For Code Infilling Beyond Fixed-size Canvas
Unified Analyses for Hierarchical Federated Learning: Topology Selection under Data Heterogeneity
Sharing State Between Prompts and Programs
MATHMO: Automated Mathematical Modeling Through Adaptive Search
Fine-Grained Iterative Adversarial Attacks with Limited Computation Budget
Fair Policy Aggregation from Standard Policy Optimization
On the Interpolation Effect of Score Smoothing in Diffusion Models
Token Distillation: Attention-Aware Input Embeddings for New Tokens
Time-Gated Multi-Scale Flow Matching for Time-Series Imputation
Skirting Additive Error Lower Bounds for Private Turnstile Streams
Learning to Reason over Continuous Tokens with Reinforcement Learning
Persona Features Control Emergent Misalignment
Automata Learning and Identification of the Support of Language Models
Prediction with Expert Advice under Local Differential Privacy
Neologism Learning for Controllability and Self-Verbalization
In-Context Learning for Pure Exploration
An Information-Theoretical Framework For Optimizing Experimental Design To Distinguish Probabilistic Neural Codes
Learning Shrinks the Hard Tail: Training‑Dependent Inference Scaling in a Solvable Linear Model
Efficient Testing for Correlation Clustering: Improved Algorithms and Optimal Bounds
COLD-Steer: Steering Large Language Models via In-Context One-step Learning Dynamics
Saddle-To-Saddle Dynamics in Deep ReLU Networks: Low-Rank Bias in the First Saddle Escape
MOBODY: Model-Based Off-Dynamics Offline Reinforcement Learning
TSLM: Tree-Structured Language Modeling for Divergent Thinking
Splat Regression Models
SWERank: Software Issue Localization with Code Ranking
Infinite Horizon Markov Economies
Robust Decision-Making with Partially Calibrated Forecasters
Language Confusion Gate: Language-Aware Decoding Through Model Self-Distillation
Log-Linear Attention
Strategic Obfuscation of Deceptive Reasoning in Language Models
How Dark Patterns Manipulate Web Agents
LLM-Guided Evolutionary Program Synthesis for Quasi-Monte Carlo Design
Natural Language PDDL (NL-PDDL) for Open-world Goal-oriented Commonsense Regression Planning in Embodied AI
From Pixels to Semantics: Unified Facial Action Representation Learning for Micro-Expression Analysis
Learning multimodal dictionary decompositions with group-sparse autoencoders
Neon: Negative Extrapolation From Self-Training Improves Image Generation
Weak-to-Strong Diffusion
Align to Misalign: Automatic LLM Jailbreak with Meta-Optimized LLM Judges
Flow Map Learning via Games
Polychromic Objectives for Reinforcement Learning
MoEEdit: Efficient and Routing-Stable Knowledge Editing for Mixture-of-Experts LLMs
In Context Semi-Supervised Learning
Steering Language Models with Weight Arithmetic
Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models
Condition Errors Refinement in Autoregressive Image Generation with Diffusion Loss
Random Controlled Differential Equations
Submodular Function Minimization with Dueling Oracle
Quadratic Direct Forecast for Training Multi-Step Time-Series Forecast Models
CONSIGN: Conformal Segmentation Informed by Spatial Groupings via Decomposition
Online Inventory Optimization in Non-Stationary Environment
Dynamics-Predictive Sampling for Active RL Finetuning of Large Reasoning Models
Hyperspherical Latents Improve Continuous-Token Autoregressive Generation
Autoregressive Image Generation with Randomized Parallel Decoding
Multi-Armed Bandits with Minimum Aggregated Revenue Constraints
Prosperity before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs?
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation
Stage-wise Dynamics of Classifier-Free Guidance in Diffusion Models
Understanding and Improving Hyperbolic Deep Reinforcement Learning
Merge before Forget: A Single LoRA Continual Learning via Continual Merging
Why Attention Patterns Exist: A Unifying Temporal Perspective Analysis
Towards a Theoretical Understanding of In-context Learning: Stability and Non-I.I.D Generalisation
Micro-Macro Coupled Koopman Modeling on Graph for Traffic Flow Prediction
Naming to Learn: Class Incremental Learning for Vision-Language Model with Unlabeled Data
Enhancing Image-Conditional Coverage in Segmentation: Adaptive Thresholding via Differentiable Miscoverage Loss
LVTINO: LAtent Video consisTency INverse sOlver for High Definition Video Restoration
SoftCFG: Uncertainty-guided Stable Guidance for Visual Autoregressive Model
MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding
Disco: Densely-overlapping Cell Instance Segmentation via Adjacency-aware Collaborative Coloring
Reward Model Routing in Alignment
Defending against Backdoor Attacks via Module Switching
3D-aware Disentangled Representation for Compositional Reinforcement Learning
Probabilistic Kernel Function for Fast Angle Testing
REAL: Reading Out Transformer Activations for Precise Localization in Language Model Steering
PMDformer: Patch-Mean Decoupling Transformer for Long-term Forecasting
Joint Discriminative-Generative Modeling via Dual Adversarial Training
Enhancing Vision Transformers for Object Detection via Context-Aware Token Selection and Packing
Trapped by simplicity: When Transformers fail to learn from noisy features
SSD-GS: Scattering and Shadow Decomposition for Relightable 3D Gaussian Splatting
Adversarial Déjà Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks
HiPO: Self-Hint Policy Optimization for RLVR
SupCLAP: Controlling Optimization Trajectory Drift in Audio-Text Contrastive Learning with Support Vector Regularization
ST-HHOL: Spatio-Temporal Hierarchical Hypergraph Online Learning for Crime Prediction
Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents
How Muon’s Spectral Design Benefits Generalization: A Study on Imbalanced Data
Agnostics: Learning to Synthesize Code in Any Programming Language with a Universal Reinforcement Learning Environment
A Problem-Oriented Perspective and Anchor Verification for Code Optimization
ProxyThinker: Test-Time Guidance through Small Visual Reasoners
Decentralized Attention Fails Centralized Signals: Rethinking Transformers for Medical Time Series
Prompt and Parameter Co-Optimization for Large Language Models
Is Graph Unlearning Ready for Practice? A Benchmark on Efficiency, Utility, and Forgetting
Learning to Reason Efficiently with Discounted Reinforcement Learning
ZeroSiam: An Efficient Siamese for Test-Time Entropy Optimization without Collapse
EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling
SparseEval: Efficient Evaluation of Large Language Models by Sparse Optimization
Neural Latent Arbitrary Lagrangian-Eulerian Grids for Fluid-Solid Interaction
Efficient Agent Training for Computer Use
Generative Diffusion Prior Distillation for Long-Context Knowledge Transfer
GLASS Flows: Efficient Inference for Reward Alignment of Flow and Diffusion Models
SESaMo: Symmetry-Enforcing Stochastic Modulation for Normalizing Flows
SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models
Optimal Transport-Induced Samples against Out-of-Distribution Overconfidence
Signal Structure-Aware Gaussian Splatting for Large-Scale Scene Reconstruction
Safe Exploration via Policy Priors
Action-Guided Attention for Video Action Anticipation
Goal-Aware Identification and Rectification of Misinformation in Multi-Agent Systems
Deep-ICE: The first globally optimal algorithm for empirical risk minimization of two-layer maxout and ReLU networks
Extreme Weather Nowcasting via Local Precipitation Pattern Prediction
Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning
PU-BENCH: A UNIFIED BENCHMARK FOR RIGOROUS AND REPRODUCIBLE PU LEARNING
MARS: Reinforcing Multi-Agent Reasoning of LLMs through Self-Play in Strategic Games
A Comprehensive Information-Decomposition Analysis of Large Vision-Language Models
A$^2$FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning
Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs
Large Language Model Compression with Global Rank and Sparsity Optimization
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
Chunking the Critic: A Transformer-based Soft Actor-Critic with N-Step Returns
TEMPFLOW-GRPO: WHEN TIMING MATTERS FOR GRPO IN FLOW MODELS
Fine-Grained Class-Conditional Distribution Balancing for Debiased Learning
Image Inpainting with Preference Alignment
Composition-Grounded Instruction Synthesis for Visual Reasoning
WARC-Bench: Web Archive based Benchmark for GUI Subtask Executions
Learning Boltzmann Generators via Constrained Mass Transport
Improving Long-Range Interactions in Graph Neural Simulators via Hamiltonian Dynamics
SAIL: Self-Amplified Iterative Learning for Diffusion Model Alignment with Minimal Human Feedback
RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning
TS-DDAE: A novel Temporal-Spectral Denoising Diffusion AutoEncoder for Wireless Signal Recognition Model Pre-training
MSCR: Exploring the Vulnerability of LLMs’ Mathematical Reasoning Abilities Using Multi-Source Candidate Replacement
LoRA meets Riemannion: Muon Optimizer for Parametrization-independent Low-Rank Adapters
Genomic Foundationless Models: Pretraining Does Not Promise Performance
FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM Agents
Consistency-Driven Calibration and Matching for Few-Shot Class Incremental Learning
CityLens: Evaluating Large Vision-Language Models for Urban Socioeconomic Sensing
Efficient Reasoning with Balanced Thinking
IR-Agent: Expert-Inspired LLM Agents for Structure Elucidation from Infrared Spectra
LoRAGen: Structure-Aware Weight Space Learning for LoRA Generation
UniF$^2$ace: A $\underline{Uni}$fied $\underline{F}$ine-grained $\underline{Face}$ Understanding and Generation Model
Image is All You Need: Towards Efficient and Effective Large Language Model-Based Recommender Systems
Knowledge Distillation for Large Language Models through Residual Learning
Discovering Novel LLM Experts via Task-Capability Coevolution
HFSTI-Net: Hierarchical Frequency-spatial-temporal Interactions for Video Polyp Segmentation
Lipschitz Bandits with Stochastic Delayed Feedback
DETR-ViP: Detection Transformer with Robust Discriminative Visual Prompts
Enhancing Language Model Reasoning with Structured Multi-Level Modeling
WavePolyp: Video Polyp Segmentation via Hierarchical Wavelet-Based Feature Aggregation and Inter-Frame Divergence Perception
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Model
Reasoning Language Model Inference Serving Unveiled: An Empirical Study
G-reasoner: Foundation Models for Unified Reasoning over Graph-structured Knowledge
Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs
Energy-Efficient Random Variate Generation via Compressed Lookup Tables
Efficient algorithms for Incremental Metric Bipartite Matching
Your Models Have Thought Enough: Training Large Reasoning Models to Stop Overthinking
Bayesian Attention Mechanism: A Probabilistic Framework for Positional Encoding and Context Length Extrapolation
BEP: A Binary Error Propagation Algorithm for Binary Neural Networks Training
Vertically Unified Agents for Graph Retrieval-Augmented Complex Reasoning
Glance for Context: Learning When to Leverage LLMs for Node-Aware GNN-LLM Fusion
Towards Text-Mask Consistency in Medical Image Segmentation
Exploring Mode Connectivity in Krylov Subspace for Domain Generalization
MICLIP: Learning to Interpret Representation in Vision Models
Flatness Guided Test-Time Adaptation for Vision-Language Models
When and Where to Reset Matters for Long-Term Test-Time Adaptation
iFusion: Integrating Dynamic Interest Streams via Diffusion Model for Click-Through Rate Prediction
A Stitch in Time Saves Nine: Proactive Self-Refinement for Language Models
Autoencoding-Free Context Compression for LLMs via Contextual Semantic Anchors
Neural Optimal Transport Meets Multivariate Conformal Prediction
Semantic Uncertainty Quantification of Hallucinations in LLMs: A Quantum Tensor Network Based Method
JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models
Self-Aligned Reward: Towards Effective and Efficient Reasoners
PMI: Flow-Based Inversion Correction via Proximal Operator
From Samples to Scenarios: A New Paradigm for Probabilistic Forecasting
AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4D Scenes
Practical estimation of the optimal classification error with soft labels and calibration
Relationship Alignment for View-aware Multi-view Clustering
Latent Planning Emerges with Scale
SpatiaLab: Can Vision–Language Models Perform Spatial Reasoning in the Wild?
Safety at One Shot: Patching Fine-Tuned LLMs with A Single Instance
Enhancing Stability of Physics-Informed Neural Network Training Through Saddle-Point Reformulation
C-Voting: Confidence-Based Test-Time Voting without Explicit Energy Functions
Modeling Interference for Treatment Effect Estimation in Network Dynamic Environment
Credit-Budgeted ICPC-Style Coding: When LLM Agents Must Pay for Every Decision
Point2RBox-v3: Self-Bootstrapping from Point Annotations via Integrated Pseudo-Label Refinement and Utilization
Optimistic Task Inference for Behavior Foundation Models
LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models
MILCO: Learned Sparse Retrieval Across Languages via a Multilingual Connector
TD-JEPA: Latent-predictive Representations for Zero-Shot Reinforcement Learning
Meta-Router: Bridging Gold-standard and Preference-based Evaluations in LLM Routing
DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment
Rethinking Model Calibration through Spectral Entropy Regularization in Medical Image Segmentation
Knowing When to Quit: Probabilistic Early Exits for Speech Separation Networks
DADA: Dual Averaging with Distance Adaptation
Fine-Grained Privacy Extraction from Retrieval-Augmented Generation Systems by Exploiting Knowledge Asymmetry
MobileIPL: Enhancing Mobile Agents Thinking Process via Iterative Preference Learning
Can Large Language Models Match the Conclusions of Systematic Reviews?
SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting
Temporal Test-Time Adaptation with State-Space Models
LearnPruner: Rethinking Attention-based Token Pruning in Vision Language Models
STaMP: Sequence Transformation and Mixed Precision for Low-Precision Activation Quantization
Semantic-Enhanced Time-Series Forecasting via Large Language Models
A State-Transition Framework for Efficient LLM Reasoning
Advancing Spatiotemporal Representations in Spiking Neural Networks via Parametric Invertible Transformation
PEERING INTO THE UNKNOWN: ACTIVE VIEW SELECTION WITH NEURAL UNCERTAINTY MAPS FOR 3D RECONSTRUCTION
Random Anchors with Low-rank Decorrelated Learning: A Minimalist Pipeline for Class-Incremental Medical Image Classification
MergePRAG: Orthogonal Merging of Passage-experts for Multi-hop Parametric RAG
Equilibrium Language Models
Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction
Accelerated Learning with Linear Temporal Logic using Differentiable Simulation
Linking Process to Outcome: Conditional Reward Modeling for LLM Reasoning
Concept Insertion Success over Time in Diffusion Models through Prompt-Conditioned Interventions
Guidance Matters: Rethinking the Evaluation Pitfall for Text-to-Image Generation
Sign-SGD via Parameter-Free Optimization
KnowledgeSmith: Uncovering Knowledge Updating in LLMs with Model Editing and Unlearning
ChainMPQ: Interleaved Text-Image Reasoning Chains for Mitigating Relation Hallucinations
Sem-MoE: Semantic-aware Model-Data Collaborative Scheduling for Efficient MoE Inference
FSD-CAP: Fractional Subgraph Diffusion with Class-Aware Propagation for Graph Feature Imputation
Solving Football by Exploiting Equilibrium Structure of 2p0s Differential Games with One-Sided Information
Thompson Sampling via Fine-Tuning of LLMs
Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations
CPiRi: Channel Permutation-Invariant Relational Interaction for Multivariate Time Series Forecasting
Positional Preservation Embedding for Multimodal Large Language Models
Enhanced Continual Learning of Vision-Language Models with Model Fusion
Distributional Consistency Loss: Beyond Pointwise Data Terms in Inverse Problems
Otters: An Energy-Efficient Spiking Transformer via Optical Time-to-First-Spike Encoding
Aligner, Diagnose Thyself: A Meta-Learning Paradigm for Fusing Intrinsic Feedback in Preference Alignment
Automated Stateful Specialization for Adaptive Agent Systems
Rethinking Benign Relearning: Syntax as the Hidden Driver of Unlearning Failures
Demystifying Emergent Exploration in Goal-Conditioned RL
Perturbed Dynamic Time Warping: A Probabilistic Framework and Generalized Variants
Breaking Agent Backbones: Evaluating the Security of Backbone LLMs in AI Agents
TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them
Dropping Just a Handful of Preferences Can Change Top Large Language Model Rankings
A Bayesian Nonparametric Framework for Private, Fair, and Balanced Tabular Data Synthesis
FlowCast: Trajectory Forecasting for Scalable Zero-Cost Speculative Flow Matching
From Cheap Geometry to Expensive Physics: Elevating Neural Operators via Latent Shape Pretraining
What Layers When: Learning to Skip Compute in LLMs with Residual Gates
Reference Guided Skill Discovery
Pixel-Level Residual Diffusion Transformer: Scalable 3D CT Volume Generation
ASTGI: Adaptive Spatio-Temporal Graph Interactions for Irregular Multivariate Time Series Forecasting
Learning to Summarize by Learning to Quiz: Adversarial Agentic Collaboration for Long Document Summarization
On Optimal Hyperparameters for Differentially Private Deep Transfer Learning
Better Learning-Augmented Spanning Tree Algorithms via Metric Forest Completion
Robust Preference Optimization: Aligning Language Models with Noisy Preference Feedback
Designing Time Series Experiments in A/B Testing with Transformer Reinforcement Learning
DISCO: Diversifying Sample Condensation for Accelerating Model Evaluation
Learning to Lie: Reinforcement Learning Attacks Damage Human-AI Teams and Teams of LLMs
Distribution-informed Online Conformal Prediction
HBO: Hierarchical Balancing Optimization for Fine-Tuning Large Language Models
Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer
UNITE: Universal kNowledge Integration from Task-specific Experts
Statistical Advantage of Softmax Attention: Insights from Single-Location Regression
Multi-LLM Adaptive Conformal Inference for Reliable LLM Response
GNN Explanations that do not Explain and How to find Them
Soft-Masked Diffusion Language Models
The False Promise of Zero-Shot Super-Resolution in Machine-Learned Operators
Angle K-Means
Code Driven Planning with Domain-Adaptive Selector
Fast Estimation of Wasserstein Distances via Regression on Sliced Wasserstein Distances
CooperTrim: Adaptive Data Selection for Uncertainty-Aware Cooperative Perception
CASteer: Cross-Attention Steering for Controllable Concept Erasure
Latent Wavelet Diffusion For Ultra High-Resolution Image Synthesis
Lean Finder: Semantic Search for Mathlib That Understands User Intents
Redirection for Erasing Memory (REM): Towards a universal unlearning method for corrupted data
LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts
Chessformer: A Unified Architecture for Chess Modeling
Judo: A Juxtaposed Domain-oriented Multimodal Reasoner for Industrial Anomaly QA
Bridging Successor Measure and Online Policy Learning with Flow Matching-Based Representations
Learn More with Less: Uncertainty Consistency Guided Query Selection for RLVR
Fused-Planes: Why Train a Thousand Tri-Planes When You Can Share?
Finite-Time Analysis of Actor-Critic Methods with Deep Neural Network Approximation
Test-Time Scaling with Reflective Generative Model
Generate Any Scene: Scene Graph Driven Data Synthesis for Visual Generation Training
In-Context Learning of Temporal Point Processes with Foundation Inference Models
Balancing the Experts: Unlocking LoRA-MoE for GRPO via Mechanism-Aware Rewards
Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning
TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video
DP-Fusion: Token-Level Differentially Private Inference for Large Language Models
UnigramLM: An Attempt at Writing The Missing Manual
Stop Guessing: Choosing the Optimization-Consistent Uncertainty Measurement for Evidential Deep Learning
OVSeg3R: Learn Open-vocabulary Instance Segmentation from 2D via 3D Reconstruction
Opponent Shaping in LLM Agents
Take Note: Your Molecular Dataset Is Probably Aligned
MicroVerse: A Preliminary Exploration Toward a Micro-World Simulation
Let OOD Feature Exploring Vast Predefined Classifiers
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning
POEMetric: The Last Stanza of Humanity
Sharp Monocular View Synthesis in Less Than a Second
Bayesian Ensemble for Sequential Decision-Making
A Function-Centric Graph Neural Network Approach for Predicting Electron Densities
Visual Autoregressive Modeling for Instruction-Guided Image Editing
DOPPLER: Dual-Policy Learning for Device Assignment in Asynchronous Dataflow Graphs
Demystifying Deep Search: A Holistic Evaluation with Hint-free Multi-Hop Questions and Factorised Metrics
CryoNet.Refine: A One-step Diffusion Model for Rapid Refinement of Structural Models with Cryo-EM Density Map Restraints
Multimodal Dataset Distillation Made Simple by Prototype-guided Data Synthesis
Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset
Graph Mixing Additive Networks
GlobeDiff: State Diffusion Process for Partial Observability in Multi-Agent System
Long Chain-of-Thought Reasoning Across Languages
Rating Quality of Diverse Time Series Data by Meta-learning from LLM Judgment
FACET: A Fragment-Aware Conformer Ensemble Transformer
FlowBind: Efficient Any-to-Any Generation with Bidirectional Flows
THEMIS: Towards Holistic Evaluation of MLLMs for Scientific Paper Fraud Forensics
The First Impression Problem: Internal Bias Triggers Overthinking in Reasoning Models
Slow-Fast Policy Optimization: Reposition-Before-Update for LLM Reasoning
Language Model Planning from an Information Theoretic Perspective
Content Promotion as a Strategic Game: How to Design Agentic Publishers for the Evolving Search Ecosystem in the GenAI Era?
SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis
Rodrigues Network for Learning Robot Actions
Learning from Algorithm Feedback: One-Shot SAT Solver Guidance with GNNs
Bongard-RWR+: Real-World Representations of Fine-Grained Concepts in Bongard Problems
ECHO: Toward Contextual Seq2Seq Paradigms in Large EEG Models
Are EEG Foundation Models Worth It? Comparative Evaluation with Traditional Decoders in Diverse BCI Tasks
Post-hoc Probabilistic Vision-Language Models
AWM: Accurate Weight-Matrix Fingerprint for Large Language Models
Flow of Spans: Generalizing Language Models to Dynamic Span-Vocabulary via GFlowNets
FlowRL: Matching Reward Distributions for LLM Reasoning
Discovering Diverse Behaviors via Temporal Contrastive Learning
Actions as Language: Fine-Tuning VLMs into VLAs Without Catastrophic Forgetting
AdvChain: Adversarial Chain-of-Thought Tuning for Robust Safety Alignment of Large Reasoning Models
Adaptive Augmentation-Aware Latent Learning for Robust LiDAR Semantic Segmentation
Reliable Poisoned Sample Detection against Backdoor Attacks Enhanced by Sharpness Aware Minimization
Inter-Agent Relative Representations for Multi-Agent Option Discovery
Tracing the Principles Behind Modern Diffusion Models
Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
Measuring Bias Amplification in Multi-Agent Systems with Large Language Models
Diffusion as Infinite HVAEs: Do Diffusion Models Generalize Better than Deep VAEs?
Learning Dynamics Feature Representation via Policy Attention for Dynamic Path Planning in Urban Road Networks
On the Universality and Complexity of GNN for Solving Second-order Cone Programs
Quagmires in SFT-RL Post-Training: When High SFT Scores Mislead and What to Use Instead
DASH: Deterministic Attention Scheduling for High-throughput Reproducible LLM Training
SELF-HARMONY: LEARNING TO HARMONIZE SELF-SUPERVISION AND SELF-PLAY IN TEST-TIME REINFORCEMENT LEARNING
Gradient-Based Diversity Optimization with Differentiable Top-$k$ Objective
HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data
DR-Submodular Maximization with Stochastic Biased Gradients: Classical and Quantum Gradient Algorithms
Theory-Grounded Evaluation of Human-Like Fallacy Patterns in LLM Reasoning
A Formal Controllability Toolkit for Black-Box Generative Models
A Graph Meta-Network for Learning on Kolmogorov–Arnold Networks
Pitfalls in Evaluating Language Model Forecasters
Activation Function Design Sustains Plasticity in Continual Learning
Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning
Compositional-ARC: Assessing Systematic Generalization in Abstract Spatial Reasoning
LCA: Local Classifier Alignment for Continual Learning
InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions
OSIRIS: Bridging Analog Circuit Design and Machine Learning with Scalable Dataset Generation
Towards Understanding The Calibration Benefits of Sharpness-Aware Minimization
FS-KAN: Permutation Equivariant Kolmogorov-Arnold Networks via Function Sharing
ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation
Neural Message-Passing on Attention Graphs for Hallucination Detection
AsyncBEV: Cross-modal flow alignment in Asynchronous 3D Object Detection
Budget Alignment: Making Models Reason in the User's Language
SERUM: Simple, Efficient, Robust, and Unifying Marking for Diffusion-based Image Generation
Inference-Time Personalized Safety Control via Paired Difference-in-Means Intervention
On The Expressive Power of GNN Derivatives
LS-Merge: Merging Language Models in Latent Space
Beyond Static Vision: Scene Dynamic Field Unlocks Intuitive Physics Understanding in Multi-modal Large Language Models
Helmsman: Autonomous Synthesis of Federated Learning Systems via Multi-Agent Collaboration
All Code, No Thought: Language Models Struggle to Reason in Ciphered Language
Enhancing Diffusion-Based Sampling with Molecular Collective Variables
KernelFusion: Zero-Shot Blind Super-Resolution via Patch Diffusion
Learning from the Electronic Structure of Molecules across the Periodic Table
Difference Predictive Coding for Training Spiking Neural Networks
Early Signs of Steganographic Capabilities in Frontier LLMs
AntigenLM: Structure-Aware DNA Language Modeling for Influenza
Simulation to Rules: A Dual-VLM Framework for Formal Visual Planning
Discrete Compositional Generation via General Soft Operators and Robust Reinforcement Learning
Multi-Scale Diffusion-Guided Graph Learning with Power-Smoothing Random Walk Contrast for Multi-View Clustering
K-Prism: A Knowledge-Guided and Prompt Integrated Universal Medical Image Segmentation Model
Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning
PerSpectra: A Scalable and Configurable Pluralist Benchmark of Perspectives from Arguments
ODI-Bench: Can MLLMs Understand Immersive Omnidirectional Environments?
Human-LLM Collaborative Feature Engineering for Tabular Data
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training
Understanding the Implicit Biases of Design Choices for Time Series Foundation Models
Boosting Open Set Recognition Performance through Modulated Representation Learning
STAR: Strategy-driven Automatic Jailbreak Red-teaming For Large Language Model
CIMemories: A Compositional Benchmark For Contextual Integrity In LLMs
All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning
LadderSym: A Multimodal Interleaved Transformer for Music Practice Error Detection
Understanding Transformers for Time Series: Rank Structure, Flow-of-ranks, and Compressibility
Incentivizing LLM Reasoning via Reinforcement Learning with Functional Monte Carlo Tree Search
Fine-Tuning Diffusion Models via Intermediate Distribution Shaping
On the Convergence of Two-Layer Kolmogorov-Arnold Networks with First-Layer Training
RefineBench: Evaluating Refinement Capability in Language Models
BOLT: Decision‑Aligned Distillation and Budget-Aware Routing for Constrained Multimodal QA on Robots
A Unifying Framework for Causal Imitation Learning with Hidden Confounders
Iterative Training of Physics-Informed Neural Networks with Fourier-enhanced Features
Avey Bidirectional Architecture
SuperF: Neural Implicit Fields for Multi-Image Super-Resolution
Mapping Post-Training Forgetting in Language Models at Scale
Contextual Causal Bayesian Optimisation
Riemannian Zeroth-Order Gradient Estimation with Structure-Preserving Metrics for Geodesically Incomplete Manifolds
Hippoformer: Integrating Hippocampus-inspired Spatial Memory with Transformers
Benchmarking LLM Tool-Use in the Wild
SABRE-FL: Selective and Accurate Backdoor Rejection for Federated Prompt Learning
Exo-Plore: Exploring Exoskeleton Control Space through Human-aligned Simulation
Task-Agnostic Amortized Multi-Objective Optimization
Rethinking JEPA: Compute‑Efficient Video Self-Supervised Learning with Frozen Teachers
HiDivDrop: Vision Token Reduction in MLLMs via Late Injection and Differentiable Top-K
Provably Tracking Equivalent Mechanistic Interpretations Across Neural Networks
Towards Improvisational TAMP: Learning Low-Level Shortcuts in Abstract Planning Graphs
EditLens: Quantifying the Extent of AI Editing in Text
Alignment-Weighted DPO: A principled reasoning approach to improve alignment
Uncertainty as Feature Gaps: Epistemic Uncertainty Quantification of LLMs in Contextual Question-Answering
Weight Decay may matter more than µP for Learning Rate Transfer in Practice
Escaping Model Collapse via Synthetic Data Verification: Near-term Improvements and Long-term Convergence
DAMR: Efficient and Adaptive Context-Aware Knowledge Graph Question Answering with LLM-Guided MCTS
La-Proteina: Atomistic Protein Generation via Partially Latent Flow Matching
CogMoE: Signal-Quality–Guided Multimodal MoE for Cognitive Load Prediction
Statistical Guarantees for Offline Domain Randomization
Gistify: Codebase-Level Understanding via Runtime Execution
ReconViaGen: Towards Accurate Multi-view 3D Object Reconstruction via Generation
Mitigating the Curse of Detail: Scaling Arguments for Feature Learning and Sample Complexity
A Schrödinger Eigenfunction Method for Long-Horizon Stochastic Optimal Control
SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models
Training Dynamics Impact Post-Training Quantization Robustness
Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks
UALM: Unified Audio Language Model for Understanding, Generation and Reasoning
h-MINT: Modeling Pocket-Ligand Binding with Hierarchical Molecular Interaction Network
Patronus: Interpretable Diffusion Models with Prototypes
Bayesian Robust Cooperative Multi-Agent Reinforcement Learning Against Unknown Adversaries
Improved $\ell_{p}$ Regression via Iteratively Reweighted Least Squares
NEO — No-Optimization Test-Time Adaptation through Latent Re-Centering
LumiTex: Towards High-Fidelity PBR Texture Generation with Illumination Context
GuardAlign: Robust Safety Alignment in Multimodal Large Language Models
You Point, I Learn: Online Adaptation of Interactive Segmentation Models for Handling Distribution Shifts in Medical Imaging
Look Carefully: Adaptive Visual Reinforcements in Multimodal Large Language Models for Hallucination Mitigation
UniCon: Unified Framework for Efficient Contrastive Alignment via Kernels
In-Context Watermarks for Large Language Models
MathNet: A Global Multimodal Benchmark for Mathematical Reasoning and Retrieval
Exposing Mixture and Annotating Confusion for Active Universal Test-Time Adaptation
AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents
Can Small Training Runs Reliably Guide Data Curation? Rethinking Proxy-Model Practice
RepIt: Steering Language Models with Concept-Specific Refusal Vectors
Learning to Reason without External Rewards
VERINA: Benchmarking Verifiable Code Generation
GCGNet: Graph-Consistent Generative Network for Time Series Forecasting with Exogenous Variables
CyberGym: Evaluating AI Agents' Real-World Cybersecurity Capabilities at Scale
Diffusion & Adversarial Schrödinger Bridges via Iterative Proportional Markovian Fitting
DELTA-Code: How RL Unlocks and Transfers New Programming Algorithms in LLMs
RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents
Drugging the Undruggable: Benchmarking and Modeling Fragment-Based Screening
Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel
ATTS: Asynchronous Test-Time Scaling via Conformal Prediction
Directional Convergence, Benign Overfitting of Gradient Descent in leaky ReLU two-layer Neural Networks
RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation
DevOps-Gym: Benchmarking AI Agents in Software DevOps Cycle
Interpretable 3D Neural Object Volumes for Robust Conceptual Reasoning
Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation
Multiple-Prediction-Powered Inference
ViPER: Empowering the Self-Evolution of Visual Perception Abilities in Vision-Language Models
GneissWeb: Preparing High Quality Data for LLMs at Scale
COSMOS: A Hybrid Adaptive Optimizer for Efficient Training of Large Language Models
Boosting for Predictive Sufficiency
Musculoskeletal simulation of limb movement biomechanics in Drosophila melanogaster
WIMLE: Uncertainty‑Aware World Models with IMLE for Sample‑Efficient Continuous Control
DRIFT: Decompose, Retrieve, Illustrate, then Formalize Theorems
A Benchmark for Deep Information Synthesis
Q-RAG: Long Context Multi‑Step Retrieval via Value‑Based Embedder Training
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies
Query-Specific Causal Graph Pruning Under Tiered Knowledge
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Initialization Schemes for Kolmogorov–Arnold Networks: An Empirical Study
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers
Antithetic Noise in Diffusion Models
VLM-Guided Adaptive Negative Prompting for Creative Generation
LightRetriever: A LLM-based Text Retrieval Architecture with Extremely Faster Query Inference
NoisePrints: Distortion-Free Watermarks for Authorship in Private Diffusion Models
Federated Graph-Level Clustering Network with Dual Knowledge Separation
Horizon Imagination: Efficient On-Policy Training in Diffusion World Models
RAVENEA: A Benchmark for Multimodal Retrieval-Augmented Visual Culture Understanding
Efficient Degradation-agnostic Image Restoration via Channel-Wise Functional Decomposition and Manifold Regularization
Efficient Approximate Posterior Sampling with Annealed Langevin Monte Carlo
TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture
Rethinking Expressivity and Degradation-Awareness in Attention for All-in-One Blind Image Restoration
dLLM - Rethinking Generation Beyond Autoregressive Models
Distilling and Adapting: A Topology-Aware Framework for Zero-Shot Interaction Prediction in Multiplex Biological Networks
AutoTool: Automatic Scaling of Tool-Use Capabilities in RL via Decoupled Entropy Constraints
Learning to Segment for Vehicle Routing Problems
RL for Reasoning by Adaptively Revealing Rationales
Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer
Distributional value gradients for stochastic environments
One Demo Is All It Takes: Planning Domain Derivation with LLMs from A Single Demonstration
Towards a Universally Transferable Acceleration Method for Density Functional Theory
MILR: Improving Multimodal Image Generation via Test-Time Latent Reasoning
Probability Distributions Computed by Hard-Attention Transformers
Generalization of Diffusion Models Arises with a Balanced Representation Space
Revisit Visual Prompt Tuning: The Expressiveness of Prompt Experts
Cross-Domain Policy Optimization via Bellman Consistency and Hybrid Critics
RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots
CLAP: Unsupervised 3D Representation Learning for Fusion 3D Perception via Curvature Sampling and Prototype Learning
A Reward-Free Viewpoint on Multi-Objective Reinforcement Learning
GAVEL: Towards Rule-Based Safety through Activation Monitoring
MolLangBench: A Comprehensive Benchmark for Language-Prompted Molecular Structure Recognition, Editing, and Generation
EAMET: ROBUST MASSIVE MODEL EDITING VIA EMBEDDING ALIGNMENT OPTIMIZATION
Test-Time Alignment for Large Language Models via Textual Model Predictive Control
One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning
Improving Feasibility via Fast Autoencoder-Based Projections
Explaining Grokking and Information Bottleneck through Neural Collapse Emergence
GuidedBench: Measuring and Mitigating the Evaluation Discrepancies of In-the-wild LLM Jailbreak Methods
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning
Multilingual Routing in Mixture-of-Experts
Energy-Regularized Sequential Model Editing on Hyperspheres
Steering MoE LLMs via Expert (De)Activation
ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping
Theoretical Analysis of Contrastive Learning under Imbalanced Data: From Training Dynamics to a Pruning Solution
FrontierCO: Real-World and Large-Scale Evaluation of Machine Learning Solvers for Combinatorial Optimization
Speculative Actions: A Lossless Framework for Faster AI Agents
On the Predictive Power of Representation Dispersion in Language Models
Distilling to Hybrid Attention Models via KL-Guided Layer Selection
To Sink or Not to Sink: Visual Information Pathways in Large Vision-Language Models
Signal in the Noise: Polysemantic Interference Transfers and Predicts Cross-Model Influence
Sparkle: A Robust and Versatile Representation for Point Cloud-based Human Motion Capture
Can Vision-Language Models Answer Face to Face Questions in the Real-World?
SHIELD: Suppressing Hallucinations In LVLM Encoders via Bias and Vulnerability Defense
SPRIG: Improving Large Language Model Performance by System Prompt Optimization
The Curious Case of In-Training Compression of State Space Models
Jailbreaking the Matrix: Nullspace Steering for Controlled Model Subversion
Cartridges: Lightweight and general-purpose long context representations via self-study
Reducing Symmetry Increase in Equivariant Neural Networks
MaskInversion: Localized Embeddings via Optimization of Explainability Maps
VisCoder2: Building Multi-Language Visualization Coding Agents
ARINBEV: Bird's-Eye View Layout Estimation with Conditional Autoregressive Model
Extending the Context of Pretrained LLMs by Dropping Their Positional Embedding
GraphUniverse: Enabling Systematic Evaluation of Inductive Generalization
Frequency-Balanced Retinal Representation Learning with Mutual Information Regularization
ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
CREPE: Controlling diffusion with REPlica Exchange
Features Emerge as Discrete States: The First Application of SAEs to 3D Representations
Quartet of Diffusions: Structure-Aware Point Cloud Generation through Part and Symmetry Guidance
RNE: plug-and-play diffusion inference-time control and energy-based training
Pallatom-Ligand: an All-Atom Diffusion Model for Designing Ligand-Binding Proteins
Accelerated Parallel Tempering via Neural Transports
Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design
Probing Rotary Position Embeddings through Frequency Entropy
SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs
ALM-MTA: Front-Door Causal Multi-Touch Attribution Method for Creator-Ecosystem Optimization
A Joint Diffusion Model with Pre-Trained Priors for RNA Sequence–Structure Co-Design
Can Speech LLMs Think while Listening?
Mitigating Non-IID Drift in Zeroth-Order Federated LLM Fine-Tuning with Transferable Sparsity
Directional Sheaf Hypergraph Networks: Unifying Learning on Directed and Undirected Hypergraphs
Randomized Antipodal Search Done Right for Data Pareto Improvement of LLM Unlearning
Taming Score-Based Denoisers in ADMM: A Convergent Plug-and-Play Framework
ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks
Improving Autoregressive Video Modeling with History Understanding
Keep the Best, Forget the Rest: Reliable Alignment with Order-Aware Preference Optimization
One-Step Flow for Image Super-Resolution with Tunable Fidelity-Realism Trade-offs
Exploring Synthesizable Chemical Space with Iterative Pathway Refinements
EvA: Evolutionary Attacks on Graphs
AlphaBench: Benchmarking Large Language Models in Formulaic Alpha Factor Mining
Vid2World: Crafting Video Diffusion Models to Interactive World Models
Bi-Criteria Metric Distortion
Cost-of-Pass: An Economic Framework for Evaluating Language Models
Bayesian Post Training Enhancement of Regression Models with Calibrated Rankings
Boosting Medical Visual Understanding From Multi-Granular Language Learning
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs
EUBRL: Epistemic Uncertainty Directed Bayesian Reinforcement Learning
BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning
Panda: A pretrained forecast model for chaotic dynamics
Noise Stability of Transformer Models
QUEST: A robust attention formulation using query-modulated spherical attention
SARM: Stage-Aware Reward Modeling for Long Horizon Robot Manipulation
GOLDILOCS: GENERAL OBJECT-LEVEL DETECTION AND LABELING OF CHANGES IN SCENES
Multi-modal Data Spectrum: Multi-modal Datasets are Multi-dimensional
R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning
Temporal Generalization: A Reality Check
FAST‑DIPS: Adjoint‑Free Analytic Steps and Hard‑Constrained Likelihood Correction for Diffusion‑Prior Inverse Problems
Towards Efficient, Adaptive, and Unified Reinforcement Mid-Training
UniHM: Unified Dexterous Hand Manipulation with Vision Language Model
RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards
Learnable Fractional Superlets with a Spectro-Temporal Emotion Encoder for Speech Emotion Recognition
Discovering heterogeneous synaptic plasticity rules via large-scale neural evolution
VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety
Demystifying and Enhancing the Efficiency of Large Language Model Based Search Agents
Train on Validation (ToV): Fast data selection with applications to fine-tuning
Reliable Fine-Grained Evaluation of Natural Language Math Proofs
Pareto-Conditioned Diffusion Models for Offline Multi-Objective Optimization
A Biologically Plausible Dense Associative Memory with Exponential Capacity
NRGPT: An Energy-based Alternative for GPT
SkillFactory: Self-Distillation for Learning Cognitive Behaviors
Scaling Laws for Diffusion Transformers
DoFlow: Flow-based Generative Models for Interventional and Counterfactual Forecasting on Time Series
The State of Reinforcement Finetuning for Transformer-based Generative Agents
Multiple Streams of Knowledge Retrieval: Enriching and Recalling in Transformers
Learnable Sparsity for Vision Generative Models
COSA: Context-aware Output-Space Adapter for Test-Time Adaptation in Time Series Forecasting
PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
Diverse Text-to-Image Generation via Contrastive Noise Optimization
Stackelberg Coupling of Online Representation Learning and Reinforcement Learning
Special Unitary Parameterized Estimators of Rotation
Membership Privacy Risks of Sharpness Aware Minimization
PERK: Long-Context Reasoning as Parameter-Efficient Test-Time Learning
Can LLMs Move Beyond Short Exchanges to Realistic Therapy Conversations?
Constraint Matters: Multi-Modal Representation for Reducing Mixed-Integer Linear programming
Bayesian Parameter Shift Rules in Variational Quantum Eigensolvers
Smooth Reading: Bridging the Gap of Recurrent LLM to Self-Attention LLM on Long-Context Understanding
Expertise Can Be Helpful for Reinforcement Learning-based Macro Placement
Towards Quantization-Aware Training for Ultra-Low-Bit Reasoning LLMs
VARestorer: One-Step VAR Distillation for Real-World Image Super-Resolution
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Astra: General Interactive World Model with Autoregressive Denoising
Rainbow Padding: Mitigating Early Termination in Instruction-Tuned Diffusion LLMs
Learning What Matters Now: Dynamic Preference Inference under Contextual Shifts
Contamination Detection for VLMs Using Multi‑Modal Semantic Perturbations
VGR: Visual Grounded Reasoning
Think in Parallel, Answer as One: Logit Averaging for Open-Ended Reasoning
PrefixMemory-Tuning: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention
DAVE: A VLM Vision Encoder for Document Understanding and Web Agents
Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning
PoliCon: Evaluating LLMs on Achieving Diverse Political Consensus Objectives
Your VAR Model is Secretly an Efficient and Explainable Generative Classifier
Fostering Video Reasoning via Next-Event Prediction
Learning Heterogeneous Degradation Representation for Real-World Super-Resolution
SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From
Variational Reasoning for Language Models
Reinforcing General Reasoning Without Verifiers
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Heuristic-Based Ideation for Guiding LLMs Toward Structured Creativity
GHOST: Hallucination-Inducing Image Generation for Multimodal LLMs
Efficient Turing Machine Simulation with Transformers
PERSONA: Dynamic and Compositional Inference-Time Personality Control via Activation Vector Algebra
CoT-RVS: Zero-Shot Chain-of-Thought Reasoning Segmentation for Videos
Toward Complex-Valued Neural Networks for Waveform Generation
Improving Human-AI Coordination through Online Adversarial Training and Generative Models
Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling
MARS - A Foundational Map Auto-Regressor
Disrupting Hierarchical Reasoning: Adversarial Protection for Geographic Privacy in Multimodal Reasoning Models
SurfSplat: Conquering Feedforward 2D Gaussian Splatting with Surface Continuity Priors
Don't Settle Too Early: Self-Reflective Remasking for Diffusion Language Models
Towards Privacy-Guaranteed Label Unlearning in Vertical Federated Learning: Few-Shot Forgetting Without Disclosure
OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis
Traceable Black-Box Watermarks For Federated Learning
Nasty Adversarial Training: A Probability Sparsity Perspective for Robustness Enhancement
Understanding the Role of Training Data in Test-Time Scaling
PhaseFormer: From Patches to Phases for Efficient and Effective Time Series Forecasting
ChainGPT: Dual-Reasoning Model with Recurrent Depth and Multi-Rank State Updates
When LLMs get significantly worse: A statistical approach to detect model degradations
Purrception: Variational Flow Matching for Vector-Quantized Image Generation
ROSETTA: Constructing Code-Based Reward from Unconstrained Language Preference
Solving the Granularity Mismatch: Hierarchical Preference Learning for Long-Horizon LLM Agents
ORION: Decoupling and Alignment for Unified Autoregressive Understanding and Generation
Expressive and Invariant Graph Learning via Canonical Tree Cover Neural Networks
Controllable Logical Hypothesis Generation for Abductive Reasoning in Knowledge Graphs
Strong Correlations Induce Cause Only Predictions in Transformer Training
Teaching Metric Distance to Discrete Autoregressive Language Models
Disentangling Knowledge Representations for Large Language Model Editing
Referring Layer Decomposition
Bayesian Evidence-Driven Prototype Evolution for Federated Domain Adaptation
A Cognitive Process-Inspired Architecture for Subject-Agnostic Brain Visual Decoding
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning
TumorChain: Interleaved Multimodal Chain-of-Thought Reasoning for Traceable Clinical Tumor Analysis
LLM Unlearning with LLM Beliefs
ProteinAE: Protein Diffusion Autoencoders for Structure Encoding
Enhancing Learning with Noisy Labels via Rockafellian Relaxation
ATOM: A Pretrained Neural Operator for Multitask Molecular Dynamics
LINGOLY-TOO: Disentangling Reasoning from Knowledge with Templatised Orthographic Obfuscation
Latent Veracity Inference for Identifying Errors in Stepwise Reasoning
HARP: Hallucination Detection via Reasoning Subspace Projection
MoRA: Missing Modality Low-Rank Adaptation for Visual Recognition
Structurally Human, Semantically Biased: Detecting LLM-Generated References with Embeddings and GNNs
Task-Adaptive Parameter-Efficient Fine-Tuning for Weather Foundation Models
Predictive CVaR Q-learning
TianQuan-S2S: A Subseasonal-to-Seasonal Global Weather Model via Incorporate Climatology State
Geometry-aware 4D Video Generation for Robot Manipulation
Hierarchy Decoding: A Training-free Parallel Decoding Strategy for Diffusion Large Language Models
Beyond RAG vs. Long-Context: Learning Distraction-Aware Retrieval for Efficient Knowledge Grounding
Unlearning Evaluation through Subset Statistical Independence
Toward Conservative Planning from Preferences in Offline Reinforcement Learning
CubeBench: Diagnosing Interactive, Long-Horizon Physical Intelligence under Partial Observations
MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning
FREAK: A Fine-grained Hallucination Evaluation Benchmark for Advanced MLLMs
Bridging the Distribution Gap to Harness Pretrained Diffusion Priors for Super-Resolution
Let LLMs Speak Embedding Languages: Generative Text Embeddings via Iterative Contrastive Refinement
CogniMap3D: Cognitive 3D Mapping and Rapid Retrieval
Concept-Aware Privacy Mechanisms for Defending Embedding Inversion Attacks
ProRe: A Proactive Reward System for GUI Agents via Reasoner–Actor Collaboration
Transitive RL: Value Learning via Divide and Conquer
Achieving Approximate Symmetry Is Exponentially Easier than Exact Symmetry
Retrospective Sparse Attention for Efficient Long-Context Generation
Secure Outlier-Aware Large Language Model Inference
Curse of Slicing: Why Sliced Mutual Information is a Deceptive Measure of Statistical Dependence
FakeXplain: AI-Generated Images Detection via Human-Aligned Grounded Reasoning
In-The-Flow Agentic System Optimization for Effective Planning and Tool Use
Variational Inference for Cyclic Learning
MT-DAO: Multi-Timescale Distributed Adaptive Optimizers with Local Updates
DES-LOC: Desynced Low Communication Adaptive Optimizers for Foundation Models
Cascadia: An Efficient Cascade Serving System for Large Language Models
Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods
FARI: Robust One-Step Inversion for Watermarking in Diffusion Models
Segment Any Events with Language
Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control
Logit‑KL Flow Matching: Non‑Autoregressive Text Generation via Sampling‑Hybrid Inference
Symmetric Space Learning for Combinatorial Generalization
Reinforcement Learning Fine-Tuning Enhances Activation Intensity and Diversity in the Internal Circuitry of LLMs
CALM: Co-evolution of Algorithms and Language Model for Automatic Heuristic Design
NeRV-Diffusion: Diffuse Implicit Neural Representation for Video Synthesis
Fine-tuning Quantized Neural Networks with Zeroth-order Optimization
Adaptive Mixture of Disentangled Experts for Dynamic Graphs under Distribution Shifts
EEPO: Exploration-Enhanced Policy Optimization via Sample-Then-Forget
On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations
Hierarchical Prototype Learning for Semantic Segmentation
GDGB: A Benchmark for Generative Dynamic Text-Attributed Graph Learning
Continuously Augmented Discrete Diffusion model for Categorical Generative Modeling
Advancing Universal Deep Learning for Electronic-Structure Hamiltonian Prediction of Materials
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Towards Efficient Constraint Handling in Neural Solvers for Routing Problems
Omni-View: Unlocking How Generation Facilitates Understanding in Unified 3D Model based on Multiview images
Context Tokens are Anchors: Understanding the Repetition Curse in Diffusion MLLMs from an Information Flow Perspective
DeCo-DETR: Decoupled Cognition DETR for efficient Open-Vocabulary Object Detection
Hallucination Begins Where Saliency Drops
Revisiting Long-context Modeling from Context Denoising Perspective
No Prior, No Leakage: Revisiting Reconstruction Attacks in Trained Neural Networks
How to Square Tensor Networks and Circuits Without Squaring Them
PRISON: Unmasking the Criminal Potential of Large Language Models
Catching the Details: Self-Distilled RoI Predictors for Fine-Grained MLLM Perception
ROGA: Scaling Generalist Agents for Office Productivity Tasks via Tool Generation
GradPruner: Gradient-guided Layer Pruning Enabling Efficient Fine-Tuning and Inference for LLMs
AVERE: Improving Audiovisual Emotion Reasoning with Preference Optimization
ELLMob: Event-Driven Human Mobility Generation with Self-Aligned LLM Framework
No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping
ExpVid: A Benchmark for Experiment Video Understanding & Reasoning
Preserve and Sculpt: Manifold-Aligned Fine-tuning of Vision-Language Models for Few-Shot Learning
xLSTM Scaling Laws: Competitive Performance with Linear Time-Complexity
Uncertainty-driven Embedding Convolution
Leveraging Data to Say No: Memory Augmented Plug-and-Play Selective Prediction
PALC: Preference Alignment via Logit Calibration
Scaling Knowledge Editing in LLMs to 100,000 Facts with Neural KV Database
Scalable and Adaptive Trust-Region Learning via Projection Convex Hull
Universal Beta Splatting
PrismAudio: Decomposed Chain-of-Thought and Multi-dimensional Rewards for Video-to-Audio Generation
Unlocking the Potential of Weighting Methods in Federated Learning Through Communication Compression
DeepScientist: Advancing Frontier-Pushing Scientific Findings Progressively
When Data is the Algorithm: A Systematic Study and Curation of Preference Optimization Datasets
GeoGramBench: Benchmarking the Geometric Program Reasoning in Modern LLMs
Learning From Dictionary: Enhancing Robustness of Machine-Generated Text Detection in Zero-Shot Language via Adversarial Training
ShapeGen4D: Towards High Quality 4D Shape Generation from Videos
Generalization of RLVR Using Causal Reasoning as a Testbed
CUPID: A Plug-in Framework for Joint Aleatoric and Epistemic Uncertainty Estimation with a Single Model
ByteFlow: Language Modeling through Adaptive Byte Compression without a Tokenizer
Using cognitive models to reveal value trade-offs in language models
Learning Brain Representation with Hierachical Visual Embeddings
Priors in time: Missing inductive biases for language model interpretability
FSOD-VFM: Few-Shot Object Detection with Vision Foundation Models and Graph Diffusion
Dynamic Speculative Agent Planning
Complexity- and Statistics-Guided Anomaly Detection in Time Series Foundation Models
Explain in Your Own Words: Improving Reasoning via Token-Selective Dual Knowledge Distillation
Toward Practical Equilibrium Propagation: Brain-inspired Recurrent Neural Network with Feedback Regulation and Residual Connections
Robust Adaptive Multi-Step Predictive Shielding
MiSS: Revisiting the Trade-off in LoRA with an Efficient Shard-Sharing Structure
Interleaving Reasoning for Better Text-to-Image Generation
Adaptive Regularization for Large-Scale Sparse Feature Embedding Models
WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction
FSA: An Alternative Efficient Implementation of Native Sparse Attention Kernel
UltraLLaDA: Scaling the Context Length to 128K for Diffusion Large Language Models
Generalizable Heuristic Generation Through LLMs with Meta-Optimization
Learning From the Past with Cascading Eligibility Traces
Physically Valid Biomolecular Interaction Modeling with Gauss-Seidel Projection
Reforming the Mechanism: Editing Reasoning Patterns in LLMs with Circuit Reshaping
AutoEP: LLMs-Driven Automation of Hyperparameter Evolution for Metaheuristic Algorithms
SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation
PARD: Accelerating LLM Inference with Low‑Cost PARallel Draft Model Adaptation
GOOD: Geometry-guided Out-of-Distribution Modeling for Open-set Test-time Adaptation in Point Cloud Semantic Segmentation
Graph-based Nearest Neighbors with Dynamic Updates via Random Walk-Based Analysis
Trust but Verify: Adaptive Conditioning for Reference-Based Diffusion Super-Resolution via Implicit Reference Correlation Modeling
Mitigating Privacy Risk via Forget Set-Free Unlearning
GNN-as-Judge: Unleashing the Power of LLMs for Graph Few-shot Semi-supervised Learning with GNN Feedback
PoLi-RL: A Point-to-List Reinforcement Learning Framework for Conditional Semantic Textual Similarity
GenDR: Lighten Generative Detail Restoration
Learning Ising Models under Hard Constraints using One Sample
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
Prior-aware and Context-guided Group Sampling for Active Probabilistic Subsampling
Zero-Shot Adaptation of Behavioral Foundation Models to Unseen Dynamics
cadrille: Multi-modal CAD Reconstruction with Reinforcement Learning
ULD-Net: Enabling Ultra-Low-Degree Fully Polynomial Networks for Homomorphically Encrypted Inference
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends
What Generative Search Engines Like and How to Optimize Web Content Cooperatively
Automatic Image-Level Morphological Trait Annotation for Organismal Images
NextQuill: Causal Preference Modeling for Enhancing LLM Personalization
Unsupervised Invariant Risk Minimization
Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking
Physics-Informed Audio-Geometry-Grid Representation Learning for Universal Sound Source Localization
BioCAP: Exploiting Synthetic Captions Beyond Labels in Biological Foundation Models
Physics-Inspired All-Pair Interaction Learning for 3D Dynamics Modeling
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
Numerion: A Multi-Hypercomplex Model for Time Series Forecasting
Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval
Domain Expansion: A Latent Space Construction Framework for Multi-Task Learning
Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models
Speech World Model: Causal State–Action Planning with Explicit Reasoning for Speech
AnesSuite: A Comprehensive Benchmark and Dataset Suite for Anesthesiology Reasoning in LLMs
An Optimal Diffusion Approach to Quadratic Rate-Distortion Problems: New Solution and Approximation Methods
Diffusion Bridge Variational Inference for Deep Gaussian Processes
Forget Many, Forget Right: Scalable and Precise Concept Unlearning in Diffusion Models
Scaling Laws of SignSGD in Linear Regression: When Does It Outperform SGD?
vCache: Verified Semantic Prompt Caching
Cautious Optimizers: Improving Training with One Line of Code
Correlated Policy Optimization in Multi-Agent Subteams
Entropy-Guided Dynamic Tokens for Graph-LLM Alignment in Molecular Understanding
AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite
Latent Geometry-Driven Network Automata for Complex Network Dismantling
VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration
CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition
CGSA: Class-Guided Slot-Aware Adaptation for Source-Free Object Detections
Towards Sampling Data Structures for Tensor Products in Turnstile Streams
Getting Your LLMs Ready for Reinforcement Learning with Lightweight SFT
Nearly Space-Optimal Graph and Hypergraph Sparsification in Insertion-Only Data Streams
When Agents “Misremember” Collectively: Exploring the Mandela Effect in LLM-based Multi-Agent Systems
Hot PATE: Private Aggregation of Distributions for Diverse Tasks
Parameterized Hardness of Zonotope Containment and Neural Network Verification
InfoBridge: Mutual Information estimation via Bridge Matching
NDAD: Negative-Direction Aware Decoding for Large Language Models via Controllable Hallucination Signal Injection
TopoFormer: Topology Meets Attention for Graph Learning
DPad: Efficient Diffusion Language Models with Suffix Dropout
SIGMA-GEN: STRUCTURE AND IDENTITY GUIDED MULTI-SUBJECT ASSEMBLY FOR IMAGE GENERATION
Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region
RoboPARA: Dual-Arm Robot Planning with Parallel Allocation and Recomposition Across Tasks
DistillKac: Few-Step Image Generation via Damped Wave Equations
Neural Force Field: Few-shot Learning of Generalized Physical Reasoning
MILPnet: A Multi-Scale Architecture with Geometric Feature Sequence Representations for Advancing MILP Problems
Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall
QuadGPT: Native Quadrilateral Mesh Generation with Autoregressive Models
Massive Memorization with Hundreds of Trillions of Parameters for Sequential Transducer Generative Recommenders
Building spatial world models from sparse transitional episodic memories
Knowledgeable Language Models as Black-Box Optimizers for Personalized Medicine
Enabling True Global Perception in State Space Models for Visual Tasks
Echoes as Anchors: Probabilistic Costs and Attention Refocusing in LLM Reasoning
GraphShield: Graph-Theoretic Modeling of Network-Level Dynamics for Robust Jailbreak Detection
Two failure modes of deep transformers and how to avoid them: a unified theory of signal propagation at initialisation
Hierarchical Multi-Scale Molecular Conformer Generation with Structural Awareness
Estimating Semantic Alphabet Size for LLM Uncertainty Quantification
The Seismic Wavefield Common Task Framework
GUIDE: Gated Uncertainty-Informed Disentangled Experts for Long-tailed Recognition
Noisy-Pair Robust Representation Alignment for Positive-Unlabeled Learning
Learning of Population Dynamics: Inverse Optimization Meets JKO Scheme
THE SELF-RE-WATERMARKING TRAP: FROM EXPLOIT TO RESILIENCE
Composition of Memory Experts for Diffusion World Models
Operationalizing Data Minimization for Privacy-Preserving LLM Prompting
On the Lipschitz Continuity of Set Aggregation Functions and Neural Networks for Sets
Stable coresets: Unleashing the power of uniform sampling
Decoding Dynamic Visual Experience from Calcium Imaging via Cell-Pattern-Aware Pretraining
Online time series prediction using feature adjustment
Generating Directed Graphs with Dual Attention and Asymmetric Encoding
Data-Centric Lessons To Improve Speech-Language Pretraining
Doloris: Dual Conditional Diffusion Implicit Bridges with Sparsity Masking Strategy for Unpaired Single-Cell Perturbation Estimation
SubDyve: Subgraph-Driven Dynamic Propagation for Virtual Screening Enhancement Controlling False Positive
Revenue Maximization Under Sequential Price Competition Via The Estimation Of $s$-Concave Demand Functions
A Scene is Worth a Thousand Features: Feed-Forward Camera Localization from a Collection of Image Features
Consistent Text-to-Image Generation via Scene De-Contextualization
Shrinking Proteins with Diffusion
Antibody: Strengthening Defense Against Harmful Fine-Tuning for Large Language Models via Attenuating Harmful Gradient Influence
Physically-Guided Optical Inversion Enable Non-Contact Side-Channel Attack on Isolated Screens
Diagnosing and Remedying Knowledge Deficiencies in LLMs via Label-free Curricular Meaningful Learning
DRIFT-Net: A Spectral-Coupled Neural Operator for PDEs Learning
Revisiting Sharpness-Aware Minimization: A More Faithful and Effective Implementation
Deep Latent Variable Model based Vertical Federated Learning with Flexible Alignment and Labeling Scenarios
AgentPO: Enhancing Multi-Agent Collaboration via Reinforcement Learning
Implicit bias produces neural scaling laws in learning curves, from perceptrons to deep networks
DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation
Learning to Be Uncertain: Pre-training World Models with Horizon-Calibrated Uncertainty
It's All Just Vectorization: einx, a Universal Notation for Tensor Operations
Adaptive Width Neural Networks
AttriCtrl: A Generalizable Framework for Controlling Semantic Attribute Intensity in Diffusion Models
LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation
Coarse-to-Fine Learning of Dynamic Causal Structures
Agentic Reinforcement Learning with Implicit Step Rewards
Self-Speculative Masked Diffusions
PRISM: Controllable Diffusion for Compound Image Restoration with Scientific Fidelity
CLARC: C/C++ Benchmark for Robust Code Search
Toward Enhancing Representation Learning in Federated Multi-Task Settings
PolyGraphScore: a classifier-based metric for evaluating graph generative models
Why High-rank Neural Networks Generalize?: An Algebraic Framework with RKHSs
Difficulty–Diversity Collaborative Filtering for Data-Efficient LLM Fine-Tuning
Hierarchical Multi-Stage Recovery Framework for Kronecker Compressed Sensing
Articulation in Motion: Prior-free Part Mobility Analysis for Articulated Objects By Dynamic-Static Disentanglement
DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems
Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning
Reasoning Boosts Opinion Alignment in LLMs
Cache-to-Cache: Direct Semantic Communication Between Large Language Models
Event-T2M: Event-level Conditioning for Complex Text-to-Motion Synthesis
Bridging the performance-gap between target-free and target-based reinforcement learning
Best-of-Infinity: Asymptotic Performance of Test-Time Compute
Leveraging Discrete Function Decomposability for Scientific Design
ComGS: Efficient 3D Object-Scene Composition via Surface Octahedral Probes
Dens3R: A Foundation Model for 3D Geometry Prediction
Reward Is Enough: LLMs Are In-Context Reinforcement Learners
Laplacian Multi-scale Flow Matching for Generative Modeling
Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation
OpenPros: A Large-Scale Dataset for Limited View Prostate Ultrasound Computed Tomography
Low-Rank Few-Shot Node Classification by Node-Level Graph Diffusion
Frozen Policy Iteration: Computationally Efficient RL under Linear $Q^{\pi}$ Realizability for Deterministic Dynamics
TEDM: Time Series Forecasting with Elucidated Diffusion Models
A foundation model with multi-variate parallel attention to generate neuronal activity
When Shift Happens - Confounding Is to Blame
Memorization Through the Lens of Sample Gradients
SpecBranch: Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism
BIRD: Behavior Induction via Representation-structure Distillation
Seeing Across Views: Benchmarking Spatial Reasoning of Vision-Language Models in Robotic Scenes
Tab-MIA: A Benchmark Dataset for Membership Inference Attacks on Tabular Data in LLMs
Calibrated Information Bottleneck for Trusted Multi-modal Clustering
OSCAR: Online Soft Compression for RAG
UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs
ARFlow: Auto-regressive Optical Flow Estimation for Arbitrary-Length Videos via Progressive Next-Frame Forecasting
Risk-Sensitive Agent Compositions
SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs
Transferable and Stealthy Adversarial Attacks on Large Vision-Language Models
Nudging the Boundaries of LLM Reasoning
Beyond English-Centric Training: How Reinforcement Learning Improves Cross-Lingual Reasoning in LLMs
PAC-Bayes bounds for cumulative loss in Continual Learning
Random Spiking Neural Networks are Stable and Spectrally Simple
Sample-Efficient Distributionally Robust Multi-Agent Reinforcement Learning via Online Interaction
What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging
The Price of Robustness: Stable Classifiers Need Overparameterization
UltraGauss: Ultrafast Gaussian Reconstruction of 3D Ultrasound Volumes
Prima.cpp: Fast 30-70B LLM Inference on Heterogeneous and Low-Resource Home Clusters
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Finite-Time Convergence Analysis of ODE-based Generative Models for Stochastic Interpolants
Learning to Adapt: In-Context Learning Beyond Stationarity
OWL : Geometry-Aware Spatial Reasoning for Audio Large Language Models
TAVAE: A VAE with Adaptable Priors Explains Contextual Modulation in the Visual Cortex
Histopathology-Genomics Multi-modal Structural Representation Learning for Data-Efficient Precision Oncology
VidGuard-R1: AI-Generated Video Detection and Explanation via Reasoning MLLMs and RL
Breaking the Correlation Plateau: On the Optimization and Capacity Limits of Attention-Based Regressors
TRIDENT: Cross-Domain Trajectory Spatio-Temporal Representation via Distance-Preserving Triplet Learning
CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow-Map Models
Beyond In-Domain Detection: SpikeScore for Cross-Domain Hallucination Detection
Improving Classifier-Free Guidance in Masked Diffusion: Low-Dim Theoretical Insights with High-Dim Impact
A Representer Theorem for Hawkes Processes via Penalized Least Squares Minimization
Self-Improving Skill Learning for Robust Skill-based Meta-Reinforcement Learning
SONA: Learning Conditional, Unconditional, and Mismatching-Aware Discriminator
Boosting Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning
Trade in Minutes! Rationality-Driven Agentic System for Quantitative Financial Trading
A Primer on SO(3) Action Representations in Deep Reinforcement Learning
Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception
InnovatorBench: Evaluating Agents’ Ability to Conduct Innovative AI Research
HiddenEcho: Mitigating Noise Amplification in Differentially Private LLMs with Hidden-State Correction
REMem: Reasoning with Episodic Memory in Language Agent
Learning Survival Distributions with Individually Calibrated Asymmetric Laplace Distribution
Designing Affine-Invariant Neural Networks for Photometric Corruption Robustness and Generalization
Convergence Analysis of Tsetlin Machines for Basic Boolean Operators under Noise-Free and Noisy Training Conditions
MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites
Convex Dominance in Deep Learning: A Scaling Law of Loss and Learning Rate
ReCAPA: Hierarchical Predictive Correction to Mitigate Cascading Failures
WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning
Q-Learning with Fine-Grained Gap-Dependent Regret
Stacked from One: Multi-Scale Self-Injection for Context Window Extension
DriveAgent-R1: Advancing VLM-based Autonomous Driving with Active Perception and Hybrid Thinking
Resurfacing the Instance-only Dependent Label Noise Model through Loss Correction
SiNGER: A Clearer Voice Distills Vision Transformers Further
Residual Feature Integration is Sufficient to Prevent Negative Transfer
Single Index Bandits: Generalized Linear Contextual Bandits with Unknown Reward Functions
pFedMMA: Personalized Federated Fine-Tuning with Multi-Modal Adapter for Vision-Language Models
Privacy-Protected Causal Survival Analysis Under Distribution Shift
VoMP: Predicting Volumetric Mechanical Property Fields
SCAD: Super-Class-Aware Debiasing for Long-Tailed Semi-Supervised Learning
Low Rank Transformer for Multivariate Time Series Anomaly Detection and Localization
FROST: Filtering Reasoning Outliers with Attention for Efficient Reasoning
Flipping the Dialogue: Training and Evaluating User Language Models
Attend to the Active: Structure-Aware Dynamic Attention in LLMs for Compositional Instruction Following
Aligning Collaborative View Recovery and Tensorial Subspace Learning via Latent Representation for Incomplete Multi-View Clustering
Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis
Grouping Nodes with known Value Differences: A lossless UCT-based Abstraction Algorithm
HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs
Optimal Robust Subsidy Policies for Irrational Agent in Principal-Agent MDPs
CaTs and DAGs: Integrating Directed Acyclic Graphs with Transformers for Causally Constrained Predictions
ForestPersons: A Large-Scale Dataset for Under-Canopy Missing Person Detection
Transformers are Inherently Succinct
ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs
Never Saddle: Reparameterized Steepest Descent as Mirror Flow
SNAP-UQ: Self-supervised Next-Activation Prediction for Single-Pass Uncertainty in TinyML
SCRAPL: Scattering Transform with Random Paths for Machine Learning
Bayesian Influence Functions for Hessian-Free Data Attribution
HeurekaBench: A Benchmarking Framework for AI Co-scientist
$AutoDrive\text{-}P^3$: Unified Chain of Perception–Prediction–Planning Thought via Reinforcement Fine-Tuning
PlantRSR: A New Plant Dataset and Method for Reference-based Super-Resolution
Algorithm Generation via Creative Ideation
UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding
FlowNIB: An Information Bottleneck Analysis of Bidirectional vs. Unidirectional Language Models
MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark for Reasoning-Intensive Multimodal Retrieval
Spinning Straw into Gold: Relabeling LLM Agent Trajectories in Hindsight for Successful Demonstrations
Hyperbolic Aware Minimization: Implicit Bias for Sparsity
Decision-Theoretic Approaches for Improved Learning-Augmented Algorithms
Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation
Repurposing Synthetic Data for Fine-grained Search Agent Supervision
TROLL: Trust Regions Improve Reinforcement Learning for Large Language Models
Scalable Offline Model-Based RL with Action Chunks
ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
PACEbench: A Framework for Evaluating Practical AI Cyber-Exploitation Capabilities
Metis: Training LLMs with FP4 Quantization
DisTaC: Conditioning Task Vectors via Distillation for Robust Model Merging
SafeMoE: Safe Fine-Tuning for MoE LLMs by Aligning Harmful Input Routing
Shift-Tolerant Allocation via Black-Litterman Using Conditional Diffusion Estimates
Video-LevelGauge: Investigating Contextual Positional Bias in Video Language Models.
Curvature-Guided Task Synergy for Skeleton based Temporal Action Segmentation
TD-MoE: Tensor Decomposition for MoE Models
Learning Robust Intervention Representations with Delta Embeddings
Toward Universal and Transferable Jailbreak Attacks on Vision-Language Models
Mastering Sparse CUDA Generation through Pretrained Models and Deep Reinforcement Learning
Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment
Reformulation for Pretraining Data Augmentation
Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions
Abstracting Robot Manipulation Skills via Mixture-of-Experts Diffusion Policies
Compositional Generalization from Learned Skills via CoT Training: A Theoretical and Structural Analysis for Reasoning
Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry
Language and Experience: A Computational Model of Social Learning in Complex Tasks
Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs
Efficient Discriminative Joint Encoders for Large Scale Vision-Language Reranking
KinemaDiff: Towards Diffusion for Coherent and Physically Plausible Human Motion Prediction
Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique
LLMs Can Hide Text in Other Text of the Same Length
CodeGenGuard: A Robust Watermark for Code Generation Models
Towards Reliable Benchmarking: A Contamination Free, Controllable Evaluation Framework for Multi-step LLM Function Calling
LLM DNA: Tracing Model Evolution via Functional Representations
AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill Diversification
Learning Retrieval Models with Sparse Autoencoders
$\textbf{Re}^{2}$: Unlocking LLM Reasoning via Reinforcement Learning with Re-solving
DESIGNER: Design-Logic-Guided Multidisciplinary Data Synthesis for LLM Reasoning
Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning
Towards Self-Robust LLMs: Intrinsic Prompt Noise Resistance via CoIPO
NFT: Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
Action-Free Offline-To-Online RL via Discretised State Policies
Learning-Augmented Moment Estimation on Time-Decay Models
Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition
ATGen: Adversarial Reinforcement Learning for Test Case Generation
Asynchronous Policy Gradient Aggregation for Efficient Distributed Reinforcement Learning
DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models
Decomposing Representation Space into Interpretable Subspaces with Unsupervised Learning
DeNOTS: Stable Deep Neural ODEs for Time Series
Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks
SysMoBench: Evaluating AI on Formally Specifying Complex Real-World Systems
Sci2Pol: Evaluating and Fine-tuning LLMs on Scientific-to-Policy Brief Generation
MolecularIQ: Characterizing Chemical Reasoning Capabilities Through Symbolic Verification on Molecular Graphs
Interaction-aware Representation Modeling With Co-Occurrence Consistency for Egocentric Hand-Object Parsing
Conformal Prediction with Corrupted Labels: Uncertain Imputation and Robust Re-weighting
LinearRAG: Linear Graph Retrieval Augmented Generation on Large-scale Corpora
ProPerSim: Developing Proactive and Personalized AI Assistants through User-Assistant Simulation
Texture Vector-Quantization and Reconstruction Aware Prediction for Generative Super-Resolution
DiffusionNFT: Online Diffusion Reinforcement with Forward Process
Outrageously Large Context Windows via RACE Attention -- A Family of Non-Linear Attention that can be calculated in Strictly Linear-Time
Property-Driven Protein Inverse Folding with Multi-Objective Preference Alignment
EigenBench: A Comparative Behavioral Measure of Value Alignment
Time-to-Move: Training-Free Motion-Controlled Video Generation via Dual-Clock Denoising
Automatic and Structure-Aware Sparsification of Hybrid Neural ODEs with Application to Glucose Prediction
Is the Reversal Curse a Binding Problem? Uncovering Limitations of Transformers from a Basic Generalization Failure
IAGA: Identity-Aware Gaussian Approximation for Efficient 3D Molecular Generation
MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks
Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency
Long-Context Generalization with Sparse Attention
SNAPHARD CONTRAST LEARNING
Attention, Please! Revisiting Attentive Probing Through the Lens of Efficiency
VowelPrompt: Hearing Speech Emotions from Text via Vowel-level Prosodic Augmentation
DeepPrim: a Physics-Driven 3D Short-term Weather Forecaster via Primitive Equation Learning
Discrete Diffusion Trajectory Alignment via Stepwise Decomposition
Get RICH or Die Scaling: Profitably Trading Inference Compute for Robustness
FOCUS: Efficient Keyframe Selection for Long Video Understanding
station2radar: query‑conditioned gaussian splatting for precipitation field
Tighter Performance Theory of FedExProx
Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts
Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models
Incentive-Aligned LLM Summaries
Beyond Structure: Invariant Crystal Property Prediction with Pseudo-Particle Ray Diffraction
Look-ahead Reasoning with a Learned Model in Imperfect Information Games
Many Eyes, One Mind: Temporal Multi-Perspective and Progressive Distillation for Spiking Neural Networks
PaAno: Patch-Based Representation Learning for Time-Series Anomaly Detection
InfoTok: Adaptive Discrete Video Tokenizer via Information-Theoretic Compression
Active Learning for Decision Trees with Provable Guarantees
Horseshoe Splatting: Handling Structural Sparsity for Uncertainty-Aware Gaussian-Splatting Radiance Field Rendering
RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling
Search Arena: Analyzing Search-Augmented LLMs
What Matters for Bioacoustic Encoding
Choices Speak Louder than Questions
Token-based Audio Inpainting via Discrete Diffusion
Much Ado About Noising: Do Flow Models Actually Make Better Control Policies?
Matched Data, Better Models: Target Aligned Data Filtering with Sparse Features
Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts
Homeostatic Adaptation of Optimal Population Codes under Metabolic Stress
Learning Physics-Grounded 4D Dynamics with Neural Gaussian Force Fields
Embracing Discrete Search: A Reasonable Approach to Causal Structure Learning
Enabling Your Forensic Detector Know How Well It Performs on Distorted Samples
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs
Eliminating VAE for Fast and High-Resolution Generative Detail Restoration
Long-range Modeling and Processing of Multimodal Event Sequences
Incomplete Data, Complete Dynamics: A Diffusion Approach
Metric $k$-clustering using only Weak Comparison Oracles
Formalising Human-in-the-Loop: Computational Reductions, Failure Modes, and Legal-Moral Responsibility
Densemarks: Learning Canonical Embeddings for Human Heads Images via Point Tracks
The Quest for Generalizable Motion Generation: Data, Model, and Evaluation
SafeDialBench: A Fine-Grained Safety Evaluation Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak Attacks
Bayesian Test-Time Adaptation via Dirichlet feature projection and GMM-Driven Inference for Motor Imagery EEG Decoding
From Abstract to Contextual: What LLMs Still Cannot Do in Mathematics
Birch SGD: A Tree Graph Framework for Local and Asynchronous SGD Methods
Benchmarking Stochastic Approximation Algorithms for Fairness-Constrained Training of Deep Neural Networks
SEED: Towards More Accurate Semantic Evaluation for Visual Brain Decoding
Characteristic Root Analysis and Regularization for Linear Time Series Forecasting
Mitigating Hallucination in Vision-Language Model with Depth and Spatial-aware Key-Value Refinement
Natural Identifiers for Privacy and Data Audits in Large Language Models
Unified Privacy Guarantees for Decentralized Learning via Matrix Factorization
Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models
Error Notebook-Guided, Training-Free Part Retrieval in 3D CAD Assemblies via Vision-Language Models
MoSA: Motion-Coherent Human Video Generation via Structure-Appearance Decoupling
ConvT3: Structured State Kernels for Convolutional State Space Models
On the Benefits of Weight Normalization for Overparameterized Matrix Sensing
EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems
Lifelong Learning with Behavior Consolidation for Vehicle Routing
Learning Recursive Multi-Scale Representations for Irregular Multivariate Time Series Forecasting
SpectraLLM: Uncovering the Ability of LLMs for Molecule Structure Elucidation from Multi-Spectra
TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design
Change Point Localization and Inference in Dynamic Multilayer Networks
BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, and Rerankers
Physics-informed learning under mixing: How physical knowledge speeds up learning
Endowing GPT-4 with a Humanoid Body: Building the Bridge Between Off-the-Shelf VLMs and the Physical World
Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs
StylOS: Multi-View 3D Stylization with Single-Forward Gaussian Splatting
MoRA: Mobility as the Backbone for Geospatial Representation Learning at Scale
CRONOS: Continuous time reconstruction for 4D medical longitudinal series
AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy
InfGen: Scenario Generation as Next Token Group Prediction
ERGO: Efficient High-Resolution Visual Understanding for Vision-Language Models
Policy Likelihood-based Query Sampling and Critic-Exploited Reset for Efficient Preference-based Reinforcement Learning
Adaptive Collaboration with Humans: Metacognitive Policy Optimization for Multi-Agent LLMs with Continual Learning
PHyCLIP: $\ell_1$-Product of Hyperbolic Factors Unifies Hierarchy and Compositionality in Vision-Language Representation Learning
VCWorld: A Biological World Model for Virtual Cell Simulation
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models
Dynamical properties of dense associative memory
Detecting and Mitigating Memorization in Diffusion Models through Anisotropy of the Log-Probability
VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation
From Assumptions to Actions: Turning LLM Reasoning into Uncertainty-Aware Planning for Embodied Agents
CroCoDiLight: Repurposing Cross-View Completion Encoders for Relighting
Interaction Field Matching: Overcoming Limitations of Electrostatic Models
AutoCode: LLMs as Problem Setters for Competitive Programming
DemoGrasp: Universal Dexterous Grasping from a Single Demonstration
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
Multiplicative Diffusion Models: Beyond Gaussian Latents
Bias Similarity Measurement: A Black-Box Audit of Fairness Across LLMs
SyncTrack: Rhythmic Stability and Synchronization in Multi-Track Music Generation
Minor First, Major Last: A Depth-Induced Implicit Bias of Sharpness-Aware Minimization
SiMO: Single-Modality-Operable Multimodal Collaborative Perception
ZeroTuning: Unlocking the Initial Token's Power to Enhance Large Language Models Without Training
Bridging ML and algorithms: comparison of hyperbolic embeddings
Self-Destructive Language Models
ComPhy: Composing Physical Models with end-to-end Alignment
VUDG: A Dataset for Video Understanding Domain Generalization
LayerSync: Self-aligning Intermediate Layers
Unmasking Backdoors: An Explainable Defense via Gradient-Attention Anomaly Scoring for Pre-trained Language Models
Detecting Invariant Manifolds in ReLU-Based RNNs
Brain-Semantoks: Learning Semantic Tokens of Brain Dynamics with a Self-Distilled Foundation Model
MCbiF: Measuring Topological Autocorrelation in Multiscale Clusterings via 2-Parameter Persistent Homology
EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements
EigenScore: OOD Detection using Posterior Covariance in Diffusion Models
Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models
MedGMAE: Gaussian Masked Autoencoders for Medical Volumetric Representation Learning
ACCORD: Alleviating Concept Coupling through Dependence Regularization for Text-to-Image Diffusion Personalization
Functional MRI Time Series Generation via Wavelet-Based Image Transform and Spectral Flow Matching for Brain Disorder Identification
Efficient Audio-Visual Speech Separation with Discrete Lip Semantics and Multi-Scale Global-Local Attention
Robustness in Text-Attributed Graph Learning: Insights, Trade-offs, and New Defenses
ASIDE: Architectural Separation of Instructions and Data in Language Models
Constraint-guided Hardware-aware NAS through Gradient Modification
Beyond Membership: Limitations of Add/Remove Adjacency in Differential Privacy
TokenSeek: Memory Efficient Fine Tuning via Instance-Aware Token Ditching
Reinforcement Unlearning via Group Relative Policy Optimization
Medical Interpretability and Knowledge Maps of Large Language Models
Tractability via Low Dimensionality: The Parameterized Complexity of Training Quantized Neural Networks
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
DiVeQ: Differentiable Vector Quantization Using the Reparameterization Trick
Beyond Ensembles: Simulating All-Atom Protein Dynamics in a Learned Latent Space
Tracing and Reversing Edits in LLMs: A Study on Rank-One Model Edits
Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models
Inducing Dyslexia in Vision Language Models
Inpainting-Guided Policy Optimization for Diffusion Large Language Models
Atomic HINs: Entity-Attribute Duality for Heterogeneous Graph Modeling
PatchDNA: A Flexible and Biologically-Informed Alternative to Tokenization for DNA
Exploring the Basin-Like Loss Landscape in Large Language Models
GALAX: Graph-Augmented Language Model for Explainable Reinforcement-Guided Subgraph Reasoning in Precision Medicine
Bridging Explainability and Embeddings: BEE Aware of Spuriousness
Splat and Distill: Augmenting Teachers with Feed-Forward 3D Reconstruction For 3D-Aware Distillation
Fast Frank–Wolfe Algorithms with Adaptive Bregman Step-Size for Weakly Convex Functions
A New Initialization to Control Gradients in Sinusoidal Neural Networks
Cut Less, Fold More: Model Compression through the Lens of Projection Geometry
Mixture-of-Visual-Thoughts: Exploring Context-Adaptive Reasoning Mode Selection for General Visual Reasoning
Towards All-Atom Foundation Models for Biomolecular Binding Affinity Prediction
HardcoreLogic: Challenging Large Reasoning Models with Long-tail Logic Puzzle Games
A Single Architecture for Representing Invariance Under Any Space Group
PiCa: Parameter-Efficient Fine-Tuning with Column Space Projection
Tokenisation over Bounded Alphabets is Hard
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
Conformal Prediction for Long-Tailed Classification
From Neural Networks to Logical Theories: The Correspondence between Fibring Modal Logics and Fibring Neural Networks
DexNDM: Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-Wise Neural Dynamics Model
Diffusion-DFL: Decision-focused Diffusion Models for Stochastic Optimization
IncVGGT: Incremental VGGT for Memory-Bounded Long-Range 3D Reconstruction
RESCHED: Rethinking Flexible Job Shop Scheduling from a Transformer-based Architecture with Simplified States
Constrained Decoding of Diffusion LLMs with Context-Free Grammars
MoGen: Detailed Neuronal Morphology Generation via Point Cloud Flow Matching
Online Black-Box Prompt Optimization with Regret Guarantees under Noisy Feedback
Imitating the Truth: Attention-aware Truth-Guided Enhancement for Hallucination Mitigation in Large Vision-Language Models
Decoupling Dynamical Richness from Representation Learning: Towards Practical Measurement
Beyond Entity Correlations: Disentangling Event Causal Puzzles in Temporal Knowledge Graphs
BoreaRL: A Multi-Objective Reinforcement Learning Environment for Climate-Adaptive Boreal Forest Management
Learning Self-Critiquing Mechanisms for Region-Guided Chest X-Ray Report Generation
Revisting Node Affinity Prediction In Temporal Graphs
Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer
THE PATH OF LEAST RESISTANCE: GUIDING LLM REASONING TRAJECTORIES WITH PREFIX CONSENSUS
Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!
Guidance Watermarking for Diffusion Models
Hierarchical Concept-based Interpretable Models
STITCH: Simultaneous Thinking and Talking with Chunked Reasoning for Spoken Language Models
Reliable Evaluation of MRI Motion Correction: Dataset and Insights
LLaVAction: evaluating and training multi-modal large language models for action understanding
High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes
Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization
Beyond Simple Graphs: Neural Multi-Objective Routing on Multigraphs
TableDART: Dynamic Adaptive Multi-Modal Routing for Table Understanding
Diverse and Sparse Mixture-of-Experts for Causal Subgraph–Based Out-of-Distribution Graph Learning
AbstRaL: Augmenting LLMs' Reasoning by Reinforcing Abstract Thinking
FlowCast: Advancing Precipitation Nowcasting with Conditional Flow Matching
SPREAD: Sampling-based Pareto front Refinement via Efficient Adaptive Diffusion
Barriers for Learning in an Evolving World: Mathematical Understanding of Loss of Plasticity
PGRF-Net: A Prototype-Guided Relational Fusion Network for Diagnostic Multivariate Time-Series Anomaly Detection
Robust Generalized Schr\"{o}dinger Bridge via Sparse Variational Gaussian Processes
On the Convergence Direction of Gradient Descent
Frozen Priors, Fluid Forecasts: Prequential Uncertainty for Low-Data Deployment with Pretrained Generative Models
Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective
ATPO: ADAPTIVE TREE POLICY OPTIMIZATION FOR MULTI-TURN MEDICAL DIALOGUE
Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models
EnvSocial-Diff: A Diffusion-Based Crowd Simulation Model with Environmental Conditioning and Individual- Group Interaction
New Hybrid Fine-Tuning Paradigm for LLMs: Algorithm Design and Convergence Analysis Framework
SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks
Inferring the Invisible: Neuro-Symbolic Rule Discovery for Missing Value Imputation
GRADIEND: Feature Learning within Neural Networks Exemplified through Biases
Topology and geometry of the learning space of ReLU networks: connectivity and singularities
A Unification of Discrete, Gaussian, and Simplicial Diffusion
Conformalized Survival Counterfactuals Prediction for General Right-Censored Data
RM-R1: Reward Modeling as Reasoning
Code World Models for General Game Playing
MIRA: Memory-Integrated Reinforcement Learning Agent with Limited LLM Guidance
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?
Principled Fast and Meta Knowledge Learners for Continual Reinforcement Learning
Quasi-Monte Carlo Methods Enable Extremely Low-Dimensional Deep Generative Models
GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving
Learning Data-Efficient and Generalizable Neural Operators via Fundamental Physics Knowledge
LAMDA: A Longitudinal Android Malware Benchmark for Concept Drift Analysis
Differentiable Lifting for Topological Neural Networks
Tensor learning with orthogonal, Lorentz, and symplectic symmetries
Diversity-Enhanced Reasoning for Subjective Questions
Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models
GeoFAR: Geography-Informed Frequency-Aware Super-Resolution for Climate Data
Mechanistic Independence: A Principle for Identifiable Disentangled Representations
Bandits with Single-Peaked Preferences and Limited Resources
ProstaTD: Bridging Surgical Triplet from Classification to Fully Supervised Detection
Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction
Context and Diversity Matter: The Emergence of In-Context Learning in World Models
Spatially Informed Autoencoders for Interpretable Visual Representation Learning
DiffSDA: Unsupervised Diffusion Sequential Disentanglement Across Modalities
ChunkTabPFN: Training-free Long Context
Celo: Training Versatile Learned Optimizers on a Compute Diet
A Case for Library-Level k-Means Binning in Histogram Gradient-Boosted Trees
NeoBERT: A Next Generation BERT
Chimera: State Space Models Beyond Sequences
MobileCLIP2: Improving Multi-Modal Reinforced Training
Light of Normals: Unified Feature Representation for Universal Photometric Stereo
Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization
HalluEntity: Benchmarking and Understanding Entity-Level Hallucination Detection
Generalized Compressed Sensing for Image Reconstruction with Diffusion Probabilistic Models
Dynamics-inspired Structure Hallucination for Protein-protein Interaction Modeling
LC-PLM: Long-context Protein Language Modeling Using Bidirectional Mamba with Shared Projection Layers
On the stability of gradient descent with second order dynamics for time-varying cost functions
ODNet: Opinion Dynamics-Inspired Neural Message Passing for Graphs and Hypergraphs
Adversarial Robustness of Graph Transformers
t-SNE Exaggerates Clusters, Provably
Information Theoretic Guarantees For Policy Alignment In Large Language Models
CNN Interpretability with Multivector Tucker Saliency Maps for Self-Supervised Models
Faster Diffusion Through Temporal Attention Decomposition
FlashAttention on a Napkin: A Diagrammatic Approach to Deep Learning IO-Awareness
Beyond a Million Tokens: Benchmarking and Enhancing Long-Term Memory in LLMs
Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness
ReFeR: Improving Evaluation and Reasoning through Hierarchy of Models
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
Training Dynamics of the Cooldown Stage in Warmup-Stable-Decay Learning Rate Scheduler
Assessing Robustness via Score-Based Adversarial Image Generation
Iterated Q-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning
Are Domain Generalization Benchmarks with Accuracy on the Line Misspecified?
Directed Exploration in Reinforcement Learning from Linear Temporal Logic
Model Tensor Planning
LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them?
An Expanded Benchmark that Rediscovers and Affirms the Edge of Uncertainty Sampling for Active Learning in Tabular Datasets
SCas4D: Structural Cascaded Optimization for Boosting Persistent 4D Novel View Synthesis
Relational Graph Transformer
PCF Learned Sort: a Learning Augmented Sort Algorithm with O(nloglogn) Expected Complexity
Encoder-only Next Token Prediction
Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation
Multi-Bellman operator for convergence of Q-learning with linear function approximation
Online Selective Conformal Inference: Errors and Solutions
Leveraging a Simulator for Learning Causal Representations from Post-Treatment Covariates for CATE
Learning Deformable Body Interactions With Adaptive Spatial Tokenization
Setting the Record Straight on Transformer Oversmoothing
Enhancing Vision-Language Model with Unmasked Token Alignment
PCNN: Probable-Class Nearest-Neighbor Explanations Improve Fine-Grained Image Classification Accuracy for AIs and Humans
back arrowGo to TMLR homepage Slicing the Gaussian Mixture Wasserstein Distance
Defending Against Unknown Corrupted Agents: Reinforcement Learning of Adversarially Robust Nash Equilibria
Measuring and Mitigating Rapport Bias of Large Language Models under Multi-Agent Social Interactions
Distributed Quasi-Newton Method for Fair and Fast Federated Learning
Adaptive Mesh Quantization for Neural PDE Solvers
Discrete Audio Tokens: More Than a Survey!
Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models
Statistical Guarantees for Approximate Stationary Points of Shallow Neural Networks
Inverse Scaling in Test-Time Compute
Influence-Preserving Proxies for Gradient-Based Data Selection in LLM FineTuning
Training Dynamics of Learning 3D-Rotational Equivariance
Deft Scheduling of Dynamic Cloud Workflows with Varying Deadlines via Mixture-of-Experts
AB-UPT: Scaling Neural CFD Surrogates for High- Fidelity Automotive Aerodynamics Simulations via Anchored- Branched Universal Physics Transformers
Variational Pseudo Marginal Methods for Jet Reconstruction in Particle Physics
Amortized Inference of Causal Models via Conditional Fixed-Point Iterations
An Information-Theoretic Lower Bound on the Generalization Error of Autoencoders
MaRS: Memory-Adaptive Routing for Reliable Capacity Expansion and Knowledge Retention
Synergistic Benefits of Joint Molecule Generation and Property Prediction
Tabby: A Language Model Architecture for Tabular and Structured Data Synthesis
StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs
Rex-Thinker: Grounded Object Referring via Chain-of-Thought Reasoning
On The Fragility of Benchmark Contamination Detection in Reasoning Models
What is the Relationship between Tensor Factorizations and Circuits (and How Can We Exploit it)?
Learning Energy-Based Generative Models via Potential Flow: A Variational Principle Approach to Probability Density Homotopy Matching
TiTok: Transfer Token-level Knowledge via Contrastive Excess to Transplant LoRA
GIR-Bench: Versatile Benchmark for Generating Images with Reasoning
Learning To Draft: Adaptive Speculative Decoding with Reinforcement Learning
A.I.R.: Enabling Adaptive, Iterative, and Reasoning-based Frame Selection For Video Question Answering
UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos
ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing
Membership Inference Attacks Against Fine-tuned Diffusion Language Models
One for Two: A Unified Framework for Imbalanced Graph Classification via Dynamic Balanced Prototype
When Does Divide and Conquer Work for Long Context LLM? A Noise Decomposition Framework
Reasoning Models Can be Accurately Pruned Via Chain-of-Thought Reconstruction
Video-GPT via Next Clip Diffusion
SFT Doesn’t Always Hurt General Capabilities: Revisiting Domain-Specific Fine-Tuning in LLMs
ChronoPlay: A Framework for Modeling Dual Dynamics and Authenticity in Game RAG Benchmarks
Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs
Policy Contrastive Decoding for Robotic Foundation Models
Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model
Autoregressive-based Progressive Coding for Ultra-Low Bitrate Image Compression
Beyond Uniformity: Sample and Frequency Meta Weighting for Post-Training Quantization of Diffusion Models
Vid-LLM: A Compact Video-based 3D Multimodal LLM with Reconstruction–Reasoning Synergy
Chart Deep Research in LVLMs via Parallel Relative Policy Optimization
CaRe-BN: Precise Moving Statistics for Stabilizing Spiking Neural Networks in Reinforcement Learning
LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing
Partially Equivariant Reinforcement Learning in Symmetry-Breaking Environments
Contact-guided Real2Sim from Monocular Video with Planar Scene Primitives
GenCompositor: Generative Video Compositing with Diffusion Transformer
The Mind's Transformer: Computational Neuroanatomy of LLM-Brain Alignment
Bound by semanticity: universal laws governing the generalization-identification tradeoff
FastGRPO: Accelerating Policy Optimization via Concurrency-aware Speculative Decoding and Online Draft Learning
Optimal transport unlocks end-to-end learning for single-molecule localization
Learning Dynamics of Logits Debiasing for Long-Tailed Semi-Supervised Learning
ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents
Tight Bounds for Schrodinger Potential Estimation in Unpaired Data Translation
From atom to space: A region-based readout function for spatial properties of materials
Where Did It Go Wrong? Attributing Undesirable LLM Behaviors via Representation Gradient Tracing
Confident Block Diagonal Structure-Aware Invariable Graph Completion for Incomplete Multi-view Clustering
Towards Lossless Memory-efficient Training of Spiking Neural Networks via Gradient Checkpointing and Spike Compression
The Achilles’ Heel of LLMs: How Altering a Handful of Neurons Can Cripple Language Abilities
FERD: Fairness-Enhanced Data-Free Adversarial Robustness Distillation
AesCoder: Code Aesthetics with Agentic Reward Feedback
Overshoot and Shrinkage in Classifier-Free Guidance: From Theory to Practice
Gradient-Aligned Calibration for Post-Training Quantization of Diffusion Models
Protection against Source Inference Attacks in Federated Learning
Disentangled representation learning through unsupervised symmetry group discovery
Pose-RFT: Aligning MLLMs for 3D Pose Generation via Hybrid Action Reinforcement Fine-Tuning
Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs
IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?
When to use Graphs in RAG: A Comprehensive Analysis for Graph Retrieval-Augmented Generation
SPICE: Submodular Penalized Information–Conflict Selection for Efficient Large Language Model Training
Cat-PO: Cross-modal Adaptive Token-rewards for Preference Optimization in Truthful Multimodal LLMs
AdaSpec: Adaptive Spectrum for Enhanced Node Distinguishability
Benchmarking ECG Foundational Models: A Reality Check Across Clinical Tasks
PCB-Bench: Benchmarking LLMs for Printed Circuit Board Placement and Routing
Adaptive Methods Are Preferable in High Privacy Settings: An SDE Perspective
Unveiling the Potential of Diffusion Large Language Model in Controllable Generation
PredNext: Explicit Cross-View Temporal Prediction for Unsupervised Learning in Spiking Neural Networks
CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs
Octax: Accelerated CHIP-8 Arcade Environments for Reinforcement Learning in JAX
Artistic Style and the Play of Neural Style Representations
Neural Hamilton--Jacobi Characteristic Flows for Optimal Transport
EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models
Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
GPS: Directed Acyclic Graph guided Proactive Information Seeking in Large Language Models
ArtVIP: Articulated Digital Assets of Visual Realism, Modular Interaction, and Physical Fidelity for Robot Learning
Latent-to-Data Cascaded Diffusion Models for Unconditional Time Series Generation
Dataset Distillation for Memorized Data: Soft Labels can Leak Held-Out Teacher Knowledge
Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search
Beyond Skeletons: Learning Animation Directly from Driving Videos with Same2X Training Strategy
BigMac3D: A Big Macaque Motion and Animation Dataset Bridging Image and 3D Pose Representations
Steering the Herd: A Framework for LLM-based Control of Social Learning
Vivid-VR: Distilling Concepts from Text-to-Video Diffusion Transformer for Photorealistic Video Restoration
Stochastic Self-Organization in Multi-Agent Systems
SWE-RM: Execution-free Feedback for Software Engineering Agents
LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning
Adaptive Debiasing Tsallis Entropy for Test-Time Adaptation
: One LLM Token for Explicit Graph Structural Understanding
Hessian-Enhanced Token Attribution (HETA): Interpreting Autoregressive LLMs
MOLM: Mixture of LoRA Markers
Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models
Towards High Data Efficiency in Reinforcement Learning with Verifiable Reward
Expanding Reasoning Potential in Foundation Model by Learning Diverse Chains of Thought Patterns
Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation
Constructive Distortion: Improving MLLMs with Attention-Guided Image Warping
Monocular Normal Estimation via Shading Sequence Estimation
Automated Interpretability Metrics Do Not Distinguish Trained and Random Transformers
Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains
Trajectory-aware Shifted State Space Models for Online Video Super-Resolution
Bridging Input Feature Spaces Towards Graph Foundation Models
Towards Improved Sentence Representations using Token Graphs
Local Entropy Search over Descent Sequences for Bayesian Optimization
Learning from Historical Activations in Graph Neural Networks
Fractional-Order Spiking Neural Network
Action-aware Dynamic Pruning for Efficient Vision-Language-Action Manipulation
VeriTrail: Closed-Domain Hallucination Detection with Traceability
Tree Search for LLM Agent Reinforcement Learning
Video-KTR: Reinforcing Video Reasoning via Key Token Attribution
SGD-Based Knowledge Distillation with Bayesian Teachers: Theory and Guidelines
Verifying Chain-of-Thought Reasoning via its Computational Graph
Hallucination Reduction with CASAL: Contrastive Activation Steering for Amortized Learning
EntropyLong: Effective Long-Context Training via Predictive Uncertainty
Cross-Modal Redundancy and the Geometry of Vision–Language Embeddings
The Gaussian-Head OFL Family: One-Shot Federated Learning from Client Global Statistics
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
Random Label Prediction Heads for Studying and Controlling Memorization in Deep Neural Networks
Efficient Spatially-Variant Convolution via Differentiable Sparse Kernel Complex
HierLoc: Hyperbolic Entity Embeddings for Hierarchical Visual Geolocation
Certified Evaluation of Model-Level Explanations for Graph Neural Networks
Automated Formalization via Conceptual Retrieval-Augmented LLMs
NatADiff: Adversarial Boundary Guidance for Natural Adversarial Diffusion
Scaling Agent Learning via Experience Synthesis
Synthesising Counterfactual Explanations via Label-Conditional Gaussian Mixture Variational Autoencoders
OccDriver: Future Occupancy Guided Dual-branch Trajectory Planner in Autonomous Driving
Benchmarking Open-ended Segmentation
Efficient Orthogonal Fine-Tuning with Principal Subspace Adaptation
DRBench: A Realistic Benchmark for Enterprise Deep Research
GEM: A Gym for Generalist LLMs
SSVPO: Effective Step-Level Credit Assignment for RL Training of Language Models
Steering Diffusion Models Towards Credible Content Recommendation
Decomposed Attention Fusion in MLLMs for Training-free Video Reasoning Segmentation
Truthfulness Despite Weak Supervision: Evaluating and Training LLMs Using Peer Prediction
Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test
LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures
From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning
VL-JEPA: Joint Embedding Predictive Architecture for Vision-language
MIAM: Modality Imbalance-Aware Masking for Multimodal Ecological Applications
AutoCodeBench: Large Language Models are Automatic Code Benchmark Generators
Bird's-eye-view Informed Reasoning Driver
Lightweight Transformer for EEG Classification via Balanced Signed Graph Algorithm Unrolling
ERTACache: Error Rectification and Timesteps Adjustment for Efficient Diffusion
Ego-Foresight: Self-supervised Learning of Agent-Aware Representations for Improved RL
From Verifiable Dot to Reward Chain: Harnessing Verifiable Reference-based Rewards for Reinforcement Learning of Open-ended Generation
DirMoE: Dirichlet-Routed Mixture of Experts
SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation
Solving the 2-norm k-hyperplane clustering problem via multi-norm formulations
Efficient Prediction of Large Protein Complexes via Subunit-Guided Hierarchical Refinement
MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents
Sheaves Reloaded: A Direction Awakening
DeepSADR: Deep Transfer Learning with Subsequence Interaction and Adaptive Readout for Cancer Drug Response Prediction
Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind
Decomposing Extrapolative Problem Solving: Spatial Transfer and Length Scaling with Map Worlds
LeanForPhysics: Comprehensive Reasoning Framework for University-level Physics in Lean4
SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence
Beyond Hearing: Learning Task-agnostic ExG Representations from Earphones via Physiology-informed Tokenization
VisualPrompter: Semantic-Aware Prompt Optimization with Visual Feedback for Text-to-Image Synthesis
W-EDIT: A Wavelet-Based Frequency-Aware Framework for Text-Driven Image Editing
Online Minimization of Polarization and Disagreement via Low-Rank Matrix Bandits
Divid: Disentangled Spatial-Temporal Modeling within LLMs for Temporally Grounded Video Understanding
Splat Feature Solver
DRPO: Efficient Reasoning via Decoupled Reward Policy Optimization
Offline Preference-Based Value Optimization
Boolean Satisfiability via Imitation Learning
FastAvatar: Towards Unified and Fast 3D Avatar Reconstruction with Large Gaussian Reconstruction Transformers
GenCP: Towards Generative Modeling Paradigm of Coupled physics with Application to Fluid-Structure Interaction
EchoMotion: Unified Human Video and Motion Generation via Dual-Modality Diffusion Transformer
Privacy Beyond Pixels: Latent Anonymization for Privacy-Preserving Video Understanding
CARE: Covariance-Aware and Rank-Enhanced Decomposition for Enabling Multi-Head Latent Attention
Scale-wise Distillation of Diffusion Models
Revisiting Global Text Conditioning in Diffusion Transformers
Understanding VLMs Spatial Mental Modeling Capability from Limited Views
FlowSearcher: Synthesizing Memory-Guided Agentic Workflows for Web Information Seeking
Inference-Time Dynamic Modality Selection for Incomplete Multimodal Classification
Reasoning-Driven Multimodal LLM for Domain Generalization
StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning
ARMOR: High-Performance Semi-Structured Pruning via Adaptive Matrix Factorization
Seeing What’s Not There: Negation Understanding Needs More Than Training
Ghost in the Cloud: Your Geo-Distributed Large Language Models Training is Easily Manipulated
Revisiting Nonstationary Kernel Design for Multi-Output Gaussian Processes
RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning
Seek-CAD: A Self-refined Generative Modeling for 3D Parametric CAD Using Local Inference via DeepSeek
CoT-Evo: Evolutionary Distillation of Chain-of-Thought for Scientific Reasoning
STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models
NePTune: A Neuro-Pythonic Framework for Tunable Compositional Reasoning on Vision-Language
Efficient Test-Time Scaling for Small Vision-Language Models
Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective
Align Your Structures: Generating Trajectories with Structure Pretraining for Molecular Dynamics
PRISM-Physics: Causal DAG-Based Process Evaluation for Physics Reasoning
MIMIC-Bench: Exploring the User-Like Thinking and Mimicking Capabilities of Multimodal Large Language Models
VisionLaw: Inferring Interpretable Intrinsic Dynamics from Visual Observations via Bilevel Optimization
No Pixel Left Behind: A Detail-Preserving Architecture for Robust High-Resolution AI-Generated Image Detection
Token-Efficient Long-Term Interest Sketching and Internalized Reasoning for LLM-based Recommendation
Sculptor: Empowering LLMs with Cognitive Agency via Active Context Management
SMAN-Bench: A Cross-System Benchmark for Mobile Agents under Single- and Multi-path, Ambiguous, and Noisy Tasks
Sample Efficient Offline RL via T-Symmetry Enforced Latent State-Stitching
Physics vs Distributions: Pareto Optimal Flow Matching with Physics Constraints
SocialJax: An Evaluation Suite for Multi-agent Reinforcement Learning in Sequential Social Dilemmas
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
Non-Collaborative User Simulators for Tool Agents
Influence without Confounding: Causal Discovery from Temporal Data with Long-term Carry-over Effects
GlowQ: Group-Shared LOw-Rank Approximation for Quantized LLMs
Knowledge Externalization: Reversible Unlearning and Modular Retrieval in Multimodal Large Language Models
Diversity-Incentivized Exploration for Versatile Reasoning
Neuron-Aware Data Selection in Instruction Tuning for Large Language Models
FACM: Flow-Anchored Consistency Models
From ``Sure" to ``Sorry": Detecting Jailbreak in Large Vision Language Model via JailNeurons
HumanPCR: Probing MLLM Capabilities in Diverse Human-Centric Scenes
Training Deep Normalization-Free Spiking Neural Networks with Lateral Inhibition.
Unlocking Full Efficiency of Token Filtering in Large Language Model Training
Spherical Watermark: Encryption-Free, Lossless Watermarking for Diffusion Models
Entropy-Monitored Kernelized Token Distillation for Audio-Visual Compression
Revisiting the Past: Data Unlearning with Model State History
DNOD: Deformable Neural Operators for Object Detection in SAR Images
In Agents We Trust, but Who Do Agents Trust? Latent Preferences Steer LLM Generations
Recover Cell Tensor: Diffusion-Equivalent Tensor Completion for Fluorescence Microscopy Imaging
Next Visual Granularity Generation
Knowledge Editing with Subspace-Aware Key-Value Mappings
REI-Bench: Can Embodied Agents Understand Vague Human Instructions in Task Planning?
The Effect of Attention Head Count on Transformer Approximation
DexMove: Learning Tactile-Guided Non-Prehensile Manipulation with Dexterous Hands
ExPO-HM: Learning to Explain-then-Detect for Hateful Meme Detection
Exposing Weaknesses of Large Reasoning Models through Graph Algorithm Problems
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs
Dual-Robust Cross-Domain Offline Reinforcement Learning Against Dynamics Shifts
Model-based Offline RL via Robust Value-Aware Model Learning with Implicitly Differentiable Adaptive Weighting
InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models
Comparing the learning dynamics of in-context learning and fine-tuning in language models
Multifidelity Simulation-based Inference for Computationally Expensive Simulators
Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas
Sobolev Gradient Ascent for Optimal Transport: Barycenter Optimization and Convergence Analysis
Latent Speech-Text Transformer
AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
RMAAT: Astrocyte-Inspired Memory Compression and Replay for Efficient Long-Context Transformers
Unified Vision-Language-Action Model
Threading Keyframe with Narratives: MLLMs as Strong Long Video Comprehenders
ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation
RepSpec: Structural Re-parameterized Draft Model Training for Speculative Decoding
Improving Code Localization with Repository Memory
Group Verification-based Policy Optimization for Interactive Coding Agents
Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework
Biologically Plausible Learning via Bidirectional Spike-Based Distillation
CDBridge: A Cross-omics Post-training Bridge Strategy for Context-aware Biological Modeling
RedacBench: Can AI Erase Your Secrets?
ReLi3D: Relightable Multi-view 3D Reconstruction with Disentangled Illumination
Beyond Frequency: Scoring-Driven Debiasing for Object Detection via Blueprint-Prompted Image Synthesis
Oversmoothing, "Oversquashing'', Heterophily, Long-Range, and more: Demystifying Common Beliefs in Graph Machine Learning
Post-Training Quantization for Video Matting
Temporal superposition and feature geometry of RNNs under memory demands
Boosted Trees on a Diet: Compact Models for Resource-Constrained Devices
Epistemic Uncertainty Quantification To Improve Decisions From Black-Box Models
CrossPL: Systematic Evaluation of Large Language Models for Cross Programming Language Interoperating Code Generation
An Efficient SE(p)-Invariant Transport Metric Driven by Polar Transport Discrepancy-based Representation
TriC-Motion: Tri-Domain Causal Modeling Grounded Text-to-Motion Generation
Optimizing Agent Planning for Security and Autonomy
Correlations in the Data Lead to Semantically Rich Feature Geometry Under Superposition
ContextBench: Modifying Contexts for Targeted Latent Activation and Behaviour Elicitation
SAFER: Risk-Constrained Sample-then-Filter in Large Language Models
Efficient Learning on Large Graphs using a Densifying Regularity Lemma
ResCP: Reservoir Conformal Prediction for Time Series Forecasting
Learning Escorted Protocols For Multistate Free-Energy Estimation
Generalised Flow Maps for Few-Step Generative Modelling on Riemannian Manifolds
Towards Quantifying Long-Range Interactions in Graph Machine Learning: a Large Graph Dataset and a Measurement
Carré du champ flow matching: better quality-generalisation tradeoff in generative models
gLSTM: Mitigating Over-Squashing by Increasing Storage Capacity
Attention Sinks and Compression Valleys in LLMs are Two Sides of the Same Coin
Planner Aware Path Learning in Diffusion Language Models Training
Efficient Regression-based Training of Normalizing Flows for Boltzmann Generators
TGM: A Modular and Efficient Library for Machine Learning on Temporal Graphs
OXtal: An All-Atom Diffusion Model for Organic Crystal Structure Prediction
Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute
Deforming Videos to Masks: Flow Matching for Referring Video Segmentation
The Sample Complexity of Online Reinforcement Learning: A Multi-model Perspective
Continuum Transformers Perform In-Context Learning by Operator Gradient Descent
HDR-NSFF: High Dynamic Range Neural Scene Flow Fields
PRISMM-Bench: A Benchmark of Peer-Review Grounded Multimodal Inconsistencies
Bootstrapping MLLM for Weakly‑Supervised Class‑Agnostic Object Counting
Variation-aware Flexible 3D Gaussian Editing
Enhancing Sparse Event Detection in Healthcare Time-Series via Adaptive Gate of Context–Detail Interaction
Byzantine-Robust Federated Learning with Learnable Aggregation Weights
LogiStory: A Logic-Aware Framework for Multi-Image Story Visualization
SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback
Align-SAM: Seeking Flatter Minima for Better Cross-Subset Alignment
UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections
SlotGCG: Exploiting the Positional Vulnerability in LLMs for Jailbreak Attacks
What Scales in Cross-Entropy Scaling Law?
Diagnosing Failures in Generalization from Task-Relevant Representational Geometry
DeepFRC: An End-to-End Deep Learning Model for Functional Registration and Classification
Massive Editing for Large Language Models Based on Dynamic Weight Generation
MLE-Smith: Scaling MLE Tasks with Automated Multi-agent Pipeline
WorldGym: World Model as An Environment for Policy Evaluation
Adapt Data to Model: Adaptive Transformation Optimization for Domain-shared Time Series Foundation Models
Reusing Pre-Training Data at Test Time is a Compute Multiplier
Reading Images Like Texts: Sequential Image Understanding in Vision-Language Models
Attribution-Guided Decoding
SpectralGCD: Spectral Concept Selection and Cross-modal Representation Learning for Generalized Category Discovery
Feed-forward Human Performance Capture via Progressive Canonical Space Updates
BA-LoRA: Bias-Alleviating Low-Rank Adaptation to Mitigate Catastrophic Inheritance in Large Language Models
What Lies Beyond the View? Actively Constructing Spatial Beliefs in Foundation Models
FLoC: Facility Location-Based Efficient Visual Token Compression for Long Video Understanding
HOG-Diff: Higher-Order Guided Diffusion for Graph Generation
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
Splat the Net: Radiance Fields with Splattable Neural Primitives
Exploring Specular Reflection Inconsistency for Generalizable Face Forgery Detection
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale
VideoNSA: Native Sparse Attention Scales Video Understanding
Computer Agent Arena: Toward Human-Centric Evaluation and Analysis of Computer-Use Agents
MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models
Addressing Pitfalls in the Evaluation of Uncertainty Estimation Methods for Natural Language Generation
Cannistraci-Hebb Training on Ultra-Sparse Spiking Neural Networks
Off-Policy Safe Reinforcement Learning with Cost-Constrained Optimistic Exploration
Token-Importance Guided Direct Preference Optimization
Improving and Accelerating Offline RL in Large Discrete Action Spaces with Structured Policy Initialization
Group Critical-token Policy Optimization for Autoregressive Image Generation
Reward Models Inherit Value Biases from Pretraining
Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods
MarS-FM: Generative Modeling of Molecular Dynamics via Markov State Models
AutoDA-Timeseries: Automated Data Augmentation for Time Series
TaCo: A Benchmark for Lossless and Lossy Codecs of Heterogeneous Tactile Data
Knowledge Distillation as Decontamination? Revisiting the “Data Laundering” Concern
ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack
Seeing Through the Brain: New Insights from Decoding Visual Stimuli with fMRI
BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Behavioural Change
Exploratory Diffusion Model for Unsupervised Reinforcement Learning
Learning a distance measure from the information-estimation geometry of data
Analytica: Soft Propositional Reasoning for Robust and Scalable LLM-Driven Analysis
OR-PRM: A Process Reward Model for Algorithmic Problem in Operations Research
KRAMABENCH: A Benchmark for AI Systems on Data-to-Insight Pipelines over Data Lakes
U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs
BoGrape: Bayesian optimization over graphs with shortest-path encoded
Reverse Distillation: Disentangling and Scaling Protein Language Model Representations
Evaluating GFlowNet from partial episodes for stable and flexible policy-based training
Temporally Detailed Hypergraph Neural ODE for Type 2 Diabetes Progression Modeling
SumRA: Parameter Efficient Fine-tuning with Singular Value Decomposition and Summed Orthogonal Basis
Generative Modeling from Black-Box Corruptions via Self-Consistent Stochastic Interpolants
Conformalized Decision Risk Assessment
Internal Evaluation of Density-Based Clusterings with Noise
NeMo-map: Neural Implicit Flow Fields for Spatio-Temporal Motion Mapping
Gen-DFL: Decision-Focused Generative Learning for Robust Decision Making
AutoMetrics: Approximate Human Judgments with Automatically Generated Evaluators
Real-Time Reasoning Agents in Evolving Environments
Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward
Continuous multinomial logistic regression for neural decoding
Bures-Wasserstein Flow Matching for Graph Generation
Personalized Feature Translation for Expression Recognition: An Efficient Source-Free Domain Adaptation Method
Tackling the XAI Disagreement Problem with Adaptive Feature Grouping
Cross-Embodiment Offline Reinforcement Learning for Heterogeneous Robot Datasets
Talking Points: Describing and Localizing Pixels
LORE: Jointly Learning The Intrinsic Dimensionality and Relative Similarity Structure from Ordinal Data
Safe Continuous-time Multi-Agent Reinforcement Learning via Epigraph Form
Continuous-Time Value Iteration for Multi-Agent Reinforcement Learning
Obfuscated Activations Bypass LLM Latent-Space Defenses
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots
WoW!: World Models in a Closed-Loop World
Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models
Why We Need New Benchmarks for Local Intrinsic Dimension Estimation
RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration
FreeViS: Training-free Video Stylization with Inconsistent References
Transducing Language Models
On the Eligibility of LLMs for Counterfactual Reasoning: A Decompositional Study
BEAT: Visual Backdoor Attacks on VLM-based Embodied Agents via Contrastive Trigger Learning
Transformers Learn Latent Mixture Models In-Context via Mirror Descent
Fair Classification by Direct Intervention on Operating Characteristics
Test-time Verification via Optimal Transport: Coverage, ROC, & Sub-optimality
Steering Embedding Models with Geometric Rotation: Mapping Semantic Relationships Across Languages and Models
LEGATO: Large-scale End-to-end Generalizable Approach to Typeset OMR
What happens when generative AI models train recursively on each others' outputs?
TimeRecipe: A Time-Series Forecasting Recipe via Benchmarking Module Level Effectiveness
Hyden: A Hybrid Dual-Path Encoder for Monocular Geometry of High-resolution Images
Tackling Time-Series Forecasting Generalization via Mitigating Concept Drift
TIMESLIVER : SYMBOLIC-LINEAR DECOMPOSITION FOR EXPLAINABLE TIME SERIES CLASSIFICATION
CLUTCH: Contextualized Language model for Unlocking Text-Conditioned Hand motion modelling in the wild
DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty
The Polar Express: Optimal Matrix Sign Methods and their Application to the Muon Algorithm
Semantic-Aware Diffusion LLM Inference With Adaptive Block Size
Learning Mixtures of Linear Dynamical Systems (MoLDS) via Hybrid Tensor–EM Method
SAIR: Enabling Deep Learning for Protein-Ligand Interactions with a Synthetic Structural Dataset
Test-Time Matching: Unlocking Compositional Reasoning in Multimodal Models
KV-Cache Transform Coding for Compact Storage in LLM Inference
GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching
HATSolver: Learning Gröbner Bases with Hierarchical Attention Transformers
Scaling Sequence-to-Sequence Generative Neural Rendering
Zephyrus: An Agentic Framework for Weather Science
Hilbert: Recursively Building Formal Proofs with Informal Reasoning
Linear Mechanisms for Spatiotemporal Reasoning in Vision Language Models
Emergent Coordination in Multi-Agent Language Models
UniTrack: Differentiable Graph Representation Learning for Multi-Object Tracking
MesaNet: Sequence Modeling by Locally Optimal Test-Time Training
Beyond Masks: Efficient, Flexible Diffusion Language Models via Deletion-Insertion Processes
Catalog-Native LLM: Speaking Item-ID dialect with Less Entanglement for Recommendation
Displacement-Resistant Extensions of DPO with Nonconvex $f$-Divergences
RewardEval: Advancing Reward Model Evaluation
Relational Transformer: Toward Zero-Shot Foundation Models for Relational Data
LLMs as Rules Oracles: Exploring Real-World Multimodal Reasoning in Tabletop Strategy Game Environments
Generalizing Linear Autoencoder Recommenders with Decoupled Expected Quadratic Loss
CLEAR: Calibrated Learning for Epistemic and Aleatoric Risk
SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs
Unsupervised Representation Learning for 3D Mesh Parameterization with Semantic and Visibility Objectives
Seeing What’s Wrong: A Trajectory-Guided Approach to Caption Error Detection
WALT: Web Agents that Learn Tools
Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression
Scaling Laws and Symmetry, Evidence from Neural Force Fields
The Expressive Limits of Diagonal SSMs for State-Tracking
Soft-Di[M]O: Improved one-step Image Discrete Model
Distributionally Robust Optimization via Generative Ambiguity Modeling
PMark: Towards Robust and Distortion-free Semantic-level Watermarking with Channel Constraints
Navigating the Accuracy-Size Trade-Off with Flexible Model Merging
TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning
We use cookies to store which papers have been visited.
I agree
Successful Page Load
ICLR uses cookies for essential functions only. We do not sell your personal information.
Our Privacy Policy »
Accept
We use cookies to store which papers have been visited.
I agree