Skip to yearly menu bar
Skip to main content
Main Navigation
ICLR
Help/FAQ
Contact ICLR
Downloads
ICLR Blog
Code of Conduct
Privacy Policy
Create Profile
Reset Password
Journal To Conference Track
Diversity & Inclusion
Proceedings at OpenReview
Future Meetings
Press
Exhibitor Information
ICLR Twitter
About ICLR
My Stuff
Login
Select Year: (2026)
2026
2025
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
Getting Started
Schedule
Main Conference
Invited Talks
Awards
Papers
Orals
Blog Track Posters
Journal Track Posters
Workshops
Community
Town Hall
Socials
Sponsors
Organizers
HelpDesk
Layout:
mini
compact
topic
detail
×
No topics available
No sessions available
title
author
topic
session
shuffle
by
serendipity
bookmarked first
visited first
not visited first
bookmarked but not visited
Enable Javascript in your browser to see the papers page.
Decentralized Attention Fails Centralized Signals: Rethinking Transformers for Medical Time Series
VITA: Zero-Shot Value Functions via Test-Time Adaptation of Vision–Language Models
S3OD: Towards Generalizable Salient Object Detection with Synthetic Data
P-GenRM: Personalized Generative Reward Model with Test-time User-based Scaling
AFD-INSTRUCTION: A Comprehensive Antibody Instruction Dataset with Functional Annotations for LLM-Based Understanding and Design
Robust Adaptive Multi-Step Predictive Shielding
Multi-state Protein Sequence Design with DynamicMPNN
TIPS: Turn-level Information-Potential Reward Shaping for Search-Augmented LLMs
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model
Exploring Synthesizable Chemical Space with Iterative Pathway Refinements
XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
Controllable diffusion-based generation for multi-channel biological data
Ice Cream Doesn’t Cause Drowning: Benchmarking LLMs Against Statistical Pitfalls in Causal Inference
Geometry-aware 4D Video Generation for Robot Manipulation
VCWorld: A Biological World Model for Virtual Cell Simulation
Free Lunch for Stabilizing Rectified Flow Inversion
Neologism Learning for Controllability and Self-Verbalization
Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment
Learning Explicit Single-Cell Dynamics Using ODE Representations
Softmax is not Enough (for Adaptive Conformal Classification)
Extending Sequence Length is Not All You Need: Effective Integration of Multimodal Signals for Gene Expression Prediction
LS-Merge: Merging Language Models in Latent Space
Projected Coupled Diffusion for Test-Time Constrained Joint Generation
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
CLAP: Unsupervised 3D Representation Learning for Fusion 3D Perception via Curvature Sampling and Prototype Learning
Nonparametric Contextual Online Bilateral Trade
Unified Brain Surface and Volume Registration
Fine-Grained Iterative Adversarial Attacks with Limited Computation Budget
Distilling and Adapting: A Topology-Aware Framework for Zero-Shot Interaction Prediction in Multiplex Biological Networks
DNOD: Deformable Neural Operators for Object Detection in SAR Images
Synergistic Benefits of Joint Molecule Generation and Property Prediction
Can Language Models Discover Scaling Laws?
Amortized Inference of Causal Models via Conditional Fixed-Point Iterations
Inference-time scaling of diffusion models through classical search
CMT-Benchmark: A Benchmark for Condensed Matter Theory Built by Expert Researchers
Statistical Guarantees for Approximate Stationary Points of Shallow Neural Networks
Discrete Audio Tokens: More Than a Survey!
Defending Against Unknown Corrupted Agents: Reinforcement Learning of Adversarially Robust Nash Equilibria
back arrowGo to TMLR homepage Slicing the Gaussian Mixture Wasserstein Distance
PCNN: Probable-Class Nearest-Neighbor Explanations Improve Fine-Grained Image Classification Accuracy for AIs and Humans
Enhancing Vision-Language Model with Unmasked Token Alignment
Setting the Record Straight on Transformer Oversmoothing
Leveraging a Simulator for Learning Causal Representations from Post-Treatment Covariates for CATE
Online Selective Conformal Inference: Errors and Solutions
Multi-Bellman operator for convergence of Q-learning with linear function approximation
Training Large Reasoning Models Efficiently via Progressive Thought Encoding
PCF Learned Sort: a Learning Augmented Sort Algorithm with O(nloglogn) Expected Complexity
SEMA: Simple yet Effective Learning for Multi-Turn Jailbreak Attacks
An Expanded Benchmark that Rediscovers and Affirms the Edge of Uncertainty Sampling for Active Learning in Tabular Datasets
Are Domain Generalization Benchmarks with Accuracy on the Line Misspecified?
FlowRL: Matching Reward Distributions for LLM Reasoning
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
CNN Interpretability with Multivector Tucker Saliency Maps for Self-Supervised Models
Information Theoretic Guarantees For Policy Alignment In Large Language Models
Adversarial Robustness of Graph Transformers
On the stability of gradient descent with second order dynamics for time-varying cost functions
LC-PLM: Long-context Protein Language Modeling Using Bidirectional Mamba with Shared Projection Layers
HalluEntity: Benchmarking and Understanding Entity-Level Hallucination Detection
Controllable Sequence Editing for Biological and Clinical Trajectories
Simplex Constrained Sparse Optimization via Tail Screening
JustRL: Scaling a 1.5B LLM with a Simple RL Recipe
DGNet: Discrete Green Networks for Data-Efficient Learning of Spatiotemporal PDEs
Computer Use Survey - A Visual Survey of Computer Use Agents
From U-Nets to DiTs: The Architectural Evolution of Text-to-Image Diffusion Models (2021–2025)
Loneliness as a Case Study for Social Reward Misalignment
What (and What Not) are Calibrated Probabilities Actually Useful for?
Discretisation invariance
Dissecting Non-Determinism in Large Language Models
Model Misspecification in Simulation-Based Inference - Recent Advances and Open Challenges
Revisiting the NetHack Learning Environment
AI Fundamentals: Valuing AI Agents & Data Assets
Why AI Evaluations Need Error Bars
Where’s the Chicken? Unpacking Spatial Awareness in Vision-Language Models
From REINFORCE to Dr. GRPO: A Unified Perspective on LLM Post-Training
BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning
From Trajectories to Operators — A Unified Flow Map Perspective on Generative Modeling
Language as a Window Into the Mind: How NLP and LLMs Advance Human Sciences
Don't Look Up (Every Token): Escaping Quadratic Complexity via Geometric Patterns and Algorithms
Dynamic Parameter Reuse Augments Reasoning via Latent Chain of Thought
Navigating the Manifold — A Geometric Perspective on Diffusion-Based Inverse Problems
How To Open the Black Box: Modern Models for Mechanistic Interpretability
Generative AI Archaeology
Is the evidence in 'Language Models Learn to Mislead Humans via RLHF' valid?
Tracing the Principles Behind Modern Diffusion Models
Performative Prediction made practical
Divide, Conquer, and Standardize — A Recursive Architecture for Multi-Agent Systems (MAS)
Probabilistic Circuits for Uncertainty Quantification
Extracting Model Precision from 20 Logprobs
GEM: A Gym for Generalist LLMs
LongRLVR: Long-Context Reinforcement Learning Requires Verifiable Context Rewards
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use
TripleSumm: Adaptive Triple-Modality Fusion for Video Summarization
EXPO: Stable Reinforcement Learning with Expressive Policies
What Matters for Batch Online Reinforcement Learning in Robotics?
Zero-shot Human Pose Estimation using Diffusion-based Inverse solvers
Causal Structure Learning in Hawkes Processes with Complex Latent Confounder Networks
AlphaBench: Benchmarking Large Language Models in Formulaic Alpha Factor Mining
Matching without Group Barrier for Heterogeneous Treatment Effect Estimation
Why Attention Patterns Exist: A Unifying Temporal Perspective Analysis
Attribution-Guided Decoding
Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP
Rex-Thinker: Grounded Object Referring via Chain-of-Thought Reasoning
Lightweight Transformer for EEG Classification via Balanced Signed Graph Algorithm Unrolling
FSOD-VFM: Few-Shot Object Detection with Vision Foundation Models and Graph Diffusion
Distributional Equivalence in Linear Non-Gaussian Latent-Variable Cyclic Causal Models: Characterization and Learning
Condition Errors Refinement in Autoregressive Image Generation with Diffusion Loss
From Broad Exploration to Stable Synthesis: Entropy-Guided Optimization for Autoregressive Image Generation
Language Models Use Lookbacks to Track Beliefs
Noise-Adaptive Diffusion Sampling for Inverse Problems Without Task-Specific Tuning
IA2: Alignment with ICL Activations improves Supervised Fine-Tuning
Bridging ML and algorithms: comparison of hyperbolic embeddings
FASTer: Toward Powerful and Efficient Autoregressive Vision–Language–Action Models with Learnable Action Tokenizer and Block-wise Decoding
Benchmarking Overton Pluralism in LLMs
Operator Theory-Driven Autoformulation of MDPs for Control of Queueing Systems
WebDevJudge: Evaluating (M)LLMs as Critiques for Web Development Quality
ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs
On the Universality and Complexity of GNN for Solving Second-order Cone Programs
Gauge Flow Matching: Efficient Constrained Generative Modeling over General Convex Set and Beyond
TRIDENT: Cross-Domain Trajectory Spatio-Temporal Representation via Distance-Preserving Triplet Learning
SupCLAP: Controlling Optimization Trajectory Drift in Audio-Text Contrastive Learning with Support Vector Regularization
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
Detect, Decide, Unlearn: A Transfer-Aware Framework for Continual Learning
DiffSDA: Unsupervised Diffusion Sequential Disentanglement Across Modalities
Independence Test for Linear Non-Gaussian Data and Applications in Causal Discovery
Conditional Independent Component Analysis for Estimating Causal Structure with Latent Variables
Let LLMs Speak Embedding Languages: Generative Text Embeddings via Iterative Contrastive Refinement
Effect of Parallel Environments and Rollout Steps in PPO
RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents
Expert Divergence Learning for MoE-based Language Models
Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization
SPRINT: Sparse-Dense Residual Fusion for Efficient Diffusion Transformers
The Layered Ontology of Models, Resolving the Epistemological Crisis of AI
PostAlign: Multimodal Grounding as a Corrective Lens for MLLMs
A Guardrail for Safety Preservation: When Safety-Sensitive Subspace Meets Harmful-Resistant Null-Space
Do 3D Large Language Models Really Understand 3D Spatial Relationships?
Diverse Text Decoding via Iterative Reweighting
Depth Anything 3: Recovering the Visual Space from Any Views
Poly-attention: a general scheme for higher-order self-attention
Pursuing Minimal Sufficiency in Spatial Reasoning
Streaming Autoregressive Video Generation via Diagonal Distillation
Efficient Degradation-agnostic Image Restoration via Channel-Wise Functional Decomposition and Manifold Regularization
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
Graphon Cross-Validation: Assessing Models on Network Data
Multi-Object System Identification from Videos
PointRePar : SpatioTemporal Point Relation Parsing for Robust Category-Unified 3D Tracking
PYRREGULAR: A Unified Framework for Irregular Time Series, with Classification Benchmarks
Modal Aphasia: Can Unified Multimodal Models Describe Images From Memory?
IDER: IDempotent Experience Replay for Reliable Continual Learning
On the Wasserstein Geodesic Principal Component Analysis of probability measures
Physics-Constrained Fine-Tuning of Flow-Matching Models for Generation and Inverse Problems
Planner Aware Path Learning in Diffusion Language Models Training
OXtal: An All-Atom Diffusion Model for Organic Crystal Structure Prediction
Does the Data Processing Inequality Reflect Practice? On the Utility of Low-Level Tasks
ConvT3: Structured State Kernels for Convolutional State Space Models
Gradient-Aligned Calibration for Post-Training Quantization of Diffusion Models
VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
Continuous Chain of Thought Enables Parallel Exploration and Reasoning
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs
Celo: Training Versatile Learned Optimizers on a Compute Diet
Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models
Celo2: Towards Learned Optimization Free Lunch
Beyond Student: An Asymmetric Network for Neural Network Inheritance
Adaptive Mesh Quantization for Neural PDE Solvers
Gauge-invariant representation holonomy
Multi-Marginal Flow Matching with Adversarially Learnt Interpolants
Optimizing Canaries for Privacy Auditing with Metagradient Descent
P$^2$-DPO: Grounding Hallucination in Perceptual Processing via Calibration Direct Preference Optimization
End-to-End Probabilistic Framework for Learning with Hard Constraints
wd1: Weighted Policy Optimization for Reasoning in Diffusion Language Models
Densemarks: Learning Canonical Embeddings for Human Heads Images via Point Tracks
Rethinking Benign Relearning: Syntax as the Hidden Driver of Unlearning Failures
Robust Multi-Objective Controlled Decoding of Large Language Models
A2D: Any-Order, Any-Step Safety Alignment for Diffusion Language Models
The Diffusion Duality, Chapter II: $\Psi$-Samplers and Efficient Curriculum
Action Chunking and Data Augmentation Yield Exponential Improvements in Behavior Cloning for Continuous Spaces
Wait, Do We Need to Wait? Revisiting Budget Forcing for Sequential Test-Time Scaling
Remotely Detectable Robot Policy Watermarking
When and Where to Reset Matters for Long-Term Test-Time Adaptation
Dynamical properties of dense associative memory
Token-Based Audio Inpainting via Discrete Diffusion
Noisy but Valid: Robust Statistical Evaluation of LLMs with Imperfect Judges
One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning
Enhanced Generative Model Evaluation with Clipped Density and Coverage
Solving General-Utility Markov Decision Processes in the Single-Trial Regime with Online Planning
Unsupervised Learning of Efficient Exploration: Pre-training Adaptive Policies via Self-Imposed Goals
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Why Ask One When You Can Ask $k$? Learning-to-Defer to the Top-$k$ Experts
Soft Quality-Diversity Optimization
Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning
Probabilistic Kernel Function for Fast Angle Testing
Negative Pre-activations Differentiate Syntax
Intention-Conditioned Flow Occupancy Models
Learning to Maximize Rewards via Reaching Goals
ABBA-Adapters: Efficient and Expressive Fine-Tuning of Foundation Models
WARP: Weight Teleportation for Attack-Resilient Unlearning Protocols
LMask: Learn to Solve Constrained Routing Problems with Lazy Masking
Toward Complex-Valued Neural Networks for Waveform Generation
Safety Subspaces are Not Linearly Distinct: A Fine-Tuning Case Study
KL-Regularized Reinforcement Learning for Generative Modelling is Designed to Mode Collapse
LoRA-S: An Efficient Low Rank Adaptation scheme via Sylvester equation
Value Flows
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
LapFlow: Laplacian Multi-scale Flow Matching for Generative Modeling
Spatial Mental Modeling from Limited Views
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
KGOT: Unified Knowledge Graph and Optimal Transport Pseudo-Labeling for Molecule-Protein Interaction Prediction
MolEditRL: Structure-Preserving Molecular Editing via Discrete Diffusion and Reinforcement Learning
Station2Radar: Query‑Conditioned Gaussian Splatting for Precipitation Field
Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization
CARE: Covariance-Aware and Rank-Enhanced Decomposition for Enabling Multi-Head Latent Attention
SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback
Decoding Dynamic Visual Experience from Calcium Imaging via Cell-Pattern-Aware Pretraining
From Medical Records to Diagnostic Dialogues: A Clinical-Grounded Approach and Dataset for Psychiatric Comorbidity
ODESteer: A Unified ODE-Based Steering Framework for LLM Alignment
Visualizing LLM Latent Space Geometry Through Dimensionality Reduction
Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?
RealPDEBench: A Benchmark for Complex Physical Systems with Real-World Data
Align-SAM: Seeking Flatter Minima for Better Cross-Subset Alignment
DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation
DAK-UCB: Diversity-Aware Prompt Routing for LLMs and Generative Models
AdS-GNN - a Conformally Equivariant Graph Neural Network
Constraint-guided Hardware-aware NAS through Gradient Modification
Scaling Goal-conditioned Reinforcement Learning with Multistep Quasimetric Distances
Verifying Chain-of-Thought Reasoning via Its Computational Graph
NeMo-map: Neural Implicit Flow Fields for Spatio-Temporal Motion Mapping
Null-Space Filtering for Data-Free Continual Model Merging: Preserving Stability, Promoting Plasticity
Is Graph Unlearning Ready for Practice? A Benchmark on Efficiency, Utility, and Forgetting
Bridging Successor Measure and Online Policy Learning with Flow Matching-Based Representations
RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization
Unifying Stable Optimization and Reference Regularization in RLHF
LLM as an Algorithmist: Enhancing Anomaly Detectors via Programmatic Synthesis
Evaluating Machine Learned Inter-Atomic Potentials for a Practical Simulation Workflow
Imitating the Truth: Attention-aware Truth-Guided Enhancement for Hallucination Mitigation in Large Vision-Language Models
Saddle-to-Saddle Dynamics Explains A Simplicity Bias Across Neural Network Architectures
Holdout-Loss-Based Data Selection for LLM Finetuning via In-Context Learning
PixelCraft: A Multi-Agent system for High-Fidelity Visual Reasoning on Structured Images
NEO — No-Optimization Test-Time Adaptation through Latent Re-Centering
Content Promotion as a Strategic Game: How to Design Agentic Publishers for the Evolving Search Ecosystem in the GenAI Era?
From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning
Generalised Flow Maps for Few-Step Generative Modelling on Riemannian Manifolds
The Adversarial Conditioning Paradox: Why Attacked Inputs Are More Stable, Not Less
SynthWorlds: Controlled Parallel Worlds for Disentangling Reasoning and Knowledge in Language Models
A Block Coordinate Descent Method for Nonsmooth Composite Optimization under Orthogonality Constraints
Accelerating Eigenvalue Dataset Generation via Chebyshev Subspace Filter
OpenThoughts: Data Recipes for Reasoning Models
From Parameters to Behaviors: Unsupervised Compression of the Policy Space
Scheduling Your LLM Reinforcement Learning with Reasoning Trees
The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner
Semantic Parallelism: Redefining Efficient MoE Inference via Model-Data Co-Scheduling
MobileKGQA: On-Device KGQA System on Dynamic Mobile Environments
Energy-Regularized Sequential Model Editing on Hyperspheres
Latent Speech-Text Transformer
VidGuard-R1: AI-Generated Video Detection and Explanation via Reasoning MLLMs and RL
SesaHand: Enhancing 3D Hand Reconstruction via Controllable Generation with Semantic and Structural Alignment
Beyond Text-to-Image: Liberating Generation with a Unified Discrete Diffusion Model
Path Matters: Unveiling Geometric Implicit Bias via Curvature-Aware Sparse View Optimization
Pi-CCA: Prompt-Invariant CCA Certificates for Replay-Free Continual Multimodal Learning
Meta-UCF: Unified Task-Conditioned LoRA Generation for Continual Learning in Large Language Models
Pre-training LLM without Learning Rate Decay Enhances Supervised Fine-Tuning
CoMem: Compositional Concept-Graph Memory for Vision–Language Adaptation
VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task
InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
GNN Explanations that do not Explain and How to find Them
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling
Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model
FlashDLM: Accelerating Diffusion Language Model Inference via Efficient KV Caching and Guided Diffusion
Planned Diffusion
Proximal Diffusion Neural Sampler
EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing
Reasoning-Driven Multimodal LLM for Domain Generalization
Learning Nonlinear Causal Reductions to Explain Reinforcement Learning Policies
Error Feedback for Muon and Friends
VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models
Scaling Agents via Continual Pre-training
Multi-Condition Conformal Selection
VideoAgentTrek: Computer-Use Pretraining from Unlabeled Videos
Long-Context Attention Benchmark: From Kernel Efficiency to Distributed Context Parallelism
Computer Agent Arena: Toward Human-Centric Evaluation and Analysis of Computer-Use Agents
Efficient Differentiable Contact Model with Long-range Influence
EquAct: An SE(3)-Equivariant Multi-Task Transformer for 3D Robotic Manipulation
Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation
HATSolver: Learning Gröbner Bases with Hierarchical Attention Transformers
Bradley-Terry and Multi-Objective Reward Modeling Are Complementary
A-TPT: Angular Diversity Calibration Properties for Test-Time Prompt Tuning of Vision-Language Models
Multi-Head Low-Rank Attention
COSMO-INR: Complex Sinusoidal Modulation for Implicit Neural Representations
Multiplayer Nash Preference Optimization
Using Graph Neural Networks in Reinforcement Learning: A Practical Guide
Mitigating Privacy Risk via Forget Set-Free Unlearning
How Far Are LLMs from Professional Poker Players? Revisiting Game-Theoretic Reasoning with Agentic Tool Use
A Structured, Tagged, and Localized Visual Question Answering Dataset with Full Sentence Answers and Scene Graphs for Chest X-ray Images
Distillation of Large Language Models via Concrete Score Matching
Continual Unlearning for Text-to-Image Diffusion Models: A Regularization Perspective
AC-Sampler: Accelerate and Correct Diffusion Sampling with Metropolis-Hastings Algorithm
EigenBench: A Comparative Behavioral Measure of Value Alignment
Multifidelity Simulation-based Inference for Computationally Expensive Simulators
Hierarchical Concept-based Interpretable Models
RepIt: Steering Language Models with Concept-Specific Refusal Vectors
Curation Leaks: Membership Inference Attacks against Data Curation for Machine Learning
Transferable and Stealthy Adversarial Attacks on Large Vision-Language Models
Attributing Response to Context: A Jensen–Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation
Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition
AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism
Slow-Fast Policy Optimization: Reposition-Before-Update for LLM Reasoning
Reversible Primitive–Composition Alignment for Continual Vision–Language Learning
From Pixels to Semantics: Unified Facial Action Representation Learning for Micro-Expression Analysis
From Tokens to Nodes: Semantic-Guided Motion Control for Dynamic 3D Gaussian Splatting
NeoBERT: A Next Generation BERT
Lean4Physics: Comprehensive Reasoning Framework for College-level Physics in Lean4
SPREAD: Sampling-based Pareto front Refinement via Efficient Adaptive Diffusion
GenCtrl -- A Formal Controllability Toolkit for Generative Models
Reconstructing KV Caches with Cross-Layer Fusion for Enhanced Transformers
Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
CHAMMI-75: Pre-training multi-channel models with heterogeneous microscopy images
WorldTree: Towards 4D Dynamic Worlds from Monocular Video using Tree-Chains
Lookup multivariate Kolmogorov-Arnold Networks
Verification and Co-Alignment via Heterogeneous Consistency for Preference-Aligned LLM Annotations
When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training
SAFETY-GUIDED FLOW (SGF): A UNIFIED FRAMEWORK FOR NEGATIVE GUIDANCE IN SAFE GENERATION
ICDiffAD: Implicit Conditioning Diffusion Model for Time Series Anomaly Detection
FastVMT: Eliminating Redundancy in Video Motion Transfer
iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models
Embedding-Based Context-Aware Reranker
Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models
Rethinking Residual Errors in Compensation-based LLM Quantization
Random Spiking Neural Networks are Stable and Spectrally Simple
NGS-Marker: Robust Native Watermarking for 3D Gaussian Splatting
ExpVid: A Benchmark for Experiment Video Understanding & Reasoning
Rényi Sharpness: A Novel Sharpness that Strongly Correlates with Generalization
WRING Out The Bias: A Rotation-Based Alternative To Projection Debiasing
CodeQuant: Unified Clustering and Quantization for Enhanced Outlier Smoothing in Low-Precision Mixture-of-Experts
3D-aware Disentangled Representation for Compositional Reinforcement Learning
Pulp Motion: Framing-aware multimodal camera and human motion generation
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning
Prompt-MII: Meta-Learning Instruction Induction for LLMs
HARDTESTGEN: A High-Quality RL Verifier Generation Pipeline for LLM Algorithimic Coding
Distilling Causal Signals for One-Shot Directed Evolution of Antibodies
DisTaC: Conditioning Task Vectors via Distillation for Robust Model Merging
PCPO: Proportionate Credit Policy Optimization for Preference Alignment of Image Generation Models
Square Peg, Round Hole: Plugging Non-Sequential Data into Sequential Language Models
From Data Statistics to Feature Geometry: How Correlations Shape Superposition
Layerwise Federated Learning for Heterogeneous Quantum Clients using Quorus
Riemannian Federated Learning via Averaging Gradient Streams
Scalable Random Wavelet Features: Efficient Non-Stationary Kernel Approximation with Convergence Guarantees
DNT: a Deeply Normalized Transformer that can be trained by Momentum SGD
Enhancing Communication Compression via Discrepancy-aware Calibration for Federated Learning
QuRL: Rubrics As Judge For Open-Ended Question Answering
Amortising Inference and Meta-Learning Priors in Neural Networks
Temporal Test-Time Adaptation with State-Space Models
Optimistic Task Inference for Behavior Foundation Models
Scalable Spatio-Temporal SE(3) Diffusion for Long-Horizon Protein Dynamics
ResiliBench: Evaluating Agentic Workflow Adaptation in Stochastic Environments
Expressive Power of Implicit Models: Rich Equilibria and Test-Time Scaling
Corner Gradient Descent
Retrieval-of-Thought: Efficient Reasoning via Reusing Thoughts
The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context
THE END OF MANUAL DECODING: TOWARDS TRULY END-TO-END LANGUAGE MODELS
A Convergence Analysis of Adaptive Optimizers under Floating-point Quantization
Structured Reasoning for LLMs: A Unified Framework for Efficiency and Explainability
PETRI: Learning Unified Cell Embeddings from Unpaired Modalities via Early-Fusion Joint Reconstruction
Riemannian Optimization on Relaxed Indicator Matrix Manifold
Improving the Trade-off Between Watermark Strength and Speculative Sampling Efficiency for Language Models
SERE: Similarity-based Expert Re-routing for Efficient Batch Decoding in MoE Models
SysMoBench: Evaluating AI on Formally Specifying Complex Real-World Systems
VideoJudge: Bootstrapping Enables Scalable Supervision of MLLM-as-a-Judge for Video Understanding
FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
DoFlow: Flow-based Generative Models for Interventional and Counterfactual Forecasting on Time Series
Flow-based Conformal Prediction for Multi-dimensional Time Series
Eliminating Inductive Bias in Reward Models with Information-Theoretic Guidance
Log-Linear Attention
From Embedding to Control: Representations for Stochastic Multi-Object Systems
Multilevel Control Functional
xLSTM Scaling Laws: Competitive Performance with Linear Time-Complexity
QuoKA: Query-Oriented KV Selection for Efficient LLM Prefill
ConRep4CO: Contrastive Representation Learning of Combinatorial Optimization Instances across Types
AQuA: Toward Strategic Response Generation for Ambiguous Visual Questions
Visual symbolic mechanisms: Emergent symbol processing in Vision Language Models
Derandomized Online-to-Non-convex Conversion for Stochastic Weakly Convex Optimization
Music Flamingo: Scaling Music Understanding in Audio Language Models
Recurrent Action Transformer with Memory
ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL Problems
FALCON: Few-step Accurate Likelihoods for Continuous Flows
Causality ≠ Invariance: Function and Concept Vectors in LLMs
Efficient Regression-based Training of Normalizing Flows for Boltzmann Generators
MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation
Latent Veracity Inference for Identifying Errors in Stepwise Reasoning
Partially Equivariant Reinforcement Learning in Symmetry-Breaking Environments
MoM: Linear Sequence Modeling with Mixture-of-Memories
Riemannian Zeroth-Order Gradient Estimation with Structure-Preserving Metrics for Geodesically Incomplete Manifolds
Action-Guided Attention for Video Action Anticipation
LLM-Guided Evolutionary Program Synthesis for Quasi-Monte Carlo Design
Preference-based Policy Optimization from Sparse-reward Offline Dataset
Clustering by Denoising: Latent plug-and-play diffusion for single-cell embeddings
Contraction and Hourglass Persistence for Learning on Graphs, Simplices, and Cells
Forget Many, Forget Right: Scalable and Precise Concept Unlearning in Diffusion Models
KnowledgeSmith: Uncovering Knowledge Updating in LLMs with Model Editing and Unlearning
Towards Cognitively-Faithful Decision-Making Models to Improve AI Alignment
FieryGS: In-the-Wild Fire Synthesis with Physics-Integrated Gaussian Splatting
Beyond Scattered Acceptance: Fast and Coherent Inference for DLMs via Longest Stable Prefixes
ConsisDrive: Identity-Preserving Driving World Models for Video Generation by Instance Mask
Motion-Aligned Word Embeddings for Text-to-Motion Generation
SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models
Scaling Laws of SignSGD in Linear Regression: When Does It Outperform SGD?
OWL : Geometry-Aware Spatial Reasoning for Audio Large Language Models
The Potential of Second-Order Optimization for LLMs: A Study with Full Gauss-Newton
Parallel Token Prediction for Language Models
PICS: Pairwise Image Compositing with Spatial Interactions
Weight-Space Linear Recurrent Neural Networks
Your VAR Model is Secretly an Efficient and Explainable Generative Classifier
Unveiling the Basin-Like Loss Landscape in Large Language Models
S2GO: Streaming Sparse Gaussian Occupancy
Death of the Novel(ty): Beyond N-Gram Novelty as a Metric for Textual Creativity
Latent Geometry-Driven Network Automata for Complex Network Dismantling
vCache: Verified Semantic Prompt Caching
Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning
Canonical Tree Cover Neural Networks for Expressive and Invariant Graph Learning
DAG-Math: Graph-of-Thought Guided Mathematical Reasoning in LLMs
PERK: Long-Context Reasoning as Parameter-Efficient Test-Time Learning
LogicXGNN: Grounded Logical Rules for Explaining Graph Neural Networks
CGSA: Class-Guided Slot-Aware Adaptation for Source-Free Object Detection
Sparling: End-to-End Spatial Concept Learning via Extremely Sparse Activations
Iterative Training of Physics-Informed Neural Networks with Fourier-enhanced Features
On the Mechanisms of Collaborative Learning in VAE Recommenders
SAQ: Stabilizer-Aware Quantum Error Correction Decoder
Opponent Shaping in LLM Agents
When Agents “Misremember” Collectively: Exploring the Mandela Effect in LLM-based Multi-Agent Systems
Hot PATE: Private Aggregation of Distributions for Diverse Tasks
Dual-Objective Reinforcement Learning with Novel Hamilton-Jacobi-Bellman Formulations
Parameterized Hardness of Zonotope Containment and Neural Network Verification
THEMIS: Towards Holistic Evaluation of MLLMs for Scientific Paper Fraud Forensics
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models
Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering
Designing Rules to Pick a Rule: Aggregation by Consistency
Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning
Sampling-aware Adversarial Attacks Against Large Language Models
MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos
Latent Wasserstein Adversarial Imitation Learning
Landing with the Score: Riemannian Optimization through Denoising
Benchmarking ECG FMs: A Reality Check Across Clinical Tasks
ChainMPQ: Interleaved Text-Image Reasoning Chains for Mitigating Relation Hallucinations
Steering Evaluation-Aware Language Models To Act Like They Are Deployed
SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training
Learning-Augmented Moment Estimation on Time-Decay Models
Flow Actor-Critic for Offline Reinforcement Learning
Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region
Efficient algorithms for Incremental Metric Bipartite Matching
MnemoDyn: Learning Resting State Dynamics from $40$K FMRI sequences
DistillKac: Few-Step Image Generation via Damped Wave Equations
RL for Reasoning by Adaptively Revealing Rationales
Trust The Typical
Noise-Aware Generalization: Robustness to In-Domain Noise and Out-of-Domain Generalization
LLM Pretraining with Continuous Concepts
Error as Signal: Stiffness-Aware Diffusion Sampling via Embedded Runge-Kutta Guidance
Hierarchical Prototype Learning for Semantic Segmentation
Segment-Level Attribution for Selective Learning of Long Reasoning Traces
Knowledgeable Language Models as Black-Box Optimizers for Personalized Medicine
Don't Shift the Trigger: Robust Gradient Ascent for Backdoor Unlearning
FAST‑DIPS: Adjoint‑Free Analytic Steps and Hard‑Constrained Likelihood Correction for Diffusion‑Prior Inverse Problems
Stage-wise Dynamics of Classifier-Free Guidance in Diffusion Models
Token-Efficient Item Representation via Images for LLM Recommender Systems
Motion-R1: Enhancing Motion Generation with Decomposed Chain-of-Thought and RL Binding
Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs
Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning
DRAGON: Guard LLM Unlearning in Context via Negative Detection and Reasoning
DiffWind: Physics-Informed Differentiable Modeling of Wind-Driven Object Dynamics
Adaptive Conformal Prediction via Mixture-of-Experts Gating Similarity
Reevaluating Policy Gradient Methods for Imperfect-Information Games
Hedonic Neurons: A Mechanistic Mapping of Latent Coalitions in Transformer MLPs
Steering MoE LLMs via Expert (De)Activation
Precise and Interpretable Editing of Code Knowledge in Large Language Models
The Shape of Adversarial Influence: Characterizing LLM Latent Spaces with Persistent Homology
Two failure modes of deep transformers and how to avoid them: a unified theory of signal propagation at initialisation
Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation
FlowGen: Synthesizing Diverse Flowcharts to Enhance and Benchmark MLLM Reasoning
Diffusion-DFL: Decision-focused Diffusion Models for Stochastic Optimization
Dropping Just a Handful of Preferences Can Change Top Large Language Model Rankings
When Large Multimodal Models Confront Evolving Knowledge: Challenges and Explorations
Hierarchical Multi-Scale Molecular Conformer Generation
NatADiff: Adversarial Boundary Guidance for Natural Adversarial Diffusion
SFBD-OMNI: Bridge models for lossy measurement restoration with limited clean samples
Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing
STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models
When Is Diversity Rewarded in Cooperative Multi-Agent Learning?
Information Shapes Koopman Representation
Can Large Language Models Match the Conclusions of Systematic Reviews?
COMPASS: Robust Feature Conformal Prediction for Medical Segmentation Metrics
Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models
Physics-Informed Audio-Geometry-Grid Representation Learning for Universal Sound Source Localization
Thompson Sampling via Fine-Tuning of LLMs
Adaptive Logit Adjustment for Debiasing Multimodal Language Models
Noisy-Pair Robust Representation Alignment for Positive-Unlabeled Learning
Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4): Analysis and Variations
XQC: Well-conditioned Optimization Accelerates Deep Reinforcement Learning
Post-training Large Language Models for Diverse High-Quality Responses
Vision-Language-Action Instruction Tuning: From Understanding to Manipulation
Multi-modal Data Spectrum: Multi-modal Datasets are Multi-dimensional
Moving Beyond Medical Exams: A Clinician-Annotated Fairness Dataset of Real-World Tasks and Ambiguity in Mental Healthcare
Zero-Sacrifice Persistent-Robustness Adversarial Defense for Pre-Trained Encoders
Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data
FLoC: Facility Location-Based Efficient Visual Token Compression for Long Video Understanding
Physics vs Distributions: Pareto Optimal Flow Matching with Physics Constraints
Learning Energy-Based Generative Models via Potential Flow: A Variational Principle Approach to Probability Density Homotopy Matching
RCPU: Rotation-Constrained Error Compensation for Structured Pruning of Large Language Models
Efficient Quantization of Mixture-of-Experts with Theoretical Generalization Guarantees
Diversified Multinomial Logit Contextual Bandits
THE SELF-RE-WATERMARKING TRAP: FROM EXPLOIT TO RESILIENCE
Logit‑KL Flow Matching: Non‑Autoregressive Text Generation via Sampling‑Hybrid Inference
Attention Smoothing Is All You Need For Unlearning
HOTA: Hamiltonian framework for Optimal Transport Advection
On the Lipschitz Continuity of Set Aggregation Functions and Neural Networks for Sets
Stable coresets: Unleashing the power of uniform sampling
When Shift Happens - Confounding Is to Blame
Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward
Policy Likelihood-based Query Sampling and Critic-Exploited Reset for Efficient Preference-based Reinforcement Learning
ProofFlow: A Dependency Graph Approach to Faithful Proof Autoformalization
Evolution and compression in LLMs: on the emergence of human-aligned categorization
Beyond Hearing: Learning Task-Agnostic ExG Representations from Earphones via Physiology-Informed Tokenization
FACT: Fine-grained Across-variable Convolution for Multivariate Time Series Forecasting
Online time series prediction using feature adjustment
Generating Directed Graphs with Dual Attention and Asymmetric Encoding
Good Allocations from Bad Estimates
DP-Fusion: Token-Level Differentially Private Inference for Large Language Models
Doloris: Dual Conditional Diffusion Implicit Bridges with Sparsity Masking Strategy for Unpaired Single-Cell Perturbation Estimation
Omni-iEEG: A Large-Scale, Comprehensive iEEG Dataset and Benchmark for Epilepsy Research
A Scene is Worth a Thousand Features: Feed-Forward Camera Localization from a Collection of Image Features
Carré du champ flow matching: better quality-generalisation tradeoff in generative models
Bi-Lipschitz Autoencoder With Injectivity Guarantee
Towards Text-Mask Consistency in Medical Image Segmentation
Supporting High-Stakes Decision Making Through Interactive Preference Elicitation in the Latent Space
Robust Adversarial Quantification via Conflict-Aware Evidential Deep Learning
Displacement-Resistant Extensions of DPO with Nonconvex $f$-Divergences
Consistent Text-to-Image Generation via Scene De-Contextualization
KV Cache Transform Coding for Compact Storage in LLM Inference
JointAVBench: A Benchmark for Joint Audio-Visual Reasoning Evaluation
Improving and Accelerating Offline RL in Large Discrete Action Spaces with Structured Policy Initialization
Scaling Laws and Symmetry, Evidence from Neural Force Fields
Shrinking Proteins with Diffusion
Antibody: Strengthening Defense Against Harmful Fine-Tuning for Large Language Models via Attenuating Harmful Gradient Influence
ICaRus: Identical Cache Reuse for Efficient Multi-Model Inference
Improving 2D Diffusion Models for 3D Medical Imaging with Inter‑Slice Consistent Stochasticity
PoseX: AI Defeats Physics-based Methods on Protein Ligand Cross-Docking
The Gaussian-Head OFL Family: One-Shot Federated Learning from Client Global Statistics
Jailbreaking the Matrix: Nullspace Steering for Controlled Model Subversion
Unveiling the Potential of Diffusion Large Language Model in Controllable Generation
ASCIIEval: Benchmarking Models' Visual Perception in Text Strings via ASCII Art
Smooth Calibration Error: Uniform Convergence and Functional Gradient Analysis
On the Eligibility of LLMs for Counterfactual Reasoning: A Decompositional Study
MILR: Improving Multimodal Image Generation via Test-Time Latent Reasoning
DRIFT-Net: A Spectral-Coupled Neural Operator for PDEs Learning
Experience-based Knowledge Correction for Robust Planning in Minecraft
Transducing Language Models
MetaMuse: Algorithm Generation via Creative Ideation
Foresight Diffusion: Improving Sampling Consistency in Predictive Diffusion Models
Mapping Semantic & Syntactic Relationships with Geometric Rotation
Deep Latent Variable Model based Vertical Federated Learning with Flexible Alignment and Labeling Scenarios
From Reproduction to Replication: Evaluating Research Agents with Progressive Code Masking
Implicit bias produces neural scaling laws in learning curves, from perceptrons to deep networks
A Function-Centric Graph Neural Network Approach for Predicting Electron Densities
Data-Aware and Scalable Sensitivity Analysis for Decision Tree Ensembles
Revisting Node Affinity Prediction In Temporal Graphs
Enhanced Continual Learning of Vision-Language Models with Model Fusion
Take Note: Your Molecular Dataset Is Probably Aligned
UltraLLaDA: Scaling the Context Length to 128K for Diffusion Large Language Models
Block-Sample MAC-Bayes Generalization Bounds
Bridging Input Feature Spaces Towards Graph Foundation Models
Calibrating Verbalized Confidence with Self-Generated Distractors
Can we generate portable representations for clinical time series data using LLMs?
Understanding and improving Shampoo and SOAP via Kullback-Leibler Minimization
Boosting Open Set Recognition Performance through Modulated Representation Learning
Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity
Measurement Score-Based Diffusion Model
AttriCtrl: A Generalizable Framework for Controlling Semantic Attribute Intensity in Diffusion Models
A Bayesian Nonparametric Framework For Learning Disentangled Representations
SimULi: Real-Time LiDAR and Camera Simulation with Unscented Transforms
On learning linear dynamical systems in context with attention layers
RainPro-8: An Efficient Deep Learning Model to Estimate Rainfall Probabilities Over 8 Hours
Agentic Reinforcement Learning with Implicit Step Rewards
Content-Aware Mamba for Learned Image Compression
In-Context Multi-Objective Optimization
BANZ-FS: BANZSL Fingerspelling Dataset
Distributional Consistency Loss: Beyond Pointwise Data Terms in Inverse Problems
Generalization of Diffusion Models Arises with a Balanced Representation Space
Membership Inference Attacks Against Fine-tuned Diffusion Language Models
Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation
Learning Hierarchical and Geometry-Aware Graph Representations for Text-to-CAD
Aligner, Diagnose Thyself: A Meta-Learning Paradigm for Fusing Intrinsic Feedback in Preference Alignment
Automatic Image-Level Morphological Trait Annotation for Organismal Images
Online Minimization of Polarization and Disagreement via Low-Rank Matrix Bandits
Bottlenecked Transformers: Periodic KV Cache Consolidation for Generalised Reasoning
DiffuDETR: Rethinking Detection Transformers with Denoising Diffusion Process
Swap-guided Preference Learning for Personalized Reinforcement Learning from Human Feedback
VisCoder2: Building Multi-Language Visualization Coding Agents
ExpertLongBench: Benchmarking Language Models on Expert-Level Long-Form Generation Tasks with Structured Checklists
CyclicReflex: Improving Reasoning Models via Cyclical Reflection Token Scheduling
BIRD-INTERACT: Re-imagining Text-to-SQL Evaluation via Lens of Dynamic Interactions
End-to-end Listen, Look, Speak and Act
Training Dynamics of the Cooldown Stage in Warmup-Stable-Decay Learning Rate Scheduler
PolyGraph Discrepancy: a classifier-based metric for graph generation
Beyond Accuracy: Are Time Series Foundation Models Well-Calibrated?
Directional Textual Inversion for Personalized Text-to-Image Generation
Video Unlearning via Low-Rank Refusal Vector
SERQ: Saliency-Aware Low-Rank Error Reconstruction for LLM Quantization
Why High-rank Neural Networks Generalize?: An Algebraic Framework with RKHSs
A Recovery Guarantee for Sparse Neural Networks
Difficulty–Diversity Collaborative Filtering for Data-Efficient LLM Fine-Tuning
Breaking Barriers: Do Reinforcement Post Training Gains Transfer To Unseen Domains?
High-Probability Bounds for the Last Iterate of Clipped SGD
HSIC Bottleneck for Cross-Generator and Domain-Incremental Synthetic Image Detection
MARL2Grid-TR: A Multi-Agent RL Benchmark in Power Grid Operations
GAS: Improving Discretization of Diffusion ODEs via Generalized Adversarial Solver
Robust and Interpretable Adaptation of Equivariant Materials Foundation Models via Sparsity-promoting Fine-tuning
Online Inventory Optimization in Non-Stationary Environment
PRISMM-Bench: A Benchmark of Peer-Review Grounded Multimodal Inconsistencies
Agnostics: Learning to Synthesize Code in Any Programming Language with a Universal Reinforcement Learning Environment
Reasoning Boosts Opinion Alignment in LLMs
Event-T2M: Event-level Conditioning for Complex Text-to-Motion Synthesis
ChainGPT: Dual-Reasoning Model with Recurrent Depth and Multi-Rank State Updates
Black-Box Privacy Attacks on Shared Representations in Multitask Learning
Assessing Robustness via Score-Based Adversarial Image Generation
Binomial Gradient-Based Meta-Learning for Enhanced Meta-Gradient Estimation
Learning Data-Efficient and Generalizable Neural Operators via Fundamental Physics Knowledge
ManipEvalAgent: Promptable and Efficient Evaluation Framework for Robotic Manipulation Policies
BeyondBench: Contamination-Resistant Evaluation of Reasoning in Language Models
Convergence of Muon with Newton-Schulz
Uncertainty Matters in Dynamic Gaussian Splatting for Monocular 4D Reconstruction
GLASS Flows: Efficient Inference for Reward Alignment of Flow and Diffusion Models
In Good GRACES: Principled Teacher Selection for Knowledge Distillation
Entropy-Monitored Kernelized Token Distillation for Audio-Visual Compression
Information Estimation with Discrete Diffusion
Composer: A Search Framework for Hybrid Neural Architecture Design
Towards a Sharp Analysis of Offline Policy Learning for $f$-Divergence-Regularized Contextual Bandits
Sequential Parallel Duality in Prefix Scannable Models
OpenApps: Simulating Environment Variations to Measure UI Agent Reliability
Bridging the performance-gap between target-free and target-based reinforcement learning
Learning Semi-Structured Sparsity for LLMs via Shared and Context-Aware Hypernetwork
Principled Fast and Meta Knowledge Learners for Continual Reinforcement Learning
EchoGen: Generating Visual Echoes in Any Scene via Feed-Forward Subject-Driven Auto-Regressive Model
Leveraging Discrete Function Decomposability for Scientific Design
Feature segregation by signed weights in artificial vision systems and biological models
Adaptive Test-Time Training for Predicting Need for Invasive Mechanical Ventilation in Multi-Center Cohorts
Embracing Discrete Search: A Reasonable Approach to Causal Structure Learning
A Schrödinger Eigenfunction Method for Long-Horizon Stochastic Optimal Control
Byzantine-Robust Federated Learning with Learnable Aggregation Weights
BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs
From Concepts to Components: Concept-Agnostic Attention Module Discovery in Transformers
Polynomial Convergence of Riemannian Diffusion Models
How reinforcement learning after next-token prediction facilitates learning
Cross-Tokenizer Likelihood Scoring Algorithms for Language Model Distillation
Graph-based Nearest Neighbors with Dynamic Updates via Random Walks
DriftLite: Lightweight Drift Control for Inference-Time Scaling of Diffusion Models
Dens3R: A Foundation Model for 3D Geometry Prediction
Pareto-Conditioned Diffusion Models for Offline Multi-Objective Optimization
Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness
Constrained Diffusion for Protein Design with Hard Structural Constraints
Breaking Agent Backbones: Evaluating the Security of Backbone LLMs in AI Agents
OpenPros: A Large-Scale Dataset for Limited View Prostate Ultrasound Computed Tomography
Diffusion as Infinite HVAEs: Do Diffusion Models Generalize Better than Deep VAEs?
Low-Rank Few-Shot Node Classification by Node-Level Graph Diffusion
Evaluating GFlowNet from partial episodes for stable and flexible policy-based training
Frozen Policy Iteration: Computationally Efficient RL under Linear $Q^{\pi}$ Realizability for Deterministic Dynamics
GlowQ: Group-Shared LOw-Rank Approximation for Quantized LLMs
TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models
Efficient Learning on Large Graphs using a Densifying Regularity Lemma
OPPO: Accelerating PPO-based RLHF via Pipeline Overlap
VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use
Any-Subgroup Equivariant Networks via Symmetry Breaking
Memorization Through the Lens of Sample Gradients
Regularized Latent Dynamics Prediction is a Strong Baseline For Behavioral Foundation Models
Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning
An Information-Theoretic Parameter-Free Bayesian Framework for Probing Labeled Dependency Trees from Attention Score
Combinatorial Rising Bandits
Alignment-Enhanced Integration of Connectivity and Spectral Sparsity in Dynamic Sparse Training of LLM
BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Digital Behavioural Change
Seeing Across Views: Benchmarking Spatial Reasoning of Vision-Language Models in Robotic Scenes
U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding
Battery Fault: A Comprehensive Dataset and Benchmark for Battery Fault Diagnosis
TwinVLA: Data-Efficient Bimanual Manipulation with Twin Single-Arm Vision-Language-Action Models
Escaping Model Collapse via Synthetic Data Verification: Near-term Improvements and Long-term Convergence
A Bayesian Nonparametric Framework for Private, Fair, and Balanced Tabular Data Synthesis
ELEPHANT: Measuring and understanding social sycophancy in LLMs
ODNet: Opinion Dynamics-Inspired Neural Message Passing for Graphs and Hypergraphs
Calibrated Information Bottleneck for Trusted Multi-modal Clustering
KLAS: Using Similarity to Stitch Neural Networks for Improved Accuracy-Efficiency Tradeoffs
GneissWeb: Preparing High Quality Data for LLMs at Scale
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding
LLMs Process Lists With General Filter Heads
UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs
PAT3D: Physics-Augmented Text-to-3D Scene Generation
Merge before Forget: A Single LoRA Continual Learning via Continual Merging
Hippoformer: Integrating Hippocampus-inspired Spatial Memory with Transformers
Implicit Inversion turns CLIP into a Decoder
A Theoretical Analysis of Mamba’s Training Dynamics: Filtering Relevant Features for Generalization in State Space Models
ConfHit: Conformal Generative Design with Oracle-Free Guarantees
Fluent Alignment with Disfluent Judges: Post-training for lower-resource languages
Fixing the Broken Compass: Diagnosing and Improving Inference-Time Reward Modeling
Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model
Reliability-Adjusted Prioritized Experience Replay
CHROMA: Consistent Harmonization of Multi-View Appearance via Bilateral Grid Prediction
Score-Based Density Estimation from Pairwise Comparisons
Statistical Guarantees for Offline Domain Randomization
A Rich Knowledge Space for Scalable Deepfake Detection
Co-LoRA: Collaborative Model Personalization on Heterogeneous Multi-Modal Clients
Sample-Efficient Distributionally Robust Multi-Agent Reinforcement Learning via Online Interaction
Reasoning with Sampling: Your Base Model is Smarter Than You Think
Next Visual Granularity Generation
Long-Text-to-Image Generation via Compositional Prompt Decomposition
Any-Order Flexible Length Masked Diffusion
Certified Evaluation of Model-Level Explanations for Graph Neural Networks
Prima.cpp: Fast 30-70B LLM Inference on Heterogeneous and Low-Resource Home Clusters
Near Optimal Robust Federated Learning Against Data Poisoning Attack
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Neural Synchrony Between Socially Interacting Language Models
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies
Learning to Adapt: In-Context Learning Beyond Stationarity
Reforming the Mechanism: Editing Reasoning Patterns in LLMs with Circuit Reshaping
JointDiff: Bridging Continuous and Discrete in Multi-Agent Trajectory Generation
LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild
Trade-offs in LLM Compute for Reasoning-Intensive Information Retrieval
TAVAE: A VAE with Adaptable Priors Explains Contextual Modulation in the Visual Cortex
Efficient Submodular Maximization for Sums of Concave over Modular Functions
A Representer Theorem for Hawkes Processes via Penalized Least Squares Minimization
Align to Misalign: Automatic LLM Jailbreak with Meta-Optimized LLM Judges
Towards Anomaly-Aware Pre-Training and Fine-Tuning for Graph Anomaly Detection
CONCUR: A Framework for Continual Constrained and Unconstrained Routing
Plug-and-Play Compositionality for Boosting Continual Learning with Foundation Models
Barriers for Learning in an Evolving World: Mathematical Understanding of Loss of Plasticity
Self-Improving Skill Learning for Robust Skill-based Meta-Reinforcement Learning
Sheaves Reloaded: A Direction Awakening
Enhancing Learning with Noisy Labels via Rockafellian Relaxation
REMem: Reasoning with Episodic Memory in Language Agent
Toward Faithful Retrieval-Augmented Generation with Sparse Autoencoders
GraphOmni: A Comprehensive and Extensible Benchmark Framework for Large Language Models on Graph-theoretic Tasks
Pitfalls in Evaluating Language Model Forecasters
When Machine Learning Gets Personal: Evaluating Prediction and Explanation
DiVE-k: DIFFERENTIAL VISUAL REASONING FOR FINE-GRAINED IMAGE RECOGNITION
Convergence Analysis of Tsetlin Machines under Noise-Free and Noisy Training Conditions: From $2$ Bits to $k$ Bits
Multi-Scale Hypergraph Meets LLMs: Aligning Large Language Models for Time Series Analysis
Training Large Language Models To Reason In Parallel With Global Forking Tokens
KRAMABENCH: A Benchmark for AI Systems on Data-to-Insight Pipelines over Data Lakes
Learning to Answer from Correct Demonstrations
Train on Validation (ToV): Fast data selection with applications to fine-tuning
AdaRank: Adaptive Rank Pruning for Enhanced Model Merging
LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation
DiMeR: Disentangled Mesh Reconstruction Model with Normal-only Geometry Training
In-Context Compositional Q-Learning for Offline Reinforcement Learning
Talk, Evaluate, Diagnose: User-aware Agent Evaluation with Automated Error Analysis
Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting
FlowNIB: An Information Bottleneck Analysis of Bidirectional vs. Unidirectional Language Models
WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning
Extreme Weather Nowcasting via Local Precipitation Pattern Prediction
Cooperative Sheaf Neural Networks
Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models
Beyond Pairwise: Empowering LLM Alignment With (Ranked) Choice Modeling
GHOST: Hallucination-Inducing Image Generation for Multimodal LLMs
TandemFoilSet: Datasets for Flow Field Prediction of Tandem-Airfoil Through the Reuse of Single Airfoils
Online Rounding and Learning Augmented Algorithms for Facility Location
Tackling Time-Series Forecasting Generalization via Mitigating Concept Drift
The Unseen Frontier: Pushing the Limits of LLM Sparsity with Surrogate-Free ADMM
Generating metamers of human scene understanding
World-In-World: World Models in a Closed-Loop World
CoFact: Conformal Factuality Guarantees for Language Models under Covariate Shift
CodeSense: a Real-World Benchmark and Dataset for Code Semantic Reasoning
ERGO: Efficient High-Resolution Visual Understanding for Vision-Language Models
XIL: Cross-Expanding Incremental Learning
Action-aware Dynamic Pruning for Efficient Vision-Language-Action Manipulation
IMSE: Intrinsic Mixture of Spectral Experts Fine-tuning for Test-Time Adaptation
: One LLM Token for Explicit Graph Structural Understanding
ContextIF: Enhancing Instruction-Following through Context Reward
Improving Attributed Long-form Question Answering with Intent Awareness
VisionLaw: Inferring Interpretable Intrinsic Dynamics from Visual Observations via Bilevel Optimization
Understanding the Emergence of Seemingly Useless Features in Next-Token Predictors
PEERING INTO THE UNKNOWN: ACTIVE VIEW SELECTION WITH NEURAL UNCERTAINTY MAPS FOR 3D RECONSTRUCTION
EIP: Weighted Ranking of LLMs by Quantifying Question Difficulty
Toward Practical Equilibrium Propagation: Brain-inspired Recurrent Neural Network with Feedback Regulation and Residual Connections
PerSpectra: A Scalable and Configurable Pluralist Benchmark of Perspectives from Arguments
SparseD: Sparse Attention for Diffusion Language Models
ContextBench: Modifying Contexts for Targeted Latent Activation and Behaviour Elicitation
MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning
An Information-Theoretic Framework For Optimizing Experimental Design To Distinguish Probabilistic Neural Codes
Lipschitz Bandits with Stochastic Delayed Feedback
Improving Feasibility via Fast Autoencoder-Based Projections
Reducing Symmetry Increase in Equivariant Neural Networks
Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards
Reward Models Inherit Value Biases from Pretraining
Discrete Diffusion for Bundle Construction
AbdCTBench: Learning Clinical Biomarker Representations from Abdominal Surface Geometry
BridgeDrive: Diffusion Bridge Policy for Closed-Loop Trajectory Planning in Autonomous Driving
How to Cure Newton for Unlearning Neural Networks? An Empirical Study from the Hessian Perspective
Text2Arch: A Dataset for Generating Scientific Architecture Diagrams from Natural Language Descriptions
Can Speech LLMs Think while Listening?
How Do Medical MLLMs Fail? A Study on Visual Grounding in Medical Images
Monocular Normal Estimation via Shading Sequence Estimation
Divide and Abstract: Autoformalization via Decomposition and Abstraction Learning
Rainbow Padding: Mitigating Early Termination in Instruction-Tuned Diffusion LLMs
Plan and Budget: Effective and Efficient Test-Time Scaling on Reasoning Large Language Models
FACET: A Fragment-Aware Conformer Ensemble Transformer
SoFlow: Solution Flow Models for One-Step Generative Modeling
WebArbiter: A Generative Reasoning Process Reward Model for Web Agents
DiaBlo: Diagonal Blocks Are Sufficient For Finetuning
INTIMA: A Benchmark for Human-AI Companionship Behavior
A Dense Subset Index for Collective Query Coverage
Bayesian Parameter Shift Rules in Variational Quantum Eigensolvers
Overshoot and Shrinkage in Classifier-Free Guidance: From Theory to Practice
Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning
When LLMs get significantly worse: A statistical approach to detect model degradations
Beyond RAG vs. Long-Context: Learning Distraction-Aware Retrieval for Efficient Knowledge Grounding
Understanding the Robustness of Distributed Self-Supervised Learning Frameworks Against Non-IID Data
Variational Inference for Cyclic Learning
AWM: Accurate Weight-Matrix Fingerprint for Large Language Models
Change Point Localization and Inference in Dynamic Multilayer Networks
A Generalized Geometric Theoretical Framework of Centroid Discriminant Analysis for Linear Classification of Multi-dimensional Data
IncVGGT: Incremental VGGT for Memory-Bounded Long-Range 3D Reconstruction
When Data is the Algorithm: A Systematic Study and Curation of Preference Optimization Datasets
DISK: Differentiable Sparse Kernel Complex for Efficient Spatially-Variant Convolution
Reading Images Like Texts: Sequential Image Understanding in Vision-Language Models
Boosted Trees on a Diet: Compact Models for Resource-Constrained Devices
Distributed Quasi-Newton Method for Fair and Fast Federated Learning
Learning Exposure Mapping Functions for Inferring Heterogeneous Peer Effects
Pixel-Level Residual Diffusion Transformer: Scalable 3D CT Volume Generation
Thought Branches: Interpreting LLM Reasoning Requires Resampling
Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
LaplacianFormer:Rethinking Linear Attention with Laplacian Kernel
MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation
Compositional-ARC: Assessing Systematic Generalization in Abstract Spatial Reasoning
Differentiable Model Predictive Control on the GPU
Learning to Reason as Action Abstractions with Scalable Mid-Training RL
Queue Length Regret Bounds for Contextual Queueing Bandits
When Flatness Does (Not) Guarantee Adversarial Robustness
Signal in the Noise: Polysemantic Interference Transfers and Predicts Cross-Model Influence
Fused-Planes: Why Train a Thousand Tri-Planes When You Can Share?
Residual Feature Integration is Sufficient to Prevent Negative Transfer
Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model
General search techniques without common knowledge for imperfect-information games, and application to superhuman Fog of War chess
TriQDef: Disrupting Semantic and Gradient Alignment to Prevent Adversarial Patch Transferability in Quantized Neural Networks
Single Index Bandits: Generalized Linear Contextual Bandits with Unknown Reward Functions
pFedMMA: Personalized Federated Fine-Tuning with Multi-Modal Adapter for Vision-Language Models
Zephyrus: An Agentic Framework for Weather Science
SmellNet: A Dataset for Sensor-Based Smell Recognition and Mixture Prediction
Privacy-Protected Causal Survival Analysis Under Distribution Shift
Solving the 2-norm k-hyperplane clustering problem via multi-norm formulations
Off-Policy Safe Reinforcement Learning with Cost-Constrained Optimistic Exploration
SCAD: Super-Class-Aware Debiasing for Long-Tailed Semi-Supervised Learning
Type-Compliant Adaptation Cascades
Robust Fine-Tuning from Non-Robust Pretrained Models: Mitigating Suboptimal Transfer With Epsilon-Scheduling
Strictly Constrained Generative Modeling via Split Augmented Langevin Sampling
FROST: Filtering Reasoning Outliers with Attention for Efficient Reasoning
Path Channels and Plan Extension Kernels: a Mechanistic Description of Planning in a Sokoban RNN
Large Language Model Compression with Global Rank and Sparsity Optimization
HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs
More Than What Was Chosen: LLM-based Explainable Recommendation Beyond Noisy User Preferences
Toward Safer Diffusion Language Models: Discovery and Mitigation of Priming Vulnerability
Learning multimodal dictionary decompositions with group-sparse autoencoders
Robustness of Probabilistic Models to Low-Quality Data: A Multi-Perspective Analysis
Task-Adaptive Parameter-Efficient Fine-Tuning for Weather Foundation Models
OFMU: OPTIMIZATION-DRIVEN FRAMEWORK FOR MACHINE UNLEARNING
Energy-Based Transformers are Scalable Learners and Thinkers
Chart Deep Research in LVLMs via Parallel Relative Policy Optimization
Simplicial Embeddings Improve Sample Efficiency in Actor–Critic Agents
Dual-Kernel Adapter: Expanding Spatial Horizons for Data-Constrained Medical Image Analysis
Beware Untrusted Simulators -- Reward-Free Backdoor Attacks in Reinforcement Learning
EA3D: Event-Augmented 3D Diffusion for Generalizable Novel View Synthesis
Diffusion and Flow-based Copulas: Forgetting and Remembering Dependencies
Asymmetric Proximal Policy Optimization: mini-critics boost LLM reasoning
Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training
Bayesian Influence Functions for Hessian-Free Data Attribution
Optimizing Agent Planning for Security and Autonomy
Grounding Computer Use Agents on Human Demonstrations
Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation
HeurekaBench: A Benchmarking Framework for AI Co-scientist
High-dimensional Analysis of Synthetic Data Selection
Hessian-Enhanced Token Attribution (HETA): Interpreting Autoregressive LLMs
UniTrack: Differentiable Graph Representation Learning for Multi-Object Tracking
Spectral Attention Steering for Prompt Highlighting
One Demo Is All It Takes: Planning Domain Derivation with LLMs from A Single Demonstration
UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding
MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark for Reasoning-Intensive Multimodal Retrieval
AnyBCQ: Hardware Efficient Flexible Binary-Coded Quantization for Multi-Precision LLMs
Hyperbolic Aware Minimization: Implicit Bias for Sparsity
Toward Principled Flexible Scaling for Self-Gated Neural Activation
Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation
Neodragon: Mobile Video Generation Using Diffusion Transformer
Temporal Generalization: A Reality Check
Are Deep Speech Denoising Models Robust to Adversarial Noise?
Enhancing Hallucination Detection through Noise Injection
Hubble: a Model Suite to Advance the Study of LLM Memorization
MergeTune: Continued Fine-Tuning of Vision-Language Models
Fair Graph Machine Learning under Adversarial Missingness Processes
Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods
CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs
Scalable Offline Model-Based RL with Action Chunks
Building spatial world models from sparse transitional episodic memories
SWERank: Software Issue Localization with Code Ranking
ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents
Reliable Weak-to-Strong Monitoring of LLM Agents
Pairwise is Not Enough: Hypergraph Neural Networks for Multi-Agent Pathfinding
Explainable Token-level Noise Filtering for LLM Fine-tuning Datasets
Learning to Summarize by Learning to Quiz: Adversarial Agentic Collaboration for Long Document Summarization
High-dimensional Mean-Field Games by Particle-based Flow Matching
Knowledge Distillation as Decontamination? Revisiting the “Data Laundering” Concern in Classification Tasks
CLUTCH: Contextualized Language model for Unlocking Text-Conditioned Hand motion modelling in the wild
Dynamics-inspired Structure Hallucination for Protein-protein Interaction Modeling
ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
Helmsman: Autonomous Synthesis of Federated Learning Systems via Collaborative LLM Agents
Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks
Curvature-Guided Task Synergy for Skeleton based Temporal Action Segmentation
On the Expressiveness of State Space Models via Temporal Logics
Mastering Sparse CUDA Generation through Pretrained Models and Deep Reinforcement Learning
Decentralized Nonconvex Optimization under Heavy-Tailed Noise: Normalization and Optimal Convergence
Textual Bayes: Quantifying Prompt Uncertainty in LLM-Based Systems
TEST-TIME SCALING IN DIFFUSION LLMS VIA HIDDEN SEMI-AUTOREGRESSIVE EXPERTS
Compositional Generalization from Learned Skills via CoT Training: A Theoretical and Structural Analysis for Reasoning
MAD-Logic: Multi-Agent Debate Enhances Symbolic Translation and Reasoning
FAME: Formal Abstract Minimal Explanation for Neural Networks
RPM: Reasoning-Level Personalization for Black-Box Large Language Models
Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry
Never Saddle for Reparameterized Steepest Descent as Mirror Flow
Language and Experience: A Computational Model of Social Learning in Complex Tasks
Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction
SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests
GIQ: Benchmarking 3D Geometric Reasoning of Vision Foundation Models with Simulated and Real Polyhedra
Features Emerge as Discrete States: The First Application of SAEs to 3D Representations
CodeGenGuard: A Watermark for Code Generation Models
An Information-Theoretic Lower Bound on the Generalization Error of Autoencoders
One Life to Learn: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration
VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents
Mitigating Noise Shift in Denoising Generative Models with Noise Awareness Guidance
Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning
LLM DNA: Tracing Model Evolution via Functional Representations
AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill Diversification
f-INE: A Hypothesis Testing Framework for Estimating Influence under Training Randomness
The Tutor-Pupil Augmentation: Enhancing Learning and Interpretability via Input Corrections
CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Fresh in memory: Training-order recency is linearly encoded in language model activations
Secure Outlier-Aware Large Language Model Inference
Latent Fourier Transform
Exo-Plore: Exploring Exoskeleton Control Space through Human-aligned Simulation
A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling
Orbital Transformers for Predicting Wavefunctions in Time-Dependent Density Functional Theory
Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning
Converge Faster, Talk Less: Hessian-Informed Federated Zeroth-Order Optimization
Weak-to-Strong Generalization with Failure Trajectories
Better Learning-Augmented Spanning Tree Algorithms via Metric Forest Completion
LORE: Jointly Learning The Intrinsic Dimensionality and Relative Similarity Structure from Ordinal Data
Designing Time Series Experiments in A/B Testing with Transformer Reinforcement Learning
Next-ToBE: Probabilistic Next Token-Bag Exploitation for Activating Anticipatory Capacity in LLMs
Toward Universal and Transferable Jailbreak Attacks on Vision-Language Models
Modeling Others' Minds as Code
Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance
Low Rank Transformer for Multivariate Time Series Anomaly Detection and Localization
Faster Vision Transformers with Adaptive Patches
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation
Self-Improving Loops for Visual Robotic Planning
DISCO: Diversifying Sample Condensation for Efficient Model Evaluation
Asynchronous Policy Gradient Aggregation for Efficient Distributed Reinforcement Learning
Don't Settle Too Early: Self-Reflective Remasking for Diffusion Language Models
Reinforcement Learning Fine-Tuning Enhances Activation Intensity and Diversity in the Internal Circuitry of LLMs
EVEREST: A Transformer for Probabilistic Rare-Event Anomaly Detection with Evidential and Tail-Aware Uncertainty
Positional Encoding Field
Low-pass Personalized Subgraph Federated Recommendation
Parameter-Efficient Reinforcement Learning using Prefix Optimization
Co-occurring Associated REtained concepts in Diffusion Unlearning
MVCustom: Multi-View Customized Diffusion via Geometric Latent Rendering and Completion
DeNOTS: Stable Deep Neural ODEs for Time Series
Sci2Pol: Evaluating and Fine-tuning LLMs on Scientific-to-Policy Brief Generation
Representation-Based Exploration for Language Models: From Test-Time to Post-Training
Model-Guided Microstimulation Steers Primate Visual Behavior
Frequency-Domain Better than Time-Domain for Causal Structure Recovery in Dynamical Systems on Networks
SCRAPL: Scattering Transform with Random Paths for Machine Learning
AirQA: A Comprehensive QA Dataset for AI Research with Instance-Level Evaluation
LEGATO: Large-scale End-to-end Generalizable Approach to Typeset OMR
RACE Attention: A Strictly Linear-Time Attention for Long-Sequence Training
A Scalable Constant-Factor Approximation Algorithm for $W_p$ Optimal Transport
MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes
Generative Modeling from Black-Box Corruptions via Self-Consistent Stochastic Interpolants
Sample Complexity and Representation Ability of Test-time Scaling Paradigms
LoRAGen: Structure-Aware Weight Space Learning for LoRA Generation
Beyond Uniformity: Regularizing Implicit Neural Representations through a Lipschitz Lens
Tackling the XAI Disagreement Problem with Adaptive Feature Grouping
MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models
Polynomial, trigonometric, and tropical activations
WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction
WearVox: An Egocentric Multichannel Voice Assistant Benchmark for Wearables
Virtual Community: An Open World for Humans, Robots, and Society
LaVCa: LLM-assisted Visual Cortex Captioning
DeepTRACE: Auditing Deep Research AI Systems for Tracking Reliability Across Citations and Evidence
Unified Multi-Modal Interactive and Reactive 3D Motion Generation via Rectified Flow
Contextual Causal Bayesian Optimisation
MARS - A Foundational Map Auto-Regressor
Same Content, Different Representations: A Controlled Study for Table QA
Direct Doubly Robust Estimation of Conditional Quantile Contrasts
SPIKE-RL: Video-LLMs meet Bayesian Surprise
Adaptive Regularization for Large-Scale Sparse Feature Embedding Models
Mapping Overlaps in Benchmarks through Perplexity in the Wild
Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs
GAGA: Gaussianity-Aware Gaussian Approximation for Efficient 3D Molecular Generation
ResT: Reshaping Token-Level Policy Gradients for Tool-Use Large Language Models
SPACeR: Self-Play Anchoring with Centralized Reference Models
Internal Evaluation of Density-Based Clusterings with Noise
Distribution-Aware Multi-Granularity Phase Coding: Towards Lower Conversion Error for Spike-Driven Large Language Models
Oversmoothing, "Oversquashing'', Heterophily, Long-Range, and more: Demystifying Common Beliefs in Graph Machine Learning
On the Expressive Power of GNNs for Boolean Satisfiability
AVERE: Improving Audiovisual Emotion Reasoning with Preference Optimization
Long-Context Generalization with Sparse Attention
DuPO: Enabling Reliable Self-Verification via Dual Preference Optimization
Enhancing Language Model Reasoning with Structured Multi-Level Modeling
Attention, Please! Revisiting Attentive Probing Through the Lens of Efficiency
Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning
SHE-LoRA: Selective Homomorphic Encryption for Federated Tuning with Heterogeneous LoRA
Subquadratic Algorithms and Hardness for Attention with Any Temperature
Does Higher Interpretability Imply Better Utility? A Pairwise Analysis on Sparse Autoencoders
MobileIPL: Enhancing Mobile Agents Thinking Process via Iterative Preference Learning
Get RICH or Die Scaling: Profitably Trading Inference Compute for Robustness
SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors
Demystifying Robot Diffusion Policies: Action Memorization and a Simple Lookup Table Alternative
RoboInter: A Holistic Intermediate Representation Suite Towards Robotic Manipulation
Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts
Spectral-guided Physical Dynamics Distillation
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
BEAT: Visual Backdoor Attacks on VLM-based Embodied Agents via Contrastive Trigger Learning
RobotArena $\infty$: Scalable Robot Benchmarking via Real-to-Sim Translation
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks
Enhancing Instruction Following of LLMs via Activation Steering with Dynamic Rejection
STEER AWAY FROM MODE COLLISIONS: IMPROVING COMPOSITION IN DIFFUSION MODELS
On the Bayes Inconsistency of Disagreement Discrepancy Surrogates
Universal Value-Function Uncertainties
Does Weak-to-strong Generalization Happen under Spurious Correlations?
BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses
SASFT: Sparse Autoencoder-guided Supervised Finetuning to Mitigate Unexpected Code-Switching in LLMs
Incomplete Data, Complete Dynamics: A Diffusion Approach
Implicit Bias and Loss of Plasticity in Matrix Completion: Depth Promotes Low-Rankness
ROGA: Scaling Generalist Agents for Office Productivity Tasks via Tool Generation
CompMarkGS: Robust Watermarking for Compressed 3D Gaussian Splatting
When Does Divide and Conquer Work for Long Context LLM? A Noise Decomposition Framework
HGNet: Scalable Foundation Model for Automated Knowledge Graph Generation from Scientific Literature
Unified 3D Scene Understanding Through Physical World Modeling
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm
VERIFY: A Novel Multi-Domain Dataset Grounding LTL in Contextual Natural Language via Provable Intermediate Logic
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Enforcing Axioms for AI Alignment under Loss-Based Rules
HDR-NSFF: High Dynamic Range Neural Scene Flow Fields
When Language Models Lose Their Mind: The Consequences of Brain Misalignment
Learning Koopman Representations with Controllability Guarantees
Unsupervised Representation Learning - an Invariant Risk Minimization Perspective
Search Arena: Analyzing Search-Augmented LLMs
PoliCon: Evaluating LLMs on Achieving Diverse Political Consensus Objectives
TIMESLIVER : SYMBOLIC-LINEAR DECOMPOSITION FOR EXPLAINABLE TIME SERIES CLASSIFICATION
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs
SAFER: Risk-Constrained Sample-then-Filter in Large Language Models
Toward Efficient Exploration by Large Language Model Agents
Towards Reliable Detection of Empty Space: Conditional Marked Point Processes for Object Detection
Much Ado About Noising: Dispelling the Myths of Generative Robotic Control
Human-LLM Collaborative Feature Engineering for Tabular Data
Faithful Bi-Directional Model Steering via Distribution Matching and Distributed Interchange Interventions
Plan then Act: Bi-level CAD Command Sequence Generation
Massive Memorization with Hundreds of Trillions of Parameters for Sequential Transducer Generative Recommenders
Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime
Nearly Space-Optimal Graph and Hypergraph Sparsification in Insertion-Only Data Streams
$\textit{MADFormer}$: Mixed Autoregressive and Diffusion Transformers for Continuous Image Generation
The Lattice Geometry of Neural Network Quantization: A Short Equivalence Proof of GPTQ and Babai's Algorithm
Membership Privacy Risks of Sharpness Aware Minimization
Universal Multi-Domain Translation via Diffusion Routers
A Probabilistic Hard Concept Bottleneck for Steerable Generative Models
VEAttack: Downstream-agnostic Vision Encoder Attack against Large Vision Language Models
Following the Navigation: Enhancing Small Language Models Contextual Reasoning with LLM Guidance
Convergence of an actor-critic gradient flow for entropy regularised MDPs in general spaces
Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension
DiVeQ: Differentiable Vector Quantization Using the Reparameterization Trick
GOT-Edit: Geometry-Aware Generic Object Tracking via Online Model Editing
Formalising Human-in-the-Loop: Computational Reductions, Failure Modes, and Legal-Moral Responsibility
The Quest for Generalizable Motion Generation: Data, Model, and Evaluation
SafeDialBench: A Fine-Grained Safety Evaluation Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak Attacks
LoongRL: Reinforcement Learning for Advanced Reasoning over Long Contexts
DUET: Distilled LLM Unlearning from an Efficiently Contextualized Teacher
Birch SGD: A Tree Graph Framework for Local and Asynchronous SGD Methods
Frequency-Balanced Retinal Representation Learning with Mutual Information Regularization
Benchmarking Stochastic Approximation Algorithms for Fairness-Constrained Training of Deep Neural Networks
HBO: Hierarchical Balancing Optimization for Fine-Tuning Large Language Models
CALM: Co-evolution of Algorithms and Language Model for Automatic Heuristic Design
On the Alignment Between Supervised and Self-Supervised Contrastive Learning
Fed-Duet: Dual Expert-Orchestrated Framework for Continual Federated Vision-Language Learning
Epistemic Uncertainty Quantification To Improve Decisions From Black-Box Models
Gogo: Group-wise granularity-ordered codec for stable and efficient speech generation
From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers
SEED: Towards More Accurate Semantic Evaluation for Visual Brain Decoding
Characteristic Root Analysis and Regularization for Linear Time Series Forecasting
Localized Concept Erasure in Text-to-Image Diffusion Models via High-Level Representation Misdirection
Jailbreaking on Text-to-Video Models via Scene Splitting Strategy
Random Controlled Differential Equations
Towards a Transferable Acceleration Method for Density Functional Theory
Continuously Augmented Discrete Diffusion model for Categorical Generative Modeling
AIRE-Prune: Asymptotic Impulse-Response Energy for State Pruning in State Space Models
Concepts' Information Bottleneck Models
EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems
T-TAMER: Provably Taming Trade-offs in ML Serving
What happens when generative AI models train recursively on each others' outputs?
From Predictors to Samplers via the Training Trajectory
QLCoder: A Query Synthesizer For Static Analysis of Security Vulnerabilities
TSLM: Tree-Structured Language Modeling for Divergent Thinking
Federated ADMM from Bayesian Duality
Identifying and Evaluating Inactive Heads in Pretrained LLMs
SpectraLLM: Uncovering the Ability of LLMs for Molecule Structure Elucidation from Multi-Spectra
In-Context Algebra
KANO: Kolmogorov-Arnold Neural Operator
Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights
TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design
BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, and Rerankers
Operator Learning with Domain Decomposition for Geometry Generalization in PDE Solving
Boosting Multi-Domain Reasoning of LLMs via Curvature-Guided Policy Optimization
Synthetic History: Evaluating Visual Representations of the Past in Diffusion Models
Compose Your Policies! Improving Diffusion-based or Flow-based Robot Policies via Test-time Distribution-level Composition
Rethinking Model Calibration through Spectral Entropy Regularization in Medical Image Segmentation
A Reward-Free Viewpoint on Multi-Objective Reinforcement Learning
Bridging Kolmogorov Complexity and Deep Learning: Asymptotically Optimal Description Length Objectives for Transformers
Accessible, Realistic, and Fair Evaluation of Positive-Unlabeled Learning Algorithms
Physics-informed learning under mixing: How physical knowledge speeds up learning
Stylos: Multi-View 3D Stylization with Single-Forward Gaussian Splatting
Towards Scalable Oversight via Partitioned Human Supervision
MergOPT: A Merge-Aware Optimizer for Robust Model Merging
Rodrigues Network for Learning Robot Actions
MoRA: Mobility as the Backbone for Geospatial Representation Learning at Scale
Uncovering Semantic Selectivity of Latent Groups in Higher Visual Cortex with Mutual Information-Guided Diffusion
Sobolev Gradient Ascent for Optimal Transport: Barycenter Optimization and Convergence Analysis
SceneStreamer: Continuous Scenario Generation as Next Token Group Prediction
Learning To Draft: Adaptive Speculative Decoding with Reinforcement Learning
Composable Sparse Subnetworks via Maximum-Entropy Principle
Hybrid Training for Vision-Language-Action Models
The Counting Power of Transformers
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Scalable Energy-Based Models via Adversarial Training: Unifying Discrimination and Generation
Can Vision-Language Models Answer Face to Face Questions in the Real-World?
PHyCLIP: $\ell_1$-Product of Hyperbolic Factors Unifies Hierarchy and Compositionality in Vision-Language Representation Learning
Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models
WARC-Bench: Web Archive based Benchmark for GUI Subtask Executions
Best-of-Majority: Minimax-Optimal Strategy for Pass@k Inference Scaling
Token-level Data Selection for Safe LLM Fine-tuning
Neuron-Level Analysis of Cultural Understanding in Large Language Models
HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization
Detecting and Mitigating Memorization in Diffusion Models through Anisotropy of the Log-Probability
TileLang: Bridge Programmability and Performance in Modern Neural Kernels
To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models
Prune-then-Quantize or Quantize-then-Prune? Understanding the Impact of Compression Order in Joint Model Compression
Sculptor: Empowering LLMs with Cognitive Agency via Active Context Management
Deep FlexQP: Accelerated Nonlinear Programming via Deep Unfolding
Predicting LLM Reasoning Performance with Small Proxy Model
Embodied Agents Meet Personalization: Investigating Challenges and Solutions Through the Lens of Memory Utilization
Exploring Real-Time Super-Resolution: Benchmarking and Fine-Tuning for Streaming Content
Graph Signal Processing Meets Mamba2: Adaptive Filter Bank via Delta Modulation
Importance Sampling for Multi-Negative Multimodal Direct Preference Optimization
UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos
TurboBoA: Faster and Exact Attention-aware Quantization without Backpropagation
ToolWeaver: Weaving Collaborative Semantics for Scalable Tool Use in Large Language Models
Query-Specific Causal Graph Pruning Under Tiered Knowledge
Analysis of approximate linear programming solution to Markov decision problem with log barrier function
HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation
Towards Lossless Memory-efficient Training of Spiking Neural Networks via Gradient Checkpointing and Spike Compression
Cross-Embodiment Offline Reinforcement Learning for Heterogeneous Robot Datasets
OrthoRF: Exploring Orthogonality in Object-Centric Representations
Towards All-Atom Foundation Models for Biomolecular Binding Affinity Prediction
Soft-Masked Diffusion Language Models
RefineBench: Evaluating Refinement Capability of Language Models via Checklists
Motion Prior Distillation in Time Reversal Sampling for Generative Inbetweening
DeepSADR: Deep Transfer Learning with Subsequence Interaction and Adaptive Readout for Cancer Drug Response Prediction
DRPO: Efficient Reasoning via Decoupled Reward Policy Optimization
Joint Optimization for 4D Human-Scene Reconstruction in the Wild
From Assumptions to Actions: Turning LLM Reasoning into Uncertainty-Aware Planning for Embodied Agents
Inlier-Centric Post-Training Quantization for Object Detection Models
Detective SAM: Adaptive AI-Image Forgery Localization
Fair Reinforcement Learning for Just AI
GRAM-DTI: Adaptive Multimodal Representation Learning for Drug–Target Interaction Prediction
SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training
NC-Bench and NCfold: A Benchmark and Closed-Loop Framework for RNA Non-Canonical Base-Pair Prediction
PoinnCARE: Hyperbolic Multi-Modal Learning for Enzyme Classification
Uncertainty Estimation via Hyperspherical Confidence Mapping
Bayesian Robust Cooperative Multi-Agent Reinforcement Learning Against Unknown Adversaries
On the Limits of Sparse Autoencoders: A Theoretical Framework and Reweighted Remedy
CARD: Towards Conditional Design of Multi-agent Topological Structures
Federated Graph-Level Clustering Network with Dual Knowledge Separation
Bayesian Attention Mechanism: A Probabilistic Framework for Positional Encoding and Context Length Extrapolation
SlotGCG: Exploiting the Positional Vulnerability in LLMs for Jailbreak Attacks
Reasoning Scaffolding: Distilling the Flow of Thought from LLMs
Unmute the Patch Tokens: Rethinking Probing in Multi-Label Audio Classification
How to Lose Inherent Counterfactuality in Reinforcement Learning
Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction
Paradigm Shift of GNN Explainer from Label Space to Prototypical Representation Space
MergePRAG: Orthogonal Merging of Passage-experts for Multi-hop Parametric RAG
Dataset Distillation as Pushforward Optimal Quantization
A Statistical Learning Perspective on Semi-dual Adversarial Neural Optimal Transport Solvers
AutoCode: LLMs as Problem Setters for Competitive Programming
PepBenchmark: A Standardized Benchmark for Peptide Machine Learning
Multiplicative Diffusion Models: Beyond Gaussian Latents
Bias Similarity Measurement: A Black-Box Audit of Fairness Across LLMs
Unified Biomolecular Trajectory Generation via Pretrained Variational Bridge
Test-Time Adaptation without Source Data for Out-of-Domain Bioactivity Prediction
Minor First, Major Last: A Depth-Induced Implicit Bias of Sharpness-Aware Minimization
Rethinking Policy Diversity in Ensemble Policy Gradient in Large-Scale Reinforcement Learning
Regret-Guided Search Control for Efficient Learning in AlphaZero
VenusX: Unlocking Fine-Grained Functional Understanding of Proteins
Reference-guided Policy Optimization for Molecular Optimization via LLM Reasoning
Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework
PINFDiT: Energy-Based Physics-Informed Diffusion Transformers for General-purpose Time Series Tasks
NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents
Light Differentiable Logic Gate Networks
Spilled Energy in Large Language Models
MAC-AMP: A Closed-Loop Multi-Agent Collaboration System for Multi-Objective Antimicrobial Peptide Design
USTBench: Benchmarking and Dissecting Spatiotemporal Reasoning Capabilities of LLMs as Urban Agents
Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles
Self-Destructive Language Models
Adaptive Acquisition Selection for Bayesian Optimization with Large Language Models
Refine Now, Query Fast: A Decoupled Refinement Paradigm for Implicit Neural Fields
Learning from the Electronic Structure of Molecules across the Periodic Table
Learning to Reason for Hallucination Span Detection
SiNGER: A Clearer Voice Distills Vision Transformers Further
Deep Learning with Learnable Product-Structured Activations
Detecting Invariant Manifolds in ReLU-Based RNNs
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
Sharing State Between Prompts and Programs
Open-Set Semantic Gaussian Splatting SLAM with Expandable Representation
Point-wise Anomaly Detection via Fold-bifurcation ODE
The Art of Scaling Reinforcement Learning Compute for LLMs
Symmetric Space Learning for Combinatorial Generalization
TraPO: A Semi-Supervised Reinforcement Learning Framework for Boosting LLM Reasoning
Strategic Planning and Rationalizing on Trees Make LLMs Better Debaters
Video-GPT via Next Clip Diffusion
CryoSplat: Gaussian Splatting for Cryo-EM Homogeneous Reconstruction
Let's (not) just put things in Context: Test-time Training for Long-context LLMs
Enhancing Sparse Event Detection in Healthcare Time-Series via Adaptive Gate of Context–Detail Interaction
Test-Time Alignment for Large Language Models via Textual Model Predictive Control
CUPID: A Plug-in Framework for Joint Aleatoric and Epistemic Uncertainty Estimation with a Single Model
Fast Proteome-Scale Protein Interaction Retrieval via Residue-Level Factorization
First is Not Really Better Than Last: Evaluating Layer Choice and Aggregation Strategies in Language Model Data Influence Estimation
A Joint Diffusion Model with Pre-Trained Priors for RNA Sequence-Structure Co-Design
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation
Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models
TimeSeriesExamAgent: Creating Time Series Reasoning Benchmarks at Scale
Does FLUX Already Know How to Perform Physically Plausible Image Composition?
InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models
Disentanglement of Variations with Multimodal Generative Modeling
Incentivizing Consistent, Effective and Scalable Reasoning Capability in Audio LLMs via Reasoning Process Rewards
Reasoned Safety Alignment: Ensuring Jailbreak Defense via Answer-Then-Check
Lost in Tokenization: Context as the Key to Unlocking Biomolecular Understanding in Scientific LLMs
A Balanced Neuro-Symbolic Approach for Commonsense Abductive Logic
A Single Architecture for Representing Invariance Under Any Space Group
Vision Language Models are Biased
BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models
Topology and geometry of the learning space of ReLU networks: connectivity and singularities
A New Paradigm for Genome-wide DNA Methylation Prediction Without Methylation Input
Completing Missing Annotation: Multi-Agent Debate for Accurate and Scalable Relevant Assessment for IR Benchmarks
SAGE: Spatial-visual Adaptive Graph Exploration for Efficient Visual Place Recognition
TS$^2$: Training with Sparsemax+, Testing with Softmax for Accurate and Diverse LLM Fine-Tuning
Adaptive Data-Knowledge Alignment in Genetic Perturbation Prediction
ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack
Learning to Reason in Structured In-context Environments with Reinforcement Learning
The 99% Success Paradox: When Near-Perfect Retrieval Equals Random Selection
Learning linear state-space models with sparse system matrices
STORK: Faster Diffusion and Flow Matching Sampling by Resolving both Stiffness and Structure-Dependence
Accelerated Learning with Linear Temporal Logic using Differentiable Simulation
Adaptive Conformal Guidance for Learning under Uncertainty
U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs
CSRv2: Unlocking Ultra-Sparse Embeddings
Flock: A Knowledge Graph Foundation Model via Learning on Random Walks
Seeing Through the PRISM: Compound & Controllable Restoration of Scientific Images
AsyncBEV: Cross-modal flow alignment in Asynchronous 3D Object Detection
Understanding Task Vectors in In-Context Learning: Emergence, Functionality, and Limitations
Robustness in Text-Attributed Graph Learning: Insights, Trade-offs, and New Defenses
Beyond Membership: Limitations of Add/Remove Adjacency in Differential Privacy
When a Robot is More Capable than a Human: Learning from Constrained Demonstrators
Evolution of Flash Attention
TokenSeek: Memory Efficient Fine Tuning via Instance-Aware Token Ditching
Efficient Turing Machine Simulation with Transformers
Time Optimal Execution of Action Chunk Policies Beyond Demonstration Speed
Control Tax: The Price of Keeping AI in Check
Reducing information dependency does not cause training data privacy. Adversarially non-robust features do.
Reinforcement Unlearning via Group Relative Policy Optimization
Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs
Medical Interpretability and Knowledge Maps of Large Language Models
Prior-free Tabular Test-time Adaptation
Angle K-Means
FrontierCO: Real-World and Large-Scale Evaluation of Machine Learning Solvers for Combinatorial Optimization
Low rank adaptation of chemical foundation models generate effective odorant representations
VITA: Vision-to-Action Flow Matching Policy
NRGPT: An Energy-based Alternative for GPT
On Fairness of Task Arithmetic: The Role of Task Vectors
Causal-Steer: Disentangled Continuous Style Control without Parallel Corpora
Towards Quantization-Aware Training for Ultra-Low-Bit Reasoning LLMs
Learning to summarize user information for personalized reinforcement learning from human feedback
Nef-Net v2: Adapting Electrocardio Panorama in the wild
AutoFly: Vision-Language-Action Model for UAV Autonomous Navigation in the Wild
Self-Refining Vision Language Model for Robotic Failure Detection and Reasoning
Disentangling Knowledge Representations for Large Language Model Editing
DRIFT: Divergent Response in Filtered Transformations for Robust Adversarial Defense
Special Unitary Parameterized Estimators of Rotation
NAB: Neural Adaptive Binning for Sparse-View CT reconstruction
Inducing Dyslexia in Vision Language Models
Brain-Semantoks: Learning Semantic Tokens of Brain Dynamics with a Self-Distilled Foundation Model
Generalizing Linear Autoencoder Recommenders with Decoupled Expected Quadratic Loss
CARE: Towards Clinical Accountability in Multi-Modal Medical Reasoning with an Evidence-Grounded Agentic Framework
Towards Dynamic Interleaving Optimizers
Atomic HINs: Entity-Attribute Duality for Heterogeneous Graph Modeling
Scalable In-Context Q-Learning
PatchDNA: A Flexible and Biologically-Informed Alternative to Tokenization for DNA
One-Step Flow Q-Learning: Addressing the Diffusion Policy Bottleneck in Offline Reinforcement Learning
GALAX: Graph-Augmented Language Model for Explainable Reinforcement-Guided Subgraph Reasoning in Precision Medicine
Temporal Concept Dynamics in Diffusion Models via Prompt-Conditioned Interventions
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents
Superficial Safety Alignment Hypothesis
Compactness and Consistency: A Conjoint Framework for Deep Graph Clustering
CellAgent: LLM-Driven Multi-Agent Framework for Natural Language-Based Single-Cell Analysis
Bridging Explainability and Embeddings: BEE Aware of Spuriousness
Locally Subspace-Informed Neural Operators for Efficient Multiscale PDE Solving
Splat and Distill: Augmenting Teachers with Feed-Forward 3D Reconstruction For 3D-Aware Distillation
Enabling Fine-Tuning of Direct Feedback Alignment via Feedback-Weight Matching
Representing local protein environments with machine learning force fields
Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness
When Priors Backfire: On the Vulnerability of Unlearnable Examples to Pretraining
The False Promise of Zero-Shot Super-Resolution in Machine-Learned Operators
Preventing Model Collapse Under Overparametrization: Optimal Mixing Ratios for Interpolation Learning and Ridge Regression
SAGA: Structural Aggregation Guided Alignment with Dynamic View and Neighborhood Order Selection for Multiview Graph Domain Adaptation
No Pixel Left Behind: A Detail-Preserving Architecture for Robust High-Resolution AI-Generated Image Detection
Multiple-Prediction-Powered Inference
Unsupervised Representation Learning for 3D Mesh Parameterization with Semantic and Visibility Objectives
SmartChunk Retrieval: Query-Aware Chunk Compression with Planning for Efficient Document RAG
Advancing Universal Deep Learning for Electronic-Structure Hamiltonian Prediction of Materials
Bound by semanticity: universal laws governing the generalization-identification tradeoff
Learning a Game by Paying the Agents
Influence Dynamics and Stagewise Data Attribution
HardcoreLogic: Challenging Large Reasoning Models with Long-tail Logic Puzzle Games
CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics
There and Back Again: On the relation between Noise and Image Inversions in Diffusion Models
COOPERTRIM: Adaptive Data Selection for Uncertainty-Aware Cooperative Perception
Conformal Prediction for Long-Tailed Classification
SPARTA: Scalable and Principled Benchmark of Tree-Structured Multi-hop QA over Text and Tables
Inferring the Invisible: Neuro-Symbolic Rule Discovery for Missing Value Imputation
Time-To-Inconsistency: A Survival Analysis of Large Language Model Robustness to Adversarial Attacks
Emergent Dexterity Via Diverse Resets and Large-Scale Reinforcement Learning
MuonBP: Faster Muon via Block-Periodic Orthogonalization
RESTRAIN: From Spurious Votes to Signals — Self-Training RL with Self-Penalization
Simulation to Rules: A Dual-VLM Framework for Formal Visual Planning
ST-SimDiff: Balancing Spatiotemporal Similarity and Difference for Efficient Video Understanding with MLLMs
Spatial CAPTCHA: Generatively Benchmarking Spatial Reasoning for Human-Machine Differentiation
Multi-View Encoders for Performance Prediction in LLM-Based Agentic Workflows
Unleashing Guidance Without Classifiers for Human-Object Interaction Animation
The Rank and Gradient Lost in Non-stationarity: Sample Weight Decay for Mitigating Plasticity Loss in Reinforcement Learning
SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration
WebDS: An End-to-End Benchmark for Web-based Data Science
Enhancing Multi-Image Understanding through Delimiter Token Scaling
Online Black-Box Prompt Optimization with Regret Guarantees under Noisy Feedback
Greater than the Sum of Its Parts: Building Substructure into Protein Encoding Models
FlowBind: Efficient Any-to-Any Generation with Bidirectional Flows
Translating Flow to Policy via Hindsight Online Imitation
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends
Scalable Exploration for High-Dimensional Continuous Control via Value-Guided Flow
RoboPARA: Dual-Arm Robot Planning with Parallel Allocation and Recomposition Across Tasks
RAP: 3D Rasterization Augmented End-to-End Planning
Decomposed Attention Fusion in MLLMs for Training-free Video Reasoning Segmentation
When Bias Meets Trainability: Connecting Theories of Initialization
Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing
Seeing What’s Wrong: A Trajectory-Guided Approach to Caption Error Detection
Sampling Complexity of TD and PPO in RKHS
Mixture-of-Visual-Thoughts: Exploring Context-Adaptive Reasoning Mode Selection for General Visual Reasoning
Learning Self-Critiquing Mechanisms for Region-Guided Chest X-Ray Report Generation
Causal Interpretation of Neural Network Computations with Contribution Decomposition
Ego-Foresight: Self-supervised Learning of Agent-Aware Representations for Improved RL
DeCo-DETR: Decoupled Cognition DETR for efficient Open-Vocabulary Object Detection
IR-Agent: Expert-Inspired LLM Agents for Structure Elucidation from Infrared Spectra
Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models
A Physics-Inspired Optimizer: Velocity Regularized Adam
DeepFRC: An End-to-End Deep Learning Model for Functional Registration and Classification
Unified Vision-Language-Action Model
SESaMo: Symmetry-Enforcing Stochastic Modulation for Normalizing Flows
VideoZoomer: Reinforcement-Learned Temporal Focusing for Long Video Reasoning
Latent Wavelet Diffusion For Ultra High-Resolution Image Synthesis
Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Solving Parameter-Robust Avoid Problems with Unknown Feasibility using Reinforcement Learning
Learning Human Habits with Rule-Guided Active Inference
SimpleFold: Folding Proteins is Simpler than You Think
Token-Guard: Towards Token-Level Hallucination Control via Self-Checking Decoding
Learning Robust Intervention Representations with Delta Embeddings
P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark
The Expressive Limits of Diagonal SSMs for State-Tracking
AbsTopK: Rethinking Sparse Autoencoders For Bidirectional Features
Beyond Sequential Reranking: Reranker-Guided Search Improves Reasoning Intensive Retrieval
DTP: Delta-Guided Two Stage Pruning for Mamba-based Multimodal Large Language Models
Lean Finder: Semantic Search for Mathlib That Understands User Intents
TABLET: A Large-Scale Dataset for Robust Visual Table Understanding
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning
Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models
Pedagogically-Inspired Data Synthesis for Language Model Knowledge Distillation
AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations
Beyond Simple Graphs: Neural Multi-Objective Routing on Multigraphs
Reverse Distillation: Consistently Scaling Protein Language Model Representations
Differentiable Simulation of Hard Contacts with Soft Gradients for Learning and Control
3DGEER: 3D Gaussian Rendering Made Exact and Efficient for Generic Cameras
Neural Message-Passing on Attention Graphs for Hallucination Detection
Distributions as Actions: A Unified Framework for Diverse Action Spaces
Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures
TableDART: Dynamic Adaptive Multi-Modal Routing for Table Understanding
GARLIC: Graph Attention-based Relational Learning of Multivariate Time Series in Intensive Care
Diverse and Sparse Mixture-of-Experts for Causal Subgraph–Based Out-of-Distribution Graph Learning
LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts
FHE-Coder: Benchmarking Secure Agentic Code Generation for Fully Homomorphic Encryption
VLBiMan: Vision-Language Anchored One-Shot Demonstration Enables Generalizable Bimanual Robotic Manipulation
Linking Process to Outcome: Conditional Reward Modeling for LLM Reasoning
FlowCast: Advancing Precipitation Nowcasting with Conditional Flow Matching
Bongard-RWR+: Real-World Representations of Fine-Grained Concepts in Bongard Problems
YuE: Scaling Open Foundation Models for Long-Form Music Generation
ALM-MTA: Front-Door Causal Multi-Touch Attribution Method for Creator-Ecosystem Optimization
Reducing Contextual Stochastic Bilevel Optimization via Structured Function Approximation
Best-of-Infinity: Asymptotic Performance of Test-Time LLM Ensembling
FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction
Is Finer Better? The Limits of Microscaling Formats in Large Language Models
Fast training of accurate physics-informed neural networks without gradient descent
Lifelong Embodied Navigation Learning
EAST: Early Action Prediction Sampling Strategy with Token Masking
Spatial Structure and Selective Text Jointly Facilitate Image Clustering
Learning from Synthetic Data Improves Multi-hop Reasoning
When Scores Learn Geometry: Rate Separations under the Manifold Hypothesis
SLM-MUX: Orchestrating Small Language Models for Reasoning
Linear Mechanisms for Spatiotemporal Reasoning in Vision Language Models
Masked Generative Policy for Robotic Control
Memba: Membrane-driven Parameter-Efficient Fine-Tuning for Mamba
CoPRS: Learning Positional Prior from Chain-of-Thought for Reasoning Segmentation
Latent Thinking Optimization: Your Latent Reasoning Language Model Secretly Encodes Reward Signals in Its Latent Thoughts
Numerion: A Multi-Hypercomplex Model for Time Series Forecasting
Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models
EnvSocial-Diff: A Diffusion-Based Crowd Simulation Model with Environmental Conditioning and Individual-Group Interaction
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
SIM-CoT: Supervised Implicit Chain-of-Thought
Rapid Training of Hamiltonian Graph Networks Using Random Features
Towards a Certificate of Trust: Task-Aware OOD Detection for Scientific AI
ProReGen: Progressive Residual Generation under Attribute Correlations
DataMIL: Selecting Data for Robot Imitation Learning with Datamodels
Understanding and Improving Hyperbolic Deep Reinforcement Learning
Robust Reward Modeling via Causal Rubrics
Circuit Insights: Towards Interpretability Beyond Activations
Tight Bounds for Schrodinger Potential Estimation in Unpaired Data Translation
RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling
Buckingham $\pi$-Invariant Test‑Time Projection for Robust PDE Surrogate Modeling
MICLIP: Learning to Interpret Representation in Vision Models
Adapt Data to Model: Adaptive Transformation Optimization for Domain-shared Time Series Foundation Models
GRADIEND: Feature Learning within Neural Networks Exemplified through Biases
Boosting for Predictive Sufficiency
Imagine How To Change: Explicit Procedure Modeling for Change Captioning
Conformalized Survival Counterfactuals Prediction for General Right-Censored Data
PMDformer: Patch-Mean Decoupling Information Transformer for Long-term Forecasting
Splat Regression Models
Towards Persistent Noise-Tolerant Active Learning of Regular Languages with Class Query
TaCo: A Benchmark for Lossless and Lossy Codecs of Heterogeneous Tactile Data
A Faster Parameter-Free Regret Matching Algorithm
Price of Quality: Sufficient Conditions for Sparse Recovery using Mixed-Quality Data
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?
GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving
Learning Mixtures of Linear Dynamical Systems via Hybrid Tensor-EM Method
Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations
LAMDA: A Longitudinal Android Malware Benchmark for Concept Drift Analysis
Random-projection ensemble dimension reduction
Test-Time Efficient Pretrained Model Portfolios for Time Series Forecasting
UALM: Unified Audio Language Model for Understanding, Generation and Reasoning
Micro-Macro Coupled Koopman Modeling on Graph for Traffic Flow Prediction
Emergent Discrete Controller Modules for Symbolic Planning in Transformers
The Deleuzian Representation Hypothesis
DPad: Efficient Diffusion Language Models with Suffix Dropout
Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training
Personalized Feature Translation for Expression Recognition: An Efficient Source-Free Domain Adaptation Method
UltraGauss: Ultrafast Gaussian Reconstruction of 3D Ultrasound Volumes
PALC: Preference Alignment via Logit Calibration
FlexHiNM-GP: Flexible Hierarchical Pruning via Region Allocation and Channel Permutation
PAS: Estimating the target accuracy before domain adaptation
GeoFAR: Geography-Informed Frequency-Aware Super-Resolution for Climate Data
ProstaTD: Bridging Surgical Triplet from Classification to Fully Supervised Detection
Tuning the burn-in phase in training recurrent neural networks improves their performance
Faster Diffusion Through Temporal Attention Decomposition
Adaptive Social Learning via Mode Policy Optimization for Language Agents
Stable-LoRA: Stabilizing Feature Learning of Low-Rank Adaptation
TiTok: Transfer Token-level Knowledge via Contrastive Excess to Transplant LoRA
Spatially Informed Autoencoders for Interpretable Visual Representation Learning
Not All Bits Are Equal: Scale-Dependent Memory Optimization Strategies for Reasoning Models
K²-Agent: Co-Evolving Know-What and Know-How for Hierarchical Mobile Device Control
Reliable Probabilistic Forecasting of Irregular Time Series through Marginalization-Consistent Flows
Destruction is a General Strategy to Learn Generation; Diffusion's Strength is to Take it Seriously; Exploration is the Future
Spatial Reasoning with Vision-Language Models in Ego-Centric Multi-View Scenes
The human knowledge loophole in the 'bitter lesson' for LLMs
UnigramLM: An Attempt at Writing The Missing Manual
Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation
ChunkTabPFN: Training-free Long Context
STABLE: Shift-Tolerant Allocation via Black-Litterman Using Conditional Diffusion Estimates
Artistic Style and the Play of Neural Style Representations
From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation
PU-BENCH: A UNIFIED BENCHMARK FOR RIGOROUS AND REPRODUCIBLE PU LEARNING
The effect of feature resolution on embedding dimension
dLLM - Rethinking Generation Beyond Autoregressive Models
Flow Where You Want
Forget Forgetting: Continual Learning in a World of Abundant Memory
A Case for Library-Level k-Means Binning in Histogram Gradient-Boosted Trees
Chimera: State Space Models Beyond Sequences
FlashAttention on a Napkin: A Diagrammatic Approach to Deep Learning IO-Awareness
TINKER: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization
How Text Quality Interventions Reshape Neural Scaling Laws for LLMs: Empirical Study
ReFeR: Improving Evaluation and Reasoning through Hierarchy of Models
Aurora: Towards Universal Generative Multimodal Time Series Forecasting
SCas4D: Structural Cascaded Optimization for Boosting Persistent 4D Novel View Synthesis
Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation
Inverse Scaling in Test-Time Compute
HAMLET: Switch Your Vision-Language-Action Model into a History-Aware Policy
Training Dynamics of Learning 3D-Rotational Equivariance
AB-UPT: Scaling Neural CFD Surrogates for High- Fidelity Automotive Aerodynamics Simulations via Anchored- Branched Universal Physics Transformers
TS-DDAE: A Novel Temporal-Spectral Denoising Diffusion AutoEncoder for Wireless Signal Recognition Model Pre-training
BrowseNet: Graph-Based Associative Memory for Contextual Information Retrieval
Variational Pseudo Marginal Methods for Jet Reconstruction in Particle Physics
Tabby: A Language Model Architecture for Tabular and Structured Data Synthesis
Scaling Laws Revisited: Modeling the Role of Data Quality in Language Model Pretraining
Machine Unlearning under Retain–Forget Entanglement
WFR-FM: Simulation-Free Dynamic Unbalanced Optimal Transport
ST-HHOL: Spatio-Temporal Hierarchical Hypergraph Online Learning for Crime Prediction
T1: One-to-One Channel-Head Binding for Multivariate Time-Series Imputation
Neural Compression of 3D Meshes using Sparse Implicit Representation
Debugging Concept Bottleneck Models through Removal and Retraining
Tokenizing Single-Channel EEG with Time-Frequency Motif Learning
ODEBrain: Continuous-Time EEG Graph for Modeling Dynamic Brain Networks
Stretching Beyond the Obvious: A Gradient-Free Framework to Unveil the Hidden Landscape of Visual Invariance
NextQuill: Causal Preference Modeling for Enhancing LLM Personalization
On the Interpolation Effect of Score Smoothing in Diffusion Models
STORM: Synergistic Cross-Scale Spatio-Temporal Modeling for Weather Forecasting
LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer
Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions
TRACE: Your Diffusion Model is Secretly an Instance Edge Detector
Flipping the Dialogue: Training and Evaluating User Language Models
Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction
Towards Privacy-Guaranteed Label Unlearning in Vertical Federated Learning: Few-Shot Forgetting Without Disclosure
LLMs Get Lost In Multi-Turn Conversation
SSDi8: Accurate and Efficient 8-bit Quantization for State Space Duality
Curse of Slicing: Why Sliced Mutual Information is a Deceptive Measure of Statistical Dependence
Learning Facts at Scale with Active Reading
CoRA: Boosting Time Series Foundation Models for Multivariate Forecasting through Correlation-aware Adapter
T1: Tool-integrated Verification for Test-time Compute Scaling in Small Language Models
Pre-training under infinite compute
Reinforcement Learning for Machine Learning Engineering Agents
Distributionally Robust Classification for Multi-source Unsupervised Domain Adaptation
FutureMind: Equipping Small Language Models with Strategic Thinking-Pattern Priors via Adaptive Knowledge Distillation
MLE-Smith: Scaling MLE Tasks with Automated Multi-agent Pipeline
QKV Projections Require a Fraction of Their Memory
Action-Free Offline-To-Online RL via Discretised State Policies
Mitigating Spurious Correlation via Distributionally Robust Learning with Hierarchical Ambiguity Sets
PriorGuide: Test-Time Prior Adaptation for Simulation-Based Inference
WorldGym: World Model as An Environment for Policy Evaluation
EgoWorld: Translating Exocentric View to Egocentric View using Rich Exocentric Observations
Lost in the Non-convex Loss Landscape: How to Fine-tune the Large Time Series Model?
Faithfulness Under the Distribution: A New Look at Attribution Evaluation
PromptHub: Enhancing Multi-Prompt Visual In-Context Learning with Locality-Aware Fusion, Concentration and Alignment
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning
Fine-tuning Quantized Neural Networks with Zeroth-order Optimization
Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation
JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation
PGRF-Net: A Prototype-Guided Relational Fusion Network for Diagnostic Multivariate Time-Series Anomaly Detection
Markovian Transformers for Informative Language Modeling
Harnessing Temporal Databases for Systematic Evaluation of Factual Time-Sensitive Question-Answering in LLMs
Delta-XAI: A Unified Framework for Explaining Prediction Changes in Online Time Series Monitoring
CRONOS: Continuous time reconstruction for 4D medical longitudinal series
Advancing Complex Video Object Segmentation via Progressive Concept Construction
Private Rate-Constrained Optimization with Applications to Fair Learning
Feature compression is the root cause of adversarial fragility in neural networks
Memory-Free Continual Learning with Null Space Adaptation for Zero-Shot Vision-Language Models
Deconstructing Guidance: A Semantic Hierarchy for Precise Diffusion Model Editing
MAGREF: Masked Guidance for Any-Reference Video Generation with Subject Disentanglement
Triangle Multiplication is All You Need for Biomolecular Structure Representations
MMPD: Diverse Time Series Forecasting via Multi-Mode Patch Diffusion Loss
Robust Federated Inference
TS-Attn: Temporal-wise Separable Attention for Multi-Event Video Generation
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Sparse Imagination for Efficient Visual World Model Planning
Taming Polysemanticity in LLMs: Theory-Grounded Feature Recovery via Sparse Autoencoders
How Transformers Learn Causal Structures In-Context: Explainable Mechanism Meets Theoretical Guarantee
TrojanTO: Action-Level Backdoor Attacks Against Trajectory Optimization Models
RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models
Instance-Dependent Fixed-Budget Pure Exploration in Reinforcement Learning
ZeroSiam: An Efficient Asymmetry for Test-Time Entropy Optimization without Collapse
Fine-tuning Done Right in Model Editing
Exploring Diverse Generation Paths via Inference-time Stiefel Activation Steering
MoAlign: Motion-Centric Representation Alignment for Video Diffusion Models
Learning to Weight Parameters for Training Data Attribution
CLAUSE: Agentic Neuro-Symbolic Knowledge Graph Reasoning via Dynamic Learnable Context Engineering
Inter-Agent Relative Representations for Multi-Agent Option Discovery
SpectralGCD: Spectral Concept Selection and Cross-modal Representation Learning for Generalized Category Discovery
Lookahead Tree-Based Rollouts for Enhanced Trajectory-Level Exploration in Reinforcement Learning with Verifiable Rewards
CloDS: Visual-Only Unsupervised Cloth Dynamics Learning in Unknown Conditions
FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning
Incentivizing LLM Reasoning via Reinforcement Learning with Functional Monte Carlo Tree Search
Highly Efficient and Effective LLMs with Multi-Boolean Architectures
Rethinking the Diffusion Model from a Langevin Perspective
Optimizing Data Augmentation through Bayesian Model Selection
OSIRIS: Bridging Analog Circuit Design and Machine Learning with Scalable Dataset Generation
Echo: Towards Advanced Audio Comprehension via Audio-Interleaved Reasoning
Transformers with Endogenous In-Context Learning: Bias Characterization and Mitigation
TAMMs: Change Understanding and Forecasting in Satellite Image Time Series with Temporal-Aware Multimodal Models
TrimR: Verifier-based Training-Free Thinking Trimming for Efficient Test-Time Scaling
VisJudge-Bench: Aesthetics and Quality Assessment of Visualizations
MoMa: A Simple Modular Learning Framework for Material Property Prediction
Relational Graph Transformer
InnovatorBench: Evaluating Agents’ Ability to Conduct Innovative AI Research
Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling
Rethinking Code Similarity for Automated Algorithm Design with LLMs
Interpretable 3D Neural Object Volumes for Robust Conceptual Reasoning
Generative Human Geometry Distribution
SPELL: Self-Play Reinforcement Learning for Evolving Long-Context Language Models
Grounding-IQA: Grounding Multimodal Language Model for Image Quality Assessment
UIS-Digger: Towards Comprehensive Research Agent Systems for Real-world Unindexed Information Seeking
Adaptive Canonicalization with Application to Invariant Anisotropic Geometric Networks
OmniText: A Training-Free Generalist for Controllable Text-Image Manipulation
PT$^2$-LLM: Post-Training Ternarization for Large Language Models
Budget Alignment: Making Models Reason in the User's Language
Discrete Latent Features Ablate Adversarial Attack: A Robust Prompt Tuning Framework for VLMs
Advancing Multi-agent Traffic Simulation via R1-Style Reinforcement Fine-Tuning
RAG4DMC: Retrieval-Augmented Generation for Data-Level Modality Completion
From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors
Matching Multiple Experts: On the Exploitability of Multi-Agent Imitation Learning
From Text to Talk: Audio-Language Model Needs Non-Autoregressive Joint Training
How does the optimizer implicitly bias the model merging loss landscape?
Iterated Q-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning
Towards Robust Real-World Multivariate Time Series Forecasting: A Unified Framework for Dependency, Asynchrony, and Missingness
Learning Posterior Predictive Distributions for Node Classification from Synthetic Graph Priors
Seeing Through Deception: Uncovering Misleading Creator Intent in Multimodal News with Vision-Language Models
Scaling with Collapse: Efficient and Predictable Training of LLM Families
Read the Room: Video Social Reasoning with Mental-Physical Causal Chains
Learning AND–OR Templates for Compositional Representation in Art and Design
Ready For General Agents? Let's test it.
Constraint Matters: Multi-Modal Representation for Reducing Mixed-Integer Linear programming
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
On Optimal Hyperparameters for Differentially Private Deep Transfer Learning
HSG-12M: A Large-Scale Benchmark of Spatial Multigraphs from the Energy Spectra of Non-Hermitian Crystals
SafeFlowMatcher: Safe and Fast Planning using Flow Matching with Control Barrier Functions
Chain-of-Context Learning: Dynamic Constraint Understanding for Multi-Task VRPs
Quotient-Space Diffusion Models
How Reliable is Language Model Micro-Benchmarking?
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
ProteinAE: Protein Diffusion Autoencoders for Structure Encoding
FlexRibbon: Joint Sequence and Structure Pretraining for Protein Modeling
Reusing Pre-Training Data at Test Time is a Compute Multiplier
Predicting Training Re-evaluation Curves Enables Effective Data Curriculums for LLMs
DiffBED: Scaling Bayesian Experimental Design to High-Dimensions
PARD: Accelerating LLM Inference with Low‑Cost PARallel Draft Model Adaptation
Counterfactual Structural Causal Bandits
Trust but Verify: Adaptive Conditioning for Reference-Based Diffusion Super-Resolution via Implicit Reference Correlation Modeling
The Price of Robustness: Stable Classifiers Need Overparameterization
FlexLinearAttention: Compiling a Unified Abstraction into Scalable Kernels for Linear Attention
MILCO: Learned Sparse Retrieval Across Languages via a Multilingual Connector
P3D: Highly Scalable 3D Neural Surrogates for Physics Simulations with Global Context
Quantile Advantage Estimation: Stabilizing RLVR for LLM Reasoning
Beyond Magnitude: Leveraging Direction of RLVR Updates for LLM Reasoning
Implicit 4D Gaussian Splatting for Fast Motion with Large Inter-Frame Displacements
ReactID: Synchronizing Realistic Actions and Identity in Personalized Video Generation
An Overview of Subliminal Learning
PI-Light: Physics-Inspired Diffusion for Full-Image Relighting
SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From
Generalization in LLM Problem Solving: The Case of the Shortest Path
AMiD: Knowledge Distillation for LLMs with $\alpha$-mixture Assistant Distribution
Semantic-aware Wasserstein Policy Regularization for Large Language Model Alignment
When Thinking Backfires: Mechanistic Insights into Reason-induced Misalignment
Misalignments and RL Failure Modes in the Early Stage of Superintelligence
ASMIL: Attention-Stabilized Multiple Instance Learning for Whole-Slide Imaging
Learning Boltzmann Generators via Constrained Mass Transport
Learning Collective Variables from BioEmu with Time-Lagged Generation
Unmasking Backdoors: An Explainable Defense via Gradient-Attention Anomaly Scoring for Pre-trained Language Models
G-Merging: Graph Models Merging for Parameter-Efficient Multi-Task Knowledge Consolidation
Squeeze the Soaked Sponge: Efficient Off-policy RFT for Large Language Model
DSA: Efficient Inference For Video Generation Models via Distributed Sparse Attention
GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks
Randomization Boosts KV Caching, Learning Balances Query Load: A Joint Perspective
Hybrid Deep Searcher: Scalable Parallel and Sequential Search Reasoning
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety
Reverse-Engineered Reasoning for Open-Ended Generation
Difficult Examples Hurt Unsupervised Contrastive Learning: A Theoretical Perspective
MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow
Does “Do Differentiable Simulators Give Better Policy Gradients?” Give Better Policy Gradients?
Learning Flexible Forward Trajectories for Masked Molecular Diffusion
ProRe: A Proactive Reward System for GUI Agents via Reasoner–Actor Collaboration
In-context learning of representations can be explained by induction circuits
When More is Less: Understanding Chain-of-Thought Length in LLMs
Enhancing Image-Conditional Coverage in Segmentation: Adaptive Thresholding via Differentiable Miscoverage Loss
Autoregressive Models Rival Diffusion Models at ANY-ORDER Generation
RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs
Mechanism of Task-oriented Information Removal in In-context Learning
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
RMAAT: Astrocyte-Inspired Memory Compression and Replay for Efficient Long-Context Transformers
Relative Entropy Pathwise Policy Optimization
Enhancing Stability of Physics-Informed Neural Network Training Through Saddle-Point Reformulation
Fair Conformal Classification via Learning Representation-Based Groups
Fine-tuning Behavioral Cloning Policies with Preference‑Based Reinforcement Learning
Foundation Visual Encoders Are Secretly Few-Shot Anomaly Detectors
Verifier-free Test-Time Sampling for Vision-Language-Action Models
Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization
Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs
Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations
GeoGramBench: Benchmarking the Geometric Program Reasoning in Modern LLMs
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training
VibeVoice: Expressive Podcast Generation with Next-Token Diffusion
Selective Rotary Position Embedding
ForestPersons: A Large-Scale Dataset for Under-Canopy Missing Person Detection
Harmonized Cone for Feasible and Non-conflict Directions in Training Physics-Informed Neural Networks
Covariate-Guided Clusterwise Linear Regression for Generalization to Unseen Data
NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction
ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models
Efficient Test-Time Scaling for Small Vision-Language Models
IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs
Human Behavior Atlas: Benchmarking Unified Psychological And Social Behavior Understanding
Testing Fourier Sparsity via Implicit Sensing
VARestorer: One-Step VAR Distillation for Real-World Image Super-Resolution
DHG-Bench: A Comprehensive Benchmark for Deep Hypergraph Learning
Fast and Interpretable Protein Substructure Alignment via Optimal Transport
Stackelberg Coupling of Online Representation Learning and Reinforcement Learning
Physically Valid Biomolecular Interaction Modeling with Gauss-Seidel Projection
Primal-Dual Policy Optimization for Linear CMDPs with Adversarial Losses
On the Tension Between Optimality and Adversarial Robustness in Policy Optimization
CompassNav: Steering From Path Imitation to Decision Understanding In Navigation
Synthesising Counterfactual Explanations via Label-Conditional Gaussian Mixture Variational Autoencoders
When Foundation Models are One-Liners: Limitations and Future Directions for Time Series Anomaly Detection
Towards Sampling Data Structures for Tensor Products in Turnstile Streams
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
$\boldsymbol{\partial^\infty}$-Grid: A Neural Differential Equation Solver with Differentiable Feature Grids
SUIT: Knowledge Editing with Subspace-Aware Key-Value Mappings
Edit-Based Flow Matching for Temporal Point Processes
Reliable Fine-Grained Evaluation of Natural Language Math Proofs
Model Tensor Planning
Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods
Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models
Hyper-SET: Designing Transformers via Hyperspherical Energy Minimization
Bures-Wasserstein Flow Matching for Graph Generation
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
Overtone: Cyclic Patch Modulation for Clean, Efficient, and Flexible Physics Emulators
KDP: Simplifying Representation Dynamics in Kernel Space
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective
SpareTrain: Fault-Tolerant LLM Training via Low-Cost Dual Modular Redundancy
Tracing and Reversing Edits in LLMs
SPR$^2$Q: Static Priority-based Rectifier Routing Quantization for Image Super-Resolution
Are EEG Foundation Models Worth It? Comparative Evaluation with Traditional Decoders in Diverse BCI Tasks
UniVideo: Unified Understanding, Generation, and Editing for Videos
Bidirectional Predictive Coding
Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning
RM-R1: Reward Modeling as Reasoning
Flow Map Learning Via Non-Gradient Vector Flow
HUME: Measuring the Human-Model Performance Gap in Text Embedding Tasks
Estimating Semantic Alphabet Size for LLM Uncertainty Quantification
Rethinking Causal Mask Attention for Vision-Language Inference
ExoPredicator: Learning Abstract Models of Dynamic Worlds for Robot Planning
BioCAP: Exploiting Synthetic Captions Beyond Labels in Biological Foundation Models
Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer
MIAM: Modality Imbalance-Aware Masking for Multimodal Ecological Applications
ASIDE: Architectural Separation of Instructions and Data in Language Models
Escaping Low-Rank Traps: Interpretable Visual Concept Learning via Implicit Vector Quantization
Adaptive Width Neural Networks
LRIM: a Physics-Based Benchmark for Provably Evaluating Long-Range Capabilities in Graph Learning
A New Initialization to Control Gradients in Sinusoidal Neural Networks
On the Wings of Imagination: Conflicting Script-based Multi-role Framework for Humor Caption Generation
MobileCLIP2: Improving Multi-Modal Reinforced Training
Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles
BEP: A Binary Error Propagation Algorithm for Binary Neural Networks Training
Towards Reliable Benchmarking: A Contamination Free, Controllable Evaluation Framework for Multi-step LLM Function Calling
MENLO: From Preferences to Proficiency – Evaluating and Modeling Native-like Quality Across 47 Languages
EntropyLong: Effective Long-Context Training via Predictive Uncertainty
Efficient Audio-Visual Speech Separation with Discrete Lip Semantics and Multi-Scale Global-Local Attention
The Spacetime of Diffusion Models: An Information Geometry Perspective
Influence-Preserving Proxies for Gradient-Based Data Selection in LLM FineTuning
Riemannian Variational Flow Matching for Material and Protein Design
SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents
Rethinking Layer Relevance in Large Language Models Beyond Cosine Similarity
DRBench: A Realistic Benchmark for Enterprise Deep Research
Continual Low-Rank Adapters for LLM-based Generative Recommender Systems
Towards Understanding Valuable Preference Data for Large Language Model Alignment
Rethinking LLM Reasoning: From Explicit Trajectories to Latent Representations
Scaling Generalist Data-Analytic Agents
FlowSearcher: Synthesizing Memory-Guided Agentic Workflows for Web Information Seeking
Online Navigation Refinement: Achieving Lane-Level Guidance by Associating Standard-Definition and Online Perception Maps
Long Chain-of-Thought Reasoning Across Languages
CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition
OmniNav: A Unified Framework for Prospective Exploration and Visual-Language Navigation
MolecularIQ: Characterizing Chemical Reasoning Capabilities Through Symbolic Verification on Molecular Graphs
RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
Process-Verified Reinforcement Learning for Theorem Proving via Lean
Egalitarian Gradient Descent: A Simple Approach to Accelerated Grokking
Paper Copilot: Tracking the Evolution of Peer Review in AI Conferences
Multiverse Mechanica: A Testbed for Learning Game Mechanics via Counterfactual Worlds
LH-DECEPTION: Simulating and Understanding LLM Deceptive Behaviors in Long-Horizon Interactions
SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting
What is the Relationship between Tensor Factorizations and Circuits (and How Can We Exploit it)?
Astra: General Interactive World Model with Autoregressive Denoising
VLM-Guided Adaptive Negative Prompting for Creative Generation
StepORLM: A Self-Evolving Framework With Generative Process Supervision For Operations Research Language Models
Rote Learning Considered Useful: Generalizing over Memorized Data in LLMs
Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks
In Agents We Trust, but Who Do Agents Trust? Latent Source Preferences Steer LLM Generations
Towards Improved Sentence Representations using Token Graphs
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Video
Vintix II: Decision Pre-Trained Transformer is a Scalable In-Context Reinforcement Learner
SoSBench: Benchmarking Safety Alignment on Six Scientific Domains
Quantifying Cross-Attention Interaction in Transformers for Interpreting TCR-pMHC Binding
Tractability via Low Dimensionality: The Parameterized Complexity of Training Quantized Neural Networks
Forward-Learned Discrete Diffusion: Learning how to noise to denoise faster
Purrception: Variational Flow Matching for Vector-Quantized Image Generation
FETAL-GAUGE: A BENCHMARK FOR ASSESSING VISION-LANGUAGE MODELS IN FETAL ULTRASOUND
Prior-based Noisy Text Data Filtering: Fast and Strong Alternative For Perplexity
TetraGT: Tetrahedral Geometry-Driven Explicit Token Interactions with Graph Transformer for Molecular Representation Learning
Flower: A Flow-Matching Solver for Inverse Problems
Rethinking Uncertainty Estimation in LLMs: A Principled Single-Sequence Measure
Addressing Pitfalls in the Evaluation of Uncertainty Estimation Methods for Natural Language Generation
RedacBench: Can AI Erase Your Secrets?
Reducing Belief Deviation in Reinforcement Learning for Active Reasoning of LLM Agents
Neural Multi-Objective Combinatorial Optimization for Flexible Job Shop Scheduling Problems
Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models
Scaling Speech Tokenizers with Diffusion Autoencoders
Relatron: Automating Relational Machine Learning over Relational Databases
On the Thinking-Language Modeling Gap in Large Language Models
TRACEDET: HALLUCINATION DETECTION FROM THE DECODING TRACE OF DIFFUSION LARGE LANGUAGE MODELS
VeriCoT: Neuro-symbolic Chain-of-Thought Validation via Logical Consistency Checks
GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching
Graph Random Features for Scalable Gaussian Processes
Beyond Linear Processing: Dendritic Bilinear Integration in Spiking Neural Networks
Towards True Speech-to-Speech Models Without Text Guidance
Bilinear representation mitigates reversal curse and enables consistent model editing
Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning
From Markov to Laplace: How Mamba In-Context Learns Markov Chains
MotionStream: Real-Time Video Generation with Interactive Motion Controls
Why We Need New Benchmarks for Local Intrinsic Dimension Estimation
Direct Reward Fine-Tuning on Poses for Single Image to 3D Human in the Wild
MedAraBench: Large-scale Arabic Medical Question Answering Dataset and Benchmark
OmniMouse: Scaling properties of multi-modal, multi-task Brain Models on 150B Neural Tokens
A Fano-Style Accuracy Upper Bound for LLM Single-Pass Reasoning in Multi-Hop QA
Don't Throw Away Your Beams: Improving Consistency-based Uncertainties in LLMs via Beam Search
TabStruct: Measuring Structural Fidelity of Tabular Data
BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design
SelfReflect: Can LLMs Communicate Their Internal Answer Distribution?
Trained on Tokens, Calibrated on Concepts: The Emergence of Semantic Calibration in LLMs
FoNE: Precise Single-Token Number Embeddings via Fourier Features
Learning to Recall with Transformers Beyond Orthogonal Embeddings
PRISM: Progressive Robust Learning for Open-World Continual Category Discovery
Tracking Equivalent Mechanistic Interpretations Across Neural Networks
Query-Guided Spatial–Temporal–Frequency Interaction for Music Audio–Visual Question Answering
Scaling Reasoning Hop Exposes Weaknesses: Demystifying and Improving Hop Generalization in Large Language Models
PRISM: Enhancing PRotein Inverse Folding through Fine- Grained Retrieval on Structure-Sequence Multimodal Representations
Towards Bridging the Gap between Large-Scale Pretraining and Efficient Finetuning for Humanoid Control
On the Impact of the Utility in Semivalue-based Data Valuation
OSCAR: Online Soft Compression for RAG
Sublinear Spectral Clustering Oracle with Little Memory
Learning Retrieval Models with Sparse Autoencoders
Fast Frank–Wolfe Algorithms with Adaptive Bregman Step-Size for Weakly Convex Functions
R2PS: Worst-Case Robust Real-Time Pursuit Strategies under Partial Observability
Zero-Shot Adaptation of Behavioral Foundation Models to Unseen Dynamics
Faster SVD via Accelerated Newton-Schulz Iteration
SAM 3: Segment Anything with Concepts
CogniLoad: A Synthetic Natural Language Reasoning Benchmark With Tunable Length, Intrinsic Difficulty, and Distractor Density
SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs
Capability-Based Scaling Trends for LLM-Based Red-Teaming
Adaptive Thinking: Large Language Models Know When to Think in Latent Space
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs
Post-hoc Probabilistic Vision-Language Models
Hybrid Reinforcement: when reward is sparse, better to be dense
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols
Learning under Quantization for High-Dimensional Linear Regression
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLMs
ACE: Attribution-Controlled Knowledge Editing for Multi-hop Factual Recall
Temporally Detailed Hypergraph Neural ODE for Disease Progression Modeling
ZeroTuning: Unlocking the Initial Token's Power to Enhance Large Language Models Without Training
DeepAFL: Deep Analytic Federated Learning
Online Learning and Equilibrium Computation with Ranking Feedback
SABRE-FL: Selective and Accurate Backdoor Rejection for Federated Prompt Learning
Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models
Empowering Small VLMs to Think with Dynamic Memorization and Exploration
Beyond the Heatmap: A Rigorous Evaluation of Component Impact in MCTS-Based TSP Solvers
Revisiting Sharpness-Aware Minimization: A More Faithful and Effective Implementation
ARMOR: Aligning Secure and Safe Large Language Models via Meticulous Reasoning
NeRV-Diffusion: Diffuse Implicit Neural Representation for Video Synthesis
CLEAR: Calibrated Learning for Epistemic and Aleatoric Risk
Sample-efficient evidence estimation of score based priors for model selection
FakeXplain: AI-Generated Image Detection via Human-Aligned Grounded Reasoning
G4Splat: Geometry-Guided Gaussian Splatting with Generative Prior
MOSS: Efficient and Accurate FP8 LLM Training with Microscaling and Automatic Scaling
Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning
On the Ability of Deep Networks to Learn Symmetries from Data – A Neural Kernel Theory
ICYM$^2$I: The illusion of multimodal informativeness under missingness
Aligned Agents, Biased Swarm: Measuring Bias Amplification in Multi-Agent Systems
Gumbel Distillation for Parallel Text Generation
CityLens: Evaluating Large Vision-Language Models for Urban Socioeconomic Sensing
Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs
InputDSA: Demixing, then comparing recurrent and externally driven dynamics
KnowProxy: Adapting Large Language Models by Knowledge-guided Proxy
Offline Preference-Based Value Optimization
Perception-R1: Advancing Multimodal Reasoning Capabilities of MLLMs via Visual Perception Reward
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces
Model Already Knows the Best Noise: Bayesian Active Noise Selection via Attention in Video Diffusion Model
Antislop: A Comprehensive Framework for Identifying and Eliminating Repetitive Patterns in Language Models
HEEGNet: Hyperbolic Embeddings for EEG
Exposing Mixture and Annotating Confusion for Active Universal Test-Time Adaptation
PrefDisco: Benchmarking Proactive Personalized Reasoning
Improving Human-AI Coordination through Online Adversarial Training and Generative Models
Convergence Dynamics of Over-Parameterized Score Matching for a Single Gaussian
Bridging Degradation Discrimination and Generation for Universal Image Restoration
AutoDrive-R²: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving
Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools
DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning
EXP-Bench: Can AI Conduct AI Research Experiments?
Continuous multinomial logistic regression for neural decoding
Dynamic Classifier-Free Diffusion Guidance via Online Feedback
FlowCast: Trajectory Forecasting for Scalable Zero-Cost Speculative Flow Matching
ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks
CIAR: Interval-based Collaborative Decoding for Image Generation Acceleration
NeuralOS: Towards Simulating Operating Systems via Neural Generative Models
Newton Method Revisited: Global Convergence Rates up to $O(1/k^3)$ for Stepsize Schedules and Linesearch Procedures
Closing the Gap Between Text and Speech Understanding in LLMs
Diffusion Models as Dataset Distillation Priors
JailNewsBench: Multi-Lingual and Regional Benchmark for Fake News Generation under Jailbreak Attacks
AgentGym-RL: An Open-Source Framework to Train LLM Agents for Long-Horizon Decision Making via Multi-Turn RL
Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks
Pixel-Perfect Puppetry: Precision-Guided Enhancement for Face Image and Video Editing
Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter
ReTabAD: A Benchmark for Restoring Semantic Context in Tabular Anomaly Detection
Bridging the Distribution Gap to Harness Pretrained Diffusion Priors for Super-Resolution
ContextGen: Contextual Layout Anchoring for Identity-Consistent Multi-Instance Generation
Culture in Action: Evaluating Text-to-Image Models through Social Activities
Sample-efficient and Scalable Exploration in Continuous-Time RL
OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot
The Intricate Dance of Prompt Complexity, Quality, Diversity and Consistency in T2I Models
Directed Exploration in Reinforcement Learning from Linear Temporal Logic
Dragging with Geometry: From Pixels to Geometry-Guided Image Editing
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation
MIMIC: Mask-Injected Manipulation Video Generation with Interaction Control
PixNerd: Pixel Neural Field Diffusion
LightCtrl: Training-free Controllable Video Relighting
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation
SurfSplat: Conquering Feedforward 2D Gaussian Splatting with Surface Continuity Priors
Radiometrically Consistent Gaussian Surfels for Inverse Rendering
Topology-Preserved Auto-regressive Mesh Generation in the Manner of Weaving Silk
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer
True Self-Supervised Novel View Synthesis is Transferable
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence
$\pi^3$: Permutation-Equivariant Visual Geometry Learning
Mono4DGS-HDR: High Dynamic Range 4D Gaussian Splatting from Alternating-exposure Monocular Videos
Human3R: Everyone Everywhere All at Once
Augmented Radiance Field: A General Framework for Enhanced Gaussian Splatting
Universal Beta Splatting
GuirlVG: Incentivize GUI Visual Grounding via Empirical Exploration on Reinforcement Learning
VLM-SubtleBench: How Far Are VLMs from Human-Level Subtle Comparative Reasoning?
Unlocking the Essence of Beauty: Advanced Aesthetic Reasoning with Relative-Absolute Policy Optimization
Rethinking Radiology Report Generation: From Narrative Flow to Topic-Guided Findings
AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning
IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment
Characterizing Deep Research: A Benchmark and Formal Definition
Cat-PO: Cross-modal Adaptive Token-rewards for Preference Optimization in Truthful Multimodal LLMs
Dynamic Reflections: Probing Video Representations with Text Alignment
Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning
VowelPrompt: Hearing Speech Emotions from Text via Vowel-level Prosodic Augmentation
Spotlight on Token Perception for Multimodal Reinforcement Learning
Endowing GPT-4 with a Humanoid Body: Building the Bridge Between Off-the-Shelf VLMs and the Physical World
Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving
AC-Foley: Reference-Audio-Guided Video-to-Audio Synthesis with Acoustic Transfer
MambaSL: Exploring Single-Layer Mamba for Time Series Classification
Revisual-R1: Advancing Multimodal Reasoning From Optimized Cold Start to Staged Reinforcement Learning
Memento: Toward an All-Day Proactive Assistant for Ultra-Long Streaming Video
QLIP: A Dynamic Quadtree Vision Prior Enhances MLLM Performance Without Retraining
Efficient Multimodal Spatial Reasoning via Dynamic and Asymmetric Routing
UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation
Unveiling the Cognitive Compass: Theory-of-Mind–Guided Multimodal Emotion Reasoning
PIRN: Prototypical-based Intra-modal Reconstruction with Normality Communication for Multi-modal Anomaly Detection.
Video-LevelGauge: Investigating Contextual Positional Bias in Video Language Models.
On The Fragility of Benchmark Contamination Detection in Reasoning Models
Hallucination-aware Intermediate Representation Edit in Large Vision-Language Models
SONIC: Spectral Oriented Neural Invariant Convolutions
PPE: Positional Preservation Embedding for Token Compression in Multimodal Large Language Models
Multimodal Classification via Total Correlation Maximization
OmniCVR: A Benchmark for Omni-Composed Video Retrieval with Vision, Audio, and Text
WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents
Not Search, But Scan: Benchmarking MLLMs on Scan-Oriented Academic Paper Reasoning
HiDrop: Hierarchical Vision Token Reduction in MLLMs via Late Injection, Concave Pyramid Pruning, and Early Exit
Robust Amortized Bayesian Inference with Self-Consistency Losses on Unlabeled Data
You Point, I Learn: Online Adaptation of Interactive Segmentation Models for Handling Distribution Shifts in Medical Imaging
COMI: Coarse-to-fine Context Compression via Marginal Information Gain
PerFit: Exploring Personalization Shifts in Representation Space of LLMs
From Vicious to Virtuous Cycles: Synergistic Representation Learning for Unsupervised Video Object-Centric Learning
Fractional-Order Spiking Neural Network
FedOpenMatch: Towards Semi-Supervised Federated Learning in Open-Set Environments
Decoupling Positional and Symbolic Attention in Transformers
Vulcan: Crafting Compact Class-Specific Vision Transformers For Edge Intelligence
Understanding and Fixing Bottlenecks in State Space Models: What Recency and Over-Smoothing Tell Us
Learning Deformable Body Interactions With Adaptive Spatial Tokenization
Zero-shot HOI Detection with MLLM-based Detector-agnostic Interaction Recognition
ELViS: Efficient Visual Similarity from Local Descriptors that Generalizes Across Domains
Seeing Through Words: Controlling Visual Retrieval Quality with Language Models
gen2seg: Generative Models Enable Generalizable Instance Segmentation
From Sparse to Dense: Spatio-Temporal Fusion for Multi-View 3D Human Pose Estimation with DenseWarper
Pose Prior Learner: Unsupervised Categorical Prior Learning for Pose Estimation
Enhancing Vision Transformers for Object Detection via Context-Aware Token Selection and Packing
Differentially Private Domain Discovery
DPQuant: Efficient and Private Model Training via Dynamic Quantization Scheduling
Privacy Beyond Pixels: Latent Anonymization for Privacy-Preserving Video Understanding
CIMemories: A Compositional Benchmark For Contextual Integrity In LLMs
Scaling Laws for Diffusion Transformers
Gaussian certified unlearning in high dimensions: A hypothesis testing approach
VeriEquivBench: An Equivalence Score for Ground-Truth-Free Evaluation of Formally Verifiable Code
Tab-MIA: A Benchmark Dataset for Membership Inference Attacks on Tabular Data in LLMs
La-Proteina: Atomistic Protein Generation via Partially Latent Flow Matching
GAVEL: Towards Rule-Based Safety through Activation Monitoring
Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models
TAO-Attack: Toward Advanced Optimization-Based Jailbreak Attacks for Large Language Models
Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test
Robust LLM Unlearning via Post Judgment and Multi-round Thinking
Discrete Adjoint Matching
Analytica: Soft Propositional Reasoning for Robust and Scalable LLM-Driven Analysis
Learning Dynamic Causal Graphs Under Parametric Uncertainty via Polynomial Chaos Expansions
Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs
Fairness via Independence: A General Regularization Framework for Machine Learning
From Static Benchmarks to Dynamic Protocol: Agent-Centric Text Anomaly Detection for Evaluating LLM Reasoning
Decoupling the Class Label and the Target Concept in Machine Unlearning
Fairness-Aware Multi-view Evidential Learning with Adaptive Prior
$\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers
Sparse Autoencoders Trained on the Same Data Learn Different Features
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking
Watermarking Diffusion Language Models
The Price of Amortized inference in Sparse Autoencoders
Obfuscated Activations Bypass LLM Latent-Space Defenses
Enhancing Trustworthiness of Fine-Tuned LLMs via Regularized Subset Selection
MIDAS: Multi-Image Dispersion and Semantic Reconstruction for Jailbreaking MLLMs
QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation
Truthfulness Despite Weak Supervision: Evaluating and Training LLMs Using Peer Prediction
OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!
Convergence of Regret Matching in Potential Games and Constrained Optimization
Pruning as a Cooperative Game: Surrogate-Assisted Layer Contribution Estimation for Large Language Models
LLMs are Single-threaded Reasoners: Demystifying the Working Mechanism of Soft Thinking
SERUM: Simple, Efficient, Robust, and Unifying Marking for Diffusion-based Image Generation
Algorithmic Guarantees for Distilling Supervised and Offline RL Datasets
Distributionally Robust Cooperative Multi-agent Reinforcement Learning with Value Factorization
MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents
Fair Decision Utility in Human-AI Collaboration: Interpretable Confidence Adjustment for Humans with Cognitive Disparities
Latent Adaptation of Foundation Policies for Sim-to-Real Transfer
Guidance Watermarking for Diffusion Models
Harnessing Hyperbolic Geometry for Harmful Prompt Detection and Sanitization
Label Smoothing Improves Machine Unlearning
Fair Classification by Direct Intervention on Operating Characteristics
Fair in Mind, Fair in Action? A Synchronous Benchmark for Understanding and Generation in UMLLMs
Deep-ICE: The first globally optimal algorithm for empirical risk minimization of two-layer maxout and ReLU networks
Safety Instincts: LLMs Learn to Trust Their Internal Compass for Self-Defense
Regulating Internal Alignment Flows for Robust Learning Under Spurious Correlations
Mitigating the Safety Alignment Tax with Null-Space Constrained Policy Optimization
SDErasure: Concept-Specific Trajectory Shifting for Concept Erasure via Adaptive Diffusion Classifier
Fine-Grained Privacy Extraction from Retrieval-Augmented Generation Systems by Exploiting Knowledge Asymmetry
GhostEI-Bench: Do Mobile Agent Resilience to Environmental Injection in Dynamic On-Device Environments?
Learn-to-Distance: Distance Learning for Detecting LLM-Generated Text
The Achilles’ Heel of LLMs: How Altering a Handful of Neurons Can Cripple Language Abilities
STAR: Strategy-driven Automatic Jailbreak Red-teaming For Large Language Model
Towards a Theoretical Understanding of In-context Learning: Stability and Non-I.I.D Generalisation
Downgrade to Upgrade: Optimizer Simplification Enhances Robustness in LLM Unlearning
PluriHarms: Benchmarking the Full Spectrum of Human Judgments on AI Harm
WaterDrum: Watermark-based Data-centric Unlearning Metric
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
Traceable Black-Box Watermarks For Federated Learning
Fine-Tuning Diffusion Models via Intermediate Distribution Shaping
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models
FragFM: Hierarchical Framework for Efficient Molecule Generation via Fragment-Level Discrete Flow Matching
Property-Driven Protein Inverse Folding with Multi-Objective Preference Alignment
Safety at One Shot: Patching Fine-Tuned LLMs with A Single Instance
Visual Compositional Tuning
Automata Learning and Identification of the Support of Language Models
scDFM: Distributional Flow Matching Model for Robust Single-Cell Perturbation Prediction
Tokenization to Transfer: Do Genomic Foundation Models Learn Good Representations?
Bridging Radiology and Pathology Foundation Models via Concept-Based Multimodal Co-Adaptation
Structural Prognostic Event Modeling for Multimodal Cancer Survival Analysis
M3CoTBench: Benchmark Chain-of-Thought of MLLMs in Medical Image Understanding
FrugalRAG: Less is More in RL Finetuning for Multi-hop Question Answering
TPDiff: Temporal Pyramid Video Diffusion Model
Generalized Spherical Neural Operators: Green’s Function Formulation
Fast Convergence of Natural Gradient Descent for Over-parameterized Physics-Informed Neural Networks
t-SNE Exaggerates Clusters, Provably
A Spectral-Grassmann Wasserstein metric for operator representations of dynamical systems
AetherCode: Evaluating LLMs’ Ability to Win In Premier Programming Competitions
LD-EnSF: Synergizing Latent Dynamics with Ensemble Score Filters for Fast Data Assimilation with Sparse Observations
Compositional Diffusion with Guided search for Long-Horizon Planning
Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denosing Diffusion Process
MobiEdit: Resource-efficient Knowledge Editing for Personalized On-device LLMs
Pyramid Patchification Flow for Visual Generation
RFS: Reinforcement learning with Residual flow steering for dexterous manipulation
Scaling up Memory for Robotic Control via Experience Retrieval
REI-Bench: Can Embodied Agents Understand Vague Human Instructions in Task Planning?
ResWorld: Temporal Residual World Model for End-to-End Autonomous Driving
Embodied Navigation Foundation Model
Influence without Confounding: Causal Discovery from Temporal Data with Long-term Carry-over Effects
Why Prototypes Collapse: Diagnosing and Preventing Partial Collapse in Prototypical Self-Supervised Learning
Journey to the Centre of Cluster: Harnessing Interior Nodes for A/B Testing under Network Interference
Multi-Scale Diffusion-Guided Graph Learning with Power-Smoothing Random Walk Contrast for Multi-View Clustering
Cross-Domain Lossy Compression via Rate- and Classification-Constrained Optimal Transport
GRACE: Generative Representation Learning via Contrastive Policy Optimization
Behavior Learning (BL)
Beyond Entity Correlations: Disentangling Event Causal Puzzles in Temporal Knowledge Graphs
Learning Unified Representation of 3D Gaussian Splatting
CoNavBench: Collaborative Long-Horizon Vision-Language Navigation Benchmark
FSD-CAP: Fractional Subgraph Diffusion with Class-Aware Propagation for Graph Feature Imputation
UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings
ARTDECO: Toward High-Fidelity On-the-Fly Reconstruction with Hierarchical Gaussian Structure and Feed-Forward Guidance
Q-Learning with Adjoint Matching
Proper Velocity Neural Networks
Supporting Multimodal Intermediate Fusion with Informatic Constraint and Distribution Coherence
No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping
Explain in Your Own Words: Improving Reasoning via Token-Selective Dual Knowledge Distillation
TokMem: One-Token Procedural Memory for Large Language Models
Gradient-Based Program Synthesis with Neurally Interpreted Languages
Scaling Up, Speeding Up: A Benchmark of Speculative Decoding for Efficient LLM Test-Time Scaling
TRAC: Tensor-Train based Across-layer Compression for Parameter-Efficient Fine-Tuning
Generative Diffusion Prior Distillation for Long-Context Knowledge Transfer
RNE: plug-and-play diffusion inference-time control and energy-based training
JAPAN: Joint Adaptive Prediction Areas with Normalising Flow
RefineStat: Efficient Exploration for Probabilistic Program Synthesis
Accelerated Parallel Tempering via Neural Transports
ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment
CoLA: Co-Calibrated Logit Adjustment for Long-Tailed Semi-Supervised Learning
Pretraining with hierarchical memories: separating long-tail and common knowledge
Rethinking Continual Learning with Progressive Neural Collapse
A Statistical Benchmark for Diffusion-Posterior-Sampling Algorithms
MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on LLMs
Entropic Confinement and Mode Connectivity in Overparameterized Neural Networks
Multi-Agent Guided Policy Optimization
Scalable and Adaptive Trust-Region Learning via Projection Convex Hull
Beyond Distributions: Geometric Action Control for Continuous Reinforcement Learning
Native Adaptive Solution Expansion for Diffusion-based Combinatorial Optimization
Learning Admissible Heuristics for A*: Theory and Practice
Provably Accelerated Imaging with Restarted Inertia and Score-based Image Priors
Seek-CAD: A Self-refined Generative Modeling for 3D Parametric CAD Using Local Inference via DeepSeek
The Power of Small Initialization in Noisy Low-Tubal-Rank Tensor Recovery
Local Entropy Search over Descent Sequences for Bayesian Optimization
Improving LLM-based Global Optimization with Search Space Partitioning
Composite Optimization with Error Feedback: the Dual Averaging Approach
DeMo: Decoupled Momentum Optimization
A Memory-Efficient Hierarchical Algorithm for Large-scale Optimal Transport Problems
Beyond Outliers: A Study of Optimizers Under Quantization
Improved $\ell_{p}$ Regression via Iteratively Reweighted Least Squares
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
Sublinear Time Quantum Algorithm for Attention Approximation
KBVQ-MoE: KLT-guided SVD with Bias-Corrected Vector Quantization for MoE Large Language Models
Adaptive Nonlinear Compression for Large Foundation Models
Inference-Cost-Aware Dynamic Tree Construction for Efficient Inference in Large Language Models
PEAR: Phase Entropy Aware Reward for Efficient Reasoning
DemoGrasp: Universal Dexterous Grasping from a Single Demonstration
Learning Dynamics of Logits Debiasing for Long-Tailed Semi-Supervised Learning
SliderQuant: Accurate Post-Training Quantization for LLMs
Test-Time Training Done Right
Scalable Second-order Riemannian Optimization for $K$-means Clustering
Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models
Beyond Length: Quantifying Long-Range Information for Long-Context LLM Pretraining Data
Beyond Uniformity: Sample and Frequency Meta Weighting for Post-Training Quantization of Diffusion Models
MoDr: Mixture-of-Depth-Recurrent Transformers for Test-Time Reasoning
On the Interaction of Compressibility and Adversarial Robustness
Remaining-data-free Machine Unlearning by Suppressing Sample Contribution
Cactus: Accelerating Auto-Regressive Decoding with Constrained Acceptance Speculative Sampling
From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning
Deconstructing Positional Information: From Attention Logits to Training Biases
TD-MoE: Tensor Decomposition for MoE Models
Attend to the Active: Structure-Aware Dynamic Attention in LLMs for Compositional Instruction Following
Token Alignment Heads: Unveiling Attention's Role in LLM Multilingual Translation
Attention Sinks and Compression Valleys in LLMs are Two Sides of the Same Coin
Critical attention scaling in long-context transformers
Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs
Group Representational Position Encoding
AlignFlow: Improving Flow-based Generative Models with Semi-Discrete Optimal Transport
NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching
Condition Matters in Full-head 3D GANs
DOPPLER: Dual-Policy Learning for Device Assignment in Asynchronous Dataflow Graphs
SceneTransporter: Optimal Transport-Guided Compositional Latent Diffusion for Single-Image Structured 3D Scene Generation
GenDR: Lighten Generative Detail Restoration
Quagmires in SFT-RL Post-Training: When High SFT Scores Mislead and What to Use Instead
Latent Stochastic Interpolants
When would Vision-Proprioception Policies Fail in Robotic Manipulation?
Follow-Your-Preference: Towards Preference-Aligned Image Inpainting
Flow of Spans: Generalizing Language Models to Dynamic Span-Vocabulary via GFlowNets
InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation
Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding
Diagnosing and Improving Diffusion Models by Estimating the Optimal Loss Value
Out of the Memory Barrier: A Highly Memory-Efficient Training System for LLMs with Million-Token Contexts
REAL: Reading Out Transformer Activations for Precise Localization in Language Model Steering
ES-dLLM: Efficient Inference for Diffusion Large Language Models by Early-Skipping
Steering Autoregressive Music Generation with Recursive Feature Machines
Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models
$\mathbf{Li_2}$: A Framework on Dynamics of Feature Emergence and Delayed Generalization
Cache-to-Cache: Direct Semantic Communication Between Large Language Models
Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents
Perturbed Dynamic Time Warping: A Probabilistic Framework and Generalized Variants
A General Spatio-Temporal Backbone with Scalable Contextual Pattern Bank for Urban Continual Forecasting
Learning an Image Editing Model without Image Editing Pairs
Improving Reasoning for Diffusion Language Models via Group Diffusion Policy Optimization
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Dual-IPO: Dual-Iterative Preference Optimization for Text-to-Video Generation
Cross-Modal Redundancy and the Geometry of Vision–Language Embeddings
GGBall: Graph Generative Model on Poincaré Ball
Learn to Guide Your Diffusion Model
Provable Separations between Memorization and Generalization in Diffusion Models
Predicting LLM Output Length via Entropy-Guided Representations
Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance
Eliminating VAE for Fast and High-Resolution Generative Detail Restoration
Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct
Toward Effective Tool-Integrated Reasoning via Self-Evolved Preference Learning
On the Design of One-step Diffusion via Shortcutting Flow Paths
AlphaFlow: Understanding and Improving MeanFlow Models
InfBaGel: Human-Object-Scene Interaction Generation with Dynamic Perception and Iterative Refinement
Heuristic-Based Ideation for Guiding LLMs Toward Structured Creativity
StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs
Is it Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image
Generalized Compressed Sensing for Image Reconstruction with Diffusion Probabilistic Models
Summaries as Centroids for Interpretable and Scalable Text Clustering
(U)NFV: (Un)Supervised Neural Finite Volume Methods for Solving Hyperbolic PDEs
MASAM: Multimodal Adaptive Sharpness-Aware Minimization for Heterogeneous Data Fusion
FreqKV: Key-Value Compression in Frequency Domain for Context Window Extension
Smooth Reading: Bridging the Gap of Recurrent LLM to Self-Attention LLM on Long-Context Understanding
FASA: FREQUENCY-AWARE SPARSE ATTENTION
Equivariant Splitting: Self-supervised learning from incomplete data
Robust Generalized Schr\"{o}dinger Bridge via Sparse Variational Gaussian Processes
Entering the Era of Discrete Diffusion Models: A Benchmark for Schrödinger Bridges and Entropic Optimal Transport
Gelato: Graph Edit Distance via Autoregressive Neural Combinatorial Optimization
Defining and quantifying compositional structure
Bilateral Information-aware Test-time Adaptation for Vision-Language Models
Dual Randomized Smoothing: Beyond Global Noise Variance
Contamination Detection for VLMs Using Multi‑Modal Semantic Perturbations
Single-Loop Byzantine-Resilient Federated Bilevel Optimization
Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation
Multi-Domain Riemannian Graph Gluing for Building Graph Foundation Models
On the $O(1/T)$ Convergence of Alternating Gradient Descent–Ascent in Bilinear Games
Improving Black-Box Generative Attacks via Generator Semantic Consistency
A Benchmark for Deep Information Synthesis
The Effect of Attention Head Count on Transformer Approximation
Training-Free Determination of Network Width via Neural Tangent Kernel
Softmax Transformers are Turing-Complete
ssToken: Self-modulated and Semantic-aware Token Selection for LLM Fine-tuning
SCOPED: Score–Curvature Out-of-distribution Proximity Evaluator for Diffusion
Texture Vector-Quantization and Reconstruction Aware Prediction for Generative Super-Resolution
Product of Experts for Visual Generation
VoG: Enhancing LLM Reasoning through Stepwise Verification on Knowledge Graphs
Geometry-aware Policy Imitation
Deft Scheduling of Dynamic Cloud Workflows with Varying Deadlines via Mixture-of-Experts
HYPER: A Foundation Model for Inductive Link Prediction with Knowledge Hypergraphs
Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models
On Universality of Deep Equivariant Networks
Noise Tolerance of Distributionally Robust Learning
Phantom-Data: Towards a General Subject-Consistent Video Generation Dataset
Understanding and Relaxing the Limitations of Transformers for Linear Algebra
WILD-Diffusion: A WDRO Inspired Training Method for Diffusion Models under Limited Data
Semantic Uncertainty Quantification of Hallucinations in LLMs: A Quantum Tensor Network Based Method
TRAJECT-Bench:A Trajectory-Aware Benchmark for Evaluating Agentic Tool Use
Avoid Catastrophic Forgetting with Rank-1 Fisher from Diffusion Models
ParaS2S: Benchmarking and Aligning Spoken Language Models for Paralinguistic-aware Speech-to-Speech Interaction
Theory of Scaling Laws for In-Context Regression: Depth, Width, Context and Time
Autoencoding-Free Context Compression for LLMs via Contextual Semantic Anchors
Sequences of Logits Reveal the Low Rank Structure of Language Models
Finite-Time Convergence Analysis of ODE-based Generative Models for Stochastic Interpolants
From Neural Networks to Logical Theories: The Correspondence between Fibring Modal Logics and Fibring Neural Networks
The Serial Scaling Hypothesis
Fisher-Rao Sensitivity for Out-of-Distribution Detection in Deep Neural Networks
The Coverage Principle: How Pre-Training Enables Post-Training
Efficient Zero-shot Inpainting with Decoupled Diffusion Guidance
On The Surprising Effectiveness of a Single Global Merging in Decentralized Learning
Probability Distributions Computed by Autoregressive Transformers
Fast Catch-Up, Late Switching: Optimal Batch Size Scheduling via Functional Scaling Laws
Probing in the Dark: State Entropy Maximization for POMDPs
Inverse Reinforcement Learning with Dynamic Reward Scaling for LLM Alignment
C-Evolve: Consensus-based Evolution for Prompt Groups
Annotation-Efficient Honesty Alignment via Confidence Elicitation and Calibration
Early Signs of Steganographic Capabilities in Frontier LLMs
Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield
Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction
DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems
CORDS - Continuous Representations of Discrete Structures
High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning
Skirting Additive Error Barriers for Private Turnstile Streams
Stable and Scalable Deep Predictive Coding Networks with Meta-Prediction Errors
SGD with Adaptive Preconditioning: Unified Analysis and Momentum Acceleration
Optimal Brain Restoration for Joint Quantization and Sparsification of LLMs
MarS-FM: Generative Modeling of Molecular Dynamics via Markov State Models
GradPCA: Leveraging NTK Alignment for Reliable Out-of-Distribution Detection
MLP Memory: A Retriever-Pretrained Memory for Large Language Models
VaseVQA-3D: Benchmarking 3D VLMs on Ancient Greek Pottery
Bures-Isotropy Alignment: Manifold Learning of Generalized Category Discovery
HackWorld: Evaluating Computer-Use Agents on Exploiting Web Application Vulnerabilities
FictionalQA: A Dataset for Studying Memorization and Knowledge Acquisition
The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs
Encoder-only Next Token Prediction
Do We Really Need Permutations? Impact of Model Width on Linear Mode Connectivity
DynaGuard: A Dynamic Guardian Model With User-Defined Policies
Zebra-CoT: A Dataset for Interleaved Vision-Language Reasoning
FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels
Rethinking Consistent Multi-Label Classification Under Inexact Supervision
Decomposition of Concept-Level Rules in Visual Scenes
Sparsity Forcing: Reinforcing Token Sparsity of MLLMs
Improving Discrete Diffusion Unmasking Policies Beyond Explicit Reference Policies
Aligning Deep Implicit Preferences by Learning to Reason Defensively
Randomized Antipodal Search Done Right for Data Pareto Improvement of LLM Unlearning
QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models
Children's Intelligence Tests Pose Challenges for MLLMs? KidGym: A 2D Grid-Based Reasoning Benchmark for MLLMs
3D Scene Prompting for Scene-Consistent Camera-Controllable Video Generation
ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents
SigmaDock: Untwisting Molecular Docking with Fragment-Based SE(3) Diffusion
Geometric Graph Neural Diffusion for Stable Molecular Dynamics Simulations
From Verifiable Dot to Reward Chain: Harnessing Verifiable Reference-based Rewards for Reinforcement Learning of Open-ended Generation
Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents
Drugging the Undruggable: Benchmarking and Modeling Fragment-Based Screening
Beyond Ensembles: Simulating All-Atom Protein Dynamics in a Learned Latent Space
EarthSE: A Benchmark Evaluating Earth Scientific Exploration Capability for Large Language Models
TianQuan-S2S: A Subseasonal-to-Seasonal Global Weather Model via Incorporate Climatology State
Unleashing Scientific Reasoning for Bio-experimental Protocol Generation via Structured Component-based Reward Mechanism
A Resolution-Agnostic Geometric Transformer for Chromosome Modeling Using Inertial Frame
VoxPrivacy: A Benchmark for Evaluating Interactional Privacy of Speech Language Models
From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents
Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design
NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context
Uncover Underlying Correspondence for Robust Multi-view Clustering
EgoHandICL: Egocentric 3D Hand Reconstruction with In-Context Learning
BigMaQ: A Big Macaque Motion and Animation Dataset Bridging Image and 3D Pose Representations
Learning to Generate Stylized Handwritten Text via a Unified Representation of Style, Content, and Noise
ROSETTA: Constructing Code-Based Reward from Unconstrained Language Preference
Alternating Diffusion for Proximal Sampling with Zeroth Order Queries
When Weak LLMs Speak with Confidence, Preference Alignment Gets Stronger
Bandit Learning in Matching Markets Robust to Adversarial Corruptions
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers
Benchmarking Open-ended Segmentation
Breaking Scale Anchoring: Frequency Representation Learning for Accurate High-Resolution Inference from Low-Resolution Training
HFSTI-Net: Hierarchical Frequency-spatial-temporal Interactions for Video Polyp Segmentation
A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning
OrthoSolver: A Neural Proper Orthogonal Decomposition Solver For PDEs
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models
The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss
Manipulation as in Simulation: Enabling Accurate Geometry Perception in Robots
$\textbf{Re}^{2}$: Unlocking LLM Reasoning via Reinforcement Learning with Re-solving
Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models
Cancer-Myth: Evaluating Large Language Models on Patient Questions with False Presuppositions
Steerable Adversarial Scenario Generation through Test-Time Preference Alignment
Multi-Synaptic Cooperation: A Bio-Inspired Framework for Robust and Scalable Continual Learning
3D Aware Region Prompted Vision Language Model
QVLA: Not All Channels Are Equal in Vision-Language-Action Model's Quantization
PERSISTENCE SPHERES: BI-CONTINUOUS REPRESENTATIONS OF PERSISTENCE DIAGRAMS.
ArtVIP: Articulated Digital Assets of Visual Realism, Modular Interaction, and Physical Fidelity for Robot Learning
Hierarchical Entity-centric Reinforcement Learning with Factored Subgoal Diffusion
RAPID$^3$: Tri-Level Reinforced Acceleration Policies for Diffusion Transformer
FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image Editing
Evolving Graph Structured Programs for Circuit Generation with Large Language Models
ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
PASER: Post-Training Data Selection for Efficient Pruned Large Language Model Recovery
HierLoc: Hyperbolic Entity Embeddings for Hierarchical Visual Geolocation
WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM
AgentFold: Long-Horizon Web Agents with Proactive Context Folding
CitySeeker: How Do VLMs Explore Embodied Urban Navigation with Implicit Human Needs?
SpikePingpong: Spike Vision-based Fast-Slow Pingpong Robot System
SciTS: Scientific Time Series Understanding and Generation with LLMs
SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models
Theoretical Guarantees for Causal Discovery on Large Random Graphs
A Hierarchical Circuit Symbolic Discovery Framework for Efficient Logic Optimization
ViMo: A Generative Visual GUI World Model for App Agents
How Far Can Unsupervised RLVR Scale LLM Training?
RoboMD: Uncovering Robot Vulnerabilities through Semantic Potential Fields
Latent-Guided Reasoning: Empowering Small LLMs with Large-Model Thinking
EVLP: Learning Unified Embodied Vision-Language Planner with Reinforced Supervised Fine-Tuning
Bridging Past and Future: Distribution-Aware Alignment for Time Series Forecasting
Mean-Field Neural Differential Equations: A Game-Theoretic Approach to Sequence Prediction
Off-Trajectory Reasoning: Can LLMs Collaborate on Reasoning Trajectories?
Towards Multimodal Time Series Anomaly Detection with Semantic Alignment and Condensed Interaction
Code Driven Planning with Domain-Adaptive Selector
Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games
DepthLM: Metric Depth from Vision Language Models
EAMET: ROBUST MASSIVE MODEL EDITING VIA EMBEDDING ALIGNMENT OPTIMIZATION
Enhancing Molecular Property Predictions by Learning from Bond Modelling and Interactions
OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning
One for Two: A Unified Framework for Imbalanced Graph Classification via Dynamic Balanced Prototype
FaLW: A Forgetting-aware Loss Reweighting for Long-tailed Unlearning
PHAT: Modeling Period Heterogeneity for Multivariate Time Series Forecasting
Efficient Autoregressive Inference for Transformer Probabilistic Models
Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility
Perturbation-Induced Linearization: Constructing Unlearnable Data with Solely Linear Classifiers
CTBench: Cryptocurrency Time Series Generation Benchmark
Semantic-Enhanced Time-Series Forecasting via Large Language Models
TSM-Bench: Detecting LLM-Generated Text in Real-World Wikipedia Editing Practices
Youtu-GraphRAG: Vertically Unified Agents for Graph Retrieval-Augmented Complex Reasoning
EchoMind: An Interrelated Multi-level Benchmark for Evaluating Empathetic Speech Language Models
VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety
LEXam: Benchmarking Legal Reasoning on 340 Law Exams
RRNCO: Towards Real-World Routing with Neural Combinatorial Optimization
Actions Speak Louder than Prompts: A Large-Scale Study of LLMs for Graph Inference
Two-Layer Convolutional Autoencoders Trained on Normal Data Provably Detect Unseen Anomalies
Towards Strategic Persuasion with Language Models
Code World Models for General Game Playing
Vid-LLM: A Compact Video-based 3D Multimodal LLM with Reconstruction–Reasoning Synergy
CaTS: Calibrated Test-Time Scaling for Efficient LLM Reasoning
Learning to Play Multi-Follower Bayesian Stackelberg Games
Can Transformers Really Do It All? On the Compatibility of Inductive Biases Across Tasks
The Lattice Representation Hypothesis of Large Language Models
AI-for-Science Low-code Platform with Bayesian Adversarial Multi-Agent Framework
Meta-RL Induces Exploration in Language Agents
Q-learning with Posterior Sampling
Confident and Adaptive Generative Speech Recognition via Risk Control
AntigenLM: Structure-Aware DNA Language Modeling for Influenza
ReFORM: Reflected Flows for On-support Offline RL via Noise Manipulation
sleep2vec: Unified Cross-Modal Alignment for Heterogeneous Nocturnal Biosignals
Token-Efficient Long-Term Interest Sketching and Internalized Reasoning for LLM-based Recommendation
Flowing Through States: Neural ODE Regularization for Reinforcement Learning
OrchestrationBench: LLM-Driven Agentic Planning and Tool Use in Multi-Domain Scenarios
TRIM: Hybrid Inference via Targeted Stepwise Routing in Multi-Step Reasoning Tasks
Rethinking Reasoning in Document Ranking: Why Chain-of-Thought Falls Short
Laplacian Kernelized Bandit
INSTANT: Compressing Gradients and Activations for Resource-Efficient Training
An Ensemble Framework for Unbiased Language Model Watermarking
Distributionally Robust Linear Regression with Block Lewis Weights
On Entropy Control in LLM-RL Algorithms
Anchor Frame Bridging for Coherent First-Last Frame Video Generation
Learning to Reason Efficiently with Discounted Reinforcement Learning
Diffusion Transformers with Representation Autoencoders
LearNAT: Learning NL2SQL with AST-guided Task Decomposition for Large Language Models
Consistent Low-Rank Approximation
Better Bounds for the Distributed Experts Problem
Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training
Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis
Distributed Algorithms for Euclidean Clustering
Do LLMs Forget What They Should? Evaluating In-Context Forgetting in Large Language Models
Cascadia: An Efficient Cascade Serving System for Large Language Models
TESSAR: Geometry-Aware Active Regression via Dynamic Voronoi Tessellation
DAMR: Efficient and Adaptive Context-Aware Knowledge Graph Question Answering with LLM-Guided MCTS
Neural Predictor-Corrector: Solving Homotopy Problems with Reinforcement Learning
AutoEP: LLMs-Driven Automation of Hyperparameter Evolution for Metaheuristic Algorithms
Tools are under-documented: Simple Document Expansion Boosts Tool Retrieval
Plan-Answer-Refine-on-Graph: Structured Planning and Self-Refinement for Large Language Model Reasoning on Knowledge Graphs
Spatial-DISE: A Unified Benchmark for Evaluating Spatial Reasoning in Vision-Language Models
Are LLMs Really Not Knowledgeable? Mining the Submerged Knowledge in LLMs' Memory
FZOO: Fast Zeroth-Order Optimizer for Fine‑Tuning Large Language Models towards Adam‑Scale Speed
MT-DAO: Multi-Timescale Distributed Adaptive Optimizers with Local Updates
Multimodal Policy Internalization for Conversational Agents
Visual Self-Refine: A Pixel-Guided Paradigm for Accurate Chart Parsing
Query-Aware Flow Diffusion for Graph-Based RAG with Retrieval Guarantees
Improving Long-Range Interactions in Graph Neural Simulators via Hamiltonian Dynamics
Evidence for Limited Metacognition in LLMs
LightMem: Lightweight and Efficient Memory-Augmented Generation
From Utterance to Vividity: Training Expressive Subtitle Translation LLM via Adaptive Local Preference Optimization
PrismAudio: Decomposed Chain-of-Thought and Multi-dimensional Rewards for Video-to-Audio Generation
Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence
Efficient Reasoning with Balanced Thinking
Selective Expert Guidance for Effective and Diverse Exploration in Reinforcement Learning of LLMs
The Open Proof Corpus: A Large-Scale Study of LLM-Generated Mathematical Proofs
Is In-Context Learning Learning?
Teach2Eval: An Interaction-Driven LLMs Evaluation Method via Teaching Effectiveness
Characterizing the Discrete Geometry of ReLU Networks
Arbitrary-Shaped Image Generation via Spherical Neural Field Diffusion
Beyond Pass@ 1: Self-Play with Variational Problem Synthesis Sustains RLVR
Scaling Knowledge Graph Construction through Synthetic Data Generation and Distillation
SynCoGen: Synthesizable 3D Molecule Generation via Joint Reaction and Coordinate Modeling
Credit-Budgeted ICPC-Style Coding: When Agents Must Pay for Every Decision
FSPO: Few-Shot Optimization of Synthetic Preferences Effectively Personalizes to Real Users
LoopFormer: Elastic-Depth Looped Transformers for Latent Reasoning via Shortcut Modulation
DiscoX: Benchmarking Discourse-Level Translation in Expert Domains
FlexiVoice: Enabling Flexible Style Control in Zero-Shot TTS with Natural Language Instructions
CLoD-GS: Continuous Level-of-Detail via 3D Gaussian Splatting
CE-Nav: Flow-Guided Reinforcement Refinement for Cross-Embodiment Local Navigation
JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation
Identity-Free Deferral For Unseen Experts
Self-Improving Vision-Language-Action Models with Data Generation via Residual RL
ToolACE-MT: Non-Autoregressive Generation for Agentic Multi-Turn Interaction
MARS-Sep: Multimodal-Aligned Reinforced Sound Separation
SuperF: Neural Implicit Fields for Multi-Image Super-Resolution
Selection, Reflection and Self-Refinement: Revisit Reasoning Tasks via a Causal Lens
Hierarchical Semantic-Acoustic Modeling via Semi-Discrete Residual Representations for Expressive End-to-End Speech Synthesis
PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts
SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization
Critique-RL: Training Language Models For Critiquing Through Two-Stage Reinforcement Learning
Vivid-VR: Distilling Concepts from Text-to-Video Diffusion Transformer for Photorealistic Video Restoration
Representational Alignment Across Model Layers and Brain Regions with Multi-Level Optimal Transport
ReDDiT: Rehashing Noise for Discrete Visual Generation
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing
HUMOF: Human Motion Forecasting in Interactive Social Scenes
Towards Physically Executable 3D Gaussian for Embodied Navigation
Divid: Disentangled Spatial-Temporal Modeling within LLMs for Temporally Grounded Video Understanding
PartSAM: A Scalable Promptable Part Segmentation Model Trained on Native 3D Data
Interaction-aware Representation Modeling With Co-Occurrence Consistency for Egocentric Hand-Object Parsing
Learning Domain-Aware Task Prompt Representations for Multi-Domain All-in-One Image Restoration
LiteGuard: Efficient Task-Agnostic Model Fingerprinting with Enhanced Generalization
GDR-learners: Orthogonal Learning of Generative Models for Potential Outcomes
Beyond Grid-Locked Voxels: Neural Response Functions for Continuous Brain Encoding
Emergence of Spatial Representation in an Actor-Critic Agent with Hippocampus-Inspired Sequence Generator
Toward Conservative Planning from Human-AI Preferences in Reinforcement Learning
Overlap-Adaptive Regularization for Conditional Average Treatment Effect Estimation
Imitation Learning as Return Distribution Matching
Heterogeneous Agent Q-weighted Policy Optimization
Uni-NTFM: A Unified Foundation Model for EEG Signal Representation Learning
LongLive: Real-time Interactive Long Video Generation
Natural Language PDDL (NL-PDDL) for Open-world Goal-oriented Commonsense Regression Planning in Embodied AI
SR-Scientist: Scientific Equation Discovery With Agentic AI
BaseReward: A Strong Baseline for Multimodal Reward Model
Matting Anything 2: Towards Video Matting for Anything
Model-based Offline RL via Robust Value-Aware Model Learning with Implicitly Differentiable Adaptive Weighting
OptimSyn: Influence-Guided Rubrics Optimization for Synthetic Data Generation
OrthAlign: Orthogonal Subspace Decomposition for Non-Interfering Multi-Objective Alignment
Guidance Matters: Rethinking the Evaluation Pitfall for Text-to-Image Generation
UrbanGS: Efficient and Scalable Architecture for Geometrically Accurate Large-Scene Reconstruction
A Stitch in Time Saves Nine: Proactive Self-Refinement for Language Models
Efficient Reinforcement Learning by Guiding World Models with Non-Curated Data
GEPO: Group Expectation Policy Optimization for Stable Heterogeneous Reinforcement Learning
FM4NPP: A Scaling Foundation Model for Nuclear and Particle Physics
Autoregressive Visual Decoding from EEG Signals
Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis
Learning Brain Representation with Hierarchical Visual Embeddings
MoGen: Detailed Neuronal Morphology Generation via Point Cloud Flow Matching
Riemannian High-Order Pooling for Brain Foundation Models
A Brain-Inspired Gating Mechanism Unlocks Robust Computation in Spiking Neural Networks
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy
On the identifiability of causal graphs with multiple environments
Accelerating Benchmarking of Functional Connectivity Modeling via Structure-aware Core-set Selection
Neural Dynamics Self-Attention for Spiking Transformers
Decoding Open-Ended Information Seeking Goals from Eye Movements in Reading
ChronoPlay: A Framework for Modeling Dual Dynamics and Authenticity in Game RAG Benchmarks
Using Reinforcement Learning to Train Large Language Models to Explain Human Decisions
Neuro-Symbolic Decoding of Neural Activity
Can SAEs reveal and mitigate racial biases of LLMs in healthcare?
AtC: Aggregate-then-Calibrate for Human-centered Assessment
On the Shelf Life of Fine-Tuned LLM-Judges: Future-Proofing, Backward-Compatibility, and Question Generalization
Mix-Ecom: Towards Mixed-Type E-Commerce Dialogues with Complex Domain Rules
Meta-Learning Theory-Informed Inductive Biases using Deep Kernel Gaussian Processes
One Patch Doesn’t Fit All: Adaptive Patching for Native-Resolution Multimodal Large Language Models
MindMix: A Multimodal Foundation Model for Auditory Perception Decoding via Deep Neural-Acoustic Alignment
AutoMetrics: Approximate Human Judgments with Automatically Generated Evaluators
Micro-Macro Retrieval: Reducing Long-Form Hallucination in Large Language Models
TINY BUT MIGHTY: A SOFTWARE-HARDWARE CO- DESIGN APPROACH FOR EFFICIENT MULTIMODAL IN- FERENCE ON BATTERY-POWERED SMALL DEVICES
``Noisier'’ Noise Contrastive Estimation is (Almost) Maximum Likelihood
JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization
Real-Time Reasoning Agents in Evolving Environments
FlowAD: Ego-Scene Interactive Modeling for Autonomous Driving
Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning
DIVA-GRPO: Enhancing Multimodal Reasoning through Difficulty-Adaptive Variant Advantage
VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning
Virne: A Comprehensive Benchmark for RL-based Network Resource Allocation in NFV
SpeechOp: Inference-Time Task Composition for Generative Speech Processing
PAGE-4D: Disentangled Pose and Geometry Estimation for VGGT-4D Perception
ELLMob: Event-Driven Human Mobility Generation with Self-Aligned LLM Framework
Prior-aware and Context-guided Group Sampling for Active Probabilistic Subsampling
I-DRUID: Layout to image generation via instance-disentangled representation and unpaired data
Reinforced Latent Reasoning for LLM-based Recommendation
SPICE: Submodular Penalized Information–Conflict Selection for Efficient Large Language Model Training
SatDreamer360: Multiview-Consistent Generation of Ground-Level Scenes from Satellite Imagery
Think-While-Generating: On-the-Fly Reasoning for Personalized Long-Form Generation
Story-Iter: A Training-free Iterative Paradigm for Long Story Visualization
Unified Analyses for Hierarchical Federated Learning: Topology Selection under Data Heterogeneity
Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?
Multi-turn Evaluation of Anthropomorphic Behaviours in Large Language Models
UltraViCo: Breaking Extrapolation Limits in Video Diffusion Transformers
Turbo-DDCM: Fast and Flexible Zero-Shot Diffusion-Based Image Compression
M$^2$-Miner: Multi-Agent Enhanced MCTS for Mobile GUI Agent Data Mining
Human Uncertainty-Aware Data Selection and Automatic Labeling in Visual Question Answering
Achieving Approximate Symmetry Is Exponentially Easier than Exact Symmetry
Self-Supervised Evolution Operator Learning for High-Dimensional Dynamical Systems
LearnIR: Learnable Posterior Sampling for Real-World Image Restoration
ChronoEdit: Towards Temporal Reasoning for In-Context Image Editing and World Simulation
D-AR: Diffusion via Autoregressive Models
Learning to Solve Orienteering Problem with Time Windows and Variable Profits
Beyond the Known: An Unknown-Aware Large Language Model for Open-Set Text Classification
Recover Cell Tensor: Diffusion-Equivalent Tensor Completion for Fluorescence Microscopy Imaging
Cross-ControlNet: Training-Free Fusion of Multiple Conditions for Text-to-Image Generation
Point Prompting: Counterfactual Tracking with Video Diffusion Models
FlashWorld: High-quality 3D Scene Generation within Seconds
LogiStory: A Logic-Aware Framework for Multi-Image Story Visualization
MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement
Scaling Attention via Feature Sparsity
ShapeGen4D: Towards High Quality 4D Shape Generation from Videos
Efficient Resource-Constrained Training of Transformers via Subspace Optimization
AssetFormer: Modular 3D Assets Generation with Autoregressive Transformer
LumiTex: Towards High-Fidelity PBR Texture Generation with Illumination Context
DA$^{2}$: Depth Anything in Any Direction
TTT3R: 3D Reconstruction as Test-Time Training
Adaptive Hopfield Network: Rethinking Similarities in Associative Memory
Retain and Adapt: Auto-Balanced Model Editing for Open-Vocabulary Object Detection under Domain Shifts
mR3: Multilingual Rubric-Agnostic Reward Reasoning Models
ULTRA-360: Unconstrained Dataset for Large-scale Temporal 3D Reconstruction across Altitudes and Omnidirectional Views
Sparkle: A Robust and Versatile Representation for Point Cloud-based Human Motion Capture
STVG-R1: Incentivizing Instance-Level Reasoning and Grounding in Videos via Reinforcement Learning
MULTIMODALITY AS SUPERVISION: SELF-SUPERVISED SPECIALIZATION TO THE TEST ENVIRONMENT VIA MULTIMODALITY
MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning
Threading Keyframe with Narratives: MLLMs as Strong Long Video Comprehenders
MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding
TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning
MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
Closing the Modality Gap Aligns Group-Wise Semantics
Midway Network: Learning Representations for Recognition and Motion from Latent Dynamics
A.I.R.: Enabling Adaptive, Iterative, and Reasoning-based Frame Selection For Video Question Answering
CERTIFIED VS. EMPIRICAL ADVERSARIAL ROBUSTNESS VIA HYBRID CONVOLUTIONS WITH ATTENTION STOCHASTICITY
CubeBench: Diagnosing Interactive, Long-Horizon Physical Intelligence under Partial Observations
Self-Evolving Vision-Language Models for Image Quality Assessment via Voting and Ranking
VisualPRM400K: An Effective Dataset for Training Multimodal Process Reward Models
MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs
Reasoning as Representation: Rethinking Visual Reinforcement Learning in Image Quality Assessment
Exploring the Potential of Encoder-free Architectures in 3D LMMs
Supervised Fine-Tuning or Contrastive Learning? Towards Better Multimodal LLM Reranking
Falcon: Fast Proximal Linearization of Normalized Cuts for Unsupervised Image Segmentation
Breaking the SFT Plateau: Multimodal Structured Reinforcement Learning for Chart-to-Code Generation
ImageDoctor: Diagnosing Text-to-Image Generation via Grounded Image Reasoning
When Silence Is Golden: Can LLMs Learn to Abstain in Temporal QA and Beyond?
Omni-IML: Towards Unified Interpretable Image Manipulation Localization
StreamingVLM: Real-Time Understanding for Infinite Video Streams
Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning
SpaCE-Eval: A Benchmark for Real-World Multi-Modal Reasoning
Multilingual Routing in Mixture-of-Experts
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
GaussianFusion: Unified 3D Gaussian Representation for Multi-Modal Fusion Perception
GNN-as-Judge: Unleashing the Power of LLMs for Graph Learning with GNN Feedback
MaskInversion: Localized Embeddings via Optimization of Explainability Maps
Leveraging Data to Say No: Memory Augmented Plug-and-Play Selective Prediction
Task Vectors, Learned Not Extracted: Performance Gains and Mechanistic Insights
Learned Meta-Tokens for Language Modeling
GOLDILOCS: GENERAL OBJECT-LEVEL DETECTION AND LABELING OF CHANGES IN SCENES
Steering and Rectifying Latent Representation Manifolds in Frozen Multi-modal LLMs for Video Anomaly Detection
Some Neural Networks Inherently Preserve Subspace Clustering Structure
Unveiling Perceptual Artifacts: A Fine-Grained Benchmark for Interpretable AI-Generated Image Detection
Procedural Mistake Detection via Action Effect Modeling
DVLA-RL: Dual-Level Vision–Language Alignment with Reinforcement Learning Gating for Few-Shot Learning
GUIDE: Gated Uncertainty-Informed Disentangled Experts for Long-tailed Recognition
Expressive yet Efficient Feature Expansion with Adaptive Cross-Hadamard Products
TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video
Rethinking Expressivity and Degradation-Awareness in Attention for All-in-One Blind Image Restoration
APT: Towards Universal Scene Graph Generation via Plug-in Adaptive Prompt Tuning
Cautious Weight Decay
SSD-GS: Scattering and Shadow Decomposition for Relightable 3D Gaussian Splatting
RF-DETR: Neural Architecture Search for Real-Time Detection Transformers
Searching for Privacy Risks in LLM Agents via Simulation
Reliable Poisoned Sample Detection against Backdoor Attacks Enhanced by Sharpness Aware Minimization
Learning Concept Bottleneck Models from Mechanistic Explanations
General Exploratory Bonus for Optimistic Exploration in RLHF
Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute
Building a Foundational Guardrail for General Agentic Systems via Synthetic Data
Concept-Aware Privacy Mechanisms for Defending Embedding Inversion Attacks
Fewer Weights, More Problems: A Practical Attack on LLM Pruning
MOLM: Mixture of LoRA Markers
Computing Equilibrium beyond Unilateral Deviation
Unlearning Evaluation through Subset Statistical Independence
Invisible Safety Threat: Malicious Finetuning for LLM via Steganography
What's In My Human Feedback? Learning Interpretable Descriptions of Preference Data
How Do Transformers Learn to Associate Tokens: Gradient Leading Terms Bring Mechanistic Interpretability
LLM Fingerprinting via Semantically Conditioned Watermarks
Self-Jailbreaking: Language Models Can Reason Themselves Out of Safety Alignment After Benign Reasoning Training
How Catastrophic is Your LLM? Certifying Risks in Conversation
Unlocking Long-Horizon Agentic Search with Large-Scale End-to-End RL
MSCR: Exploring the Vulnerability of LLMs’ Mathematical Reasoning Abilities Using Multi-Source Candidate Replacement
FARI: Robust One-Step Inversion for Watermarking in Diffusion Models
GeoDiv: Framework for Measuring Geographical Diversity in Text-to-Image Models
Explainable Mixture Models through Differentiable Rule Learning
From Evaluation to Defense: Advancing Safety in Video Large Language Models
STEDiff: Revealing the Spatial and Temporal Redundancy of Backdoor Attacks in Text-to-Image Diffusion Models
Where Did It Go Wrong? Attributing Undesirable LLM Behaviors via Representation Gradient Tracing
Readout Representation: Redefining Neural Codes by Input Recovery
FREAK: A Fine-grained Hallucination Evaluation Benchmark for Advanced MLLMs
Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
From ``Sure" to ``Sorry": Detecting Jailbreak in Large Vision Language Model via JailNeurons
Beyond Match Maximization and Fairness: Retention-Optimized Two-Sided Matching
PRISON: Unmasking the Criminal Potential of Large Language Models
Is Your Paper Being Reviewed by an LLM? Benchmarking AI Text Detection in Peer Review
Do Vision-Language Models Respect Contextual Integrity in Location Disclosure?
Market Games for Generative Models: Equilibria, Welfare, and Strategic Entry
LDT: Layer-Decomposition Training Makes Networks More Generalizable
Test-time Domain Generalization for Image Super-resolution
A Relative Error-Based Evaluation Framework of Heterogeneous Treatment Effect Estimators
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs
Finite-Time Analysis of Actor-Critic Methods with Deep Neural Network Approximation
An efficient, provably optimal algorithm for the 0-1 loss linear classification problem
Learning the Inverse Temperature of Ising Models under Hard Constraints using One Sample
High Probability Bounds for Non-Convex Stochastic Optimization with Momentum
Why Adversarially Train Diffusion Models?
Feedback-driven recurrent quantum neural network universality
Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability
Characterizing and Mitigating Reasoning Drift in Large Language Models
Neural+Symbolic Approaches for Interpretable Actor-Critic Reinforcement Learning
Provably Explaining Neural Additive Models
Minimax Optimal Adversarial Reinforcement Learning
Sparse CLIP: Co-Optimizing Interpretability and Performance in Contrastive Learning
Watermark-based Attribution of AI-Generated Content
Token-Importance Guided Direct Preference Optimization
Discounted Online Convex Optimization: Uniform Regret Across a Continuous Interval
Multi-LLM Adaptive Conformal Inference for Reliable LLM Response
A Derandomization Framework for Structure Discovery: Applications in Neural Networks and Beyond
Optimal Aggregation of LLM and PRM Signals for Efficient Test-Time Scaling
EditLens: Quantifying the Extent of AI Editing in Text
Singleton-Optimized Conformal Prediction
VGR: Visual Grounded Reasoning
Branch and Bound Search for Exact MAP Inference in Credal Networks
Best-of-N through the Smoothing Lens: KL Divergence and Regret Analysis
Enough is as good as a feast: A Comprehensive Analysis of How Reinforcement Learning Mitigates Task Conflicts in LLMs
Tackling Heavy-Tailed Q-Value Bias in Offline-to-Online Reinforcement Learning with Laplace-Robust Modeling
Learning What Matters Now: Dynamic Preference Inference under Contextual Shifts
ROC-n-reroll: How verifier imperfection affects test-time scaling
Point-Focused Attention Meets Context-Scan State Space: Robust Biological Visual Perception for Point Cloud Representation
Proximal Supervised Fine-Tuning
Locality-Attending Vision Transformer
From Curiosity to Caution: Mitigating Reward Hacking for Best-of-$N$ with Pessimism
The State of Reinforcement Finetuning for Transformer-based Agents
Peak-Return Greedy Slicing: Subtrajectory Selection for Transformer-based Offline RL
Part-level Semantic-guided Contrastive Learning for Fine-grained Visual Classification
Bayesian Test-Time Adaptation via Dirichlet feature projection and GMM-Driven Inference for Motor Imagery EEG Decoding
Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning
Spectral Bellman Method: Unifying Representation and Exploration in RL
On Coreset for LASSO Regression Problem with Sensitivity Sampling
ReVeal: Self-Evolving Code Agents via Reliable Self-Verification
CORE: Concept-Oriented Reinforcement for Bridging the Definition–Application Gap in Mathematical Reasoning
Flow Matching Policy Gradients
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
HippoTune: A Hippocampal Associative Loop–Inspired Fine-Tuning Method for Continual Learning
EEPO: Exploration-Enhanced Policy Optimization via Sample-Then-Forget
Q-RAG: Long Context Multi‑Step Retrieval via Value‑Based Embedder Training
Learning Massively Multitask World Models for Continuous Control
GRL-SNAM: Geometric Reinforcement Learning with Differential Hamiltonians for Navigation and Mapping in Unknown Environments
RECAST: Expanding the Boundaries of LLMs' Complex Instruction Following with Multi-Constraint Data
Self-Aligned Reward: Towards Effective and Efficient Reasoners
Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models
Sample Lottery: Unsupervised Discovery of Critical Instances for LLM Reasoning
ExGRPO: Learning to Reason from Experience
Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning
Medical thinking with multiple images
Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn Search Agents
Mitigating the Curse of Detail: Scaling Arguments for Feature Learning and Sample Complexity
GoldenStart: Q-Guided Priors and Entropy Control for Distilling Flow Policies
Guided Policy Optimization under Partial Observability
LiTo: Surface Light Field Tokenization
Who Matters Matters: Agent-Specific Conservative Offline MARL
On Predictability of Reinforcement Learning Dynamics for Large Language Models
ChemEval: A Multi-level and Fine-grained Chemical Capability Evaluation for Large Language Models
Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow
Negotiated Reasoning: On Provably Addressing Relative Over-Generalization
UniSplat: Unified Spatio-Temporal Fusion via 3D Latent Scaffolds for Dynamic Driving Scene Reconstruction
Multi-Action Self-Improvement For Neural Combinatorial Optimization
From Observations to Events: Event-Aware World Models for Reinforcement Learning
Efficient Best-of-Both-Worlds Algorithms for Contextual Combinatorial Semi-Bandits
Online Decision Making with Generative Action Sets
Spike-based Digital Brain: a novel fundamental model for brain activity analysis
PredNext: Explicit Cross-View Temporal Prediction for Unsupervised Learning in Spiking Neural Networks
Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction
RankFlow: Property-aware Transport for Protein Optimization
Multi-objective Large Language Model Alignment with Hierarchical Experts
REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Reasoning
A Near-Optimal Best-of-Both-Worlds Algorithm for Federated Bandits
Planning with an Embodied Learnable Memory
Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents
SFT Doesn’t Always Hurt General Capabilities: Revisiting Domain-Specific Fine-Tuning in LLMs
Seeing Through the Brain: New Insights from Decoding Visual Stimuli with fMRI
Go-Browse: Training Web Agents with Structured Exploration
Online Pseudo-Zeroth-Order Training of Neuromorphic Spiking Neural Networks
ARM-FM: Automated Reward Machines via Foundation Models for Compositional Reinforcement Learning
Conformal Robustness Control: A New Strategy for Robust Decision
On the Computational Limits of AI4S-RL : A Unified $\varepsilon$-$N$ Analysis
EMFuse: Energy-based Model Fusion for Decision Making
A Biologically Plausible Dense Associative Memory with Exponential Capacity
Provable and Practical In-Context Policy Optimization for Self-Improvement
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Method
EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning
Flow Caching for Autoregressive Video Generation
AutoDA-Timeseries: Automated Data Augmentation for Time Series
MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
ORCaS: Unsupervised Depth Completion via Occluded Region Completion as Supervision
NFT: Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
PerfGuard: A Performance-Aware Agent for Visual Content Generation
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?
NIMO: a Nonlinear Interpretable MOdel
Learning from Noisy Preferences: A Semi-Supervised Learning Approach to Direct Preference Optimization
MAS$^2$: Self-Generative, Self-Configuring, Self-Rectifying Multi-Agent Systems
Mixture-of-World Models: Scaling Multi-Task Reinforcement Learning with Modular Latent Dynamics
MaskCO: Masked Generation Drives Effective Representation Learning and Exploiting for Combinatorial Optimization
CodeBrain: Bridging Decoupled Tokenizer and Multi-Scale Architecture for EEG Foundation Model
Predicting Kernel Regression Learning Curves from Only Raw Data Statistics
LatentQA: Teaching LLMs to Decode Activations Into Natural Language
DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment
Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling
Stability Under Scrutiny: Benchmarking Representation Paradigms for Online HD Mapping
PRISM: Partial-label Relational Inference with Spatial and Spectral Cues
Exploring Knowledge Purification in Multi-Teacher Knowledge Distillation for LLMs
SONA: Learning Conditional, Unconditional, and Matching-Aware Discriminator
Rethinking LLM Evaluation: Can We Evaluate LLMs with 200× Less Data?
Diverse Dictionary Learning
Taming Hierarchical Image Coding Optimization: A Spectral Regularization Perspective
DeepRAG: Thinking to Retrieve Step by Step for Large Language Models
Robust Preference Alignment via Directional Neighborhood Consensus
PTQ4ARVG: Post-Training Quantization for AutoRegressive Visual Generation Models
FACT: a first-principles alternative to the Neural Feature Ansatz for how networks learn representations
Adaptive Methods Are Preferable in High Privacy Settings: An SDE Perspective
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
Joint Selection for Large-Scale Pre-Training Data via Policy Gradient-based Mask Learning
SpatiaLab: Can Vision–Language Models Perform Spatial Reasoning in the Wild?
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation
World2Minecraft: Occupancy-Driven Simulated Scenes Construction
FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark
LFQA-E: Carefully Benchmarking Long-form QA Evaluation
FastVGGT: Fast Visual Geometry Transformer
SELF-HARMONY: LEARNING TO HARMONIZE SELF-SUPERVISION AND SELF-PLAY IN TEST-TIME REINFORCEMENT LEARNING
TrajTok: What makes for a good trajectory tokenizer in behavior generation?
Pretraining Scaling Laws for Generative Evaluations of Language Models
Fewer Battles, More Gain: An Information-Efficient Framework for Arena-based LLM Evaluation
lmgame-Bench: How Good are LLMs at Playing Games?
Adjusting Prediction Model Through Wasserstein Geodesic for Causal Inference
Statistical and structural identifiability in representation learning
On Measuring Influence in Avoiding Undesired Future
Uncertainty as Feature Gaps: Epistemic Uncertainty Quantification of LLMs in Contextual Question-Answering
IGC-Net for conditional average potential outcome estimation over time
Efficient and Sharp Off-Policy Learning under Unobserved Confounding
Decoupling Primitive with Experts: Dynamic Feature Alignment for Compositional Zero-Shot Learning
Beyond Text-Only: Towards Multimodal Table Retrieval in Open-World
The Lie of the Average: How Class Incremental Learning Evaluation Deceives You?
Multi-ReduNet: Interpretable Class-Wise Decomposition of ReduNet
Token Distillation: Attention-Aware Input Embeddings for New Tokens
Align Your Structures: Generating Trajectories with Structure Pretraining for Molecular Dynamics
Dual Perspectives on Non-Contrastive Self-Supervised Learning
Confident Block Diagonal Structure-Aware Invariable Graph Completion for Incomplete Multi-view Clustering
Fast and Stable Riemannian Metrics on SPD Manifolds via Cholesky Product Geometry
Mixture of Mini Experts: Overcoming the Linear Layer Bottleneck in Multiple Instance Learning
Quantized Gradient Projection for Memory-Efficient Continual Learning
Compositional amortized inference for large-scale hierarchical Bayesian models
Efficient Credal Prediction through Decalibration
Play to Generalize: Learning to Reason Through Game Play
Pseudo-Non-Linear Data Augmentation: A Constrained Energy Minimization Viewpoint
AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite
Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems
Know When to Abstain: Optimal Selective Classification with Likelihood Ratios
It's All Just Vectorization: einx, a Universal Notation for Tensor Operations
Incomplete Multi-View Multi-Label Classification via Shared Codebook and Fused-Teacher Self-Distillation
RADAR: Learning to Route with Asymmetry-aware Distance Representations
Combination-of-Experts with Knowledge Sharing for Cross-Task Vehicle Routing Problems
FMIP: Joint Continuous-Integer Flow For Mixed-Integer Linear Programming
Gen-DFL: Decision-Focused Generative Learning for Robust Decision Making
Energy-Efficient Random Variate Generation via Compressed Lookup Tables
Beyond Magic Words: Sharpness-Aware Prompt Evolving for Robust Large Language Models with TARE
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Unlocking the Potential of Weighting Methods in Federated Learning Through Communication Compression
Speculative Actions: A Lossless Framework for Faster AI Agents
Bi-LoRA: Efficient Sharpness-Aware Minimization for Fine-Tuning Large-Scale Models
HiFo-Prompt: Prompting with Hindsight and Foresight for LLM-based Automatic Heuristic Design
RepSpec: Structural Re-parameterized Draft Model Training for Speculative Decoding
AdaCache: Adaptive Caching and Context Augmentation for Efficient LLM Serving
Bridging Fairness and Explainability: Can Input-Based Explanations Promote Fairness in Hate Speech Detection?
Sign-SGD via Parameter-Free Optimization
Text2Interact: High-Fidelity and Diverse Text-to-Two-Person Interaction Generation
SkillFactory: Self-Distillation for Learning Cognitive Behaviors
ParaRNN: Unlocking Parallel Training of Nonlinear RNNs for Large Language Models
Dataset Color Quantization: A Training-Oriented Framework for Dataset-Level Compression
Learning from Historical Activations in Graph Neural Networks
ARMOR: High-Performance Semi-Structured Pruning via Adaptive Matrix Factorization
Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs
WSVD: Weighted Low-Rank Approximation for Fast and Efficient Execution of Low-Precision Vision-Language Models
Compute-Optimal Quantization-Aware Training
Programming by Backprop: An Instruction is Worth 100 Examples When Finetuning LLMs
UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice
GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning
Hierarchy Decoding: A Training-free Parallel Decoding Strategy for Diffusion Large Language Models
Low-Pass Filtering Improves Behavioral Alignment of Vision Models
Scaling Direct Feedback Learning with Jacobian Alignment Guarantees
Mini-cluster Guided Long-tailed Deep Clustering
Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models
A State-Transition Framework for Efficient LLM Reasoning
Learning is Forgetting; LLM Training As Lossy Compression
Probing Rotary Position Embeddings through Frequency Entropy
FSA: An Alternative Efficient Implementation of Native Sparse Attention Kernel
Frequency Bands in RoPE: Base Frequency and Context Length Shape the Interpolation–Extrapolation Trade-off
CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally
ETGS: Explicit Thermodynamics Gaussian Splatting for Dynamic Thermal Reconstruction
FOCUS: Efficient Keyframe Selection for Long Video Understanding
OwlEye: Zero-Shot Learner for Cross-Domain Graph Data Anomaly Detection
Exploring the Design Space of Transition Matching
e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs
Antithetic Noise in Diffusion Models
Unbiased Object Detection Beyond Frequency with Visually Prompted Image Synthesis
Robust Adversarial Attacks Against Unknown Disturbance via Inverse Gradient Sample
Can LLMs Refuse Questions They Do Not Know? Measuring Knowledge-Aware Refusal in Factual Tasks
MrRoPE: Mixed-radix Rotary Position Embedding
Flatter Tokens are More Valuable for Speculative Draft Model Training
Adaptive Concept Discovery for Interpretable Few-Shot Text Classification
STAT: Skill-Targeted Adaptive Training
Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
VSF: Simple, Efficient, and Effective Negative Guidance in Few-Step Image Generation Models By Value Sign Flip
CR-Net: Scaling Parameter-Efficient Training with Cross-Layer Low-Rank Structure
Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information
DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving
Retrospective Sparse Attention for Efficient Long-Context Generation
BézierFlow: Learning Bézier Stochastic Interpolant Schedulers for Few-Step Generation
HOG-Diff: Higher-Order Guided Diffusion for Graph Generation
LouisKV: Efficient KV Cache Retrieval for Long Input-Output Sequences
Topological Flow Matching
Controlling Repetition in Protein Language Models
Dynamic Multi-sample Mixup with Gradient Exploration for Open-set Graph Anomaly Detection
Adaptive Mixture of Disentangled Experts for Dynamic Graph Out-of-Distribution Generalization
Global-Recent Semantic Reasoning on Dynamic Text-Attributed Graphs with Large Language Models
PonderLM: Pretraining Language Models to Ponder in Continuous Space
Conditioned Initialization for Attention
Rethinking the Gold Standard: Why Discrete Curvature Fails to Fully Capture Over-squashing in GNNs?
Cutting the Skip: Training Residual-Free Transformers
Panda: A pretrained forecast model for chaotic dynamics
The Logical Expressiveness of Topological Neural Networks
Directed Semi-Simplicial Learning with Applications to Brain Activity Decoding
Topology of Reasoning: Retrieved Cell Complex-Augmented Generation for Textual Graph Question Answering
Minimax Sample Complexity of Graph Neural Networks: Lower Bounds and Structural Effects
ATEX-CF: Attack-Informed Counterfactual Explanations for Graph Neural Networks
PolicyFlow: Policy Optimization with Continuous Normalizing Flow in Reinforcement Learning
Spinning Straw into Gold: Relabeling LLM Agent Trajectories in Hindsight for Successful Demonstrations
In-Context Watermarks for Large Language Models
The Curious Case of In-Training Compression of State Space Models
The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation
Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!
UniOD: A Universal Model for Outlier Detection across Diverse Domains
SparseEval: Efficient Evaluation of Large Language Models by Sparse Optimization
Global and Local Topology-Aware Graph Generation via Dual Conditioning Diffusion
Tucker-FNO: Tensor Tucker-Fourier Neural Operator and its Universal Approximation Theory
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
Quantitative Bounds for Length Generalization in Transformers
Sample Smart, Not Hard: Correctness-First Decoding for Better Reasoning in LLMs
Leveraging Explanation to Improve Generalization of Meta Reinforcement Learning
Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making
CaReBench: A Fine-grained Benchmark for Video Captioning and Retrieval
Flow Along the $K$-Amplitude for Generative Modeling
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
Towards Knowledge‑and‑Data‑Driven Organic Reaction Prediction: RAG‑Enhanced and Reasoning‑Powered Hybrid System with LLMs
SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning
Expert Merging in Sparse Mixture of Experts with Nash Bargaining
Scaling Group Inference for Diverse and High-Quality Generation
Stronger-MAS: Multi-Agent Reinforcement Learning for Collaborative LLMs
Exposing Weaknesses of Large Reasoning Models through Graph Algorithm Problems
FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Merging
Any-step Generation via N-th Order Recursive Consistent Velocity Field Estimation
Uniform Discrete Diffusion with Metric Path for Video Generation
The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas
Humanline: Online Alignment as Perceptual Loss
AudioTrust: Benchmarking The Multifaceted Trustworthiness of Audio Large Language Models
Scalable Chain of Thoughts via Elastic Reasoning
On the Generalization Capacities of MLLMs for Spatial Intelligence
Graph Representational Learning: When Does More Expressivity Hurt Generalization?
Think Then Embed: Generative Context Improves Multimodal Embedding
Fastcar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge
Uncertainty-Aware 3D Reconstruction for Dynamic Underwater Scenes
Analyzing and Evaluating Unbiased Language Model Watermark
Flow Autoencoders are Effective Protein Tokenizers
Fast Data Mixture Optimization via Gradient Descent
Non-Clashing Teaching in Graphs: Algorithms, Complexity, and Bounds
Can LLMs Reason Soundly in Law? Auditing Inference Patterns for Legal Judgment
CAR-LoRA: Training Compression-Aware and Robust LoRA Adapters for Evolving LLMs
RD-HRL: Generating Reliable Sub-Goals for Long-Horizon Sparse-Reward Tasks
Synthetic Bootstrapped Pretraining
WavePolyp: Video Polyp Segmentation via Hierarchical Wavelet-Based Feature Aggregation and Inter-Frame Divergence Perception
Three Forward, One Backward: Memory-Efficient Full-Rank Fine-Tuning of Large Models via Extra Forward Passes
LipNeXt: Scaling up Lipschitz-based Certified Robustness to Billion-parameter Models
MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning
Mitigating Non-IID Drift in Zeroth-Order Federated LLM Fine-Tuning with Transferable Sparsity
DrugTrail: Interpretable Drug Discovery via Structured Reasoning and Druggability‑Tailored Preference Optimization
Latent Denoising Makes Good Tokenizers
MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation
Advancing Spatiotemporal Representations in Spiking Neural Networks via Parametric Invertible Transformation
Web-CogReasoner: Towards Multimodal Knowledge-Induced Cognitive Reasoning for Web Agents
Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems
Incorporating Expert Priors into Bayesian Optimization via Dynamic Mean Decay
Improving Diffusion Models for Class-imbalanced Training Data via Capacity Manipulation
PLANETALIGN: A Comprehensive Python Library for Benchmarking Network Alignment
AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation
CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving
Robust Spiking Neural Networks Against Adversarial Attacks
Dual-Robust Cross-Domain Offline Reinforcement Learning Against Dynamics Shifts
Beyond Binary Preferences: A Principled Framework for Reward Modeling with Ordinal Feedback
PROS: Towards Compute-Efficient RLVR via Rollout Prefix Reuse
Tversky Neural Networks: Psychologically Plausible Deep Learning with Differentiable Tversky Similarity
MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent
PACE: Pretrained Audio Continual Learning
HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models
Thyme: Think Beyond Images
Hierarchical Value-Decomposed Offline Reinforcement Learning for Whole-Body Control
DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science
Scaling Agent Learning via Experience Synthesis
SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation
From Assistant to Independent Developer — Are GPTs Ready for Software Development?
Frequency-aware Dynamic Gaussian Splatting
SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes
ProSafePrune: Projected Safety Pruning for Mitigating Over-Refusal in LLMs
Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models
ArtUV: Artist-style UV Unwrapping
ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization
AlignSep: Temporally-Aligned Video-Queried Sound Separation with Flow Matching
Learning-Time Encoding Shapes Unlearning in LLMs
Towards Safe and Optimal Online Bidding: A Modular Look-ahead Lyapunov Framework
Oracle-efficient Hybrid Learning with Constrained Adversaries
Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning
ChinaTravel: An Open-Ended Travel Planning Benchmark with Compositional Constraint Validation for Language Agents
Evaluating Text Creativity across Diverse Domains: a Dataset and Large Language Model Evaluator
Refining Hybrid Genetic Search for CVRP via Reinforcement Learning-Finetuned LLM
GeoBench: Rethinking Multimodal Geometric Problem-Solving via Hierarchical Evaluation
Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator
PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model
Fine-Grained Activation Steering: Steering Less, Achieving More
Knowledge Reasoning Language Model: Unifying Knowledge and Language for Inductive Knowledge Graph Reasoning
Local Success Does Not Compose: Benchmarking Large Language Models for Compositional Formal Verification
CrossPL: Systematic Evaluation of Large Language Models for Cross Programming Language Interoperating Code Generation
Towards Faithful Reasoning in Remote Sensing: A Perceptually-Grounded GeoSpatial Chain-of-Thought for Vision-Language Models
SportR: A Benchmark for Multimodal Large Language Model Reasoning in Sports
HiVid: LLM-Guided Video Saliency For Content-Aware VOD And Live Streaming
Automatic Stage Lighting Control: Is it a Rule-Driven Process or Generative Task?
LLEMA: Evolutionary Search with LLMs for Multi-Objective Materials Discovery
Neural Theorem Proving for Verification Conditions: A Real-World Benchmark
Aurelius: Relation Aware Text-to-Audio Generation At Scale
AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
Bridging Piano Transcription and Rendering via Disentangled Score Content and Style
Durian: Dual Reference Image-Guided Portrait Animation with Attribute Transfer
Hyperspherical Latents Improve Continuous-Token Autoregressive Generation
ToProVAR: Efficient Visual Autoregressive Modeling via Tri-Dimensional Entropy-Aware Semantic Analysis and Sparsity Optimization
Data Provenance for Image Auto-Regressive Generation
Latent Diffusion Model without Variational Autoencoder
FACM: Flow-Anchored Consistency Models
EchoMotion: Unified Human Video and Motion Generation via Dual-Modality Diffusion Transformer
Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models
Learning Patient-Specific Disease Dynamics With Latent Flow Matching For Longitudinal Imaging Generation
FreeViS: Training-free Video Stylization with Inconsistent References
Pixel to Gaussian: Ultra-Fast Continuous Super-Resolution with 2D Gaussian Modeling
MiSS: Revisiting the Trade-off in LoRA with an Efficient Shard-Sharing Structure
ORION: Decoupling and Alignment for Unified Autoregressive Understanding and Generation
Time-to-Move: Training-Free Motion-Controlled Video Generation via Dual-Clock Denoising
Realtime Video Frame Interpolation using One-Step Diffusion Sampling
CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models
Exploring Interpretability for Visual Prompt Tuning with Cross-layer Concepts
Generative Blocks World: Moving Things Around in Pictures
FastFlow: Accelerating The Generative Flow Matching Models with Bandit Inference
SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer
AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4D Scenes
Reconstruct Anything Model a lightweight general model for computational imaging
MotionWeaver: Holistic 4D-Anchored Framework for Multi-Humanoid Image Animation
$\alpha$-DPO: Robust Preference Alignment for Diffusion Models via $\alpha$ Divergence
Stochastic Self-Guidance for Training-Free Enhancement of Diffusion Models
MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference
MoCa: Modeling Object Consistency for 3D Camera Control in Video Generation
Measuring LLM Novelty As The Frontier Of Original And High-Quality Output
reAR: Rethinking Visual Autoregressive Models via Token-wise Consistency Regularization
CASteer: Cross-Attention Steering for Controllable Concept Erasure
DVD-Quant: Data-free Video Diffusion Transformers Quantization
Unleashing Perception-Time Scaling to Multimodal Reasoning Models
Splat the Net: Radiance Fields with Splattable Neural Primitives
CylinderSplat: 3D Gaussian Splatting with Cylindrical Triplanes for Panoramic Novel View Synthesis
Generative View Stitching
PoSh: Using Scene Graphs to Guide LLMs-as-a-Judge for Detailed Image Descriptions
Secondary Motion-Aware 3D Clothed Gaussian Avatars from Monocular Videos
ReconViaGen: Towards Accurate Multi-view 3D Object Reconstruction via Generation
$\ell_1$ Latent Distance based Continuous-time Graph Representation
SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus
Interp3D: Correspondence-aware Interpolation for Generative Textured 3D Morphing
CogniMap3D: Cognitive 3D Mapping and Rapid Retrieval
TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning
On Discriminative vs. Generative classifiers: Rethinking MLLMs for Action Understanding
Long-tailed Test-Time Adaptation for Vision-Language Models
TPRU: Advancing Temporal and Procedural Understanding in Large Multimodal Models
Through the Lens of Contrast: Self-Improving Visual Reasoning in VLMs
Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search
Mordal: Automated Pretrained Model Selection for Vision Language Models
LLMs as Rules Oracles: Exploring Real-World Multimodal Reasoning in Tabletop Strategy Game Environments
Sapiens2
Robust Test-time Video-Text Retrieval: Benchmarking and Adapting for Query Shifts
Mitigating Hallucination in Vision-Language Model with Depth and Spatial-aware Key-Value Refinement
Reasoning on Time-Series for Financial Technical Analysis
RL makes MLLMs see better than SFT
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning
Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs
Visual Planning: Let's Think Only with Images
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
HiTeA: Hierarchical Temporal Alignment for Training-Free Long-Video Temporal Grounding
SAM-Veteran: An MLLM-Based Human-like SAM Agent for Reasoning Segmentation
Thicker and Quicker: The Jumbo Token for Fast Plain Vision Transformers
Evaluating Cross-Modal Reasoning Ability and Problem Characteristics with Multimodal Item Response Theory
Point-UQ: An Uncertainty-Quantification Paradigm for Point Cloud Few-Shot Class Incremental Learning
Measure Twice, Cut Once: A Semantic-Oriented Approach to Video Temporal Localization with Video LLMs
Multimodal Dataset Distillation Made Simple by Prototype-Guided Data Synthesis
PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing
JUDO: A Juxtaposed Domain-Oriented Multimodal Reasoner for Industrial Anomaly QA
Part-X-MLLM: Part-aware 3D Multimodal Large Language Model
H2OFlow: Grounding Human-Object Affordances with 3D Generative Models and Dense Diffused Flows
HumanPCR: Probing MLLM Capabilities in Diverse Human-Centric Scenes
WebWatcher: Breaking New Frontiers of Vision-Language Deep Research Agent
VINCIE: Unlocking In-context Image Editing from Video
IVC-Prune: Revealing the Implicit Visual Coordinates in LVLMs for Vision Token Pruning
Cross-Timestep: 3D Diffusion Model with Trans-temporal Memory LSTM and Adaptive Priori Decoding Strategy for Medical Segmentation
WOW-Seg: A Word-free Open World Segmentation Model
TTOM: Test-Time Optimization and Memorization for Compositional Video Generation
OVSeg3R: Learn Open-vocabulary Instance Segmentation from 2D via 3D Reconstruction
GmNet: Revisiting Gating Mechanisms From A Frequency View
Customizing Visual Emotion Evaluation for MLLMs: An Open-vocabulary, Multifaceted, and Scalable Approach
Exploring Specular Reflection Inconsistency for Generalizable Face Forgery Detection
Semantic Visual Anomaly Detection and Reasoning in AI-Generated Images
Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints
OD$^3$: Optimization-free Dataset Distillation for Object Detection
BioTamperNet: Affinity-Guided State-Space Model Detecting Tampered Biomedical Images
KernelFusion: Zero-Shot Blind Super-Resolution via Patch Diffusion
UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity
Asymmetric Synthetic Data Update for Domain Incremental Dataset Distillation
Parameterization-Based Dataset Distillation of 3D Point Clouds through Learnable Shape Morphing
Making Slow Thinking Faster: Compressing LLM Chain-of-Thought via Step Entropy
Bootstrapping MLLM for Weakly‑Supervised Class‑Agnostic Object Counting
Analyzing the Training Dynamics of Image Restoration Transformers: A Revisit to Layer Normalization
DETR-ViP: Detection Transformer with Robust Discriminative Visual Prompts
Exponential-Wrapped Mechanisms: Differential Privacy on Hadamard Manifolds Made Practical
An Orthogonal Learner for Individualized Outcomes in Markov Decision Processes
PMark: Towards Robust and Distortion-free Semantic-level Watermarking with Channel Constraints
Expertise Can Be Helpful for Reinforcement Learning-based Macro Placement
Winter Soldier: Backdooring Language Models at Pre-Training with Indirect Data Poisoning
ULD-Net: Enabling Ultra-Low-Degree Fully Polynomial Networks for Homomorphically Encrypted Inference
INO-SGD: Addressing Utility Imbalance under Individualized Differential Privacy
EnsembleSHAP: Faithful and Certifiably Robust Attribution for Random Subspace Method
Federated Learning with Profile Mapping under Distribution Shifts and Drifts
AdvChain: Adversarial Chain-of-Thought Tuning for Robust Safety Alignment of Large Reasoning Models
CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation
SecP-Tuning: Efficient Privacy-Preserving Prompt Tuning for Large Language Models via MPC
A Unified Total Variation Framework for Membrane Potential Perturbation Dynamic
RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents
Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs
Adversarial Déjà Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks
RedSage: A Cybersecurity Generalist LLM
Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts
Sharpness-Aware Machine Unlearning
Evolution of Concepts in Language Model Pre-Training
DualEdit: Mitigating Safety Fallback in LLM Backdoor Editing via Affirmation-Refusal Regulation
Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs
BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration
Priors in time: Missing inductive biases for language model interpretability
Testing Most Influential Sets
Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives
When Style Breaks Safety: Defending LLMs Against Superficial Style Alignment
Not All Documents Are What You Need for Extracting Instruction Tuning Data
Untraceable DeepFakes via Traceable Fingerprint Elimination
Pragma-VL: Towards a Pragmatic Arbitration of Safety and Helpfulness in MLLMs
A Fair Bayesian Inference through Matched Gibbs Posterior
GuidedBench: Measuring and Mitigating the Evaluation Discrepancies of In-the-wild LLM Jailbreak Methods
Benchmarking Bias Mitigation Toward Fairness Without Harm from Vision to LVLMs
Speech World Model: Causal State–Action Planning with Explicit Reasoning for Speech
Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models
Doubly-Regressing Approach for Subgroup Fairness
Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth
Cost-of-Pass: An Economic Framework for Evaluating Language Models
Semi-Supervised Preference Optimization with Limited Feedback
Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset
DIVERSE: Disagreement-Inducing Vector Evolution for Rashomon Set Exploration
What Do Large Language Models Know About Opinions?
VUDG: A Dataset for Video Understanding Domain Generalization
Dimension-Free Decision Calibration for Nonlinear Loss Functions
Mean Estimation from Coarse Data: Characterizations and Efficient Algorithms
Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit
The Forecast After the Forecast: A Post-Processing Shift in Time Series
Adversarially Pretrained Transformers May Be Universally Robust In-Context Learners
Gradient Descent Dynamics of Rank-One Matrix Denoising
Block Recurrent Dynamics in Vision Transformers
What's the plan? Metrics for implicit planning in LLMs and their application to rhyme generation and question answering
Learnable Sparsity for Vision Generative Models
Eliciting Numerical Predictive Distributions of LLMs Without Auto-Regression
Pretraining with Re-parametrized Self-Attention: Unlocking Generalizationin SNN-Based Neural Decoding Across Time, Brains, and Tasks
Temporal superposition and feature geometry of RNNs under memory demands
Emotions Where Art Thou: Understanding and Characterizing the Emotional Latent Space of Large Language Models
Reshaping Reasoning in LLMs: A Theoretical Analysis of RL Training Dynamics through Pattern Selection
Debiased and Denoised Representation Learning for Incomplete Multi-view Clustering
How hard is learning to cut? Trade-offs and sample complexity
Distilling the Thought, Watermarking the Answer: A Principle Semantic Guided Watermark for Reasoning Large Language Models
OrderDP: A Theoretically Guaranteed Lossless Dynamic Data Pruning Framework
Stop Guessing: Choosing the Optimization-Consistent Uncertainty Measurement for Evidential Deep Learning
A Unifying View of Coverage in Linear Off-policy Evaluation
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale
Welfarist Formulations for Diverse Similarity Search
Less Is More: Clustered Cross-Covariance Control for Offline RL
Adaptive Scaling of Policy Constraints for Offline Reinforcement Learning
MAGE: Multi-scale Autoregressive Generation for Offline Reinforcement Learning
QeRL: Beyond Efficiency - Quantization-enhanced Reinforcement Learning for LLMs
Direct Preference Optimization for Primitive-Enabled Hierarchical RL: A Bilevel Approach
Geometry of Uncertainty: Learning Metric Spaces for Multimodal State Estimation in RL
Reward Model Routing in Alignment
A Simple "Motivation" Can Enhance Reinforcement Finetuning of Large Reasoning Models
Horizon Imagination: Efficient On-Policy Rollout in Diffusion World Models
Topological Causal Effects
Mango-GS: Enhancing Spatio-Temporal Consistency in Dynamic Scenes Reconstruction using Multi-Frame Node-Guided 4D Gaussian Splatting
Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions
SSVPO: Effective Step-Level Credit Assignment for RL Training of Language Models
Learning Efficient and Interpretable Multi-Agent Communication
Towards Better Branching Policies: Leveraging the Sequential Nature of Branch-and-Bound Tree
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
Group-Normalized Implicit Value Optimization for Language Models
Contractive Diffusion Policies
Continuous-Time Value Iteration for Multi-Agent Reinforcement Learning
ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents
Near-Optimal Online Deployment and Routing for Streaming LLMs
MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents
VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications
Learning Dynamics Feature Representation via Policy Attention for Dynamic Path Planning in Urban Road Networks
Information-based Value Iteration Networks for Decision Making Under Uncertainty
WebSeer: Training Deeper Search Agents through Reinforcement Learning with Self-Reflection
RewardBench 2: Advancing Reward Model Evaluation
AutoQD: Automatic Discovery of Diverse Behaviors with Quality-Diversity Optimization
Risk-Sensitive Reinforcement Learning for Alleviating Exploration Dilemmas in Large Language Models
The Markovian Thinker: Architecture-Agnostic Linear Scaling of Reasoning
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
Shop-R1: Rewarding LLMs to Simulate Human Behavior in Online Shopping via Reinforcement Learning
Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences
LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning
Disentangled Representation Learning for Parametric Partial Differential Equations
ViPO: Visual Preference Optimization at Scale
Turning Internal Gap into Self-Improvement: Promoting the Generation-Understanding Unification in MLLMs
Hallucination Begins Where Saliency Drops
WMPO: World Model-based Policy Optimization for Vision-Language-Action Models
floq: Training Critics via Flow-Matching for Scaling Compute in Value-Based RL
Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework
Pushing Test-Time Scaling Limits of Deep Search with Asymmetric Verification
Inpainting-Guided Policy Optimization for Diffusion Large Language Models
AFTER: Mitigating the Object Hallucination of LVLM via Adaptive Factual-Guided Activation Editing
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows
Representation Alignment for Diffusion Transformers without External Components
Speculative Speculative Decoding
Generalizable End-to-End Tool-Use RL with Synthetic CodeGym
OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning
How Stable is the Next Token? A Geometric View of LLM Prediction Stability
Instance-wise Adaptive Scheduling via Derivative-Free Meta-Learning
Revisiting Group Relative Policy Optimization: Insights into On-Policy and Off-Policy Training
PM-KVQ: Progressive Mixed-precision KV Cache Quantization for Long-CoT LLMs
LLMs Can Hide Text in Other Text of the Same Length
Is Pure Exploitation Sufficient in Exogenous MDPs with Linear Function Approximation?
S2R-HDR: A Large-Scale Rendered Dataset for HDR Fusion
BRIDGE: Bi-level Reinforcement Learning for Dynamic Group Structure in Coalition Formation Games
SocialJax: An Evaluation Suite for Multi-agent Reinforcement Learning in Sequential Social Dilemmas
ZeroGR: A Generalizable and Scalable Framework for Zero-Shot Generative Retrieval
Joint Adaptation of Uni-modal Foundation Models for Multi-modal Alzheimer's Disease Diagnosis
CoMind: Towards Community-Driven Agents for Machine Learning Engineering
Training-free Counterfactual Explanation for Temporal Graph Model Inference
Neyman-Pearson Classification under Both Null and Alternative Distributions Shift
Near-Optimal Second-Order Guarantees for Model-Based Adversarial Imitation Learning
CL-DPS: A Contrastive Learning Approach to Blind Nonlinear Inverse Problem Solving via Diffusion Posterior Sampling
Reducing Class-Wise Performance Disparity via Margin Regularization
Tug-of-War No More: Harmonizing Accuracy and Robustness in Vision-Language Models via Stability-Aware Task Vector Merging
Language Confusion Gate: Language-Aware Decoding Through Model Self-Distillation
YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting
MeSH: Memory-as-State-Highways for Recursive Transformers
CellDuality: Unlocking Biological Reasoning in LLMs with Self-Supervised RLVR
FeDaL: Federated Dataset Learning for General Time Series Foundation Models
Beyond Speedup - Utilizing KV Cache for Sampling and Reasoning
CoLLMLight: Cooperative Large Language Model Agents for Network-Wide Traffic Signal Control
OmniPortrait: Fine-Grained Personalized Portrait Synthesis via Pivotal Optimization
PRO-MOF: Policy Optimization with Universal Atomistic Models for Controllable MOF Generation
Diverse Text-to-Image Generation via Contrastive Noise Optimization
Muon Outperforms Adam in Tail-End Associative Memory Learning
Frustratingly Simple Retrieval Improves Challenging, Reasoning-Intensive Benchmarks
Culture In a Frame: C$^3$B as a Comic-Based Benchmark for Multimodal Culturally Awareness
GIT-BO: High-Dimensional Bayesian Optimization with Tabular Foundation Models
Full-Graph vs. Mini-Batch Training: Comprehensive Analysis from a Batch Size and Fan-Out Size Perspective
Healthcare Insurance Fraud Detection via Continual Fiedler Vector Graph Model
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs
Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
Neural Optimal Transport Meets Multivariate Conformal Prediction
Kevin: Multi-Turn RL for Generating CUDA Kernels
Learning to Reason via Mixture-of-Thought for Logical Reasoning
TTSDS2: Resources and Benchmark for Evaluating Human-Quality Text to Speech Systems
RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning
Kimi-Dev: Agentless Training as Skill Prior for SWE-agents
FedMC: Federated Manifold Calibration
MATRIX: Mask Track Alignment for Interaction-aware Video Generation
FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning
Mixed-Curvature Tree-Sliced Wasserstein Distance
GoT-R1: Unleashing Reasoning Capability of Autoregressive Visual Generation with Reinforcement Learning
Learning Escorted Protocols For Multistate Free-Energy Estimation
MATH-Beyond: A Benchmark for RL to Expand Beyond the Base Model
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity?
Understanding and Improving Continuous LLM Adversarial Training via In-context Learning Theory
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
Almost Bayesian: Dynamics of SGD Through Singular Learning Theory
Point-MoE: Large-Scale Multi-Dataset Training with Mixture-of-Experts for 3D Semantic Segmentation
Query-Level Uncertainty in Large Language Models
The Potential of CoT for Reasoning: A Closer Look at Trace Dynamics
Foundation Models for Causal Inference via Prior-Data Fitted Networks
Efficient Ensemble Conditional Independence Test Framework for Causal Discovery
Executable Counterfactuals: Improving LLMs' Causal Reasoning Through Code
Modeling Interference for Treatment Effect Estimation in Network Dynamic Environment
Permutation-Consistent Variational Encoding for Incomplete Multi-View Multi-Label Classification
Temporal Slowness in Central Vision Drives Semantic Object Learning
Self-Supervised Learning from Structural Invariance
Beyond Instance-Level Alignment: Dual-Level Optimal Transport for Audio-Text Retrieval
Disentangled representation learning through unsupervised symmetry group discovery
There Was Never a Bottleneck in Concept Bottleneck Models
Verification of the Implicit World Model in a Generative Model via Adversarial Sequences
Uncertainty-driven Embedding Convolution
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
Explainable $ K $-means Neural Networks for Multi-view Clustering
SeeDNorm: Self-Rescaled Dynamic Normalization
Beyond DAGs: A Latent Partial Causal Model for Multimodal Learning
NeuCLIP: Efficient Large-Scale CLIP Training with Neural Normalizer Optimization
Stochastic Optimal Control for Continuous-Time fMRI Representation Learning
RAEE: A Robust Retrieval-Augmented Early Exit Framework for Efficient Inference
Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds
Beyond Static Vision: Scene Dynamic Field Unlocks Intuitive Physics Understanding in Multi-modal Large Language Models
AdAEM: An Adaptively and Automated Extensible Measurement of LLMs' Value Difference
One-Shot Exemplars for Class Grounding in Self-Supervised Learning
Adversarial Encoding Perturbation and Synthesis for Set Representation Auxiliary Learning
Command-V: Training-Free Representation Finetuning Transfer
Elastic Optimal Transport: Theory, Application, and Empirical Evaluation
Efficient Orthogonal Fine-Tuning with Principal Subspace Adaptation
Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking
Dataless Weight Disentanglement in Task Arithmetic via Kronecker-Factored Approximate Curvature
SWINGARENA: Adversarial Programming Arena for Long-context GitHub Issue Solving
SmartDJ: Declarative Audio Editing with Audio Language Model
Uncertainty-Aware Diagnostics for Physics-Informed Machine Learning
Don’t Pass@k: A Bayesian Framework for Large Language Model Evaluation
$p\textrm{-less}$ Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding
Efficient Agent Training for Computer Use
LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models
Watch your steps: Dormant Adversarial Behaviors that Activate upon LLM Finetuning
Towards Self-Evolving Agent Benchmarks : Validatable Agent Trajectory via Test-Time Exploration
OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding
SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation
Faster Gradient Methods for Highly-smooth Stochastic Bilevel Optimization
ChartGalaxy: A Dataset for Infographic Chart Understanding and Generation
MIMIC-Bench: Exploring the User-Like Thinking and Mimicking Capabilities of Multimodal Large Language Models
Language-guided Open-world Video Anomaly Detection under Weak Supervision
Strongly Convex Sets in Riemannian Manifolds
Towards Personalized Deep Research: Benchmarks and Evaluations
Data-Centric Lessons To Improve Speech-Language Pretraining
Test-time Verification via Optimal Transport: Coverage, ROC, & Sub-optimality
Efficient Approximate Posterior Sampling with Annealed Langevin Monte Carlo
Efficient Sliced Wasserstein Distance Computation via Adaptive Bayesian Optimization
Trinity: An Evolved LLM Coordinator
DR-Submodular Maximization with Stochastic Biased Gradients: Classical and Quantum Gradient Algorithms
LUMINA: Detecting Hallucinations in RAG System with Context–Knowledge Signals
Unlocking Full Efficiency of Token Filtering in Large Language Model Training
From Sequential to Parallel: Reformulating Dynamic Programming as GPU Kernels for Large-Scale Stochastic Combinatorial Optimization
A Scalable Distributed Framework for Multimodal GigaVoxel Image Registration
FS-KAN: Permutation Equivariant Kolmogorov-Arnold Networks via Function Sharing
C-Voting: Confidence-Based Test-Time Voting without Explicit Energy Functions
Achieving low-bit Muon through subspace preservation and grid quantization
Samples Are Not Equal: A Sample Selection Approach for Deep Clustering
Massive Editing for Large Language Models Based on Dynamic Weight Generation
Uni-DPO: A Unified Paradigm for Dynamic Preference Optimization of LLMs
Dr.LLM: Dynamic Layer Routing in LLMs
ADEPT: Continual Pretraining via Adaptive Expansion and Dynamic Decoupled Tuning
Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding
Metis: Training LLMs with FP4 Quantization
Counterfactual Reasoning for Retrieval-Augmented Generation
RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers
LeSTD: LLM Compression via Learning-based Sparse Tensor Decomposition
In Context Semi-Supervised Learning
Channel-Aware Mixed-Precision Quantization for Efficient Long-Context Inference
Expert Heads: Robust Evidence Identification for Large Language Models
ProxyAttn: Guided Sparse Attention via Representative Heads
RESA: Bringing Back What Sparse Attention Ignores with Residual Estimation
LycheeDecode: Accelerating Long-Context LLM Inference via Hybrid-Head Sparse Decoding
Scaling Knowledge Editing in LLMs to 100,000 Facts with Neural KV Database
PrefixMemory-Tuning: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention
Discrete Bayesian Sample Inference for Graph Generation
Is the Reversal Curse a Binding Problem? Uncovering Limitations of Transformers from a Basic Generalization Failure
What Exactly Does Guidance Do in Masked Discrete Diffusion Models
Structured Flow Autoencoders: Learning Structured Probabilistic Representations with Flow Matching
Learning residue level protein dynamics with multiscale Gaussians
Contact Wasserstein Geodesics for Non-Conservative Schrödinger Bridges
Train Once, Answer All: Many Pretraining Experiments for the Cost of One
Dual-Path Condition Alignment for Diffusion Transformers
Open Data Synthesis for Deep Research
pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation
Adapting Self-Supervised Representations as a Latent Space for Efficient Generation
Attention Is All You Need for KV Cache in Diffusion LLMs
ProtoKV: Long-context Knowledges Are Already Well-Organized Before Your Query
Catalog-Native LLM: Speaking Item-ID dialect with Less Entanglement for Recommendation
Self-Speculative Decoding Accelerates Lossless Inference in Any-Order and Any-Subset Autoregressive Models
One step further with Monte-Carlo sampler to guide diffusion better
Training-Free Reward-Guided Image Editing via Trajectory Optimal Control
ReLaSH: Reconstructing Joint Latent Spaces for Efficient Generation of Synthetic Hypergraphs with Hyperlink Attributes
CardioComposer: Leveraging Differentiable Geometry for Compositional Control of Anatomical Diffusion Models
Graph homophily booster: Reimagining the role of discrete features in heterophilic graph learning
DR-GGAD: Dual Residual Centering for Mitigating Anomaly Non‑Discriminativity in Generalist Graph Anomaly Detection
Physics-Inspired All-Pair Interaction Learning for 3D Dynamics Modeling
Forest-Based Graph Learning for Semi-Supervised Node Classification
Trapped by simplicity: When Transformers fail to learn from noisy features
TrainRef: Curating Data with Label Distribution and Minimal Reference for Accurate Prediction and Reliable Confidence
Variational Deep Learning via Implicit Regularization
How Dark Patterns Manipulate Web Agents
Pay Less Attention to Function Words for Free Robustness of Vision-Language Models
SeRI: Gradient-Free Sensitive Region Identification in Decision-Based Black-Box Attacks
Intrinsic Entropy of Context Length Scaling in LLMs
How Muon’s Spectral Design Benefits Generalization: A Study on Imbalanced Data
Noise Stability of Transformer Models
Benchmarking LLM Tool-Use in the Wild
Efficient Adversarial Attacks on High-dimensional Offline Bandits
Understanding the Learning Phases in Self-Supervised Learning via Critical Periods
Train-before-Test Harmonizes Language Model Rankings
E²LoRA: Efficient and Effective Low-Rank Adaptation with Entropy-Guided Adaptive Sharing
An Information Theoretic Perspective on Agentic System Design
Hystar: Hypernetwork-driven Style-adaptive Retrieval via Dynamic SVD Modulation
Libra: Effective yet Efficient Load Balancing for Large-scale MoE Inference
SAIR: Enabling Deep Learning for Protein-Ligand Interactions with a Synthetic Structural Dataset
The First Impression Problem: Internal Bias Triggers Overthinking in Reasoning Models
Interpolation-Based Conditioning of Flow Matching Models for Bioisosteric Ligand Design
RMFlow: Refined Mean Flow by a Noise-Injection Step for Multimodal Generation
Protein Structure Tokenization via Geometric Byte Pair Encoding
Branched Schrödinger Bridge Matching
Refine Drugs, Don’t Complete Them: Uniform-Source Discrete Flows for Fragment-Based Drug Discovery
HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data
RIDER: 3D RNA Inverse Design with Reinforcement Learning-Guided Diffusion
GeneBreaker: Jailbreak Attacks against DNA Language Models with Pathogenicity Guidance
CP-Agent: Context‑Aware Multimodal Reasoning for Cellular Morphological Profiling under Chemical Perturbations
OVID: Open-Vocabulary Intrusion Detection
Histopathology-Genomics Multi-modal Structural Representation Learning for Data-Efficient Precision Oncology
Photon: Speedup Volume Understanding with Efficient Multimodal Large Language Models
Random Anchors with Low-rank Decorrelated Learning: A Minimalist Pipeline for Class-Incremental Medical Image Classification
KnowGuard: Knowledge-Driven Abstention for Multi-Round Clinical Reasoning
VeriRole: Verifiable Role-Awareness through Hint-Guided Reinforcement Learning
MedVR: Annotation-Free Medical Visual Reasoning via Agentic Reinforcement Learning
Gradient Intrinsic Dimensionality Alignment:Narrowing The Gap Between Low-Rank Adaptation and Full Fine-Tuning
Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks
MedLesionVQA: A Multimodal Benchmark Emulating Clinical Visual Diagnosis for Body Surface Health
Catching the Details: Self-Distilled RoI Predictors for Fine-Grained MLLM Perception
Critic–Adviser–Reviser Cyclic Refinement: Towards High-Quality EMR Corpus Generation with LLMs
The Limits of Inference Scaling Through Resampling
Unveiling the Mechanism of Continuous Representation Full-Waveform Inversion: A Wave Based Neural Tangent Kernel Framework
CFO: Learning Continuous-Time PDE Dynamics via Flow-Matched Neural Operators
RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation
Grounding Generative Planners in Verifiable Logic: A Hybrid Architecture for Trustworthy Embodied AI
RF-MatID: Dataset and Benchmark for Radio Frequency Material Identification
BoRA: Towards More Expressive Low-Rank Adaptation with Block Diversity
M$^3$E: Continual Vision-and-Language Navigation via Mixture of Macro and Micro Experts
RoboOmni: Proactive Robot Manipulation in Omni-modal Context
Uncertainty-Aware Gaussian Map for Vision-Language Navigation
Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-Language Navigation
Aegis: Automated Error Generation and Attribution for Multi-Agent Systems
SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via Velocity-Reparameterized Sequential Modeling
SARM: Stage-Aware Reward Modeling for Long Horizon Robot Manipulation
Generalizable Coarse-to-Fine Robot Manipulation via Language-Aligned 3D Keypoints
Accelerated co-design of robots through morphological pretraining
AutoBio: A Simulation and Benchmark for Robotic Automation in Digital Biology Laboratory
Bird's-eye-view Informed Reasoning Driver
TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning
SPECS: Decoupling Multimodal Learning via Self-distilled Preference-based Cold Start
ARROW: An Adaptive Rollout and Routing Method for Global Weather Forecasting
When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency
Steering Language Models with Weight Arithmetic
Enabling arbitrary inference in spatio-temporal dynamic systems: A physics-inspired perspective
SE-Diff: Simulator and Experience Enhanced Diffusion Model for Comprehensive ECG Generation
Contextual and Seasonal LSTMs for Time Series Anomaly Detection
CLIP-FMoE: Scalable CLIP via Fused Mixture-of-Experts with Enforced Specialization
Accelerating Inference for Multilayer Neural Networks with Quantum Computers
Rewarding Doubt: A Reinforcement Learning Approach to Calibrated Confidence Expression of Large Language Models
GPS: Graph-guided Proactive Information Seeking in Large Language Models
Scale-wise Distillation of Diffusion Models
Learnable Fractional Superlets with a Spectro-Temporal Emotion Encoder for Speech Emotion Recognition
IterResearch: Rethinking Long-Horizon Agents with Interaction Scaling
Data Selection for LLM Alignment Using Fine-Grained Preferences
Rethinking Global Text Conditioning in Diffusion Transformers
Enhancing Persona Following at Decoding Time via Dynamic Importance Estimation for Role-Playing Agents
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
SUSD: Structured Unsupervised Skill Discovery through State Factorization
PERSONA: Dynamic and Compositional Inference-Time Personality Control via Activation Vector Algebra
Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models
A Problem-Oriented Perspective and Anchor Verification for Code Optimization
LINGOLY-TOO: Disentangling Reasoning from Knowledge with Templatised Orthographic Obfuscation
POEMetric: The Last Stanza of Humanity
AutoLibra: Agent Metric Induction from Open-Ended Human Feedback
FeatureBench: Benchmarking Agentic Coding for Complex Feature Development
Scaling Behavior of Discrete Diffusion Language Models
Geometric Constraints for Small Language Models to Understand and Expand Scientific Taxonomies
Neuron-Aware Data Selection in Instruction Tuning for Large Language Models
IDEAL: Data Equilibrium Adaptation for Multi-Capability Language Model Alignment
RefTool: Reference-Guided Tool Creation for Knowledge-Intensive Reasoning
Inheriting Generalizable Knowledge from LLMs to Diverse Vertical Tasks
Implicit Regularization of SGD Reduces Shortcut Learning
FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates
Should We Still Pretrain Encoders with Masked Language Modeling?
ClarifyVC: Clarifying Ambiguous Commands in Vehicle Control with a Hybrid Data Augmentation Pipeline
Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models
TableMaster: A Recipe to Advance Table Understanding with Language Models
Learning to Generate Unit Test via Adversarial Reinforcement Learning
SPRIG: Improving Large Language Model Performance by System Prompt Optimization
DESIGNER: Design-Logic-Guided Multidisciplinary Data Synthesis for LLM Reasoning
Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution
Code Aesthetics with Agentic Reward Feedback
GRO-RAG: Gradient-aware Re-rank Optimization for Multi-source Retrieval-Augmented Generation
Explore-on-Graph: Incentivizing Autonomous Exploration of Large Language Models on Knowledge Graphs with Path-refined Reward Modeling
DreamOn: Diffusion Language Models For Code Infilling Beyond Fixed-size Canvas
Repurposing Synthetic Data for Fine-grained Search Agent Supervision
Flow2GAN: Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-step High-Fidelity Audio Generation
The Human Brain as a Dynamic Mixture of Expert Models in Video Understanding
Towards Interpretable Visual Decoding with Attention to Brain Representations
CerebraGloss: Instruction-Tuning a Large Vision-Language Model for Fine-Grained Clinical EEG Interpretation
Convex Efficient Coding
Setting up for failure: automatic discovery of the neural mechanisms of cognitive errors
A tale of two tails: Preferred and anti-preferred natural stimuli in visual cortex
Animal behavioral analysis and neural encoding with transformer-based self-supervised pretraining
SMixer: Rethinking Efficient-Training and Event-Driven SNNs
Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search
SK2Decompile: LLM-based Two-Phase Binary Decompilation from Skeleton to Skin
Trade in Minutes! Rationality-Driven Agentic System for Quantitative Financial Trading
CoDA: Agentic Systems for Collaborative Data Visualization
LoC-Decomp: LLM Autoformalization via Logical Concept Decomposition and Iterative Feedback Correction
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
EvolProver: Advancing Automated theorem proving by Evolving Formalized Problems via Symmetry and Difficulty
iFusion: Integrating Dynamic Interest Streams via Diffusion Model for Click-Through Rate Prediction
Topology Matters in RTL Circuit Representation Learning
A General Framework for Black-Box Attacks Under Cost Asymmetry
Process-Level Trajectory Evaluation for Environment Configuration in Software Engineering Agents
PCB-Bench: Benchmarking LLMs for Printed Circuit Board Placement and Routing
CollectiveKV: Decoupling and Sharing Collaborative Information in Sequential Recommendation
GenCompositor: Generative Video Compositing with Diffusion Transformer
Constantly Improving Image Models Need Constantly Improving Benchmarks
ODE-GS: Latent ODEs for Dynamic Scene Extrapolation with 3D Gaussian Splatting
MVAR: Visual Autoregressive Modeling with Scale and Spatial Markovian Conditioning
EdiVal-Agent: An Object-Centric Framework for Automated, Fine-Grained Evaluation of Multi-Turn Editing
Preserving Forgery Artifacts: AI-Generated Video Detection at Native Scale
Streaming Drag-Oriented Interactive Video Manipulation: Drag Anything, Anytime!
RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning
EasyTune: Efficient Step-Aware Fine-Tuning for Diffusion-Based Motion Generation
UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models
PICABench: How Far are We from Physical Realistic Image Editing?
Soft-Di[M]O: Improving One-Step Discrete Image Generation with Soft Embeddings
TrustGen: A Platform of Dynamic Benchmarking on the Trustworthiness of Generative Foundation Models
LayerSync: Self-aligning Intermediate Layers
Real-Time Motion-Controllable Autoregressive Video Diffusion
EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning
QVGen: Pushing the Limit of Quantized Video Generative Models
Stable Video Infinity: Infinite-Length Video Generation with Error Recycling
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence
Controllable Video Generation with Provable Disentanglement
Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer
RegionE: Adaptive Region-Aware Generation for Efficient Image Editing
SoftCFG: Uncertainty-guided Stable Guidance for Visual Autoregressive Model
Video-As-Prompt: Unified Semantic Control for Video Generation
Instilling an Active Mind in Avatars via Cognitive Simulation
Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective
MEGS^{2}: Memory-Efficient Gaussian Splatting via Spherical Gaussians and Unified Pruning
CAD-Tokenizer: Towards Text-Based CAD Prototyping via Modality-Specific Tokenization
GenFusion: Feed-forward Human Performance Capture via Progressive Canonical Space Updates
Distractor-free Generalizable 3D Gaussian Splatting
QuadGPT: Native Quadrilateral Mesh Generation with Autoregressive Models
VL-JEPA: Joint Embedding Predictive Architecture for Vision-language
Cambrian-S: Towards Spatial Supersensing in Video
ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding
Progressive Online Video Understanding with Evidence-Aligned Timing and Transparent Decisions
FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting
Thinking as Society: Multi-Social-Agent Self-Distillation for Multimodal Misinformation Detection
Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle
V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model Interaction
Earth-Agent: Unlocking the Full Landscape of Earth Observation with Agents
RayI2P: Learning Rays for Image-to-Point Cloud Registration
Panoptic Pairwise Distortion Graph
PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of mUlti-turn jailbrEaks
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Lightweight Spatio-Temporal Modeling via Temporally Shifted Distillation for Real-Time Accident Anticipation
Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models
ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping
VideoAnchor: Reinforcing Subspace-Structured Visual Cues for Coherent Visual-Spatial Reasoning
PlantRSR: A New Plant Dataset and Method for Reference-based Super-Resolution
Perception-Aware Policy Optimization for Multimodal Reasoning
SCUBA: Salesforce Computer Use Benchmark
ExPO-HM: Learning to Explain-then-Detect for Hateful Meme Detection
AnyUp: Universal Feature Upsampling
MRAD: Zero-Shot Anomaly Detection with Memory-Driven Retrieval
Constructive Distortion: Improving MLLMs with Attention-Guided Image Warping
VidBridge-R1: Bridging QA and Captioning for RL-based Video Understanding Models with Intermediate Proxy Tasks
LearnPruner: Rethinking Attention-based Token Pruning in Vision Language Models
Fostering Video Reasoning via Next-Event Prediction
TOUCH: Text-guided Controllable Generation of Free-Form Hand-Object Interactions
Sequential Information Bottleneck Fusion: Towards Robust and Generalizable Multi-Modal Brain Tumor Segmentation
Low-Latency Neural LiDAR Compression with 2D Context Models
Maximizing Asynchronicity in Event-based Neural Networks
RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration
KinemaDiff: Towards Diffusion for Coherent and Physically Plausible Human Motion Prediction
FARTrack: Fast Autoregressive Visual Tracking with High Performance
CortiLife: A Unified Framework for Cortical Representation Learning across the Lifespan
Self-Guided Low Light Object Detection Framework
Federated Learning of Quantile Inference under Local Differential Privacy
HiddenEcho: Mitigating Noise Amplification in Differentially Private LLMs with Hidden-State Correction
Rethinking LoRA for Privacy-Preserving Federated Learning in Large Models
Information-Theoretic Membership Inference for Granular Quantification of Memorization
Secret-Protected Evolution for Differentially Private Synthetic Text Generation
Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models
Revisiting Confidence Calibration for Misclassification Detection in VLMs
Mechanistic Detection and Mitigation of Hallucination in Large Reasoning Models
Learning for Highly Faithful Explainability
Uncovering Conceptual Blindspots in Generative Image Models Using Sparse Autoencoders
Evaluating Data Influence in Meta Learning
Dissecting Representation Misalignment in Contrastive Learning via Influence Function
Self-Consistency Improves the Trustworthiness of Self-Interpretable GNNs
Tracing the Traces: Latent Temporal Signals for Efficient and Accurate Reasoning
Bi-directional Bias Attribution: Debiasing Large Language Models without Modifying Prompts
JailbreakLoRA: Your Downloaded LoRA from Sharing Platforms might be Unsafe
DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models
When AI Agents Collude Online: Financial Fraud Risks by Collaborative LLM Agents on Social Platforms
Adversarial Attacks Already Tell the Answer: Directional Bias-Guided Test-time Defense for Vision-Language Models
No Caption, No Problem: Caption-Free Membership Inference via Model-Fitted Embeddings
Naming to Learn: Class Incremental Learning for Vision-Language Model with Unlabeled Data
Bi-Criteria Metric Distortion
Non-Asymptotic Analysis of (Sticky) Track-and-Stop
Near-Optimal Sample Complexity Bounds for Constrained Average-Reward MDPs
Why DPO is a Misspecified Estimator and How to Fix It
InfoNCE Induces Gaussian Distribution
Automated Interpretability Metrics Do Not Distinguish Trained and Random Transformers
Unveiling Super Experts in Mixture-of-Experts Large Language Models
A Hidden Semantic Bottleneck in Conditional Embeddings of Diffusion Transformers
Quantum machine learning advantages beyond hardness of evaluation
Matched Data, Better Models: Target Aligned Data Filtering with Sparse Autoencoders
Skill Learning via Policy Diversity Yields Identifiable Representations for Reinforcement Learning
Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning
Masked Skill Token Training for Hierarchical Off-Dynamics Transfer
MOBODY: Model-Based Off-Dynamics Offline Reinforcement Learning
Interleave-VLA: Enhancing Robot Manipulation with Image-Text Interleaved Instructions
Search Self-Play: Pushing the Frontier of Agent Capability without Supervision
Transitive RL: Value Learning via Divide and Conquer
TD-JEPA: Latent-predictive Representations for Zero-Shot Reinforcement Learning
On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning
Native Reasoning Models: Training Language Models to Reason on Unverifiable Data
Reference Grounded Skill Discovery
Jackpot: Align Actor-Policy Distribution for scalable and stable RL for LLM
Policy Newton Algorithm in Reproducing Kernel Hilbert Space
GRACE: A Language Model Framework for Explainable Inverse Reinforcement Learning
Safe Continuous-time Multi-Agent Reinforcement Learning via Epigraph Form
STAIRS-Former: Spatio-Temporal Attention with Interleaved Recursive Structure TransFormer for Offline Mulit-task Multi-agent Reinforcement Learning
GlobeDiff: State Diffusion Process for Partial Observability in Multi-Agent System
HiPO: Self-Hint Policy Optimization for RLVR
R2-Dreamer: Redundancy-Reduced World Models without Decoders or Augmentation
Revisiting Matrix Sketching in Linear Bandits: Achieving Sublinear Regret via Dyadic Block Sketching
Temperature as a Meta-Policy: Adaptive Temperature in LLM Reinforcement Learning
Flow Matching with Injected Noise for Offline-to-Online Reinforcement Learning
Triple-BERT: Do We Really Need MARL for Order Dispatch on Ride-Sharing Platforms?
Automating the Refinement of Reinforcement Learning Specifications
LogicReward: Incentivizing LLM Reasoning via Step-Wise Logical Supervision
Efficient Morphology-Control Co-Design via Stackelberg Proximal Policy Optimization
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning
Scaling Large Vision-Language Model RL Training via Efficient Load Balancing
RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents
BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning
Geometric-Mean Policy Optimization
CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs
Efficient Multi-objective Prompt Optimization via Pure-exploration Bandits
Count Counts: Motivating Exploration in LLM Reasoning with Count-based Intrinsic Rewards
Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs
Activation Steering with a Feedback Controller
Hyperparameter Trajectory Inference with Conditional Lagrangian Optimal Transport
Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment
On the Predictive Power of Representation Dispersion in Language Models
SCI-Verifier: Scientific Verifier with Thinking
Nasty Adversarial Training: A Probability Sparsity Perspective for Robustness Enhancement
Comparing the learning dynamics of in-context learning and fine-tuning in language models
Hallucination Reduction with CASAL: Contrastive Activation Steering for Amortized Learning
To Compress or Not? Pushing the Frontier of Lossless GenAI Model Weights Compression with Exponent Concentration
AttTok: Marrying Attribute Tokens with Generative Pre-trained Vision-Language Models towards Medical Image Understanding
Improved Adversarial Diffusion Compression for Real-World Video Super-Resolution
InclusiveVidPose: Bridging the Pose Estimation Gap for Individuals with Limb Deficiencies in Video-Based Motion
Dynamic-dLLM: Dynamic Cache-Budget and Adaptive Parallel Decoding for Training-Free Acceleration of Diffusion LLM
A Brain Graph Foundation Model: Pre-Training and Prompt-Tuning across Broad Atlases and Disorders
Consis-GCPO: Consistency-Preserving Group Causal Preference Optimization for Vision Customization
Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
Taming Curvature: Architecture Warm-up for Stable Transformer Training
Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models
FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning
TEMPFLOW-GRPO: WHEN TIMING MATTERS FOR GRPO IN FLOW MODELS
CoDA: From Text-to-Image Diffusion Models to Training-Free Dataset Distillation
DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation
Joint Distribution–Informed Shapley Values for Sparse Counterfactual Explanations
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding
Can LLMs Move Beyond Short Exchanges to Realistic Therapy Conversations?
Cartridges: Lightweight and general-purpose long context representations via self-study
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models
CPQS-Tuning: A Model Self-Perception-Based Data Filtering Algorithm for Efficient Instruction Fine-Tuning
To Sink or Not to Sink: Visual Information Pathways in Large Vision-Language Models
$PhyWorldBench$: A Comprehensive Evaluation of Physical Realism in Text-to-Video Models
Are we measuring oversmoothing in graph neural networks correctly?
Language-Instructed Vision Embeddings for Controllable and Generalizable Perception
Test-Time Optimization of 3D Point Cloud LLM via Manifold-Aware In-Context Guidance and Refinement
Object-Centric World Models from Few-Shot Annotations for Sample-Efficient Reinforcement Learning
A High Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation
Emergent Misalignment is Easy, Narrow Misalignment is Hard
Reinforcing General Reasoning Without Verifiers
AlphaAgentEvo: Evolution-Oriented Alpha Mining via Self-Evolving Agentic Reinforcement Learning
IF-VidCap: Can Video Caption Models Follow Instructions?
Disentangling the Factors of Convergence between Brains and DINOv3
PreferThinker: Reasoning-based Personalized Image Preference Assessment
Modeling the Density of Pixel-level Self-supervised Embeddings for Unsupervised Pathology Segmentation in Medical CT
CoCoDiff: Correspondence-Consistent Diffusion Model for Fine-grained Style Transfer
Physics-Informed Inference Time Scaling for Solving High-Dimensional Partial Differential Equations
Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test
Tree Search for LLM Agent Reinforcement Learning
Topological Anomaly Quantification for Semi-supervised Graph Anomaly Detection
MedGMAE: Gaussian Masked Autoencoders for Medical Volumetric Representation Learning
ATTS: Asynchronous Test-Time Scaling via Conformal Prediction
Evaluating and Improving Cultural Awareness of Reward Models for LLM Alignment
Can Vision–Language Models Assess Graphic Design Aesthetics? A Benchmark, Evaluation, and Dataset Perspective.
Anchored Supervised Fine-Tuning
Neural Sum-of-Squares: Certifying the Nonnegativity of Polynomials with Transformers
UrbanFeel:A Comprehensive Benchmark for Temporal and Perceptual Understanding of City Scenes through Human Perspective
Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning
Multi-Subspace Multi-Modal Modeling for Diffusion Models: Estimation, Convergence and Mixture of Experts
MARTI: A Framework for Multi-Agent LLM Systems Reinforced Training and Inference
RLP: Reinforcement as a Pretraining Objective
RL's Razor: Why Online Reinforcement Learning Forgets Less
Implicit Regularisation in Diffusion Models: An Algorithm-Dependent Generalisation Analysis
AlphaAlign: Incentivizing Safety Alignment with Extremely Simplified Reinforcement Learning
Fast-dLLM v2: Efficient Block-Diffusion LLM
StreamingThinker: Large Language Models Can Think While Reading
Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs
SAIL: Self-Amplified Iterative Learning for Diffusion Model Alignment with Minimal Human Feedback
Variance-Dependent Regret Lower Bounds for Contextual Bandits
IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction
Hilbert-Guided Sparse Local Attention
Slicing Wasserstein over Wasserstein via Functional Optimal Transport
NI Sampling: Accelerating Discrete Diffusion Sampling by Token Order Optimization
AutoTool: Automatic Scaling of Tool-Use Capabilities in RL via Decoupled Entropy Constraints
K-Sort Eval: Efficient Preference Evaluation for Visual Generation via Corrected VLM-as-a-Judge
Context Tokens are Anchors: Understanding the Repeat Curse in dMLLMs from an Information Flow Perspective
Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders
Towards Understanding The Calibration Benefits of Sharpness-Aware Minimization
Causal Discovery via Quantile Partial Effect
Entropy-Based Block Pruning for Efficient Large Language Models
Free Energy Mixer
Nesterov Finds GRAAL: Optimal and Adaptive Gradient Method for Convex Optimization
Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control
ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis
Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models
LLaVA-4D: Embedding SpatioTemporal Prompt into LMMs for 4D Scene Understanding
TusoAI: Agentic Optimization for Scientific Methods
Fly-CL: A Fly-Inspired Framework for Enhancing Efficient Decorrelation and Reduced Training Time in Pre-trained Model-based Continual Representation Learning
Dynamics-Predictive Sampling for Active RL Finetuning of Large Reasoning Models
Learning From the Past with Cascading Eligibility Traces
AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size
MindPilot: Closed-loop Visual Stimulation Optimization for Brain Modulation with EEG-guided Diffusion
Measuring Uncertainty Calibration
Math Blind: Failures in Diagram Understanding Undermine Reasoning in MLLMs
Balancing the Experts: Unlocking LoRA-MoE for GRPO via Mechanism-Aware Rewards
InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions
Leveraging Pretrained Knowledge at Inference Time: LoRA-Gated Contrastive Decoding for Multilingual Factual Language Generation in Adapted LLMs
MATHMO: Automated Mathematical Modeling Through Adaptive Search
Every Language Model Has a Forgery-Resistant Signature
DreamSwapV: Mask-guided Subject Swapping for Any Customized Video Editing
LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation
Reconciling Visual Perception and Generation in Diffusion Models
Coupled Transformer Autoencoder for Disentangling Multi-Region Neural Latent Dynamics
ViPER: Empowering the Self-Evolution of Visual Perception Abilities in Vision-Language Models
Relative Value Learning
SWE-RM: Execution-free Feedback for Software Engineering Agents
Detection of unknown unknowns in autonomous systems
Flow Straight and Fast in Hilbert Space: Functional Rectified Flow
All Patches Matter, More Patches Better: Enhance AI-Generated Image Detection via Panoptic Patch Learning
Strong Correlations Induce Cause Only Predictions in Transformer Training
Persona Features Control Emergent Misalignment
Mixture of Contexts for Long Video Generation
Polychromic Objectives for Reinforcement Learning
Codified Finite-state Machines for Role-playing
Segment Any Events with Language
WithAnyone: Toward Controllable and ID Consistent Image Generation
MOAI: Module-Optimizing Architecture for Non-Interactive Secure Transformer Inference
Generation then Reconstruction: Accelerating Masked Autoregressive Models via Two-Stage Sampling
SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration
Symmetry-Aware Bayesian Optimization via Max Kernels
Into the Rabbit Hull: From Task-Relevant Concepts in DINO to Minkowski Geometry
Learning to Segment for Vehicle Routing Problems
Inoculation Prompting: Eliciting traits from LLMs during training can reduce trait expression at test-time
From EduVisBench to EduVisAgent: A Benchmark and Multi-Agent Framework for Reasoning-Driven Pedagogical Visualization
Hierarchical Multi-Stage Recovery Framework for Kronecker Compressed Sensing
Time-Gated Multi-Scale Flow Matching for Time-Series Imputation
RobustSpring: Benchmarking Robustness to Image Corruptions for Optical Flow, Scene Flow and Stereo
Knowledge Externalization: Reversible Unlearning and Modular Retrieval in Multimodal Large Language Models
TriC-Motion: Tri-Domain Causal Modeling Grounded Text-to-Motion Generation
ComGS: Efficient 3D Object-Scene Composition via Surface Octahedral Probes
RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments
Weak-to-Strong Diffusion with Reflection
Knowledge Fusion of Large Language Models via Modular SkillPacks
Critical Confabulation: Can LLMs Hallucinate for Social Good?
Invert4TVG: A Temporal Video Grounding Framework with Inversion Tasks Preserving Action Understanding Ability
WideSearch: Benchmarking Agentic Broad Info-Seeking
Alignment-Weighted DPO: A principled reasoning approach to improve safety alignment
Diffusion Bridge Variational Inference for Deep Gaussian Processes
Unlocking the Power of Co-Occurrence in CLIP: A DualPrompt-Driven Method for Training-Free Zero-Shot Multi-Label Classification
Teaching VLMs to Admit Uncertainty in OCR from Lossy Visual Inputs
Conformal Prediction with Corrupted Labels: Uncertain Imputation and Robust Re-weighting
Towards Generalizable PDE Dynamics Forecasting via Physics-Guided Invariant Learning
Prompt Curriculum Learning for Efficient LLM Post-Training
Translate Policy to Language: Flow Matching Generated Rewards for LLM Explanations
Captain Cinema: Towards Short Movie Generation
Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind
Why Keep Your Doubts to Yourself? Trading Visual Uncertainties among Vision-Language Models
Fracture-GS: Dynamic Fracture Simulation with Physics-Integrated Gaussian Splatting
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning
MIRACLE: Model-free Imitation and Reinforcement Learning for Adaptive Cut-Selection
Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation
Predictive Differential Training Guided by Training Dynamics
DualMap: Enabling Both Cache Affinity and Load Balancing for Distributed LLM Serving
Cautious Optimizers: Improving Training with One Line of Code
RAVEN: End-to-end Equivariant Robot Learning with RGB Cameras
Seq vs Seq: An Open Suite of Paired Encoders and Decoders
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
BioBO: Biology-informed Bayesian Optimization for Perturbation Design
Entropy-Guided Dynamic Tokens for Graph-LLM Alignment in Molecular Understanding
Active Learning of 3D Gaussian Splatting with Consistent Region Partition and Robust Pose Estimation
pySpatial: Generating 3D Visual Programs for Zero-Shot Spatial Reasoning
GoR: A Unified and Extensible Generative Framework for Ordinal Regression
Joint Shadow Generation and Relighting via Light-Geometry Interaction Maps
SHIELD: Suppressing Hallucinations In LVLM Encoders via Bias and Vulnerability Defense
Nemotron-Research-Tool-N1: Exploring Tool-Using Language Models with Reinforced Reasoning
FaSTA*: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing
How NOT to benchmark your SITE metric: Beyond Static Leaderboards and Towards Realistic Evaluation.
Dual-Branch Representations with Dynamic Gated Fusion and Triple-Granularity Alignment for Deep Multi-View Clustering
Preserve and Sculpt: Manifold-Aligned Fine-tuning of Vision-Language Models for Few-Shot Learning
Getting Your LLMs Ready for Reinforcement Learning with Lightweight SFT
Reducing Semantic Mismatch in Brain-to-Text Decoding Through Personalized Multimodal Masking
PepTri: Tri-Guided All-Atom Diffusion for Peptide Design via Physics, Evolution, and Mutual Information
Constant Degree Matrix-Driven Incomplete Multi-View Clustering via Connectivity-Structure and Embedding Tensor Learning
How to train data-efficient LLMs
IceCache: Memory-Efficient KV-cache Management for Long-Sequence LLMs
Interact-RAG: Reason and Interact with the Corpus, Beyond Black-Box Retrieval
InfoBridge: Mutual Information estimation via Bridge Matching
Beyond a Million Tokens: Benchmarking and Enhancing Long-Term Memory in LLMs
GaitSnippet: Gait Recognition Beyond Unordered Sets and Ordered Sequences
SYNC: Measuring and Advancing Synthesizability in Structure-Based Drug Design
What matters for Representation Alignment: Global Information or Spatial Structure?
Revisiting Long-context Modeling from Context Denoising Perspective
Revisiting [CLS] and Patch Token Interaction in Vision Transformers
LSA: Layer-wise Sparsity Allocation for Large Language Model Pruning Based on Minimal Linear Reconstruction Error
RIVER: A Real-Time Interaction Benchmark for Video LLMs
Bridging Generalization Gap of Heterogeneous Federated Clients Using Generative Models
OpenAgentSafety: A Comprehensive Framework For Evaluating Real-World AI Agent Safety
TopoFormer: Topology Meets Attention for Graph Learning
DexNDM: Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-Wise Neural Dynamics Model
RAR: Reversing Visual Attention Re-Sinking for Unlocking Potential in Multimodal Large Language Models
Conformalized Decision Risk Assessment
Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification
Adaptive Mamba Neural Operators
Redirection for Erasing Memory (REM): Towards a universal unlearning method for corrupted data
Overparametrization bends the landscape: BBP transitions at initialization in simple Neural Networks
The Sample Complexity of Online Reinforcement Learning: A Multi-model Perspective
CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data
Lossless Vocabulary Reduction for Auto-Regressive Language Models
A Framework for Studying AI Agent Behavior: Evidence from Consumer Choice Experiments
Structure-Aware Graph Hypernetworks for Neural Program Synthesis
Demystifying Deep Search: A Holistic Evaluation with Hint-free Multi-Hop Questions and Factorised Metrics
Attack-Resistant Watermarking for AIGC Image Forensics via Diffusion-based Semantic Deflection
Primary-Fine Decoupling for Action Generation in Robotic Imitation
Causally Robust Reward Learning from Reason-Augmented Preference Feedback
FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory
Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities
Test-Time Matching: Unlocking Compositional Reasoning in Multimodal Models
Lumos-1: On Autoregressive Video Generation with Discrete Diffusion from a Unified Model Perspective
ARMs: Adaptive Red-Teaming Agent against Multimodal Models with Plug-and-Play Attacks
Neural Force Field: Few-shot Learning of Generalized Physical Reasoning
Repurposing Foundation Model for Generalizable Medical Time Series Classification
Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall
ReTrace: Reinforcement Learning-Guided Reconstruction Attacks on Machine Unlearning
CAPSUL: A Comprehensive Human Protein Benchmark for Subcellular Localization
Discrete Variational Autoencoding via Policy Search
xRFM: Accurate, scalable, and interpretable feature learning models for tabular data
Why Reinforcement Fine-Tuning Enables MLLMs Preserve Prior Knowledge Better: A Data Perspective
Relational Feature Caching for Accelerating Diffusion Transformers
ViPRA: Video Prediction for Robot Actions
Online Decision-Focused Learning
A universal compression theory for lottery ticket hypothesis and neural scaling laws
Differentiable JPEG-based Input Perturbation for Knowledge Distillation Amplification via Conditional Mutual Information Maximization
FlowSymm: Physics–Aware, Symmetry–Preserving Graph Attention for Network Flow Completion
From Large to Small: Transferring CUDA Optimization Expertise via Reasoning Graph
Weak Correlations as the Underlying Principle for Linearization of Gradient-Based Learning Systems
Echoes as Anchors: Probabilistic Costs and Attention Refocusing in LLM Reasoning
InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search
ACADREASON: Exploring the Limits of Reasoning Models with Academic Research Problems
A Minimum Variance Path Principle for Accurate and Stable Score-Based Density Ratio Estimation
Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game
AutoDV: An End-to-End Deep Learning Model for High-Dimensional Data Visualization
Solving Football by Exploiting Equilibrium Structure of 2p0s Differential Games with One-Sided Information
Mobile-GS: Real-time Gaussian Splatting for Mobile Devices
LightRetriever: A LLM-based Text Retrieval Architecture with Extremely Faster Query Inference
Dual Optimistic Ascent (PI Control) is the Augmented Lagrangian Method in Disguise
GraphShield: Graph-Theoretic Modeling of Network-Level Dynamics for Robust Jailbreak Detection
Revisiting Nonstationary Kernel Design for Multi-Output Gaussian Processes
Partition Generative Modeling: Masked Modeling Without Masks
Escaping Policy Contraction: Contraction-Aware PPO (CaPPO) for Stable Language Model Fine-Tuning
From atom to space: A region-based readout function for spatial properties of materials
SumRA: Parameter Efficient Fine-tuning with Singular Value Decomposition and Summed Orthogonal Basis
Lossy Common Information in a Learnable Gray-Wyner Network
Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling
UniHand: A Unified Model for Diverse Controlled 4D Hand Motion Modeling
An Optimal Diffusion Approach to Quadratic Rate-Distortion Problems: New Solution and Approximation Methods
Spectrum Tuning: Post-Training for Distributional Coverage and In-Context Steerability
Transformers Trained via Gradient Descent Can Provably Learn a Class of Teacher Models
REAP the Experts: Why Pruning Prevails for One-Shot MoE compression
FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models
Breaking Gradient Temporal Collinearity for Robust Spiking Neural Networks
MAPSS: Manifold-based Assessment of Perceptual Source Separation
ScalingCache: Extreme Acceleration of DiTs through Difference Scaling and Dynamic Interval Caching
Robust Fine-tuning of Vision-Language-Action Robot Policies via Parameter Merging
Fusing Pixels and Genes: Spatially-Aware Learning in Computational Pathology
Relationship Alignment for View-aware Multi-view Clustering
Text summarization via global structure awareness
Pallatom-Ligand: an All-Atom Diffusion Model for Designing Ligand-Binding Proteins
Reinforcement Mid-Training
Robust Selective Activation with Randomized Temporal K-Winner-Take-All in Spiking Neural Networks for Continual Learning
Weight Space Representation Learning on Diverse NeRF Architectures
The Seismic Wavefield Common Task Framework
Efficient-LVSM: Faster, Cheaper, and Better Large View Synthesis Model via Decoupled Co-Refinement Attention
Is On-Policy Data always the Best Choice for Direct Preference Optimization-Based LM Alignment?
Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations
Resurfacing the Instance-only Dependent Label Noise Model through Loss Correction
FlexLoRA: Entropy-Guided Flexible Low-Rank Adaptation
DiffPBR: Point-Based Rendering via Spatial-Aware Residual Diffusion
Closing the Safety Gap: Surgical Concept Erasure in Visual Autoregressive Models
CPiRi: Channel Permutation-Invariant Relational Interaction for Multivariate Time Series Forecasting
VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models
OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning
Learning of Population Dynamics: Inverse Optimization Meets JKO Scheme
RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots
Massive Activations are the Key to Local Detail Synthesis in Diffusion Transformers
PAMDP: Interact to Persona Alignment via a Partially Observable Markov Decision Process
SIGMark: Scalable In-Generation Watermark with Blind Extraction for Video Diffusion
Generalizable Heuristic Generation Through LLMs with Meta-Optimization
OR-PRM: A Process Reward Model for Algorithmic Problem in Operations Research
Visual Jigsaw Post-Training Improves MLLMs
ExpGuard: LLM Content Moderation in Specialized Domains
FutureFill: Fast Generation from Convolutional Sequence Models
DADA: Dual Averaging with Distance Adaptation
SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs
STDDN: A Physics-Guided Deep Learning Framework for Crowd Simulation
What Happens Next? Anticipating Future Motion by Generating Point Trajectories
STaMP: Sequence Transformation and Mixed Precision for Low-Precision Activation Quantization
Towards High Data Efficiency in Reinforcement Learning with Verifiable Reward
Fingerprinting Deep Neural Networks for Ownership Protection: An Analytical Approach
AMemGym: Interactive Memory Benchmarking for Assistants in Long-Horizon Conversations
Heterogeneous Federated Fine-Tuning with Parallel One-Rank Adaptation
R4: Nested Reasoning-Retrieval for Reward Modeling in Role-Playing Agents
Urban Socio-Semantic Segmentation with Vision-Language Reasoning
Composition of Memory Experts for Diffusion World Models
Pay Attention to CTC: Fast and Robust Pseudo-Labelling for Unified Speech Recognition
PoLi-RL: A Point-to-List Reinforcement Learning Framework for Conditional Semantic Textual Similarity
Explaining Grokking and Information Bottleneck through Neural Collapse Emergence
Measuring Audio's Impact on Correctness: Audio-Contribution-Aware Post-Training of Large Audio Language Models
GAS: Enhancing Reward-Cost Balance of Generative Model-assisted Offline Safe RL
Syncphony: Synchronized Audio-to-Video Generation with Diffusion Transformers
Actions as Language: Fine-Tuning VLMs into VLAs Without Catastrophic Forgetting
All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning
Revisiting Multimodal Positional Encoding in Vision–Language Models
Solving the Granularity Mismatch: Hierarchical Preference Learning for Long-Horizon LLM Agents
Pareto Variational Autoencoder
Bayesian Ensemble for Sequential Decision-Making
Vision-Zero: Scalable VLM Self-Evolution via Multi-Agent Self-Play
Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification
Extending Fourier Neural Operators for Modeling Parameterized and Coupled PDEs
Revenue Maximization Under Sequential Price Competition Via The Estimation Of $s$-Concave Demand Functions
VMDiff: Visual Mixing Diffusion for Limitless Cross-Object Synthesis
Operationalizing Data Minimization for Privacy-Preserving LLM Prompting
The Value of Information in Human-AI Decision-making
From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning
Probing to Refine: Reinforcement Distillation of LLM Reasoners via Explanatory Inversion
Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset
OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents
Chunking the Critic: A Transformer-based Soft Actor-Critic with N-Step Returns
TNT: Improving Chunkwise Training for Test-Time Memorization
Towards Efficient Constraint Handling in Neural Solvers for Routing Problems
Estimating Worst-Case Frontier Risks of Open-Weight LLMs
Asynchronous Denoising Diffusion Models for Aligning Text-to-Image Generation
Model Predictive Adversarial Imitation Learning for Planning from Observation
R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?
NAIPv2: Debiased Pairwise Learning for Efficient Paper Quality Estimation
RLAP-CLIP: Continual Multimodal Learning with Prototype Adaptation and Difficulty-Aware Routing
Autoregressive Image Generation with Randomized Parallel Decoding
NewtonGen: Physics-consistent and Controllable Text-to-Video Generation via Neural Newtonian Dynamics
PRISM: Festina Lente Proactivity—Risk-Sensitive, Uncertainty-Aware Deliberation for Proactive Agents
SURGE: Surprise-Guided Token Reduction for Efficient Video Understanding with VLMs
Computational Bottlenecks for Denoising Diffusions
Monotone Near-Zero-Sum Games: A Generalization of Convex-Concave Minimax
Optimal Transport-Induced Samples against Out-of-Distribution Overconfidence
DriveAgent-R1: Advancing VLM-based Autonomous Driving with Active Perception and Hybrid Thinking
mCLM: A Modular Chemical Language Model that Generates Functional and Makeable Molecules
Real-Time Robot Execution with Masked Action Chunking
Do LLM Agents Know How to Ground, Recover, and Assess? Evaluating Epistemic Competence in Information-Seeking Agents
Meta-Router: Bridging Gold-standard and Preference-based Evaluations in LLM Routing
Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences
Glance and Focus Reinforcement for Pan-cancer Screening
Pretrain Value, Not Reward: Decoupled Value Policy Optimization
Characterization and Learning of Causal Graphs with Latent Confounders and Post-treatment Selection from Interventional Data
Contextual Multi-Armed Bandits with Minimum Aggregated Revenue Constraints
Adaptive Moments are Surprisingly Effective for Plug-and-Play Diffusion Sampling
No outlier channels but with outlier blocks
Aligning Collaborative View Recovery and Tensorial Subspace Learning via Latent Representation for Incomplete Multi-View Clustering
All-day Multi-scenes Lifelong Vision-and-Language Navigation with Tucker Adaptation
Closed-form $\ell_r$ norm scaling with data for overparameterized linear regression and diagonal linear networks under $\ell_p$ bias
SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis
Building Massively Multimodal Foundation Models with Interaction-aware Mixture-of-Experts
Random Label Prediction Heads for Studying Memorization in Deep Neural Networks
Characterizing and Optimizing the Spatial Kernel of Multi Resolution Hash Encodings
Plug-and-Play Fidelity Optimization for Diffusion Transformer Acceleration via Cumulative Error Minimization
FastGHA: Generalized Few-Shot 3D Gaussian Head Avatars with Real-Time Animation
Reconstruction Alignment Improves Unified Multimodal Models
Bayesian Evidence-Driven Prototype Evolution for Federated Domain Adaptation
MILPnet: A Multi-Scale Architecture with Geometric Feature Sequence Representations for Advancing MILP Problems
Local Reinforcement Learning with Action-Conditioned Root Mean Squared Q-Functions
Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training
Partial Soft-Matching Distance For Neural Representational Comparison With Partial Unit Correspondence
Learning to Be Uncertain: Pre-training World Models with Horizon-Calibrated Uncertainty
Optimal Robust Subsidy Policies for Irrational Agent in Principal-Agent MDPs
Darwin Gödel Machine: Open-Ended Evolution of Self-Improving Agents
LLaVAction: evaluating and training multi-modal large language models for action understanding
Ensembling Pruned Attention Heads For Uncertainty-Aware Efficient Transformers
CONSIGN: Conformal Segmentation Informed by Spatial Groupings via Decomposition
$\nabla$-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space
Source-Guided Flow Matching
Video-KTR: Reinforcing Video Reasoning via Key Token Attribution
An Efficient SE(p)-Invariant Transport Metric Driven by Polar Transport Discrepancy-based Representation
FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation
The Natural Geometry of Code: Hyperbolic Representation Learning for Program Reasoning
Optimizing ID Consistency in Multimodal Large Models: Facial Restoration via Alignment, Entanglement, and Disentanglement
Why is Your Language Model a Poor Implicit Reward Model?
EGG-SR: Embedding Symbolic Equivalence into Symbolic Regression via Equality Graph
Object-Centric Refinement for Enhanced Zero-Shot Segmentation
Beyond Noisy-TVs: Noise-Robust Exploration Via Learning Progress Monitoring
PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits
Trust-Region Adaptive Policy Optimization
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
MIRA: Memory-Integrated Reinforcement Learning Agent with Limited LLM Guidance
Making, Not Taking, the Best of N
Controllable Logical Hypothesis Generation for Abductive Reasoning in Knowledge Graphs
MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models
VMoBA: Mixture-of-Block Attention for Video Diffusion Models
Deep Hierarchical Learning with Nested Subspace Networks for Large Language Models
Glance for Context: Learning When to Leverage LLMs for Node-Aware GNN-LLM Fusion
CLARC: C/C++ Benchmark for Robust Code Search
Memory-Statistics Tradeoff in Continual Learning with Structural Regularization
Otters: An Energy-Efficient Spiking Transformer via Optical Time-to-First-Spike Encoding
OmniActor: A Generalist GUI and Embodied Agent for 2D&3D Worlds
Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning
Mixture-of-Experts Can Surpass Dense LLMs Under Strictly Equal Resource
UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections
Inductive Reasoning for Temporal Knowledge Graphs with Emerging Entities
Don't Just Fine-tune the Agent, Tune the Environment
Offline Reinforcement Learning with Adaptive Feature Fusion
Flash-Mono: Feed-Forward Accelerated Gaussian Splatting Monocular SLAM
AnyTouch 2: General Optical Tactile Representation Learning For Dynamic Tactile Perception
STARK: Strategic Team of Agents for Refining Kernels
Cognitive models can reveal interpretable value trade-offs in language models
Toward Enhancing Representation Learning in Federated Multi-Task Settings
VLMgineer: Vision-Language Models as Robotic Toolsmiths
Latent-to-Data Cascaded Diffusion Models for Unconditional Time Series Generation
Statistical Guarantees in the Search for Less Discriminatory Algorithms
FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM Agents
Learning with Dual-level Noisy Correspondence for Multi-modal Entity Alignment
Demystifying The Mechanisms Behind Emergent Exploration in Goal-Conditioned RL
Articulation in Motion: Prior-free Part Mobility Analysis for Articulated Objects By Dynamic-Static Disentanglement
An Open-Ended Benchmark and Formal Framework for Adjuvant Research with MLLM
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation
Guided Speculative Inference for Efficient Test-Time Alignment of LLMs
Fast Estimation of Wasserstein Distances via Regression on Sliced Wasserstein Distances
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
Beyond Penalization: Diffusion-based Out-of-Distribution Detection and Selective Regularization in Offline Reinforcement Learning
Test-Time Scaling with Reflective Generative Model
EasyCreator: Empowering 4D Creation through Video Inpainting
Improving Classifier-Free Guidance in Masked Diffusion: Low-Dim Theoretical Insights with High-Dim Impact
Mean Flow Policy with Instantaneous Velocity Constraint for One-step Action Generation
ATLAS: Constraints-Aware Multi-Agent Collaboration for Real-World Travel Planning
Robust Decision-Making with Partially Calibrated Forecasters
HarmonyGNNs: Harmonizing Heterophily and Homophily in GNNs via Self-Supervised Node Encoding
Discovering and Steering Interpretable Concepts in Large Generative Music Models
LinearRAG: Linear Graph Retrieval Augmented Generation on Large-scale Corpora
MTVCraft: Tokenizing 4D Motion for Arbitrary Character Animation
Discount Model Search for Quality Diversity Optimization in High-Dimensional Measure Spaces
Learning Pseudorandom Numbers with Transformers: Permuted Congruential Generators, Curricula, and Interpretability
String Seed of Thought: Prompting LLMs for Distribution-Faithful and Diverse Generation
Emergence of Superposition: Unveiling the Training Dynamics of Chain of Continuous Thought
Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning
Equilibrium Language Models
Jacobian Aligned Random Forests
All Code, No Thought: Language Models Struggle to Reason in Ciphered Language
Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SEED-SET: Scalable Evolving Experimental Design for System-level Ethical Testing
SpikeStereoNet: A Brain-Inspired Framework for Stereo Depth Estimation from Spike Streams
Learning to Reason over Continuous Tokens with Reinforcement Learning
An Improved Model-free Decision-estimation Coefficient with Applications in Adversarial MDPs
Interactive Learning of Single-Index Models via Stochastic Gradient Descent
Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling
Complementing Self-Consistency with Cross-Model Disagreement for Uncertainty Quantification
Interleaving Reasoning for Better Text-to-Image Generation
ShinkaEvolve: Towards Open-Ended and Sample-Efficient Program Evolution
Graph Diffusion Transformers are In-Context Molecular Designers
Gistify: Codebase-Level Understanding via Runtime Execution
HLD: Approximate Hierarchical Linguistic Distribution Modeling for LLM-Generated Text Detection
LingoLoop Attack: Trapping MLLMs via Linguistic Context and State Entrapment into Endless Loops
SAVE: A Generalizable Framework for Multi-Condition Single-Cell Generation with Gene Block Attention
Adaptive Augmentation-Aware Latent Learning for Robust LiDAR Semantic Segmentation
CoAct-1: Computer-using Multi-agent System with Coding Actions
AgenTracer: Who Is Inducing Failure in the LLM Agentic Systems?
Identifying Robust Neural Pathways: Few-Shot Adversarial Mask Tuning for Vision-Language Models
Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning
Safeguarding Multimodal Knowledge Copyright in the RAG-as-a-Service Environment
Subspace Kernel Learning on Tensor Sequences
Exploring Cross-Modal Flows for Few-Shot Learning
FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment
DAVE: A VLM Vision Encoder for Document Understanding and Web Agents
DiCache: Let Diffusion Model Determine Its Own Cache
When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling
Evaluating Language Models' Evaluations of Games
Structure Learning from Time-Series Data with Lag-Agnostic Structural Prior
Improving Autoregressive Video Modeling with History Understanding
Robust Equation Structure Learning with Adaptive Refinement
Unifying Formal Explanations: A Complexity-Theoretic Perspective
Universal Model Routing for Efficient LLM Inference
Towards Multimodal Data-Driven Scientific Discovery Powered by LLM Agents
Cross-Domain Policy Optimization via Bellman Consistency and Hybrid Critics
TEDM: Time Series Forecasting with Elucidated Diffusion Models
CREPE: Controlling diffusion with REPlica Exchange
Avey-B
Divide, Harmonize, Then Conquer It: Shooting Multi-Commodity Flow Problems with Multimodal Language Models
Evaluating SAE interpretability without generating explanations
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
TGM: A Modular and Efficient Library for Machine Learning on Temporal Graphs
On the Theoretical Limitations of Embedding-Based Retrieval
House Of Dextra : Cross-Embodied Co-Design for Dexterous Hands
Mitigating Mismatch within Reference-based Preference Optimization
ContextNav: Towards Agentic Multimodal In-Context Learning
Reinforcing Diffusion Models by Direct Group Preference Optimization
Scalable Training for Vector-Quantized Networks with 100% Codebook Utilization
CLUE: Conflict-guided Localization for LLM Unlearning Framework
Nano3D: A Training-Free Approach for Efficient 3D Editing Without Masks
MoSA: Mosaic Shared Adaptation of Large Language Models
A Federated Generalized Expectation-Maximization Algorithm for Mixture Models with an Unknown Number of Components
BIRD: Behavior Induction via Representation-structure Distillation
Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models
Fantastic Tractor-Dogs and How Not to Find Them With Open-Vocabulary Detectors
Neural Latent Arbitrary Lagrangian-Eulerian Grids for Fluid-Solid Interaction
MedAgentGym: A Scalable Agentic Training Environment for Code-Centric Reasoning in Biomedical Data Science
FERD: Fairness-Enhanced Data-Free Adversarial Robustness Distillation
RE-PO: Robust Enhanced Policy Optimization as a General Framework for LLM Alignment
Graph Tokenization for Bridging Graphs and Transformers
Latent Visual Reasoning
Poisson Midpoint Method for Log Concave Sampling: Beyond the Strong Error Lower Bounds
Difference Predictive Coding for Training Spiking Neural Networks
Adaptive Domain Shift in Diffusion Models for Cross-Modality Image Translation
Towards Better Optimization For Listwise Preference in Diffusion Models
Automatic Dialectic Jailbreak: A Framework for Generating Effective Jailbreak Strategies
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards
Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
Defending against Backdoor Attacks via Module Switching
A Tale of Two Geometries: Adaptive Optimizers and Non-Euclidean Descent
Demystifying Supervision Data Generalization in Multimodal LMs
RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning
HiMAE: Hierarchical Masked Autoencoders Discover Resolution-Specific Structure in Wearable Time Series
LadderSym: A Multimodal Interleaved Transformer for Music Practice Error Detection
SpotIt: Evaluating Text-to-SQL Evaluation with Formal Verification
AnesSuite: A Comprehensive Benchmark and Dataset Suite for Anesthesiology Reasoning in LLMs
ARFlow: Auto-regressive Optical Flow Estimation for Arbitrary-Length Videos via Progressive Next-Frame Forecasting
Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning
Risk-Sensitive Agent Compositions
One2Scene: Geometric Consistent Explorable 3D Scene Generation from a Single Image
Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks
From Cheap Geometry to Expensive Physics: A Physics-agnostic Pretraining Framework for Neural Operators
Neon: Negative Extrapolation From Self-Training Improves Image Generation
Diagnosing and Remedying Knowledge Deficiencies in LLMs via Label-free Curricular Meaningful Learning
UNDERSTANDING TRANSFORMERS FOR TIME SERIES FORECASTING: A CASE STUDY ON MOIRAI
Self-Speculative Masked Diffusions
When to use Graphs in RAG: A Comprehensive Analysis for Graph Retrieval-Augmented Generation
Object Fidelity Diffusion for Remote Sensing Image Generation
Prune Redundancy, Preserve Essence: Vision Token Compression in VLMs via Synergistic Importance-Diversity
From Single to Multi-Granularity: Toward Long-Term Memory Association and Selection of Conversational Agents
MoL: Adaptive Mixture-of-Length Reasoning for Efficient Question Answering with Context
ProofOptimizer: Training Language Models to Simplify Proofs without Human Demonstrations
Developmental Federated Tuning: A Cognitive-Inspired Paradigm for Efficient LLM Adaptation
Token Hidden Reward: Steering Exploration-Exploitation in Group Relative Deep Reinforcement Learning
THE PATH OF LEAST RESISTANCE: GUIDING LLM REASONING TRAJECTORIES WITH PREFIX CONSENSUS
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization
Reasoning-Aligned Perception Decoupling for Scalable Multi-modal Reasoning
Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
Nudging the Boundaries of LLM Reasoning
A Training-Free Framework for Long Video Understanding via Video-Query-Options Similarity
Beyond English-Centric Training: How Reinforcement Learning Improves Cross-Lingual Reasoning in LLMs
Breaking the Correlation Plateau: On the Optimization and Capacity Limits of Attention-Based Regressors
Group Critical-token Policy Optimization for Autoregressive Image Generation
PAC-Bayes bounds for cumulative loss in Continual Learning
dParallel: Learnable Parallel Decoding for dLLMs
APPLE: Toward General Active Perception via Reinforcement Learning
On Discovering Algorithms for Adversarial Imitation Learning
Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels
Disentangling Length Bias in Preference Learning via Response-Conditioned Modeling
Discovering heterogeneous synaptic plasticity rules via large-scale neural evolution
What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging
Log Probability Tracking of LLM APIs
In-Context Learning of Temporal Point Processes with Foundation Inference Models
DrVoice: Parallel Speech-Text Voice Conversation Model via Dual-Resolution Speech Representations
SiMO: Single-Modality-Operable Multimodal Collaborative Perception
Any-to-Bokeh: Arbitrary-Subject Video Refocusing with Video Diffusion Model
Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments
NoisePrints: Distortion-Free Watermarks for Authorship in Private Diffusion Models
K-Prism: A Knowledge-Guided and Prompt Integrated Universal Medical Image Segmentation Model
STEM: SCALING TRANSFORMERS WITH EMBEDDING MODULES
StochasTok: Improving Fine-Grained Subword Understanding in LLMs
Global Resolution: Optimal Multi-Draft Speculative Sampling via Convex Optimization
RESCUE: Retrieval Augmented Secure Code Generation
Speech-to-LaTeX: New Models and Datasets for Converting Spoken Equations and Sentences
APC-RL: Exceeding data-driven behavior priors with adaptive policy composition
Steering Diffusion Models Towards Credible Content Recommendation
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
GoalRank: Group-Relative Optimization for a Large Ranking Model
How to Square Tensor Networks and Circuits Without Squaring Them
ActivationReasoning: Logical Reasoning in Latent Activation Spaces
Measuring and Mitigating Rapport Bias of Large Language Models under Multi-Agent Social Interactions
Minimax-Optimal Aggregation for Density Ratio Estimation
Diffusion Language Model Knows the Answer Before It Decodes
SketchingReality: From Freehand Scene Sketches to Photorealistic Images
A Primer on SO(3) Action Representations in Deep Reinforcement Learning
Towards Quantifying Long-Range Interactions in Graph Machine Learning: a Large Graph Dataset and a Measurement
GT-Space: Enhancing Heterogeneous Collaborative Perception with Ground Truth Feature Space
ProTDyn: A Foundation Protein Language Model for Thermodynamics and Dynamics Generation
Learning Survival Distributions with Individually Calibrated Asymmetric Laplace Distribution
Reasoning Models Can be Accurately Pruned Via Chain-of-Thought Reconstruction
Exploring Mode Connectivity in Krylov Subspace for Domain Generalization
Johnson-Lindenstrauss Lemma Guided Network for Efficient 3D Medical Segmentation
Counterfactual LLM-based Framework for Measuring Rhetorical Style
Bandits with Single-Peaked Preferences and Limited Resources
Designing Affine-Invariant Neural Networks for Photometric Corruption Robustness and Generalization
Quasi-Monte Carlo Methods Enable Extremely Low-Dimensional Deep Generative Models
Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining
Globally aware optimization with resurgence
RefAny3D: 3D Asset-Referenced Diffusion Models for Image Generation
ODI-Bench: Can MLLMs Understand Immersive Omnidirectional Environments?
MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites
DeepWeightFlow: Re-Basined Flow Matching for Generating Neural Network Weights
AutoCodeBench: Large Language Models are Automatic Code Benchmark Generators
Variational Reasoning for Language Models
AP-OOD: Attention Pooling for Out-of- Distribution Detection
Learning More with Less: A Dynamic Dual-Level Down-Sampling Framework for Efficient Policy Optimization
Q-Learning with Fine-Grained Gap-Dependent Regret
NDAD: Negative-Direction Aware Decoding for Large Language Models via Controllable Hallucination Signal Injection
Conformalized Hierarchical Calibration for Uncertainty-Aware Adaptive Hashing
From Natural Alignment to Conditional Controllability in Multimodal Dialogue
Evoking User Memory: Personalizing LLM via Recollection-Familiarity Adaptive Retrieval
EdgeCape: Edge Weight Prediction For Category-Agnostic Pose Estimation
Expressiveness of Multi-Neuron Convex Relaxations in Neural Network Certification
SLAP: Shortcut Learning for Abstract Planning
QuaMo: Quaternion Motions for Vision-based 3D Human Kinematics Capture
Visual Autoregressive Modeling for Instruction-Guided Image Editing
Completed Hyperparameter Transfer across Modules, Width, Depth, Batch and Duration
Hidden Breakthroughs in Language Model Training
DND: Boosting Large Language Models with Dynamic Nested Depth
Discovering Novel LLM Experts via Task-Capability Coevolution
GTR-Bench: Evaluating Geo-Temporal Reasoning in Vision-Language Models
Cortical Policy: A Dual-Stream View Transformer for Robotic Manipulation
Trajectory Generation with Conservative Value Guidance for Offline Reinforcement Learning
KaVa: Latent Reasoning via Compressed KV-Cache Distillation
Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks
Arbitrary Generative Video Interpolation
FedDAG: Clustered Federated Learning via Global Data and Gradient Integration for Heterogeneous Environments
Latent Concept Disentanglement in Transformer-based Language Models
AdaSpec: Adaptive Spectrum for Enhanced Node Distinguishability
A Step to Decouple Optimization in 3DGS
QuRL: Low-Precision Reinforcement Learning for Efficient Reasoning
VQ-Transplant: Efficient VQ-Module Integration for Pre-trained Visual Tokenizers
GenCP: Towards Generative Modeling Paradigm of Coupled physics
Differentiable Lifting for Topological Neural Networks
ATOM: A Pretrained Neural Operator for Multitask Molecular Dynamics
Revela: Dense Retriever Learning via Language Modeling
Robust Training of Neural Networks at Arbitrary Precision and Sparsity
Robustness in the Face of Partial Identifiability in Reward Learning
Fore-Mamba3D: Mamba-based Foreground-Enhanced Encoding for 3D Object Detection
GeoPurify: A Data-Efficient Geometric Distillation Framework for Open-Vocabulary 3D Segmentation
SkyEvents: A Large-Scale Event-enhanced UAV Dataset for Robust 3D Scene Reconstruction
Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations
ProxyThinker: Test-Time Guidance through Small Visual Reasoners
Internal Planning in Language Models: Characterizing Horizon and Branch Awareness
Natural Identifiers for Privacy and Data Audits in Large Language Models
AUHead: Realistic Emotional Talking Head Generation via Action Units Control
Beyond Spectra: Eigenvector Overlaps in Loss Geometry
Mirror Flow Matching with Heavy-Tailed Priors for Generative Modeling on Convex Domains
Think in Parallel, Answer as One: Logit Averaging for Open-Ended Reasoning
In-Place Test-Time Training
DexMove: Learning Tactile-Guided Non-Prehensile Manipulation with Dexterous Hands
Convex Dominance in Deep Learning I: A Scaling Law of Loss and Learning Rate
Bilevel Optimization with Lower-Level Uniform Convexity: Theory and Algorithm
Reformulation for Pretraining Data Augmentation
Discrete Diffusion Trajectory Alignment via Stepwise Decomposition
SIGMA-Gen: Structure and Identity Guided Multi-Subject Assembly for Image Generation
Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs
RLAC: Reinforcement Learning with Adversarial Critic for Free-Form Generation Tasks
Multimodal Aligned Semantic Knowledge for Unpaired Image-text Matching
Directional Sheaf Hypergraph Networks: Unifying Learning on Directed and Undirected Hypergraphs
DRIFT: Learning from Abundant User Dissatisfaction in Real-World Preference Learning
HiGS: History-Guided Sampling for Plug-and-Play Enhancement of Diffusion Models
Pre-training Limited Memory Language Models with Internal and External Knowledge
Bayesian Neural Networks for Functional ANOVA Model
Gradient-Based Diversity Optimization with Differentiable Top-$k$ Objective
Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond
PACEbench: A Framework for Evaluating Practical AI Cyber-Exploitation Capabilities
UrbanGraph: Physics-Informed Spatio-Temporal Dynamic Heterogeneous Graphs for Urban Microclimate Prediction
WALT: Web Agents that Learn Tools
Boosting Medical Visual Understanding From Multi-Granular Language Learning
Addressing divergent representations from causal interventions on neural networks
Shortcut Diffusion Training with Cumulative Consistency Loss: An Optimal Control View
DeepScientist: Advancing Frontier-Pushing Scientific Findings Progressively
Out of the Shadows: Exploring a Latent Space for Neural Network Verification
UniHM: Unified Dexterous Hand Manipulation with Vision Language Model
Sharp Monocular View Synthesis in Less Than a Second
Only Brains Align with Brains: Cross-Region Alignment Patterns Expose Limits of Normative Models
A Statistical Theory of Overfitting for Imbalanced Classification
Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks
Structurally Human, Semantically Biased: Detecting LLM-Generated References with Embeddings and GNNs
Modality-free Graph In-context Alignment
Teach to Reason Safely: Policy-Guided Safety Tuning for MLRMs
TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation
A Sharp KL Convergence Analysis for Diffusion Models under Minimal Assumptions
Block-wise Adaptive Caching for Accelerating Diffusion Policy
gLSTM: Mitigating Over-Squashing by Increasing Storage Capacity
DispViT: Direct Stereo Disparity Regression with a Single-Stream Vision Transformer
GraphUniverse: Synthetic Graph Generation for Evaluating Inductive Generalization
GradPruner: Gradient-guided Layer Pruning Enabling Efficient Fine-Tuning and Inference for LLMs
DeAltHDR: Learning HDR Video Reconstruction from Degraded Alternating Exposure Sequences
On the Convergence of Two-Layer Kolmogorov-Arnold Networks with First-Layer Training
RECON: Robust symmetry discovery via Explicit Canonical Orientation Normalization
GTool: Graph Enhanced Tool Planning with Large Language Model
TumorChain: Interleaved Multimodal Chain-of-Thought Reasoning for Traceable Clinical Tumor Analysis
TRIBE: TRImodal Brain Encoder for whole-brain fMRI response prediction
Boosting Entropy with Bell Box Quantization
DEAS: DEtached value learning with Action Sequence for Scalable Offline RL
Learning From Dictionary: Enhancing Robustness of Machine-Generated Text Detection in Zero-Shot Language via Adversarial Training
Efficient Offline Reinforcement Learning via Peer-Influenced Constraint
Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning
Rectifying LLM Thought from Lens of Optimization
DASH: Deterministic Attention Scheduling for High-throughput Reproducible LLM Training
Grounding or Guessing? Visual Signals for Detecting Hallucinations in Sign Language Translation
LiveWeb-IE: A Benchmark For Online Web Information Extraction
Dual-Scale World Memory for LLM Agents towards Hard-Exploration Problems
Domain Expansion: A Latent Space Construction Framework for Multi-Task Learning
Single-stream Policy Optimization
GeomMotif: A Benchmark for Arbitrary Geometric Preservation in Protein Generation
Bayesian Post Training Enhancement of Regression Models with Calibrated Rankings
Time Is All It Takes: Spike-Retiming Attacks on Event-Driven Spiking Neural Networks
Image Quality Assessment for Embodied AI
Understanding Transformers for Time Series: Rank Structure, Flow-of-ranks, and Compressibility
PairFlow: Closed-Form Source-Target Coupling for Few-Step Generation in Discrete Flow Models
ASSESS: A Semantic and Structural Evaluation Framework for Statement Similarity
New Hybrid Fine-Tuning Paradigm for LLMs: Algorithm Design and Convergence Analysis Framework
Decoupled Q-Chunking
VER: Vision Expert Transformer for Robot Learning via Foundation Distillation and Dynamic Routing
SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models
TimeSeg: An Information-Theoretic Segment-Wise Explainer for Time-Series Predictions
h-MINT: Modeling Pocket-Ligand Binding with Hierarchical Molecular Interaction Network
What Layers When: Learning to Skip Compute in LLMs with Residual Gates
InfoScan: Information-Efficient Visual Scanning via Resource-Adaptive Walks
EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models
COLD-Steer: Steering Large Language Models via In-Context One-step Learning Dynamics
Beyond Multi-Token Prediction: Pretraining LLMs with Future Summaries
OccDriver: Future Occupancy Guided Dual-branch Trajectory Planner in Autonomous Driving
Are Reasoning LLMs Robust to Interventions on their Chain-of-Thought?
Dual Goal Representations
Differential Fine-Tuning Large Language Models Towards Better Diverse Reasoning Abilities
Revisiting the Past: Data Unlearning with Model State History
Learning Distributions over Permutations and Rankings with Factorized Representations
PhyScensis: Physics-Augmented LLM Agents for Complex Physical Scene Arrangement
Propaganda AI: An Analysis of Semantic Divergence in Large Language Models
Divergence-Free Neural Networks with Application to Image Denoising
Temporal Graph Thumbnail: Robust Representation Learning with Global Evolutionary Skeleton
StoryAlign: Evaluating and Training Reward Models for Story Generation
Generalization Below the Edge of Stability: The Role of Data Geometry
Language in the Flow of Time: Time-Series-Paired Texts Weaved into a Unified Temporal Narrative
DirMoE: Dirichlet-Routed Mixture of Experts
Inconsistency Biases in Dynamic Data Pruning
Diagnosing Generalization Failures from Representational Geometry Markers
Navigating the Latent Space Dynamics of Neural Models
COSMOS: A Hybrid Adaptive Optimizer for Efficient Training of Large Language Models
WATS: Wavelet-Aware Temperature Scaling for Reliable Graph Neural Networks
SimpleGVR: A Simple Baseline for Latent-Cascaded Generative Video Super-Resolution
Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis
Grouping Nodes with known Value Differences: A lossless UCT-based Abstraction Algorithm
DragFlow: Unleashing DiT Priors with Region-Based Supervision for Drag Editing
Vid2World: Crafting Video Diffusion Models to Interactive World Models
MoRA: Missing Modality Low-Rank Adaptation for Visual Recognition
TyphoonMLA: A Mixed Naive-Absorb MLA Kernel For Shared Prefix
On Smoothness Bounds for Non-Clairvoyant Scheduling with Predictions
Nonparametric Teaching of Attention Learners
Training Dynamics Impact Post-Training Quantization Robustness
Neural Graduated Assignment for Maximum Common Edge Subgraphs
BA-LoRA: Bias-Alleviating Low-Rank Adaptation to Mitigate Catastrophic Inheritance in Large Language Models
To Augment or Not to Augment? Diagnosing Distributional Symmetry Breaking
SketchThinker-R1: Towards Efficient Sketch-Style Reasoning in Large Multimodal Models
SMOTE and Mirrors: Exposing Privacy Leakage from Synthetic Minority Oversampling
PTNET: A PROPOSAL-CENTRIC TRANSFORMER NET- WORK FOR 3D OBJECT DETECTION
Improving Online-to-Nonconvex Conversion for Smooth Optimization via Double Optimism
Escaping the Homophily Trap: A Threshold-free Graph Outlier Detection Framework via Clustering-guided Edge Reweighting
Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards
Self-Aug: Query and Entropy Adaptive Decoding for Large Vision-Language Models
Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception
Transformers are Inherently Succinct
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models
Dynamic Multimodal Activation Steering for Hallucination Mitigation in Large Vision-Language Models
Let's Split Up: Zero-Shot Classifier Edits for Fine-Grained Video Understanding
Detecting Data Contamination in LLMs via In-Context Learning
SNAP-UQ: Self-supervised Next-Activation Prediction for Single-Pass Uncertainty in TinyML
Neural Hamilton--Jacobi Characteristic Flows for Optimal Transport
SGD-Based Knowledge Distillation with Bayesian Teachers: Theory and Guidelines
Dynamic Speculative Agent Planning
Task-Related Token Compression in Multimodal Large Language Models from an Explainability Perspective
Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning
Cannistraci-Hebb Training on Ultra-Sparse Spiking Neural Networks
DM4CT: Benchmarking Diffusion Models for Computed Tomography Reconstruction
When MLLMs Meet Compression Distortion: A Coding Paradigm Tailored to MLLMs
InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning
Understanding the Role of Training Data in Test-Time Scaling
KaLM-Embedding-V2: Superior Training Techniques and Data Inspire A Versatile Embedding Model
Adaptive Gaussian Expansion for On-the-fly Category Discovery
Beyond In-Domain Detection: SpikeScore for Cross-Domain Hallucination Detection
Quantized Visual Geometry Grounded Transformer
SwiftTS: A Swift Selection Framework for Time Series Pre-trained Models via Multi-task Meta-Learning
Temporal Geometry of Deep Networks: Hyperbolic Representations of Training Dynamics for Intrinsic Explainability
MAVEN: A Mesh-Aware Volumetric Encoding Network for Simulating 3D Flexible Deformation
Efficient Message-Passing Transformer for Error Correcting Codes
Parallel Sampling from Masked Diffusion Models via Conditional Independence Testing
Tokenisation over Bounded Alphabets is Hard
Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning
Routing, Cascades, and User Choice for LLMs
Activation Function Design Sustains Plasticity in Continual Learning
Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning
MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents
On the trade-off between expressivity and privacy in graph representation learning
Quasi-Equivariant Metanetworks
Non-Asymptotic Analysis of Efficiency in Conformalized Regression
Maximizing Incremental Information Entropy for Contrastive Learning
Decision-Theoretic Approaches for Improved Learning-Augmented Algorithms
CircuitSense: A Hierarchical MLLM Benchmark Bridging Visual Comprehension and Symbolic Reasoning in Engineering Design Process
Post-Training Quantization for Video Matting
TROLL: Trust Regions Improve Reinforcement Learning for Large Language Models
Exploratory Diffusion Model for Unsupervised Reinforcement Learning
Continuum Transformers Perform In-Context Learning by Operator Gradient Descent
Ambig-SWE: Interactive Agents to Overcome Underspecificity in Software Engineering
CreatiDesign: A Unified Multi-Conditional Diffusion Transformer for Creative Graphic Design
Causal Discovery in the Wild: A Voting-Theoretic Ensemble Approach
MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting
UNITE: Universal kNowledge Integration from Task-specific Experts
WIMFRIS: WIndow Mamba Fusion and Parameter Efficient Tuning for Referring Image Segmentation
Sharp asymptotic theory for Q-learning with \texttt{LD2Z} learning rate and its generalization
InfoDet: A Dataset for Infographic Element Detection
Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment
ASTGI: Adaptive Spatio-Temporal Graph Interactions for Irregular Multivariate Time Series Forecasting
RESFL: An Uncertainty-Aware Framework for Responsible Federated Learning by Balancing Privacy, Fairness and Utility
Conjuring Semantic Similarity
CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives
PCLR: Progressively Compressed LoRA for Multimodal Continual Instruction Tuning
Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark
Prompt and Parameter Co-Optimization for Large Language Models
ResCP: Reservoir Conformal Prediction for Time Series Forecasting
Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression
SpikeGen: Decoupled “Rods and Cones” Visual Representation Processing with Latent Generative Framework
Towards Self-Robust LLMs: Intrinsic Prompt Noise Resistance via CoIPO
ReCAPA: Hierarchical Predictive Correction to Mitigate Cascading Failures
Graph-Theoretic Intrinsic Reward: Guiding RL with Effective Resistance
Frayed RoPE and Long Inputs: A Geometric Perspective
NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks
PD$^{2}$GS: Part-Level Decoupling and Continuous Deformation of Articulated Objects via Gaussian Splatting
DeLiVR: Differential Spatiotemporal Lie Bias for Efficient Video Deraining
SafeMoE: Safe Fine-Tuning for MoE LLMs by Aligning Harmful Input Routing
QPrompt-R1: Real-Time Reasoning for Domain-Generalized Semantic Segmentation via Group-Relative Query Alignment
Decision Aggregation under Quantal Response
BOLT: Decision‑Aligned Distillation and Budget-Aware Routing for Constrained Multimodal QA on Robots
Interference-Isolated Elastic Weight Consolidation and Knowledge Calibration for Incremental Object Detection
LLM-as-a-Prophet: Understanding Predictive Intelligence with Prophet Arena
OmniField: Conditioned Neural Fields for Robust Multimodal Spatiotemporal Learning
Riesz Neural Operator for Solving Partial Differential Equations
Unified In-Context Video Editing
Do We Need All the Synthetic Data? Targeted Image Augmentation via Diffusion Models
LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution
ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation
jqBench: a benchmark for reading and editing JSON from natural language and/or examples
Abstracting Robot Manipulation Skills via Mixture-of-Experts Diffusion Policies
Small Transformers Don’t Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and Implications for Mechanistic Interpretability
Directional Convergence, Benign Overfitting of Gradient Descent in leaky ReLU two-layer Neural Networks
DiffusionNFT: Online Diffusion Reinforcement with Forward Process
Dual-Space Smoothness for Robust and Balanced LLM Unlearning
Inference-Time Personalized Safety Control via Paired Difference-in-Means Intervention
DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty
Zeros can be Informative: Masked Binary U-Net for Image Segmentation on Tensor Cores
Light-X: Generative 4D Video Rendering with Camera and Illumination Control
The Polar Express: Optimal Matrix Sign Methods and their Application to the Muon Algorithm
StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning
LogART: Pushing the Limit of Efficient Logarithmic Post-Training Quantization
DiffSparse: Accelerating Diffusion Transformers with Learned Token Sparsity
Optimal transport unlocks end-to-end learning for single-molecule localization
CogMoE: Signal-Quality–Guided Multimodal MoE for Cognitive Load Prediction
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning
Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs
A Study of Posterior Stability in Time-Series Latent Diffusion
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
Efficient Discriminative Joint Encoders for Large Scale Vision-Language Reranking
Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique
Scaling Sequence-to-Sequence Generative Neural Rendering
Jailbreak Transferability Emerges from Shared Representations
Wavelet Predictive Representations for Non-Stationary Reinforcement Learning
TaskCraft: Automated Generation of Agentic Tasks
MesaNet: Sequence Modeling by Locally Optimal Test-Time Training
Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context
Automated Stateful Specialization for Adaptive Agent Systems
Memorizing Long-tail Data Can Help Generalization Through Composition
Koopman-Assisted Trajectory Synthesis: A Data Augmentation Framework for Offline Imitation Learning
Non-Collaborative User Simulators for Tool Agents
Training Deep Normalization-Free Spiking Neural Networks with Lateral Inhibition
A Scalable Inter-edge Correlation Modeling in CopulaGNN for Link Sign Prediction
On the Spectral Differences Between NTK and CNTK and Their Implications for Point Cloud Recognition
ProofBridge: Auto-Formalization of Natural Language Proofs in Lean via Joint Embeddings
Relational Transformer: Toward Zero-Shot Foundation Models for Relational Data
Learning to Orchestrate Agents in Natural Language with the Conductor
Rating Quality of Diverse Time Series Data by Meta-learning from LLM Judgment
On The Expressive Power of GNN Derivatives
Soft Equivariance Regularization for Invariant Self-Supervised Learning
SketchEvo: Leveraging Drawing Dynamics for Enhanced Image Synthesis
Correlated Policy Optimization in Multi-Agent Subteams
Trion: FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of LLMs
Formal Mechanistic Interpretability: Automated Circuit Discovery with Provable Guarantees
SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs
Grounding and Enhancing Informativeness and Utility in Dataset Distillation
TVTSyn: Content-Synchronous Time-Varying Timbre for Streaming Voice Conversion and Anonymization
Agentic Reinforced Policy Optimization
From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments
TangleScore: Tangle-Guided Purge and Imprint for Unstructured Knowledge Editing
Learning Correlated Reward Models: Statistical Barriers and Opportunities
Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression
Batch Pruning by Activation Stability
Concept-TRAK: Understanding how diffusion models learn concepts through concept attribution
Learning in Prophet Inequalities with Noisy Observations
UnLoc: Leveraging Depth Uncertainties for Floorplan Localization
Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models
Wiki-R1: Incentivizing Multimodal Reasoning for Knowledge-based VQA via Data and Sampling Curriculum
SongEcho: Towards Cover Song Generation via Instance-Adaptive Element-wise Linear Modulation
Rejuvenating Cross-Entropy Loss in Knowledge Distillation for Recommender Systems
CTRL&SHIFT: High-quality Geometry-Aware Object Manipulation in Visual Generation
How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining
Huxley-G\"odel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine
ATGen: Adversarial Reinforcement Learning for Test Case Generation
NLI : Non-uniform Linear Interpolation Approximation of Nonlinear Operations for Efficient LLMs Inference
Deterministic Bounds and Random Estimates of Metric Tensors on Neuromanifolds
From Samples to Scenarios: A New Paradigm for Probabilistic Forecasting
VeriTrail: Closed-Domain Hallucination Detection with Traceability
DreamPhase: Offline Imagination and Uncertainty-Guided Planning for Large-Language-Model Agents
HARP: Hallucination Detection via Reasoning Subspace Projection
ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases
StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams
DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models
Detecting Temporal Misalignment Attacks in Multimodal Fusion for Autonomous Driving
Emergent Coordination in Multi-Agent Language Models
DMAP: A Distribution Map for Text
Decomposing Representation Space into Interpretable Subspaces with Unsupervised Learning
Transformers Learn Latent Mixture Models In-Context via Mirror Descent
Flatness Guided Test-Time Adaptation for Vision-Language Models
SSG: Scaled Spatial Guidance for Multi-Scale Visual Autoregressive Generation
Data-to-Energy Stochastic Dynamics
Infinite Horizon Markov Economies
Distilling to Hybrid Attention Models via KL-Guided Layer Selection
Trajectory-aware Shifted State Space Models for Online Video Super-Resolution
Ensemble Prediction of Task Affinity for Efficient Multi-Task Learning
Steering the Herd: A Framework for LLM-based Control of Social Learning
How Many Code and Test Cases Are Enough? Evaluating Test Cases Generation from a Binary-Matrix Perspective
Extending the Context of Pretrained LLMs by Dropping Their Positional Embedding
Pisces: Cryptography-based Private Retrieval-Augmented Generation with Dual-Path Retrieval
Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning
Beyond Aggregation: Guiding Clients in Heterogeneous Federated Learning
Generative Value Conflicts Reveal LLM Priorities
ProPerSim: Developing Proactive and Personalized AI Assistants through User-Assistant Simulation
Two (narrow) heads are better than (an arbitrarily wide) one
Score Distillation Beyond Acceleration: Generative Modeling from Corrupted Data
Compositional Generalization through Gradient Search in Nonparametric Latent Space
Online Conformal Prediction with Adversarial Semi-bandit Feedback via Regret Minimization
Knowing When to Quit: Probabilistic Early Exits for Speech Separation Networks
Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners
Salient Object Ranking via Cyclical Perception-Viewing Interaction Modeling
Multimodal Dataset Distillation via Phased Teacher Models
A Law of Data Reconstruction for Random Features (And Beyond)
Dichotomous Diffusion Policy Optimization
Scaling Linear Attention Capacity with Sparse State Expansion
Exploring State-Space Models for Data-Specific Neural Representations
Task-Aware Data Selection via Proxy-Label Enhanced Distribution Matching for LLM Finetuning
MUSE: Model-Agnostic Tabular Watermarking via Multi-Sample Selection
Tricks or Traps? A Deep Dive into RL for LLM Reasoning
No, of Course I Can! Deeper Fine-Tuning Attacks That Bypass Token-Level Safety Mechanisms
EventFlash: Towards Efficient MLLMs for Event-Based Vision
Human or Machine? A Preliminary Turing Test for Speech-to-Speech Interaction
FullPart: Generating each 3D Part at Full Resolution
Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning
NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation
Code2Bench: Scaling Source and Rigor for Dynamic Benchmark Construction
The Less You Depend, The More You Learn: Synthesizing Novel Views from Sparse, Unposed Images without Any 3D Knowledge
MixLinear: Extreme Low Resource Multivariate Time Series Forecasting with $0.1K$ Parameters
Enhancing Multivariate Time Series Forecasting with Global Temporal Retrieval
Shuffling the Data, Extrapolating the Step: Sharper Bias In Constant Step-Size SGD
Characterizing Human Semantic Navigation in Concept Production as Trajectories in Embedding Space
FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models
Geometric Autoencoder Priors for Bayesian Inversion: Learn First Observe Later
DTO-KD: Dynamic Trade-off Optimization for Effective Knowledge Distillation
OPRIDE: Efficient Offline Preference-based Reinforcement Learning via In-Dataset Exploration
BiasScope: Towards Automated Detection of Bias in LLM-as-a-Judge Evaluation
EgoTwin: Dreaming Body and View in First Person
Behavioral Embeddings of Programs: A Quasi-Dynamic Approach for Optimization Prediction
ATLAS: Alibaba Dataset and Benchmark for Learning-Augmented Scheduling
Understanding the Mechanisms of Fast Hyperparameter Transfer
H$^3$DP: Triply‑Hierarchical Diffusion Policy for Visuomotor Learning
Stopping Computation for Converged Tokens in Masked Diffusion-LM Decoding
Weight Decay may matter more than µP for Learning Rate Transfer in Practice
LANE: Label-Aware Noise Elimination for Fine-Grained Text Classification
MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks
DreamCS: Geometry-Aware Text-to-3D Generation with Unpaired 3D Reward Supervision
Beyond Visual Reconstruction Quality: Object Perception-aware 3D Gaussian Splatting for Autonomous Driving
WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool
ReST-KV: Robust KV Cache Eviction with Layer-wise Output Reconstruction and Spatial-Temporal Smoothing
The Mind's Transformer: Computational Neuroanatomy of LLM-Brain Alignment
Efficient Estimation of Kernel Surrogate Models for Task Attribution
SPWOOD: Sparse Partial Weakly-Supervised Oriented Object Detection
Inference-Time Dynamic Modality Selection for Incomplete Multimodal Classification
GTM: A General Time-series Model for Enhanced Representation Learning of Time-Series data
AutoGPS: Automated Geometry Problem Solving via Multimodal Formalization and Deductive Reasoning
CoEmoGen: Towards Semantically-Coherent and Scalable Emotional Image Content Generation
Mesh Splatting for End-to-end Multiview Surface Reconstruction
RAIN-Merging: A Gradient-Free Method to Enhance Instruction Following in Large Reasoning Models with Preserved Thinking Format
Distilled Pretraining: A modern lens of Data, In-Context Learning and Test-Time Scaling
Breaking and Fixing Defenses Against Control Flow Hijacking in Multi-Agent Systems
On Powerful Ways to Generate: Autoregression, Diffusion, and Beyond
Tighter Performance Theory of FedExProx
Policy Contrastive Decoding for Robotic Foundation Models
FastAvatar: Towards Unified and Fast 3D Avatar Reconstruction with Large Gaussian Reconstruction Transformers
Mitigating Semantic Collapse in Generative Personalization with Test-Time Embedding Adjustment
MicroMix: Efficient Mixed-Precision Quantization with Microscaling Formats for Large Language Models
Decoupled MeanFlow: Turning Flow Models into Flow Maps for Accelerated Sampling
RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards
Towards One-step Causal Video Generation via Adversarial Self-Distillation
Tina: Tiny Reasoning Models via LoRA
EMBridge: Enhancing Gesture Generalization from EMG Signals Through Cross-modal Representation Learning
SONATA: Synergistic Coreset Informed Adaptive Temporal Tensor Factorization
PateGAIL++: Utility Optimized Private Trajectory Generation with Imitation Learning
LLM2Fx-Tools: Tool Calling for Music Post-Production
A Genetic Algorithm for Navigating Synthesizable Molecular Spaces
Personalized Collaborative Learning with Affinity-Based Variance Reduction
COMAL: A Convergent Meta-Algorithm for Aligning LLMs with General Preferences
Einstein Fields: A Neural Perspective To Computational General Relativity
ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs
DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving
SpatialViz-Bench: A Cognitively-Grounded Benchmark for Diagnosing Spatial Visualization in MLLMs
Transfer Learning in Infinite Width Feature Learning Networks
Incentive-Aligned Multi-Source LLM Summaries
ReFocusEraser: Refocusing for Small Object Removal with Robust Context-Shadow Repair
Beyond Structure: Invariant Crystal Property Prediction with Pseudo-Particle Ray Diffraction
TreeGrad-Ranker: Feature Ranking via $O(L)$-Time Gradients for Decision Trees
Learning to Reason without External Rewards
UniCalli: A Unified Diffusion Framework for Column-Level Generation and Recognition of Chinese Calligraphy
Learn More with Less: Uncertainty Consistency Guided Query Selection for RLVR
A Graph Meta-Network for Learning on Kolmogorov–Arnold Networks
CoT-Evo: Evolutionary Distillation of Chain-of-Thought for Scientific Reasoning
OpenFly: A COMPREHENSIVE PLATFORM FOR AERIAL VISION-LANGUAGE NAVIGATION
FreeAdapt: Unleashing Diffusion Priors for Ultra-High-Definition Image Restoration
Multimodal LLM-assisted Evolutionary Search for Programmatic Control Policies
Test-Time Adaptation for LLM Agents via Environment Interaction
WholeBodyVLA: Towards Unified Latent VLA for Whole-body Loco-manipulation Control
DaVinci: Reinforcing Visual-Structural Syntax in MLLMs for Generalized Scientific Diagram Parsing
BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation
BOAD: Discovering Hierarchical Software Engineering Agents via Bandit Optimization
Robust Optimization for Mitigating Reward Hacking with Correlated Proxies
CryoNet.Refine: A One-step Diffusion Model for Rapid Refinement of Structural Models with Cryo-EM Density Map Restraints
Incentives in Federated Learning with Heterogeneous Agents
Dynamic Early Exit in Reasoning Models
Look-ahead Reasoning with a Learned Model in Imperfect Information Games
Let's Explore Step by Step: Generating Provable Formal Statements with Deductive Exploration
Seesaw: Accelerating Training by Balancing Batch Size and Learning Rate Scheduling
Learning from Algorithm Feedback: One-Shot SAT Solver Guidance with GNNs
Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel
GUI-Shift: Enhancing VLM-Based GUI Agents through Self-supervised Reinforcement Learning
Learning to Grasp Anything By Playing with Random Toys
PaAno: Patch-Based Representation Learning for Time-Series Anomaly Detection
Arbitrary-Order Block SignSGD for Memory-Efficient LLM Fine-Tuning
Active Learning for Decision Trees with Provable Guarantees
Intrinsic Lorentz Neural Network
Horseshoe Splatting: Handling Structural Sparsity for Uncertainty-Aware Gaussian-Splatting Radiance Field Rendering
Label-Free Mitigation of Spurious Correlations in VLMs using Sparse Autoencoders
In-Context Learning for Pure Exploration
Fine-Grained Class-Conditional Distribution Balancing for Debiased Learning
Local Geometry Attention for Time Series Forecasting under Realistic Corruptions
Variation-aware Flexible 3D Gaussian Editing
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence
SVD Provably Denoises Nearest Neighbor Data
Discrete Guidance Matching: Exact Guidance for Discrete Flow Matching
ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents
TCD-Arena: Assessing Robustness of Time Series Causal Discovery Methods Against Assumption Violations
WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research
Biologically Plausible Learning via Bidirectional Spike-Based Distillation
Exploratory Causal Inference in SAEnce
Peng's Q($\lambda$) for Conservative Value Estimation in Offline Reinforcement Learning
From Collapse to Control: Understanding and Extending Context Length in Emerging Hybrid Models via Universal Position Interpolation
OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging
Enhancing Visual Token Representations for Video Large Language Models via Training-free Spatial-Temporal Pooling and Gridding
DriveMamba: Task-Centric Scalable State Space Model for Efficient End-to-End Autonomous Driving
Sample Reward Soups: Query-efficient Multi-Reward Guidance for Text-to-Image Diffusion Models
A Unified Federated Framework for Trajectory Data Preparation via LLMs
Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages
HAMLET: A Hierarchical and Adaptive Multi-Agent Framework for Live Embodied Theatrics
Discrete Compositional Generation via General Soft Operators and Robust Reinforcement Learning
AVEX: What Matters for Animal Vocalization Encoding
Continuous Audio Language Models
Purifying Generative LLMs from Backdoors without Prior Knowledge or Clean Reference
Neural Collapse in Multi-Task Learning
Causal Score Conditioning for Multi-Resolution Latent Systems
PolySHAP: Extending KernelSHAP with Interaction-Informed Polynomial Regression
Beyond Raw Detection Scores: Markov-Informed Calibration for Boosting Machine-Generated Text Detection
Choices Speak Louder than Questions
Consistency-Driven Calibration and Matching for Few-Shot Class Incremental Learning
Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts
Keep the Best, Forget the Rest: Reliable Alignment with Order-Aware Preference Optimization
A cross-species neural foundation model for end-to-end speech decoding
MoSA: Motion-Coherent Human Video Generation via Structure-Appearance Decoupling
Unleashing LLMs in Bayesian Optimization: Preference-Guided Framework for Scientific Discovery
Human-Object Interaction via Automatically Designed VLM-Guided Motion Policy
A2ASecBench: A Protocol-Aware Security Benchmark for Agent-to-Agent Multi-Agent Systems
Smarter Not Harder: Generative Process Evaluation with Intrinsic-Signal Driving and Ability‑Adaptive Reward Shaping
FLARE: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
UniF$^2$ace: A $\underline{Uni}$fied $\underline{F}$ine-grained $\underline{Face}$ Understanding and Generation Model
Test-Time Mixture of World Models for Embodied Agents in Dynamic Environments
Boolean Satisfiability via Imitation Learning
Homeostatic Adaptation of Optimal Population Codes under Metabolic Stress
DCFold: Efficient Protein Structure Generation with Single Forward Pass
On Code-Induced Reasoning in LLMs
Scalable Multi-Task Low-Rank Model Adaptation
Rethinking Pareto Frontier: On the Optimal Trade-offs in Fair Classification
COSA: Context-aware Output-Space Adapter for Test-Time Adaptation in Time Series Forecasting
TP-Spikformer: Token Pruned Spiking Transformer
Learning on a Razor’s Edge: Identifiability and Singularity of Polynomial Neural Networks
CoT Vectors: Transferring and Probing the Reasoning Mechanisms of LLMs
Learning Physics-Grounded 4D Dynamics with Neural Gaussian Force Fields
TSPulse: Tiny Pre-Trained Models with Disentangled Representations for Rapid Time-Series Analysis
ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs
Long-range Modeling and Processing of Multimodal Event Sequences
Test-Time Poisoned Sample Detection by Exploiting Shallow Malicious Matching in Backdoored CLIP
Metric $k$-clustering using only Weak Comparison Oracles
3DSMT: A Hybrid Spiking Mamba-Transformer for Point Cloud Analysis
Temporal Representations for Exploration: Learning Complex Exploratory Behavior without Extrinsic Rewards
PolySkill: Learning Generalizable Skills Through Polymorphic Abstraction For Continual Learning
Pretrain–Test Task Alignment Governs Generalization in In-Context Learning
Enhancing Diffusion-Based Sampling with Molecular Collective Variables
WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving
RegionReasoner: Region-Grounded Multi-Round Visual Reasoning
Learning Shrinks the Hard Tail: Training‑Dependent Inference Scaling in a Solvable Linear Model
DiSRouter: Distributed Self-Routing for LLM Selections
Knowledge Exchange with Confidence: Cost-Effective LLM Integration for Reliable and Efficient Visual Question Answering
From Abstract to Contextual: What LLMs Still Cannot Do in Mathematics
EUBRL: Epistemic Uncertainty Directed Bayesian Reinforcement Learning
Unified and Efficient Multi-view Clustering from Probabilistic Perspective
What Generative Search Engines Like and How to Optimize Web Content Cooperatively
DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains
Mode-conditioning unlocks superior test-time compute scaling
Contrastive Predictive Coding Done Right for Mutual Information Estimation
Denoising Neural Reranker for Recommender Systems
A Study on PAVE Specification for Learnware
Learning from Label Proportions via Proportional Value Classification
Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match
SyncTrack: Rhythmic Stability and Synchronization in Multi-Track Music Generation
Cut Less, Fold More: Model Compression through the Lens of Projection Geometry
Output Supervision Can Obfuscate the Chain of Thought
Unified Privacy Guarantees for Decentralized Learning via Matrix Factorization
Improving Set Function Approximation with Quasi-Arithmetic Neural Networks
Error Notebook-Guided, Training-Free Part Retrieval in 3D CAD Assemblies via Vision-Language Models
Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning
Reinforcement Learning via Value Gradient Flow
Do Large Language Models Know What They Are Capable Of?
On the Benefits of Weight Normalization for Overparameterized Matrix Sensing
InfoTok: Adaptive Discrete Video Tokenizer via Information-Theoretic Compression
Learning Recursive Multi-Scale Representations for Irregular Multivariate Time Series Forecasting
MC-Search: Evaluating and Enhancing Multimodal Agentic Search with Structured Long Reasoning Chains
JULI: Jailbreak Large Language Models by Self-Introspection
Safe Exploration via Policy Priors
3DCS: Datasets and Benchmark for Evaluating Conformational Sensitivity in Molecular Representations
Structural Inference: Interpreting Small Language Models with Susceptibilities
Factuality Matters: When Image Generation and Editing Meet Structured Visuals
Revisiting Active Sequential Prediction-Powered Mean Estimation
Empowering Multi-Robot Cooperation via Sequential World Models
Strategic Obfuscation of Deceptive Reasoning in Language Models
I2Mole: Interaction-aware Invariant Molecular Learning For Generalizable Property Prediction
Bringing Stability to Diffusion: Decomposing and Reducing Variance of Training Masked Diffusion Models
Intrinsic training dynamics of deep neural networks
Breaking the Total Variance Barrier: Sharp Sample Complexity for Linear Heteroscedastic Bandits with Fixed Action Set
Verifier-Constrained Flow Expansion for Discovery Beyond the Data
Heads collapse, features stay: Why Replay needs big buffers
AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy
SMAN-Bench: A Cross-System Benchmark for Mobile Agents under Single- and Multi-path, Ambiguous, and Noisy Tasks
GenCape: Structure-Inductive Generative Modeling for Category-Agnostic Pose Estimation
MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs
Distributional Machine Unlearning via Selective Data Removal
Learning Heterogeneous Degradation Representation for Real-World Super-Resolution
D-REX: Differentiable Real-to-Sim-to-Real Engine for Learning Dexterous Grasping
RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback
Adaptive Collaboration with Humans: Metacognitive Policy Optimization for Multi-Agent LLMs with Continual Learning
Uni-X: Mitigating Modality Conflict with a Two-End-Separated Architecture for Unified Multimodal Models
Prosperity before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs?
DRIFT: Decompose, Retrieve, Illustrate, then Formalize Theorems
ADM-v2: Pursuing Full-Horizon Roll-out in Dynamics Models for Offline Policy Learning and Evaluation
Universal Properties of Activation Sparsity in Modern Large Language Models
Rolling Forcing: Autoregressive Long Video Diffusion in Real Time
SRT: Super-Resolution for Time Series via Disentangled Rectified Flow
UniCA: Unified Covariate Adaptation for Time Series Foundation Model
Distribution-informed Online Conformal Prediction
CARL: Preserving Causal Structure in Representation Learning
Why Do Unlearnable Examples Work: A Novel Perspective of Mutual Information
Mamba-3: Improved Sequence Modeling using State Space Principles
Beyond Markovian Drifts: Action-Biased Geometric Walks with Memory for Personalized Summarization
QUEST: A robust attention formulation using query-modulated spherical attention
CDBridge: A Cross-omics Post-training Bridge Strategy for Context-aware Biological Modeling
Semi-Parametric Contextual Pricing with General Smoothness
Theoretical Modeling of Large Language Model Self-Improvement Training Dynamics Through Solver-Verifier Gap
Efficient-SAM2: Accelerating SAM2 with Object-Aware Visual Encoding and Memory Retrieval
There is No VAE: End-to-End Pixel-Space Generative Modeling via Self-Supervised Pre-Training
MotionGPT3: Human Motion as a Second Modality
Householder-Diagonalized Linear Attention (HDLA): Utilizing Enhanced Decay Mechanism for Efficient Sequence Modeling
WAFT: Warping-Alone Field Transforms for Optical Flow
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots
Monitoring Decomposition Attacks with Lightweight Sequential Monitors
Exchangeability of GNN Representations with Applications to Graph Retrieval
Scalable Multilingual Multimodal Machine Translation with Speech-Text Fusion
Online Prediction of Stochastic Sequences with High Probability Regret Bounds
InnoGym: Benchmarking the Innovation Potential of AI Agents
VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation
Tree-sliced Sobolev IPM
Beyond Short Steps in Frank-Wolfe Algorithms
TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture
Exploiting Low-Dimensional Manifold of Features for Few-Shot Whole Slide Image Classification
No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers
MMReD: a Cross-Modal Benchmark for Dense Context Reasoning
DiffTrans: Differentiable Geometry-Materials Decomposition for Reconstructing Transparent Objects
Sim2Real VLA: Zero-Shot Generalization of Synthesized Skills to Realistic Manipulation
VIRTUE: Visual-Interactive Text-Image Universal Embedder
A Cognitive Process-Inspired Architecture for Subject-Agnostic Brain Visual Decoding
Latent Planning Emerges with Scale
Synergizing Understanding and Generation with Interleaved Analyzing-Drafting Thinking
Spurious Correlation-Aware Embedding Regularization for Worst-Group Robustness
Zero-Overhead Introspection for Adaptive Test-Time Compute
PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
ByteFlow: Language Modeling through Adaptive Byte Compression without a Tokenizer
SinkTrack: Attention Sink based Context Anchoring for Large Language Models
LLMs Struggle to Balance Reasoning and World Knowledge in Causal Narrative Understanding
Improved high-dimensional estimation with Langevin dynamics and stochastic weight averaging
Discern Truth from Falsehood: Reducing Over-Refusal via Contrastive Refinement
Hilbert: Recursively Building Formal Proofs with Informal Reasoning
LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing
GAPrune: Gradient-Alignment Pruning for Domain-Aware Embeddings
CroCoDiLight: Repurposing Cross-View Completion Encoders for Relighting
SelvaBox: A high‑resolution dataset for tropical tree crown detection
MaRS: Memory-Adaptive Routing for Reliable Capacity Expansion and Knowledge Retention
Conditionally Whitened Generative Models for Probabilistic Time Series Forecasting
Interaction Field Matching: Overcoming Limitations of Electrostatic Models
StyliTruth : Unlocking Stylized yet Truthful LLM Generation via Disentangled Steering
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs
Generative Bayesian Optimization: Generative Models as Acquisition Functions
Deep Global-sense Hard-negative Discriminative Generation Hashing for Cross-modal Retrieval
Harpoon: Generalised Manifold Guidance for Conditional Tabular Diffusion
Ghost in the Cloud: Your Geo-Distributed Large Language Models Training is Easily Manipulated
To View Transform or Not to View Transform: NeRF-based Pre-training Perspective
Flow-Disentangled Feature Importance
ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance Generation
EditBench: Evaluating LLM Abilities to Perform Real-World Instructed Code Edits
Autonomous Functional Play with Correspondence-Driven Trajectory Warping
Non-Convex Federated Optimization under Cost-Aware Client Selection
Composition-Grounded Data Synthesis for Visual Reasoning
Separable Neural Networks: Approximation Theory, NTK Regime, and Preconditioned Gradient Descent
Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning
D&R: Recovery-based AI-Generated Text Detection via a Single Black-box LLM Call
Talking Points: Describing and Localizing Pixels
Autoregressive-based Progressive Coding for Ultra-Low Bitrate Image Compression
Sample Efficient Offline RL via T-Symmetry Enforced Latent State-Stitching
EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video
R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning
Towards a Foundation Model for Crowdsourced Label Aggregation
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
KVComm: Enabling Efficient LLM Communication through Selective KV Sharing
EvA: Evolutionary Attacks on Graphs
Generate Any Scene: Scene Graph Driven Data Synthesis for Visual Generation Training
Towards Sustainable Investment Policies Informed by Opponent Shaping
WorldEdit: Towards Open-World Image Editing with a Knowledge-Informed Benchmark
Gradient-Normalized Smoothness for Optimization with Approximate Hessians
Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks
EmoPrefer: Can Large Language Models Understand Human Emotion Preferences?
ToolTree: Efficient LLM Tool Planning via Dual-Feedback Monte Carlo Tree Search and Bidirectional Pruning
Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation
Multi-Agent Debate with Memory Masking
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse
Tell me Habibi, is it Real or Fake?
Revisit Visual Prompt Tuning: The Expressiveness of Prompt Experts
Late-to-Early Training: LET LLMs Learn Earlier, So Faster and Better
Context Learning for Multi-Agent Discussion
Context parroting: A simple but tough-to-beat baseline for foundation models in scientific machine learning
ComPhy: Composing Physical Models with end-to-end Alignment
A^2TG: Adaptive Anisotropic Textured Gaussians for Efficient 3D Scene Representation
GCGNet: Graph-Consistent Generative Network for Time Series Forecasting with Exogenous Variables
Pinet: Optimizing hard-constrained neural networks with orthogonal projection layers
Robustify Spiking Neural Networks via Dominant Singular Deflation under Heterogeneous Training Vulnerability
Doubly-Robust LLM-as-a-Judge: Externally Valid Estimation with Imperfect Personas
Compositional Visual Planning via Inference-Time Diffusion Scaling
Learning Molecular Chirality via Chiral Determinant Kernels
EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling
MCbiF: Measuring Topological Autocorrelation in Multiscale Clusterings via 2-Parameter Persistent Homology
Learning Adaptive Distribution Alignment with Neural Characteristic Function for Graph Domain Adaptation
MetaVLA: Unified Meta Co-Training for Efficient Embodied Adaptation
LiveClin: A Live Clinical Benchmark without Leakage
EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements
Let OOD Feature Exploring Vast Predefined Classifiers
Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs
LEGACY: A Lightweight Dynamic Gradient Compression Strategy for Distributed Deep Learning
EigenScore: OOD Detection using Posterior Covariance in Diffusion Models
Towards Understanding the Shape of Representations in Protein Language Models
Knowledge Distillation for Large Language Models through Residual Learning
Functional MRI Time Series Generation via Wavelet-Based Image Transform and Spectral Flow Matching for Brain Disorder Identification
Can You Hear Me Now? A Benchmark for Long-Range Graph Propagation
The Softmax Bottleneck Does Not Limit the Probabilities of the Most Likely Tokens
SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence
Disentangled Robot Learning via Separate Forward and Inverse Dynamics Pretraining
Seeing What’s Not There: Negation Understanding Needs More Than Training
Signal Structure-Aware Gaussian Splatting for Large-Scale Scene Reconstruction
Generalization of RLVR Using Causal Reasoning as a Testbed
LLaVA-FA: Learning Fourier Approximation for Compressing Large Multimodal Models
Theoretical Analysis of Contrastive Learning under Imbalanced Data: From Training Dynamics to a Pruning Solution
Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions
BideDPO: Conditional Image Generation with Simultaneous Text and Condition Alignment
Generative Universal Verifier as Multimodal Meta-Reasoner
EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging Benchmark
JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models
EgoBrain: Synergizing Minds and Eyes For Human Action Understanding
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
Shift-and-Sum Quantization for Visual Autoregressive Models
PQGAN: Product-Quantised Image Representation for High-Quality Image Synthesis
On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment
DynamicInfer: Runtime-Aware Sparse Offloading for LLMs Inference on a Consumer-Grade GPU
TimeRecipe: A Time-Series Forecasting Recipe via Benchmarking Module Level Effectiveness
AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents
ECHO: Toward Contextual Seq2Seq Paradigms in Large EEG Models
Multiple Token Divergence: Measuring and Steering In-Context Computation Density
FIRE: Frobenius-Isometry Reinitialization for Balancing the Stability–Plasticity Tradeoff
Reasoning in Space via Grounding in the World
All That Glitters Is Not Gold: Key-Secured 3D Secrets within 3D Gaussian Splatting
Replicable Reinforcement Learning with Linear Function Approximation
TIGaussian: Disentangle Gaussians for Spatial-Awared Text-Image-3D Alignment
Figma2Code: Automating Multimodal Design to Code in the Wild
On the Convergence Behavior of Preconditioned Gradient Descent Toward the Rich Learning Regime
ActiveCQ: Active Estimation of Causal Quantities
LINK: Learning Instance-level Knowledge from Vision-Language Models for Human-Object Interaction Detection
MASS: MoErging through Adaptive Subspace Selection
Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models
Aria: an Agent for Retrieval and Iterative Auto-Formalization via Dependency Graph
Are Global Dependencies Necessary? Scalable Time Series Forecasting via Local Cross-Variate Modeling
$AutoDrive\text{-}P^3$: Unified Chain of Perception–Prediction–Planning Thought via Reinforcement Fine-Tuning
ACCORD: Alleviating Concept Coupling through Dependence Regularization for Text-to-Image Diffusion Personalization
RL Grokking Recipe: How Does RL Unlock and Transfer New Algorithms in LLMs?
The Alignment Auditor: A Bayesian Framework for Verifying and Refining LLM Objectives
Nüwa: Mending the Spatial Integrity Torn by VLM Token Pruning
LLMS ON TRIAL: Evaluating Judicial Fairness For Large Language Models
Demystifying and Enhancing the Efficiency of Large Language Model Based Search Agents
Dual-objective Language Models: Training Efficiency Without Overfitting
Understanding Cross-layer Contributions to Mixture-of-Experts Routing in LLMs
Trace Anything: Representing Any Video in 4D via Trajectory Fields
UniCon: Unified Framework for Efficient Contrastive Alignment via Kernels
Taming Score-Based Denoisers in ADMM: A Convergent Plug-and-Play Framework
Study of Training Dynamics for Memory-Constrained Fine-Tuning
Combinatorial Bandit Bayesian Optimization for Tensor Outputs
Submodular Function Minimization with Dueling Oracle
Octax: Accelerated CHIP-8 Arcade Environments for Reinforcement Learning in JAX
PSDNorm: Temporal Normalization for Deep Learning in Sleep Staging
TEN-DM: Topology-Enhanced Diffusion Model for Spatio-Temporal Event Prediction
Identifiability Challenges in Sparse Linear Ordinary Differential Equations
On the Reasoning Abilities of Masked Diffusion Language Models
MoEEdit: Efficient and Routing-Stable Knowledge Editing for Mixture-of-Experts LLMs
Unbiased Gradient Estimation for Event Binning via Functional Backpropagation
RoRE: Rotary Ray Embedding for Generalised Multi-Modal Scene Understanding
A New Approach to Controlling Linear Dynamical Systems
NetArena: Dynamic Benchmarks for AI Agents in Network Automation
Score-based Greedy Search for Structure Identification of Partially Observed Causal Models
An Agentic Framework with LLMs for Solving Complex Vehicle Routing Problems
LongHorizonUI: A Unified Framework for Robust long-horizon Task Automation of GUI Agent
LitmusValues: Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas
Breaking Safety Paradox with Feasible Dual Policy Iteration
TrajFlow: Nation-wide Pseudo GPS Trajectory Generation with Flow Matching Models
In-Context Algorithm Emulation in Fixed-Weight Transformers
Energy-oriented Diffusion Bridge for Image Restoration with Foundational Diffusion Models
One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning
Saddle-To-Saddle Dynamics in Deep ReLU Networks: Low-Rank Bias in the First Saddle Escape
Contrastive Diffusion Guidance for Spatial Inverse Problems
From Prediction to Perfection: Introducing Refinement to Autoregressive Image Generation
Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models
HistoPrism: Unlocking Functional Pathway Analysis from Pan-Cancer Histology via Gene Expression Prediction
Task-free Adaptive Meta Black-box Optimization
Capturing Visual Environment Structure Correlates with Control Performance
Test-Time Iterative Error Correction for Efficient Diffusion Models
High-Dimensional Analysis of Single-Layer Attention for Sparse-Token Classification
Fast Escape, Slow Convergence: Learning Dynamics of Phase Retrieval under Power-Law Data
DecompGAIL: Learning Realistic Traffic Behaviors with Decomposed Multi-Agent Generative Adversarial Imitation Learning
CaRe-BN: Precise Moving Statistics for Stabilizing Spiking Neural Networks in Reinforcement Learning
Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning
Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty
Certifying the Full YOLO Pipeline: A Probabilistic Verification Approach
Q&C: When Quantization Meets Cache in Efficient Generation
Beyond Linear Probes: Dynamic Safety Monitoring for Language Models
FideDiff: Efficient Diffusion Model for High-Fidelity Image Motion Deblurring
Composition of Pretrained Diffusion Models: A Logic-Based Calculus
PatchRefiner V2: Fast and Lightweight Real-Domain High-Resolution Metric Depth Estimation
Ads that Stick: Near-Optimal Ad Optimization through Psychological Behavior Models
Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards
CryoLVM: Self-supervised Learning from Cryo-EM Density Maps with Large Vision Models
SubDyve: Subgraph-Driven Dynamic Propagation for Virtual Screening Enhancement Controlling False Positive
Point2RBox-v3: Self-Bootstrapping from Point Annotations via Integrated Pseudo-Label Refinement and Utilization
Unlearning during Training: Domain-Specific Gradient Ascent for Domain Generalization
LiFR-Seg: Anytime High-Frame-Rate Segmentation via Event-Guided Propagation
Consolidating Reinforcement Learning for Multimodal Discrete Diffusion Models
FedMuon: Federated Learning with Bias-corrected LMO-based Optimization
SAFA-SNN: Sparsity-Aware On-Device Few-Shot Class-Incremental Learning with Fast-Adaptive Structure of Spiking Neural Network
Deploying Models to Non-participating Clients in Federated Learning without Fine-tuning: A Hypernetwork-based Approach
DUET: Optimizing LLM Training Data Mixtures via Noisy Feedback from Unseen, Downstream Evaluation Tasks
Learning Global Hypothesis Space for Enhancing Synergistic Reasoning Chain
Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation
On-the-Fly Adaptation to Quantization: Configuration-Aware LoRA for Efficient Fine-Tuning of Quantized LLMs
Programming with Pixels: Can Computer-Use Agents do Software Engineering?
Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer
Soft Tokens, Hard Truths
ContextPRM: Leveraging Contextual Coherence for multi-domain Test-Time Scaling
Diversity-Incentivized Exploration for Versatile Reasoning
ShieldedCode: Learning Robust Representations for Virtual Machine Protected Code
Optimizer Choice Matters For The Emergence of Neural Collapse
Tequila: Trapping-free Ternary Quantization for Large Language Models
Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition
Enabling True Global Perception in State Space Models for Visual Tasks
Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond
R-Zero: Self-Evolving Reasoning LLM from Zero Data
Out-of-Distribution Graph Models Merging
CheckMate! Watermarking Graph Diffusion Models in Polynomial Time
RATE-DISTORTION OPTIMIZED PRAGMATIC COMMUNICATION FOR COLLABORATIVE PERCEPTION
Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function
Improving Code Localization with Repository Memory
Exposing and Defending the Achilles' Heel of Video Mixture-of-Experts
VFScale: Intrinsic Reasoning through Verifier-Free Test-time Scalable Diffusion Model
FlyPrompt: Brain-Inspired Random-Expanded Routing with Temporal-Ensemble Experts for General Continual Learning
Debiased Front-Door Learners for Heterogeneous Effects
SciNav: A General Agent Framework for Scientific Coding Tasks
UFO-4D: Unposed Feedforward 4D Reconstruction from Two Images
Discovering alternative solutions beyond the simplicity bias in recurrent neural networks
GenSR: Symbolic regression based on equation generative space
LEAP: Local ECT-Based Learnable Positional Encodings for Graphs
SteinsGate: Adding Causality to Diffusions for Long Video Generation via Path Integral
LVTINO: LAtent Video consisTency INverse sOlver for High Definition Video Restoration
MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs
Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization
Universal Inverse Distillation for Matching Models with Real-Data Supervision (No GANs)
CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmarking of Large Language Models in Mental Health Question Answering
Deep Think with Confidence
Why Less is More (Sometimes): A Theory of Data Curation
NePTune: A Neuro-Pythonic Framework for Tunable Compositional Reasoning on Vision-Language
D$^2$GS: Depth-and-Density Guided Gaussian Splatting for Stable and Accurate Sparse-View Reconstruction
No Prior, No Leakage: Revisiting Reconstruction Attacks in Trained Neural Networks
Inference-Time Scaling of Discrete Diffusion Models via Importance Weighting and Optimal Proposal Design
Adaptive Conformal Anomaly Detection with Time Series Foundation Models for Signal Monitoring.
Differentially Private Equilibrium Finding in Polymatrix Games
MicroVerse: A Preliminary Exploration Toward a Micro-World Simulation
MVR: Multi-view Video Reward Shaping for Reinforcement Learning
Rethinking Unsupervised Cross-modal Flow Estimation: Learning from Decoupled Optimization and Consistency Constraint
Architecture-Agnostic Test-Time Adaptation via Backprop-Free Embedding Alignment
Value Matching: Scalable and Gradient-Free Reward-Guided Flow Adaptation
LaTo: Landmark-tokenized Diffusion Transformer for Fine-grained Human Face Editing
Qronos: Correcting the Past by Shaping the Future... in Post-Training Quantization
Convergent Differential Privacy Analysis for General Federated Learning
MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers
FRABench and UFEval: Unified Fine-grained Evaluation with Task and Aspect Generalization
DanceTogether: Generating Interactive Multi-Person Video without Identity Drifting
Constrained Decoding of Diffusion LLMs with Context-Free Grammars
Asynchronous Matching with Dynamic Sampling for Multimodal Dataset Distillation
From Sorting Algorithms to Scalable Kernels: Bayesian Optimization in High-Dimensional Permutation Spaces
Decoupling Dynamical Richness from Representation Learning: Towards Practical Measurement
Plug, Play, and Fortify: A Low-Cost Module for Robust Multimodal Image Understanding Models
Minimax Rates for Learning Pairwise Interactions in Attention-Style Models
PSP: Prompt-Guided Self-Training Sampling Policy for Active Prompt Learning
Spiking Discrepancy Transformer for Point Cloud Analysis
Semantic Voting: A Self-Evaluation-Free Approach for Efficient LLM Self-Improvement on Unverifiable Open-ended Tasks
BoreaRL: A Multi-Objective Reinforcement Learning Environment for Climate-Adaptive Boreal Forest Management
Ctrl-World: A Controllable Generative World Model for Robot Manipulation
Social Agents: Collective Intelligence Improves LLM Predictions
QueryStream: Advancing Streaming Video Understanding with Query-Aware Pruning and Proactive Response
PE-SGD: Differentially Private Deep Learning via Evolution of Gradient Subspace for Text
UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing
Secure Inference for Diffusion Models via Unconditional Scores
Gradient-Direction-Aware Density Control for 3D Gaussian Splatting
Delay Flow Matching
Musculoskeletal simulation of limb movement biomechanics in Drosophila melanogaster
AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM
Distributional value gradients for stochastic environments
Goal-Aware Identification and Rectification of Misinformation in Multi-Agent Systems
LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference
Resisting Contextual Interference in RAG via Parametric-Knowledge Reinforcement
Using maximal information auxiliary variables to improve synthetic data generation based on TabPFN foundation models
Semantic Regexes: Auto-Interpreting LLM Features with a Structured Language
Reasoning Language Model Inference Serving Unveiled: An Empirical Study
Learning to Interpret Weight Differences in Language Models
One protein is all you need
MatRIS: Toward Reliable and Efficient Pretrained Machine Learning Interatomic Potentials
Ringleader ASGD: The First Asynchronous SGD with Optimal Time Complexity under Data Heterogeneity
An evolutionary perspective on modes of learning in Transformers
Understanding the Implicit Biases of Design Choices for Time Series Foundation Models
One-Step Flow for Image Super-Resolution with Tunable Fidelity-Realism Trade-offs
MARC: Memory-Augmented RL Token Compression for Efficient Video Understanding
Quantization-Aware Diffusion Models For Maximum Likelihood Training
Diffusion Language Models are Provably Optimal Parallel Samplers
BWCache: Accelerating Video Diffusion Transformers through Block-Wise Caching
R-WoM: Retrieval-augmented World Model For Computer-use Agents
STITCH: Simultaneous Thinking and Talking with Chunked Reasoning for Spoken Language Models
From Fields to Random Trees
SCoT: Teaching 3D-LLMs to Think Spatially with Million-scale CoT Annotations
Anatomy-aware Representation Learning for Medical Ultrasound
Disrupting Hierarchical Reasoning: Adversarial Protection for Geographic Privacy in Multimodal Reasoning Models
SC-Arena: A Natural Language Benchmark for Single-Cell Reasoning with Knowledge-Augmented Evaluation
Reliable Evaluation of MRI Motion Correction: Dataset and Insights
High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes
A foundation model with multi-variate parallel attention to generate neuronal activity
AgentPO: Enhancing Multi-Agent Collaboration via Reinforcement Learning
Streaming Visual Geometry Transformer
VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration
VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning
RAVENEA: A Benchmark for Multimodal Retrieval-Augmented Visual Culture Understanding
CFT-RAG: An Entity Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter
Enhancing LLMs for Knowledge Base Question Answering by Chain-of-Decomposition
Learning a distance measure from the information-estimation geometry of data
From Narrow to Panoramic Vision: Attention-Guided Cold-Start Reshapes Multimodal Reasoning
DistMLIP: A Distributed Inference Platform for Machine Learning Interatomic Potentials
WavefrontDiffusion: Dynamic Decoding Schedule for Improved Reasoning
Count Bridges enable Modeling and Deconvolving Transcriptomic Data
Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization
SNAPHARD CONTRAST LEARNING
GIR-Bench: Versatile Benchmark for Generating Images with Reasoning
Pusa V1.0: Unlocking Temporal Control in Pretrained Video Diffusion Models via Vectorized Timestep Adaptation
Boomerang Distillation Enables Zero-Shot Model Size Interpolation
Textual Equilibrium Propagation for Deep Compound AI Systems
QuantSparse: Comprehensively Compressing Video Diffusion Transformer with Model Quantization and Attention Sparsification
PRISM-Physics: Causal DAG-Based Process Evaluation for Physics Reasoning
Unified Vision–Language Modeling via Concept Space Alignment
ReIn: Conversational Error Recovery with Reasoning Inception
Guided Query Refinement: Multimodal Hybrid Retrieval with Test-Time Optimization
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
Flow Matching with Semidiscrete Couplings
AbstRaL: Augmenting LLMs' Reasoning by Reinforcing Abstract Thinking
ReSplat: Degradation-agnostic Feed-forward Gaussian Splatting via Self-guided Residual Diffusion
Prompt-Robust Vision-Language Models via Meta-Finetuning
Visual Prompt-Agnostic Evolution
LCA: Local Classifier Alignment for Continual Learning
ScaleCap: Scalable Image Captioning via Dual-Modality Debiasing
TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models
Hinge Regression Tree: A Newton Method for Oblique Regression Tree Splitting
PLoP: Precise LoRA Placement for Efficient Finetuning of Large Models
A$^2$FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning
Rethinking JEPA: Compute‑Efficient Video Self-Supervised Learning with Frozen Teachers
GTA1: GUI Test-time Scaling Agent
HWC-Loco: A Hierarchical Whole-Body Control Approach to Robust Humanoid Locomotion
On the Convergence Direction of Gradient Descent
Deforming Videos to Masks: Flow Matching for Referring Video Segmentation
Expanding Reasoning Potential in Foundation Model by Learning Diverse Chains of Thought Patterns
Frozen Priors, Fluid Forecasts: Prequential Uncertainty for Low-Data Deployment with Pretrained Generative Models
Once-More: Continuous Self-Correction for Large Language Models via Perplexity-Guided Intervention
Belief-Based Offline Reinforcement Learning for Delay-Robust Policy Optimization
Diffusion & Adversarial Schrödinger Bridges via Iterative Proportional Markovian Fitting
Graph-of-Agents: A Graph-based Framework for Multi-Agent LLM Collaboration
PiCa: Parameter-Efficient Fine-Tuning with Column Space Projection
VisioMath: Benchmarking Figure-based Mathematical Reasoning in LMMs
Target-Aware Video Diffusion Models
Step-Aware Residual-Guided Diffusion for EEG Spatial Super-Resolution
Learning Structure-Semantic Evolution Trajectories for Graph Domain Adaptation
Your Models Have Thought Enough: Training Large Reasoning Models to Stop Overthinking
On The Geometry and Topology of Representations: the Manifolds of Modular Addition
BAR: Refactor the Basis of Autoregressive Visual Generation
Chessformer: A Unified Architecture for Chess Modeling
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Preserve and Personalize: Personalized Text-to-Image Diffusion Models without Distributional Drift
Hyden: A Hybrid Dual-Path Encoder for Monocular Geometry of High-resolution Images
Spherical Watermark: Encryption-Free, Lossless Watermarking for Diffusion Models
Clipped Gradient Methods for Nonsmooth Convex Optimization under Heavy-Tailed Noise: A Refined Analysis
ATPO: ADAPTIVE TREE POLICY OPTIMIZATION FOR MULTI-TURN MEDICAL DIALOGUE
Color3D: Controllable and Consistent 3D Colorization with Personalized Colorizer
CyberGym: Evaluating AI Agents' Real-World Cybersecurity Capabilities at Scale
HoloPart: Generative 3D Part Amodal Segmentation
Adaptive gradient descent on Riemannian manifolds and its applications to Gaussian variational inference
From Gradient Volume to Shapley Fairness: Towards Fair Multi-Task Learning
AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models
The Alignment Waltz: Jointly Training Agents to Collaborate for Safety
ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Problems
Multi-agent Coordination via Flow Matching
Transformers as Measure-Theoretic Associative Memory: A Statistical Perspective and Minimax Optimality
Fantastic Pretraining Optimizers and Where to Find Them
Can Small Training Runs Reliably Guide Data Curation? Rethinking Proxy-Model Practice
CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow-Map Models
Charts Are Not Images: On the Challenges of Scientific Chart Editing
Deep SPI: Safe Policy Improvement via World Models
IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?
AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
Master Skill Learning with Policy-Grounded Synergy of LLM-based Reward Shaping and Exploring
A Unification of Discrete, Gaussian, and Simplicial Diffusion
Stochastic Neural Networks for Causal Inference with Missing Confounders
SafeMPO: Constrained Reinforcement Learning with Probabilistic Incremental Improvement
Cyber-Zero: Training Cybersecurity Agents without Runtime
Patronus: Interpretable Diffusion Models with Prototypes
Disco: Densely-overlapping Cell Instance Segmentation via Adjacency-aware Collaborative Coloring
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
Don't Throw Away Your Pretrained Model
CoT-RVS: Zero-Shot Chain-of-Thought Reasoning Segmentation for Videos
Practical estimation of the optimal classification error with soft labels and calibration
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding
Language Identification in the Limit with Computational Trace
SuperMAN: Interpretable and Expressive Networks over Temporally Sparse Heterogeneous Data
Theory-Grounded Evaluation of Human-Like Fallacy Patterns in LLM Reasoning
Protection against Source Inference Attacks in Federated Learning
Tensor learning with orthogonal, Lorentz, and symplectic symmetries
Diversity-Enhanced Reasoning for Subjective Questions
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
BioX-Bridge: Model Bridging for Unsupervised Cross-Modal Knowledge Transfer across Biosignals
Dual-Solver: A Generalized ODE Solver for Diffusion Models with Dual Prediction
Dynamic Novel View Synthesis in High Dynamic Range
AQER: A Scalable and Efficient Data Loader for Digital Quantum Computers
Draft-based Approximate Inference for LLMs
Strategic Scaling of Test-Time Compute: A Bandit Learning Approach
Language Models are Injective and Hence Invertible
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
Understanding Dataset Distillation via Spectral Filtering
Benefits and Limitations of Communication in Multi-Agent Reasoning
Map as a Prompt: Learning Multi-Modal Spatial-Signal Foundation Models for Cross-scenario Wireless Localization
Off-Policy Evaluation for Ranking Policies under Deterministic Logging Policies
Overlap-weighted orthogonal meta-learner for treatment effect estimation over time
Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval
Learning Ordinal Probabilistic Reward from Preferences
Mechanistic Independence: A Principle for Identifiable Disentangled Representations
Multi-Feature Quantized Self-Attention for Fair Large Language Models
Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction
Geometric Image Editing via Effects-Sensitive In-Context Inpainting with Diffusion Transformers
Context and Diversity Matter: The Emergence of In-Context Learning in World Models
Efficient Prediction of Large Protein Complexes via Subunit-Guided Hierarchical Refinement
Larger Datasets Can Be Repeated More: A Theoretical Analysis of Multi-Epoch Scaling in Linear Regression
ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality
VERINA: Benchmarking Verifiable Code Generation
Automated Formalization via Conceptual Retrieval-Augmented LLMs
Reassessing Layer Pruning in LLMs: New Insights and Methods
LiveMoments: Reselected Key Photo Restoration in Live Photos via Reference-guided Diffusion
vAttention: Verified Sparse Attention via Sampling
FastGRPO: Accelerating Policy Optimization via Concurrency-aware Speculative Decoding and Online Draft Learning
AlphaSAGE: Structure-Aware Alpha Mining via GFlowNets for Robust Exploration
Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization
VideoNSA: Native Sparse Attention Scales Video Understanding
Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks
MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction
MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark
DeepEyesV2: Toward Agentic Multimodal Model
Contact-guided Real2Sim from Monocular Video with Planar Scene Primitives
The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward
A Noise is Worth Diffusion Guidance
Explainable LLM Unlearning through Reasoning
Quadratic Direct Forecast for Training Multi-Step Time-Series Forecast Models
Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation
EffiVMT: Video Motion Transfer via Efficient Spatial-Temporal Decoupled Finetuning
The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think
Routing Channel-Patch Dependencies in Time Series Forecasting with Graph Spectral Decomposition
Decomposing LLM Computation with Jets
Dual Distillation for Few-Shot Anomaly Detection
Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs
CatalystBench: A Comprehensive Multi-Task Benchmark for Advancing Language Models in Catalysis Science
Generalized Parallel Scaling with Interdependent Generations
Lifelong Learning with Behavior Consolidation for Vehicle Routing
OpenEstimate: Evaluating LLMs on Reasoning Under Uncertainty with Real-World Data
RESCHED: Rethinking Flexible Job Shop Scheduling from a Transformer-based Architecture with Simplified States
Delving into Spectral Clustering with Vision-Language Representations
Teaching Metric Distance to Discrete Autoregressive Language Models
Stochastic Self-Organization in Multi-Agent Systems
Complexity- and Statistics-Guided Anomaly Detection in Time Series Foundation Models
LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them?
Revisiting Tree-Sliced Wasserstein Distance Through the Lens of the Fermat–Weber Problem
AudioX: A Unified Framework for Anything-to-Audio Generation
PA3FF:Learning Part-Aware Dense 3D Feature Field For Generalizable Articulated Object Manipulation
Omni-View: Unlocking How Generation Facilitates Understanding in Unified 3D Model based on Multiview images
Consistent Noisy Latent Rewards for Trajectory Preference Optimization in Diffusion Models
LLM Unlearning with LLM Beliefs
Mapping Post-Training Forgetting in Language Models at Scale
Distributionally Robust Optimization via Generative Ambiguity Modeling
Terminal Velocity Matching
Revisiting Weight Regularization for Low-Rank Continual Learning
Coarse-to-Fine Learning of Dynamic Causal Structures
What Scales in Cross-Entropy Scaling Law?
OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis
Scaling Synthetic Task Generation for Agents via Exploration
DefensiveKV: Taming the Fragility of KV Cache Eviction in LLM Inference
Synchronizing Probabilities in Model-Driven Lossless Compression
WIMLE: Uncertainty‑Aware World Models with IMLE for Sample‑Efficient Continuous Control
Premise Selection for a Lean Hammer
Dataset Distillation for Memorized Data: Soft Labels can Leak Held-Out Teacher Knowledge
CircuitNet 3.0: A Multi-Modal Dataset with Task-Oriented Augmentation for AI-Driven Circuit Design
ARINBEV: Bird's-Eye View Layout Estimation with Conditional Autoregressive Model
WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference
X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model
Reward Is Enough: LLMs Are In-Context Reinforcement Learners
Inferring brain plasticity rule under long-term stimulation with structured recurrent dynamics
From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance
From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones
Text-Aware Image Restoration with Diffusion Models
Guaranteed Simply Connected Mesh Reconstruction from an Unorganized Point Cloud
PropensityBench: Evaluating Latent Safety Risks in Large Language Models via an Agentic Approach
W-EDIT: A Wavelet-Based Frequency-Aware Framework for Text-Driven Image Editing
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
Dynamic Weight Grafting: Localizing Finetuned Factual Knowledge in Transformers
The Geometry of Reasoning: Flowing Logics in Representation Space
Estimating Dimensionality of Neural Representations from Finite Samples
Revisiting Parameter Server in LLM Post-Training
MAGO: Beyond Fixed Hyperparameters with Multi-Objective Pareto Optimization for Hybrid LLM Reasoning
IC-Custom: Diverse Image Customization via In-Context Learning
VisualPrompter: Semantic-Aware Prompt Optimization with Visual Feedback for Text-to-Image Synthesis
ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation
Measuring the Intrinsic Dimension of Earth Representations
BoGrape: Bayesian optimization over graphs with shortest-path encoded
HiCache: A Plug-in Scaled-Hermite Upgrade for Taylor-Style Cache-then-Forecast Diffusion Acceleration
VisuRiddles: Fine-grained Perception is a Primary Bottleneck for Multimodal Large Language Models in Abstract Visual Reasoning
MATA: A Trainable Hierarchical Automaton System for Multi-Agent Visual Reasoning
Physically-Guided Optical Inversion Enable Non-Contact Side-Channel Attack on Isolated Screens
Scaling Multi-Task Bayesian Optimization with Large Language Models
Pose-RFT: Aligning MLLMs for 3D Pose Generation via Hybrid Action Reinforcement Fine-Tuning
Spatially Guided Training for Vision-Language-Action Model
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention
Astraea: A Token-wise Acceleration Framework for Video Diffusion Transformers
AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent
Initialization Schemes for Kolmogorov–Arnold Networks: An Empirical Study
UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction as Reasoning
GranViT: A Fine-Grained Vision Model For Autoregressive Multimodal Large Language Models
Adaptive Debiasing Tsallis Entropy for Test-Time Adaptation
TIPO: Text to Image with Text Pre-sampling for Prompt Optimization
Copy-Paste to Mitigate Large Language Model Hallucinations
Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation
On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations
Video Scene Segmentation with Genre and Duration Signals
Sparse Attention Adaptation for Long Reasoning
Short Window Attention Enables Long-Term Memorization
DevOps-Gym: Benchmarking AI Agents in Software DevOps Cycle
Continuous Space-Time Video Super-Resolution with 3D Fourier Fields
Eigen-Agent: Adaptive Multi-Agent Scientific Reasoning with Monitor-Based RAG
Navigating the Accuracy-Size Trade-Off with Flexible Model Merging
Aligning Visual Foundation Encoders to Tokenizers for Diffusion Models
VoMP: Predicting Volumetric Mechanical Property Fields
Stop Unnecessary Reflection: Training LRMs for Efficient Reasoning with Adaptive Reflection and Length Coordinated Penalty
Diffusion Alignment as Variational Expectation-Maximization
GraphPlanner: Graph Memory-Augmented Agentic Routing for Multi-Agent LLMs
Zero-shot Forecasting by Simulation Alone
CaTs and DAGs: Integrating Directed Acyclic Graphs with Transformers for Causally Constrained Predictions
GuardAlign: Test-time Safety Alignment in Multimodal Large Language Models
GarmentGPT: Compositional Garment Pattern Generation via Discrete Latent Tokenization
Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs
More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models
SigLIP-HD by Fine-to-Coarse Supervision
DiffInk: Glyph- and Style-Aware Latent Diffusion Transformer for Text to Online Handwriting Generation
Best-of-three-worlds Analysis for Dueling Bandits with Borda Winner
LoRA meets Riemannion: Muon Optimizer for Parametrization-independent Low-Rank Adapters
DistDF: Time-series Forecasting Needs Joint-distribution Wasserstein Alignment
Rectified Decoupled Dataset Distillation: A Closer Look for Fair and Comprehensive Evaluation
Causal Imitation Learning under Expert-Observable and Expert-Unobservable Confounding
ACPBench Hard: Unrestrained Reasoning about Action, Change, and Planning
SpatialHand: Generative Object Manipulation from 3D Prespective
Beyond Masks: Efficient, Flexible Diffusion Language Models via Deletion-Insertion Processes
Characterizing Pattern Matching and Its Limits on Compositional Task Structures
MMSearch-Plus: Benchmarking Provenance-Aware Search for Multimodal Browsing Agents
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation
Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing
Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers
LogiConBench: Benchmarking Logical Consistencies of LLMs
DeRaDiff: Denoising Time Realignment of Diffusion Models
Learning to Lie: Adversarial Attacks on Human-AI Teams and LLMs
KeepLoRA: Continual Learning with Residual Gradient Adaptation
BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models
Concept-based Adversarial Attack: a Probabilistic Perspective
d$^2$Cache: Accelerating Diffusion-Based LLMs via Dual Adaptive Caching
Text2Grad: Reinforcement Learning from Natural Language Feedback
Group Verification-based Policy Optimization for Interactive Coding Agents
Less Gaussians, Texture More: 4K Feed-Forward Textured Splatting
Statistical Advantage of Softmax Attention: Insights from Single-Location Regression
The Matthew Effect of AI Programming Assistants: A Hidden Bias in Software Evolution
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime
DiffVax: Optimization-Free Image Immunization Against Diffusion-Based Editing
Automatic and Structure-Aware Sparsification of Hybrid Neural ODEs with Application to Glucose Prediction
Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization
Beyond Softmax and Entropy: Convergence Rates of Policy Gradients with $\boldsymbol{f}$-SoftArgmax Parameterization $\&$ Coupled Regularization
References Improve LLM Alignment in Non-Verifiable Domains
Many Eyes, One Mind: Temporal Multi-Perspective and Progressive Distillation for Spiking Neural Networks
Si-GT: Fast Interconnect Signal Integrity Analysis for Integrated Circuit Design via Graph Transformers
Understanding the Dynamics of Forgetting and Generalization in Continual Learning via the Neural Tangent Kernel
IGU-LoRA: Adaptive Rank Allocation via Integrated Gradients and Uncertainty-Aware Scoring
Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking
Empowering LLM Tool Invocation with Tool-call Reward Model
PhaseFormer: From Patches to Phases for Efficient and Effective Time Series Forecasting
Light of Normals: Unified Feature Representation for Universal Photometric Stereo
Splat Feature Solver
RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy
BioMD: All-atom Generative Model for Biomolecular Dynamics Simulation
SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Tree-based Search
RiskPO: Risk-based Policy Optimization with Verifiable Reward for LLM Post-Training
Look Carefully: Adaptive Visual Reinforcements in Multimodal Large Language Models for Hallucination Mitigation
SAES-SVD: Self-Adaptive Suppression of Accumulated and Local Errors for SVD-based LLM Compression
Enabling Your Forensic Detector Know How Well It Performs on Distorted Samples
Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer
HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space
Seeing but Not Believing: Probing the Disconnect Between Visual Attention and Answer Correctness in VLMs
ERTACache: Error Rectification and Timesteps Adjustment for Efficient Diffusion
ProtoTS: Learning Hierarchical Prototypes for Explainable Time Series Forecasting
Depth Anything with Any Prior
Improving Extreme Wind Prediction with Frequency-Informed Learning
Large Depth Completion Model from Sparse Observations
SpeechJudge: Towards Human-Level Judgment for Speech Naturalness
GOOD: Geometry-guided Out-of-Distribution Modeling for Open-set Test-time Adaptation in Point Cloud Semantic Segmentation
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning
Beyond Skeletons: Learning Animation Directly from Driving Videos with Same2X Training Strategy
Communication-Efficient Decentralized Optimization via Double-Communication Symmetric ADMM
Deep Learning for Subspace Regression
Hierarchical Encoding Tree with Modality Mixup for Cross-modal Hashing
LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures
DeepPrim: a Physics-Driven 3D Short-term Weather Forecaster via Primitive Equation Learning
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle
Shoot First, Ask Questions Later? Building Rational Agents that Explore and Act Like People
Back to Square Roots: An Optimal Bound on the Matrix Factorization Error for Multi-Epoch Differentially Private SGD
Entropy-preserving reinforcement learning
PreciseCache: Precise Feature Caching for Efficient and High-fidelity Video Generation
Variation in Verification: Understanding Verification Dynamics in Large Language Models
PathChat-SegR1: Reasoning Segmentation in Pathology via SO-GRPO
LeRobot: An Open-Source Library for End-to-End Robot Learning
Diffusion Negative Preference Optimization Made Simple
Mathesis: Towards Formal Theorem Proving from Natural Languages
Vision-SR1: Self-Rewarding Vision-Language Model via Reasoning Decomposition and Multi-Reward Policy Optimization
SpecBranch: Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism
ReLi3D: Relightable Multi-view 3D Reconstruction with Disentangled Illumination
Quartet of Diffusions: Structure-Aware Point Cloud Generation through Part and Symmetry Guidance
Anime-Ready: Controllable 3D Anime Character Generation with Body-Aligned Component-Wise Garment Modeling
Predictive CVaR Q-learning
Referring Layer Decomposition
InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents
CARPRT: Class-Aware Zero-Shot Prompt Reweighting for Vision-Language Model
Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments
Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought
Missingness Bias Calibration in Feature Attribution Explanations
Difference-Aware Retrieval Policies for Imitation Learning
CoDi: Subject-Consistent and Pose-Diverse Text-to-Image Generation
From movement to cognitive maps: recurrent neural networks reveal how locomotor development shapes hippocampal spatial coding
Assembling the Mind's Mosaic: Towards EEG Semantic Intent Decoding
Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains
MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE
LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning
Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning
DES-LOC: Desynced Low Communication Adaptive Optimizers for Foundation Models
GDGB: A Benchmark for Generative Dynamic Text-Attributed Graph Learning
Disentangled Hierarchical VAE for 3D Human-Human Interaction Generation
TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them
Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals
AEGIS: Adversarial Target-Guided Retention-Data-Free Robust Concept Erasure from Diffusion Models
MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Models for Embodied Task Planning
Omni-Weather: A Unified Multimodal Model for Weather Radar Understanding and Generation
Efficient Testing for Correlation Clustering: Improved Algorithms and Optimal Bounds
Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective
Helix: Evolutionary Reinforcement Learning for Open-Ended Scientific Problem Solving
Understanding vs. Generation: Navigating Optimization Dilemma in Multimodal Models
Unlocking the Value of Text: Event-Driven Reasoning and Multi-Level Alignment for Time Series Forecasting
Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs
A Comprehensive Information-Decomposition Analysis of Large Vision-Language Models
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
SCRIBES: Web-Scale Script-Based Semi-Structured Data Extraction with Reinforcement Learning
AssoMem: Scalable Memory QA with Multi-Signal Associative Retrieval
MambaVoiceCloning: Efficient and Expressive Text-to-Speech via State-Space Modeling and Diffusion Control
G-reasoner: Foundation Models for Unified Reasoning over Graph-structured Knowledge
RADAR: Reasoning-Ability and Difficulty-Aware Routing for Reasoning LLMs
Neural Posterior Estimation with Latent Basis Expansions
Counterfactual Explanations on Robust Perceptual Geodesics
Your Language Model Secretly Contains Personality Subnetworks
Two-Way Is Better Than One: Bidirectional Alignment with Cycle Consistency for Exemplar-Free Class-Incremental Learning
Test-Time Accuracy-Cost Control in Neural Simulators via Recurrent-Depth
Prediction with Expert Advice under Local Differential Privacy
GuidedSampling: Steering LLMs Towards Diverse Candidate Solutions at Inference-Time
MathNet: A Global Multimodal Benchmark for Mathematical Reasoning and Retrieval
Loc$^{2}$: Interpretable Cross-View Localization via Depth-Lifted Local Feature Matching
MolLangBench: A Comprehensive Benchmark for Language-Prompted Molecular Structure Recognition, Editing, and Generation
We use cookies to store which papers have been visited.
I agree
Successful Page Load
ICLR uses cookies for essential functions only. We do not sell your personal information.
Our Privacy Policy »
Accept
We use cookies to store which papers have been visited.
I agree