Skip to yearly menu bar
Skip to main content
Main Navigation
ICLR
Help/FAQ
Contact ICLR
Downloads
ICLR Blog
Code of Conduct
Privacy Policy
Create Profile
Reset Password
Journal To Conference Track
Diversity & Inclusion
Proceedings at OpenReview
Future Meetings
Press
Exhibitor Information
ICLR Twitter
About ICLR
My Stuff
Login
Select Year: (2026)
2026
2025
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
Getting Started
Schedule
Main Conference
Invited Talks
Awards
Papers
Orals
Blog Track Posters
Workshops
Community
Town Hall
Socials
Sponsors
Organizers
Help
Getting Started
Layout:
mini
compact
topic
detail
×
No topics available
No sessions available
title
author
topic
session
shuffle
by
serendipity
bookmarked first
visited first
not visited first
bookmarked but not visited
Enable Javascript in your browser to see the papers page.
FastVMT: Eliminating Redundancy in Video Motion Transfer
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
Reinforcement Learning Fine-Tuning Enhances Activation Intensity and Diversity in the Internal Circuitry of LLMs
AFD-INSTRUCTION: A Comprehensive Antibody Instruction Dataset with Functional Annotations for LLM-Based Understanding and Design
SynCoGen: Synthesizable 3D Molecule Generation via Joint Reaction and Coordinate Modeling
Traceable Black-Box Watermarks For Federated Learning
Fine-Tuning Diffusion Models via Intermediate Distribution Shaping
How to Cure Newton for Unlearning Neural Networks? An Empirical Study from the Hessian Perspective
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
FALCON: Few-step Accurate Likelihoods for Continuous Flows
When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models
Multi-state Protein Sequence Design with DynamicMPNN
FragFM: Hierarchical Framework for Efficient Molecule Generation via Fragment-Level Discrete Flow Matching
Property-Driven Protein Inverse Folding with Multi-Objective Preference Alignment
Leveraging Discrete Function Decomposability for Scientific Design
CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models
Selection, Reflection and Self-Refinement: Revisit Reasoning Tasks via a Causal Lens
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model
Exploring Synthesizable Chemical Space with Iterative Pathway Refinements
Fast and Interpretable Protein Substructure Alignment via Optimal Transport
OXtal: An All-Atom Diffusion Model for Organic Crystal Structure Prediction
XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
TetraGT: Tetrahedral Geometry-Driven Explicit Token Interactions with Graph Transformer for Molecular Representation Learning
Learning Collective Variables from BioEmu with Time-Lagged Generation
Safety at One Shot: Patching Fine-Tuned LLMs with A Single Instance
Controllable diffusion-based generation for multi-channel biological data
Smooth Calibration Error: Uniform Convergence and Functional Gradient Analysis
Doloris: Dual Conditional Diffusion Implicit Bridges with Sparsity Masking Strategy for Unpaired Single-Cell Perturbation Estimation
VCWorld: A Biological World Model for Virtual Cell Simulation
Visual Compositional Tuning
Controllable Sequence Editing for Biological and Clinical Trajectories
scDFM: Distributional Flow Matching Model for Robust Single-Cell Perturbation Prediction
Learning Explicit Single-Cell Dynamics Using ODE Representations
How To Open the Black Box: Modern Models for Mechanistic Interpretability
Adaptive Data-Knowledge Alignment in Genetic Perturbation Prediction
Tokenization to Transfer: Do Genomic Foundation Models Learn Good Representations?
Extending Sequence Length is Not All You Need: Effective Integration of Multimodal Signals for Gene Expression Prediction
Bridging Radiology and Pathology Foundation Models via Concept-Based Multimodal Co-Adaptation
Structural Prognostic Event Modeling for Multimodal Cancer Survival Analysis
M3CoTBench: Benchmark Chain-of-Thought of MLLMs in Medical Image Understanding
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
FrugalRAG: Less is More in RL Finetuning for Multi-hop Question Answering
CLAP: Unsupervised 3D Representation Learning for Fusion 3D Perception via Curvature Sampling and Prototype Learning
Nonparametric Contextual Online Bilateral Trade
Unified Brain Surface and Volume Registration
TPDiff: Temporal Pyramid Video Diffusion Model
Distilling and Adapting: A Topology-Aware Framework for Zero-Shot Interaction Prediction in Multiplex Biological Networks
RefineBench: Evaluating Refinement Capability of Language Models via Checklists
Physics vs Distributions: Pareto Optimal Flow Matching with Physics Constraints
CMT-Benchmark: A Benchmark for Condensed Matter Theory Built by Expert Researchers
SESaMo: Symmetry-Enforcing Stochastic Modulation for Normalizing Flows
What (and What Not) are Calibrated Probabilities Actually Useful for?
Generalized Spherical Neural Operators: Green’s Function Formulation
Incomplete Data, Complete Dynamics: A Diffusion Approach
Neural Multi-Objective Combinatorial Optimization for Flexible Job Shop Scheduling Problems
Fast Convergence of Natural Gradient Descent for Over-parameterized Physics-Informed Neural Networks
Enhancing Stability of Physics-Informed Neural Network Training Through Saddle-Point Reformulation
DGNet: Discrete Green Networks for Data-Efficient Learning of Spatiotemporal PDEs
A Spectral-Grassmann Wasserstein metric for operator representations of dynamical systems
AetherCode: Evaluating LLMs’ Ability to Win In Premier Programming Competitions
LD-EnSF: Synergizing Latent Dynamics with Ensemble Score Filters for Fast Data Assimilation with Sparse Observations
Divide, Conquer, and Standardize — A Recursive Architecture for Multi-Agent Systems (MAS)
An Expanded Benchmark that Rediscovers and Affirms the Edge of Uncertainty Sampling for Active Learning in Tabular Datasets
DiMeR: Disentangled Mesh Reconstruction Model with Normal-only Geometry Training
BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning
TwinVLA: Data-Efficient Bimanual Manipulation with Twin Single-Arm Vision-Language-Action Models
Compositional Diffusion with Guided search for Long-Horizon Planning
Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denosing Diffusion Process
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective
The 99% Success Paradox: When Near-Perfect Retrieval Equals Random Selection
MobiEdit: Resource-efficient Knowledge Editing for Personalized On-device LLMs
RFS: Reinforcement learning with Residual flow steering for dexterous manipulation
VITA: Vision-to-Action Flow Matching Policy
Scaling up Memory for Robotic Control via Experience Retrieval
REI-Bench: Can Embodied Agents Understand Vague Human Instructions in Task Planning?
Semantic Parallelism: Redefining Efficient MoE Inference via Model-Data Co-Scheduling
T-TAMER: Provably Taming Trade-offs in ML Serving
ResWorld: Temporal Residual World Model for End-to-End Autonomous Driving
Embodied Navigation Foundation Model
Influence without Confounding: Causal Discovery from Temporal Data with Long-term Carry-over Effects
Causal Structure Learning in Hawkes Processes with Complex Latent Confounder Networks
Matching without Group Barrier for Heterogeneous Treatment Effect Estimation
GDR-learners: Orthogonal Learning of Generative Models for Potential Outcomes
U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs
Conditional Independent Component Analysis for Estimating Causal Structure with Latent Variables
On the Eligibility of LLMs for Counterfactual Reasoning: A Decompositional Study
Distributional Equivalence in Linear Non-Gaussian Latent-Variable Cyclic Causal Models: Characterization and Learning
Pre-training under infinite compute
Journey to the Centre of Cluster: Harnessing Interior Nodes for A/B Testing under Network Interference
Multi-Scale Diffusion-Guided Graph Learning with Power-Smoothing Random Walk Contrast for Multi-View Clustering
Bridging ML and algorithms: comparison of hyperbolic embeddings
FASTer: Toward Powerful and Efficient Autoregressive Vision–Language–Action Models with Learnable Action Tokenizer and Block-wise Decoding
Boosting Open Set Recognition Performance through Modulated Representation Learning
Operator Theory-Driven Autoformulation of MDPs for Control of Queueing Systems
Adaptive Width Neural Networks
Cross-Domain Lossy Compression via Rate- and Classification-Constrained Optimal Transport
Random Controlled Differential Equations
From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors
GRACE: Generative Representation Learning via Contrastive Policy Optimization
TRIDENT: Cross-Domain Trajectory Spatio-Temporal Representation via Distance-Preserving Triplet Learning
Behavior Learning (BL)
WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning
Features Emerge as Discrete States: The First Application of SAEs to 3D Representations
SupCLAP: Controlling Optimization Trajectory Drift in Audio-Text Contrastive Learning with Support Vector Regularization
Beyond Entity Correlations: Disentangling Event Causal Puzzles in Temporal Knowledge Graphs
Learning Unified Representation of 3D Gaussian Splatting
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
CoNavBench: Collaborative Long-Horizon Vision-Language Navigation Benchmark
Detect, Decide, Unlearn: A Transfer-Aware Framework for Continual Learning
FSD-CAP: Fractional Subgraph Diffusion with Class-Aware Propagation for Graph Feature Imputation
Learning Robust Intervention Representations with Delta Embeddings
UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings
Inferring the Invisible: Neuro-Symbolic Rule Discovery for Missing Value Imputation
KDP: Simplifying Representation Dynamics in Kernel Space
ARTDECO: Toward High-Fidelity On-the-Fly Reconstruction with Hierarchical Gaussian Structure and Feed-Forward Guidance
LLM Pretraining with Continuous Concepts
AirQA: A Comprehensive QA Dataset for AI Research with Instance-Level Evaluation
Let LLMs Speak Embedding Languages: Generative Text Embeddings via Iterative Contrastive Refinement
Effect of Parallel Environments and Rollout Steps in PPO
RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents
Expert Divergence Learning for MoE-based Language Models
Multilingual Routing in Mixture-of-Experts
TabStruct: Measuring Structural Fidelity of Tabular Data
Proper Velocity Neural Networks
Closing the Modality Gap Aligns Group-Wise Semantics
On the Wasserstein Geodesic Principal Component Analysis of probability measures
VeriCoT: Neuro-symbolic Chain-of-Thought Validation via Logical Consistency Checks
Depth Anything 3: Recovering the Visual Space from Any Views
Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment
Learning Human Habits with Rule-Guided Active Inference
Graphon Cross-Validation: Assessing Models on Network Data
Supporting Multimodal Intermediate Fusion with Informatic Constraint and Distribution Coherence
Learning Energy-Based Generative Models via Potential Flow: A Variational Principle Approach to Probability Density Homotopy Matching
AsyncBEV: Cross-modal flow alignment in Asynchronous 3D Object Detection
Spatially Informed Autoencoders for Interpretable Visual Representation Learning
PolyGraph Discrepancy: a classifier-based metric for graph generation
Explain in Your Own Words: Improving Reasoning via Token-Selective Dual Knowledge Distillation
Knowledge Distillation as Decontamination? Revisiting the “Data Laundering” Concern in Classification Tasks
TokMem: One-Token Procedural Memory for Large Language Models
ABBA-Adapters: Efficient and Expressive Fine-Tuning of Foundation Models
Gradient-Based Program Synthesis with Neurally Interpreted Languages
Gradient-Aligned Calibration for Post-Training Quantization of Diffusion Models
Prior-free Tabular Test-time Adaptation
TRAC: Tensor-Train based Across-layer Compression for Parameter-Efficient Fine-Tuning
Beyond Student: An Asymmetric Network for Neural Network Inheritance
Adaptive Mesh Quantization for Neural PDE Solvers
GNN-as-Judge: Unleashing the Power of LLMs for Graph Learning with GNN Feedback
Generative Diffusion Prior Distillation for Long-Context Knowledge Transfer
Evaluating GFlowNet from partial episodes for stable and flexible policy-based training
Score-Based Density Estimation from Pairwise Comparisons
RNE: plug-and-play diffusion inference-time control and energy-based training
Scalable Random Wavelet Features: Efficient Non-Stationary Kernel Approximation with Convergence Guarantees
CLEAR: Calibrated Learning for Epistemic and Aleatoric Risk
AI-for-Science Low-code Platform with Bayesian Adversarial Multi-Agent Framework
JAPAN: Joint Adaptive Prediction Areas with Normalising Flow
Wait, Do We Need to Wait? Revisiting Budget Forcing for Sequential Test-Time Scaling
RefineStat: Efficient Exploration for Probabilistic Program Synthesis
Accelerated Parallel Tempering via Neural Transports
Latent Geometry-Driven Network Automata for Complex Network Dismantling
When and Where to Reset Matters for Long-Term Test-Time Adaptation
Supporting High-Stakes Decision Making Through Interactive Preference Elicitation in the Latent Space
CoLA: Co-Calibrated Logit Adjustment for Long-Tailed Semi-Supervised Learning
Noisy but Valid: Robust Statistical Evaluation of LLMs with Imperfect Judges
Train on Validation (ToV): Fast data selection with applications to fine-tuning
Enhanced Generative Model Evaluation with Clipped Density and Coverage
On the Ability of Deep Networks to Learn Symmetries from Data – A Neural Kernel Theory
Understanding the Robustness of Distributed Self-Supervised Learning Frameworks Against Non-IID Data
Pretraining with hierarchical memories: separating long-tail and common knowledge
Soft Quality-Diversity Optimization
Dynamic Parameter Reuse Augments Reasoning via Latent Chain of Thought
Rethinking Continual Learning with Progressive Neural Collapse
Random-projection ensemble dimension reduction
MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow
A Statistical Benchmark for Diffusion-Posterior-Sampling Algorithms
MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on LLMs
Entropic Confinement and Mode Connectivity in Overparameterized Neural Networks
Null-Space Filtering for Data-Free Continual Model Merging: Preserving Stability, Promoting Plasticity
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation
LMask: Learn to Solve Constrained Routing Problems with Lazy Masking
Deft Scheduling of Dynamic Cloud Workflows with Varying Deadlines via Mixture-of-Experts
CALM: Co-evolution of Algorithms and Language Model for Automatic Heuristic Design
KGOT: Unified Knowledge Graph and Optimal Transport Pseudo-Labeling for Molecule-Protein Interaction Prediction
MolEditRL: Structure-Preserving Molecular Editing via Discrete Diffusion and Reinforcement Learning
Think-While-Generating: On-the-Fly Reasoning for Personalized Long-Form Generation
Station2Radar: Query‑Conditioned Gaussian Splatting for Precipitation Field
Fast Proteome-Scale Protein Interaction Retrieval via Residue-Level Factorization
SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback
From Medical Records to Diagnostic Dialogues: A Clinical-Grounded Approach and Dataset for Psychiatric Comorbidity
Large Language Model Compression with Global Rank and Sparsity Optimization
Visualizing LLM Latent Space Geometry Through Dimensionality Reduction
ConRep4CO: Contrastive Representation Learning of Combinatorial Optimization Instances across Types
Align-SAM: Seeking Flatter Minima for Better Cross-Subset Alignment
Scalable and Adaptive Trust-Region Learning via Projection Convex Hull
FrontierCO: Real-World and Large-Scale Evaluation of Machine Learning Solvers for Combinatorial Optimization
Native Adaptive Solution Expansion for Diffusion-based Combinatorial Optimization
Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling
Deep FlexQP: Accelerated Nonlinear Programming via Deep Unfolding
Chain-of-Context Learning: Dynamic Constraint Understanding for Multi-Task VRPs
DISK: Differentiable Sparse Kernel Complex for Efficient Spatially-Variant Convolution
A Block Coordinate Descent Method for Nonsmooth Composite Optimization under Orthogonality Constraints
Riemannian Optimization on Relaxed Indicator Matrix Manifold
Discretisation invariance
Provably Accelerated Imaging with Restarted Inertia and Score-based Image Priors
Efficient Submodular Maximization for Sums of Concave over Modular Functions
Seek-CAD: A Self-refined Generative Modeling for 3D Parametric CAD Using Local Inference via DeepSeek
The Power of Small Initialization in Noisy Low-Tubal-Rank Tensor Recovery
Reducing Contextual Stochastic Bilevel Optimization via Structured Function Approximation
Off-Policy Safe Reinforcement Learning with Cost-Constrained Optimistic Exploration
Proximal Diffusion Neural Sampler
Local Entropy Search over Descent Sequences for Bayesian Optimization
Improving LLM-based Global Optimization with Search Space Partitioning
Thompson Sampling via Fine-Tuning of LLMs
MobileIPL: Enhancing Mobile Agents Thinking Process via Iterative Preference Learning
Error Feedback for Muon and Friends
Byzantine-Robust Federated Learning with Learnable Aggregation Weights
SpareTrain: Fault-Tolerant LLM Training via Low-Cost Dual Modular Redundancy
Long-Context Attention Benchmark: From Kernel Efficiency to Distributed Context Parallelism
Efficient Differentiable Contact Model with Long-range Influence
Composite Optimization with Error Feedback: the Dual Averaging Approach
Query-Specific Causal Graph Pruning Under Tiered Knowledge
Using Graph Neural Networks in Reinforcement Learning: A Practical Guide
DeMo: Decoupled Momentum Optimization
When a Robot is More Capable than a Human: Learning from Constrained Demonstrators
A Memory-Efficient Hierarchical Algorithm for Large-scale Optimal Transport Problems
Beyond Outliers: A Study of Optimizers Under Quantization
Improved $\ell_{p}$ Regression via Iteratively Reweighted Least Squares
Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension
Sublinear Time Quantum Algorithm for Attention Approximation
KBVQ-MoE: KLT-guided SVD with Bias-Corrected Vector Quantization for MoE Large Language Models
Adaptive Nonlinear Compression for Large Foundation Models
Alignment-Enhanced Integration of Connectivity and Spectral Sparsity in Dynamic Sparse Training of LLM
Inference-Cost-Aware Dynamic Tree Construction for Efficient Inference in Large Language Models
NeoBERT: A Next Generation BERT
UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs
DemoGrasp: Universal Dexterous Grasping from a Single Demonstration
Learning Dynamics of Logits Debiasing for Long-Tailed Semi-Supervised Learning
Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
SliderQuant: Accurate Post-Training Quantization for LLMs
Lookup multivariate Kolmogorov-Arnold Networks
Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods
Test-Time Training Done Right
SAFETY-GUIDED FLOW (SGF): A UNIFIED FRAMEWORK FOR NEGATIVE GUIDANCE IN SAFE GENERATION
Neodragon: Mobile Video Generation Using Diffusion Transformer
MoAlign: Motion-Centric Representation Alignment for Video Diffusion Models
Fresh in memory: Training-order recency is linearly encoded in language model activations
Rethinking Residual Errors in Compensation-based LLM Quantization
Scalable Second-order Riemannian Optimization for $K$-means Clustering
Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models
Beyond Length: Quantifying Long-Range Information for Long-Context LLM Pretraining Data
Extracting Model Precision from 20 Logprobs
Is the evidence in 'Language Models Learn to Mislead Humans via RLHF' valid?
Constraint-guided Hardware-aware NAS through Gradient Modification
CodeQuant: Unified Clustering and Quantization for Enhanced Outlier Smoothing in Low-Precision Mixture-of-Experts
DisTaC: Conditioning Task Vectors via Distillation for Robust Model Merging
PYRREGULAR: A Unified Framework for Irregular Time Series, with Classification Benchmarks
Beyond Uniformity: Sample and Frequency Meta Weighting for Post-Training Quantization of Diffusion Models
SmellNet: A Dataset for Sensor-Based Smell Recognition and Mixture Prediction
MoDr: Mixture-of-Depth-Recurrent Transformers for Test-Time Reasoning
Towards a Transferable Acceleration Method for Density Functional Theory
Square Peg, Round Hole: Plugging Non-Sequential Data into Sequential Language Models
Temporal Generalization: A Reality Check
On the Interaction of Compressibility and Adversarial Robustness
From Data Statistics to Feature Geometry: How Correlations Shape Superposition
From REINFORCE to Dr. GRPO: A Unified Perspective on LLM Post-Training
LoRAGen: Structure-Aware Weight Space Learning for LoRA Generation
WearVox: An Egocentric Multichannel Voice Assistant Benchmark for Wearables
Remaining-data-free Machine Unlearning by Suppressing Sample Contribution
DNT: a Deeply Normalized Transformer that can be trained by Momentum SGD
Amortising Inference and Meta-Learning Priors in Neural Networks
Diffusion Transformers with Representation Autoencoders
Temporal Test-Time Adaptation with State-Space Models
Fine-tuning Quantized Neural Networks with Zeroth-order Optimization
Cactus: Accelerating Auto-Regressive Decoding with Constrained Acceptance Speculative Sampling
From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning
Deconstructing Positional Information: From Attention Logits to Training Biases
NeMo-map: Neural Implicit Flow Fields for Spatio-Temporal Motion Mapping
WebDevJudge: Evaluating (M)LLMs as Critiques for Web Development Quality
Corner Gradient Descent
EIP: Weighted Ranking of LLMs by Quantifying Question Difficulty
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Light Differentiable Logic Gate Networks
TD-MoE: Tensor Decomposition for MoE Models
DRIFT: Divergent Response in Filtered Transformations for Robust Adversarial Defense
SERE: Similarity-based Expert Re-routing for Efficient Batch Decoding in MoE Models
SysMoBench: Evaluating AI on Formally Specifying Complex Real-World Systems
Attend to the Active: Structure-Aware Dynamic Attention in LLMs for Compositional Instruction Following
TriQDef: Disrupting Semantic and Gradient Alignment to Prevent Adversarial Patch Transferability in Quantized Neural Networks
ViMo: A Generative Visual GUI World Model for App Agents
VideoJudge: Bootstrapping Enables Scalable Supervision of MLLM-as-a-Judge for Video Understanding
FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
DPad: Efficient Diffusion Language Models with Suffix Dropout
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks
MULTIMODALITY AS SUPERVISION: SELF-SUPERVISED SPECIALIZATION TO THE TEST ENVIRONMENT VIA MULTIMODALITY
Robustness in Text-Attributed Graph Learning: Insights, Trade-offs, and New Defenses
Token Alignment Heads: Unveiling Attention's Role in LLM Multilingual Translation
LogicXGNN: Grounded Logical Rules for Explaining Graph Neural Networks
Log-Linear Attention
Attention Sinks and Compression Valleys in LLMs are Two Sides of the Same Coin
xLSTM Scaling Laws: Competitive Performance with Linear Time-Complexity
SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety
LaplacianFormer:Rethinking Linear Attention with Laplacian Kernel
QuoKA: Query-Oriented KV Selection for Efficient LLM Prefill
The Counting Power of Transformers
Selective Rotary Position Embedding
Critical attention scaling in long-context transformers
Emergent Discrete Controller Modules for Symbolic Planning in Transformers
Graph Signal Processing Meets Mamba2: Adaptive Filter Bank via Delta Modulation
Towards Dynamic Interleaving Optimizers
Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs
Group Representational Position Encoding
Membership Inference Attacks Against Fine-tuned Diffusion Language Models
A Convergence Analysis of Adaptive Optimizers under Floating-point Quantization
Derandomized Online-to-Non-convex Conversion for Stochastic Weakly Convex Optimization
Spectral Attention Steering for Prompt Highlighting
UALM: Unified Audio Language Model for Understanding, Generation and Reasoning
Music Flamingo: Scaling Music Understanding in Audio Language Models
ZeroTuning: Unlocking the Initial Token's Power to Enhance Large Language Models Without Training
Let's (not) just put things in Context: Test-time Training for Long-context LLMs
AlignFlow: Improving Flow-based Generative Models with Semi-Discrete Optimal Transport
OPPO: Accelerating PPO-based RLHF via Pipeline Overlap
VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use
Online Inventory Optimization in Non-Stationary Environment
The Spacetime of Diffusion Models: An Information Geometry Perspective
EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems
MoM: Linear Sequence Modeling with Mixture-of-Memories
Riemannian Zeroth-Order Gradient Estimation with Structure-Preserving Metrics for Geodesically Incomplete Manifolds
LLM-Guided Evolutionary Program Synthesis for Quasi-Monte Carlo Design
Condition Matters in Full-head 3D GANs
Riemannian Variational Flow Matching for Material and Protein Design
Beyond Scattered Acceptance: Fast and Coherent Inference for DLMs via Longest Stable Prefixes
DOPPLER: Dual-Policy Learning for Device Assignment in Asynchronous Dataflow Graphs
SceneTransporter: Optimal Transport-Guided Compositional Latent Diffusion for Single-Image Structured 3D Scene Generation
The Potential of Second-Order Optimization for LLMs: A Study with Full Gauss-Newton
Learning to Reason Efficiently with Discounted Reinforcement Learning
Unveiling the Basin-Like Loss Landscape in Large Language Models
Layerwise Federated Learning for Heterogeneous Quantum Clients using Quorus
Parameterized Hardness of Zonotope Containment and Neural Network Verification
GenDR: Lighten Generative Detail Restoration
HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization
vCache: Verified Semantic Prompt Caching
Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning
Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning
PERK: Long-Context Reasoning as Parameter-Efficient Test-Time Learning
On the Mechanisms of Collaborative Learning in VAE Recommenders
Quagmires in SFT-RL Post-Training: When High SFT Scores Mislead and What to Use Instead
Latent Stochastic Interpolants
Dual-Objective Reinforcement Learning with Novel Hamilton-Jacobi-Bellman Formulations
VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents
Confident and Adaptive Generative Speech Recognition via Risk Control
Reasoning with Sampling: Your Base Model is Smarter Than You Think
Unveiling the Potential of Diffusion Large Language Model in Controllable Generation
End-to-End Probabilistic Framework for Learning with Hard Constraints
Follow-Your-Preference: Towards Preference-Aligned Image Inpainting
MnemoDyn: Learning Resting State Dynamics from $40$K FMRI sequences
Trust The Typical
Uncertainty Estimation via Hyperspherical Confidence Mapping
In Good GRACES: Principled Teacher Selection for Knowledge Distillation
Token-Guard: Towards Token-Level Hallucination Control via Self-Checking Decoding
Flow of Spans: Generalizing Language Models to Dynamic Span-Vocabulary via GFlowNets
Stage-wise Dynamics of Classifier-Free Guidance in Diffusion Models
Motion-R1: Enhancing Motion Generation with Decomposed Chain-of-Thought and RL Binding
Precise and Interpretable Editing of Code Knowledge in Large Language Models
InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation
Unifying Stable Optimization and Reference Regularization in RLHF
Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing
Pixel-Level Residual Diffusion Transformer: Scalable 3D CT Volume Generation
Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding
Gumbel Distillation for Parallel Text Generation
Generative Human Geometry Distribution
On Optimal Hyperparameters for Differentially Private Deep Transfer Learning
Diagnosing and Improving Diffusion Models by Estimating the Optimal Loss Value
Setting the Record Straight on Transformer Oversmoothing
Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data
Free Lunch for Stabilizing Rectified Flow Inversion
REAL: Reading Out Transformer Activations for Precise Localization in Language Model Steering
ES-dLLM: Efficient Inference for Diffusion Large Language Models by Early-Skipping
Efficient Quantization of Mixture-of-Experts with Theoretical Generalization Guarantees
Steering Autoregressive Music Generation with Recursive Feature Machines
Beyond Membership: Limitations of Add/Remove Adjacency in Differential Privacy
Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models
Cache-to-Cache: Direct Semantic Communication Between Large Language Models
FACT: Fine-grained Across-variable Convolution for Multivariate Time Series Forecasting
Carré du champ flow matching: better quality-generalisation tradeoff in generative models
Perturbed Dynamic Time Warping: A Probabilistic Framework and Generalized Variants
FlashDLM: Accelerating Diffusion Language Model Inference via Efficient KV Caching and Guided Diffusion
Rote Learning Considered Useful: Generalizing over Memorized Data in LLMs
A General Spatio-Temporal Backbone with Scalable Contextual Pattern Bank for Urban Continual Forecasting
VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models
Improving Reasoning for Diffusion Language Models via Group Diffusion Policy Optimization
Scalable Energy-Based Models via Adversarial Training: Unifying Discrimination and Generation
DRPO: Efficient Reasoning via Decoupled Reward Policy Optimization
Next-ToBE: Probabilistic Next Token-Bag Exploitation for Activating Anticipatory Capacity in LLMs
MILR: Improving Multimodal Image Generation via Test-Time Latent Reasoning
In-Context Algebra
Foresight Diffusion: Improving Sampling Consistency in Predictive Diffusion Models
From U-Nets to DiTs: The Architectural Evolution of Text-to-Image Diffusion Models (2021–2025)
Time-To-Inconsistency: A Survival Analysis of Large Language Model Robustness to Adversarial Attacks
UltraLLaDA: Scaling the Context Length to 128K for Diffusion Large Language Models
Cross-Modal Redundancy and the Geometry of Vision–Language Embeddings
A Bayesian Nonparametric Framework For Learning Disentangled Representations
Generalization of Diffusion Models Arises with a Balanced Representation Space
VisCoder2: Building Multi-Language Visualization Coding Agents
GGBall: Graph Generative Model on Poincaré Ball
CyclicReflex: Improving Reasoning Models via Cyclical Reflection Token Scheduling
Training Dynamics of the Cooldown Stage in Warmup-Stable-Decay Learning Rate Scheduler
LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts
Learn to Guide Your Diffusion Model
Provable Separations between Memorization and Generalization in Diffusion Models
Predicting LLM Output Length via Entropy-Guided Representations
Online Minimization of Polarization and Disagreement via Low-Rank Matrix Bandits
The human knowledge loophole in the 'bitter lesson' for LLMs
Assessing Robustness via Score-Based Adversarial Image Generation
Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance
Eliminating VAE for Fast and High-Resolution Generative Detail Restoration
ManipEvalAgent: Promptable and Efficient Evaluation Framework for Robotic Manipulation Policies
Efficient Autoregressive Inference for Transformer Probabilistic Models
Your VAR Model is Secretly an Efficient and Explainable Generative Classifier
GLASS Flows: Efficient Inference for Reward Alignment of Flow and Diffusion Models
Generating Directed Graphs with Dual Attention and Asymmetric Encoding
Consistent Text-to-Image Generation via Scene De-Contextualization
Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct
Toward Effective Tool-Integrated Reasoning via Self-Evolved Preference Learning
MambaSL: Exploring Single-Layer Mamba for Time Series Classification
DriftLite: Lightweight Drift Control for Inference-Time Scaling of Diffusion Models
Tracing the Principles Behind Modern Diffusion Models
Diffusion as Infinite HVAEs: Do Diffusion Models Generalize Better than Deep VAEs?
Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity
Flow Actor-Critic for Offline Reinforcement Learning
On the Design of One-step Diffusion via Shortcutting Flow Paths
Bottlenecked Transformers: Periodic KV Cache Consolidation for Generalised Reasoning
AlphaFlow: Understanding and Improving MeanFlow Models
Explainable Token-level Noise Filtering for LLM Fine-tuning Datasets
InfBaGel: Human-Object-Scene Interaction Generation with Dynamic Perception and Iterative Refinement
SCOPED: Score–Curvature Out-of-distribution Proximity Evaluator for Diffusion
Texture Vector-Quantization and Reconstruction Aware Prediction for Generative Super-Resolution
ODNet: Opinion Dynamics-Inspired Neural Message Passing for Graphs and Hypergraphs
GneissWeb: Preparing High Quality Data for LLMs at Scale
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding
ConfHit: Conformal Generative Design with Oracle-Free Guarantees
Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model
Unsupervised Representation Learning for 3D Mesh Parameterization with Semantic and Visibility Objectives
Edit-Based Flow Matching for Temporal Point Processes
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies
Reforming the Mechanism: Editing Reasoning Patterns in LLMs with Circuit Reshaping
JointDiff: Bridging Continuous and Discrete in Multi-Agent Trajectory Generation
Trade-offs in LLM Compute for Reasoning-Intensive Information Retrieval
Projected Coupled Diffusion for Test-Time Constrained Joint Generation
ICDiffAD: Implicit Conditioning Diffusion Model for Time Series Anomaly Detection
Towards Anomaly-Aware Pre-Training and Fine-Tuning for Graph Anomaly Detection
Sheaves Reloaded: A Direction Awakening
GraphOmni: A Comprehensive and Extensible Benchmark Framework for Large Language Models on Graph-theoretic Tasks
Pitfalls in Evaluating Language Model Forecasters
Multi-Scale Hypergraph Meets LLMs: Aligning Large Language Models for Time Series Analysis
An Information-Theoretic Lower Bound on the Generalization Error of Autoencoders
Pairwise is Not Enough: Hypergraph Neural Networks for Multi-Agent Pathfinding
Cooperative Sheaf Neural Networks
TandemFoilSet: Datasets for Flow Field Prediction of Tandem-Airfoil Through the Reuse of Single Airfoils
Diverse and Sparse Mixture-of-Experts for Causal Subgraph–Based Out-of-Distribution Graph Learning
Composable Sparse Subnetworks via Maximum-Entropy Principle
EVEREST: A Transformer for Probabilistic Rare-Event Anomaly Detection with Evidential and Tail-Aware Uncertainty
HYPER: A Foundation Model for Inductive Link Prediction with Knowledge Hypergraphs
On Universality of Deep Equivariant Networks
One for Two: A Unified Framework for Imbalanced Graph Classification via Dynamic Balanced Prototype
Beyond Simple Graphs: Neural Multi-Objective Routing on Multigraphs
Noise Tolerance of Distributionally Robust Learning
Robust Fine-Tuning from Non-Robust Pretrained Models: Mitigating Suboptimal Transfer With Epsilon-Scheduling
VEAttack: Downstream-agnostic Vision Encoder Attack against Large Vision Language Models
Virtual Community: An Open World for Humans, Robots, and Society
Distributionally Robust Linear Regression with Block Lewis Weights
Point-wise Anomaly Detection via Fold-bifurcation ODE
Robust Adversarial Quantification via Conflict-Aware Evidential Deep Learning
Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation
Distributed Quasi-Newton Method for Fair and Fast Federated Learning
Semantic Uncertainty Quantification of Hallucinations in LLMs: A Quantum Tensor Network Based Method
Compositional-ARC: Assessing Systematic Generalization in Abstract Spatial Reasoning
When Flatness Does (Not) Guarantee Adversarial Robustness
Characteristic Root Analysis and Regularization for Linear Time Series Forecasting
On the Lipschitz Continuity of Set Aggregation Functions and Neural Networks for Sets
ICYM$^2$I: The illusion of multimodal informativeness under missingness
Enhancing Learning with Noisy Labels via Rockafellian Relaxation
Predicting Training Re-evaluation Curves Enables Effective Data Curriculums for LLMs
Theory of Scaling Laws for In-Context Regression: Depth, Width, Context and Time
Sequences of Logits Reveal the Low Rank Structure of Language Models
Task-Adaptive Parameter-Efficient Fine-Tuning for Weather Foundation Models
How Transformers Learn Causal Structures In-Context: Explainable Mechanism Meets Theoretical Guarantee
Finite-Time Convergence Analysis of ODE-based Generative Models for Stochastic Interpolants
From Neural Networks to Logical Theories: The Correspondence between Fibring Modal Logics and Fibring Neural Networks
The Serial Scaling Hypothesis
Fisher-Rao Sensitivity for Out-of-Distribution Detection in Deep Neural Networks
The Coverage Principle: How Pre-Training Enables Post-Training
A New Initialization to Control Gradients in Sinusoidal Neural Networks
Synergistic Benefits of Joint Molecule Generation and Property Prediction
Does Weak-to-strong Generalization Happen under Spurious Correlations?
Self-Destructive Language Models
Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models
On The Surprising Effectiveness of a Single Global Merging in Decentralized Learning
Dynamics-inspired Structure Hallucination for Protein-protein Interaction Modeling
DAG-Math: Graph-of-Thought Guided Mathematical Reasoning in LLMs
LC-PLM: Long-context Protein Language Modeling Using Bidirectional Mamba with Shared Projection Layers
Probability Distributions Computed by Autoregressive Transformers
Never Saddle for Reparameterized Steepest Descent as Mirror Flow
Fast Catch-Up, Late Switching: Optimal Batch Size Scheduling via Functional Scaling Laws
Probing in the Dark: State Entropy Maximization for POMDPs
SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors
AIRE-Prune: Asymptotic Impulse-Response Energy for State Pruning in State Space Models
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Highly Efficient and Effective LLMs with Multi-Boolean Architectures
Inverse Reinforcement Learning with Dynamic Reward Scaling for LLM Alignment
Early Signs of Steganographic Capabilities in Frontier LLMs
SSDi8: Accurate and Efficient 8-bit Quantization for State Space Duality
DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems
CORDS - Continuous Representations of Discrete Structures
Rethinking Layer Relevance in Large Language Models Beyond Cosine Similarity
Path Matters: Unveiling Geometric Implicit Bias via Curvature-Aware Sparse View Optimization
CoMem: Compositional Concept-Graph Memory for Vision–Language Adaptation
High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning
Stable and Scalable Deep Predictive Coding Networks with Meta-Prediction Errors
Reversible Primitive–Composition Alignment for Continual Vision–Language Learning
On the Tension Between Optimality and Adversarial Robustness in Policy Optimization
Latent-Guided Reasoning: Empowering Small LLMs with Large-Model Thinking
ResT: Reshaping Token-Level Policy Gradients for Tool-Use Large Language Models
Internal Evaluation of Density-Based Clusterings with Noise
Distribution-Aware Multi-Granularity Phase Coding: Towards Lower Conversion Error for Spike-Driven Large Language Models
DuPO: Enabling Reliable Self-Verification via Dual Preference Optimization
DISCO: Diversifying Sample Condensation for Efficient Model Evaluation
Does Higher Interpretability Imply Better Utility? A Pairwise Analysis on Sparse Autoencoders
INSTANT: Compressing Gradients and Activations for Resource-Efficient Training
GeoGramBench: Benchmarking the Geometric Program Reasoning in Modern LLMs
Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights
SASFT: Sparse Autoencoder-guided Supervised Finetuning to Mitigate Unexpected Code-Switching in LLMs
Enabling Fine-Tuning of Direct Feedback Alignment via Feedback-Weight Matching
MarS-FM: Generative Modeling of Molecular Dynamics via Markov State Models
SAFER: Risk-Constrained Sample-then-Filter in Large Language Models
Tracking Equivalent Mechanistic Interpretations Across Neural Networks
GradPCA: Leveraging NTK Alignment for Reliable Out-of-Distribution Detection
MLP Memory: A Retriever-Pretrained Memory for Large Language Models
Bures-Isotropy Alignment: Manifold Learning of Generalized Category Discovery
PALC: Preference Alignment via Logit Calibration
ConvT3: Structured State Kernels for Convolutional State Space Models
The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs
Do We Really Need Permutations? Impact of Model Width on Linear Mode Connectivity
T1: One-to-One Channel-Head Binding for Multivariate Time-Series Imputation
Decomposition of Concept-Level Rules in Visual Scenes
Understanding the Emergence of Seemingly Useless Features in Next-Token Predictors
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Aligning Deep Implicit Preferences by Learning to Reason Defensively
Randomized Antipodal Search Done Right for Data Pareto Improvement of LLM Unlearning
QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models
TileLang: Bridge Programmability and Performance in Modern Neural Kernels
Predicting LLM Reasoning Performance with Small Proxy Model
Children's Intelligence Tests Pose Challenges for MLLMs? KidGym: A 2D Grid-Based Reasoning Benchmark for MLLMs
ToolWeaver: Weaving Collaborative Semantics for Scalable Tool Use in Large Language Models
ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents
Towards All-Atom Foundation Models for Biomolecular Binding Affinity Prediction
DeepSADR: Deep Transfer Learning with Subsequence Interaction and Adaptive Readout for Cancer Drug Response Prediction
Fair Reinforcement Learning for Just AI
GRAM-DTI: Adaptive Multimodal Representation Learning for Drug–Target Interaction Prediction
PoinnCARE: Hyperbolic Multi-Modal Learning for Enzyme Classification
SigmaDock: Untwisting Molecular Docking with Fragment-Based SE(3) Diffusion
How to Lose Inherent Counterfactuality in Reinforcement Learning
Geometric Graph Neural Diffusion for Stable Molecular Dynamics Simulations
PepBenchmark: A Standardized Benchmark for Peptide Machine Learning
Faster SVD via Accelerated Newton-Schulz Iteration
FlexRibbon: Joint Sequence and Structure Pretraining for Protein Modeling
Unified Biomolecular Trajectory Generation via Pretrained Variational Bridge
Test-Time Adaptation without Source Data for Out-of-Domain Bioactivity Prediction
Online Learning and Equilibrium Computation with Ranking Feedback
VenusX: Unlocking Fine-Grained Functional Understanding of Proteins
Reference-guided Policy Optimization for Molecular Optimization via LLM Reasoning
PINFDiT: Energy-Based Physics-Informed Diffusion Transformers for General-purpose Time Series Tasks
Constrained Diffusion for Protein Design with Hard Structural Constraints
Numerion: A Multi-Hypercomplex Model for Time Series Forecasting
MAC-AMP: A Closed-Loop Multi-Agent Collaboration System for Multi-Objective Antimicrobial Peptide Design
Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles
GAGA: Gaussianity-Aware Gaussian Approximation for Efficient 3D Molecular Generation
Learning from the Electronic Structure of Molecules across the Periodic Table
Drugging the Undruggable: Benchmarking and Modeling Fragment-Based Screening
Beyond Ensembles: Simulating All-Atom Protein Dynamics in a Learned Latent Space
Orbital Transformers for Predicting Wavefunctions in Time-Dependent Density Functional Theory
EarthSE: A Benchmark Evaluating Earth Scientific Exploration Capability for Large Language Models
TianQuan-S2S: A Subseasonal-to-Seasonal Global Weather Model via Incorporate Climatology State
Extreme Weather Nowcasting via Local Precipitation Pattern Prediction
Shrinking Proteins with Diffusion
CryoSplat: Gaussian Splatting for Cryo-EM Homogeneous Reconstruction
CUPID: A Plug-in Framework for Joint Aleatoric and Epistemic Uncertainty Estimation with a Single Model
A Joint Diffusion Model with Pre-Trained Priors for RNA Sequence-Structure Co-Design
HeurekaBench: A Benchmarking Framework for AI Co-scientist
SABRE-FL: Selective and Accurate Backdoor Rejection for Federated Prompt Learning
AntigenLM: Structure-Aware DNA Language Modeling for Influenza
Lost in Tokenization: Context as the Key to Unlocking Biomolecular Understanding in Scientific LLMs
CellAgent: LLM-Driven Multi-Agent Framework for Natural Language-Based Single-Cell Analysis
A New Paradigm for Genome-wide DNA Methylation Prediction Without Methylation Input
Unleashing Scientific Reasoning for Bio-experimental Protocol Generation via Structured Component-based Reward Mechanism
A Resolution-Agnostic Geometric Transformer for Chromosome Modeling Using Inertial Frame
Exploring Diverse Generation Paths via Inference-time Stiefel Activation Steering
Learning linear state-space models with sparse system matrices
Benchmarking ECG FMs: A Reality Check Across Clinical Tasks
Knowledgeable Language Models as Black-Box Optimizers for Personalized Medicine
Can we generate portable representations for clinical time series data using LLMs?
Evolution of Flash Attention
From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents
NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context
Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs
A Structured, Tagged, and Localized Visual Question Answering Dataset with Full Sentence Answers and Scene Graphs for Chest X-ray Images
Nef-Net v2: Adapting Electrocardio Panorama in the wild
Understanding and improving Shampoo and SOAP via Kullback-Leibler Minimization
FETAL-GAUGE: A BENCHMARK FOR ASSESSING VISION-LANGUAGE MODELS IN FETAL ULTRASOUND
ProstaTD: Bridging Surgical Triplet from Classification to Fully Supervised Detection
CARE: Towards Clinical Accountability in Multi-Modal Medical Reasoning with an Evidence-Grounded Agentic Framework
Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents
Improving 2D Diffusion Models for 3D Medical Imaging with Inter‑Slice Consistent Stochasticity
Locally Subspace-Informed Neural Operators for Efficient Multiscale PDE Solving
Operator Learning with Domain Decomposition for Geometry Generalization in PDE Solving
Breaking Scale Anchoring: Frequency Representation Learning for Accurate High-Resolution Inference from Low-Resolution Training
The False Promise of Zero-Shot Super-Resolution in Machine-Learned Operators
Bayesian Parameter Shift Rules in Variational Quantum Eigensolvers
Advancing Universal Deep Learning for Electronic-Structure Hamiltonian Prediction of Materials
The Tutor-Pupil Augmentation: Enhancing Learning and Interpretability via Input Corrections
OrthoSolver: A Neural Proper Orthogonal Decomposition Solver For PDEs
CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models
Vision-Language-Action Instruction Tuning: From Understanding to Manipulation
Manipulation as in Simulation: Enabling Accurate Geometry Perception in Robots
Emergent Dexterity Via Diverse Resets and Large-Scale Reinforcement Learning
Steerable Adversarial Scenario Generation through Test-Time Preference Alignment
PEERING INTO THE UNKNOWN: ACTIVE VIEW SELECTION WITH NEURAL UNCERTAINTY MAPS FOR 3D RECONSTRUCTION
Translating Flow to Policy via Hindsight Online Imitation
Multi-Synaptic Cooperation: A Bio-Inspired Framework for Robust and Scalable Continual Learning
RoboPARA: Dual-Arm Robot Planning with Parallel Allocation and Recomposition Across Tasks
RAP: 3D Rasterization Augmented End-to-End Planning
One Demo Is All It Takes: Planning Domain Derivation with LLMs from A Single Demonstration
Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing
QVLA: Not All Channels Are Equal in Vision-Language-Action Model's Quantization
ArtVIP: Articulated Digital Assets of Visual Realism, Modular Interaction, and Physical Fidelity for Robot Learning
Unified Vision-Language-Action Model
P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark
ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
Beyond Sequential Reranking: Reranker-Guided Search Improves Reasoning Intensive Retrieval
Time Optimal Execution of Action Chunk Policies Beyond Demonstration Speed
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning
MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation
Differentiable Simulation of Hard Contacts with Soft Gradients for Learning and Control
TaCo: A Benchmark for Lossless and Lossy Codecs of Heterogeneous Tactile Data
Content Promotion as a Strategic Game: How to Design Agentic Publishers for the Evolving Search Ecosystem in the GenAI Era?
AgentFold: Long-Horizon Web Agents with Proactive Context Folding
CitySeeker: How Do VLMs Explore Embodied Urban Navigation with Implicit Human Needs?
VLBiMan: Vision-Language Anchored One-Shot Demonstration Enables Generalizable Bimanual Robotic Manipulation
SpikePingpong: Spike Vision-based Fast-Slow Pingpong Robot System
Training Large Language Models To Reason In Parallel With Global Forking Tokens
Verifier-free Test-Time Sampling for Vision-Language-Action Models
Lifelong Embodied Navigation Learning
Rethinking Code Similarity for Automated Algorithm Design with LLMs
Masked Generative Policy for Robotic Control
Hybrid Training for Vision-Language-Action Models
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
DataMIL: Selecting Data for Robot Imitation Learning with Datamodels
Weight-Space Linear Recurrent Neural Networks
Adapt Data to Model: Adaptive Transformation Optimization for Domain-shared Time Series Foundation Models
PMDformer: Patch-Mean Decoupling Information Transformer for Long-term Forecasting
Splat Regression Models
A Faster Parameter-Free Regret Matching Algorithm
Bridging Past and Future: Distribution-Aware Alignment for Time Series Forecasting
Mean-Field Neural Differential Equations: A Game-Theoretic Approach to Sequence Prediction
Learning Koopman Representations with Controllability Guarantees
Test-Time Efficient Pretrained Model Portfolios for Time Series Forecasting
Micro-Macro Coupled Koopman Modeling on Graph for Traffic Flow Prediction
Towards Multimodal Time Series Anomaly Detection with Semantic Alignment and Condensed Interaction
Adaptive Social Learning via Mode Policy Optimization for Language Agents
Reliable Probabilistic Forecasting of Irregular Time Series through Marginalization-Consistent Flows
STABLE: Shift-Tolerant Allocation via Black-Litterman Using Conditional Diffusion Estimates
Aurora: Towards Universal Generative Multimodal Time Series Forecasting
DeNOTS: Stable Deep Neural ODEs for Time Series
TS-DDAE: A Novel Temporal-Spectral Denoising Diffusion AutoEncoder for Wireless Signal Recognition Model Pre-training
Language as a Window Into the Mind: How NLP and LLMs Advance Human Sciences
ST-HHOL: Spatio-Temporal Hierarchical Hypergraph Online Learning for Crime Prediction
TIMESLIVER : SYMBOLIC-LINEAR DECOMPOSITION FOR EXPLAINABLE TIME SERIES CLASSIFICATION
ExpertLongBench: Benchmarking Language Models on Expert-Level Long-Form Generation Tasks with Structured Checklists
Learning Facts at Scale with Active Reading
CoRA: Boosting Time Series Foundation Models for Multivariate Forecasting through Correlation-aware Adapter
When Foundation Models are One-Liners: Limitations and Future Directions for Time Series Anomaly Detection
TimeSeriesExamAgent: Creating Time Series Reasoning Benchmarks at Scale
Lost in the Non-convex Loss Landscape: How to Fine-tune the Large Time Series Model?
PGRF-Net: A Prototype-Guided Relational Fusion Network for Diagnostic Multivariate Time-Series Anomaly Detection
Perturbation-Induced Linearization: Constructing Unlearnable Data with Solely Linear Classifiers
Delta-XAI: A Unified Framework for Explaining Prediction Changes in Online Time Series Monitoring
CRONOS: Continuous time reconstruction for 4D medical longitudinal series
CTBench: Cryptocurrency Time Series Generation Benchmark
MMPD: Diverse Time Series Forecasting via Multi-Mode Patch Diffusion Loss
Semantic-Enhanced Time-Series Forecasting via Large Language Models
Towards True Speech-to-Speech Models Without Text Guidance
Faster Diffusion Through Temporal Attention Decomposition
TSM-Bench: Detecting LLM-Generated Text in Real-World Wikipedia Editing Practices
SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models
Youtu-GraphRAG: Vertically Unified Agents for Graph Retrieval-Augmented Complex Reasoning
EchoMind: An Interrelated Multi-level Benchmark for Evaluating Empathetic Speech Language Models
LEXam: Benchmarking Legal Reasoning on 340 Law Exams
Echo: Towards Advanced Audio Comprehension via Audio-Interleaved Reasoning
InnovatorBench: Evaluating Agents’ Ability to Conduct Innovative AI Research
Two-Layer Convolutional Autoencoders Trained on Normal Data Provably Detect Unseen Anomalies
CaTS: Calibrated Test-Time Scaling for Efficient LLM Reasoning
Learning to Play Multi-Follower Bayesian Stackelberg Games
Token-Efficient Long-Term Interest Sketching and Internalized Reasoning for LLM-based Recommendation
STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models
An Overview of Subliminal Learning
Rethinking Reasoning in Document Ranking: Why Chain-of-Thought Falls Short
Hybrid Deep Searcher: Scalable Parallel and Sequential Search Reasoning
In-context learning of representations can be explained by induction circuits
RMAAT: Astrocyte-Inspired Memory Compression and Replay for Efficient Long-Context Transformers
LearNAT: Learning NL2SQL with AST-guided Task Decomposition for Large Language Models
Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training
Don't Throw Away Your Beams: Improving Consistency-based Uncertainties in LLMs via Beam Search
Improving Discrete Diffusion Unmasking Policies Beyond Explicit Reference Policies
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation
Reliable Fine-Grained Evaluation of Natural Language Math Proofs
Fixing the Broken Compass: Diagnosing and Improving Inference-Time Reward Modeling
Do LLMs Forget What They Should? Evaluating In-Context Forgetting in Large Language Models
Tackling the XAI Disagreement Problem with Adaptive Feature Grouping
DAMR: Efficient and Adaptive Context-Aware Knowledge Graph Question Answering with LLM-Guided MCTS
Tools are under-documented: Simple Document Expansion Boosts Tool Retrieval
Plan-Answer-Refine-on-Graph: Structured Planning and Self-Refinement for Large Language Model Reasoning on Knowledge Graphs
Are LLMs Really Not Knowledgeable? Mining the Submerged Knowledge in LLMs' Memory
REMem: Reasoning with Episodic Memory in Language Agent
Multimodal Policy Internalization for Conversational Agents
Visual Self-Refine: A Pixel-Guided Paradigm for Accurate Chart Parsing
Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models
Improving Attributed Long-form Question Answering with Intent Awareness
GALAX: Graph-Augmented Language Model for Explainable Reinforcement-Guided Subgraph Reasoning in Precision Medicine
From Text to Talk: Audio-Language Model Needs Non-Autoregressive Joint Training
BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design
Improving Long-Range Interactions in Graph Neural Simulators via Hamiltonian Dynamics
Talk, Evaluate, Diagnose: User-aware Agent Evaluation with Automated Error Analysis
LightMem: Lightweight and Efficient Memory-Augmented Generation
From Utterance to Vividity: Training Expressive Subtitle Translation LLM via Adaptive Local Preference Optimization
How Reliable is Language Model Micro-Benchmarking?
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning
PrismAudio: Decomposed Chain-of-Thought and Multi-dimensional Rewards for Video-to-Audio Generation
Efficient Reasoning with Balanced Thinking
RM-R1: Reward Modeling as Reasoning
Flow Map Learning Via Non-Gradient Vector Flow
Beyond Uniformity: Regularizing Implicit Neural Representations through a Lipschitz Lens
Pareto-Conditioned Diffusion Models for Offline Multi-Objective Optimization
Selective Expert Guidance for Effective and Diverse Exploration in Reinforcement Learning of LLMs
The Open Proof Corpus: A Large-Scale Study of LLM-Generated Mathematical Proofs
UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos
Teach2Eval: An Interaction-Driven LLMs Evaluation Method via Teaching Effectiveness
MILCO: Learned Sparse Retrieval Across Languages via a Multilingual Connector
BioCAP: Exploiting Synthetic Captions Beyond Labels in Biological Foundation Models
Learning to Summarize by Learning to Quiz: Adversarial Agentic Collaboration for Long Document Summarization
Beware Untrusted Simulators -- Reward-Free Backdoor Attacks in Reinforcement Learning
Arbitrary-Shaped Image Generation via Spherical Neural Field Diffusion
Beyond Pass@ 1: Self-Play with Variational Problem Synthesis Sustains RLVR
CNN Interpretability with Multivector Tucker Saliency Maps for Self-Supervised Models
Scaling Knowledge Graph Construction through Synthetic Data Generation and Distillation
On the Wings of Imagination: Conflicting Script-based Multi-role Framework for Humor Caption Generation
Credit-Budgeted ICPC-Style Coding: When Agents Must Pay for Every Decision
LLMs Get Lost In Multi-Turn Conversation
Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles
Towards Reliable Benchmarking: A Contamination Free, Controllable Evaluation Framework for Multi-step LLM Function Calling
Toward Complex-Valued Neural Networks for Waveform Generation
FSPO: Few-Shot Optimization of Synthetic Preferences Effectively Personalizes to Real Users
LoopFormer: Elastic-Depth Looped Transformers for Latent Reasoning via Shortcut Modulation
Exo-Plore: Exploring Exoskeleton Control Space through Human-aligned Simulation
MENLO: From Preferences to Proficiency – Evaluating and Modeling Native-like Quality Across 47 Languages
DiscoX: Benchmarking Discourse-Level Translation in Expert Domains
EntropyLong: Effective Long-Context Training via Predictive Uncertainty
Efficient Audio-Visual Speech Separation with Discrete Lip Semantics and Multi-Scale Global-Local Attention
Influence-Preserving Proxies for Gradient-Based Data Selection in LLM FineTuning
SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents
End-to-end Listen, Look, Speak and Act
DRBench: A Realistic Benchmark for Enterprise Deep Research
Gogo: Group-wise granularity-ordered codec for stable and efficient speech generation
HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation
Towards Understanding Valuable Preference Data for Large Language Model Alignment
FlexiVoice: Enabling Flexible Style Control in Zero-Shot TTS with Natural Language Instructions
Rethinking LLM Reasoning: From Explicit Trajectories to Latent Representations
PT$^2$-LLM: Post-Training Ternarization for Large Language Models
Scaling Generalist Data-Analytic Agents
FlowSearcher: Synthesizing Memory-Guided Agentic Workflows for Web Information Seeking
Long Chain-of-Thought Reasoning Across Languages
CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition
Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation
VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models
RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
When Does Divide and Conquer Work for Long Context LLM? A Noise Decomposition Framework
Can Speech LLMs Think while Listening?
Same Content, Different Representations: A Controlled Study for Table QA
Process-Verified Reinforcement Learning for Theorem Proving via Lean
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs
ToolACE-MT: Non-Autoregressive Generation for Agentic Multi-Turn Interaction
Strategic Planning and Rationalizing on Trees Make LLMs Better Debaters
MARS-Sep: Multimodal-Aligned Reinforced Sound Separation
Dissecting Non-Determinism in Large Language Models
Token-Based Audio Inpainting via Discrete Diffusion
PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts
Enhancing Sparse Event Detection in Healthcare Time-Series via Adaptive Gate of Context–Detail Interaction
Representational Alignment Across Model Layers and Brain Regions with Multi-Level Optimal Transport
Beyond Grid-Locked Voxels: Neural Response Functions for Continuous Brain Encoding
Omni-iEEG: A Large-Scale, Comprehensive iEEG Dataset and Benchmark for Epilepsy Research
Uncovering Semantic Selectivity of Latent Groups in Higher Visual Cortex with Mutual Information-Guided Diffusion
LLMs are Single-threaded Reasoners: Demystifying the Working Mechanism of Soft Thinking
Emergence of Spatial Representation in an Actor-Critic Agent with Hippocampus-Inspired Sequence Generator
Stretching Beyond the Obvious: A Gradient-Free Framework to Unveil the Hidden Landscape of Visual Invariance
Hippoformer: Integrating Hippocampus-inspired Spatial Memory with Transformers
RESTRAIN: From Spurious Votes to Signals — Self-Training RL with Self-Penalization
An Information-Theoretic Framework For Optimizing Experimental Design To Distinguish Probabilistic Neural Codes
Autoregressive Visual Decoding from EEG Signals
Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis
Model-Guided Microstimulation Steers Primate Visual Behavior
Learning from Synthetic Data Improves Multi-hop Reasoning
Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training
SEED: Towards More Accurate Semantic Evaluation for Visual Brain Decoding
Learning Brain Representation with Hierarchical Visual Embeddings
MoGen: Detailed Neuronal Morphology Generation via Point Cloud Flow Matching
Inducing Dyslexia in Vision Language Models
Riemannian High-Order Pooling for Brain Foundation Models
A Brain-Inspired Gating Mechanism Unlocks Robust Computation in Spiking Neural Networks
Tokenizing Single-Channel EEG with Time-Frequency Motif Learning
Learning Mixtures of Linear Dynamical Systems via Hybrid Tensor-EM Method
Accelerating Benchmarking of Functional Connectivity Modeling via Structure-aware Core-set Selection
Decoding Dynamic Visual Experience from Calcium Imaging via Cell-Pattern-Aware Pretraining
Modeling Others' Minds as Code
Continuous multinomial logistic regression for neural decoding
Towards Lossless Memory-efficient Training of Spiking Neural Networks via Gradient Checkpointing and Spike Compression
Neural Dynamics Self-Attention for Spiking Transformers
Decoding Open-Ended Information Seeking Goals from Eye Movements in Reading
ChronoPlay: A Framework for Modeling Dual Dynamics and Authenticity in Game RAG Benchmarks
Using Reinforcement Learning to Train Large Language Models to Explain Human Decisions
Neuro-Symbolic Decoding of Neural Activity
QLCoder: A Query Synthesizer For Static Analysis of Security Vulnerabilities
AtC: Aggregate-then-Calibrate for Human-centered Assessment
On the Shelf Life of Fine-Tuned LLM-Judges: Future-Proofing, Backward-Compatibility, and Question Generalization
Mix-Ecom: Towards Mixed-Type E-Commerce Dialogues with Complex Domain Rules
Transducing Language Models
A Hierarchical Circuit Symbolic Discovery Framework for Efficient Logic Optimization
Read the Room: Video Social Reasoning with Mental-Physical Causal Chains
Calibrating Verbalized Confidence with Self-Generated Distractors
LaVCa: LLM-assisted Visual Cortex Captioning
Meta-Learning Theory-Informed Inductive Biases using Deep Kernel Gaussian Processes
MindMix: A Multimodal Foundation Model for Auditory Perception Decoding via Deep Neural-Acoustic Alignment
SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs
AutoMetrics: Approximate Human Judgments with Automatically Generated Evaluators
TINY BUT MIGHTY: A SOFTWARE-HARDWARE CO- DESIGN APPROACH FOR EFFICIENT MULTIMODAL IN- FERENCE ON BATTERY-POWERED SMALL DEVICES
MIAM: Modality Imbalance-Aware Masking for Multimodal Ecological Applications
Real-Time Reasoning Agents in Evolving Environments
FlowAD: Ego-Scene Interactive Modeling for Autonomous Driving
Evolving Graph Structured Programs for Circuit Generation with Large Language Models
K²-Agent: Co-Evolving Know-What and Know-How for Hierarchical Mobile Device Control
Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning
SpectraLLM: Uncovering the Ability of LLMs for Molecule Structure Elucidation from Multi-Spectra
VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning
SWERank: Software Issue Localization with Code Ranking
From Reproduction to Replication: Evaluating Research Agents with Progressive Code Masking
Virne: A Comprehensive Benchmark for RL-based Network Resource Allocation in NFV
SpeechOp: Inference-Time Task Composition for Generative Speech Processing
Human Behavior Atlas: Benchmarking Unified Psychological And Social Behavior Understanding
PAGE-4D: Disentangled Pose and Geometry Estimation for VGGT-4D Perception
ELLMob: Event-Driven Human Mobility Generation with Self-Aligned LLM Framework
Prior-aware and Context-guided Group Sampling for Active Probabilistic Subsampling
Forget Many, Forget Right: Scalable and Precise Concept Unlearning in Diffusion Models
Dynamic Classifier-Free Diffusion Guidance via Online Feedback
I-DRUID: Layout to image generation via instance-disentangled representation and unpaired data
Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning
SatDreamer360: Multiview-Consistent Generation of Ground-Level Scenes from Satellite Imagery
Paper Copilot: Tracking the Evolution of Peer Review in AI Conferences
Error as Signal: Stiffness-Aware Diffusion Sampling via Embedded Runge-Kutta Guidance
FlowGen: Synthesizing Diverse Flowcharts to Enhance and Benchmark MLLM Reasoning
Story-Iter: A Training-free Iterative Paradigm for Long Story Visualization
FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image Editing
LapFlow: Laplacian Multi-scale Flow Matching for Generative Modeling
Long-Text-to-Image Generation via Compositional Prompt Decomposition
Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?
FlowCast: Trajectory Forecasting for Scalable Zero-Cost Speculative Flow Matching
UltraViCo: Breaking Extrapolation Limits in Video Diffusion Transformers
Turbo-DDCM: Fast and Flexible Zero-Shot Diffusion-Based Image Compression
Monocular Normal Estimation via Shading Sequence Estimation
M$^2$-Miner: Multi-Agent Enhanced MCTS for Mobile GUI Agent Data Mining
Overshoot and Shrinkage in Classifier-Free Guidance: From Theory to Practice
On the stability of gradient descent with second order dynamics for time-varying cost functions
Human Uncertainty-Aware Data Selection and Automatic Labeling in Visual Question Answering
ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks
Achieving Approximate Symmetry Is Exponentially Easier than Exact Symmetry
LearnIR: Learnable Posterior Sampling for Real-World Image Restoration
CIAR: Interval-based Collaborative Decoding for Image Generation Acceleration
Mitigating Noise Shift in Denoising Generative Models with Noise Awareness Guidance
NeuralOS: Towards Simulating Operating Systems via Neural Generative Models
MVCustom: Multi-View Customized Diffusion via Geometric Latent Rendering and Completion
MAGREF: Masked Guidance for Any-Reference Video Generation with Subject Disentanglement
ChronoEdit: Towards Temporal Reasoning for In-Context Image Editing and World Simulation
Does the Data Processing Inequality Reflect Practice? On the Utility of Low-Level Tasks
Synthetic History: Evaluating Visual Representations of the Past in Diffusion Models
D-AR: Diffusion via Autoregressive Models
Closing the Gap Between Text and Speech Understanding in LLMs
Beyond RAG vs. Long-Context: Learning Distraction-Aware Retrieval for Efficient Knowledge Grounding
Diffusion Models as Dataset Distillation Priors
Detecting and Mitigating Memorization in Diffusion Models through Anisotropy of the Log-Probability
Beyond the Known: An Unknown-Aware Large Language Model for Open-Set Text Classification
AgentGym-RL: An Open-Source Framework to Train LLM Agents for Long-Horizon Decision Making via Multi-Turn RL
Recover Cell Tensor: Diffusion-Equivalent Tensor Completion for Fluorescence Microscopy Imaging
Pixel-Perfect Puppetry: Precision-Guided Enhancement for Face Image and Video Editing
Cross-ControlNet: Training-Free Fusion of Multiple Conditions for Text-to-Image Generation
Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter
Point Prompting: Counterfactual Tracking with Video Diffusion Models
ReTabAD: A Benchmark for Restoring Semantic Context in Tabular Anomaly Detection
Bridging the Distribution Gap to Harness Pretrained Diffusion Priors for Super-Resolution
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
FlashWorld: High-quality 3D Scene Generation within Seconds
ContextGen: Contextual Layout Anchoring for Identity-Consistent Multi-Instance Generation
Culture in Action: Evaluating Text-to-Image Models through Social Activities
OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot
The Intricate Dance of Prompt Complexity, Quality, Diversity and Consistency in T2I Models
LogiStory: A Logic-Aware Framework for Multi-Image Story Visualization
Learning to Maximize Rewards via Reaching Goals
Dragging with Geometry: From Pixels to Geometry-Guided Image Editing
Directed Exploration in Reinforcement Learning from Linear Temporal Logic
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation
Does FLUX Already Know How to Perform Physically Plausible Image Composition?
MIMIC: Mask-Injected Manipulation Video Generation with Interaction Control
PixNerd: Pixel Neural Field Diffusion
Temporal Concept Dynamics in Diffusion Models via Prompt-Conditioned Interventions
Iterated Q-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning
MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement
LightCtrl: Training-free Controllable Video Relighting
cadrille: Multi-modal CAD Reconstruction with Reinforcement Learning
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation
ShapeGen4D: Towards High Quality 4D Shape Generation from Videos
SurfSplat: Conquering Feedforward 2D Gaussian Splatting with Surface Continuity Priors
Radiometrically Consistent Gaussian Surfels for Inverse Rendering
Dens3R: A Foundation Model for 3D Geometry Prediction
G4Splat: Geometry-Guided Gaussian Splatting with Generative Prior
Topology-Preserved Auto-regressive Mesh Generation in the Manner of Weaving Silk
True Self-Supervised Novel View Synthesis is Transferable
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer
AssetFormer: Modular 3D Assets Generation with Autoregressive Transformer
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence
$\pi^3$: Permutation-Equivariant Visual Geometry Learning
LumiTex: Towards High-Fidelity PBR Texture Generation with Illumination Context
Mono4DGS-HDR: High Dynamic Range 4D Gaussian Splatting from Alternating-exposure Monocular Videos
Budget Alignment: Making Models Reason in the User's Language
DA$^{2}$: Depth Anything in Any Direction
Human3R: Everyone Everywhere All at Once
TTT3R: 3D Reconstruction as Test-Time Training
MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models
Augmented Radiance Field: A General Framework for Enhanced Gaussian Splatting
mR3: Multilingual Rubric-Agnostic Reward Reasoning Models
Stylos: Multi-View 3D Stylization with Single-Forward Gaussian Splatting
ULTRA-360: Unconstrained Dataset for Large-scale Temporal 3D Reconstruction across Altitudes and Omnidirectional Views
Universal Beta Splatting
Sparkle: A Robust and Versatile Representation for Point Cloud-based Human Motion Capture
STVG-R1: Incentivizing Instance-Level Reasoning and Grounding in Videos via Reinforcement Learning
GuirlVG: Incentivize GUI Visual Grounding via Empirical Exploration on Reinforcement Learning
MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning
VLM-SubtleBench: How Far Are VLMs from Human-Level Subtle Comparative Reasoning?
Unlocking the Essence of Beauty: Advanced Aesthetic Reasoning with Relative-Absolute Policy Optimization
Rethinking Radiology Report Generation: From Narrative Flow to Topic-Guided Findings
AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning
IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment
Vid-LLM: A Compact Video-based 3D Multimodal LLM with Reconstruction–Reasoning Synergy
Threading Keyframe with Narratives: MLLMs as Strong Long Video Comprehenders
Characterizing Deep Research: A Benchmark and Formal Definition
A Balanced Neuro-Symbolic Approach for Commonsense Abductive Logic
MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding
Cat-PO: Cross-modal Adaptive Token-rewards for Preference Optimization in Truthful Multimodal LLMs
TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning
MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
DeepTRACE: Auditing Deep Research AI Systems for Tracking Reliability Across Citations and Evidence
Dynamic Reflections: Probing Video Representations with Text Alignment
GHOST: Hallucination-Inducing Image Generation for Multimodal LLMs
Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning
Text2Arch: A Dataset for Generating Scientific Architecture Diagrams from Natural Language Descriptions
VowelPrompt: Hearing Speech Emotions from Text via Vowel-level Prosodic Augmentation
Rex-Thinker: Grounded Object Referring via Chain-of-Thought Reasoning
Spotlight on Token Perception for Multimodal Reinforcement Learning
Deep Latent Variable Model based Vertical Federated Learning with Flexible Alignment and Labeling Scenarios
Weak-to-Strong Generalization with Failure Trajectories
GOT-Edit: Geometry-Aware Generic Object Tracking via Online Model Editing
Endowing GPT-4 with a Humanoid Body: Building the Bridge Between Off-the-Shelf VLMs and the Physical World
Midway Network: Learning Representations for Recognition and Motion from Latent Dynamics
Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Video
AC-Foley: Reference-Audio-Guided Video-to-Audio Synthesis with Acoustic Transfer
A.I.R.: Enabling Adaptive, Iterative, and Reasoning-based Frame Selection For Video Question Answering
VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
Unified Multi-Modal Interactive and Reactive 3D Motion Generation via Rectified Flow
SpectralGCD: Spectral Concept Selection and Cross-modal Representation Learning for Generalized Category Discovery
Revisual-R1: Advancing Multimodal Reasoning From Optimized Cold Start to Staged Reinforcement Learning
CubeBench: Diagnosing Interactive, Long-Horizon Physical Intelligence under Partial Observations
MergeTune: Continued Fine-Tuning of Vision-Language Models
IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs
Self-Evolving Vision-Language Models for Image Quality Assessment via Voting and Ranking
VisualPRM400K: An Effective Dataset for Training Multimodal Process Reward Models
MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs
Memento: Toward an All-Day Proactive Assistant for Ultra-Long Streaming Video
Reasoning as Representation: Rethinking Visual Reinforcement Learning in Image Quality Assessment
QLIP: A Dynamic Quadtree Vision Prior Enhances MLLM Performance Without Retraining
Efficient Multimodal Spatial Reasoning via Dynamic and Asymmetric Routing
UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation
SPR$^2$Q: Static Priority-based Rectifier Routing Quantization for Image Super-Resolution
Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?
Unveiling the Cognitive Compass: Theory-of-Mind–Guided Multimodal Emotion Reasoning
Query-Guided Spatial–Temporal–Frequency Interaction for Music Audio–Visual Question Answering
Enhancing Multi-Image Understanding through Delimiter Token Scaling
PIRN: Prototypical-based Intra-modal Reconstruction with Normality Communication for Multi-modal Anomaly Detection.
Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models
Linear Mechanisms for Spatiotemporal Reasoning in Vision Language Models
Exploring the Potential of Encoder-free Architectures in 3D LMMs
Supervised Fine-Tuning or Contrastive Learning? Towards Better Multimodal LLM Reranking
Breaking the SFT Plateau: Multimodal Structured Reinforcement Learning for Chart-to-Code Generation
Video-LevelGauge: Investigating Contextual Positional Bias in Video Language Models.
ImageDoctor: Diagnosing Text-to-Image Generation via Grounded Image Reasoning
Hallucination-aware Intermediate Representation Edit in Large Vision-Language Models
When Silence Is Golden: Can LLMs Learn to Abstain in Temporal QA and Beyond?
SONIC: Spectral Oriented Neural Invariant Convolutions
Seeing Across Views: Benchmarking Spatial Reasoning of Vision-Language Models in Robotic Scenes
Omni-IML: Towards Unified Interpretable Image Manipulation Localization
StreamingVLM: Real-Time Understanding for Infinite Video Streams
Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning
SpaCE-Eval: A Benchmark for Real-World Multi-Modal Reasoning
SPIKE-RL: Video-LLMs meet Bayesian Surprise
Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs
PPE: Positional Preservation Embedding for Token Compression in Multimodal Large Language Models
Multimodal Classification via Total Correlation Maximization
OmniCVR: A Benchmark for Omni-Composed Video Retrieval with Vision, Audio, and Text
WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents
Not Search, But Scan: Benchmarking MLLMs on Scan-Oriented Academic Paper Reasoning
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
Action-aware Dynamic Pruning for Efficient Vision-Language-Action Manipulation
GaussianFusion: Unified 3D Gaussian Representation for Multi-Modal Fusion Perception
MaskInversion: Localized Embeddings via Optimization of Explainability Maps
HiDrop: Hierarchical Vision Token Reduction in MLLMs via Late Injection, Concave Pyramid Pruning, and Early Exit
Leveraging Data to Say No: Memory Augmented Plug-and-Play Selective Prediction
Task Vectors, Learned Not Extracted: Performance Gains and Mechanistic Insights
You Point, I Learn: Online Adaptation of Interactive Segmentation Models for Handling Distribution Shifts in Medical Imaging
Detective SAM: Adaptive AI-Image Forgery Localization
COMI: Coarse-to-fine Context Compression via Marginal Information Gain
PerFit: Exploring Personalization Shifts in Representation Space of LLMs
ROGA: Scaling Generalist Agents for Office Productivity Tasks via Tool Generation
Hierarchical Prototype Learning for Semantic Segmentation
Action-Guided Attention for Video Action Anticipation
GOLDILOCS: GENERAL OBJECT-LEVEL DETECTION AND LABELING OF CHANGES IN SCENES
Steering and Rectifying Latent Representation Manifolds in Frozen Multi-modal LLMs for Video Anomaly Detection
From Vicious to Virtuous Cycles: Synergistic Representation Learning for Unsupervised Video Object-Centric Learning
Unveiling Perceptual Artifacts: A Fine-Grained Benchmark for Interpretable AI-Generated Image Detection
Fractional-Order Spiking Neural Network
Towards Reliable Detection of Empty Space: Conditional Marked Point Processes for Object Detection
Procedural Mistake Detection via Action Effect Modeling
FedOpenMatch: Towards Semi-Supervised Federated Learning in Open-Set Environments
Probabilistic Circuits for Uncertainty Quantification
DVLA-RL: Dual-Level Vision–Language Alignment with Reinforcement Learning Gating for Few-Shot Learning
Vulcan: Crafting Compact Class-Specific Vision Transformers For Edge Intelligence
What happens when generative AI models train recursively on each others' outputs?
Understanding and Fixing Bottlenecks in State Space Models: What Recency and Over-Smoothing Tell Us
Learning Deformable Body Interactions With Adaptive Spatial Tokenization
Personalized Feature Translation for Expression Recognition: An Efficient Source-Free Domain Adaptation Method
Enhancing Vision-Language Model with Unmasked Token Alignment
Discrete Audio Tokens: More Than a Survey!
Zero-shot HOI Detection with MLLM-based Detector-agnostic Interaction Recognition
AB-UPT: Scaling Neural CFD Surrogates for High- Fidelity Automotive Aerodynamics Simulations via Anchored- Branched Universal Physics Transformers
GUIDE: Gated Uncertainty-Informed Disentangled Experts for Long-tailed Recognition
Expressive yet Efficient Feature Expansion with Adaptive Cross-Hadamard Products
ELViS: Efficient Visual Similarity from Local Descriptors that Generalizes Across Domains
OmniText: A Training-Free Generalist for Controllable Text-Image Manipulation
Seeing Through Words: Controlling Visual Retrieval Quality with Language Models
gen2seg: Generative Models Enable Generalizable Instance Segmentation
BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, and Rerankers
Learning AND–OR Templates for Compositional Representation in Art and Design
PixelCraft: A Multi-Agent system for High-Fidelity Visual Reasoning on Structured Images
Content-Aware Mamba for Learned Image Compression
TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video
From Sparse to Dense: Spatio-Temporal Fusion for Multi-View 3D Human Pose Estimation with DenseWarper
Rethinking Expressivity and Degradation-Awareness in Attention for All-in-One Blind Image Restoration
Pose Prior Learner: Unsupervised Categorical Prior Learning for Pose Estimation
UniTrack: Differentiable Graph Representation Learning for Multi-Object Tracking
Benchmarking Open-ended Segmentation
Enhancing Vision Transformers for Object Detection via Context-Aware Token Selection and Packing
APT: Towards Universal Scene Graph Generation via Plug-in Adaptive Prompt Tuning
Inlier-Centric Post-Training Quantization for Object Detection Models
PointRePar : SpatioTemporal Point Relation Parsing for Robust Category-Unified 3D Tracking
Splat and Distill: Augmenting Teachers with Feed-Forward 3D Reconstruction For 3D-Aware Distillation
Differentially Private Domain Discovery
DPQuant: Efficient and Private Model Training via Dynamic Quantization Scheduling
Privacy Beyond Pixels: Latent Anonymization for Privacy-Preserving Video Understanding
Privacy-Protected Causal Survival Analysis Under Distribution Shift
CIMemories: A Compositional Benchmark For Contextual Integrity In LLMs
Reducing information dependency does not cause training data privacy. Adversarially non-robust features do.
Searching for Privacy Risks in LLM Agents via Simulation
Gaussian certified unlearning in high dimensions: A hypothesis testing approach
VeriEquivBench: An Equivalence Score for Ground-Truth-Free Evaluation of Formally Verifiable Code
Reliable Poisoned Sample Detection against Backdoor Attacks Enhanced by Sharpness Aware Minimization
Tab-MIA: A Benchmark Dataset for Membership Inference Attacks on Tabular Data in LLMs
General Exploratory Bonus for Optimistic Exploration in RLHF
Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models
Toward Faithful Retrieval-Augmented Generation with Sparse Autoencoders
Optimizing Agent Planning for Security and Autonomy
CoFact: Conformal Factuality Guarantees for Language Models under Covariate Shift
Ready For General Agents? Let's test it.
GAVEL: Towards Rule-Based Safety through Activation Monitoring
Loneliness as a Case Study for Social Reward Misalignment
Flow Where You Want
Online Selective Conformal Inference: Errors and Solutions
Concept-Aware Privacy Mechanisms for Defending Embedding Inversion Attacks
Multi-turn Evaluation of Anthropomorphic Behaviours in Large Language Models
Fewer Weights, More Problems: A Practical Attack on LLM Pruning
Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models
TAO-Attack: Toward Advanced Optimization-Based Jailbreak Attacks for Large Language Models
Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test
Get RICH or Die Scaling: Profitably Trading Inference Compute for Robustness
Robust LLM Unlearning via Post Judgment and Multi-round Thinking
Spilled Energy in Large Language Models
AbsTopK: Rethinking Sparse Autoencoders For Bidirectional Features
Analytica: Soft Propositional Reasoning for Robust and Scalable LLM-Driven Analysis
Continual Unlearning for Text-to-Image Diffusion Models: A Regularization Perspective
Learning Dynamic Causal Graphs Under Parametric Uncertainty via Polynomial Chaos Expansions
MOLM: Mixture of LoRA Markers
Computing Equilibrium beyond Unilateral Deviation
Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs
In Agents We Trust, but Who Do Agents Trust? Latent Source Preferences Steer LLM Generations
Fairness via Independence: A General Regularization Framework for Machine Learning
Attention Smoothing Is All You Need For Unlearning
From Static Benchmarks to Dynamic Protocol: Agent-Centric Text Anomaly Detection for Evaluating LLM Reasoning
Plan and Budget: Effective and Efficient Test-Time Scaling on Reasoning Large Language Models
Decoupling the Class Label and the Target Concept in Machine Unlearning
Enhancing Hallucination Detection through Noise Injection
Fairness-Aware Multi-view Evidential Learning with Adaptive Prior
Unlearning Evaluation through Subset Statistical Independence
Invisible Safety Threat: Malicious Finetuning for LLM via Steganography
Synthesising Counterfactual Explanations via Label-Conditional Gaussian Mixture Variational Autoencoders
Debugging Concept Bottleneck Models through Removal and Retraining
ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack
What's In My Human Feedback? Learning Interpretable Descriptions of Preference Data
LLMs Process Lists With General Filter Heads
From Concepts to Components: Concept-Agnostic Attention Module Discovery in Transformers
When Thinking Backfires: Mechanistic Insights into Reason-induced Misalignment
Sparse Autoencoders Trained on the Same Data Learn Different Features
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
How Do Transformers Learn to Associate Tokens: Gradient Leading Terms Bring Mechanistic Interpretability
Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking
Watermarking Diffusion Language Models
The Price of Amortized inference in Sparse Autoencoders
LLM Fingerprinting via Semantically Conditioned Watermarks
Obfuscated Activations Bypass LLM Latent-Space Defenses
Certified Evaluation of Model-Level Explanations for Graph Neural Networks
Learning Concept Bottleneck Models from Mechanistic Explanations
Enhancing Trustworthiness of Fine-Tuned LLMs via Regularized Subset Selection
Self-Jailbreaking: Language Models Can Reason Themselves Out of Safety Alignment After Benign Reasoning Training
CodeGenGuard: A Watermark for Code Generation Models
How Catastrophic is Your LLM? Certifying Risks in Conversation
SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests
Unlocking Long-Horizon Agentic Search with Large-Scale End-to-End RL
MIDAS: Multi-Image Dispersion and Semantic Reconstruction for Jailbreaking MLLMs
QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation
Truthfulness Despite Weak Supervision: Evaluating and Training LLMs Using Peer Prediction
Towards Scalable Oversight via Partitioned Human Supervision
OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!
Convergence of Regret Matching in Potential Games and Constrained Optimization
Pruning as a Cooperative Game: Surrogate-Assisted Layer Contribution Estimation for Large Language Models
SERUM: Simple, Efficient, Robust, and Unifying Marking for Diffusion-based Image Generation
Algorithmic Guarantees for Distilling Supervised and Offline RL Datasets
MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents
MSCR: Exploring the Vulnerability of LLMs’ Mathematical Reasoning Abilities Using Multi-Source Candidate Replacement
Fair Decision Utility in Human-AI Collaboration: Interpretable Confidence Adjustment for Humans with Cognitive Disparities
Guidance Watermarking for Diffusion Models
FARI: Robust One-Step Inversion for Watermarking in Diffusion Models
Label Smoothing Improves Machine Unlearning
GeoDiv: Framework for Measuring Geographical Diversity in Text-to-Image Models
Fair Classification by Direct Intervention on Operating Characteristics
Explainable Mixture Models through Differentiable Rule Learning
From Evaluation to Defense: Advancing Safety in Video Large Language Models
Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts
STEDiff: Revealing the Spatial and Temporal Redundancy of Backdoor Attacks in Text-to-Image Diffusion Models
Fair in Mind, Fair in Action? A Synchronous Benchmark for Understanding and Generation in UMLLMs
Where Did It Go Wrong? Attributing Undesirable LLM Behaviors via Representation Gradient Tracing
Deep-ICE: The first globally optimal algorithm for empirical risk minimization of two-layer maxout and ReLU networks
Safety Instincts: LLMs Learn to Trust Their Internal Compass for Self-Defense
Hierarchical Concept-based Interpretable Models
Reward Models Inherit Value Biases from Pretraining
Regulating Internal Alignment Flows for Robust Learning Under Spurious Correlations
Benchmarking Stochastic Approximation Algorithms for Fairness-Constrained Training of Deep Neural Networks
Enhancing Image-Conditional Coverage in Segmentation: Adaptive Thresholding via Differentiable Miscoverage Loss
Mitigating the Safety Alignment Tax with Null-Space Constrained Policy Optimization
SlotGCG: Exploiting the Positional Vulnerability in LLMs for Jailbreak Attacks
SDErasure: Concept-Specific Trajectory Shifting for Concept Erasure via Adaptive Diffusion Classifier
SCas4D: Structural Cascaded Optimization for Boosting Persistent 4D Novel View Synthesis
BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models
CogniLoad: A Synthetic Natural Language Reasoning Benchmark With Tunable Length, Intrinsic Difficulty, and Distractor Density
Readout Representation: Redefining Neural Codes by Input Recovery
FREAK: A Fine-grained Hallucination Evaluation Benchmark for Advanced MLLMs
Fine-Grained Privacy Extraction from Retrieval-Augmented Generation Systems by Exploiting Knowledge Asymmetry
GhostEI-Bench: Do Mobile Agent Resilience to Environmental Injection in Dynamic On-Device Environments?
Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
Learn-to-Distance: Distance Learning for Detecting LLM-Generated Text
The Achilles’ Heel of LLMs: How Altering a Handful of Neurons Can Cripple Language Abilities
From ``Sure" to ``Sorry": Detecting Jailbreak in Large Vision Language Model via JailNeurons
ELEPHANT: Measuring and understanding social sycophancy in LLMs
Beyond Match Maximization and Fairness: Retention-Optimized Two-Sided Matching
STAR: Strategy-driven Automatic Jailbreak Red-teaming For Large Language Model
Fair Graph Machine Learning under Adversarial Missingness Processes
Towards a Theoretical Understanding of In-context Learning: Stability and Non-I.I.D Generalisation
Downgrade to Upgrade: Optimizer Simplification Enhances Robustness in LLM Unlearning
MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes
BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses
PRISON: Unmasking the Criminal Potential of Large Language Models
Is Your Paper Being Reviewed by an LLM? Benchmarking AI Text Detection in Peer Review
Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness
PluriHarms: Benchmarking the Full Spectrum of Human Judgments on AI Harm
P-GenRM: Personalized Generative Reward Model with Test-time User-based Scaling
Mapping Overlaps in Benchmarks through Perplexity in the Wild
Modal Aphasia: Can Unified Multimodal Models Describe Images From Memory?
AutoCode: LLMs as Problem Setters for Competitive Programming
Fed-Duet: Dual Expert-Orchestrated Framework for Continual Federated Vision-Language Learning
Do Vision-Language Models Respect Contextual Integrity in Location Disclosure?
WaterDrum: Watermark-based Data-centric Unlearning Metric
UnigramLM: An Attempt at Writing The Missing Manual
CGSA: Class-Guided Slot-Aware Adaptation for Source-Free Object Detection
Market Games for Generative Models: Equilibria, Welfare, and Strategic Entry
LDT: Layer-Decomposition Training Makes Networks More Generalizable
Test-time Domain Generalization for Image Super-resolution
Dual-Kernel Adapter: Expanding Spatial Horizons for Data-Constrained Medical Image Analysis
A Relative Error-Based Evaluation Framework of Heterogeneous Treatment Effect Estimators
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs
FutureMind: Equipping Small Language Models with Strategic Thinking-Pattern Priors via Adaptive Knowledge Distillation
Learning under Quantization for High-Dimensional Linear Regression
Variational Inference for Cyclic Learning
Bayesian Influence Functions for Hessian-Free Data Attribution
Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime
The effect of feature resolution on embedding dimension
Finite-Time Analysis of Actor-Critic Methods with Deep Neural Network Approximation
Automata Learning and Identification of the Support of Language Models
An efficient, provably optimal algorithm for the 0-1 loss linear classification problem
Learning the Inverse Temperature of Ising Models under Hard Constraints using One Sample
A Recovery Guarantee for Sparse Neural Networks
Near Optimal Robust Federated Learning Against Data Poisoning Attack
Best-of-Majority: Minimax-Optimal Strategy for Pass@k Inference Scaling
Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods
High Probability Bounds for Non-Convex Stochastic Optimization with Momentum
Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures
Why Adversarially Train Diffusion Models?
Feedback-driven recurrent quantum neural network universality
Towards Persistent Noise-Tolerant Active Learning of Regular Languages with Class Query
Harnessing Hyperbolic Geometry for Harmful Prompt Detection and Sanitization
Reading Images Like Texts: Sequential Image Understanding in Vision-Language Models
Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability
Implicit Inversion turns CLIP into a Decoder
Characterizing and Mitigating Reasoning Drift in Large Language Models
Neural+Symbolic Approaches for Interpretable Actor-Critic Reinforcement Learning
Provably Explaining Neural Additive Models
GARLIC: Graph Attention-based Relational Learning of Multivariate Time Series in Intensive Care
The Shape of Adversarial Influence: Characterizing LLM Latent Spaces with Persistent Homology
t-SNE Exaggerates Clusters, Provably
Minimax Optimal Adversarial Reinforcement Learning
Aligner, Diagnose Thyself: A Meta-Learning Paradigm for Fusing Intrinsic Feedback in Preference Alignment
Decoupling Positional and Symbolic Attention in Transformers
Scalable Exploration for High-Dimensional Continuous Control via Value-Guided Flow
Efficient Degradation-agnostic Image Restoration via Channel-Wise Functional Decomposition and Manifold Regularization
Concepts' Information Bottleneck Models
Pyramid Patchification Flow for Visual Generation
Detecting Invariant Manifolds in ReLU-Based RNNs
Sparse CLIP: Co-Optimizing Interpretability and Performance in Contrastive Learning
Watermark-based Attribution of AI-Generated Content
When Scores Learn Geometry: Rate Separations under the Manifold Hypothesis
Statistical Guarantees for Offline Domain Randomization
Token-Importance Guided Direct Preference Optimization
Why Prototypes Collapse: Diagnosing and Preventing Partial Collapse in Prototypical Self-Supervised Learning
Landing with the Score: Riemannian Optimization through Denoising
Implicit Bias and Loss of Plasticity in Matrix Completion: Depth Promotes Low-Rankness
A Schrödinger Eigenfunction Method for Long-Horizon Stochastic Optimal Control
LoRA-S: An Efficient Low Rank Adaptation scheme via Sylvester equation
Generating metamers of human scene understanding
Discounted Online Convex Optimization: Uniform Regret Across a Continuous Interval
Nearly Space-Optimal Graph and Hypergraph Sparsification in Insertion-Only Data Streams
High-Probability Bounds for the Last Iterate of Clipped SGD
Multi-LLM Adaptive Conformal Inference for Reliable LLM Response
Human-LLM Collaborative Feature Engineering for Tabular Data
In-Context Multi-Objective Optimization
A Derandomization Framework for Structure Discovery: Applications in Neural Networks and Beyond
Towards Sampling Data Structures for Tensor Products in Turnstile Streams
Graph Random Features for Scalable Gaussian Processes
Optimal Aggregation of LLM and PRM Signals for Efficient Test-Time Scaling
EditLens: Quantifying the Extent of AI Editing in Text
Singleton-Optimized Conformal Prediction
Is Finer Better? The Limits of Microscaling Formats in Large Language Models
VGR: Visual Grounded Reasoning
Branch and Bound Search for Exact MAP Inference in Credal Networks
Best-of-N through the Smoothing Lens: KL Divergence and Regret Analysis
Single Index Bandits: Generalized Linear Contextual Bandits with Unknown Reward Functions
Destruction is a General Strategy to Learn Generation; Diffusion's Strength is to Take it Seriously; Exploration is the Future
Enough is as good as a feast: A Comprehensive Analysis of How Reinforcement Learning Mitigates Task Conflicts in LLMs
Statistical Guarantees for Approximate Stationary Points of Shallow Neural Networks
Tackling Heavy-Tailed Q-Value Bias in Offline-to-Online Reinforcement Learning with Laplace-Robust Modeling
Learning What Matters Now: Dynamic Preference Inference under Contextual Shifts
ROC-n-reroll: How verifier imperfection affects test-time scaling
Towards Cognitively-Faithful Decision-Making Models to Improve AI Alignment
Sample Complexity and Representation Ability of Test-time Scaling Paradigms
Point-Focused Attention Meets Context-Scan State Space: Robust Biological Visual Perception for Point Cloud Representation
Proximal Supervised Fine-Tuning
Locality-Attending Vision Transformer
Scalable Offline Model-Based RL with Action Chunks
From Curiosity to Caution: Mitigating Reward Hacking for Best-of-$N$ with Pessimism
Preference-based Policy Optimization from Sparse-reward Offline Dataset
Q-Learning with Adjoint Matching
The State of Reinforcement Finetuning for Transformer-based Agents
Peak-Return Greedy Slicing: Subtrajectory Selection for Transformer-based Offline RL
Reevaluating Policy Gradient Methods for Imperfect-Information Games
Part-level Semantic-guided Contrastive Learning for Fine-grained Visual Classification
Model Misspecification in Simulation-Based Inference - Recent Advances and Open Challenges
XQC: Well-conditioned Optimization Accelerates Deep Reinforcement Learning
Bayesian Test-Time Adaptation via Dirichlet feature projection and GMM-Driven Inference for Motor Imagery EEG Decoding
Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning
Spectral Bellman Method: Unifying Representation and Exploration in RL
On Coreset for LASSO Regression Problem with Sensitivity Sampling
ReVeal: Self-Evolving Code Agents via Reliable Self-Verification
CORE: Concept-Oriented Reinforcement for Bridging the Definition–Application Gap in Mathematical Reasoning
LoongRL: Reinforcement Learning for Advanced Reasoning over Long Contexts
No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping
One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning
Self-Improving Skill Learning for Robust Skill-based Meta-Reinforcement Learning
Flow Matching Policy Gradients
Entropy-Monitored Kernelized Token Distillation for Audio-Visual Compression
Zero-Shot Adaptation of Behavioral Foundation Models to Unseen Dynamics
Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
Does “Do Differentiable Simulators Give Better Policy Gradients?” Give Better Policy Gradients?
Scaling Up, Speeding Up: A Benchmark of Speculative Decoding for Efficient LLM Test-Time Scaling
HippoTune: A Hippocampal Associative Loop–Inspired Fine-Tuning Method for Continual Learning
EEPO: Exploration-Enhanced Policy Optimization via Sample-Then-Forget
Q-RAG: Long Context Multi‑Step Retrieval via Value‑Based Embedder Training
Learning Massively Multitask World Models for Continuous Control
GRL-SNAM: Geometric Reinforcement Learning with Differential Hamiltonians for Navigation and Mapping in Unknown Environments
RECAST: Expanding the Boundaries of LLMs' Complex Instruction Following with Multi-Constraint Data
Self-Aligned Reward: Towards Effective and Efficient Reasoners
Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models
Understanding and Improving Hyperbolic Deep Reinforcement Learning
Sample Lottery: Unsupervised Discovery of Critical Instances for LLM Reasoning
ExGRPO: Learning to Reason from Experience
Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning
Defending Against Unknown Corrupted Agents: Reinforcement Learning of Adversarially Robust Nash Equilibria
Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training
Medical thinking with multiple images
Squeeze the Soaked Sponge: Efficient Off-policy RFT for Large Language Model
Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn Search Agents
Mitigating the Curse of Detail: Scaling Arguments for Feature Learning and Sample Complexity
Regularized Latent Dynamics Prediction is a Strong Baseline For Behavioral Foundation Models
Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation
When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training
From Parameters to Behaviors: Unsupervised Compression of the Policy Space
GoldenStart: Q-Guided Priors and Entropy Control for Distilling Flow Policies
Guided Policy Optimization under Partial Observability
ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment
LiTo: Surface Light Field Tokenization
MAD-Logic: Multi-Agent Debate Enhances Symbolic Translation and Reasoning
Who Matters Matters: Agent-Specific Conservative Offline MARL
On Predictability of Reinforcement Learning Dynamics for Large Language Models
Aligned Agents, Biased Swarm: Measuring Bias Amplification in Multi-Agent Systems
SPACeR: Self-Play Anchoring with Centralized Reference Models
Multi-Agent Guided Policy Optimization
ChemEval: A Multi-level and Fine-grained Chemical Capability Evaluation for Large Language Models
Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow
Negotiated Reasoning: On Provably Addressing Relative Over-Generalization
Learning to summarize user information for personalized reinforcement learning from human feedback
R2PS: Worst-Case Robust Real-Time Pursuit Strategies under Partial Observability
Sample-Efficient Distributionally Robust Multi-Agent Reinforcement Learning via Online Interaction
Inter-Agent Relative Representations for Multi-Agent Option Discovery
UniSplat: Unified Spatio-Temporal Fusion via 3D Latent Scaffolds for Dynamic Driving Scene Reconstruction
Multi-Action Self-Improvement For Neural Combinatorial Optimization
Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models
From Observations to Events: Event-Aware World Models for Reinforcement Learning
Efficient Best-of-Both-Worlds Algorithms for Contextual Combinatorial Semi-Bandits
Online Decision Making with Generative Action Sets
Spike-based Digital Brain: a novel fundamental model for brain activity analysis
PredNext: Explicit Cross-View Temporal Prediction for Unsupervised Learning in Spiking Neural Networks
Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction
SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration
RankFlow: Property-aware Transport for Protein Optimization
Beyond Distributions: Geometric Action Control for Continuous Reinforcement Learning
Solving Parameter-Robust Avoid Problems with Unknown Feasibility using Reinforcement Learning
Prima.cpp: Fast 30-70B LLM Inference on Heterogeneous and Low-Resource Home Clusters
Multi-objective Large Language Model Alignment with Hierarchical Experts
REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Reasoning
Contextual Causal Bayesian Optimisation
A Near-Optimal Best-of-Both-Worlds Algorithm for Federated Bandits
Learning Admissible Heuristics for A*: Theory and Practice
Regret-Guided Search Control for Efficient Learning in AlphaZero
Planning with an Embodied Learnable Memory
From Embedding to Control: Representations for Stochastic Multi-Object Systems
Conformalized Survival Counterfactuals Prediction for General Right-Censored Data
Enhancing Language Model Reasoning with Structured Multi-Level Modeling
Navigating the Manifold — A Geometric Perspective on Diffusion-Based Inverse Problems
Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents
Latent Adaptation of Foundation Policies for Sim-to-Real Transfer
SFT Doesn’t Always Hurt General Capabilities: Revisiting Domain-Specific Fine-Tuning in LLMs
Lipschitz Bandits with Stochastic Delayed Feedback
AdS-GNN - a Conformally Equivariant Graph Neural Network
Brain-Semantoks: Learning Semantic Tokens of Brain Dynamics with a Self-Distilled Foundation Model
Seeing Through the Brain: New Insights from Decoding Visual Stimuli with fMRI
Go-Browse: Training Web Agents with Structured Exploration
AI Fundamentals: Valuing AI Agents & Data Assets
Online Pseudo-Zeroth-Order Training of Neuromorphic Spiking Neural Networks
Asynchronous Policy Gradient Aggregation for Efficient Distributed Reinforcement Learning
ARM-FM: Automated Reward Machines via Foundation Models for Compositional Reinforcement Learning
Don't Look Up (Every Token): Escaping Quadratic Complexity via Geometric Patterns and Algorithms
Why AI Evaluations Need Error Bars
Simplex Constrained Sparse Optimization via Tail Screening
From Trajectories to Operators — A Unified Flow Map Perspective on Generative Modeling
The Adversarial Conditioning Paradox: Why Attacked Inputs Are More Stable, Not Less
Conformal Robustness Control: A New Strategy for Robust Decision
Incentivizing Consistent, Effective and Scalable Reasoning Capability in Audio LLMs via Reasoning Process Rewards
On the Computational Limits of AI4S-RL : A Unified $\varepsilon$-$N$ Analysis
EMFuse: Energy-based Model Fusion for Decision Making
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
A Biologically Plausible Dense Associative Memory with Exponential Capacity
Accelerated Learning with Linear Temporal Logic using Differentiable Simulation
Provable and Practical In-Context Policy Optimization for Self-Improvement
PEAR: Phase Entropy Aware Reward for Efficient Reasoning
back arrowGo to TMLR homepage Slicing the Gaussian Mixture Wasserstein Distance
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Method
OmniNav: A Unified Framework for Prospective Exploration and Visual-Language Navigation
COSMO-INR: Complex Sinusoidal Modulation for Implicit Neural Representations
NextQuill: Causal Preference Modeling for Enhancing LLM Personalization
EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning
Flow Caching for Autoregressive Video Generation
AutoDA-Timeseries: Automated Data Augmentation for Time Series
MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
Hedonic Neurons: A Mechanistic Mapping of Latent Coalitions in Transformer MLPs
ORCaS: Unsupervised Depth Completion via Occluded Region Completion as Supervision
dLLM - Rethinking Generation Beyond Autoregressive Models
NFT: Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
PerfGuard: A Performance-Aware Agent for Visual Content Generation
T1: Tool-integrated Verification for Test-time Compute Scaling in Small Language Models
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?
NIMO: a Nonlinear Interpretable MOdel
Learning from Noisy Preferences: A Semi-Supervised Learning Approach to Direct Preference Optimization
MAS$^2$: Self-Generative, Self-Configuring, Self-Rectifying Multi-Agent Systems
Mixture-of-World Models: Scaling Multi-Task Reinforcement Learning with Modular Latent Dynamics
MaskCO: Masked Generation Drives Effective Representation Learning and Exploiting for Combinatorial Optimization
NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching
CodeBrain: Bridging Decoupled Tokenizer and Multi-Scale Architecture for EEG Foundation Model
Predicting Kernel Regression Learning Curves from Only Raw Data Statistics
LatentQA: Teaching LLMs to Decode Activations Into Natural Language
Neural Synchrony Between Socially Interacting Language Models
DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment
Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling
Stability Under Scrutiny: Benchmarking Representation Paradigms for Online HD Mapping
Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction
PRISM: Partial-label Relational Inference with Spatial and Spectral Cues
Exploring Knowledge Purification in Multi-Teacher Knowledge Distillation for LLMs
SONA: Learning Conditional, Unconditional, and Matching-Aware Discriminator
Rethinking LLM Evaluation: Can We Evaluate LLMs with 200× Less Data?
TAVAE: A VAE with Adaptable Priors Explains Contextual Modulation in the Visual Cortex
Diverse Dictionary Learning
Taming Hierarchical Image Coding Optimization: A Spectral Regularization Perspective
Reducing Symmetry Increase in Equivariant Neural Networks
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
When would Vision-Proprioception Policies Fail in Robotic Manipulation?
DeepRAG: Thinking to Retrieve Step by Step for Large Language Models
Robust Preference Alignment via Directional Neighborhood Consensus
PTQ4ARVG: Post-Training Quantization for AutoRegressive Visual Generation Models
FACT: a first-principles alternative to the Neural Feature Ansatz for how networks learn representations
THEMIS: Towards Holistic Evaluation of MLLMs for Scientific Paper Fraud Forensics
Bridging Degradation Discrimination and Generation for Universal Image Restoration
Adaptive Methods Are Preferable in High Privacy Settings: An SDE Perspective
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
Joint Selection for Large-Scale Pre-Training Data via Policy Gradient-based Mask Learning
SpatiaLab: Can Vision–Language Models Perform Spatial Reasoning in the Wild?
Rethinking the Diffusion Model from a Langevin Perspective
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation
Tabby: A Language Model Architecture for Tabular and Structured Data Synthesis
P$^2$-DPO: Grounding Hallucination in Perceptual Processing via Calibration Direct Preference Optimization
World2Minecraft: Occupancy-Driven Simulated Scenes Construction
Out of the Memory Barrier: A Highly Memory-Efficient Training System for LLMs with Million-Token Contexts
TRACE: Your Diffusion Model is Secretly an Instance Edge Detector
Bound by semanticity: universal laws governing the generalization-identification tradeoff
When Language Models Lose Their Mind: The Consequences of Brain Misalignment
$\mathbf{Li_2}$: A Framework on Dynamics of Feature Emergence and Delayed Generalization
FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark
Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents
LFQA-E: Carefully Benchmarking Long-form QA Evaluation
MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation
FastVGGT: Fast Visual Geometry Transformer
From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers
Inverse Scaling in Test-Time Compute
HalluEntity: Benchmarking and Understanding Entity-Level Hallucination Detection
SELF-HARMONY: LEARNING TO HARMONIZE SELF-SUPERVISION AND SELF-PLAY IN TEST-TIME REINFORCEMENT LEARNING
GAS: Improving Discretization of Diffusion ODEs via Generalized Adversarial Solver
Evaluating Machine Learned Inter-Atomic Potentials for a Practical Simulation Workflow
TrajTok: What makes for a good trajectory tokenizer in behavior generation?
Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training
Pretraining Scaling Laws for Generative Evaluations of Language Models
Heuristic-Based Ideation for Guiding LLMs Toward Structured Creativity
StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs
Fewer Battles, More Gain: An Information-Efficient Framework for Arena-based LLM Evaluation
Egalitarian Gradient Descent: A Simple Approach to Accelerated Grokking
SCRAPL: Scattering Transform with Random Paths for Machine Learning
Latent Thinking Optimization: Your Latent Reasoning Language Model Secretly Encodes Reward Signals in Its Latent Thoughts
Bi-Lipschitz Autoencoder With Injectivity Guarantee
Is it Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
Efficient Regression-based Training of Normalizing Flows for Boltzmann Generators
High-dimensional Analysis of Synthetic Data Selection
lmgame-Bench: How Good are LLMs at Playing Games?
Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image
Adjusting Prediction Model Through Wasserstein Geodesic for Causal Inference
AnyBCQ: Hardware Efficient Flexible Binary-Coded Quantization for Multi-Precision LLMs
Statistical and structural identifiability in representation learning
On Measuring Influence in Avoiding Undesired Future
Designing Time Series Experiments in A/B Testing with Transformer Reinforcement Learning
Uncertainty as Feature Gaps: Epistemic Uncertainty Quantification of LLMs in Contextual Question-Answering
Embracing Discrete Search: A Reasonable Approach to Causal Structure Learning
IGC-Net for conditional average potential outcome estimation over time
Efficient and Sharp Off-Policy Learning under Unobserved Confounding
Multiverse Mechanica: A Testbed for Learning Game Mechanics via Counterfactual Worlds
DiffSDA: Unsupervised Diffusion Sequential Disentanglement Across Modalities
Decoupling Primitive with Experts: Dynamic Feature Alignment for Compositional Zero-Shot Learning
WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM
Unsupervised Representation Learning - an Invariant Risk Minimization Perspective
Thought Branches: Interpreting LLM Reasoning Requires Resampling
DeepFRC: An End-to-End Deep Learning Model for Functional Registration and Classification
XIL: Cross-Expanding Incremental Learning
Beyond Text-Only: Towards Multimodal Table Retrieval in Open-World
The Lie of the Average: How Class Incremental Learning Evaluation Deceives You?
Multi-ReduNet: Interpretable Class-Wise Decomposition of ReduNet
Token Distillation: Attention-Aware Input Embeddings for New Tokens
Align Your Structures: Generating Trajectories with Structure Pretraining for Molecular Dynamics
Dual Perspectives on Non-Contrastive Self-Supervised Learning
IMSE: Intrinsic Mixture of Spectral Experts Fine-tuning for Test-Time Adaptation
Symmetric Space Learning for Combinatorial Generalization
Confident Block Diagonal Structure-Aware Invariable Graph Completion for Incomplete Multi-view Clustering
Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry
Antibody: Strengthening Defense Against Harmful Fine-Tuning for Large Language Models via Attenuating Harmful Gradient Influence
Fast and Stable Riemannian Metrics on SPD Manifolds via Cholesky Product Geometry
Generalized Compressed Sensing for Image Reconstruction with Diffusion Probabilistic Models
Mixture of Mini Experts: Overcoming the Linear Layer Bottleneck in Multiple Instance Learning
Quantized Gradient Projection for Memory-Efficient Continual Learning
VITA: Zero-Shot Value Functions via Test-Time Adaptation of Vision–Language Models
TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design
Energy-Regularized Sequential Model Editing on Hyperspheres
Clustering by Denoising: Latent plug-and-play diffusion for single-cell embeddings
COMPASS: Robust Feature Conformal Prediction for Medical Segmentation Metrics
Compositional amortized inference for large-scale hierarchical Bayesian models
Efficient Credal Prediction through Decalibration
Conformal Prediction for Long-Tailed Classification
DoFlow: Flow-based Generative Models for Interventional and Counterfactual Forecasting on Time Series
Diffusion and Flow-based Copulas: Forgetting and Remembering Dependencies
QKV Projections Require a Fraction of Their Memory
Play to Generalize: Learning to Reason Through Game Play
Federated ADMM from Bayesian Duality
PU-BENCH: A UNIFIED BENCHMARK FOR RIGOROUS AND REPRODUCIBLE PU LEARNING
When Shift Happens - Confounding Is to Blame
Pseudo-Non-Linear Data Augmentation: A Constrained Energy Minimization Viewpoint
Summaries as Centroids for Interpretable and Scalable Text Clustering
AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite
Greater than the Sum of Its Parts: Building Substructure into Protein Encoding Models
Sharing State Between Prompts and Programs
Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems
Convergence Analysis of Tsetlin Machines under Noise-Free and Noisy Training Conditions: From $2$ Bits to $k$ Bits
HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs
Know When to Abstain: Optimal Selective Classification with Likelihood Ratios
It's All Just Vectorization: einx, a Universal Notation for Tensor Operations
Incomplete Multi-View Multi-Label Classification via Shared Codebook and Fused-Teacher Self-Distillation
Minor First, Major Last: A Depth-Induced Implicit Bias of Sharpness-Aware Minimization
RADAR: Learning to Route with Asymmetry-aware Distance Representations
Combination-of-Experts with Knowledge Sharing for Cross-Task Vehicle Routing Problems
FMIP: Joint Continuous-Integer Flow For Mixed-Integer Linear Programming
Gen-DFL: Decision-Focused Generative Learning for Robust Decision Making
Linking Process to Outcome: Conditional Reward Modeling for LLM Reasoning
Energy-Efficient Random Variate Generation via Compressed Lookup Tables
Beyond Magic Words: Sharpness-Aware Prompt Evolving for Robust Large Language Models with TARE
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
CompassNav: Steering From Path Imitation to Decision Understanding In Navigation
Unlocking the Potential of Weighting Methods in Federated Learning Through Communication Compression
Randomization Boosts KV Caching, Learning Balances Query Load: A Joint Perspective
Speculative Actions: A Lossless Framework for Faster AI Agents
Planner Aware Path Learning in Diffusion Language Models Training
Bi-LoRA: Efficient Sharpness-Aware Minimization for Fine-Tuning Large-Scale Models
HiFo-Prompt: Prompting with Hindsight and Foresight for LLM-based Automatic Heuristic Design
The Unseen Frontier: Pushing the Limits of LLM Sparsity with Surrogate-Free ADMM
RepSpec: Structural Re-parameterized Draft Model Training for Speculative Decoding
Sobolev Gradient Ascent for Optimal Transport: Barycenter Optimization and Convergence Analysis
AdaCache: Adaptive Caching and Context Augmentation for Efficient LLM Serving
GlowQ: Group-Shared LOw-Rank Approximation for Quantized LLMs
Special Unitary Parameterized Estimators of Rotation
Bridging Fairness and Explainability: Can Input-Based Explanations Promote Fairness in Hate Speech Detection?
Reasoning Boosts Opinion Alignment in LLMs
PAS: Estimating the target accuracy before domain adaptation
Medical Interpretability and Knowledge Maps of Large Language Models
Where’s the Chicken? Unpacking Spatial Awareness in Vision-Language Models
MobileCLIP2: Improving Multi-Modal Reinforced Training
Sign-SGD via Parameter-Free Optimization
Stable-LoRA: Stabilizing Feature Learning of Low-Rank Adaptation
When Machine Learning Gets Personal: Evaluating Prediction and Explanation
Text2Interact: High-Fidelity and Diverse Text-to-Two-Person Interaction Generation
Slow-Fast Policy Optimization: Reposition-Before-Update for LLM Reasoning
SkillFactory: Self-Distillation for Learning Cognitive Behaviors
ParaRNN: Unlocking Parallel Training of Nonlinear RNNs for Large Language Models
DiaBlo: Diagonal Blocks Are Sufficient For Finetuning
Toward Principled Flexible Scaling for Self-Gated Neural Activation
(U)NFV: (Un)Supervised Neural Finite Volume Methods for Solving Hyperbolic PDEs
Online Black-Box Prompt Optimization with Regret Guarantees under Noisy Feedback
Dataset Color Quantization: A Training-Oriented Framework for Dataset-Level Compression
Learning from Historical Activations in Graph Neural Networks
ARMOR: High-Performance Semi-Structured Pruning via Adaptive Matrix Factorization
Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs
WSVD: Weighted Low-Rank Approximation for Fast and Efficient Execution of Low-Precision Vision-Language Models
Compute-Optimal Quantization-Aware Training
Programming by Backprop: An Instruction is Worth 100 Examples When Finetuning LLMs
UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice
GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning
Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model
SesaHand: Enhancing 3D Hand Reconstruction via Controllable Generation with Semantic and Structural Alignment
Curvature-Guided Task Synergy for Skeleton based Temporal Action Segmentation
Hierarchy Decoding: A Training-free Parallel Decoding Strategy for Diffusion Large Language Models
Low-Pass Filtering Improves Behavioral Alignment of Vision Models
Scaling Direct Feedback Learning with Jacobian Alignment Guarantees
TS$^2$: Training with Sparsemax+, Testing with Softmax for Accurate and Diverse LLM Fine-Tuning
Learning Self-Critiquing Mechanisms for Region-Guided Chest X-Ray Report Generation
MASAM: Multimodal Adaptive Sharpness-Aware Minimization for Heterogeneous Data Fusion
Exposing Mixture and Annotating Confusion for Active Universal Test-Time Adaptation
Mini-cluster Guided Long-tailed Deep Clustering
Efficient Resource-Constrained Training of Transformers via Subspace Optimization
FreqKV: Key-Value Compression in Frequency Domain for Context Window Extension
On learning linear dynamical systems in context with attention layers
Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models
A State-Transition Framework for Efficient LLM Reasoning
Scaling Attention via Feature Sparsity
Smooth Reading: Bridging the Gap of Recurrent LLM to Self-Attention LLM on Long-Context Understanding
FASA: FREQUENCY-AWARE SPARSE ATTENTION
Learning is Forgetting; LLM Training As Lossy Compression
NRGPT: An Energy-based Alternative for GPT
Probing Rotary Position Embeddings through Frequency Entropy
Multi-Head Low-Rank Attention
Sequential Parallel Duality in Prefix Scannable Models
IA2: Alignment with ICL Activations improves Supervised Fine-Tuning
FSA: An Alternative Efficient Implementation of Native Sparse Attention Kernel
Frequency Bands in RoPE: Base Frequency and Context Length Shape the Interpolation–Extrapolation Trade-off
CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally
ETGS: Explicit Thermodynamics Gaussian Splatting for Dynamic Thermal Reconstruction
Massive Memorization with Hundreds of Trillions of Parameters for Sequential Transducer Generative Recommenders
FOCUS: Efficient Keyframe Selection for Long Video Understanding
Equivariant Splitting: Self-supervised learning from incomplete data
OwlEye: Zero-Shot Learner for Cross-Domain Graph Data Anomaly Detection
Hierarchical Multi-Scale Molecular Conformer Generation
Exploring the Design Space of Transition Matching
Revisiting Sharpness-Aware Minimization: A More Faithful and Effective Implementation
SoFlow: Solution Flow Models for One-Step Generative Modeling
e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs
The Diffusion Duality, Chapter II: $\Psi$-Samplers and Efficient Curriculum
Generative Modeling from Black-Box Corruptions via Self-Consistent Stochastic Interpolants
Soft-Masked Diffusion Language Models
Antithetic Noise in Diffusion Models
Unbiased Object Detection Beyond Frequency with Visually Prompted Image Synthesis
Robust Adversarial Attacks Against Unknown Disturbance via Inverse Gradient Sample
Robust Generalized Schr\"{o}dinger Bridge via Sparse Variational Gaussian Processes
Can LLMs Refuse Questions They Do Not Know? Measuring Knowledge-Aware Refusal in Factual Tasks
MrRoPE: Mixed-radix Rotary Position Embedding
Cross-Embodiment Offline Reinforcement Learning for Heterogeneous Robot Datasets
Flatter Tokens are More Valuable for Speculative Draft Model Training
NeRV-Diffusion: Diffuse Implicit Neural Representation for Video Synthesis
Adaptive Concept Discovery for Interpretable Few-Shot Text Classification
Measurement Score-Based Diffusion Model
PASER: Post-Training Data Selection for Efficient Pruned Large Language Model Recovery
STAT: Skill-Targeted Adaptive Training
Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
Rethinking Policy Diversity in Ensemble Policy Gradient in Large-Scale Reinforcement Learning
Entering the Era of Discrete Diffusion Models: A Benchmark for Schrödinger Bridges and Entropic Optimal Transport
VSF: Simple, Efficient, and Effective Negative Guidance in Few-Step Image Generation Models By Value Sign Flip
CR-Net: Scaling Parameter-Efficient Training with Cross-Layer Low-Rank Structure
Relatron: Automating Relational Machine Learning over Relational Databases
InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information
WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction
DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving
Retrospective Sparse Attention for Efficient Long-Context Generation
BézierFlow: Learning Bézier Stochastic Interpolant Schedulers for Few-Step Generation
STEER AWAY FROM MODE COLLISIONS: IMPROVING COMPOSITION IN DIFFUSION MODELS
wd1: Weighted Policy Optimization for Reasoning in Diffusion Language Models
A Probabilistic Hard Concept Bottleneck for Steerable Generative Models
Continuously Augmented Discrete Diffusion model for Categorical Generative Modeling
HOG-Diff: Higher-Order Guided Diffusion for Graph Generation
Video-GPT via Next Clip Diffusion
LouisKV: Efficient KV Cache Retrieval for Long Input-Output Sequences
Topological Flow Matching
Controlling Repetition in Protein Language Models
Dynamic Multi-sample Mixup with Gradient Exploration for Open-set Graph Anomaly Detection
Adaptive Mixture of Disentangled Experts for Dynamic Graph Out-of-Distribution Generalization
Global-Recent Semantic Reasoning on Dynamic Text-Attributed Graphs with Large Language Models
Chart Deep Research in LVLMs via Parallel Relative Policy Optimization
PonderLM: Pretraining Language Models to Ponder in Continuous Space
Contraction and Hourglass Persistence for Learning on Graphs, Simplices, and Cells
Performative Prediction made practical
: One LLM Token for Explicit Graph Structural Understanding
Conditioned Initialization for Attention
Rethinking the Gold Standard: Why Discrete Curvature Fails to Fully Capture Over-squashing in GNNs?
Can Transformers Really Do It All? On the Compatibility of Inductive Biases Across Tasks
Cutting the Skip: Training Residual-Free Transformers
Oversmoothing, "Oversquashing'', Heterophily, Long-Range, and more: Demystifying Common Beliefs in Graph Machine Learning
Panda: A pretrained forecast model for chaotic dynamics
Federated Graph-Level Clustering Network with Dual Knowledge Separation
The Logical Expressiveness of Topological Neural Networks
DHG-Bench: A Comprehensive Benchmark for Deep Hypergraph Learning
Directed Semi-Simplicial Learning with Applications to Brain Activity Decoding
Topology of Reasoning: Retrieved Cell Complex-Augmented Generation for Textual Graph Question Answering
Minimax Sample Complexity of Graph Neural Networks: Lower Bounds and Structural Effects
ATEX-CF: Attack-Informed Counterfactual Explanations for Graph Neural Networks
Pursuing Minimal Sufficiency in Spatial Reasoning
Bridging Input Feature Spaces Towards Graph Foundation Models
Gelato: Graph Edit Distance via Autoregressive Neural Combinatorial Optimization
Rapid Training of Hamiltonian Graph Networks Using Random Features
Defining and quantifying compositional structure
Bilateral Information-aware Test-time Adaptation for Vision-Language Models
Dual Randomized Smoothing: Beyond Global Noise Variance
Contamination Detection for VLMs Using Multi‑Modal Semantic Perturbations
VERIFY: A Novel Multi-Domain Dataset Grounding LTL in Contextual Natural Language via Provable Intermediate Logic
Single-Loop Byzantine-Resilient Federated Bilevel Optimization
Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation
Jailbreaking the Matrix: Nullspace Steering for Controlled Model Subversion
Mitigating Spurious Correlation via Distributionally Robust Learning with Hierarchical Ambiguity Sets
Multi-Domain Riemannian Graph Gluing for Building Graph Foundation Models
On the $O(1/T)$ Convergence of Alternating Gradient Descent–Ascent in Bilinear Games
Improving Black-Box Generative Attacks via Generator Semantic Consistency
TRACEDET: HALLUCINATION DETECTION FROM THE DECODING TRACE OF DIFFUSION LARGE LANGUAGE MODELS
A Benchmark for Deep Information Synthesis
Two failure modes of deep transformers and how to avoid them: a unified theory of signal propagation at initialisation
The Effect of Attention Head Count on Transformer Approximation
Training-Free Determination of Network Width via Neural Tangent Kernel
Softmax Transformers are Turing-Complete
ssToken: Self-modulated and Semantic-aware Token Selection for LLM Fine-tuning
Scaling with Collapse: Efficient and Predictable Training of LLM Families
PolicyFlow: Policy Optimization with Continuous Normalizing Flow in Reinforcement Learning
FAME: Formal Abstract Minimal Explanation for Neural Networks
Spinning Straw into Gold: Relabeling LLM Agent Trajectories in Hindsight for Successful Demonstrations
In-Context Watermarks for Large Language Models
SHE-LoRA: Selective Homomorphic Encryption for Federated Tuning with Heterogeneous LoRA
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
The Curious Case of In-Training Compression of State Space Models
The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation
Reusing Pre-Training Data at Test Time is a Compute Multiplier
BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs
Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!
SERQ: Saliency-Aware Low-Rank Error Reconstruction for LLM Quantization
FlexHiNM-GP: Flexible Hierarchical Pruning via Region Allocation and Channel Permutation
A Generalized Geometric Theoretical Framework of Centroid Discriminant Analysis for Linear Classification of Multi-dimensional Data
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use
UniOD: A Universal Model for Outlier Detection across Diverse Domains
Deep Learning with Learnable Product-Structured Activations
SparseEval: Efficient Evaluation of Large Language Models by Sparse Optimization
Bridging Explainability and Embeddings: BEE Aware of Spuriousness
VoG: Enhancing LLM Reasoning through Stepwise Verification on Knowledge Graphs
PRISM: Enhancing PRotein Inverse Folding through Fine- Grained Retrieval on Structure-Sequence Multimodal Representations
Global and Local Topology-Aware Graph Generation via Dual Conditioning Diffusion
Tracing and Reversing Edits in LLMs
Tucker-FNO: Tensor Tucker-Fourier Neural Operator and its Universal Approximation Theory
Compose Your Policies! Improving Diffusion-based or Flow-based Robot Policies via Test-time Distribution-level Composition
GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
Quantitative Bounds for Length Generalization in Transformers
Sample Smart, Not Hard: Correctness-First Decoding for Better Reasoning in LLMs
Leveraging Explanation to Improve Generalization of Meta Reinforcement Learning
How does the optimizer implicitly bias the model merging loss landscape?
Scalable Spatio-Temporal SE(3) Diffusion for Long-Horizon Protein Dynamics
Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making
DepthLM: Metric Depth from Vision Language Models
CaReBench: A Fine-grained Benchmark for Video Captioning and Retrieval
Flow Along the $K$-Amplitude for Generative Modeling
JustRL: Scaling a 1.5B LLM with a Simple RL Recipe
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models
Towards Knowledge‑and‑Data‑Driven Organic Reaction Prediction: RAG‑Enhanced and Reasoning‑Powered Hybrid System with LLMs
OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
Enforcing Axioms for AI Alignment under Loss-Based Rules
Much Ado About Noising: Dispelling the Myths of Generative Robotic Control
SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
Frequency-Balanced Retinal Representation Learning with Mutual Information Regularization
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning
Expert Merging in Sparse Mixture of Experts with Nash Bargaining
Steering MoE LLMs via Expert (De)Activation
Scaling Group Inference for Diverse and High-Quality Generation
Harnessing Temporal Databases for Systematic Evaluation of Factual Time-Sensitive Question-Answering in LLMs
Stronger-MAS: Multi-Agent Reinforcement Learning for Collaborative LLMs
Exposing Weaknesses of Large Reasoning Models through Graph Algorithm Problems
FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Merging
Any-step Generation via N-th Order Recursive Consistent Velocity Field Estimation
Uniform Discrete Diffusion with Metric Path for Video Generation
Understanding and Relaxing the Limitations of Transformers for Linear Algebra
NC-Bench and NCfold: A Benchmark and Closed-Loop Framework for RNA Non-Canonical Base-Pair Prediction
Adaptive Regularization for Large-Scale Sparse Feature Embedding Models
The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas
A Statistical Learning Perspective on Semi-dual Adversarial Neural Optimal Transport Solvers
CoPRS: Learning Positional Prior from Chain-of-Thought for Reasoning Segmentation
Humanline: Online Alignment as Perceptual Loss
Sparsity Forcing: Reinforcing Token Sparsity of MLLMs
AudioTrust: Benchmarking The Multifaceted Trustworthiness of Audio Large Language Models
Scalable Chain of Thoughts via Elastic Reasoning
On the Generalization Capacities of MLLMs for Spatial Intelligence
ContextIF: Enhancing Instruction-Following through Context Reward
ParaS2S: Benchmarking and Aligning Spoken Language Models for Paralinguistic-aware Speech-to-Speech Interaction
Understanding Task Vectors in In-Context Learning: Emergence, Functionality, and Limitations
Graph Representational Learning: When Does More Expressivity Hurt Generalization?
Think Then Embed: Generative Context Improves Multimodal Embedding
Fastcar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge
Uncertainty-Aware 3D Reconstruction for Dynamic Underwater Scenes
Autoencoding-Free Context Compression for LLMs via Contextual Semantic Anchors
Generative AI Archaeology
Bongard-RWR+: Real-World Representations of Fine-Grained Concepts in Bongard Problems
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
Analyzing and Evaluating Unbiased Language Model Watermark
Flow Autoencoders are Effective Protein Tokenizers
Fast Data Mixture Optimization via Gradient Descent
Non-Clashing Teaching in Graphs: Algorithms, Complexity, and Bounds
Can LLMs Reason Soundly in Law? Auditing Inference Patterns for Legal Judgment
CAR-LoRA: Training Compression-Aware and Robust LoRA Adapters for Evolving LLMs
Automatic Image-Level Morphological Trait Annotation for Organismal Images
Efficient Zero-shot Inpainting with Decoupled Diffusion Guidance
RD-HRL: Generating Reliable Sub-Goals for Long-Horizon Sparse-Reward Tasks
Synthetic Bootstrapped Pretraining
Consistent Low-Rank Approximation
WavePolyp: Video Polyp Segmentation via Hierarchical Wavelet-Based Feature Aggregation and Inter-Frame Divergence Perception
Bradley-Terry and Multi-Objective Reward Modeling Are Complementary
Three Forward, One Backward: Memory-Efficient Full-Rank Fine-Tuning of Large Models via Extra Forward Passes
LipNeXt: Scaling up Lipschitz-based Certified Robustness to Billion-parameter Models
MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning
Mitigating Non-IID Drift in Zeroth-Order Federated LLM Fine-Tuning with Transferable Sparsity
DrugTrail: Interpretable Drug Discovery via Structured Reasoning and Druggability‑Tailored Preference Optimization
Latent Denoising Makes Good Tokenizers
Distributed Algorithms for Euclidean Clustering
MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation
Advancing Spatiotemporal Representations in Spiking Neural Networks via Parametric Invertible Transformation
Web-CogReasoner: Towards Multimodal Knowledge-Induced Cognitive Reasoning for Web Agents
RoboMD: Uncovering Robot Vulnerabilities through Semantic Potential Fields
Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems
From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation
Incorporating Expert Priors into Bayesian Optimization via Dynamic Mean Decay
Improving Diffusion Models for Class-imbalanced Training Data via Capacity Manipulation
FlowNIB: An Information Bottleneck Analysis of Bidirectional vs. Unidirectional Language Models
Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards
PLANETALIGN: A Comprehensive Python Library for Benchmarking Network Alignment
AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation
CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving
Robust Spiking Neural Networks Against Adversarial Attacks
Unified Analyses for Hierarchical Federated Learning: Topology Selection under Data Heterogeneity
Dual-Robust Cross-Domain Offline Reinforcement Learning Against Dynamics Shifts
Beyond Binary Preferences: A Principled Framework for Reward Modeling with Ordinal Feedback
Annotation-Efficient Honesty Alignment via Confidence Elicitation and Calibration
PROS: Towards Compute-Efficient RLVR via Rollout Prefix Reuse
Tversky Neural Networks: Psychologically Plausible Deep Learning with Differentiable Tversky Similarity
Misalignments and RL Failure Modes in the Early Stage of Superintelligence
MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent
Revisiting the NetHack Learning Environment
PACE: Pretrained Audio Continual Learning
Computer Use Survey - A Visual Survey of Computer Use Agents
Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield
HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models
BridgeDrive: Diffusion Bridge Policy for Closed-Loop Trajectory Planning in Autonomous Driving
Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction
Leveraging a Simulator for Learning Causal Representations from Post-Treatment Covariates for CATE
PCNN: Probable-Class Nearest-Neighbor Explanations Improve Fine-Grained Image Classification Accuracy for AIs and Humans
PAT3D: Physics-Augmented Text-to-3D Scene Generation
ST-SimDiff: Balancing Spatiotemporal Similarity and Difference for Efficient Video Understanding with MLLMs
GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks
Transformers with Endogenous In-Context Learning: Bias Characterization and Mitigation
Thyme: Think Beyond Images
Hierarchical Value-Decomposed Offline Reinforcement Learning for Whole-Body Control
DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science
Scaling Agent Learning via Experience Synthesis
SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation
Skirting Additive Error Barriers for Private Turnstile Streams
How Text Quality Interventions Reshape Neural Scaling Laws for LLMs: Empirical Study
From Assistant to Independent Developer — Are GPTs Ready for Software Development?
SGD with Adaptive Preconditioning: Unified Analysis and Momentum Acceleration
CloDS: Visual-Only Unsupervised Cloth Dynamics Learning in Unknown Conditions
Optimal Brain Restoration for Joint Quantization and Sparsification of LLMs
Frequency-aware Dynamic Gaussian Splatting
SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes
ProSafePrune: Projected Safety Pruning for Mitigating Over-Refusal in LLMs
Long-Context Generalization with Sparse Attention
Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models
ArtUV: Artist-style UV Unwrapping
ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization
Learning to Reason for Hallucination Span Detection
AlignSep: Temporally-Aligned Video-Queried Sound Separation with Flow Matching
Reasoned Safety Alignment: Ensuring Jailbreak Defense via Answer-Then-Check
Learning-Time Encoding Shapes Unlearning in LLMs
Towards Safe and Optimal Online Bidding: A Modular Look-ahead Lyapunov Framework
Oracle-efficient Hybrid Learning with Constrained Adversaries
Compactness and Consistency: A Conjoint Framework for Deep Graph Clustering
HFSTI-Net: Hierarchical Frequency-spatial-temporal Interactions for Video Polyp Segmentation
Spatial Structure and Selective Text Jointly Facilitate Image Clustering
Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning
Asymmetric Proximal Policy Optimization: mini-critics boost LLM reasoning
ChunkTabPFN: Training-free Long Context
SimpleFold: Folding Proteins is Simpler than You Think
A Case for Library-Level k-Means Binning in Histogram Gradient-Boosted Trees
ChinaTravel: An Open-Ended Travel Planning Benchmark with Compositional Constraint Validation for Language Agents
Evaluating Text Creativity across Diverse Domains: a Dataset and Large Language Model Evaluator
Chimera: State Space Models Beyond Sequences
Refining Hybrid Genetic Search for CVRP via Reinforcement Learning-Finetuned LLM
Sci2Pol: Evaluating and Fine-tuning LLMs on Scientific-to-Policy Brief Generation
ReFeR: Improving Evaluation and Reasoning through Hierarchy of Models
Divide and Abstract: Autoformalization via Decomposition and Abstraction Learning
GeoBench: Rethinking Multimodal Geometric Problem-Solving via Hierarchical Evaluation
Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction
Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator
PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model
SynthWorlds: Controlled Parallel Worlds for Disentangling Reasoning and Knowledge in Language Models
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Adversarial Robustness of Graph Transformers
Why High-rank Neural Networks Generalize?: An Algebraic Framework with RKHSs
Fine-Grained Activation Steering: Steering Less, Achieving More
Information Theoretic Guarantees For Policy Alignment In Large Language Models
Knowledge Reasoning Language Model: Unifying Knowledge and Language for Inductive Knowledge Graph Reasoning
LAMDA: A Longitudinal Android Malware Benchmark for Concept Drift Analysis
FlashAttention on a Napkin: A Diagrammatic Approach to Deep Learning IO-Awareness
Multi-View Encoders for Performance Prediction in LLM-Based Agentic Workflows
Local Success Does Not Compose: Benchmarking Large Language Models for Compositional Formal Verification
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
MetaMuse: Algorithm Generation via Creative Ideation
CrossPL: Systematic Evaluation of Large Language Models for Cross Programming Language Interoperating Code Generation
HackWorld: Evaluating Computer-Use Agents on Exploiting Web Application Vulnerabilities
Are Domain Generalization Benchmarks with Accuracy on the Line Misspecified?
Towards Faithful Reasoning in Remote Sensing: A Perceptually-Grounded GeoSpatial Chain-of-Thought for Vision-Language Models
Discrete Diffusion for Bundle Construction
Model Tensor Planning
SportR: A Benchmark for Multimodal Large Language Model Reasoning in Sports
HiVid: LLM-Guided Video Saliency For Content-Aware VOD And Live Streaming
PCF Learned Sort: a Learning Augmented Sort Algorithm with O(nloglogn) Expected Complexity
Encoder-only Next Token Prediction
Automatic Stage Lighting Control: Is it a Rule-Driven Process or Generative Task?
LLEMA: Evolutionary Search with LLMs for Multi-Objective Materials Discovery
Multi-Bellman operator for convergence of Q-learning with linear function approximation
Neural Theorem Proving for Verification Conditions: A Real-World Benchmark
Aurelius: Relation Aware Text-to-Audio Generation At Scale
AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
HATSolver: Learning Gröbner Bases with Hierarchical Attention Transformers
FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels
Bridging Piano Transcription and Rendering via Disentangled Score Content and Style
Durian: Dual Reference Image-Guided Portrait Animation with Attribute Transfer
Hyperspherical Latents Improve Continuous-Token Autoregressive Generation
A Guardrail for Safety Preservation: When Safety-Sensitive Subspace Meets Harmful-Resistant Null-Space
ToProVAR: Efficient Visual Autoregressive Modeling via Tri-Dimensional Entropy-Aware Semantic Analysis and Sparsity Optimization
Data Provenance for Image Auto-Regressive Generation
Latent Diffusion Model without Variational Autoencoder
FACM: Flow-Anchored Consistency Models
TINKER: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization
Eliminating Inductive Bias in Reward Models with Information-Theoretic Guidance
Training Dynamics of Learning 3D-Rotational Equivariance
EchoMotion: Unified Human Video and Motion Generation via Dual-Modality Diffusion Transformer
Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models
Amortized Inference of Causal Models via Conditional Fixed-Point Iterations
Learning Patient-Specific Disease Dynamics With Latent Flow Matching For Longitudinal Imaging Generation
FreeViS: Training-free Video Stylization with Inconsistent References
Pixel to Gaussian: Ultra-Fast Continuous Super-Resolution with 2D Gaussian Modeling
MiSS: Revisiting the Trade-off in LoRA with an Efficient Shard-Sharing Structure
ORION: Decoupling and Alignment for Unified Autoregressive Understanding and Generation
Control Tax: The Price of Keeping AI in Check
What is the Relationship between Tensor Factorizations and Circuits (and How Can We Exploit it)?
Time-to-Move: Training-Free Motion-Controlled Video Generation via Dual-Clock Denoising
Realtime Video Frame Interpolation using One-Step Diffusion Sampling
STORK: Faster Diffusion and Flow Matching Sampling by Resolving both Stiffness and Structure-Dependence
Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols
CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models
3D Scene Prompting for Scene-Consistent Camera-Controllable Video Generation
Model Already Knows the Best Noise: Bayesian Active Noise Selection via Attention in Video Diffusion Model
Exploring Interpretability for Visual Prompt Tuning with Cross-layer Concepts
ConsisDrive: Identity-Preserving Driving World Models for Video Generation by Instance Mask
Generative Blocks World: Moving Things Around in Pictures
FastFlow: Accelerating The Generative Flow Matching Models with Bandit Inference
Enhancing Instruction Following of LLMs via Activation Steering with Dynamic Rejection
RoboInter: A Holistic Intermediate Representation Suite Towards Robotic Manipulation
SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer
AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4D Scenes
Streaming Autoregressive Video Generation via Diagonal Distillation
Reconstruct Anything Model a lightweight general model for computational imaging
MotionWeaver: Holistic 4D-Anchored Framework for Multi-Humanoid Image Animation
$\alpha$-DPO: Robust Preference Alignment for Diffusion Models via $\alpha$ Divergence
Stochastic Self-Guidance for Training-Free Enhancement of Diffusion Models
MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference
Learning to Generate Stylized Handwritten Text via a Unified Representation of Style, Content, and Noise
MoCa: Modeling Object Consistency for 3D Camera Control in Video Generation
Measuring LLM Novelty As The Frontier Of Original And High-Quality Output
HUME: Measuring the Human-Model Performance Gap in Text Embedding Tasks
reAR: Rethinking Visual Autoregressive Models via Token-wise Consistency Regularization
There and Back Again: On the relation between Noise and Image Inversions in Diffusion Models
CASteer: Cross-Attention Steering for Controllable Concept Erasure
DVD-Quant: Data-free Video Diffusion Transformers Quantization
Unleashing Perception-Time Scaling to Multimodal Reasoning Models
Splat the Net: Radiance Fields with Splattable Neural Primitives
Uncertainty Matters in Dynamic Gaussian Splatting for Monocular 4D Reconstruction
CylinderSplat: 3D Gaussian Splatting with Cylindrical Triplanes for Panoramic Novel View Synthesis
NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction
Fused-Planes: Why Train a Thousand Tri-Planes When You Can Share?
Generative View Stitching
PoSh: Using Scene Graphs to Guide LLMs-as-a-Judge for Detailed Image Descriptions
SPRINT: Sparse-Dense Residual Fusion for Efficient Diffusion Transformers
HDR-NSFF: High Dynamic Range Neural Scene Flow Fields
Implicit 4D Gaussian Splatting for Fast Motion with Large Inter-Frame Displacements
Secondary Motion-Aware 3D Clothed Gaussian Avatars from Monocular Videos
ReconViaGen: Towards Accurate Multi-view 3D Object Reconstruction via Generation
EgoHandICL: Egocentric 3D Hand Reconstruction with In-Context Learning
Generalizing Linear Autoencoder Recommenders with Decoupled Expected Quadratic Loss
$\ell_1$ Latent Distance based Continuous-time Graph Representation
SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus
BigMaQ: A Big Macaque Motion and Animation Dataset Bridging Image and 3D Pose Representations
RCPU: Rotation-Constrained Error Compensation for Structured Pruning of Large Language Models
Direct Reward Fine-Tuning on Poses for Single Image to 3D Human in the Wild
Interp3D: Correspondence-aware Interpolation for Generative Textured 3D Morphing
Refine Now, Query Fast: A Decoupled Refinement Paradigm for Implicit Neural Fields
CogniMap3D: Cognitive 3D Mapping and Rapid Retrieval
One Life to Learn: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation
TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning
On Discriminative vs. Generative classifiers: Rethinking MLLMs for Action Understanding
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers
Long-tailed Test-Time Adaptation for Vision-Language Models
VidGuard-R1: AI-Generated Video Detection and Explanation via Reasoning MLLMs and RL
Distillation of Large Language Models via Concrete Score Matching
TPRU: Advancing Temporal and Procedural Understanding in Large Multimodal Models
AMiD: Knowledge Distillation for LLMs with $\alpha$-mixture Assistant Distribution
pFedMMA: Personalized Federated Fine-Tuning with Multi-Modal Adapter for Vision-Language Models
Through the Lens of Contrast: Self-Improving Visual Reasoning in VLMs
Semantic-aware Wasserstein Policy Regularization for Large Language Model Alignment
AC-Sampler: Accelerate and Correct Diffusion Sampling with Metropolis-Hastings Algorithm
Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search
Mordal: Automated Pretrained Model Selection for Vision Language Models
LLMs as Rules Oracles: Exploring Real-World Multimodal Reasoning in Tabletop Strategy Game Environments
Plan then Act: Bi-level CAD Command Sequence Generation
Sapiens2
The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss
Robust Test-time Video-Text Retrieval: Benchmarking and Adapting for Query Shifts
Mitigating Hallucination in Vision-Language Model with Depth and Spatial-aware Key-Value Refinement
Simulation to Rules: A Dual-VLM Framework for Formal Visual Planning
Reasoning on Time-Series for Financial Technical Analysis
Cancer-Myth: Evaluating Large Language Models on Patient Questions with False Presuppositions
TableDART: Dynamic Adaptive Multi-Modal Routing for Table Understanding
RL makes MLLMs see better than SFT
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
Learning to Answer from Correct Demonstrations
DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning
Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs
Directional Textual Inversion for Personalized Text-to-Image Generation
PERSISTENCE SPHERES: BI-CONTINUOUS REPRESENTATIONS OF PERSISTENCE DIAGRAMS.
Visual Planning: Let's Think Only with Images
Autoregressive Models Rival Diffusion Models at ANY-ORDER Generation
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
HiTeA: Hierarchical Temporal Alignment for Training-Free Long-Video Temporal Grounding
Multi-modal Data Spectrum: Multi-modal Datasets are Multi-dimensional
SAM-Veteran: An MLLM-Based Human-like SAM Agent for Reasoning Segmentation
Thicker and Quicker: The Jumbo Token for Fast Plain Vision Transformers
Evaluating Cross-Modal Reasoning Ability and Problem Characteristics with Multimodal Item Response Theory
Calibrated Information Bottleneck for Trusted Multi-modal Clustering
Point-UQ: An Uncertainty-Quantification Paradigm for Point Cloud Few-Shot Class Incremental Learning
A Representer Theorem for Hawkes Processes via Penalized Least Squares Minimization
ERGO: Efficient High-Resolution Visual Understanding for Vision-Language Models
Measure Twice, Cut Once: A Semantic-Oriented Approach to Video Temporal Localization with Video LLMs
TRAJECT-Bench:A Trajectory-Aware Benchmark for Evaluating Agentic Tool Use
HierLoc: Hyperbolic Entity Embeddings for Hierarchical Visual Geolocation
Multimodal Dataset Distillation Made Simple by Prototype-Guided Data Synthesis
ExpVid: A Benchmark for Experiment Video Understanding & Reasoning
PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing
JUDO: A Juxtaposed Domain-Oriented Multimodal Reasoner for Industrial Anomaly QA
Part-X-MLLM: Part-aware 3D Multimodal Large Language Model
Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks
FictionalQA: A Dataset for Studying Memorization and Knowledge Acquisition
FakeXplain: AI-Generated Image Detection via Human-Aligned Grounded Reasoning
The Layered Ontology of Models, Resolving the Epistemological Crisis of AI
Identifying and Evaluating Inactive Heads in Pretrained LLMs
H2OFlow: Grounding Human-Object Affordances with 3D Generative Models and Dense Diffused Flows
SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs
HumanPCR: Probing MLLM Capabilities in Diverse Human-Centric Scenes
DynaGuard: A Dynamic Guardian Model With User-Defined Policies
Importance Sampling for Multi-Negative Multimodal Direct Preference Optimization
Zebra-CoT: A Dataset for Interleaved Vision-Language Reasoning
WebWatcher: Breaking New Frontiers of Vision-Language Deep Research Agent
VINCIE: Unlocking In-context Image Editing from Video
Spatial CAPTCHA: Generatively Benchmarking Spatial Reasoning for Human-Machine Differentiation
Decomposed Attention Fusion in MLLMs for Training-free Video Reasoning Segmentation
TABLET: A Large-Scale Dataset for Robust Visual Table Understanding
IVC-Prune: Revealing the Implicit Visual Coordinates in LVLMs for Vision Token Pruning
Cross-Timestep: 3D Diffusion Model with Trans-temporal Memory LSTM and Adaptive Priori Decoding Strategy for Medical Segmentation
SciTS: Scientific Time Series Understanding and Generation with LLMs
WOW-Seg: A Word-free Open World Segmentation Model
Matting Anything 2: Towards Video Matting for Anything
TTOM: Test-Time Optimization and Memorization for Compositional Video Generation
OVSeg3R: Learn Open-vocabulary Instance Segmentation from 2D via 3D Reconstruction
ASCIIEval: Benchmarking Models' Visual Perception in Text Strings via ASCII Art
FSOD-VFM: Few-Shot Object Detection with Vision Foundation Models and Graph Diffusion
DiVE-k: DIFFERENTIAL VISUAL REASONING FOR FINE-GRAINED IMAGE RECOGNITION
VisionLaw: Inferring Interpretable Intrinsic Dynamics from Visual Observations via Bilevel Optimization
GmNet: Revisiting Gating Mechanisms From A Frequency View
BIRD-INTERACT: Re-imagining Text-to-SQL Evaluation via Lens of Dynamic Interactions
Customizing Visual Emotion Evaluation for MLLMs: An Open-vocabulary, Multifaceted, and Scalable Approach
Exploring Specular Reflection Inconsistency for Generalizable Face Forgery Detection
Off-Trajectory Reasoning: Can LLMs Collaborate on Reasoning Trajectories?
EAST: Early Action Prediction Sampling Strategy with Token Masking
Semantic Visual Anomaly Detection and Reasoning in AI-Generated Images
Reasoning-Driven Multimodal LLM for Domain Generalization
Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints
Interpretable 3D Neural Object Volumes for Robust Conceptual Reasoning
Attention, Please! Revisiting Attentive Probing Through the Lens of Efficiency
OD$^3$: Optimization-free Dataset Distillation for Object Detection
Reinforced Latent Reasoning for LLM-based Recommendation
BioTamperNet: Affinity-Guided State-Space Model Detecting Tampered Biomedical Images
WebDS: An End-to-End Benchmark for Web-based Data Science
KernelFusion: Zero-Shot Blind Super-Resolution via Patch Diffusion
DiffuDETR: Rethinking Detection Transformers with Denoising Diffusion Process
UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity
Asymmetric Synthetic Data Update for Domain Incremental Dataset Distillation
Parameterization-Based Dataset Distillation of 3D Point Clouds through Learnable Shape Morphing
Making Slow Thinking Faster: Compressing LLM Chain-of-Thought via Step Entropy
Bootstrapping MLLM for Weakly‑Supervised Class‑Agnostic Object Counting
Analyzing the Training Dynamics of Image Restoration Transformers: A Revisit to Layer Normalization
DETR-ViP: Detection Transformer with Robust Discriminative Visual Prompts
Exponential-Wrapped Mechanisms: Differential Privacy on Hadamard Manifolds Made Practical
Black-Box Privacy Attacks on Shared Representations in Multitask Learning
A Bayesian Nonparametric Framework for Private, Fair, and Balanced Tabular Data Synthesis
Membership Privacy Risks of Sharpness Aware Minimization
An Orthogonal Learner for Individualized Outcomes in Markov Decision Processes
PMark: Towards Robust and Distortion-free Semantic-level Watermarking with Channel Constraints
FHE-Coder: Benchmarking Secure Agentic Code Generation for Fully Homomorphic Encryption
OpenThoughts: Data Recipes for Reasoning Models
Expertise Can Be Helpful for Reinforcement Learning-based Macro Placement
Mitigating Privacy Risk via Forget Set-Free Unlearning
Secure Outlier-Aware Large Language Model Inference
Winter Soldier: Backdooring Language Models at Pre-Training with Indirect Data Poisoning
ULD-Net: Enabling Ultra-Low-Degree Fully Polynomial Networks for Homomorphically Encrypted Inference
INO-SGD: Addressing Utility Imbalance under Individualized Differential Privacy
EnsembleSHAP: Faithful and Certifiably Robust Attribution for Random Subspace Method
Federated Learning with Profile Mapping under Distribution Shifts and Drifts
AdvChain: Adversarial Chain-of-Thought Tuning for Robust Safety Alignment of Large Reasoning Models
Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness
CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation
SecP-Tuning: Efficient Privacy-Preserving Prompt Tuning for Large Language Models via MPC
INTIMA: A Benchmark for Human-AI Companionship Behavior
ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents
A Unified Total Variation Framework for Membrane Potential Perturbation Dynamic
SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From
RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents
Robust Federated Inference
Don't Shift the Trigger: Robust Gradient Ascent for Backdoor Unlearning
When Priors Backfire: On the Vulnerability of Unlearnable Examples to Pretraining
Memorization Through the Lens of Sample Gradients
Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs
Adversarial Déjà Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks
RedSage: A Cybersecurity Generalist LLM
Co-occurring Associated REtained concepts in Diffusion Unlearning
Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts
TrojanTO: Action-Level Backdoor Attacks Against Trajectory Optimization Models
Sharpness-Aware Machine Unlearning
Evolution of Concepts in Language Model Pre-Training
DualEdit: Mitigating Safety Fallback in LLM Backdoor Editing via Affirmation-Refusal Regulation
Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs
BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration
Priors in time: Missing inductive biases for language model interpretability
ContextBench: Modifying Contexts for Targeted Latent Activation and Behaviour Elicitation
GNN Explanations that do not Explain and How to find Them
Testing Most Influential Sets
Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives
Breaking Agent Backbones: Evaluating the Security of Backbone LLMs in AI Agents
When Style Breaks Safety: Defending LLMs Against Superficial Style Alignment
VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety
BEAT: Visual Backdoor Attacks on VLM-based Embodied Agents via Contrastive Trigger Learning
CompMarkGS: Robust Watermarking for Compressed 3D Gaussian Splatting
Not All Documents Are What You Need for Extracting Instruction Tuning Data
Untraceable DeepFakes via Traceable Fingerprint Elimination
Pragma-VL: Towards a Pragmatic Arbitration of Safety and Helpfulness in MLLMs
SafeDialBench: A Fine-Grained Safety Evaluation Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak Attacks
SEMA: Simple yet Effective Learning for Multi-Turn Jailbreak Attacks
MoRA: Mobility as the Backbone for Geospatial Representation Learning at Scale
Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models
A Fair Bayesian Inference through Matched Gibbs Posterior
Data-Aware and Scalable Sensitivity Analysis for Decision Tree Ensembles
PARD: Accelerating LLM Inference with Low‑Cost PARallel Draft Model Adaptation
Jailbreaking on Text-to-Video Models via Scene Splitting Strategy
GuidedBench: Measuring and Mitigating the Evaluation Discrepancies of In-the-wild LLM Jailbreak Methods
Formalising Human-in-the-Loop: Computational Reductions, Failure Modes, and Legal-Moral Responsibility
Artistic Style and the Play of Neural Style Representations
Benchmarking Bias Mitigation Toward Fairness Without Harm from Vision to LVLMs
Speech World Model: Causal State–Action Planning with Explicit Reasoning for Speech
Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models
Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework
Doubly-Regressing Approach for Subgroup Fairness
Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth
Cost-of-Pass: An Economic Framework for Evaluating Language Models
PerSpectra: A Scalable and Configurable Pluralist Benchmark of Perspectives from Arguments
Semi-Supervised Preference Optimization with Limited Feedback
Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset
DIVERSE: Disagreement-Inducing Vector Evolution for Rashomon Set Exploration
What Do Large Language Models Know About Opinions?
A Rich Knowledge Space for Scalable Deepfake Detection
ZeroSiam: An Efficient Asymmetry for Test-Time Entropy Optimization without Collapse
TiTok: Transfer Token-level Knowledge via Contrastive Excess to Transplant LoRA
VUDG: A Dataset for Video Understanding Domain Generalization
Towards Strategic Persuasion with Language Models
Learning a Game by Paying the Agents
Code World Models for General Game Playing
Dimension-Free Decision Calibration for Nonlinear Loss Functions
Tuning the burn-in phase in training recurrent neural networks improves their performance
Mean Estimation from Coarse Data: Characterizations and Efficient Algorithms
Physics-informed learning under mixing: How physical knowledge speeds up learning
Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit
The Forecast After the Forecast: A Post-Processing Shift in Time Series
Price of Quality: Sufficient Conditions for Sparse Recovery using Mixed-Quality Data
Adversarially Pretrained Transformers May Be Universally Robust In-Context Learners
Instance-Dependent Fixed-Budget Pure Exploration in Reinforcement Learning
Stable coresets: Unleashing the power of uniform sampling
Rényi Sharpness: A Novel Sharpness that Strongly Correlates with Generalization
Frozen Policy Iteration: Computationally Efficient RL under Linear $Q^{\pi}$ Realizability for Deterministic Dynamics
Saddle-to-Saddle Dynamics Explains A Simplicity Bias Across Neural Network Architectures
Dynamical properties of dense associative memory
Gradient Descent Dynamics of Rank-One Matrix Denoising
Preventing Model Collapse Under Overparametrization: Optimal Mixing Ratios for Interpolation Learning and Ridge Regression
When Bias Meets Trainability: Connecting Theories of Initialization
First is Not Really Better Than Last: Evaluating Layer Choice and Aggregation Strategies in Language Model Data Influence Estimation
OmniMouse: Scaling properties of multi-modal, multi-task Brain Models on 150B Neural Tokens
Visual symbolic mechanisms: Emergent symbol processing in Vision Language Models
Block Recurrent Dynamics in Vision Transformers
What's the plan? Metrics for implicit planning in LLMs and their application to rhyme generation and question answering
Learnable Sparsity for Vision Generative Models
Hessian-Enhanced Token Attribution (HETA): Interpreting Autoregressive LLMs
Video Unlearning via Low-Rank Refusal Vector
Eliciting Numerical Predictive Distributions of LLMs Without Auto-Regression
Markovian Transformers for Informative Language Modeling
Causality ≠ Invariance: Function and Concept Vectors in LLMs
ACE: Attribution-Controlled Knowledge Editing for Multi-hop Factual Recall
Pretraining with Re-parametrized Self-Attention: Unlocking Generalizationin SNN-Based Neural Decoding Across Time, Brains, and Tasks
Temporal superposition and feature geometry of RNNs under memory demands
Emotions Where Art Thou: Understanding and Characterizing the Emotional Latent Space of Large Language Models
The Lattice Representation Hypothesis of Large Language Models
Reshaping Reasoning in LLMs: A Theoretical Analysis of RL Training Dynamics through Pattern Selection
Debiased and Denoised Representation Learning for Incomplete Multi-view Clustering
How hard is learning to cut? Trade-offs and sample complexity
Better Learning-Augmented Spanning Tree Algorithms via Metric Forest Completion
Distilling the Thought, Watermarking the Answer: A Principle Semantic Guided Watermark for Reasoning Large Language Models
Celo: Training Versatile Learned Optimizers on a Compute Diet
Efficient algorithms for Incremental Metric Bipartite Matching
OrderDP: A Theoretically Guaranteed Lossless Dynamic Data Pruning Framework
A Scalable Constant-Factor Approximation Algorithm for $W_p$ Optimal Transport
Stop Guessing: Choosing the Optimization-Consistent Uncertainty Measurement for Evidential Deep Learning
Q-learning with Posterior Sampling
Convergence of an actor-critic gradient flow for entropy regularised MDPs in general spaces
WARC-Bench: Web Archive based Benchmark for GUI Subtask Executions
A Unifying View of Coverage in Linear Off-policy Evaluation
Change Point Localization and Inference in Dynamic Multilayer Networks
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale
Welfarist Formulations for Diverse Similarity Search
Random Spiking Neural Networks are Stable and Spectrally Simple
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm
Less Is More: Clustered Cross-Covariance Control for Offline RL
ReFORM: Reflected Flows for On-support Offline RL via Noise Manipulation
Action-Free Offline-To-Online RL via Discretised State Policies
Improving and Accelerating Offline RL in Large Discrete Action Spaces with Structured Policy Initialization
Spectral-guided Physical Dynamics Distillation
Adaptive Scaling of Policy Constraints for Offline Reinforcement Learning
MAGE: Multi-scale Autoregressive Generation for Offline Reinforcement Learning
QeRL: Beyond Efficiency - Quantization-enhanced Reinforcement Learning for LLMs
Direct Preference Optimization for Primitive-Enabled Hierarchical RL: A Bilevel Approach
Geometry of Uncertainty: Learning Metric Spaces for Multimodal State Estimation in RL
Reward Model Routing in Alignment
Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models
A Simple "Motivation" Can Enhance Reinforcement Finetuning of Large Reasoning Models
Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning
Principled Fast and Meta Knowledge Learners for Continual Reinforcement Learning
Intention-Conditioned Flow Occupancy Models
Reliability-Adjusted Prioritized Experience Replay
Quantile Advantage Estimation: Stabilizing RLVR for LLM Reasoning
AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill Diversification
Parameter-Efficient Reinforcement Learning using Prefix Optimization
Horizon Imagination: Efficient On-Policy Rollout in Diffusion World Models
3D-aware Disentangled Representation for Compositional Reinforcement Learning
Flowing Through States: Neural ODE Regularization for Reinforcement Learning
Topological Causal Effects
Mango-GS: Enhancing Spatio-Temporal Consistency in Dynamic Scenes Reconstruction using Multi-Frame Node-Guided 4D Gaussian Splatting
Testing Fourier Sparsity via Implicit Sensing
Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions
SSVPO: Effective Step-Level Credit Assignment for RL Training of Language Models
Type-Compliant Adaptation Cascades
Learning Efficient and Interpretable Multi-Agent Communication
Towards Better Branching Policies: Leveraging the Sequential Nature of Branch-and-Bound Tree
Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
Group-Normalized Implicit Value Optimization for Language Models
Bayesian Robust Cooperative Multi-Agent Reinforcement Learning Against Unknown Adversaries
Improving Human-AI Coordination through Online Adversarial Training and Generative Models
When Is Diversity Rewarded in Cooperative Multi-Agent Learning?
KRAMABENCH: A Benchmark for AI Systems on Data-to-Insight Pipelines over Data Lakes
Contractive Diffusion Policies
Continuous-Time Value Iteration for Multi-Agent Reinforcement Learning
GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving
Sample-efficient and Scalable Exploration in Continuous-Time RL
ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents
Near-Optimal Online Deployment and Routing for Streaming LLMs
MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents
VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications
OrchestrationBench: LLM-Driven Agentic Planning and Tool Use in Multi-Domain Scenarios
Learning Dynamics Feature Representation via Policy Attention for Dynamic Path Planning in Urban Road Networks
Iterative Training of Physics-Informed Neural Networks with Fourier-enhanced Features
Information-based Value Iteration Networks for Decision Making Under Uncertainty
GeoFAR: Geography-Informed Frequency-Aware Super-Resolution for Climate Data
WebSeer: Training Deeper Search Agents through Reinforcement Learning with Self-Reflection
TRIM: Hybrid Inference via Targeted Stepwise Routing in Multi-Step Reasoning Tasks
Agentic Reinforcement Learning with Implicit Step Rewards
RewardBench 2: Advancing Reward Model Evaluation
When Data is the Algorithm: A Systematic Study and Curation of Preference Optimization Datasets
How Far Can Unsupervised RLVR Scale LLM Training?
AutoQD: Automatic Discovery of Diverse Behaviors with Quality-Diversity Optimization
Risk-Sensitive Reinforcement Learning for Alleviating Exploration Dilemmas in Large Language Models
Laplacian Kernelized Bandit
The Markovian Thinker: Architecture-Agnostic Linear Scaling of Reasoning
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
Shop-R1: Rewarding LLMs to Simulate Human Behavior in Online Shopping via Reinforcement Learning
Scaling Goal-conditioned Reinforcement Learning with Multistep Quasimetric Distances
Toward Efficient Exploration by Large Language Model Agents
Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends
DNOD: Deformable Neural Operators for Object Detection in SAR Images
Vision Language Models are Biased
LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning
Disentangled Representation Learning for Parametric Partial Differential Equations
ViPO: Visual Preference Optimization at Scale
TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models
Turning Internal Gap into Self-Improvement: Promoting the Generation-Understanding Unification in MLLMs
Hallucination Begins Where Saliency Drops
WMPO: World Model-based Policy Optimization for Vision-Language-Action Models
floq: Training Critics via Flow-Matching for Scaling Compute in Value-Based RL
Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework
Adaptive Test-Time Training for Predicting Need for Invasive Mechanical Ventilation in Multi-Center Cohorts
An Ensemble Framework for Unbiased Language Model Watermarking
Pushing Test-Time Scaling Limits of Deep Search with Asymmetric Verification
Inpainting-Guided Policy Optimization for Diffusion Large Language Models
YuE: Scaling Open Foundation Models for Long-Form Music Generation
AFTER: Mitigating the Object Hallucination of LVLM via Adaptive Factual-Guided Activation Editing
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows
Representation Alignment for Diffusion Transformers without External Components
A Dense Subset Index for Collective Query Coverage
PCPO: Proportionate Credit Policy Optimization for Preference Alignment of Image Generation Models
FlowRL: Matching Reward Distributions for LLM Reasoning
Speculative Speculative Decoding
Reverse-Engineered Reasoning for Open-Ended Generation
OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning
How Stable is the Next Token? A Geometric View of LLM Prediction Stability
Instance-wise Adaptive Scheduling via Derivative-Free Meta-Learning
Revisiting Group Relative Policy Optimization: Insights into On-Policy and Off-Policy Training
PM-KVQ: Progressive Mixed-precision KV Cache Quantization for Long-CoT LLMs
LLMs Can Hide Text in Other Text of the Same Length
Is Pure Exploitation Sufficient in Exogenous MDPs with Linear Function Approximation?
Scheduling Your LLM Reinforcement Learning with Reasoning Trees
S2R-HDR: A Large-Scale Rendered Dataset for HDR Fusion
BRIDGE: Bi-level Reinforcement Learning for Dynamic Group Structure in Coalition Formation Games
SocialJax: An Evaluation Suite for Multi-agent Reinforcement Learning in Sequential Social Dilemmas
Learning Exposure Mapping Functions for Inferring Heterogeneous Peer Effects
Toward Universal and Transferable Jailbreak Attacks on Vision-Language Models
RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models
Positional Encoding Field
ZeroGR: A Generalizable and Scalable Framework for Zero-Shot Generative Retrieval
Joint Adaptation of Uni-modal Foundation Models for Multi-modal Alzheimer's Disease Diagnosis
CoMind: Towards Community-Driven Agents for Machine Learning Engineering
Training-free Counterfactual Explanation for Temporal Graph Model Inference
Neyman-Pearson Classification under Both Null and Alternative Distributions Shift
On Entropy Control in LLM-RL Algorithms
Near-Optimal Second-Order Guarantees for Model-Based Adversarial Imitation Learning
EigenBench: A Comparative Behavioral Measure of Value Alignment
CL-DPS: A Contrastive Learning Approach to Blind Nonlinear Inverse Problem Solving via Diffusion Posterior Sampling
Reducing Class-Wise Performance Disparity via Margin Regularization
Tug-of-War No More: Harmonizing Accuracy and Robustness in Vision-Language Models via Stability-Aware Task Vector Merging
EchoGen: Generating Visual Echoes in Any Scene via Feed-Forward Subject-Driven Auto-Regressive Model
Language Confusion Gate: Language-Aware Decoding Through Model Self-Distillation
ASIDE: Architectural Separation of Instructions and Data in Language Models
YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
MeSH: Memory-as-State-Highways for Recursive Transformers
CellDuality: Unlocking Biological Reasoning in LLMs with Self-Supervised RLVR
FeDaL: Federated Dataset Learning for General Time Series Foundation Models
Beyond Speedup - Utilizing KV Cache for Sampling and Reasoning
VoxPrivacy: A Benchmark for Evaluating Interactional Privacy of Speech Language Models
Learning Posterior Predictive Distributions for Node Classification from Synthetic Graph Priors
CoLLMLight: Cooperative Large Language Model Agents for Network-Wide Traffic Signal Control
Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs
OmniPortrait: Fine-Grained Personalized Portrait Synthesis via Pivotal Optimization
USTBench: Benchmarking and Dissecting Spatiotemporal Reasoning Capabilities of LLMs as Urban Agents
PRO-MOF: Policy Optimization with Universal Atomistic Models for Controllable MOF Generation
Diverse Text-to-Image Generation via Contrastive Noise Optimization
NAB: Neural Adaptive Binning for Sparse-View CT reconstruction
Muon Outperforms Adam in Tail-End Associative Memory Learning
Frustratingly Simple Retrieval Improves Challenging, Reasoning-Intensive Benchmarks
Culture In a Frame: C$^3$B as a Comic-Based Benchmark for Multimodal Culturally Awareness
GIT-BO: High-Dimensional Bayesian Optimization with Tabular Foundation Models
Full-Graph vs. Mini-Batch Training: Comprehensive Analysis from a Batch Size and Fan-Out Size Perspective
Unleashing Guidance Without Classifiers for Human-Object Interaction Animation
Joint Optimization for 4D Human-Scene Reconstruction in the Wild
Better Bounds for the Distributed Experts Problem
Healthcare Insurance Fraud Detection via Continual Fiedler Vector Graph Model
Do 3D Large Language Models Really Understand 3D Spatial Relationships?
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs
Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
LH-DECEPTION: Simulating and Understanding LLM Deceptive Behaviors in Long-Horizon Interactions
Spatial Mental Modeling from Limited Views
Hybrid Reinforcement: when reward is sparse, better to be dense
Solving General-Utility Markov Decision Processes in the Single-Trial Regime with Online Planning
Neural Optimal Transport Meets Multivariate Conformal Prediction
Kevin: Multi-Turn RL for Generating CUDA Kernels
Learning to Reason via Mixture-of-Thought for Logical Reasoning
How Far Are LLMs from Professional Poker Players? Revisiting Game-Theoretic Reasoning with Agentic Tool Use
TTSDS2: Resources and Benchmark for Evaluating Human-Quality Text to Speech Systems
RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning
Kimi-Dev: Agentless Training as Skill Prior for SWE-agents
Beyond Hearing: Learning Task-Agnostic ExG Representations from Earphones via Physiology-Informed Tokenization
FedMC: Federated Manifold Calibration
MATRIX: Mask Track Alignment for Interaction-aware Video Generation
On the Thinking-Language Modeling Gap in Large Language Models
FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning
SAGA: Structural Aggregation Guided Alignment with Dynamic View and Neighborhood Order Selection for Multiview Graph Domain Adaptation
Geometry-aware Policy Imitation
Gauge-invariant representation holonomy
Mixed-Curvature Tree-Sliced Wasserstein Distance
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
OFMU: OPTIMIZATION-DRIVEN FRAMEWORK FOR MACHINE UNLEARNING
GoT-R1: Unleashing Reasoning Capability of Autoregressive Visual Generation with Reinforcement Learning
EA3D: Event-Augmented 3D Diffusion for Generalizable Novel View Synthesis
MolecularIQ: Characterizing Chemical Reasoning Capabilities Through Symbolic Verification on Molecular Graphs
Learning Escorted Protocols For Multistate Free-Energy Estimation
MATH-Beyond: A Benchmark for RL to Expand Beyond the Base Model
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity?
Astra: General Interactive World Model with Autoregressive Denoising
Understanding and Improving Continuous LLM Adversarial Training via In-context Learning Theory
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
SIM-CoT: Supervised Implicit Chain-of-Thought
Almost Bayesian: Dynamics of SGD Through Singular Learning Theory
AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations
Point-MoE: Large-Scale Multi-Dataset Training with Mixture-of-Experts for 3D Semantic Segmentation
Query-Level Uncertainty in Large Language Models
The Potential of CoT for Reasoning: A Closer Look at Trace Dynamics
Independence Test for Linear Non-Gaussian Data and Applications in Causal Discovery
ALM-MTA: Front-Door Causal Multi-Touch Attribution Method for Creator-Ecosystem Optimization
Foundation Models for Causal Inference via Prior-Data Fitted Networks
Efficient Ensemble Conditional Independence Test Framework for Causal Discovery
Executable Counterfactuals: Improving LLMs' Causal Reasoning Through Code
Modeling Interference for Treatment Effect Estimation in Network Dynamic Environment
Permutation-Consistent Variational Encoding for Incomplete Multi-View Multi-Label Classification
Temporal Slowness in Central Vision Drives Semantic Object Learning
Self-Supervised Learning from Structural Invariance
Difficulty–Diversity Collaborative Filtering for Data-Efficient LLM Fine-Tuning
Reverse Distillation: Consistently Scaling Protein Language Model Representations
Beyond Instance-Level Alignment: Dual-Level Optimal Transport for Audio-Text Retrieval
LLM DNA: Tracing Model Evolution via Functional Representations
Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis
Frequency-Domain Better than Time-Domain for Causal Structure Recovery in Dynamical Systems on Networks
Lightweight Transformer for EEG Classification via Balanced Signed Graph Algorithm Unrolling
On the Alignment Between Supervised and Self-Supervised Contrastive Learning
Disentangled representation learning through unsupervised symmetry group discovery
PHyCLIP: $\ell_1$-Product of Hyperbolic Factors Unifies Hierarchy and Compositionality in Vision-Language Representation Learning
There Was Never a Bottleneck in Concept Bottleneck Models
Verification of the Implicit World Model in a Generative Model via Adversarial Sequences
Uncertainty-driven Embedding Convolution
Covariate-Guided Clusterwise Linear Regression for Generalization to Unseen Data
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
Explainable $ K $-means Neural Networks for Multi-view Clustering
KLAS: Using Similarity to Stitch Neural Networks for Improved Accuracy-Efficiency Tradeoffs
SeeDNorm: Self-Rescaled Dynamic Normalization
Beyond DAGs: A Latent Partial Causal Model for Multimodal Learning
MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark for Reasoning-Intensive Multimodal Retrieval
NeuCLIP: Efficient Large-Scale CLIP Training with Neural Normalizer Optimization
Stochastic Optimal Control for Continuous-Time fMRI Representation Learning
OrthoRF: Exploring Orthogonality in Object-Centric Representations
RAEE: A Robust Retrieval-Augmented Early Exit Framework for Efficient Inference
Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds
Disentanglement of Variations with Multimodal Generative Modeling
Beyond Static Vision: Scene Dynamic Field Unlocks Intuitive Physics Understanding in Multi-modal Large Language Models
AdAEM: An Adaptively and Automated Extensible Measurement of LLMs' Value Difference
One-Shot Exemplars for Class Grounding in Self-Supervised Learning
Difficult Examples Hurt Unsupervised Contrastive Learning: A Theoretical Perspective
RL for Reasoning by Adaptively Revealing Rationales
Adversarial Encoding Perturbation and Synthesis for Set Representation Auxiliary Learning
Enhanced Continual Learning of Vision-Language Models with Model Fusion
Command-V: Training-Free Representation Finetuning Transfer
Merge before Forget: A Single LoRA Continual Learning via Continual Merging
S3OD: Towards Generalizable Salient Object Detection with Synthetic Data
Elastic Optimal Transport: Theory, Application, and Empirical Evaluation
Forget Forgetting: Continual Learning in a World of Abundant Memory
Information Shapes Koopman Representation
Efficient Orthogonal Fine-Tuning with Principal Subspace Adaptation
Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking
Dataless Weight Disentanglement in Task Arithmetic via Kronecker-Factored Approximate Curvature
SWINGARENA: Adversarial Programming Arena for Long-context GitHub Issue Solving
Noise-Aware Generalization: Robustness to In-Domain Noise and Out-of-Domain Generalization
Binomial Gradient-Based Meta-Learning for Enhanced Meta-Gradient Estimation
SmartDJ: Declarative Audio Editing with Audio Language Model
Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition
Latent Veracity Inference for Identifying Errors in Stepwise Reasoning
Alternating Diffusion for Proximal Sampling with Zeroth Order Queries
PriorGuide: Test-Time Prior Adaptation for Simulation-Based Inference
Uncertainty-Aware Diagnostics for Physics-Informed Machine Learning
Don’t Pass@k: A Bayesian Framework for Large Language Model Evaluation
$p\textrm{-less}$ Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding
Probabilistic Kernel Function for Fast Angle Testing
Multi-Condition Conformal Selection
Efficient Agent Training for Computer Use
Boosted Trees on a Diet: Compact Models for Resource-Constrained Devices
Angle K-Means
Accessible, Realistic, and Fair Evaluation of Positive-Unlabeled Learning Algorithms
Boosting for Predictive Sufficiency
LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models
MedAraBench: Large-scale Arabic Medical Question Answering Dataset and Benchmark
TESSAR: Geometry-Aware Active Regression via Dynamic Voronoi Tessellation
Watch your steps: Dormant Adversarial Behaviors that Activate upon LLM Finetuning
BeyondBench: Contamination-Resistant Evaluation of Reasoning in Language Models
Graph-based Nearest Neighbors with Dynamic Updates via Random Walks
Variational Pseudo Marginal Methods for Jet Reconstruction in Particle Physics
Lean4Physics: Comprehensive Reasoning Framework for College-level Physics in Lean4
Epistemic Uncertainty Quantification To Improve Decisions From Black-Box Models
Towards Self-Evolving Agent Benchmarks : Validatable Agent Trajectory via Test-Time Exploration
Neural Predictor-Corrector: Solving Homotopy Problems with Reinforcement Learning
Celo2: Towards Learned Optimization Free Lunch
AutoEP: LLMs-Driven Automation of Hyperparameter Evolution for Metaheuristic Algorithms
BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Digital Behavioural Change
Learning to Adapt: In-Context Learning Beyond Stationarity
Improving Feasibility via Fast Autoencoder-Based Projections
Spatial-DISE: A Unified Benchmark for Evaluating Spatial Reasoning in Vision-Language Models
HBO: Hierarchical Balancing Optimization for Fine-Tuning Large Language Models
OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding
Beyond the Heatmap: A Rigorous Evaluation of Component Impact in MCTS-Based TSP Solvers
SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation
Convergence of Muon with Newton-Schulz
Faster Gradient Methods for Highly-smooth Stochastic Bilevel Optimization
ChartGalaxy: A Dataset for Infographic Chart Understanding and Generation
Constraint Matters: Multi-Modal Representation for Reducing Mixed-Integer Linear programming
MIMIC-Bench: Exploring the User-Like Thinking and Mimicking Capabilities of Multimodal Large Language Models
Language-guided Open-world Video Anomaly Detection under Weak Supervision
Strongly Convex Sets in Riemannian Manifolds
Towards Personalized Deep Research: Benchmarks and Evaluations
Diffusion-DFL: Decision-focused Diffusion Models for Stochastic Optimization
Data-Centric Lessons To Improve Speech-Language Pretraining
Strictly Constrained Generative Modeling via Split Augmented Langevin Sampling
Test-time Verification via Optimal Transport: Coverage, ROC, & Sub-optimality
From Predictors to Samplers via the Training Trajectory
Efficient Approximate Posterior Sampling with Annealed Langevin Monte Carlo
Efficient Sliced Wasserstein Distance Computation via Adaptive Bayesian Optimization
Converge Faster, Talk Less: Hessian-Informed Federated Zeroth-Order Optimization
FZOO: Fast Zeroth-Order Optimizer for Fine‑Tuning Large Language Models towards Adam‑Scale Speed
Adaptive Acquisition Selection for Bayesian Optimization with Large Language Models
Trinity: An Evolved LLM Coordinator
DR-Submodular Maximization with Stochastic Biased Gradients: Classical and Quantum Gradient Algorithms
LUMINA: Detecting Hallucinations in RAG System with Context–Knowledge Signals
Unlocking Full Efficiency of Token Filtering in Large Language Model Training
From Sequential to Parallel: Reformulating Dynamic Programming as GPU Kernels for Large-Scale Stochastic Combinatorial Optimization
A Scalable Distributed Framework for Multimodal GigaVoxel Image Registration
MT-DAO: Multi-Timescale Distributed Adaptive Optimizers with Local Updates
FS-KAN: Permutation Equivariant Kolmogorov-Arnold Networks via Function Sharing
C-Voting: Confidence-Based Test-Time Voting without Explicit Energy Functions
Meta-UCF: Unified Task-Conditioned LoRA Generation for Continual Learning in Large Language Models
Achieving low-bit Muon through subspace preservation and grid quantization
Poly-attention: a general scheme for higher-order self-attention
Samples Are Not Equal: A Sample Selection Approach for Deep Clustering
The Lattice Geometry of Neural Network Quantization: A Short Equivalence Proof of GPTQ and Babai's Algorithm
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning
Massive Editing for Large Language Models Based on Dynamic Weight Generation
Uni-DPO: A Unified Paradigm for Dynamic Preference Optimization of LLMs
TokenSeek: Memory Efficient Fine Tuning via Instance-Aware Token Ditching
Dr.LLM: Dynamic Layer Routing in LLMs
ADEPT: Continual Pretraining via Adaptive Expansion and Dynamic Decoupled Tuning
Displacement-Resistant Extensions of DPO with Nonconvex $f$-Divergences
Query-Aware Flow Diffusion for Graph-Based RAG with Retrieval Guarantees
ChainGPT: Dual-Reasoning Model with Recurrent Depth and Multi-Rank State Updates
SCAD: Super-Class-Aware Debiasing for Long-Tailed Semi-Supervised Learning
Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding
Metis: Training LLMs with FP4 Quantization
Holdout-Loss-Based Data Selection for LLM Finetuning via In-Context Learning
Towards Quantization-Aware Training for Ultra-Low-Bit Reasoning LLMs
Counterfactual Reasoning for Retrieval-Augmented Generation
RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers
LeSTD: LLM Compression via Learning-based Sparse Tensor Decomposition
In Context Semi-Supervised Learning
RACE Attention: A Strictly Linear-Time Attention for Long-Sequence Training
Channel-Aware Mixed-Precision Quantization for Efficient Long-Context Inference
Expert Heads: Robust Evidence Identification for Large Language Models
ProxyAttn: Guided Sparse Attention via Representative Heads
RESA: Bringing Back What Sparse Attention Ignores with Residual Estimation
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
Scaling Knowledge Editing in LLMs to 100,000 Facts with Neural KV Database
LycheeDecode: Accelerating Long-Context LLM Inference via Hybrid-Head Sparse Decoding
PrefixMemory-Tuning: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention
Logit‑KL Flow Matching: Non‑Autoregressive Text Generation via Sampling‑Hybrid Inference
Discrete Bayesian Sample Inference for Graph Generation
Is the Reversal Curse a Binding Problem? Uncovering Limitations of Transformers from a Basic Generalization Failure
What Exactly Does Guidance Do in Masked Discrete Diffusion Models
Evidence for Limited Metacognition in LLMs
FROST: Filtering Reasoning Outliers with Attention for Efficient Reasoning
EAMET: ROBUST MASSIVE MODEL EDITING VIA EMBEDDING ALIGNMENT OPTIMIZATION
On the Interpolation Effect of Score Smoothing in Diffusion Models
Universal Multi-Domain Translation via Diffusion Routers
Structured Flow Autoencoders: Learning Structured Probabilistic Representations with Flow Matching
Learning residue level protein dynamics with multiscale Gaussians
Contact Wasserstein Geodesics for Non-Conservative Schrödinger Bridges
Reasoning Scaffolding: Distilling the Flow of Thought from LLMs
Train Once, Answer All: Many Pretraining Experiments for the Cost of One
Dual-Path Condition Alignment for Diffusion Transformers
FlowBind: Efficient Any-to-Any Generation with Bidirectional Flows
Open Data Synthesis for Deep Research
pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation
Adapting Self-Supervised Representations as a Latent Space for Efficient Generation
Attention Is All You Need for KV Cache in Diffusion LLMs
Information Estimation with Discrete Diffusion
ProtoKV: Long-context Knowledges Are Already Well-Organized Before Your Query
Catalog-Native LLM: Speaking Item-ID dialect with Less Entanglement for Recommendation
Self-Speculative Decoding Accelerates Lossless Inference in Any-Order and Any-Subset Autoregressive Models
One step further with Monte-Carlo sampler to guide diffusion better
Training-Free Reward-Guided Image Editing via Trajectory Optimal Control
ReLaSH: Reconstructing Joint Latent Spaces for Efficient Generation of Synthetic Hypergraphs with Hyperlink Attributes
CardioComposer: Leveraging Differentiable Geometry for Compositional Control of Anatomical Diffusion Models
Learning To Draft: Adaptive Speculative Decoding with Reinforcement Learning
Multiplicative Diffusion Models: Beyond Gaussian Latents
Learning to Reason in Structured In-context Environments with Reinforcement Learning
ProReGen: Progressive Residual Generation under Attribute Correlations
Graph homophily booster: Reimagining the role of discrete features in heterophilic graph learning
HGNet: Scalable Foundation Model for Automated Knowledge Graph Generation from Scientific Literature
DR-GGAD: Dual Residual Centering for Mitigating Anomaly Non‑Discriminativity in Generalist Graph Anomaly Detection
Any-Subgroup Equivariant Networks via Symmetry Breaking
Physics-Inspired All-Pair Interaction Learning for 3D Dynamics Modeling
Low-pass Personalized Subgraph Federated Recommendation
LRIM: a Physics-Based Benchmark for Provably Evaluating Long-Range Capabilities in Graph Learning
Atomic HINs: Entity-Attribute Duality for Heterogeneous Graph Modeling
Forest-Based Graph Learning for Semi-Supervised Node Classification
KnowledgeSmith: Uncovering Knowledge Updating in LLMs with Model Editing and Unlearning
CERTIFIED VS. EMPIRICAL ADVERSARIAL ROBUSTNESS VIA HYBRID CONVOLUTIONS WITH ATTENTION STOCHASTICITY
Trapped by simplicity: When Transformers fail to learn from noisy features
TrainRef: Curating Data with Label Distribution and Minimal Reference for Accurate Prediction and Reliable Confidence
Toward Safer Diffusion Language Models: Discovery and Mitigation of Priming Vulnerability
Are Deep Speech Denoising Models Robust to Adversarial Noise?
Variational Deep Learning via Implicit Regularization
How Dark Patterns Manipulate Web Agents
Pay Less Attention to Function Words for Free Robustness of Vision-Language Models
Is In-Context Learning Learning?
SeRI: Gradient-Free Sensitive Region Identification in Decision-Based Black-Box Attacks
Intrinsic Entropy of Context Length Scaling in LLMs
Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting
Scaling Laws Revisited: Modeling the Role of Data Quality in Language Model Pretraining
How Muon’s Spectral Design Benefits Generalization: A Study on Imbalanced Data
Noise Stability of Transformer Models
On The Fragility of Benchmark Contamination Detection in Reasoning Models
Characterizing the Discrete Geometry of ReLU Networks
Benchmarking LLM Tool-Use in the Wild
Composer: A Search Framework for Hybrid Neural Architecture Design
Pi-CCA: Prompt-Invariant CCA Certificates for Replay-Free Continual Multimodal Learning
Learning Semi-Structured Sparsity for LLMs via Shared and Context-Aware Hypernetwork
FoNE: Precise Single-Token Number Embeddings via Fourier Features
Efficient Adversarial Attacks on High-dimensional Offline Bandits
Compositional Generalization from Learned Skills via CoT Training: A Theoretical and Structural Analysis for Reasoning
Understanding the Learning Phases in Self-Supervised Learning via Critical Periods
Can Language Models Discover Scaling Laws?
Train-before-Test Harmonizes Language Model Rankings
E²LoRA: Efficient and Effective Low-Rank Adaptation with Entropy-Guided Adaptive Sharing
ProofFlow: A Dependency Graph Approach to Faithful Proof Autoformalization
Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation
ICaRus: Identical Cache Reuse for Efficient Multi-Model Inference
An Information Theoretic Perspective on Agentic System Design
Hystar: Hypernetwork-driven Style-adaptive Retrieval via Dynamic SVD Modulation
CodeSense: a Real-World Benchmark and Dataset for Code Semantic Reasoning
Libra: Effective yet Efficient Load Balancing for Large-scale MoE Inference
ResiliBench: Evaluating Agentic Workflow Adaptation in Stochastic Environments
SAIR: Enabling Deep Learning for Protein-Ligand Interactions with a Synthetic Structural Dataset
The First Impression Problem: Internal Bias Triggers Overthinking in Reasoning Models
Scaling Laws and Symmetry, Evidence from Neural Force Fields
DRIFT-Net: A Spectral-Coupled Neural Operator for PDEs Learning
PoseX: AI Defeats Physics-based Methods on Protein Ligand Cross-Docking
FACET: A Fragment-Aware Conformer Ensemble Transformer
Adaptive Hopfield Network: Rethinking Similarities in Associative Memory
Interpolation-Based Conditioning of Flow Matching Models for Bioisosteric Ligand Design
Implicit bias produces neural scaling laws in learning curves, from perceptrons to deep networks
RMFlow: Refined Mean Flow by a Noise-Injection Step for Multimodal Generation
Protein Structure Tokenization via Geometric Byte Pair Encoding
Robust Reward Modeling via Causal Rubrics
Branched Schrödinger Bridge Matching
Refine Drugs, Don’t Complete Them: Uniform-Source Discrete Flows for Fragment-Based Drug Discovery
Beyond Accuracy: Are Time Series Foundation Models Well-Calibrated?
Triangle Multiplication is All You Need for Biomolecular Structure Representations
IR-Agent: Expert-Inspired LLM Agents for Structure Elucidation from Infrared Spectra
Zephyrus: An Agentic Framework for Weather Science
HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data
Dual-IPO: Dual-Iterative Preference Optimization for Text-to-Video Generation
RIDER: 3D RNA Inverse Design with Reinforcement Learning-Guided Diffusion
GeneBreaker: Jailbreak Attacks against DNA Language Models with Pathogenicity Guidance
CP-Agent: Context‑Aware Multimodal Reasoning for Cellular Morphological Profiling under Chemical Perturbations
OVID: Open-Vocabulary Intrusion Detection
PETRI: Learning Unified Cell Embeddings from Unpaired Modalities via Early-Fusion Joint Reconstruction
CHAMMI-75: Pre-training multi-channel models with heterogeneous microscopy images
PatchDNA: A Flexible and Biologically-Informed Alternative to Tokenization for DNA
Representing local protein environments with machine learning force fields
Histopathology-Genomics Multi-modal Structural Representation Learning for Data-Efficient Precision Oncology
Photon: Speedup Volume Understanding with Efficient Multimodal Large Language Models
Can Large Language Models Match the Conclusions of Systematic Reviews?
Moving Beyond Medical Exams: A Clinician-Annotated Fairness Dataset of Real-World Tasks and Ambiguity in Mental Healthcare
Random Anchors with Low-rank Decorrelated Learning: A Minimalist Pipeline for Class-Incremental Medical Image Classification
OpenPros: A Large-Scale Dataset for Limited View Prostate Ultrasound Computed Tomography
U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding
KnowGuard: Knowledge-Driven Abstention for Multi-Round Clinical Reasoning
VeriRole: Verifiable Role-Awareness through Hint-Guided Reinforcement Learning
AbdCTBench: Learning Clinical Biomarker Representations from Abdominal Surface Geometry
MedVR: Annotation-Free Medical Visual Reasoning via Agentic Reinforcement Learning
Gradient Intrinsic Dimensionality Alignment:Narrowing The Gap Between Low-Rank Adaptation and Full Fine-Tuning
Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks
MedLesionVQA: A Multimodal Benchmark Emulating Clinical Visual Diagnosis for Body Surface Health
Catching the Details: Self-Distilled RoI Predictors for Fine-Grained MLLM Perception
Critic–Adviser–Reviser Cyclic Refinement: Towards High-Quality EMR Corpus Generation with LLMs
The Limits of Inference Scaling Through Resampling
Identity-Free Deferral For Unseen Experts
Unveiling the Mechanism of Continuous Representation Full-Waveform Inversion: A Wave Based Neural Tangent Kernel Framework
CFO: Learning Continuous-Time PDE Dynamics via Flow-Matched Neural Operators
When Large Multimodal Models Confront Evolving Knowledge: Challenges and Explorations
RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation
Fast training of accurate physics-informed neural networks without gradient descent
KANO: Kolmogorov-Arnold Neural Operator
AdaRank: Adaptive Rank Pruning for Enhanced Model Merging
Buckingham $\pi$-Invariant Test‑Time Projection for Robust PDE Surrogate Modeling
Learning Data-Efficient and Generalizable Neural Operators via Fundamental Physics Knowledge
Grounding Generative Planners in Verifiable Logic: A Hybrid Architecture for Trustworthy Embodied AI
RF-MatID: Dataset and Benchmark for Radio Frequency Material Identification
BoRA: Towards More Expressive Low-Rank Adaptation with Block Diversity
M$^3$E: Continual Vision-and-Language Navigation via Mixture of Macro and Micro Experts
Self-Improving Vision-Language-Action Models with Data Generation via Residual RL
SparseD: Sparse Attention for Diffusion Language Models
Differentiable Model Predictive Control on the GPU
Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance
RoboOmni: Proactive Robot Manipulation in Omni-modal Context
When LLMs get significantly worse: A statistical approach to detect model degradations
Uncertainty-Aware Gaussian Map for Vision-Language Navigation
Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-Language Navigation
COOPERTRIM: Adaptive Data Selection for Uncertainty-Aware Cooperative Perception
Aegis: Automated Error Generation and Attribution for Multi-Agent Systems
SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via Velocity-Reparameterized Sequential Modeling
SARM: Stage-Aware Reward Modeling for Long Horizon Robot Manipulation
Generalizable Coarse-to-Fine Robot Manipulation via Language-Aligned 3D Keypoints
Accelerated co-design of robots through morphological pretraining
Paradigm Shift of GNN Explainer from Label Space to Prototypical Representation Space
AutoBio: A Simulation and Benchmark for Robotic Automation in Digital Biology Laboratory
Bird's-eye-view Informed Reasoning Driver
Demystifying Robot Diffusion Policies: Action Memorization and a Simple Lookup Table Alternative
RobotArena $\infty$: Scalable Robot Benchmarking via Real-to-Sim Translation
SAGE: Spatial-visual Adaptive Graph Exploration for Efficient Visual Place Recognition
TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning
SPECS: Decoupling Multimodal Learning via Self-distilled Preference-based Cold Start
Tackling Time-Series Forecasting Generalization via Mitigating Concept Drift
ARROW: An Adaptive Rollout and Routing Method for Global Weather Forecasting
When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency
Steering Language Models with Weight Arithmetic
Battery Fault: A Comprehensive Dataset and Benchmark for Battery Fault Diagnosis
Enabling arbitrary inference in spatio-temporal dynamic systems: A physics-inspired perspective
SE-Diff: Simulator and Experience Enhanced Diffusion Model for Comprehensive ECG Generation
Contextual and Seasonal LSTMs for Time Series Anomaly Detection
OWL : Geometry-Aware Spatial Reasoning for Audio Large Language Models
CLIP-FMoE: Scalable CLIP via Fused Mixture-of-Experts with Enforced Specialization
Death of the Novel(ty): Beyond N-Gram Novelty as a Metric for Textual Creativity
Flipping the Dialogue: Training and Evaluating User Language Models
Accelerating Inference for Multilayer Neural Networks with Quantum Computers
Rewarding Doubt: A Reinforcement Learning Approach to Calibrated Confidence Expression of Large Language Models
GPS: Graph-guided Proactive Information Seeking in Large Language Models
Scale-wise Distillation of Diffusion Models
Learnable Fractional Superlets with a Spectro-Temporal Emotion Encoder for Speech Emotion Recognition
IterResearch: Rethinking Long-Horizon Agents with Interaction Scaling
Data Selection for LLM Alignment Using Fine-Grained Preferences
Rethinking Global Text Conditioning in Diffusion Transformers
Enhancing Persona Following at Decoding Time via Dynamic Importance Estimation for Role-Playing Agents
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
Hierarchical Semantic-Acoustic Modeling via Semi-Discrete Residual Representations for Expressive End-to-End Speech Synthesis
SUSD: Structured Unsupervised Skill Discovery through State Factorization
CONCUR: A Framework for Continual Constrained and Unconstrained Routing
MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning
ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
PERSONA: Dynamic and Compositional Inference-Time Personality Control via Activation Vector Algebra
Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models
A Problem-Oriented Perspective and Anchor Verification for Code Optimization
LINGOLY-TOO: Disentangling Reasoning from Knowledge with Templatised Orthographic Obfuscation
POEMetric: The Last Stanza of Humanity
SPARTA: Scalable and Principled Benchmark of Tree-Structured Multi-hop QA over Text and Tables
NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents
BANZ-FS: BANZSL Fingerspelling Dataset
AutoLibra: Agent Metric Induction from Open-Ended Human Feedback
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
FeatureBench: Benchmarking Agentic Coding for Complex Feature Development
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs
Scaling Behavior of Discrete Diffusion Language Models
KnowProxy: Adapting Large Language Models by Knowledge-guided Proxy
Geometric Constraints for Small Language Models to Understand and Expand Scientific Taxonomies
Neuron-Aware Data Selection in Instruction Tuning for Large Language Models
IDEAL: Data Equilibrium Adaptation for Multi-Capability Language Model Alignment
RefTool: Reference-Guided Tool Creation for Knowledge-Intensive Reasoning
Inheriting Generalizable Knowledge from LLMs to Diverse Vertical Tasks
Implicit Regularization of SGD Reduces Shortcut Learning
Learning Hierarchical and Geometry-Aware Graph Representations for Text-to-CAD
Latent Speech-Text Transformer
FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates
Segment-Level Attribution for Selective Learning of Long Reasoning Traces
Should We Still Pretrain Encoders with Masked Language Modeling?
SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization
Parallel Token Prediction for Language Models
Fluent Alignment with Disfluent Judges: Post-training for lower-resource languages
ClarifyVC: Clarifying Ambiguous Commands in Vehicle Control with a Hybrid Data Augmentation Pipeline
Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models
TableMaster: A Recipe to Advance Table Understanding with Language Models
Learning to Generate Unit Test via Adversarial Reinforcement Learning
SPRIG: Improving Large Language Model Performance by System Prompt Optimization
Mixture-of-Visual-Thoughts: Exploring Context-Adaptive Reasoning Mode Selection for General Visual Reasoning
Critique-RL: Training Language Models For Critiquing Through Two-Stage Reinforcement Learning
DESIGNER: Design-Logic-Guided Multidisciplinary Data Synthesis for LLM Reasoning
HardcoreLogic: Challenging Large Reasoning Models with Long-tail Logic Puzzle Games
Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution
Code Aesthetics with Agentic Reward Feedback
GRO-RAG: Gradient-aware Re-rank Optimization for Multi-source Retrieval-Augmented Generation
Explore-on-Graph: Incentivizing Autonomous Exploration of Large Language Models on Knowledge Graphs with Path-refined Reward Modeling
Latent Wavelet Diffusion For Ultra High-Resolution Image Synthesis
Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Sculptor: Empowering LLMs with Cognitive Agency via Active Context Management
DreamOn: Diffusion Language Models For Code Infilling Beyond Fixed-size Canvas
Repurposing Synthetic Data for Fine-grained Search Agent Supervision
Lookahead Tree-Based Rollouts for Enhanced Trajectory-Level Exploration in Reinforcement Learning with Verifiable Rewards
Flow2GAN: Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-step High-Fidelity Audio Generation
TraPO: A Semi-Supervised Reinforcement Learning Framework for Boosting LLM Reasoning
A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning
Evolution and compression in LLMs: on the emergence of human-aligned categorization
The Human Brain as a Dynamic Mixture of Expert Models in Video Understanding
Towards Interpretable Visual Decoding with Attention to Brain Representations
Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models
CerebraGloss: Instruction-Tuning a Large Vision-Language Model for Fine-Grained Clinical EEG Interpretation
Convex Efficient Coding
Setting up for failure: automatic discovery of the neural mechanisms of cognitive errors
Multi-Object System Identification from Videos
HEEGNet: Hyperbolic Embeddings for EEG
ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding
Are EEG Foundation Models Worth It? Comparative Evaluation with Traditional Decoders in Diverse BCI Tasks
A tale of two tails: Preferred and anti-preferred natural stimuli in visual cortex
Low rank adaptation of chemical foundation models generate effective odorant representations
Animal behavioral analysis and neural encoding with transformer-based self-supervised pretraining
ODEBrain: Continuous-Time EEG Graph for Modeling Dynamic Brain Networks
SMixer: Rethinking Efficient-Training and Event-Driven SNNs
Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search
SK2Decompile: LLM-based Two-Phase Binary Decompilation from Skeleton to Skin
Trade in Minutes! Rationality-Driven Agentic System for Quantitative Financial Trading
Experience-based Knowledge Correction for Robust Planning in Minecraft
SmartChunk Retrieval: Query-Aware Chunk Compression with Planning for Efficient Document RAG
CoDA: Agentic Systems for Collaborative Data Visualization
Lean Finder: Semantic Search for Mathlib That Understands User Intents
LoC-Decomp: LLM Autoformalization via Logical Concept Decomposition and Iterative Feedback Correction
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
EvolProver: Advancing Automated theorem proving by Evolving Formalized Problems via Symmetry and Difficulty
iFusion: Integrating Dynamic Interest Streams via Diffusion Model for Click-Through Rate Prediction
Topology Matters in RTL Circuit Representation Learning
A General Framework for Black-Box Attacks Under Cost Asymmetry
LEGATO: Large-scale End-to-end Generalizable Approach to Typeset OMR
Process-Level Trajectory Evaluation for Environment Configuration in Software Engineering Agents
PCB-Bench: Benchmarking LLMs for Printed Circuit Board Placement and Routing
CollectiveKV: Decoupling and Sharing Collaborative Information in Sequential Recommendation
Zero-shot Human Pose Estimation using Diffusion-based Inverse solvers
GenCompositor: Generative Video Compositing with Diffusion Transformer
AttriCtrl: A Generalizable Framework for Controlling Semantic Attribute Intensity in Diffusion Models
Constantly Improving Image Models Need Constantly Improving Benchmarks
ODE-GS: Latent ODEs for Dynamic Scene Extrapolation with 3D Gaussian Splatting
MVAR: Visual Autoregressive Modeling with Scale and Spatial Markovian Conditioning
Trust but Verify: Adaptive Conditioning for Reference-Based Diffusion Super-Resolution via Implicit Reference Correlation Modeling
EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing
EdiVal-Agent: An Object-Centric Framework for Automated, Fine-Grained Evaluation of Multi-Turn Editing
Vivid-VR: Distilling Concepts from Text-to-Video Diffusion Transformer for Photorealistic Video Restoration
Preserving Forgery Artifacts: AI-Generated Video Detection at Native Scale
Streaming Drag-Oriented Interactive Video Manipulation: Drag Anything, Anytime!
RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning
EasyTune: Efficient Step-Aware Fine-Tuning for Diffusion-Based Motion Generation
UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models
PICABench: How Far are We from Physical Realistic Image Editing?
Soft-Di[M]O: Improving One-Step Discrete Image Generation with Soft Embeddings
ReDDiT: Rehashing Noise for Discrete Visual Generation
TrustGen: A Platform of Dynamic Benchmarking on the Trustworthiness of Generative Foundation Models
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
LayerSync: Self-aligning Intermediate Layers
Real-Time Motion-Controllable Autoregressive Video Diffusion
FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction
ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing
ReactID: Synchronizing Realistic Actions and Identity in Personalized Video Generation
Anchor Frame Bridging for Coherent First-Last Frame Video Generation
JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation
EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning
QVGen: Pushing the Limit of Quantized Video Generative Models
Stable Video Infinity: Infinite-Length Video Generation with Error Recycling
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
TS-Attn: Temporal-wise Separable Attention for Multi-Event Video Generation
LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence
WILD-Diffusion: A WDRO Inspired Training Method for Diffusion Models under Limited Data
Controllable Video Generation with Provable Disentanglement
Unified 3D Scene Understanding Through Physical World Modeling
From Broad Exploration to Stable Synthesis: Entropy-Guided Optimization for Autoregressive Image Generation
Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer
PI-Light: Physics-Inspired Diffusion for Full-Image Relighting
RegionE: Adaptive Region-Aware Generation for Efficient Image Editing
Motion Prior Distillation in Time Reversal Sampling for Generative Inbetweening
SoftCFG: Uncertainty-guided Stable Guidance for Visual Autoregressive Model
Video-As-Prompt: Unified Semantic Control for Video Generation
Instilling an Active Mind in Avatars via Cognitive Simulation
Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective
FieryGS: In-the-Wild Fire Synthesis with Physics-Integrated Gaussian Splatting
DiffWind: Physics-Informed Differentiable Modeling of Wind-Driven Object Dynamics
A Scene is Worth a Thousand Features: Feed-Forward Camera Localization from a Collection of Image Features
CHROMA: Consistent Harmonization of Multi-View Appearance via Bilateral Grid Prediction
MEGS^{2}: Memory-Efficient Gaussian Splatting via Spherical Gaussians and Unified Pruning
CAD-Tokenizer: Towards Text-Based CAD Prototyping via Modality-Specific Tokenization
Densemarks: Learning Canonical Embeddings for Human Heads Images via Point Tracks
HUMOF: Human Motion Forecasting in Interactive Social Scenes
GenFusion: Feed-forward Human Performance Capture via Progressive Canonical Space Updates
Towards Physically Executable 3D Gaussian for Embodied Navigation
Distractor-free Generalizable 3D Gaussian Splatting
Neural Compression of 3D Meshes using Sparse Implicit Representation
UltraGauss: Ultrafast Gaussian Reconstruction of 3D Ultrasound Volumes
SimULi: Real-Time LiDAR and Camera Simulation with Unscented Transforms
QuadGPT: Native Quadrilateral Mesh Generation with Autoregressive Models
IncVGGT: Incremental VGGT for Memory-Bounded Long-Range 3D Reconstruction
VL-JEPA: Joint Embedding Predictive Architecture for Vision-language
Towards Text-Mask Consistency in Medical Image Segmentation
Cambrian-S: Towards Spatial Supersensing in Video
ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding
Progressive Online Video Understanding with Evidence-Aligned Timing and Transparent Decisions
FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting
Thinking as Society: Multi-Social-Agent Self-Distillation for Multimodal Misinformation Detection
Divid: Disentangled Spatial-Temporal Modeling within LLMs for Temporally Grounded Video Understanding
PRISMM-Bench: A Benchmark of Peer-Review Grounded Multimodal Inconsistencies
Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle
Event-T2M: Event-level Conditioning for Complex Text-to-Motion Synthesis
V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model Interaction
Earth-Agent: Unlocking the Full Landscape of Earth Observation with Agents
RayI2P: Learning Rays for Image-to-Point Cloud Registration
Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation
CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs
Panoptic Pairwise Distortion Graph
PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of mUlti-turn jailbrEaks
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Lightweight Spatio-Temporal Modeling via Temporally Shifted Distillation for Real-Time Accident Anticipation
Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models
Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models
ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping
VideoAnchor: Reinforcing Subspace-Structured Visual Cues for Coherent Visual-Spatial Reasoning
Latent Wasserstein Adversarial Imitation Learning
S2GO: Streaming Sparse Gaussian Occupancy
MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos
DTP: Delta-Guided Two Stage Pruning for Mamba-based Multimodal Large Language Models
FLoC: Facility Location-Based Efficient Visual Token Compression for Long Video Understanding
PlantRSR: A New Plant Dataset and Method for Reference-based Super-Resolution
Perception-Aware Policy Optimization for Multimodal Reasoning
SCUBA: Salesforce Computer Use Benchmark
ExPO-HM: Learning to Explain-then-Detect for Hateful Meme Detection
Uncover Underlying Correspondence for Robust Multi-view Clustering
AnyUp: Universal Feature Upsampling
UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding
CLUTCH: Contextualized Language model for Unlocking Text-Conditioned Hand motion modelling in the wild
MRAD: Zero-Shot Anomaly Detection with Memory-Driven Retrieval
Faster Vision Transformers with Adaptive Patches
Constructive Distortion: Improving MLLMs with Attention-Guided Image Warping
MARS - A Foundational Map Auto-Regressor
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models
VidBridge-R1: Bridging QA and Captioning for RL-based Video Understanding Models with Intermediate Proxy Tasks
Can Vision-Language Models Answer Face to Face Questions in the Real-World?
LearnPruner: Rethinking Attention-based Token Pruning in Vision Language Models
Fostering Video Reasoning via Next-Event Prediction
TOUCH: Text-guided Controllable Generation of Free-Form Hand-Object Interactions
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?
PartSAM: A Scalable Promptable Part Segmentation Model Trained on Native 3D Data
ASMIL: Attention-Stabilized Multiple Instance Learning for Whole-Slide Imaging
Sequential Information Bottleneck Fusion: Towards Robust and Generalizable Multi-Modal Brain Tumor Segmentation
Interaction-aware Representation Modeling With Co-Occurrence Consistency for Egocentric Hand-Object Parsing
NatADiff: Adversarial Boundary Guidance for Natural Adversarial Diffusion
Noisy-Pair Robust Representation Alignment for Positive-Unlabeled Learning
EnvSocial-Diff: A Diffusion-Based Crowd Simulation Model with Environmental Conditioning and Individual-Group Interaction
HSIC Bottleneck for Cross-Generator and Domain-Incremental Synthetic Image Detection
No Pixel Left Behind: A Detail-Preserving Architecture for Robust High-Resolution AI-Generated Image Detection
Low-Latency Neural LiDAR Compression with 2D Context Models
Maximizing Asynchronicity in Event-based Neural Networks
ForestPersons: A Large-Scale Dataset for Under-Canopy Missing Person Detection
Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction
SceneStreamer: Continuous Scenario Generation as Next Token Group Prediction
Learning Domain-Aware Task Prompt Representations for Multi-Domain All-in-One Image Restoration
Seeing What’s Wrong: A Trajectory-Guided Approach to Caption Error Detection
RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration
KinemaDiff: Towards Diffusion for Coherent and Physically Plausible Human Motion Prediction
FARTrack: Fast Autoregressive Visual Tracking with High Performance
CortiLife: A Unified Framework for Cortical Representation Learning across the Lifespan
Self-Guided Low Light Object Detection Framework
Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models
Seeing Through the PRISM: Compound & Controllable Restoration of Scientific Images
The Gaussian-Head OFL Family: One-Shot Federated Learning from Client Global Statistics
Federated Learning of Quantile Inference under Local Differential Privacy
HiddenEcho: Mitigating Noise Amplification in Differentially Private LLMs with Hidden-State Correction
Hot PATE: Private Aggregation of Distributions for Diverse Tasks
Private Rate-Constrained Optimization with Applications to Fair Learning
Rethinking Benign Relearning: Syntax as the Hidden Driver of Unlearning Failures
Curation Leaks: Membership Inference Attacks against Data Curation for Machine Learning
Rethinking LoRA for Privacy-Preserving Federated Learning in Large Models
Information-Theoretic Membership Inference for Granular Quantification of Memorization
Secret-Protected Evolution for Differentially Private Synthetic Text Generation
DRAGON: Guard LLM Unlearning in Context via Negative Detection and Reasoning
Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models
Revisiting Confidence Calibration for Misclassification Detection in VLMs
Hubble: a Model Suite to Advance the Study of LLM Memorization
Mechanistic Detection and Mitigation of Hallucination in Large Reasoning Models
LiteGuard: Efficient Task-Agnostic Model Fingerprinting with Enhanced Generalization
On Fairness of Task Arithmetic: The Role of Task Vectors
Learning for Highly Faithful Explainability
Neuron-Level Analysis of Cultural Understanding in Large Language Models
Uncovering Conceptual Blindspots in Generative Image Models Using Sparse Autoencoders
Circuit Insights: Towards Interpretability Beyond Activations
Evaluating Data Influence in Meta Learning
Dissecting Representation Misalignment in Contrastive Learning via Influence Function
Self-Consistency Improves the Trustworthiness of Self-Interpretable GNNs
Tracing the Traces: Latent Temporal Signals for Efficient and Accurate Reasoning
Mechanism of Task-oriented Information Removal in In-context Learning
f-INE: A Hypothesis Testing Framework for Estimating Influence under Training Randomness
Faithful Bi-Directional Model Steering via Distribution Matching and Distributed Interchange Interventions
When Agents “Misremember” Collectively: Exploring the Mandela Effect in LLM-based Multi-Agent Systems
THE SELF-RE-WATERMARKING TRAP: FROM EXPLOIT TO RESILIENCE
Bi-directional Bias Attribution: Debiasing Large Language Models without Modifying Prompts
FaLW: A Forgetting-aware Loss Reweighting for Long-tailed Unlearning
Align to Misalign: Automatic LLM Jailbreak with Meta-Optimized LLM Judges
Benchmarking Overton Pluralism in LLMs
JailbreakLoRA: Your Downloaded LoRA from Sharing Platforms might be Unsafe
DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models
Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs
Adaptive Logit Adjustment for Debiasing Multimodal Language Models
Preference Leakage: A Contamination Problem in LLM-as-a-judge
When AI Agents Collude Online: Financial Fraud Risks by Collaborative LLM Agents on Social Platforms
Adversarial Attacks Already Tell the Answer: Directional Bias-Guided Test-time Defense for Vision-Language Models
No Caption, No Problem: Caption-Free Membership Inference via Model-Fitted Embeddings
Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer
Bias Similarity Measurement: A Black-Box Audit of Fairness Across LLMs
A-TPT: Angular Diversity Calibration Properties for Test-Time Prompt Tuning of Vision-Language Models
PoliCon: Evaluating LLMs on Achieving Diverse Political Consensus Objectives
Residual Feature Integration is Sufficient to Prevent Negative Transfer
Naming to Learn: Class Incremental Learning for Vision-Language Model with Unlabeled Data
Barriers for Learning in an Evolving World: Mathematical Understanding of Loss of Plasticity
General search techniques without common knowledge for imperfect-information games, and application to superhuman Fog of War chess
Bi-Criteria Metric Distortion
Designing Rules to Pick a Rule: Aggregation by Consistency
Learning-Augmented Moment Estimation on Time-Decay Models
Non-Asymptotic Analysis of (Sticky) Track-and-Stop
Toward Practical Equilibrium Propagation: Brain-inspired Recurrent Neural Network with Feedback Regulation and Residual Connections
Subquadratic Algorithms and Hardness for Attention with Any Temperature
Near-Optimal Sample Complexity Bounds for Constrained Average-Reward MDPs
Bandit Learning in Matching Markets Robust to Adversarial Corruptions
The Price of Robustness: Stable Classifiers Need Overparameterization
Good Allocations from Bad Estimates
Towards a Sharp Analysis of Offline Policy Learning for $f$-Divergence-Regularized Contextual Bandits
Polynomial Convergence of Riemannian Diffusion Models
A Theoretical Analysis of Mamba’s Training Dynamics: Filtering Relevant Features for Generalization in State Space Models
Why DPO is a Misspecified Estimator and How to Fix It
Bridging Kolmogorov Complexity and Deep Learning: Asymptotically Optimal Description Length Objectives for Transformers
InfoNCE Induces Gaussian Distribution
Tight Bounds for Schrodinger Potential Estimation in Unpaired Data Translation
Mapping Semantic & Syntactic Relationships with Geometric Rotation
Bilinear representation mitigates reversal curse and enables consistent model editing
Feature segregation by signed weights in artificial vision systems and biological models
Automated Interpretability Metrics Do Not Distinguish Trained and Random Transformers
Unveiling Super Experts in Mixture-of-Experts Large Language Models
A Hidden Semantic Bottleneck in Conditional Embeddings of Diffusion Transformers
Causal Interpretation of Neural Network Computations with Contribution Decomposition
Quantum machine learning advantages beyond hardness of evaluation
Matched Data, Better Models: Target Aligned Data Filtering with Sparse Autoencoders
Learning multimodal dictionary decompositions with group-sparse autoencoders
On the Expressiveness of State Space Models via Temporal Logics
Influence Dynamics and Stagewise Data Attribution
The Deleuzian Representation Hypothesis
Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region
HOTA: Hamiltonian framework for Optimal Transport Advection
Skill Learning via Policy Diversity Yields Identifiable Representations for Reinforcement Learning
Action Chunking and Data Augmentation Yield Exponential Improvements in Behavior Cloning for Continuous Spaces
Queue Length Regret Bounds for Contextual Queueing Bandits
Analysis of approximate linear programming solution to Markov decision problem with log barrier function
Online Rounding and Learning Augmented Algorithms for Facility Location
The Expressive Limits of Diagonal SSMs for State-Tracking
Toward Conservative Planning from Human-AI Preferences in Reinforcement Learning
Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning
Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning
Beyond Pairwise: Empowering LLM Alignment With (Ranked) Choice Modeling
In-Context Compositional Q-Learning for Offline Reinforcement Learning
Masked Skill Token Training for Hierarchical Off-Dynamics Transfer
MOBODY: Model-Based Off-Dynamics Offline Reinforcement Learning
Interleave-VLA: Enhancing Robot Manipulation with Image-Text Interleaved Instructions
Bridging Successor Measure and Online Policy Learning with Flow Matching-Based Representations
From Verifiable Dot to Reward Chain: Harnessing Verifiable Reference-based Rewards for Reinforcement Learning of Open-ended Generation
Search Self-Play: Pushing the Frontier of Agent Capability without Supervision
Transitive RL: Value Learning via Divide and Conquer
Mastering Sparse CUDA Generation through Pretrained Models and Deep Reinforcement Learning
Structured Reasoning for LLMs: A Unified Framework for Efficiency and Explainability
TD-JEPA: Latent-predictive Representations for Zero-Shot Reinforcement Learning
Universal Value-Function Uncertainties
Ego-Foresight: Self-supervised Learning of Agent-Aware Representations for Improved RL
Relative Entropy Pathwise Policy Optimization
Multiplayer Nash Preference Optimization
On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning
Simplicial Embeddings Improve Sample Efficiency in Actor–Critic Agents
Native Reasoning Models: Training Language Models to Reason on Unverifiable Data
Representation-Based Exploration for Language Models: From Test-Time to Post-Training
A Reward-Free Viewpoint on Multi-Objective Reinforcement Learning
Reference Grounded Skill Discovery
Jackpot: Align Actor-Policy Distribution for scalable and stable RL for LLM
Bridging the performance-gap between target-free and target-based reinforcement learning
Policy Newton Algorithm in Reproducing Kernel Hilbert Space
Imitation Learning as Return Distribution Matching
GRACE: A Language Model Framework for Explainable Inverse Reinforcement Learning
Safe Continuous-time Multi-Agent Reinforcement Learning via Epigraph Form
Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies
From Assumptions to Actions: Turning LLM Reasoning into Uncertainty-Aware Planning for Embodied Agents
STAIRS-Former: Spatio-Temporal Attention with Interleaved Recursive Structure TransFormer for Offline Mulit-task Multi-agent Reinforcement Learning
Best-of-Infinity: Asymptotic Performance of Test-Time LLM Ensembling
Heterogeneous Agent Q-weighted Policy Optimization
Language and Experience: A Computational Model of Social Learning in Complex Tasks
From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning
GlobeDiff: State Diffusion Process for Partial Observability in Multi-Agent System
HiPO: Self-Hint Policy Optimization for RLVR
DAK-UCB: Diversity-Aware Prompt Routing for LLMs and Generative Models
R2-Dreamer: Redundancy-Reduced World Models without Decoders or Augmentation
Revisiting Matrix Sketching in Linear Bandits: Achieving Sublinear Regret via Dyadic Block Sketching
Temperature as a Meta-Policy: Adaptive Temperature in LLM Reinforcement Learning
Flow Matching with Injected Noise for Offline-to-Online Reinforcement Learning
RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling
Code Driven Planning with Domain-Adaptive Selector
Triple-BERT: Do We Really Need MARL for Order Dispatch on Ride-Sharing Platforms?
Automating the Refinement of Reinforcement Learning Specifications
ExoPredicator: Learning Abstract Models of Dynamic Worlds for Robot Planning
Test-Time Alignment for Large Language Models via Textual Model Predictive Control
LogicReward: Incentivizing LLM Reasoning via Step-Wise Logical Supervision
HARDTESTGEN: A High-Quality RL Verifier Generation Pipeline for LLM Algorithimic Coding
Efficient Morphology-Control Co-Design via Stackelberg Proximal Policy Optimization
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning
Breaking Barriers: Do Reinforcement Post Training Gains Transfer To Unseen Domains?
WorldGym: World Model as An Environment for Policy Evaluation
Scaling Large Vision-Language Model RL Training via Efficient Load Balancing
RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents
GEM: A Gym for Generalist LLMs
BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning
Geometric-Mean Policy Optimization
CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs
Efficient Multi-objective Prompt Optimization via Pure-exploration Bandits
Count Counts: Motivating Exploration in LLM Reasoning with Count-based Intrinsic Rewards
Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs
Activation Steering with a Feedback Controller
Adaptive Conformal Prediction via Mixture-of-Experts Gating Similarity
Hyperparameter Trajectory Inference with Conditional Lagrangian Optimal Transport
What Matters for Batch Online Reinforcement Learning in Robotics?
Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment
On the Predictive Power of Representation Dispersion in Language Models
Uni-NTFM: A Unified Foundation Model for EEG Signal Representation Learning
LongLive: Real-time Interactive Long Video Generation
Reinforcement Learning for Machine Learning Engineering Agents
MuonBP: Faster Muon via Block-Periodic Orthogonalization
Natural Language PDDL (NL-PDDL) for Open-world Goal-oriented Commonsense Regression Planning in Embodied AI
SCI-Verifier: Scientific Verifier with Thinking
Nasty Adversarial Training: A Probability Sparsity Perspective for Robustness Enhancement
How Do Medical MLLMs Fail? A Study on Visual Grounding in Medical Images
Comparing the learning dynamics of in-context learning and fine-tuning in language models
CE-Nav: Flow-Guided Reinforcement Refinement for Cross-Embodiment Local Navigation
SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting
Hallucination Reduction with CASAL: Contrastive Activation Steering for Amortized Learning
To Compress or Not? Pushing the Frontier of Lossless GenAI Model Weights Compression with Exponent Concentration
Reliable Weak-to-Strong Monitoring of LLM Agents
Taming Polysemanticity in LLMs: Theory-Grounded Feature Recovery via Sparse Autoencoders
AttTok: Marrying Attribute Tokens with Generative Pre-trained Vision-Language Models towards Medical Image Understanding
Improved Adversarial Diffusion Compression for Real-World Video Super-Resolution
InclusiveVidPose: Bridging the Pose Estimation Gap for Individuals with Limb Deficiencies in Video-Based Motion
Dynamic-dLLM: Dynamic Cache-Budget and Adaptive Parallel Decoding for Training-Free Acceleration of Diffusion LLM
A Brain Graph Foundation Model: Pre-Training and Prompt-Tuning across Broad Atlases and Disorders
Consis-GCPO: Consistency-Preserving Group Causal Preference Optimization for Vision Customization
Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
AutoDrive-R²: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving
SR-Scientist: Scientific Equation Discovery With Agentic AI
Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games
MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes
BaseReward: A Strong Baseline for Multimodal Reward Model
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
Taming Curvature: Architecture Warm-up for Stable Transformer Training
Scaling Agents via Continual Pre-training
Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models
CLAUSE: Agentic Neuro-Symbolic Knowledge Graph Reasoning via Dynamic Learnable Context Engineering
FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning
TEMPFLOW-GRPO: WHEN TIMING MATTERS FOR GRPO IN FLOW MODELS
CoDA: From Text-to-Image Diffusion Models to Training-Free Dataset Distillation
RAG4DMC: Retrieval-Augmented Generation for Data-Level Modality Completion
DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation
Joint Distribution–Informed Shapley Values for Sparse Counterfactual Explanations
Fast Frank–Wolfe Algorithms with Adaptive Bregman Step-Size for Weakly Convex Functions
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding
Can LLMs Move Beyond Short Exchanges to Realistic Therapy Conversations?
Cartridges: Lightweight and general-purpose long context representations via self-study
Newton Method Revisited: Global Convergence Rates up to $O(1/k^3)$ for Stepsize Schedules and Linesearch Procedures
Model-based Offline RL via Robust Value-Aware Model Learning with Implicitly Differentiable Adaptive Weighting
Causal-Steer: Disentangled Continuous Style Control without Parallel Corpora
Cascadia: An Efficient Cascade Serving System for Large Language Models
OptimSyn: Influence-Guided Rubrics Optimization for Synthetic Data Generation
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models
EVLP: Learning Unified Embodied Vision-Language Planner with Reinforced Supervised Fine-Tuning
CPQS-Tuning: A Model Self-Perception-Based Data Filtering Algorithm for Efficient Instruction Fine-Tuning
To Sink or Not to Sink: Visual Information Pathways in Large Vision-Language Models
$PhyWorldBench$: A Comprehensive Evaluation of Physical Realism in Text-to-Video Models
OrthAlign: Orthogonal Subspace Decomposition for Non-Interfering Multi-Objective Alignment
Are we measuring oversmoothing in graph neural networks correctly?
Language-Instructed Vision Embeddings for Controllable and Generalizable Perception
Test-Time Optimization of 3D Point Cloud LLM via Manifold-Aware In-Context Guidance and Refinement
Object-Centric World Models from Few-Shot Annotations for Sample-Efficient Reinforcement Learning
A High Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation
Emergent Misalignment is Easy, Narrow Misalignment is Hard
Reinforcing General Reasoning Without Verifiers
PHAT: Modeling Period Heterogeneity for Multivariate Time Series Forecasting
AlphaAgentEvo: Evolution-Oriented Alpha Mining via Self-Evolving Agentic Reinforcement Learning
IF-VidCap: Can Video Caption Models Follow Instructions?
Recurrent Action Transformer with Memory
Disentangling the Factors of Convergence between Brains and DINOv3
PreferThinker: Reasoning-based Personalized Image Preference Assessment
Modeling the Density of Pixel-level Self-supervised Embeddings for Unsupervised Pathology Segmentation in Medical CT
Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation
CoCoDiff: Correspondence-Consistent Diffusion Model for Fine-grained Style Transfer
Physics-Informed Inference Time Scaling for Solving High-Dimensional Partial Differential Equations
THE END OF MANUAL DECODING: TOWARDS TRULY END-TO-END LANGUAGE MODELS
Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test
Tree Search for LLM Agent Reinforcement Learning
Topological Anomaly Quantification for Semi-supervised Graph Anomaly Detection
MedGMAE: Gaussian Masked Autoencoders for Medical Volumetric Representation Learning
ATTS: Asynchronous Test-Time Scaling via Conformal Prediction
Evaluating and Improving Cultural Awareness of Reward Models for LLM Alignment
Helmsman: Autonomous Synthesis of Federated Learning Systems via Collaborative LLM Agents
Discrete Adjoint Matching
Feature compression is the root cause of adversarial fragility in neural networks
Guidance Matters: Rethinking the Evaluation Pitfall for Text-to-Image Generation
Can Vision–Language Models Assess Graphic Design Aesthetics? A Benchmark, Evaluation, and Dataset Perspective.
Anchored Supervised Fine-Tuning
Neural Sum-of-Squares: Certifying the Nonnegativity of Polynomials with Transformers
UrbanFeel:A Comprehensive Benchmark for Temporal and Perceptual Understanding of City Scenes through Human Perspective
Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning
PrefDisco: Benchmarking Proactive Personalized Reasoning
Multi-Subspace Multi-Modal Modeling for Diffusion Models: Estimation, Convergence and Mixture of Experts
UrbanGS: Efficient and Scalable Architecture for Geometrically Accurate Large-Scene Reconstruction
MARTI: A Framework for Multi-Agent LLM Systems Reinforced Training and Inference
Tractability via Low Dimensionality: The Parameterized Complexity of Training Quantized Neural Networks
Multi-Marginal Flow Matching with Adversarially Learnt Interpolants
RLP: Reinforcement as a Pretraining Objective
SPELL: Self-Play Reinforcement Learning for Evolving Long-Context Language Models
RL's Razor: Why Online Reinforcement Learning Forgets Less
Neural Message-Passing on Attention Graphs for Hallucination Detection
Optimizing Canaries for Privacy Auditing with Metagradient Descent
Implicit Regularisation in Diffusion Models: An Algorithm-Dependent Generalisation Analysis
AlphaAlign: Incentivizing Safety Alignment with Extremely Simplified Reinforcement Learning
MICLIP: Learning to Interpret Representation in Vision Models
Fast-dLLM v2: Efficient Block-Diffusion LLM
StreamingThinker: Large Language Models Can Think While Reading
A Stitch in Time Saves Nine: Proactive Self-Refinement for Language Models
Efficient Reinforcement Learning by Guiding World Models with Non-Curated Data
PICS: Pairwise Image Compositing with Spatial Interactions
Opponent Shaping in LLM Agents
WebArbiter: A Generative Reasoning Process Reward Model for Web Agents
Rodrigues Network for Learning Robot Actions
Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs
SAIL: Self-Amplified Iterative Learning for Diffusion Model Alignment with Minimal Human Feedback
Variance-Dependent Regret Lower Bounds for Contextual Bandits
Diversified Multinomial Logit Contextual Bandits
IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction
Hilbert-Guided Sparse Local Attention
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
Slicing Wasserstein over Wasserstein via Functional Optimal Transport
MARL2Grid-TR: A Multi-Agent RL Benchmark in Power Grid Operations
PostAlign: Multimodal Grounding as a Corrective Lens for MLLMs
GEPO: Group Expectation Policy Optimization for Stable Heterogeneous Reinforcement Learning
Online time series prediction using feature adjustment
NI Sampling: Accelerating Discrete Diffusion Sampling by Token Order Optimization
Agnostics: Learning to Synthesize Code in Any Programming Language with a Universal Reinforcement Learning Environment
AutoTool: Automatic Scaling of Tool-Use Capabilities in RL via Decoupled Entropy Constraints
K-Sort Eval: Efficient Preference Evaluation for Visual Generation via Corrected VLM-as-a-Judge
Context Tokens are Anchors: Understanding the Repeat Curse in dMLLMs from an Information Flow Perspective
Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders
Towards Understanding The Calibration Benefits of Sharpness-Aware Minimization
Causal Discovery via Quantile Partial Effect
Entropy-Based Block Pruning for Efficient Large Language Models
Efficient Turing Machine Simulation with Transformers
Free Energy Mixer
FM4NPP: A Scaling Foundation Model for Nuclear and Particle Physics
Token-Efficient Item Representation via Images for LLM Recommender Systems
Nesterov Finds GRAAL: Optimal and Adaptive Gradient Method for Convex Optimization
Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control
ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis
WorldTree: Towards 4D Dynamic Worlds from Monocular Video using Tree-Chains
Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models
DistillKac: Few-Step Image Generation via Damped Wave Equations
Low-Rank Few-Shot Node Classification by Node-Level Graph Diffusion
LLaVA-4D: Embedding SpatioTemporal Prompt into LMMs for 4D Scene Understanding
Birch SGD: A Tree Graph Framework for Local and Asynchronous SGD Methods
Superficial Safety Alignment Hypothesis
When More is Less: Understanding Chain-of-Thought Length in LLMs
TusoAI: Agentic Optimization for Scientific Methods
From Markov to Laplace: How Mamba In-Context Learns Markov Chains
Fly-CL: A Fly-Inspired Framework for Enhancing Efficient Decorrelation and Reduced Training Time in Pre-trained Model-based Continual Representation Learning
Dynamics-Predictive Sampling for Active RL Finetuning of Large Reasoning Models
Learning From the Past with Cascading Eligibility Traces
AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size
MindPilot: Closed-loop Visual Stimulation Optimization for Brain Modulation with EEG-guided Diffusion
Measuring Uncertainty Calibration
Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization
Math Blind: Failures in Diagram Understanding Undermine Reasoning in MLLMs
Balancing the Experts: Unlocking LoRA-MoE for GRPO via Mechanism-Aware Rewards
InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions
Leveraging Pretrained Knowledge at Inference Time: LoRA-Gated Contrastive Decoding for Multilingual Factual Language Generation in Adapted LLMs
MATHMO: Automated Mathematical Modeling Through Adaptive Search
Every Language Model Has a Forgery-Resistant Signature
DreamSwapV: Mask-guided Subject Swapping for Any Customized Video Editing
Physically Valid Biomolecular Interaction Modeling with Gauss-Seidel Projection
Beyond Magnitude: Leveraging Direction of RLVR Updates for LLM Reasoning
LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation
Search Arena: Analyzing Search-Augmented LLMs
Dataset Distillation as Pushforward Optimal Quantization
Robustness of Probabilistic Models to Low-Quality Data: A Multi-Perspective Analysis
Latent Fourier Transform
Motion-Aligned Word Embeddings for Text-to-Motion Generation
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy
Reconciling Visual Perception and Generation in Diffusion Models
On the identifiability of causal graphs with multiple environments
TrimR: Verifier-based Training-Free Thinking Trimming for Efficient Test-Time Scaling
Coupled Transformer Autoencoder for Disentangling Multi-Region Neural Latent Dynamics
Exploring Real-Time Super-Resolution: Benchmarking and Fine-Tuning for Streaming Content
Can SAEs reveal and mitigate racial biases of LLMs in healthcare?
ViPER: Empowering the Self-Evolution of Visual Perception Abilities in Vision-Language Models
Relative Value Learning
Scaling Laws of SignSGD in Linear Regression: When Does It Outperform SGD?
SWE-RM: Execution-free Feedback for Software Engineering Agents
One Patch Doesn’t Fit All: Adaptive Patching for Native-Resolution Multimodal Large Language Models
Detection of unknown unknowns in autonomous systems
Flow Straight and Fast in Hilbert Space: Functional Rectified Flow
Micro-Macro Retrieval: Reducing Long-Form Hallucination in Large Language Models
All Patches Matter, More Patches Better: Enhance AI-Generated Image Detection via Panoptic Patch Learning
Softmax is not Enough (for Adaptive Conformal Classification)
``Noisier'’ Noise Contrastive Estimation is (Almost) Maximum Likelihood
Strong Correlations Induce Cause Only Predictions in Transformer Training
JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization
Canonical Tree Cover Neural Networks for Expressive and Invariant Graph Learning
Persona Features Control Emergent Misalignment
Mixture of Contexts for Long Video Generation
VideoAgentTrek: Computer-Use Pretraining from Unlabeled Videos
DIVA-GRPO: Enhancing Multimodal Reasoning through Difficulty-Adaptive Variant Advantage
Polychromic Objectives for Reinforcement Learning
GIQ: Benchmarking 3D Geometric Reasoning of Vision Foundation Models with Simulated and Real Polyhedra
Codified Finite-state Machines for Role-playing
Segment Any Events with Language
WithAnyone: Toward Controllable and ID Consistent Image Generation
MOAI: Module-Optimizing Architecture for Non-Interactive Secure Transformer Inference
Generation then Reconstruction: Accelerating Masked Autoregressive Models via Two-Stage Sampling
SPICE: Submodular Penalized Information–Conflict Selection for Efficient Large Language Model Training
SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration
Block-Sample MAC-Bayes Generalization Bounds
SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training
Neologism Learning for Controllability and Self-Verbalization
VibeVoice: Expressive Podcast Generation with Next-Token Diffusion
Symmetry-Aware Bayesian Optimization via Max Kernels
Quantifying Cross-Attention Interaction in Transformers for Interpreting TCR-pMHC Binding
Into the Rabbit Hull: From Task-Relevant Concepts in DINO to Minkowski Geometry
Learning to Segment for Vehicle Routing Problems
Inoculation Prompting: Eliciting traits from LLMs during training can reduce trait expression at test-time
Unmute the Patch Tokens: Rethinking Probing in Multi-Label Audio Classification
From EduVisBench to EduVisAgent: A Benchmark and Multi-Agent Framework for Reasoning-Driven Pedagogical Visualization
MergePRAG: Orthogonal Merging of Passage-experts for Multi-hop Parametric RAG
Hierarchical Multi-Stage Recovery Framework for Kronecker Compressed Sensing
Time-Gated Multi-Scale Flow Matching for Time-Series Imputation
RobustSpring: Benchmarking Robustness to Image Corruptions for Optical Flow, Scene Flow and Stereo
Knowledge Externalization: Reversible Unlearning and Modular Retrieval in Multimodal Large Language Models
TriC-Motion: Tri-Domain Causal Modeling Grounded Text-to-Motion Generation
ComGS: Efficient 3D Object-Scene Composition via Surface Octahedral Probes
RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments
Generalizable End-to-End Tool-Use RL with Synthetic CodeGym
Weak-to-Strong Diffusion with Reflection
Knowledge Fusion of Large Language Models via Modular SkillPacks
Critical Confabulation: Can LLMs Hallucinate for Social Good?
Invert4TVG: A Temporal Video Grounding Framework with Inversion Tasks Preserving Action Understanding Ability
WideSearch: Benchmarking Agentic Broad Info-Seeking
The Art of Scaling Reinforcement Learning Compute for LLMs
On the Expressive Power of GNNs for Boolean Satisfiability
SAQ: Stabilizer-Aware Quantum Error Correction Decoder
Diffusion Bridge Variational Inference for Deep Gaussian Processes
Unlocking the Power of Co-Occurrence in CLIP: A DualPrompt-Driven Method for Training-Free Zero-Shot Multi-Label Classification
Teaching VLMs to Admit Uncertainty in OCR from Lossy Visual Inputs
Conformal Prediction with Corrupted Labels: Uncertain Imputation and Robust Re-weighting
Towards Generalizable PDE Dynamics Forecasting via Physics-Guided Invariant Learning
Take Note: Your Molecular Dataset Is Probably Aligned
Prompt Curriculum Learning for Efficient LLM Post-Training
Translate Policy to Language: Flow Matching Generated Rewards for LLM Explanations
Captain Cinema: Towards Short Movie Generation
Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind
CLoD-GS: Continuous Level-of-Detail via 3D Gaussian Splatting
Rethinking Consistent Multi-Label Classification Under Inexact Supervision
Why Keep Your Doubts to Yourself? Trading Visual Uncertainties among Vision-Language Models
Fracture-GS: Dynamic Fracture Simulation with Physics-Integrated Gaussian Splatting
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning
MIRACLE: Model-free Imitation and Reinforcement Learning for Adaptive Cut-Selection
Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation
Predictive Differential Training Guided by Training Dynamics
DualMap: Enabling Both Cache Affinity and Load Balancing for Distributed LLM Serving
Avoid Catastrophic Forgetting with Rank-1 Fisher from Diffusion Models
Cautious Optimizers: Improving Training with One Line of Code
Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model
RAVEN: End-to-end Equivariant Robot Learning with RGB Cameras
Seq vs Seq: An Open Suite of Paired Encoders and Decoders
Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
BioBO: Biology-informed Bayesian Optimization for Perturbation Design
Entropy-Guided Dynamic Tokens for Graph-LLM Alignment in Molecular Understanding
Active Learning of 3D Gaussian Splatting with Consistent Region Partition and Robust Pose Estimation
pySpatial: Generating 3D Visual Programs for Zero-Shot Spatial Reasoning
GoR: A Unified and Extensible Generative Framework for Ordinal Regression
Joint Shadow Generation and Relighting via Light-Geometry Interaction Maps
NGS-Marker: Robust Native Watermarking for 3D Gaussian Splatting
SHIELD: Suppressing Hallucinations In LVLM Encoders via Bias and Vulnerability Defense
Nemotron-Research-Tool-N1: Exploring Tool-Using Language Models with Reinforced Reasoning
FaSTA*: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing
Escaping Model Collapse via Synthetic Data Verification: Near-term Improvements and Long-term Convergence
How NOT to benchmark your SITE metric: Beyond Static Leaderboards and Towards Realistic Evaluation.
Dual-Branch Representations with Dynamic Gated Fusion and Triple-Granularity Alignment for Deep Multi-View Clustering
Preserve and Sculpt: Manifold-Aligned Fine-tuning of Vision-Language Models for Few-Shot Learning
Sparling: End-to-End Spatial Concept Learning via Extremely Sparse Activations
Getting Your LLMs Ready for Reinforcement Learning with Lightweight SFT
Reducing Semantic Mismatch in Brain-to-Text Decoding Through Personalized Multimodal Masking
PepTri: Tri-Guided All-Atom Diffusion for Peptide Design via Physics, Evolution, and Mutual Information
Constant Degree Matrix-Driven Incomplete Multi-View Clustering via Connectivity-Structure and Embedding Tensor Learning
How to train data-efficient LLMs
From Pixels to Semantics: Unified Facial Action Representation Learning for Micro-Expression Analysis
IceCache: Memory-Efficient KV-cache Management for Long-Sequence LLMs
Interact-RAG: Reason and Interact with the Corpus, Beyond Black-Box Retrieval
InfoBridge: Mutual Information estimation via Bridge Matching
Beyond a Million Tokens: Benchmarking and Enhancing Long-Term Memory in LLMs
GaitSnippet: Gait Recognition Beyond Unordered Sets and Ordered Sequences
SYNC: Measuring and Advancing Synthesizability in Structure-Based Drug Design
Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering
What matters for Representation Alignment: Global Information or Spatial Structure?
Revisiting Long-context Modeling from Context Denoising Perspective
ProRe: A Proactive Reward System for GUI Agents via Reasoner–Actor Collaboration
Revisiting [CLS] and Patch Token Interaction in Vision Transformers
LSA: Layer-wise Sparsity Allocation for Large Language Model Pruning Based on Minimal Linear Reconstruction Error
RIVER: A Real-Time Interaction Benchmark for Video LLMs
Bridging Generalization Gap of Heterogeneous Federated Clients Using Generative Models
OpenAgentSafety: A Comprehensive Framework For Evaluating Real-World AI Agent Safety
TopoFormer: Topology Meets Attention for Graph Learning
DexNDM: Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-Wise Neural Dynamics Model
RAR: Reversing Visual Attention Re-Sinking for Unlocking Potential in Multimodal Large Language Models
Conformalized Decision Risk Assessment
Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification
Adaptive Mamba Neural Operators
Redirection for Erasing Memory (REM): Towards a universal unlearning method for corrupted data
Overparametrization bends the landscape: BBP transitions at initialization in simple Neural Networks
The Sample Complexity of Online Reinforcement Learning: A Multi-model Perspective
CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data
Lossless Vocabulary Reduction for Auto-Regressive Language Models
A Framework for Studying AI Agent Behavior: Evidence from Consumer Choice Experiments
Structure-Aware Graph Hypernetworks for Neural Program Synthesis
TripleSumm: Adaptive Triple-Modality Fusion for Video Summarization
ChainMPQ: Interleaved Text-Image Reasoning Chains for Mitigating Relation Hallucinations
Demystifying Deep Search: A Holistic Evaluation with Hint-free Multi-Hop Questions and Factorised Metrics
Attack-Resistant Watermarking for AIGC Image Forensics via Diffusion-based Semantic Deflection
Primary-Fine Decoupling for Action Generation in Robotic Imitation
Causally Robust Reward Learning from Reason-Augmented Preference Feedback
FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory
Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding
RedacBench: Can AI Erase Your Secrets?
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities
Rethinking Uncertainty Estimation in LLMs: A Principled Single-Sequence Measure
EgoWorld: Translating Exocentric View to Egocentric View using Rich Exocentric Observations
Planned Diffusion
Test-Time Matching: Unlocking Compositional Reasoning in Multimodal Models
Lumos-1: On Autoregressive Video Generation with Discrete Diffusion from a Unified Model Perspective
ARMs: Adaptive Red-Teaming Agent against Multimodal Models with Plug-and-Play Attacks
Neural Force Field: Few-shot Learning of Generalized Physical Reasoning
Repurposing Foundation Model for Generalizable Medical Time Series Classification
Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall
ReTrace: Reinforcement Learning-Guided Reconstruction Attacks on Machine Unlearning
CAPSUL: A Comprehensive Human Protein Benchmark for Subcellular Localization
Discrete Variational Autoencoding via Policy Search
xRFM: Accurate, scalable, and interpretable feature learning models for tabular data
On the Universality and Complexity of GNN for Solving Second-order Cone Programs
Why Reinforcement Fine-Tuning Enables MLLMs Preserve Prior Knowledge Better: A Data Perspective
Relational Feature Caching for Accelerating Diffusion Transformers
ViPRA: Video Prediction for Robot Actions
Online Decision-Focused Learning
Gauge Flow Matching: Efficient Constrained Generative Modeling over General Convex Set and Beyond
A universal compression theory for lottery ticket hypothesis and neural scaling laws
Differentiable JPEG-based Input Perturbation for Knowledge Distillation Amplification via Conditional Mutual Information Maximization
Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning
FlowSymm: Physics–Aware, Symmetry–Preserving Graph Attention for Network Flow Completion
From Large to Small: Transferring CUDA Optimization Expertise via Reasoning Graph
Weak Correlations as the Underlying Principle for Linearization of Gradient-Based Learning Systems
FAST‑DIPS: Adjoint‑Free Analytic Steps and Hard‑Constrained Likelihood Correction for Diffusion‑Prior Inverse Problems
Echoes as Anchors: Probabilistic Costs and Attention Refocusing in LLM Reasoning
InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search
ACADREASON: Exploring the Limits of Reasoning Models with Academic Research Problems
A Minimum Variance Path Principle for Accurate and Stable Score-Based Density Ratio Estimation
DeepAFL: Deep Analytic Federated Learning
Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game
AutoDV: An End-to-End Deep Learning Model for High-Dimensional Data Visualization
Solving Football by Exploiting Equilibrium Structure of 2p0s Differential Games with One-Sided Information
Mobile-GS: Real-time Gaussian Splatting for Mobile Devices
LightRetriever: A LLM-based Text Retrieval Architecture with Extremely Faster Query Inference
Dual Optimistic Ascent (PI Control) is the Augmented Lagrangian Method in Disguise
GraphShield: Graph-Theoretic Modeling of Network-Level Dynamics for Robust Jailbreak Detection
Revisiting Nonstationary Kernel Design for Multi-Output Gaussian Processes
Partition Generative Modeling: Masked Modeling Without Masks
Escaping Policy Contraction: Contraction-Aware PPO (CaPPO) for Stable Language Model Fine-Tuning
From atom to space: A region-based readout function for spatial properties of materials
SumRA: Parameter Efficient Fine-tuning with Singular Value Decomposition and Summed Orthogonal Basis
MotionStream: Real-Time Video Generation with Interactive Motion Controls
Lossy Common Information in a Learnable Gray-Wyner Network
MOSS: Efficient and Accurate FP8 LLM Training with Microscaling and Automatic Scaling
Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling
UniHand: A Unified Model for Diverse Controlled 4D Hand Motion Modeling
An Optimal Diffusion Approach to Quadratic Rate-Distortion Problems: New Solution and Approximation Methods
Spectrum Tuning: Post-Training for Distributional Coverage and In-Context Steerability
Transformers Trained via Gradient Descent Can Provably Learn a Class of Teacher Models
REAP the Experts: Why Pruning Prevails for One-Shot MoE compression
FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models
Breaking Gradient Temporal Collinearity for Robust Spiking Neural Networks
MAPSS: Manifold-based Assessment of Perceptual Source Separation
ScalingCache: Extreme Acceleration of DiTs through Difference Scaling and Dynamic Interval Caching
Learning to Reason as Action Abstractions with Scalable Mid-Training RL
Robust Fine-tuning of Vision-Language-Action Robot Policies via Parameter Merging
Fusing Pixels and Genes: Spatially-Aware Learning in Computational Pathology
Relationship Alignment for View-aware Multi-view Clustering
Text summarization via global structure awareness
Pallatom-Ligand: an All-Atom Diffusion Model for Designing Ligand-Binding Proteins
Reinforcement Mid-Training
Robust Selective Activation with Randomized Temporal K-Winner-Take-All in Spiking Neural Networks for Continual Learning
Weight Space Representation Learning on Diverse NeRF Architectures
The Seismic Wavefield Common Task Framework
Efficient-LVSM: Faster, Cheaper, and Better Large View Synthesis Model via Decoupled Co-Refinement Attention
Is On-Policy Data always the Best Choice for Direct Preference Optimization-Based LM Alignment?
Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations
Resurfacing the Instance-only Dependent Label Noise Model through Loss Correction
Memory-Free Continual Learning with Null Space Adaptation for Zero-Shot Vision-Language Models
Any-Order Flexible Length Masked Diffusion
FlexLoRA: Entropy-Guided Flexible Low-Rank Adaptation
DiffPBR: Point-Based Rendering via Spatial-Aware Residual Diffusion
Closing the Safety Gap: Surgical Concept Erasure in Visual Autoregressive Models
WRING Out The Bias: A Rotation-Based Alternative To Projection Debiasing
CPiRi: Channel Permutation-Invariant Relational Interaction for Multivariate Time Series Forecasting
AVERE: Improving Audiovisual Emotion Reasoning with Preference Optimization
VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models
Signal in the Noise: Polysemantic Interference Transfers and Predicts Cross-Model Influence
ProteinAE: Protein Diffusion Autoencoders for Structure Encoding
OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning
Learning of Population Dynamics: Inverse Optimization Meets JKO Scheme
RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots
Massive Activations are the Key to Local Detail Synthesis in Diffusion Transformers
PAMDP: Interact to Persona Alignment via a Partially Observable Markov Decision Process
SIGMark: Scalable In-Generation Watermark with Blind Extraction for Video Diffusion
Generalizable Heuristic Generation Through LLMs with Meta-Optimization
OR-PRM: A Process Reward Model for Algorithmic Problem in Operations Research
RainPro-8: An Efficient Deep Learning Model to Estimate Rainfall Probabilities Over 8 Hours
Visual Jigsaw Post-Training Improves MLLMs
Vintix II: Decision Pre-Trained Transformer is a Scalable In-Context Reinforcement Learner
ExpGuard: LLM Content Moderation in Specialized Domains
FutureFill: Fast Generation from Convolutional Sequence Models
DADA: Dual Averaging with Distance Adaptation
SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs
STDDN: A Physics-Guided Deep Learning Framework for Crowd Simulation
What Happens Next? Anticipating Future Motion by Generating Point Trajectories
iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models
STaMP: Sequence Transformation and Mixed Precision for Low-Precision Activation Quantization
Towards Improved Sentence Representations using Token Graphs
Towards High Data Efficiency in Reinforcement Learning with Verifiable Reward
Fingerprinting Deep Neural Networks for Ownership Protection: An Analytical Approach
AMemGym: Interactive Memory Benchmarking for Assistants in Long-Horizon Conversations
Heterogeneous Federated Fine-Tuning with Parallel One-Rank Adaptation
R4: Nested Reasoning-Retrieval for Reward Modeling in Role-Playing Agents
Urban Socio-Semantic Segmentation with Vision-Language Reasoning
Composition of Memory Experts for Diffusion World Models
Continuous Chain of Thought Enables Parallel Exploration and Reasoning
Pay Attention to CTC: Fast and Robust Pseudo-Labelling for Unified Speech Recognition
RAPID$^3$: Tri-Level Reinforced Acceleration Policies for Diffusion Transformer
PoLi-RL: A Point-to-List Reinforcement Learning Framework for Conditional Semantic Textual Similarity
Explaining Grokking and Information Bottleneck through Neural Collapse Emergence
Measuring Audio's Impact on Correctness: Audio-Contribution-Aware Post-Training of Large Audio Language Models
GAS: Enhancing Reward-Cost Balance of Generative Model-assisted Offline Safe RL
Syncphony: Synchronized Audio-to-Video Generation with Diffusion Transformers
Actions as Language: Fine-Tuning VLMs into VLAs Without Catastrophic Forgetting
All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning
Revisiting Multimodal Positional Encoding in Vision–Language Models
Solving the Granularity Mismatch: Hierarchical Preference Learning for Long-Horizon LLM Agents
Pareto Variational Autoencoder
Bayesian Ensemble for Sequential Decision-Making
Vision-Zero: Scalable VLM Self-Evolution via Multi-Agent Self-Play
Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification
Extending Fourier Neural Operators for Modeling Parameterized and Coupled PDEs
Revenue Maximization Under Sequential Price Competition Via The Estimation Of $s$-Concave Demand Functions
Accelerating Eigenvalue Dataset Generation via Chebyshev Subspace Filter
VMDiff: Visual Mixing Diffusion for Limitless Cross-Object Synthesis
Operationalizing Data Minimization for Privacy-Preserving LLM Prompting
The Value of Information in Human-AI Decision-making
From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning
Probing to Refine: Reinforcement Distillation of LLM Reasoners via Explanatory Inversion
Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset
SafeFlowMatcher: Safe and Fast Planning using Flow Matching with Control Barrier Functions
OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents
Chunking the Critic: A Transformer-based Soft Actor-Critic with N-Step Returns
TNT: Improving Chunkwise Training for Test-Time Memorization
Learning Flexible Forward Trajectories for Masked Molecular Diffusion
Towards Efficient Constraint Handling in Neural Solvers for Routing Problems
Estimating Worst-Case Frontier Risks of Open-Weight LLMs
Asynchronous Denoising Diffusion Models for Aligning Text-to-Image Generation
Model Predictive Adversarial Imitation Learning for Planning from Observation
R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?
NAIPv2: Debiased Pairwise Learning for Efficient Paper Quality Estimation
RLAP-CLIP: Continual Multimodal Learning with Prototype Adaptation and Difficulty-Aware Routing
Autoregressive Image Generation with Randomized Parallel Decoding
NewtonGen: Physics-consistent and Controllable Text-to-Video Generation via Neural Newtonian Dynamics
PRISM: Festina Lente Proactivity—Risk-Sensitive, Uncertainty-Aware Deliberation for Proactive Agents
Efficient Test-Time Scaling for Small Vision-Language Models
SURGE: Surprise-Guided Token Reduction for Efficient Video Understanding with VLMs
Computational Bottlenecks for Denoising Diffusions
Monotone Near-Zero-Sum Games: A Generalization of Convex-Concave Minimax
Optimal Transport-Induced Samples against Out-of-Distribution Overconfidence
Towards Robust Real-World Multivariate Time Series Forecasting: A Unified Framework for Dependency, Asynchrony, and Missingness
DriveAgent-R1: Advancing VLM-based Autonomous Driving with Active Perception and Hybrid Thinking
mCLM: A Modular Chemical Language Model that Generates Functional and Makeable Molecules
Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization
Real-Time Robot Execution with Masked Action Chunking
Do LLM Agents Know How to Ground, Recover, and Assess? Evaluating Epistemic Competence in Information-Seeking Agents
Meta-Router: Bridging Gold-standard and Preference-based Evaluations in LLM Routing
SUIT: Knowledge Editing with Subspace-Aware Key-Value Mappings
Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences
Glance and Focus Reinforcement for Pan-cancer Screening
LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer
Pretrain Value, Not Reward: Decoupled Value Policy Optimization
Characterization and Learning of Causal Graphs with Latent Confounders and Post-treatment Selection from Interventional Data
Contextual Multi-Armed Bandits with Minimum Aggregated Revenue Constraints
Adaptive Moments are Surprisingly Effective for Plug-and-Play Diffusion Sampling
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces
No outlier channels but with outlier blocks
WARP: Weight Teleportation for Attack-Resilient Unlearning Protocols
Aligning Collaborative View Recovery and Tensorial Subspace Learning via Latent Representation for Incomplete Multi-View Clustering
All-day Multi-scenes Lifelong Vision-and-Language Navigation with Tucker Adaptation
Closed-form $\ell_r$ norm scaling with data for overparameterized linear regression and diagonal linear networks under $\ell_p$ bias
Scaling Reasoning Hop Exposes Weaknesses: Demystifying and Improving Hop Generalization in Large Language Models
RF-DETR: Neural Architecture Search for Real-Time Detection Transformers
SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis
Empowering Small VLMs to Think with Dynamic Memorization and Exploration
Building Massively Multimodal Foundation Models with Interaction-aware Mixture-of-Experts
Random Label Prediction Heads for Studying Memorization in Deep Neural Networks
An Information-Theoretic Parameter-Free Bayesian Framework for Probing Labeled Dependency Trees from Attention Score
Characterizing and Optimizing the Spatial Kernel of Multi Resolution Hash Encodings
Training Large Reasoning Models Efficiently via Progressive Thought Encoding
DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation
Plug-and-Play Fidelity Optimization for Diffusion Transformer Acceleration via Cumulative Error Minimization
FastGHA: Generalized Few-Shot 3D Gaussian Head Avatars with Real-Time Animation
Reconstruction Alignment Improves Unified Multimodal Models
Bayesian Evidence-Driven Prototype Evolution for Federated Domain Adaptation
MILPnet: A Multi-Scale Architecture with Geometric Feature Sequence Representations for Advancing MILP Problems
Local Reinforcement Learning with Action-Conditioned Root Mean Squared Q-Functions
Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training
Partial Soft-Matching Distance For Neural Representational Comparison With Partial Unit Correspondence
Learning to Be Uncertain: Pre-training World Models with Horizon-Calibrated Uncertainty
Optimal Robust Subsidy Policies for Irrational Agent in Principal-Agent MDPs
Darwin Gödel Machine: Open-Ended Evolution of Self-Improving Agents
LLaVAction: evaluating and training multi-modal large language models for action understanding
DiffBED: Scaling Bayesian Experimental Design to High-Dimensions
Ensembling Pruned Attention Heads For Uncertainty-Aware Efficient Transformers
CONSIGN: Conformal Segmentation Informed by Spatial Groupings via Decomposition
$\nabla$-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space
Source-Guided Flow Matching
Video-KTR: Reinforcing Video Reasoning via Key Token Attribution
An Efficient SE(p)-Invariant Transport Metric Driven by Polar Transport Discrepancy-based Representation
FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation
The Natural Geometry of Code: Hyperbolic Representation Learning for Program Reasoning
LongRLVR: Long-Context Reinforcement Learning Requires Verifiable Context Rewards
Optimizing ID Consistency in Multimodal Large Models: Facial Restoration via Alignment, Entanglement, and Disentanglement
Why is Your Language Model a Poor Implicit Reward Model?
EGG-SR: Embedding Symbolic Equivalence into Symbolic Regression via Equality Graph
Optimizing Data Augmentation through Bayesian Model Selection
Object-Centric Refinement for Enhanced Zero-Shot Segmentation
Decentralized Attention Fails Centralized Signals: Rethinking Transformers for Medical Time Series
Beyond Noisy-TVs: Noise-Robust Exploration Via Learning Progress Monitoring
PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits
Trust-Region Adaptive Policy Optimization
Distributional Consistency Loss: Beyond Pointwise Data Terms in Inverse Problems
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
MIRA: Memory-Integrated Reinforcement Learning Agent with Limited LLM Guidance
Making, Not Taking, the Best of N
Controllable Logical Hypothesis Generation for Abductive Reasoning in Knowledge Graphs
MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models
VMoBA: Mixture-of-Block Attention for Video Diffusion Models
Deep Hierarchical Learning with Nested Subspace Networks for Large Language Models
Glance for Context: Learning When to Leverage LLMs for Node-Aware GNN-LLM Fusion
CLARC: C/C++ Benchmark for Robust Code Search
Memory-Statistics Tradeoff in Continual Learning with Structural Regularization
Otters: An Energy-Efficient Spiking Transformer via Optical Time-to-First-Spike Encoding
OmniActor: A Generalist GUI and Embodied Agent for 2D&3D Worlds
Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning
Fine-tuning Behavioral Cloning Policies with Preference‑Based Reinforcement Learning
Mixture-of-Experts Can Surpass Dense LLMs Under Strictly Equal Resource
UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections
Riemannian Federated Learning via Averaging Gradient Streams
Inductive Reasoning for Temporal Knowledge Graphs with Emerging Entities
Don't Just Fine-tune the Agent, Tune the Environment
Offline Reinforcement Learning with Adaptive Feature Fusion
Flash-Mono: Feed-Forward Accelerated Gaussian Splatting Monocular SLAM
AnyTouch 2: General Optical Tactile Representation Learning For Dynamic Tactile Perception
Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations
Swap-guided Preference Learning for Personalized Reinforcement Learning from Human Feedback
STARK: Strategic Team of Agents for Refining Kernels
Cognitive models can reveal interpretable value trade-offs in language models
Deconstructing Guidance: A Semantic Hierarchy for Precise Diffusion Model Editing
Toward Enhancing Representation Learning in Federated Multi-Task Settings
VLMgineer: Vision-Language Models as Robotic Toolsmiths
Latent-to-Data Cascaded Diffusion Models for Unconditional Time Series Generation
Statistical Guarantees in the Search for Less Discriminatory Algorithms
FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM Agents
Learning with Dual-level Noisy Correspondence for Multi-modal Entity Alignment
Demystifying The Mechanisms Behind Emergent Exploration in Goal-Conditioned RL
Articulation in Motion: Prior-free Part Mobility Analysis for Articulated Objects By Dynamic-Static Disentanglement
An Open-Ended Benchmark and Formal Framework for Adjuvant Research with MLLM
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation
Guided Speculative Inference for Efficient Test-Time Alignment of LLMs
Fast Estimation of Wasserstein Distances via Regression on Sliced Wasserstein Distances
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
MLE-Smith: Scaling MLE Tasks with Automated Multi-agent Pipeline
Beyond Penalization: Diffusion-based Out-of-Distribution Detection and Selective Regularization in Offline Reinforcement Learning
Test-Time Scaling with Reflective Generative Model
EasyCreator: Empowering 4D Creation through Video Inpainting
Improving Classifier-Free Guidance in Masked Diffusion: Low-Dim Theoretical Insights with High-Dim Impact
Mean Flow Policy with Instantaneous Velocity Constraint for One-step Action Generation
ATLAS: Constraints-Aware Multi-Agent Collaboration for Real-World Travel Planning
Robust Decision-Making with Partially Calibrated Forecasters
HarmonyGNNs: Harmonizing Heterophily and Homophily in GNNs via Self-Supervised Node Encoding
Discovering and Steering Interpretable Concepts in Large Generative Music Models
Why Ask One When You Can Ask $k$? Learning-to-Defer to the Top-$k$ Experts
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling
LinearRAG: Linear Graph Retrieval Augmented Generation on Large-scale Corpora
MTVCraft: Tokenizing 4D Motion for Arbitrary Character Animation
Discount Model Search for Quality Diversity Optimization in High-Dimensional Measure Spaces
Optimistic Task Inference for Behavior Foundation Models
Learning Pseudorandom Numbers with Transformers: Permuted Congruential Generators, Curricula, and Interpretability
String Seed of Thought: Prompting LLMs for Distribution-Faithful and Diverse Generation
Emergence of Superposition: Unveiling the Training Dynamics of Chain of Continuous Thought
Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning
Equilibrium Language Models
Jacobian Aligned Random Forests
All Code, No Thought: Language Models Struggle to Reason in Ciphered Language
Transferable and Stealthy Adversarial Attacks on Large Vision-Language Models
Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SEED-SET: Scalable Evolving Experimental Design for System-level Ethical Testing
SpikeStereoNet: A Brain-Inspired Framework for Stereo Depth Estimation from Spike Streams
Learning to Reason over Continuous Tokens with Reinforcement Learning
An Improved Model-free Decision-estimation Coefficient with Applications in Adversarial MDPs
Discrete Latent Features Ablate Adversarial Attack: A Robust Prompt Tuning Framework for VLMs
Interactive Learning of Single-Index Models via Stochastic Gradient Descent
Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling
Complementing Self-Consistency with Cross-Model Disagreement for Uncertainty Quantification
FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning
Interleaving Reasoning for Better Text-to-Image Generation
ShinkaEvolve: Towards Open-Ended and Sample-Efficient Program Evolution
Graph Diffusion Transformers are In-Context Molecular Designers
Gistify: Codebase-Level Understanding via Runtime Execution
HLD: Approximate Hierarchical Linguistic Distribution Modeling for LLM-Generated Text Detection
LingoLoop Attack: Trapping MLLMs via Linguistic Context and State Entrapment into Endless Loops
SAVE: A Generalizable Framework for Multi-Condition Single-Cell Generation with Gene Block Attention
Adaptive Augmentation-Aware Latent Learning for Robust LiDAR Semantic Segmentation
CoAct-1: Computer-using Multi-agent System with Coding Actions
AgenTracer: Who Is Inducing Failure in the LLM Agentic Systems?
Identifying Robust Neural Pathways: Few-Shot Adversarial Mask Tuning for Vision-Language Models
Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning
Safeguarding Multimodal Knowledge Copyright in the RAG-as-a-Service Environment
Subspace Kernel Learning on Tensor Sequences
CityLens: Evaluating Large Vision-Language Models for Urban Socioeconomic Sensing
Exploring Cross-Modal Flows for Few-Shot Learning
Learning to Solve Orienteering Problem with Time Windows and Variable Profits
FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment
Physics-Constrained Fine-Tuning of Flow-Matching Models for Generation and Inverse Problems
DAVE: A VLM Vision Encoder for Document Understanding and Web Agents
DiCache: Let Diffusion Model Determine Its Own Cache
When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling
Evaluating Language Models' Evaluations of Games
NEO — No-Optimization Test-Time Adaptation through Latent Re-Centering
Structure Learning from Time-Series Data with Lag-Agnostic Structural Prior
Improving Autoregressive Video Modeling with History Understanding
Robust Equation Structure Learning with Adaptive Refinement
Unifying Formal Explanations: A Complexity-Theoretic Perspective
Universal Model Routing for Efficient LLM Inference
Towards Multimodal Data-Driven Scientific Discovery Powered by LLM Agents
LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation
Self-Refining Vision Language Model for Robotic Failure Detection and Reasoning
Cross-Domain Policy Optimization via Bellman Consistency and Hybrid Critics
TEDM: Time Series Forecasting with Elucidated Diffusion Models
CREPE: Controlling diffusion with REPlica Exchange
Avey-B
Divide, Harmonize, Then Conquer It: Shooting Multi-Commodity Flow Problems with Multimodal Language Models
Efficient Learning on Large Graphs using a Densifying Regularity Lemma
Evaluating SAE interpretability without generating explanations
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
TGM: A Modular and Efficient Library for Machine Learning on Temporal Graphs
On the Theoretical Limitations of Embedding-Based Retrieval
House Of Dextra : Cross-Embodied Co-Design for Dexterous Hands
Token-level Data Selection for Safe LLM Fine-tuning
Mitigating Mismatch within Reference-based Preference Optimization
ContextNav: Towards Agentic Multimodal In-Context Learning
Reinforcing Diffusion Models by Direct Group Preference Optimization
BEP: A Binary Error Propagation Algorithm for Binary Neural Networks Training
Verification and Co-Alignment via Heterogeneous Consistency for Preference-Aligned LLM Annotations
Scalable Training for Vector-Quantized Networks with 100% Codebook Utilization
CLUE: Conflict-guided Localization for LLM Unlearning Framework
Nano3D: A Training-Free Approach for Efficient 3D Editing Without Masks
MoMa: A Simple Modular Learning Framework for Material Property Prediction
MoSA: Mosaic Shared Adaptation of Large Language Models
A Federated Generalized Expectation-Maximization Algorithm for Mixture Models with an Unknown Number of Components
BIRD: Behavior Induction via Representation-structure Distillation
Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models
Fantastic Tractor-Dogs and How Not to Find Them With Open-Vocabulary Detectors
Dropping Just a Handful of Preferences Can Change Top Large Language Model Rankings
Neural Latent Arbitrary Lagrangian-Eulerian Grids for Fluid-Solid Interaction
MedAgentGym: A Scalable Agentic Training Environment for Code-Centric Reasoning in Biomedical Data Science
FERD: Fairness-Enhanced Data-Free Adversarial Robustness Distillation
RE-PO: Robust Enhanced Policy Optimization as a General Framework for LLM Alignment
Graph Tokenization for Bridging Graphs and Transformers
Latent Visual Reasoning
Poisson Midpoint Method for Log Concave Sampling: Beyond the Strong Error Lower Bounds
Adaptive Canonicalization with Application to Invariant Anisotropic Geometric Networks
C-Evolve: Consensus-based Evolution for Prompt Groups
Difference Predictive Coding for Training Spiking Neural Networks
Overtone: Cyclic Patch Modulation for Clean, Efficient, and Flexible Physics Emulators
Adaptive Domain Shift in Diffusion Models for Cross-Modality Image Translation
Towards Better Optimization For Listwise Preference in Diffusion Models
Automatic Dialectic Jailbreak: A Framework for Generating Effective Jailbreak Strategies
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards
Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
Defending against Backdoor Attacks via Module Switching
OSCAR: Online Soft Compression for RAG
A Tale of Two Geometries: Adaptive Optimizers and Non-Euclidean Descent
Demystifying Supervision Data Generalization in Multimodal LMs
RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning
HiMAE: Hierarchical Masked Autoencoders Discover Resolution-Specific Structure in Wearable Time Series
LadderSym: A Multimodal Interleaved Transformer for Music Practice Error Detection
SpotIt: Evaluating Text-to-SQL Evaluation with Formal Verification
AnesSuite: A Comprehensive Benchmark and Dataset Suite for Anesthesiology Reasoning in LLMs
ARFlow: Auto-regressive Optical Flow Estimation for Arbitrary-Length Videos via Progressive Next-Frame Forecasting
Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning
Risk-Sensitive Agent Compositions
One2Scene: Geometric Consistent Explorable 3D Scene Generation from a Single Image
Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks
Building spatial world models from sparse transitional episodic memories
From Cheap Geometry to Expensive Physics: A Physics-agnostic Pretraining Framework for Neural Operators
Neon: Negative Extrapolation From Self-Training Improves Image Generation
Diagnosing and Remedying Knowledge Deficiencies in LLMs via Label-free Curricular Meaningful Learning
UNDERSTANDING TRANSFORMERS FOR TIME SERIES FORECASTING: A CASE STUDY ON MOIRAI
Self-Speculative Masked Diffusions
When to use Graphs in RAG: A Comprehensive Analysis for Graph Retrieval-Augmented Generation
Object Fidelity Diffusion for Remote Sensing Image Generation
Prune Redundancy, Preserve Essence: Vision Token Compression in VLMs via Synergistic Importance-Diversity
From Single to Multi-Granularity: Toward Long-Term Memory Association and Selection of Conversational Agents
The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner
Robust Multi-Objective Controlled Decoding of Large Language Models
MoL: Adaptive Mixture-of-Length Reasoning for Efficient Question Answering with Context
ProofOptimizer: Training Language Models to Simplify Proofs without Human Demonstrations
Developmental Federated Tuning: A Cognitive-Inspired Paradigm for Efficient LLM Adaptation
Token Hidden Reward: Steering Exploration-Exploitation in Group Relative Deep Reinforcement Learning
THE PATH OF LEAST RESISTANCE: GUIDING LLM REASONING TRAJECTORIES WITH PREFIX CONSENSUS
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization
Reasoning-Aligned Perception Decoupling for Scalable Multi-modal Reasoning
Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
Sampling-aware Adversarial Attacks Against Large Language Models
Nudging the Boundaries of LLM Reasoning
A Training-Free Framework for Long Video Understanding via Video-Query-Options Similarity
Beyond English-Centric Training: How Reinforcement Learning Improves Cross-Lingual Reasoning in LLMs
Breaking the Correlation Plateau: On the Optimization and Capacity Limits of Attention-Based Regressors
Group Critical-token Policy Optimization for Autoregressive Image Generation
PAC-Bayes bounds for cumulative loss in Continual Learning
dParallel: Learnable Parallel Decoding for dLLMs
APPLE: Toward General Active Perception via Reinforcement Learning
On Discovering Algorithms for Adversarial Imitation Learning
SelfReflect: Can LLMs Communicate Their Internal Answer Distribution?
Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels
Disentangling Length Bias in Preference Learning via Response-Conditioned Modeling
Discovering heterogeneous synaptic plasticity rules via large-scale neural evolution
What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging
Log Probability Tracking of LLM APIs
Advancing Complex Video Object Segmentation via Progressive Concept Construction
Cross-Tokenizer Likelihood Scoring Algorithms for Language Model Distillation
Next Visual Granularity Generation
In-Context Learning of Temporal Point Processes with Foundation Inference Models
DrVoice: Parallel Speech-Text Voice Conversation Model via Dual-Resolution Speech Representations
SiMO: Single-Modality-Operable Multimodal Collaborative Perception
Any-to-Bokeh: Arbitrary-Subject Video Refocusing with Video Diffusion Model
Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments
NoisePrints: Distortion-Free Watermarks for Authorship in Private Diffusion Models
K-Prism: A Knowledge-Guided and Prompt Integrated Universal Medical Image Segmentation Model
STEM: SCALING TRANSFORMERS WITH EMBEDDING MODULES
StochasTok: Improving Fine-Grained Subword Understanding in LLMs
Global Resolution: Optimal Multi-Draft Speculative Sampling via Convex Optimization
RESCUE: Retrieval Augmented Secure Code Generation
Speech-to-LaTeX: New Models and Datasets for Converting Spoken Equations and Sentences
Counterfactual Structural Causal Bandits
From Tokens to Nodes: Semantic-Guided Motion Control for Dynamic 3D Gaussian Splatting
APC-RL: Exceeding data-driven behavior priors with adaptive policy composition
Steering Diffusion Models Towards Credible Content Recommendation
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
GoalRank: Group-Relative Optimization for a Large Ranking Model
Is Graph Unlearning Ready for Practice? A Benchmark on Efficiency, Utility, and Forgetting
Multiple-Prediction-Powered Inference
How to Square Tensor Networks and Circuits Without Squaring Them
ActivationReasoning: Logical Reasoning in Latent Activation Spaces
Measuring and Mitigating Rapport Bias of Large Language Models under Multi-Agent Social Interactions
Minimax-Optimal Aggregation for Density Ratio Estimation
Diffusion Language Model Knows the Answer Before It Decodes
SketchingReality: From Freehand Scene Sketches to Photorealistic Images
A Primer on SO(3) Action Representations in Deep Reinforcement Learning
Imitating the Truth: Attention-aware Truth-Guided Enhancement for Hallucination Mitigation in Large Vision-Language Models
Towards Quantifying Long-Range Interactions in Graph Machine Learning: a Large Graph Dataset and a Measurement
GT-Space: Enhancing Heterogeneous Collaborative Perception with Ground Truth Feature Space
ProTDyn: A Foundation Protein Language Model for Thermodynamics and Dynamics Generation
RepIt: Steering Language Models with Concept-Specific Refusal Vectors
Learning Survival Distributions with Individually Calibrated Asymmetric Laplace Distribution
Reasoning Models Can be Accurately Pruned Via Chain-of-Thought Reconstruction
Spatial Reasoning with Vision-Language Models in Ego-Centric Multi-View Scenes
Exploring Mode Connectivity in Krylov Subspace for Domain Generalization
Johnson-Lindenstrauss Lemma Guided Network for Efficient 3D Medical Segmentation
KL-Regularized Reinforcement Learning for Generative Modelling is Designed to Mode Collapse
Counterfactual LLM-based Framework for Measuring Rhetorical Style
Bandits with Single-Peaked Preferences and Limited Resources
Designing Affine-Invariant Neural Networks for Photometric Corruption Robustness and Generalization
Quasi-Monte Carlo Methods Enable Extremely Low-Dimensional Deep Generative Models
Sparse Imagination for Efficient Visual World Model Planning
Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs
Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining
Globally aware optimization with resurgence
RefAny3D: 3D Asset-Referenced Diffusion Models for Image Generation
ODI-Bench: Can MLLMs Understand Immersive Omnidirectional Environments?
MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites
DeepWeightFlow: Re-Basined Flow Matching for Generating Neural Network Weights
AutoCodeBench: Large Language Models are Automatic Code Benchmark Generators
Variational Reasoning for Language Models
AP-OOD: Attention Pooling for Out-of- Distribution Detection
Learning More with Less: A Dynamic Dual-Level Down-Sampling Framework for Efficient Policy Optimization
Q-Learning with Fine-Grained Gap-Dependent Regret
AWM: Accurate Weight-Matrix Fingerprint for Large Language Models
NDAD: Negative-Direction Aware Decoding for Large Language Models via Controllable Hallucination Signal Injection
Conformalized Hierarchical Calibration for Uncertainty-Aware Adaptive Hashing
From Natural Alignment to Conditional Controllability in Multimodal Dialogue
Evoking User Memory: Personalizing LLM via Recollection-Familiarity Adaptive Retrieval
EdgeCape: Edge Weight Prediction For Category-Agnostic Pose Estimation
Expressiveness of Multi-Neuron Convex Relaxations in Neural Network Certification
Online Navigation Refinement: Achieving Lane-Level Guidance by Associating Standard-Definition and Online Perception Maps
SLAP: Shortcut Learning for Abstract Planning
QuaMo: Quaternion Motions for Vision-based 3D Human Kinematics Capture
Visual Autoregressive Modeling for Instruction-Guided Image Editing
Completed Hyperparameter Transfer across Modules, Width, Depth, Batch and Duration
Hidden Breakthroughs in Language Model Training
DND: Boosting Large Language Models with Dynamic Nested Depth
Discovering Novel LLM Experts via Task-Capability Coevolution
GTR-Bench: Evaluating Geo-Temporal Reasoning in Vision-Language Models
Learned Meta-Tokens for Language Modeling
Cortical Policy: A Dual-Stream View Transformer for Robotic Manipulation
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
Trajectory Generation with Conservative Value Guidance for Offline Reinforcement Learning
KaVa: Latent Reasoning via Compressed KV-Cache Distillation
Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks
Robust and Interpretable Adaptation of Equivariant Materials Foundation Models via Sparsity-promoting Fine-tuning
Arbitrary Generative Video Interpolation
FedDAG: Clustered Federated Learning via Global Data and Gradient Integration for Heterogeneous Environments
Latent Concept Disentanglement in Transformer-based Language Models
AdaSpec: Adaptive Spectrum for Enhanced Node Distinguishability
A Step to Decouple Optimization in 3DGS
QuRL: Low-Precision Reinforcement Learning for Efficient Reasoning
VQ-Transplant: Efficient VQ-Module Integration for Pre-trained Visual Tokenizers
Memba: Membrane-driven Parameter-Efficient Fine-Tuning for Mamba
GenCP: Towards Generative Modeling Paradigm of Coupled physics
Differentiable Lifting for Topological Neural Networks
TIPS: Turn-level Information-Potential Reward Shaping for Search-Augmented LLMs
ATOM: A Pretrained Neural Operator for Multitask Molecular Dynamics
Revela: Dense Retriever Learning via Language Modeling
Robust Training of Neural Networks at Arbitrary Precision and Sparsity
RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization
Robustness in the Face of Partial Identifiability in Reward Learning
Fore-Mamba3D: Mamba-based Foreground-Enhanced Encoding for 3D Object Detection
GeoPurify: A Data-Efficient Geometric Distillation Framework for Open-Vocabulary 3D Segmentation
SkyEvents: A Large-Scale Event-enhanced UAV Dataset for Robust 3D Scene Reconstruction
Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations
ProxyThinker: Test-Time Guidance through Small Visual Reasoners
Internal Planning in Language Models: Characterizing Horizon and Branch Awareness
Natural Identifiers for Privacy and Data Audits in Large Language Models
AUHead: Realistic Emotional Talking Head Generation via Action Units Control
Beyond Spectra: Eigenvector Overlaps in Loss Geometry
SAM 3: Segment Anything with Concepts
Mirror Flow Matching with Heavy-Tailed Priors for Generative Modeling on Convex Domains
Bayesian Attention Mechanism: A Probabilistic Framework for Positional Encoding and Context Length Extrapolation
Think in Parallel, Answer as One: Logit Averaging for Open-Ended Reasoning
Product of Experts for Visual Generation
In-Place Test-Time Training
DexMove: Learning Tactile-Guided Non-Prehensile Manipulation with Dexterous Hands
Convex Dominance in Deep Learning I: A Scaling Law of Loss and Learning Rate
Partially Equivariant Reinforcement Learning in Symmetry-Breaking Environments
SiNGER: A Clearer Voice Distills Vision Transformers Further
A Fano-Style Accuracy Upper Bound for LLM Single-Pass Reasoning in Multi-Hop QA
Bilevel Optimization with Lower-Level Uniform Convexity: Theory and Algorithm
Reformulation for Pretraining Data Augmentation
Discrete Diffusion Trajectory Alignment via Stepwise Decomposition
SIGMA-Gen: Structure and Identity Guided Multi-Subject Assembly for Image Generation
Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs
RLAC: Reinforcement Learning with Adversarial Critic for Free-Form Generation Tasks
AlphaBench: Benchmarking Large Language Models in Formulaic Alpha Factor Mining
A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling
Multimodal Aligned Semantic Knowledge for Unpaired Image-text Matching
Directional Sheaf Hypergraph Networks: Unifying Learning on Directed and Undirected Hypergraphs
EquAct: An SE(3)-Equivariant Multi-Task Transformer for 3D Robotic Manipulation
DRIFT: Learning from Abundant User Dissatisfaction in Real-World Preference Learning
Rainbow Padding: Mitigating Early Termination in Instruction-Tuned Diffusion LLMs
HiGS: History-Guided Sampling for Plug-and-Play Enhancement of Diffusion Models
Pre-training Limited Memory Language Models with Internal and External Knowledge
Bayesian Neural Networks for Functional ANOVA Model
Gradient-Based Diversity Optimization with Differentiable Top-$k$ Objective
Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond
PACEbench: A Framework for Evaluating Practical AI Cyber-Exploitation Capabilities
UrbanGraph: Physics-Informed Spatio-Temporal Dynamic Heterogeneous Graphs for Urban Microclimate Prediction
WALT: Web Agents that Learn Tools
Boosting Medical Visual Understanding From Multi-Granular Language Learning
RPM: Reasoning-Level Personalization for Black-Box Large Language Models
Addressing divergent representations from causal interventions on neural networks
Shortcut Diffusion Training with Cumulative Consistency Loss: An Optimal Control View
DeepScientist: Advancing Frontier-Pushing Scientific Findings Progressively
Out of the Shadows: Exploring a Latent Space for Neural Network Verification
UniHM: Unified Dexterous Hand Manipulation with Vision Language Model
Sharp Monocular View Synthesis in Less Than a Second
Only Brains Align with Brains: Cross-Region Alignment Patterns Expose Limits of Normative Models
A Statistical Theory of Overfitting for Imbalanced Classification
Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks
Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute
Stackelberg Coupling of Online Representation Learning and Reinforcement Learning
Structurally Human, Semantically Biased: Detecting LLM-Generated References with Embeddings and GNNs
Modality-free Graph In-context Alignment
Teach to Reason Safely: Policy-Guided Safety Tuning for MLRMs
TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation
A Sharp KL Convergence Analysis for Diffusion Models under Minimal Assumptions
Block-wise Adaptive Caching for Accelerating Diffusion Policy
gLSTM: Mitigating Over-Squashing by Increasing Storage Capacity
DispViT: Direct Stereo Disparity Regression with a Single-Stream Vision Transformer
GraphUniverse: Synthetic Graph Generation for Evaluating Inductive Generalization
GradPruner: Gradient-guided Layer Pruning Enabling Efficient Fine-Tuning and Inference for LLMs
DeAltHDR: Learning HDR Video Reconstruction from Degraded Alternating Exposure Sequences
On the Convergence of Two-Layer Kolmogorov-Arnold Networks with First-Layer Training
RECON: Robust symmetry discovery via Explicit Canonical Orientation Normalization
GTool: Graph Enhanced Tool Planning with Large Language Model
TumorChain: Interleaved Multimodal Chain-of-Thought Reasoning for Traceable Clinical Tumor Analysis
ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL Problems
Multifidelity Simulation-based Inference for Computationally Expensive Simulators
TRIBE: TRImodal Brain Encoder for whole-brain fMRI response prediction
Primal-Dual Policy Optimization for Linear CMDPs with Adversarial Losses
Boosting Entropy with Bell Box Quantization
Physics-Informed Audio-Geometry-Grid Representation Learning for Universal Sound Source Localization
DEAS: DEtached value learning with Action Sequence for Scalable Offline RL
Learning From Dictionary: Enhancing Robustness of Machine-Generated Text Detection in Zero-Shot Language via Adversarial Training
Efficient Offline Reinforcement Learning via Peer-Influenced Constraint
Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning
Rectifying LLM Thought from Lens of Optimization
DASH: Deterministic Attention Scheduling for High-throughput Reproducible LLM Training
Grounding or Guessing? Visual Signals for Detecting Hallucinations in Sign Language Translation
LiveWeb-IE: A Benchmark For Online Web Information Extraction
Dual-Scale World Memory for LLM Agents towards Hard-Exploration Problems
Domain Expansion: A Latent Space Construction Framework for Multi-Task Learning
Not All Bits Are Equal: Scale-Dependent Memory Optimization Strategies for Reasoning Models
Single-stream Policy Optimization
GeomMotif: A Benchmark for Arbitrary Geometric Preservation in Protein Generation
Bayesian Post Training Enhancement of Regression Models with Calibrated Rankings
VisJudge-Bench: Aesthetics and Quality Assessment of Visualizations
Time Is All It Takes: Spike-Retiming Attacks on Event-Driven Spiking Neural Networks
Image Quality Assessment for Embodied AI
UIS-Digger: Towards Comprehensive Research Agent Systems for Real-world Unindexed Information Seeking
Understanding Transformers for Time Series: Rank Structure, Flow-of-ranks, and Compressibility
PairFlow: Closed-Form Source-Target Coupling for Few-Step Generation in Discrete Flow Models
ASSESS: A Semantic and Structural Evaluation Framework for Statement Similarity
New Hybrid Fine-Tuning Paradigm for LLMs: Algorithm Design and Convergence Analysis Framework
Decoupled Q-Chunking
VER: Vision Expert Transformer for Robot Learning via Foundation Distillation and Dynamic Routing
SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models
TimeSeg: An Information-Theoretic Segment-Wise Explainer for Time-Series Predictions
h-MINT: Modeling Pocket-Ligand Binding with Hierarchical Molecular Interaction Network
What Layers When: Learning to Skip Compute in LLMs with Residual Gates
InfoScan: Information-Efficient Visual Scanning via Resource-Adaptive Walks
EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models
COLD-Steer: Steering Large Language Models via In-Context One-step Learning Dynamics
Beyond Multi-Token Prediction: Pretraining LLMs with Future Summaries
OccDriver: Future Occupancy Guided Dual-branch Trajectory Planner in Autonomous Driving
Fair Conformal Classification via Learning Representation-Based Groups
Are Reasoning LLMs Robust to Interventions on their Chain-of-Thought?
KV Cache Transform Coding for Compact Storage in LLM Inference
Dual Goal Representations
Differential Fine-Tuning Large Language Models Towards Better Diverse Reasoning Abilities
Revisiting the Past: Data Unlearning with Model State History
EXPO: Stable Reinforcement Learning with Expressive Policies
Learning Distributions over Permutations and Rankings with Factorized Representations
PhyScensis: Physics-Augmented LLM Agents for Complex Physical Scene Arrangement
Path Channels and Plan Extension Kernels: a Mechanistic Description of Planning in a Sokoban RNN
Propaganda AI: An Analysis of Semantic Divergence in Large Language Models
Divergence-Free Neural Networks with Application to Image Denoising
Temporal Graph Thumbnail: Robust Representation Learning with Global Evolutionary Skeleton
StoryAlign: Evaluating and Training Reward Models for Story Generation
Generalization Below the Edge of Stability: The Role of Data Geometry
Language in the Flow of Time: Time-Series-Paired Texts Weaved into a Unified Temporal Narrative
DirMoE: Dirichlet-Routed Mixture of Experts
Inconsistency Biases in Dynamic Data Pruning
Diagnosing Generalization Failures from Representational Geometry Markers
Navigating the Latent Space Dynamics of Neural Models
COSMOS: A Hybrid Adaptive Optimizer for Efficient Training of Large Language Models
Low Rank Transformer for Multivariate Time Series Anomaly Detection and Localization
WATS: Wavelet-Aware Temperature Scaling for Reliable Graph Neural Networks
StepORLM: A Self-Evolving Framework With Generative Process Supervision For Operations Research Language Models
SimpleGVR: A Simple Baseline for Latent-Cascaded Generative Video Super-Resolution
Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis
Grouping Nodes with known Value Differences: A lossless UCT-based Abstraction Algorithm
DragFlow: Unleashing DiT Priors with Region-Based Supervision for Drag Editing
Vid2World: Crafting Video Diffusion Models to Interactive World Models
MoRA: Missing Modality Low-Rank Adaptation for Visual Recognition
Attribution-Guided Decoding
JointAVBench: A Benchmark for Joint Audio-Visual Reasoning Evaluation
TyphoonMLA: A Mixed Naive-Absorb MLA Kernel For Shared Prefix
On Smoothness Bounds for Non-Clairvoyant Scheduling with Predictions
Nonparametric Teaching of Attention Learners
Training Dynamics Impact Post-Training Quantization Robustness
Neural Graduated Assignment for Maximum Common Edge Subgraphs
BA-LoRA: Bias-Alleviating Low-Rank Adaptation to Mitigate Catastrophic Inheritance in Large Language Models
To Augment or Not to Augment? Diagnosing Distributional Symmetry Breaking
SketchThinker-R1: Towards Efficient Sketch-Style Reasoning in Large Multimodal Models
SMOTE and Mirrors: Exposing Privacy Leakage from Synthetic Minority Oversampling
PTNET: A PROPOSAL-CENTRIC TRANSFORMER NET- WORK FOR 3D OBJECT DETECTION
Energy-Based Transformers are Scalable Learners and Thinkers
Improving Online-to-Nonconvex Conversion for Smooth Optimization via Double Optimism
Escaping the Homophily Trap: A Threshold-free Graph Outlier Detection Framework via Clustering-guided Edge Reweighting
Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards
Self-Aug: Query and Entropy Adaptive Decoding for Large Vision-Language Models
Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception
Transformers are Inherently Succinct
HSG-12M: A Large-Scale Benchmark of Spatial Multigraphs from the Energy Spectra of Non-Hermitian Crystals
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models
Dynamic Multimodal Activation Steering for Hallucination Mitigation in Large Vision-Language Models
Let's Split Up: Zero-Shot Classifier Edits for Fine-Grained Video Understanding
Detecting Data Contamination in LLMs via In-Context Learning
SNAP-UQ: Self-supervised Next-Activation Prediction for Single-Pass Uncertainty in TinyML
Neural Hamilton--Jacobi Characteristic Flows for Optimal Transport
SGD-Based Knowledge Distillation with Bayesian Teachers: Theory and Guidelines
Dynamic Speculative Agent Planning
Task-Related Token Compression in Multimodal Large Language Models from an Explainability Perspective
Foundation Visual Encoders Are Secretly Few-Shot Anomaly Detectors
Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning
Cannistraci-Hebb Training on Ultra-Sparse Spiking Neural Networks
DM4CT: Benchmarking Diffusion Models for Computed Tomography Reconstruction
When MLLMs Meet Compression Distortion: A Coding Paradigm Tailored to MLLMs
InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning
Understanding the Role of Training Data in Test-Time Scaling
KaLM-Embedding-V2: Superior Training Techniques and Data Inspire A Versatile Embedding Model
Adaptive Gaussian Expansion for On-the-fly Category Discovery
Beyond In-Domain Detection: SpikeScore for Cross-Domain Hallucination Detection
Quantized Visual Geometry Grounded Transformer
SwiftTS: A Swift Selection Framework for Time Series Pre-trained Models via Multi-task Meta-Learning
Temporal Geometry of Deep Networks: Hyperbolic Representations of Training Dynamics for Intrinsic Explainability
MAVEN: A Mesh-Aware Volumetric Encoding Network for Simulating 3D Flexible Deformation
Efficient Message-Passing Transformer for Error Correcting Codes
Parallel Sampling from Masked Diffusion Models via Conditional Independence Testing
Why Attention Patterns Exist: A Unifying Temporal Perspective Analysis
Tokenisation over Bounded Alphabets is Hard
Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning
Routing, Cascades, and User Choice for LLMs
DUET: Distilled LLM Unlearning from an Efficiently Contextualized Teacher
Activation Function Design Sustains Plasticity in Continual Learning
Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning
MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents
On the trade-off between expressivity and privacy in graph representation learning
Post-hoc Probabilistic Vision-Language Models
Quasi-Equivariant Metanetworks
Non-Asymptotic Analysis of Efficiency in Conformalized Regression
Maximizing Incremental Information Entropy for Contrastive Learning
Hyperbolic Aware Minimization: Implicit Bias for Sparsity
Decision-Theoretic Approaches for Improved Learning-Augmented Algorithms
Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks
CircuitSense: A Hierarchical MLLM Benchmark Bridging Visual Comprehension and Symbolic Reasoning in Engineering Design Process
Post-Training Quantization for Video Matting
TROLL: Trust Regions Improve Reinforcement Learning for Large Language Models
Exploratory Diffusion Model for Unsupervised Reinforcement Learning
Continuum Transformers Perform In-Context Learning by Operator Gradient Descent
Ambig-SWE: Interactive Agents to Overcome Underspecificity in Software Engineering
Retrieval-of-Thought: Efficient Reasoning via Reusing Thoughts
ARMOR: Aligning Secure and Safe Large Language Models via Meticulous Reasoning
CreatiDesign: A Unified Multi-Conditional Diffusion Transformer for Creative Graphic Design
Causal Discovery in the Wild: A Voting-Theoretic Ensemble Approach
MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting
UNITE: Universal kNowledge Integration from Task-specific Experts
WIMFRIS: WIndow Mamba Fusion and Parameter Efficient Tuning for Referring Image Segmentation
Sharp asymptotic theory for Q-learning with \texttt{LD2Z} learning rate and its generalization
InfoDet: A Dataset for Infographic Element Detection
Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment
ASTGI: Adaptive Spatio-Temporal Graph Interactions for Irregular Multivariate Time Series Forecasting
RESFL: An Uncertainty-Aware Framework for Responsible Federated Learning by Balancing Privacy, Fairness and Utility
Conjuring Semantic Similarity
CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives
PCLR: Progressively Compressed LoRA for Multimodal Continual Instruction Tuning
More Than What Was Chosen: LLM-based Explainable Recommendation Beyond Noisy User Preferences
Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark
Prompt and Parameter Co-Optimization for Large Language Models
DP-Fusion: Token-Level Differentially Private Inference for Large Language Models
ResCP: Reservoir Conformal Prediction for Time Series Forecasting
Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression
SpikeGen: Decoupled “Rods and Cones” Visual Representation Processing with Latent Generative Framework
Towards Self-Robust LLMs: Intrinsic Prompt Noise Resistance via CoIPO
ReCAPA: Hierarchical Predictive Correction to Mitigate Cascading Failures
High-dimensional Mean-Field Games by Particle-based Flow Matching
Graph-Theoretic Intrinsic Reward: Guiding RL with Effective Resistance
Frayed RoPE and Long Inputs: A Geometric Perspective
NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks
PD$^{2}$GS: Part-Level Decoupling and Continuous Deformation of Articulated Objects via Gaussian Splatting
DeLiVR: Differential Spatiotemporal Lie Bias for Efficient Video Deraining
SafeMoE: Safe Fine-Tuning for MoE LLMs by Aligning Harmful Input Routing
On the Bayes Inconsistency of Disagreement Discrepancy Surrogates
QPrompt-R1: Real-Time Reasoning for Domain-Generalized Semantic Segmentation via Group-Relative Query Alignment
Decision Aggregation under Quantal Response
BOLT: Decision‑Aligned Distillation and Budget-Aware Routing for Constrained Multimodal QA on Robots
World-In-World: World Models in a Closed-Loop World
Interference-Isolated Elastic Weight Consolidation and Knowledge Calibration for Incremental Object Detection
IDER: IDempotent Experience Replay for Reliable Continual Learning
LLM-as-a-Prophet: Understanding Predictive Intelligence with Prophet Arena
Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4): Analysis and Variations
OmniField: Conditioned Neural Fields for Robust Multimodal Spatiotemporal Learning
Why We Need New Benchmarks for Local Intrinsic Dimension Estimation
Riesz Neural Operator for Solving Partial Differential Equations
Unified In-Context Video Editing
Do We Need All the Synthetic Data? Targeted Image Augmentation via Diffusion Models
LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution
ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation
jqBench: a benchmark for reading and editing JSON from natural language and/or examples
Abstracting Robot Manipulation Skills via Mixture-of-Experts Diffusion Policies
LS-Merge: Merging Language Models in Latent Space
Small Transformers Don’t Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and Implications for Mechanistic Interpretability
Textual Bayes: Quantifying Prompt Uncertainty in LLM-Based Systems
Directional Convergence, Benign Overfitting of Gradient Descent in leaky ReLU two-layer Neural Networks
Unmasking Backdoors: An Explainable Defense via Gradient-Attention Anomaly Scoring for Pre-trained Language Models
DiffusionNFT: Online Diffusion Reinforcement with Forward Process
Solving the 2-norm k-hyperplane clustering problem via multi-norm formulations
Dual-Space Smoothness for Robust and Balanced LLM Unlearning
Inference-Time Personalized Safety Control via Paired Difference-in-Means Intervention
DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty
Zeros can be Informative: Masked Binary U-Net for Image Segmentation on Tensor Cores
Prior-based Noisy Text Data Filtering: Fast and Strong Alternative For Perplexity
Convergence Dynamics of Over-Parameterized Score Matching for a Single Gaussian
Light-X: Generative 4D Video Rendering with Camera and Illumination Control
The Polar Express: Optimal Matrix Sign Methods and their Application to the Muon Algorithm
StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning
LogART: Pushing the Limit of Efficient Logarithmic Post-Training Quantization
Theoretical Guarantees for Causal Discovery on Large Random Graphs
DiffSparse: Accelerating Diffusion Transformers with Learned Token Sparsity
Optimal transport unlocks end-to-end learning for single-molecule localization
Flow-based Conformal Prediction for Multi-dimensional Time Series
CogMoE: Signal-Quality–Guided Multimodal MoE for Cognitive Load Prediction
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning
Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs
Unsupervised Learning of Efficient Exploration: Pre-training Adaptive Policies via Self-Imposed Goals
A Study of Posterior Stability in Time-Series Latent Diffusion
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
Efficient Discriminative Joint Encoders for Large Scale Vision-Language Reranking
Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique
Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence
A2D: Any-Order, Any-Step Safety Alignment for Diffusion Language Models
Scaling Sequence-to-Sequence Generative Neural Rendering
Jailbreak Transferability Emerges from Shared Representations
Wavelet Predictive Representations for Non-Stationary Reinforcement Learning
Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning
TaskCraft: Automated Generation of Agentic Tasks
MesaNet: Sequence Modeling by Locally Optimal Test-Time Training
Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context
Automated Stateful Specialization for Adaptive Agent Systems
Memorizing Long-tail Data Can Help Generalization Through Composition
Koopman-Assisted Trajectory Synthesis: A Data Augmentation Framework for Offline Imitation Learning
Non-Collaborative User Simulators for Tool Agents
Training Deep Normalization-Free Spiking Neural Networks with Lateral Inhibition
A Scalable Inter-edge Correlation Modeling in CopulaGNN for Link Sign Prediction
On the Spectral Differences Between NTK and CNTK and Their Implications for Point Cloud Recognition
ProofBridge: Auto-Formalization of Natural Language Proofs in Lean via Joint Embeddings
Relational Transformer: Toward Zero-Shot Foundation Models for Relational Data
Learning to Orchestrate Agents in Natural Language with the Conductor
Rating Quality of Diverse Time Series Data by Meta-learning from LLM Judgment
On The Expressive Power of GNN Derivatives
Soft Equivariance Regularization for Invariant Self-Supervised Learning
Learning Retrieval Models with Sparse Autoencoders
SketchEvo: Leveraging Drawing Dynamics for Enhanced Image Synthesis
Correlated Policy Optimization in Multi-Agent Subteams
Trion: FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of LLMs
Formal Mechanistic Interpretability: Automated Circuit Discovery with Provable Guarantees
Hierarchical Entity-centric Reinforcement Learning with Factored Subgoal Diffusion
SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs
Grounding and Enhancing Informativeness and Utility in Dataset Distillation
TVTSyn: Content-Synchronous Time-Varying Timbre for Streaming Voice Conversion and Anonymization
Flower: A Flow-Matching Solver for Inverse Problems
Agentic Reinforced Policy Optimization
From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments
TangleScore: Tangle-Guided Purge and Imprint for Unstructured Knowledge Editing
Learning Correlated Reward Models: Statistical Barriers and Opportunities
Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression
Batch Pruning by Activation Stability
Concept-TRAK: Understanding how diffusion models learn concepts through concept attribution
Learning in Prophet Inequalities with Noisy Observations
UnLoc: Leveraging Depth Uncertainties for Floorplan Localization
Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models
OSIRIS: Bridging Analog Circuit Design and Machine Learning with Scalable Dataset Generation
Wiki-R1: Incentivizing Multimodal Reasoning for Knowledge-based VQA via Data and Sampling Curriculum
SongEcho: Towards Cover Song Generation via Instance-Adaptive Element-wise Linear Modulation
Scaling Laws for Diffusion Transformers
Rejuvenating Cross-Entropy Loss in Knowledge Distillation for Recommender Systems
Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks
CTRL&SHIFT: High-quality Geometry-Aware Object Manipulation in Visual Generation
How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining
VARestorer: One-Step VAR Distillation for Real-World Image Super-Resolution
Huxley-G\"odel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine
Self-Improving Loops for Visual Robotic Planning
ATGen: Adversarial Reinforcement Learning for Test Case Generation
NLI : Non-uniform Linear Interpolation Approximation of Nonlinear Operations for Efficient LLMs Inference
Deterministic Bounds and Random Estimates of Metric Tensors on Neuromanifolds
From Samples to Scenarios: A New Paradigm for Probabilistic Forecasting
VeriTrail: Closed-Domain Hallucination Detection with Traceability
MobileKGQA: On-Device KGQA System on Dynamic Mobile Environments
DreamPhase: Offline Imagination and Uncertainty-Guided Planning for Large-Language-Model Agents
HARP: Hallucination Detection via Reasoning Subspace Projection
ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases
StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams
DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models
Detecting Temporal Misalignment Attacks in Multimodal Fusion for Autonomous Driving
Emergent Coordination in Multi-Agent Language Models
DMAP: A Distribution Map for Text
Rethinking Model Calibration through Spectral Entropy Regularization in Medical Image Segmentation
Decomposing Representation Space into Interpretable Subspaces with Unsupervised Learning
Transformers Learn Latent Mixture Models In-Context via Mirror Descent
Purrception: Variational Flow Matching for Vector-Quantized Image Generation
Flatness Guided Test-Time Adaptation for Vision-Language Models
SSG: Scaled Spatial Guidance for Multi-Scale Visual Autoregressive Generation
Combinatorial Rising Bandits
Enhancing Molecular Property Predictions by Learning from Bond Modelling and Interactions
Data-to-Energy Stochastic Dynamics
Infinite Horizon Markov Economies
Negative Pre-activations Differentiate Syntax
Distilling to Hybrid Attention Models via KL-Guided Layer Selection
Trajectory-aware Shifted State Space Models for Online Video Super-Resolution
Distributionally Robust Classification for Multi-source Unsupervised Domain Adaptation
Ensemble Prediction of Task Affinity for Efficient Multi-Task Learning
Steering the Herd: A Framework for LLM-based Control of Social Learning
JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation
How Many Code and Test Cases Are Enough? Evaluating Test Cases Generation from a Binary-Matrix Perspective
Generalization in LLM Problem Solving: The Case of the Shortest Path
Extending the Context of Pretrained LLMs by Dropping Their Positional Embedding
Pisces: Cryptography-based Private Retrieval-Augmented Generation with Dual-Path Retrieval
Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning
Beyond Aggregation: Guiding Clients in Heterogeneous Federated Learning
Generative Value Conflicts Reveal LLM Priorities
ProPerSim: Developing Proactive and Personalized AI Assistants through User-Assistant Simulation
Two (narrow) heads are better than (an arbitrarily wide) one
Pulp Motion: Framing-aware multimodal camera and human motion generation
When Weak LLMs Speak with Confidence, Preference Alignment Gets Stronger
Score Distillation Beyond Acceleration: Generative Modeling from Corrupted Data
Compositional Generalization through Gradient Search in Nonparametric Latent Space
Online Conformal Prediction with Adversarial Semi-bandit Feedback via Regret Minimization
Knowing When to Quit: Probabilistic Early Exits for Speech Separation Networks
Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners
La-Proteina: Atomistic Protein Generation via Partially Latent Flow Matching
Salient Object Ranking via Cyclical Perception-Viewing Interaction Modeling
Multimodal Dataset Distillation via Phased Teacher Models
A Law of Data Reconstruction for Random Features (And Beyond)
Dichotomous Diffusion Policy Optimization
Following the Navigation: Enhancing Small Language Models Contextual Reasoning with LLM Guidance
Scaling Linear Attention Capacity with Sparse State Expansion
Exploring State-Space Models for Data-Specific Neural Representations
Task-Aware Data Selection via Proxy-Label Enhanced Distribution Matching for LLM Finetuning
Boosting Multi-Domain Reasoning of LLMs via Curvature-Guided Policy Optimization
MUSE: Model-Agnostic Tabular Watermarking via Multi-Sample Selection
Tricks or Traps? A Deep Dive into RL for LLM Reasoning
No, of Course I Can! Deeper Fine-Tuning Attacks That Bypass Token-Level Safety Mechanisms
Polynomial, trigonometric, and tropical activations
EventFlash: Towards Efficient MLLMs for Event-Based Vision
Human or Machine? A Preliminary Turing Test for Speech-to-Speech Interaction
LORE: Jointly Learning The Intrinsic Dimensionality and Relative Similarity Structure from Ordinal Data
FullPart: Generating each 3D Part at Full Resolution
Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning
NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation
Code2Bench: Scaling Source and Rigor for Dynamic Benchmark Construction
The Less You Depend, The More You Learn: Synthesizing Novel Views from Sparse, Unposed Images without Any 3D Knowledge
MixLinear: Extreme Low Resource Multivariate Time Series Forecasting with $0.1K$ Parameters
Enhancing Multivariate Time Series Forecasting with Global Temporal Retrieval
Shuffling the Data, Extrapolating the Step: Sharper Bias In Constant Step-Size SGD
Characterizing Human Semantic Navigation in Concept Production as Trajectories in Embedding Space
FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models
Geometric Autoencoder Priors for Bayesian Inversion: Learn First Observe Later
DTO-KD: Dynamic Trade-off Optimization for Effective Knowledge Distillation
OPRIDE: Efficient Offline Preference-based Reinforcement Learning via In-Dataset Exploration
BiasScope: Towards Automated Detection of Bias in LLM-as-a-Judge Evaluation
EgoTwin: Dreaming Body and View in First Person
Behavioral Embeddings of Programs: A Quasi-Dynamic Approach for Optimization Prediction
ATLAS: Alibaba Dataset and Benchmark for Learning-Augmented Scheduling
Understanding the Mechanisms of Fast Hyperparameter Transfer
H$^3$DP: Triply‑Hierarchical Diffusion Policy for Visuomotor Learning
Stopping Computation for Converged Tokens in Masked Diffusion-LM Decoding
Falcon: Fast Proximal Linearization of Normalized Cuts for Unsupervised Image Segmentation
Weight Decay may matter more than µP for Learning Rate Transfer in Practice
LANE: Label-Aware Noise Elimination for Fine-Grained Text Classification
MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks
DreamCS: Geometry-Aware Text-to-3D Generation with Unpaired 3D Reward Supervision
Disentangling Knowledge Representations for Large Language Model Editing
Beyond Visual Reconstruction Quality: Object Perception-aware 3D Gaussian Splatting for Autonomous Driving
WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool
ReST-KV: Robust KV Cache Eviction with Layer-wise Output Reconstruction and Spatial-Temporal Smoothing
The Mind's Transformer: Computational Neuroanatomy of LLM-Brain Alignment
Efficient Estimation of Kernel Surrogate Models for Task Attribution
SPWOOD: Sparse Partial Weakly-Supervised Oriented Object Detection
TAMMs: Change Understanding and Forecasting in Satellite Image Time Series with Temporal-Aware Multimodal Models
Inference-Time Dynamic Modality Selection for Incomplete Multimodal Classification
GTM: A General Time-series Model for Enhanced Representation Learning of Time-Series data
AutoGPS: Automated Geometry Problem Solving via Multimodal Formalization and Deductive Reasoning
TSLM: Tree-Structured Language Modeling for Divergent Thinking
CoEmoGen: Towards Semantically-Coherent and Scalable Emotional Image Content Generation
Mesh Splatting for End-to-end Multiview Surface Reconstruction
Harmonized Cone for Feasible and Non-conflict Directions in Training Physics-Informed Neural Networks
RAIN-Merging: A Gradient-Free Method to Enhance Instruction Following in Large Reasoning Models with Preserved Thinking Format
Distilled Pretraining: A modern lens of Data, In-Context Learning and Test-Time Scaling
Breaking and Fixing Defenses Against Control Flow Hijacking in Multi-Agent Systems
On Powerful Ways to Generate: Autoregression, Diffusion, and Beyond
Tighter Performance Theory of FedExProx
Policy Contrastive Decoding for Robotic Foundation Models
FastAvatar: Towards Unified and Fast 3D Avatar Reconstruction with Large Gaussian Reconstruction Transformers
Mitigating Semantic Collapse in Generative Personalization with Test-Time Embedding Adjustment
MicroMix: Efficient Mixed-Precision Quantization with Microscaling Formats for Large Language Models
Decoupled MeanFlow: Turning Flow Models into Flow Maps for Accelerated Sampling
RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards
Towards One-step Causal Video Generation via Adversarial Self-Distillation
Tina: Tiny Reasoning Models via LoRA
EMBridge: Enhancing Gesture Generalization from EMG Signals Through Cross-modal Representation Learning
SONATA: Synergistic Coreset Informed Adaptive Temporal Tensor Factorization
PateGAIL++: Utility Optimized Private Trajectory Generation with Imitation Learning
LLM2Fx-Tools: Tool Calling for Music Post-Production
Addressing Pitfalls in the Evaluation of Uncertainty Estimation Methods for Natural Language Generation
A Genetic Algorithm for Navigating Synthesizable Molecular Spaces
Personalized Collaborative Learning with Affinity-Based Variance Reduction
Alignment-Weighted DPO: A principled reasoning approach to improve safety alignment
COMAL: A Convergent Meta-Algorithm for Aligning LLMs with General Preferences
Einstein Fields: A Neural Perspective To Computational General Relativity
ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs
DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving
SpatialViz-Bench: A Cognitively-Grounded Benchmark for Diagnosing Spatial Visualization in MLLMs
Transfer Learning in Infinite Width Feature Learning Networks
Incentive-Aligned Multi-Source LLM Summaries
ReFocusEraser: Refocusing for Small Object Removal with Robust Context-Shadow Repair
Beyond Structure: Invariant Crystal Property Prediction with Pseudo-Particle Ray Diffraction
TreeGrad-Ranker: Feature Ranking via $O(L)$-Time Gradients for Decision Trees
Learning to Reason without External Rewards
UniCalli: A Unified Diffusion Framework for Column-Level Generation and Recognition of Chinese Calligraphy
Learn More with Less: Uncertainty Consistency Guided Query Selection for RLVR
A Graph Meta-Network for Learning on Kolmogorov–Arnold Networks
CoT-Evo: Evolutionary Distillation of Chain-of-Thought for Scientific Reasoning
OpenFly: A COMPREHENSIVE PLATFORM FOR AERIAL VISION-LANGUAGE NAVIGATION
FreeAdapt: Unleashing Diffusion Priors for Ultra-High-Definition Image Restoration
Learning an Image Editing Model without Image Editing Pairs
Multimodal LLM-assisted Evolutionary Search for Programmatic Control Policies
Test-Time Adaptation for LLM Agents via Environment Interaction
WholeBodyVLA: Towards Unified Latent VLA for Whole-body Loco-manipulation Control
DaVinci: Reinforcing Visual-Structural Syntax in MLLMs for Generalized Scientific Diagram Parsing
BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation
BOAD: Discovering Hierarchical Software Engineering Agents via Bandit Optimization
Topology and geometry of the learning space of ReLU networks: connectivity and singularities
Robust Optimization for Mitigating Reward Hacking with Correlated Proxies
CryoNet.Refine: A One-step Diffusion Model for Rapid Refinement of Structural Models with Cryo-EM Density Map Restraints
Scalable In-Context Q-Learning
Incentives in Federated Learning with Heterogeneous Agents
Dynamic Early Exit in Reasoning Models
Look-ahead Reasoning with a Learned Model in Imperfect Information Games
Let's Explore Step by Step: Generating Provable Formal Statements with Deductive Exploration
Seesaw: Accelerating Training by Balancing Batch Size and Learning Rate Scheduling
Learning from Algorithm Feedback: One-Shot SAT Solver Guidance with GNNs
Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel
GUI-Shift: Enhancing VLM-Based GUI Agents through Self-supervised Reinforcement Learning
Learning to Grasp Anything By Playing with Random Toys
PaAno: Patch-Based Representation Learning for Time-Series Anomaly Detection
Arbitrary-Order Block SignSGD for Memory-Efficient LLM Fine-Tuning
Active Learning for Decision Trees with Provable Guarantees
Intrinsic Lorentz Neural Network
Horseshoe Splatting: Handling Structural Sparsity for Uncertainty-Aware Gaussian-Splatting Radiance Field Rendering
Label-Free Mitigation of Spurious Correlations in VLMs using Sparse Autoencoders
Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design
In-Context Learning for Pure Exploration
Towards Bridging the Gap between Large-Scale Pretraining and Efficient Finetuning for Humanoid Control
Fine-Grained Class-Conditional Distribution Balancing for Debiased Learning
Local Geometry Attention for Time Series Forecasting under Realistic Corruptions
Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools
FlexLinearAttention: Compiling a Unified Abstraction into Scalable Kernels for Linear Attention
Variation-aware Flexible 3D Gaussian Editing
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence
RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs
SVD Provably Denoises Nearest Neighbor Data
Discrete Guidance Matching: Exact Guidance for Discrete Flow Matching
ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents
TCD-Arena: Assessing Robustness of Time Series Causal Discovery Methods Against Assumption Violations
WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research
Biologically Plausible Learning via Bidirectional Spike-Based Distillation
Exploratory Causal Inference in SAEnce
Peng's Q($\lambda$) for Conservative Value Estimation in Offline Reinforcement Learning
From Collapse to Control: Understanding and Extending Context Length in Emerging Hybrid Models via Universal Position Interpolation
Actions Speak Louder than Prompts: A Large-Scale Study of LLMs for Graph Inference
OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging
Enhancing Visual Token Representations for Video Large Language Models via Training-free Spatial-Temporal Pooling and Gridding
DriveMamba: Task-Centric Scalable State Space Model for Efficient End-to-End Autonomous Driving
Learning Boltzmann Generators via Constrained Mass Transport
Sample Reward Soups: Query-efficient Multi-Reward Guidance for Text-to-Image Diffusion Models
A Unified Federated Framework for Trajectory Data Preparation via LLMs
Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages
HAMLET: A Hierarchical and Adaptive Multi-Agent Framework for Live Embodied Theatrics
Discrete Compositional Generation via General Soft Operators and Robust Reinforcement Learning
AVEX: What Matters for Animal Vocalization Encoding
Continuous Audio Language Models
Purifying Generative LLMs from Backdoors without Prior Knowledge or Clean Reference
Distilling Causal Signals for One-Shot Directed Evolution of Antibodies
Neural Collapse in Multi-Task Learning
Causal Score Conditioning for Multi-Resolution Latent Systems
Building a Foundational Guardrail for General Agentic Systems via Synthetic Data
ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models
PolySHAP: Extending KernelSHAP with Interaction-Informed Polynomial Regression
Beyond Raw Detection Scores: Markov-Informed Calibration for Boosting Machine-Generated Text Detection
Choices Speak Louder than Questions
Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions
Consistency-Driven Calibration and Matching for Few-Shot Class Incremental Learning
Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts
Keep the Best, Forget the Rest: Reliable Alignment with Order-Aware Preference Optimization
A cross-species neural foundation model for end-to-end speech decoding
MoSA: Motion-Coherent Human Video Generation via Structure-Appearance Decoupling
Unleashing LLMs in Bayesian Optimization: Preference-Guided Framework for Scientific Discovery
Human-Object Interaction via Automatically Designed VLM-Guided Motion Policy
A2ASecBench: A Protocol-Aware Security Benchmark for Agent-to-Agent Multi-Agent Systems
VaseVQA-3D: Benchmarking 3D VLMs on Ancient Greek Pottery
DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning
Smarter Not Harder: Generative Process Evaluation with Intrinsic-Signal Driving and Ability‑Adaptive Reward Shaping
FLARE: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
UniF$^2$ace: A $\underline{Uni}$fied $\underline{F}$ine-grained $\underline{Face}$ Understanding and Generation Model
Test-Time Mixture of World Models for Embodied Agents in Dynamic Environments
Boolean Satisfiability via Imitation Learning
Homeostatic Adaptation of Optimal Population Codes under Metabolic Stress
DCFold: Efficient Protein Structure Generation with Single Forward Pass
On Code-Induced Reasoning in LLMs
Scalable Multi-Task Low-Rank Model Adaptation
Rethinking Pareto Frontier: On the Optimal Trade-offs in Fair Classification
COSA: Context-aware Output-Space Adapter for Test-Time Adaptation in Time Series Forecasting
TEST-TIME SCALING IN DIFFUSION LLMS VIA HIDDEN SEMI-AUTOREGRESSIVE EXPERTS
TP-Spikformer: Token Pruned Spiking Transformer
Learning on a Razor’s Edge: Identifiability and Singularity of Polynomial Neural Networks
CoT Vectors: Transferring and Probing the Reasoning Mechanisms of LLMs
Learning Physics-Grounded 4D Dynamics with Neural Gaussian Force Fields
TSPulse: Tiny Pre-Trained Models with Disentangled Representations for Rapid Time-Series Analysis
Self-Supervised Evolution Operator Learning for High-Dimensional Dynamical Systems
Perception-R1: Advancing Multimodal Reasoning Capabilities of MLLMs via Visual Perception Reward
ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs
Long-range Modeling and Processing of Multimodal Event Sequences
Test-Time Poisoned Sample Detection by Exploiting Shallow Malicious Matching in Backdoored CLIP
Metric $k$-clustering using only Weak Comparison Oracles
3DSMT: A Hybrid Spiking Mamba-Transformer for Point Cloud Analysis
EXP-Bench: Can AI Conduct AI Research Experiments?
Temporal Representations for Exploration: Learning Complex Exploratory Behavior without Extrinsic Rewards
PolySkill: Learning Generalizable Skills Through Polymorphic Abstraction For Continual Learning
HAMLET: Switch Your Vision-Language-Action Model into a History-Aware Policy
Pretrain–Test Task Alignment Governs Generalization in In-Context Learning
Enhancing Diffusion-Based Sampling with Molecular Collective Variables
Prune-then-Quantize or Quantize-then-Prune? Understanding the Impact of Compression Order in Joint Model Compression
WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving
RegionReasoner: Region-Grounded Multi-Round Visual Reasoning
Learning Shrinks the Hard Tail: Training‑Dependent Inference Scaling in a Solvable Linear Model
DiVeQ: Differentiable Vector Quantization Using the Reparameterization Trick
SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training
The Quest for Generalizable Motion Generation: Data, Model, and Evaluation
DiSRouter: Distributed Self-Routing for LLM Selections
Knowledge Exchange with Confidence: Cost-Effective LLM Integration for Reliable and Efficient Visual Question Answering
From Abstract to Contextual: What LLMs Still Cannot Do in Mathematics
EUBRL: Epistemic Uncertainty Directed Bayesian Reinforcement Learning
Unified and Efficient Multi-view Clustering from Probabilistic Perspective
What Generative Search Engines Like and How to Optimize Web Content Cooperatively
DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains
Mode-conditioning unlocks superior test-time compute scaling
Contrastive Predictive Coding Done Right for Mutual Information Estimation
Pre-training LLM without Learning Rate Decay Enhances Supervised Fine-Tuning
Denoising Neural Reranker for Recommender Systems
A Study on PAVE Specification for Learnware
Learning from Label Proportions via Proportional Value Classification
Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match
SyncTrack: Rhythmic Stability and Synchronization in Multi-Track Music Generation
Cut Less, Fold More: Model Compression through the Lens of Projection Geometry
Output Supervision Can Obfuscate the Chain of Thought
Unified Privacy Guarantees for Decentralized Learning via Matrix Factorization
Localized Concept Erasure in Text-to-Image Diffusion Models via High-Level Representation Misdirection
Improving Set Function Approximation with Quasi-Arithmetic Neural Networks
Error Notebook-Guided, Training-Free Part Retrieval in 3D CAD Assemblies via Vision-Language Models
Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning
Reinforcement Learning via Value Gradient Flow
STORM: Synergistic Cross-Scale Spatio-Temporal Modeling for Weather Forecasting
Do Large Language Models Know What They Are Capable Of?
On the Benefits of Weight Normalization for Overparameterized Matrix Sensing
InfoTok: Adaptive Discrete Video Tokenizer via Information-Theoretic Compression
Learning Recursive Multi-Scale Representations for Irregular Multivariate Time Series Forecasting
MC-Search: Evaluating and Enhancing Multimodal Agentic Search with Structured Long Reasoning Chains
JULI: Jailbreak Large Language Models by Self-Introspection
Safe Exploration via Policy Priors
3DCS: Datasets and Benchmark for Evaluating Conformational Sensitivity in Molecular Representations
Structural Inference: Interpreting Small Language Models with Susceptibilities
Factuality Matters: When Image Generation and Editing Meet Structured Visuals
Revisiting Active Sequential Prediction-Powered Mean Estimation
Empowering Multi-Robot Cooperation via Sequential World Models
Strategic Obfuscation of Deceptive Reasoning in Language Models
I2Mole: Interaction-aware Invariant Molecular Learning For Generalizable Property Prediction
Condition Errors Refinement in Autoregressive Image Generation with Diffusion Loss
Bringing Stability to Diffusion: Decomposing and Reducing Variance of Training Masked Diffusion Models
Intrinsic training dynamics of deep neural networks
Breaking the Total Variance Barrier: Sharp Sample Complexity for Linear Heteroscedastic Bandits with Fixed Action Set
Phantom-Data: Towards a General Subject-Consistent Video Generation Dataset
Policy Likelihood-based Query Sampling and Critic-Exploited Reset for Efficient Preference-based Reinforcement Learning
ROSETTA: Constructing Code-Based Reward from Unconstrained Language Preference
Verifier-Constrained Flow Expansion for Discovery Beyond the Data
Heads collapse, features stay: Why Replay needs big buffers
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLMs
AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy
SMAN-Bench: A Cross-System Benchmark for Mobile Agents under Single- and Multi-path, Ambiguous, and Noisy Tasks
GenCape: Structure-Inductive Generative Modeling for Category-Agnostic Pose Estimation
MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs
Distributional Machine Unlearning via Selective Data Removal
Learning Heterogeneous Degradation Representation for Real-World Super-Resolution
D-REX: Differentiable Real-to-Sim-to-Real Engine for Learning Dexterous Grasping
RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback
Adaptive Collaboration with Humans: Metacognitive Policy Optimization for Multi-Agent LLMs with Continual Learning
Uni-X: Mitigating Modality Conflict with a Two-End-Separated Architecture for Unified Multimodal Models
Prosperity before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs?
DRIFT: Decompose, Retrieve, Illustrate, then Formalize Theorems
ADM-v2: Pursuing Full-Horizon Roll-out in Dynamics Models for Offline Policy Learning and Evaluation
Universal Properties of Activation Sparsity in Modern Large Language Models
Rolling Forcing: Autoregressive Long Video Diffusion in Real Time
SRT: Super-Resolution for Time Series via Disentangled Rectified Flow
UniCA: Unified Covariate Adaptation for Time Series Foundation Model
Distribution-informed Online Conformal Prediction
CARL: Preserving Causal Structure in Representation Learning
Why Do Unlearnable Examples Work: A Novel Perspective of Mutual Information
Mamba-3: Improved Sequence Modeling using State Space Principles
Beyond Markovian Drifts: Action-Biased Geometric Walks with Memory for Personalized Summarization
Enhancing Communication Compression via Discrepancy-aware Calibration for Federated Learning
QUEST: A robust attention formulation using query-modulated spherical attention
CDBridge: A Cross-omics Post-training Bridge Strategy for Context-aware Biological Modeling
Semi-Parametric Contextual Pricing with General Smoothness
Theoretical Modeling of Large Language Model Self-Improvement Training Dynamics Through Solver-Verifier Gap
Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models
Efficient-SAM2: Accelerating SAM2 with Object-Aware Visual Encoding and Memory Retrieval
There is No VAE: End-to-End Pixel-Space Generative Modeling via Self-Supervised Pre-Training
Bidirectional Predictive Coding
MotionGPT3: Human Motion as a Second Modality
Householder-Diagonalized Linear Attention (HDLA): Utilizing Enhanced Decay Mechanism for Efficient Sequence Modeling
WAFT: Warping-Alone Field Transforms for Optical Flow
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots
Monitoring Decomposition Attacks with Lightweight Sequential Monitors
Exchangeability of GNN Representations with Applications to Graph Retrieval
Scalable Multilingual Multimodal Machine Translation with Speech-Text Fusion
Online Prediction of Stochastic Sequences with High Probability Regret Bounds
Overlap-Adaptive Regularization for Conditional Average Treatment Effect Estimation
InnoGym: Benchmarking the Innovation Potential of AI Agents
VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation
GenCtrl -- A Formal Controllability Toolkit for Generative Models
Tree-sliced Sobolev IPM
A Function-Centric Graph Neural Network Approach for Predicting Electron Densities
Beyond Short Steps in Frank-Wolfe Algorithms
TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture
Exploiting Low-Dimensional Manifold of Features for Few-Shot Whole Slide Image Classification
$\textbf{Re}^{2}$: Unlocking LLM Reasoning via Reinforcement Learning with Re-solving
Improving the Trade-off Between Watermark Strength and Speculative Sampling Efficiency for Language Models
TurboBoA: Faster and Exact Attention-aware Quantization without Backpropagation
No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers
MMReD: a Cross-Modal Benchmark for Dense Context Reasoning
DiffTrans: Differentiable Geometry-Materials Decomposition for Reconstructing Transparent Objects
Sim2Real VLA: Zero-Shot Generalization of Synthesized Skills to Realistic Manipulation
VIRTUE: Visual-Interactive Text-Image Universal Embedder
A Cognitive Process-Inspired Architecture for Subject-Agnostic Brain Visual Decoding
Latent Planning Emerges with Scale
The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context
Cautious Weight Decay
Synergizing Understanding and Generation with Interleaved Analyzing-Drafting Thinking
Spurious Correlation-Aware Embedding Regularization for Worst-Group Robustness
Zero-Overhead Introspection for Adaptive Test-Time Compute
PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
ByteFlow: Language Modeling through Adaptive Byte Compression without a Tokenizer
SinkTrack: Attention Sink based Context Anchoring for Large Language Models
LLMs Struggle to Balance Reasoning and World Knowledge in Causal Narrative Understanding
Improved high-dimensional estimation with Langevin dynamics and stochastic weight averaging
Discern Truth from Falsehood: Reducing Over-Refusal via Contrastive Refinement
Fine-Grained Iterative Adversarial Attacks with Limited Computation Budget
3D Aware Region Prompted Vision Language Model
Hilbert: Recursively Building Formal Proofs with Informal Reasoning
LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing
GAPrune: Gradient-Alignment Pruning for Domain-Aware Embeddings
CroCoDiLight: Repurposing Cross-View Completion Encoders for Relighting
SelvaBox: A high‑resolution dataset for tropical tree crown detection
MaRS: Memory-Adaptive Routing for Reliable Capacity Expansion and Knowledge Retention
Conditionally Whitened Generative Models for Probabilistic Time Series Forecasting
Interaction Field Matching: Overcoming Limitations of Electrostatic Models
StyliTruth : Unlocking Stylized yet Truthful LLM Generation via Disentangled Steering
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs
Generative Bayesian Optimization: Generative Models as Acquisition Functions
Deep Global-sense Hard-negative Discriminative Generation Hashing for Cross-modal Retrieval
Harpoon: Generalised Manifold Guidance for Conditional Tabular Diffusion
Ghost in the Cloud: Your Geo-Distributed Large Language Models Training is Easily Manipulated
Towards Privacy-Guaranteed Label Unlearning in Vertical Federated Learning: Few-Shot Forgetting Without Disclosure
To View Transform or Not to View Transform: NeRF-based Pre-training Perspective
Flow-Disentangled Feature Importance
ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance Generation
EditBench: Evaluating LLM Abilities to Perform Real-World Instructed Code Edits
Autonomous Functional Play with Correspondence-Driven Trajectory Warping
G-Merging: Graph Models Merging for Parameter-Efficient Multi-Task Knowledge Consolidation
Non-Convex Federated Optimization under Cost-Aware Client Selection
Composition-Grounded Data Synthesis for Visual Reasoning
Separable Neural Networks: Approximation Theory, NTK Regime, and Preconditioned Gradient Descent
Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning
Safety Subspaces are Not Linearly Distinct: A Fine-Tuning Case Study
Hyper-SET: Designing Transformers via Hyperspherical Energy Minimization
SuperF: Neural Implicit Fields for Multi-Image Super-Resolution
D&R: Recovery-based AI-Generated Text Detection via a Single Black-box LLM Call
Talking Points: Describing and Localizing Pixels
Autoregressive-based Progressive Coding for Ultra-Low Bitrate Image Compression
Sample Efficient Offline RL via T-Symmetry Enforced Latent State-Stitching
EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video
The Rank and Gradient Lost in Non-stationarity: Sample Weight Decay for Mitigating Plasticity Loss in Reinforcement Learning
R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning
Towards a Foundation Model for Crowdsourced Label Aggregation
Faithfulness Under the Distribution: A New Look at Attribution Evaluation
PromptHub: Enhancing Multi-Prompt Visual In-Context Learning with Locality-Aware Fusion, Concentration and Alignment
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
KVComm: Enabling Efficient LLM Communication through Selective KV Sharing
EvA: Evolutionary Attacks on Graphs
Generate Any Scene: Scene Graph Driven Data Synthesis for Visual Generation Training
Towards Sustainable Investment Policies Informed by Opponent Shaping
WorldEdit: Towards Open-World Image Editing with a Knowledge-Informed Benchmark
Gradient-Normalized Smoothness for Optimization with Approximate Hessians
Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks
Learning to Weight Parameters for Training Data Attribution
EmoPrefer: Can Large Language Models Understand Human Emotion Preferences?
ToolTree: Efficient LLM Tool Planning via Dual-Feedback Monte Carlo Tree Search and Bidirectional Pruning
Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation
Multi-Agent Debate with Memory Masking
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse
Tell me Habibi, is it Real or Fake?
Revisit Visual Prompt Tuning: The Expressiveness of Prompt Experts
Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning
Late-to-Early Training: LET LLMs Learn Earlier, So Faster and Better
Context Learning for Multi-Agent Discussion
Context parroting: A simple but tough-to-beat baseline for foundation models in scientific machine learning
ComPhy: Composing Physical Models with end-to-end Alignment
A^2TG: Adaptive Anisotropic Textured Gaussians for Efficient 3D Scene Representation
GCGNet: Graph-Consistent Generative Network for Time Series Forecasting with Exogenous Variables
Pinet: Optimizing hard-constrained neural networks with orthogonal projection layers
Robustify Spiking Neural Networks via Dominant Singular Deflation under Heterogeneous Training Vulnerability
Doubly-Robust LLM-as-a-Judge: Externally Valid Estimation with Imperfect Personas
Compositional Visual Planning via Inference-Time Diffusion Scaling
UniVideo: Unified Understanding, Generation, and Editing for Videos
Learning Molecular Chirality via Chiral Determinant Kernels
EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling
MCbiF: Measuring Topological Autocorrelation in Multiscale Clusterings via 2-Parameter Persistent Homology
Open-Set Semantic Gaussian Splatting SLAM with Expandable Representation
Embodied Agents Meet Personalization: Investigating Challenges and Solutions Through the Lens of Memory Utilization
Learning Adaptive Distribution Alignment with Neural Characteristic Function for Graph Domain Adaptation
Robust Amortized Bayesian Inference with Self-Consistency Losses on Unlabeled Data
MetaVLA: Unified Meta Co-Training for Efficient Embodied Adaptation
LiveClin: A Live Clinical Benchmark without Leakage
EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements
Rethinking Causal Mask Attention for Vision-Language Inference
Let OOD Feature Exploring Vast Predefined Classifiers
Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs
LEGACY: A Lightweight Dynamic Gradient Compression Strategy for Distributed Deep Learning
QuRL: Rubrics As Judge For Open-Ended Question Answering
EigenScore: OOD Detection using Posterior Covariance in Diffusion Models
Towards Understanding the Shape of Representations in Protein Language Models
Knowledge Distillation for Large Language Models through Residual Learning
Functional MRI Time Series Generation via Wavelet-Based Image Transform and Spectral Flow Matching for Brain Disorder Identification
Can You Hear Me Now? A Benchmark for Long-Range Graph Propagation
The Softmax Bottleneck Does Not Limit the Probabilities of the Most Likely Tokens
SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence
Disentangled Robot Learning via Separate Forward and Inverse Dynamics Pretraining
Seeing What’s Not There: Negation Understanding Needs More Than Training
Signal Structure-Aware Gaussian Splatting for Large-Scale Scene Reconstruction
Generalization of RLVR Using Causal Reasoning as a Testbed
LLaVA-FA: Learning Fourier Approximation for Compressing Large Multimodal Models
CARE: Covariance-Aware and Rank-Enhanced Decomposition for Enabling Multi-Head Latent Attention
Theoretical Analysis of Contrastive Learning under Imbalanced Data: From Training Dynamics to a Pruning Solution
Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions
On the Limits of Sparse Autoencoders: A Theoretical Framework and Reweighted Remedy
BideDPO: Conditional Image Generation with Simultaneous Text and Condition Alignment
Generative Universal Verifier as Multimodal Meta-Reasoner
EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging Benchmark
JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models
EgoBrain: Synergizing Minds and Eyes For Human Action Understanding
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
Completing Missing Annotation: Multi-Agent Debate for Accurate and Scalable Relevant Assessment for IR Benchmarks
Continual Low-Rank Adapters for LLM-based Generative Recommender Systems
Shift-and-Sum Quantization for Visual Autoregressive Models
PQGAN: Product-Quantised Image Representation for High-Quality Image Synthesis
Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility
Verifying Chain-of-Thought Reasoning via Its Computational Graph
On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment
Learning Nonlinear Causal Reductions to Explain Reinforcement Learning Policies
DynamicInfer: Runtime-Aware Sparse Offloading for LLMs Inference on a Consumer-Grade GPU
TimeRecipe: A Time-Series Forecasting Recipe via Benchmarking Module Level Effectiveness
AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents
ECHO: Toward Contextual Seq2Seq Paradigms in Large EEG Models
Multiple Token Divergence: Measuring and Steering In-Context Computation Density
FIRE: Frobenius-Isometry Reinitialization for Balancing the Stability–Plasticity Tradeoff
Reasoning in Space via Grounding in the World
Learning to Recall with Transformers Beyond Orthogonal Embeddings
All That Glitters Is Not Gold: Key-Secured 3D Secrets within 3D Gaussian Splatting
Replicable Reinforcement Learning with Linear Function Approximation
TIGaussian: Disentangle Gaussians for Spatial-Awared Text-Image-3D Alignment
Figma2Code: Automating Multimodal Design to Code in the Wild
On the Convergence Behavior of Preconditioned Gradient Descent Toward the Rich Learning Regime
ActiveCQ: Active Estimation of Causal Quantities
CSRv2: Unlocking Ultra-Sparse Embeddings
LINK: Learning Instance-level Knowledge from Vision-Language Models for Human-Object Interaction Detection
How reinforcement learning after next-token prediction facilitates learning
MASS: MoErging through Adaptive Subspace Selection
Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models
Aria: an Agent for Retrieval and Iterative Auto-Formalization via Dependency Graph
Are Global Dependencies Necessary? Scalable Time Series Forecasting via Local Cross-Variate Modeling
$AutoDrive\text{-}P^3$: Unified Chain of Perception–Prediction–Planning Thought via Reinforcement Fine-Tuning
ACCORD: Alleviating Concept Coupling through Dependence Regularization for Text-to-Image Diffusion Personalization
RL Grokking Recipe: How Does RL Unlock and Transfer New Algorithms in LLMs?
The Alignment Auditor: A Bayesian Framework for Verifying and Refining LLM Objectives
ODESteer: A Unified ODE-Based Steering Framework for LLM Alignment
Nüwa: Mending the Spatial Integrity Torn by VLM Token Pruning
LLMS ON TRIAL: Evaluating Judicial Fairness For Large Language Models
Pedagogically-Inspired Data Synthesis for Language Model Knowledge Distillation
Demystifying and Enhancing the Efficiency of Large Language Model Based Search Agents
Don't Settle Too Early: Self-Reflective Remasking for Diffusion Language Models
Dual-objective Language Models: Training Efficiency Without Overfitting
Understanding Cross-layer Contributions to Mixture-of-Experts Routing in LLMs
Trace Anything: Representing Any Video in 4D via Trajectory Fields
UniCon: Unified Framework for Efficient Contrastive Alignment via Kernels
Reinforcement Unlearning via Group Relative Policy Optimization
Taming Score-Based Denoisers in ADMM: A Convergent Plug-and-Play Framework
Study of Training Dynamics for Memory-Constrained Fine-Tuning
Combinatorial Bandit Bayesian Optimization for Tensor Outputs
Submodular Function Minimization with Dueling Oracle
Octax: Accelerated CHIP-8 Arcade Environments for Reinforcement Learning in JAX
PSDNorm: Temporal Normalization for Deep Learning in Sleep Staging
TEN-DM: Topology-Enhanced Diffusion Model for Spatio-Temporal Event Prediction
Identifiability Challenges in Sparse Linear Ordinary Differential Equations
On the Reasoning Abilities of Masked Diffusion Language Models
MoEEdit: Efficient and Routing-Stable Knowledge Editing for Mixture-of-Experts LLMs
Post-training Large Language Models for Diverse High-Quality Responses
Unbiased Gradient Estimation for Event Binning via Functional Backpropagation
RoRE: Rotary Ray Embedding for Generalised Multi-Modal Scene Understanding
A New Approach to Controlling Linear Dynamical Systems
NetArena: Dynamic Benchmarks for AI Agents in Network Automation
Score-based Greedy Search for Structure Identification of Partially Observed Causal Models
An Agentic Framework with LLMs for Solving Complex Vehicle Routing Problems
LongHorizonUI: A Unified Framework for Robust long-horizon Task Automation of GUI Agent
LitmusValues: Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas
Breaking Safety Paradox with Feasible Dual Policy Iteration
TrajFlow: Nation-wide Pseudo GPS Trajectory Generation with Flow Matching Models
In-Context Algorithm Emulation in Fixed-Weight Transformers
Matching Multiple Experts: On the Exploitability of Multi-Agent Imitation Learning
Energy-oriented Diffusion Bridge for Image Restoration with Foundational Diffusion Models
One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning
Saddle-To-Saddle Dynamics in Deep ReLU Networks: Low-Rank Bias in the First Saddle Escape
Contrastive Diffusion Guidance for Spatial Inverse Problems
From Prediction to Perfection: Introducing Refinement to Autoregressive Image Generation
Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models
Decentralized Nonconvex Optimization under Heavy-Tailed Noise: Normalization and Optimal Convergence
HistoPrism: Unlocking Functional Pathway Analysis from Pan-Cancer Histology via Gene Expression Prediction
Task-free Adaptive Meta Black-box Optimization
Capturing Visual Environment Structure Correlates with Control Performance
Test-Time Iterative Error Correction for Efficient Diffusion Models
Multilevel Control Functional
High-Dimensional Analysis of Single-Layer Attention for Sparse-Token Classification
Fast Escape, Slow Convergence: Learning Dynamics of Phase Retrieval under Power-Law Data
DecompGAIL: Learning Realistic Traffic Behaviors with Decomposed Multi-Agent Generative Adversarial Imitation Learning
CaRe-BN: Precise Moving Statistics for Stabilizing Spiking Neural Networks in Reinforcement Learning
Zero-Sacrifice Persistent-Robustness Adversarial Defense for Pre-Trained Encoders
Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning
Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty
VideoZoomer: Reinforcement-Learned Temporal Focusing for Long Video Reasoning
Certifying the Full YOLO Pipeline: A Probabilistic Verification Approach
Q&C: When Quantization Meets Cache in Efficient Generation
Beyond Linear Probes: Dynamic Safety Monitoring for Language Models
FideDiff: Efficient Diffusion Model for High-Fidelity Image Motion Deblurring
Composition of Pretrained Diffusion Models: A Logic-Based Calculus
PatchRefiner V2: Fast and Lightweight Real-Domain High-Resolution Metric Depth Estimation
Ads that Stick: Near-Optimal Ad Optimization through Psychological Behavior Models
Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards
$\textit{MADFormer}$: Mixed Autoregressive and Diffusion Transformers for Continuous Image Generation
CryoLVM: Self-supervised Learning from Cryo-EM Density Maps with Large Vision Models
SubDyve: Subgraph-Driven Dynamic Propagation for Virtual Screening Enhancement Controlling False Positive
Point2RBox-v3: Self-Bootstrapping from Point Annotations via Integrated Pseudo-Label Refinement and Utilization
Unlearning during Training: Domain-Specific Gradient Ascent for Domain Generalization
LiFR-Seg: Anytime High-Frame-Rate Segmentation via Event-Guided Propagation
Consolidating Reinforcement Learning for Multimodal Discrete Diffusion Models
FedMuon: Federated Learning with Bias-corrected LMO-based Optimization
SAFA-SNN: Sparsity-Aware On-Device Few-Shot Class-Incremental Learning with Fast-Adaptive Structure of Spiking Neural Network
Offline Preference-Based Value Optimization
Deploying Models to Non-participating Clients in Federated Learning without Fine-tuning: A Hypernetwork-based Approach
Grounding Computer Use Agents on Human Demonstrations
DUET: Optimizing LLM Training Data Mixtures via Noisy Feedback from Unseen, Downstream Evaluation Tasks
Learning Global Hypothesis Space for Enhancing Synergistic Reasoning Chain
Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation
On-the-Fly Adaptation to Quantization: Configuration-Aware LoRA for Efficient Fine-Tuning of Quantized LLMs
Programming with Pixels: Can Computer-Use Agents do Software Engineering?
Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer
Soft Tokens, Hard Truths
ContextPRM: Leveraging Contextual Coherence for multi-domain Test-Time Scaling
Diversity-Incentivized Exploration for Versatile Reasoning
ShieldedCode: Learning Robust Representations for Virtual Machine Protected Code
Optimizer Choice Matters For The Emergence of Neural Collapse
Tequila: Trapping-free Ternary Quantization for Large Language Models
Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition
Enabling True Global Perception in State Space Models for Visual Tasks
Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond
R-Zero: Self-Evolving Reasoning LLM from Zero Data
Out-of-Distribution Graph Models Merging
CheckMate! Watermarking Graph Diffusion Models in Polynomial Time
RATE-DISTORTION OPTIMIZED PRAGMATIC COMMUNICATION FOR COLLABORATIVE PERCEPTION
Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function
Improving Code Localization with Repository Memory
Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization
Exposing and Defending the Achilles' Heel of Video Mixture-of-Experts
VFScale: Intrinsic Reasoning through Verifier-Free Test-time Scalable Diffusion Model
Remotely Detectable Robot Policy Watermarking
FlyPrompt: Brain-Inspired Random-Expanded Routing with Temporal-Ensemble Experts for General Continual Learning
Debiased Front-Door Learners for Heterogeneous Effects
SciNav: A General Agent Framework for Scientific Coding Tasks
UFO-4D: Unposed Feedforward 4D Reconstruction from Two Images
Discovering alternative solutions beyond the simplicity bias in recurrent neural networks
GenSR: Symbolic regression based on equation generative space
LEAP: Local ECT-Based Learnable Positional Encodings for Graphs
SteinsGate: Adding Causality to Diffusions for Long Video Generation via Path Integral
P3D: Highly Scalable 3D Neural Surrogates for Physics Simulations with Global Context
LVTINO: LAtent Video consisTency INverse sOlver for High Definition Video Restoration
MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs
Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization
Universal Inverse Distillation for Matching Models with Real-Data Supervision (No GANs)
Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations
CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmarking of Large Language Models in Mental Health Question Answering
Deep Think with Confidence
A Single Architecture for Representing Invariance Under Any Space Group
Why Less is More (Sometimes): A Theory of Data Curation
Attributing Response to Context: A Jensen–Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation
NePTune: A Neuro-Pythonic Framework for Tunable Compositional Reasoning on Vision-Language
AutoFly: Vision-Language-Action Model for UAV Autonomous Navigation in the Wild
D$^2$GS: Depth-and-Density Guided Gaussian Splatting for Stable and Accurate Sparse-View Reconstruction
No Prior, No Leakage: Revisiting Reconstruction Attacks in Trained Neural Networks
Inference-Time Scaling of Discrete Diffusion Models via Importance Weighting and Optimal Proposal Design
Adaptive Conformal Anomaly Detection with Time Series Foundation Models for Signal Monitoring.
Differentially Private Equilibrium Finding in Polymatrix Games
MicroVerse: A Preliminary Exploration Toward a Micro-World Simulation
SSD-GS: Scattering and Shadow Decomposition for Relightable 3D Gaussian Splatting
MVR: Multi-view Video Reward Shaping for Reinforcement Learning
Rethinking Unsupervised Cross-modal Flow Estimation: Learning from Decoupled Optimization and Consistency Constraint
Architecture-Agnostic Test-Time Adaptation via Backprop-Free Embedding Alignment
Value Matching: Scalable and Gradient-Free Reward-Guided Flow Adaptation
JailNewsBench: Multi-Lingual and Regional Benchmark for Fake News Generation under Jailbreak Attacks
LaTo: Landmark-tokenized Diffusion Transformer for Fine-grained Human Face Editing
AQuA: Toward Strategic Response Generation for Ambiguous Visual Questions
Qronos: Correcting the Past by Shaping the Future... in Post-Training Quantization
Convergent Differential Privacy Analysis for General Federated Learning
MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers
FRABench and UFEval: Unified Fine-grained Evaluation with Task and Aspect Generalization
DanceTogether: Generating Interactive Multi-Person Video without Identity Drifting
Constrained Decoding of Diffusion LLMs with Context-Free Grammars
Asynchronous Matching with Dynamic Sampling for Multimodal Dataset Distillation
From Sorting Algorithms to Scalable Kernels: Bayesian Optimization in High-Dimensional Permutation Spaces
Decoupling Dynamical Richness from Representation Learning: Towards Practical Measurement
Plug, Play, and Fortify: A Low-Cost Module for Robust Multimodal Image Understanding Models
Minimax Rates for Learning Pairwise Interactions in Attention-Style Models
$\boldsymbol{\partial^\infty}$-Grid: A Neural Differential Equation Solver with Differentiable Feature Grids
PSP: Prompt-Guided Self-Training Sampling Policy for Active Prompt Learning
Spiking Discrepancy Transformer for Point Cloud Analysis
Advancing Multi-agent Traffic Simulation via R1-Style Reinforcement Fine-Tuning
Semantic Voting: A Self-Evaluation-Free Approach for Efficient LLM Self-Improvement on Unverifiable Open-ended Tasks
BoreaRL: A Multi-Objective Reinforcement Learning Environment for Climate-Adaptive Boreal Forest Management
Ctrl-World: A Controllable Generative World Model for Robot Manipulation
Social Agents: Collective Intelligence Improves LLM Predictions
QueryStream: Advancing Streaming Video Understanding with Query-Aware Pruning and Proactive Response
PE-SGD: Differentially Private Deep Learning via Evolution of Gradient Subspace for Text
UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing
Secure Inference for Diffusion Models via Unconditional Scores
Gradient-Direction-Aware Density Control for 3D Gaussian Splatting
Delay Flow Matching
Musculoskeletal simulation of limb movement biomechanics in Drosophila melanogaster
AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM
Distributional value gradients for stochastic environments
Language Models Use Lookbacks to Track Beliefs
Goal-Aware Identification and Rectification of Misinformation in Multi-Agent Systems
LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
Resisting Contextual Interference in RAG via Parametric-Knowledge Reinforcement
Using maximal information auxiliary variables to improve synthetic data generation based on TabPFN foundation models
Semantic Regexes: Auto-Interpreting LLM Features with a Structured Language
Reasoning Language Model Inference Serving Unveiled: An Empirical Study
A Physics-Inspired Optimizer: Velocity Regularized Adam
Learning to Interpret Weight Differences in Language Models
One-Step Flow Q-Learning: Addressing the Diffusion Policy Bottleneck in Offline Reinforcement Learning
One protein is all you need
MatRIS: Toward Reliable and Efficient Pretrained Machine Learning Interatomic Potentials
Ringleader ASGD: The First Asynchronous SGD with Optimal Time Complexity under Data Heterogeneity
An evolutionary perspective on modes of learning in Transformers
LLM as an Algorithmist: Enhancing Anomaly Detectors via Programmatic Synthesis
Understanding the Implicit Biases of Design Choices for Time Series Foundation Models
One-Step Flow for Image Super-Resolution with Tunable Fidelity-Realism Trade-offs
Sampling Complexity of TD and PPO in RKHS
MARC: Memory-Augmented RL Token Compression for Efficient Video Understanding
Quantization-Aware Diffusion Models For Maximum Likelihood Training
Diffusion Language Models are Provably Optimal Parallel Samplers
BWCache: Accelerating Video Diffusion Transformers through Block-Wise Caching
R-WoM: Retrieval-augmented World Model For Computer-use Agents
STITCH: Simultaneous Thinking and Talking with Chunked Reasoning for Spoken Language Models
From Fields to Random Trees
SCoT: Teaching 3D-LLMs to Think Spatially with Million-scale CoT Annotations
Anatomy-aware Representation Learning for Medical Ultrasound
Disrupting Hierarchical Reasoning: Adversarial Protection for Geographic Privacy in Multimodal Reasoning Models
SC-Arena: A Natural Language Benchmark for Single-Cell Reasoning with Knowledge-Augmented Evaluation
Reliable Evaluation of MRI Motion Correction: Dataset and Insights
High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes
Beyond Linear Processing: Dendritic Bilinear Integration in Spiking Neural Networks
A foundation model with multi-variate parallel attention to generate neuronal activity
AgentPO: Enhancing Multi-Agent Collaboration via Reinforcement Learning
PRISM: Progressive Robust Learning for Open-World Continual Category Discovery
Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation
Bures-Wasserstein Flow Matching for Graph Generation
Streaming Visual Geometry Transformer
VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration
VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning
RAVENEA: A Benchmark for Multimodal Retrieval-Augmented Visual Culture Understanding
CFT-RAG: An Entity Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter
3DGEER: 3D Gaussian Rendering Made Exact and Efficient for Generic Cameras
Enhancing LLMs for Knowledge Base Question Answering by Chain-of-Decomposition
Learning a distance measure from the information-estimation geometry of data
From Narrow to Panoramic Vision: Attention-Guided Cold-Start Reshapes Multimodal Reasoning
DistMLIP: A Distributed Inference Platform for Machine Learning Interatomic Potentials
WavefrontDiffusion: Dynamic Decoding Schedule for Improved Reasoning
Reconstructing KV Caches with Cross-Layer Fusion for Enhanced Transformers
Distributions as Actions: A Unified Framework for Diverse Action Spaces
Count Bridges enable Modeling and Deconvolving Transcriptomic Data
Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization
Retain and Adapt: Auto-Balanced Model Editing for Open-Vocabulary Object Detection under Domain Shifts
SNAPHARD CONTRAST LEARNING
GIR-Bench: Versatile Benchmark for Generating Images with Reasoning
Pusa V1.0: Unlocking Temporal Control in Pretrained Video Diffusion Models via Vectorized Timestep Adaptation
Boomerang Distillation Enables Zero-Shot Model Size Interpolation
Textual Equilibrium Propagation for Deep Compound AI Systems
Machine Unlearning under Retain–Forget Entanglement
QuantSparse: Comprehensively Compressing Video Diffusion Transformer with Model Quantization and Attention Sparsification
PRISM-Physics: Causal DAG-Based Process Evaluation for Physics Reasoning
Unified Vision–Language Modeling via Concept Space Alignment
ReIn: Conversational Error Recovery with Reasoning Inception
Meta-RL Induces Exploration in Language Agents
Guided Query Refinement: Multimodal Hybrid Retrieval with Test-Time Optimization
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
Flow Matching with Semidiscrete Couplings
AbstRaL: Augmenting LLMs' Reasoning by Reinforcing Abstract Thinking
FlowCast: Advancing Precipitation Nowcasting with Conditional Flow Matching
SPREAD: Sampling-based Pareto front Refinement via Efficient Adaptive Diffusion
ReSplat: Degradation-agnostic Feed-forward Gaussian Splatting via Self-guided Residual Diffusion
Forward-Learned Discrete Diffusion: Learning how to noise to denoise faster
Prompt-Robust Vision-Language Models via Meta-Finetuning
Visual Prompt-Agnostic Evolution
LCA: Local Classifier Alignment for Continual Learning
ScaleCap: Scalable Image Captioning via Dual-Modality Debiasing
TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models
Hinge Regression Tree: A Newton Method for Oblique Regression Tree Splitting
PLoP: Precise LoRA Placement for Efficient Finetuning of Large Models
A$^2$FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning
Rethinking JEPA: Compute‑Efficient Video Self-Supervised Learning with Frozen Teachers
Temporally Detailed Hypergraph Neural ODE for Disease Progression Modeling
GTA1: GUI Test-time Scaling Agent
HWC-Loco: A Hierarchical Whole-Body Control Approach to Robust Humanoid Locomotion
On the Convergence Direction of Gradient Descent
Deforming Videos to Masks: Flow Matching for Referring Video Segmentation
Expanding Reasoning Potential in Foundation Model by Learning Diverse Chains of Thought Patterns
Frozen Priors, Fluid Forecasts: Prequential Uncertainty for Low-Data Deployment with Pretrained Generative Models
Once-More: Continuous Self-Correction for Large Language Models via Perplexity-Guided Intervention
Belief-Based Offline Reinforcement Learning for Delay-Robust Policy Optimization
Diffusion & Adversarial Schrödinger Bridges via Iterative Proportional Markovian Fitting
Graph-of-Agents: A Graph-based Framework for Multi-Agent LLM Collaboration
PiCa: Parameter-Efficient Fine-Tuning with Column Space Projection
VisioMath: Benchmarking Figure-based Mathematical Reasoning in LMMs
SLM-MUX: Orchestrating Small Language Models for Reasoning
Target-Aware Video Diffusion Models
Step-Aware Residual-Guided Diffusion for EEG Spatial Super-Resolution
Learning Structure-Semantic Evolution Trajectories for Graph Domain Adaptation
Your Models Have Thought Enough: Training Large Reasoning Models to Stop Overthinking
On The Geometry and Topology of Representations: the Manifolds of Modular Addition
BAR: Refactor the Basis of Autoregressive Visual Generation
Chessformer: A Unified Architecture for Chess Modeling
Adaptive Thinking: Large Language Models Know When to Think in Latent Space
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Preserve and Personalize: Personalized Text-to-Image Diffusion Models without Distributional Drift
Hyden: A Hybrid Dual-Path Encoder for Monocular Geometry of High-resolution Images
Spherical Watermark: Encryption-Free, Lossless Watermarking for Diffusion Models
Clipped Gradient Methods for Nonsmooth Convex Optimization under Heavy-Tailed Noise: A Refined Analysis
Relational Graph Transformer
ATPO: ADAPTIVE TREE POLICY OPTIMIZATION FOR MULTI-TURN MEDICAL DIALOGUE
Robust Adaptive Multi-Step Predictive Shielding
Color3D: Controllable and Consistent 3D Colorization with Personalized Colorizer
CyberGym: Evaluating AI Agents' Real-World Cybersecurity Capabilities at Scale
Diverse Text Decoding via Iterative Reweighting
Value Flows
HoloPart: Generative 3D Part Amodal Segmentation
SoSBench: Benchmarking Safety Alignment on Six Scientific Domains
Adaptive gradient descent on Riemannian manifolds and its applications to Gaussian variational inference
Distributionally Robust Cooperative Multi-agent Reinforcement Learning with Value Factorization
Towards a Certificate of Trust: Task-Aware OOD Detection for Scientific AI
From Gradient Volume to Shapley Fairness: Towards Fair Multi-Task Learning
AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models
The Alignment Waltz: Jointly Training Agents to Collaborate for Safety
ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Problems
Multi-agent Coordination via Flow Matching
Transformers as Measure-Theoretic Associative Memory: A Statistical Perspective and Minimax Optimality
Fantastic Pretraining Optimizers and Where to Find Them
InputDSA: Demixing, then comparing recurrent and externally driven dynamics
Can Small Training Runs Reliably Guide Data Curation? Rethinking Proxy-Model Practice
CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow-Map Models
SFBD-OMNI: Bridge models for lossy measurement restoration with limited clean samples
Charts Are Not Images: On the Challenges of Scientific Chart Editing
Deep SPI: Safe Policy Improvement via World Models
IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?
AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
Master Skill Learning with Policy-Grounded Synergy of LLM-based Reward Shaping and Exploring
GRADIEND: Feature Learning within Neural Networks Exemplified through Biases
A Unification of Discrete, Gaussian, and Simplicial Diffusion
Stochastic Neural Networks for Causal Inference with Missing Confounders
WFR-FM: Simulation-Free Dynamic Unbalanced Optimal Transport
SafeMPO: Constrained Reinforcement Learning with Probabilistic Incremental Improvement
Imagine How To Change: Explicit Procedure Modeling for Change Captioning
Adaptive Conformal Guidance for Learning under Uncertainty
Cyber-Zero: Training Cybersecurity Agents without Runtime
Flock: A Knowledge Graph Foundation Model via Learning on Random Walks
Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models
Patronus: Interpretable Diffusion Models with Prototypes
DSA: Efficient Inference For Video Generation Models via Distributed Sparse Attention
Disco: Densely-overlapping Cell Instance Segmentation via Adjacency-aware Collaborative Coloring
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
Steering Evaluation-Aware Language Models To Act Like They Are Deployed
Don't Throw Away Your Pretrained Model
CoT-RVS: Zero-Shot Chain-of-Thought Reasoning Segmentation for Videos
Practical estimation of the optimal classification error with soft labels and calibration
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding
Language Identification in the Limit with Computational Trace
SuperMAN: Interpretable and Expressive Networks over Temporally Sparse Heterogeneous Data
Capability-Based Scaling Trends for LLM-Based Red-Teaming
Theory-Grounded Evaluation of Human-Like Fallacy Patterns in LLM Reasoning
Protection against Source Inference Attacks in Federated Learning
Tensor learning with orthogonal, Lorentz, and symplectic symmetries
Diversity-Enhanced Reasoning for Subjective Questions
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
Geometry-aware 4D Video Generation for Robot Manipulation
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task
BioX-Bridge: Model Bridging for Unsupervised Cross-Modal Knowledge Transfer across Biosignals
Dual-Solver: A Generalized ODE Solver for Diffusion Models with Dual Prediction
Dynamic Novel View Synthesis in High Dynamic Range
AQER: A Scalable and Efficient Data Loader for Digital Quantum Computers
Draft-based Approximate Inference for LLMs
Strategic Scaling of Test-Time Compute: A Bandit Learning Approach
Language Models are Injective and Hence Invertible
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
Understanding Dataset Distillation via Spectral Filtering
Co-LoRA: Collaborative Model Personalization on Heterogeneous Multi-Modal Clients
Benefits and Limitations of Communication in Multi-Agent Reasoning
Map as a Prompt: Learning Multi-Modal Spatial-Signal Foundation Models for Cross-scenario Wireless Localization
Off-Policy Evaluation for Ranking Policies under Deterministic Logging Policies
Overlap-weighted orthogonal meta-learner for treatment effect estimation over time
Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval
Learning Ordinal Probabilistic Reward from Preferences
Mechanistic Independence: A Principle for Identifiable Disentangled Representations
Multi-Feature Quantized Self-Attention for Fair Large Language Models
Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction
Geometric Image Editing via Effects-Sensitive In-Context Inpainting with Diffusion Transformers
Sublinear Spectral Clustering Oracle with Little Memory
Context and Diversity Matter: The Emergence of In-Context Learning in World Models
Efficient Prediction of Large Protein Complexes via Subunit-Guided Hierarchical Refinement
Larger Datasets Can Be Repeated More: A Theoretical Analysis of Multi-Epoch Scaling in Linear Regression
ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality
VERINA: Benchmarking Verifiable Code Generation
Automated Formalization via Conceptual Retrieval-Augmented LLMs
Reassessing Layer Pruning in LLMs: New Insights and Methods
LiveMoments: Reselected Key Photo Restoration in Live Photos via Reference-guided Diffusion
Seeing Through Deception: Uncovering Misleading Creator Intent in Multimodal News with Vision-Language Models
vAttention: Verified Sparse Attention via Sampling
FastGRPO: Accelerating Policy Optimization via Concurrency-aware Speculative Decoding and Online Draft Learning
AlphaSAGE: Structure-Aware Alpha Mining via GFlowNets for Robust Exploration
Prompt-MII: Meta-Learning Instruction Induction for LLMs
Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization
VideoNSA: Native Sparse Attention Scales Video Understanding
Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks
MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction
MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark
Grounding-IQA: Grounding Multimodal Language Model for Image Quality Assessment
DeepEyesV2: Toward Agentic Multimodal Model
RealPDEBench: A Benchmark for Complex Physical Systems with Real-World Data
Contact-guided Real2Sim from Monocular Video with Planar Scene Primitives
The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward
A Noise is Worth Diffusion Guidance
Explainable LLM Unlearning through Reasoning
Quadratic Direct Forecast for Training Multi-Step Time-Series Forecast Models
Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation
EffiVMT: Video Motion Transfer via Efficient Spatial-Temporal Decoupled Finetuning
The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think
Estimating Semantic Alphabet Size for LLM Uncertainty Quantification
Routing Channel-Patch Dependencies in Time Series Forecasting with Graph Spectral Decomposition
Decomposing LLM Computation with Jets
OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Dual Distillation for Few-Shot Anomaly Detection
Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs
CatalystBench: A Comprehensive Multi-Task Benchmark for Advancing Language Models in Catalysis Science
Generalized Parallel Scaling with Interdependent Generations
To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models
RRNCO: Towards Real-World Routing with Neural Combinatorial Optimization
Lifelong Learning with Behavior Consolidation for Vehicle Routing
Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward
OpenEstimate: Evaluating LLMs on Reasoning Under Uncertainty with Real-World Data
RESCHED: Rethinking Flexible Job Shop Scheduling from a Transformer-based Architecture with Simplified States
Delving into Spectral Clustering with Vision-Language Representations
Teaching Metric Distance to Discrete Autoregressive Language Models
Stochastic Self-Organization in Multi-Agent Systems
Complexity- and Statistics-Guided Anomaly Detection in Time Series Foundation Models
Reducing Belief Deviation in Reinforcement Learning for Active Reasoning of LLM Agents
LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them?
Revisiting Tree-Sliced Wasserstein Distance Through the Lens of the Fermat–Weber Problem
AudioX: A Unified Framework for Anything-to-Audio Generation
PA3FF:Learning Part-Aware Dense 3D Feature Field For Generalizable Articulated Object Manipulation
Omni-View: Unlocking How Generation Facilitates Understanding in Unified 3D Model based on Multiview images
Consistent Noisy Latent Rewards for Trajectory Preference Optimization in Diffusion Models
LLM Unlearning with LLM Beliefs
Mapping Post-Training Forgetting in Language Models at Scale
Distributionally Robust Optimization via Generative Ambiguity Modeling
Terminal Velocity Matching
Revisiting Weight Regularization for Low-Rank Continual Learning
Beyond Text-to-Image: Liberating Generation with a Unified Discrete Diffusion Model
Coarse-to-Fine Learning of Dynamic Causal Structures
What Scales in Cross-Entropy Scaling Law?
OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis
Scaling Synthetic Task Generation for Agents via Exploration
DefensiveKV: Taming the Fragility of KV Cache Eviction in LLM Inference
Synchronizing Probabilities in Model-Driven Lossless Compression
On the Impact of the Utility in Semivalue-based Data Valuation
WIMLE: Uncertainty‑Aware World Models with IMLE for Sample‑Efficient Continuous Control
Premise Selection for a Lean Hammer
Dataset Distillation for Memorized Data: Soft Labels can Leak Held-Out Teacher Knowledge
Scaling Speech Tokenizers with Diffusion Autoencoders
Incentivizing LLM Reasoning via Reinforcement Learning with Functional Monte Carlo Tree Search
CircuitNet 3.0: A Multi-Modal Dataset with Task-Oriented Augmentation for AI-Driven Circuit Design
ARINBEV: Bird's-Eye View Layout Estimation with Conditional Autoregressive Model
WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference
X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model
Reward Is Enough: LLMs Are In-Context Reinforcement Learners
Inferring brain plasticity rule under long-term stimulation with structured recurrent dynamics
From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance
From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones
Text-Aware Image Restoration with Diffusion Models
Guaranteed Simply Connected Mesh Reconstruction from an Unorganized Point Cloud
PropensityBench: Evaluating Latent Safety Risks in Large Language Models via an Agentic Approach
W-EDIT: A Wavelet-Based Frequency-Aware Framework for Text-Driven Image Editing
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
Dynamic Weight Grafting: Localizing Finetuned Factual Knowledge in Transformers
The Geometry of Reasoning: Flowing Logics in Representation Space
Estimating Dimensionality of Neural Representations from Finite Samples
Revisiting Parameter Server in LLM Post-Training
MAGO: Beyond Fixed Hyperparameters with Multi-Objective Pareto Optimization for Hybrid LLM Reasoning
IC-Custom: Diverse Image Customization via In-Context Learning
VisualPrompter: Semantic-Aware Prompt Optimization with Visual Feedback for Text-to-Image Synthesis
ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation
LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild
Measuring the Intrinsic Dimension of Earth Representations
BoGrape: Bayesian optimization over graphs with shortest-path encoded
HiCache: A Plug-in Scaled-Hermite Upgrade for Taylor-Style Cache-then-Forecast Diffusion Acceleration
VisuRiddles: Fine-grained Perception is a Primary Bottleneck for Multimodal Large Language Models in Abstract Visual Reasoning
MATA: A Trainable Hierarchical Automaton System for Multi-Agent Visual Reasoning
$\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers
Physically-Guided Optical Inversion Enable Non-Contact Side-Channel Attack on Isolated Screens
Scaling Multi-Task Bayesian Optimization with Large Language Models
Pose-RFT: Aligning MLLMs for 3D Pose Generation via Hybrid Action Reinforcement Fine-Tuning
Spatially Guided Training for Vision-Language-Action Model
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention
Astraea: A Token-wise Acceleration Framework for Video Diffusion Transformers
AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent
Initialization Schemes for Kolmogorov–Arnold Networks: An Empirical Study
UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction as Reasoning
GranViT: A Fine-Grained Vision Model For Autoregressive Multimodal Large Language Models
Adaptive Debiasing Tsallis Entropy for Test-Time Adaptation
TIPO: Text to Image with Text Pre-sampling for Prompt Optimization
Copy-Paste to Mitigate Large Language Model Hallucinations
OpenApps: Simulating Environment Variations to Measure UI Agent Reliability
Fine-tuning Done Right in Model Editing
Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation
On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations
Video Scene Segmentation with Genre and Duration Signals
Sparse Attention Adaptation for Long Reasoning
Short Window Attention Enables Long-Term Memorization
DevOps-Gym: Benchmarking AI Agents in Software DevOps Cycle
Continuous Space-Time Video Super-Resolution with 3D Fourier Fields
Eigen-Agent: Adaptive Multi-Agent Scientific Reasoning with Monitor-Based RAG
Inference-time scaling of diffusion models through classical search
Navigating the Accuracy-Size Trade-Off with Flexible Model Merging
Aligning Visual Foundation Encoders to Tokenizers for Diffusion Models
VoMP: Predicting Volumetric Mechanical Property Fields
Stop Unnecessary Reflection: Training LRMs for Efficient Reasoning with Adaptive Reflection and Length Coordinated Penalty
Diffusion Alignment as Variational Expectation-Maximization
GraphPlanner: Graph Memory-Augmented Agentic Routing for Multi-Agent LLMs
Zero-shot Forecasting by Simulation Alone
CaTs and DAGs: Integrating Directed Acyclic Graphs with Transformers for Causally Constrained Predictions
Noise-Adaptive Diffusion Sampling for Inverse Problems Without Task-Specific Tuning
GuardAlign: Test-time Safety Alignment in Multimodal Large Language Models
GarmentGPT: Compositional Garment Pattern Generation via Discrete Latent Tokenization
Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs
More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models
SigLIP-HD by Fine-to-Coarse Supervision
DiffInk: Glyph- and Style-Aware Latent Diffusion Transformer for Text to Online Handwriting Generation
Best-of-three-worlds Analysis for Dueling Bandits with Borda Winner
LoRA meets Riemannion: Muon Optimizer for Parametrization-independent Low-Rank Adapters
DistDF: Time-series Forecasting Needs Joint-distribution Wasserstein Alignment
Rectified Decoupled Dataset Distillation: A Closer Look for Fair and Comprehensive Evaluation
Causal Imitation Learning under Expert-Observable and Expert-Unobservable Confounding
ACPBench Hard: Unrestrained Reasoning about Action, Change, and Planning
SpatialHand: Generative Object Manipulation from 3D Prespective
Beyond Masks: Efficient, Flexible Diffusion Language Models via Deletion-Insertion Processes
Characterizing Pattern Matching and Its Limits on Compositional Task Structures
MMSearch-Plus: Benchmarking Provenance-Aware Search for Multimodal Browsing Agents
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation
Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing
Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers
LogiConBench: Benchmarking Logical Consistencies of LLMs
Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP
DeRaDiff: Denoising Time Realignment of Diffusion Models
Learning to Lie: Adversarial Attacks on Human-AI Teams and LLMs
KeepLoRA: Continual Learning with Residual Gradient Adaptation
BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models
Concept-based Adversarial Attack: a Probabilistic Perspective
d$^2$Cache: Accelerating Diffusion-Based LLMs via Dual Adaptive Caching
Text2Grad: Reinforcement Learning from Natural Language Feedback
Group Verification-based Policy Optimization for Interactive Coding Agents
Less Gaussians, Texture More: 4K Feed-Forward Textured Splatting
Statistical Advantage of Softmax Attention: Insights from Single-Location Regression
The Matthew Effect of AI Programming Assistants: A Hidden Bias in Software Evolution
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime
DiffVax: Optimization-Free Image Immunization Against Diffusion-Based Editing
Automatic and Structure-Aware Sparsification of Hybrid Neural ODEs with Application to Glucose Prediction
Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control
Expressive Power of Implicit Models: Rich Equilibria and Test-Time Scaling
Some Neural Networks Inherently Preserve Subspace Clustering Structure
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization
Beyond Softmax and Entropy: Convergence Rates of Policy Gradients with $\boldsymbol{f}$-SoftArgmax Parameterization $\&$ Coupled Regularization
References Improve LLM Alignment in Non-Verifiable Domains
Many Eyes, One Mind: Temporal Multi-Perspective and Progressive Distillation for Spiking Neural Networks
Si-GT: Fast Interconnect Signal Integrity Analysis for Integrated Circuit Design via Graph Transformers
Understanding the Dynamics of Forgetting and Generalization in Continual Learning via the Neural Tangent Kernel
IGU-LoRA: Adaptive Rank Allocation via Integrated Gradients and Uncertainty-Aware Scoring
Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking
Empowering LLM Tool Invocation with Tool-call Reward Model
PhaseFormer: From Patches to Phases for Efficient and Effective Time Series Forecasting
Light of Normals: Unified Feature Representation for Universal Photometric Stereo
Splat Feature Solver
RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy
BioMD: All-atom Generative Model for Biomolecular Dynamics Simulation
SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
Curse of Slicing: Why Sliced Mutual Information is a Deceptive Measure of Statistical Dependence
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Tree-based Search
RiskPO: Risk-based Policy Optimization with Verifiable Reward for LLM Post-Training
Look Carefully: Adaptive Visual Reinforcements in Multimodal Large Language Models for Hallucination Mitigation
SAES-SVD: Self-Adaptive Suppression of Accumulated and Local Errors for SVD-based LLM Compression
VLM-Guided Adaptive Negative Prompting for Creative Generation
Enabling Your Forensic Detector Know How Well It Performs on Distorted Samples
Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer
HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space
Seeing but Not Believing: Probing the Disconnect Between Visual Attention and Answer Correctness in VLMs
ERTACache: Error Rectification and Timesteps Adjustment for Efficient Diffusion
ProtoTS: Learning Hierarchical Prototypes for Explainable Time Series Forecasting
Depth Anything with Any Prior
Improving Extreme Wind Prediction with Frequency-Informed Learning
Large Depth Completion Model from Sparse Observations
SpeechJudge: Towards Human-Level Judgment for Speech Naturalness
GOOD: Geometry-guided Out-of-Distribution Modeling for Open-set Test-time Adaptation in Point Cloud Semantic Segmentation
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning
Beyond Skeletons: Learning Animation Directly from Driving Videos with Same2X Training Strategy
Communication-Efficient Decentralized Optimization via Double-Communication Symmetric ADMM
Deep Learning for Subspace Regression
Hierarchical Encoding Tree with Modality Mixup for Cross-modal Hashing
Sample-efficient evidence estimation of score based priors for model selection
LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures
Direct Doubly Robust Estimation of Conditional Quantile Contrasts
DeepPrim: a Physics-Driven 3D Short-term Weather Forecaster via Primitive Equation Learning
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle
Shoot First, Ask Questions Later? Building Rational Agents that Explore and Act Like People
Back to Square Roots: An Optimal Bound on the Matrix Factorization Error for Multi-Epoch Differentially Private SGD
Entropy-preserving reinforcement learning
PreciseCache: Precise Feature Caching for Efficient and High-fidelity Video Generation
Variation in Verification: Understanding Verification Dynamics in Large Language Models
PathChat-SegR1: Reasoning Segmentation in Pathology via SO-GRPO
sleep2vec: Unified Cross-Modal Alignment for Heterogeneous Nocturnal Biosignals
LeRobot: An Open-Source Library for End-to-End Robot Learning
Diffusion Negative Preference Optimization Made Simple
Mathesis: Towards Formal Theorem Proving from Natural Languages
Vision-SR1: Self-Rewarding Vision-Language Model via Reasoning Decomposition and Multi-Reward Policy Optimization
MergOPT: A Merge-Aware Optimizer for Robust Model Merging
SpecBranch: Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism
ReLi3D: Relightable Multi-view 3D Reconstruction with Disentangled Illumination
Quartet of Diffusions: Structure-Aware Point Cloud Generation through Part and Symmetry Guidance
Anime-Ready: Controllable 3D Anime Character Generation with Body-Aligned Component-Wise Garment Modeling
Predictive CVaR Q-learning
Referring Layer Decomposition
InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents
CARPRT: Class-Aware Zero-Shot Prompt Reweighting for Vision-Language Model
Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments
Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought
Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning
Escaping Low-Rank Traps: Interpretable Visual Concept Learning via Implicit Vector Quantization
Missingness Bias Calibration in Feature Attribution Explanations
Difference-Aware Retrieval Policies for Imitation Learning
CoDi: Subject-Consistent and Pose-Diverse Text-to-Image Generation
From movement to cognitive maps: recurrent neural networks reveal how locomotor development shapes hippocampal spatial coding
Assembling the Mind's Mosaic: Towards EEG Semantic Intent Decoding
Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains
MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE
LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning
Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning
Revisting Node Affinity Prediction In Temporal Graphs
DES-LOC: Desynced Low Communication Adaptive Optimizers for Foundation Models
GDGB: A Benchmark for Generative Dynamic Text-Attributed Graph Learning
Disentangled Hierarchical VAE for 3D Human-Human Interaction Generation
TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them
Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals
AEGIS: Adversarial Target-Guided Retention-Data-Free Robust Concept Erasure from Diffusion Models
Computer Agent Arena: Toward Human-Centric Evaluation and Analysis of Computer-Use Agents
MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Models for Embodied Task Planning
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs
Omni-Weather: A Unified Multimodal Model for Weather Radar Understanding and Generation
DeCo-DETR: Decoupled Cognition DETR for efficient Open-Vocabulary Object Detection
Quotient-Space Diffusion Models
Efficient Testing for Correlation Clustering: Improved Algorithms and Optimal Bounds
Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective
BrowseNet: Graph-Based Associative Memory for Contextual Information Retrieval
Helix: Evolutionary Reinforcement Learning for Open-Ended Scientific Problem Solving
Plug-and-Play Compositionality for Boosting Continual Learning with Foundation Models
Understanding vs. Generation: Navigating Optimization Dilemma in Multimodal Models
Generalised Flow Maps for Few-Step Generative Modelling on Riemannian Manifolds
Trained on Tokens, Calibrated on Concepts: The Emergence of Semantic Calibration in LLMs
Unlocking the Value of Text: Event-Driven Reasoning and Multi-Level Alignment for Time Series Forecasting
AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism
Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs
A Comprehensive Information-Decomposition Analysis of Large Vision-Language Models
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
SCRIBES: Web-Scale Script-Based Semi-Structured Data Extraction with Reinforcement Learning
AssoMem: Scalable Memory QA with Multi-Signal Associative Retrieval
MambaVoiceCloning: Efficient and Expressive Text-to-Speech via State-Space Modeling and Diffusion Control
G-reasoner: Foundation Models for Unified Reasoning over Graph-structured Knowledge
CARD: Towards Conditional Design of Multi-agent Topological Structures
RADAR: Reasoning-Ability and Difficulty-Aware Routing for Reasoning LLMs
Neural Posterior Estimation with Latent Basis Expansions
Counterfactual Explanations on Robust Perceptual Geodesics
Your Language Model Secretly Contains Personality Subnetworks
Antislop: A Comprehensive Framework for Identifying and Eliminating Repetitive Patterns in Language Models
Two-Way Is Better Than One: Bidirectional Alignment with Cycle Consistency for Exemplar-Free Class-Incremental Learning
Test-Time Accuracy-Cost Control in Neural Simulators via Recurrent-Depth
Prediction with Expert Advice under Local Differential Privacy
GuidedSampling: Steering LLMs Towards Diverse Candidate Solutions at Inference-Time
Ice Cream Doesn’t Cause Drowning: Benchmarking LLMs Against Statistical Pitfalls in Causal Inference
MathNet: A Global Multimodal Benchmark for Mathematical Reasoning and Retrieval
Loc$^{2}$: Interpretable Cross-View Localization via Depth-Lifted Local Feature Matching
MolLangBench: A Comprehensive Benchmark for Language-Prompted Molecular Structure Recognition, Editing, and Generation
Embedding-Based Context-Aware Reranker
We use cookies to store which papers have been visited.
I agree
Successful Page Load
ICLR uses cookies for essential functions only. We do not sell your personal information.
Our Privacy Policy »
Accept
We use cookies to store which papers have been visited.
I agree