Skip to yearly menu bar
Skip to main content
Main Navigation
ICLR
Help/FAQ
Contact ICLR
Downloads
ICLR Blog
Code of Conduct
Privacy Policy
Create Profile
Reset Password
Journal To Conference Track
Diversity & Inclusion
Proceedings at OpenReview
Future Meetings
Press
Exhibitor Information
ICLR Twitter
About ICLR
My Stuff
Login
Select Year: (2026)
2026
2025
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
Getting Started
Schedule
Main Conference
Invited Talks
Awards
Papers
In-person Orals
Blog Track Posters
Workshops
Community
Town Hall
Socials
Sponsors
Organizers
Help
Getting Started
Layout:
mini
compact
topic
detail
×
No topics available
No sessions available
title
author
topic
session
shuffle
by
serendipity
bookmarked first
visited first
not visited first
bookmarked but not visited
Enable Javascript in your browser to see the papers page.
PRISON: Unmasking the Criminal Potential of Large Language Models
pySpatial: Generating 3D Visual Programs for Zero-Shot Spatial Reasoning
Behavior Learning
Low Rank Transformer for Multivariate Time Series Anomaly Detection and Localization
SCRAPL: Scattering Transform with Random Paths for Machine Learning
Scaling Goal-conditioned Reinforcement Learning with Multistep Quasimetric Distances
Deforming Videos to Masks: Flow Matching for Referring Video Segmentation
Scaling Sequence-to-Sequence Generative Neural Rendering
SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer
Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models
Predicting Kernel Regression Learning Curves from Only Raw Data Statistics
Eliciting Numerical Predictive Distributions of LLMs Without Auto-Regression
On the Bayes Inconsistency of Disagreement Discrepancy Surrogates
Toward Universal and Transferable Jailbreak Attacks on Vision-Language Models
Breaking the Total Variance Barrier: Sharp Sample Complexity for Linear Heteroscedastic Bandits with Fixed Action Set
Faster Diffusion Through Temporal Attention Decomposition
Towards a Sharp Analysis of Learning Offline $f$-Divergence-Regularized Contextual Bandits
CIMemories: A Compositional Benchmark For Contextual Integrity In LLMs
Best-of-Majority: Minimax-Optimal Strategy for Pass@k Inference Scaling
Convex Dominance in Deep Learning: A Scaling Law of Loss and Learning Rate
FlowNIB: An Information Bottleneck Analysis of Bidirectional vs. Unidirectional Language Models
Inferring brain plasticity rule under long-term stimulation with structured recurrent dynamics
RCPU: Rotation-Constrained Error Compensation for Structured Pruning of a Large Language Model
ChronoPlay: A Framework for Modeling Dual Dynamics and Authenticity in Game RAG Benchmarks
Toward Complex-Valued Neural Networks for Waveform Generation
Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models
Unbalanced Soft-Matching Distance For Neural Representational Comparison With Partial Unit Correspondence
Efficient Ensemble Conditional Independence Test Framework for Causal Discovery
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning
Are Domain Generalization Benchmarks with Accuracy on the Line Misspecified?
StreamingVLM: Real-Time Understanding for Infinite Video Streams
WAFT: Warping-Alone Field Transforms for Optical Flow
Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs
FreeAdapt: Unleashing Diffusion Priors for Ultra-High-Definition Image Restoration
On the Wings of Imagination: Conflicting Script-based Multi-role Framework for Humor Caption Generation
Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency
DiffusionNFT: Online Diffusion Reinforcement with Forward Process
NFT: Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
Hierarchical Entity-centric Reinforcement Learning with Factored Subgoal Diffusion
Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies
Adaptive Domain Shift in Diffusion Models for Cross-Modality Image Translation
Defining and quantifying compositional structure
Prune-then-Quantize or Quantize-then-Prune? Understanding the Impact of Compression Order in Joint Model Compression
Relational Feature Caching for Accelerating Diffusion Transformers
JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation
Statistical Guarantees for Approximate Stationary Points of Shallow Neural Networks
Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning
Fine-Tuning Diffusion Models via Intermediate Distribution Shaping
From Tokens to Nodes: Semantic-Guided Motion Control for Dynamic 3D Gaussian Splatting
Complexity- and Statistics-Guided Anomaly Detection in Time Series Foundation Models
Finite-Time Analysis of Actor-Critic Methods with Deep Neural Network Approximation
Enabling arbitrary inference in spatio-temporal dynamic systems: A physics-inspired perspective
Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning
Bilinear relational structure fixes reversal curse and enables consistent model editing
Structure Learning from Time-Series Data with Lag-Agnostic Structural Prior
TripleSumm: Adaptive Triple-Modality Fusion for Video Summarization
Ensemble Prediction of Task Affinity for Efficient Multi-Task Learning
Teach2Eval: An Interaction-Driven LLMs Evaluation Method via Teaching Effectiveness
Bayesian Evidence-Driven Prototype Evolution for Federated Domain Adaptation
Markovian Transformers for Informative Language Modeling
MrRoPE: Mixed-radix Rotary Position Embedding
TEN-DM: Topology-Enhanced Diffusion Model for Spatio-Temporal Event Prediction
VEAttack: Downstream-agnostic Vision Encoder Attack against Large Vision Language Models
ARINBEV: Bird's-Eye View Layout Estimation with Conditional Autoregressive Model
VenusX: Unlocking Fine-Grained Functional Understanding of Proteins
Efficient Adversarial Attacks on High-dimensional Offline Bandits
SUSD: Structured Unsupervised Skill Discovery through State Factorization
ODNet: Opinion Dynamics-Inspired Neural Message Passing for Graphs and Hypergraphs
GenSR: Symbolic regression based on equation generative space
vCache: Verified Semantic Prompt Caching
When Agents “Misremember” Collectively: Exploring the Mandela Effect in LLM-based Multi-Agent Systems
Hot PATE: Private Aggregation of Distributions for Diverse Tasks
DiffWind: Physics-Informed Differentiable Modeling of Wind-Driven Object Dynamics
Towards Improved Sentence Representations using Token Graphs
Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward
From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning
Displacement-Resistant Extensions of DPO with Nonconvex $f$-Divergences
Implicit bias produces neural scaling laws in learning curves, from perceptrons to deep networks
Coarse-to-Fine Learning of Dynamic Causal Structures
Difficulty–Diversity Collaborative Filtering for Data-Efficient LLM Fine-Tuning
Reward Is Enough: LLMs Are In-Context Reinforcement Learners
OSCAR: Online Soft Compression for RAG
WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning
EdgeCape: Edge Weight Prediction For Category-Agnostic Pose Estimation
ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents
Rethinking LLM-as-a-Judge: Representation-as-a-Judge with Small Language Models via Semantic Capacity Asymmetry
LLM DNA: Tracing Model Evolution via Functional Representations
LLaVA-FA: Learning Fourier Approximation for Compressing Large Multimodal Models
BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models
Tracing and Reversing Edits in LLMs: A Study on Rank-One Model Edits
Tensor learning with orthogonal, Lorentz, and symplectic symmetries
The Adversarial Conditioning Paradox: Why Attacked Inputs Are More Stable, Not Less
From Sequential to Parallel: Reformulating Dynamic Programming as GPU Kernels for Large-Scale Stochastic Combinatorial Optimization
TIGaussian: Disentangle Gaussians for Spatial-Awared Text-Image-3D Alignment
From Abstract to Contextual: What LLMs Still Cannot Do in Mathematics
Iterated Q-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning
XQC: Well-conditioned Optimization Accelerates Deep Reinforcement Learning
Hierarchy Decoding: A Training-free Parallel Decoding Strategy for Diffusion Large Language Models
Automated Formalization via Conceptual Retrieval-Augmented LLMs
Disco: Densely-overlapping Cell Instance Segmentation via Adjacency-aware Collaborative Coloring
AEGIS: Adversarial Target–Guided Retention-Data-Free Robust Concept Erasure from Diffusion Models
InnoGym: Benchmarking the Innovation Potential of AI Agents
EvolProver: Advancing Automated theorem proving by Evolving Formalized Problems via Symmetry and Difficulty
WavePolyp: Video Polyp Segmentation via Hierarchical Wavelet-Based Feature Aggregation and Inter-Frame Divergence Perception
HFSTI-Net: Hierarchical Frequency-spatial-temporal Interactions for Video Polyp Segmentation
MASAM: Multimodal Adaptive Sharpness-Aware Minimization for Heterogeneous Data Fusion
ActiveCQ: Active Estimation of Causal Quantities
Training-free Counterfactual Explanation for Temporal Graph Model Inference
HoloPart: Generative 3D Part Amodal Segmentation
Part-X-MLLM: Part-aware 3D Multimodal Large Language Model
Entropic Confinement and Mode Connectivity in Overparameterized Neural Networks
Sharpness-Aware Machine Unlearning
MVCustom: Multi-View Customized Diffusion via Geometric Latent Rendering and Completion
Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation
Riesz Neural Operator for Solving Partial Differential Equations
RESCHED: Rethinking Flexible Job Shop Scheduling from a Transformer-based Architecture with Simplified States
Stability Under Scrutiny: Benchmarking Representation Paradigms for Online HD Mapping
Syncphony: Synchronized Audio-to-Video Generation with Diffusion Transformers
Robust Denoising Neural Reranker for Recommender Systems
Towards High Data Efficiency in Reinforcement Learning with Verifiable Reward
DrugTrail: Explainable Drug Discovery via Structured Reasoning and Druggability‑Tailored Preference Optimization
Householder-Diagonalized Linear Attention (HDLA): Utilizing Enhanced Decay Mechanism for Efficient Sequence Modeling
GranViT: A Fine-Grained Vision Model With Autoregressive Perception For MLLMs
MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning
Expressive yet Efficient Feature Expansion with Adaptive Cross-Hadamard Products
How Stable is the Next Token? A Geometric View of LLM Prediction Stability
WebWatcher: Breaking New Frontiers of Vision-Language Deep Research Agent
Byzantine-Robust Federated Learning with Learnable Aggregation Weights
Grounding Generative Planners in Verifiable Logic: A Hybrid Architecture for Trustworthy Embodied AI
Towards Efficient, Adaptive, and Unified Reinforcement Mid-Training
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
VCWorld: A Biological World Model for Virtual Cell Simulation
TAVAE: A VAE with Adaptable Priors Explains Contextual Modulation in the Visual Cortex
Coupled Transformer Autoencoder for Disentangling Multi-Region Neural Latent Dynamics
FSOD-VFM: Few-Shot Object Detection with Vision Foundation Models and Graph Diffusion
Natural Identifiers for Privacy and Data Audits in Large Language Models
DIVA-GRPO: Enhancing Multimodal Reasoning through Difficulty-Adaptive Variant Advantage
Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models
Pre-training under infinite compute
FastVGGT: Fast Visual Geometry Transformer
Beyond Masks: Efficient, Flexible Diffusion Language Models via Deletion-Insertion Processes
Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond
FlyPrompt: Brain-Inspired Random-Expanded Routing with Temporal-Ensemble Experts for General Continual Learning
G-Merging: Graph Models Merging for Parameter-Efficient Multi-Task Knowledge Consolidation
Bi-LoRA: Efficient Sharpness-Aware Minimization for Fine-Tuning Large-Scale Models
RAIN-Merging: A Gradient-Free Method to Enhance Instruction Following in Large Reasoning Models with Preserved Thinking Format
Bayesian Neural Networks for Functional ANOVA Model
OptimSyn: Influence-Guided Rubrics Optimization for Synthetic Data Generation
Monocular Normal Estimation via Shading Sequence Estimation
The False Promise of Zero-Shot Super-Resolution in Machine-Learned Operators
GeomMotif: A Benchmark for Arbitrary Geometric Preservation in Protein Generation
FieryGS: In-the-Wild Fire Synthesis with Physics-Integrated Gaussian Splatting
Decoding Open-Ended Information Seeking Goals from Eye Movements in Reading
Ensembling Pruned Attention Heads For Uncertainty-Aware Efficient Transformers
UltraViCo: Breaking Extrapolation Limits in Video Diffusion Transformers
HierLoc: Hyperbolic Entity Embeddings for Hierarchical Visual Geolocation
EAST: Early Action Prediction Sampling Strategy with Token Masking
Counterfactual Structural Causal Bandits
DNOD: Deformable Neural Operators for Object Detection in SAR Images
LongHorizonUI: A Unified Framework for Robust long-horizon Task Automation of GUI Agent
LearnIR: Learnable Posterior Sampling for Real-World Image Restoration
Not All Clients Are Equal: Collaborative Model Personalization on Heterogeneous Multi-Modal Clients
Evolving Graph Structured Programs for Circuit Generation with Large Language Models
A Hierarchical Circuit Symbolic Discovery Framework for Efficient Logic Optimization
HDR-NSFF: High Dynamic Range Neural Scene Flow Fields
FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning
Converge Faster, Talk Less: Hessian-Informed Federated Zeroth-Order Optimization
Open-Set Semantic Gaussian Splatting SLAM with Expandable Representation
Reconciling Visual Perception and Generation in Diffusion Models
Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction
Uncertainty-Aware 3D Reconstruction for Dynamic Underwater Scenes
Beyond Frequency: Scoring-Driven Debiasing for Object Detection via Blueprint-Prompted Image Synthesis
Uncertainty-Aware Gaussian Map for Vision-Language Navigation
Master Skill Learning with Policy-Grounded Synergy of LLM-based Reward Shaping and Exploring
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
TraPO: A Semi-Supervised Reinforcement Learning Framework for Boosting LLM Reasoning
RayI2P: Learning Rays for Image-to-Point Cloud Registration
SkillFactory: Self-Distillation for Learning Cognitive Behaviors
Localized Concept Erasure in Text-to-Image Diffusion Models via High-Level Representation Misdirection
Joint Shadow Generation and Relighting via Light-Geometry Interaction Maps
ForestPersons: A Large-Scale Dataset for Under-Canopy Missing Person Detection
WIMFRIS: WIndow Mamba Fusion and Parameter Efficient Tuning for Referring Image Segmentation
Diverse Text-to-Image Generation via Contrastive Noise Optimization
Evidence for Limited Metacognition in LLMs
Emergent Dexterity Via Diverse Resets and Large-Scale Reinforcement Learning
KANO: Kolmogorov-Arnold Neural Operator
SPARTA: Scalable and Principled Benchmark of Tree-Structured Multi-hop QA over Text and Tables
Beyond Hearing: Learning Task-agnostic ExG Representations from Earphones via Physiology-informed Tokenization
Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts
Mechanism of Task-oriented Information Removal in In-context Learning
RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs
HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization
Modality Alignment across Trees on Heterogeneous Hyperbolic Manifolds
Deep Latent Variable Model based Vertical Federated Learning with Flexible Alignment and Labeling Scenarios
What Scales in Cross-Entropy Scaling Law?
Test-Time Efficient Pretrained Model Portfolios for Time Series Forecasting
DiffInk: Glyph- and Style-Aware Latent Diffusion Transformer for Text to Online Handwriting Generation
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention
Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning
Parameterization-Based Dataset Distillation of 3D Point Clouds through Learnable Shape Morphing
Asymmetric Synthetic Data Update for Domain Incremental Dataset Distillation
lmgame-Bench: How Good are LLMs at Playing Games?
STRONGER TOGETHER: ON-POLICY REINFORCEMENT LEARNING FOR COLLABORATIVE LLMS
Speech World Model: Causal State–Action Planning with Explicit Reasoning for Speech
VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications
Test-Time Mixture of World Models for Embodied Agents in Dynamic Environments
Good allocations from bad estimates
Context and Diversity Matter: The Emergence of In-Context Learning in World Models
Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing
EMFuse: Energy-based Model Fusion for Decision Making
VUDG: A Dataset for Video Understanding Domain Generalization
MolEditRL: Structure-Preserving Molecular Editing via Discrete Diffusion and Reinforcement Learning
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization
Repurposing Synthetic Data for Fine-grained Search Agent Supervision
UrbanFeel:A Comprehensive Benchmark for Temporal and Perceptual Understanding of City Scenes through Human Perspective
Earth-Agent: Unlocking the Full Landscape of Earth Observation with Agents
TrajTok: What makes for a good trajectory tokenizer in behavior generation?
Cannistraci-Hebb Training on Ultra-Sparse Spiking Neural Networks
CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs
Characterizing Pattern Matching and Its Limits on Compositional Task Structures
BranchGRPO: Stable and Efficient GRPO with Structured Branching in Diffusion Models
WFR-FM: Simulation-Free Dynamic Unbalanced Optimal Transport
Remaining-data-free Machine Unlearning by Suppressing Sample Contribution
MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents
SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization
Semantic Voting: A Self-Evaluation-Free Approach for Efficient LLM Self-Improvement on Unverifiable Open-ended Tasks
CompassNav: Steering From Path Imitation to Decision Understanding In Navigation
Decoding Dynamic Visual Experience from Calcium Imaging via Cell-Pattern-Aware SSL
SinkTrack: Attention Sink based Context Anchoring for Large Language Models
FlowCast: Trajectory Forecasting for Scalable Zero-Cost Speculative Flow Matching
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
$AutoDrive\text{-}P^3$: Unified Chain of Perception–Prediction–Planning Thought via Reinforcement Fine-Tuning
Detective SAM: Adaptive AI-Image Forgery Localization
RoboOmni: Proactive Robot Manipulation in Omni-modal Context
FASTer: Toward Powerful and Efficient Autoregressive Vision–Language–Action Models with Learnable Action Tokenizer and Block-wise Decoding
On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation
Quantile Advantage Estimation for Entropy-Safe Reasoning
Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs
PERSISTENCE SPHERES: BI-CONTINUOUS REPRESENTATIONS OF PERSISTENCE DIAGRAMS.
Instance-wise Adaptive Scheduling via Derivative-Free Meta-Learning
Object-Centric World Models from Few-Shot Annotations for Sample-Efficient Reinforcement Learning
Mixture-of-World Models: Scaling Multi-Task Reinforcement Learning with Modular Latent Dynamics
Hierarchical Value-Decomposed Offline Reinforcement Learning for Whole-Body Control
OmniCVR: A Benchmark for Omni-Composed Video Retrieval with Vision, Audio, and Text
LoRAGen: Structure-Aware Weight Space Learning for LoRA Generation
Initialization Schemes for Kolmogorov–Arnold Networks: An Empirical Study
Stable-LoRA: Stabilizing Feature Learning of Low-Rank Adaption
Multimodal LLM-assisted Evolutionary Search for Programmatic Control Policies
ExpGuard: LLM Content Moderation in Specialized Domains
LiveWeb-IE: A Benchmark For Online Web Information Extraction
MMReD: a Cross-Modal Benchmark for Dense Context Reasoning
Towards Quantization-Aware Training for Ultra-Low-Bit Reasoning LLMs
Beyond Markovian Drifts: Action-Biased Geometric Walks with Memory for Personalized Summarization
All That Glitters Is Not Gold: Key-Secured 3D Secrets within 3D Gaussian Splatting
Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis
Training Dynamics of the Cooldown Stage in Warmup-Stable-Decay Learning Rate Scheduler
Federated Learning with Profile Mapping under Distribution Shifts and Drifts
Topology-Preserved Auto-regressive Mesh Generation in the Manner of Weaving Silk
TIPO: Text to Image with Text Pre-sampling for Prompt Optimization
: One LLM Token for Explicit Graph Structural Understanding
ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning
Boosting Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning
On the Computational Limits of AI4S-RL : A Unified $\varepsilon$-$N$ Analysis
Cross-ControlNet: Training-Free Fusion of Multiple Conditions for Text-to-Image Generation
Diagnosing and Improving Diffusion Models by Estimating Optimal Loss Value
Quotient-Space Diffusion Model
Wait, Do We Need to Wait? Revisiting Budget Forcing for Sequential Test-Time Scaling
Mixture-of-Visual-Thoughts: Exploring Context-Adaptive Reasoning Mode Selection for General Visual Reasoning
Revisiting [CLS] and Patch Token Interaction in Vision Transformers
Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs
Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning
Safeguarding Multimodal Knowledge Copyright in the RAG-as-a-Service Environment
Inheriting Generalizable Knowledge from LLMs to Diverse Vertical Tasks
IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning
Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models
Learning-Time Encoding Shapes Unlearning in LLMs
ZeroSiam: An Efficient Siamese for Test-Time Entropy Optimization without Collapse
Bound by semanticity: universal laws governing the generalization-identification tradeoff
Topology and geometry of the learning space of ReLU networks: connectivity and singularities
Learning Deformable Body Interactions With Adaptive Spatial Tokenization
Exploring Interpretability for Visual Prompt Tuning with Cross-layer Concepts
Reasoning-Driven Multimodal LLM for Domain Generalization
Joint Adaptation of Uni-modal Foundation Models for Multi-modal Alzheimer's Disease Diagnosis
Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning
PerSpectra: A Scalable and Configurable Pluralist Benchmark of Perspectives from Arguments
APT: Towards Universal Scene Graph Generation via Plug-in Adaptive Prompt Tuning
Robust Adversarial Quantification via Conflict-Aware Evidential Deep Learning
Octax: Accelerated CHIP-8 Arcade Environments for Reinforcement Learning in JAX
RLAP-CLIP: Continual Multimodal Learning with Prototype Adaptation and Difficulty-Aware Routing
GraphOmni: A Comprehensive and Extensible Benchmark Framework for Large Language Models on Graph-theoretic Tasks
Augmented Radiance Field: A General Framework for Enhanced Gaussian Splatting
LMask: Learn to Solve Constrained Routing Problems with Lazy Masking
ResCP: Reservoir Conformal Prediction for Time Series Forecasting
DVD-Quant: Data-free Video Diffusion Transformers Quantization
Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models
PT$^2$-LLM: Post-Training Ternarization for Large Language Models
ExpVid: A Benchmark for Experiment Video Understanding & Reasoning
DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving
ViPRA: Video Prediction for Robot Actions
Towards Efficient Constraint Handling in Neural Solvers for Routing Problems
Generalizable Heuristic Generation Through LLMs with Meta-Optimization
PoLi-RL: A Point-to-List Reinforcement Learning Framework for Conditional Semantic Textual Similarity
Differential Fine-Tuning Large Language Models Towards Better Diverse Reasoning Abilities
Hallucination Begins Where Saliency Drops
Context Tokens are Anchors: Understanding the Repetition Curse in Diffusion MLLMs from an Information Flow Perspective
Benchmarking Multi-Agent Reinforcement Learning in Power Grid Operations
Model Already Knows the Best Noise: Bayesian Active Noise Selection via Attention in Video Diffusion Model
ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance Generation
CollectiveKV: Decoupling and Sharing Collaborative Information in Sequential Recommendation
The Effect of Attention Head Count on Transformer Approximation
back arrowGo to TMLR homepage Slicing the Gaussian Mixture Wasserstein Distance
The Overthinking Predicament: When Reasoning Hurts Ranking
Tools are under-documented: Simple Document Expansion Boosts Tool Retrieval
Angle K-Means
DemoGrasp: Universal Dexterous Grasping from a Single Demonstration
Regulating Internal Evidence Flows for Robust Learning Under Spurious Correlations
NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents
Prior-free Tabular Test-time Adaptation
Debugging Concept Bottleneck Models through Removal and Retraining
Multimodal Dataset Distillation Made Simple by Prototype-guided Data Synthesis
Expert Heads: Robust Evidence Identification for Large Language Models
PolicyFlow: Policy Optimization with Continuous Normalizing Flow in Reinforcement Learning
EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning
KaVa: Latent Reasoning via Compressed KV-Cache Distillation
Neuron-Level Analysis of Cultural Understanding in Large Language Models
AP-OOD: Attention Pooling for Out-of- Distribution Detection
Prune Redundancy, Preserve Essence: Vision Token Compression in VLMs via Synergistic Importance-Diversity
HiddenEcho: Mitigating Noise Amplification in Differentially Private LLMs with Hidden-State Correction
EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems
Continuous-Time Value Iteration for Multi-Agent Reinforcement Learning
HiFo-Prompt: Prompting with Hindsight and Foresight for LLM-based Automatic Heuristic Design
Task-Aware Data Selection via Proxy-Label Enhanced Distribution Matching for LLM Finetuning
Reassessing Layer Pruning in LLMs: New Insights and Methods
NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction
LycheeDecode: Accelerating Long-Context LLM Inference via Hybrid-Head Sparse Decoding
Detecting and Mitigating Memorization in Diffusion Models through Anisotropy of the Log-Probability
Pareto-Conditioned Diffusion Models for Offline Multi-Objective Optimization
BAR: Refactor the Basis of Autoregressive Visual Generation
FlexProtein: Joint Sequence and Structure Pretraining for Protein Modeling
Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information
IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction
IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction
Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis
ssToken: Self-modulated and Semantic-aware Token Selection for LLM Fine-tuning
KernelFusion: Zero-Shot Blind Super-Resolution via Patch Diffusion
Reading Images Like Texts: Sequential Image Understanding in Vision-Language Models
An Information-Theoretic Parameter-Free Bayesian Framework for Probing Labeled Dependency Trees from Attention Score
SafeMoE: Safe Fine-Tuning for MoE LLMs by Aligning Harmful Input Routing
Video-GPT via Next Clip Diffusion
Multi-Bellman operator for convergence of Q-learning with linear function approximation
QuaMo: Quaternion Motions for Vision-based 3D Human Kinematics Capture
SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation
RAP: 3D Rasterization Augmented End-to-End Planning
Strictly Constrained Generative Modeling via Split Augmented Langevin Sampling
Test-time Verification via Optimal Transport: Coverage, ROC, & Sub-optimality
ProtoTS: Learning Hierarchical Prototypes for Explainable Time Series Forecasting
Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer
LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence
MSCR: Exploring the Vulnerability of LLMs’ Mathematical Reasoning Abilities Using Multi-Source Candidate Replacement
GUI-Shift: Enhancing VLM-Based GUI Agents through Self-supervised Reinforcement Learning
SMAN-Bench: A Cross-System Benchmark for Mobile Agents under Single- and Multi-path, Ambiguous, and Noisy Tasks
MobileIPL: Enhancing Mobile Agents Thinking Process via Iterative Preference Learning
LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts
Procedural Mistake Detection via Action Effect Modeling
Learning from the Electronic Structure of Molecules across the Periodic Table
Textual Bayes: Quantifying Uncertainty in LLM-Based Systems
UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice
USTBench: Benchmarking and Dissecting Spatiotemporal Reasoning Capabilities of LLMs as Urban Agents
CoLLMLight: Cooperative Large Language Model Agents for Network-Wide Traffic Signal Control
Diffusion Bridge Variational Inference for Deep Gaussian Processes
A Scene is Worth a Thousand Features: Feed-Forward Camera Localization from a Collection of Image Features
Is the evidence in 'Language Models Learn to Mislead Humans via RLHF' valid?
Seeing Through Words: Controlling Visual Retrieval Quality with Language
Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks
HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space
Train on Validation (ToV): Fast data selection with applications to fine-tuning
Equivariant Splitting: Self-supervised learning from incomplete data
Dual Randomized Smoothing: Beyond Global Noise Variance
Revisiting Weight Regularization for Low-Rank Continual Learning
Multiplicative Diffusion Models: Beyond Gaussian Latents
Enhancing Communication Compression via Discrepancy-aware Calibration for Federated Learning
Solving General-Utility Markov Decision Processes in the Single-Trial Regime with Online Planning
ContextNav: Towards Agentic Multimodal In-Context Learning
Stochastic Self-Organization in Multi-Agent Systems
LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning
MOLM: Mixture of LoRA Markers
Learning Nonlinear Causal Reductions to Explain Reinforcement Learning Policies
Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning
Discrete Variational Autoencoding via Policy Search
From REINFORCE to Dr. GRPO: A Unified Perspective on LLM Post-Training
Speech-to-LaTeX: New Models and Datasets for Converting Spoken Equations and Sentences
Generalization of RLVR Using Causal Reasoning as a Testbed
FS-KAN: Permutation Equivariant Kolmogorov-Arnold Networks via Function Sharing
Bilateral Information-aware Test-time Adaptation for Vision-Language Models
ClarifyVC: Clarifying Ambiguous Commands in Vehicle Control with a Hybrid Data Augmentation Pipeline
SpatialViz-Bench: A Cognitively-Grounded Benchmark for Diagnosing Spatial Visualization in MLLMs
Selective Rotary Position Embedding
NAB: Neural Adaptive Binning for Sparse-View CT reconstruction
Revenue Maximization Under Sequential Price Competition Via The Estimation Of $s$-Concave Demand Functions
SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests
SVD Provably Denoises Nearest Neighbor Data
InclusiveVidPose: Bridging the Pose Estimation Gap for Individuals with Limb Deficiencies in Video-Based Motion
Semantic Regexes: Auto-Interpreting LLM Features with a Structured Language
Are EEG Foundation Models Worth It? Comparative Evaluation with Traditional Decoders in Diverse BCI Tasks
Panoptic Pairwise Distortion Graph
Improving and Accelerating Offline RL in Large Discrete Action Spaces with Structured Policy Initialization
LAMDA: A Longitudinal Android Malware Benchmark for Concept Drift Analysis
RepSpec: Structural Re-parameterized Draft Model Training for Speculative Decoding
RAVENEA: A Benchmark for Multimodal Retrieval-Augmented Visual Culture Understanding
RIVER: Real-time Video Interaction Benchmark
PEERING INTO THE UNKNOWN: ACTIVE VIEW SELECTION WITH NEURAL UNCERTAINTY MAPS FOR 3D RECONSTRUCTION
Rescue: Retrieval Augmented Secure Code Generation
ExpertLongBench: Benchmarking Language Models on Expert-Level Long-Form Generation Tasks with Structured Checklists
Differentially Private Domain Discovery
Visual Self-Refine: A Pixel-Guided Paradigm for Accurate Chart Parsing
Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models
ScaleCap: Scalable Image Captioning via Dual-Modality Debiasing
FlashDLM: Accelerating Diffusion Language Model Inference via Efficient KV Caching and Guided Diffusion
A Fano-Style Accuracy Upper Bound for LLM Single-Pass Reasoning in Multi-Hop QA
Massive Memorization with Hundreds of Trillions of Parameters for Sequential Transducer Generative Recommenders
The human knowledge loophole in the 'bitter lesson' for LLMs
From Markov to Laplace: How Mamba In-Context Learns Markov Chains
Interpolation-Based Conditioning of Flow Matching Models for Bioisosteric Ligand Design
EditBench: Evaluating LLM Abilities to Perform Real-World Instructed Code Edits
PCNN: Probable-Class Nearest-Neighbor Explanations Improve Fine-Grained Image Classification Accuracy for AIs and Humans
Measuring LLM Novelty As The Frontier Of Original And High-Quality Output
Autonomous Play with Correspondence-Driven Trajectory Warping
LoRA-S: An Efficient Low Rank Adaptation scheme via Sylvester equation
UniCalli: A Unified Diffusion Framework for Column-Level Generation and Recognition of Chinese Calligraphy
MATH-Beyond: A Benchmark for RL to Expand Beyond the Base Model
NerVE: Nonlinear Eigenspectrum Dynamics in LLM Feed-Forward Networks
Continual Unlearning for Text-to-Image Diffusion Models: A Regularization Perspective
Certified Evaluation of Model-Level Explanations for Graph Neural Networks
MILCO: Learned Sparse Retrieval Across Languages via a Multilingual Connector
Gelato: Graph Edit Distance via Autoregressive Neural Combinatorial Optimization
Improving Text-guided CAD Prototyping via Modality-Specific Tokenization
Modality-free Graph In-context Alignment
Dissecting Non-Determinism in Large Language Models
Improving Set Function Approximation with Quasi-Arithmetic Neural Networks
LLM as an Algorithmist: Enhancing Anomaly Detectors via Programmatic Synthesis
Huxley-G\"odel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine
Universal Beta Splatting
RESFL: An Uncertainty-Aware Framework for Responsible Federated Learning by Balancing Privacy, Fairness and Utility
Divide, Conquer, and Standardize — A Recursive Architecture for Multi-Agent Systems (MAS)
Are Deep Speech Denoising Models Robust to Adversarial Noise?
Generalizing Linear Autoencoder Recommenders with Decoupled Expected Quadratic Loss
Automatic Dialectic Jailbreak: A Framework for Generating Effective Jailbreak Strategies
Study of Training Dynamics for Memory-Constrained Fine-Tuning
Learning Energy-Based Generative Models via Potential Flow: A Variational Principle Approach to Probability Density Homotopy Matching
What is the Relationship between Tensor Factorizations and Circuits (and How Can We Exploit it)?
StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs
Tabby: A Language Model Architecture for Tabular and Structured Data Synthesis
Synergistic Benefits of Joint Molecule Generation and Property Prediction
On the Ability of Deep Networks to Learn Symmetries from Data – A Neural Kernel Theory
Amortized Inference of Causal Models via Conditional Fixed-Point Iterations
Variational Pseudo Marginal Methods for Jet Reconstruction in Particle Physics
AB-UPT: Scaling Neural CFD Surrogates for High- Fidelity Automotive Aerodynamics Simulations via Anchored- Branched Universal Physics Transformers
Training Dynamics of Learning 3D-Rotational Equivariance
Inverse Scaling in Test-Time Compute
Discrete Audio Tokens: More Than a Survey!
Adaptive Mesh Quantization for Neural PDE Solvers
Distributed Quasi-Newton Method for Fair and Fast Federated Learning
Defending Against Unknown Corrupted Agents: Reinforcement Learning of Adversarially Robust Nash Equilibria
Enhancing Vision-Language Model with Unmasked Token Alignment
Setting the Record Straight on Transformer Oversmoothing
Leveraging a Simulator for Learning Causal Representations from Post-Treatment Covariates for CATE
Online Selective Conformal Inference: Errors and Solutions
Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation
Encoder-only Next Token Prediction
PCF Learned Sort: a Learning Augmented Sort Algorithm with O(nloglogn) Expected Complexity
SCas4D: Structural Cascaded Optimization for Boosting Persistent 4D Novel View Synthesis
Model Tensor Planning
Directed Exploration in Reinforcement Learning from Linear Temporal Logic
Assessing Robustness via Score-Based Adversarial Image Generation
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
FlashAttention on a Napkin: A Diagrammatic Approach to Deep Learning IO-Awareness
CNN Interpretability with Multivector Tucker Saliency Maps for Self-Supervised Models
Information Theoretic Guarantees For Policy Alignment In Large Language Models
Temporal Test-Time Adaptation with State-Space Models
Adversarial Robustness of Graph Transformers
On the stability of gradient descent with second order dynamics for time-varying cost functions
Generalized Compressed Sensing for Image Reconstruction with Diffusion Probabilistic Models
HalluEntity: Benchmarking and Understanding Entity-Level Hallucination Detection
MobileCLIP2: Improving Multi-Modal Reinforced Training
Chimera: State Space Models Beyond Sequences
NeoBERT: A Next Generation BERT
A Case for Library-Level k-Means Binning in Histogram Gradient-Boosted Trees
Celo: Training Versatile Learned Optimizers on a Compute Diet
Simplex Constrained Sparse Optimization via Tail Screening
Faster SVD via Accelerated Newton-Schulz Iteration
JustRL: Scaling a 1.5B LLM with a Simple RL Recipe
YuE: Scaling Open Foundation Models for Long-Form Music Generation
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs
Flow Where You Want
Trade-offs in LLM Compute for Reasoning-Intensive Information Retrieval
Computer Use Survey - A Visual Survey of Computer Use Agents
Content Promotion as a Strategic Game: How to Design Agentic Publishers for the Evolving Search Ecosystem in the GenAI Era?
dLLM - Rethinking Generation Beyond Autoregressive Models
Loneliness as a Case Study for Social Reward Misalignment
What (and What Not) are Calibrated Probabilities Actually Useful for?
The effect of feature resolution on embedding dimension
Discretisation invariance
Ready For General Agents? Let's test it.
Rethinking the Diffusion Model from a Langevin Perspective
Revisiting the NetHack Learning Environment
AI Fundamentals: Valuing AI Agents & Data Assets
vAttention: Verified Sparse Attention via Sampling
Budget Alignment: Making Models Reason in the User's Language
Artistic Style and the Play of Neural Style Representations
Where’s the Chicken? Unpacking Spatial Awareness in Vision-Language Models
ChunkTabPFN: Training-free Long Context
Heuristic-Based Ideation for Guiding LLMs Toward Structured Creativity
Incentivizing Consistent, Effective and Scalable Reasoning Capability in Audio LLMs via Reasoning Process Rewards
h-MINT: Modeling Pocket-Ligand Binding with Hierarchical Molecular Interaction Network
From Trajectories to Operators — A Unified Flow Map Perspective on Generative Modeling
mCLM: A Modular Chemical Language Model that Generates Functional and Makeable Molecules
UnigramLM: An Attempt at Writing The Missing Manual
Square Peg, Round Hole: Plugging Non-Sequential Data into Sequential Language Models
Don't Look Up (Every Token): Escaping Quadratic Complexity via Geometric Patterns and Algorithms
Lossy Common Information in a Learnable Gray-Wyner Network
Dynamic Parameter Reuse Augments Reasoning via Latent Chain of Thought
Destruction is a General Strategy to Learn Generation; Diffusion's Strength is to Take it Seriously; Exploration is the Future
Effect of Parallel Environments and Rollout Steps in PPO
Navigating the Manifold — A Geometric Perspective on Diffusion-Based Inverse Problems
Evaluating Machine Learned Inter-Atomic Potentials for a Practical Simulation Workflow
Visualizing LLM Latent Space Geometry Through Dimensionality Reduction
UFO-4D: Unposed Feedforward 4D reconstruction from Two Images
How To Open the Black Box: Modern Models for Mechanistic Interpretability
Scaling Group Inference for Diverse and High-Quality Generation
Generative AI Archaeology
Understanding and Fixing Bottlenecks in State Space Models: What Recency and Over-Smoothing Tell Us
Tracing the Principles Behind Modern Diffusion Models
Medical Interpretability and Knowledge Maps of Large Language Models
Performative Prediction made practical
Learning to Maximize Rewards via Reaching Goals
Probabilistic Circuits for Uncertainty Quantification
Extracting Model Precision from 20 Logprobs
Seeing Through Deception: Uncovering Misleading Creator Intent in Multimodal News with Vision-Language Models
LiveMoments: Reselected Key Photo Restoration in Live Photos via Reference-guided Diffusion
Taming Score-Based Denoisers in ADMM: A Convergent Plug-and-Play Framework
Provably Explaining Neural Additive Models
ImageDoctor: Diagnosing Text-to-Image Generation via Grounded Image Reasoning
When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency
PLAGUE: Plug-and-play Framework for Lifelong Adaptive Generation of Multi-turn Exploits
Orbital Transformers for Predicting Wavefunctions in Time-Dependent Density Functional Theory
COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning
Spatially Informed Autoencoders for Interpretable Visual Representation Learning
VERINA: Benchmarking Verifiable Code Generation
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
TiTok: Transfer Token-level Knowledge via Contrastive Excess to Transplant LoRA
Laplacian Kernelized Bandit
ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality
GRAM-DTI: Adaptive Multimodal Representation Learning for Drug–Target Interaction Prediction
Larger Datasets Can Be Repeated More: A Theoretical Analysis of Multi-Epoch Scaling in Linear Regression
Narrow Finetuning Leaves Clearly Readable Traces in the Activation Differences
DR-Submodular Maximization with Stochastic Biased Gradients: Classical and Quantum Gradient Algorithms
Hybrid Reinforcement: when reward is sparse, better to be dense
Semantic-Aware Diffusion LLM Inference With Adaptive Block Size
A Benchmark for Deep Information Synthesis
Gaussian certified unlearning in high dimensions: A hypothesis testing approach
Efficient Prediction of Large Protein Complexes via Subunit-Guided Hierarchical Refinement
A Stitch in Time Saves Nine: Proactive Self-Refinement for Language Models
Sublinear Spectral Clustering Oracle with Little Memory
When and Where to Reset Matters for Long-Term Test-Time Adaptation
Geometric Image Editing via Effects-Sensitive In-Context Inpainting with Diffusion Transformers
TusoAI: Agentic Optimization for Scientific Methods
ProstaTD: Bridging Surgical Triplet from Classification to Fully Supervised Detection
MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on LLMs
Test-Time Adaptation without Source Data for Out-of-Domain Bioactivity Prediction
Efficient Benchmarking of Functional Connectivity Modeling via Structure-aware Core-set Selection
VoG: Enhancing LLM Reasoning through Stepwise Verification on Knowledge Graphs
Bandits with Single-Peaked Preferences and Limited Resources
CogniLoad: A Synthetic Natural Language Reasoning Benchmark With Tunable Length, Intrinsic Difficulty, and Distractor Density
Unlocking the Value of Text: Event-Driven Reasoning and Multi-Level Alignment for Time Series Forecasting
Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs
Multi-Feature Quantized Self-Attention for Fair Large Language Models
Video-LevelGauge: Investigating Contextual Positional Bias in Video Language Models.
Mechanistic Independence: A Principle for Identifiable Disentangled Representations
Learning Ordinal Probabilistic Reward from Preferences
Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval
GeoFAR: Geography-Informed Frequency-Aware Super-Resolution for Climate Data
Selection, Reflection and Self-Refinement: Revisit Reasoning Tasks via a Causal Lens
Neural+Symbolic Approaches for Interpretable Actor-Critic Reinforcement Learning
Overlap-weighted orthogonal meta-learner for treatment effect estimation over time
ChinaTravel: An Open-Ended Travel Planning Benchmark with Compositional Constraint Validation for Language Agents
PAS: Estimating the target Accuracy before domain adaptation
Off-Policy Evaluation for Ranking Policies under Deterministic Logging Policies
VADv2: End-to-End Autonomous Driving via Probabilistic Planning
Map as a Prompt: Learning Multi-Modal Spatial-Signal Foundation Models for Cross-scenario Wireless Localization
Benefits and Limitations of Communication in Multi-Agent Reasoning
Adapting Self-Supervised Representations as a Latent Space for Efficient Generation
Sparkle: A Robust and Versatile Representation for Point Cloud-based Human Motion Capture
From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning
PALC: Preference Alignment via Logit Calibration
Newton Method Revisited: Global Convergence Rates up to $O(1/k^3)$ for Stepsize Schedules and Linesearch Procedures
AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth
MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation
Protein Structure Tokenization via Geometric Byte Pair Encoding
Beyond Simple Graphs: Neural Multi-Objective Routing on Multigraphs
Reward Model Routing in Alignment
Understanding Dataset Distillation via Spectral Filtering
MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation
Semantic Visual Anomaly Detection and Reasoning in AI-Generated Images
VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning
SimpleFold: Folding Proteins is Simpler than You Think
Language Models are Injective and Hence Invertible
Learning linear state-space models with sparse system matrices
VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration
Generalization of Diffusion Models Arises with a Balanced Representation Space
Strategic Scaling of Test-Time Compute: A Bandit Learning Approach
Efficient Resource-Constrained Training of Vision Transformers via Subspace Optimization
MambaVoiceCloning: Efficient and Expressive Text-to-Speech via State-Space Modeling and Diffusion Control
Trained on Tokens, Calibrated on Concepts: The Emergence of Semantic Calibration in LLMs
Toward Faithful Retrieval-Augmented Generation with Sparse Autoencoders
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
Personalized Feature Translation for Expression Recognition: An Efficient Source-Free Domain Adaptation Method
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation
Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models
Draft-based Approximate Inference for LLMs
Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training
Vulcan: Crafting Compact Class-Specific Vision Transformers For Edge Intelligence
Relatron: Automating Relational Machine Learning over Relational Databases
DPad: Efficient Diffusion Language Models with Suffix Dropout
Multi-Object System Identification from Videos
Unleashing Scientific Reasoning for Bio-experimental Protocol Generation via Structured Component-based Reward Mechanism
AQER: A Scalable and Efficient Data Loader for Digital Quantum Computers
StreamingThinker: Large Language Models Can Think While Reading
The Deleuzian Representation Hypothesis
MAGREF: Masked Guidance for Any-Reference Video Generation with Subject Disentanglement
Bures-Wasserstein Flow Matching for Graph Generation
Dynamic Novel View Synthesis in High Dynamic Range
HATSolver: Learning Gröbner Bases with Hierarchical Attention Transformers
Query-Level Uncertainty in Large Language Models
Adversarially Pretrained Transformers may be Universally Robust In-Context Learners
STAIRS-Former: Spatio-Temporal Attention with Interleaved Recursive Structure TransFormer for Offline Mulit-task Multi-agent Reinforcement Learning
Semantic Uncertainty Quantification of Hallucinations in LLMs: A Quantum Tensor Network Based Method
Dual-Solver: A Generalized ODE Solver for Diffusion Models with Dual Prediction
Topological Flow Matching
The Rank and Gradient Lost in Non-stationarity: Sample Weight Decay for Mitigating Plasticity Loss in Reinforcement Learning
RD-HRL: Generating Reliable Sub-Goals for Long-Horizon Sparse-Reward Tasks
D-REX: Differentiable Real-to-Sim-to-Real Engine for Learning Dexterous Grasping
Set Representation Auxiliary Learning with Adversarial Encoding Perturbation and Optimization
BioX-Bridge: Model Bridging for Unsupervised Cross-Modal Knowledge Transfer across Biosignals
Emergent Discrete Controller Modules for Symbolic Planning in Transformers
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task
Breaking the SFT Plateau: Multimodal Structured Reinforcement Learning for Chart-to-Code Generation
Bridging Piano Transcription and Rendering via Disentangled Score Content and Style
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs
Characterizing Deep Research: A Benchmark and Formal Definition
Efficient Zero-shot Inpainting with Decoupled Diffusion Guidance
Geometry-aware 4D Video Generation for Robot Manipulation
The Lie of the Average: How Class Incremental Learning Evaluation Deceives You?
MedAraBench: Large-scale Arabic Medical Question Answering Dataset and Benchmark
WaterDrum: Watermark-based Data-centric Unlearning Metric
CAR-LoRA: Training Compression-Aware and Robust LoRA Adapters for Evolving LLMs
Learning Dynamics Feature Representation via Policy Attention for Dynamic Path Planning in Urban Road Networks
Trinity: An Evolved LLM Coordinator
Projected Coupled Diffusion for Test-Time Constrained Joint Generation
Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation
Protection against Source Inference Attacks in Federated Learning
Theory-Grounded Evaluation of Human-Like Fallacy Patterns in LLM Reasoning
TRACE: Your Diffusion Model is Secretly an Instance Edge Detector
Capability-Based Scaling Laws for LLM-Based Red-Teaming
PRISM: Progressive Robust Learning for Open-World Continual Category Discovery
The Hidden Lattice Geometry of LLMs
Probing Rotary Position Embeddings through Frequency Entropy
AgentPO: Enhancing Multi-Agent Collaboration via Reinforcement Learning
Difficult Examples Hurt Unsupervised Contrastive Learning: A Theoretical Perspective
Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs
GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving
Graph Mixing Additive Networks
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?
Reinforcement Unlearning via Group Relative Policy Optimization
Supervised Fine-Tuning or Contrastive Learning? Towards Better Multimodal LLM Reranking
Fast-dLLM v2: Efficient Block-Diffusion LLM
A foundation model with multi-variate parallel attention to generate neuronal activity
FOCUS: Efficient Keyframe Selection for Long Video Understanding
InfoNCE Induces Gaussian Distribution
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding
Price of Quality: Sufficient Conditions for Sparse Recovery using Mixed-Quality Data
TaCo: A Benchmark for Lossless and Lossy Codecs of Heterogeneous Tactile Data
AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations
Learning Collective Variables from BioEmu with Time-Lagged Generation
Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models
Practical estimation of the optimal classification error with soft labels and calibration
CoT-RVS: Zero-Shot Chain-of-Thought Reasoning Segmentation for Videos
High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes
MLP Memory: A Retriever-Pretrained Memory for Large Language Models
Dual-Objective Reinforcement Learning with Novel Hamilton-Jacobi-Bellman Formulations
Reliable Evaluation of MRI Motion Correction: Dataset and Insights
Steering Evaluation-Aware Language Models To Act Like They Are Deployed
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference
Code World Models for General Game Playing
CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives
Towards Persistent Noise-Tolerant Active Learning of Regular Languages with Class Query
Scaling Generalist Data-Analytic Agents
Multiverse Mechanica: A Testbed for Learning Game Mechanics via Counterfactual Worlds
Generalised Flow Maps for Few-Step Generative Modelling on Riemannian Manifolds
DSA: Efficient Inference For Video Generation Models via Distributed Sparse Attention
RM-R1: Reward Modeling as Reasoning
Communication-Efficient Decentralized Optimization via Double-Communication Symmetric ADMM
Testing Most Influential Sets
Accessible, Realistic, and Fair Evaluation of Positive-Unlabeled Learning Algorithms
Enhancing Diffusion-Based Sampling with Molecular Collective Variables
Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning
Patronus: Interpretable Diffusion Models with Prototypes
Specialization after Generalization: Towards Understanding Test-Time Training in Foundation Models
Flock: A Knowledge Graph Foundation Model via Learning on Random Walks
CircuitSense: A Hierarchical Circuit System Benchmark Bridging Visual Comprehension and Symbolic Reasoning in Engineering Design Process
Jackpot: Align Actor-Policy Distribution for scalable and stable RL for LLM
Disrupting Hierarchical Reasoning: Adversarial Protection for Geographic Privacy in Multimodal Reasoning Models
Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents
Scaling up Memory for Robotic Control via Experience Retrieval
Adaptive Conformal Guidance for Learning under Uncertainty
MindMix: A Multimodal Foundation Model for Auditory Perception Decoding via Deep Neural-Acoustic Alignment
Conformalized Survival Counterfactuals Prediction for General Right-Censored Data
AVERE: Improving Audiovisual Emotion Reasoning with Preference Optimization
Fast and Stable Riemannian Metrics on SPD Manifolds via Cholesky Product Geometry
From Assistant to Independent Developer — Are GPTs Ready for Software Development?
Latent Denoising Makes Good Visual Tokenizers
Imagine How To Change: Explicit Procedure Modeling for Change Captioning
Can LLMs Reason Soundly in Law? Auditing Inference Patterns for Legal Judgment
SafeMPO: Constrained Reinforcement Learning with Probabilistic Incremental Improvement
Boosting for Predictive Sufficiency
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
Achieving Expert-Level Agent from Foundation Model via Complexity Curriculum Reinforcement Learning with Synthetic Data
Understanding vs. Generation: Navigating Optimization Dilemma in Multimodal Models
Gauge-invariant representation holonomy
A Unification of Discrete, Gaussian, and Simplicial Diffusion
Eliminating VAE for Fast and High-Resolution Generative Detail Restoration
Attention Is All You Need for KV Cache in Diffusion LLMs
Anatomy-aware Representation Learning for Medical Ultrasound
pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation
GRADIEND: Feature Learning within Neural Networks Exemplified through Biases
SCoT: Teaching 3D-LLMs to Think Spatially with Million-scale CoT Annotations
GDGB: A Benchmark for Generative Dynamic Text-Attributed Graph Learning
Secret-Protected Evolution for Differentially Private Synthetic Text Generation
AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
Speculative Speculative Decoding
Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance
TABLET: A Large-Scale Dataset for Robust Visual Table Understanding
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?
FedOpenMatch: Towards Semi-Supervised Federated Learning in Open-Set Environments
Exploring the Potential of Encoder-free Architectures in 3D LMMs
From Fields to Random Trees
Plug-and-Play Compositionality for Boosting Continual Learning with Foundation Models
Feedback-driven recurrent quantum neural network universality
Multimodal Aligned Semantic Knowledge for Unpaired Image-text Matching
Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning
Deep SPI: Safe Policy Improvement via World Models
ReLi3D: Relightable Multi-view 3D Reconstruction with Disentangled Illumination
SynCoGen: Synthesizable 3D Molecule Generation via Joint Reaction and Coordinate Modeling
Histopathology-Genomics Multi-modal Structural Representation Learning for Data-Efficient Precision Oncology
Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning
Charts Are Not Images: On the Challenges of Scientific Chart Editing
KDP: Simplifying Representation Dynamics in Kernel Space
Inferring the Invisible: Neuro-Symbolic Rule Discovery for Missing Value Imputation
MICLIP: Learning to Interpret Representation in Vision Models
SFBD-OMNI: Bridge models for lossy measurement restoration with limited clean samples
Hyperparameter Trajectory Inference with Conditional Lagrangian Optimal Transport
ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing
A Representer Theorem for Hawkes Processes via Penalized Least Squares Minimization
Lean Finder: Semantic Search for Mathlib That Understands User Intents
Dynamic Texture Modeling of 3D Clothed Gaussian Avatars from a Single Video
CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow-Map Models
Helix: Evolutionary Reinforcement Learning for Open-Ended Scientific Problem Solving
Mitigating Non-IID Drift in Zeroth-Order Federated LLM Fine-Tuning with Transferable Sparsity
Differentiable Simulation of Hard Contacts with Soft Gradients for Learning and Control
Buckingham $\pi$-Invariant Test‑Time Projection for Robust PDE Surrogate Modeling
Can Small Training Runs Reliably Guide Data Curation? Rethinking Proxy-Model Practice
InputDSA: Demixing, then comparing recurrent and externally driven dynamics
RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models
SciTS: Scientific Time Series Understanding and Generation with LLMs
Learning Escorted Protocols For Multistate Free-Energy Estimation
Towards Self-Evolving Agent Benchmarks : Validatable Agent Trajectory via Test-Time Exploration
Tight Bounds for Schrodinger Potential Estimation in Unpaired Data Translation
FlashWorld: High-quality 3D Scene Generation within Seconds
Learning to Reason Efficiently with Discounted Reinforcement Learning
VideoAnchor: Reinforcing Subspace-Structured Visual Cues for Coherent Visual-Spatial Reasoning
Circuit Insights: Towards Interpretability Beyond Activations
Diffusion as Infinite HVAEs: Do Diffusion Models Generalize Better than Deep VAEs?
R-WoM: Retrieval-augmented World Model For Computer-use Agents
Forest-Based Graph Learning for Semi-Supervised Node Classification
Multi-agent Coordination via Flow Matching
AdaCache: Adaptive Caching and Context Augmentation for Efficient LLM Serving
PEAR: Phase Entropy Aware Reward for Efficient Reasoning
ViTSP: A Vision Language Models Guided Framework for Large-Scale Traveling Salesman Problems
VoMP: Predicting Volumetric Mechanical Property Fields
ProReGen: Progressive Residual Generation under Attribute Correlations
Debiased and Denoised Projection Learning for Incomplete Multi-view Clustering
Can Vision-Language Models Answer Face to Face Questions in the Real-World?
BWCache: Accelerating Video Diffusion Transformers through Block-Wise Caching
Off-Trajectory Reasoning: Can LLMs Collaborate on Reasoning Trajectory?
Efficient Offline Reinforcement Learning via Peer-Influenced Constraint
An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models
Reshaping Reasoning in LLMs: A Theoretical Analysis of RL Training Dynamics through Pattern Selection
Zeros can be Informative: Masked Binary U-Net for Image Segmentation on Tensor Cores
Leveraging Data to Say No: Memory Augmented Plug-and-Play Selective Prediction
Thicker and Quicker: The Jumbo Token for Fast Plain Vision Transformers
Efficient Autoregressive Inference for Transformer Probabilistic Models
The Expressive Limits of Diagonal SSMs for State-Tracking
From Gradient Volume to Shapley Fairness: Towards Fair Multi-Task Learning
From Predictors to Samplers via the Training Trajectory
Neural Networks Learn Multi-Index Models Near the Information-Theoretic Limit
BioCAP: Exploiting Synthetic Captions Beyond Labels in Biological Foundation Models
GAP: Gradient Adjustment with Phase-guidance for Robust Vision-Proprioception Policies in Robotic Manipulation
Saddle-To-Saddle Dynamics in Deep ReLU Networks: Low-Rank Bias in the First Saddle Escape
ReFeR: Improving Evaluation and Reasoning through Hierarchy of Models
Learned Meta-Tokens for Language Modeling
Towards a Certificate of Trust: Task-Aware OOD Detection for Scientific AI
Self-Supervised Learning from Structural Invariance
Perception-Aware Policy Optimization for Multimodal Reasoning
Multimodal Policy Internalization for Conversational Agents
Distributionally Robust Cooperative Multi-agent Reinforcement Learning with Value Factorization
The 99% Success Paradox: When Near-Perfect Retrieval Equals Random Selection
Rapid Training of Hamiltonian Graph Networks Using Random Features
Lookup multivariate Kolmogorov-Arnold Networks
Adaptive gradient descent on Riemannian manifolds and its applications to Gaussian variational inference
Supporting Multimodal Intermediate Fusion with Informatic Constraint and Distribution Coherence
WSVD: Weighted Low-Rank Approximation for Fast and Efficient Execution of Low-Precision Vision-Language Models
SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge
Learning Robust Intervention Representations with Delta Embeddings
Flow2GAN: Hybrid Flow Matching and GAN with Multi-Resolution Network for One-/Two-step High-Fidelity Audio Generation
Reasoning Boosts Opinion Alignment in LLMs
Optimal Brain Restoration for Joint Quantization and Sparsification of LLMs
Understanding the Role of Training Data in Test-Time Scaling
Emergent Coordination in Multi-Agent Language Models
MetaVLA: Unified Meta Co-Training for Efficient Embodied Adaptation
Contextual and Seasonal LSTMs for Time Series Anomaly Detection
Value Flows
Personalized Reasoning: Just-in-time Personalization and Why LLMs Fail at It
EnvSocial-Diff: A Diffusion-Based Crowd Simulation Model with Environmental Conditioning and Individual- Group Interaction
Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models
AlphaAlign: Incentivizing Safety Alignment with Extremely Simplified Reinforcement Learning
Diversity-Enhanced Reasoning for Subjective Questions
EAMET: ROBUST MASSIVE MODEL EDITING VIA EMBEDDING ALIGNMENT OPTIMIZATION
Token-Guard: Towards Token-Level Hallucination Control via Self-Checking Decoding
Diverse Text Decoding via Iterative Reweighting
CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmarking of Large Language Models in Mental Health Question Answering
CyberGym: Evaluating AI Agents' Real-World Cybersecurity Capabilities at Scale
Self-Supervised Evolution Operator Learning for High-Dimensional Dynamical Systems
Heterogeneous Front-Door Effects: Debiased Estimation with Quasi-Oracle Guarantees
LightCtrl: Training-free Controllable Video Relighting
Towards Prompt-Robust Machine-Generated Text Detection
Color3D: Controllable and Consistent 3D Colorization with Personalized Colorizer
Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools
Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks
Enhancing Learning with Noisy Labels via Rockafellian Relaxation
HiDivDrop: Vision Token Reduction in MLLMs via Late Injection and Differentiable Top-K
Homeostatic Adaptation of Optimal Population Codes under Metabolic Stress
Fast Data Mixture Optimization via Gradient Descent
MARC: Memory-Augmented RL Token Compression for Efficient Video Understanding
Robust Adaptive Multi-Step Predictive Shielding
Cautious Weight Decay
ATPO: ADAPTIVE TREE POLICY OPTIMIZATION FOR MULTI-TURN MEDICAL DIALOGUE
Sampling Complexity of TD and PPO in RKHS
Domain Expansion: A Latent Space Construction Framework for Multi-Task Learning
Enhancing Agentic Search via Data Synthesis on Hierarchical Constraint Satisfaction
Loc$^{2}$: Interpretable Cross-View Localization via Depth-Lifted Local Feature Matching
A Genetic Algorithm for Navigating Synthesizable Molecular Spaces
A Brain-Inspired Gating Mechanism Unlocks Robust Computation in Spiking Neural Networks
gen2seg: Generative Models Enable Generalizable Instance Segmentation
Does Weak-to-strong Generalization Happen under Spurious Correlations?
Relational Graph Transformer
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Multiplayer Nash Preference Optimization
Guidance Watermarking for Diffusion Models
PoseX: AI Defeats Physics-based Methods on Protein Ligand Cross-Docking
Dynamics-inspired Structure Hallucination for Protein-protein Interaction Modeling
Spherical Watermark: Encryption-Free, Lossless Watermarking for Diffusion Models
SC-Arena: A Natural Language Benchmark for Single-Cell Reasoning with Knowledge-Augmented Evaluation
LC-PLM: Long-context Protein Language Modeling Using Bidirectional Mamba with Shared Projection Layers
G-reasoner: Foundation Models for Unified Reasoning over Graph-structured Knowledge
BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs
CARD: Towards Conditional Design of Multi-agent Topological Structures
Vertically Unified Agents for Graph Retrieval-Augmented Complex Reasoning
Solving Parameter-Robust Avoid Problems with Unknown Feasibility using Reinforcement Learning
Understanding the Implicit Biases of Design Choices for Time Series Foundation Models
Hyden: A Hybrid Dual-Path Encoder for Monocular Geometry of High-resolution Images
Knowledge Reasoning Language Model: Unifying Knowledge and Language for Inductive Knowledge Graph Reasoning
Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark
DiffuDETR: Rethinking Detection Transformers with Diffusion Process
GuidedSampling: Steering LLMs Towards Diverse Candidate Solutions at Inference-Time
Delta-XAI: A Unified Framework for Explaining Prediction Changes in Online Time Series Monitoring
ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks
TriC-Motion: Tri-Domain Causal Modeling Grounded Text-to-Motion Generation
Preserve and Personalize: Personalized Text-to-Image Diffusion Models without Distributional Drift
BridgeDrive: Diffusion Bridge Policy for Closed-Loop Trajectory Planning in Autonomous Driving
TPRU: Advancing Temporal and Procedural Understanding in Large Multimodal Models
Generalized Spherical Neural Operators: Green’s Function Formulation
Bridging Radiology and Pathology Foundation Models via Concept-Based Multimodal Co-Adaptation
Horseshoe Splatting: Handling Structural Sparsity for Uncertainty-Aware Gaussian-Splatting Radiance Field Rendering
Adaptive Thinking: Large Language Models Know When to Think in Latent Space
CoNavBench: Collaborative Long-Horizon Vision-Language Navigation Benchmark
UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings
FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction
Learning Patient-Specific Disease Dynamics With Latent Flow Matching For Longitudinal Imaging Generation
Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model
SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence
Latent Wavelet Diffusion For Ultra High-Resolution Image Synthesis
DVLA-RL: Dual-Level Vision–Language Alignment with Reinforcement Learning Gating for Few-Shot Learning
LogART: Pushing the Limit of Efficient Logarithmic Post-Training Quantization
On the Theoretical Limitations of Embedding-Based Retrieval
PointRePar : SpatioTemporal Point Relation Parsing for Robust Category-Unified 3D Tracking
Physically-Guided Optical Inversion Enable Non-Contact Side-Channel Attack on Isolated Screens
Flow Autoencoders are Effective Protein Tokenizers
Modeling Interference for Treatment Effect Estimation in Network Dynamic Environment
Train Once, Answer All: Many Pretraining Experiments for the Cost of One
DRBench: A Realistic Benchmark for Enterprise Deep Research
SESaMo: Symmetry-Enforcing Stochastic Modulation for Normalizing Flows
Beyond the Known: An Unknown-Aware Large Language Model for Open-Set Text Classification
On the Convergence Direction of Gradient Descent
Chessformer: A Unified Architecture for Chess Modeling
The Geometry and Topology of Circuits: the Manifolds of Modular Addition
SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts
Predictability Shapes Adaptation: An Evolutionary Perspective on Modes of Learning in Transformers
DeepFRC: An End-to-End Deep Learning Model for Functional Registration and Classification
Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions
TRAC: Tensor-Train based Across-layer Compression for Parameter-Efficient Fine-Tuning
Ringleader ASGD: The First Asynchronous SGD with Optimal Time Complexity under Data Heterogeneity
BrowseNet: Knowledge Graph-Based Associative Memory for Contextual Information Retrieval
Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning
MatRIS: Toward Reliable and Efficient Pretrained Machine Learning Interaction Potentials
OFMU: OPTIMIZATION-DRIVEN FRAMEWORK FOR MACHINE UNLEARNING
MT-DAO: Multi-Timescale Distributed Adaptive Optimizers with Local Updates
Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle
ThinkOmni: Lifting Textual Reasoning to Omni-modal Scenarios via Guidance Decoding
One protein is all you need
GRO-RAG: Gradient-aware Re-rank Optimization for Multi-source Retrieval-Augmented Generation
Simulating and Understanding Deceptive Behaviors in Long-Horizon Interactions
PerFit: Exploring Personalization Shifts in Representation Space of LLMs
One-Step Flow Q-Learning: Addressing the Diffusion Policy Bottleneck in Offline Reinforcement Learning
DNT: a Deeply Normalized Transformer that can be trained by Momentum SGD
The Price of Robustness: Stable Classifiers Need Overparameterization
Uncovering Conceptual Blindspots in Generative Image Models Using Sparse Autoencoders
DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference
R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?
Do Vision-Language Models Respect Contextual Integrity in Location Disclosure?
Revisiting Global Text Conditioning in Diffusion Transformers
Discounted Online Convex Optimization: Uniform Regret Across a Continuous Interval
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Bridging the Distribution Gap to Harness Pretrained Diffusion Priors for Super-Resolution
UltraLLaDA: Scaling the Context Length to 128K for Diffusion Large Language Models
Learning to Answer from Correct Demonstrations
Urban Socio-Semantic Segmentation with Vision-Language Reasoning
Tokenizing Single-Channel EEG with Time-Frequency Motif Learning
Your Models Have Thought Enough: Training Large Reasoning Models to Stop Overthinking
Learning to Interpret Weight Differences in Language Models
Riemannian High-Order Pooling for Brain Foundation Models
AUHead: Realistic Emotional Talking Head Generation via Action Units Control
Reasoning Language Model Inference Serving Unveiled: An Empirical Study
CASteer: Cross-Attention Steering for Controllable Concept Erasure
Point Prompting: Counterfactual Tracking with Video Diffusion Models
Efficient Sliced Wasserstein Distance Computation via Adaptive Bayesian Optimization
Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
OXtal: An All-Atom Diffusion Model for Organic Crystal Structure Prediction
Matting Anything 2: Towards Video Matting for Anything
VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models
Multi-Action Self-Improvement For Neural Combinatorial Optimization
RAG4DMC: Retrieval-Augmented Generation for Data-Level Modality Completion
Learning Structure-Semantic Evolution Trajectories for Graph Domain Adaptation
Geometric-Mean Policy Optimization
VibeVoice: Expressive Podcast Generation with Next-Token Diffusion
DES-LOC: Desynced Low Communication Adaptive Optimizers for Foundation Models
Using maximal information auxiliary variables to improve synthetic data generation based on TabPFN foundation models
When More is Less: Understanding Chain-of-Thought Length in LLMs
FREAK: A Fine-grained Hallucination Evaluation Benchmark for Advanced MLLMs
Resisting Contextual Interference in RAG via Parametric-Knowledge Reinforcement
MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning
Step-Aware Residual-Guided Diffusion for EEG Spatial Super-Resolution
LouisKV: Efficient KV Cache Retrieval for Long Input-Output Sequences
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models
Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference
A Recovery Guarantee for Sparse Neural Networks
Beyond Linear Processing: Dendritic Bilinear Integration in Spiking Neural Networks
The First Impression Problem: Internal Bias Triggers Overthinking in Reasoning Models
Latent Concept Disentanglement in Transformer-based Language Models
Revisting Node Affinity Prediction In Temporal Graphs
How Muon’s Spectral Design Benefits Generalization: A Study on Imbalanced Data
Beyond Distributions: Geometric Action Control for Continuous Reinforcement Learning
A Comprehensive Information-Decomposition Analysis of Large Vision-Language Models
Linear Mechanisms for Spatiotemporal Reasoning in Vision Language Models
FoNE: Precise Single-Token Number Embeddings via Fourier Features
Goal-Aware Identification and Rectification of Misinformation in Multi-Agent Systems
SIM-CoT: Supervised Implicit Chain-of-Thought
Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning
Faithful Bi-Directional Model Steering via Distribution Matching and Distributed Interchange Interventions
Online Pseudo-Zeroth-Order Training of Neuromorphic Spiking Neural Networks
When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models
GhostEI-Bench: Do Mobile Agent Resilience to Environmental Injection in Dynamic On-Device Environments?
IR-Agent: Expert-Inspired LLM Agents for Structure Elucidation from Infrared Spectra
Read the Room: Video Social Reasoning with Mental-Physical Causal Chains
Invisible Safety Threat: Malicious Finetuning for LLM via Steganography
Constant Degree Matrix-Driven Incomplete Multi-View Clustering via Connectivity-Structure and Embedding Tensor Learning
HistoPrism: Unlocking Functional Pathway Analysis from Pan-Cancer Histology via Gene Expression Prediction
Dual-Branch Representations with Dynamic Gated Fusion and Triple-Granularity Alignment for Deep Multi-View Clustering
Multi-Scale Diffusion-Guided Graph Learning with Power-Smoothing Random Walk Contrast for Multi-View Clustering
SEMA: Simple yet Effective Learning for Multi-Turn Jailbreak Attacks
Breaking the Correlation Plateau: On the Optimization and Capacity Limits of Attention-Based Regressors
Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search
Target-Aware Video Diffusion Models
Language Models Use Lookbacks to Track Beliefs
Distributional value gradients for stochastic environments
AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM
SLM-MUX: Orchestrating Small Language Models for Reasoning
VideoJudge: Bootstrapping Enables Scalable Supervision of MLLM-as-a-Judge for Video Understanding
Ego-Foresight: Self-supervised Learning of Agent-Aware Representations for Improved RL
Causal Interpretation of Neural Network Computations with Contribution Decomposition (CODEC)
JointDiff: Bridging Continuous and Discrete in Multi-Agent Trajectory Generation
Musculoskeletal simulation of limb movement biomechanics in Drosophila melanogaster
OPPO: Accelerating PPO-based RLHF via Pipeline Overlap
VisioMath: Benchmarking Figure-based Mathematical Reasoning in LMMs
DA$^2$: Depth Anything in Any Direction
SportR: A Benchmark for Multimodal Large Language Model Reasoning in Sports
The Power of Small Initialization in Noisy Low-Tubal-Rank Tensor Recovery
MVAR: Visual Autoregressive Modeling with Scale and Spatial Markovian Conditioning
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
Better Bounds for the Distributed Experts Problem
Gradient-Direction-Aware Density Control for 3D Gaussian Splatting
MAGE: Multi-scale Autoregressive Generation for Offline Reinforcement Learning
Learning Self-Critiquing Mechanisms for Region-Guided Chest X-Ray Report Generation
UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing
Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter
PiCa: Parameter-Efficient Fine-Tuning with Column Space Projection
The Price of Amortized inference in Sparse Autoencoders
Flow Matching with Injected Noise for Offline-to-Online Reinforcement Learning
CoDA: From Text-to-Image Diffusion Models to Truly Training-Free Dataset Distillation
ExGRPO: Learning to Reason from Prior Successes
Language-guided Open-world Video Anomaly Detection under Weak Supervision
PE-SGD: Differentially Private Deep Learning via Evolution of Gradient Subspace for Text
Emotions Where Art Thou: Understanding and Characterizing the Emotional Latent Space of Large Language Models
QueryStream: Advancing Streaming Video Understanding with Query-Aware Pruning and Proactive Response
An Information-Theoretic Lower Bound on the Generalization Error of Autoencoders
Understanding VLMs Spatial Mental Modeling Capability from Limited Views
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Seeing What’s Wrong: A Trajectory-Guided Approach to Caption Error Detection
Social Agents: Collective Intelligence Improves LLM Predictions
Ctrl-World: A Controllable Generative World Model for Robot Manipulation
BoreaRL: A Multi-Objective Reinforcement Learning Environment for Climate-Adaptive Boreal Forest Management
When Bias Helps Learning: Bridging Initial Prejudice and Trainability
Sample Lottery: Unsupervised Discovery of Critical Instances for LLM Reasoning
Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective
Tree Search for LLM Agent Reinforcement Learning
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation
Graph-of-Agents: A Graph-based Framework for Multi-Agent LLM Collaboration
SMixer: Rethinking Efficient-Training and Event-Driven SNNs
Why Attention Patterns Exist: A Unifying Temporal Perspective Analysis
FlowSearcher: Synthesizing Memory-Guided Agentic Workflows for Web Information Seeking
Point-MoE: Large-Scale Multi-Dataset Training with Mixture-of-Experts for 3D Semantic Segmentation
AesCoder: Code Aesthetics with Agentic Reward Feedback
Planning with an Embodied Learnable Memory
Efficient Credal Prediction through Decalibration
Decomposed Attention Fusion in MLLMs for Training-free Video Reasoning Segmentation
BiasScope: Towards Automated Detection of Bias in LLM-as-a-Judge Evaluation
Anchored Supervised Fine-Tuning
RefAny3D: 3D Asset-Referenced Diffusion Models for Image Generation
MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement
The Alignment Waltz: Jointly Training Agents to Collaborate for Safety
DragFlow: Unleashing DiT Priors with Region-Based Supervision for Drag Editing
CP-Agent: Context‑Aware Multimodal Reasoning for Cellular Morphological Profiling under Chemical Perturbations
Does FLUX Already Know How to Perform Physically Plausible Image Composition?
Advancing Multi-agent Traffic Simulation via R1-Style Reinforcement Fine-Tuning
Three Forward, One Backward: Memory-Efficient Full-Rank Fine-Tuning of Large Models via Extra Forward Passes
Evolution of Flash Attention
Tug-of-War No More: Harmonizing Accuracy and Robustness in Vision-Language Models via Stability-Aware Task Vector Merging
Spiking Discrepancy Transformer for Point Cloud Analysis
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends
PSP: Prompt-Guided Self-Training Sampling Policy for Active Prompt Learning
FlowBind: Efficient Any-to-Any Generation with Bidirectional Flows
OrthAlign: Orthogonal Subspace Decomposition for Non-Interfering Multi-Objective Alignment
Trace Anything: Representing Any Video in 4D via Trajectory Fields
Can LLMs Refuse Questions They Do Not Know? Measuring Knowledge-Aware Refusal in Factual Tasks
DeCo-DETR: Decoupled Cognition DETR for efficient Open-Vocabulary Object Detection
Diffusion & Adversarial Schrödinger Bridges via Iterative Proportional Markovian Fitting
Learning from Synthetic Data Improves Multi-hop Reasoning
One-Step Flow for Image Super-Resolution with Tunable Fidelity-Realism Trade-offs
Unified Registration of Cortical and Subcortical Structures
ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization
$\boldsymbol{\partial^\infty}$-Grid: Differentiable Grid Representations for Fast and Accurate Solutions to Differential Equations
Dimension-Free Minimax Rates for Learning Pairwise Interactions in Attention-Style Models
RL's Razor: Why Online Reinforcement Learning Forgets Less
Plug, Play, and Fortify: A Low-Cost Module for Robust Multimodal Image Understanding Models
Solving the Granularity Mismatch: Hierarchical Preference Learning for Long-Horizon LLM Agents
PIRN: Prototypical-based Intra-modal Reconstruction with Normality Communication for Multi-modal Anomaly Detection.
Greater than the Sum of Its Parts: Building Substructure into Protein Encoding Models
Consistent Text-to-Image Generation via Scene De-Contextualization
SAGA: Structural Aggregation Guided Alignment with Dynamic View and Neighborhood Order Selection for Multiview Graph Domain Adaptation
PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model
Online Black-Box Prompt Optimization with Regret Guarantees under Noisy Feedback
TangleScore: Tangle-Guided Purge and Imprint for Unstructured Knowledge Editing
Uncertainty-Aware Diagnostics for Physics-Informed Machine Learning
Enhancing Multi-Image Understanding through Delimiter Token Scaling
EigenBench: A Comparative Behavioral Measure of Value Alignment
The Markovian Thinker
Can LLMs Move Beyond Short Exchanges to Realistic Therapy Conversations?
VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
From Sorting Algorithms to Scalable Kernels: Bayesian Optimization in High-Dimensional Permutation Spaces
UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction as Reasoning
ReDDiT: Rehashing Noise for Discrete Visual Generation
STITCH: Simultaneous Thinking and Talking with Chunked Reasoning for Spoken Language Models
Understanding Collaboration Mechanism In VAE Recommender Systems
Critical attention scaling in long-context transformers
CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition
Asynchronous Matching with Dynamic Sampling for Multimodal Dataset Distillation
Belief-Based Offline Reinforcement Learning for Delay-Robust Policy Optimization
Sample Reward Soups: Query-efficient Multi-Reward Guidance for Text-to-Image Diffusion Models
Constrained Decoding of Diffusion LLMs with Context-Free Grammars
TS$^2$: Training with Sparsemax+, Testing with Softmax for Accurate and Diverse LLM Fine-Tuning
Two-Way Is Better Than One: Bidirectional Alignment with Cycle Consistency for Exemplar-Free Class-Incremental Learning
DanceTogether: Generating Interactive Multi-Person Video without Identity Drifting
Once-More: Continuous Self-Correction for Large Language Models via Perplexity-Guided Intervention
ULTRA-360: Unconstrained Dataset for Large-scale Temporal 3D Reconstruction across Altitudes and Omnidirectional Views
AMiD: Knowledge Distillation for LLMs with $\alpha$-mixture Assistant Distribution
A$^2$Search: Ambiguity-Aware Question Answering with Reinforcement Learning
Radiometrically Consistent Gaussian Surfels for Inverse Rendering
No Caption, No Problem: Caption-Free Membership Inference via Model-Fitted Embeddings
FRABench and UFEval: Unified Fine-grained Evaluation with Task and Aspect Generalization
MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers
No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
Correlations in the Data Lead to Semantically Rich Feature Geometry Under Superposition
Large Depth Completion Model from Sparse Observations
Spatial Structure and Selective Text Jointly Facilitate Image Clustering
Frozen Priors, Fluid Forecasts: Prequential Uncertainty for Low-Data Deployment with Pretrained Generative Models
Qronos: Correcting the Past by Shaping the Future... in Post-Training Quantization
Rethinking Unsupervised Cross-modal Flow Estimation: Learning from Decoupled Optimization and Consistency Constraint
Attributing Response to Context: A Jensen–Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation
Language in the Flow of Time: Time-Series-Paired Texts Weaved into a Unified Temporal Narrative
LaTo: Landmark-tokenized Diffusion Transformer for Fine-grained Human Face Editing
Temporal superposition and feature geometry of RNNs under memory demands
SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration
JailNewsBench: Multi-Lingual and Regional Benchmark for Fake News Generation under Jailbreak Attacks
Meta-Learning Theory-Informed Inductive Biases using Deep Kernel Gaussian Processes
Joint Optimization for 4D Human-Scene Reconstruction in the Wild
Uncertainty-driven Embedding Convolution
Understanding the Mechanisms of Fast Hyperparameter Transfer
Imitation Learning as Return Distribution Matching
Value Matching: Scalable and Gradient-Free Reward-Guided Flow Adaptation
PLoP: Precise LoRA Placement for Efficient Finetuning of Large Models
EXP-Bench: Can AI Conduct AI Research Experiments?
Neural Predictor-Corrector: Solving Homotopy Problems with Reinforcement Learning
FlowGen: Synthesizing Diverse Flowcharts to Enhance and Benchmark MLLM Reasoning
Steerable Adversarial Scenario Generation through Test-Time Preference Alignment
GaussianFusion: Unified 3D Gaussian Representation for Multi-Modal Fusion Perception
MindPilot: Closed-loop Visual Stimulation Optimization for Brain Modulation with EEG-guided Diffusion
CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation
Expanding Reasoning Potential in Foundation Model by Learning Diverse Chains of Thought Patterns
Understanding Routing Mechanism in Mixture-of-Experts Language Models
Risk-Sensitive Reinforcement Learning for Alleviating Exploration Dilemmas in Large Language Models
Breaking Barriers: Do Reinforcement Fine-tuning Gains Transfer To Unseen Domains?
SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety
Architecture-Agnostic Test-Time Adaptation via Backprop-Free Embedding Alignment
TianQuan-S2S: A Subseasonal-to-Seasonal Global Weather Model via Incorporate Climatology State
ROC-n-reroll: How verifier imperfection affects test-time scaling
AnesSuite: A Comprehensive Benchmark and Dataset Suite for Anesthesiology Reasoning in LLMs
Unleashing Guidance Without Classifiers for Human-Object Interaction Animation
MVR: Multi-view Video Reward Shaping for Reinforcement Learning
SSD-GS: Scattering and Shadow Decomposition for Relightable 3D Gaussian Splatting
TEMPFLOW-GRPO: WHEN TIMING MATTERS FOR GRPO IN FLOW MODELS
Efficient and Sharp Off-Policy Learning under Unobserved Confounding
MicroVerse: A Preliminary Exploration Toward a Micro-World Simulation
Mini-cluster Guided Long-tailed Deep Clustering
CityLens: Evaluating Large Vision-Language Models for Urban Socioeconomic Sensing
TopoFormer: Topology Meets Attention for Graph Learning
Exploring Knowledge Purification in Multi-Teacher Knowledge Distillation for LLMs
Differentially Private Equilibrium Finding in Polymatrix Games
Advancing Spatiotemporal Representations in Spiking Neural Networks via Parametric Invertible Transformation
Rethinking Consistent Multi-Label Classification under Inexact Supervision
Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models
GradPCA: Leveraging NTK Alignment for Reliable Out-of-Distribution Detection
scDFM: Distributional Flow Matching Model for Robust Single-Cell Perturbation Prediction
Adaptive Conformal Anomaly Detection with Time Series Foundation Models for Signal Monitoring.
Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning
Peak-Return Greedy Slicing: Subtrajectory Selection for Transformer-based Offline RL
IGU-LoRA: Adaptive Rank Allocation via Integrated Gradients and Uncertainty-Aware Scoring
Inference-Time Scaling of Discrete Diffusion Models via Importance Weighting and Optimal Proposal Design
No Prior, No Leakage: Revisiting Reconstruction Attacks in Trained Neural Networks
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
OpenThoughts: Data Recipes for Reasoning Models
Data Selection for LLM Alignment Using Fine-Grained Preferences
Critic–Adviser–Reviser Cyclic Refinement: Towards High-Quality EMR Corpus Generation with LLMs
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
Beyond DAGs: A Latent Partial Causal Model for Multimodal Learning
Efficient Message-Passing Transformer for Error Correcting Codes
D$^2$GS: Depth-and-Density Guided Gaussian Splatting for Stable and Accurate Sparse-View Reconstruction
Understanding and Improving Continuous LLM Adversarial Training via In-context Learning Theory
Instilling an Active Mind in Avatars via Cognitive Simulation
Causal Discovery via Quantile Partial Effect
DexNDM: Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-Wise Neural Dynamics Model
DreamPhase: Offline Imagination and Uncertainty-Guided Planning for Large-Language-Model Agents
SPELL: Self-Play Reinforcement Learning for evolving Long-Context Language Models
Soft-Di[M]O: Improved one-step Image Discrete Model
Breaking Scale Anchoring: Frequency Representation Learning for Accurate High-Resolution Inference from Low-Resolution Training
ES-dLLM: Efficient Inference for Diffusion Large Language Models by Early-Skipping
QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation
MoM: Linear Sequence Modeling with Mixture-of-Memories
MotionGPT3: Human Motion as a Second Modality
RESTRAIN: From Spurious Votes to Signals — Self-Training RL with Self-Penalization
MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE
A Guardrail for Safety Preservation: When Safety-Sensitive Subspace Meets Harmful-Resistant Null-Space
From Curiosity to Caution: Mitigating Reward Hacking for Best-of-$N$ with Pessimism
Time-To-Inconsistency: A Survival Analysis of Large Language Model Robustness to Adversarial Attacks
AutoFly: Vision-Language-Action Model for UAV Autonomous Navigation in the Wild
TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them
Seeing Through the Brain: New Insights from Decoding Visual Stimuli with fMRI
Exploring Synthesizable Chemical Space with Iterative Pathway Refinements
Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains
Self-Aligned Reward: Towards Effective and Efficient Reasoners
Graph Random Features for Scalable Gaussian Processes
DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning
Exposing Mixture and Annotating Confusion for Active Universal Test-Time Adaptation
Improving Semantic Proximity in English-Centric Information Retrieval through Cross-Lingual Alignment
FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning
Delay Flow Matching
Query-Guided Spatial–Temporal–Frequency Interaction for Music Audio–Visual Question Answering
Video-As-Prompt: Unified Semantic Control for Video Generation
NePTune: A Neuro-Pythonic Framework for Tunable Compositional Reasoning on Vision-Language
Unified Biomolecular Trajectory Generation via Pretrained Variational Bridge
Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs
Graphon Cross-Validation: Assessing Models on Network Data
A New Paradigm for Genome-wide DNA Methylation Prediction Without Methylation Input
Visual symbolic mechanisms: Emergent symbol processing in Vision Language Models
Story-Iter: A Training-free Iterative Paradigm for Long Story Visualization
Misalignments and RL Failure Modes in the Early Stage of Superintelligence
Conformal Prediction for Long-Tailed Classification
Spatial CAPTCHA: Generatively Benchmarking Spatial Reasoning for Human-Machine Differentiation
Evaluating Data Influence in Meta Learning
A Single Architecture for Representing Invariance Under Any Space Group
Neuron-Aware Data Selection in Instruction Tuning for Large Language Models
When Thinking Backfires: Mechanistic Insights into Reason-induced Misalignment
ARMOR: High-Performance Semi-Structured Pruning via Adaptive Matrix Factorization
Dissecting Representation Misalignment in Contrastive Learning via Influence Function
Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations
CooperTrim: Adaptive Data Selection for Uncertainty-Aware Cooperative Perception
Virne: A Comprehensive Benchmark for RL-based Network Resource Allocation in NFV
Universal Inverse Distillation for Matching Models with Real-Data Supervision (No GANs)
Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization
Assembling the Mind's Mosaic: Towards EEG Semantic Intent Decoding
There and Back Again: On the relation between Noise and Image Inversions in Diffusion Models
Autoencoding-Free Context Compression for LLMs via Contextual Semantic Anchors
MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs
Learning from Historical Activations in Graph Neural Networks
LVTINO: LAtent Video consisTency INverse sOlver for High Definition Video Restoration
HardcoreLogic: Challenging Large Reasoning Models with Long-tail Logic Puzzle Games
A Scalable Inter-edge Correlation Modeling in CopulaGNN for Link Sign Prediction
Astra: General Interactive World Model with Autoregressive Denoising
P3D: Highly Scalable 3D Neural Surrogates for Physics Simulations with Global Context
Unveiling the Cognitive Compass: Theory-of-Mind–Guided Multimodal Emotion Reasoning
SteinsGate: Adding Causality to Diffusions for Long Video Generation via Path Integral
LEAP: Local ECT-Based Learnable Positional Encodings for Graphs
From movement to cognitive maps: recurrent neural networks reveal how locomotor development shapes hippocampal spatial coding
A Scalable Distributed Framework for Multimodal GigaVoxel Image Registration
Influence Dynamics and Stagewise Data Attribution
Unified In-Context Video Editing
Discovering alternative solutions beyond the simplicity bias in recurrent neural networks
What Lies Beyond the View? Actively Constructing Spatial Beliefs in Foundation Models
ProtoKV: Long-context Knowledges Are Already Well-Organized Before Your Query
SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration
Pixel-Perfect Puppetry: Precision-Guided Enhancement for Face Image and Video Editing
Block-wise Adaptive Caching for Accelerating Diffusion Policy
HWC-Loco: A Hierarchical Whole-Body Control Approach to Robust Humanoid Locomotion
Fostering Video Reasoning via Next-Event Prediction
Lost in the Non-convex Loss Landscape: How to Fine-tune the Large Time Series Model?
CoDi: Subject-Consistent and Pose-Diverse Text-to-Image Generation
Provably Accelerated Imaging with Restarted Inertia and Score-based Image Priors
VFScale: Intrinsic Reasoning through Verifier-Free Test-time Scalable Diffusion Model
Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation
GTA1: GUI Test-time Scaling Agent
Task-free Adaptive Meta Black-box Optimization
Remotely Detectable Robot Policy Watermarking
Fast training of accurate physics-informed neural networks without gradient descent
SPR$^2$Q: Static Priority-based Rectifier Routing Quantization for Image Super-Resolution
Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models
TetraGT: Tetrahedral Geometry-Driven Explicit Token Interactions with Graph Transformer for Molecular Representation Learning
Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding
The logical expressiveness of topological neural networks
HackWorld: Evaluating Computer-Use Agents on Exploiting Web Application Vulnerabilities
Plan-Answer-Refine-on-Graph: Structured Planning and Self-Refinement for Large Language Model Reasoning on Knowledge Graphs
Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization
Dual-IPO: Dual-Iterative Preference Optimization for Text-to-Video Generation
Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
STVG-R1: Incentivizing Instance-Level Reasoning and Grounding in Videos via Reinforcement Learning
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning
Low-pass Personalized Subgraph Federated Recommendation
Learning a Game by Paying the Agents
Improving Code Localization with Repository Memory
Unlocking the Essence of Beauty: Advanced Aesthetic Reasoning with Relative-Absolute Policy Optimization
Temporally Detailed Hypergraph Neural ODE for Type 2 Diabetes Progression Modeling
3D Scene Prompting for Scene-Consistent Camera-Controllable Video Generation
Scaling with Collapse: Efficient and Predictable Training of LLM Families
Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function
Judo: A Juxtaposed Domain-oriented Multimodal Reasoner for Industrial Anomaly QA
Lightweight Spatio-Temporal Modeling via Temporally Shifted Distillation for Real-Time Accident Anticipation
RECAST: Expanding the Boundaries of LLMs' Complex Instruction Following with Multi-Constraint Data
ARFlow: Auto-regressive Optical Flow Estimation for Arbitrary-Length Videos via Progressive Next-Frame Forecasting
LoC-Decomp: LLM Autoformalization via Logical Concept Decomposition and Iterative Feedback Correction
CheckMate! Watermarking Graph Diffusion Models in Polynomial Time
Out-of-Distribution Graph Models Merging
Readout Representation: Redefining Neural Codes by Input Recovery
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding
CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models
Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models
Towards Understanding Valuable Preference Data for Large Language Model Alignment
SE-Diff: Simulator and Experience Enhanced Diffusion Model for Comprehensive ECG Generation
GuidedBench: Measuring and Mitigating the Evaluation Discrepancies of In-the-wild LLM Jailbreak Methods
Discovering Novel LLM Experts via Task-Capability Coevolution
R-Zero: Self-Evolving Reasoning LLM from Zero Data
Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond
GlobeDiff: State Diffusion Process for Partial Observability in Multi-Agent System
Leveraging Pretrained Knowledge at Inference Time: LoRA-Gated Contrastive Decoding for Multilingual Factual Language Generation in Adapted LLMs
Robust Generalized Schr\"{o}dinger Bridge via Sparse Variational Gaussian Processes
HippoTune: A Hippocampal Associative Loop–Inspired Fine-Tuning Method for Continual Learning
Exploring Specular Reflection Inconsistency for Generalizable Face Forgery Detection
Enabling True Global Perception in State Space Models for Visual Tasks
CLAUSE: Agentic Neuro-Symbolic Knowledge Graph Reasoning via Dynamic Learnable Context Engineering
SERE: Similarity-based Expert Re-routing for Efficient Batch Decoding in MoE Models
Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition
Difference-Aware Retrieval Polices for Imitation Learning
Missingness Bias Calibration in Feature Attribution Explanations
K²-Agent: Co-Evolving Know-What and Know-How for Hierarchical Mobile Device Control
Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences
Watermarking Diffusion Language Models
FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels
Exposing and Defending the Achilles' Heel of Video Mixture-of-Experts
Tequila: Deadzone-free Ternary Quantization for Large Language Models
A Unified Federated Framework for Trajectory Data Preparation via LLMs
Optimizer Choice Matters For The Emergence of Neural Collapse
ODEBrain: Continuous-Time EEG Graph for Modeling Dynamic Brain Networks
STEDiff: Revealing the Spatial and Temporal Redundancy of Backdoor Attacks in Text-to-Image Diffusion Models
ShieldedCode: Learning Robust Representations for Virtual Machine Protected Code
Diversity-Incentivized Exploration for Versatile Reasoning
SceneCOT: Eliciting Chain-of-Thought Reasoning in 3D Scenes
RLP: Reinforcement as a Pretraining Objective
G4Splat: Geometry-Guided Gaussian Splatting with Generative Prior
ContextPRM: Leveraging Contextual Coherence for multi-domain Test-Time Scaling
Rethinking JEPA: Compute‑Efficient Video Self-Supervised Learning with Frozen Teachers
From Static Benchmarks to Dynamic Protocol: Agent-Centric Text Anomaly Detection for Evaluating LLM Reasoning
CortiLife: A Unified Framework for Cortical Representation Learning across the Lifespan
RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers
Unlearning Evaluation through Subset Statistical Independence
Brain-IT: Image Reconstruction from fMRI via Brain-Interaction Transformer
Unsupervised Representation Learning for 3D Mesh Parameterization with Semantic and Visibility Objectives
Escaping Low-Rank Traps: Interpretable Visual Concept Learning via Implicit Vector Quantization
Programming with Pixels: Can Computer-Use Agents do Software Engineering?
OrthoSolver: A Neural Proper Orthogonal Decomposition Solver For PDEs
FETAL-GAUGE: A BENCHMARK FOR ASSESSING VISION-LANGUAGE MODELS IN FETAL ULTRASOUND
On-the-Fly Adaptation to Quantization: Configuration-Aware LoRA for Efficient Fine-Tuning of Quantized LLMs
Optimizing ID Consistency in Multimodal Large Models: Facial Restoration via Alignment, Entanglement, and Disentanglement
MoDr: Mixture-of-Depth-Recurrent Transformers for Test-Time Reasoning
Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation
No Pixel Left Behind: A Detail-Preserving Architecture for Robust High-Resolution AI-Generated Image Detection
Learning Global Hypothesis Space for Enhancing Synergistic Reasoning Chain
GenCP: Towards Generative Modeling Paradigm of Coupled physics with Application to Fluid-Structure Interaction
GGBall: Graph Generative Model on Poincaré Ball
CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics
DUET: Optimizing Training Data Mixtures via Coarse, Noisy Feedback from Unseen Evaluation Tasks
Preventing Model Collapse Under Overparametrization: Optimal Mixing Ratios for Interpolation Learning and Ridge Regression
RealBench: A Benchmark for Complex Physical Systems with Real-World Data
S$^2$-Guidance: Stochastic Self-Guidance for Training-Free Enhancement of Diffusion Models
Counterfactual Reasoning for Retrieval-Augmented Generation
POEMetric: The Last Stanza of Humanity
Foresight Diffusion: Improving Sampling Consistency in Predictive Diffusion Models
Topology Matters in RTL Circuit Representation Learning
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs
Grounding Computer Use Agents on Human Demonstrations
Deploying Models to Non-participating Clients in Federated Learning without Fine-tuning: A Hypernetwork-based Approach
Representing local protein environments with machine learning force fields
MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Models for Embodied Task Planning
Token-based Audio Inpainting via Discrete Diffusion
A Fair Bayesian Inference through Matched Gibbs Posterior
HDR-4DGS: High Dynamic Range 4D Gaussian Splatting from Alternating-exposure Monocular Videos
Fast Frank–Wolfe Algorithms with Adaptive Bregman Step-Size for Weakly Convex Functions
Doubly-Regressing Approach for Subgroup Fairness
Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning
Fastcar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge
Splat and Distill: Augmenting Teachers with Feed-Forward 3D Reconstruction For 3D-Aware Distillation
RL makes MLLMs see better than SFT
Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments
PGRF-Net: A Prototype-Guided Relational Fusion Network for Diagnostic Multivariate Time-Series Anomaly Detection
Bridging Explainability and Embeddings: BEE Aware of Spuriousness
Dataless Weight Disentanglement in Task Arithmetic via Kronecker-Factored Approximate Curvature
SAFA-SNN: Sparsity-Aware On-Device Few-Shot Class-Incremental Learning with Fast-Adaptive Structure of Spiking Neural Network
Compactness and Consistency: A Conjoint Framework for Deep Graph Clustering
FedMuon: Federated Learning with Bias-corrected LMO-based Optimization
Frustratingly Simple Retrieval Improves Challenging, Reasoning-Intensive Benchmarks
Provably Tracking Equivalent Mechanistic Interpretations Across Neural Networks
Is Finer Better? The Limits of Microscaling Formats in Large Language Models
ELViS: Efficient Visual Similarity from Local Descriptors that Generalizes Across Domains
LiFR-Seg: Anytime High-Frame-Rate Segmentation via Event-Guided Propagation
Learnable Sparsity for Vision Generative Models
SPICE: Submodular Penalized Information–Conflict Selection for Efficient Large Language Model Training
Dataset Color Quantization: A Training-Oriented Framework for Dataset-Level Compression
A$^2$FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning
Unlearning during Training: Domain-Specific Gradient Ascent for Domain Generalization
Point2RBox-v3: Self-Bootstrapping from Point Annotations via Integrated Pseudo-Label Refinement and Utilization
SubDyve: Subgraph-Driven Dynamic Propagation for Virtual Screening Enhancement Controlling False Positive
Implicit Regularisation in Diffusion Models: An Algorithm-Dependent Generalisation Analysis
Count Counts: Motivating Exploration in LLM Reasoning with Count-based Intrinsic Rewards
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
CryoLVM: Self-supervised Learning from Cryo-EM Density Maps with Large Vision Models
Antithetic Noise in Diffusion Models
The Layered Ontology of Models, Resolving the Epistemological Crisis of AI
Superficial Safety Alignment Hypothesis
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective
Predicting LLM Output Length via Entropy-Guided Representations
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs
$\textit{MADFormer}$: Mixed Autoregressive and Diffusion Transformers for Continuous Image Generation
Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards
How Do Transformers Learn to Associate Tokens: Gradient Leading Terms Bring Mechanistic Interpretability
Ads that Stick: Near-Optimal Ad Optimization through Psychological Behavior Models
PatchRefiner V2: Fast and Lightweight Real-Domain High-Resolution Metric Depth Estimation
Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought
Micro-Macro Retrieval: Reducing Long-Form Hallucination in Large Language Models
A Simple "Motivation" Can Enhance Reinforcement Finetuning of Large Reasoning Models
Concept Insertion Success over Time in Diffusion Models through Prompt-Conditioned Interventions
Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation
GALAX: Graph-Augmented Language Model for Explainable Reinforcement-Guided Subgraph Reasoning in Precision Medicine
Hot Fuzz: Temperature-Tunable Composition of Diffusion models with Fuzzy Logic
FideDiff: Efficient Diffusion Model for High-Fidelity Image Motion Deblurring
PatchDNA: A Flexible and Biologically-Informed Alternative to Tokenization for DNA
CLAP: Unsupervised 3D Representation Learning for Fusion 3D Perception via Curvature Sampling and Prototype Learning
Atomic HINs: Entity-Attribute Duality for Heterogeneous Graph Modeling
Parallel Token Generation for Language Models
Beyond Linear Probes: Dynamic Safety Monitoring for Language Models
Cooperative Sheaf Neural Networks
Towards Dynamic Interleaving Optimizers
Multi-Marginal Flow Matching with Adversarially Learnt Interpolants
Best-of-Infinity: Asymptotic Performance of Test-Time Compute
Oracle-efficient Hybrid Learning with Constrained Adversaries
Think Then Embed: Generative Context Improves Multimodal Embedding
Reducing Contextual Stochastic Bilevel Optimization via Structured Function Approximation
Dual-Path Condition Alignment for Diffusion Transformers
Certifying the Full YOLO Pipeline: A Probabilistic Verification Approach
LipNeXt: Scaling up Lipschitz-based Certified Robustness to Billion-parameter Models
Brain-Semantoks: Learning Semantic Tokens of Brain Dynamics with a Self-Distilled Foundation Model
Inducing Dyslexia in Vision Language Models
VideoZoomer: Reinforcement-Learned Temporal Focusing for Long Video Reasoning
LLMs are Single-threaded Reasoners: Demystifying the Working Mechanism of Soft Thinking
Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty
CARPRT: Class-Aware Zero-Shot Prompt Reweighting for Vision-Language Model
Enabling Fine-Tuning of Direct Feedback Alignment via Feedback-Weight Matching
Transformers as a Measure-Theoretic Associative Memory: A Statistical Perspective
CodeQuant: Unified Clustering and Quantization for Enhanced Outlier Smoothing in Low-Precision Mixture-of-Experts
The Coverage Principle: How Pre-Training Enables Post-Training
LD-EnSF: Synergizing Latent Dynamics with Ensemble Score Filters for Fast Data Assimilation with Sparse Observations
PICABench: How Far are We from Physical Realistic Image Editing?
Zero-Sacrifice Lifelong Adversarial Defense for Pre-Trained Encoders
DRIFT: Divergent Response in Filtered Transformations for Robust Adversarial Defense
Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer
CaRe-BN: Precise Moving Statistics for Stabilizing Spiking Neural Networks in Reinforcement Learning
DecompGAIL: Learning Realistic Traffic Behaviors with Decomposed Multi-Agent Generative Adversarial Imitation Learning
DataMIL: Selecting Data for Robot Imitation Learning with Datamodels
Fast Escape, Slow Convergence: Learning Dynamics of Phase Retrieval under Power-Law Data
High-Dimensional Analysis of Single-Layer Attention for Sparse-Token Classification
ALM-MTA: Front-Door Causal Multi-Touch Attribution Method for Creator-Ecosystem Optimization
TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models
Improving Human-AI Coordination through Online Adversarial Training and Generative Models
Splat Feature Solver
Animal behavioral analysis and neural encoding with transformer-based self-supervised pretraining
Reforming the Mechanism: Editing Reasoning Patterns in LLMs with Circuit Reshaping
Chain-of-Context Learning: Dynamic Constraint Understanding for Multi-Task VRPs
InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents
Multilevel Control Functional
(U)NFV: Supervised and Unsupervised Neural Finite Volume Methods for Solving Hyperbolic PDEs
MARS-Sep: Multimodal-Aligned Reinforced Sound Separation
ManipEvalAgent: Promptable and Efficient Evaluation Framework for Robotic Manipulation Policies
SERUM: Simple, Efficient, Robust, and Unifying Marking for Diffusion-based Image Generation
Capturing Visual Environment Structure Correlates with Control Performance
One-Shot Exemplars for Class Grounding in Self-Supervised Learning
LCA: Local Classifier Alignment for Continual Learning
Temperature as a Meta-Policy: Adaptive Temperature in LLM Reinforcement Learning
Learning to summarize user information for personalized reinforcement learning from human feedback
UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models
Towards Safe and Optimal Online Bidding: A Modular Look-ahead Lyapunov Framework
Do 3D Large Language Models Really Understand 3D Spatial Relationships?
Visual Prompt-Agnostic Evolution
Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking
Beyond Ensembles: Simulating All-Atom Protein Dynamics in a Learned Latent Space
UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation
Prompt-Robust Vision-Language Models via Meta-Finetuning
Beyond Static Vision: Scene Dynamic Field Unlocks Intuitive Physics Understanding in Multi-modal Large Language Models
WOW-Seg: A Word-free Open World Segmentation Model
Recover Cell Tensor: Diffusion-Equivalent Tensor Completion for Fluorescence Microscopy Imaging
Single-Loop Byzantine-Resilient Federated Bilevel Optimization
Pruning as a Cooperative Game: Surrogate-Assisted Layer Contribution Estimation for Large Language Models
Computer Agent Arena: Toward Human-Centric Evaluation and Analysis of Computer-Use Agents
MaskInversion: Localized Embeddings via Optimization of Explainability Maps
Neural Compression of 3D Meshes using Sparse Implicit Representation
A Spectral-Grassmann Wasserstein metric for operator representations of dynamical systems
Can Transformers Really Do It All? On the Compatibility of Inductive Biases Across Tasks
Decentralized Nonconvex Optimization under Heavy-Tailed Noise: Normalization and Optimal Convergence
On Fairness of Task Arithmetic: The Role of Task Vectors
Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models
NRGPT: An Energy-based Alternative for GPT
Bradley-Terry and Multi-Objective Reward Modeling Are Complementary
WARP: Weight Teleportation for Attack-Resilient Unlearning Protocols
From Prediction to Perfection: Introducing Refinement to Autoregressive Image Generation
Contrastive Diffusion Guidance for Spatial Inverse Problems
Predictive CVaR Q-learning
KnowProxy: Adapting Large Language Models by Knowledge-guided Proxy
Counterfactual Explanations on Robust Perceptual Geodesics
LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution
Fine-Grained Privacy Extraction from Retrieval-Augmented Generation Systems by Exploiting Knowledge Asymmetry
One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning
Matching multiple experts: on the exploitability of multi-agent imitation learning
Tractability via Low Dimensionality: The Parameterized Complexity of Training Quantized Neural Networks
ACE-Bench: Benchmarking Agentic Coding in End-to-End Development of Complex Features
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
In-Context Algorithm Emulation in Fixed-Weight Transformers
TrajFlow: Nation-wide Pseudo GPS Trajectory Generation with Flow Matching Models
PixNerd: Pixel Neural Field Diffusion
Breaking Safety Paradox with Feasible Dual Policy Iteration
Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas
PRO-MOF: Policy Optimization with Universal Atomistic Models for Controllable MOF Generation
Verification of the Implicit World Model in a Generative Model via Adversarial Sequences
Forward-Learned Discrete Diffusion: Learning how to noise to denoise faster
Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
An Agentic Framework with LLMs for Solving Complex Vehicle Routing Problems
Score-based Greedy Search for Structure Identification of Partially Observed Linear Causal Models
NetArena: Dynamically Generated LLM Benchmarks for Network Applications
Rethinking LoRA for Privacy-Preserving Federated Learning in Large Models
Efficient Multimodal Spatial Reasoning via Dynamic and Asymmetric Routing
ReSplat: Degradation-agnostic Feed-forward Gaussian Splatting via Self-guided Residual Diffusion
A New Approach to Controlling Linear Dynamical Systems
SASFT: Sparse Autoencoder-guided Supervised Finetuning to Mitigate Unexpected Code-Switching in LLMs
SynthWorlds: Controlled Parallel Worlds for Disentangling Reasoning and Knowledge in Language Models
RoRE: Rotary Ray Embedding for Generalised Multi-Modal Scene Understanding
Unbiased Gradient Estimation for Event Binning via Functional Backpropagation
Anime-Ready: Controllable 3D Anime Character Generation with Body-Aligned Component-Wise Garment Modeling
Quartet of Diffusions: Structure-Aware Point Cloud Generation through Part and Symmetry Guidance
IVC-Prune: Revealing the Implicit Visual Coordinates in LVLMs for Vision Token Pruning
FlowCast: Advancing Precipitation Nowcasting with Conditional Flow Matching
VITA: Vision-to-Action Flow Matching Policy
Low rank adaptation of chemical foundation models generate effective odorant representations
Post-training Large Language Models for Diverse High-Quality Responses
On the Reasoning Abilities of Masked Diffusion Language Models
FrontierCO: Real-World and Large-Scale Evaluation of Machine Learning Solvers for Combinatorial Optimization
MedLesionVQA: A Multimodal Benchmark Emulating Clinical Visual Diagnosis for Body Surface Health
TokenSeek: Memory Efficient Fine Tuning via Instance-Aware Token Ditching
Robust Federated Inference
Dual Language Models: Balancing sample-efficiency and overfitting resilience
SpecBranch: Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism
AbstRaL: Augmenting LLMs' Reasoning by Reinforcing Abstract Thinking
Zero-shot Human Pose Estimation using Diffusion-based Inverse solvers
CellAgent: LLM-Driven Multi-Agent Framework for Natural Language-Based Single-Cell Analysis
Decoupling Dynamical Richness from Representation Learning: Towards Practical Measurement
Don't Settle Too Early: Self-Reflective Remasking for Diffusion Language Models
Demystifying and Enhancing the Efficiency of Large Language Model Based Search Agents
When a Robot is More Capable than a Human: Learning from Constrained Demonstrators
Measuring Uncertainty Calibration
AutoLibra: Agent Metric Induction from Open-Ended Human Feedback
Pedagogically-Inspired Data Synthesis for Language Model Knowledge Distillation
Curation Leaks: Membership Inference Attacks against Data Curation for Machine Learning
Part-level Semantic-guided Contrastive Learning for Fine-grained Visual Classification
Segment-Level Attribution for Selective Learning of Long Reasoning Traces
MergOPT: A Merge-Aware Optimizer for Robust Model Merging
Linking Process to Outcome: Conditional Reward Modeling for LLM Reasoning
Flow Matching with Semidiscrete Couplings
TFHE-Coder: Evaluating LLM Agents for secure Fully Homomorphic Encryption Code Generation
Guided Query Refinement: Multimodal Hybrid Retrieval with Test-Time Optimization
NeMo-map: Neural Implicit Flow Fields for Spatio-Temporal Motion Mapping
DoFlow: Flow-based Generative Models for Interventional and Counterfactual Forecasting on Time Series
Priors in time: Missing inductive biases for language model interpretability
Beyond Membership: Limitations of Add/Remove Adjacency in Differential Privacy
LLMS ON TRIAL: Evaluating Judicial Fairness For Large Language Models
Systematic Biosafety Evaluation of DNA Language Models under Jailbreak Attacks
Graph Representational Learning: When Does More Expressivity Hurt Generalization?
ReIn: Conversational Error Recovery with Reasoning Inception
Pay Less Attention to Function Words for Free Robustness of Vision-Language Models
ASIDE: Architectural Separation of Instructions and Data in Language Models
Beyond Length: Quantifying Long-Range Information for Long-Context LLM Pretraining Data
Nüwa: Mending the Spatial Integrity Torn by VLM Token Pruning
RADAR: Reasoning–Ability and Difficulty-Aware Routing in Language Models
WebDevJudge: Evaluating (M)LLMs as Critiques for Web Development Quality
LumiTex: Towards High-Fidelity PBR Texture Generation with Illumination Context
(Token-Level) \textbf{InfoRMIA}: Stronger Membership Inference and Privacy Assessment for LLMs
PRISM-Physics: Causal DAG-Based Process Evaluation for Physics Reasoning
Real-Time Motion-Controllable Autoregressive Video Diffusion
Robustness in Text-Attributed Graph Learning: Insights, Trade-offs, and New Defenses
A Federated Generalized Expectation-Maximization Algorithm for Mixture Models with an Unknown Number of Components
BioBO: Biology-informed Bayesian Optimization for Perturbation Design
Activation Steering for LLM Alignment via a Unified ODE-Based Framework
The Alignment Auditor: A Bayesian Framework for Verifying and Refining LLM Objectives
Energy-Regularized Sequential Model Editing on Hyperspheres
DELTA-Code: How RL Unlocks and Transfers New Programming Algorithms in LLMs
Mathesis: Towards Formal Theorem Proving from Natural Languages
Beyond Text-Only: Towards Multimodal Table Retrieval in Open-World
Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals
ACCORD: Alleviating Concept Coupling through Dependence Regularization for Text-to-Image Diffusion Personalization
Understanding Task Vectors in In-Context Learning: Emergence, Functionality, and Limitations
HEEGNet: Hyperbolic Embeddings for EEG
Are Global Dependencies Necessary? Scalable Time Series Forecasting via Local Cross-Variate Modeling
TrojanTO: Action-Level Backdoor Attacks Against Trajectory Optimization Models
Accelerated Parallel Tempering via Neural Transports
MIMIC: Mask-Injected Manipulation Video Generation with Interaction Control
JAPAN: Joint Adaptive Prediction Areas with Normalising Flow
ON THE ROLE OF IMPLICIT REGULARIZATION OF STOCHASTIC GRADIENT DESCENT IN GROUP ROBUSTNESS
Aria: an Agent for Retrieval and Iterative Auto-Formalization via Dependency Graph
PRISM: Controllable Diffusion for Compound Image Restoration with Scientific Fidelity
LINGOLY-TOO: Disentangling Reasoning from Knowledge with Templatised Orthographic Obfuscation
QuantSparse: Comprehensively Compressing Video Diffusion Transformer with Model Quantization and Attention Sparsification
Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models
MASS: MoErging through Adaptive Subspace Selection
Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models
How reinforcement learning after next-token prediction facilitates learning
LINK: Learning Instance-level Knowledge from Vision-Language Models for Human-Object Interaction Detection
Genomic Foundationless Models: Pretraining Does Not Promise Performance
TOUCH: Text-guided Controllable Generation of Free-Form Hand-Object Interactions
CSRv2: Unlocking Ultra-Sparse Embeddings
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
Diffusion Negative Preference Optimization Made Simple
U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs
On the Convergence Behavior of Preconditioned Gradient Descent Toward the Rich Learning Regime
ASMIL: Attention-Stabilized Multiple Instance Learning for Whole-Slide Imaging
SparseEval: Efficient Evaluation of Large Language Models by Sparse Optimization
Figma2Code: Automating Multimodal Design to Code in the Wild
Rethinking LLM Reasoning: From Explicit Trajectories to Latent Representations
The State of Reinforcement Finetuning for Transformer-based Generative Agents
Replicable Reinforcement Learning with Linear Function Approximation
ParaS2S: Benchmarking and Aligning Spoken Language Models for Paralinguistic-aware Speech-to-Speech Interaction
STORK: Faster Diffusion and Flow Matching Sampling by Resolving both Stiffness and Structure-Dependence
IncVGGT: Incremental VGGT for Memory-Bounded Long-Range 3D Reconstruction
Learning to Recall with Transformers Beyond Orthogonal Embeddings
Reasoning in Space via Grounding in the World
FIRE: Frobenius-Isometry Reinitialization for Balancing the Stability–Plasticity Tradeoff
Diverse and Sparse Mixture-of-Experts for Causal Subgraph–Based Out-of-Distribution Graph Learning
Multiple Token Divergence: Measuring and Steering In-Context Computation Density
LeRobot: An Open-Source Library for End-to-End Robot Learning
Learning to Reason in Structured In-context Environments with Reinforcement Learning
OD$^3$: Optimization-free Dataset Distillation for Object Detection
A Memory-Efficient Hierarchical Algorithm for Large-scale Optimal Transport Problems
ECHO: Toward Contextual Seq2Seq Paradigms in Large EEG Models
Textual Equilibrium Propagation for Deep Compound AI Systems
Catching the Details: Self-Distilled RoI Predictors for Fine-Grained MLLM Perception
Triangle Multiplication is All You Need for Biomolecular Structure Representations
TimeRecipe: A Time-Series Forecasting Recipe via Benchmarking Module Level Effectiveness
Identity-Free Deferral For Unseen Experts
Gradient-Aligned Calibration for Post-Training Quantization of Diffusion Models
JailbreakLoRA: Your Downloaded LoRA from Sharing Platforms might be Unsafe
AI-for-Science Low-code Platform with Bayesian Adversarial Multi-Agent Framework
DynamicInfer: Runtime-Aware Sparse Offloading for LLMs Inference on a Consumer-Grade GPU
Holdout-Loss-Based Data Selection for LLM Finetuning via In-Context Learning
**TandemFoilSet**: Datasets for Flow Field Prediction of Tandem-Airfoil Through the Reuse of Single Airfoils
On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment
Adaptive Data-Knowledge Alignment in Genetic Perturbation Prediction
Verifying Chain-of-Thought Reasoning via its Computational Graph
GARLIC: Graph Attention-based Relational Learning of Multivariate Time Series in Intensive Care
Boomerang Distillation Enables Zero-Shot Model Size Interpolation
Learning Domain-Aware Task Prompt Representations for Multi-Domain All-in-One Image Restoration
Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility
Deconstructing Positional Information: From Attention Logits to Training Biases
Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs
Product-Quantised Image Representation for High-Quality Image Synthesis
Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models
Using Graph Neural Networks in Reinforcement Learning: A Practical Guide
TD-MoE: Tensor Decomposition for MoE Models
TableDART: Dynamic Adaptive Multi-Modal Routing for Table Understanding
Shift-and-Sum Quantization for Visual Autoregressive Models
Regret-Guided Search Control for Efficient Learning in AlphaZero
From U-Nets to DiTs: The Architectural Evolution of Text-to-Image Diffusion Models (2021–2025)
reAR: Rethinking Visual Autoregressive Models via Token-wise Consistency Regularization
Continual Low-Rank Adapters for LLM-based Generative Recommender Systems
SAGE: Spatial-visual Adaptive Graph Exploration for Visual Place Recognition
Pusa V1.0: Unlocking Temporal Control in Pretrained Video Diffusion Models via Vectorized Timestep Adaptation
Completing Missing Annotation: Multi-Agent Debate for Accurate and Scalable Relevant Assessment for IR Benchmarks
sleep2vec: Unified Cross-Modal Alignment for Heterogeneous Nocturnal Biosignals
On the Generalization Capacities of MLLMs for Spatial Intelligence
Data-Centric Lessons To Improve Speech-Language Pretraining
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
Vision Language Models are Biased
EgoBrain: Synergizing Minds and Eyes For Human Action Understanding
GIR-Bench: Versatile Benchmark for Generating Images with Reasoning
Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models
JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models
Reasoned Safety Alignment: Ensuring Jailbreak Defense via Answer-Then-Check
SNAPHARD CONTRAST LEARNING
MULTIMODALITY AS SUPERVISION: SELF-SUPERVISED SPECIALIZATION TO THE TEST ENVIRONMENT VIA MULTIMODALITY
Generative Universal Verifier as Multimodal Meta-Reasoner
BideDPO: Conditional Image Generation with Simultaneous Text and Condition Alignment
PathChat-SegR1: Reasoning Segmentation in Pathology via SO-GRPO
On the Limits of Sparse Autoencoders: A Theoretical Framework and Reweighted Remedy
Retain and Adapt: Auto-Balanced Model Editing for Open-Vocabulary Object Detection under Domain Shifts
EVALUATING MEMORY IN LLM AGENTS VIA INCRE- MENTAL MULTI-TURN INTERACTIONS
$\pi^3$: Permutation-Equivariant Visual Geometry Learning
Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures
Branch and Bound Search for Exact MAP Inference in Credal Networks
LayerSync: Self-aligning Intermediate Layers
Theoretical Analysis of Contrastive Learning under Imbalanced Data: From Training Dynamics to a Pruning Solution
AlignSep: Temporally-Aligned Video-Queried Sound Separation with Flow Matching
OmniPortrait: Fine-Grained Personalized Portrait Synthesis via Pivotal Optimization
Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization
CARE: Covariance-Aware and Rank-Enhanced Decomposition for Enabling Multi-Head Latent Attention
Lookahead Tree-Based Rollouts for Enhanced Trajectory-Level Exploration in Reinforcement Learning with Verifiable Rewards
Count Bridges enable Modeling and Deconvolving Transcriptomics
LeSTD: LLM Compression via Learning-based Sparse Tensor Decomposition
MoCa: Modeling Object Consistency for 3D Camera Control in Video Generation
Signal Structure-Aware Gaussian Splatting for Large-Scale Scene Reconstruction
Variation in Verification: Understanding Verification Dynamics in Large Language Models
Disentanglement of Variations with Multimodal Generative Modeling
Latent Thinking Optimization: Your Latent Reasoning Language Model Secretly Encodes Reward Signals in its Latent Thoughts
CyclicReflex: Improving Reasoning Models via Cyclical Reflection Token Scheduling
Seeing What’s Not There: Negation Understanding Needs More Than Training
Disentangled Robot Learning via Separate Forward and Inverse Dynamics Pretraining
TimeSeriesExamAgent: Creating TimeSeries Reasoning Benchmarks at Scale
Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models
Can You Hear Me Now? A Benchmark for Long-Range Graph Propagation
Panda: A pretrained forecast model for chaotic dynamics
Knowledge Distillation for Large Language Models through Residual Learning
Distributions as Actions: A Unified Framework for Diverse Action Spaces
Scaling Up, Speeding Up: A Benchmark of Speculative Decoding for Efficient LLM Test-Time Scaling
A Physics-Inspired Optimizer: Velocity Regularized Adam
Sparse CLIP: Co-Optimizing Interpretability and Performance in Contrastive Learning
PreciseCache: Precise Feature Caching for Efficient and High-fidelity Video Generation
Reconstructing KV Caches with Cross-Layer Fusion for Enhanced Transformers
Reasoning as Representation: Rethinking Visual Reinforcement Learning in Image Quality Assessment
First is Not Really Better Than Last: Evaluating Layer Choice and Aggregation Strategies in Language Model Data Influence Estimation
CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally
Towards Understanding the Shape of Representations in Protein Language Models
Multi-state Protein Design with DynamicMPNN
Fast Proteome-Scale Protein Interaction Retrieval via Residue-Level Factorization
EigenScore: OOD Detection using Posterior Covariance in Diffusion Models
TRACEDET: HALLUCINATION DETECTION FROM THE DECODING TRACE OF DIFFUSION LARGE LANGUAGE MODELS
QuRL: Rubrics As Judge For Open-Ended Question Answering
Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models
Scaling Agents via Continual Pre-training
Strongly Convex Sets in Riemannian Manifolds
High-Probability Bounds for the Last Iterate of Clipped SGD
WavefrontDiffusion: Dynamic Decoding Schedule for Improved Reasoning
Tokenisation over Bounded Alphabets is Hard
DistMLIP: A Distributed Inference Platform for Machine Learning Interatomic Potentials
From Narrow to Panoramic Vision: Attention-Guided Cold-Start Reshapes Multimodal Reasoning
Neural Message-Passing on Attention Graphs for Hallucination Detection
Learning a distance measure from the information-estimation geometry of data
Relative Entropy Pathwise Policy Optimization
Random-projection ensemble dimension reduction
3DGEER: 3D Gaussian Rendering Made Exact and Efficient for Generic Cameras
CFT-RAG: An Entity Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter
Error Feedback for Muon and Friends
VisCoder2: Building Multi-Language Visualization Coding Agents
Enhancing Molecular Property Predictions by Learning from Bond Modelling and Interactions
Towards Personalized Deep Research: Benchmarks and Evaluations
Disentangled Hierarchical VAE for 3D Human-Human Interaction Generation
LEGACY: A Lightweight Dynamic Gradient Compression Strategy for Distributed Deep Learning
Test-Time Alignment for Large Language Models via Textual Model Predictive Control
Bridging Input Feature Spaces Towards Graph Foundation Models
Let OOD Feature Exploring Vast Predefined Classifiers
Rethinking Causal Mask Attention for Vision-Language Inference
Taming Curvature: Architecture Warm-up for Stable Transformer Training
The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation
Enhancing Sparse Event Detection in Healthcare Time-Series via Adaptive Gate of Context–Detail Interaction
RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning
LearnPruner: Rethinking Attention-based Token Pruning in Vision Language Models
Machine Unlearning under Retain–Forget Entanglement
EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements
Learning Ising Models under Hard Constraints using One Sample
Numerion: A Multi-Hypercomplex Model for Time Series Forecasting
ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding
Learning Explicit Single-Cell Dynamics Using ODE Representations
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation
LiveClin: A Live Clinical Benchmark without Leakage
Robust Amortized Bayesian Inference with Self-Consistency Losses on Unlabeled Data
Strategic Planning and Rationalizing on Trees Make LLMs Better Debaters
Learning Adaptive Distribution Alignment with Neural Characteristic Function for Graph Domain Adaptation
Embodied Agents Meet Personalization: Investigating Challenges and Solutions Through the Lens of Memory Utilization
REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Reasoning
MCbiF: Measuring Topological Autocorrelation in Multiscale Clusterings via 2-Parameter Persistent Homology
Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image
FastGHA: Generalized Few-Shot 3D Gaussian Head Avatars with Real-Time Animation
EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling
Learning Molecular Chirality via Chiral Determinant Kernels
AudioTrust: Benchmarking The Multifaceted Trustworthiness of Audio Large Language Models
Entropy-preserving reinforcement learning
Detecting Invariant Manifolds in ReLU-Based RNNs
Deep Learning with Learnable Product-Structured Activations
Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning
Learning to Reason for Hallucination Span Detection
Mix-Ecom: Towards Mixed-Type E-Commerce Dialogues with Complex Domain Rules
VOGUE: Unified Understanding, Generation, and Editing for Videos
An Overview of Subliminal Learning
Compositional Visual Planning via Inference-Time Diffusion Scaling
Back to Square Roots: An Optimal Bound on the Matrix Factorization Error for Multi-Epoch Differentially Private SGD
AbsTopK: Rethinking Sparse Autoencoders For Bidirectional Features
Doubly-Robust LLM-as-a-Judge: Externally Valid Estimation with Imperfect Personas
Robustify Spiking Neural Networks via Dominant Singular Deflation under Heterogeneous Training Vulnerability
AdS-GNN - a Conformally Equivariant Graph Neural Network
Pinet: Optimizing hard-constrained neural networks with orthogonal projection layers
Why AI Evaluations Need Error Bars
DGNet: Learning Spatiotemporal PDEs with Discrete Green Networks
Model Misspecification in Simulation-Based Inference - Recent Advances and Open Challenges
Refine Now, Query Fast: A Decoupled Refinement Paradigm for Implicit Neural Fields
GCGNet: Graph-Consistent Generative Network for Time Series Forecasting with Exogenous Variables
Do Large Language Models Know What They Are Capable Of?
Convergence of Regret Matching in Potential Games and Constrained Optimization
Adaptive Acquisition Selection for Bayesian Optimization with Large Language Models
A^2TG: Adaptive Anisotropic Textured Gaussians for Efficient 3D Scene Representation
Enhancing Stability of Physics-Informed Neural Network Training Through Saddle-Point Reformulation
Shoot First, Ask Questions Later? Building Rational Agents that Explore and Act Like People
ComPhy: Composing Physical Models with end-to-end Alignment
Context parroting: A simple but tough-to-beat baseline for foundation models in scientific machine learning
Context Learning for Multi-Agent Discussion
Late-to-Early Training: LET LLMs Learn Earlier, So Faster and Better
SDErasure: Concept-Specific Trajectory Shifting for Concept Erasure via Adaptive Diffusion Classifier
Spilled Energy in Large Language Models
Light Differentiable Logic Gate Networks
Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts
Revisit Visual Prompt Tuning: The Expressiveness of Prompt Experts
Tell me Habibi, is it Real or Fake?
OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse
ZeroTuning: Unlocking the Initial Token's Power to Enhance Large Language Models Without Training
Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation
ToolTree: Efficient LLM Tool Planning via Dual-Feedback Monte Carlo Tree Search and Bidirectional Pruning
TPDiff: Temporal Pyramid Video Diffusion Model
Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework
EmoPrefer: Can Large Language Models Understand Human Emotion Preferences?
PMark: Towards Robust and Distortion-free Semantic-level Watermarking with Channel Constraints
Learning to Weight Parameters for Data Attribution
PINFDiT: Energy-Based Physics-Informed Diffusion Transformers for General-purpose Time Series Tasks
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
WorldEdit: Towards Open-World Image Editing with a Knowledge-Informed Benchmark
Minor First, Major Last: A Depth-Induced Implicit Bias of Sharpness-Aware Minimization
ResearchRubrics: A Benchmark of Prompts and Rubrics For Deep Research Agents
Towards Sustainable Investment Policies Informed by Opponent Shaping
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle
BaseReward: A Strong Baseline for Multimodal Reward Model
Bias Similarity Measurement: A Black-Box Audit of Fairness Across LLMs
Generate Any Scene: Scene Graph Driven Data Synthesis for Visual Generation Training
SparseD: Sparse Attention for Diffusion Language Models
Streaming Visual Geometry Transformer
Safety at One Shot: Patching Fine-Tuned LLMs with A Single Instance
DeepPrim: a Physics-Driven 3D Short-term Weather Forecaster via Primitive Equation Learning
EvA: Evolutionary Attacks on Graphs
TTS Can Speak in Any Style with Any Voice
KVComm: Enabling Efficient LLM Communication through Selective KV Sharing
Reducing information dependency does not cause training data privacy. Adversarially non-robust features do.
AutoCode: LLMs as Problem Setters for Competitive Programming
Learning to Generate Stylized Handwritten Text via a Unified Representation of Style, Content, and Noise
Test-Time Iterative Error Correction for Efficient Diffusion Models
PromptHub: Enhancing Multi-Prompt Visual In-Context Learning with Locality-Aware Fusion, Concentration and Alignment
Beyond Uniformity: Sample and Frequency Meta Weighting for Post-Training Quantization of Diffusion Models
AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
SkyEvents: A Large-Scale Event-enhanced UAV Dataset for Robust 3D Scene Reconstruction
Faithfulness Under the Distribution: A New Look at Attribution Evaluation
Towards a Foundation Model for Crowdsourced Label Aggregation
EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video
Mitigating Hallucination in Vision-Language Model with Depth and Spatial-aware Key-Value Refinement
Geometric Constraints for Small Language Models to Understand and Expand Scientific Taxonomies
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
A Statistical Learning Perspective on Semi-dual Adversarial Neural Optimal Transport Solvers
Online Decision-Focused Learning
Revisiting Matrix Sketching in Linear Bandits: Achieving Sublinear Regret via Dyadic Block Sketching
Dataset Distillation as Pushforward Optimal Quantization
The Art of Scaling Reinforcement Learning Compute for LLMs
Scalable Second-order Riemannian Optimization for $K$-means Clustering
Edit-Based Flow Matching for Temporal Point Processes
R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning
AutoQD: Automatic Discovery of Diverse Behaviors with Quality-Diversity Optimization
Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs
BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration
Robust Test-time Video-Text Retrieval: Benchmarking and Adapting for Query Shifts
Efficient Orthogonal Fine-Tuning with Principal Subspace Adaptation
MergePRAG: Orthogonal Merging of Passage-experts for Multi-hop Parametric RAG
Sample Efficient Offline RL via T-Symmetry Enforced Latent State-Stitching
Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction
Autoregressive-based Progressive Coding for Ultra-Low Bitrate Image Compression
HOG-Diff: Higher-Order Guided Diffusion for Graph Generation
Learning Human Habits with Rule-Guided Active Inference
CogniMap3D: Cognitive 3D Mapping and Rapid Retrieval
Unmute the Patch Tokens: Rethinking Probing in Multi-Label Audio Classification
FALCON: Few-step Accurate Likelihoods for Continuous Flows
Talking Points: Describing and Localizing Pixels
Simulation to Rules: A Dual-VLM Framework for Formal Visual Planning
TrustGen: A Platform of Dynamic Benchmarking on the Trustworthiness of Generative Foundation Models
CoPRS: Learning Positional Prior from Chain-of-Thought for Reasoning Segmentation
Reasoning Scaffolding: Distilling the Flow of Thought from LLMs
Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents
In-context learning of representations can be explained by induction circuits
Softmax Transformers are Turing-Complete
A Hidden Semantic Bottleneck in Conditional Embeddings of Diffusion Transformers
D&R: Recovery-based AI-Generated Text Detection via a Single Black-box LLM Call
Inoculation Prompting: Eliciting traits from LLMs during training can reduce trait expression at test-time
SuperF: Neural Implicit Fields for Multi-Image Super-Resolution
Fast and Interpretable Protein Substructure Alignment via Optimal Transport
Safety Subspaces are Not Linearly Distinct: A Fine-Tuning Case Study
A Unifying View of Coverage in Linear Off-policy Evaluation
Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning
Separable Neural Networks: Approximation Theory, NTK Regime, and Preconditioned Gradient Descent
The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas
Negotiated Reasoning: On Provably Addressing Relative Over-Generalization
Learning Posterior Predictive Distributions for Node Classification from Synthetic Graph Priors
SlotGCG: Exploiting the Positional Vulnerability in LLMs for Jailbreak Attacks
Composition-Grounded Instruction Synthesis for Visual Reasoning
FASA: FREQUENCY-AWARE SPARSE ATTENTION
Similarity-aware Non-Convex Federated Optimization
Two-Layer Convolutional Autoencoders Trained on Normal Data Provably Detect Unseen Anomalies
Memento: Toward an All-Day Proactive Assistant for Ultra-Long Streaming Video
Federated Graph-Level Clustering Network with Dual Knowledge Separation
Ghost in the Cloud: Your Geo-Distributed Large Language Models Training is Easily Manipulated
Flow-Disentangled Feature Importance
EasyTune: Efficient Step-Aware Fine-Tuning for Diffusion-Based Motion Generation
There Was Never a Bottleneck in Concept Bottleneck Models
Towards Privacy-Guaranteed Label Unlearning in Vertical Federated Learning: Few-Shot Forgetting Without Disclosure
A General Framework for Black-Box Attacks Under Cost Asymmetry
Bayesian Robust Cooperative Multi-Agent Reinforcement Learning Against Unknown Adversaries
Multi-Domain Transferable Graph Gluing for Building Graph Foundation Models
PriorGuide: Test-Time Prior Adaptation for Simulation-Based Inference
Uncertainty Estimation via Hyperspherical Confidence Mapping
Harpoon: Generalised Manifold Guidance for Conditional Tabular Diffusion
Uni-DPO: A Unified Paradigm for Dynamic Preference Optimization of LLMs
SoftCFG: Uncertainty-guided Stable Guidance for Visual Autoregressive Model
Direct Doubly Robust Estimation of Conditional Quantile Contrasts
Harnessing Hyperbolic Geometry for Harmful Prompt Detection and Sanitization
NC-Bench and NCfold: A Benchmark and Closed-Loop Framework for RNA Non-Canonical Base-Pair Prediction
Deep Global-sense Hard-negative Discriminative Generation Hashing for Cross-modal Retrieval
Robust LLM Unlearning via Post Judgment and Multi-round Thinking
Generative Bayesian Optimization: Generative Models as Acquisition Functions
Understanding and Relaxing the Limitations of Transformers for Linear Algebra
Not Search, But Scan: Benchmarking MLLMs on Scan-Oriented Academic Paper Reasoning
Scaling Behavior of Discrete Diffusion Language Models
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
3D-aware Disentangled Representation for Compositional Reinforcement Learning
StyliTruth : Unlocking Stylized yet Truthful LLM Generation via Disentangled Steering
Interaction Field Matching: Overcoming Limitations of Electrostatic Models
Fisher-Rao Sensitivity for Out-of-Distribution Detection in Deep Neural Networks
Mitigating the Safety Alignment Tax with Null-Space Constrained Policy Optimization
Learning to Reason via Mixture-of-Thought for Logical Reasoning
Conditionally Whitened Generative Models for Probabilistic Time Series Forecasting
Adaptive Social Learning via Mode Policy Optimization for Language Agents
MaRS: Memory-Adaptive Routing for Reliable Capacity Expansion and Knowledge Retention
SelvaBox: A high‑resolution dataset for tropical tree crown detection
Massive Editing for Large Language Models Based on Dynamic Weight Generation
Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-Language Navigation
CroCoDiLight: Repurposing Cross-View Completion Encoders for Relighting
Benchmarking Bias Mitigation Toward Fairness Without Harm from Vision to LVLMs
MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference
LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing
BANZ-FS: BANZSL Fingerspelling Dataset
Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes
Prior-aware and Context-guided Group Sampling for Active Probabilistic Subsampling
Hilbert: Recursively Building Formal Proofs with Informal Reasoning
Inlier-Centric Post-Training Quantization for Object Detection Models
VoxPrivacy: A Benchmark for Evaluating Interactional Privacy of Speech Language Models
From Assumptions to Actions: Turning LLM Reasoning into Uncertainty-Aware Planning for Embodied Agents
An efficient, provably optimal, practical algorithm for the 0-1 loss linear classification problem
DRPO: Efficient Reasoning via Decoupled Reward Policy Optimization
RAEE: A Robust Retrieval-Augmented Early Exit Framework for Efficient Inference
Motion Prior Distillation in Time Reversal Sampling for Generative Inbetweening
Modal Aphasia: Can Unified Multimodal Models Describe Images From Memory?
Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model
3D Aware Region Prompted Vision Language Model
Gen-DFL: Decision-Focused Generative Learning for Robust Decision Making
Beyond Speedup - Utilizing KV Cache for Sampling and Reasoning
A Bayesian Nonparametric Framework For Learning Disentangled Representations
The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss
Flow Straight and Fast in Hilbert Space: Functional Rectified Flow
Discern Truth from Falsehood: Reducing Over-Refusal via Contrastive Refinement
RefineBench: Evaluating Refinement Capability in Language Models
Improved high-dimensional estimation with Langevin dynamics and stochastic weight averaging
Semantic-Enhanced Time-Series Forecasting via Large Language Models
LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures
Soft-Masked Diffusion Language Models
Enhancing Image-Conditional Coverage in Segmentation: Adaptive Thresholding via Differentiable Miscoverage Loss
ToolWeaver: Weaving Collaborative Semantics for Scalable Tool Use in Large Language Models
LLMs Struggle to Balance Reasoning and World Knowledge in Causal Narrative Understanding
ByteFlow: Language Modeling through Adaptive Byte Compression without a Tokenizer
PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework
OrthoRF: Exploring Orthogonality in Object-Centric Representations
Is it Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
Any-step Generation via N-th Order Recursive Consistent Velocity Field Estimation
Smooth Reading: Bridging the Gap of Recurrent LLM to Self-Attention LLM on Long-Context Understanding
Grasp Any Region: Prompting MLLM to Understand the Dense World
Convergent Differential Privacy Analysis for General Federated Learning
Hierarchical Encoding Tree with Modality Mixup for Cross-modal Hashing
ZIP-RC: Zero-overhead Inference-time Prediction of Reward and Cost for Adaptive and Interpretable Generation
Spurious Correlation-Aware Embedding Regularization for Worst-Group Robustness
Cross-Embodiment Offline Reinforcement Learning for Heterogeneous Robot Datasets
Uncovering Robot Vulnerabilities through Semantic Potential Fields
HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation
Synergizing Understanding and Generation with Interleaved Analyzing-Drafting Thinking
Distilling and Adapting: A Topology-Aware Framework for Zero-Shot Interaction Prediction in Multiplex Biological Networks
MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs
Analysis of approximate linear programming solution to Markov decision problem with log barrier function
The Pensieve Paradigm: Stateful Language Models with Learned Memory Management
Latent Planning Emerges with Scale
Let's (not) just put things in Context: Test-time Training for Long-context LLMs
A Cognitive Process-Inspired Architecture for Subject-Agnostic Brain Visual Decoding
Unifying Diffusion and Autoregression for Generalizable Vision-Language-Action Model
Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games
When Foundation Models are One-Liners: Limitations and Future Directions for Time Series Anomaly Detection
VIRTUE: Visual-Interactive Text-Image Universal Embedder
Sim2Real VLA: Zero-Shot Generalization of Synthesized Skills to Realistic Manipulation
DiffTrans: Differentiable Geometry-Materials Decomposition for Reconstructing Transparent Objects
Beyond the Heatmap: A Rigorous Evaluation of Component Impact in MCTS-Based TSP Solvers
FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Merging
No labels, No Problem: Training Visual Reasoners with Multimodal Verifiers
AFD-INSTRUCTION: A Comprehensive Antibody Instruction Dataset with Functional Annotations for LLM-Based Understanding and Design
SWE-RM: Execution-free Feedback for Software Engineering Agents
TurboBoA: Faster and Exact Attention-aware Quantization without Backpropagation
VideoPhy-2: A Challenging Action-Centric Physical Commonsense Evaluation in Video Generation
Break the Trade-off Between Watermark Strength and Speculative Sampling Efficiency for Language Models
Can SAEs reveal and mitigate racial biases of LLMs in healthcare?
Towards Physically Executable 3D Gaussian for Embodied Navigation
$\textbf{Re}^{2}$: Unlocking LLM Reasoning via Reinforcement Learning with Re-solving
Exploiting Low-Dimensional Manifold of Features for Few-shot Whole Slide Image Classification
Exposing Weaknesses of Large Reasoning Models through Graph Algorithm Problems
GNN Explanations that do not Explain and How to find Them
TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture
MARS: Reinforcing Multi-Agent Reasoning of LLMs through Self-Play in Strategic Games
Null-Space Filtering for Data-free Continual Model Merging: Preserving Transparency, Promoting Fidelity
A Function-Centric Graph Neural Network Approach for Predicting Electron Densities
UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos
Importance Sampling for Multi-Negative Multimodal Direct Preference Optimization
Deep Learning for Subspace Regression
ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping
A Problem-Oriented Perspective and Anchor Verification for Code Optimization
Tree-sliced Sobolev IPM
Exploring Real-Time Super-Resolution: Benchmarking and Fine-Tuning for Streaming Content
Efficient Submodular Maximization for Sums of Concave over Modular Functions
ICDiffAD: Implicit Conditioning Diffusion Model for Time Series Anomaly Detection
RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning
EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging Benchmark
A Formal Controllability Toolkit for Black-Box Generative Models
Adaptive Hopfield Network: Rethinking Similarities in Associative Memory
Overlap-Adaptive Regularization for Conditional Average Treatment Effect Estimation
INO-SGD: Addressing Utility Imbalance under Individualized Differential Privacy
Online Prediction of Stochastic Sequences with High Probability Regret Bounds
Sculptor: Empowering LLMs with Cognitive Agency via Active Context Management
Scalable Multilingual Multimodal Machine Translation with Speech-Text Fusion
Exchangeability of GNN Representations with Applications to Graph Retrieval
Monitoring Decomposition Attacks with Lightweight Sequential Monitors
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots
WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents
XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
Advancing End-to-End Pixel-Space Generative Modeling via Self-Supervised Pre-Training
Naming to Learn: Class Incremental Learning for Vision-Language Model with Unlabeled Data
Efficient-SAM2: Accelerating SAM2 with Object-Aware Visual Encoding and Memory Retrieval
Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models
Beyond Skeletons: Learning Animation Directly from Driving Videos with Same2X Training Strategy
Hinge Regression Tree: A Newton Method for Oblique Regression Tree Splitting
Theoretical Modeling of Large Language Model Self-Improvement Training Dynamics Through Solver-Verifier Gap
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training
Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!
Semi-Parametric Contextual Pricing with General Smoothness
Children's Intelligence Tests Pose Challenges for MLLMs? KidGym: A 2D Grid-Based Reasoning Benchmark for MLLMs
CDBridge: A Cross-omics Post-training Bridge Strategy for Context-aware Biological Modeling
QUEST: A robust attention formulation using query-modulated spherical attention
Feed-forward Human Performance Capture via Progressive Canonical Space Updates
MoGen: Detailed Neuronal Morphology Generation via Point Cloud Flow Matching
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models
Sample-efficient evidence estimation of score based priors for model selection
PixelCraft: A Multi-Agent system for High-Fidelity Visual Reasoning on Structured Images
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning
Diffusion Models as Dataset Distillation Priors
Mamba-3: Improved Sequence Modeling using State Space Principles
Why Do Unlearnable Examples Work: A Novel Perspective of Mutual Information
GOOD: Geometry-guided Out-of-Distribution Modeling for Open-set Test-time Adaptation in Point Cloud Semantic Segmentation
Is Your Paper Being Reviewed by an LLM? Benchmarking AI Text Detection in Peer Review
Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies
Metis: Training LLMs with FP4 Quantization
PHyCLIP: $\ell_1$-Product of Hyperbolic Factors Unifies Hierarchy and Compositionality in Vision-Language Representation Learning
CARL: Preserving Causal Structure in Representation Learning
CellDuality: Unlocking Biological Reasoning in LLMs with Self-Supervised RLVR
Distribution-informed Online Conformal Prediction
HumanPCR: Probing MLLM Capabilities in Diverse Human-Centric Scenes
RegionE: Adaptive Region-Aware Generation for Efficient Image Editing
Disentangled representation learning through unsupervised symmetry group discovery
Tackling Heavy-Tailed Q-Value Bias in Offline-to-Online Reinforcement Learning with Laplace-Robust Modeling
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
UniCA: Unified Covariate Adaptation for Time Series Foundation Model
SRT: Super-Resolution for Time Series via Disentangled Rectified Flow
SpeechJudge: Towards Human-Level Judgment for Speech Naturalness
Universal Properties of Activation Sparsity in Modern Large Language Models
Joint Discriminative-Generative Modeling via Dual Adversarial Training
The Counting Power of Transformers
LRIM: a Physics-Based Benchmark for Provably Evaluating Long-Range Capabilities in Graph Learning
T1: One-to-One Channel-Head Binding for Multivariate Time-Series Imputation
Rolling Forcing: Autoregressive Long Video Diffusion in Real Time
Hybrid Training for Vision-Language-Action Models
Rethinking Expressivity and Degradation-Awareness in Attention for All-in-One Blind Image Restoration
ADM-v2: Pursuing Full-Horizon Roll-out in Dynamics Models for Offline Policy Learning and Evaluation
Learning Brain Representation with Hierachical Visual Embeddings
DRIFT: Decompose, Retrieve, Illustrate, then Formalize Theorems
Composable Sparse Subnetworks via Maximum-Entropy Principle
VisualPRM400K: An Effective Dataset for Training Multimodal Process Reward Models
Improving Extreme Wind Prediction with Frequency-Informed Learning
Uni-X: Mitigating Modality Conflict with a Two-End-Separated Architecture for Unified Multimodal Models
Bongard-RWR+: Real-World Representations of Fine-Grained Concepts in Bongard Problems
Learning To Draft: Adaptive Speculative Decoding with Reinforcement Learning
Adaptive Collaboration with Humans: Metacognitive Policy Optimization for Multi-Agent LLMs with Continual Learning
RECODE: A Benchmark for Research Code DEvelopment with Interactive Human Feedback
FastVMT: Eliminating Redundancy in Video Motion Transfer
MARTI: A Framework for Multi-Agent LLM Systems Reinforced Training and Inference
Self-Evolving Vision-Language Models for Image Quality Assessment via Voting and Ranking
Time Optimal Execution of Action Chunk Policies Beyond Demonstration Speed
Language as a Window Into the Mind: How NLP and LLMs Advance Human Sciences
HUMOF: Human Motion Forecasting in Interactive Social Scenes
Learning Heterogeneous Degradation Representation for Real-World Super-Resolution
Selective Data Removal for Distributional Machine Unlearning
UniCon: Unified Framework for Efficient Contrastive Alignment via Kernels
CFO: Learning Continuous-Time PDE Dynamics via Flow-Matched Neural Operators
E²LoRA: Efficient and Effective Low-Rank Adaptation with Entropy-Guided Adaptive Sharing
Training LLMs with LogicReward for Faithful and Rigorous Reasoning
MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs
Policy Likelihood-based Query Sampling and Critic-Exploited Reset for Efficient Preference-based Reinforcement Learning
Rethinking Benign Relearning: Syntax as the Hidden Driver of Unlearning Failures
Sapiens2
GenCape: Structure-Inductive Generative Modeling for Category-Agnostic Pose Estimation
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs
InfGen: Scenario Generation as Next Token Group Prediction
Rodrigues Network for Learning Robot Actions
Global and Local Topology-Aware Graph Generation via Dual Conditioning Diffusion
Contact Wasserstein Geodesics for Non-Conservative Schrödinger Bridges
AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy
Reference Guided Skill Discovery
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLMs
Asymptotic analysis of shallow and deep forgetting in replay with neural collapse
VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use
Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer
Flow Expansion via Verifier-Constrained Noised State Space Exploration
ROSETTA: Constructing Code-Based Reward from Unconstrained Language Preference
MeSH: Memory-as-State-Highways for Recursive Transformers
D-AR: Diffusion via Autoregressive Models
Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow
Sobolev Gradient Ascent for Optimal Transport: Barycenter Optimization and Convergence Analysis
Phantom-Data: Towards a General Subject-Consistent Video Generation Dataset
MoRA: Mobility as the Backbone for Geospatial Representation Learning at Scale
Intrinsic training dynamics of deep neural networks
YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting
ERTACache: Error Rectification and Timesteps Adjustment for Efficient Diffusion
Bringing Stability to Diffusion: Decomposing and Reducing Variance of Training Masked Diffusion Models
Scalable Oversight via Partitioned Human Supervision
Functional MRI Time Series Generation via Wavelet-Based Image Transform and Spectral Flow Matching for Brain Disorder Identification
I2Mole: Interaction-aware Invariant Molecular Learning For Generalizable Property Prediction
StylOS: Multi-View 3D Stylization with Single-Forward Gaussian Splatting
Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness
$p\textrm{-less}$ Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding
Tab-MIA: A Benchmark Dataset for Membership Inference Attacks on Tabular Data in LLMs
ACE: Attribution-Controlled Knowledge Editing for Multi-hop Factual Recall
Learning Mixtures of Linear Dynamical Systems (MoLDS) via Hybrid Tensor–EM Method
AgentFold: Long-Horizon Web Agents with Proactive Context Folding
ContextIF: Enhancing Instruction-Following through Context Reward
Physics-informed learning under mixing: How physical knowledge speeds up learning
Empowering Multi-Robot Cooperation via Sequential World Models
A Revisit of Active Sequential Prediction-Powered Mean Estimation
Bridging Kolmogorov Complexity and Deep Learning: Asymptotically Optimal Description Length Objectives for Transformers
A Reward-Free Viewpoint on Multi-Objective Reinforcement Learning
Synthetic History: Evaluating Visual Representations of the Past in Diffusion Models
Operator Learning with Domain Decomposition for Geometry Generalization in PDE Solving
BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, and Rerankers
RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents
TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design
ATEX-CF: Attack-Informed Counterfactual Explanations for Graph Neural Networks
Factuality Matters: When Image Generation and Editing Meet Structured Visuals
Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights
In-Context Algebra
Structural Inference: Interpreting Small Language Models with Susceptibilities
SpectraLLM: Uncovering the Ability of LLMs for Molecule Structure Elucidation from Multi-Spectra
Federated ADMM from Bayesian Duality
MIMIC-Bench: Exploring the User-Like Thinking and Mimicking Capabilities of Multimodal Large Language Models
Online Decision Making with Generative Action Sets
FineNib: A Query Synthesizer For Static Analysis of Security Vulnerabilities
Seeing but Not Believing: Probing the Disconnect Between Visual Attention and Answer Correctness in VLMs
3DCS: Datasets and Benchmark for Evaluating Conformational Sensitivity in Molecular Representations
Safe Exploration via Policy Priors
JULI: Jailbreak Large Language Models by Self-Introspection
Sublinear Time Quantum Algorithm for Attention Approximation
MC-Search: Evaluating and Enhancing Multimodal Agentic Search with Structured Long Reasoning Chains
Learning Recursive Multi-Scale Representations for Irregular Multivariate Time Series Forecasting
InfoTok: Adaptive Discrete Video Tokenizer via Information-Theoretic Compression
T-TAMER: Provably Taming Trade-offs in ML Serving
Concepts' Information Bottleneck Models
On the Benefits of Weight Normalization for Overparameterized Matrix Sensing
AsyncBEV: Cross-modal flow alignment in Asynchronous 3D Object Detection
Near-Optimal Online Deployment and Routing for Streaming LLMs
MoSA: Motion-Coherent Human Video Generation via Structure-Appearance Decoupling
WARC-Bench: Web Archive based Benchmark for GUI Subtask Executions
STORM: Synergistic Cross-Scale Spatio-Temporal Modeling for Weather Forecasting
AIRE-Prune: Asymptotic Impulse-Response Energy for State Pruning in State Space Models
Expert Merging in Sparse Mixture of Experts with Nash Bargaining
CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning
Faster Parameter-Free Regret Matching Algorithms
Error Notebook-Guided, Training-Free Part Retrieval in 3D CAD Assemblies via Vision-Language Models
Continuously Augmented Discrete Diffusion model for Categorical Generative Modeling
Towards a Universally Transferable Acceleration Method for Density Functional Theory
CoRA: Boosting Time Series Foundation Models for Multivariate Forecasting through Correlation-aware Adapter
Predicting LLM Reasoning Performance with Small Proxy Model
Unified Privacy Guarantees for Decentralized Learning via Matrix Factorization
Output Supervision Can Obfuscate the Chain of Thought
Characteristic Root Analysis and Regularization for Linear Time Series Forecasting
SEED: Towards More Accurate Semantic Evaluation for Visual Brain Decoding
Cut Less, Fold More: Model Compression through the Lens of Projection Geometry
From Five Dimensions to Many: Large Language Models as Precise and Interpretable Psychological Profilers
Unveiling Super Experts in Mixture-of-Experts Large Language Models
Improving Reasoning for Diffusion Language Models via Group Diffusion Policy Optimization
TileLang: Bridge Programmability and Performance in Modern Neural Kernels
Gogo: Group-wise granularity-ordered codec for stable and efficient speech generation
R2-Dreamer: Redundancy-Reduced World Models without Decoders or Augmentation
SyncTrack: Rhythmic Stability and Synchronization in Multi-Track Music Generation
VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models
Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match
Fed-Duet: Dual Expert-Orchestrated Framework for Continual Federated Vision-Language Learning
A Study on PAVE Specification for Learnware
On the Alignment Between Supervised and Self-Supervised Contrastive Learning
Pre-training LLM without Learning Rate Decay Enhances Supervised Fine-Tuning
ULD-Net: Enabling Ultra-Low-Degree Fully Polynomial Networks for Homomorphically Encrypted Inference
HBO: Hierarchical Balancing Optimization for Fine-Tuning Large Language Models
ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
Contrastive Predictive Coding Done Right for Mutual Information Estimation
P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark
Rote Learning Considered Useful: Generalizing over Memorized Data in LLMs
ChemEval: A Multi-level and Fine-grained Chemical Capability Evaluation for Large Language Models
Benchmarking Stochastic Approximation Algorithms for Fairness-Constrained Training of Deep Neural Networks
CardioComposer: Leveraging Differentiable Geometry for Compositional Control of Anatomical Diffusion Models
Enabling Your Forensic Detector Know How Well It Performs on Distorted Samples
VLM-Guided Adaptive Negative Prompting for Creative Generation
Nef-Net+: Adapting Electrocardio Panorama in the wild
Mode-conditioning unlocks superior test-time compute scaling
Inverse Reinforcement Learning with Dynamic Reward Scaling for LLM Alignment
Masked Skill Token Training for Hierarchical Off-Dynamics Transfer
DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains
Frequency-Balanced Retinal Representation Learning with Mutual Information Regularization
VidBridge-R1: Bridging QA and Captioning for RL-based Video Understanding Models with Intermediate Proxy Tasks
What Generative Search Engines Like and How to Optimize Web Content Cooperatively
Unified and Efficient Multi-view Clustering from Probabilistic Perspective
EUBRL: Epistemic Uncertainty Directed Bayesian Reinforcement Learning
SR-Scientist: Scientific Equation Discovery With Agentic AI
Birch SGD: A Tree Graph Framework for Local and Asynchronous SGD Methods
Knowledge Exchange with Confidence: Cost-Effective LLM Integration for Reliable and Efficient Visual Question Answering
DiSRouter: Distributed Self-Routing for LLM Selections
Pairwise is Not Enough: Hypergraph Neural Networks for Multi-Agent Pathfinding
SafeDialBench: A Fine-Grained Safety Evaluation Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak Attacks
Masked Generative Policy for Robotic Control
Amortising Inference and Meta-Learning Priors in Neural Networks
Curriculum Reinforcement Learning from Easy to Hard Tasks Improves LLM Reasoning
Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks
SAES-SVD: Self-Adaptive Suppression of Accumulated and Local Errors for SVD-based LLM Compression
Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models
The Quest for Generalizable Motion Generation: Data, Model, and Evaluation
Densemarks: Learning Canonical Embeddings for Human Heads Images via Point Tracks
Point-Focused Attention Meets Context-Scan State Space: Robust Biological Visual Perception for Point Cloud Representation
SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training
Formalising Human-in-the-Loop: Computational Reductions, Failure Modes, and Legal-Moral Responsibility
PILOT-Bench: Probabilistic Interaction for LLM Operations in Tool-driven Scenarios
Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension
AutoDrive-R²: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving
Non-Clashing Teaching in Graphs: Algorithms, Complexity, and Bounds
Fast Convergence of Natural Gradient Descent for Over-parameterized Physics-Informed Neural Networks
WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving
Structured Flow Autoencoders: Learning Structured Probabilistic Representations with Flow Matching
Pretrain–Test Task Alignment Governs Generalization in In-Context Learning
SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From
MolLangBench: A Comprehensive Benchmark for Language-Prompted Molecular Structure Recognition, Editing, and Generation
GRL-SNAM: Geometric Reinforcement Learning with Differential Hamiltonians for Navigation and Mapping in Unknown Environments
HAMLET: Switch Your Vision-Language-Action Model into a History-Aware Policy
A Probabilistic Hard Concept Bottleneck for Steerable Generative Models
One Skill, Many Websites: Learning Generalizable Skills Through Polymorphic Abstraction
Reducing Class-Wise Performance Disparity via Margin Regularization
Look Carefully: Adaptive Visual Reinforcements in Multimodal Large Language Models for Hallucination Mitigation
High Probability Bounds for Non-Convex Stochastic Optimization with Momentum
RiskPO: Risk-based Policy Optimization with Verifiable Reward for LLM Post-Training
Discovering Diverse Behaviors via Temporal Contrastive Learning
Scalable and Adaptive Trust-Region Learning via Projection Convex Hull
MotionWeaver: Holistic 4D-Anchored Framework for Multi-Humanoid Image Animation
Multi-Synaptic Cooperation: A Bio-Inspired Framework for Robust and Scalable Continual Learning
3DSMT: A Hybrid Spiking Mamba-Transformer for Point Cloud Analysis
Universal Multi-Domain Translation via Diffusion Routers
Metric $k$-clustering using only Weak Comparison Oracles
CL-DPS: A Contrastive Learning Approach to Blind Nonlinear Inverse Problem Solving via Diffusion Posterior Sampling
Test-Time Poisoned Sample Detection by Exploiting Shallow Malicious Matching in Backdoored CLIP
Long-range Modeling and Processing of Multimodal Event Sequences
ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
RFS: Reinforcement learning with Residual flow steering for dexterous manipulation
SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models
Perception-R1: Advancing Multimodal Reasoning Capabilities of MLLMs via Visual Perception Reward
Locality-Attending Vision Transformer
TSPulse: Tiny Pre-Trained Models with Disentangled Representations for Rapid Time-Series Analysis
Pragma-VL: Towards a Pragmatic Arbitration of Safety and Helpfulness in MLLMs
Curse of Slicing: Why Sliced Mutual Information is a Deceptive Measure of Statistical Dependence
Learning Physics-Grounded 4D Dynamics with Neural Gaussian Force Fields
Non-Autoregressive Generation for Agentic Multi-Turn Interaction
CoT Vectors: Transferring and Probing the Reasoning Mechanisms of LLMs
wd1: Weighted Policy Optimization for Reasoning in Diffusion Language Models
A Block Coordinate Descent Method for Nonsmooth Composite Optimization under Orthogonality Constraints
UrbanGS: Efficient and Scalable Architecture for Geometrically Accurate Large-Scene Reconstruction
The Lattice Geometry of Neural Network Quantization: A Short Equivalence Proof of GPTQ and Babai's algorithm
Learning on a Razor’s Edge: Identifiability and Singularity of Polynomial Neural Networks
TP-Spikformer: Token Pruned Spiking Transformer
TEST-TIME SCALING IN DIFFUSION LLMS VIA HIDDEN SEMI-AUTOREGRESSIVE EXPERTS
Rethinking Pareto Frontier: On the Optimal Trade-offs in Fair Classification
Aurelius: Relation Aware Text-to-Audio Generation At Scale
FSD-CAP: Fractional Subgraph Diffusion with Class-Aware Propagation for Graph Feature Imputation
Incomplete Multi-View Multi-Label Classification via Shared Codebook and Fused-Teacher Self-Distillation
Soft Quality-Diversity Optimization
SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents
Membrane Potential Perturbation Dynamic Is Total Variation
A General Spatio-Temporal Backbone with Scalable Contextual Pattern Bank for Urban Continual Forecasting
Nearly Space-Optimal Graph and Hypergraph Sparsification in Insertion-Only Data Streams
Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime
SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
DCFold: Efficient Protein Structure Generation with Single Forward Pass
ArtUV: Artist-style UV Unwrapping
BioMD: All-atom Generative Model for Biomolecular Dynamics Simulation
RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy
Light of Normals: Unified Feature Representation for Universal Photometric Stereo
Safety Instincts: LLMs Learn to Trust Their Internal Compass for Self-Defense
UniF$^2$ace: A $\underline{Uni}$fied $\underline{F}$ine-grained $\underline{Face}$ Understanding and Generation Model
PI-Light: Physics-Inspired Diffusion for Full-Image Relighting
Rethinking Residual Errors in Compensation-based LLM Quantization
FLARE: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
End-to-end Listen, Look, Speak and Act
Smarter Not Harder: Generative Process Evaluation with Intrinsic-Signal Driving and Ability‑Adaptive Reward Shaping
Efficient Audio-Visual Speech Separation with Discrete Lip Semantics and Multi-Scale Global-Local Attention
Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding
DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning
VaseVQA-3D: Benchmarking 3D VLMs on Ancient Greek Pottery
Cache-to-Cache: Direct Semantic Communication Between Large Language Models
Detect, Decide, Unlearn: A Transfer-Aware Framework for Continual Learning
Mitigating the Curse of Detail: Scaling Arguments for Feature Learning and Sample Complexity
A2ASecBench: A Protocol-Aware Security Benchmark for Agent-to-Agent Multi-Agent Systems
Human-Object Interaction via Automatically Designed VLM-Guided Motion Policy
Plan then Act: Bi-level CAD Command Sequence Generation
PredNext: Explicit Cross-View Temporal Prediction for Unsupervised Learning in Spiking Neural Networks
Untraceable DeepFakes via Traceable Fingerprint Elimination
A Near-Optimal Best-of-Both-Worlds Algorithm for Federated Bandits
Unleashing LLMs in Bayesian Optimization: Preference-Guided Framework for Scientific Discovery
LLMs Must Think Thrice to Solve Executable Counterfactuals
IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs
Condition Matters in Full-head 3D GANs
Empowering LLM Tool Invocation with Tool-call Reward Model
Decoding Inner Speech with an End-to-End Brain-to-Text Neural Interface
EMBridge: Enhancing Gesture Generalization from EMG Signals Through Cross-modal Representation Learning
Learning to Lie: Reinforcement Learning Attacks Damage Human-AI Teams and Teams of LLMs
Keep the Best, Forget the Rest: Reliable Alignment with Order-Aware Preference Optimization
A Structured, Tagged, and Localized Visual Question Answering Dataset with Full Sentence Answers and Scene Graphs for Chest X-ray Images
The Curious Case of In-Training Compression of State Space Models
Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts
Human Uncertainty-Aware Data Selection and Automatic Labeling in Visual Question Answering
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification
PrefixMemory-Tuning: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention
Align-SAM: Seeking Flatter Minima for Better Cross-Subset Alignment
Consistency-Driven Calibration and Matching for Few-Shot Class Incremental Learning
Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions
Much Ado About Noising: Do Flow Models Actually Make Better Control Policies?
Choices Speak Louder than Questions
Beyond Raw Detection Scores: Markov-Informed Calibration for Boosting Machine-Generated Text Detection
Efficient Multi-objective Prompt Optimization via Pure-exploration Bandits
PolySHAP: Extending KernelSHAP with Interaction-Informed Polynomial Regression
Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models
Towards Reliable Detection of Empty Space: Conditional Marked Point Processes for Object Detection
ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models
Toward Efficient Exploration by Large Language Model Agents
Building a Foundational Guardrail for General Agentic Systems via Synthetic Data
CoDA: Agentic Systems for Collaborative Data Visualization
Neural Collapse in Multi-Task Learning
Multi-Resolution Score-Based Variational Graphical Diffusion for Causal Inference on Latent Systems
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs
A One-shot Framework for Directed Evolution of Antibodies
AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite
Purifying Generative LLMs from Backdoors without Prior Knowledge or Clean Reference
MergeTune: Continued Fine-Tuning of Vision-Language Models
Dragging with Geometry: From Pixels to Geometry-Guided Image Editing
CubeBench: Diagnosing Interactive, Long-Horizon Physical Intelligence under Partial Observations
TIMESLIVER : SYMBOLIC-LINEAR DECOMPOSITION FOR EXPLAINABLE TIME SERIES CLASSIFICATION
Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking
What Matters for Bioacoustic Encoding
Ice Cream Doesn’t Cause Drowning: Benchmarking LLMs Against Statistical Pitfalls in Causal Inference
Discrete Compositional Generation via General Soft Operators and Robust Reinforcement Learning
Self-Guided Low Light Object Detection Framework
PoliCon: Evaluating LLMs on Achieving Diverse Political Consensus Objectives
HAMLET: Hyperadaptive Agent-based Modeling for Live Embodied Theatrics
Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages
From Sparse to Dense: Spatio-Temporal Fusion for Multi-View 3D Human Pose Estimation with DenseWarper
Search Arena: Analyzing Search-Augmented LLMs
Drugging the Undruggable: Benchmarking and Modeling Fragment-Based Screening
Where Did It Go Wrong? Attributing Undesirable LLM Behaviors via Representation Gradient Tracing
Multi-Subspace Multi-Modal Modeling for Diffusion Models: Estimation, Convergence and Mixture of Experts
ELLMob: Event-Driven Human Mobility Generation with Self-Aligned LLM Framework
Learning Massively Multitask World Models for Continuous Control
RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling
Learning Boltzmann Generators via Constrained Mass Transport
Q-RAG: Long Context Multi‑Step Retrieval via Value‑Based Embedder Training
Sem-MoE: Semantic-aware Model-Data Collaborative Scheduling for Efficient MoE Inference
Unsupervised Invariant Risk Minimization
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking
Implicit 4D Gaussian Splatting for Fast Motion with Large Inter-Frame Displacements
DriveMamba: Task-Centric Scalable State Space Model for Efficient End-to-End Autonomous Driving
Enhancing Visual Token Representations for Video Large Language Models via Training-free Spatial-Temporal Pooling and Gridding
ChronoEdit: Towards Temporal Reasoning for In-Context Image Editing and World Simulation
OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging
Multimodal Dataset Distillation via Phased Teacher Models
In-The-Flow Agentic System Optimization for Effective Planning and Tool Use
Neural Theorem Proving for Verification Conditions: A Real-World Benchmark
Unlocking Long-Horizon Agentic Search with Large-Scale End-to-End RL
Actions Speak Louder than Prompts: A Large-Scale Study of LLMs for Graph Inference
WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM
From Collapse to Control: Understanding and Extending Context Length in Emerging Hybrid Models via Universal Position Interpolation
Spike-based Digital Brain: a novel fundamental model for brain activity analysis
Peng's Q($\lambda$) for Conservative Value Estimation in Offline Reinforcement Learning
Exploratory Causal Inference in SAEnce
Biologically Plausible Learning via Bidirectional Spike-Based Distillation
TRIM: Hybrid Inference via Targeted Stepwise Routing in Multi-Step Reasoning Tasks
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
GeoGramBench: Benchmarking the Geometric Program Reasoning in Modern LLMs
One for Two: A Unified Framework for Imbalanced Graph Classification via Dynamic Balanced Prototype
WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research
TCD-Arena: Assessing Robustness of Time Series Causal Discovery Methods Against Assumption Violations
ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents
SmartChunk Retrieval: Query-Aware Chunk Compression with Planning for Efficient Document RAG
Discrete Guidance Matching: Exact Guidance for Discrete Flow Matching
Experience-based Knowledge Correction for Robust Planning in Minecraft
TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video
Compositional amortized inference for large-scale hierarchical Bayesian models
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence
Mango-GS: Enhancing Spatio-Temporal Consistency in Dynamic Scenes Reconstruction using Multi-Frame Node-Guided 4D Gaussian Splatting
Forge: Compiling a Unified Abstraction into Scalable Kernels for Linear Attention
Enough is as good as a feast: A Comprehensive Analysis of How Reinforcement Learning Mitigates Task Conflicts in LLMs
Stochastic Optimal Control for Continuous-Time fMRI Representation Learning
OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
Variation-aware Flexible 3D Gaussian Editing
From Broad Exploration to Stable Synthesis: Entropy-Guided Optimization for Autoregressive Image Generation
Local Geometry Attention for Time Series Forecasting under Realistic Corruptions
VERIFY: A Novel Multi-Domain Dataset Grounding LTL in Contextual Natural Language via Provable Intermediate Logic
Understanding the Dynamics of Forgetting and Generalization in Continual Learning via the Neural Tangent Kernel
Fine-Grained Class-Conditional Distribution Balancing for Debiased Learning
Towards Bridging the Gap between Large-Scale Pretraining and Efficient Finetuning for Humanoid Control
MMPD: Diverse Time Series Forecasting via Multi-Mode Patch Diffusion Loss
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm
Setting up for failure: automatic discovery of the neural mechanisms of cognitive errors
LABEL-FREE MITIGATION OF SPURIOUS CORRELATIONS IN VLMS USING SPARSE AUTOENCODERS
Fractional-Order Spiking Neural Network
FZOO: Fast Zeroth-Order Optimizer for Fine‑Tuning Large Language Models towards Adam‑Scale Speed
Intrinsic Lorentz Neural Network
Active Learning for Decision Trees with Provable Guarantees
Unified 3D Scene Understanding Through Physical World Modeling
Arbitrary-Order Block SignSGD for Memory-Efficient LLM Fine-Tuning
GneissWeb: Preparing High Quality Data for LLMs at Scale
Revisual-R1: Advancing Multimodal Reasoning From Optimized Cold Start to Staged Reinforcement Learning
Token-Efficient Long-Term Interest Sketching and Internalized Reasoning for LLM-based Recommendation
HGNet: Scalable Foundation Model for Automated Knowledge Graph Generation from Scientific Literature
CompMarkGS: Robust Watermarking for Compressed 3D Gaussian Splatting
PaAno: Patch-Based Representation Learning for Time-Series Anomaly Detection
Implicit Bias and Loss of Plasticity in Matrix Completion: Depth Promotes Low-Rankness
Fair in Mind, Fair in Action? A Synchronous Benchmark for Understanding and Generation in UMLLMs
Incomplete Data, Complete Dynamics: A Diffusion Approach
Learning to Grasp Anything By Playing with Random Toys
Steering Autoregressive Music Generation with Recursive Feature Machines
Many Eyes, One Mind: Temporal Multi-Perspective and Progressive Distillation for Spiking Neural Networks
ST-HHOL: Spatio-Temporal Hierarchical Hypergraph Online Learning for Crime Prediction
BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses
Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale
Universal Value-Function Uncertainties
Learning from Algorithm Feedback: One-Shot SAT Solver Guidance with GNNs
PAGE-4D: Disentangled Pose and Geometry Estimation for 4D Perception
Explore-on-Graph: Incentivizing Autonomous Exploration of Large Language Models on Knowledge Graphs with Path-refined Reward Modeling
UniSplat: Unified Spatio-Temporal Fusion via 3D Latent Scaffolds for Dynamic Driving Scene Reconstruction
Seesaw: Accelerating Training by Balancing Batch Size and Learning Rate Scheduling
Divide and Abstract: Autoformalization via Decomposition and Abstraction Learning
Alternating Diffusion for Proximal Sampling with Zeroth Order Queries
Let's Explore Step by Step: Generating Provable Formal Statements with Deductive Exploration
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
Riemannian Variational Flow Matching for Material and Protein Design
Look-ahead Reasoning with a Learned Model in Imperfect Information Games
Improving LLM Alignment with References
Dynamic Early Exit in Reasoning Models
Incentives in Federated Learning with Heterogeneous Agents
PepBenchmark: A Standardized Benchmark for Peptide Machine Learning
Scalable In-Context Q-Learning
ABBA-Adapters: Efficient and Expressive Fine-Tuning of Foundation Models
Learning Unified Representation of 3D Gaussian Splatting
CO3: CONTRASTING CONCEPTS COMPOSE BETTER
CryoNet.Refine: A One-step Diffusion Model for Rapid Refinement of Structural Models with Cryo-EM Density Map Restraints
Flow Along the $K$-Amplitude for Generative Modeling
Robust Optimization for Mitigating Reward Hacking with Correlated Proxies
Discovering Hierarchical Software Engineering Agents via Bandit Optimization
Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search
Beyond Softmax and Entropy: $f$-Regularized Policy Gradients with Coupled Parametrizations
BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation
DaVinci: Reinforcing Visual-Structural Syntax in MLLMs for Generalized Scientific Diagram Parsing
ARM-FM: Automated Reward Machines via Foundation Models for Compositional Reinforcement Learning
WholeBodyVLA: Towards Unified Latent VLA for Whole-body Loco-manipulation Control
Texture Vector-Quantization and Reconstruction Aware Prediction for Generative Super-Resolution
AssetFormer: Modular 3D Assets Generation with Autoregressive Transformer
LogiStory: A Logic-Aware Framework for Multi-Image Story Visualization
COMI: Coarse-to-fine Context Compression via Marginal Information Gain
Grounded Test-Time Adaptation for LLM Agents
NIMO: a Nonlinear Interpretable MOdel
Learning an Image Editing Model without Image Editing Pairs
On the Wasserstein Geodesic Principal Component Analysis of probability measures
Align Your Structures: Generating Trajectories with Structure Pretraining for Molecular Dynamics
OpenFly: A COMPREHENSIVE PLATFORM FOR AERIAL VISION-LANGUAGE NAVIGATION
CoT-Evo: Evolutionary Distillation of Chain-of-Thought for Scientific Reasoning
A Graph Meta-Network for Learning on Kolmogorov–Arnold Networks
Learn More with Less: Uncertainty Consistency Guided Query Selection for RLVR
SeRI: Gradient-Free Sensitive Region Identification in Decision-Based Black-Box Attacks
DOPPLER: Dual-Policy Learning for Device Assignment in Asynchronous Dataflow Graphs
MoAlign: Motion-Centric Representation Alignment for Video Diffusion Models
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization
Winter Soldier: Backdooring Language Models at Pre-Training with Indirect Data Poisoning
Train-before-Test Harmonizes Language Model Rankings
Multi-Agent Guided Policy Optimization
DiscoX: Benchmarking Discourse-Level Translation in Expert Domains
Positional Preservation Embedding for Multimodal Large Language Models
Efficient Best-of-Both-Worlds Algorithms for Contextual Combinatorial Semi-Bandits
From Observations to Events: Event-Aware World Models for Reinforcement Learning
Uncertainty as Feature Gaps: Epistemic Uncertainty Quantification of LLMs in Contextual Question-Answering
ConRep4CO: Contrastive Representation Learning of Combinatorial Optimization Instances across Types
CaReBench: A Fine-grained Benchmark for Video Captioning and Retrieval
EEPO: Exploration-Enhanced Policy Optimization via Sample-Then-Forget
DepthLM: Metric Depth from Vision Language Models
The Serial Scaling Hypothesis
Horizon Imagination: Efficient On-Policy Training in Diffusion World Models
Controllable Video Generation with Provable Disentanglement
TreeGrad-Ranker: Feature Ranking via $O(L)$-Time Gradients for Decision Trees
Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control
Adaptive Mamba Neural Operators
Beyond Structure: Invariant Crystal Property Prediction with Pseudo-Particle Ray Diffraction
HOTA: Hamiltonian framework for Optimal Transport Advection
Realtime Video Frame Interpolation using One-Step Diffusion Sampling
ReFocusEraser: Refocusing for Small Object Removal with Robust Context-Shadow Repair
Incentive-Aligned LLM Summaries
OrchestrationBench: LLM-Driven Agentic Planning and Tool Use in Multi-Domain Scenarios
Transfer Learning in Infinite Width Feature Learning Networks
Characterizing and Mitigating Reasoning Drift in Large Language Models
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks
ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs
Einstein Fields: A Neural Perspective To Computational General Relativity
COMAL: A Convergent Meta-Algorithm for Aligning LLMs with General Preferences
Consis-GCPO: Consistency-Preserving Group Causal Preference Optimization for Vision Customization
Alignment-Weighted DPO: A principled reasoning approach to improve alignment
RobotArena $\infty$: Unlimited Robot Benchmarking via Real-to-Sim Translation
Personalized Collaborative Learning with Affinity-Based Variance Reduction
A Resolution-Agnostic Geometric Transformer for Chromosome Modeling Using Inertial Frame
Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning
Time-to-Move: Training-Free Motion-Controlled Video Generation via Dual-Clock Denoising
Addressing Pitfalls in the Evaluation of Uncertainty Estimation Methods for Natural Language Generation
LLM2Fx-Tools: Tool Calling for Music Post-Production
PetaGAIL++: Utility Optimized Private Trajectory Generation with Imitation Learning
VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety
Process-Verified Reinforcement Learning for Theorem Proving via Lean
Speculative Actions: A Lossless Framework for Faster AI Agents
SONATA: Synergistic Coreset Informed Adaptive Temporal Tensor Factorization
From Neural Networks to Logical Theories: The Correspondence between Fibring Modal Logics and Fibring Neural Networks
On the Sample Complexity of GNNs
ETGS: Explicit Thermodynamics Gaussian Splatting for Dynamic Thermal Reconstruction
Tina: Tiny Reasoning Models via LoRA
INSTANT: Compressing Gradients and Activations for Resource-Efficient Training
REAL: Reading Out Transformer Activations for Precise Localization in Language Model Steering
Towards One-step Causal Video Generation via Adversarial Self-Distillation
RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards
CoMind: Towards Community-Driven Agents for Machine Learning Engineering
ChartGalaxy: A Dataset for Infographic Chart Understanding and Generation
MicroMix: Efficient Mixed-Precision Quantization with Microscaling Formats for Large Language Models
Spectral-guided Physical Dynamics Distillation
Mitigating Semantic Collapse in Generative Personalization with Test-Time Embedding Adjustment
RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents
FastAvatar: Towards Unified and Fast 3D Avatar Reconstruction with Large Gaussian Reconstruction Transformers
Policy Contrastive Decoding for Robotic Foundation Models
Neural Sum-of-Squares: Certifying the Nonnegativity of Polynomials with Transformers
BézierFlow: Learning Bézier Stochastic Interpolant Schedulers for Few-Step Generation
Near-Optimal Sample Complexity Bounds for Constrained Average-Reward MDPs
Near-Optimal Second-Order Guarantees for Model-Based Adversarial Imitation Learning
Influence-Preserving Proxies for Gradient-Based Data Selection in LLM FineTuning
RoboInter: A Holistic Intermediate Representation Suite Towards Robotic Manipulation
Tighter Performance Theory of FedExProx
station2radar: query‑conditioned gaussian splatting for precipitation field
Sample-efficient and Scalable Exploration in Continuous-Time RL
On Natural Ways to Generate and Their Provable Power
Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making
Demystifying Robot Diffusion Policies: Action Memorization and a Simple Lookup Table Alternative
SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors
Get RICH or Die Scaling: Profitably Trading Inference Compute for Robustness
VowelPrompt: Hearing Speech Emotions from Text via Vowel-level Prosodic Augmentation
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Breaking and Fixing Defenses Against Control Flow Hijacking in Multi-Agent Systems
Distilled Pretraining: A modern lens of Data, In-Context Learning and Test-Time Scaling
ORION: Decoupling and Alignment for Unified Autoregressive Understanding and Generation
How Base Frequency Shapes RoPE: An Analytical Study of Frequency-Band Formation
Harmonized Cone for Feasible and Non-conflict Directions in Training Physics-Informed Neural Networks
An Orthogonal Learner for Individualized Outcomes in Markov Decision Processes
Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test
Subquadratic Algorithms and Hardness for Attention with Any Temperature
Mesh Splatting for End-to-end Multiview Surface Reconstruction
Composite Optimization with Error Feedback: the Dual Averaging Approach
Don’t Pass@$k$: A Bayesian Framework for Large Language Model Evaluation
CoEmoGen: Towards Semantically-Coherent and Scalable Emotional Image Content Generation
AutoGPS: Automated Geometry Problem Solving via Multimodal Formalization and Deductive Reasoning
SHE-LoRA: Selective Homomorphic Encryption for Federated Tuning with Heterogeneous LoRA
GTM: A General Time-series Model for Enhanced Representation Learning of Time-Series data
Inference-Time Dynamic Modality Selection for Incomplete Multimodal Classification
TAMMs:~Change Understanding and Forecasting in Satellite Image Time Series with a Temporal-Aware Multimodal Model
Towards Revealing the Effect of Batch Size Scheduling on Pre-training
SPWOOD: Sparse Partial Weakly-Supervised Oriented Object Detection
Attention, Please! Revisiting Attentive Probing Through the Lens of Efficiency
Policy Newton Algorithm in Reproducing Kernel Hilbert Space
Lifelong Embodied Navigation Learning
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
Reasoning on Time-Series for Financial Technical Analysis
A Brain Graph Foundation Model: Pre-Training and Prompt-Tuning across Broad Atlases and Disorders
The Mind's Transformer: Computational Neuroanatomy of LLM-Brain Alignment
ReST-KV: Robust KV Cache Eviction with Layer-wise Output Reconstruction and Spatial-Temporal Smoothing
When Silence Is Golden: Can LLMs Learn to Abstain in Temporal QA and Beyond?
Some Neural Networks Inherently Preserve Subspace Clustering Structure
WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool
Unified Vision-Language-Action Model
Enhancing Language Model Reasoning with Structured Multi-Level Modeling
Beyond Visual Reconstruction Quality: Object Perception-aware 3D Gaussian Splatting for Autonomous Driving
Disentangling Knowledge Representations for Large Language Model Editing
DreamCS: Geometry-Aware Text-to-3D Generation with Unpaired 3D Reward Supervision
Long-Context Generalization with Sparse Attention
MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks
Human or Machine? A Preliminary Turing Test for Speech-to-Speech Interaction
Weight Decay may matter more than µP for Learning Rate Transfer in Practice
Falcon: Fast Proximal Linearization of Normalized Cuts for Unsupervised Image Segmentation
Implicit Models: Expressive Power Scales with Test-Time Compute
SpectralGCD: Spectral Concept Selection and Cross-modal Representation Learning for Generalized Category Discovery
Seek-CAD: A Self-refined Generative Modeling for 3D Parametric CAD Using Local Inference via DeepSeek
Stopping Computation for Converged Tokens in Masked Diffusion-LM Decoding
On the Expressive Power of GNNs for Boolean Satisfiability
H$^3$DP: Triply‑Hierarchical Diffusion Policy for Visuomotor Learning
Scalable Spatio-Temporal SE(3) Diffusion for Long-Horizon Protein Dynamics
Enhancing Vision Transformers for Object Detection via Context-Aware Token Selection and Packing
On Universality of Deep Equivariant Networks
Oversmoothing, "Oversquashing'', Heterophily, Long-Range, and more: Demystifying Common Beliefs in Graph Machine Learning
Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime
Does Higher Interpretability Imply Better Utility? A Pairwise Analysis on Sparse Autoencoders
SPACeR: Self-Play Anchoring with Centralized Reference Models
PCB-Bench: Benchmarking LLMs for Printed Circuit Board Placement and Routing
WideSearch: Benchmarking Agentic Broad Info-Seeking
Process-Level Trajectory Evaluation for Environment Configuration in Software Engineering Agents
IAGA: Identity-Aware Gaussian Approximation for Efficient 3D Molecular Generation
Bi-Criteria Metric Distortion
ATLAS: Alibaba Dataset and Benchmark for Learning-Augmented Scheduling
Automatic and Structure-Aware Sparsification of Hybrid Neural ODEs with Application to Glucose Prediction
Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs
Mapping Overlaps in Benchmarks through Perplexity in the Wild
Behavioral Embeddings of Programs: A Quasi-Dynamic Approach for Optimization Prediction
DiffVax: Optimization-Free Image Immunization Against Diffusion-Based Editing
Minimax Optimal Adversarial Reinforcement Learning
Adaptive Regularization for Large-Scale Sparse Feature Embedding Models
EgoTwin: Dreaming Body and View in First Person
Unlocking the Power of Co-Occurrence in CLIP: A DualPrompt-Driven Method for Training-Free Zero-Shot Multi-Label Classification
Can Vision–Language Models Assess Graphic Design Aesthetics? A Benchmark, Evaluation, and Dataset Perspective.
PMI: Flow-Based Inversion Correction via Proximal Operator
OPRIDE: Efficient Offline Preference-based Reinforcement Learning via In-Dataset Exploration
SPIKE-RL: Video-LLMs meet Bayesian Surprise
DTO-KD: Dynamic Trade-off Optimization for Effective Knowledge Distillation
Geometric Autoencoder Priors for Bayesian Inversion: Learn First Observe Later
QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models
Spinning Straw into Gold: Relabeling LLM Agent Trajectories in Hindsight for Successful Demonstrations
MENLO: From Preferences to Proficiency – Evaluating and Modeling Native-like Quality Across 47 Languages
FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models
Invert4TVG: A Temporal Video Grounding Framework with Inversion Tasks Preserving Action Understanding Ability
Characterizing Human Semantic Navigation in Concept Production as Trajectories in Embedding Space
Shuffling the Data, Extrapolating the Step: Sharper Bias In Constant Step-Size SGD
Generalizable End-to-End Tool-Use RL with Synthetic CodeGym
Benchmarking Open-ended Segmentation
Same Content, Different Representations: A Controlled Study for Table QA
Bootstrapping MLLM for Weakly‑Supervised Class‑Agnostic Object Counting
MixLinear: Extreme Low Resource Multivariate Time Series Forecasting with $0.1K$ Parameters
MARS - A Foundational Map Auto-Regressor
Contextual Causal Bayesian Optimisation
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
The Less You Depend, The More You Learn: Synthesizing Novel Views from Sparse, Unposed Images without Any 3D Knowledge
Rethinking the Gold Standard: Why Discrete Curvature Fails to Fully Capture Over-squashing in GNNs?
Code2Bench: Scaling Source and Rigor for Dynamic Benchmark Construction
PERSONA: Dynamic and Compositional Inference-Time Personality Control via Activation Vector Algebra
Journey to the Centre of Cluster: Harnessing Interior Nodes for A/B Testing under Network Interference
Unified Multi-Modal Interactive and Reactive 3D Motion Generation via Rectified Flow
Accelerating Inference for Multilayer Neural Networks with Quantum Computers
S3OD: Towards Generalizable Salient Object Detection with Synthetic Data
Refine Drugs, Don’t Complete Them: Uniform-Source Discrete Flows for Fragment-Based Drug Discovery
SocialJax: An Evaluation Suite for Multi-agent Reinforcement Learning in Sequential Social Dilemmas
NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation
H2OFlow: Grounding Human-Object Affordances with 3D Generative Models and Dense Diffused Flows
Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning
TS-Attn: Temporal-wise Separable Attention for Multi-Event Video Generation
The Matthew Effect of AI Programming Assistants: A Hidden Bias in Software Evolution
ProSafePrune: Projected Safety Pruning for Mitigating Over-Refusal in LLMs
DeepTRACE: Auditing Deep Research AI Systems for Tracking Reliability Across Citations and Evidence
Retrospective Sparse Attention for Efficient Long-Context Generation
FullPart: Generating each 3D Part at Full Resolution
LaVCa: LLM-assisted Visual Cortex Captioning
LORE: Jointly Learning The Intrinsic Dimensionality and Relative Similarity Structure from Ordinal Data
Virtual Community: An Open World for Humans, Robots, and Society
WearVox: An Egocentric Multichannel Voice Assistant Benchmark for Wearables
Compute-Optimal Quantization-Aware Training
Unifying Stable Optimization and Reference Regularization in RLHF
It's All Just Vectorization: einx, a Universal Notation for Tensor Operations
Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test
ARROW: An Adaptive Rollout and Routing Method for Global Weather Forecasting
WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction
From Evaluation to Defense: Advancing Safety in Video Large Language Models
Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution
EventFlash: Towards Efficient MLLMs for Event-Based Vision
ARTDECO: Toward High-Fidelity On-the-Fly Reconstruction with Hierarchical Gaussian Structure and Feed-Forward Guidance
Polynomial, trigonometric, and tropical activations
No, of Course I Can! Deeper Fine-Tuning Attacks That Bypass Token-Level Safety Mechanisms
MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models
Tricks or Traps? A Deep Dive into RL for LLM Reasoning
MUSE: Model-Agnostic Tabular Watermarking via Multi-Sample Selection
GAS: Improving Discretization of Diffusion ODEs via Generalized Adversarial Solver
Boosting Multi-Domain Reasoning of LLMs via Curvature-Guided Policy Optimization
Statistical Advantage of Softmax Attention: Insights from Single-Location Regression
Improving 2D Diffusion Models for 3D Medical Imaging with Inter‑Slice Consistent Stochasticity
Exploring State-Space Models for Data-Specific Neural Representations
Scaling Linear Attention with Sparse State Expansion
Randomization Boosts KV Caching, Learning Balances Query Load: A Joint Perspective
Following the Navigation: Enhancing Small Language Models Contextual Reasoning with LLM Guidance
Dichotomous Diffusion Policy Optimization
A Law of Data Reconstruction for Random Features (And Beyond)
The Intricate Dance of Prompt Complexity, Quality, Diversity and Consistency in T2I Models
ZeroGR: A Generalizable and Scalable Framework for Zero-Shot Generative Retrieval
Tackling the XAI Disagreement Problem with Adaptive Feature Grouping
A Balanced Neuro-Symbolic Approach for Commonsense Abductive Logic
ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment
Lost in Tokenization: Context as the Key to Unlocking Biomolecular Understanding in Scientific LLMs
Salient Object Ranking via Cyclical Perception-Viewing Interaction Modeling
La-Proteina: Atomistic Protein Generation via Partially Latent Flow Matching
Beyond Uniformity: Regularizing Implicit Neural Representations through a Lipschitz Lens
CLIP-FMoE: Scalable CLIP via Fused Mixture-of-Experts with Enforced Specialization
Less Gaussians, Texture More: 4K Feed-Forward Textured Splatting
Task Vectors, Learned Not Extracted: Performance Gains and Mechanistic Insights
Closing the Modality Gap Aligns Group-Wise Semantics
Sample Complexity and Representation Ability of Test-time Scaling Paradigms
Generative Modeling from Black-Box Corruptions via Self-Consistent Stochastic Interpolants
TS-DDAE: A novel Temporal-Spectral Denoising Diffusion AutoEncoder for Wireless Signal Recognition Model Pre-training
Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners
Knowing When to Quit: Probabilistic Early Exits for Speech Separation Networks
Online Conformal Prediction with Adversarial Feedback via Regret Minimization
Interleave-VLA: Enhancing Robot Manipulation with Image-Text Interleaved Instructions
MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes
Trade in Minutes! Rationality-Driven Agentic System for Quantitative Financial Trading
Score Distillation Beyond Acceleration: Generative Modeling from Corrupted Data
CORDS - Continuous Representations of Discrete Structures
When Weak LLMs Speak with Confidence, Preference Alignment Gets Stronger
A Scalable Constant-Factor Approximation Algorithm for $W_p$ Optimal Transport
End-to-End Probabilistic Framework for Learning with Hard Constraints
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Outrageously Large Context Windows via RACE Attention -- A Family of Non-Linear Attention that can be calculated in Strictly Linear-Time
Pulp Motion: Framing-aware multimodal camera and human motion generation
Two (narrow) heads are better than (an arbitrarily wide) one
Music Flamingo: Scaling Music Understanding in Audio Language Models
The Diffusion Duality, Chapter II: $\Psi$-Samplers and Efficient Curriculum
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer
How does the optimizer implicitly bias the model merging loss landscape?
Does “Do Differentiable Simulators Give Better Policy Gradients?” Give Better Policy Gradients?
ProPerSim: Developing Proactive and Personalized AI Assistants through User-Assistant Simulation
LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation
TokMem: Tokenized Procedural Memory for Large Language Models
Generative Value Conflicts Reveal LLM Priorities
Group Verification-based Policy Optimization for Interactive Coding Agents
CLEAR: Calibrated Learning for Epistemic and Aleatoric Risk
Interaction-aware Representation Modeling With Co-Occurrence Consistency for Egocentric Hand-Object Parsing
A tale of two tails: Preferred and anti-preferred natural stimuli in visual cortex
Beyond Aggregation: Guiding Clients in Heterogeneous Federated Learning
Beyond Entity Correlations: Disentangling Event Causal Puzzles in Temporal Knowledge Graphs
LEGATO: Large-scale End-to-end Generalizable Approach to Typeset OMR
Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning
Pisces: Cryptography-based Private Retrieval-Augmented Generation with Dual-Path Retrieval
RobustSpring: Benchmarking Robustness to Image Corruptions for Optical Flow, Scene Flow and Stereo
Leveraging Explanation to Improve Generalization of Meta Reinforcement Learning
From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning
Decomposing Extrapolative Problem Solving: Spatial Transfer and Length Scaling with Map Worlds
How Many Code and Test Cases Are Enough? Evaluating Test Cases Generation from a Binary-Matrix Perspective
Locally Subspace-Informed Neural Operators for Efficient Multiscale PDE Solving
MolecularIQ: Characterizing Chemical Reasoning Capabilities Through Symbolic Verification on Molecular Graphs
Know When to Abstain: Optimal Selective Classification with Likelihood Ratios
Steering the Herd: A Framework for LLM-based Control of Social Learning
On the Tension Between Optimality and Adversarial Robustness in Policy Optimization
Math Blind: Failures in Diagram Understanding Undermine Reasoning in MLLMs
M4PQA: A Comprehensive QA Dataset for AI Research with Instance-Level Evaluation
Distributionally Robust Classification for Multi-source Unsupervised Domain Adaptation
Co-occurring Associated REtained concepts in Diffusion Unlearning
Trajectory-aware Shifted State Space Models for Online Video Super-Resolution
CTBench: Cryptocurrency Time Series Generation Benchmark
LoopFormer: Elastic-Depth Looped Transformers for Latent Reasoning via Shortcut Modulation
Distilling to Hybrid Attention Models via KL-Guided Layer Selection
SupCLAP: Controlling Optimization Trajectory Drift in Audio-Text Contrastive Learning with Support Vector Regularization
Frequency-Domain Better than Time-Domain for Causal Structure Recovery in Dynamical Systems on Networks
Quantifying Cross-Attention Interaction in Transformers for Interpreting TCR-pMHC Binding
Model-Guided Microstimulation Steers Primate Visual Behavior
Representation-Based Exploration for Language Models: From Test-Time to Post-Training
Combinatorial Rising Bandits
Multihead Mixture of Experts for Classification of Gigapixel Pathology Images
Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective
Gradient Descent Dynamics of Rank-One Matrix Denoising
Sci2Pol: Evaluating and Fine-tuning LLMs on Scientific-to-Policy Brief Generation
M3CoTBench: Benchmark Chain-of-Thought of MLLMs in Medical Image Understanding
Flatness Guided Test-Time Adaptation for Vision-Language Models
Purrception: Variational Flow Matching for Vector-Quantized Image Generation
SysMoBench: Evaluating AI on Formally Specifying Complex Real-World Systems
RefineStat: Efficient Exploration for Probabilistic Program Synthesis
DuPO: Enabling Reliable Self-Verification via Dual Preference Optimization
EntropyLong: Effective Long-Context Training via Predictive Uncertainty
DeNOTS: Stable Deep Neural ODEs for Time Series
ReLaSH: Reconstructing Joint Latent Spaces for Efficient Generation of Synthetic Hypergraphs with Hyperlink Attributes
Decomposing Representation Space into Interpretable Subspaces with Unsupervised Learning
Text2Grad: Reinforcement Learning from Natural Language Feedback
Parameter-Efficient Reinforcement Learning using Prefix Optimization
CHAMMI-75: pre-training multi-channel models with heterogeneous microscopy images
Rethinking Model Calibration through Spectral Entropy Regularization in Medical Image Segmentation
DMAP: A Distribution Map for Text
DESIGNER: Design-Logic-Guided Multidisciplinary Data Synthesis for LLM Reasoning
Constructive Distortion: Improving MLLMs with Attention-Guided Image Warping
mR3: Multilingual Rubric-Agnostic Reward Reasoning Models
A Fictional Q&A Dataset for Studying Memorization and Knowledge Acquisition
Detecting Temporal Misalignment Attacks in Multimodal Fusion for Autonomous Driving
DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models
Guided Policy Optimization under Partial Observability
A.I.R.: Enabling Adaptive, Iterative, and Reasoning-based Frame Selection For Video Question Answering
Pixel to Gaussian: Ultra-Fast Continuous Super-Resolution with 2D Gaussian Modeling
From Embedding to Control: Representations for Stochastic Multi-Object Systems
StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams
Multi-Scale Hypergraph Meets LLMs: Aligning Large Language Models for Time Series Analysis
Analyzing the Training Dynamics of Image Restoration Transformers: A Revisit to Layer Normalization
EVEREST: A Transformer for Probabilistic Rare-Event Anomaly Detection with Evidential and Tail-Aware Uncertainty
On Predictability of Reinforcement Learning Dynamics for Large Language Models
Dynamic-dLLM: Dynamic Cache-Budget and Adaptive Parallel Decoding for Training-Free Acceleration of Diffusion LLM
Convex Efficient Coding
ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases
HARP: Hallucination Detection via Reasoning Subspace Projection
d$^2$Cache: Accelerating Diffusion-Based LLMs via Dual Adaptive Caching
MobileKGQA: On-Device KGQA System on Dynamic Mobile Environments
Derandomized Online-to-Non-convex Conversion for Stochastic Weakly Convex Optimization
DISCO: Diversifying Sample Condensation for Accelerating Model Evaluation
Concept-based Adversarial Attack: a Probabilistic Perspective
Samples Are Not Equal: A Sample Selection Approach for Deep Clustering
ArtVIP: Articulated Digital Assets of Visual Realism, Modular Interaction, and Physical Fidelity for Robot Learning
VeriTrail: Closed-Domain Hallucination Detection with Traceability
From Samples to Scenarios: A New Paradigm for Probabilistic Forecasting
Critique-RL: Training Critiquing Language Models Through Two-Stage RL for Improved Discrimination and Constructive Feedback
Deterministic Bounds and Random Estimates of Metric Tensors on Neuromanifolds
Downgrade to Upgrade: Optimizer Simplification Enhances Robustness in LLM Unlearning
NLI : Non-uniform Linear Interpolation Approximation of Nonlinear Operations for Efficient LLMs Inference
ATGen: Adversarial Reinforcement Learning for Test Case Generation
FSPO: Few-Shot Optimization of Synthetic Preferences Effectively Personalizes to Real Users
Self-Improving Loops for Visual Robotic Planning
TD-JEPA: Latent-predictive Representations for Zero-Shot Reinforcement Learning
Information Shapes Koopman Representation
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation
Faster Vision Transformers with Adaptive Patches
InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
VARestorer: One-Step VAR Distillation for Real-World Image Super-Resolution
KeepLoRA: Continual Learning with Residual Gradient Adaptation
Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance
How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining
Distilling the Thought, Watermarking the Answer: A Principle Semantic Guided Watermark for Reasoning Large Language Models
CTRL&SHIFT: High-quality Geometry-Aware Object Manipulation in Visual Generation
Market Games for Generative Models: Equilibria, Welfare, and Strategic Entry
Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks
Next-ToBE: Probabilistic Next Token-Bag Exploitation for Activating Anticipatory Capacity in LLMs
Rejuvenating Cross-Entropy Loss in Knowledge Distillation for Recommender Systems
Designing Time Series Experiments in A/B Testing with Transformer Reinforcement Learning
Guidance Matters: Rethinking the Evaluation Pitfall for Text-to-Image Generation
Conditional Independent Component Analysis For Estimating Causal Structure with Latent Variables
ICPO: Provable and Practical In-Context Policy Optimization for Test-Time Scaling
BioTamperNet: Affinity-Guided State-Space Model Detecting Tampered Biomedical Images
NeuralOS: Towards Simulating Operating Systems via Neural Generative Models
Cross-Timestep: 3D Diffusion Model with Trans-temporal Memory LSTM and Adaptive Priori Decoding Strategy for Medical Segmentation
SongEcho: Cover Song Generation via Instance-Adaptive Element-wise Linear Modulation
LiteGuard: Efficient Task-Agnostic Model Fingerprinting with Enhanced Generalization
Wiki-R1: Incentivizing Multimodal Reasoning for Knowledge-based VQA via Data and Sampling Curriculum
OSIRIS: Bridging Analog Circuit Design and Machine Learning with Scalable Dataset Generation
Action-Free Offline-To-Online RL via Discretised State Policies
Accelerating Materials Design via LLM-Guided Evolutionary Search
Structured Reasoning for LLMs: A Unified Framework for Efficiency and Explainability
Autoregressive Visual Decoding from EEG Signals
DeRaDiff: Denoising Time Realignment of Diffusion Models
Quantitative Bounds for Length Generalization in Transformers
SCOPED: Score–Curvature Out-of-distribution Proximity Evaluator for Diffusion
UnLoc: Leveraging Depth Uncertainties for Floorplan Localization
LLMs as Rules Oracles: Exploring Real-World Multimodal Reasoning in Tabletop Strategy Game Environments
Can Language Models Discover Scaling Laws?
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
MRAD: Zero-Shot Anomaly Detection with Memory-Driven Retrieval
Concept-TRAK: Understanding how diffusion models learn concepts through concept attribution
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
Batch Pruning by Activation Stability
Towards Self-Robust LLMs: Intrinsic Prompt Noise Resistance via CoIPO
LiTo: Surface Light Field Tokenization
Better Learning-Augmented Spanning Tree Algorithms via Metric Forest Completion
Agentic Reinforced Policy Optimization
Weak-to-Strong Generalization with Failure Trajectories
TRAJECT-Bench:A Trajectory-Aware Benchmark for Evaluating Agentic Tool Use
Test-Time Training Done Right
Learning Correlated Reward Models: Statistical Barriers and Opportunities
From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments
Dynamical properties of dense associative memory
InfBaGel: Human-Object-Scene Interaction Generation with Dynamic Perception and Iterative Refinement
TVTSyn: Content-Synchronous Time-Varying Timbre for Streaming Voice Conversion and Anonymization
Grounding and Enhancing Informativeness and Utility in Dataset Distillation
Topology of Reasoning: Retrieved Cell Complex-Augmented Generation for Textual Graph Question Answering
Provable Guarantees for Automated Circuit Discovery in Mechanistic Interpretability
Unveiling Perceptual Artifacts: A Fine-Grained Benchmark for Interpretable AI-Generated Image Detection
FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of Large Language Models
Correlated Policy Optimization in Multi-Agent Subteams
Exo-Plore: Exploring Exoskeleton Control Space through Human-aligned Simulation
Secure Outlier-Aware Large Language Model Inference
Fresh in memory: Training-order recency is linearly encoded in language model activations
Compose Your Policies! Improving Diffusion-based or Flow-based Robot Policies via Test-time Distribution-level Composition
AutoQVLA: Not All Channels Are Equal in Vision-Language-Action Model's Quantization
CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence
UALM: Unified Audio Language Model for Understanding, Generation and Reasoning
SketchEvo: Leveraging Drawing Dynamics for Enhanced Image Synthesis
Learning Retrieval Models with Sparse Autoencoders
Bird's-eye-view Informed Reasoning Driver
The Tutor-Pupil Augmentation: Enhancing Learning and Interpretability via Input Corrections
On The Expressive Power of GNN Derivatives
f-INE: A Hypothesis Testing Framework for Estimating Influence under Training Randomness
Rating Quality of Diverse Time Series Data by Meta-learning from LLM Judgment
Video Unlearning via Low-Rank Refusal Vector
Learning to Orchestrate Agents in Natural Language with the Conductor
Improved Adversarial Diffusion Compression for Real-World Video Super-Resolution
Test-Time Accuracy-Cost Control in Neural Simulators via Recurrent-Depth
ProofBridge: Auto-Formalization of Natural Language Proofs in Lean via Joint Embeddings
On the Spectral Differences Between NTK and CNTK and Their Implications for Point Cloud Recognition
Safe Continuous-time Multi-Agent Reinforcement Learning via Epigraph Form
AMPED: Adaptive Multi-objective Projection for balancing Exploration and skill Diversification
Training Deep Normalization-Free Spiking Neural Networks with Lateral Inhibition.
DeMo: Decoupled Momentum Optimization
Non-Collaborative User Simulators for Tool Agents
Koopman-Assisted Trajectory Synthesis: A Data Augmentation Framework for Offline Imitation Learning
PoSh: Using Scene Graphs to Guide LLMs-as-a-Judge for Detailed Image Descriptions
Memorizing Long-tail Data Can Help Generalization Through Composition
Proper Velocity Neural Networks
Automated Stateful Specialization for Adaptive Agent Systems
ReTabAD: A Benchmark for Restoring Semantic Context in Tabular Anomaly Detection
Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods
Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP
TaskCraft: Automated Generation of Agentic Tasks
Tucker-FNO: Tensor Tucker-Fourier Neural Operator and its Universal Approximation Theory
Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning
Towards Reliable Benchmarking: A Contamination Free, Controllable Evaluation Framework for Multi-step LLM Function Calling
LogiConBench: Benchmarking Logical Consistencies of LLMs
Mitigating Noise Shift in Denoising Generative Models with Noise Awareness Guidance
VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents
Arbitrary-Shaped Image Generation via Spherical Neural Field Diffusion
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
Wavelet Predictive Representations for Non-Stationary Reinforcement Learning
One Life to Learn: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration
CodeGenGuard: A Robust Watermark for Code Generation Models
AC-Foley: Reference-Audio-Guided Video-to-Audio Synthesis with Acoustic Transfer
A2D: Any-Order, Any-Step Safety Alignment for Diffusion Language Models
Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers
LLaVA-4D: Embedding SpatioTemporal Prompt into LMMs for 4D Scene Understanding
Automated Interpretability Metrics Do Not Distinguish Trained and Random Transformers
Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence
Welfarist Formulations for Diverse Similarity Search
Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique
Efficient Discriminative Joint Encoders for Large Scale Vision-Language Reranking
Feature compression is the root cause of adversarial fragility in neural networks
Frequency-aware Dynamic Gaussian Splatting
Adaptive Concept Discovery for Interpretable Few-Shot Text Classification
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
Independence Test for Linear Non-Gaussian Data and Applications in Causal Discovery
S2R-HDR: A Large-Scale Rendered Dataset for HDR Fusion
Features Emerge as Discrete States: The First Application of SAEs to 3D Representations
FakeXplain: AI-Generated Images Detection via Human-Aligned Grounded Reasoning
GIQ: Benchmarking 3D Geometric Reasoning of Vision Foundation Models with Simulated and Real Polyhedra
Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles
Multi-objective Large Language Model Alignment with Hierarchical Experts
AttTok: Marrying Attribute Tokens with Generative Pre-trained Vision-Language Models towards Medical Image Understanding
Non-Asymptotic Analysis of Efficiency in Conformalized Regression
Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning
Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction
Unsupervised Learning of Efficient Exploration: Pre-training Adaptive Policies via Self-Imposed Goals
KGOT: Unified Knowledge Graph and Optimal Transport Pseudo-Labeling for Molecule-Protein Interaction Prediction
Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs
Adversarial Attacks Already Tell the Answer: Directional Bias-Guided Test-time Defense for Vision-Language Models
GoldenStart: Q-Guided Priors and Entropy Control for Distilling Flow Policies
PonderLM: Pretraining Language Models to Ponder in Continuous Space
Scaling Attention via Feature Sparsity
Streaming Drag-Oriented Interactive Video Manipulation: Drag Anything, Anytime!
CogMoE: Signal-Quality–Guided Multimodal MoE for Cognitive Load Prediction
Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Diffusion Diffusion Process
Language and Experience: A Computational Model of Social Learning in Complex Tasks
Understanding the Learning Phases in Self-Supervised Learning via Critical Periods
Summaries as Centroids for Interpretable and Scalable Text Clustering
Optimal transport unlocks end-to-end learning for single-molecule localization
VITA: Zero-Shot Value Functions via Test-Time Adaptation of Vision–Language Models
Decoupling Positional and Symbolic Attention in Transformers
DIFFSPARSE: ACCELERATING DIFFUSION TRANSFORMERS WITH LEARNED TOKEN SPARSITY
On Optimal Hyperparameters for Differentially Private Deep Transfer Learning
Scheduling Your LLM Reinforcement Learning with Reasoning Trees
Theoretical Guarantees for Causal Discovery on Large Random Graphs
SpaCE-Eval: A Benchmark for Real-World Multi-Modal Reasoning
StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning
Light-X: Generative 4D Video Rendering with Camera and Illumination Control
Finite-Time Convergence Analysis of ODE-based Generative Models for Stochastic Interpolants
Convergence Dynamics of Over-Parameterized Score Matching for a Single Gaussian
Bayesian Test-Time Adaptation via Dirichlet feature projection and GMM-Driven Inference for Motor Imagery EEG Decoding
Prior-based Noisy Text Data Filtering: Fast and Strong Alternative For Perplexity
MMSearch-Plus: Benchmarking Provenance-Aware Search for Multimodal Browsing Agents
TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning
On Measuring Influence in Avoiding Undesired Future
Inference-Time Personalized Safety Control via Paired Difference-in-Means Intervention
DeepRAG: Thinking to Retrieve Step by Step for Large Language Models
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Video
Dual-Space Smoothness for Robust and Balanced LLM Unlearning
Solving the 2-norm k-hyperplane clustering problem via multi-norm formulations
FAME: $\underline{F}$ormal $\underline{A}$bstract $\underline{M}$inimal $\underline{E}$xplanation for neural networks
LLMs Get Lost In Multi-Turn Conversation
Unmasking Backdoors: An Explainable Defense via Gradient-Attention Anomaly Scoring for Pre-trained Language Models
Compositional Generalization from Learned Skills via CoT Training: A Theoretical and Structural Analysis for Reasoning
Directional Convergence, Benign Overfitting of Gradient Descent in leaky ReLU two-layer Neural Networks
Transformers Don’t Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and Implications for Mechanistic Interpretability
LS-Merge: Merging Language Models in Latent Space
Interpretable 3D Neural Object Volumes for Robust Conceptual Reasoning
Abstracting Robot Manipulation Skills via Mixture-of-Experts Diffusion Policies
ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation
Membership Privacy Risks of Sharpness Aware Minimization
Aurora: Towards Universal Generative Multimodal Time Series Forecasting
CR-Net: Scaling Parameter-Efficient Training with Cross-Layer Low-Rank Structure
Cross-Modal Redundancy and the Geometry of Vision–Language Embeddings
Distill-SynthKG: Distilling Knowledge Graph Synthesis Workflow for Improved Coverage and Efficiency
Do We Need All the Synthetic Data? Targeted Image Augmentation via Diffusion Models
Special Unitary Parameterized Estimators of Rotation
Fairness-Aware Multi-view Evidential Learning with Adaptive Prior
Mastering Sparse CUDA Generation through Pretrained Models and Deep Reinforcement Learning
SPRIG: Improving Large Language Model Performance by System Prompt Optimization
PM-KVQ: Progressive Mixed-precision KV Cache Quantization for Long-CoT LLMs
How Far Can Unsupervised RLVR Scale LLM Training?
On the Expressiveness of State Space Models via Temporal Logics
Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models
Curvature-Guided Task Synergy for Skeleton based Temporal Action Segmentation
SigmaDock: Untwisting Molecular Docking with Fragment-Based SE(3) Diffusion
A-TPT: Angular Diversity Calibration Properties for Test-Time Prompt Tuning of Vision-Language Models
Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks
Shift-Tolerant Allocation via Black-Litterman Using Conditional Diffusion Estimates
Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data
Helmsman: Autonomous Synthesis of Federated Learning Systems via Multi-Agent Collaboration
OmniField: Conditioned Neural Fields for Robust Multimodal Spatiotemporal Learning
Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4): Analysis and Variations
LLM-as-a-Prophet: Understanding Predictive Intelligence with Prophet Arena
SpatialHand: Generative Object Manipulation from 3D Prespective
Routing, Cascades, and User Choice for LLMs
Learning to Generate Unit Test via Adversarial Reinforcement Learning
From Parameters to Behaviors: Unsupervised Compression of the Policy Space
IDER: IDEMPOTENT EXPERIENCE REPLAY FOR RELIABLE CONTINUAL LEARNING
Interference-Isolated Elastic Weight Consolidation and Knowledge Calibration for Incremental Object Detection
DistDF: Time-series Forecasting Needs Joint-distribution Wasserstein Alignment
BOLT: Decision‑Aligned Distillation and Budget-Aware Routing for Constrained Multimodal QA on Robots
Decision Aggregation under Quantal Response
Taming Polysemanticity in LLMs: Theory-Grounded Feature Recovery via Sparse Autoencoders
Unveiling the Mechanism of Continuous Representation Full-Waveform Inversion: A Wave Based Neural Tangent Kernel Framework
QPrompt-R1: Real-Time Reasoning for Domain-Generalized Semantic Segmentation via Group-Relative Query Alignment
PETRI: Learning Unified Cell Embeddings from Unpaired Modalities via Early-Fusion Joint Reconstruction
Fair Classification by Direct Intervention on Operating Characteristics
ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge
VSF: Simple, Efficient, and Effective Negative Guidance in Few-Step Image Generation Models By Value Sign Flip
DeLiVR: Differential Spatiotemporal Lie Bias for Efficient Video Deraining
PD$^{2}$GS: Part-Level Decoupling and Continuous Deformation of Articulated Objects via Gaussian Splatting
RedSage: A Cybersecurity Generalist LLM
DisTaC: Conditioning Task Vectors via Distillation for Robust Model Merging
Harnessing Temporal Databases for Systematic Evaluation of Factual Time-Sensitive Question-Answering in LLMs
CLUTCH: Contextualized Language model for Unlocking Text-Conditioned Hand motion modelling in the wild
Long-Context Attention Benchmark: From Kernel Efficiency to Distributed Context Parallelism
Knowledge Distillation as Decontamination? Revisiting the “Data Laundering” Concern
Frayed RoPE and Long Inputs: A Geometric Perspective
Graph-Theoretic Intrinsic Reward: Guiding RL with Effective Resistance
High-dimensional Mean-Field Games by Particle-based Flow Matching
Learning Admissible Heuristics for A*: Theory and Practice
Learning to Summarize by Learning to Quiz: Adversarial Agentic Collaboration for Long Document Summarization
Credit-Budgeted ICPC-Style Coding: When LLM Agents Must Pay for Every Decision
ReCAPA: Hierarchical Predictive Correction to Mitigate Cascading Failures
ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL
Explainable Token-level Noise Filtering for LLM Fine-tuning Datasets
SpikeGen: Decoupled “Rods and Cones” Visual Representation Processing with Latent Generative Framework
Adversarial Déjà Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks
Unleashing Perception-Time Scaling to Multimodal Reasoning Models
Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regression
Evaluating Text Creativity across Diverse Domains: a Dataset and Large Language Model Evaluator
ACPBench Hard: Unrestrained Reasoning about Action, Change, and Planning
DP-Fusion: Token-Level Differentially Private Inference for Large Language Models
TumorChain: Interleaved Multimodal Chain-of-Thought Reasoning for Traceable Clinical Tumor Analysis
Prompt and Parameter Co-Optimization for Large Language Models
Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment
A Unifying Framework for Causal Imitation Learning with Hidden Confounders
Reliable Weak-to-Strong Monitoring of LLM Agents
Accelerated co-design of robots through morphological pretraining
Generalizable Coarse-to-Fine Robot Manipulation via Language-Aligned 3D Keypoints
When Data is the Algorithm: A Systematic Study and Curation of Preference Optimization Datasets
More Than What Was Chosen: LLM-based Explainable Recommendation Beyond Noisy User Preferences
CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs
Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods
Rectified Decoupled Dataset Distillation: A Closer Look for Fair and Comprehensive Evaluation
PCLR: Progressively Compressed LoRA for Multimodal Continual Instruction Tuning
GTool: Graph Enhanced Tool Planning with Large Language Model
ASTGI: Adaptive Spatio-Temporal Graph Interactions for Irregular Multivariate Time Series Forecasting
Fair Graph Machine Learning under Adversarial Missingness Processes
Robust Adversarial Attacks Against Unknown Disturbance via Inverse Gradient Sample
Evaluating and Improving Cultural Awareness of Reward Models for LLM Alignment
Libra: Effective yet Efficient Load Balancing for Large-scale MoE Inference
InfoDet: A Dataset for Infographic Element Detection
CloDS: Visual-Only Unsupervised Cloth Dynamics Learning in Unknown Conditions
Sharp asymptotic theory for Q-learning with \texttt{LD2Z} learning rate and its generalization
Beyond Pass@ 1: Self-Play with Variational Problem Synthesis Sustains RLVR
GeoDiv: Framework for Measuring Geographical Diversity in Text-to-Image Models
Enhancing Hallucination Detection through Noise Injection
UNITE: Universal kNowledge Integration from Task-specific Experts
NeuCLIP: Efficient Large-Scale CLIP Training with Neural Normalizer Optimization
Structural Prognostic Event Modeling for Multimodal Cancer Survival Analysis
MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting
Variational Deep Learning via Implicit Regularization
LoRA meets Riemannion: Muon Optimizer for Parametrization-independent Low-Rank Adapters
CreatiDesign: A Unified Multi-Conditional Diffusion Transformer for Creative Graphic Design
Content-Aware Mamba for Learned Image Compression
ARMOR: Aligning Secure and Safe Large Language Models via Meticulous Reasoning
Retrieval-of-Thought: Efficient Reasoning via Reusing Thoughts
Temporal Generalization: A Reality Check
Controlling Repetition in Protein Language Models
Interactive Agents to Overcome Underspecificity in Software Engineering
Continuum Transformers Perform In-Context Learning by Operator Gradient Descent
Label Smoothing Improves Machine Unlearning
Streaming Autoregressive Video Generation via Diagonal Distillation
Paradigm Shift of GNN Explainer from Label Space to Prototypical Representation Space
Stable Video Infinity: Infinite-Length Video Generation with Error Recycling
TROLL: Trust Regions Improve Reinforcement Learning for Large Language Models
Post-Training Quantization for Video Matting
Neodragon: Mobile Video Generation Using Diffusion Transformer
Preserving Forgery Artifacts: AI-Generated Video Detection at Native Scale
Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks
Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation
Decision-Theoretic Approaches for Improved Learning-Augmented Algorithms
Toward Principled Flexible Scaling for Self-Gated Neural Activation
To Compress or Not? Pushing the Frontier of Lossless GenAI Model Weights Compression with Exponent Concentration
Faster Gradient Methods for Highly-smooth Stochastic Bilevel Optimization
QVGen: Pushing the Limit of Quantized Video Generative Models
Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability
Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving
Maximizing Incremental Information Entropy for Contrastive Learning
Post-hoc Probabilistic Vision-Language Models
Quasi-Equivariant Metanetworks
AtC: Aggregate-then-Calibrate for Human-centered Assessment
TabStruct: Measuring Structural Fidelity of Tabular Data
AnyBCQ: Hardware Efficient Flexible Binary-Coded Quantization for Multi-Precision LLMs
Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression
Proximal Diffusion Neural Sampler
Mechanistic Detection and Mitigation of Hallucination in Large Reasoning Models
Fewer Battles, More Gain: An Information-Efficient Framework for Arena-based LLM Evaluation
Mean Estimation from Coarse Data: Characterizations and Efficient Algorithms
Computing Equilibrium beyond Unilateral Deviation
Revisiting Group Relative Policy Optimization: Insights into On-Policy and Off-Policy Training
MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents
MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark for Reasoning-Intensive Multimodal Retrieval
DUET: DISTILLED LLM UNLEARNING FROM AN EFFICIENTLY CONTEXTUALIZED TEACHER
UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding
PARD: Accelerating LLM Inference with Low‑Cost PARallel Draft Model Adaptation
A Generalized Geometric Theoretical Framework of Centroid Discriminant Analysis for Linear Classification of Multi-dimensional Data
Entering the Era of Discrete Diffusion Models: A Benchmark for Schrödinger Bridges and Entropic Optimal Transport
Spectral Attention Steering for Prompt Highlighting
TAO-Attack: Toward Advanced Optimization-Based Jailbreak Attacks for Large Language Models
ConvRec-R1: Training LLM-based Conversational Recommender Systems with Reinforcement Learning
CerebraGloss: Instruction-Tuning a Large Vision-Language Model for Fine-Grained Clinical EEG Interpretation
Parallel Sampling from Masked Diffusion Models via Conditional Independence Testing
How Transformers Learn Causal Structures In-Context: Explainable Mechanism Meets Theoretical Guarantee
ODE-GS: Latent ODEs for Dynamic Scene Extrapolation with 3D Gaussian Splatting
MAVEN: A Mesh-Aware Volumetric Encoding Network for Simulating 3D Flexible Deformation
RECON: Robust symmetry discovery via Explicit Canonical Orientation Normalization
Randomized Antipodal Search Done Right for Data Pareto Improvement of LLM Unlearning
More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models
UniTrack: Differentiable Graph Representation Learning for Multi-Object Tracking
Decoupling the Class Label and the Target Concept in Machine Unlearning
Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models
SwiftTS: A Swift Selection Framework for Time Series Pre-trained Models via Multi-task Meta-Learning
Hessian-Enhanced Token Attribution (HETA): Interpreting Autoregressive LLMs
Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs
QKV Projections Require a Fraction of Their Memory
GarmentGPT: Compositional Garment Pattern Generation via Discrete Latent Tokenization
Quantized Visual Geometry Grounded Transformer
Beyond In-Domain Detection: SpikeScore for Cross-Domain Hallucination Detection
One Demo Is All It Takes: Planning Domain Derivation with LLMs from A Single Demonstration
High-dimensional Analysis of Synthetic Data Selection
Adaptive Gaussian Expansion for On-the-fly Category Discovery
KaLM-Embedding-V2: Superior Training Techniques and Data Inspire A Versatile Embedding Model
HeurekaBench: A Benchmarking Framework for AI Co-scientist
Towards a Theoretical Understanding of In-context Learning: Stability and Non-I.I.D Generalisation
AnyUp: Universal Feature Upsampling
Neural Multi-Objective Combinatorial Optimization for Flexible Job Shop Scheduling Problems
Patching Gaps In LLM Reasoning With Interventional Training
WebSeer: Training Deeper Search Agents through Reinforcement Learning with Self-Reflection
When MLLMs Meets Compression Distortion: A Coding Paradigm Tailored to MLLMs
MambaSL: Exploring Single-Layer Mamba for Time Series Classification
DM4CT: Benchmarking Diffusion Models for Computed Tomography Reconstruction
Beyond Magic Words: Sharpness-Aware Prompt Evolving for Robust Large Language Models with TARE
Bayesian Influence Functions for Hessian-Free Data Attribution
Training-Free Reward-Guided Image Editing via Trajectory Optimal Control
HYPER: A Foundation Model for Inductive Link Prediction with Knowledge Hypergraphs
Hallucination Reduction with CASAL: Contrastive Activation Steering for Amortized Learning
ATTS: Asynchronous Test-Time Scaling via Conformal Prediction
SliderQuant: Accurate Post-Training Quantization for LLMs
Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning
Directed Semi-Simplicial Learning with Applications to Brain Activity Decoding
Foundation Visual Encoders Are Secretly Few-Shot Anomaly Detectors
Fewer Weights, More Problems: A Practical Attack on LLM Pruning
QuoKA: Query-Oriented KV Selection for Efficient LLM Prefill
Confident Block Diagonal Structure-Aware Invariable Graph Completion for Incomplete Multi-view Clustering
Efficient Differentiable Contact Model with Long-range Influence
Task-Related Token Compression in Multimodal Large Language Models from an Explainability Perspective
Vivid-VR: Distilling Concepts from Text-to-Video Diffusion Transformer for Photorealistic Video Restoration
Towards Interpretable Visual Decoding with Attention to Brain Representations
Dynamic Speculative Agent Planning
SGD-Based Knowledge Distillation with Bayesian Teachers: Theory and Guidelines
Neural Hamilton--Jacobi Characteristic Flows for Optimal Transport
GuardAlign: Robust Safety Alignment in Multimodal Large Language Models
SNAP-UQ: Self-supervised Next-Activation Prediction for Single-Pass Uncertainty in TinyML
Noise-Adaptive Diffusion Sampling for Inverse Problems Without Task-Specific Tuning
Never Saddle: Reparameterized Steepest Descent as Mirror Flow
Low-Pass Filtering Improves Behavioral Alignment of Vision Models
FARI: Robust One-Step Inversion for Watermarking in Diffusion Models
EdiVal-Agent: An Object-Centric Framework for Automated, Fine-Grained Evaluation of Multi-Turn Editing
Detecting Data Contamination in LLMs via In-Context Learning
Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training
Diffusion and Flow-based Copulas: Forgetting and Remembering Dependencies
Generative Human Geometry Distribution
Dynamic Multimodal Activation Steering for Hallucination Mitigation in Large Vision-Language Models
Tracing the Traces: Latent Temporal Signals for Efficient and Accurate Reasoning
SWINGARENA: Adversarial Programming Arena for Long-context GitHub Issue Solving
All-day Multi-scenes Lifelong Vision-and-Language Navigation with Tucker Adaptation
ReFORM: Reflected Flows for On-support Offline RL via Noise Manipulation
Advancing Universal Deep Learning for Electronic-Structure Hamiltonian Prediction of Materials
EA3D: Event-Augmented 3D Diffusion for Generalizable Novel View Synthesis
InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
Revisiting Sharpness-Aware Minimization: A More Faithful and Effective Implementation
HSG-12M: A Large-Scale Dataset of Spatial Multigraphs from the Energy Spectra of non-Hermitian Crystals
Transformers are Inherently Succinct
TableMaster: A Recipe to Advance Table Understanding with Language Models
CIAR: Interval-based Collaborative Decoding for Image Generation Acceleration
Scale-wise Distillation of Diffusion Models
Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception
Scaling Knowledge Editing in LLMs to 100,000 Facts with Neural KV Database
What's the plan? Metrics for implicit planning in LLMs and their application to rhyme generation
MedGMAE: Gaussian Masked Autoencoders for Medical Volumetric Representation Learning
Beware Untrusted Simulators -- Reward-Free Backdoor Attacks in Reinforcement Learning
Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards
Dual-Kernel Adapter: Expanding Spatial Horizons for Data-Constrained Medical Image Analysis
Escaping the Homophily Trap: A Threshold-free Graph Outlier Detection Framework via Clustering-guided Edge Reweighting
GoT-R1: Unleashing Reasoning Capability of Autoregressive Visual Generation with Reinforcement Learning
Aligning Deep Implicit Preferences by Learning to Reason Defensively
Improving Online-to-Nonconvex Conversion for Smooth Optimization via Double Optimism
Multi-turn Evaluation of Anthropomorphic Behaviours in Large Language Models
In-Context Compositional Q-Learning for Offline Reinforcement Learning
Energy-Based Transformers are Scalable Learners and Thinkers
Towards Better Optimization For Listwise Preference in Diffusion Models
How Text Quality Interventions Reshape Neural Scaling Laws for LLMs: Empirical Study
AssoMem: Scalable Memory QA with Multi-Signal Associative Retrieval
Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?
Why We Need New Benchmarks for Local Intrinsic Dimension Estimation
Robustness of Probabilistic Models to Low-Quality Data: A Multi-Perspective Analysis
Preserve and Sculpt: Manifold-Aligned Fine-tuning of Vision-Language Models for Few-Shot Learning
Knowledge Externalization: Reversible Unlearning and Modular Retrieval in Multimodal Large Language Models
Asynchronous Denoising Diffusion Models for Aligning Text-to-Image Generation
CaTs and DAGs: Integrating Directed Acyclic Graphs with Transformers for Causally Constrained Predictions
Human Behavior Atlas: Benchmarking Unified Psychological And Social Behavior Understanding
Differentiable JPEG-based Input Perturbation for Knowledge Distillation Amplification via Conditional Mutual Information Maximization
Toward Safer Diffusion Language Models: Discovery and Mitigation of Priming Vulnerability
Optimal Robust Subsidy Policies for Irrational Agent in Principal-Agent MDPs
Zero-shot Forecasting by Simulation Alone
PTNET: A PROPOSAL-CENTRIC TRANSFORMER NET- WORK FOR 3D OBJECT DETECTION
MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow
SMOTE and Mirrors: Exposing Privacy Leakage from Synthetic Minority Oversampling
Mordal: Automated Pretrained Model Selection for Vision Language Models
When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training
SketchThinker-R1: Towards Efficient Sketch-Style Reasoning in Large Multimodal Models
Full-Graph vs. Mini-Batch Training: Comprehensive Analysis from a Batch Size and Fan-Out Size Perspective
NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context
To Augment or Not to Augment? Diagnosing Distributional Symmetry Breaking
Topological Anomaly Quantification for Semi-supervised Graph Anomaly Detection
HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs
Neural Graduated Assignment for Maximum Common Edge Subgraphs
Healthcare Insurance Fraud Detection via Continual Fiedler Vector Graph Model
Training Dynamics Impact Post-Training Quantization Robustness
Nonparametric Teaching of Attention Learners
Large Language Model Compression with Global Rank and Sparsity Optimization
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Midway Network: Learning Representations for Recognition and Motion from Latent Dynamics
TyphoonMLA: A Mixed Naive-Absorb MLA Kernel For Shared Prefix
Hubble: a Model Suite to Advance the Study of LLM Memorization
JointAVBench: A Benchmark for Joint Audio-Visual Reasoning Evaluation
AgentGym-RL: An Open-Source Framework to Train LLM Agents for Long-Horizon Decision Making via Multi-Turn RL
Towards Sampling Data Structures for Tensor Products in Turnstile Streams
MoRA: Missing Modality Low-Rank Adaptation for Visual Recognition
Grouping Nodes with known Value Differences: A lossless UCT-based Abstraction Algorithm
SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting
Search Self-Play: Pushing the Frontier of Agent Capability without Supervision
IGC-Net for conditional average potential outcome estimation over time
SimpleGVR: A Simple Baseline for Latent-Cascaded Generative Video Super-Resolution
Multilingual Routing in Mixture-of-Experts
Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search
StepORLM: A Self-Evolving Framework With Generative Process Supervision For Operations Research Language Models
WATS: Wavelet-Aware Temperature Scaling for Reliable Graph Neural Networks
Pretraining with Re-parametrized Self-Attention: Unlocking Generalizationin SNN-Based Neural Decoding Across Time, Brains, and Tasks
From Verifiable Dot to Reward Chain: Harnessing Verifiable Reference-based Rewards for Reinforcement Learning of Open-ended Generation
Navigating the Latent Space Dynamics of Neural Models
Inconsistency Biases in Dynamic Data Pruning
Layerwise Federated Learning for Heterogeneous Quantum Clients using Quorus
Through the Lens of Contrast: Self-Improving Visual Reasoning in VLMs
A State-Transition Framework for Efficient LLM Reasoning
SELF-HARMONY: LEARNING TO HARMONIZE SELF-SUPERVISION AND SELF-PLAY IN TEST-TIME REINFORCEMENT LEARNING
DirMoE: Dirichlet-Routed Mixture of Experts
DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty
Financial fraud collusion among generative AI agents in social networks
ExoPredicator: Learning Abstract Models of Dynamic Worlds for Robot Planning
The Open Proof Corpus: A Large-Scale Study of LLM-Generated Mathematical Proofs
Learning Efficient and Interpretable Multi-Agent Communication
StoryAlign: Evaluating and Training Reward Models for Story Generation
Beyond Penalization: Diffusion-based Out-of-Distribution Detection and Selective Regularization in Offline Reinforcement Learning
Automatic Stage Lighting Control: Is it a Rule-Driven Process or Generative Task?
Temporal Graph Thumbnail: Robust Representation Learning with Global Evolutionary Skeleton
Aligning Collaborative View Recovery and Tensorial Subspace Learning via Latent Representation for Incomplete Multi-View Clustering
Federated Learning of Quantile Inference under Local Differential Privacy
Automating the Refinement of Reinforcement Learning Specifications
Divergence-Free Neural Networks with Application to Image Denoising
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces
When Style Breaks Safety: Defending LLMs Against Superficial Style Alignment
FROST: Filtering Reasoning Outliers with Attention for Efficient Reasoning
Path Channels and Plan Extension Kernels: a Mechanistic Description of Planning in a Sokoban RNN
Diffusion Alignment as Variataional Expectation-Maximization
PhyScensis: Physics-Augmented LLM Agents for Complex Physical Scene Generation
HiVid: LLM-Guided Video Saliency For Content-Aware VOD And Live Streaming
EXPO: Stable Reinforcement Learning with Expressive Policies
OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning
Revisiting the Past: Data Unlearning with Model State History
Robust Fine-Tuning from Non-Robust Pretrained Models: Mitigating Suboptimal Transfer With Epsilon-Scheduling
Bridging Generalization Gap of Heterogeneous Federated Clients Using Generative Models
e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs
Refining Hybrid Genetic Search for CVRP via Reinforcement Learning-Finetuned LLM
True Self-Supervised Novel View Synthesis is Transferable
Type-Compliant Adaptation Cascades
Dual Goal Representations
TTT3R: 3D Reconstruction as Test-Time Training
Endowing GPT-4 with a Humanoid Body: Building the Bridge Between Off-the-Shelf VLMs and the Physical World
Are Reasoning LLMs Robust to Interventions on their Chain-of-Thought?
PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing
Stop Unnecessary Reflection: Training LRMs for Efficient Reasoning with Adaptive Reflection and Length Coordinated Penalty
SCAD: Super-Class-Aware Debiasing for Long-Tailed Semi-Supervised Learning
Towards Strategic Persuasion with Language Models
xLSTM Scaling Laws: Competitive Performance with Linear Time-Complexity
SPRINT: Sparse-Dense Residual Fusion for Efficient Diffusion Transformers
Privacy-Protected Causal Survival Analysis Under Distribution Shift
SmellNet: A Large-scale Dataset for Real-world Smell Recognition
GOT-Edit: Geometry-Aware Generic Object Tracking via Online Model Editing
Zephyrus: An Agentic Framework for Weather Science
pFedMMA: Personalized Federated Fine-Tuning with Multi-Modal Adapter for Vision-Language Models
Single Index Bandits: Generalized Linear Contextual Bandits with Unknown Reward Functions
UNIVERSAL AND EFFICIENT LOADING BALANCING FOR RL TRAINING OF LARGE MULTIMODAL MODELS
Fair Conformal Classification via Learning Representation-Based Groups
Native Reasoning Models: Training Language Models to Reason on Unverifiable Data
OccDriver: Future Occupancy Guided Dual-branch Trajectory Planner in Autonomous Driving
TriQDef: Disrupting Semantic and Gradient Alignment to Prevent Adversarial Patch Transferability in Quantized Neural Networks
AlphaFlow: Understanding and Improving MeanFlow Models
Beyond Multi-Token Prediction: Pretraining LLMs with Future Summaries
SARM: Stage-Aware Reward Modeling for Long Horizon Robot Manipulation
General search techniques without common knowledge for imperfect-information games, and application to superhuman Fog of War chess
EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models
InfoScan: Information-Efficient Visual Scanning via Resource-Adaptive Walks
What Layers When: Learning to Skip Compute in LLMs with Residual Gates
Aligning Visual Foundation Encoders to Tokenizers for Diffusion Models
Self-Jailbreaking: Language Models Can Reason Themselves Out of Safety Alignment After Benign Reasoning Training
Compositional Neuro-Symbolic Concepts in Neural Activities
PCPO: Proportionate Credit Policy Optimization for Preference Alignment of Image Generation Models
TimeSeg: An Information-Theoretic Segment-Wise Explainer for Time-Series Predictions
Poly-attention: a general scheme for higher-order self-attention
SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models
VER: Vision Expert Transformer for Robot Learning via Foundation Distillation and Dynamic Routing
CE-Nav: Flow-Guided Reinforcement Refinement for Cross-Embodiment Local Navigation
Decoupled Q-Chunking
Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model
Interp3D: Correspondence-aware Interpolation for Generative Textured 3D Morphing
New Hybrid Fine-Tuning Paradigm for LLMs: Algorithm Design and Convergence Analysis Framework
ASSESS: A Semantic and Structural Evaluation Framework for Statement Similarity
PairFlow: Closed-Form Source-Target Coupling for Few-Step Generation in Discrete Flow Models
Reliable Probabilistic Forecasting of Irregular Time Series through Marginalization-Consistent Flows
Navigating the Accuracy-Size Trade-Off with Flexible Model Merging
Understanding Transformers for Time Series: Rank Structure, Flow-of-ranks, and Compressibility
Selective Expert Guidance for Effective and Diverse Exploration in Reinforcement Learning of LLMs
UIS-Digger: Towards Comprehensive Research Agent Systems for Real-world Unindexed Information Seeking
OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding
Image Quality Assessment for Embodied AI
Time Is All It Takes: Spike-Retiming Attacks on Event-Driven Spiking Neural Networks
Residual Feature Integration is Sufficient to Prevent Negative Transfer
Bayesian Post Training Enhancement of Regression Models with Calibrated Rankings
Single-stream Policy Optimization
Not All Bits Are Equal: How Model Scale Changes Memory-Optimal Reasoning
Inference-time scaling of diffusion models through classical search
Distributional Equivalence in Linear Non-Gaussian Latent-Variable Cyclic Causal Models: Characterization and Learning
Compositional Diffusion with Guided search for Long-Horizon Planning
RoboPARA: Dual-Arm Robot Planning with Parallel Allocation and Recomposition Across Tasks
Fused-Planes: Why Train a Thousand Tri-Planes When You Can Share?
ExPO-HM: Learning to Explain-then-Detect for Hateful Meme Detection
Learning to Parallel: Accelerating Diffusion Large Language Models via Adaptive Parallel Decoding
Differentiable Model Predictive Control on the GPU
Eigen-1: Scientific Reasoning through Adaptive Multi-Agent Refinement and Monitor-based RAG
VINCIE: Unlocking In-context Image Editing from Video
LFQA-E: Carefully Benchmarking Long-form QA Evaluation
LaplacianFormer:Rethinking Linear Attention with Laplacian Kernel
Grounding or Guessing? Visual Signals for Detecting Hallucinations in Sign Language Translation
Learning for Highly Faithful Explainability
Continuous Space-Time Video Super-Resolution with 3D Fourier Fields
Spatial-DISE: A Unified Benchmark for Evaluating Spatial Reasoning in Vision-Language Models
DASH: Deterministic Attention Scheduling for High-throughput Reproducible LLM Training
Rectifying LLM Thought from Lens of Optimization
Query-Specific Causal Graph Pruning Under Tiered Knowledge
Scaf-GRPO: Scaffolded Group Relative Policy Optimization for Enhancing LLM Reasoning
DevOps-Gym: Benchmarking AI Agents in Software DevOps Cycle
Spotlight on Token Perception for Multimodal Reinforcement Learning
The Human Brain as a Dynamic Mixture of Expert Models in Video Understanding
Learning From Dictionary: Enhancing Robustness of Machine-Generated Text Detection in Zero-Shot Language via Adversarial Training
DEAS: DEtached value learning with Action Sequence for Scalable Offline RL
Thought Branches: Interpreting LLM Reasoning Requires Resampling
Physics-Informed Audio-Geometry-Grid Representation Learning for Universal Sound Source Localization
From Vicious to Virtuous Cycles: Synergistic Representation Learning for Unsupervised Video Object-Centric Learning
Pixel-Level Residual Diffusion Transformer: Scalable 3D CT Volume Generation
Learning Exposure Mapping Functions for Inferring Heterogeneous Peer Effects
Distillation of Large Language Models via Concrete Score Matching
Boosted Trees on a Diet: Compact Models for Resource-Constrained Devices
Efficient Spatially-Variant Convolution via Differentiable Sparse Kernel Complex
SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation
TRIBE: TRImodal Brain Encoder for whole-brain fMRI response prediction
Multifidelity Simulation-based Inference for Computationally Expensive Simulators
SCUBA: Salesforce Computer Use Benchmark
Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs
Conformal Robustness Control: A New Strategy for Robust Decision
Why DPO is a Misspecified Estimator and How to Fix It
Rex-Thinker: Grounded Object Referring via Chain-of-Thought Reasoning
Short Window Attention Enables Long-Term Memorization
On the Convergence of Two-Layer Kolmogorov-Arnold Networks with First-Layer Training
DeAltHDR: Learning HDR Video Reconstruction from Degraded Alternating Exposure Sequences
Change Point Localization and Inference in Dynamic Multilayer Networks
GradPruner: Gradient-guided Layer Pruning Enabling Efficient Fine-Tuning and Inference for LLMs
Entropy-Based Block Pruning for Efficient Large Language Models
Towards Understanding The Calibration Benefits of Sharpness-Aware Minimization
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
Variational Inference for Cyclic Learning
DispViT: Direct Stereo Disparity Regression with a Single-Stream Vision Transformer
STAR: Strategy-driven Automatic Jailbreak Red-teaming For Large Language Model
AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4D Scenes
Diagnosing Failures in Generalization from Task-Relevant Representational Geometry
Understanding the Robustness of Distributed Self-Supervised Learning Frameworks Against Non-IID Data
Sparse Attention Adaptation for Long Reasoning
Attention Sinks and Compression Valleys in LLMs are Two Sides of the Same Coin
FSA: An Alternative Efficient Implementation of Native Sparse Attention Kernel
Zebra-CoT: A Dataset for Interleaved Vision-Language Reasoning
Beyond RAG vs. Long-Context: Learning Distraction-Aware Retrieval for Efficient Knowledge Grounding
A Sharp KL-Convergence Analysis for Diffusion Models under Minimal Assumptions
Video Scene Segmentation with Genre and Duration Signals
Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders
TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation
Structurally Human, Semantically Biased: Detecting LLM-Generated References with Embeddings and GNNs
Stackelberg Coupling of Online Representation Learning and Reinforcement Learning
Making Slow Thinking Faster: Compressing LLM Chain-of-Thought via Step Entropy
PrismAudio: Decomposed Chain-of-Thought and Multi-dimensional Rewards for Video-to-Audio Generation
Learning AND–OR Templates for Compositional Representation in Art and Design
Comparing the learning dynamics of in-context learning and fine-tuning in language models
Efficient Reasoning with Balanced Thinking
MedVR: Annotation-Free Medical Visual Reasoning via Agentic Reinforcement Learning
Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks
A Statistical Theory of Overfitting for Imbalanced Classification
When LLMs get significantly worse: A statistical approach to detect model degradations
Token-Importance Guided Direct Preference Optimization
Only Brains Align with Brains: Cross-Region Patterns Expose Limits of Normative Models
Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning
Overshoot and Shrinkage in Classifier-Free Guidance: From Theory to Practice
THE END OF MANUAL DECODING: TOWARDS TRULY END-TO-END LANGUAGE MODELS
Local Entropy Search over Descent Sequences for Bayesian Optimization
Chart Deep Research in LVLMs via Parallel Relative Policy Optimization
How Reliable is Language Model Micro-Benchmarking?
Bayesian Parameter Shift Rules in Variational Quantum Eigensolvers
On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations
Generating metamers of human scene understanding
RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents
A Dense Subset Index for Collective Query Coverage
UniHM: Unified Dexterous Hand Manipulation with Vision Language Model
Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation
Annotation-Efficient Honesty Alignment via Confidence Elicitation and Calibration
Enhancing Trustworthiness of Fine-Tuned LLMs via Regularized Subset Selection
Out of the Shadows: Exploring a Latent Space for Neural Network Verification
DeepScientist: Advancing Frontier-Pushing Scientific Findings Progressively
DiaBlo: Diagonal Blocks Are Sufficient For Finetuning
Enhancing Shortcut Models with Cumulative Self-Consistency Loss for One-Step Diffusion
Addressing divergent representations from causal interventions on neural networks
Boosting Medical Visual Understanding From Multi-Granular Language Learning
Fine-tuning Done Right in Model Editing
WALT: Web Agents that Learn Tools
Matched Data, Better Models: Target Aligned Data Filtering with Sparse Features
Teach to Reason Safely: Policy-Guided Safety Tuning for MLRMs
LadderSym: A Multimodal Interleaved Transformer for Music Practice Error Detection
SoFlow: Solution Flow Models for One-Step Generative Modeling
UrbanGraph: Physics-Informed Spatio-Temporal Dynamic Heterogeneous Graphs for Urban Microclimate Prediction
Using Reinforcement Learning to Train Large Language Models to Explain Human Decisions
A Joint Diffusion Model with Pre-Trained Priors for RNA Sequence–Structure Co-Design
One step further with Monte-Carlo sampler to guide diffusion better
FACET: A Fragment-Aware Conformer Ensemble Transformer
Copy-Paste to Mitigate Large Language Model Hallucinations
Efficient Regression-based Training of Normalizing Flows for Boltzmann Generators
Text2Interact: High-Fidelity and Diverse Text-to-Two-Person Interaction Generation
EchoGen: Generating Visual Echoes in Any Scene via Feed-Forward Subject-Driven Auto-Regressive Model
Branched Schrödinger Bridge Matching
Plan and Budget: Effective and Efficient Test-Time Scaling on Reasoning Large Language Models
Gradient-Based Diversity Optimization with Differentiable Top-$k$ Objective
Pre-training Limited Memory Language Models with Internal and External Knowledge
Enhanced Generative Model Evaluation with Clipped Density and Coverage
HiGS: History-Guided Sampling for Plug-and-Play Enhancement of Diffusion Models
Rainbow Padding: Mitigating Early Termination in Instruction-Tuned Diffusion LLMs
SpikePingpong: Spike Vision-based Fast-Slow Pingpong Robot System
Directional Sheaf Hypergraph Networks: Unifying Learning on Directed and Undirected Hypergraphs
Foundation Models for Causal Inference via Prior-Data Fitted Networks
Physics-Informed Inference Time Scaling for Solving High-Dimensional Partial Differential Equations
From Utterance to Vividity: Training Expressive Subtitle Translation LLM via Adaptive Local Preference Optimization
Mitigating Privacy Risk via Forget Set-Free Unlearning
A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling
CPQS-Tuning: A Model Self-Perception-Based Data Filtering Algorithm for Efficient Instruction Fine-Tuning
Revisiting Confidence Calibration for Misclassification Detection in VLMs
AlphaBench: Benchmarking Large Language Models in Formulaic Alpha Factor Mining
Zero-Shot Adaptation of Behavioral Foundation Models to Unseen Dynamics
Reinforcement Learning from Dynamic Critic Feedback for Free-Form Generations
Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
Closing the Gap Between Text and Speech Understanding in LLMs
SeeDNorm: Self-Rescaled Dynamic Normalization
Can Speech LLMs Think while Listening?
PoinnCARE: Hyperbolic Multi-Modal Learning for Enzyme Classification
Text2Arch: A Dataset for Generating Scientific Architecture Diagrams from Natural Language Descriptions
Adaptive Debiasing Tsallis Entropy for Test-Time Adaptation
Batch and Sequential Unlearning for Neural Networks
Bilevel Optimization with Lower-Level Uniform Convexity: Theory and Algorithm
AbdCTBench: Learning Clinical Biomarker Representations from Abdominal Surface Geometry
Discrete Diffusion for Bundle Construction
CryoSplat: Gaussian Splatting for Cryo-EM Homogeneous Reconstruction
Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs
Translating Flow to Policy via Hindsight Online Imitation
Customizing Visual Emotion Evaluation for MLLMs: An Open-vocabulary, Multifaceted, and Scalable Approach
Partially Equivariant Reinforcement Learning in Symmetry-Breaking Environments
Out of the Memory Barrier: A Highly Memory-Efficient Training System for LLMs with Million-Token Contexts
Reward Models Inherit Value Biases from Pretraining
DexMove: Learning Tactile-Guided Non-Prehensile Manipulation with Dexterous Hands
In-Place Test-Time Training
Product of Experts for Visual Generation
Improving Feasibility via Fast Autoencoder-Based Projections
Bayesian Attention Mechanism: A Probabilistic Framework for Positional Encoding and Context Length Extrapolation
How Do Medical MLLMs Fail? A Study on Visual Grounding in Medical Images
Topological Causal Effects
Mirror Flow Matching with Heavy-Tailed Priors for Generative Modeling on Convex Domains
World2Minecraft: Occupancy-Driven simulated scenes Construction
When Does Divide and Conquer Work for Long Context LLM? A Noise Decomposition Framework
Identifying and Evaluating Inactive Heads in Pretrained LLMs
Lipschitz Bandits with Stochastic Delayed Feedback
Precise and Interpretable Editing of Code Knowledge in Large Language Models
GmNet: Revisiting Gating Mechanisms From A Frequency View
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Language Model Planning from an Information Theoretic Perspective
Less is more: Clustered Cross-Covariance Control for Offline RL
Representation Alignment for Diffusion Transformers without External Components
MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning
ProxyThinker: Test-Time Guidance through Small Visual Reasoners
A Derandomization Framework for Structure Discovery: Applications in Neural Networks and Beyond
GAVEL: Towards Rule-Based Safety through Activation Monitoring
MILR: Improving Multimodal Image Generation via Test-Time Latent Reasoning
Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
LightMem: Lightweight and Efficient Memory-Augmented Generation
Learning Dynamics of Logits Debiasing for Long-Tailed Semi-Supervised Learning
Understanding and Improving Hyperbolic Deep Reinforcement Learning
AgentMath: Empowering Mathematical Reasoning for Large Language Models via Tool-Augmented Agent
From Text to Talk: Audio-Language Model Needs Non-Autoregressive Joint Training
Mixed-Curvature Tree-Sliced Wasserstein Distance
Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning
Fore-Mamba3D: Mamba-based Foreground-Enhanced Encoding for 3D Object Detection
Robustness in the Face of Partial Identifiability in Reward Learning
Toward Practical Equilibrium Propagation: Brain-inspired Recurrent Neural Network with Feedback Regulation and Residual Connections
Generative Diffusion Prior Distillation for Long-Context Knowledge Transfer
RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization
Robust Training of Neural Networks at Arbitrary Precision and Sparsity
OrderDP: A Theoretically Guaranteed Lossless Dynamic Data Pruning Framework
Revela: Dense Retriever Learning via Language Modeling
ASTRAEA: A Token-wise Acceleration Framework for Video Diffusion Transformers
Symmetric Space Learning for Combinatorial Generalization
MEGS^{2}: Memory-Efficient Gaussian Splatting via Spherical Gaussians and Unified Pruning
ATOM: A Pretrained Neural Operator for Multitask Molecular Dynamics
Understanding the Emergence of Seemingly Useless Features in Next-Token Predictors
TIPS: Turn-level Information-Potential Reward Shaping for Search-Augmented LLMs
Agentic Context Engineering: Learning Comprehensive Contexts for Self-Improving Language Models
Differentiable Lifting for Topological Neural Networks
Nasty Adversarial Training: A Probability Sparsity Perspective for Robustness Enhancement
Memba: Membrane-driven Parameter-Efficient Fine-Tuning for Mamba
VQ-Transplant: Efficient VQ-Module Integration for Pre-trained Visual Tokenizers
Task-Adaptive Parameter-Efficient Fine-Tuning for Weather Foundation Models
Group-Normalized Implicit Value Optimization for Language Models
QuRL: Low-Precision Reinforcement Learning for Efficient Reasoning
Turbo-DDCM: Fast and Flexible Zero-Shot Diffusion-Based Image Compression
FedDAG: Clustered Federated Learning via Global Data and Gradient Integration for Heterogeneous Environments
DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science
EVLP: Learning Unified Embodied Vision-Language Planner with Reinforced Supervised Fine-Tuning
Arbitrary Generative Video Interpolation
Spatially Guided Training for Vision-Language-Action Model
SpeechOp: Inference-Time Task Composition for Generative Speech Processing
Declarative Audio Editing with Audio Language Model
An Expanded Benchmark that Rediscovers and Affirms the Edge of Uncertainty Sampling for Active Learning in Tabular Datasets
Constitutional Classifiers++: Production-Grade Defenses against Universal Jailbreaks
CoCoDiff: Correspondence-Consistent Diffusion Model for Fine-grained Style Transfer
Trajectory Generation with Conservative Value Guidance for Offline Reinforcement Learning
Self-Improving Vision-Language-Action Models with Data Generation via Residual RL
BoRA: Towards More Expressive Low-Rank Adaptation with Block Diversity
Reinforced Latent Reasoning for LLM-based Recommendation
Cortical Policy: A Dual-Stream View Transformer for Robotic Manipulation
VisionLaw: Inferring Interpretable Intrinsic Dynamics from Visual Observations via Bilevel Optimization
Motion-R1: Enhancing Motion Generation with Decomposed Chain-of-Thought and RL Binding
OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot
IMSE: Intrinsic Mixture of Spectral Experts Fine-tuning for Test-Time Adaptation
EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing
Pose-RFT: Aligning MLLMs for 3D Pose Generation via Hybrid Action Reinforcement Fine-Tuning
Action-aware Dynamic Pruning for Efficient Vision-Language-Action Manipulation
XIL: Cross-Expanding Incremental Learning
GTR-Bench: Evaluating Geo-Temporal Reasoning in Vision-Language Models
EchoMotion: Unified Human Video and Motion Generation via Dual-Modality Diffusion Transformer
Path Matters: Unveiling Geometric Implicit Bias via Curvature-Aware Sparse View Optimization
ERGO: Efficient High-Resolution Visual Understanding for Vision-Language Models
CodeSense: a Real-World Benchmark and Dataset for Code Semantic Reasoning
DND: Boosting Large Language Models with Dynamic Nested Depth
CoFact: Conformal Factuality Guarantees for Language Models under Distribution Shift
Hidden Breakthroughs in Language Model Training
Reversible Primitive–Composition Alignment for Continual Vision–Language Learning
The Unseen Frontier: Pushing the Limits of LLM Sparsity with Surrogate-Free ADMM
P$^2$-DPO:Grounding Hallucination in Perceptual Processing via Calibration Direct Preference Optimization
Scaling Multi-Task Bayesian Optimization with Large Language Models
Transfer Paramatters: Optimal per-Module Hyperparameters Across All Scaling Axes
Visual Autoregressive Modeling for Instruction-Guided Image Editing
Tackling Time-Series Forecasting Generalization via Mitigating Concept Drift
Towards Improvisational TAMP: Learning Low-Level Shortcuts in Abstract Planning Graphs
Flow Matching Policy Gradients
Online Navigation Refinement: Achieving Lane-Level Guidance by Associating Standard-Definition and Online Perception Maps
Unlocking Full Efficiency of Token Filtering in Large Language Model Training
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model
Online Rounding and Learning Augmented Algorithms for Facility Location
Don't Throw Away Your Pretrained Model
SiNGER: A Clearer Voice Distills Vision Transformers Further
Reverse Distillation: Disentangling and Scaling Protein Language Model Representations
Benchmarking Overton Pluralism in LLMs
GHOST: Hallucination-Inducing Image Generation for Multimodal LLMs
Dual Perspectives on Non-Contrastive Self-Supervised Learning
Evoking User Memory: Personalizing LLM via Recollection-Familiarity Adaptive Retrieval
$\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers
From Natural Alignment to Conditional Controllability in Multimodal Dialogue
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows
Conformalized Hierarchical Calibration for Uncertainty-Aware Adaptive Hashing
MATA: A Trainable Hierarchical Automaton System for Multi-Agent Visual Reasoning
NDAD: Negative-Direction Aware Decoding for Large Language Models via Controllable Hallucination Signal Injection
Beyond Pairwise: Empowering LLM Alignment With (Ranked) Choice Modeling
In-Context Watermarks for Large Language Models
AWM: Accurate Weight-Matrix Fingerprint for Large Language Models
Extreme Weather Nowcasting via Local Precipitation Pattern Prediction
Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting
Variational Reasoning for Language Models
VisuRiddles: Fine-grained Perception is a Primary Bottleneck for Multimodal Large Language Models in Abstract Visual Reasoning
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation
Talk, Evaluate, Diagnose: User-aware Agent Evaluation with Automated Error Analysis
DiMeR: Disentangled Mesh Reconstruction Model with Normal-only Geometry Training
AutoCodeBench: Large Language Models are Automatic Code Benchmark Generators
Towards Multimodal Time Series Anomaly Detection with Semantic Alignment and Condensed Interaction
Cancer-Myth: Evaluating Large Language Models on Patient Questions with False Presuppositions
DeepWeightFlow: Re-Basined Flow Matching for Generating Neural Network Weights
Bridging Fairness and Explainability: Can Input-Based Explanations Promote Fairness in Hate Speech Detection?
MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites
Improving Attributed Long-form Question Answering with Intent Awareness
ODI-Bench: Can MLLMs Understand Immersive Omnidirectional Environments?
AdaRank: Adaptive Rank Pruning for Enhanced Model Merging
Why Prototypes Collapse: Diagnosing and Preventing Partial Collapse in Prototypical Self-Supervised Learning
On the Shelf Life of Finetuned LLM-Judges: Future Proofing, Backward Compatibility, and Question Generalization
SpatiaLab: Can Vision–Language Models Perform Spatial Reasoning in the Wild?
Bottlenecked Transformers: Periodic KV Cache Consolidation for Generalised Reasoning
Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining
KRAMABENCH: A Benchmark for AI Systems on Data-to-Insight Pipelines over Data Lakes
HiCache: A Plug-in Scaled-Hermite Upgrade for Taylor-Style Cache-then-Forecast Diffusion Acceleration
Sparse Imagination for Efficient Visual World Model Planning
RewardEval: Advancing Reward Model Evaluation
MobiEdit: Resource-efficient Knowledge Editing for Personalized On-device LLMs
Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers
Convergence Analysis of Tsetlin Machines for Basic Boolean Operators under Noise-Free and Noisy Training Conditions
Micro-Macro Coupled Koopman Modeling on Graph for Traffic Flow Prediction
Designing Affine-Invariant Neural Networks for Photometric Corruption Robustness and Generalization
Point-UQ: An Uncertainty-Quantification Paradigm for Point Cloud Few-Shot Class Incremental Learning
Counterfactual LLM-based Framework for Measuring Rhetorical Style
Shop-R1: Rewarding LLMs to Simulate Human Behavior in Online Shopping via Reinforcement Learning
RNE: plug-and-play diffusion inference-time control and energy-based training
KL-Regularized Reinforcement Learning is Designed to Mode Collapse
Johnson-Lindenstrauss Lemma Guided Network for Efficient 3D Medical Segmentation
When Machine Learning Gets Personal: Evaluating Prediction and Explanation
Exploring Mode Connectivity in Krylov Subspace for Domain Generalization
Spatial Reasoning with Vision-Language Models in Ego-Centric Multi-View Scenes
RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation
Learning Survival Distributions with Individually Calibrated Asymmetric Laplace Distribution
Joint Selection for Large-Scale Pre-Training Data via Policy Gradient-based Mask Learning
RepIt: Steering Language Models with Concept-Specific Refusal Vectors
REMem: Reasoning with Episodic Memory in Language Agent
PROTDYN: A FOUNDATION PROTEIN LANGUAGE MODEL FOR THERMODYNAMICS AND DYNAMICS GENERATION
RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
Forget Forgetting: Continual Learning in a World of Abundant Memory
GT-Space: Enhancing Heterogeneous Collaborative Perception with Ground Truth Feature Space
Towards Quantifying Long-Range Interactions in Graph Machine Learning: a Large Graph Dataset and a Measurement
Imitating the Truth: Attention-aware Truth-Guided Enhancement for Hallucination Mitigation in Large Vision-Language Models
From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors
BoGrape: Bayesian optimization over graphs with shortest-path encoded
Optimizing Agent Planning for Security and Autonomy
Beyond Match Maximization and Fairness: Retention-Objectified Two-Sided Matching
Achieving low-bit Muon through subspace preservation and grid quantization
SSVPO: Effective Step-Level Credit Assignment for RL Training of Language Models
SketchingReality: From Freehand Scene Sketches to Photorealistic Images
Self-Improving Skill Learning for Robust Skill-based Meta-Reinforcement Learning
Barriers for Learning in an Evolving World: Mathematical Understanding of Loss of Plasticity
Diffusion Language Model Knows the Answer Before It Decodes
CONCUR: A Framework for Continual Constrained and Unconstrained Routing
Thyme: Think Beyond Images
Minimax-Optimal Aggregation for Density Ratio Estimation
Dynamic Reflections: Probing Video Representations with Text Alignment
Measuring and Mitigating Rapport Bias of Large Language Models under Multi-Agent Social Interactions
ActivationReasoning: Logical Reasoning in Latent Activation Spaces
Block Recurrent Dynamics in Vision Transformers
How to Square Tensor Networks and Circuits Without Squaring Them
MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
Multiple-Prediction-Powered Inference
Antislop: A Comprehensive Framework for Identifying and Eliminating Repetitive Patterns in Language Models
ResT: Reshaping Token-Level Policy Gradients for Tool-Use Large Language Models
TRIDENT: Cross-Domain Trajectory Spatio-Temporal Representation via Distance-Preserving Triplet Learning
KnowGuard: Knowledge-Driven Abstention for Multi-Round Clinical Reasoning
SCRIBES: Web-Scale Script-Based Semi-Structured Data Extraction with Reinforcement Learning
Rethinking Policy Diversity in Ensemble Policy Gradient in Large-Scale Reinforcement Learning
Is Graph Unlearning Ready for Practice? A Benchmark on Efficiency, Utility, and Forgetting
GoalRank: Group-Relative Optimization for a Large Ranking Model
FutureMind: Equipping Small Language Models with Strategic Thinking-Pattern Priors via Adaptive Knowledge Distillation
VidGuard-R1: AI-Generated Video Detection and Explanation via Reasoning MLLMs and RL
Probability Distributions Computed by Hard-Attention Transformers
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
DynaGuard: A Dynamic Guardian Model With User-Defined Policies
Steering Diffusion Models Towards Credible Content Recommendation
APC-RL: Exceeding data-driven behavior priors with adaptive policy composition
Learning Concept Bottleneck Models from Mechanistic Explanations
Sequences of Logits Reveal the Low Rank Structure of Language Models
Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis
Geometry-aware Policy Imitation
AFTER: Mitigating the Object Hallucination of LVLM via Adaptive Factual-Guided Activation Editing
LiveResearchBench: Benchmarking Single- and Multi-Agent Systems for Citation-Grounded Deep Research
Semi-Supervised Preference Optimization with Limited Feedback
Multi-Condition Conformal Selection
Contamination Detection for VLMs Using Multi‑Modal Semantic Perturbations
Neural Synchrony Between Socially Interacting Language Models
Towards Knowledge‑and‑Data‑Driven Organic Reaction Prediction: RAG‑Enhanced and Reasoning‑Powered Hybrid System with LLMs
Global Resolution: Optimal Multi-Draft Speculative Sampling via Convex Optimization
StochasTok: Improving Fine-Grained Subword Understanding in LLMs
TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Near Optimal Robust Federated Learning Against Data Poisoning Attack
ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation
Transformers with Endogenous In-Context Learning: Bias Characterization and Mitigation
Fine-Grained Activation Steering: Steering Less, Achieving More
A Relative Error-Based Evaluation Framework of Heterogeneous Treatment Effect Estimators
STEM: SCALING TRANSFORMERS WITH EMBEDDING MODULES
IC-Custom: Diverse Image Customization via In-Context Learning
Meta-RL Induces Exploration in Language Agents
K-Prism: A Knowledge-Guided and Prompt Integrated Universal Medical Image Segmentation Model
Are LLMs Really Not Knowledgeable? Mining the Submerged Knowledge in LLMs' Memory
NoisePrints: Distortion-Free Watermarks for Authorship in Private Diffusion Models
Perturbed Dynamic Time Warping: A Probabilistic Framework and Generalized Variants
Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments
Any-to-Bokeh: Arbitrary-Subject Video Refocusing with Video Diffusion Model
Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
What Exactly Does Guidance Do in Masked Discrete Diffusion Models
Prima.cpp: Fast 30-70B LLM Inference on Heterogeneous and Low-Resource Home Clusters
SiMO: Single-Modality-Operable Multimodal Collaborative Perception
Medical thinking with multiple images
Compositional-ARC: Assessing Systematic Generalization in Abstract Spatial Reasoning
DrVoice: Parallel Speech-Text Voice Conversation Model via Dual-Resolution Speech Representations
Hierarchical Semantic-Acoustic Modeling via Semi-Discrete Residual Representations for Expressive End-to-End Speech Synthesis
Hierarchical Concept-based Interpretable Models
In-Context Learning of Temporal Point Processes with Foundation Inference Models
Discrete Diffusion Trajectory Alignment via Stepwise Decomposition
Next Visual Granularity Generation
Advancing Complex Video Object Segmentation via Progressive Concept Construction
Noisy but Valid: Robust Statistical Evaluation of LLMs with Imperfect Judges
Log Probability Tracking of LLM APIs
What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging
Sample-Efficient Distributionally Robust Multi-Agent Reinforcement Learning via Online Interaction
VisualPrompter: Semantic-Aware Prompt Optimization with Visual Feedback for Text-to-Image Synthesis
Discovering heterogeneous synaptic plasticity rules via large-scale neural evolution
Disentangling Length Bias in Preference Learning via Response-Conditioned Modeling
From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents
A Rich Knowledge Space for Scalable Deepfake Detection
SelfReflect: Can LLMs Communicate Their Internal Answer Distribution?
Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels
Operator Theory-Driven Autoformulation of MDPs for Control of Queueing Systems
On Discovering Algorithms for Adversarial Imitation Learning
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
Proximal Supervised Fine-Tuning
Improved Quality, Synchrony, and Preference Alignment for Joint Audio-Video Generation
Energy-Efficient Random Variate Generation via Compressed Lookup Tables
Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation
Adaptive Methods Are Preferable in High Privacy Settings: An SDE Perspective
APPLE: Toward General Active Perception via Reinforcement Learning
Decoupling Primitive with Experts: Dynamic Feature Alignment for Compositional Zero-Shot Learning
Bridging Degradation Discrimination and Generation for Universal Image Restoration
dParallel: Learnable Parallel Decoding for dLLMs
P-GenRM: Personalized Generative Reward Model with Test-time User-based Scaling
Group Critical-token Policy Optimization for Autoregressive Image Generation
Score-Based Density Estimation from Pairwise Comparisons
Trust but Verify: Adaptive Conditioning for Reference-Based Diffusion Super-Resolution via Implicit Reference Correlation Modeling
Self-Speculative Decoding Accelerates Lossless Inference in Any-Order and Any-Subset Autoregressive Models
Physics-Inspired All-Pair Interaction Learning for 3D Dynamics Modeling
GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks
Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints
Beyond English-Centric Training: How Reinforcement Learning Improves Cross-Lingual Reasoning in LLMs
VLBiMan: Vision-Language Anchored One-Shot Demonstration Enables Generalizable Bimanual Robotic Manipulation
ST-SimDiff: Balancing Spatiotemporal Similarity and Difference for Efficient Video Understanding with MLLMs
A Training-Free Framework for Long Video Understanding via Video-Query-Options Similarity
Nudging the Boundaries of LLM Reasoning
CHROMA: Consistent Harmonization of Multi-View Appearance via Bilateral Grid Prediction
General Exploratory Bonus for Optimistic Exploration in RLHF
WorldGym: World Model as An Environment for Policy Evaluation
AutoEP: LLMs-Driven Automation of Hyperparameter Evolution for Metaheuristic Algorithms
Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
Reasoning-Aligned Perception Decoupling for Scalable Multi-modal Reasoning
Reliability-Adjusted Prioritized Experience Replay
Fair Decision Utility in Human-AI Collaboration: Interpretable Confidence Adjustment for Humans with Cognitive Disparities
THE PATH OF LEAST RESISTANCE: GUIDING LLM REASONING TRAJECTORIES WITH PREFIX CONSENSUS
GRACE: Generative Representation Learning via Contrastive Policy Optimization
Fixing the Broken Compass: Diagnosing and Improving Inference-Time Reward Modeling
Fluent Alignment with Disfluent Judges: Post-training for lower-resource languages
Token Hidden Reward: Steering Exploration-Exploitation in Group Relative Deep Reinforcement Learning
Developmental Federated Tuning: A Cognitive-Inspired Paradigm for Efficient LLM Adaptation
Decomposition of Concept-Level Rules in Visual Scenes
ProofOptimizer: Training Language Models to Simplify Proofs without Human Demonstrations
Think in Parallel, Answer as One: Logit Averaging for Open-Ended Reasoning
A Theoretical Analysis of Mamba’s Training Dynamics: Filtering Relevant Features for Generalization in State Space Models
Implicit Inversion turns CLIP into a Decoder
Self-Consistency Improves the Trustworthiness of Self-Interpretable GNNs
Pushing Test-Time Scaling Limits of Deep Search with Asymmetric Verification
Hippoformer: Integrating Hippocampus-inspired Spatial Memory with Transformers
CitySeeker: How Do VLMs Explore Embodied Urban Navigation with Implicit Human Needs?
Robust Multi-Objective Controlled Decoding of Large Language Models
The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner
From Single to Multi-Granularity: Toward Long-Term Memory Association and Selection of Conversational Agents
Flow of Spans: Generalizing Language Models to Dynamic Span-Vocabulary via GFlowNets
Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation
Modeling the Density of Pixel-level Self-supervised Embeddings for Unsupervised Pathology Segmentation in Medical CT
MAGO: Beyond Fixed Hyperparameters with Multi-Objective Pareto Optimization for Hybrid LLM Reasoning
Disentangling the Factors of Convergence between Brains and Computer Vision Models
When to use Graphs in RAG: A Comprehensive Analysis for Graph Retrieval-Augmented Generation
UNDERSTANDING TRANSFORMERS FOR TIME SEIRES FORECASTING: A CASE STUDY ON MOIRAI
On the $O(1/T)$ Convergence of Alternating Gradient Descent–Ascent in Bilinear Games
From Cheap Geometry to Expensive Physics: Elevating Neural Operators via Latent Shape Pretraining
Sheaves Reloaded: A Direction Awakening
SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs
One2Scene: Geometric Consistent Explorable 3D Scene Generation from a Single Image
Jailbreaking on Text-to-Video Models via Scene Splitting Strategy
PreferThinker: Reasoning-based Personalized Image Preference Assessment
Risk-Sensitive Agent Compositions
Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models
Revisiting Parameter Server in LLM Post-Training
PAT3D: Physics-Augmented Text-to-3D Scene Generation
Cat-PO: Cross-modal Adaptive Token-rewards for Preference Optimization in Truthful Multimodal LLMs
SecP-Tuning: Efficient Privacy-Preserving Prompt Tuning for Large Language Models via MPC
Cutting the Skip: Training Residual-Free Transformers
Contractive Diffusion Policies: Robust Action Diffusion via Contractive Score-Based Sampling with Differential Equations
Estimating Dimensionality of Neural Representations from Finite Samples
SpotIt: Evaluating Text-to-SQL Evaluation with Formal Verification
Meta-UCF: Unified Task-Conditioned LoRA Generation for Continual Learning in Large Language Models
UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs
HiMAE: Hierarchical Masked Autoencoders Discover Resolution-Specific Structure in Wearable Time Series
LLMs Process Lists With General Filter Heads
Towards Better Branching Policies: Leveraging the Sequential Nature of Branch-and-Bound Tree
HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models
RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning
KLAS: Using Similarity to Stitch Neural Networks for an Improved Accuracy-Efficiency Tradeoff
One Model for All Tasks: Leveraging Efficient World Models in Multi-Task Planning
MAC-AMP: A Closed-Loop Multi-Agent Collaboration System for Multi-Objective Antimicrobial Peptide Design
iFusion: Integrating Dynamic Interest Streams via Diffusion Model for Click-Through Rate Prediction
An Ensemble Framework for Unbiased Language Model Watermarking
Catalog-Native LLM: Speaking Item-ID dialect with Less Entanglement for Recommendation
When Large Multimodal Models Confront Evolving Knowledge: Challenges and Explorations
Cost-Aware Dynamic Tree Construction for Efficient Large Language Model Inference
A Tale of Two Smoothness Notions: Adaptive Optimizers and Non-Euclidean Descent
Improving Black-Box Generative Attacks via Generator Semantic Consistency
Defending against Backdoor Attacks via Module Switching
Calibrated Information Bottleneck for Trusted Multi-modal Clustering
Information-based Value Iteration Networks for Decision Making Under Uncertainty
Attribution-Guided Decoding
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
ELEPHANT: Measuring and understanding social sycophancy in LLMs
Don't Throw Away Your Beams: Improving Consistency-based Uncertainties in LLMs via Beam Search
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards
MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents
Agentic Collaboration as an Information Bottleneck Problem
Anchor Frame Bridging for Coherent First-Last Frame Video Generation
Overtone: Cyclic Patch Modulation for Cleaner, Faster Physics Emulators
Group Representational Position Embedding
Difference Predictive Coding for Training Spiking Neural Networks
The Geometry of Reasoning: Flowing Logics in Representation Space
C-Evolve: Consensus-based Evolution for Prompt Groups
Internal Evaluation of Density-Based Clusterings with Noise
A Bayesian Nonparametric Framework for Private, Fair, and Balanced Tabular Data Synthesis
Adaptive Canonicalization with Application to Invariant Anisotropic Geometric Networks
COSMOS: A Hybrid Adaptive Optimizer for Efficient Training of Large Language Models
Robust Spiking Neural Networks Against Adversarial Attacks
FACT: Fine-grained Across-variable Convolution for Multivariate Time Series Forecasting
FACT: a first-principles alternative to the Neural Feature Ansatz for how networks learn representations
Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction
Multiple Streams of Knowledge Retrieval: Enriching and Recalling in Transformers
Intention-Conditioned Flow Occupancy Models
TINKER: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization
Latent Visual Reasoning
Bridging Successor Measure and Online Policy Learning with Flow Matching-Based Representations
DRIFT-Net: A Spectral-Coupled Neural Operator for PDEs Learning
Omni-IML: Towards Unified Interpretable Image Manipulation Localization
Test-time Domain Generalization for Image Super-resolution
Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield
ICYM2I: The illusion of multimodal informativeness under missingness
Robust Preference Optimization: Aligning Language Models with Noisy Preference Feedback
Efficient Degradation-agnostic Image Restoration via Channel-Wise Functional Decomposition and Manifold Regularization
TwinVLA: Data-Efficient Bimanual Manipulation with Twin Single-Arm Vision-Language-Action Models
FERD: Fairness-Enhanced Data-Free Adversarial Robustness Distillation
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
MedAgentGym: A Scalable Agentic Training Environment for Code-Centric Reasoning in Biomedical Data Science
Learning Koopman Representations with Controllability Guarantees
W-EDIT: A Wavelet-Based Frequency-Aware Framework for Text-Driven Image Editing
Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
Neural Dynamics Self-Attention for Spiking Transformers
Neural Latent Arbitrary Lagrangian-Eulerian Grids for Fluid-Solid Interaction
Long-tailed Test-Time Adaptation for Vision-Language Models
LDT: Layer-Decomposition Training Makes Networks More Generalizable
Evaluating Cross-Modal Reasoning Ability and Problem Charactaristics with Multimodal Item Response Theory
Fly-CL: A Fly-Inspired Framework for Enhancing Efficient Decorrelation and Reduced Training Time in Pre-trained Model-based Continual Representation Learning
Dropping Just a Handful of Preferences Can Change Top Large Language Model Rankings
PropensityBench: Evaluating Latent Safety Risks in Large Language Models via an Agentic Approach
MarS-FM: Generative Modeling of Molecular Dynamics via Markov State Models
Exploring the Design Space of Transition Matching
GraphUniverse: Enabling Systematic Evaluation of Inductive Generalization
Battery Fault: A Comprehensive Dataset and Benchmark for Battery Fault Diagnosis
SK2Decompile: LLM-based Two-Phase Binary Decompilation from Skeleton to Skin
TrainRef: Curating Data with Label Distribution and Minimal Reference for Accurate Prediction and Reliable Confidence
U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding
Fantastic Tractor-Dogs and How Not to Find Them With Open-Vocabulary Detectors
Seeing Across Views: Benchmarking Spatial Reasoning of Vision-Language Models in Robotic Scenes
GUIDE: Gated Uncertainty-Informed Disentangled Experts for Long-tailed Recognition
BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Behavioural Change
PTQ4ARVG: Post-Training Quantization for AutoRegressive Visual Generation Models
Alignment-Enhanced Integration of Connectivity and Spectral Sparse in Dynamic Sparse Training of LLM
BIRD: Behavior Induction via Representation-structure Distillation
Regularized Latent Dynamics Prediction is a Strong Baseline For Behavioral Foundation Models
Memorization Through the Lens of Sample Gradients
MoMa: A Simple Modular Learning Framework for Material Property Prediction
Action Chunking and Data Augmentation Yield Exponential Improvements for Imitation Learning in Continuous Spaces
Guaranteed Simply Connected Mesh Reconstruction from an Unorganized Point Cloud
RankLLM: Weighted Ranking of LLMs by Quantifying Question Difficulty
NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks
Should We Still Pretrain Encoders with Masked Language Modeling?
Latent-Guided Reasoning: Empowering Small LLMs with Large-Model Thinking
Self-Refining Vision Language Model for Robotic Failure Detection and Reasoning
DeepSADR: Deep Transfer Learning with Subsequence Interaction and Adaptive Readout for Cancer Drug Response Prediction
MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding
CaTS: Calibrated Test-Time Scaling for Efficient LLM Inference
Text-Aware Image Restoration with Diffusion Models
From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones
CLUE: Conflict-guided Localization for LLM Unlearning Framework
Scalable Training for Vector-Quantized Networks with 100% Codebook Utilization
Stretching Beyond the Obvious: A Gradient-Free Framework to Unveil the Hidden Landscape of Visual Invariance
Culture In a Frame: C$^3$B as a Comic-Based Benchmark for Multimodal Culturally Awareness
Tuning the burn-in phase in training recurrent neural networks improves their performance
BEP: A Binary Error Propagation Algorithm for Binary Neural Networks Training
Long-Text-to-Image Generation via Compositional Prompt Decomposition
EarthSE: A Benchmark Evaluating Earth Scientific Exploration Capability for Large Language Models
Any-Subgroup Equivariant Networks via Symmetry Breaking
Exploratory Diffusion Model for Unsupervised Reinforcement Learning
PACE: Pretrained Audio Continual Learning
MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent
On the Design of One-step Diffusion via Shortcutting Flow Paths
Mitigating Mismatch within Reference-based Preference Optimization
Token-level Data Selection for Safe LLM Fine-tuning
Cross-Embodied Co-Design for Dexterous Hands
FACM: Flow-Anchored Consistency Models
FaLW: A Forgetting-aware Loss Reweighting for Long-tailed Unlearning
Recurrent Action Transformer with Memory
SCI-Verifier: Scientific Verifier with Thinking
DR-GGAD: Dual Residual Centering for Mitigating Anomaly Non‑Discriminativity in Generalist Graph Anomaly Detection
SpareTrain: Fault-Tolerant LLM Training via Low-Cost Dual Modular Redundancy
VGR: Visual Grounded Reasoning
Revisiting Tree-Sliced Wasserstein Distance Through the Lens of the Fermat–Weber Problem
Stable and Scalable Deep Predictive Coding Networks with Meta Prediction Errors
TGM: A Modular and Efficient Library for Machine Learning on Temporal Graphs
PACEbench: A Framework for Evaluating Practical AI Cyber-Exploitation Capabilities
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
What Do Large Language Models Know About Opinions?
Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator
BRIDGE: Bi-level Reinforcement Learning for Dynamic Group Structure in Coalition Formation Games
Scalable Exploration for High-Dimensional Continuous Control via Value-Guided Flow
Efficient Learning on Large Graphs using a Densifying Regularity Lemma
Divide, Harmonize, Then Conquer It: Shooting Multi-Commodity Flow Problems with Multimodal Language Models
TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models
Gradient Intrinsic Dimensionality Alignment:Narrowing The Gap Between Low-Rank Adaptation and Full Fine-Tuning
Heterogeneous Agent Q-weighted Policy Optimization
Avey Bidirectional Architecture
TEDM: Time Series Forecasting with Elucidated Diffusion Models
DIVERSE: Disagreement-Inducing Vector Evolution for Rashomon Set Exploration
Cross-Domain Policy Optimization via Bellman Consistency and Hybrid Critics
Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the Diverse Framework
GlowQ: Group-Shared LOw-Rank Approximation for Quantized LLMs
Frozen Policy Iteration: Computationally Efficient RL under Linear $Q^{\pi}$ Realizability for Deterministic Dynamics
Direct Reward Fine-Tuning on Poses for Single Image to 3D Human in the Wild
AC-Sampler: Accelerate and Correct Diffusion Sampling with Metropolis-Hastings Algorithm
Low-Rank Few-Shot Node Classification by Node-Level Graph Diffusion
C-Voting: Confidence-Based Test-Time Voting without Explicit Energy Functions
FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates
Universal Model Routing for Efficient LLM Inference
Scaling Direct Feedback Learning with Theoretical Guarantees
Unifying Complexity-Theoretic Perspectives on Provable Explanations
OpenPros: A Large-Scale Dataset for Limited View Prostate Ultrasound Computed Tomography
Improving Autoregressive Video Modeling with History Understanding
ChainGPT: Dual-Reasoning Model with Recurrent Depth and Multi-Rank State Updates
Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling
Latent Diffusion Model without Variational Autoencoder
Laplacian Multi-scale Flow Matching for Generative Modeling
Evaluating Language Models' Evaluations of Games
When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling
DiCache: Let Diffusion Model Determine Its Own Cache
Breaking Agent Backbones: Evaluating the Security of Backbone LLMs in AI Agents
DAVE: A VLM Vision Encoder for Document Understanding and Web Agents
Physics-Constrained Fine-Tuning of Flow-Matching Models for Generation and Inverse Problems
Embodied Navigation Foundation Model
Constrained Diffusion for Protein Design with Hard Structural Constraints
FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment
Learning to Solve Orienteering Problem with Time Windows and Variable Profits
Tversky Neural Networks: Psychologically Plausible Deep Learning with Differentiable Tversky Similarity
SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback
Latent Speech-Text Transformer
Exploring Cross-Modal Flows for Few-Shot Learning
X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model
Obfuscated Activations Bypass LLM Latent-Space Defenses
RESA: Bringing Back What Sparse Attention Ignores with Residual Estimation
Subspace Kernel Learning on Tensor Sequences
Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness
LingoLoop Attack: Trapping MLLMs via Linguistic Context and State Entrapment into Endless Loops
Dens3R: A Foundation Model for 3D Geometry Prediction
Probing in the Dark: State Entropy Maximization for POMDPs
Reliable Fine-Grained Evaluation of Natural Language Math Proofs
Threading Keyframe with Narratives: MLLMs as Strong Long Video Comprehenders
FMIP: Joint Continuous-Integer Flow For Mixed-Integer Linear Programming
Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning
Identifying Robust Neural Pathways: Few-Shot Adversarial Mask Tuning for Vision-Language Models
DriftLite: Lightweight Drift Control for Inference-Time Scaling of Diffusion Models
AgenTracer: Who Is Inducing Failure in the LLM Agentic Systems?
CoAct-1: Computer-using Multi-agent System with Coding Actions
Vid-LLM: A Compact Video-based 3D Multimodal LLM with Reconstruction–Reasoning Synergy
Slicing Wasserstein over Wasserstein via Functional Optimal Transport
Exploring the Basin-Like Loss Landscape in Large Language Models
EchoMind: An Interrelated Multi-level Benchmark for Evaluating Empathetic Speech Language Models
Adaptive Augmentation-Aware Latent Learning for Robust LiDAR Semantic Segmentation
SAVE: A Generalizable Framework for Multi-Condition Single-Cell Generation with Gene Block Attention
WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference
V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model Interaction
Learn to Guide Your Diffusion Model
Is Pure Exploitation Sufficient in Exogenous MDPs with Linear Function Approximation?
Graph-based Nearest Neighbors with Dynamic Updates via Random Walk-Based Analysis
HLD: Approximate Hierarchical Linguistic Distribution Modeling for LLM-Generated Text Detection
Planner Aware Path Learning in Diffusion Language Models Training
IF-VidCap: Can Video Caption Models Follow Instructions?
3D RNA Inverse Design with Reinforcement Learning-Guided Diffusion Models
CircuitNet 3.0: A Multi-Modal Dataset with Task-Oriented Augmentation for AI-Driven Circuit Design
VisJudge-Bench: Aesthetics and Quality Assessment of Visualizations
Matching without Group Barrier for Heterogeneous Treatment Effect Estimation
Combination-of-Experts with Knowledge Sharing for Cross-Task Vehicle Routing Problems
Incentivizing LLM Reasoning via Reinforcement Learning with Functional Monte Carlo Tree Search
Towards Faithful Reasoning in Remote Sensing: A Perceptually-Grounded GeoSpatial Chain-of-Thought for Vision-Language Models
Graph Diffusion Transformers are In-Context Molecular Designers
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Convergence of Muon with Newton-Schulz
HEIST: A Graph Foundation Model for Spatial Transcriptomics and Proteomics Data
Polynomial Convergence of Riemannian Diffusion Models
Why Adversarially Train Diffusion Models?
Interleaving Reasoning for Better Text-to-Image Generation
FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning
AlphaAgentEvo: Evolution-Oriented Alpha Mining via Self-Evolving Agentic Reinforcement Learning
FlowRL: Matching Reward Distributions for LLM Reasoning
Complementing Self-Consistency with Cross-Model Disagreement for Uncertainty Quantification
SpikeStereoNet: A Brain-Inspired Framework for Stereo Depth Estimation from Spike Streams
AetherCode: Evaluating LLMs’ Ability to Win In Premier Programming Competitions
On the Thinking-Language Modeling Gap in Large Language Models
Enhancing Persona Following at Decoding Time via Dynamic Importance Estimation for Role-Playing Agents
VeriRole: Verifiable Role-Awareness through Hint-Guided Reinforcement Learning
RADAR: Learning to Route with Asymmetry-aware Distance Representations
Discrete Latent Features Ablate Adversarial Attack: A Robust Prompt Tuning Framework for VLMs
An Improved Model-free Decision-estimation Coefficient with Applications in Adversarial MDPs
A Schrödinger Eigenfunction Method for Long-Horizon Stochastic Optimal Control
Embracing Discrete Search: A Reasonable Approach to Causal Structure Learning
SEED-SET: Scalable Evolving Experimental Design for System-level Ethical Testing
Online Learning and Equilibrium Computation with Ranking Feedback
Adaptive Test-Time Training for Predicting Need for Invasive Mechanical Ventilation in Multi-Center Cohorts
Stacked from One: Multi-Scale Self-Injection for Context Window Extension
MATRIX: Mask Track Alignment for Interaction-aware Video Generation
DHG-Bench: A Comprehensive Benchmark for Deep Hypergraph Learning
Emergence of Spatial Representation in an Actor-Critic Agent with Hippocampus-Inspired Sequence Generator
Adaptive Scaling of Policy Constraints for Offline Reinforcement Learning
Transferable and Stealthy Adversarial Attacks on Large Vision-Language Models
Explainable $ K $-means Neural Networks for Multi-view Clustering
Do We Really Need Permutations? Impact of Width Expansion on Linear Mode Connectivity
Scaling Speech Tokenizers with Diffusion Autoencoders
Think-While-Generating: On-the-Fly Reasoning for Personalized Long-Form Generation
Dataset Distillation for Memorized Data: Soft Labels can Leak Held-Out Teacher Knowledge
Feature segregation by signed weights in artificial vision systems and biological models
Principled Fast and Meta Knowledge Learners for Continual Reinforcement Learning
Jacobian Aligned Random Forests
FARTrack: Fast Autoregressive Visual Tracking with High Performance
Equilibrium Language Models
Learning Semi-Structured Sparsity for LLMs via Shared and Context-Aware Hypernetwork
PHAT: Modeling Period Heterogeneity for Multivariate Time Series Forecasting
Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning
Emergence of Superposition: Unveiling the Training Dynamics of Chain of Continuous Thought
Hilbert-Guided Sparse Local Attention
Bridging the performance-gap between target-free and target-based reinforcement learning
SONA: Learning Conditional, Unconditional, and Mismatching-Aware Discriminator
Composer: A Search Framework for Hybrid Neural Architecture Design
Premise Selection for a Lean Hammer
floq: Training Critics via Flow-Matching for Scaling Compute in Value-Based RL
In Good GRACES: Principled Teacher Selection for Knowledge Distillation
PRISM: Partial-label Relational Inference with Spatial and Spectral Cues
ProxyAttn: Guided Sparse Attention via Representative Heads
STAT: Skill-Targeted Adaptive Training
Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization
Uncertainty Matters in Dynamic Gaussian Splatting for Monocular 4D Reconstruction
FlowAD: Ego-Scene Interactive Modeling for Autonomous Driving
DePO: Demonstration-guided Policy Optimization for Molecular Optimization
Learning Pseudorandom Numbers with Transformers: Permuted Congruential Generators, Curricula, and Interpretability
Optimistic Task Inference for Behavior Foundation Models
Discount Model Search for Quality Diversity Optimization in High-Dimensional Measure Spaces
MTVCraft: Tokenizing 4D Motion for Arbitrary Character Animation
Property-Driven Protein Inverse Folding with Multi-Objective Preference Alignment
Causal Structure Learning in Hawkes Processes with Complex Latent Confounder Networks
Towards Anomaly-Aware Pre-Training and Fine-Tuning for Graph Anomaly Detection
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
Simplicial Embeddings Improve Sample Efficiency in Actor–Critic Agents
LinearRAG: Linear Graph Retrieval Augmented Generation on Large-scale Corpora
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling
OmniMouse: Scaling properties of multi-modal, multi-task Brain Models on 150B Neural Tokens
Why Ask One When You Can Ask $k$? Learning-to-Defer to the Top-$k$ Experts
Fine-tuning Quantized Neural Networks with Zeroth-order Optimization
Discovering and Steering Interpretable Concepts in Large Generative Music Models
Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction
MuonBP: Faster Muon via Block-Periodic Orthogonalization
BeyondBench: Benchmark-Free Evaluation of Reasoning in Language Models
ATLAS: Constraints-Aware Multi-Agent Collaboration for Real-World Travel Planning
AdvChain: Adversarial Chain-of-Thought Tuning for Robust Safety Alignment of Large Reasoning Models
Mean Flow Policy with Instantaneous Velocity Constraint for One-step Action Generation
Learning Data-Efficient and Generalizable Neural Operators via Fundamental Physics Knowledge
Binomial Gradient-Based Meta-Learning for Enhanced Meta-Gradient Estimation
Improving Classifier-Free Guidance in Masked Diffusion: Low-Dim Theoretical Insights with High-Dim Impact
GeoPurify: A Data-Efficient Geometric Distillation Framework for Open-Vocabulary 3D Segmentation
Toward Effective Tool-Integrated Reasoning via Self-Evolved Preference Learning
Singleton-Optimized Conformal Prediction
Rethinking Layer Relevance in Large Language Models Beyond Cosine Similarity
A Biologically Plausible Dense Associative Memory with Exponential Capacity
Learning Facts at Scale with Active Reading
Black-Box Privacy Attacks on Shared Representations in Multitask Learning
EasyCreator: Empowering 4D Creation through Video Inpainting
Cross-Domain Lossy Compression via Rate- and Classification-Constrained Optimal Transport
Bi-directional Bias Attribution: Debiasing Large Language Models without Modifying Prompts
DAMR: Efficient and Adaptive Context-Aware Knowledge Graph Question Answering with LLM-Guided MCTS
NEO — No-Optimization Test-Time Adaptation through Latent Re-Centering
Beyond Outliers: A Study of Optimizers Under Quantization
WorldTree: Towards 4D Dynamic Worlds from Monocular Video using Tree-Chains
PASER: Post-Training Data Selection for Efficient Pruned Large Language Model Recovery
ViMo: A Generative Visual GUI World Model for App Agents
FlashRNN: Unlocking Parallel Training of Nonlinear RNNs for Large Language Models
Event-T2M: Event-level Conditioning for Complex Text-to-Motion Synthesis
MLE-Smith: Scaling MLE Tasks with Automated Multi-agent Pipeline
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
Random Anchors with Low-rank Decorrelated Learning: A Minimalist Pipeline for Class-Incremental Medical Image Classification
Truthfulness Despite Weak Supervision: Evaluating and Training LLMs Using Peer Prediction
Distribution-Aware Multi-Granularity Phase Coding: Towards Lower Conversion Error for Spike-Driven Large Language Models
Private Rate-Constrained Optimization with Applications to Fair Learning
Noise Tolerance of Distributionally Robust Learning
Reinforcement Learning for Machine Learning Engineering Agents
RefTool: Reference-Guided Tool Creation for Knowledge-Intensive Reasoning
TESSAR: Geometry-Aware Active Regression via Dynamic Voronoi Tessellation
Guided Speculative Inference for Efficient Test-Time Alignment of LLMs
Agnostics: Learning to Synthesize Code in Any Programming Language with a Universal Reinforcement Learning Environment
PRISMM-Bench: A Benchmark of Peer-Review Grounded Multimodal Inconsistencies
Parallel Multimodal Diffusion Language Models for Thinking-Aware Editing and Generation
Measuring Bias Amplification in Multi-Agent Systems with Large Language Models
Sparsity-promoting Fine-tuning for Equivariant Materials Foundation Model
An Open-Ended Benchmark and Formal Framework for Adjuvant Research with MLLM
GEPO: Group Expectation Policy Optimization for Stable Heterogeneous Reinforcement Learning
Beyond Student: An Asymmetric Network for Neural Network Inheritance
DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems
Divid: Disentangled Spatial-Temporal Modeling within LLMs for Temporally Grounded Video Understanding
Mitigating Spurious Correlation via Distributionally Robust Learning with Hierarchical Ambiguity Sets
OVID: Open-Vocabulary Intrusion Detection
CodeBrain: Towards Decoupled Interpretability and Multi-Scale Architecture for EEG Foundation Model
HSIC Bottleneck for Cross-Generator and Domain-Incremental Synthetic Image Detection
Fast Language Generation through Discrete Diffusion Divergence Instruct
Beyond Binary Preferences: A Principled Framework for Reward Modeling with Ordinal Feedback
Articulation in Motion: Prior-free Part Mobility Analysis for Articulated Objects By Dynamic-Static Disentanglement
Demystifying Emergent Exploration in Goal-Conditioned RL
Learning with Dual-level Noisy Correspondence for Multi-modal Entity Alignment
WIMLE: Uncertainty‑Aware World Models with IMLE for Sample‑Efficient Continuous Control
You Point, I Learn: Online Adaptation of Interactive Segmentation Models for Handling Distribution Shifts in Medical Imaging
Real-Time Reasoning Agents in Evolving Environments
SABRE-FL: Selective and Accurate Backdoor Rejection for Federated Prompt Learning
Query-Aware Flow Diffusion for Graph-Based RAG with Retrieval Guarantees
FingerTip 20K: A Benchmark for Proactive and Personalized Mobile LLM Agents
On the Impact of the Utility in Semivalue-based Data Valuation
Trapped by simplicity: When Transformers fail to learn from noisy features
Image Inpainting with Preference Alignment
BigMac3D: A Big Macaque Motion and Animation Dataset Bridging Image and 3D Pose Representations
Statistical Guarantees in the Search for Less Discriminatory Algorithms
IDEAL: Data Equilibrium Adaptation for Multi-Capability Language Model Alignment
Latent-to-Data Cascaded Diffusion Models for Unconditional Time Series Generation
LongLive: Real-time Interactive Long Video Generation
Why High-rank Neural Networks Generalize?: An Algebraic Framework with RKHSs
Probabilistic Kernel Function for Fast Angle Testing
UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity
VLMgineer: Vision-Language Models as Robotic Toolsmiths
CUPID: A Plug-in Framework for Joint Aleatoric and Epistemic Uncertainty Estimation with a Single Model
SERQ: Saliency-Aware Low-Rank Error Reconstruction for LLM Quantization
Beyond Accuracy: Are Time Series Foundation Models Well-Calibrated?
Maximizing Asynchronicity in Event-based Neural Networks
PolyGraphScore: a classifier-based metric for evaluating graph generative models
Synchronizing Probabilities in Model-Driven Lossless Compression
Thinking as Society: Multi-Social-Agent Self-Distillation for Multimodal Misinformation Detection
BIRD-INTERACT: Re-imagining Text-to-SQL Evaluation via Lens of Dynamic Interactions
Toward Enhancing Representation Learning in Federated Multi-Task Settings
DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment
Deconstructing Guidance: A Semantic Hierarchy for Precise Diffusion Model Editing
Using cognitive models to reveal value trade-offs in language models
Corner Gradient Descent
Constantly Improving Image Models Need Constantly Improving Benchmarks
AlignFlow: Improving Flow-based Generative Models with Semi-Discrete Optimal Transport
AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning
Rethinking Radiology Report Generation: From Narrative Flow to Topic-Guided Findings
STARK: Strategic Team of Agents for Refining Kernels
Privacy Beyond Pixels: Latent Anonymization for Privacy-Preserving Video Understanding
Achieving Approximate Symmetry Is Exponentially Easier than Exact Symmetry
AnyTouch 2: General Optical Tactile Representation Learning For Dynamic Tactile Perception
DPQuant: Efficient and Private Model Training via Dynamic Quantization Scheduling
Scaling Synthetic Task Generation for Agents via Exploration
Object Fidelity Diffusion for Remote Sensing Image Generation
Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations
Gistify: Codebase-Level Understanding via Runtime Execution
Reinforcing General Reasoning Without Verifiers
LogicXGNN: Grounded Logical Rules for Explaining Graph Neural Networks
Towards Lossless Memory-efficient Training of Spiking Neural Networks via Gradient Checkpointing and Spike Compression
OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis
FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting
Entropy-Monitored Kernelized Token Distillation for Audio-Visual Compression
Flash-Mono: Feed-Forward Accelerated Gaussian Splatting Monocular SLAM
EgoHandICL: Egocentric 3D Hand Reconstruction with In-Context Learning
Online Minimization of Polarization and Disagreement via Low-Rank Matrix Bandits
FlowAlign: Trajectory-Regularized, Inversion-Free Flow-based Image Editing
Searching for Privacy Risks in LLM Agents via Simulation
Don't Just Fine-tune the Agent, Tune the Environment
Towards All-Atom Foundation Models for Biomolecular Binding Affinity Prediction
SurfSplat: Conquering Feedforward 2D Gaussian Splatting with Surface Continuity Priors
LoongRL: Reinforcement Learning for Advanced Reasoning over Long Contexts
Inductive Reasoning for Temporal Knowledge Graphs with Emerging Entities
ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents
Automatic Image-Level Morphological Trait Annotation for Organismal Images
UP2You: Fast Reconstruction of Yourself from Unconstrained Photo Collections
Zero-shot HOI Detection with MLLM-based Detector-agnostic Interaction Recognition
Aligner, Diagnose Thyself: A Meta-Learning Paradigm for Fusing Intrinsic Feedback in Preference Alignment
Mixture-of-Experts Can Surpass Dense LLMs Under Strictly Equal Resource
OmniActor: A Generalist GUI and Embodied Agent for 2D&3D Worlds
LUMINA: Detecting Hallucinations in RAG System with Context–Knowledge Signals
Otters: An Energy-Efficient Spiking Transformer via Optical Time-to-First-Spike Encoding
Progressive Online Video Understanding with Evidence-Aligned Timing and Transparent Decisions
Dual-Robust Cross-Domain Offline Reinforcement Learning Against Dynamics Shifts
RPM: Reasoning-Level Personalization for Black-Box Large Language Models
Learning Hierarchical and Geometry-Aware Graph Representations for Text-to-CAD
SAM-Veteran: An MLLM-Based Human-like SAM Agent for Reasoning Segmentation
SPECS: Decoupling Multimodal Learning via Self-distilled Preference-based Cold Start
CLARC: C/C++ Benchmark for Robust Code Search
Robust Reward Modeling via Causal Rubrics
Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation
VMoBA: Mixture-of-Block Attention for Video Diffusion Models
Command-V: Training-Free Representation Finetuning Transfer
QuadGPT: Native Quadrilateral Mesh Generation with Autoregressive Models
MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models
Controllable Logical Hypothesis Generation for Abductive Reasoning in Knowledge Graphs
Uni-NTFM: A Unified Foundation Model for EEG Signal Representation Learning
Making, Not Taking, the Best of N
Membership Inference Attacks Against Fine-tuned Diffusion Language Models
MoL: Adaptive Mixture-of-Length Reasoning for Efficient Question Answering with Context
MIRA: Memory-Integrated Reinforcement Learning Agent with Limited LLM Guidance
Distributional Consistency Loss: Beyond Pointwise Data Terms in Inverse Problems
Trust-Region Adaptive Policy Optimization
Decentralized Attention Fails Centralized Signals: Rethinking Transformers for Medical Time Series
A Step to Decouple Optimization in 3DGS
All Patches Matter, More Patches Better: Enhance AI-Generated Image Detection via Panoptic Patch Learning
NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching
Object-Centric Refinement for Enhanced Zero-Shot Segmentation
TrimR: Verifier-based Training-Free Thinking Trimming for Efficient Test-Time Scaling
From ``Sure" to ``Sorry": Detecting Jailbreak in Large Vision Language Model via JailNeurons
Optimizing Data Augmentation through Bayesian Model Selection
Human-AI Curation Synergy: Scaling Preference Data Curation via Human-Guided AI Feedback
Self-Speculative Masked Diffusions
Latent Fourier Transform
EGG-SR: Embedding Symbolic Equivalence into Symbolic Regression via Equality Graph
Human-LLM Collaborative Feature Engineering for Tabular Data
Let LLMs Speak Embedding Languages: Generative Text Embeddings via Iterative Contrastive Refinement
Steering and Rectifying Latent representation manifolds in Frozen Multi-modal LLMs for Video Anomaly Detection
MIAM: Modality Imbalance-Aware Masking for Multimodal Ecological Applications
Improving Discrete Diffusion Unmasking Policies Beyond Explicit Reference Policies
Agentic Reinforcement Learning with Implicit Step Rewards
Culture in Action: Evaluating Text-to-Image Models through Social Activities
Multi-LLM Adaptive Conformal Inference for Reliable LLM Response
The Natural Geometry of Code: Hyperbolic Representation Learning for Program Reasoning
Si-GT: Fast Interconnect Signal Integrity Analysis for Integrated Circuit Design via Graph Transformers
SimULi: Real-Time LiDAR and Camera Simulation with Unscented Transforms
FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation
Graph homophily booster: Rethinking the role of discrete features on heterophilic graphs
AttriCtrl: A Generalizable Framework for Controlling Semantic Attribute Intensity in Diffusion Models
An Efficient SE(p)-Invariant Transport Metric Driven by Polar Transport Discrepancy-based Representation
RMFlow: Refined Mean Flow by a Noise-Injection Step for Multimodal Generation
Video-KTR: Reinforcing Video Reasoning via Key Token Attribution
CrossPL: Systematic Evaluation of Large Language Models for Cross Programming Language Interoperating Code Generation
Source-Guided Flow Matching
Adaptive Nonlinear Compression for Large Foundation Models
Adaptive Width Neural Networks
Pseudo-Non-Linear Data Augmentation: A Constrained Energy Minimization Viewpoint
VLM-SubtleBench: How Far Are VLMs from Human-Level Subtle Comparative Reasoning?
Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training
Omni-View: Unlocking How Generation Facilitates Understanding in Unified 3D Model based on Multiview images
$\nabla$-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Textual Space
Measurement Score-Based Diffusion Model
M$^3$E: Continual Vision-and-Language Navigation via Mixture of Macro and Micro Experts
Vid2World: Crafting Video Diffusion Models to Interactive World Models
Beyond Text-to-Image: Liberating Generation with a Unified Discrete Diffusion Model
Neyman-Pearson Classification under Both Null and Alternative Distributions Shift
CoLA: Co-Calibrated Logit Adjustment for Long-Tailed Semi-Supervised Learning
The Achilles’ Heel of LLMs: How Altering a Handful of Neurons Can Cripple Language Abilities
Learning to Segment for Vehicle Routing Problems
Scaling Bayesian Experimental Design to High-Dimensions with Information-Guided Diffusion
Sample Smart, Not Hard: Correctness-First Decoding for Better Reasoning in LLMs
Deep-ICE: The first globally optimal algorithm for empirical risk minimization of two-layer maxout and ReLU networks
LLaVAction: evaluating and training multi-modal large language models for action understanding
Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity
Understanding and improving Shampoo and SOAP via Kullback-Leibler Minimization
Bridging Past and Future: Distribution-Aware Alignment for Time Series Forecasting
CORE: Concept-Oriented Reinforcement for Bridging the Definition–Application Gap in Mathematical Reasoning
Darwin Gödel Machine: Open-Ended Evolution of Self-Improving Agents
$\ell_1$ Latent Distance based Continuous-time Graph Representation
Uncovering Semantic Selectivity of Latent Groups in Higher Visual Cortex with Mutual Information-Guided Diffusion
CMT-Benchmark: A Benchmark for Condensed Matter Theory Built by Expert Researchers
Can we generate portable representations for clinical time series data using LLMs?
Adjusting Prediction Model Through Wasserstein Geodesic for Causal Inference
Calibrating Verbalized Confidence with Self-Generated Distractors
Learning to Be Uncertain: Pre-training World Models with Horizon-Calibrated Uncertainty
GeoBench: Rethinking Multimodal Geometric Problem-Solving via Hierarchical Evaluation
Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training
Local Reinforcement Learning with Action-Conditioned Root Mean Squared Q-Functions
SSDi8: Accurate and Efficient 8-bit Quantization for State Space Duality
MILPnet: A Multi-Scale Architecture with Geometric Feature Sequence Representations for Advancing MILP Problems
Terminal Velocity Matching
DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving
Pi-CCA: Prompt-Invariant CCA Certificates for Replay-Free Vision–Language Continual Learning
Reconstruction Alignment Improves Unified Multimodal Models
On Discriminative vs. Generative classifiers: Rethinking MLLMs for Action Understanding
Carré du champ flow matching: better quality-generalisation tradeoff in generative models
Plug-and-Play Fidelity Optimization for Diffusion Transformer Acceleration via Cumulative Error Minimization
Enhanced Continual Learning of Vision-Language Models with Model Fusion
ResWorld: Temporal Residual World Model for End-to-End Autonomous Driving
PlantRSR: A New Plant Dataset and Method for Reference-based Super-Resolution
Global-Recent Semantic Reasoning on Dynamic Text-Attributed Graphs with Large Language Models
DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation
Discrete Bayesian Sample Inference for Graph Generation
Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems
Adaptive Mixture of Disentangled Experts for Dynamic Graphs under Distribution Shifts
Training Large Reasoning Models Efficiently via Progressive Thought Encoding
Early Signs of Steganographic Capabilities in Frontier LLMs
ReVeal: Self-Evolving Code Agents via Reliable Self-Verification
Emergent Misalignment is Easy, Narrow Misalignment is Hard
Reliable Poisoned Sample Detection against Backdoor Attacks Enhanced by Sharpness Aware Minimization
LearNAT: Learning NL2SQL with AST-guided Task Decomposition for Large Language Models
Data Aware and Scalable Sensitivity Analysis for Decision Tree Ensembles
Translation Heads: Unveiling Attention's Role in LLM Multilingual Translation
A High Quality Dataset and Reliable Evaluation for Interleaved Image-Text Generation
From Reproduction to Replication: Evaluating Research Agents with Progressive Code Masking
Random Label Prediction Heads for Studying and Controlling Memorization in Deep Neural Networks
Mapping Post-Training Forgetting in Language Models at Scale
Cache What Lasts: Token Retention for Memory-Bounded KV Cache in LLMs
The Spacetime of Diffusion Models: An Information Geometry Perspective
WMPO: World Model-based Policy Optimization for Vision-Language-Action Models
Steering Embedding Models with Geometric Rotation: Mapping Semantic Relationships Across Languages and Models
Guiding Mixture-of-Experts with Temporal Multimodal Interactions
Empowering Small VLMs to Think with Dynamic Memorization and Exploration
SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis
Consistent Noisy Latent Rewards for Trajectory Preference Optimization in Diffusion Models
RF-DETR: Neural Architecture Search for Real-Time Detection Transformers
Generation then Reconstruction: Accelerating Masked Autoregressive Models via Two-Stage Sampling
MOAI: Module-Optimizing Architecture for Non-Interactive Secure Transformer Inference
Algorithm Generation via Creative Ideation
Scaling Reasoning Hop Exposes Weaknesses: Demystifying and Improving Hop Generalization in Large Language Models
Evolution of Concepts in Language Model Pre-Training
AdAEM: An Adaptively and Automated Extensible Evaluation Method of LLMs' Value Difference
Transducing Language Models
On the Eligibility of LLMs for Counterfactual Reasoning: A Decompositional Study
No outlier channels but with outlier blocks
Diagnosing and Remedying Knowledge Deficiencies in LLMs via Label-free Curricular Meaningful Learning
Why is Your Language Model a Poor Implicit Reward Model?
Smooth Calibration Error: Uniform Convergence and Functional Gradient Analysis
Learning Part-Aware Dense 3D Feature Field For Generalizable Articulated Object Manipulation
Adaptive Moments are Surprisingly Effective for Plug-and-Play Diffusion Sampling
Data Provenance for Image Auto-Regressive Generation
FM4NPP: A Scaling Foundation Model for Nuclear and Particle Physics
GOLDILOCS: GENERAL OBJECT-LEVEL DETECTION AND LABELING OF CHANGES IN SCENES
Multi-Armed Bandits with Minimum Aggregated Revenue Constraints
Characterization and Learning of Causal Graphs with Latent Confounders and Post-treatment Selection from Interventional Data
On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning
PYRREGULAR: A Unified Framework for Irregular Time Series, with Classification Benchmarks
ASCIIEval: Benchmarking Models' Visual Perception in Text Strings via ASCII Art
TangoFlux: Text to Audio Generation with CLAP-Ranked Preference Optimization
SAIR: Enabling Deep Learning for Protein-Ligand Interactions with a Synthetic Structural Dataset
Unveiling the Potential of Diffusion Large Language Model in Controllable Generation
Pretrain Value, Not Reward: Decoupled Value Policy Optimization
AudioX: A Unified Framework for Anything-to-Audio Generation
Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents
Jailbreaking the Matrix: Nullspace Steering for Controlled Model Subversion
Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute
The Gaussian-Head OFL Family: One-Shot Federated Learning from Client Global Statistics
LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer
ICaRus: Identical Cache Reuse for Efficient Multi-Model Inference
Glance and Focus Reinforcement for Pan-cancer Screening
Test-Time Optimization of 3D Point Cloud LLM via Manifold-Aware In-Context Guidance and Refinement
PRISM: Enhancing PRotein Inverse Folding through Fine- Grained Retrieval on Structure-Sequence Multimodal Representations
RF-MatID: Dataset and Benchmark for Radio Frequency Material Identification
Squeeze the Soaked Sponge: Efficient Off-policy RFT for Large Language Model
MAS$^2$: Self-Generative, Self-Configuring, Self-Rectifying Multi-Agent Systems
Antibody: Strengthening Defense Against Harmful Fine-Tuning for Large Language Models via Attenuating Harmful Gradient Influence
Shrinking Proteins with Diffusion
On the Predictive Power of Representation Dispersion in Language Models
PERK: Long-Context Reasoning as Parameter-Efficient Test-Time Learning
Scaling Laws and Symmetry, Evidence from Neural Force Fields
BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design
Knowledge Editing with Subspace-Aware Key-Value Mappings
LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them?
Meta-Router: Bridging Gold-standard and Preference-based Evaluations in LLM Routing
Highly Efficient and Effective LLMs with Multi-Boolean Architectures
Do LLM Agents Know How to Ground, Recover, and Assess? A Benchmark for Epistemic Competence in Information-Seeking Agents
Real-Time Robot Execution with Masked Action Chunking
Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization
Language-Instructed Vision Embeddings for Controllable and Generalizable Perception
DriveAgent-R1: Advancing VLM-based Autonomous Driving with Active Perception and Hybrid Thinking
SAM 3: Segment Anything with Concepts
Are we measuring oversmoothing in graph neural networks correctly?
Towards Robust Real-World Multivariate Time Series Forecasting: A Unified Framework for Dependency, Asynchrony, and Missingness
LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation
SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs
Optimal Transport-Induced Samples against Out-of-Distribution Overconfidence
$\mathbf{T^3}$: Reducing Belief Deviation in Reinforcement Learning for Active Reasoning
ShapeGen4D: Towards High Quality 4D Shape Generation from Videos
Monotone Near-Zero-Sum Games
Computational Bottlenecks for Denoising Diffusions
Towards Spatial Supersensing in Video
SURGE: Surprise-Guided Token Reduction for Efficient Video Understanding with VLMs
Efficient Test-Time Scaling for Small Vision-Language Models
Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning
Stop Guessing: Choosing the Optimization-Consistent Uncertainty Measurement for Evidential Deep Learning
PRISM: Festina Lente Proactivity—Risk-Sensitive, Uncertainty-Aware Deliberation for Proactive Agents
OVSeg3R: Learn Open-vocabulary Instance Segmentation from 2D via 3D Reconstruction
The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs
NewtonGen: Physics-consistent and Controllable Text-to-Video Generation via Neural Newtonian Dynamics
InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions
NAIPv2: Debiased Pairwise Learning for Efficient Paper Quality Estimation
BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning
Rényi Sharpness: A Novel Sharpness that Strongly Correlates with Generalization
Learning from Noisy Preferences: A Semi-Supervised Learning Approach to Direct Preference Optimization
Model Predictive Adversarial Imitation Learning for Planning from Observation
Hybrid Deep Searcher: Scalable Parallel and Sequential Search Reasoning
Supporting High-Stakes Decision Making Through Interactive Preference Elicitation in the Latent Space
Learning Flexible Forward Trajectories for Masked Molecular Diffusion
TNT: Improving Chunkwise Training for Test-Time Memorization
HiPO: Self-Hint Policy Optimization for RLVR
K-Sort Eval: Efficient Preference Evaluation for Visual Generation via Corrected VLM-as-a-Judge
OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents
HUME: Measuring the Human-Model Performance Gap in Text Embedding Tasks
Enhancing Complex Symbolic Logical Reasoning of Large Language Models via Sparse Multi-Agent Debate
SafeFlowMatcher: Safe and Fast Planning using Flow Matching with Control Barrier Functions
PMDformer: Patch-Mean Decoupling Transformer for Long-term Forecasting
Balancing the Experts: Unlocking LoRA-MoE for GRPO via Mechanism-Aware Rewards
Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset
Towards Text-Mask Consistency in Medical Image Segmentation
Probing to Refine: Reinforcement Distillation of LLM Reasoners via Explanatory Inversion
Bi-Lipschitz Autoencoder With Injectivity Guarantee
$PhyWorldBench$: A Comprehensive Evaluation of Physical Realism in Text-to-Video Models
Relational Transformer: Toward Zero-Shot Foundation Models for Relational Data
Operationalizing Data Minimization for Privacy-Preserving LLM Prompting
VMDiff: Visual Mixing Diffusion for Limitless Cross-Object Synthesis
Accelerating Eigenvalue Dataset Generation via Chebyshev Subspace Filter
t-SNE Exaggerates Clusters, Provably
Extending Fourier Neural Operators for Modeling Parameterized and Coupled PDEs
SFT Doesn’t Always Hurt General Capabilities: Revisiting Domain-Specific Fine-Tuning in LLMs
Omni-iEEG: A Large-Scale, Comprehensive iEEG Dataset and Benchmark for Epilepsy Research
GEOMETRY OF UNCERTAINTY: LEARNING METRIC SPACES FOR MULTIMODAL STATE ESTIMATION IN RL
Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification
FedMC: Federated Manifold Calibration
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
Influence without Confounding: Causal Discovery from Temporal Data with Long-term Carry-over Effects
Teaching Metric Distance to Discrete Autoregressive Language Models
ToProVAR: Efficient Visual Autoregressive Modeling via Tri-Dimensional Entropy-Aware Semantic Analysis and Sparsity Optimization
Delving into Spectral Clustering with Vision-Language Representations
Bayesian Ensemble for Sequential Decision-Making
Generating Directed Graphs with Dual Attention and Asymmetric Encoding
ProofFlow: A Dependency Graph Approach to Faithful Proof Autoformalization
OpenEstimate: Evaluating LLMs on Probabilistic Estimation with Real-World Data
DRIFT: Learning from Abundant User Dissatisfaction in Real-World Preference Learning
Revisiting Multimodal Positional Encoding in Vision–Language Models
All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning
Lifelong Learning with Behavior Consolidation for Vehicle Routing
Actions as Language: Fine-Tuning VLMs into VLAs Without Catastrophic Forgetting
When Shift Happens - Confounding Is to Blame
GAS: Enhancing Reward-Cost Balance of Generative Model-assisted Offline Safe RL
SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus
The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity?
Measuring Audio's Impact on Correctness: Audio-Contribution-Aware Post-Training of Large Audio Language Models
Efficient Morphology–Control Co-Design via Stackelberg PPO under Non-Differentiable Leader–Follower Interfaces
Physically Valid Biomolecular Interaction Modeling with Gauss-Seidel Projection
SesaHand: Enhancing 3D Hand Reconstruction via Controllable Generation with Semantic and Structural Alignment
Towards Real-World Routing with Neural Combinatorial Optimization
Explaining Grokking and Information Bottleneck through Neural Collapse Emergence
Stable coresets: Unleashing the power of uniform sampling
RAPID$^3$: Tri-Level Reinforced Acceleration Policies for Diffusion Transformer
To Sink or Not to Sink: Visual Information Pathways in Large Vision-Language Models
To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models
Continuous Chain of Thought: Parallel Exploration and Reasoning through a Theoretical Lens
RMAAT: Astrocyte-Inspired Memory Compression and Replay for Efficient Long-Context Transformers
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?
Composition of Memory Experts for Diffusion World Models
Semantic-aware Wasserstein Policy Regularization for Large Language Model Alignment
R4: Nested Reasoning-Retrieval for Reward Modeling in Role-Playing Agents
From Medical Records to Diagnostic Dialogues: A Clinical-Grounded Approach and Dataset for Psychiatric Comorbidity
Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives
Manipulation as in Simulation: Enabling Accurate Geometry Perception in Robots
Attention Smoothing Is All You Need For Unlearning
Heterogeneous Federated Fine-Tuning with Parallel One-Rank Adaptation
CERTIFIED VS. EMPIRICAL ADVERSARIAL ROBUSTNESS VIA HYBRID CONVOLUTIONS WITH ATTENTION STOCHASTICITY
CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving
Fairness via Independence: A General Regularization Framework for Machine Learning
Logit‑KL Flow Matching: Non‑Autoregressive Text Generation via Sampling‑Hybrid Inference
AMemGym: Interactive Memory Benchmarking for Assistants in Long-Horizon Conversations
Fingerprinting Deep Neural Networks for Ownership Protection: An Analytical Approach
One Patch Doesn’t Fit All: Adaptive Patching for Native-Resolution Multimodal Large Language Models
When Flatness Does (Not) Guarantee Adversarial Robustness
When Priors Backfire: On the Vulnerability of Unlearnable Examples to Pretraining
THE SELF-RE-WATERMARKING TRAP: FROM EXPLOIT TO RESILIENCE
STaMP: Sequence Transformation and Mixed Precision for Low-Precision Activation Quantization
Generalized Parallel Scaling with Interdependent Generations
iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models
Watermark-based Attribution of AI-Generated Images
The Forecast After the Forecast: A Post-Processing Shift in Time Series
Triple-BERT: Do We Really Need MARL for Order Dispatch on Ride-Sharing Platforms?
What Happens Next? Anticipating Future Motion by Generating Point Trajectories
STDDN: A Physics-Guided Deep Learning Framework for Crowd Simulation
SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs
LLM Fingerprinting via Semantically Conditioned Watermarks
Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models
DADA: Dual Averaging with Distance Adaptation
FutureFill: Fast Generation from Convolutional Sequence Models
Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing
Decision Pre-Trained Transformer is a Scalable In-Context Reinforcement Learner
Controllable diffusion-based generation for multi-channel biological data
Physics vs Distributions: Pareto Optimal Flow Matching with Physics Constraints
AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation
Visual Jigsaw Post-Training Improves MLLMs
T1: Tool-integrated Verification for Test-time Compute Scaling in Small Language Models
CatalystBench: A Comprehensive Multi-Task Benchmark for Advancing Language Models in Catalysis Science
Best-of-N through the Smoothing Lens: KL Divergence and Regret Analysis
Temporal Slowness in Central Vision Drives Semantic Object Learning
Test-Time Scaling with Reflective Generative Model
RainPro-8: An Efficient Deep Learning Model to Estimate Rainfall Probabilities Over 8 Hours
OR-PRM: A Process Reward Model for Algorithmic Problem in Operations Research
Agent Data Protocol
Spectral Bellman Method: Unifying Representation and Exploration in RL
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation
SIGMark: Scalable In-Generation Watermark with Blind Extraction for Video Diffusion
PAMDP: Interact to Persona Alignment via a Partially Observable Markov Decision Process
PLANETALIGN: A Comprehensive Python Library for Benchmarking Network Alignment
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
Massive Activations are the Key to Local Detail Synthesis in Diffusion Transformers
Why Reinforcement Fine-Tuning Enables MLLMs Preserve Prior Knowledge Better: A Data Perspective
FLoC: Facility Location-Based Efficient Visual Token Compression for Long Video Understanding
RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots
Dual Distillation for Few-Shot Anomaly Detection
VeriEquivBench: An Equivalence Score for Ground-Truth-Free Evaluation of Formally Verifiable Code
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Moving Beyond Medical Exams: A Clinician-Annotated Fairness Dataset of Real-World Tasks and Ambiguity in Mental Healthcare
Multi-modal Data Spectrum: Multi-modal Datasets are Multi-dimensional
Learning of Population Dynamics: Inverse Optimization Meets JKO Scheme
Turning Internal Gap into Self-Improvement: Promoting the Generation-Understanding Unification in MLLMs
OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning
NeRV-Diffusion: Diffuse Implicit Neural Representation for Video Synthesis
MIDAS: Multi-Image Dispersion and Semantic Reconstruction for Jailbreaking MLLMs
ProteinAE: Protein Diffusion Autoencoders for Structure Encoding
Signal in the Noise: Polysemantic Interference Transfers and Predicts Cross-Model Influence
Kimi-Dev: Agentless Training as Skill Prior for SWE-agents
On the identifiability of causal graphs with multiple environments
PU-BENCH: A UNIFIED BENCHMARK FOR RIGOROUS AND REPRODUCIBLE PU LEARNING
VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models
PerfGuard: A Performance-Aware Agent for Visual Content Generation
CPiRi: Channel Permutation-Invariant Relational Interaction for Multivariate Time Series Forecasting
Continuous multinomial logistic regression for neural decoding
Hystar: Hypernetwork-driven Style-adaptive Retrieval via Dynamic SVD Modulation
VL-JEPA: Joint Embedding Predictive Architecture for Vision-language
WRING Out The Bias: A Rotation-Based Alternative To Projection Debiasing
OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning
Closing the Safety Gap: Surgical Concept Erasure in Visual Autoregressive Models
Sequential Information Bottleneck Fusion: Towards Robust and Generalizable Multi-Modal Brain Tumor Segmentation
DiffSDA: Unsupervised Diffusion Sequential Disentanglement Across Modalities
DiffPBR: Point-Based Rendering via Spatial-Aware Residual Diffusion
FlexLoRA: Entropy-Guided Flexible Low-Rank Adaptation
FragFM: Hierarchical Framework for Efficient Molecule Generation via Fragment-Level Discrete Flow Matching
Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models
Vision-Language-Action Instruction Tuning: From Understanding to Manipulation
Any-Order Flexible Length Masked Diffusion
Sequential Parallel Duality in Prefix Scannable Models
Memory-Free Continual Learning with Null Space Adaptation for Zero-Shot Vision-Language Models
Resurfacing the Instance-only Dependent Label Noise Model through Loss Correction
Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations
Learning is Forgetting; LLM Training As Lossy Compression
Muon Outperforms Adam in Tail-End Associative Memory Learning
Latent Stochastic Interpolants
Reasoning Models Can be Accurately Pruned Via Chain-of-Thought Reconstruction
Durian: Dual Reference Image-Guided Portrait Animation with Attribute Transfer
Is On-Policy Data always the Best Choice for Direct Preference Optimization-Based LM Alignment?
Efficient-LVSM: Faster, Cheaper, and Better Large View Synthesis Model via Decoupled Co-Refinement Attention
Noisy-Pair Robust Representation Alignment for Positive-Unlabeled Learning
Reinforcement Learning Fine-Tuning Enhances Activation Intensity and Diversity in the Internal Circuitry of LLMs
Adaptive Logit Adjustment for Debiasing Multimodal Language Models
Play to Generalize: Learning to Reason Through Game Play
The Seismic Wavefield Common Task Framework
Jet Expansions: Restructuring LLM Computation for Model Inspection
Thompson Sampling via Fine-Tuning of LLMs
PluriHarms: Benchmarking the Full Spectrum of Human Judgments on AI Harm
Weight Space Representation Learning on Diverse NeRF Architectures
Robust Selective Activation with Randomized Temporal K-Winner-Take-All in Spiking Neural Networks for Continual Learning
Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models
COMPASS: Robust Feature Conformal Prediction for Medical Segmentation Metrics
Offline Reinforcement Learning with Adaptive Feature Fusion
EditAnyShape: Shape-Aware Image Editing via Trajectory-Guided Region Control
Can Large Language Models Match the Conclusions of Systematic Reviews?
Routing Channel-Patch Dependencies in Time Series Forecasting with Graph Spectral Decomposition
When Is Diversity Rewarded in Cooperative Multi-Agent Learning?
Action-Guided Attention for Video Action Anticipation
Learning residue level protein dynamics with multiscale Gaussians
Pallatom-Ligand: an All-Atom Diffusion Model for Designing Ligand-Binding Proteins
Text summarization via global structure awareness
CREPE: Controlling diffusion with REPlica Exchange
Relationship Alignment for View-aware Multi-view Clustering
RankFlow: Property-aware Transport for Protein Optimization
Adapt Data to Model: Adaptive Transformation Optimization for Domain-shared Time Series Foundation Models
STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models
Fusing Pixels and Genes: Spatially-Aware Learning in Computational Pathology
GRACE: A Language Model Framework for Explainable Inverse Reinforcement Learning
Robust Fine-tuning of Vision-Language-Action Robot Policies via Parameter Merging
Learning to Reason as Action Abstractions with Scalable Mid-Training RL
NatADiff: Adversarial Boundary Guidance for Natural Adversarial Diffusion
ScalingCache: Extreme Acceleration of DiTs through Difference Scaling and Dynamic Interval Caching
Estimating Semantic Alphabet Size for LLM Uncertainty Quantification
Hierarchical Multi-Scale Molecular Conformer Generation with Structural Awareness
MAPSS: Manifold-based Assessment of Perceptual Source Separation
Learnable Fractional Superlets with a Spectro-Temporal Emotion Encoder for Speech Emotion Recognition
Breaking Gradient Temporal Collinearity for Robust Spiking Neural Networks
FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models
Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment
OWLEYE: ZERO-SHOT LEARNER FOR CROSSDOMAIN GRAPH DATA ANOMALY DETECTION
The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think
Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards
REAP the Experts: Why Pruning Prevails for One-Shot MoE compression
Transformers Trained via Gradient Descent Can Provably Learn a Class of Teacher Models
Exponential-Wrapped Mechanisms: Differential Privacy on Hadamard Manifolds Made Practical
Spectrum Tuning: Post-Training for Distributional Coverage and In-Context Steerability
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use
UniHand: A Unified Model for Diverse Controlled 4D Hand Motion Modeling
DTP: Delta-Guided Two Stage Pruning for Mamba-based Multimodal Large Language Models
What Matters for Batch Online Reinforcement Learning in Robotics?
Two failure modes of deep transformers and how to avoid them: a unified theory of signal propagation at initialisation
MOSS: Efficient and Accurate FP8 LLM Training with Microscaling and Automatic Scaling
KinemaDiff: Towards Diffusion for Coherent and Physically Plausible Human Motion Prediction
ViPER: Empowering the Self-Evolution of Visual Perception Abilities in Vision-Language Models
Exploring Diverse Generation Paths via Inference-time Stiefel Activation Steering
MotionStream: Real-Time Video Generation with Interactive Motion Controls
SumRA: Parameter Efficient Fine-tuning with Singular Value Decomposition and Summed Orthogonal Basis
The Shape of Adversarial Influence: Characterizing LLM Latent Spaces with Persistent Homology
From atom to space: A region-based readout function for spatial properties of materials
ORCaS: Unsupervised Depth Completion via Occluded Region Completion as Supervision
HARDTESTGEN: A High-Quality RL Verifier Generation Pipeline for LLM Algorithimic Coding
Splat the Net: Radiance Fields with Splattable Neural Primitives
Steering MoE LLMs via Expert (De)Activation
Hedonic Neurons: A Mechanistic Mapping of Latent Coalitions in Transformer MLPs
Hallucination-aware Intermediate Representation Editing in Large Vision-Lanugage Models
Explainable Mixture Models through Differentiable Rule Learning
Sampling-aware Adversarial Attacks Against Large Language Models
Cost-of-Pass: An Economic Framework for Evaluating Language Models
Reevaluating Policy Gradient Methods for Imperfect-Information Games
Adaptive Conformal Prediction via Mixture-of-Experts Gating Similarity
Escaping Policy Contraction: Contraction-Aware PPO (CaPPO) for Stable Language Model Fine-Tuning
Partition Generative Modeling: Masked Modeling Without Masks
Revisiting Nonstationary Kernel Design for Multi-Output Gaussian Processes
GraphShield: Graph-Theoretic Modeling of Network-Level Dynamics for Robust Jailbreak Detection
GLASS Flows: Efficient Inference for Reward Alignment of Flow and Diffusion Models
Modeling Others' Minds as Code
Dual Optimistic Ascent (PI Control) is the Augmented Lagrangian Method in Disguise
HiTeA: Hierarchical Temporal Alignment for Training-Free Long-Video Temporal Grounding
Every Language Model Has a Forgery-Resistant Signature
LightRetriever: A LLM-based Text Retrieval Architecture with Extremely Faster Query Inference
Nesterov Finds GRAAL: Optimal and Adaptive Gradient Method for Convex Optimization
Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning
DRAGON: Guard LLM Unlearning in Context via Negative Detection and Reasoning
Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents
Mobile-GS: Real-time Gaussian Splatting for Mobile Devices
EffiVMT: Video Motion Transfer via Efficient Spatial-Temporal Decoupled Finetuning
Solving Football by Exploiting Equilibrium Structure of 2p0s Differential Games with One-Sided Information
AutoDA-Timeseries: Automated Data Augmentation for Time Series
Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation
How Far Are LLMs from Professional Poker Players? Revisiting Game-Theoretic Reasoning with Agentic Tool Use
Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game
ADEPT: Continual Pretraining via Adaptive Expansion and Dynamic Decoupled Tuning
Q-Learning with Adjoint Matching
DeepAFL: Deep Analytic Federated Learning
KBVQ-MoE: KLT-guided SVD with Bias-Corrected Vector Quantization for MoE Large Language Models
Don't Forget Its Variance! The Minimum Path Variance Principle for Accurate and Stable Score-Based Density Ratio Estimation
Trust The Typical
Eliciting Harmful Capabilities by Fine-Tuning on Safeguarded Outputs
Image is All You Need: Towards Efficient and Effective Large Language Model-Based Recommender Systems
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
Flow Caching for Autoregressive Video Generation
ACADREASON: Exploring the Limits of Reasoning Models with Academic Research Problems
InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search
REI-Bench: Can Embodied Agents Understand Vague Human Instructions in Task Planning?
Echoes as Anchors: Probabilistic Costs and Attention Refocusing in LLM Reasoning
Intrinsic Entropy of Context Length Scaling in LLMs
FAST‑DIPS: Adjoint‑Free Analytic Steps and Hard‑Constrained Likelihood Correction for Diffusion‑Prior Inverse Problems
Weak Correlations as the Underlying Principle for Linearization of Gradient-Based Learning Systems
Don't Shift the Trigger: Robust Gradient Ascent for Backdoor Unlearning
Boosting Open Set Recognition Performance through Modulated Representation Learning
From Large to Small: Transferring CUDA Optimization Expertise via Reasoning Graph
Constraint Matters: Multi-Modal Representation for Reducing Mixed-Integer Linear programming
GEM: A Gym for Generalist LLMs
Any-Order Any-Subset AutoRegressive Model
FlowSymm: Physics–Aware, Symmetry–Preserving Graph Attention for Network Flow Completion
Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning
A universal compression theory: Lottery ticket hypothesis and superpolynomial scaling laws
GenDR: Lighten Generative Detail Restoration
Knowledgeable Language Models as Black-Box Optimizers for Personalized Medicine
Graph Signal Processing Meets Mamba2: Adaptive Filter Bank via Delta Modulation
cadrille: Multi-modal CAD Reconstruction with Reinforcement Learning
Building spatial world models from sparse transitional episodic memories
ConvT3: Structured State Kernels for Convolutional State Space Models
InnovatorBench: Evaluating Agents’ Ability to Conduct Innovative AI Research
Beyond Grid-Locked Voxels: Neural Response Functions for Continuous Brain Encoding
FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory
ContextGen: Contextual Layout Anchoring for Identity-Consistent Multi-Instance Generation
FreqKV: Key-Value Compression in Frequency Domain for Context Window Extension
Critical Confabulation: Can LLMs Hallucinate for Social Good?
Hierarchical Prototype Learning for Semantic Segmentation
xRFM: Accurate, scalable, and interpretable feature learning models for tabular data
CAPSUL: A Comprehensive Human Protein Benchmark for Subcellular Localization
Knowledge Fusion of Large Language Models via Modular SkillPacks
Error as Signal: Stiffness-Aware Diffusion Sampling via Embedded Runge-Kutta Guidance
SAIL: Self-Amplified Iterative Learning for Diffusion Model Alignment with Minimal Human Feedback
ReTrace: Reinforcement Learning-Guided Reconstruction Attacks on Machine Unlearning
Multi-ReduNet: Interpretable Class-Wise Decomposition of ReduNet
Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall
Repurposing Foundation Model for Generalizable Medical Time Series Classification
Neural Force Field: Few-shot Learning of Generalized Physical Reasoning
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning
LeanForPhysics: Comprehensive Reasoning Framework for University-level Physics in Lean4
ARMs: Adaptive Red-Teaming Agent against Multimodal Models with Plug-and-Play Attacks
Improving Diffusion Models for Class-imbalanced Training Data via Capacity Manipulation
Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols
LLM Pretraining with Continuous Concepts
Lumos-1: On Autoregressive Video Generation with Discrete Diffusion from a Unified Model Perspective
FastFlow: Accelerating The Generative Flow Matching Models with Bandit Inference
FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
Planned Diffusion
EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning
EgoWorld: Translating Exocentric View to Egocentric View using Rich Exocentric Observations
RL for Reasoning by Adaptively Revealing Rationales
Rethinking Uncertainty Estimation in LLMs: A Principled Single-Sequence Measure
DistillKac: Few-Step Image Generation via Damped Wave Equations
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities
Explainable LLM Unlearning through Reasoning
RedacBench: Can AI Erase Your Secrets?
Flatter Tokens are More Valuable for Speculative Draft Model Training
CARE: Towards Clinical Accountability in Multi-Modal Medical Reasoning with an Evidence-Grounded Agentic Framework
Cascadia: An Efficient Cascade Serving System for Large Language Models
Direct Preference Optimization for Primitive-Enabled Hierarchical RL: A Bilevel Approach
SatDreamer360: Multiview-Consistent Generation of Ground-Level Scenes from Satellite Imagery
Efficient algorithms for Incremental Metric Bipartite Matching
Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding
TTOM: Test-Time Optimization and Memorization for Compositional Video Generation
$\alpha$-DPO: Robust Preference Alignment for Diffusion Models via $\alpha$ Divergence
Expert Divergence Learning for MoE-based Language Models
Visual Planning: Let's Think Only with Images
Flow Actor-Critic for Offline Reinforcement Learning
Causally Robust Preference Learning with Reasons
Attend to the Active: Structure-Aware Dynamic Attention in LLMs for Compositional Instruction Following
Extending Sequence Length is Not All You Need: Effective Integration of Multimodal Signals for Gene Expression Prediction
Primary-Fine Decoupling for Action Generation in Robotic Imitation
Attack-Resistant Watermarking for AIGC Image Forensics via Diffusion-based Semantic Deflection
Beyond Noisy-TVs: Noise-Robust Exploration Via Learning Progress Monitoring
Learning-Augmented Moment Estimation on Time-Decay Models
One-Step Video Restoration via Diffusion Adversarial Post-Training
SIGMA-GEN: STRUCTURE AND IDENTITY GUIDED MULTI-SUBJECT ASSEMBLY FOR IMAGE GENERATION
PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits
Demystifying Deep Search: A Holistic Evaluation with Hint-free Multi-Hop Questions and Factorised Metrics
Scaling Laws Revisited: Modeling the Role of Data Quality in Language Model Pretraining
ChainMPQ: Interleaved Text-Image Reasoning Chains for Mitigating Relation Hallucinations
ViPO: Visual Preference Optimization at Scale
Improving Long-Range Interactions in Graph Neural Simulators via Hamiltonian Dynamics
CALM: Co-evolution of Algorithms and Language Model for Automatic Heuristic Design
Structure-Aware Graph Hypernetworks for Neural Program Synthesis
Optimal Aggregation of LLM and PRM Signals for Efficient Test-Time Scaling
A Framework for Studying AI Agent Behavior: Evidence from Consumer Choice Experiments
Lossless Vocabulary Reduction for Auto-Regressive Language Models
CauKer: Classification Time Series Foundation Models Can Be Pretrained on Synthetic Data
Slow-Fast Policy Optimization: Reposition-Before-Update for LLM Reasoning
The Sample Complexity of Online Reinforcement Learning: A Multi-model Perspective
Overparametrization bends the landscape: BBP transitions at initialization in simple Neural Networks
A Noise is Worth Diffusion Guidance
Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs
WithAnyone: Toward Controllable and ID Consistent Image Generation
Redirection for Erasing Memory (REM): Towards a universal unlearning method for corrupted data
DreamSwapV: Mask-guided Subject Swapping for Any Customized Video Editing
Traceable Black-Box Watermarks For Federated Learning
Echo: Towards Advanced Audio Comprehension via Audio-Interleaved Reasoning
Instance-Dependent Fixed-Budget Pure Exploration in Reinforcement Learning
Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification
Quantized Gradient Projection for Memory-Efficient Continual Learning
LEXam: Benchmarking Legal Reasoning on 340 Law Exams
The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward
Learning More with Less: A Dynamic Dual-Level Down-Sampling Framework for Efficient Policy Optimization
Conformalized Decision Risk Assessment
Codified Finite-state Machines for Role-playing
RAR: Reversing Visual Attention Re-Sinking for Unlocking Potential in Multimodal Large Language Models
Benchmarking ECG Foundational Models: A Reality Check Across Clinical Tasks
ContextBench: Modifying Contexts for Targeted Latent Activation and Behaviour Elicitation
NextQuill: Causal Preference Modeling for Enhancing LLM Personalization
Landing with the Score: Riemannian Optimization through Denoising
MesaNet: Sequence Modeling by Locally Optimal Test-Time Training
Disentangled Representation Learning for Parametric Partial Differential Equations
Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs
Latent Wasserstein Adversarial Imitation Learning
OpenAgentSafety: A Comprehensive Framework For Evaluating Real-World AI Agent Safety
ReWatch-R1: Boosting Complex Video Reasoning in Large Vision-Language Models through Agentic Data Synthesis
SceneTransporter: Optimal Transport-Guided Compositional Latent Diffusion for Single-Image Structured 3D Scene Generation
MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos
Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning
Contact-guided Real2Sim from Monocular Video with Planar Scene Primitives
CoMem: Compositional Concept-Graph Memory for Continual Vision–Language Learning
GPS: Directed Acyclic Graph guided Proactive Information Seeking in Large Language Models
LSA: Layer-wise Sparsity Allocation for Large Language Model Pruning Based on Minimal Linear Reconstruction Error
Skill Learning via Policy Diversity Yields Identifiable Representations for Reinforcement Learning
Photon: Speedup Volume Understanding with Efficient Multimodal Large Language Models
ProRe: A Proactive Reward System for GUI Agents via Reasoner–Actor Collaboration
Kevin: Multi-Turn RL for Generating CUDA Kernels
Revisiting Long-context Modeling from Context Denoising Perspective
VideoAgentTrek: Computer-Use Pretraining from Unlabeled Videos
Representational Alignment Across Model Layers and Brain Regions with Hierarchical Optimal Transport
What matters for Representation Alignment: Global Information or Spatial Structure?
Meta-Adaptive Prompt Distillation for Few-Shot Visual Question Answering
Programming by Backprop: Learning Behaviour from Symbolic Descriptions
Low-Latency Neural LiDAR Compression with 2D Context Models
SYNC: Measuring and Advancing Synthesizability in Structure-Based Drug Design
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Model
THEMIS: Towards Holistic Evaluation of MLLMs for Scientific Paper Fraud Forensics
Permutation-Consistent Variational Encoding for Incomplete Multi-View Multi-Label Classification
Local Success Does Not Compose: Benchmarking Large Language Models for Compositional Formal Verification
GaitSnippet: Gait Recognition Beyond Unordered Sets and Ordered Sequences
Beyond a Million Tokens: Benchmarking and Enhancing Long-Term Memory in LLMs
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models
Mixture of Contexts for Long Video Generation
Improving LLM-based Global Optimization with Search Space Partitioning
Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization
Human3R: Everyone Everywhere All at Once
InfoBridge: Mutual Information estimation via Bridge Matching
Parameterized Hardness of Zonotope Containment and Neural Network Verification
PartSAM: A Scalable Promptable Part Segmentation Model Trained on Native 3D Data
I-DRUID: Layout to image generation via instance-disentangled representation and unpaired data
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching
DeepEyesV2: Toward Agentic Multimodal Model
Sign-SGD via Parameter-Free Optimization
Code Driven Planning with Domain-Adaptive Selector
WoW!: World Models in a Closed-Loop World
Riemannian Zeroth-Order Gradient Estimation with Structure-Preserving Metrics for Geodesically Incomplete Manifolds
Grounding-IQA: Grounding Multimodal Language Model for Image Quality Assessment
COSMO-INR: Complex Sinusoidal Modulation for Implicit Neural Representations
Interact-RAG: Reason and Interact with the Corpus, Beyond Black-Box Retrieval
IceCache: Memory-Efficient KV-cache Management for Long-Sequence LLMs
MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark
From Pixels to Semantics: Unified Facial Action Representation Learning for Micro-Expression Analysis
Opponent Shaping in LLM Agents
MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction
How to train data-efficient LLMs
Learning From the Past with Cascading Eligibility Traces
PepTri: Tri-Guided All-Atom Diffusion for Peptide Design via Physics, Evolution, and Mutual Information
Rewarding Doubt: A Reinforcement Learning Approach to Calibrated Confidence Expression of Large Language Models
The Polar Express: Optimal Matrix Sign Methods and their Application to the Muon Algorithm
SAQ: Stabilizer-Aware Quantum Error Correction Decoder
In Agents We Trust, but Who Do Agents Trust? Latent Preferences Steer LLM Generations
RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments
ComGS: Efficient 3D Object-Scene Composition via Surface Octahedral Probes
Reducing Semantic Mismatch in Brain-to-Text Decoding Through Personalized Multimodal Masking
Iterative Training of Physics-Informed Neural Networks with Fourier-enhanced Features
Getting Your LLMs Ready for Reinforcement Learning with Lightweight SFT
Sparling: End-to-End Spatial Concept Learning via Extremely Sparse Activations
Watch your steps: Dormant Adversarial Behaviors that Activate upon LLM Finetuning
Causal-Steer: Disentangled Continuous Style Control without Parallel Corpora
Escaping Model Collapse via Synthetic Data Verification: Near-term Improvements and Long-term Convergence
Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling
FaSTA*: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing
Nemotron-Research-Tool-N1: Exploring Tool-Using Language Models with Reinforced Reasoning
CGSA: Class-Guided Slot-Aware Adaptation for Source-Free Object Detections
Efficient Quantization of Mixture-of-Experts with Theoretical Generalization Guarantees
Depth Anything 3: Recovering the Visual Space from Any Views
Channel-Aware Mixed-Precision Quantization for Efficient Long-Context Inference
SHIELD: Suppressing Hallucinations In LVLM Encoders via Bias and Vulnerability Defense
Flipping the Dialogue: Training and Evaluating User Language Models
Incorporating Expert Priors into Bayesian Optimization via Dynamic Mean Decay
Neural Optimal Transport Meets Multivariate Conformal Prediction
NGS-Marker: Robust Native Watermarking for 3D Gaussian Splatting
DAG-Math: Graph-Guided Mathematical Reasoning in LLMs
AutoMetrics: Approximate Human Judgments with Automatically Generated Evaluators
ReactID: Synchronizing Realistic Actions and Identity in Personalized Video Generation
Latent Adaptation of Foundation Policies for Sim-to-Real Transfer
From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation
GenCompositor: Generative Video Compositing with Diffusion Transformer
Generative Blocks World: Moving Things Around in Pictures
Expertise Can Be Helpful for Reinforcement Learning-based Macro Placement
GoR: A Unified and Extensible Generative Framework for Ordinal Regression
How Catastrophic is Your LLM? Certifying Risk in Conversation
RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration
Sharp Monocular View Synthesis in Less Than a Second
The Potential of Second-Order Optimization for LLMs: A Study with Full Gauss-Newton
Model-based Offline RL via Robust Value-Aware Model Learning with Implicitly Differentiable Adaptive Weighting
Active Learning of 3D Gaussian Splatting with Consistent Region Partition and Robust Pose Estimation
Latent Geometry-Driven Network Automata for Complex Network Dismantling
Benchmarking LLM Tool-Use in the Wild
Entropy-Guided Dynamic Tokens for Graph-LLM Alignment in Molecular Understanding
ReconViaGen: Towards Accurate Multi-view 3D Object Reconstruction via Generation
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
Death of the Novel(ty): Beyond N-Gram Novelty as a Metric for Textual Creativity
Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
Seq vs Seq: An Open Suite of Paired Encoders and Decoders
LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning
Pitfalls in Evaluating Language Model Forecasters
RAVEN: End-to-end Equivariant Robot Learning with RGB Cameras
S2GO: Streaming Sparse Gaussian Occupancy
Robust Equation Structure learning with Adaptive Refinement
Dynamic Classifier-Free Diffusion Guidance via Online Feedback
VideoNSA: Native Sparse Attention Scales Video Understanding
DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models
Bridging the Gap Between Promise and Performance for FP4 Quantization
Avoid Catastrophic Forgetting with Rank-1 Fisher from Diffusion Models
DualMap: Enabling Both Cache Affinity and Load Balancing for Distributed LLM Serving
Softmax is not Enough (for Adaptive Conformal Classification)
Prompt-MII: Meta-Learning Instruction Induction for LLMs
A Statistical Benchmark for Diffusion Posterior Sampling Algorithms
LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models
OmniText: A Training-Free Generalist for Controllable Text-Image Manipulation
Your VAR Model is Secretly an Efficient and Explainable Generative Classifier
AutoTool: Automatic Scaling of Tool-Use Capabilities in RL via Decoupled Entropy Constraints
OmniNav: A Unified Framework for Prospective Exploration and Visual-Language Navigation
Weight-Space Linear Recurrent Neural Networks
VeriCoT: Neuro-symbolic Chain-of-Thought Validation via Logical Consistency Checks
PICS: Pairwise Image Compositing with Spatial Interactions
Unlocking the Potential of Weighting Methods in Federated Learning Through Communication Compression
OWL : Geometry-Aware Spatial Reasoning for Audio Large Language Models
MathNet: A Global Multimodal Benchmark for Mathematical Reasoning and Retrieval
AlphaSAGE: Structure-Aware Alpha Mining via GFlowNets for Robust Exploration
Predictive Differential Training Guided by Training Dynamics
On the Interaction of Compressibility and Adversarial Robustness
Symmetry-Aware Bayesian Optimization via Max Kernels
Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation
Generalization Below the Edge of Stability: The Role of Data Geometry
Scaling Laws of SignSGD in Linear Regression: When Does It Outperform SGD?
Does the Data Processing Inequality Reflect Practice? On the Utility of Low-Level Tasks
SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via Velocity-Reparameterized Sequential Modeling
MIRACLE: Model-free Imitation and Reinforcement Learning for Adaptive Cut-Selection
Motion-Aligned Word Embeddings for Text-to-Motion Generation
Accelerated Learning with Linear Temporal Logic using Differentiable Simulation
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning
Fracture-GS: Dynamic Fracture Simulation with Physics-Integrated Gaussian Splatting
Synthesising Counterfactual Explanations via Label-Conditional Gaussian Mixture Variational Autoencoders
Dynamic Multi-sample Mixup with Gradient Exploration for Open-set Graph Anomaly Detection
Why Keep Your Doubts to Yourself? Trading Visual Uncertainties in Multi-Agent Bandit Systems
MnemoDyn: Learning Resting State Dynamics from $40$K FMRI sequences
Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems
CLoD-GS: Continuous Level-of-Detail via 3D Gaussian Splatting
Towards Cognitively-Faithful Decision-Making Models to Improve AI Alignment
TSM-Bench: Detecting LLM-Generated Text in Real-World Wikipedia Editing Practices
Towards True Speech-to-Speech Models Without Text Guidance
Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind
Captain Cinema: Towards Short Movie Generation
Translate Policy to Language: Flow Matching Generated Rewards for LLM Explanations
KnowledgeSmith: Uncovering Knowledge Updating in LLMs with Model Editing and Unlearning
R2PS: Worst-Case Robust Real-Time Pursuit Strategies under Partial Observability
Prompt Curriculum Learning for Efficient LLM Post-Training
Aegis: Automated Error Generation and Identification for Multi-Agent Systems
Take Note: Your Molecular Dataset Is Probably Aligned
On The Surprising Effectiveness of a Single Global Merging in Decentralized Learning
GuirlVG: Incentivize GUI Visual Grounding via Empirical Exploration on Reinforcement Learning
Towards Generalizable PDE Dynamics Forecasting via Physics-Guided Invariant Learning
Conformal Prediction with Corrupted Labels: Uncertain Imputation and Robust Re-weighting
FastGRPO: Accelerating Policy Optimization via Concurrency-aware Speculative Decoding and Online Draft Learning
Algorithmic Guarantees for Distilling Supervised and Offline RL Datasets
Beyond Scattered Acceptance: Fast and Coherent Inference for DLMs via Longest Stable Prefixes
QeRL: Beyond Efficiency - Quantization-enhanced Reinforcement Learning for LLMs
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Method
Clustering by Denoising: Latent plug-and-play diffusion for single-cell embeddings
Lightweight Transformer for EEG Classification via Balanced Signed Graph Algorithm Unrolling
Teaching LLMs to Admit Uncertainty in OCR
Preference-based Policy Optimization from Sparse-reward Offline Dataset
Your Language Model Secretly Contains Personality Subnetworks
DiVE-k: DIFFERENTIAL VISUAL REASONING FOR FINE-GRAINED IMAGE RECOGNITION
Controllable Sequence Editing for Biological and Clinical Trajectories
Information Estimation with Discrete Diffusion
Directional Textual Inversion for Personalized Text-to-Image Generation
Chunking the Critic: A Transformer-based Soft Actor-Critic with N-Step Returns
Diverse Dictionary Learning
Learning Dynamic Causal Graphs Under Parametric Uncertainty via Polynomial Chaos Expansions
Positional Encoding Field
Value Gradient Flow: Behavior-Regularized RL without Regularization
Demystifying Supervision Data Generalization in Multimodal LMs
Native Adaptive Solution Expansion for Diffusion-based Combinatorial Optimization
FeDaL: Federated Dataset Learning for General Time Series Foundation Models
CylinderSplat: 3D Gaussian Splatting with Cylindrical Triplanes for Panoramic Novel View Synthesis
Cultivating Pluralism In Algorithmic Monoculture: The Community Alignment Dataset
FreeViS: Training-free Video Stylization with Inconsistent References
Embedding-Based Context-Aware Reranker
DETR-ViP: Detection Transformer with Robust Discriminative Visual Prompts
Rethinking LLM Evaluation: Can We Evaluate LLMs with 200× Less Data?
Consolidating Reinforcement Learning for Multimodal Discrete Diffusion Models
ShinkaEvolve: Towards Open-Ended and Sample-Efficient Program Evolution
Paper Copilot: Tracking the Evolution of Peer Review in AI Conferences
On the Universality and Complexity of GNN for Solving Second-order Cone Programs
Joint Distribution–Informed Shapley Values for Sparse Counterfactual Explanations
Transformers Learn Latent Mixture Models In-Context via Mirror Descent
Dr.LLM: Dynamic Layer Routing in LLMs
Q&C: When Quantization Meets Cache in Efficient Generation
Forget Many, Forget Right: Scalable and Precise Concept Unlearning in Diffusion Models
Beyond Spectra: Eigenvector Overlaps in Loss Geometry
UniOD: A Universal Model for Outlier Detection across Diverse Domains
MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
Geometric Graph Neural Diffusion for Stable Molecular Dynamics
Doloris: Dual Conditional Diffusion Implicit Bridges with Sparsity Masking Strategy for Unpaired Single-Cell Perturbation Estimation
SigLIP-HD by Fine-to-Coarse Supervision
Characterizing the Discrete Geometry of ReLU Networks
IA2: Alignment with ICL Activations improves Supervised Fine-Tuning
Synthetic Bootstrapped Pretraining
Free Point-wise Anomaly Detection via Fold-bifurcation
Globally aware optimization with resurgence
Distributionally Robust Optimization via Generative Ambiguity Modeling
Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models
Relative Value Learning
Quasi-Monte Carlo Methods Enable Extremely Low-Dimensional Deep Generative Models
Split Happens (But Your Video Model Can Be Edited)
Hourglass Persistence for Graphs, Simplices, and Cells
Estimating Worst-Case Frontier Risks of Open-Weight LLMs
SAFETY-GUIDED FLOW (SGF): A UNIFIED FRAMEWORK FOR NEGATIVE GUIDANCE IN SAFE GENERATION
Bandit Learning in Matching Markets Robust to Adversarial Corruptions
BA-LoRA: Bias-Alleviating Low-Rank Adaptation to Mitigate Catastrophic Inheritance in Large Language Models
SGD with Adaptive Preconditioning: Unified Analysis and Momentum Acceleration
Queue Length Regret Bounds for Contextual Queueing Bandits
Activation Steering with a Feedback Controller
Reinforcing Diffusion Models by Direct Group Preference Optimization
Scaling Laws for Diffusion Transformers
Consistency Geodesic Bridge: Image Restoration with Pretrained Diffusion Models
Enforcing Axioms for AI Alignment under Loss-Based Rules
From Concepts to Components: Concept-Agnostic Attention Module Discovery in Transformers
BBQ: Boosting Quantization Entropy with Bell Box Quantization
Online time series prediction using feature adjustment
GAPrune: Gradient-Alignment Pruning for Domain-Aware Embeddings
Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction
Asynchronous Policy Gradient Aggregation for Efficient Distributed Reinforcement Learning
Sparsity Forcing: Reinforcing Token Sparsity of MLLMs
Hyperbolic Aware Minimization: Implicit Bias for Sparsity
Off-Policy Safe Reinforcement Learning with Cost-Constrained Optimistic Exploration
Variance-Dependent Regret Lower Bounds for Contextual Bandits
Analyzing and Evaluating Unbiased Language Model Watermark
RATE-DISTORTION OPTIMIZED COMMUNICATION FOR COLLABORATIVE PERCEPTION
Deft Scheduling of Dynamic Cloud Workflows with Varying Deadlines via Mixture-of-Experts
Verification and Co-Alignment via Heterogeneous Consistency for Preference-Aligned LLM Annotations
SSG: Scaled Spatial Guidance for Multi-Scale Visual Autoregressive Generation
Towards Learned Optimization Free Lunch
``Noisier'’ Noise Contrastive Estimation is (Almost) Maximum Likelihood
Consistent Low-Rank Approximation
On Smoothness Bounds for Non-Clairvoyant Scheduling with Predictions
Boolean Satisfiability via Imitation Learning
Conjuring Semantic Similarity
Online Alignment as Perceptual Loss
FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark
Efficient Turing Machine Simulation with Transformers
AntigenLM: Structure-Aware DNA Language Modeling for Influenza
Causal Discovery in the Wild: A Voting-Theoretic Ensemble Approach
Temporal Geometry of Deep Networks: Hyperbolic Representations of Training Dynamics for Intrinsic Explainability
Mean-Field Neural Differential Equations: A Game-Theoretic Approach to Sequence Prediction
RegionReasoner: Region-Grounded Multi-Round Visual Reasoning
On Entropy Control in LLM-RL Algorithms
Enhancing Multivariate Time Series Forecasting with Global Temporal Retrieval
Activation Function Design Sustains Plasticity in Continual Learning
Beyond Instance-Level Alignment: Dual-Level Optimal Transport for Audio-Text Retrieval
Reusing Pre-Training Data at Test Time is a Compute Multiplier
Scalable Offline Model-Based RL with Action Chunks
Scalable Random Wavelet Features: Efficient Non-Stationary Kernel Approximation with Convergence Guarantees
Training-Free Determination of Network Width via Neural Tangent Kernel
Primal-Dual Policy Optimization for Adversarial Linear CMDPs
Measure Twice, Cut Once: A Semantic-Oriented Approach to Video Temporal Localization with Video LLMs
Reducing Symmetry Increase in Equivariant Neural Networks
Is In-Context Learning Learning?
COSA: Context-aware Output-Space Adapter for Test-Time Adaptation in Time Series Forecasting
Learning What Matters Now: Dynamic Preference Inference under Contextual Shifts
Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs
FlexHiNM-GP: Flexible Hierarchical Pruning via Region Allocation and Channel Permutation
Pareto Variational Autoencoder
ROGA: Scaling Generalist Agents for Office Productivity Tasks via Tool Generation
PostAlign: Multimodal Grounding as a Corrective Lens for MLLMs
DiVeQ: Differentiable Vector Quantization Using the Reparameterization Trick
How to Lose Inherent Counterfactuality in Reinforcement Learning
Quantization-Aware Diffusion Models For Maximum Likelihood Training
Analytica: Soft Propositional Reasoning for Robust and Scalable LLM-Driven Analysis
Efficient Approximate Posterior Sampling with Annealed Langevin Monte Carlo
Interactive Learning of Single-Index Models via Stochastic Gradient Descent
Perturbation-Induced Linearization: Constructing Unlearnable Data with Solely Linear Classifiers
SNaX: sparse narrow accelerated mixture of experts
Quagmires in SFT-RL Post-Training: When High SFT Scores Mislead and What to Use Instead
Elastic Optimal Transport: Theory, Application, and Empirical Evaluation
Diversity-Aware Online Prompt Assignment to Generative Models
Cross-Tokenizer Likelihood Scoring Algorithms for Language Model Distillation
RigidSSL: Rigidity-based Geometric Pretraining for Protein Generation
M$^2$-Miner: Multi-Agent Enhanced MCTS for Mobile GUI Agent Data Mining
Self-Augmented Visual Contrastive Decoding
High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning
String Seed of Thought: Prompting LLMs for Distribution-Faithful and Diverse Generation
Almost Bayesian: Dynamics of SGD Through Singular Learning Theory
Data-to-Energy Stochastic Dynamics
WILD-Diffusion: A WDRO Inspired Training Method for Diffusion Models under Limited Data
CRONOS: Continuous time reconstruction for 4D medical longitudinal series
Bidirectional Predictive Coding
UltraGauss: Ultrafast Gaussian Reconstruction of 3D Ultrasound Volumes
Multi-Agent Debate with Memory Masking
Graph Tokenization for Bridging Graphs and Transformers
Reconstruct Anything Model a lightweight foundation model for computational imaging
Noise Stability of Transformer Models
Deep Hierarchical Learning with Nested Subspace Networks
The Softmax Bottleneck Does Not Limit the Probabilities of the Most Likely Tokens
Saddle-to-Saddle Dynamics Explains A Simplicity Bias Across Architectures
Gumbel Distillation for Parallel Text Generation
$\mathbf{Li_2}$: A Framework on Dynamics of Feature Emergence and Delayed Generalization
Into the Rabbit Hull: From Task-Relevant Concepts in DINO to Minkowski Geometry
Noise-Aware Generalization: Robustness to In-Domain Noise and Out-of-Domain Generalization
Generative View Stitching
Fantastic Pretraining Optimizers and Where to Find Them
Diversified Multinomial Logit Contextual Bandits
How NOT to benchmark your SITE metric: Beyond Static Leaderboards and Towards Realistic Evaluation.
Conditioned Initialization for Attention
Theory of Scaling Laws for In-Context Regression: Depth, Width, Context and Time
Deep Think with Confidence
Pose Prior Learner: Unsupervised Categorical Prior Learning for Pose Estimation
Cyber-Zero: Training Cybersecurity Agents without Runtime
Multi-Head Low-Rank Attention
Bridging ML and algorithms: comparison of hyperbolic embeddings
Uniform Discrete Diffusion with Metric Path for Video Generation
Learning from Label Proportions via Proportional Value Classification
A Convergence Analysis of Adaptive Optimizers under Floating-point Quantization
AQuA: Toward Strategic Response Generation for Ambiguous Visual Questions
PSDNorm: Temporal Normalization for Deep Learning in Sleep Staging
Taming Hierarchical Image Coding Optimization: A Spectral Regularization Perspective
INTIMA: A Benchmark for Human-AI Companionship Behavior
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
A Study of Posterior Stability in Time-Series Latent Diffusion
GDR-learners: Orthogonal Learning of Generative Models for Potential Outcomes
MoSA: Mosaic Shared Adaptation of Large Language Models
Segment Any Events with Language
Compositional Generalization through Gradient Search in Nonparametric Latent Space
GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning
Q-Learning with Fine-Grained Gap-Dependent Regret
Gradient-Normalized Smoothness for Optimization with Approximate Hessians
Efficient Reinforcement Learning by Guiding World Models with Non-Curated Data
AutoDV: An End-to-End Deep Learning Model for High-Dimensional Data Visualization
Rethinking Continual Learning with Progressive Neural Collapse
Flow-based Conformal Prediction for Multi-dimensional Time Series
Covariate-Guided Clusterwise Linear Regression for Generalization to Unseen Data
Glance for Context: Learning When to Leverage LLMs for Node-Aware GNN-LLM Fusion
Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling
FLOWER: A Flow-Matching Solver for Inverse Problems
Soft Tokens, Hard Truths
Who Matters Matters: Agent-Specific Conservative Offline MARL
Learning Distributions over Permutations and Rankings with Factorized Representations
Control Tax: The Price of Keeping AI in Check
Learning to Reason without External Rewards
Riemannian Federated Learning via Averaging Gradient Streams
GNN-as-Judge: Unleashing the Power of LLMs for Graph Few-shot Semi-supervised Learning with GNN Feedback
PAC-Bayes bounds for cumulative loss in Continual Learning
Learning in Prophet Inequalities with Noisy Observations
Unified Vision–Language Modeling via Concept Space Alignment
From EduVisBench to EduVisAgent: A Benchmark and Multi-Agent Framework for Reasoning-Driven Pedagogical Visualization
Memory-Statistics Tradeoff in Continual Learning with Structural Regularization
Cactus: Accelerating Auto-Regressive Decoding with Constrained Acceptance Speculative Sampling
Predicting Training Re-evaluation Curves Enables Effective Data Curriculums for LLMs
Distributionally Robust Linear Regression with Block Lewis Weights
Referring Layer Decomposition
Pay Attention to CTC: Fast and Robust Pseudo-Labelling for Unified Speech Recognition
Identifiability Challenges in Sparse Linear Ordinary Differential Equations
Beyond Sequential Reranking: Reranker-Guided Search Improves Reasoning Intensive Retrieval
Evaluating SAE interpretability without generating explanations
AutoBio: A Simulation and Benchmark for Robotic Automation in Digital Biology Laboratory
Sparse Autoencoders Trained on the Same Data Learn Different Features
Asymmetric Proximal Policy Optimization: mini-critics boost LLM reasoning
Q-learning with Posterior Sampling
Pretraining Scaling Laws for Generative Evaluations of Language Models
Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition
Expressiveness of Multi-Neuron Convex Relaxations in Neural Network Certification
SciNav: A Principled Agent Framework for Scientific Coding Tasks
Provable Separations between Memorization and Generalization in Diffusion Models
TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning
TTSDS2: Resources and Benchmark for Evaluating Human-Quality Text to Speech Systems
LatentQA: Teaching LLMs to Decode Activations Into Natural Language
AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents
QLIP: A Dynamic Quadtree Vision Prior Enhances MLLM Performance Without Retraining
Latent Veracity Inference for Identifying Errors in Stepwise Reasoning
Diffusion-DFL: Decision-focused Diffusion Models for Stochastic Optimization
On The Fragility of Benchmark Contamination Detection in Reasoning Models
Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning
EquAct: An SE(3)-Equivariant Multi-Task Transformer for 3D Robotic Manipulation
Pretraining with hierarchical memories: separating long-tail and common knowledge
Jailbreak Transferability Emerges from Shared Representations
Distributed Algorithms for Euclidean Clustering
Pursuing Minimal Sufficiency in Spatial Reasoning
jqBench: a benchmark for reading and editing JSON from natural language and/or examples
Gradient-Based Program Synthesis with Neurally Interpreted Languages
Go-Browse: Training Web Agents with Structured Exploration
Scalable Chain of Thoughts via Elastic Reasoning
Efficient Estimation of Kernel Surrogate Models for Task Attribution
Cautious Optimizers: Improving Training with One Line of Code
Fast Estimation of Wasserstein Distances via Regression on Sliced Wasserstein Distances
Riemannian Optimization on Relaxed Indicator Matrix Manifold
Multi-View Encoders for Performance Prediction in LLM-Based Agentic Workflows
Block-sample MAC-Bayes generalization bounds
Not All Documents Are What You Need for Extracting Instruction Tuning Data
Dual-Scale World Models for LLM Agents towards Hard-Exploration Problems
How hard is learning to cut? Trade-offs and sample complexity
Gauge Flow Matching: Efficient Constrained Generative Modeling over General Convex Set and Beyond
Learning to Play Multi-Follower Bayesian Stackelberg Games
Diffusion Transformers with Representation Autoencoders
Soft Equivariance Regularization for Invariant Self-Supervised Learning
Secure Inference for Diffusion Models via Unconditional Scores
Eliminating Inductive Bias in Reward Models with Information-Theoretic Guidance
Discrete Adjoint Matching
Self-Rewarding Vision-Language Model via Reasoning Decomposition and Multi-Reward Policy Optimization
Efficient Agent Training for Computer Use
GIT-BO: High-Dimensional Bayesian Optimization with Tabular Foundation Models
AdaSpec: Adaptive Spectrum for Enhanced Node Distinguishability
MaskCO: Masked Generation Drives Effective Representation Learning and Exploiting for Combinatorial Optimization
MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation
Toward Conservative Planning from Preferences in Offline Reinforcement Learning
Reasoning without Training: Your Base Model is Smarter Than You Think
Combinatorial Bandit Bayesian Optimization for Tensor Outputs
Language Identification in the Limit with Computational Trace
Cartridges: Lightweight and general-purpose long context representations via self-study
Depth Anything with Any Prior
Egalitarian Gradient Descent: A Simple Approach to Accelerated Grokking
Multimodal Classification via Total Correlation Maximization
What's In My Human Feedback? Learning Interpretable Descriptions of Preference Data
Improved $\ell_{p}$ Regression via Iteratively Reweighted Least Squares
Multi-Task Low-Rank Model Adaptation
Enhancing LLMs for Knowledge Base Question Answering by Chain-of-Decomposition
To View Transform or Not to View Transform: NeRF-based Pre-training Perspective
Identifiability and recoverability in self-supervised models
Beyond Short Steps in Frank-Wolfe Algorithms
On the trade-off between expressivity and privacy in graph representation learning
Nonparametric Contextual Online Bilateral Trade
Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents
SONIC: Spectral Oriented Neural Invariant Convolutions
NI Sampling: Accelerating Discrete Diffusion Sampling by Token Order Optimization
Robust Preference Alignment via Directional Neighborhood Consensus
Merge before Forget: A Single LoRA Continual Learning via Continual Merging
Taming the Fragility of KV Cache Eviction in LLM Inference
PROS: Towards Compute-Efficient RLVR via Rollout Prefix Reuse
Hidden Patterns in Chain-of-Thought Reasoning
Distractor-free Generalizable 3D Gaussian Splatting
ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack
PhaseFormer: From Patches to Phases for Efficient and Effective Time Series Forecasting
Hierarchical Multi-Stage Recovery Framework for Kronecker Compressed Sensing
Implicit Sensing for Fourier Sparse Boolean Functions
gLSTM: Mitigating Over-Squashing by Increasing Storage Capacity
Non-Asymptotic Analysis of (Sticky) Track-and-Stop
Poisson Midpoint Method for Log Concave Sampling: Beyond the Strong Error Lower Bounds
Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning
LongRLVR: Long-Context Reinforcement Learning Requires Verifiable Context Rewards
Quantum machine learning advantages beyond hardness of evaluation
Convergence of Actor-Critic gradient flow for entropy regularised MDPs in general spaces
An Optimal Diffusion Approach to Quadratic Rate-Distortion Problems: New Solution and Approximation Methods
Confident and Adaptive Generative Speech Recognition via Conformal Risk Control
Dimension-Free Decision Calibration for Nonlinear Loss Functions
A New Initialization to Control Gradients in Sinusoidal Neural Networks
Rethinking Code Similarity for Automated Algorithm Design with LLMs
What happens when generative AI models train recursively on each others' outputs?
Free Energy Mixer
Continuous Audio Language Models
A Primer on SO(3) Action Representations in Deep Reinforcement Learning
KV-Cache Transform Coding for Compact Storage in LLM Inference
Optimizing Canaries for Privacy Auditing with Metagradient Descent
H$^3$GNNs: Harmonizing Heterophily and Homophily in GNNs via Self-Supervised Node Encoding
EditLens: Quantifying the Extent of AI Editing in Text
Diffusion Language Models For Code Infilling Beyond Fixed-size Canvas
Unified Analyses for Hierarchical Federated Learning: Topology Selection under Data Heterogeneity
Sharing State Between Prompts and Programs
ConfHit: Conformal Generative Design via Nested Testing
Verifier-free Test-Time Sampling for Vision Language Action Models
Statistical Guarantees for Offline Domain Randomization
Training Large Language Models To Reason In Parallel With Global Forking Tokens
On learning linear dynamical systems in context with attention layers
Decoupled MeanFlow: Turning Flow Models into Flow Maps for Accelerated Sampling
Stochastic Neural Networks for Causal Inference with Missing Confounders
MATHMO: Automated Mathematical Modeling Through Adaptive Search
Fine-Grained Iterative Adversarial Attacks with Limited Computation Budget
Fair Policy Aggregation from Standard Policy Optimization
On the Interpolation Effect of Score Smoothing in Diffusion Models
Token Distillation: Attention-Aware Input Embeddings for New Tokens
Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design
Inter-Agent Relative Representations for Multi-Agent Option Discovery
Time-Gated Multi-Scale Flow Matching for Time-Series Imputation
GraphPlanner: Graph-Based Agentic Routing for LLMs
Closed-form $\ell_r$ norm scaling with data for overparameterized linear regression and diagonal linear networks under $\ell_p$ bias
Self-Destructive Language Models
LLMs Can Hide Text in Other Text of the Same Length
Skirting Additive Error Lower Bounds for Private Turnstile Streams
Learning to Reason over Continuous Tokens with Reinforcement Learning
Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region
Persona Features Control Emergent Misalignment
Automata Learning and Identification of the Support of Language Models
Prediction with Expert Advice under Local Differential Privacy
Neologism Learning for Controllability and Self-Verbalization
OpenApps: Simulating Environment Variations to Measure UI Agent Reliability
In-Context Learning for Pure Exploration
An Information-Theoretical Framework For Optimizing Experimental Design To Distinguish Probabilistic Neural Codes
All Code, No Thought: Language Models Struggle to Reason in Ciphered Language
Learning Shrinks the Hard Tail: Training‑Dependent Inference Scaling in a Solvable Linear Model
Efficient Testing for Correlation Clustering: Improved Algorithms and Optimal Bounds
Intrinsic Explanation of Random Subspace Method for Enhanced Security Applications
COLD-Steer: Steering Large Language Models via In-Context One-step Learning Dynamics
Is the Reversal Curse a Binding Problem? Uncovering Limitations of Transformers from a Basic Generalization Failure
Inpainting-Guided Policy Optimization for Diffusion Large Language Models
Neural Posterior Estimation with Latent Basis Expansions
MOBODY: Model-Based Off-Dynamics Offline Reinforcement Learning
TSLM: Tree-Structured Language Modeling for Divergent Thinking
Expressive and Invariant Graph Learning via Canonical Tree Cover Neural Networks
When Scores Learn Geometry: Rate Separations under the Manifold Hypothesis
Flowing Through States: Neural ODE Regularization for Reinforcement Learning
The Limits of Inference Scaling Through Resampling
Splat Regression Models
Measuring the Intrinsic Dimension of Earth Representations
Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context
SWERank: Software Issue Localization with Code Ranking
Infinite Horizon Markov Economies
Robust Decision-Making with Partially Calibrated Forecasters
Why Less is More (Sometimes): A Theory of Data Curation
Offline Preference-Based Value Optimization
Language Confusion Gate: Language-Aware Decoding Through Model Self-Distillation
Log-Linear Attention
Strategic Obfuscation of Deceptive Reasoning in Language Models
Clipped Gradient Methods for Nonsmooth Convex Optimization under Heavy-Tailed Noise: A Refined Analysis
Deep FlexQP: Accelerated Nonlinear Programming via Deep Unfolding
BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning
How Dark Patterns Manipulate Web Agents
Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning
Evolution and compression in LLMs: on the emergence of human-aligned categorization
LLM-Guided Evolutionary Program Synthesis for Quasi-Monte Carlo Design
Natural Language PDDL (NL-PDDL) for Open-world Goal-oriented Commonsense Regression Planning in Embodied AI
Test-Time Matching: Unlocking Compositional Reasoning in Multimodal Models
Learning multimodal dictionary decompositions with group-sparse autoencoders
Bures Generalized Category Discovery
Neon: Negative Extrapolation From Self-Training Improves Image Generation
Weak-to-Strong Diffusion
Negative Pre-activations Differentiate Syntax
Align to Misalign: Automatic LLM Jailbreak with Meta-Optimized LLM Judges
Flow Map Learning via Games
Long Chain-of-Thought Reasoning Across Languages
LANE: Label-Aware Noise Elimination for Fine-Grained Text Classification
ConsisDrive: Identity-Preserving Driving World Models for Video Generation by Instance Mask
Hyper-SET: Designing Transformers via Hyperspherical Energy Minimization
Strong Correlations Induce Cause Only Predictions in Transformer Training
Diffusion Language Models are Provably Optimal Parallel Samplers
Polychromic Objectives for Reinforcement Learning
Swap-guided Preference Learning for Personalized Reinforcement Learning from Human Feedback
WebDS: An End-to-End Benchmark for Web-based Data Science
Leveraging Discrete Function Decomposability for Scientific Design
On Coreset for LASSO Regression Problem with Sensitivity Sampling
Task-Agnostic Amortized Multi-Objective Optimization
MoEEdit: Efficient and Routing-Stable Knowledge Editing for Mixture-of-Experts LLMs
In Context Semi-Supervised Learning
Steering Language Models with Weight Arithmetic
Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing
Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models
Condition Errors Refinement in Autoregressive Image Generation with Diffusion Loss
Enhancing Instruction Following of LLMs via Activation Steering with Dynamic Rejection
Detection of unknown unknowns in autonomous systems
On the Lipschitz Continuity of Set Aggregation Functions and Neural Networks for Sets
Random Controlled Differential Equations
When Language Models Lose Their Mind: The Consequences of Brain Misalignment
FrugalRAG: Less is More in RL Finetuning for Multi-hop Question Answering
Characterizing and Optimizing the Spatial Kernel of Multi Resolution Hash Encodings
Submodular Function Minimization with Dueling Oracle
Quadratic Direct Forecast for Training Multi-Step Time-Series Forecast Models
SAFER: Risk-Constrained Sample-then-Filter in Large Language Models
Explain in Your Own Words: Improving Reasoning via Token-Selective Dual Knowledge Distillation
CONSIGN: Conformal Segmentation Informed by Spatial Groupings via Decomposition
Fine-tuning Behavioral Cloning Policies with Preference‑Based Reinforcement Learning
TINY BUT MIGHTY: A SOFTWARE-HARDWARE CO- DESIGN APPROACH FOR EFFICIENT MULTIMODAL IN- FERENCE ON BATTERY-POWERED SMALL DEVICES
Online Inventory Optimization in Non-Stationary Environment
Do LLMs Forget What They Should? Evaluating In-Context Forgetting in Large Language Models
Dynamics-Predictive Sampling for Active RL Finetuning of Large Reasoning Models
MiSS: Revisiting the Trade-off in LoRA with an Efficient Shard-Sharing Structure
Designing Rules to Pick a Rule: Aggregation by Consistency
Hyperspherical Latents Improve Continuous-Token Autoregressive Generation
LLM Unlearning with LLM Beliefs
Autoregressive Image Generation with Randomized Parallel Decoding
Pyramid Patchification Flow for Visual Generation
Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs
Reverse-Engineered Reasoning for Open-Ended Generation
Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs
Learning to Adapt: In-Context Learning Beyond Stationarity
Reformulation for Pretraining Data Augmentation
Constraint-guided Hardware-aware NAS through Gradient Modification
From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance
Towards Multimodal Data-Driven Scientific Discovery Powered by LLM Agents
Learning under Quantization for High-Dimensional Linear Regression
Propaganda AI: An Analysis of Semantic Divergence in Large Language Models
Transitive RL: Value Learning via Divide and Conquer
On Code-Induced Reasoning in LLMs
InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models
DSSA: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation
Scaling Agent Learning via Experience Synthesis
Evaluating GFlowNet from partial episodes for stable and flexible policy-based training
Prosperity before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs?
Uncover Underlying Correspondence for Robust Multi-view Clustering
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding
Causality ≠ Invariance: Function vs Concept Vectors in LLMs
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Random Spiking Neural Networks are Stable and Spectrally Simple
Epistemic Uncertainty Quantification To Improve Decisions From Black-Box Models
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation
WebArbiter: A Generative Reasoning Process Reward Model for Web Agents
The Value of Information in Human-AI Decision-making
Stage-wise Dynamics of Classifier-Free Guidance in Diffusion Models
Concept-Aware Privacy Mechanisms for Defending Embedding Inversion Attacks
SPREAD: Sampling-based Pareto front Refinement via Efficient Adaptive Diffusion
Extending the Context of Pretrained LLMs by Dropping Their Positional Embedding
We use cookies to store which papers have been visited.
I agree
Successful Page Load
ICLR uses cookies for essential functions only. We do not sell your personal information.
Our Privacy Policy »
Accept
We use cookies to store which papers have been visited.
I agree