Skip to yearly menu bar
Skip to main content
Main Navigation
ICLR
Help/FAQ
Contact ICLR
Downloads
ICLR Blog
Code of Conduct
Privacy Policy
Create Profile
Reset Password
Journal To Conference Track
Diversity & Inclusion
Proceedings at OpenReview
Future Meetings
Press
Exhibitor Information
ICLR Twitter
About ICLR
My Stuff
Login
Select Year: (2025)
2025
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
Getting Started
Schedule
Main Conference
Invited Talks
Awards
Papers
In-person Orals
Spotlight Posters
Workshops
Community
Town Hall
Socials
Sponsors
Organizers
Help
Helpdesk
RocketChat Client
Website FAQ
Layout:
mini
compact
topic
detail
×
No topics available
No sessions available
title
author
topic
session
shuffle
by
serendipity
bookmarked first
visited first
not visited first
bookmarked but not visited
Enable Javascript in your browser to see the papers page.
Horizon Generalization in Reinforcement Learning
ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL
Stem-OB: Generalizable Visual Imitation Learning with Stem-Like Convergent Observation through Diffusion Inversion
E(n) Equivariant Topological Neural Networks
Revealing the 3D Cosmic Web through Gravitationally Constrained Neural Fields
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation
LR0.FM: LOW-RESOLUTION ZERO-SHOT CLASSIFICATION BENCHMARK FOR FOUNDATION MODELS
ComPC: Completing a 3D Point Cloud with 2D Diffusion Priors
Controlling Space and Time with Diffusion Models
Towards Generalization Bounds of GCNs for Adversarially Robust Node Classification
Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture
L-WISE: Boosting human visual category learning through model-based image selection and enhancement
Learning Structured Representations by Embedding Class Hierarchy with Fast Optimal Transport
Periodic Materials Generation using Text-Guided Joint Diffusion Model
Multi-Robot Motion Planning with Diffusion Models
Ask, and it shall be given: On the Turing completeness of prompting
The Hidden Cost of Waiting for Accurate Predictions
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications for Multi-Task RL
Harnessing Diversity for Important Data Selection in Pretraining Large Language Models
Hierarchically Encapsulated Representation for Protocol Design in Self-Driving Labs
Adversarial Attacks on Data Attribution
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference
Differentiable and Learnable Wireless Simulation with Geometric Transformers
SplatFormer: Point Transformer for Robust 3D Gaussian Splatting
Interpreting Emergent Planning in Model-Free Reinforcement Learning
Reward Learning from Multiple Feedback Types
Temporal Flexibility in Spiking Neural Networks: Towards Generalization Across Time Steps and Deployment Friendliness
Hyper-Connections
Denoising Task Difficulty-based Curriculum for Training Diffusion Models
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Towards a Unified and Verified Understanding of Group-Operation Networks
Neural Context Flows for Meta-Learning of Dynamical Systems
Guaranteed Generation from Large Language Models
PFGuard: A Generative Framework with Privacy and Fairness Safeguards
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding
Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors
X-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos
Learned Reference-based Diffusion Sampler for multi-modal distributions
The Same but Different: Structural Similarities and Differences in Multilingual Language Modeling
Certified Robustness Under Bounded Levenshtein Distance
SEPARATE: A Simple Low-rank Projection for Gradient Compression in Modern Large-scale Model Training Process
Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search
PICASO: Permutation-Invariant Context Composition with State Space Models
PPT: Patch Order Do Matters In Time Series Pretext Task
Block-Attention for Efficient Prefilling
Hierarchical World Models as Visual Whole-Body Humanoid Controllers
Exploring a Principled Framework for Deep Subspace Clustering
GRAIN: Exact Graph Reconstruction from Gradients
Do WGANs succeed because they minimize the Wasserstein Distance? Lessons from Discrete Generators
Event-Driven Online Vertical Federated Learning
Theory on Mixture-of-Experts in Continual Learning
Modality-Specialized Synergizers for Interleaved Vision-Language Generalists
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Revisiting Source-Free Domain Adaptation: a New Perspective via Uncertainty Control
DON’T STOP ME NOW: EMBEDDING BASED SCHEDULING FOR LLMS
Lawma: The Power of Specialization for Legal Annotation
Semialgebraic Neural Networks: From roots to representations
PuzzleFusion++: Auto-agglomerative 3D Fracture Assembly by Denoise and Verify
On Calibration of LLM-based Guard Models for Reliable Content Moderation
When Attention Sink Emerges in Language Models: An Empirical View
Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid
Unsupervised Multiple Kernel Learning for Graphs via Ordinality Preservation
How Discrete and Continuous Diffusion Meet: Comprehensive Analysis of Discrete Diffusion Models via a Stochastic Integral Framework
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Transformers are Universal In-context Learners
CSA: Data-efficient Mapping of Unimodal Features to Multimodal Features
LLM Unlearning via Loss Adjustment with Only Forget Data
BlendRL: A Framework for Merging Symbolic and Neural Policy Learning
Doubly Optimal Policy Evaluation for Reinforcement Learning
Bonsai: Gradient-free Graph Condensation for Node Classification
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"
TabWak: A Watermark for Tabular Diffusion Models
Score-based free-form architectures for high-dimensional Fokker-Planck equations
dEBORA: Efficient Bilevel Optimization-based low-Rank Adaptation
Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data
Masked Temporal Interpolation Diffusion for Procedure Planning in Instructional Videos
MAGNet: Motif-Agnostic Generation of Molecules from Scaffolds
HASARD: A Benchmark for Vision-Based Safe Reinforcement Learning in Embodied Agents
O(d/T) Convergence Theory for Diffusion Probabilistic Models under Minimal Assumptions
Graph-Guided Scene Reconstruction from Images with 3D Gaussian Splatting
On the Completeness of Invariant Geometric Deep Learning Models
Deep Networks Learn Features From Local Discontinuities in the Label Function
Bias Mitigation in Graph Diffusion Models
Bayesian Analysis of Combinatorial Gaussian Process Bandits
Nova: Generative Language Models for Assembly Code with Hierarchical Attention and Contrastive Learning
IFORMER: INTEGRATING CONVNET AND TRANSFORMER FOR MOBILE APPLICATION
Test-Time Ensemble via Linear Mode Connectivity: A Path to Better Adaptation
REVISITING MULTI-PERMUTATION EQUIVARIANCE THROUGH THE LENS OF IRREDUCIBLE REPRESENTATIONS
Neural Multi-Objective Combinatorial Optimization via Graph-Image Multimodal Fusion
Cached Multi-Lora Composition for Multi-Concept Image Generation
Fourier Head: Helping Large Language Models Learn Complex Probability Distributions
STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning
metabench - A Sparse Benchmark of Reasoning and Knowledge in Large Language Models
Scaling LLM Test-Time Compute Optimally Can be More Effective than Scaling Parameters for Reasoning
CONGO: Compressive Online Gradient Optimization
AI2TALE: An Innovative Information Theory-based Approach for Learning to Localize Phishing Attacks
Enhancing Federated Domain Adaptation with Multi-Domain Prototype-Based Federated Fine-Tuning
BRAID: Input-driven Nonlinear Dynamical Modeling of Neural-Behavioral Data
Fast training and sampling of Restricted Boltzmann Machines
Model-Agnostic Knowledge Guided Correction for Improved Neural Surrogate Rollout
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Video Action Differencing
SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement
UniCoTT: A Unified Framework for Structural Chain-of-Thought Distillation
EMOS: Embodiment-aware Heterogeneous Multi-robot Operating System with LLM Agents
Multi-Label Node Classification with Label Influence Propagation
Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs
MoDeGPT: Modular Decomposition for Large Language Model Compression
OptionZero: Planning with Learned Options
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Learning Spatiotemporal Dynamical Systems from Point Process Observations
Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation
From Tokens to Words: On the Inner Lexicon of LLMs
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
NL-Eye: Abductive NLI For Images
DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
Residual Kernel Policy Network: Enhancing Stability and Robustness in RKHS-Based Reinforcement Learning
Revisit the Open Nature of Open Vocabulary Semantic Segmentation
DOPL: Direct Online Preference Learning for Restless Bandits with Preference Feedback
DeepRTL: Bridging Verilog Understanding and Generation with a Unified Representation Model
Boost Self-Supervised Dataset Distillation via Parameterization, Predefined Augmentation, and Approximation
Learning from weak labelers as constraints
HELMET: How to Evaluate Long-context Models Effectively and Thoroughly
CREIMBO: Cross-Regional Ensemble Interactions in Multi-view Brain Observations
CLIPDrag: Combining Text-based and Drag-based Instructions for Image Editing
PharmacoMatch: Efficient 3D Pharmacophore Screening via Neural Subgraph Matching
Lambda-Skip Connections: the architectural component that prevents Rank Collapse
Cross-Attention Head Position Patterns Can Align with Human Visual Concepts in Text-to-Image Generative Models
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation
Coreset Spectral Clustering
From Risk to Uncertainty: Generating Predictive Uncertainty Measures via Bayesian Estimation
GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment
Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians
BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities
Wicked Oddities: Selectively Poisoning for Effective Clean-Label Backdoor Attacks
Fully-inductive Node Classification on Arbitrary Graphs
Bridging Jensen Gap for Max-Min Group Fairness Optimization in Recommendation
Improved Convergence Rate for Diffusion Probabilistic Models
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
OmniKV: Dynamic Context Selection for Efficient Long-Context LLMs
Monet: Mixture of Monosemantic Experts for Transformers
Test-Time Adaptation for Combating Missing Modalities in Egocentric Videos
Optimizing Neural Network Representations of Boolean Networks
6D Object Pose Tracking in Internet Videos for Robotic Manipulation
CO-MOT: Boosting End-to-end Transformer-based Multi-Object Tracking via Coopetition Label Assignment and Shadow Sets
On Bits and Bandits: Quantifying the Regret-Information Trade-off
MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis
Universal generalization guarantees for Wasserstein distributionally robust models
Weakly-Supervised Affordance Grounding Guided by Part-Level Semantic Priors
PETRA: Parallel End-to-end Training with Reversible Architectures
SPD Attack - Prevention of AI Powered Image Editing by Image Immunization
Brain-inspired $L_p$-Convolution benefits large kernels and aligns better with visual cortex
Personality Alignment of Large Language Models
Divide and Translate: Compositional First-Order Logic Translation and Verification for Complex Logical Reasoning
Synthetic continued pretraining
Port-Hamiltonian Architectural Bias for Long-Range Propagation in Deep Graph Networks
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback
Avoid Overclaims: Summary of Complexity Bounds for Algorithms in Minimization and Minimax Optimization
Towards more rigorous evaluations of language models
How do we interpret the outputs of a neural network trained on classification?
Generative Adversarial Ranking Nets
Generating Less Certain Adversarial Examples Improves Robust Generalization
Deriving Causal Order from Single-Variable Interventions: Guarantees & Algorithm
Training LLMs over Neurally Compressed Text
PCNN: Probable-Class Nearest-Neighbor Explanations Improve Fine-Grained Image Classification Accuracy for AIs and Humans
Grid Cell-Inspired Fragmentation and Recall for Efficient Map Building
Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models
Revisiting Feature Prediction for Learning Visual Representations from Video
Revisiting In-context Learning Inference Circuit in Large Language Models
Regularized Proportional Fairness Mechanism for Resource Allocation Without Money
Self-Improvement for Neural Combinatorial Optimization: Sample Without Replacement, but Improvement
Towards Unbiased Calibration using Meta-Regularization
Optimization with Access to Auxiliary Information
MMD-Regularized Unbalanced Optimal Transport
DAMO: Decoding by Accumulating Activations Momentum for Mitigating Hallucinations in Vision-Language Models
$F^3Set$: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos
MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL
SWEb: A Large Web Dataset for the Scandinavian Languages
Do Contemporary Causal Inference Models Capture Real-World Heterogeneity? Findings from a Large-Scale Benchmark
Personalized Representation from Personalized Generation
Disentangling 3D Animal Pose Dynamics with Scrubbed Conditional Latent Variables
TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models
CycleResearcher: Improving Automated Research via Automated Review
Mechanistic Interpretability Meets Vision Language Models: Insights and Limitations
Test-time Adaptation for Image Compression with Distribution Regularization
Interpretable Compressed Descriptions For Image Generation
Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks
SigDiffusions: Score-Based Diffusion Models for Time Series via Log-Signature Embeddings
K-HALU: Multiple Answer Korean Hallucination Benchmark for Large Language Models
FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs
Uncertainty Modeling in Graph Neural Networks via Stochastic Differential Equations
BingoGuard: LLM Content Moderation Tools with Risk Levels
Can a Large Language Model be a Gaslighter?
SONICS: Synthetic Or Not - Identifying Counterfeit Songs
ConcreTizer: Model Inversion Attack via Occupancy Classification and Dispersion Control for 3D Point Cloud Restoration
TeaserGen: Generating Teasers for Long Documentaries
Temporal Heterogeneous Graph Generation with Privacy, Utility, and Efficiency
Identification of Intermittent Temporal Latent Process
ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
SINGER: Stochastic Network Graph Evolving Operator for High Dimensional PDEs
Fast Uncovering of Protein Sequence Diversity from Structure
Contractive Dynamical Imitation Policies for Efficient Out-of-Sample Recovery
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
Provably Robust Explainable Graph Neural Networks against Graph Perturbation Attacks
Pareto Prompt Optimization
Spectral-Refiner: Accurate Fine-Tuning of Spatiotemporal Fourier Neural Operator for Turbulent Flows
Dense Video Object Captioning from Disjoint Supervision
MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors
RelitLRM: Generative Relightable Radiance for Large Reconstruction Models
ReSi: A Comprehensive Benchmark for Representational Similarity Measures
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Learning to Select Nodes in Branch and Bound with Sufficient Tree Representation
Aligning Language Models with Demonstrated Feedback
SimulPL: Aligning Human Preferences in Simultaneous Machine Translation
Efficient Learning with Sine-Activated Low-Rank Matrices
Accelerating Goal-Conditioned Reinforcement Learning Algorithms and Research
Highly Efficient Self-Adaptive Reward Shaping for Reinforcement Learning
Learning Transformer-based World Models with Contrastive Predictive Coding
Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction
PhysPDE: Rethinking PDE Discovery and a Physical HYpothesis Selection Benchmark
FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models
SecureGS: Boosting the Security and Fidelity of 3D Gaussian Splatting Steganography
Trajectory-Class-Aware Multi-Agent Reinforcement Learning
nGPT: Normalized Transformer with Representation Learning on the Hypersphere
From Few to Many: Self-Improving Many-Shot Reasoners Through Iterative Optimization and Generation
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation
Valid Conformal Prediction for Dynamic GNNs
Analytic DAG Constraints for Differentiable DAG Learning
Self-Normalized Resets for Plasticity in Continual Learning
Robustness of Quantum Algorithms for Nonconvex Optimization
Intelligence at the Edge of Chaos
GLoRa: A Benchmark to Evaluate the Ability to Learn Long-Range Dependencies in Graphs
xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation
ManiSkill-HAB: A Benchmark for Low-Level Manipulation in Home Rearrangement Tasks
Generalized Consistency Trajectory Models for Image Manipulation
Gradient-Free Generation for Hard-Constrained Systems
Training Free Guided Flow-Matching with Optimal Control
Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization
Hotspot-Driven Peptide Design via Multi-Fragment Autoregressive Extension
ChemAgent: Self-updating Memories in Large Language Models Improves Chemical Reasoning
TSVD: Bridging Theory and Practice in Continual Learning with Pre-trained Models
KLay: Accelerating Arithmetic Circuits for Neurosymbolic AI
CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models
DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors
Beyond Mere Token Analysis: A Hypergraph Metric Space Framework for Defending Against Socially Engineered LLM Attacks
Interaction Asymmetry: A General Principle for Learning Composable Abstractions
FLOPS: Forward Learning with OPtimal Sampling
Non-Stationary Dueling Bandits Under a Weighted Borda Criterion
Tuning-Free Bilevel Optimization: New Algorithms and Convergence Analysis
Regularization by Texts for Latent Diffusion Inverse Solvers
PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training
AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors
Block Verification Accelerates Speculative Decoding
Accelerating Training with Neuron Interaction and Nowcasting Networks
Repulsive Latent Score Distillation for Solving Inverse Problems
Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries
Breaking the $\log(1/\Delta_2)$ Barrier: Better Batched Best Arm Identification with Adaptive Grids
McEval: Massively Multilingual Code Evaluation
MRS: A Fast Sampler for Mean Reverting Diffusion based on ODE and SDE Solvers
Inverse Scaling: When Bigger Isn't Better
RB-Modulation: Training-Free Stylization using Reference-Based Modulation
CLDyB: Towards Dynamic Benchmarking for Continual Learning with Pre-trained Models
CREAM: Consistency Regularized Self-Rewarding Language Models
BLEND: Behavior-guided Neural Population Dynamics Modeling via Privileged Knowledge Distillation
What Makes Large Language Models Reason in (Multi-Turn) Code Generation?
Unlocking the Power of Function Vectors for Characterizing and Mitigating Catastrophic Forgetting in Continual Instruction Tuning
SD-LoRA: Scalable Decoupled Low-Rank Adaptation for Class Incremental Learning
Anyprefer: An Agentic Framework for Preference Data Synthesis
Long-Sequence Recommendation Models Need Decoupled Embeddings
Mini-batch Coresets for Memory-efficient Language Model Training on Data Mixtures
Beyond Random Masking: When Dropout meets Graph Convolutional Networks
Re-evaluating Open-ended Evaluation of Large Language Models
MrSteve: Instruction-Following Agents in Minecraft with What-Where-When Memory
PEAR: Primitive Enabled Adaptive Relabeling for Boosting Hierarchical Reinforcement Learning
Exploring Learning Complexity for Efficient Downstream Dataset Pruning
Post-hoc Reward Calibration: A Case Study on Length Bias
Hot-pluggable Federated Learning: Bridging General and Personalized FL via Dynamic Selection
Animate-X: Universal Character Image Animation with Enhanced Motion Representation
MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
A Large-Scale 3D Face Mesh Video Dataset via Neural Re-parameterized Optimization
When Graph Neural Networks Meet Dynamic Mode Decomposition
Attention with Markov: A Curious Case of Single-layer Transformers
Supervised and Semi-Supervised Diffusion Maps with Label-Driven Diffusion
Fréchet Wavelet Distance: A Domain-Agnostic Metric for Image Generation
InfoGS: Efficient Structure-Aware 3D Gaussians via Lightweight Information Shaping
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late In Training
HyPoGen: Optimization-Biased Hypernetworks for Generalizable Policy Generation
Multi-objective Differentiable Neural Architecture Search
Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues
Physics-Informed Diffusion Models
OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling
Equivariant Neural Functional Networks for Transformers
How Low Can You Go? Searching for the Intrinsic Dimensionality of Complex Networks using Metric Node Embeddings
Efficient Neuron Segmentation in Electron Microscopy by Affinity-Guided Queries
Emergent Orientation Maps —— Mechanisms, Coding Efficiency and Robustness
FormalAlign: Automated Alignment Evaluation for Autoformalization
KAA: Kolmogorov-Arnold Attention for Enhancing Attentive Graph Neural Networks
Bridging the Gap Between f-divergences and Bayes Hilbert Spaces
Probabilistic Learning to Defer: Handling Missing Expert Annotations and Controlling Workload Distribution
Do You Keep an Eye on What I Ask? Mitigating Multimodal Hallucination via Attention-Guided Ensemble Decoding
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Context Clues: Evaluating Long Context Models for Clinical Prediction Tasks on EHR Data
LICORICE: Label-Efficient Concept-Based Interpretable Reinforcement Learning
Selective Task Group Updates for Multi-Task Optimization
Deconstructing What Makes a Good Optimizer for Autoregressive Language Models
Re-Thinking Inverse Graphics With Large Language Models
Shared-AE: Automatic Identification of Shared Subspaces in High-dimensional Neural and Behavioral Activity
Towards Unbiased Learning in Semi-Supervised Semantic Segmentation
Revisiting Zeroth-Order Optimization: Minimum-Variance Two-Point Estimators and Directionally Aligned Perturbations
On the Price of Differential Privacy for Hierarchical Clustering
HAMSTER: Hierarchical Action Models for Open-World Robot Manipulation
Can a MISL Fly? Analysis and Ingredients for Mutual Information Skill Learning
Rethinking Diffusion Posterior Sampling: From Conditional Score Estimator to Maximizing a Posterior
Why In-Context Learning Models are Good Few-Shot Learners?
Computational Explorations of Total Variation Distance
Isometric Regularization for Manifolds of Functional Data
Advancing Graph Generation through Beta Diffusion
ReCogLab: a framework testing relational reasoning & cognitive hypotheses on LLMs
Revealing and Mitigating Over-Attention in Knowledge Editing
Improved Algorithms for Kernel Matrix-Vector Multiplication Under Sparsity Assumptions
Mitigating Object Hallucination in MLLMs via Data-augmented Phrase-level Alignment
Do Egocentric Video-Language Models Truly Understand Hand-Object Interactions?
Optimistic Games for Combinatorial Bayesian Optimization with Application to Protein Design
Tree of Attributes Prompt Learning for Vision-Language Models
Multi-Modal and Multi-Attribute Generation of Single Cells with CFGen
Learning from negative feedback, or positive feedback or both
DEPT: Decoupled Embeddings for Pre-training Language Models
Learning Gain Map for Inverse Tone Mapping
Advancing Prompt-Based Methods for Replay-Independent General Continual Learning
Approximation algorithms for combinatorial optimization with predictions
Learning to Explore and Exploit with GNNs for Unsupervised Combinatorial Optimization
Robust Barycenter Estimation using Semi-Unbalanced Neural Optimal Transport
Mitigating Hallucination in Large Vision-Language Models via Modular Attribution and Intervention
On the Expressiveness of Rational ReLU Neural Networks With Bounded Depth
AgentRefine: Enhancing Agent Generalization through Refinement Tuning
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation
Approximating Full Conformal Prediction for Neural Network Regression with Gauss-Newton Influence
Lift Your Molecules: Molecular Graph Generation in Latent Euclidean Space
Efficiently Parameterized Neural Metriplectic Systems
Near-Exact Privacy Amplification for Matrix Mechanisms
Federated Continual Learning Goes Online: Uncertainty-Aware Memory Management for Vision Tasks and Beyond
TimeKAN: KAN-based Frequency Decomposition Learning Architecture for Long-term Time Series Forecasting
VAE-Var: Variational Autoencoder-Enhanced Variational Methods for Data Assimilation in Meteorology
PostCast: Generalizable Postprocessing for Precipitation Nowcasting via Unsupervised Blurriness Modeling
Improving Convergence Guarantees of Random Subspace Second-order Algorithm for Nonconvex Optimization
BAMDP Shaping: a Unified Theoretical Framework for Intrinsic Motivation and Reward Shaping
Can Large Language Models Understand Symbolic Graphics Programs?
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Flow: Modularized Agentic Workflow Automation
Identifiability for Gaussian Processes with Holomorphic Kernels
IDInit: A Universal and Stable Initialization Method for Neural Network Training
Addressing Label Shift in Distributed Learning via Entropy Regularization
SBSC: Step-by-Step Coding for Improving Mathematical Olympiad Performance
Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics
Context-aware Dynamic Pruning for Speech Foundation Models
Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement
The Superposition of Diffusion Models Using the Itô Density Estimator
RAPID: Retrieval Augmented Training of Differentially Private Diffusion Models
Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold
Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction
Near-optimal Active Regression of Single-Index Models
SplineGS: Learning Smooth Trajectories in Gaussian Splatting for Dynamic Scene Reconstruction
A Simple Approach to Unifying Diffusion-based Conditional Generation
Density estimation with LLMs: a geometric investigation of in-context learning trajectories
Sparse autoencoders reveal selective remapping of visual concepts during adaptation
Preference Elicitation for Offline Reinforcement Learning
Language Models Need Inductive Biases to Count Inductively
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment
Learning system dynamics without forgetting
Broadening Target Distributions for Accelerated Diffusion Models via a Novel Analysis Approach
Improving Unsupervised Constituency Parsing via Maximizing Semantic Information
MamBEV: Enabling State Space Models to Learn Birds-Eye-View Representations
econSG: Efficient and Multi-view Consistent Open-Vocabulary 3D Semantic Gaussians
Large Language Models Often Say One Thing and Do Another
DSBench: How Far Are Data Science Agents from Becoming Data Science Experts?
Beyond Worst-Case Dimensionality Reduction for Sparse Vectors
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
ACTIVE: Offline Reinforcement Learning via Adaptive Imitation and In-sample $V$-Ensemble
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel
How new data permeates LLM knowledge and how to dilute it
Hierarchical Uncertainty Estimation for Learning-based Registration in Neuroimaging
Diffusing to the Top: Boost Graph Neural Networks with Minimal Hyperparameter Tuning
Improved Techniques for Optimization-Based Jailbreaking on Large Language Models
Enhancing Learning with Label Differential Privacy by Vector Approximation
Accelerating neural network training: An analysis of the AlgoPerf competition
NetMoE: Accelerating MoE Training through Dynamic Sample Placement
Text-to-Image Rectified Flow as Plug-and-Play Priors
NextBestPath: Efficient 3D Mapping of Unseen Environments
AdaFisher: Adaptive Second Order Optimization via Fisher Information
On the Adversarial Risk of Test Time Adaptation: An Investigation into Realistic Test-Time Data Poisoning
Evidential Learning-based Certainty Estimation for Robust Dense Feature Matching
RocketEval: Efficient automated LLM evaluation via grading checklist
IGL-Bench: Establishing the Comprehensive Benchmark for Imbalanced Graph Learning
Can Reinforcement Learning Solve Asymmetric Combinatorial-Continuous Zero-Sum Games?
Resolution Attack: Exploiting Image Compression to Deceive Deep Neural Networks
Cauchy-Schwarz Regularizers
Perturbation-Restrained Sequential Model Editing
Uni$^2$Det: Unified and Universal Framework for Prompt-Guided Multi-dataset 3D Detection
A new framework for evaluating model out-of-distribution generalisation for the biochemical domain
ContextGNN: Beyond Two-Tower Recommendation Systems
Provable Convergence Bounds for Hybrid Dynamical Sampling and Optimization
BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models
Language-Assisted Feature Transformation for Anomaly Detection
HR-Extreme: A High-Resolution Dataset for Extreme Weather Forecasting
Wavelet Diffusion Neural Operator
Towards Self-Supervised Covariance Estimation in Deep Heteroscedastic Regression
Mentored Learning: Improving Generalization and Convergence of Student Learner
SysBench: Can LLMs Follow System Message?
OMG: Opacity Matters in Material Modeling with Gaussian Splatting
Training-Free Activation Sparsity in Large Language Models
Geometry-aware RL for Manipulation of Varying Shapes and Deformable Objects
Building, Reusing, and Generalizing Abstract Representations from Concrete Sequences
Certifying Language Model Robustness with Fuzzed Randomized Smoothing: An Efficient Defense Against Backdoor Attacks
Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
FreqPrior: Improving Video Diffusion Models with Frequency Filtering Gaussian Noise
UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
A Benchmark for Semantic Sensitive Information in LLMs Outputs
Diverse Preference Learning for Capabilities and Alignment
CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
Predicate Hierarchies Improve Few-Shot State Classification
Differential Transformer
Injective flows for star-like manifolds
Efficient Off-Policy Learning for High-Dimensional Action Spaces
Expand and Compress: Exploring Tuning Principles for Continual Spatio-Temporal Graph Forecasting
Episodic Memories Generation and Evaluation Benchmark for Large Language Models
GeSubNet: Gene Interaction Inference for Disease Subtype Network Generation
Transformer Learns Optimal Variable Selection in Group-Sparse Classification
On the Feature Learning in Diffusion Models
On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
DarkBench: Benchmarking Dark Patterns in Large Language Models
VTDexManip: A Dataset and Benchmark for Visual-tactile Pretraining and Dexterous Manipulation with Reinforcement Learning
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts
On the Role of Attention Heads in Large Language Model Safety
Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement Learning
RECAST: Reparameterized, Compact weight Adaptation for Sequential Tasks
Non-myopic Generation of Language Models for Reasoning and Planning
Uncovering Gaps in How Humans and LLMs Interpret Subjective Language
Extreme Risk Mitigation in Reinforcement Learning using Extreme Value Theory
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Efficient Inference for Large Language Model-based Generative Recommendation
RefactorBench: Evaluating Stateful Reasoning in Language Agents Through Code
TIGeR: Unifying Text-to-Image Generation and Retrieval with Large Multimodal Models
Transformers Struggle to Learn to Search
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
Fast Training of Sinusoidal Neural Fields via Scaling Initialization
ZIP: An Efficient Zeroth-order Prompt Tuning for Black-box Vision-Language Models
Stochastic Semi-Gradient Descent for Learning Mean Field Games with Population-Aware Function Approximation
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
TIPS: Text-Image Pretraining with Spatial awareness
Generalized Behavior Learning from Diverse Demonstrations
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
Revisiting Convolution Architecture in the Realm of DNA Foundation Models
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
COAT: Compressing Optimizer states and Activations for Memory-Efficient FP8 Training
Learn-by-interact: A Data-Centric Framework For Self-Adaptive Agents in Realistic Environments
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
SANA: Efficient High-Resolution Text-to-Image Synthesis with Linear Diffusion Transformers
A transfer learning framework for weak to strong generalization
Direct Post-Training Preference Alignment for Multi-Agent Motion Generation Model Using Implicit Feedback from Pre-training Demonstrations
LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias
LIFe-GoM: Generalizable Human Rendering with Learned Iterative Feedback Over Multi-Resolution Gaussians-on-Mesh
Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling
VCR: Pixel-Level Complex Reasoning by Restoring Occluded Text
Generalization v.s. Memorization: Tracing Language Models’ Capabilities Back to Pretraining Data
On Linear Representations and Pretraining Data Frequency in Language Models
CPSample: Classifier Protected Sampling for Guarding Training Data During Diffusion
Point-based Instance Completion with Scene Constraints
One Hundred Neural Networks and Brains Watching Videos: Lessons from Alignment
Frame-Voyager: Learning to Query Frames for Video Large Language Models
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?
Exploiting Distribution Constraints for Scalable and Efficient Image Retrieval
Halton Scheduler for Masked Generative Image Transformer
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Is uniform expressivity too restrictive? Towards efficient expressivity of GNNs
ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments
Backdooring Vision-Language Models with Out-Of-Distribution Data
ImpScore: A Learnable Metric For Quantifying The Implicitness Level of Sentences
Jamba: Hybrid Transformer-Mamba Language Models
Geometry of Long-Tailed Representation Learning: Rebalancing Features for Skewed Distributions
On the Modeling Capabilities of Large Language Models for Sequential Decision Making
MaestroMotif: Skill Design from Artificial Intelligence Feedback
A Causal Lens for Learning Long-term Fair Policies
DINOv2: Learning Robust Visual Features without Supervision
Watermark Anything With Localized Messages
Root Cause Analysis of Anomalies in Multivariate Time Series through Granger Causal Discovery
Human-inspired Episodic Memory for Infinite Context LLMs
E-Valuating Classifier Two-Sample Tests
See What You Are Told: Visual Attention Sink in Large Multimodal Models
SPA-BENCH: A COMPREHENSIVE BENCHMARK FOR SMARTPHONE AGENT EVALUATION
Revisiting Large-Scale Non-convex Distributionally Robust Optimization
Lightning-Fast Image Inversion and Editing for Text-to-Image Diffusion Models
DRESSing Up LLM: Efficient Stylized Question-Answering via Style Subspace Editing
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark for Large Language Models
Metalic: Meta-Learning In-Context with Protein Language Models
Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
Hummingbird: High Fidelity Image Generation via Multimodal Context Alignment
ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks
Learning Dynamics of LLM Finetuning
Towards counterfactual fairness through auxiliary variables
ToolGen: Unified Tool Retrieval and Calling via Generation
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods
On the Optimization and Generalization of Multi-head Attention
Consistency Checks for Language Model Forecasters
MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models
Latent-EnSF: A Latent Ensemble Score Filter for High-Dimensional Data Assimilation with Sparse Observation Data
The Foundations of Tokenization: Statistical and Computational Concerns
Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaptation
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
Faster, More Efficient RLHF through Off-Policy Asynchronous Learning
On the Benefits of Attribute-Driven Graph Domain Adaptation
Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching
Language Guided Skill Discovery
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
MaskBit: Embedding-free Image Generation via Bit Tokens
Mechanistic Permutability: Match Features Across Layers
Learn Your Reference Model for Real Good Alignment
RuAG: Learned-rule-augmented Generation for Large Language Models
From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data
RegMix: Data Mixture as Regression for Language Model Pre-training
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Scaling up Masked Diffusion Models on Text
Bootstrapping Language Models with DPO Implicit Rewards
DiffGAD: A Diffusion-based Unsupervised Graph Anomaly Detector
Atomas: Hierarchical Adaptive Alignment on Molecule-Text for Unified Molecule Understanding and Generation
Data Unlearning in Diffusion Models
What Secrets Do Your Manifolds Hold? Understanding the Local Geometry of Generative Models
SELF-EVOLVED REWARD LEARNING FOR LLMS
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agent
Optimization by Parallel Quasi-Quantum Annealing with Gradient-Based Sampling
MMTEB: Massive Multilingual Text Embedding Benchmark
MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
Shedding Light on Time Series Classification using Interpretability Gated Networks
OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning
Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation
HQGS: High-Quality Novel View Synthesis with Gaussian Splatting in Degraded Scenes
MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection
SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction
Optimal Flow Transport and its Entropic Regularization: a GPU-friendly Matrix Iterative Algorithm for Flow Balance Satisfaction
TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees
BodyGen: Advancing Towards Efficient Embodiment Co-Design
PN-GAIL: Leveraging Non-optimal Information from Imperfect Demonstrations
ScImage: How good are multimodal large language models at scientific text-to-image generation?
ParaSolver: A Hierarchical Parallel Integral Solver for Diffusion Models
NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer
Deconstructing Denoising Diffusion Models for Self-Supervised Learning
Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models
Warm Diffusion: Recipe for Blur-Noise Mixture Diffusion Models
3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation
PhiNets: Brain-inspired Non-contrastive Learning Based on Temporal Prediction Hypothesis
DGQ: Distribution-Aware Group Quantization for Text-to-Image Diffusion Models
Towards Unified Human Motion-Language Understanding via Sparse Interpretable Characterization
AgentSquare: Automatic LLM Agent Search in Modular Design Space
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Flow Distillation Sampling: Regularizing 3D Gaussians with Pre-trained Matching Priors
From an LLM Swarm to a PDDL-empowered Hive: Planning Self-executed Instructions in a Multi-modal Jungle
Accelerating Diffusion Transformers with Token-wise Feature Caching
SpaceGNN: Multi-Space Graph Neural Network for Node Anomaly Detection with Extremely Limited Labels
VLAS: Vision-Language-Action Model with Speech Instructions for Customized Robot Manipulation
Near, far: Patch-ordering enhances vision foundation models' scene understanding
Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model
Model-based Offline Reinforcement Learning with Lower Expectile Q-Learning
CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models
Metamizer: A Versatile Neural Optimizer for Fast and Accurate Physics Simulations
RRM: Robust Reward Model Training Mitigates Reward Hacking
PWM: Policy Learning with Multi-Task World Models
EgoSim: Egocentric Exploration in Virtual Worlds with Multi-modal Conditioning
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models
Causal Information Prioritization for Efficient Reinforcement Learning
PQMass: Probabilistic Assessment of the Quality of Generative Models using Probability Mass Estimation
Quamba: A Post-Training Quantization Recipe for Selective State Space Models
Probabilistic Geometric Principal Component Analysis with application to neural data
A General Framework for Off-Policy Learning with Partially-Observed Reward
Bayesian Image Regression with Soft-thresholded Conditional Autoregressive Prior
GS-CPR: Efficient Camera Pose Refinement via 3D Gaussian Splatting
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
Exponential Topology-enabled Scalable Communication in Multi-agent Reinforcement Learning
Towards a Complete Logical Framework for GNN Expressiveness
DCT-CryptoNets: Scaling Private Inference in the Frequency Domain
Air Quality Prediction with Physics-Guided Dual Neural ODEs in Open Systems
FlashRNN: I/O-Aware Optimization of Traditional RNNs on modern hardware
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
PaPaGei: Open Foundation Models for Optical Physiological Signals
U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
Conformal Prediction Sets Can Cause Disparate Impact
GSBA$^K$: $top$-$K$ Geometric Score-based Black-box Attack
Minimalistic Predictions for Online Class Constraint Scheduling
Energy-based Backdoor Defense Against Federated Graph Learning
Rethinking the generalization of drug target affinity prediction algorithms via similarity aware evaluation
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
Manifolds, Random Matrices and Spectral Gaps: The geometric phases of generative diffusion
A Truncated Newton Method for Optimal Transport
A Training-Free Sub-quadratic Cost Transformer Model Serving Framework with Hierarchically Pruned Attention
Learning Partial Graph Matching via Optimal Partial Transport
GNNs Getting ComFy: Community and Feature Similarity Guided Rewiring
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models
Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models
Mixture-of-Agents Enhances Large Language Model Capabilities
Scaling Instruction-tuned LLMs to Million-token Contexts via Hierarchical Synthetic Data Generation
Enhancing End-to-End Autonomous Driving with Latent World Model
API Pack: A Massive Multi-Programming Language Dataset for API Call Generation
Vision CNNs trained to estimate spatial latents learned similar ventral-stream-aligned representations
Generating Physical Dynamics under Priors
Emergence of meta-stable clustering in mean-field transformer models
Efficient Reinforcement Learning with Large Language Model Priors
MOFFlow: Flow Matching for Structure Prediction of Metal-Organic Frameworks
Steering LLMs' Behavior with Concept Activation Vectors
Open-Source vs Close-Source: The Context Utilization Challenge
I Can Hear You: Selective Robust Training for Deepfake Audio Detection
Scaling Speech-Text Pre-training with Synthetic Interleaved Data
Operator Deep Smoothing for Implied Volatility
Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels
SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding
TAU-106K: A New Dataset for Comprehensive Understanding of Traffic Accident
TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types
OASIS Uncovers: High-Quality T2I Models, Same Old Stereotypes
Correlating instruction-tuning (in multimodal models) with vision-language processing (in the brain)
Language Model Alignment in Multilingual Trolley Problems
Aria-MIDI: A Dataset of Piano MIDI Files for Symbolic Music Modeling
UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation
Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object Detection
Towards Homogeneous Lexical Tone Decoding from Heterogeneous Intracranial Recordings
Learning Structured Universe Graph with Outlier OOD Detection for Partial Matching
Not All Language Model Features Are One-Dimensionally Linear
BadRobot: Jailbreaking Embodied LLMs in the Physical World
Strong Model Collapse
Depth Any Video with Scalable Synthetic Data
Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models
Glimpse: Enabling White-Box Methods to Use Proprietary Models for Zero-Shot LLM-Generated Text Detection
VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Diversity-Rewarded CFG Distillation
Real-time design of architectural structures with differentiable mechanics and neural networks
Image Watermarks are Removable using Controllable Regeneration from Clean Noise
FairDen: Fair Density-Based Clustering
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Personalized Visual Instruction Tuning
KGARevion: An AI Agent for Knowledge-Intensive Biomedical QA
CyberHost: A One-stage Diffusion Framework for Audio-driven Talking Body Generation
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency
Complexity Lower Bounds of Adaptive Gradient Algorithms for Non-convex Stochastic Optimization under Relaxed Smoothness
SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
DSPO: Direct Score Preference Optimization for Diffusion Model Alignment
Fine-tuning with Reserved Majority for Noise Reduction
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
Robust System Identification: Finite-sample Guarantees and Connection to Regularization
InterMask: 3D Human Interaction Generation via Collaborative Masked Modeling
Finding and Only Finding Differential Nash Equilibria by Both Pretending to be a Follower
Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning
CirT: Global Subseasonal-to-Seasonal Forecasting with Geometry-inspired Transformer
Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Dist Loss: Enhancing Regression in Few-Shot Region through Distribution Distance Constraint
Enhancing Vision-Language Model with Unmasked Token Alignment
OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code
Improved Sampling Algorithms for Lévy-Itô Diffusion Models
LeanVec: Searching vectors faster by making them fit
New Algorithms for the Learning-Augmented k-means Problem
Understanding Fairness Surrogate Functions in Algorithmic Fairness
On the Inherent Privacy Properties of Discrete Denoising Diffusion Models
Soft Merging of Experts with Adaptive Routing
Unlocking Guidance for Discrete State-Space Diffusion and Flow Models
See It from My Perspective: How Language Affects Cultural Bias in Image Understanding
Agree to Disagree: Demystifying Homogeneous Deep Ensembles through Distributional Equivalence
What Has Been Overlooked in Contrastive Source-Free Domain Adaptation: Leveraging Source-Informed Latent Augmentation within Neighborhood Context
GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models
Solving Inverse Problems with Model Mismatch using Untrained Neural Networks within Model-based Architectures
Causal Reasoning and Large Language Models: Opening a New Frontier for Causality
A Statistical Approach for Controlled Training Data Detection
Revisiting Energy Based Models as Policies: Ranking Noise Contrastive Estimation and Interpolating Energy Models
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
Breaking Class Barriers: Efficient Dataset Distillation via Inter-Class Feature Compensator
Transition Path Sampling with Improved Off-Policy Training of Diffusion Path Samplers
Interpreting Global Perturbation Robustness of Image Models using Axiomatic Spectral Importance Decomposition
From Complexity to Clarity: Analytical Expressions of Deep Neural Network Weights via Clifford Algebra and Convexity
Sensitivity-Aware Amortized Bayesian Inference
Exploiting Hankel-Toeplitz Structures for Fast Computation of Kernel Precision Matrices
Forget the Data and Fine-Tuning! Just Fold the Network to Compress
Differentially Private Federated Learning with Time-Adaptive Privacy Spending
LoRA Learns Less and Forgets Less
VideoGLUE: Video General Understanding Evaluation of Foundation Models
Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning
Hessian Free Efficient Single Loop Iterative Differentiation Methods for Bi-Level Optimization Problems
Transformer Encoder Satisfiability: Complexity and Impact on Formal Reasoning
Plug, Play, and Generalize: Length Extrapolation with Pointer-Augmented Neural Memory
Fair Clustering in the Sliding Window Model
Risk-Controlling Model Selection via Guided Bayesian Optimization
Equivariant Symmetry Breaking Sets
Robustness Auditing for Linear Regression: To Singularity and Beyond
Reward Guided Latent Consistency Distillation
Linear Mode Connectivity in Differentiable Tree Ensembles
Strong Preferences Affect the Robustness of Preference Models and Value Alignment
Manifold Learning by Mixture Models of VAEs for Inverse Problems
Information Theoretic Text-to-Image Alignment
Efficient Cross-Episode Meta-RL
How Two-Layer Neural Networks Learn, One (Giant) Step at a Time
Learning Regularized Graphon Mean-Field Games with Unknown Graphons
Boundary constrained Gaussian processes for robust physics-informed machine learning of linear partial differential equations
HiRA: Parameter-Efficient Hadamard High-Rank Adaptation for Large Language Models
Graph Neural Preconditioners for Iterative Solutions of Sparse Linear Systems
MGCFNN: A Neural MultiGrid Solver with Novel Fourier Neural Network for High Wave Number Helmholtz Equations
Measuring memorization in RLHF for code completion
HyperPLR: Hypergraph Generation through Projection, Learning, and Reconstruction
Factual Context Validation and Simplification: A Scalable Method to Enhance GPT Trustworthiness and Efficiency
A Curious Case of the Missing Measure: Better Scores and Worse Generation
PolyNet: Learning Diverse Solution Strategies for Neural Combinatorial Optimization
Flow With What You Know
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
Provable Uncertainty Decomposition via Higher-Order Calibration
Understanding Model Calibration - A gentle introduction and visual exploration of calibration and the expected calibration error (ECE)
“I Am the One and Only, Your Cyber BFF”: Understanding the Impact of GenAI Requires Understanding the Impact of Anthropomorphic AI
Restating the Proof of Linear Convergence for Linear GNNs
A Visual Dive into Conditional Flow Matching
Multi-modal Learning: A Look Back and the Road Ahead
TopoNets: High performing vision and language models with brain-like topography
Intricacies of Feature Geometry in Large Language Models
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
Positional Embeddings in Transformer Models: Evolution from Text to Vision Domains
Analysing The Spectral Biases in Generative Models
How to visualize training dynamics in neural networks
Flaws of ImageNet, Computer Vision's Favourite Dataset
Efficient Model Editing with Task-Localized Sparse Fine-tuning
Building Blocks of Differentially Private Training
Provence: efficient and robust context pruning for retrieval-augmented generation
A primer on analytical learning dynamics of nonlinear neural networks
Robustness Reprogramming for Representation Learning
Rethinking Graph Prompts: Unraveling the Power of Data Manipulation in Graph Neural Networks
Vision-LSTM: xLSTM as Generic Vision Backbone
Models trained with unnormalized density functions: A need for a course correction
Fine-Tuning Token-Based Large Multimodal Models: What Works, What Doesn’t and What's Next
Test-time Adaptation for Regression by Subspace Alignment
Peeking Behind Closed Doors: Risks of LLM Evaluation by Private Data Curators
Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient
3D Vision-Language Gaussian Splatting
Learning the Optimal Stopping for Early Classification within Finite Horizons via Sequential Probability Ratio Test
Lost in Prediction: Why Social Media Narratives Don't Help Macroeconomic Forecasting?
Dynamic Neural Fortresses: An Adaptive Shield for Model Extraction Defense
Adaptive Gradient Clipping for Robust Federated Learning
Holistically Evaluating the Environmental Impact of Creating Language Models
Towards Understanding Text Hallucination of Diffusion Models via Local Generation Bias
HOPE for a Robust Parameterization of Long-memory State Space Models
GS-LiDAR: Generating Realistic LiDAR Point Clouds with Panoramic Gaussian Splatting
Neural Eulerian Scene Flow Fields
Improved Sampling Of Diffusion Models In Fluid Dynamics With Tweedie's Formula
Probing the Latent Hierarchical Structure of Data via Diffusion Models
Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming
Variational Bayesian Pseudo-Coreset
The Computational Complexity of Circuit Discovery for Inner Interpretability
Do Deep Neural Network Solutions Form a Star Domain?
Rethinking Artistic Copyright Infringements In the Era Of Text-to-Image Generative Models
Unsupervised Zero-Shot Reinforcement Learning via Dual-Value Forward-Backward Representation
Edge-aware Image Smoothing with Relative Wavelet Domain Representation
Toward Exploratory Inverse Constraint Inference with Generative Diffusion Verifiers
Debiasing Mini-Batch Quadratics for Applications in Deep Learning
Uni-Sign: Toward Unified Sign Language Understanding at Scale
Domain Guidance: A Simple Transfer Approach for a Pre-trained Diffusion Model
ProtComposer: Compositional Protein Structure Generation with 3D Ellipsoids
Mufu: Multilingual Fused Learning for Low-Resource Translation with LLM
Transformers Can Learn Temporal Difference Methods for In-Context Reinforcement Learning
Emergence of a High-Dimensional Abstraction Phase in Language Transformers
Residual Connections and Normalization Can Provably Prevent Oversmoothing in GNNs
ThunderKittens: Simple, Fast, and $\textit{Adorable}$ Kernels
Incremental Causal Effect for Time to Treatment Initialization
A Percolation Model of Emergence: Analyzing Transformers Trained on a Formal Language
BOND: Aligning LLMs with Best-of-N Distillation
ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization
Influence-Guided Diffusion for Dataset Distillation
Monitoring Latent World States in Language Models with Propositional Probes
Factor Graph-based Interpretable Neural Networks
OmniRe: Omni Urban Scene Reconstruction
TRACE: Temporal Grounding Video LLM via Causal Event Modeling
Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances
Transformer Meets Twicing: Harnessing Unattended Residual Information
Continuous Exposure Learning for Low-light Image Enhancement using Neural ODEs
Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
On the Fourier analysis in the SO(3) space : the EquiLoPO Network
Efficient Source-Free Time-Series Adaptation via Parameter Subspace Disentanglement
Improved Diffusion-based Generative Model with Better Adversarial Robustness
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models
Generalized Principal-Agent Problem with a Learning Agent
Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs
PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
Towards Calibrated Deep Clustering Network
Private Mechanism Design via Quantile Estimation
Rapid Selection and Ordering of In-Context Demonstrations via Prompt Embedding Clustering
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid
Revisiting Mode Connectivity in Neural Networks with Bezier Surface
TexTailor: Customized Text-aligned Texturing via Effective Resampling
Elliptic Loss Regularization
SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios
SysCaps: Language Interfaces for Simulation Surrogates of Complex Systems
SAM 2: Segment Anything in Images and Videos
Linear Recurrences Accessible to Everyone
Aligned Better, Listen Better For Audio-Visual Large Language Models
LASER: A Neuro-Symbolic Framework for Learning Spatio-Temporal Scene Graphs with Weak Supervision
GenXD: Generating Any 3D and 4D Scenes
Discrete GCBF Proximal Policy Optimization for Multi-agent Safe Optimal Control
Open-Vocabulary Customization from CLIP via Data-Free Knowledge Distillation
Long Context Compression with Activation Beacon
Singular Subspace Perturbation Bounds via Rectangular Random Matrix Diffusions
Natural Language Inference Improves Compositionality in Vision-Language Models
Bayesian Optimization via Continual Variational Last Layer Training
Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning
Federated $Q$-Learning with Reference-Advantage Decomposition: Almost Optimal Regret and Logarithmic Communication Cost
RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything
Efficient and Robust Neural Combinatorial Optimization via Wasserstein-Based Coresets
Adaptive Rank Allocation: Speeding Up Modern Transformers with RaNA Adapters
PIN: Prolate Spheroidal Wave Function-based Implicit Neural Representations
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References
MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards
How much of my dataset did you use? Quantitative Data Usage Inference in Machine Learning
MGMapNet: Multi-Granularity Representation Learning for End-to-End Vectorized HD Map Construction
Variational Search Distributions
High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws
Differentially private learners for heterogeneous treatment effects
Neuroplastic Expansion in Deep Reinforcement Learning
Balanced Ranking with Relative Centrality: A multi-core periphery perspective
Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation
A General Framework for Producing Interpretable Semantic Text Embeddings
Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning
Explore Theory of Mind: program-guided adversarial data generation for theory of mind reasoning
What should a neuron aim for? Designing local objective functions based on information theory
EqNIO: Subequivariant Neural Inertial Odometry
A deep inverse-mapping model for a flapping robotic wing
Intermediate Layer Classifiers for OOD generalization
How Gradient descent balances features: A dynamical analysis for two-layer neural networks
Leave-One-Out Stable Conformal Prediction
Reinforcement Learning for Control of Non-Markovian Cellular Population Dynamics
PolyhedronNet: Representation Learning for Polyhedra with Surface-attributed Graph
Towards Understanding Why FixMatch Generalizes Better Than Supervised Learning
Differentiable Causal Discovery for Latent Hierarchical Causal Models
Fourier Sliced-Wasserstein Embedding for Multisets and Measures
Beyond Graphs: Can Large Language Models Comprehend Hypergraphs?
Generative Representational Instruction Tuning
Be More Diverse than the Most Diverse: Optimal Mixtures of Generative Models via Mixture-UCB Bandit Algorithms
Few-Class Arena: A Benchmark for Efficient Selection of Vision Models and Dataset Difficulty Measurement
Linear Partial Gromov-Wasserstein Embedding
Algorithmic Stability Based Generalization Bounds for Adversarial Training
Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures
Lossy Compression with Pretrained Diffusion Models
Spectro-Riemannian Graph Neural Networks
Flow matching achieves almost minimax optimal convergence
Learning-Guided Rolling Horizon Optimization for Long-Horizon Flexible Job-Shop Scheduling
Global Convergence of Policy Gradient in Average Reward MDPs
CollabEdit: Towards Non-destructive Collaborative Knowledge Editing
Gradient correlation is a key ingredient to accelerate SGD with momentum
Efficient Action-Constrained Reinforcement Learning via Acceptance-Rejection Method and Augmented MDPs
On a Connection Between Imitation Learning and RLHF
Boosting Multiple Views for pretrained-based Continual Learning
Long-Context Linear System Identification
Has the Deep Neural Network learned the Stochastic Process? An Evaluation Viewpoint
AtomSurf: Surface Representation for Learning on Protein Structures
SelectFormer: Private and Practical Data Selection for Transformers
Conformal Language Model Reasoning with Coherent Factuality
Generating Graphs via Spectral Diffusion
HiLo: A Learning Framework for Generalized Category Discovery Robust to Domain Shifts
Advancing LLM Reasoning Generalists with Preference Trees
CheapNet: Cross-attention on Hierarchical representations for Efficient protein-ligand binding Affinity Prediction
Mixture of In-Context Prompters for Tabular PFNs
Boltzmann Semantic Score: A Semantic Metric for Evaluating Large Vision Models Using Large Language Models
General Scene Adaptation for Vision-and-Language Navigation
uniINF: Best-of-Both-Worlds Algorithm for Parameter-Free Heavy-Tailed MABs
ParetoFlow: Guided Flows in Multi-Objective Optimization
ELBOing Stein: Variational Bayes with Stein Mixture Inference
3DMolFormer: A Dual-channel Framework for Structure-based Drug Discovery
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models
Null Counterfactual Factor Interactions for Goal-Conditioned Reinforcement Learning
Robust Gymnasium: A Unified Modular Benchmark for Robust Reinforcement Learning
Distribution Backtracking Builds A Faster Convergence Trajectory for Diffusion Distillation
Trajectory attention for fine-grained video motion control
Qinco2: Vector Compression and Search with Improved Implicit Neural Codebooks
Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL
Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning
Federated Domain Generalization with Data-free On-server Matching Gradient
LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs
Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models
Interpretable Causal Representation Learning for Biological Data in the Pathway Space
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?
Learning to Contextualize Web Pages for Enhanced Decision Making by LLM Agents
WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models
Group Distributionally Robust Dataset Distillation with Risk Minimization
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
Procedural Synthesis of Synthesizable Molecules
IPDreamer: Appearance-Controllable 3D Object Generation with Complex Image Prompts
An Intelligent Agentic System for Complex Image Restoration Problems
Better Instruction-Following Through Minimum Bayes Risk
Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
Selective Label Enhancement Learning for Test-Time Adaptation
LeFusion: Controllable Pathology Synthesis via Lesion-Focused Diffusion Models
LiFT: Learning to Fine-Tune via Bayesian Parameter Efficient Meta Fine-Tuning
LASeR: Towards Diversified and Generalizable Robot Design with Large Language Models
Broaden your SCOPE! Efficient Multi-turn Conversation Planning for LLMs with Semantic Space
On Speeding Up Language Model Evaluation
Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient
DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes
DiffPuter: An EM-Driven Diffusion Model for Missing Data Imputation
AnoLLM: Large Language Models for Tabular Anomaly Detection
Compositional Entailment Learning for Hyperbolic Vision-Language Models
Iterative Substructure Extraction for Molecular Relational Learning with Interactive Graph Information Bottleneck
Visually Consistent Hierarchical Image Classification
Fast unsupervised ground metric learning with tree-Wasserstein distance
CR2PQ: Continuous Relative Rotary Positional Query for Dense Visual Representation Learning
ConMix: Contrastive Mixup at Representation Level for Long-tailed Deep Clustering
Realistic Evaluation of Deep Partial-Label Learning Algorithms
Simulating Human-like Daily Activities with Desire-driven Autonomy
Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition
Extendable and Iterative Structure Learning Strategy for Bayesian Networks
Automatic Curriculum Expert Iteration for Reliable LLM Reasoning
Input Space Mode Connectivity in Deep Neural Networks
Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model
Machine Unlearning via Simulated Oracle Matching
You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning
Tailoring Mixup to Data for Calibration
Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations
Durable Quantization Conditioned Misalignment Attack on Large Language Models
Demystifying Online Clustering of Bandits: Enhanced Exploration Under Stochastic and Smoothed Adversarial Contexts
HyperFace: Generating Synthetic Face Recognition Datasets by Exploring Face Embedding Hypersphere
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
On Quantizing Neural Representation for Variable-Rate Video Coding
FedTMOS: Efficient One-Shot Federated Learning with Tsetlin Machine
Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models
Planning in Natural Language Improves LLM Search for Code Generation
FreDF: Learning to Forecast in the Frequency Domain
Training-Free Message Passing for Learning on Hypergraphs
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing
Tamper-Resistant Safeguards for Open-Weight LLMs
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
AdaGrad under Anisotropic Smoothness
Pushing the Limits of All-Atom Geometric Graph Neural Networks: Pre-Training, Scaling, and Zero-Shot Transfer
ND-SDF: Learning Normal Deflection Fields for High-Fidelity Indoor Reconstruction
On the self-verification limitations of large language models on reasoning and planning tasks
REFINE: Inversion-Free Backdoor Defense via Model Reprogramming
kNN Attention Demystified: A Theoretical Exploration for Scalable Transformers
Beyond Squared Error: Exploring Loss Design for Enhanced Training of Generative Flow Networks
Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning
RankSHAP: Shapley Value Based Feature Attributions for Learning to Rank
Graph Assisted Offline-Online Deep Reinforcement Learning for Dynamic Workflow Scheduling
Self-Evolving Multi-Agent Collaboration Networks for Software Development
Unlocking Point Processes through Point Set Diffusion
Chemistry-Inspired Diffusion with Non-Differentiable Guidance
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-Time Alignment
Transformers Learn Low Sensitivity Functions: Investigations and Implications
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies
A-Bench: Are LMMs Masters at Evaluating AI-generated Images?
Global Identifiability of Overcomplete Dictionary Learning via L1 and Volume Minimization
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model
What Do You See in Common? Learning Hierarchical Prototypes over Tree-of-Life to Discover Evolutionary Traits
DUALFormer: Dual Graph Transformer
Collab: Controlled Decoding using Mixture of Agents for LLM Alignment
Provable weak-to-strong generalization via benign overfitting
Exploring The Loss Landscape Of Regularized Neural Networks Via Convex Duality
Refining CLIP's Spatial Awareness: A Visual-Centric Perspective
COME: Test-time Adaption by Conservatively Minimizing Entropy
A Probabilistic Perspective on Unlearning and Alignment for Large Language Models
Point Cluster: A Compact Message Unit for Communication-Efficient Collaborative Perception
Binary Losses for Density Ratio Estimation
Discovering Temporally Compositional Neural Manifolds with Switching Infinite GPFA
How to Probe: Simple Yet Effective Techniques for Improving Post-hoc Explanations
Neural Fluid Simulation on Geometric Surfaces
Looped Transformers for Length Generalization
Measuring Non-Adversarial Reproduction of Training Data in Large Language Models
PnP-Flow: Plug-and-Play Image Restoration with Flow Matching
InvestESG: A multi-agent reinforcement learning benchmark for studying climate investment as a social dilemma
Descent with Misaligned Gradients and Applications to Hidden Convexity
Enhancing Compositional Text-to-Image Generation with Reliable Random Seeds
Can One Modality Model Synergize Training of Other Modality Models?
Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis
Make Haste Slowly: A Theory of Emergent Structured Mixed Selectivity in Feature Learning ReLU Networks
Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention
Everything, Everywhere, All at Once: Is Mechanistic Interpretability Identifiable?
A Stochastic Approach to the Subset Selection Problem via Mirror Descent
Conformal Generative Modeling with Improved Sample Efficiency through Sequential Greedy Filtering
Breaking Neural Network Scaling Laws with Modularity
GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation
NeuralPlane: Structured 3D Reconstruction in Planar Primitives with Neural Fields
Captured by Captions: On Memorization and its Mitigation in CLIP Models
Steering Large Language Models between Code Execution and Textual Reasoning
Learning Task Belief Similarity with Latent Dynamics for Meta-Reinforcement Learning
COPER: Correlation-based Permutations for Multi-View Clustering
LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation
MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs
Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse
Conformal Structured Prediction
Affine Steerable Equivariant Layer for Canonicalization of Neural Networks
Learning under Temporal Label Noise
Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity
ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization
KooNPro: A Variance-Aware Koopman Probabilistic Model Enhanced by Neural Process for Time Series Forecasting
Charting the Design Space of Neural Graph Representations for Subgraph Matching
TTVD: Towards a Geometric Framework for Test-Time Adaptation Based on Voronoi Diagram
Convex Formulations for Training Two-Layer ReLU Neural Networks
Precise Localization of Memories: A Fine-grained Neuron-level Knowledge Editing Technique for LLMs
VVC-Gym: A Fixed-Wing UAV Reinforcement Learning Environment for Multi-Goal Long-Horizon Problems
Manifold Constraint Reduces Exposure Bias in Accelerated Diffusion Sampling
Expressivity of Neural Networks with Random Weights and Learned Biases
Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy
Time After Time: Deep-Q Effect Estimation for Interventions on When and What to do
DPLM-2: A Multimodal Diffusion Protein Language Model
Cross-Entropy Is All You Need To Invert the Data Generating Process
Regret-Optimal List Replicable Bandit Learning: Matching Upper and Lower Bounds
Privacy Auditing of Large Language Models
Physics-Informed Deep Inverse Operator Networks for Solving PDE Inverse Problems
On the Identification of Temporal Causal Representation with Instantaneous Dependence
Complementary Label Learning with Positive Label Guessing and Negative Label Enhancement
Not-So-Optimal Transport Flows for 3D Point Cloud Generation
Synergy Between Sufficient Changes and Sparse Mixing Procedure for Disentangled Representation Learning
Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning
In Search of the Engram in LLMs: A Neuroscience Perspective on the Memory Functions in AI Models
Pyramidal Flow Matching for Efficient Video Generative Modeling
Model merging with SVD to tie the Knots
Repurposing in AI: A Distinct Approach or an Extension of Creative Problem Solving?
Scaling up the Banded Matrix Factorization Mechanism for Large Scale Differentially Private ML
Do vision models perceive objects like toddlers ?
Curriculum-aware Training for Discriminating Molecular Property Prediction Models
Variational Diffusion Posterior Sampling with Midpoint Guidance
Preference Diffusion for Recommendation
MotherNet: Fast Training and Inference via Hyper-Network Transformers
NutriBench: A Dataset for Evaluating Large Language Models in Nutrition Estimation from Meal Descriptions
Safety Alignment Should be Made More Than Just a Few Tokens Deep
MTSAM: Multi-Task Fine-Tuning for Segment Anything Model
STAR: Stability-Inducing Weight Perturbation for Continual Learning
Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions
Efficient Biological Data Acquisition through Inference Set Design
Pitfalls of Evidence-Based AI Policy
The Loss Landscape of Deep Linear Neural Networks: a Second-order Analysis
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
Improving Text-to-Image Consistency via Automatic Prompt Optimization
FedLWS: Federated Learning with Adaptive Layer-wise Weight Shrinking
High-dimension Prototype is a Better Incremental Object Detection Learner
Learning multi-modal generative models with permutation-invariant encoders and tighter variational objectives
SINGAPO: Single Image Controlled Generation of Articulated Parts in Objects
Noise Stability Optimization for Finding Flat Minima: A Hessian-based Regularization Approach
Inference Optimal VLMs Need Fewer Visual Tokens and More Parameters
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solver
Interference Among First-Price Pacing Equilibria: A Bias and Variance Analysis
LoRA-X: Bridging Foundation Models with Training-Free Cross-Model Adaptation
HyperDAS: Towards Automating Mechanistic Interpretability with Hypernetworks
BrainUICL: An Unsupervised Individual Continual Learning Framework for EEG Applications
Mastering Task Arithmetic: $\tau$Jp as a Key Indicator for Weight Disentanglement
Benign Overfitting in Out-of-Distribution Generalization of Linear Models
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)
Data Taggants: Dataset Ownership Verification Via Harmless Targeted Data Poisoning
Towards Understanding Why Label Smoothing Degrades Selective Classification and How to Fix It
Dynamic Negative Guidance of Diffusion Models
Learning How Hard to Think: Input-Adaptive Allocation of LM Computation
Divergence-enhanced Knowledge-guided Context Optimization for Visual-Language Prompt Tuning
A Multiscale Frequency Domain Causal Framework for Enhanced Pathological Analysis
Towards Foundation Models for Mixed Integer Linear Programming
LaGeM: A Large Geometry Model for 3D Representation Learning and Diffusion
Multi-Label Test-Time Adaptation with Bound Entropy Minimization
Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models
Bad-PFL: Exploiting Backdoor Attacks against Personalized Federated Learning
Graph Sparsification via Mixture of Graphs
Regretful Decisions under Label Noise
Bandit Learning in Matching Markets with Indifference
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
WardropNet: Traffic Flow Predictions via Equilibrium-Augmented Learning
Prompting Fairness: Integrating Causality to Debias Large Language Models
Simplifying Deep Temporal Difference Learning
Examining Alignment of Large Language Models through Representative Heuristics: the case of political stereotypes
Alchemy: Amplifying Theorem-Proving Capability Through Symbolic Mutation
Logically Consistent Language Models via Neuro-Symbolic Integration
Lie Algebra Canonicalization: Equivariant Neural Operators under arbitrary Lie Groups
Entropy-based Activation Function Optimization: A Method on Searching Better Activation Functions
Forte : Finding Outliers with Representation Typicality Estimation
Graph Neural Ricci Flow: Evolving Feature from a Curvature Perspective
Object-Centric Pretraining via Target Encoder Bootstrapping
Manifold Induced Biases for Zero-shot and Few-shot Detection of Generated Images
CoTFormer: A Chain of Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference
A Black Swan Hypothesis: The Role of Human Irrationality in AI Safety
Generalization through variance: how noise shapes inductive biases in diffusion models
CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs
TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark
Proximal Mapping Loss: Understanding Loss Functions in Crowd Counting & Localization
On the Importance of Language-driven Representation Learning for Heterogeneous Federated Learning
Self-Boosting Large Language Models with Synthetic Preference Data
EdgeRunner: Auto-regressive Auto-encoder for Artistic Mesh Generation
Stiefel Flow Matching for Moment-Constrained Structure Elucidation
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities
HALL-E: Hierarchical Neural Codec Language Model for Minute-Long Zero-Shot Text-to-Speech Synthesis
Optimal Brain Apoptosis
ToolACE: Winning the Points of LLM Function Calling
GDrag:Towards General-Purpose Interactive Editing with Anti-ambiguity Point Diffusion
STAMP: Scalable Task- And Model-agnostic Collaborative Perception
Forking Paths in Neural Text Generation
Breach By A Thousand Leaks: Unsafe Information Leakage in 'Safe' AI Responses
GameGen-X: Interactive Open-world Game Video Generation
LoLCATs: On Low-Rank Linearizing of Large Language Models
Shapley-Guided Utility Learning for Effective Graph Inference Data Valuation
ViSAGe: Video-to-Spatial Audio Generation
Quality Measures for Dynamic Graph Generative Models
TGB-Seq Benchmark: Challenging Temporal GNNs with Complex Sequential Dynamics
Optimal Strong Regret and Violation in Constrained MDPs via Policy Optimization
TetSphere Splatting: Representing High-Quality Geometry with Lagrangian Volumetric Meshes
Controllable Generation via Locally Constrained Resampling
Longhorn: State Space Models are Amortized Online Learners
Explain Yourself, Briefly! Self-Explaining Neural Networks with Concise Sufficient Reasons
DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from a Single Demo
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation
Reconstructive Visual Instruction Tuning
RouteLLM: Learning to Route LLMs from Preference Data
CONDA: Adaptive Concept Bottleneck for Foundation Models Under Distribution Shifts
Generalizability of Neural Networks Minimizing Empirical Risk Based on Expressive Power
ParFam -- (Neural Guided) Symbolic Regression via Continuous Global Optimization
Amortized Control of Continuous State Space Feynman-Kac Model for Irregular Time Series
Do LLMs have Consistent Values?
LevAttention: Time, Space and Streaming Efficient Algorithm for Heavy Attentions
Data Pruning by Information Maximization
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations
Causal Identification for Complex Functional Longitudinal Studies
Semi-Supervised CLIP Adaptation by Enforcing Semantic and Trapezoidal Consistency
On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Enhancing Zeroth-order Fine-tuning for Language Models with Low-rank Structures
Ada-K Routing: Boosting the Efficiency of MoE-based LLMs
VideoPhy: Evaluating Physical Commonsense for Video Generation
An Efficient Framework for Crediting Data Contributors of Diffusion Models
Inverse Constitutional AI: Compressing Preferences into Principles
PRDP: Progressively Refined Differentiable Physics
When do GFlowNets learn the right distribution?
SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget
OmniPhysGS: 3D Constitutive Gaussians for General Physics-Based Dynamics Generation
3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark
Iterative Dual-RL: An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning
ALBAR: Adversarial Learning approach to mitigate Biases in Action Recognition
Towards Optimal Multi-draft Speculative Decoding
IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities
Controllable Unlearning for Image-to-Image Generative Models via $\epsilon$-Constrained Optimization
LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch
DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing
Infilling Score: A Pretraining Data Detection Algorithm for Large Language Models
Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning
Multi-Reward as Condition for Instruction-based Image Editing
Conditional Diffusion with Ordinal Regression: Longitudinal Data Generation for Neurodegenerative Disease Studies
The Complexity of Two-Team Polymatrix Games with Independent Adversaries
Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models
Mixture of Parrots: Experts improve memorization more than reasoning
OSDA Agent: Leveraging Large Language Models for De Novo Design of Organic Structure Directing Agents
Sports-Traj: A Unified Trajectory Generation Model for Multi-Agent Movement in Sports
Vision and Language Synergy for Rehearsal Free Continual Learning
Adaptive Retention & Correction: Test-Time Training for Continual Learning
A CLIP-Powered Framework for Robust and Generalizable Data Selection
Verifying Properties of Binary Neural Networks Using Sparse Polynomial Optimization
Sparse Autoencoders Do Not Find Canonical Units of Analysis
Debiasing Federated Learning with Correlated Client Participation
Beyond Sequence: Impact of Geometric Context for RNA Property Prediction
Causal Order: The Key to Leveraging Imperfect Experts in Causal Inference
SIMPL: Scalable and hassle-free optimisation of neural representations from behaviour
AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution
LocoVR: Multiuser Indoor Locomotion Dataset in Virtual Reality
Integrating Protein Dynamics into Structure-Based Drug Design via Full-Atom Stochastic Flows
MetaOOD: Automatic Selection of OOD Detection Models
Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models
Doubly robust identification of treatment effects from multiple environments
RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Open-CK: A Large Multi-Physics Fields Coupling benchmarks in Combustion Kinetics
Grounding Continuous Representations in Geometry: Equivariant Neural Fields
Consistent Flow Distillation for Text-to-3D Generation
Grounding Multimodal Large Language Model in GUI World
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
Implicit Search via Discrete Diffusion: A Study on Chess
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
XAIguiFormer: explainable artificial intelligence guided transformer for brain disorder identification
The Illustrated AlphaFold
HG-Adapter: Improving Pre-Trained Heterogeneous Graph Neural Networks with Dual Adapters
Decentralized Optimization with Coupled Constraints
Beyond Random Augmentations: Pretraining with Hard Views
MIRACLE 3D: Memory-efficient Integrated Robust Approach for Continual Learning on 3D Point Clouds via Shape Model Construction
Exploiting Structure in Offline Multi-Agent RL: The Benefits of Low Interaction Rank
Learning Efficient Positional Encodings with Graph Neural Networks
I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength
DeLLMa: Decision Making Under Uncertainty with Large Language Models
Repetition Improves Language Model Embeddings
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
Sequential Stochastic Combinatorial Optimization Using Hierarchal Reinforcement Learning
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
A Theoretical Analysis of Self-Supervised Learning for Vision Transformers
RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
Toward Generalizing Visual Brain Decoding to Unseen Subjects
Transformers Provably Learn Two-Mixture of Linear Classification via Gradient Flow
Audio Large Language Models Can Be Descriptive Speech Quality Evaluators
Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs
Scalable Decentralized Learning with Teleportation
LLaMaFlex: Many-in-one LLMs via Generalized Pruning and Weight Sharing
Towards Neural Scaling Laws for Time Series Foundation Models
FLIP: Flow-Centric Generative Planning as General-Purpose Manipulation World Model
Understanding Constraint Inference in Safety-Critical Inverse Reinforcement Learning
TopoGaussian: Inferring Internal Topology Structures from Visual Clues
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
SleepSMC: Ubiquitous Sleep Staging via Supervised Multimodal Coordination
PRISM: Privacy-Preserving Improved Stochastic Masking for Federated Generative Models
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
MuHBoost: Multi-Label Boosting For Practical Longitudinal Human Behavior Modeling
Kolmogorov-Arnold Transformer
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
Flavors of Margin: Implicit Bias of Steepest Descent in Homogeneous Neural Networks
Constructing Confidence Intervals for Average Treatment Effects from Multiple Datasets
Lightweight Neural App Control
Generating CAD Code with Vision-Language Models for 3D Designs
Towards Bridging Generalization and Expressivity of Graph Neural Networks
Spread Preference Annotation: Direct Preference Judgment for Efficient LLM Alignment
MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection
qNBO: quasi-Newton Meets Bilevel Optimization
Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping
LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph Generation
Efficient and Trustworthy Causal Discovery with Latent Variables and Complex Relations
LiveXiv - A Multi-Modal live benchmark based on Arxiv papers content
Learning Molecular Representation in a Cell
Adaptive teachers for amortized samplers
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing
On the Linear Speedup of Personalized Federated Reinforcement Learning with Shared Representations
Autoregressive Pretraining with Mamba in Vision
What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?
Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance
SC-OmniGS: Self-Calibrating Omnidirectional Gaussian Splatting
Reinforcement Learning from Imperfect Corrective Actions and Proxy Rewards
S4M: S4 for multivariate time series forecasting with Missing values
ProteinBench: A Holistic Evaluation of Protein Foundation Models
Leveraging Variable Sparsity to Refine Pareto Stationarity in Multi-Objective Optimization
Non-Equilibrium Dynamics of Hybrid Continuous-Discrete Ground-State Sampling
Backtracking Improves Generation Safety
RGB-Event ISP: The Dataset and Benchmark
Interactive Speculative Planning: Enhance Agent Efficiency through Co-design of System and User Interface
Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation
Hessian-Free Online Certified Unlearning
The KoLMogorov Test: Compression by Code Generation
Does Training with Synthetic Data Truly Protect Privacy?
Can LLMs Solve Longer Math Word Problems Better?
Improving Long-Text Alignment for Text-to-Image Diffusion Models
Real2Code: Reconstruct Articulated Objects via Code Generation
Innovative Thinking, Infinite Humor: Humor Research of Large Language Models through Structured Thought Leaps
ELICIT: LLM Augmentation Via External In-context Capability
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces
DRL: Decomposed Representation Learning for Tabular Anomaly Detection
A Deep Generative Learning Approach for Two-stage Adaptive Robust Optimization
Towards Faster Decentralized Stochastic Optimization with Communication Compression
Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling
Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization
Restyling Unsupervised Concept Based Interpretable Networks with Generative Models
Digi-Q: Learning VLM Q-Value Functions for Training Device-Control Agents
Training Language Models to Self-Correct via Reinforcement Learning
High-Dynamic Radar Sequence Prediction for Weather Nowcasting Using Spatiotemporal Coherent Gaussian Representation
How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension
When narrower is better: the narrow width limit of Bayesian parallel branching neural networks
OPTAMI: Global Superlinear Convergence of High-order Methods
CFD: Learning Generalized Molecular Representation via Concept-Enhanced Feedback Disentanglement
CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution
Understanding Methods for Scalable MCTS
Speech Robust Bench: A Robustness Benchmark For Speech Recognition
Efficient and Context-Aware Label Propagation for Zero-/Few-Shot Training-Free Adaptation of Vision-Language Model
Imputation for prediction: beware of diminishing returns.
QA-Calibration of Language Model Confidence Scores
Learning Shape-Independent Transformation via Spherical Representations for Category-Level Object Pose Estimation
Transformer Block Coupling and its Correlation with Generalization in LLMs
MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs
Refine-by-Align: Reference-Guided Artifacts Refinement through Semantic Alignment
Fast Feedforward 3D Gaussian Splatting Compression
Faster Algorithms for Structured Linear and Kernel Support Vector Machines
Clique Number Estimation via Differentiable Functions of Adjacency Matrix Permutations
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Sensitivity-Constrained Fourier Neural Operators for Forward and Inverse Problems in Parametric Differential Equations
Dynamic Low-Rank Sparse Adaptation for Large Language Models
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
Trivialized Momentum Facilitates Diffusion Generative Modeling on Lie Groups
Provable Convergence and Limitations of Geometric Tempering for Langevin Dynamics
ADMM for Structured Fractional Minimization
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
Reinforcement learning with combinatorial actions for coupled restless bandits
Decoupling Layout from Glyph in Online Chinese Handwriting Generation
Enhancing Document Understanding with Group Position Embedding: A Novel Approach to Incorporate Layout Information
OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Continual Slow-and-Fast Adaptation of Latent Neural Dynamics (CoSFan): Meta-Learning What-How & When to Adapt
DEPfold: RNA Secondary Structure Prediction as Dependency Parsing.
Gaussian-Det: Learning Closed-Surface Gaussians for 3D Object Detection
MADGEN: Mass-Spec attends to De Novo Molecular generation
Controlled LLM Decoding via Discrete Auto-regressive Biasing
Student-Informed Teacher Training
Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs
Evaluating Large Language Models through Role-Guide and Self-Reflection: A Comparative Study
Black-Box Detection of Language Model Watermarks
Large Convolutional Model Tuning via Filter Subspace
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials
DynaPrompt: Dynamic Test-Time Prompt Tuning
A Graph Enhanced Symbolic Discovery Framework For Efficient Logic Optimization
ToVE: Efficient Vision-Language Learning via Knowledge Transfer from Vision Experts
SymmetricDiffusers: Learning Discrete Diffusion Models over Finite Symmetric Groups
Data Center Cooling System Optimization Using Offline Reinforcement Learning
Improved Approximation Algorithms for $k$-Submodular Maximization via Multilinear Extension
CAKE: Cascading and Adaptive KV Cache Eviction with Layer Preferences
GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS
Diffusion-Based Planning for Autonomous Driving with Flexible Guidance
Reveal Object in Lensless Photography via Region Gaze and Amplification
ReGen: Generative Robot Simulation via Inverse Design
The adaptive complexity of parallelized log-concave sampling
Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
Matcha: Mitigating Graph Structure Shifts with Test-Time Adaptation
Efficient Multi-agent Offline Coordination via Diffusion-based Trajectory Stitching
The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise
Agent-Oriented Planning in Multi-Agent Systems
Privacy-Preserving Personalized Federated Prompt Learning for Multimodal Large Language Models
AutoUAD: Hyper-parameter Optimization for Unsupervised Anomaly Detection
Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis
On Large Language Model Continual Unlearning
Local-Prompt: Extensible Local Prompts for Few-Shot Out-of-Distribution Detection
Adapt-$\infty$: Scalable Continual Multimodal Instruction Tuning via Dynamic Data Selection
INFER: A Neural-symbolic Model For Extrapolation Reasoning on Temporal Knowledge Graph
Poison-splat: Computation Cost Attack on 3D Gaussian Splatting
The Ramanujan Library - Automated Discovery on the Hypergraph of Integer Relations
Retrieval Head Mechanistically Explains Long-Context Factuality
BEEM: Boosting Performance of Early Exit DNNs using Multi-Exit Classifiers as Experts
Bisimulation Metric for Model Predictive Control
Regulatory DNA Sequence Design with Reinforcement Learning
Differentially private optimization for non-decomposable objective functions
DataGen: Unified Synthetic Dataset Generation via Large Language Models
DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
STAFF: Speculative Coreset Selection for Task-Specific Fine-tuning
Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs
Determine-Then-Ensemble: Necessity of Top-k Union for Large Language Model Ensembling
MAESTRO: Masked Encoding Set Transformer with Self-Distillation
Proxy Denoising for Source-Free Domain Adaptation
Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning
When does compositional structure yield compositional generalization? A kernel theory.
Spherical Tree-Sliced Wasserstein Distance
Differentiable Integer Linear Programming
Discrete Copula Diffusion
Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot Videos
CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning
Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning
Do Large Language Models Truly Understand Geometric Structures?
In Search of Forgotten Domain Generalization
GIFT: Unlocking Full Potential of Labels in Distilled Dataset at Near-zero Cost
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
EmbedLLM: Learning Compact Representations of Large Language Models
Diffusion Generative Modeling for Spatially Resolved Gene Expression Inference from Histology Images
DELIFT: Data Efficient Language model Instruction Fine-Tuning
Graph-based Document Structure Analysis
UniCO: On Unified Combinatorial Optimization via Problem Reduction to Matrix-Encoded General TSP
GridMix: Exploring Spatial Modulation for Neural Fields in PDE Modeling
Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic Learning
Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
Global Well-posedness and Convergence Analysis of Score-based Generative Models via Sharp Lipschitz Estimates
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
JudgeBench: A Benchmark for Evaluating LLM-Based Judges
AutoCGP: Closed-Loop Concept-Guided Policies from Unlabeled Demonstrations
InstaRevive: One-Step Image Enhancement via Dynamic Score Matching
Rethinking the role of frames for SE(3)-invariant crystal structure modeling
SLMRec: Distilling Large Language Models into Small for Sequential Recommendation
Bridging the Data Provenance Gap Across Text, Speech, and Video
Grounding Video Models to Actions through Goal Conditioned Exploration
SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement
R2Det: Exploring Relaxed Rotation Equivariance in 2D Object Detection
Group-robust Sample Reweighting for Subpopulation Shifts via Influence Functions
ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning
Neural Functions for Learning Periodic Signal
OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer
Rational Decision-Making Agent with Learning Internal Utility Judgment
Efficient Active Imitation Learning with Random Network Distillation
MMQA: Evaluating LLMs with Multi-Table Multi-Hop Complex Questions
TFG-Flow: Training-free Guidance in Multimodal Generative Flow
ADMM for Nonconvex Optimization under Minimal Continuity Assumption
Skill Expansion and Composition in Parameter Space
SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation
Interpreting the Second-Order Effects of Neurons in CLIP
Optimizing $(L_0, L_1)$-Smooth Functions by Gradient Methods
Weakly Supervised Video Scene Graph Generation via Natural Language Supervision
MAPS: Advancing Multi-Modal Reasoning in Expert-Level Physical Science
Mutual Effort for Efficiency: A Similarity-based Token Pruning for Vision Transformers in Self-Supervised Learning
Discrete Latent Plans via Semantic Skill Abstractions
3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds
SMT: Fine-Tuning Large Language Models with Sparse Matrices
RaSA: Rank-Sharing Low-Rank Adaptation
NeurFlow: Interpreting Neural Networks through Neuron Groups and Functional Interactions
Chain-of-Focus Prompting: Leveraging Sequential Visual Cues to Prompt Large Autoregressive Vision Models
Centrality-guided Pre-training for Graph
Taming Transformer Without Using Learning Rate Warmup
Unhackable Temporal Reward for Scalable Video MLLMs
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Enhancing Pre-trained Representation Classifiability can Boost its Interpretability
Learning Graph Invariance by Harnessing Spuriosity
INS: Interaction-aware Synthesis to Enhance Offline Multi-agent Reinforcement Learning
Divergence-Regularized Discounted Aggregation: Equilibrium Finding in Multiplayer Partially Observable Stochastic Games
Empowering LLM Agents with Zero-Shot Optimal Decision-Making through Q-learning
Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages
Round and Round We Go! What makes Rotary Positional Encodings useful?
Remove Symmetries to Control Model Expressivity and Improve Optimization
ZETA: Leveraging $Z$-order Curves for Efficient Top-$k$ Attention
Atlas Gaussians Diffusion for 3D Generation
Partially Observed Trajectory Inference using Optimal Transport and a Dynamics Prior
Data Shapley in One Training Run
ADAPT: Attentive Self-Distillation and Dual-Decoder Prediction Fusion for Continual Panoptic Segmentation
Systems with Switching Causal Relations: A Meta-Causal Perspective
Towards Improving Exploration through Sibling Augmented GFlowNets
Guided Score identity Distillation for Data-Free One-Step Text-to-Image Generation
VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration
Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data
Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
World Model on Million-Length Video And Language With Blockwise RingAttention
Certifying Counterfactual Bias in LLMs
Learning Generalizable Skills from Offline Multi-Task Data for Multi-Agent Cooperation
Improving Uncertainty Estimation through Semantically Diverse Language Generation
Learning LLM-as-a-Judge for Preference Alignment
An Illustrated Guide to Automatic Sparse Differentiation
Machine Unlearning Fails to Remove Data Poisoning Attacks
RandLoRA: Full rank parameter-efficient fine-tuning of large models
Rethinking Evaluation of Sparse Autoencoders through the Representation of Polysemous Words
Union-over-Intersections: Object Detection beyond Winner-Takes-All
Refine Knowledge of Large Language Models via Adaptive Contrastive Learning
Unified Convergence Analysis for Score-Based Diffusion Models with Deterministic Samplers
STAR: Synthesis of Tailored Architectures
AlphaEdit: Null-Space Constrained Model Editing for Language Models
Glauber Generative Model: Discrete Diffusion Models via Binary Classification
SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Maximizing the Potential of Synthetic Data: Insights from Random Matrix Theory
Let Your Features Tell The Differences: Understanding Graph Convolution By Feature Splitting
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
HGM³: Hierarchical Generative Masked Motion Modeling with Hard Token Mining
Do LLMs estimate uncertainty well in instruction-following?
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions
Harnessing Webpage UIs for Text-Rich Visual Understanding
Can In-context Learning Really Generalize to Out-of-distribution Tasks?
Narrowing Information Bottleneck Theory for Multimodal Image-Text Representations Interpretability
From Models to Microtheories: Distilling a Model's Topical Knowledge for Grounded Question-Answering
On Designing General and Expressive Quantum Graph Neural Networks with Applications to MILP Instance Representation
Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Neuron Platonic Intrinsic Representation From Dynamics Using Contrastive Learning
CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding & Reasoning Capabilities of CodeLLMs
DynAlign: Unsupervised Dynamic Taxonomy Alignment for Cross-Domain Segmentation
Controllable Context Sensitivity and the Knob Behind It
Second-Order Min-Max Optimization with Lazy Hessians
Start Smart: Leveraging Gradients For Enhancing Mask-based XAI Methods
Tell me about yourself: LLMs are aware of their learned behaviors
Towards Marginal Fairness Sliced Wasserstein Barycenter
Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequences
From Attention to Activation: Unraveling the Enigmas of Large Language Models
IV-mixed Sampler: Leveraging Image Diffusion Models for Enhanced Video Synthesis
NeuroLM: A Universal Multi-task Foundation Model for Bridging the Gap between Language and EEG Signals
Improving Reasoning Performance in Large Language Models via Representation Engineering
PseDet: Revisiting the Power of Pseudo Label in Incremental Object Detection
Multi-session, multi-task neural decoding from distinct cell-types and brain regions
LOIRE: LifelOng learning on Incremental data via pre-trained language model gRowth Efficiently
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine
A Simple yet Effective $\Delta\Delta G$ Predictor is An Unsupervised Antibody Optimizer and Explainer
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
ToolDial: Multi-turn Dialogue Generation Method for Tool-Augmented Language Models
Not All LLM-Generated Data Are Equal: Rethinking Data Weighting in Text Classification
An Empirical Analysis of Uncertainty in Large Language Model Evaluations
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
Inverse decision-making using neural amortized Bayesian actors
LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging
RetroInText: A Multimodal Large Language Model Enhanced Framework for Retrosynthetic Planning via In-Context Representation Learning
PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing
Generalizing Reasoning Problems to Longer Lengths
Ultra-Sparse Memory Network
Discretization-invariance? On the Discretization Mismatch Errors in Neural Operators
System 1.x: Learning to Balance Fast and Slow Planning with Language Models
FreSh: Frequency Shifting for Accelerated Neural Representation Learning
Zero-cost Proxy for Adversarial Robustness Evaluation
Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers
Active Task Disambiguation with LLMs
Differentiable Rule Induction from Raw Sequence Inputs
Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks
How Does Critical Batch Size Scale in Pre-training?
MMAD: A Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection
On the expressiveness and spectral bias of KANs
Scaling Wearable Foundation Models
Disentangling Representations through Multi-task Learning
Accelerated training through iterative gradient propagation along the residual path
Quantitative Approximation for Neural Operators in Nonlinear Parabolic Equations
Autoregressive Video Generation without Vector Quantization
Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding
From Decoupling to Adaptive Transformation: a Wider Optimization Space for PTQ
$\sigma$-zero: Gradient-based Optimization of $\ell_0$-norm Adversarial Examples
Grammar Reinforcement Learning: path and cycle counting in graphs with a Context-Free Grammar and Transformer approach
Sort-free Gaussian Splatting via Weighted Sum Rendering
PIG: Physics-Informed Gaussians as Adaptive Parametric Mesh Representations
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts
Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model
Advantage-Guided Distillation for Preference Alignment in Small Language Models
A Generic Framework for Conformal Fairness
Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models
Structural-Entropy-Based Sample Selection for Efficient and Effective Learning
Specialized Foundation Models Struggle to Beat Supervised Baselines
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
A Single Goal is All You Need: Skills and Exploration Emerge from Contrastive RL without Rewards, Demonstrations, or Subgoals
SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP
The Power of LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions
Storybooth: Training-Free Multi-Subject Consistency for Improved Visual Storytelling
Tuning Frequency Bias of State Space Models
Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge
ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains
The Computational Complexity of Positive Non-Clashing Teaching in Graphs
Aligning Visual Contrastive learning models via Preference Optimization
Credit-based self organizing maps: training deep topographic networks with minimal performance degradation
Self-Attention-Based Contextual Modulation Improves Neural System Identification
VLMaterial: Procedural Material Generation with Large Vision-Language Models
Sufficient Context: A New Lens on Retrieval Augmented Generation Systems
Needle Threading: Can LLMs Follow Threads Through Near-Million-Scale Haystacks?
Analyzing Neural Scaling Laws in Two-Layer Networks with Power-Law Data Spectra
Intrinsic User-Centric Interpretability through Global Mixture of Experts
Is Large-scale Pretraining the Secret to Good Domain Generalization?
Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval-Augmented Generation
Boosting Perturbed Gradient Ascent for Last-Iterate Convergence in Games
ToddlerDiffusion: Interactive Structured Image Generation with Cascaded Schrödinger Bridge
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains
Open-World Reinforcement Learning over Long Short-Term Imagination
Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation
Benchmarking Agentic Workflow Generation
MAGE: Model-Level Graph Neural Networks Explanations via Motif-based Graph Generation
Revisiting Nearest Neighbor for Tabular Data: A Deep Tabular Baseline Two Decades Later
Difference-of-submodular Bregman Divergence
To Trust or Not to Trust? Enhancing Large Language Models' Situated Faithfulness to External Contexts
Faster Cascades via Speculative Decoding
LLMs Can Plan Only If We Tell Them
High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity
Towards Empowerment Gain through Causal Structure Learning in Model-Based Reinforcement Learning
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents
Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images
ReAttention: Training-Free Infinite Context with Finite Attention Scope
Inverse Rendering using Multi-Bounce Path Tracing and Reservoir Sampling
MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers
EVA: Geometric Inverse Design for Fast Protein Motif-Scaffolding with Coupled Flow
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
Stochastic Bandits Robust to Adversarial Attacks
miniCTX: Neural Theorem Proving with (Long-)Contexts
Timer-XL: Long-Context Transformers for Unified Time Series Forecasting
Can Watermarks be Used to Detect LLM IP Infringement For Free?
CaPo: Cooperative Plan Optimization for Efficient Embodied Multi-Agent Cooperation
Beyond Canonicalization: How Tensorial Messages Improve Equivariant Message Passing
AniSDF: Fused-Granularity Neural Surfaces with Anisotropic Encoding for High-Fidelity 3D Reconstruction
FlowDec: A flow-based full-band general audio codec with high perceptual quality
IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations
Computing Circuits Optimization via Model-Based Circuit Genetic Evolution
Zero-shot Model-based Reinforcement Learning using Large Language Models
YOLO-RD: Introducing Relevant and Compact Explicit Knowledge to YOLO by Retriever-Dictionary
Distilling Structural Representations into Protein Sequence Models
Weighted Point Set Embedding for Multimodal Contrastive Learning Toward Optimal Similarity Metric
GALA: Geometry-Aware Local Adaptive Grids for Detailed 3D Generation
Learning Distributions of Complex Fluid Simulations with Diffusion Graph Networks
Distribution-Specific Agnostic Conditional Classification With Halfspaces
Benchmarking LLMs' Judgments with No Gold Standard
Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View
SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators
On the Performance Analysis of Momentum Method: A Frequency Domain Perspective
A Coefficient Makes SVRG Effective
Language Imbalance Driven Rewarding for Multilingual Self-improving
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
Composable Interventions for Language Models
Accelerating Neural ODEs: A Variational Formulation-based Approach
A Non-Contrastive Learning Framework for Sequential Recommendation with Preference-Preserving Profile Generation
Improving Large Language Model Planning with Action Sequence Similarity
Accelerating Task Generalisation with Multi-Level Skill Hierarchies
Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph Generation
A Differentiable Rank-Based Objective for Better Feature Learning
Rethinking Multiple-Instance Learning From Feature Space to Probability Space
Enhancing Prediction Performance through Influence Measure
Data-centric Prediction Explanation via Kernelized Stein Discrepancy
Shallow diffusion networks provably learn hidden low-dimensional structure
Detecting Backdoor Samples in Contrastive Language Image Pretraining
Training-Free Diffusion Model Alignment with Sampling Demons
JPEG Inspired Deep Learning
Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models
Learning to Plan Before Answering: Self-Teaching LLMs to Learn Abstract Plans for Problem Solving
A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules
Streaming Algorithms For $\ell_p$ Flows and $\ell_p$ Regression
MIRAGE: Evaluating and Explaining Inductive Reasoning Process in Language Models
Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation
Asymptotic Analysis of Two-Layer Neural Networks after One Gradient Step under Gaussian Mixtures Data with Structure
Diffusion Feedback Helps CLIP See Better
SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency
MiniPLM: Knowledge Distillation for Pre-training Language Models
Selective Unlearning via Representation Erasure Using Domain Adversarial Training
Projection Head is Secretly an Information Bottleneck
TabReD: Analyzing Pitfalls and Filling the Gaps in Tabular Deep Learning Benchmarks
Node Identifiers: Compact, Discrete Representations for Efficient Graph Learning
Classic but Everlasting: Traditional Gradient-Based Algorithms Converges Fast Even in Time-Varying Multi-Player Games
How efficient is LLM-generated code? A rigorous & high-standard benchmark
DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Do as We Do, Not as You Think: the Conformity of Large Language Models
Towards Learning High-Precision Least Squares Algorithms with Sequence Models
Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix
QERA: an Analytical Framework for Quantization Error Reconstruction
Reducing Hallucinations in Large Vision-Language Models via Latent Space Steering
Accessing Vision Foundation Models via ImageNet-1K
pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation
For Better or For Worse? Learning Minimum Variance Features With Label Augmentation
CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation
Aioli: A Unified Optimization Framework for Language Model Data Mixing
Implicit Neural Surface Deformation with Explicit Velocity Fields
6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering
Can LLMs Understand Time Series Anomalies?
LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation models
Pursuing Better Decision Boundaries for Long-Tailed Object Detection via Category Information Amount
LiveBench: A Challenging, Contamination-Limited LLM Benchmark
Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games
Temporal Reasoning Transfer from Text to Video
Neuron based Personality Trait Induction in Large Language Models
Partial Gromov-Wasserstein Metric
Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model
LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning
What Are Good Positional Encodings for Directed Graphs?
An Evolved Universal Transformer Memory
UniGEM: A Unified Approach to Generation and Property Prediction for Molecules
On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning
Training-Free Dataset Pruning for Instance Segmentation
Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models
Diff-Prompt: Diffusion-driven Prompt Generator with Mask Supervision
DiSK: Differentially Private Optimizer with Simplified Kalman Filter for Noise Reduction
Efficient and Accurate Explanation Estimation with Distribution Compression
From Probability to Counterfactuals: the Increasing Complexity of Satisfiability in Pearl's Causal Hierarchy
Faster Inference of Flow-Based Generative Models via Improved Data-Noise Coupling
Learning Chaos In A Linear Way
What Matters in Learning from Large-Scale Datasets for Robot Manipulation
Rethinking Reward Modeling in Preference-based Large Language Model Alignment
CircuitFusion: Multimodal Circuit Representation Learning for Agile Chip Design
Three-in-One: Fast and Accurate Transducer for Hybrid-Autoregressive ASR
Generating Likely Counterfactuals Using Sum-Product Networks
Enhanced Diffusion Sampling via Extrapolation with Multiple ODE Solutions
What's the Move? Hybrid Imitation Learning via Salient Points
Towards Out-of-Modal Generalization without Instance-level Modal Correspondence
GETS: Ensemble Temperature Scaling for Calibration in Graph Neural Networks
Eliciting Human Preferences with Language Models
AdaRankGrad: Adaptive Gradient Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning
Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching
Laplace Sample Information: Data Informativeness Through a Bayesian Lens
A Periodic Bayesian Flow for Material Generation
SMI-Editor: Edit-based SMILES Language Model with Fragment-level Supervision
CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation
Trusted Multi-View Classification via Evolutionary Multi-View Fusion
Systematic Relational Reasoning With Epistemic Graph Neural Networks
Improving Generalization and Robustness in SNNs Through Signed Rate Encoding and Sparse Encoding Attacks
AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation
Chain-of-region: Visual Language Models Need Details for Diagram Analysis
CoMotion: Concurrent Multi-person 3D Motion
Understanding Matrix Function Normalizations in Covariance Pooling through the Lens of Riemannian Geometry
ESE: Espresso Sentence Embeddings
Random-Set Neural Networks
OGBench: Benchmarking Offline Goal-Conditioned RL
TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data
Explanations of GNN on Evolving Graphs via Axiomatic Layer edges
URLOST: Unsupervised Representation Learning without Stationarity or Topology
Boltzmann priors for Implicit Transfer Operators
On Scaling Up 3D Gaussian Splatting Training
Bayesian WeakS-to-Strong from Text Classification to Generation
Jump Your Steps: Optimizing Sampling Schedule of Discrete Diffusion Models
Compute-Optimal LLMs Provably Generalize Better with Scale
D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement
Rethinking Fair Representation Learning for Performance-Sensitive Tasks
Asymmetric Factorized Bilinear Operation for Vision Transformer
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Generative Flows on Synthetic Pathway for Drug Design
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
Spreading Out-of-Distribution Detection on Graphs
GlycanML: A Multi-Task and Multi-Structure Benchmark for Glycan Machine Learning
Zigzag Diffusion Sampling: Diffusion Models Can Self-Improve via Self-Reflection
X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention
Equivariant Denoisers Cannot Copy Graphs: Align Your Graph Diffusion Models
Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning
Commit0: Library Generation from Scratch
Loss Landscape of Shallow ReLU-like Neural Networks: Stationary Points, Saddle Escape, and Network Embedding
Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification
FlickerFusion: Intra-trajectory Domain Generalizing Multi-agent Reinforcement Learning
Faster Diffusion Sampling with Randomized Midpoints: Sequential and Parallel
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG
NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval
Knowledge Graph Finetuning Enhances Knowledge Manipulation in Large Language Models
TVNet: A Novel Time Series Analysis Method Based on Dynamic Convolution and 3D-Variation
Variance-Reducing Couplings for Random Features
Looking into User’s Long-term Interests through the Lens of Conservative Evidential Learning
EC-Diffuser: Multi-Object Manipulation via Entity-Centric Behavior Generation
3DIS: Depth-Driven Decoupled Image Synthesis for Universal Multi-Instance Generation
Diffusion Transformer Captures Spatial-Temporal Dependencies: A Theory for Gaussian Process Data
ANaGRAM: A Natural Gradient Relative to Adapted Model for efficient PINNs learning
Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance
Neuron-based Multifractal Analysis of Neuron Interaction Dynamics in Large Models
Multi-Scale Fusion for Object Representation
QP-SNN: Quantized and Pruned Spiking Neural Networks
NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields
Visual Agents as Fast and Slow Thinkers
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine
HelpSteer2-Preference: Complementing Ratings with Preferences
Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers
Relax and Merge: A Simple Yet Effective Framework for Solving Fair $k$-Means and $k$-sparse Wasserstein Barycenter Problems
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals
From Isolated Conversations to Hierarchical Schemas: Dynamic Tree Memory Representation for LLMs
Discovering Clone Negatives via Adaptive Contrastive Learning for Image-Text Matching
Active Learning for Continual Learning: Keeping the Past Alive in the Present
On the Adversarial Vulnerability of Label-Free Test-Time Adaptation
Multi-Draft Speculative Sampling: Canonical Decomposition and Theoretical Limits
Weak to Strong Generalization for Large Language Models with Multi-capabilities
Preble: Efficient Distributed Prompt Scheduling for LLM Serving
TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning
CipherPrune: Efficient and Scalable Private Transformer Inference
Fragment and Geometry Aware Tokenization of Molecules for Structure-Based Drug Design Using Language Models
GOFA: A Generative One-For-All Model for Joint Graph Language Modeling
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Chain-of-Thought Provably Enables Learning the (Otherwise) Unlearnable
Efficient Distribution Matching of Representations via Noise-Injected Deep InfoMax
Learning to Help in Multi-Class Settings
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints
An Engorgio Prompt Makes Large Language Model Babble on
Tracking objects that change in appearance with phase synchrony
AdaWM: Adaptive World Model based Planning for Autonomous Driving
Robust Conformal Prediction with a Single Binary Certificate
Long-Short Decision Transformer: Bridging Global and Local Dependencies for Generalized Decision-Making
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery
RESuM: A Rare Event Surrogate Model for Physics Detector Design
It Helps to Take a Second Opinion: Teaching Smaller LLMs To Deliberate Mutually via Selective Rationale Optimisation
Plastic Learning with Deep Fourier Features
Dynamic Modeling of Patients, Modalities and Tasks via Multi-modal Multi-task Mixture of Experts
Nonconvex Stochastic Optimization under Heavy-Tailed Noises: Optimal Convergence without Gradient Clipping
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs
Limits to scalable evaluation at the frontier: LLM as judge won’t beat twice the data
Zero-shot Imputation with Foundation Inference Models for Dynamical Systems
Multi-Dimensional Conformal Prediction
From Promise to Practice: Realizing High-performance Decentralized Training
Unifying Causal Representation Learning with the Invariance Principle
Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning
Discrete Codebook World Models for Continuous Control
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs
Adaptive $Q$-Network: On-the-fly Target Selection for Deep Reinforcement Learning
Towards Automated Knowledge Integration From Human-Interpretable Representations
RAG-SR: Retrieval-Augmented Generation for Neural Symbolic Regression
Many-Objective Multi-Solution Transport
Measuring And Improving Persuasiveness Of Large Language Models
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding
Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity
Differentially Private Steering for Large Language Model Alignment
Probabilistic Conformal Prediction with Approximate Conditional Validity
Rethinking Graph Neural Networks From A Geometric Perspective Of Node Features
Formation of Representations in Neural Networks
A Watermark for Order-Agnostic Language Models
Online Reinforcement Learning in Non-Stationary Context-Driven Environments
Conditional Diffusion Models are Minimax-Optimal and Manifold-Adaptive for Conditional Distribution Estimation
Enhancing Graph Of Thought: Enhancing Prompts with LLM Rationales and Dynamic Temperature Control
Understanding Virtual Nodes: Oversquashing and Node Heterogeneity
Robust Root Cause Diagnosis using In-Distribution Interventions
Learning Mask Invariant Mutual Information for Masked Image Modeling
One for all and all for one: Efficient computation of partial Wasserstein distances on the line
Provably Safeguarding a Classifier from OOD and Adversarial Samples
From Search to Sampling: Generative Models for Robust Algorithmic Recourse
Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction
Learning local equivariant representations for quantum operators
Decoupled Graph Energy-based Model for Node Out-of-Distribution Detection on Heterophilic Graphs
Equivariant Masked Position Prediction for Efficient Molecular Representation
Circuit Transformer: A Transformer That Preserves Logical Equivalence
Reconsidering Faithfulness in Regular, Self-Explainable and Domain Invariant GNNs
Learning and aligning single-neuron invariance manifolds in visual cortex
Direct Distributional Optimization for Provable Alignment of Diffusion Models
Signature Kernel Conditional Independence Tests in Causal Discovery for Stochastic Processes
Neural Wave Equation for Irregularly Sampled Sequence Data
Diffusing States and Matching Scores: A New Framework for Imitation Learning
Few for Many: Tchebycheff Set Scalarization for Many-Objective Optimization
Diffusion State-Guided Projected Gradient for Inverse Problems
AstroCompress: A benchmark dataset for multi-purpose compression of astronomical data
Learning Long Range Dependencies on Graphs via Random Walks
Arithmetic Without Algorithms: Language Models Solve Math with a Bag of Heuristics
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
Erasing Concept Combination from Text-to-Image Diffusion Model
RelCon: Relative Contrastive Learning for a Motion Foundation Model for Wearable Data
Efficient Dictionary Learning with Switch Sparse Autoencoders
Learning a Neural Solver for Parametric PDEs to Enhance Physics-Informed Methods
Meta-Continual Learning of Neural Fields
A Sanity Check for AI-generated Image Detection
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Instant Policy: In-Context Imitation Learning via Graph Diffusion
OpenHands: An Open Platform for AI Software Developers as Generalist Agents
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
DS-LLM: Leveraging Dynamical Systems to Enhance Both Training and Inference of Large Language Models
Positive-Unlabeled Diffusion Models for Preventing Sensitive Data Generation
EFFICIENT JAILBREAK ATTACK SEQUENCES ON LARGE LANGUAGE MODELS VIA MULTI-ARMED BANDIT-BASED CONTEXT SWITCHING
CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and Reranking
Inverse Attention Agents for Multi-Agent Systems
Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting
AutoEval: Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks
Adversarial Mixup Unlearning
Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs
Moner: Motion Correction in Undersampled Radial MRI with Unsupervised Neural Representation
Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson–Romberg Extrapolation
Samba: Synchronized Set-of-Sequences Modeling for Multiple Object Tracking
Geometry-Aware Approaches for Balancing Performance and Theoretical Guarantees in Linear Bandits
Proactive Privacy Amnesia for Large Language Models: Safeguarding PII with Negligible Impact on Model Utility
Representational Similarity via Interpretable Visual Concepts
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
SafeDiffuser: Safe Planning with Diffusion Probabilistic Models
Robotouille: An Asynchronous Planning Benchmark for LLM Agents
ET-SEED: EFFICIENT TRAJECTORY-LEVEL SE(3) EQUIVARIANT DIFFUSION POLICY
Uncertainty modeling for fine-tuned implicit functions
PaCA: Partial Connection Adaptation for Efficient Fine-Tuning
Distance-Based Tree-Sliced Wasserstein Distance
A Theoretical Framework for Partially-Observed Reward States in RLHF
Mind the GAP: Glimpse-based Active Perception improves generalization and sample efficiency of visual reasoning
Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Can We Talk Models Into Seeing the World Differently?
Unlocking the Potential of Model Calibration in Federated Learning
Endowing Visual Reprogramming with Adversarial Robustness
Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
MuPT: A Generative Symbolic Music Pretrained Transformer
Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics
Neural Stochastic Differential Equations for Uncertainty-Aware Offline RL
No Free Lunch: Fundamental Limits of Learning Non-Hallucinating Generative Models
AutoG: Towards automatic graph construction from tabular data
Can Knowledge Editing Really Correct Hallucinations?
Understanding Long Videos with Multimodal Language Models
GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement
Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction
Watch Less, Do More: Implicit Skill Discovery for Video-Conditioned Policy
Immunogenicity Prediction with Dual Attention Enables Vaccine Target Selection
Mining your own secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales
Uncertainty-Aware Decoding with Minimum Bayes Risk
Instance-dependent Early Stopping
GaussianAnything: Interactive Point Cloud Flow Matching for 3D Generation
ACES: Automatic Cohort Extraction System for Event-Stream Datasets
GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation
No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images
OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones?
Provable Benefit of Annealed Langevin Monte Carlo for Non-log-concave Sampling
Mix-CPT: A Domain Adaptation Framework via Decoupling Knowledge Learning and Format Alignment
T-JEPA: Augmentation-Free Self-Supervised Learning for Tabular Data
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Directional Gradient Projection for Robust Fine-Tuning of Foundation Models
Precise Parameter Localization for Textual Generation in Diffusion Models
Residual-MPPI: Online Policy Customization for Continuous Control
Diffusion Models Are Real-Time Game Engines
Rare event modeling with self-regularized normalizing flows: what can we learn from a single failure?
Can Video LLMs Refuse to Answer? Alignment for Answerability in Video Large Language Models
Scalable Influence and Fact Tracing for Large Language Model Pretraining
Unlearning-based Neural Interpretations
Towards Auto-Regressive Next-Token Prediction: In-context Learning Emerges from Generalization
Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Model Alignment
Learning Robust Representations with Long-Term Information for Generalization in Visual Reinforcement Learning
RTop-K: Ultra-Fast Row-Wise Top-K Selection for Neural Network Acceleration on GPUs
Safety Representations for Safer Policy Learning
Humanizing the Machine: Proxy Attacks to Mislead LLM Detectors
Generalization and Distributed Learning of GFlowNets
Composing Unbalanced Flows for Flexible Docking and Relaxation
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
Local Loss Optimization in the Infinite Width: Stable Parameterization of Predictive Coding Networks and Target Propagation
Weighted-Reward Preference Optimization for Implicit Model Fusion
Mitigating Reward Over-Optimization in RLHF via Behavior-Supported Regularization
Improved Training Technique for Latent Consistency Models
Linear Spherical Sliced Optimal Transport: A Fast Metric for Comparing Spherical Data
Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models
Statistical Advantages of Perturbing Cosine Router in Mixture of Experts
Steering Protein Family Design through Profile Bayesian Flow
Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference
Interpretable Unsupervised Joint Denoising and Enhancement for Real-World low-light Scenarios
Improving Probabilistic Diffusion Models With Optimal Diagonal Covariance Matching
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Tight Lower Bounds under Asymmetric High-Order Hölder Smoothness and Uniform Convexity
Unlearning or Obfuscating? Jogging the Memory of Unlearned LLMs via Benign Relearning
Connectome Mapping: Shape-Memory Network via Interpretation of Contextual Semantic Information
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Capability Localization: Capabilities Can be Localized rather than Individual Knowledge
Lasso Bandit with Compatibility Condition on Optimal Arm
Universal Image Restoration Pre-training via Degradation Classification
Lightweight Predictive 3D Gaussian Splats
Towards Continuous Reuse of Graph Models via Holistic Memory Diversification
GenVP: Generating Visual Puzzles with Contrastive Hierarchical VAEs
Safety-Prioritizing Curricula for Constrained Reinforcement Learning
What to align in multimodal contrastive learning?
Value-aligned Behavior Cloning for Offline Reinforcement Learning via Bi-level Optimization
Learning Neural Networks with Distribution Shift: Efficiently Certifiable Guarantees
Filtered not Mixed: Filtering-Based Online Gating for Mixture of Large Language Models
Mitigating Parameter Interference in Model Merging via Sharpness-Aware Fine-Tuning
Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood
SegLLM: Multi-round Reasoning Segmentation with Large Language Models
Gap Preserving Distillation by Building Bidirectional Mappings with A Dynamic Teacher
Layerwise Recurrent Router for Mixture-of-Experts
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count
Pairwise Elimination with Instance-Dependent Guarantees for Bandits with Cost Subsidy
Fine-Tuning Attention Modules Only: Enhancing Weight Disentanglement in Task Arithmetic
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
Provable unlearning in topic modeling and downstream tasks
Gaussian-Based Instance-Adaptive Intensity Modeling for Point-Supervised Facial Expression Spotting
Training Free Exponential Context Extension via Cascading KV Cache
Sequential Controlled Langevin Diffusions
PooDLe🐩: Pooled and dense self-supervised learning from naturalistic videos
Exact Certification of (Graph) Neural Networks Against Label Poisoning
Policy Design in Long-run Welfare Dynamics
Tight Clusters Make Specialized Experts
Predicting the Energy Landscape of Stochastic Dynamical System via Physics-informed Self-supervised Learning
EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing
MallowsPO: Fine-Tune Your LLM with Preference Dispersions
SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
A Robust Method to Discover Causal or Anticausal Relation
Spectral Compressive Imaging via Unmixing-driven Subspace Diffusion Refinement
Hybrid Regularization Improves Diffusion-based Inverse Problem Solving
A Closer Look at Machine Unlearning for Large Language Models
Underdamped Diffusion Bridges with Applications to Sampling
CLIBD: Bridging Vision and Genomics for Biodiversity Monitoring at Scale
GSE: Group-wise Sparse and Explainable Adversarial Attacks
Efficient Training of Neural Stochastic Differential Equations by Matching Finite Dimensional Distributions
Model Equality Testing: Which Model is this API Serving?
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
Model-agnostic meta-learners for estimating heterogeneous treatment effects over time
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequences
Scale-aware Recognition in Satellite Images under Resource Constraints
Credal Wrapper of Model Averaging for Uncertainty Estimation in Classification
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Adversarial Latent Feature Augmentation for Fairness
REvolve: Reward Evolution with Large Language Models using Human Feedback
Aligning Human Motion Generation with Human Perceptions
PEARL: Parallel Speculative Decoding with Adaptive Draft Length
Subgraph Federated Learning for Local Generalization
One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt
A New Perspective on Shampoo's Preconditioner
GeoLoRA: Geometric integration for parameter efficient fine-tuning
Unsupervised Model Tree Heritage Recovery
MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation
Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs
Diffusion Transformers for Tabular Data Time Series Generation
Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF
Size-Generalizable RNA Structure Evaluation by Exploring Hierarchical Geometries
Transformers Handle Endogeneity in In-Context Linear Regression
No Preference Left Behind: Group Distributional Preference Optimization
InstaTrain: Adaptive Training via Ultra-Fast Natural Annealing within Dynamical Systems
NetFormer: An interpretable model for recovering dynamical connectivity in neuronal population dynamics
Prototype antithesis for biological few-shot class-incremental learning
$\text{I}^2\text{AM}$: Interpreting Image-to-Image Latent Diffusion Models via Bi-Attribution Maps
Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models
A Quantum Circuit-Based Compression Perspective for Parameter-Efficient Learning
BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks
Intrinsic Dimension Correlation: uncovering nonlinear connections in multimodal representations
Risk-Sensitive Diffusion: Robustly Optimizing Diffusion Models with Noisy Samples
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
HADAMRNN: BINARY AND SPARSE TERNARY ORTHOGONAL RNNS
A Geometric Framework for Understanding Memorization in Generative Models
Standardizing Structural Causal Models
Recovering Manifold Structure Using Ollivier Ricci Curvature
TopoLM: brain-like spatio-functional organization in a topographic language model
Towards General-Purpose Model-Free Reinforcement Learning
PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection
QMP: Q-switch Mixture of Policies for Multi-Task Behavior Sharing
Confidence Elicitation: A New Attack Vector for Large Language Models
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Mitigate the Gap: Improving Cross-Modal Alignment in CLIP
Separation Power of Equivariant Neural Networks
Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?
Toward Understanding In-context vs. In-weight Learning
Error-quantified Conformal Inference for Time Series
Enhancing Language Model Agents using Diversity of Thoughts
InstantPortrait: One-Step Portrait Editing via Diffusion Multi-Objective Distillation
A Distributional Approach to Uncertainty-Aware Preference Alignment Using Offline Demonstrations
Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models
DiffPC: Diffusion-based High Perceptual Fidelity Image Compression with Semantic Refinement
ASTrA: Adversarial Self-supervised Training with Adaptive-Attacks
ARB-LLM: Alternating Refined Binarizations for Large Language Models
Optimized Multi-Token Joint Decoding With Auxiliary Model for LLM Inference
Kronecker Mask and Interpretive Prompts are Language-Action Video Learners
Brain Bandit: A Biologically Grounded Neural Network for Efficient Control of Exploration
InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation
Min-K%++: Improved Baseline for Pre-Training Data Detection from Large Language Models
Latent Bayesian Optimization via Autoregressive Normalizing Flows
Easing Training Process of Rectified Flow Models Via Lengthening Inter-Path Distance
Shape as Line Segments: Accurate and Flexible Implicit Surface Representation
Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression
Cross-Domain Off-Policy Evaluation and Learning for Contextual Bandits
FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models
Diffusion Models as Cartoonists: The Curious Case of High Density Regions
Concept Bottleneck Language Models For Protein Design
MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model
Sensor-Invariant Tactile Representation
Conformalized Interactive Imitation Learning: Handling Expert Shift and Intermittent Feedback
Learning 3D Perception from Others' Predictions
Multilevel Generative Samplers for Investigating Critical Phenomena
Generator Matching: Generative modeling with arbitrary Markov processes
Reframing Structure-Based Drug Design Model Evaluation via Metrics Correlated to Practical Needs
Mixture of Attentions For Speculative Decoding
Medium-Difficulty Samples Constitute Smoothed Decision Boundary for Knowledge Distillation on Pruned Datasets
GOttack: Universal Adversarial Attacks on Graph Neural Networks via Graph Orbits Learning
ODE-based Smoothing Neural Network for Reinforcement Learning Tasks
Enhancing Multilingual Reasoning in LLMs: Insights from Cross-Linguistic Correlations and Optimal Data Proportions
ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement
Progressive Compositionality in Text-to-Image Generative Models
Redefining the task of Bioactivity Prediction
GROOT-2: Weakly Supervised Multimodal Instruction Following Agents
ECD: A Machine Learning Benchmark for Predicting Enhanced-Precision Electronic Charge Density in Crystalline Inorganic Materials
HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation
On the Crucial Role of Initialization for Matrix Factorization
Handling Delay in Real-Time Reinforcement Learning
NExUME: Adaptive Training and Inference for DNNs under Intermittent Power Environments
Revisiting Random Walks for Learning on Graphs
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Geometry of Lightning Self-Attention: Identifiability and Dimension
Attributing Culture-Conditioned Generations to Pretraining Corpora
Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training
Query-based Knowledge Transfer for Heterogeneous Learning Environments
Long-time asymptotics of noisy SVGD outside the population limit
High-Dimensional Bayesian Optimisation with Gaussian Process Prior Variational Autoencoders
Multi-level Certified Defense Against Poisoning Attacks in Offline Reinforcement Learning
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
Semantic Aware Representation Learning for Lifelong Learning
Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Optimal Non-Asymptotic Rates of Value Iteration for Average-Reward Markov Decision Processes
Meta-Dynamical State Space Models for Integrative Neural Data Analysis
Optimal Learning of Kernel Logistic Regression for Complex Classification Scenarios
Scaling Optimal LR Across Token Horizons
Is In-Context Learning Sufficient for Instruction Following in LLMs?
Discovering Influential Neuron Path in Vision Transformers
Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing
REBIND: Enhancing Ground-state Molecular Conformation Prediction via Force-Based Graph Rewiring
Locality Sensitive Avatars From Video
Does Spatial Cognition Emerge in Frontier Models?
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Release the Powers of Prompt Tuning: Cross-Modality Prompt Transfer
GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization
Learning Interpretable Hierarchical Dynamical Systems Models from Time Series Data
NeSyC: A Neuro-symbolic Continual Learner For Complex Embodied Tasks in Open Domains
A Decade's Battle on Dataset Bias: Are We There Yet?
TabM: Advancing tabular deep learning with parameter-efficient ensembling
Learning to Solve Differential Equation Constrained Optimization Problems
GameArena: Evaluating LLM Reasoning through Live Computer Games
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion
CTSyn: A Foundation Model for Cross Tabular Data Generation
Bayesian Regularization of Latent Representation
MCNC: Manifold-Constrained Reparameterization for Neural Compression
Accelerating 3D Molecule Generation via Jointly Geometric Optimal Transport
Bridging the Gap between Database Search and \emph{De Novo} Peptide Sequencing with SearchNovo
REEF: Representation Encoding Fingerprints for Large Language Models
Regret Bounds for Episodic Risk-Sensitive Linear Quadratic Regulator
Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning
Towards a Theoretical Understanding of Synthetic Data in LLM Post-Training: A Reverse-Bottleneck Perspective
How DNNs break the Curse of Dimensionality: Compositionality and Symmetry Learning
Adaptive backtracking for faster optimization
Privacy-Aware Lifelong Learning
Mitigating Spurious Correlations in Zero-Shot Multimodal Models
LeanAgent: Lifelong Learning for Formal Theorem Proving
BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
ZooProbe: A Data Engine for Evaluating, Exploring, and Evolving Large-scale Training Data for Multimodal LLMs
CryoFM: A Flow-based Foundation Model for Cryo-EM Densities
PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-agent Tasks
Adversarial Training for Defense Against Label Poisoning Attacks
You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs
Learning Harmonized Representations for Speculative Sampling
Trajectory-LLM: A Language-based Data Generator for Trajectory Prediction in Autonomous Driving
Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation
Self-Supervised Diffusion Models for Electron-Aware Molecular Representation Learning
MUSE: Machine Unlearning Six-Way Evaluation for Language Models
Learning to Adapt Frozen CLIP for Few-Shot Test-Time Domain Adaptation
L3Ms — Lagrange Large Language Models
MambaPEFT: Exploring Parameter-Efficient Fine-Tuning for Mamba
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Nonlinear Sequence Embedding by Monotone Variational Inequality
A Formal Framework for Understanding Length Generalization in Transformers
Surprising Effectiveness of pretraining Ternary Language Model at Scale
Poisson-Dirac Neural Networks for Modeling Coupled Dynamical Systems across Domains
FIRING-Net: A filtered feature recycling network for speech enhancement
Multi-Task Dense Predictions via Unleashing the Power of Diffusion
Noise Separation guided Candidate Label Reconstruction for Noisy Partial Label Learning
Physics-informed Temporal Difference Metric Learning for Robot Motion Planning
ClawMachine: Learning to Fetch Visual Tokens for Referential Comprehension
Grokking at the Edge of Numerical Stability
CLIPure: Purification in Latent Space via CLIP for Adversarially Robust Zero-Shot Classification
Solving Video Inverse Problems Using Image Diffusion Models
First-Person Fairness in Chatbots
Gumbel Counterfactual Generation From Language Models
Neural Sampling from Boltzmann Densities: Fisher-Rao Curves in the Wasserstein Geometry
Proteina: Scaling Flow-based Protein Structure Generative Models
DECO: Unleashing the Potential of ConvNets for Query-based Detection and Segmentation
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Deep Learning Alternatives Of The Kolmogorov Superposition Theorem
Logical Consistency of Large Language Models in Fact-Checking
HShare: Fast LLM Decoding by Hierarchical Key-Value Sharing
Exploring the Camera Bias of Person Re-identification
P-SPIKESSM: HARNESSING PROBABILISTIC SPIKING STATE SPACE MODELS FOR LONG-RANGE DEPENDENCY TASKS
Epistemic Monte Carlo Tree Search
Boosting Neural Combinatorial Optimization for Large-Scale Vehicle Routing Problems
On the Relation between Trainability and Dequantization of Variational Quantum Learning Models
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
The Belief State Transformer
Herald: A Natural Language Annotated Lean 4 Dataset
UNSURE: self-supervised learning with Unknown Noise level and Stein's Unbiased Risk Estimate
Spurious Forgetting in Continual Learning of Language Models
KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks
VideoGrain: Modulating Space-Time Attention for Multi-Grained Video Editing
State Space Model Meets Transformer: A New Paradigm for 3D Object Detection
Discriminator-Guided Embodied Planning for LLM Agent
UTILITY: Utilizing Explainable Reinforcement Learning to Improve Reinforcement Learning
Learnable Expansion of Graph Operators for Multi-Modal Feature Fusion
Language Models Are Implicitly Continuous
MLPs Learn In-Context on Regression and Classification Tasks
TorchTitan: One-stop PyTorch native solution for production ready LLM pretraining
Law of the Weakest Link: Cross Capabilities of Large Language Models
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
Bridging Information Asymmetry in Text-video Retrieval: A Data-centric Approach
Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-Based Decision-Making Systems
Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice
Concept-ROT: Poisoning Concepts in Large Language Models with Model Editing
Learning Video-Conditioned Policy on Unlabelled Data with Joint Embedding Predictive Transformer
Vec2Face: Scaling Face Dataset Generation with Loosely Constrained Vectors
Methods with Local Steps and Random Reshuffling for Generally Smooth Non-Convex Federated Optimization
A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts
A Riemannian Framework for Learning Reduced-order Lagrangian Dynamics
AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation
A Theory of Initialisation's Impact on Specialisation
Closed-Form Merging of Parameter-Efficient Modules for Federated Continual Learning
FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling
T2V2: A Unified Non-Autoregressive Model for Speech Recognition and Synthesis via Multitask Learning
SRSA: Skill Retrieval and Adaptation for Robotic Assembly Tasks
Concept Bottleneck Large Language Models
Approaching Rate-Distortion Limits in Neural Compression with Lattice Transform Coding
MELODI: Exploring Memory Compression for Long Contexts
Reflexive Guidance: Improving OoDD in Vision-Language Models via Self-Guided Image-Adaptive Concept Generation
Neural Interactive Proofs
Newton Meets Marchenko-Pastur: Massively Parallel Second-Order Optimization with Hessian Sketching and Debiasing
A Differentiable Metric for Discovering Groups and Unitary Representations
A Simple Framework for Open-Vocabulary Zero-Shot Segmentation
GPS: A Probabilistic Distributional Similarity with Gumbel Priors for Set-to-Set Matching
Model-Free Offline Reinforcement Learning with Enhanced Robustness
Is Your Multimodal Language Model Oversensitive to Safe Queries?
On Disentangled Training for Nonlinear Transform in Learned Image Compression
ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time
InstantSwap: Fast Customized Concept Swapping across Sharp Shape Differences
The 3D-PC: a benchmark for visual perspective taking in humans and machines
Continuous Diffusion for Mixed-Type Tabular Data
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
Fair Submodular Cover
Exploring the Design Space of Visual Context Representation in Video MLLMs
Microcanonical Langevin Ensembles: Advancing the Sampling of Bayesian Neural Networks
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Advantage Alignment Algorithms
Circuit Representation Learning with Masked Gate Modeling and Verilog-AIG Alignment
AIR-BENCH 2024: A Safety Benchmark based on Regulation and Policies Specified Risk Categories
Efficient Causal Decision Making with One-sided Feedback
ImageFolder: Autoregressive Image Generation with Folded Tokens
Learning Hierarchical Polynomials of Multiple Nonlinear Features
Demystifying Topological Message-Passing with Relational Structures: A Case Study on Oversquashing in Simplicial Message-Passing
A Transfer Attack to Image Watermarks
Navigating Neural Space: Revisiting Concept Activation Vectors to Overcome Directional Divergence
Uncertainty Herding: One Active Learning Method for All Label Budgets
On Rollouts in Model-Based Reinforcement Learning
Matryoshka Multimodal Models
Combining Induction and Transduction for Abstract Reasoning
SoftMatcha: A Soft and Fast Pattern Matcher for Billion-Scale Corpus Searches
VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation
Topograph: An Efficient Graph-Based Framework for Strictly Topology Preserving Image Segmentation
MorphoDiff: Cellular Morphology Painting with Diffusion Models
Accurate and Scalable Graph Neural Networks via Message Invariance
The Breakdown of Gaussian Universality in Classification of High-dimensional Linear Factor Mixtures
LoCoDL: Communication-Efficient Distributed Learning with Local Training and Compression
Protein Language Model Fitness is a Matter of Preference
Biologically Constrained Barrel Cortex Model Integrates Whisker Inputs and Replicates Key Brain Network Dynamics
Triples as the Key: Structuring Makes Decomposition and Verification Easier in LLM-based TableQA
SoftCVI: Contrastive variational inference with self-generated soft labels
Think Then React: Towards Unconstrained Action-to-Reaction Motion Generation
CL-DiffPhyCon: Closed-loop Diffusion Control of Complex Physical Systems
Execution-guided within-prompt search for programming-by-example
RA-TTA: Retrieval-Augmented Test-Time Adaptation for Vision-Language Models
Attention as a Hypernetwork
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents
Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity
Gaussian Ensemble Belief Propagation for Efficient Inference in High-Dimensional, Black-box Systems
DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life
Select before Act: Spatially Decoupled Action Repetition for Continuous Control
Expected Sliced Transport Plans
Joint Reward and Policy Learning with Demonstrations and Human Feedback Improves Alignment
Divergence of Neural Tangent Kernel in Classification Problems
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Structure Language Models for Protein Conformation Generation
Matérn Kernels for Tunable Implicit Surface Reconstruction
SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection
Designing Mechanical Meta-Materials by Learning Equivariant Flows
Activation Gradient based Poisoned Sample Detection Against Backdoor Attacks
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving
Convergence of Distributed Adaptive Optimization with Local Updates
DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models
Learning vector fields of differential equations on manifolds with geometrically constrained operator-valued kernels
CARTS: Advancing Neural Theorem Proving with Diversified Tactic Calibration and Bias-Resistant Tree Search
An Online Learning Theory of Trading-Volume Maximization
Learning High-Degree Parities: The Crucial Role of the Initialization
Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective
Fast Direct: Query-Efficient Online Black-box Guidance for Diffusion-model Target Generation
Reasoning Elicitation in Language Models via Counterfactual Feedback
Latent Action Pretraining from Videos
Programming Refusal with Conditional Activation Steering
Scalable Mechanistic Neural Networks
ObscuraCoder: Powering Efficient Code LM Pre-Training Via Obfuscation Grounding
Vertical Federated Learning with Missing Features During Training and Inference
Co$^{\mathbf{3}}$Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion
Correlation and Navigation in the Vocabulary Key Representation Space of Language Models
KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models
Self-supervised contrastive learning performs non-linear system identification
Fengbo: a Clifford Neural Operator pipeline for 3D PDEs in Computational Fluid Dynamics
A Theoretically-Principled Sparse, Connected, and Rigid Graph Representation of Molecules
Deep Random Features for Scalable Interpolation of Spatiotemporal Data
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
SmartRAG: Jointly Learn RAG-Related Tasks From the Environment Feedback
SparsyFed: Sparse Adaptive Federated Learning
REGENT: A Retrieval-Augmented Generalist Agent That Can Act In-Context in New Environments
Decision Information Meets Large Language Models: The Future of Explainable Operations Research
TRENDy: Temporal Regression of Effective Nonlinear Dynamics
Don't Take Things Out of Context: Attention Intervention for Enhancing Chain-of-Thought Reasoning in Large Language Models
Adaptive Transformer Programs: Bridging the Gap Between Performance and Interpretability in Transformers
Variational Best-of-N Alignment
Aligned LLMs Are Not Aligned Browser Agents
Adaptive Pruning of Pretrained Transformer via Differential Inclusions
PivotMesh: Generic 3D Mesh Generation via Pivot Vertices Guidance
Distributional Associations vs In-Context Reasoning: A Study of Feed-forward and Attention Layers
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Text2PDE: Latent Diffusion Models for Accessible Physics Simulation
Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics
Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective
Training Robust Ensembles Requires Rethinking Lipschitz Continuity
Perm: A Parametric Representation for Multi-Style 3D Hair Modeling
ActionReasoningBench: Reasoning about Actions with and without Ramification Constraints
Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation
CBraMod: A Criss-Cross Brain Foundation Model for EEG Decoding
Re-Evaluating the Impact of Unseen-Class Unlabeled Data on Semi-Supervised Learning Model
MAP: Multi-Human-Value Alignment Palette
Towards Synergistic Path-based Explanations for Knowledge Graph Completion: Exploration and Evaluation
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions
Breaking Free from MMI: A New Frontier in Rationalization by Probing Input Utilization
Generalization in VAE and Diffusion Models: A Unified Information-Theoretic Analysis
Learning-Augmented Frequent Directions
Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning
A Large-scale Dataset and Benchmark for Commuting Origin-Destination Flow Generation
Weighted Multi-Prompt Learning with Description-free Large Language Model Distillation
On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
A Unifying Framework for Representation Learning
Three Mechanisms of Feature Learning in a Linear Network
DyCAST: Learning Dynamic Causal Structure from Time Series
Online Clustering with Nearly Optimal Consistency
Building Math Agents with Multi-Turn Iterative Preference Learning
GMValuator: Similarity-based Data Valuation for Generative Models
GenDataAgent: On-the-fly Dataset Augmentation with Synthetic Data
Outlier Synthesis via Hamiltonian Monte Carlo for Out-of-Distribution Detection
Small Models are LLM Knowledge Triggers for Medical Tabular Prediction
Learning-Augmented Search Data Structures
Contextual Document Embeddings
Continuity-Preserving Convolutional Autoencoders for Learning Continuous Latent Dynamical Models from Images
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior
ALLaM: Large Language Models for Arabic and English
Efficient Perplexity Bound and Ratio Matching in Discrete Diffusion Language Models
A Theoretical Perspective: How to Prevent Model Collapse in Self-consuming Training Loops
Learning to Discover Regulatory Elements for Gene Expression Prediction
Controllable Blur Data Augmentation Using 3D-Aware Motion Estimation
GANDALF: Generative AttentioN based Data Augmentation and predictive modeLing Framework for personalized cancer treatment
Topological Schrödinger Bridge Matching
Moral Alignment for LLM Agents
Reassessing How to Compare and Improve the Calibration of Machine Learning Models
F-Fidelity: A Robust Framework for Faithfulness Evaluation of Explainable AI
Motion Control of High-Dimensional Musculoskeletal Systems with Hierarchical Model-Based Planning
HELM: Hierarchical Encoding for mRNA Language Modeling
Think while You Generate: Discrete Diffusion with Planned Denoising
Everything is Editable: Extend Knowledge Editing to Unstructured Data in Large Language Models
Mitigating Memorization in Language Models
Zero-Shot Natural Language Explanations
Fine-tuning can Help Detect Pretraining Data from Large Language Models
Rethinking Invariance Regularization in Adversarial Training to Improve Robustness-Accuracy Trade-off
Following the Human Thread in Social Navigation
DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving
Residual Stream Analysis with Multi-Layer SAEs
EmbodiedSAM: Online Segment Any 3D Thing in Real Time
Unsupervised Disentanglement of Content and Style via Variance-Invariance Constraints
AdaManip: Adaptive Articulated Object Manipulation Environments and Policy Learning
ContraDiff: Planning Towards High Return States via Contrastive Learning
Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Dynamic Scenes
Modeling Unseen Environments with Language-guided Composable Causal Components in Reinforcement Learning
Framer: Interactive Frame Interpolation
DartControl: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control
MambaExtend: A Training-Free Approach to Improve Long Context Extension of Mamba
StochSync: Stochastic Diffusion Synchronization for Image Generation in Arbitrary Spaces
Going Beyond Static: Understanding Shifts with Time-Series Attribution
Contrastive Learning from Synthetic Audio Doppelgängers
Nonlinear multiregion neural dynamics with parametric impulse response communication channels
Graph Neural Networks for Edge Signals: Orientation Equivariance and Invariance
Incorporating Visual Correspondence into Diffusion Model for Virtual Try-On
POTEC: Off-Policy Contextual Bandits for Large Action Spaces via Policy Decomposition
Does Safety Training of LLMs Generalize to Semantically Related Natural Prompts?
3D-SPATIAL MULTIMODAL MEMORY
Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction
Self-supervised Monocular Depth Estimation Robust to Reflective Surface Leveraged by Triplet Mining
VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
RobustKV: Defending Large Language Models against Jailbreak Attacks via KV Eviction
Competition Dynamics Shape Algorithmic Phases of In-Context Learning
Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization
Learning to engineer protein flexibility
TSC-Net: Prediction of Pedestrian Trajectories by Trajectory-Scene-Cell Classification
Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning
Beyond Circuit Connections: A Non-Message Passing Graph Transformer Approach for Quantum Error Mitigation
On Conformal Isometry of Grid Cells: Learning Distance-Preserving Position Embedding
Deep Linear Probe Generators for Weight Space Learning
IntersectionZoo: Eco-driving for Benchmarking Multi-Agent Contextual Reinforcement Learning
Finding Shared Decodable Concepts and their Negations in the Brain
Adapters for Altering LLM Vocabularies: What Languages Benefit the Most?
MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane Representation
An Asynchronous Bundle Method for Distributed Learning Problems
Agent Skill Acquisition for Large Language Models via CycleQD
GraphArena: Evaluating and Exploring Large Language Models on Graph Computation
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems
Efficient stagewise pretraining via progressive subnetworks
CtD: Composition through Decomposition in Emergent Communication
Locally Connected Echo State Networks for Time Series Forecasting
Efficient Interpolation between Extragradient and Proximal Methods for Weak MVIs
PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer
LaMPlace: Learning to Optimize Cross-Stage Metrics in Macro Placement
Progressive Parameter Efficient Transfer Learning for Semantic Segmentation
Ranking-aware adapter for text-driven image ordering with CLIP
COMBO: Compositional World Models for Embodied Multi-Agent Cooperation
To Tackle Adversarial Transferability: A Novel Ensemble Training Method with Fourier Transformation
A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement
BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models
Dissecting Adversarial Robustness of Multimodal LM Agents
Calibrating LLMs with Information-Theoretic Evidential Deep Learning
SMITE: Segment Me In TimE
PABBO: Preferential Amortized Black-Box Optimization
Federated Granger Causality Learning For Interdependent Clients With State Space Representation
ShEPhERD: Diffusing shape, electrostatics, and pharmacophores for bioisosteric drug design
Toward Efficient Multi-Agent Exploration With Trajectory Entropy Maximization
Improving Semantic Understanding in Speech Language Models via Brain-tuning
Permute-and-Flip: An optimally stable and watermarkable decoder for LLMs
Scaling Laws for Adversarial Attacks on Language Model Activations and Tokens
CameraCtrl: Enabling Camera Control for Video Diffusion Models
NEAR: A Training-Free Pre-Estimator of Machine Learning Model Performance
PhyloVAE: Unsupervised Learning of Phylogenetic Trees via Variational Autoencoders
Training-free Camera Control for Video Generation
From GNNs to Trees: Multi-Granular Interpretability for Graph Neural Networks
Prevalence of Negative Transfer in Continual Reinforcement Learning: Analyses and a Simple Baseline
Improving Equivariant Networks with Probabilistic Symmetry Breaking
Glad: A Streaming Scene Generator for Autonomous Driving
Scaling Large Language Model-based Multi-Agent Collaboration
Gaussian Differentially Private Human Faces Under a Face Radial Curve Representation
Simulating Training Dynamics to Reconstruct Training Data from Deep Neural Networks
Unsupervised Meta-Learning via In-Context Learning
Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs
An Exploration with Entropy Constrained 3D Gaussians for 2D Video Compression
The Directionality of Optimization Trajectories in Neural Networks
TopoDiffusionNet: A Topology-aware Diffusion Model
Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate
Oracle efficient truncated statistics
Self-Improving Robust Preference Optimization
BaB-ND: Long-Horizon Motion Planning with Branch-and-Bound and Neural Dynamics
Radar: Fast Long-Context Decoding for Any Transformer
AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation
A Policy-Gradient Approach to Solving Imperfect-Information Games with Best-Iterate Convergence
Conformalized Survival Analysis for General Right-Censored Data
Truncated Consistency Models
Exploiting Hidden Symmetry to Improve Objective Perturbation for DP linear learners with a nonsmooth L1-norm
Actions Speak Louder Than Words: Rate-Reward Trade-off in Markov Decision Processes
Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo
Learning Dynamics of Deep Matrix Factorization Beyond the Edge of Stability
Learning to Steer Markovian Agents under Model Uncertainty
Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Federated Few-Shot Class-Incremental Learning
What Makes a Maze Look Like a Maze?
Sparse components distinguish visual pathways & their alignment to neural networks
LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics
Conditional Testing based on Localized Conformal $p$-values
Revisiting text-to-image evaluation with Gecko: on metrics, prompts, and human rating
Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling
PvNeXt: Rethinking Network Design and Temporal Motion for Point Cloud Video Recognition
The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation
Problem-Parameter-Free Federated Learning
TLDR: Token-Level Detective Reward Model for Large Vision Language Models
Rethinking Spiking Neural Networks from an Ensemble Learning Perspective
One-for-All Few-Shot Anomaly Detection via Instance-Induced Prompt Learning
Self-Play Preference Optimization for Language Model Alignment
Training Large Language Models for Retrieval-Augmented Question Answering through Backtracking Correction
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages
CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes
Multiplicative Logit Adjustment Approximates Neural-Collapse-Aware Decision Boundary Adjustment
Retrieval Augmented Diffusion Model for Structure-informed Antibody Design and Optimization
Implicit Bias of Mirror Flow for Shallow Neural Networks in Univariate Regression
Investigating Pattern Neurons in Urban Time Series Forecasting
SaMer: A Scenario-aware Multi-dimensional Evaluator for Large Language Models
SOAP: Improving and Stabilizing Shampoo using Adam for Language Modeling
Self-Updatable Large Language Models by Integrating Context into Model Parameters
Streamlining Redundant Layers to Compress Large Language Models
Multimodal Situational Safety
Diff3DS: Generating View-Consistent 3D Sketch via Differentiable Curve Rendering
TASAR: Transfer-based Attack on Skeletal Action Recognition
Wasserstein-Regularized Conformal Prediction under General Distribution Shift
Fast and Slow Streams for Online Time Series Forecasting Without Information Leakage
$\text{D}_{2}\text{O}$: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
Does Refusal Training in LLMs Generalize to the Past Tense?
Gaussian Head & Shoulders: High Fidelity Neural Upper Body Avatars with Anchor Gaussian Guided Texture Warping
ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning
Robust Feature Learning for Multi-Index Models in High Dimensions
Stabilized Neural Prediction of Potential Outcomes in Continuous Time
High-Quality Joint Image and Video Tokenization with Causal VAE
Injecting Universal Jailbreak Backdoors into LLMs in Minutes
Learning Continually by Spectral Regularization
Searching for Optimal Solutions with LLMs via Bayesian Optimization
The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning
Training Neural Networks as Recognizers of Formal Languages
Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection
MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models
Adversarially Robust Out-of-Distribution Detection Using Lyapunov-Stabilized Embeddings
Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation
IterGen: Iterative Semantic-aware Structured LLM Generation with Backtracking
QPM: Discrete Optimization for Globally Interpretable Image Classification
Targeted Attack Improves Protection against Unauthorized Diffusion Customization
MMDisCo: Multi-Modal Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation
Multi-agent cooperation through learning-aware policy gradients
DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
How Learnable Grids Recover Fine Detail in Low Dimensions: A Neural Tangent Kernel Analysis of Multigrid Parametric Encodings
Learning Geometric Reasoning Networks For Robot Task And Motion Planning
Rethinking Neural Multi-Objective Combinatorial Optimization via Neat Weight Embedding
Denoising Autoregressive Transformers for Scalable Text-to-Image Generation
Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models
Towards Multiple Character Image Animation Through Enhancing Implicit Decoupling
Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
Is Factuality Enhancement a Free Lunch For LLMs? Better Factuality Can Lead to Worse Context-Faithfulness
Implicit In-context Learning
Understanding and Enhancing the Transferability of Jailbreaking Attacks
RecDreamer: Consistent Text-to-3D Generation via Uniform Score Distillation
3D StreetUnveiler with Semantic-aware 2DGS - a simple baseline
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models
Inner Information Analysis Algorithm for Deep Neural Network based on Community
AgentStudio: A Toolkit for Building General Virtual Agents
Node Similarities under Random Projections: Limits and Pathological Cases
Towards Scalable Topological Regularizers
DeepGate4: Efficient and Effective Representation Learning for Circuit Design at Scale
Population Transformer: Learning Population-level Representations of Neural Activity
Adapting Multi-modal Large Language Model to Concept Drift From Pre-training Onwards
Rethinking Shapley Value for Negative Interactions in Non-convex Games
No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models
Teaching LLMs How to Learn with Contextual Fine-Tuning
X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing
The Case for Cleaner Biosignals: High-fidelity Neural Compressor Enables Transfer from Cleaner iEEG to Noisier EEG
SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Model
Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement Learning
Second-Order Fine-Tuning without Pain for LLMs: A Hessian Informed Zeroth-Order Optimizer
MeshMask: Physics-Based Simulations with Masked Graph Neural Networks
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Efficient Sparse PCA via Block-Diagonalization
The Crucial Role of Samplers in Online Direct Preference Optimization
Towards Generalizable Reinforcement Learning via Causality-Guided Self-Adaptive Representations
Bridging the Gap between Variational Inference and Stochastic Gradient MCMC in Function Space
Topological Blindspots: Understanding and Extending Topological Deep Learning Through the Lens of Expressivity
The Effectiveness of Curvature-Based Rewiring and the Role of Hyperparameters in GNNs Revisited
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
CoMRes: Semi-Supervised Time Series Forecasting Utilizing Consensus Promotion of Multi-Resolution
The Geometry of Categorical and Hierarchical Concepts in Large Language Models
Offline RL in Regular Decision Processes: Sample Efficiency via Language Metrics
MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements
Human-Aligned Chess With a Bit of Search
Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass
Rethinking Self-Distillation: Label Averaging and Enhanced Soft Label Refinement with Partial Labels
Neural Exploratory Landscape Analysis for Meta-Black-Box-Optimization
Can Generative AI Solve Your In-Context Learning Problem? A Martingale Perspective
Making Transformer Decoders Better Differentiable Indexers
Cut Your Losses in Large-Vocabulary Language Models
Bayesian Optimization of Antibodies Informed by a Generative Model of Evolving Sequences
KinPFN: Bayesian Approximation of RNA Folding Kinetics using Prior-Data Fitted Networks
Scaling FP8 training to trillion-token LLMs
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory
Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces
PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs
Selective induction Heads: How Transformers Select Causal Structures in Context
SOO-Bench: Benchmarks for Evaluating the Stability of Offline Black-Box Optimization
Attention layers provably solve single-location regression
Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap
Convergence and Implicit Bias of Gradient Descent on Continual Linear Classification
ImDy: Human Inverse Dynamics from Imitated Observations
Leveraging Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning
Towards a learning theory of representation alignment
Dynamic Assortment Selection and Pricing with Censored Preference Feedback
ReMatching Dynamic Reconstruction Flow
Straightness of Rectified Flow: A Theoretical Insight into Wasserstein Convergence
A Large-scale Training Paradigm for Graph Generative Models
Noise-conditioned Energy-based Annealed Rewards (NEAR): A Generative Framework for Imitation Learning from Observation
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
Swift4D: Adaptive divide-and-conquer Gaussian Splatting for compact and efficient reconstruction of dynamic scene
Out-of-distribution Generalization for Total Variation based Invariant Risk Minimization
Interactive Adjustment for Human Trajectory Prediction with Individual Feedback
Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-training of Deep Networks
Beyond FVD: An Enhanced Evaluation Metrics for Video Generation Distribution Quality
Estimating the Probabilities of Rare Outputs in Language Models
Probabilistic Language-Image Pre-Training
CoInD: Enabling Logical Compositions in Diffusion Models
Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
Physics-aligned field reconstruction with diffusion bridge
Find A Winning Sign: Sign Is All We Need to Win the Lottery
Comparing noisy neural population dynamics using optimal transport distances
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities
Can Textual Gradient Work in Federated Learning?
Progressive Compression with Universally Quantized Diffusion Models
Bridging Context Gaps: Leveraging Coreference Resolution for Long Contextual Understanding
Strength Estimation and Human-Like Strength Adjustment in Games
eQMARL: Entangled Quantum Multi-Agent Reinforcement Learning for Distributed Cooperation over Quantum Channels
SPDIM: Source-Free Unsupervised Conditional and Label Shift Adaptation in EEG
CViT: Continuous Vision Transformer for Operator Learning
Generative Verifiers: Reward Modeling as Next-Token Prediction
Class Distribution-induced Attention Map for Open-vocabulary Semantic Segmentations
CR-CTC: Consistency regularization on CTC for improved speech recognition
Fitting Networks with a Cancellation Trick
Tight Time Complexities in Parallel Stochastic Optimization with Arbitrary Computation Dynamics
ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Knowledge Distillation with Multi-granularity Mixture of Priors for Image Super-Resolution
Distilling Reinforcement Learning Algorithms for In-Context Model-Based Planning
Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond
How to Evaluate Reward Models for RLHF
GeoILP: A Synthetic Dataset to Guide Large-Scale Rule Induction
Number Cookbook: Number Understanding of Language Models and How to Improve It
Online Preference Alignment for Language Models via Count-based Exploration
On-the-fly Preference Alignment via Principle-Guided Decoding
Towards Interpreting Visual Information Processing in Vision-Language Models
LucidPPN: Unambiguous Prototypical Parts Network for User-centric Interpretable Computer Vision
Occlusion-aware Non-Rigid Point Cloud Registration via Unsupervised Neural Deformation Correntropy
Feedback Favors the Generalization of Neural ODEs
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
Wasserstein Distances, Neuronal Entanglement, and Sparsity
VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning
X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale
Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions
Overcoming Lower-Level Constraints in Bilevel Optimization: A Novel Approach with Regularized Gap Functions
Decentralized Sporadic Federated Learning: A Unified Algorithmic Framework with Convergence Guarantees
Identifying latent state transitions in non-linear dynamical systems
Learning Clustering-based Prototypes for Compositional Zero-Shot Learning
MotionDreamer: One-to-Many Motion Synthesis with Localized Generative Masked Transformer
Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs
Denoising with a Joint-Embedding Predictive Architecture
DELTA: DENSE EFFICIENT LONG-RANGE 3D TRACKING FOR ANY VIDEO
Large Language Models Assume People are More Rational than We Really are
CodePlan: Unlocking Reasoning Potential in Large Language Models by Scaling Code-form Planning
In-context Time Series Predictor
Beyond Next Token Prediction: Patch-Level Training for Large Language Models
Locality-aware Gaussian Compression for Fast and High-quality Rendering
MixMax: Distributional Robustness in Function Space via Optimal Data Mixtures
Calibrating Expressions of Certainty
Budgeted Online Continual Learning by Adaptive Layer Freezing and Frequency-based Sampling
Generalizable Human Gaussians from Single-View Image
Tool-Planner: Task Planning with Clusters across Multiple Tools
FreeVS: Generative View Synthesis on Free Driving Trajectory
Offline Hierarchical Reinforcement Learning via Inverse Optimization
SCOPE: A Self-supervised Framework for Improving Faithfulness in Conditional Text Generation
ImProver: Agent-Based Automated Proof Optimization
FOSP: Fine-tuning Offline Safe Policy through World Models
Quantum-PEFT: Ultra parameter-efficient fine-tuning
Data Selection via Optimal Control for Language Models
Reasoning with Latent Thoughts: On the Power of Looped Transformers
Improving Deep Regression with Tightness
Gaussian Splatting Lucas-Kanade
FACTS: A Factored State-Space Framework for World Modelling
Progressive Token Length Scaling in Transformer Encoders for Efficient Universal Segmentation
Aligned Datasets Improve Detection of Latent Diffusion-Generated Images
Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace
Learning Diagrams: A Graphical Language for Compositional Training Regimes
DeepTAGE: Deep Temporal-Aligned Gradient Enhancement for Optimizing Spiking Neural Networks
TC-MoE: Augmenting Mixture of Experts with Ternary Expert Choice
RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph
3D-MolT5: Leveraging Discrete Structural Information for Molecule-Text Modeling
Federated Residual Low-Rank Adaption of Large Language Models
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models
Ensembling Diffusion Models via Adaptive Feature Aggregation
Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model
Dreamweaver: Learning Compositional World Models from Pixels
PAD: Personalized Alignment at Decoding-time
Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition
Matrix Product Sketching via Coordinated Sampling
Look Before You Leap: Universal Emergent Mechanism for Retrieval in Language Models
Language Representations Can be What Recommenders Need: Findings and Potentials
UniDetox: Universal Detoxification of Large Language Models via Dataset Distillation
Preserving Deep Representations in One-Shot Pruning: A Hessian-Free Second-Order Optimization Framework
DataMan: Data Manager for Pre-training Large Language Models
Rethinking Audio-Visual Adversarial Vulnerability from Temporal and Modality Perspectives
GraphRouter: A Graph-based Router for LLM Selections
CBQ: Cross-Block Quantization for Large Language Models
Multi-Resolution Decomposable Diffusion Model for Non-Stationary Time Series Anomaly Detection
How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?
DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation
Looking Inward: Language Models Can Learn About Themselves by Introspection
TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation
A Meta-Learning Approach to Bayesian Causal Discovery
Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning
Diffusion Bridge Implicit Models
Disentangled Representation Learning with the Gromov-Monge Gap
Persistent Pre-training Poisoning of LLMs
Why Does the Effective Context Length of LLMs Fall Short?
Influence Functions for Scalable Data Attribution in Diffusion Models
Parameter Expanded Stochastic Gradient Markov Chain Monte Carlo
Action Sequence Augmentation for Action Anticipation
Combatting Dimensional Collapse in LLM Pre-Training Data via Submodular File Selection
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
Controllable Satellite-to-Street-View Synthesis with Precise Pose Alignment and Zero-Shot Environmental Control
Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs
Efficient Discovery of Pareto Front for Multi-Objective Reinforcement Learning
OpenPRM: Building Open-domain Process-based Reward Models with Preference Trees
Generalization Guarantees for Representation Learning via Data-Dependent Gaussian Mixture Priors
Recovery of Causal Graph Involving Latent Variables via Homologous Surrogates
What is Wrong with Perplexity for Long-context Language Modeling?
LongMamba: Enhancing Mamba's Long-Context Capabilities via Training-Free Receptive Field Enlargement
Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets
PhyMPGN: Physics-encoded Message Passing Graph Network for spatiotemporal PDE systems
GReaTer: Gradients Over Reasoning Makes Smaller Language Models Strong Prompt Optimizers
On Evaluating the Durability of Safeguards for Open-Weight LLMs
Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization
Can We Ignore Labels in Out of Distribution Detection?
Optimality of Matrix Mechanism on $\ell_p^p$-metric
Teaching Human Behavior Improves Content Understanding Abilities Of VLMs
$\phi$-Update: A Class of Policy Update Methods with Policy Convergence Guarantee
Scalable Bayesian Learning with posteriors
Efficient Imitation under Misspecification
Causal Graph Transformer for Treatment Effect Estimation Under Unknown Interference
UGMathBench: A Diverse and Dynamic Benchmark for Undergraduate-Level Mathematical Reasoning with Large Language Models
Simple ReFlow: Improved Techniques for Fast Flow Models
Vision Language Models are In-Context Value Learners
FIG: Flow with Interpolant Guidance for Linear Inverse Problems
ADePT: Adaptive Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning
A Unified Theory of Quantum Neural Network Loss Landscapes
CL-MFAP: A Contrastive Learning-Based Multimodal Foundation Model for Molecular Property Prediction and Antibiotic Screening
Eliminating Position Bias of Language Models: A Mechanistic Approach
ADBM: Adversarial Diffusion Bridge Model for Reliable Adversarial Purification
Black Sheep in the Herd: Playing with Spuriously Correlated Attributes for Vision-Language Recognition
Multi-domain Distribution Learning for De Novo Drug Design
A Computational Framework for Modeling Emergence of Color Vision in the Human Brain
Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness
Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A Benchmark
Exploring channel distinguishability in local neighborhoods of the model space in quantum neural networks
CrossMPT: Cross-attention Message-passing Transformer for Error Correcting Codes
Integrative Decoding: Improving Factuality via Implicit Self-consistency
Bilinear MLPs enable weight-based mechanistic interpretability
DocMIA: Document-Level Membership Inference Attacks against DocVQA Models
Learning stochastic dynamics from snapshots through regularized unbalanced optimal transport
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining
Representative Guidance: Diffusion Model Sampling with Coherence
MAI: A Multi-turn Aggregation-Iteration Model for Composed Image Retrieval
Improving the Sparse Structure Learning of Spiking Neural Networks from the View of Compression Efficiency
Unlocking Global Optimality in Bilevel Optimization: A Pilot Study
GraphBridge: Towards Arbitrary Transfer Learning in GNNs
Score Forgetting Distillation: A Swift, Data-Free Method for Machine Unlearning in Diffusion Models
ClimaQA: An Automated Evaluation Framework for Climate Question Answering Models
A Conditional Independence Test in the Presence of Discretization
Rethinking Visual Counterfactual Explanations Through Region Constraint
Minimal Variance Model Aggregation: A principled, non-intrusive, and versatile integration of black box models
ML4TSPBench: Drawing Methodological Principles for TSP and Beyond from Streamlined Design Space of Learning and Search
Beyond Surface Structure: A Causal Assessment of LLMs' Comprehension ability
An Auditing Test to Detect Behavioral Shift in Language Models
Sensitivity Verification for Additive Decision Tree Ensembles
RNNs are not Transformers (Yet): The Key Bottleneck on In-Context Retrieval
BANGS: Game-theoretic Node Selection for Graph Self-Training
Second Order Bounds for Contextual Bandits with Function Approximation
Sharpness-Aware Black-Box Optimization
Improving Complex Reasoning with Dynamic Prompt Corruption: A Soft Prompt Optimization Approach
Bayesian Experimental Design Via Contrastive Diffusions
Diffusion Bridge AutoEncoders for Unsupervised Representation Learning
Probabilistic Neural Pruning via Sparsity Evolutionary Fokker-Planck-Kolmogorov Equation
GI-GS: Global Illumination Decomposition on Gaussian Splatting for Inverse Rendering
SAGEPhos: Sage Bio-Coupled and Augmented Fusion for Phosphorylation Site Detection
MamKO: Mamba-based Koopman operator for modeling and predictive control
Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration
Large Language Models are Interpretable Learners
Rethinking and Improving Autoformalization: Towards a Faithful Metric and a Dependency Retrieval-based Approach
Privately Counting Partially Ordered Data
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization
Causal Graphical Models for Vision-Language Compositional Understanding
SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Towards Robust Multimodal Open-set Test-time Adaptation via Adaptive Entropy-aware Optimization
Causal Representation Learning from Multimodal Biomedical Observations
Adversarial Search Engine Optimization for Large Language Models
OvercookedV2: Rethinking Overcooked for Zero-Shot Coordination
Mind Control through Causal Inference: Predicting Clean Images from Poisoned Data
TDDBench: A Benchmark for Training data detection
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Investigating the Pre-Training Dynamics of In-Context Learning: Task Recognition vs. Task Learning
Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond
DPaI: Differentiable Pruning at Initialization with Node-Path Balance Principle
Modeling Complex System Dynamics with Flow Matching Across Time and Conditions
Deep Distributed Optimization for Large-Scale Quadratic Programming
Parameter and Memory Efficient Pretraining via Low-rank Riemannian Optimization
SymDiff: Equivariant Diffusion via Stochastic Symmetrisation
ECHOPulse: ECG Controlled Echocardio-gram Video Generation
MM-EMBED: UNIVERSAL MULTIMODAL RETRIEVAL WITH MULTIMODAL LLMS
Simple Guidance Mechanisms for Discrete Diffusion Models
Bootstrapped Model Predictive Control
Lipschitz Bandits in Optimal Space
Interpreting Language Reward Models via Contrastive Explanations
Minimax Optimal Reinforcement Learning with Quasi-Optimism
Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency
Improving Graph Neural Networks by Learning Continuous Edge Directions
Adam-mini: Use Fewer Learning Rates To Gain More
EcoFace: Audio-Visual Emotional Co-Disentanglement Speech-Driven 3D Talking Face Generation
Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion
The Optimization Landscape of SGD Across the Feature Learning Strength
DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding
3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting
Stochastic variance-reduced Gaussian variational inference on the Bures-Wasserstein manifold
Fast Summation of Radial Kernels via QMC Slicing
ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance
Selective Aggregation for Low-Rank Adaptation in Federated Learning
REMEDY: Recipe Merging Dynamics in Large Vision-Language Models
Language models scale reliably with over-training and on downstream tasks
THE ROBUSTNESS OF DIFFERENTIABLE CAUSAL DISCOVERY IN MISSPECIFIED SCENARIOS
Uncertainty and Influence aware Reward Model Refinement for Reinforcement Learning from Human Feedback
Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences
Fast and Accurate Blind Flexible Docking
Adaptive Batch Size for Privately Finding Second-Order Stationary Points
The Value of Sensory Information to a Robot
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
AI as Humanity’s Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text
Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data
M^3PC: Test-time Model Predictive Control using Pretrained Masked Trajectory Model
Risk-Sensitive Variational Actor-Critic: A Model-Based Approach
Can Neural Networks Achieve Optimal Computational-statistical Tradeoff? An Analysis on Single-Index Model
Action abstractions for amortized sampling
SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
Noisy Test-Time Adaptation in Vision-Language Models
Temporal Difference Learning: Why It Can Be Fast and How It Will Be Faster
Optimizing importance weighting in the presence of sub-population shifts
Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
Diffusion-based Neural Network Weights Generation
Attribute-based Visual Reprogramming for Vision-Language Models
Competitive Fair Scheduling with Predictions
AnalogGenie: A Generative Engine for Automatic Discovery of Analog Circuit Topologies
Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency
Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting
ComLoRA: A Competitive Learning Approach for Enhancing LoRA
On the Convergence of No-Regret Dynamics in Information Retrieval Games with Proportional Ranking Functions
Training on the Test Task Confounds Evaluation and Emergence
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
HeadMap: Locating and Enhancing Knowledge Circuits in LLMs
UniDrive: Towards Universal Driving Perception Across Camera Configurations
SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning
Large Language Models can Become Strong Self-Detoxifiers
Learning Interleaved Image-Text Comprehension in Vision-Language Large Models
Extending Mercer's expansion to indefinite and asymmetric kernels
AssembleFlow: Rigid Flow Matching with Inertial Frames for Molecular Assembly
The AdEMAMix Optimizer: Better, Faster, Older
Preference Optimization for Reasoning with Pseudo Feedback
The OMG dataset: An Open MetaGenomic corpus for mixed-modality genomic language modeling
To Clip or not to Clip: the Dynamics of SGD with Gradient Clipping in High-Dimensions
Quantifying Generalization Complexity for Large Language Models
Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration
Indirect Gradient Matching for Adversarial Robust Distillation
Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context
DUET: Decentralized Bilevel Optimization without Lower-Level Strong Convexity
Identifiable Exchangeable Mechanisms for Causal Structure and Representation Learning
Feedback Schrödinger Bridge Matching
Predictive Uncertainty Quantification for Bird's Eye View Segmentation: A Benchmark and Novel Loss Function
MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility
Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment
ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents
Unveiling the Magic of Code Reasoning through Hypothesis Decomposition and Amendment
PolaFormer: Polarity-aware Linear Attention for Vision Transformers
Unearthing Skill-level Insights for Understanding Trade-offs of Foundation Models
Semantic Loss Guided Data Efficient Supervised Fine Tuning for Safe Responses in LLMs
DeeperForward: Enhanced Forward-Forward Training for Deeper and Better Performance
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Copyright-Protected Language Generation via Adaptive Model Fusion
Beyond single neurons: population response geometry in digital twins of mouse visual cortex
Open-Set Graph Anomaly Detection via Normal Structure Regularisation
Safety Layers in Aligned Large Language Models: The Key to LLM Security
Node-Time Conditional Prompt Learning in Dynamic Graphs
Ward: Provable RAG Dataset Inference via LLM Watermarks
Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation
Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
Boosting the visual interpretability of CLIP via adversarial fine-tuning
Convergent Privacy Loss of Noisy-SGD without Convexity and Smoothness
MA$^2$E: Addressing Partial Observability in Multi-Agent Reinforcement Learning with Masked Auto-Encoder
RMB: Comprehensively benchmarking reward models in LLM alignment
Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion Model
Dobi-SVD: Differentiable SVD for LLM Compression and Some New Perspectives
Finally Rank-Breaking Conquers MNL Bandits: Optimal and Efficient Algorithms for MNL Assortment
Reti-Diff: Illumination Degradation Image Restoration with Retinex-based Latent Diffusion Model
InstaSHAP: Interpretable Additive Models Explain Shapley Values Instantly
Collapsed Language Models Promote Fairness
Semi-Parametric Retrieval via Binary Bag-of-Tokens Index
Prompt as Knowledge Bank: Boost Vision-language model via Structural Representation for zero-shot medical detection
Taming Overconfidence in LLMs: Reward Calibration in RLHF
Controlling Language and Diffusion Models by Transporting Activations
HiBug2: Efficient and Interpretable Error Slice Discovery for Comprehensive Model Debugging
ADIFF: Explaining audio difference using natural language
CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux Modelling
Gaussian Mixture Counterfactual Generator
BitStack: Any-Size Compression of Large Language Models in Variable Memory Environments
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
RFMamba: Frequency-Aware State Space Model for RF-Based Human-Centric Perception
Catastrophic Failure of LLM Unlearning via Quantization
Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data
Agent S: An Open Agentic Framework that Uses Computers Like a Human
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion
Efficient Top-m Data Values Identification for Data Selection
Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step
HMoRA: Making LLMs More Effective with Hierarchical Mixture of LoRA Experts
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
Enhancing Robust Fairness via Confusional Spectral Regularization
Do as I do (Safely): Mitigating Task-Specific Fine-tuning Risks in Large Language Models
Analysis of Linear Mode Connectivity via Permutation-Based Weight Matching: With Insights into Other Permutation Search Methods
Diffusion Actor-Critic: Formulating Constrained Policy Iteration as Diffusion Noise Regression for Offline Reinforcement Learning
ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
Symbolic regression via MDLformer-guided search: from minimizing prediction error to minimizing description length
Hidden in the Noise: Two-Stage Robust Watermarking for Images
Causal Concept Graph Models: Beyond Causal Opacity in Deep Learning
gRNAde: Geometric Deep Learning for 3D RNA inverse design
Boltzmann-Aligned Inverse Folding Model as a Predictor of Mutational Effects on Protein-Protein Interactions
CATCH: Channel-Aware Multivariate Time Series Anomaly Detection via Frequency Patching
LLM-SR: Scientific Equation Discovery via Programming with Large Language Models
Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape View
AdvPaint: Protecting Images from Inpainting Manipulation via Adversarial Attention Disruption
Is Your Video Language Model a Reliable Judge?
Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model
MQuAKE-Remastered: Multi-Hop Knowledge Editing Can Only Be Advanced with Reliable Evaluations
Coreset Selection via Reducible Loss in Continual Learning
Diffusion Policy Policy Optimization
CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
Last Iterate Convergence of Incremental Methods as a Model of Forgetting
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
E(3)-equivariant models cannot learn chirality: Field-based molecular generation
Adaptive Length Image Tokenization via Recurrent Allocation
From Tokens to Lattices: Emergent Lattice Structures in Language Models
Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation
Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
Subtask-Aware Visual Reward Learning from Segmented Demonstrations
Towards Federated RLHF with Aggregated Client Preference for LLMs
From Layers to States: A State Space Model Perspective to Deep Neural Network Layer Dynamics
Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models
Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment
Zeroth-Order Fine-Tuning of LLMs with Transferable Static Sparsity
Layout-your-3D: Controllable and Precise 3D Generation with 2D Blueprint
MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmentation
ThinK: Thinner Key Cache by Query-Driven Pruning
Generalization Bounds for Canonicalization: A Comparative Study with Group Averaging
Transformers Provably Solve Parity Efficiently with Chain of Thought
A Tight Convergence Analysis of Inexact Stochastic Proximal Point Algorithm for Stochastic Composite Optimization Problems
Linear combinations of latents in generative models: subspaces and beyond
Towards Semantic Equivalence of Tokenization in Multimodal LLM
cryoSPHERE: Single-Particle HEterogeneous REconstruction from cryo EM
OS-ATLAS: Foundation Action Model for Generalist GUI Agents
Balanced Neural ODEs: nonlinear model order reduction and Koopman operator approximations
Distilling Dataset into Neural Field
HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning
Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow
YouTube-SL-25: A Large-Scale, Open-Domain Multilingual Sign Language Parallel Corpus
Scale-Free Graph-Language Models
Misspecified $Q$-Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error
CAMEx: Curvature-aware Merging of Experts
Graph Neural Networks Are More Than Filters: Revisiting and Benchmarking from A Spectral Perspective
InversionGNN: A Dual Path Network for Multi-Property Molecular Optimization
Tree-Wasserstein Distance for High Dimensional Data with a Latent Feature Hierarchy
Large-scale and Fine-grained Vision-language Pre-training for Enhanced CT Image Understanding
ACC-Collab: An Actor-Critic Approach to Multi-Agent LLM Collaboration
The Unreasonable Ineffectiveness of the Deeper Layers
KinFormer: Generalizable Dynamical Symbolic Regression for Catalytic Organic Reaction Kinetics
Shifting the Paradigm: A Diffeomorphism Between Time Series Data Manifolds for Achieving Shift-Invariancy in Deep Learning
Neural Causal Graph for Interpretable and Intervenable Classification
MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction
Constraint-Conditioned Actor-Critic for Offline Safe Reinforcement Learning
Chunk-Distilled Language Modeling
DynFrs: An Efficient Framework for Machine Unlearning in Random Forest
Artificial Kuramoto Oscillatory Neurons
An Effective Manifold-based Optimization Method for Distributionally Robust Classification
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence
CAX: Cellular Automata Accelerated in JAX
ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding
ZAPBench: A Benchmark for Whole-Brain Activity Prediction in Zebrafish
Generative Classifiers Avoid Shortcut Solutions
Physics of Language Models: Part 3.2, Knowledge Manipulation
TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization With Estimated Weights
Think-on-Graph 2.0: Deep and Faithful Large Language Model Reasoning with Knowledge-guided Retrieval Augmented Generation
FaceShot: Bring Any Character into Life
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
DreamDistribution: Learning Prompt Distribution for Diverse In-distribution Generation
Track-On: Transformer-based Online Point Tracking with Memory
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Learning Graph Quantized Tokenizers
Do Mice Grok? Glimpses of Hidden Progress in Sensory Cortex
Support is All You Need for Certified VAE Training
Minimax Optimal Two-Stage Algorithm For Moment Estimation Under Covariate Shift
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models
Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning
On the Byzantine-Resilience of Distillation-Based Federated Learning
Large (Vision) Language Models are Unsupervised In-Context Learners
Achieving Dimension-Free Communication in Federated Learning via Zeroth-Order Optimization
Stable Segment Anything Model
Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models
Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning
Learning Causal Alignment for Reliable Disease Diagnosis
Solving New Tasks by Adapting Internet Video Knowledge
Stealthy Shield Defense: A Conditional Mutual Information-Based Approach against Black-Box Model Inversion Attacks
Exploring Local Memorization in Diffusion Models via Bright Ending Attention
A Generalist Hanabi Agent
Distribution-Free Data Uncertainty for Neural Network Regression
Multimodal Lego: Model Merging and Fine-Tuning Across Topologies and Modalities in Biomedicine
No Need to Talk: Asynchronous Mixture of Language Models
Data Scaling Laws in Imitation Learning for Robotic Manipulation
CONTRA: Conformal Prediction Region via Normalizing Flow Transformation
Control-oriented Clustering of Visual Latent Representation
Scalable and Certifiable Graph Unlearning: Overcoming the Approximation Error Barrier
Scalable Decision-Making in Stochastic Environments through Learned Temporal Abstraction
Leveraging Flatness to Improve Information-Theoretic Generalization Bounds for SGD
TEASER: Token Enhanced Spatial Modeling for Expressions Reconstruction
Think Thrice Before You Act: Progressive Thought Refinement in Large Language Models
Streamlining Prediction in Bayesian Deep Learning
ICLR: In-Context Learning of Representations
CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control
SOREL: A Stochastic Algorithm for Spectral Risks Minimization
Adaptive Shrinkage Estimation for Personalized Deep Kernel Regression in Modeling Brain Trajectories
Tackling Data Corruption in Offline Reinforcement Learning via Sequence Modeling
Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference
UV-Attack: Physical-World Adversarial Attacks on Person Detection via Dynamic-NeRF-based UV Mapping
On the Optimization Landscape of Low Rank Adaptation Methods for Large Language Models
Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing
Rethinking Invariance in In-context Learning
CofCA: A STEP-WISE Counterfactual Multi-hop QA benchmark
Forgetting Transformer: Softmax Attention with a Forget Gate
Retri3D: 3D Neural Graphics Representation Retrieval
$\gamma-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Iterative Label Refinement Matters More than Preference Optimization under Weak Supervision
Generalization Bounds and Model Complexity for Kolmogorov–Arnold Networks
Homomorphism Counts as Structural Encodings for Graph Learning
Multi-Perspective Data Augmentation for Few-shot Object Detection
Do LLMs ``know'' internally when they follow instructions?
Generalization, Expressivity, and Universality of Graph Neural Networks on Attributed Graphs
Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks
GravMAD: Grounded Spatial Value Maps Guided Action Diffusion for Generalized 3D Manipulation
LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
Decoupled Finetuning for Domain Generalizable Semantic Segmentation
Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling
How Much is a Noisy Image Worth? Data Scaling Laws for Ambient Diffusion.
Generalized Video Moment Retrieval
Causal Effect Estimation with Mixed Latent Confounders and Post-treatment Variables
Ambient Diffusion Posterior Sampling: Solving Inverse Problems with Diffusion Models Trained on Corrupted Data
PIORF: Physics-Informed Ollivier-Ricci Flow for Long–Range Interactions in Mesh Graph Neural Networks
Bridging the Semantic Gap Between Text and Table: A Case Study on NL2SQL
Prioritized Generative Replay
ILLUSION: Unveiling Truth with a Comprehensive Multi-Modal, Multi-Lingual Deepfake Dataset
Mitigating Information Loss in Tree-Based Reinforcement Learning via Direct Optimization
Locality Alignment Improves Vision-Language Models
Demystifying the Token Dynamics of Deep Selective State Space Models
DEEM: Diffusion models serve as the eyes of large language models for image perception
Balancing Bias in Two-sided Markets for Fair Stable Matchings
Spiking Vision Transformer with Saccadic Attention
Breaking the Reclustering Barrier in Centroid-based Deep Clustering
Balancing Act: Diversity and Consistency in Large Language Model Ensembles
Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought
Neuralized Markov Random Field for Interaction-Aware Stochastic Human Trajectory Prediction
Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth Study
TULIP: Token-length Upgraded CLIP
Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving
Exposure Bracketing Is All You Need For A High-Quality Image
TS-LIF: A Temporal Segment Spiking Neuron Network for Time Series Forecasting
PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration
Metric-Driven Attributions for Vision Transformers
SEMDICE: Off-policy State Entropy Maximization via Stationary Distribution Correction Estimation
Systematic Outliers in Large Language Models
N-ForGOT: Towards Not-forgetting and Generalization of Open Temporal Graph Learning
Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning
Biologically Plausible Brain Graph Transformer
Reasoning of Large Language Models over Knowledge Graphs with Super-Relations
PhyloLM: Inferring the Phylogeny of Large Language Models and Predicting their Performances in Benchmarks
Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
$InterLCM$: Low-Quality Images as Intermediate States of Latent Consistency Models for Effective Blind Face Restoration
PaLD: Detection of Text Partially Written by Large Language Models
MoLEx: Mixture of Layer Experts for Fine-tuning with Sparse Upcycling
Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits
Homomorphism Expressivity of Spectral Invariant Graph Neural Networks
High-quality Text-to-3D Character Generation with SparseCubes and Sparse Transformers.
DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image
Optimal Protocols for Continual Learning via Statistical Physics and Control Theory
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?
UIFace: Unleashing Inherent Model Capabilities to Enhance Intra-Class Diversity in Synthetic Face Recognition
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation
OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models
{$\tau$}-bench: A Benchmark for \underline{T}ool-\underline{A}gent-\underline{U}ser Interaction in Real-World Domains
UniCBE: An Uniformity-driven Comparing Based Evaluation Framework with Unified Multi-Objective Optimization
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Online-to-Offline RL for Agent Alignment
Robust Transfer of Safety-Constrained Reinforcement Learning Agents
Efficient Alternating Minimization with Applications to Weighted Low Rank Approximation
Beyond the convexity assumption: Realistic tabular data generation under quantifier-free real linear constraints
PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance
Minimal Impact ControlNet: Advancing Multi-ControlNet Integration
Dynamics of Concept Learning and Compositional Generalization
Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models
Robust LLM safeguarding via refusal feature adversarial training
DRoC: Elevating Large Language Models for Complex Vehicle Routing via Decomposed Retrieval of Constraints
Adaptive Energy Alignment for Accelerating Test-Time Adaptation
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation
Your Weak LLM is Secretly a Strong Teacher for Alignment
Understanding Optimization in Deep Learning with Central Flows
Energy-Based Diffusion Language Models for Text Generation
Group Downsampling with Equivariant Anti-aliasing
MAST: model-agnostic sparsified training
Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance
One Model Transfer to All: On Robust Jailbreak Prompts Generation against LLMs
Data-adaptive Differentially Private Prompt Synthesis for In-Context Learning
Efficient Masked AutoEncoder for Video Object Counting and A Large-Scale Benchmark
Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models
Autocorrelation Matters: Understanding the Role of Initialization Schemes for State Space Models
Graph Neural Networks Can (Often) Count Substructures
Offline Model-Based Optimization by Learning to Rank
C-CLIP: Multimodal Continual Learning for Vision-Language Model
Improved Finite-Particle Convergence Rates for Stein Variational Gradient Descent
Bundle Neural Network for message diffusion on graphs
Collaborative Discrete-Continuous Black-Box Prompt Learning for Language Models
PiCO: Peer Review in LLMs based on Consistency Optimization
FreeCG: Free the Design Space of Clebsch-Gordan Transform for Machine Learning Force Fields
Towards Understanding the Robustness of Diffusion-Based Purification: A Stochastic Perspective
Semantix: An Energy-guided Sampler for Semantic Style Transfer
BTBS-LNS: Binarized-Tightening, Branch and Search on Learning LNS Policies for MIP
Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Nesterov acceleration in benignly non-convex landscapes
Neural Spacetimes for DAG Representation Learning
MetaMetrics: Calibrating Metrics for Generation Tasks Using Human Preferences
Provable Robust Overfitting Mitigation in Wasserstein Distributionally Robust Optimization
Better than Your Teacher: LLM Agents that learn from Privileged AI Feedback
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
Tractable Multi-Agent Reinforcement Learning through Behavioral Economics
Adding Conditional Control to Diffusion Models with Reinforcement Learning
TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation
SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix
Context-Alignment: Activating and Enhancing LLMs Capabilities in Time Series
Effective post-training embedding compression via temperature control in contrastive training
Enhance Multi-View Classification Through Multi-Scale Alignment and Expanded Boundary
Breaking Mental Set to Improve Reasoning through Diverse Multi-Agent Debate
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
Sparse Learning for State Space Models on Mobile
Uncovering Overfitting in Large Language Model Editing
Automated Design of Agentic Systems
Adversarial Generative Flow Network for Solving Vehicle Routing Problems
Quantum (Inspired) $D^2$-sampling with Applications
Studying the Interplay Between the Actor and Critic Representations in Reinforcement Learning
ThinkBot: Embodied Instruction Following with Thought Chain Reasoning
ElasticTok: Adaptive Tokenization for Image and Video
Holographic Node Representations: Pre-training Task-Agnostic Node Embeddings
Boosting Ray Search Procedure of Hard-label Attacks with Transfer-based Priors
GLOMA: Global Video Text Spotting with Morphological Association
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
Discrete Diffusion Schrödinger Bridge Matching for Graph Transformation
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Hierarchical Autoregressive Transformers: Combining Byte- and Word-Level Processing for Robust, Adaptable Language Models
Precedence-Constrained Winter Value for Effective Graph Data Valuation
COFlowNet: Conservative Constraints on Flows Enable High-Quality Candidate Generation
Dynamic Diffusion Transformer
A Theory for Token-Level Harmonization in Retrieval-Augmented Generation
Scaling and evaluating sparse autoencoders
Scaling Long Context Training Data by Long-Distance Referrals
Knowledge Localization: Mission Not Accomplished? Enter Query Localization!
Learning the Complexity of Weakly Noisy Quantum States
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search
Heavy-Tailed Diffusion Models
ProAdvPrompter: A Two-Stage Journey to Effective Adversarial Prompting for LLMs
Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology
Cross-Embodiment Dexterous Grasping with Reinforcement Learning
Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds
PEARL: Towards Permutation-Resilient LLMs
Scaling In-the-Wild Training for Diffusion-based Illumination Harmonization and Editing by Imposing Consistent Light Transport
Jailbreaking as a Reward Misspecification Problem
Compositional simulation-based inference for time series
Dimension Agnostic Neural Processes
Capturing the Temporal Dependence of Training Data Influence
SeRA: Self-Reviewing and Alignment of LLMs using Implicit Reward Margins
UniCon: Unidirectional Information Flow for Effective Control of Large-Scale Diffusion Models
NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens
Score-based Self-supervised MRI Denoising
CryoGEN: Generative Energy-based Models for Cryogenic Electron Tomography Reconstruction
Learning Splitting Heuristics in Divide-and-Conquer SAT Solvers with Reinforcement Learning
LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time
Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models
Data Distillation for extrapolative protein design through exact preference optimization
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws
Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning
Discriminating image representations with principal distortions
Functional Homotopy: Smoothing Discrete Optimization via Continuous Parameters for LLM Jailbreak Attacks
Can Watermarked LLMs be Identified by Users via Crafted Prompts?
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
Weak-to-Strong Generalization Through the Data-Centric Lens
The impact of allocation strategies in subset learning on the expressive power of neural networks
EG4D: Explicit Generation of 4D Object without Score Distillation
How Much is Unseen Depends Chiefly on Information About the Seen
Deep Kernel Posterior Learning under Infinite Variance Prior Weights
TODO: Enhancing LLM Alignment with Ternary Preferences
Towards Certification of Uncertainty Calibration under Adversarial Attacks
Counterfactual Realizability
SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints
Flow Matching with Gaussian Process Priors for Probabilistic Time Series Forecasting
Unbounded: A Generative Infinite Game of Character Life Simulation
Reconciling Model Multiplicity for Downstream Decision Making
VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking
Selective Attention Improves Transformer
Schur's Positive-Definite Network: Deep Learning in the SPD cone with structure
Decoupled Subgraph Federated Learning
Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
Does SGD really happen in tiny subspaces?
Multimodal Quantitative Language for Generative Recommendation
UniMatch: Universal Matching from Atom to Task for Few-Shot Drug Discovery
PINP: Physics-Informed Neural Predictor with latent estimation of fluid flows
OccProphet: Pushing the Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with an Observer-Forecaster-Refiner Framework
Logic-Logit: A Logic-Based Approach to Choice Modeling
Memory Efficient Transformer Adapter for Dense Predictions
Learning View-invariant World Models for Visual Robotic Manipulation
Long-tailed Adversarial Training with Self-Distillation
Scaling Laws for Downstream Task Performance in Machine Translation
ImagineNav: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination
DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation
Decomposition Polyhedra of Piecewise Linear Functions
RecFlow: An Industrial Full Flow Recommendation Dataset
The Pitfalls of Memorization: When Memorization Hurts Generalization
Differentiable Optimization of Similarity Scores Between Models and Brains
VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words?
Simplifying, Stabilizing and Scaling Continuous-time Consistency Models
Neural Approximate Mirror Maps for Constrained Diffusion Models
MindSimulator: Exploring Brain Concept Localization via Synthetic fMRI
DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking head Video Generation
Scalable Extraction of Training Data from Aligned, Production Language Models
Linear Multistep Solver Distillation for Fast Sampling of Diffusion Models
Text4Seg: Reimagining Image Segmentation as Text Generation
IMDPrompter: Adapting SAM to Image Manipulation Detection by Cross-View Automated Prompt Learning
Revolutionizing EMCCD Denoising through a Novel Physics-Based Learning Framework for Noise Modeling
Training-free LLM-generated Text Detection by Mining Token Probability Sequences
Param$\Delta$ for Direct Mixing: Post-Train Large Language Model At Zero Cost
Routing Experts: Learning to Route Dynamic Experts in Existing Multi-modal Large Language Models
DistillHGNN: A Knowledge Distillation Approach for High-Speed Hypergraph Neural Networks
Overcoming Slow Decision Frequencies in Continuous Control: Model-Based Sequence Reinforcement Learning for Model-Free Control
Growth Inhibitors for Suppressing Inappropriate Image Concepts in Diffusion Models
Field-DiT: Diffusion Transformer on Unified Video, 3D, and Game Field Generation
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
In-Context Editing: Learning Knowledge from Self-Induced Distributions
PIED: Physics-Informed Experimental Design for Inverse Problems
Counterfactual Concept Bottleneck Models
Efficient Low-Bit Quantization with Adaptive Scales for Multi-Task Co-Training
Modeling dynamic social vision highlights gaps between deep learning and humans
Learning Fine-Grained Representations through Textual Token Disentanglement in Composed Video Retrieval
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
Effective Interplay between Sparsity and Quantization: From Theory to Practice
LancBiO: Dynamic Lanczos-aided Bilevel Optimization via Krylov Subspace
LLM-based Typed Hyperresolution for Commonsense Reasoning with Knowledge Bases
Process Reward Model with Q-value Rankings
Towards Effective Evaluations and Comparisons for LLM Unlearning Methods
FlashMask: Efficient and Rich Mask Extension of FlashAttention
When GNNs meet symmetry in ILPs: an orbit-based feature augmentation approach
Inspection and Control of Self-Generated-Text Recognition Ability in Llama3-8b-Instruct
GrabS: Generative Embodied Agent for 3D Object Segmentation without Scene Supervision
Learning Successor Features with Distributed Hebbian Temporal Memory
ST-GCond: Self-supervised and Transferable Graph Dataset Condensation
A Solvable Attention for Neural Scaling Laws
Making Text Embedders Few-Shot Learners
Scaling Laws for Precision
Provably Accurate Shapley Value Estimation via Leverage Score Sampling
Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators
Learning to Communicate Through Implicit Communication Channels
CausalRivers - Scaling up benchmarking of causal discovery for real-world time-series
PFDiff: Training-Free Acceleration of Diffusion Models Combining Past and Future Scores
Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation
Scaling Autonomous Agents via Automatic Reward Modeling And Planning
Improving Instruction-Following in Language Models through Activation Steering
GaussianBlock: Building Part-Aware Compositional and Editable 3D Scene by Primitives and Gaussians
Feature Responsiveness Scores: Model-Agnostic Explanations for Recourse
Learning Evolving Tools for Large Language Models
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
Self-Supervised Diffusion MRI Denoising via Iterative and Stable Refinement
ZeroDiff: Solidified Visual-semantic Correlation in Zero-Shot Learning
Scalable Universal T-Cell Receptor Embeddings from Adaptive Immune Repertoires
Active Learning for Neural PDE Solvers
Beware of Calibration Data for Pruning Large Language Models
When Selection Meets Intervention: Additional Complexities in Causal Discovery
Learning to Discretize Denoising Diffusion ODEs
Better autoregressive regression with LLMs via regression-aware fine-tuning
(Mis)Fitting Scaling Laws: A Survey of Scaling Law Fitting Techniques in Deep Learning
MolSpectra: Pre-training 3D Molecular Representation with Multi-modal Energy Spectra
Enhancing Uncertainty Estimation and Interpretability with Bayesian Non-negative Decision Layer
Language Models Learn to Mislead Humans via RLHF
Discrete Distribution Networks
Learning General-purpose Biomedical Volume Representations using Randomized Synthesis
Endless Jailbreaks with Bijection Learning
UniRestore3D: A Scalable Framework For General Shape Restoration
Optimal Transport for Time Series Imputation
Reflective Gaussian Splatting
Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control
Context Steering: Controllable Personalization at Inference Time
Diffusion Models are Evolutionary Algorithms
OLMoE: Open Mixture-of-Experts Language Models
Deep Signature: Characterization of Large-Scale Molecular Dynamics
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
Vector-ICL: In-context Learning with Continuous Vector Representations
MaxCutPool: differentiable feature-aware Maxcut for pooling in graph neural networks
Feature-Based Online Bilateral Trade
SymmCD: Symmetry-Preserving Crystal Generation with Diffusion Models
Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
InstantSplamp: Fast and Generalizable Stenography Framework for Generative Gaussian Splatting
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
Boosting Latent Diffusion with Perceptual Objectives
GOLD: Graph Out-of-Distribution Detection via Implicit Adversarial Latent Generation
Agent-to-Sim: Learning Interactive Behavior Models from Casual Longitudinal Videos
Language Agents Meet Causality -- Bridging LLMs and Causal World Models
Quality over Quantity in Attention Layers: When Adding More Heads Hurts
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
GPromptShield: Elevating Resilience in Graph Prompt Tuning Against Adversarial Attacks
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation
Image and Video Tokenization with Binary Spherical Quantization
A Unified Framework for Forward and Inverse Problems in Subsurface Imaging using Latent Space Translations
MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models
Understanding the Generalization of In-Context Learning in Transformers: An Empirical Study
Pacmann: Efficient Private Approximate Nearest Neighbor Search
Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron
Causally Motivated Sycophancy Mitigation for Large Language Models
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets
Point-SAM: Promptable 3D Segmentation Model for Point Clouds
NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models
Language-Image Models with 3D Understanding
ThermalGaussian: Thermal 3D Gaussian Splatting
AIMS.au: A Dataset for the Analysis of Modern Slavery Countermeasures in Corporate Statements
Federated Class-Incremental Learning: A Hybrid Approach Using Latent Exemplars and Data-Free Techniques to Address Local and Global Forgetting
Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI
Effective and Efficient Time-Varying Counterfactual Prediction with State-Space Models
Forewarned is Forearmed: Harnessing LLMs for Data Synthesis via Failure-induced Exploration
ELFS: Label-Free Coreset Selection with Proxy Training Dynamics
DenoiseVAE: Learning Molecule-Adaptive Noise Distributions for Denoising-based 3D Molecular Pre-training
SiReRAG: Indexing Similar and Related Information for Multihop Reasoning
LICO: Large Language Models for In-Context Molecular Optimization
Integral Performance Approximation for Continuous-Time Reinforcement Learning Control
GOAL: A Generalist Combinatorial Optimization Agent Learner
AFlow: Automating Agentic Workflow Generation
Random Is All You Need: Random Noise Injection on Feature Statistics for Generalizable Deep Image Denoising
Deep Kernel Relative Test for Machine-generated Text Detection
Joint Graph Rewiring and Feature Denoising via Spectral Resonance
SSOLE: Rethinking Orthogonal Low-rank Embedding for Self-Supervised Learning
Group Ligands Docking to Protein Pockets
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs
Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles
A Skewness-Based Criterion for Addressing Heteroscedastic Noise in Causal Discovery
EvA: Erasing Spurious Correlations with Activations
Unified Parameter-Efficient Unlearning for LLMs
Enhancing Clustered Federated Learning: Integration of Strategies and Improved Methodologies
Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks
To Code or Not To Code? Exploring Impact of Code in Pre-training
Semantic Temporal Abstraction via Vision-Language Model Guidance for Efficient Reinforcement Learning
Synthesizing Realistic fMRI: A Physiological Dynamics-Driven Hierarchical Diffusion Model for Efficient fMRI Acquisition
h4rm3l: A Language for Composable Jailbreak Attack Synthesis
Dataset Ownership Verification in Contrastive Pre-trained Models
Lines of Thought in Large Language Models
IgGM: A Generative Model for Functional Antibody and Nanobody Design
Understanding the Stability-based Generalization of Personalized Federated Learning
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
MVTokenFlow: High-quality 4D Content Generation using Multiview Token Flow
Mask in the Mirror: Implicit Sparsification
STORM: Spatio-TempOral Reconstruction Model For Large-Scale Outdoor Scenes
On Stochastic Contextual Bandits with Knapsacks in Small Budget Regime
PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches
PALMBENCH: A COMPREHENSIVE BENCHMARK OF COMPRESSED LARGE LANGUAGE MODELS ON MOBILE PLATFORMS
Gradient descent with generalized Newton’s method
Task Descriptors Help Transformers Learn Linear Models In-Context
Generative Monoculture in Large Language Models
Revisit Micro-batch Clipping: Adaptive Data Pruning via Gradient Manipulation
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
SpinQuant: LLM Quantization with Learned Rotations
U-Nets as Belief Propagation: Efficient Classification, Denoising, and Diffusion in Generative Hierarchical Models
Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design
Diverse Policies Recovering via Pointwise Mutual Information Weighted Imitation Learning
Local Patterns Generalize Better for Novel Anomalies
SFESS: Score Function Estimators for $k$-Subset Sampling
Scalable Discrete Diffusion Samplers: Combinatorial Optimization and Statistical Physics
Learn hybrid prototypes for multivariate time series anomaly detection
SGD with memory: fundamental properties and stochastic acceleration
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
Model Risk-sensitive Offline Reinforcement Learning
Multi-Accurate CATE is Robust to Unknown Covariate Shifts
Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
DLEFT-MKC: Dynamic Late Fusion Multiple Kernel Clustering with Robust Tensor Learning via Min-Max Optimization
Apollo-MILP: An Alternating Prediction-Correction Neural Solving Framework for Mixed-Integer Linear Programming
Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Models
Pursuing Feature Separation based on Neural Collapse for Out-of-Distribution Detection
LoCA: Location-Aware Cosine Adaptation for Parameter-Efficient Fine-Tuning
A Second-Order Perspective on Model Compositionality and Incremental Learning
Structuring Benchmark into Knowledge Graphs to Assist Large Language Models in Retrieving and Designing Models
Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation
On the Computation of the Fisher Information in Continual Learning
Distributed Speculative Inference (DSI): Speculation Parallelism for Provably Faster Lossless Language Model Inference
Robust Function-Calling for On-Device Language Model via Function Masking
More Experts Than Galaxies: Conditionally-Overlapping Experts with Biologically-Inspired Fixed Routing
Energy-Weighted Flow Matching for Offline Reinforcement Learning
Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them
ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability
Perplexity Trap: PLM-Based Retrievers Overrate Low Perplexity Documents
Reward Dimension Reduction for Scalable Multi-Objective Reinforcement Learning
Utility-Directed Conformal Prediction: A Decision-Aware Framework for Actionable Uncertainty Quantification
MIND: Math Informed syNthetic Dialogues for Pretraining LLMs
Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
Inference Scaling for Long-Context Retrieval Augmented Generation
A3D: Does Diffusion Dream about 3D Alignment?
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
CraftRTL: High-quality Synthetic Data Generation for Verilog Code Models with Correct-by-Construction Non-Textual Representations and Targeted Code Repair
The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs
MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization
MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge
Scrutinize What We Ignore: Reining In Task Representation Shift Of Context-Based Offline Meta Reinforcement Learning
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
From Commands to Prompts: LLM-based Semantic File System for AIOS
Improving Data Efficiency via Curating LLM-Driven Rating Systems
Shh, don't say that! Domain Certification in LLMs
Memory Mosaics
Denoising Levy Probabilistic Models
SFS: Smarter Code Space Search improves LLM Inference Scaling
Progressive Mixed-Precision Decoding for Efficient LLM Inference
Preserving Diversity in Supervised Fine-Tuning of Large Language Models
Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification
T2V-Turbo-v2: Enhancing Video Model Post-Training through Data, Reward, and Conditional Guidance Design
CURIE: Evaluating LLMs on Multitask Scientific Long-Context Understanding and Reasoning
Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
u-$\mu$P: The Unit-Scaled Maximal Update Parametrization
Does Editing Provide Evidence for Localization?
NRGBoost: Energy-Based Generative Boosted Trees
MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation
Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization
KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models
HERO: Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning
Should VLMs be Pre-trained with Image Data?
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist
RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards
Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Transformers
Generalizing Weisfeiler-Lehman Kernels to Subgraphs
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
On the Hölder Stability of Multiset and Graph Neural Networks
EIA: ENVIRONMENTAL INJECTION ATTACK ON GENERALIST WEB AGENTS FOR PRIVACY LEAKAGE
Free Hunch: Denoiser Covariance Estimation for Diffusion Models Without Extra Costs
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
Palu: KV-Cache Compression with Low-Rank Projection
Unleashing the Power of Task-Specific Directions in Parameter Efficient Fine-tuning
Higher-Order Graphon Neural Networks: Approximation and Cut Distance
TD-Paint: Faster Diffusion Inpainting Through Time Aware Pixel Conditioning
MixEval-X: Any-to-any Evaluations from Real-world Data Mixture
Relation-Aware Diffusion for Heterogeneous Graphs with Partially Observed Features
Normed Spaces for Graph Embedding
GotenNet: Rethinking Efficient 3D Equivariant Graph Neural Networks
An Undetectable Watermark for Generative Image Models
PT-T2I/V: An Efficient Proxy-Tokenized Diffusion Transformer for Text-to-Image/Video-Task
Counterfactual Generative Modeling with Variational Causal Inference
Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems
JetFormer: An autoregressive generative model of raw images and text
One Step Diffusion via Shortcut Models
Flow Matching with General Discrete Paths: A Kinetic-Optimal Perspective
Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking
Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning
Towards Explaining the Power of Constant-depth Graph Neural Networks for Structured Linear Programming
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Towards a General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders
MANTRA: The Manifold Triangulations Assemblage
In vivo cell-type and brain region classification via multimodal contrastive learning
MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA
Re-Imagining Multimodal Instruction Tuning: A Representation View
MuseGNN: Forming Scalable, Convergent GNN Layers that Minimize a Sampling-Based Energy
SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models
ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
$\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs
Adversarially Robust Anomaly Detection through Spurious Negative Pair Mitigation
Improving Neural Optimal Transport via Displacement Interpolation
Restructuring Vector Quantization with the Rotation Trick
TimeMixer++: A General Time Series Pattern Machine for Universal Predictive Analysis
When LLMs Play the Telephone Game: Cultural Attractors as Conceptual Tools to Evaluate LLMs in Multi-turn Settings
As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss
SCBench: A KV Cache-Centric Analysis of Long-Context Methods
Greener GRASS: Enhancing GNNs with Encoding, Rewiring, and Attention
Adaptive Methods through the Lens of SDEs: Theoretical Insights on the Role of Noise
Edge Prompt Tuning for Graph Neural Networks
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Episodic Novelty Through Temporal Distance
Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents
Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba Models
Test-time Alignment of Diffusion Models without Reward Over-optimization
Gated Delta Networks: Improving Mamba2 with Delta Rule
Looking Backward: Streaming Video-to-Video Translation with Feature Banks
Graph Transformers Dream of Electric Flow
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory
Controlling the Fidelity and Diversity of Deep Generative Models via Pseudo Density
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning
Linear Transformer Topological Masking with Graph Random Features
Fugatto 1: Foundational Generative Audio Transformer Opus 1
Reconstruction-Guided Policy: Enhancing Decision-Making through Agent-Wise State Consistency
ControlAR: Controllable Image Generation with Autoregressive Models
DoF: A Diffusion Factorization Framework for Offline Multi-Agent Reinforcement Learning
Graph Neural Networks Gone Hogwild
BrainOOD: Out-of-distribution Generalizable Brain Network Analysis
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
On Statistical Rates of Conditional Diffusion Transformers: Approximation, Estimation and Minimax Optimality
Youku Dense Caption: A Large-scale Chinese Video Dense Caption Dataset and Benchmarks
Unlocking Efficient, Scalable, and Continual Knowledge Editing with Basis-Level Representation Fine-Tuning
MIND over Body: Adaptive Thinking using Dynamic Computation
The Rise and Down of Babel Tower: Investigating the Evolution Process of Multilingual Code Large Language Model
MagicPIG: LSH Sampling for Efficient LLM Generation
On the Expressive Power of Sparse Geometric MPNNs
On LLM Knowledge Distillation - A Comparison between Forward KL and Reverse KL
Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality
Presto! Distilling Steps and Layers for Accelerating Music Generation
Token-Supervised Value Models for Enhancing Mathematical Problem-Solving Capabilities of Large Language Models
Human Simulacra: Benchmarking the Personification of Large Language Models
Deep Incomplete Multi-view Learning via Cyclic Permutation of VAEs
Fewer May Be Better: Enhancing Offline Reinforcement Learning with Reduced Dataset
Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage
Automated Filtering of Human Feedback Data for Aligning Text-to-Image Diffusion Models
T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Towards Hierarchical Rectified Flow
Fundamental Limitations on Subquadratic Alternatives to Transformers
Why RoPE Struggles to Maintain Long-Term Decay in Long Sequences?
Efficient Evolutionary Search Over Chemical Space with Large Language Models
ReNovo: Retrieval-Based \emph{De Novo} Mass Spectrometry Peptide Sequencing
NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation
Enhancing the Scalability and Applicability of Kohn-Sham Hamiltonians for Molecular Systems
MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses
Learning Equivariant Non-Local Electron Density Functionals
ProtPainter: Draw or Drag Protein via Topology-guided Diffusion
Contextualizing biological perturbation experiments through language
BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments
Hyperbolic Genome Embeddings
Estimation of single-cell and tissue perturbation effect in spatial transcriptomics via Spatial Causal Disentanglement
Reliable and Diverse Evaluation of LLM Medical Knowledge Mastery
BoneMet: An Open Large-Scale Multi-Modal Murine Dataset for Breast Cancer Bone Metastasis Diagnosis and Prognosis
CLOVER: Cross-Layer Orthogonal Vectors Pruning and Fine-Tuning
Time-to-Event Pretraining for 3D Medical Imaging
Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding
QuaDiM: A Conditional Diffusion Model For Quantum State Property Estimation
Continuous Ensemble Weather Forecasting with Diffusion models
SimXRD-4M: Big Simulated X-ray Diffraction Data and Crystal Symmetry Classification Benchmark
Can Transformers Do Enumerative Geometry?
Solving Differential Equations with Constrained Learning
Diff-PIC: Revolutionizing Particle-In-Cell Nuclear Fusion Simulation with Diffusion Models
No Equations Needed: Learning System Dynamics Without Relying on Closed-Form ODEs
Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving
Rapidly Adapting Policies to the Real-World via Simulation-Guided Fine-Tuning
Generating Freeform Endoskeletal Robots
RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation
Physiome-ODE: A Benchmark for Irregularly Sampled Multivariate Time-Series Forecasting Based on Biological ODEs
Diffusion-based Decoupled Deterministic and Uncertain Framework for Probabilistic Multivariate Time Series Forecasting
Zero-shot forecasting of chaotic systems
SimpleTM: A Simple Baseline for Multivariate Time Series Forecasting
TimeInf: Time Series Data Contribution via Influence Functions
Infinite-Resolution Integral Noise Warping for Diffusion Models
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Automated Proof Generation for Rust Code via Self-Evolution
Improving Language Model Distillation through Hidden State Matching
Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement
Agents' Room: Narrative Generation through Multi-step Collaboration
Continuous Autoregressive Modeling with Stochastic Monotonic Alignment for Speech Synthesis
OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalities
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
W-PCA Based Gradient-Free Proxy for Efficient Search of Lightweight Language Models
PersonalLLM: Tailoring LLMs to Individual Preferences
DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory
Competing Large Language Models in Multi-Agent Gaming Environments
SeCom: On Memory Construction and Retrieval for Personalized Conversational Agents
Multi-modal brain encoding models for multi-modal stimuli
Animate Your Thoughts: Reconstruction of Dynamic Natural Vision from Human Brain Activity
Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers
Quantized Spike-driven Transformer
Range, not Independence, Drives Modularity in Biologically Inspired Representations
BrainACTIV: Identifying visuo-semantic properties driving cortical selectivity using diffusion-based image manipulation
Associative memory and dead neurons
SIM: Surface-based fMRI Analysis for Inter-Subject Multimodal Decoding from Movie-Watching Experiments
As large as it gets – Studying Infinitely Large Convolutions via Neural Implicit Frequency Filters
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Masked Image Modeling Representations
Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering
CityAnchor: City-scale 3D Visual Grounding with Multi-modality LLMs
Segment Any 3D Object with Language
Bridging Compressed Image Latents and Multimodal Large Language Models
Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy Video
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Less is More: Masking Elements in Image Condition Features Avoids Content Leakages in Style Transfer Diffusion Models
TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio Motion Embedding and Diffusion Interpolation
Order-aware Interactive Segmentation
Learning Spatial-Semantic Features for Robust Video Object Segmentation
Adaptive Camera Sensor for Vision Models
Optimizing 4D Gaussians for Dynamic Scene Video from Single Landscape Images
Rethinking Classifier Re-Training in Long-Tailed Recognition: Label Over-Smooth Can Balance
Pedestrian Motion Reconstruction: A Large-scale Benchmark via Mixed Reality Rendering with Multiple Perspectives and Modalities
Foundation Models Secretly Understand Neural Network Weights: Enhancing Hypernetwork Architectures with Foundation Models
Robust-PIFu: Robust Pixel-aligned Implicit Function for 3D Human Digitalization from a Single Image
Unleashing the Potential of Vision-Language Pre-Training for 3D Zero-Shot Lesion Segmentation via Mask-Attribute Alignment
Generation and Comprehension Hand-in-Hand: Vision-guided Expression Diffusion for Boosting Referring Expression Generation and Comprehension
3DGS-Drag: Dragging Gaussians for Intuitive Point-Based 3D Editing
AugKD: Ingenious Augmentations Empower Knowledge Distillation for Image Super-Resolution
Cocoon: Robust Multi-Modal Perception with Uncertainty-Aware Sensor Fusion
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
CAT-3DGS: A Context-Adaptive Triplane Approach to Rate-Distortion-Optimized 3DGS Compression
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
X-Drive: Cross-modality Consistent Multi-Sensor Data Synthesis for Driving Scenarios
Sketch2Diagram: Generating Vector Diagrams from Hand-Drawn Sketches
Bringing NeRFs to the Latent Space: Inverse Graphics Autoencoder
Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control
Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding
Articulate-Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Latent Radiance Fields with 3D-aware 2D Representations
Which Tasks Should Be Compressed Together? A Causal Discovery Approach for Efficient Multi-Task Representation Compression
Measuring And Improving Engagement of Text-to-Image Generation Models
SurFhead: Affine Rig Blending for Geometrically Accurate 2D Gaussian Surfel Head Avatars
SynQ: Accurate Zero-shot Quantization by Synthesis-aware Fine-tuning
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training
Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes
Learning Color Equivariant Representations
Towards Realistic Data Generation for Real-World Super-Resolution
Re-Aligning Language to Visual Objects with an Agentic Workflow
EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition
ProtoSnap: Prototype Alignment For Cuneiform Signs
On the Transfer of Object-Centric Representation Learning
CertainlyUncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
Recognize Any Surgical Object: Unleashing the Power of Weakly-Supervised Data
CHAMP: Conformalized 3D Human Multi-Hypothesis Pose Estimators
RESfM: Robust Deep Equivariant Structure from Motion
IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension
OATS: Outlier-Aware Pruning Through Sparse and Low Rank Decomposition
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models
SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation
OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?
SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models
Empowering Users in Digital Privacy Management through Interactive LLM-Based Agents
BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics
MambaQuant: Quantizing the Mamba Family with Variance Aligned Rotation Methods
OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting
Smoothing the Shift: Towards Stable Test-Time Adaptation under Complex Multimodal Noises
BP-Modified Local Loss for Efficient Training of Deep Neural Networks
Regularizing Energy among Training Samples for Out-of-Distribution Generalization
Rotated Runtime Smooth: Training-Free Activation Smoother for accurate INT4 inference
MotionClone: Training-Free Motion Cloning for Controllable Video Generation
LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding
Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression
EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment
From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions
TypedThinker: Diversify Large Language Model Reasoning with Typed Thinking
Progress or Regress? Self-Improvement Reversal in Post-training
Are Large Vision Language Models Good Game Players?
Neural Phylogeny: Fine-Tuning Relationship Detection among Neural Networks
Ensembles of Low-Rank Expert Adapters
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
Uncovering Latent Memories in Large Language Models
Your Mixture-of-Experts LLM Is Secretly an Embedding Model for Free
Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers
Differential learning kinetics govern the transition from memorization to generalization during in-context learning
DOCS: Quantifying Weight Similarity for Deeper Insights into Large Language Models
Pre-training of Foundation Adapters for LLM Fine-tuning
What's New in My Data? Novelty Exploration via Contrastive Generation
Wavelet-based Positional Representation for Long Context
Self-Improvement in Language Models: The Sharpening Mechanism
Task-Adaptive Pretrained Language Models via Clustered-Importance Sampling
A Statistical Framework for Ranking LLM-based Chatbots
Sail into the Headwind: Alignment via Robust Rewards and Dynamic Labels against Reward Hacking
HaDeMiF: Hallucination Detection and Mitigation in Large Language Models
BadJudge: Backdoor Vulnerabilities of LLM-As-A-Judge
Improving Pretraining Data Using Perplexity Correlations
Going Beyond Feature Similarity: Effective Dataset distillation based on Class-aware Conditional Mutual Information
Gramian Multimodal Representation Learning and Alignment
SelKD: Selective Knowledge Distillation via Optimal Transport Perspective
Improving Neural Network Accuracy by Concurrently Training with a Twin Network
Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge
Robust Representation Consistency Model via Contrastive Denoising
SEBRA : Debiasing through Self-Guided Bias Ranking
DRoP: Distributionally Robust Data Pruning
Democratic Training Against Universal Adversarial Perturbations
Severing Spurious Correlations with Data Pruning
Adversaries With Incentives: A Strategic Alternative to Adversarial Robustness
Provably Reliable Conformal Prediction Sets in the Presence of Data Poisoning
MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments
Morphing Tokens Draw Strong Masked Image Models
The "Law'' of the Unconscious Contrastive Learner: Probabilistic Alignment of Unpaired Modalities
DebGCD: Debiased Learning with Distribution Guidance for Generalized Category Discovery
Bayesian Treatment of the Spectrum of the Empirical Kernel in (Sub)Linear-Width Neural Networks
A Rainbow in Deep Network Black Boxes
Prediction Risk and Estimation Risk of the Ridgeless Least Squares Estimator under General Assumptions on Regression Errors
Oscillatory State-Space Models
Two Sparse Matrices are Better than One: Sparsifying Neural Networks with Double Sparse Factorization
Designing Concise ConvNets with Columnar Stages
MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines
Rethinking Light Decoder-based Solvers for Vehicle Routing Problems
Solving hidden monotone variational inequalities with surrogate losses
Sharpness-Aware Minimization: General Analysis and Improved Rates
Local convergence of simultaneous min-max algorithms to differential equilibrium on Riemannian manifold
OCCAM: Towards Cost-Efficient and Accuracy-Aware Classification Inference
Utilitarian Algorithm Configuration for Infinite Parameter Spaces
Optimizing Posterior Samples for Bayesian Optimization via Rootfinding
On the Almost Sure Convergence of the Stochastic Three Points Algorithm
Local Steps Speed Up Local GD for Heterogeneous Distributed Logistic Regression
Joint Gradient Balancing for Data Ordering in Finite-Sum Multi-Objective Optimization
Learning on One Mode: Addressing Multi-modality in Offline Reinforcement Learning
Cross-Domain Offline Policy Adaptation with Optimal Transport and Dataset Constraint
Fat-to-Thin Policy Optimization: Offline Reinforcement Learning with Sparse Policies
Flat Reward in Policy Parameter Space Implies Robust Reinforcement Learning
Policy Optimization under Imperfect Human Interactions with Agent-Gated Shared Autonomy
Learning to Search from Demonstration Sequences
Policy Gradient with Kernel Quadrature
On Generalization Across Environments In Multi-Objective Reinforcement Learning
POGEMA: A Benchmark Platform for Cooperative Multi-Agent Pathfinding
Expected Return Symmetries
Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning
Learning mirror maps in policy mirror descent
What Makes a Good Diffusion Planner for Decision Making?
Simple, Good, Fast: Self-Supervised World Models Free of Baggage
How to Find the Exact Pareto Front for Multi-Objective MDPs?
Dynamic Contrastive Skill Learning with State-Transition Based Skill Clustering and Dynamic Length Adjustment
$q$-exponential family for policy optimization
CBMA: Improving Conformal Prediction through Bayesian Model Averaging
Robust Simulation-Based Inference under Missing Data via Neural Processes
InverseBench: Benchmarking Plug-and-Play Diffusion Priors for Inverse Problems in Physical Sciences
Residual Deep Gaussian Processes on Manifolds
Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization
End-to-end Learning of Gaussian Mixture Priors for Diffusion Sampler
Diffusion On Syntax Trees For Program Synthesis
Flow-based Variational Mutual Information: Fast and Flexible Approximations
Benchmarking Predictive Coding Networks -- Made Simple
Kernel-based Optimally Weighted Conformal Time-Series Prediction
Connecting Federated ADMM to Bayes
Training One-Dimensional Graph Neural Networks is NP-Hard
On the Optimal Memorization Capacity of Transformers
State Space Models are Provably Comparable to Transformers in Dynamic Token Selection
Single-agent Poisoning Attacks Suffice to Ruin Multi-Agent Learning
Efficient Online Pruning and Abstraction for Imperfect Information Extensive-Form Games
Strategic Classification With Externalities
Sketching for Convex and Nonconvex Regularized Least Squares with Sharp Guarantees
ONLINE EPSILON NET & PIERCING SET FOR GEOMETRIC CONCEPTS
Bounds on $L_p$ Errors in Density Ratio Estimation via $f$-Divergence Loss Functions
Conservative Contextual Bandits: Beyond Linear Representations
Learning from Imperfect Human Feedback: A Tale from Corruption-Robust Dueling
Satisficing Regret Minimization in Bandits
Linear Bandits with Memory
ADAM Optimization with Adaptive Batch Selection
Do Stochastic, Feel Noiseless: Stable Stochastic Optimization via a Double Momentum Mechanism
Accelerated Over-Relaxation Heavy-Ball Method: Achieving Global Accelerated Convergence with Broad Generalization
Reexamining the Aleatoric and Epistemic Uncertainty Dichotomy
Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis
Learning a Fast Mixing Exogenous Block MDP using a Single Trajectory
Statistical Tractability of Off-policy Evaluation of History-dependent Policies in POMDPs
Revisiting a Design Choice in Gradient Temporal Difference Learning
Generalizable Motion Planning via Operator Learning
On Minimizing Adversarial Counterfactual Error in Adversarial Reinforcement Learning
Topological Zigzag Spaghetti for Diffusion-based Generation and Prediction on Graphs
ADAM: An Embodied Causal Agent in Open-World Environments
Causal Discovery via Bayesian Optimization
Euler Characteristic Tools for Topological Data Analysis
KAN: Kolmogorov–Arnold Networks
Advancing Out-of-Distribution Detection via Local Neuroplasticity
VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning
SyllableLM: Learning Coarse Semantic Units for Speech Language Models
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning
An Information Criterion for Controlled Disentanglement of Multimodal Data
Test-time Adaptation for Cross-modal Retrieval with Query Shift
Neural networks on Symmetric Spaces of Noncompact Type
ColPali: Efficient Document Retrieval with Vision Language Models
Boosting Methods for Interval-censored Data with Regression and Classification
Simple yet Effective Incomplete Multi-view Clustering: Similarity-level Imputation and Intra-view Hybrid-group Prototype Construction
Exact Community Recovery under Side Information: Optimality of Spectral Algorithms
Content-Style Learning from Unaligned Domains: Identifiability under Unknown Latent Dimensions
Scale-Aware Contrastive Reverse Distillation for Unsupervised Medical Anomaly Detection
Let SSMs be ConvNets: State-space Modeling with Optimal Tensor Contractions
Single Teacher, Multiple Perspectives: Teacher Knowledge Augmentation for Enhanced Knowledge Distillation
Fine-tuning can cripple your foundation model; preserving features may be the solution
Comparing Targeting Strategies for Maximizing Social Welfare with Limited Resources
Do not write that jailbreak paper
Encryption-Friendly LLM Architecture
Image-level Memorization Detection via Inversion-based Inference Perturbation
Towards hyperparameter-free optimization with differential privacy
The Last Iterate Advantage: Empirical Auditing and Principled Heuristic Analysis of Differentially Private SGD
Learning from End User Data with Shuffled Differential Privacy over Kernel Densities
How to Verify Any (Reasonable) Distribution Property: Computationally Sound Argument Systems for Distributions
Adversarial Machine Unlearning
More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness
Fantastic Copyrighted Beasts and How (Not) to Generate Them
Probe before You Talk: Towards Black-box Defense against Backdoor Unalignment for Large Language Models
When Prompt Engineering Meets Software Engineering: CNL-P as Natural and Robust "APIs'' for Human-AI Interaction
Exact Computation of Any-Order Shapley Interactions for Graph Neural Networks
Efficient Automated Circuit Discovery in Transformers using Contextual Decomposition
DICE: Data Influence Cascade in Decentralized Learning
Salvage: Shapley-distribution Approximation Learning Via Attribution Guided Exploration for Explainable Image Classification
Mechanism and emergence of stacked attention heads in multi-layer transformers
Century: A Framework and Dataset for Evaluating Historical Contextualisation of Sensitive Images
Linear Representations of Political Perspective Emerge in Large Language Models
Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
How Far Are We from True Unlearnability?
BBCaL: Black-box Backdoor Detection under the Causality Lens
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
Dynamic Sparse Training versus Dense Training: The Unexpected Winner in Image Corruption Robustness
An Effective Theory of Bias Amplification
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
Language Models are Advanced Anonymizers
Geometry of Neural Reinforcement Learning in Continuous State and Action Spaces
Towards Domain Adaptive Neural Contextual Bandits
Lean-STaR: Learning to Interleave Thinking and Proving
ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning
Protecting against simultaneous data poisoning attacks
Robustness Inspired Graph Backdoor Defense
Towards Understanding the Universality of Transformers for Next-Token Prediction
SAVA: Scalable Learning-Agnostic Data Valuation
Global Convergence in Neural ODEs: Impact of Activation Functions
$R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning
Deep MMD Gradient Flow without adversarial training
Gyrogroup Batch Normalization
How Feature Learning Can Improve Neural Scaling Laws
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs
Understanding Factual Recall in Transformers via Associative Memories
WeatherGFM: Learning a Weather Generalist Foundation Model via In-context Learning
On the Benefits of Memory for Modeling Time-Dependent PDEs
Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception
Tracing Representation Progression: Analyzing and Enhancing Layer-Wise Similarity
Consistency Models Made Easy
APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding
Let the Code LLM Edit Itself When You Edit the Code
Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models
Diffusion Models and Gaussian Flow Matching: Two Sides of the Same Coin
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
Elucidating the Preconditioning in Consistency Distillation
MatExpert: Decomposing Materials Discovery By Mimicking Human Experts
Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Dynamical Diffusion: Learning Temporal Dynamics with Diffusion Models
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents
ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler
How many samples are needed to train a deep neural network?
Rationalizing and Augmenting Dynamic Graph Neural Networks
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Multi-LLM-Agents Debate - Performance, Efficiency, and Scaling Challenges
AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models
RTDiff: Reverse Trajectory Synthesis via Diffusion for Offline Reinforcement Learning
Multi-Field Adaptive Retrieval
Wayward Concepts In Large Multimodal Models
Small-to-Large Generalization: Training Data Influences Models Consistently Across Scale
Hymba: A Hybrid-head Architecture for Small Language Models
Anti-Exposure Bias in Diffusion Models
Decision Tree Induction Through LLMs via Semantically-Aware Evolution
Decoupling Angles and Strength in Low-rank Adaptation
Exploring The Forgetting in Adversarial Training: A Novel Method for Enhancing Robustness
LoRA-Pro: Are Low-Rank Adapters Properly Optimized?
Real-Time Video Generation with Pyramid Attention Broadcast
BenTo: Benchmark Reduction with In-Context Transferability
Do LLM Agents Have Regret? A Case Study in Online Learning and Games
4K4DGen: Panoramic 4D Generation at 4K Resolution
Monte Carlo Planning with Large Language Model for Text-Based Game Agents
GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding
Can LLM Simulations Truly Reflect Humanity? A Deep Dive
Compute-Constrained Data Selection
Multi-objective antibody design with constrained preference optimization
Generative World Explorer
Learning Randomized Algorithms with Transformers
KBLaM: Knowledge Base augmented Language Model
Linear SCM Identification in the Presence of Confounders and Gaussian Noise
Transformer-Squared: Self-adaptive LLMs
No Location Left Behind: Measuring and Improving the Fairness of Implicit Representations for Earth Data
Scaling Diffusion Language Models via Adaptation from Autoregressive Models
StringLLM: Understanding the String Processing Capability of Large Language Models
Progressive distillation induces an implicit curriculum
AutoBencher: Towards Declarative Benchmark Construction
LLMs' Potential Influences on Our Democracy: Challenges and Opportunities
Reassessing EMNLP 2024’s Best Paper: Does Divergence-Based Calibration for MIAs Hold Up?
Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
Large Scale Knowledge Washing
NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics
Interleaved Scene Graphs for Interleaved Text-and-Image Generation Assessment
MGDA Converges under Generalized Smoothness, Provably
We use cookies to store which papers have been visited.
I agree
Successful Page Load
ICLR uses cookies for essential functions only. We do not sell your personal information.
Our Privacy Policy »
Accept Cookies
We use cookies to store which papers have been visited.
I agree