ICLR 2025 Papers

Skip to yearly menu bar Skip to main content

Layout:

mini compact topic detail

Horizon Generalization in Reinforcement Learning

ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL

Stem-OB: Generalizable Visual Imitation Learning with Stem-Like Convergent Observation through Diffusion Inversion

E(n) Equivariant Topological Neural Networks

Revealing the 3D Cosmic Web through Gravitationally Constrained Neural Fields

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

LR0.FM: LOW-RESOLUTION ZERO-SHOT CLASSIFICATION BENCHMARK FOR FOUNDATION MODELS

ComPC: Completing a 3D Point Cloud with 2D Diffusion Priors

Controlling Space and Time with Diffusion Models

Towards Generalization Bounds of GCNs for Adversarially Robust Node Classification

Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture

L-WISE: Boosting human visual category learning through model-based image selection and enhancement

Learning Structured Representations by Embedding Class Hierarchy with Fast Optimal Transport

Periodic Materials Generation using Text-Guided Joint Diffusion Model

Multi-Robot Motion Planning with Diffusion Models

Ask, and it shall be given: On the Turing completeness of prompting

The Hidden Cost of Waiting for Accurate Predictions

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications for Multi-Task RL

Harnessing Diversity for Important Data Selection in Pretraining Large Language Models

Hierarchically Encapsulated Representation for Protocol Design in Self-Driving Labs

Adversarial Attacks on Data Attribution

R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference

Differentiable and Learnable Wireless Simulation with Geometric Transformers

SplatFormer: Point Transformer for Robust 3D Gaussian Splatting

Interpreting Emergent Planning in Model-Free Reinforcement Learning

Reward Learning from Multiple Feedback Types

Temporal Flexibility in Spiking Neural Networks: Towards Generalization Across Time Steps and Deployment Friendliness

Hyper-Connections

Denoising Task Difficulty-based Curriculum for Training Diffusion Models

Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity

Towards a Unified and Verified Understanding of Group-Operation Networks

Neural Context Flows for Meta-Learning of Dynamical Systems

Guaranteed Generation from Large Language Models

PFGuard: A Generative Framework with Privacy and Fairness Safeguards

SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding

Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors

X-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos

Learned Reference-based Diffusion Sampler for multi-modal distributions

The Same but Different: Structural Similarities and Differences in Multilingual Language Modeling

Certified Robustness Under Bounded Levenshtein Distance

SEPARATE: A Simple Low-rank Projection for Gradient Compression in Modern Large-scale Model Training Process

Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search

PICASO: Permutation-Invariant Context Composition with State Space Models

PPT: Patch Order Do Matters In Time Series Pretext Task

Block-Attention for Efficient Prefilling

Hierarchical World Models as Visual Whole-Body Humanoid Controllers

Exploring a Principled Framework for Deep Subspace Clustering

GRAIN: Exact Graph Reconstruction from Gradients

Do WGANs succeed because they minimize the Wasserstein Distance? Lessons from Discrete Generators

Event-Driven Online Vertical Federated Learning

Theory on Mixture-of-Experts in Continual Learning

Modality-Specialized Synergizers for Interleaved Vision-Language Generalists

AI Sandbagging: Language Models can Strategically Underperform on Evaluations

Revisiting Source-Free Domain Adaptation: a New Perspective via Uncertainty Control

DON’T STOP ME NOW: EMBEDDING BASED SCHEDULING FOR LLMS

Lawma: The Power of Specialization for Legal Annotation

Semialgebraic Neural Networks: From roots to representations

PuzzleFusion++: Auto-agglomerative 3D Fracture Assembly by Denoise and Verify

On Calibration of LLM-based Guard Models for Reliable Content Moderation

When Attention Sink Emerges in Language Models: An Empirical View

Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid

Unsupervised Multiple Kernel Learning for Graphs via Ordinality Preservation

How Discrete and Continuous Diffusion Meet: Comprehensive Analysis of Discrete Diffusion Models via a Stochastic Integral Framework

SPA: 3D Spatial-Awareness Enables Effective Embodied Representation

Transformers are Universal In-context Learners

CSA: Data-efficient Mapping of Unimodal Features to Multimodal Features

LLM Unlearning via Loss Adjustment with Only Forget Data

BlendRL: A Framework for Merging Symbolic and Neural Policy Learning

Doubly Optimal Policy Evaluation for Reinforcement Learning

Bonsai: Gradient-free Graph Condensation for Node Classification

FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"

TabWak: A Watermark for Tabular Diffusion Models

Score-based free-form architectures for high-dimensional Fokker-Planck equations

dEBORA: Efficient Bilevel Optimization-based low-Rank Adaptation

Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data

Masked Temporal Interpolation Diffusion for Procedure Planning in Instructional Videos

MAGNet: Motif-Agnostic Generation of Molecules from Scaffolds

HASARD: A Benchmark for Vision-Based Safe Reinforcement Learning in Embodied Agents

O(d/T) Convergence Theory for Diffusion Probabilistic Models under Minimal Assumptions

Graph-Guided Scene Reconstruction from Images with 3D Gaussian Splatting

On the Completeness of Invariant Geometric Deep Learning Models

Deep Networks Learn Features From Local Discontinuities in the Label Function

Bias Mitigation in Graph Diffusion Models

Bayesian Analysis of Combinatorial Gaussian Process Bandits

Nova: Generative Language Models for Assembly Code with Hierarchical Attention and Contrastive Learning

IFORMER: INTEGRATING CONVNET AND TRANSFORMER FOR MOBILE APPLICATION

Test-Time Ensemble via Linear Mode Connectivity: A Path to Better Adaptation

REVISITING MULTI-PERMUTATION EQUIVARIANCE THROUGH THE LENS OF IRREDUCIBLE REPRESENTATIONS

Neural Multi-Objective Combinatorial Optimization via Graph-Image Multimodal Fusion

Cached Multi-Lora Composition for Multi-Concept Image Generation

Fourier Head: Helping Large Language Models Learn Complex Probability Distributions

STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning

metabench - A Sparse Benchmark of Reasoning and Knowledge in Large Language Models

Scaling LLM Test-Time Compute Optimally Can be More Effective than Scaling Parameters for Reasoning

CONGO: Compressive Online Gradient Optimization

AI2TALE: An Innovative Information Theory-based Approach for Learning to Localize Phishing Attacks

Enhancing Federated Domain Adaptation with Multi-Domain Prototype-Based Federated Fine-Tuning

BRAID: Input-driven Nonlinear Dynamical Modeling of Neural-Behavioral Data

Fast training and sampling of Restricted Boltzmann Machines

Model-Agnostic Knowledge Guided Correction for Improved Neural Surrogate Rollout

What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis

Video Action Differencing

SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement

UniCoTT: A Unified Framework for Structural Chain-of-Thought Distillation

EMOS: Embodiment-aware Heterogeneous Multi-robot Operating System with LLM Agents

Multi-Label Node Classification with Label Influence Propagation

Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs

MoDeGPT: Modular Decomposition for Large Language Model Compression

OptionZero: Planning with Learned Options

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Learning Spatiotemporal Dynamical Systems from Point Process Observations

Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation

From Tokens to Words: On the Inner Lexicon of LLMs

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

NL-Eye: Abductive NLI For Images

DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference

Residual Kernel Policy Network: Enhancing Stability and Robustness in RKHS-Based Reinforcement Learning

Revisit the Open Nature of Open Vocabulary Semantic Segmentation

DOPL: Direct Online Preference Learning for Restless Bandits with Preference Feedback

DeepRTL: Bridging Verilog Understanding and Generation with a Unified Representation Model

Boost Self-Supervised Dataset Distillation via Parameterization, Predefined Augmentation, and Approximation

Learning from weak labelers as constraints

HELMET: How to Evaluate Long-context Models Effectively and Thoroughly

CREIMBO: Cross-Regional Ensemble Interactions in Multi-view Brain Observations

CLIPDrag: Combining Text-based and Drag-based Instructions for Image Editing

PharmacoMatch: Efficient 3D Pharmacophore Screening via Neural Subgraph Matching

Lambda-Skip Connections: the architectural component that prevents Rank Collapse

Cross-Attention Head Position Patterns Can Align with Human Visual Concepts in Text-to-Image Generative Models

MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

Coreset Spectral Clustering

From Risk to Uncertainty: Generating Predictive Uncertainty Measures via Bayesian Estimation

GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling

PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment

Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians

BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities

Wicked Oddities: Selectively Poisoning for Effective Clean-Label Backdoor Attacks

Fully-inductive Node Classification on Arbitrary Graphs

Bridging Jensen Gap for Max-Min Group Fairness Optimization in Recommendation

Improved Convergence Rate for Diffusion Probabilistic Models

SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters

OmniKV: Dynamic Context Selection for Efficient Long-Context LLMs

Monet: Mixture of Monosemantic Experts for Transformers

Test-Time Adaptation for Combating Missing Modalities in Egocentric Videos

Optimizing Neural Network Representations of Boolean Networks

6D Object Pose Tracking in Internet Videos for Robotic Manipulation

CO-MOT: Boosting End-to-end Transformer-based Multi-Object Tracking via Coopetition Label Assignment and Shadow Sets

On Bits and Bandits: Quantifying the Regret-Information Trade-off

MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis

Universal generalization guarantees for Wasserstein distributionally robust models

Weakly-Supervised Affordance Grounding Guided by Part-Level Semantic Priors

PETRA: Parallel End-to-end Training with Reversible Architectures

SPD Attack - Prevention of AI Powered Image Editing by Image Immunization

Brain-inspired $L_p$-Convolution benefits large kernels and aligns better with visual cortex

Personality Alignment of Large Language Models

Divide and Translate: Compositional First-Order Logic Translation and Verification for Complex Logical Reasoning

Synthetic continued pretraining

Port-Hamiltonian Architectural Bias for Long-Range Propagation in Deep Graph Networks

DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback

Avoid Overclaims: Summary of Complexity Bounds for Algorithms in Minimization and Minimax Optimization

Towards more rigorous evaluations of language models

How do we interpret the outputs of a neural network trained on classification?

Generative Adversarial Ranking Nets

Generating Less Certain Adversarial Examples Improves Robust Generalization

Deriving Causal Order from Single-Variable Interventions: Guarantees & Algorithm

Training LLMs over Neurally Compressed Text

PCNN: Probable-Class Nearest-Neighbor Explanations Improve Fine-Grained Image Classification Accuracy for AIs and Humans

Grid Cell-Inspired Fragmentation and Recall for Efficient Map Building

Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models

Revisiting Feature Prediction for Learning Visual Representations from Video

Revisiting In-context Learning Inference Circuit in Large Language Models

Regularized Proportional Fairness Mechanism for Resource Allocation Without Money

Self-Improvement for Neural Combinatorial Optimization: Sample Without Replacement, but Improvement

Towards Unbiased Calibration using Meta-Regularization

Optimization with Access to Auxiliary Information

MMD-Regularized Unbalanced Optimal Transport

DAMO: Decoding by Accumulating Activations Momentum for Mitigating Hallucinations in Vision-Language Models

$F^3Set$: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos

MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL

SWEb: A Large Web Dataset for the Scandinavian Languages

Do Contemporary Causal Inference Models Capture Real-World Heterogeneity? Findings from a Large-Scale Benchmark

Personalized Representation from Personalized Generation

Disentangling 3D Animal Pose Dynamics with Scrubbed Conditional Latent Variables

TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models

CycleResearcher: Improving Automated Research via Automated Review

Mechanistic Interpretability Meets Vision Language Models: Insights and Limitations

Test-time Adaptation for Image Compression with Distribution Regularization

Interpretable Compressed Descriptions For Image Generation

Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks

SigDiffusions: Score-Based Diffusion Models for Time Series via Log-Signature Embeddings

K-HALU: Multiple Answer Korean Hallucination Benchmark for Large Language Models

FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs

Uncertainty Modeling in Graph Neural Networks via Stochastic Differential Equations

BingoGuard: LLM Content Moderation Tools with Risk Levels

Can a Large Language Model be a Gaslighter?

SONICS: Synthetic Or Not - Identifying Counterfeit Songs

ConcreTizer: Model Inversion Attack via Occupancy Classification and Dispersion Control for 3D Point Cloud Restoration

TeaserGen: Generating Teasers for Long Documentaries

Temporal Heterogeneous Graph Generation with Privacy, Utility, and Efficiency

Identification of Intermittent Temporal Latent Process

ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

SINGER: Stochastic Network Graph Evolving Operator for High Dimensional PDEs

Fast Uncovering of Protein Sequence Diversity from Structure

Contractive Dynamical Imitation Policies for Efficient Out-of-Sample Recovery

VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control

Provably Robust Explainable Graph Neural Networks against Graph Perturbation Attacks

Pareto Prompt Optimization

Spectral-Refiner: Accurate Fine-Tuning of Spatiotemporal Fourier Neural Operator for Turbulent Flows

Dense Video Object Captioning from Disjoint Supervision

MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors

RelitLRM: Generative Relightable Radiance for Large Reconstruction Models

ReSi: A Comprehensive Benchmark for Representational Similarity Measures

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

Learning to Select Nodes in Branch and Bound with Sufficient Tree Representation

Aligning Language Models with Demonstrated Feedback

SimulPL: Aligning Human Preferences in Simultaneous Machine Translation

Efficient Learning with Sine-Activated Low-Rank Matrices

Accelerating Goal-Conditioned Reinforcement Learning Algorithms and Research

Highly Efficient Self-Adaptive Reward Shaping for Reinforcement Learning

Learning Transformer-based World Models with Contrastive Predictive Coding

Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction

PhysPDE: Rethinking PDE Discovery and a Physical HYpothesis Selection Benchmark

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

SecureGS: Boosting the Security and Fidelity of 3D Gaussian Splatting Steganography

Trajectory-Class-Aware Multi-Agent Reinforcement Learning

nGPT: Normalized Transformer with Representation Learning on the Hypersphere

From Few to Many: Self-Improving Many-Shot Reasoners Through Iterative Optimization and Generation

SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation

Valid Conformal Prediction for Dynamic GNNs

Analytic DAG Constraints for Differentiable DAG Learning

Self-Normalized Resets for Plasticity in Continual Learning

Robustness of Quantum Algorithms for Nonconvex Optimization

Intelligence at the Edge of Chaos

GLoRa: A Benchmark to Evaluate the Ability to Learn Long-Range Dependencies in Graphs

xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation

ManiSkill-HAB: A Benchmark for Low-Level Manipulation in Home Rearrangement Tasks

Generalized Consistency Trajectory Models for Image Manipulation

Gradient-Free Generation for Hard-Constrained Systems

Training Free Guided Flow-Matching with Optimal Control

Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization

Hotspot-Driven Peptide Design via Multi-Fragment Autoregressive Extension

ChemAgent: Self-updating Memories in Large Language Models Improves Chemical Reasoning

TSVD: Bridging Theory and Practice in Continual Learning with Pre-trained Models

KLay: Accelerating Arithmetic Circuits for Neurosymbolic AI

CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models

DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors

Beyond Mere Token Analysis: A Hypergraph Metric Space Framework for Defending Against Socially Engineered LLM Attacks

Interaction Asymmetry: A General Principle for Learning Composable Abstractions

FLOPS: Forward Learning with OPtimal Sampling

Non-Stationary Dueling Bandits Under a Weighted Borda Criterion

Tuning-Free Bilevel Optimization: New Algorithms and Convergence Analysis

Regularization by Texts for Latent Diffusion Inverse Solvers

PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training

AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors

Block Verification Accelerates Speculative Decoding

Accelerating Training with Neuron Interaction and Nowcasting Networks

Repulsive Latent Score Distillation for Solving Inverse Problems

Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries

Breaking the $\log(1/\Delta_2)$ Barrier: Better Batched Best Arm Identification with Adaptive Grids

McEval: Massively Multilingual Code Evaluation

MRS: A Fast Sampler for Mean Reverting Diffusion based on ODE and SDE Solvers

Inverse Scaling: When Bigger Isn't Better

RB-Modulation: Training-Free Stylization using Reference-Based Modulation

CLDyB: Towards Dynamic Benchmarking for Continual Learning with Pre-trained Models

CREAM: Consistency Regularized Self-Rewarding Language Models

BLEND: Behavior-guided Neural Population Dynamics Modeling via Privileged Knowledge Distillation

What Makes Large Language Models Reason in (Multi-Turn) Code Generation?

Unlocking the Power of Function Vectors for Characterizing and Mitigating Catastrophic Forgetting in Continual Instruction Tuning

SD-LoRA: Scalable Decoupled Low-Rank Adaptation for Class Incremental Learning

Anyprefer: An Agentic Framework for Preference Data Synthesis

Long-Sequence Recommendation Models Need Decoupled Embeddings

Mini-batch Coresets for Memory-efficient Language Model Training on Data Mixtures

Beyond Random Masking: When Dropout meets Graph Convolutional Networks

Re-evaluating Open-ended Evaluation of Large Language Models

MrSteve: Instruction-Following Agents in Minecraft with What-Where-When Memory

PEAR: Primitive Enabled Adaptive Relabeling for Boosting Hierarchical Reinforcement Learning

Exploring Learning Complexity for Efficient Downstream Dataset Pruning

Post-hoc Reward Calibration: A Case Study on Length Bias

Hot-pluggable Federated Learning: Bridging General and Personalized FL via Dynamic Selection

Animate-X: Universal Character Image Animation with Enhanced Motion Representation

MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation

A Large-Scale 3D Face Mesh Video Dataset via Neural Re-parameterized Optimization

When Graph Neural Networks Meet Dynamic Mode Decomposition

Attention with Markov: A Curious Case of Single-layer Transformers

Supervised and Semi-Supervised Diffusion Maps with Label-Driven Diffusion

Fréchet Wavelet Distance: A Domain-Agnostic Metric for Image Generation

InfoGS: Efficient Structure-Aware 3D Gaussians via Lightweight Information Shaping

Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late In Training

HyPoGen: Optimization-Biased Hypernetworks for Generalizable Policy Generation

Multi-objective Differentiable Neural Architecture Search

Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues

Physics-Informed Diffusion Models

OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling

Equivariant Neural Functional Networks for Transformers

How Low Can You Go? Searching for the Intrinsic Dimensionality of Complex Networks using Metric Node Embeddings

Efficient Neuron Segmentation in Electron Microscopy by Affinity-Guided Queries

Emergent Orientation Maps —— Mechanisms, Coding Efficiency and Robustness

FormalAlign: Automated Alignment Evaluation for Autoformalization

KAA: Kolmogorov-Arnold Attention for Enhancing Attentive Graph Neural Networks

Bridging the Gap Between f-divergences and Bayes Hilbert Spaces

Probabilistic Learning to Defer: Handling Missing Expert Annotations and Controlling Workload Distribution

Do You Keep an Eye on What I Ask? Mitigating Multimodal Hallucination via Attention-Guided Ensemble Decoding

Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference

Context Clues: Evaluating Long Context Models for Clinical Prediction Tasks on EHR Data

LICORICE: Label-Efficient Concept-Based Interpretable Reinforcement Learning

Selective Task Group Updates for Multi-Task Optimization

Deconstructing What Makes a Good Optimizer for Autoregressive Language Models

Re-Thinking Inverse Graphics With Large Language Models

Shared-AE: Automatic Identification of Shared Subspaces in High-dimensional Neural and Behavioral Activity

Towards Unbiased Learning in Semi-Supervised Semantic Segmentation

Revisiting Zeroth-Order Optimization: Minimum-Variance Two-Point Estimators and Directionally Aligned Perturbations

On the Price of Differential Privacy for Hierarchical Clustering

HAMSTER: Hierarchical Action Models for Open-World Robot Manipulation

Can a MISL Fly? Analysis and Ingredients for Mutual Information Skill Learning

Rethinking Diffusion Posterior Sampling: From Conditional Score Estimator to Maximizing a Posterior

Why In-Context Learning Models are Good Few-Shot Learners?

Computational Explorations of Total Variation Distance

Isometric Regularization for Manifolds of Functional Data

Advancing Graph Generation through Beta Diffusion

ReCogLab: a framework testing relational reasoning & cognitive hypotheses on LLMs

Revealing and Mitigating Over-Attention in Knowledge Editing

Improved Algorithms for Kernel Matrix-Vector Multiplication Under Sparsity Assumptions

Mitigating Object Hallucination in MLLMs via Data-augmented Phrase-level Alignment

Do Egocentric Video-Language Models Truly Understand Hand-Object Interactions?

Optimistic Games for Combinatorial Bayesian Optimization with Application to Protein Design

Tree of Attributes Prompt Learning for Vision-Language Models

Multi-Modal and Multi-Attribute Generation of Single Cells with CFGen

Learning from negative feedback, or positive feedback or both

DEPT: Decoupled Embeddings for Pre-training Language Models

Learning Gain Map for Inverse Tone Mapping

Advancing Prompt-Based Methods for Replay-Independent General Continual Learning

Approximation algorithms for combinatorial optimization with predictions

Learning to Explore and Exploit with GNNs for Unsupervised Combinatorial Optimization

Robust Barycenter Estimation using Semi-Unbalanced Neural Optimal Transport

Mitigating Hallucination in Large Vision-Language Models via Modular Attribution and Intervention

On the Expressiveness of Rational ReLU Neural Networks With Bounded Depth

AgentRefine: Enhancing Agent Generalization through Refinement Tuning

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation

Approximating Full Conformal Prediction for Neural Network Regression with Gauss-Newton Influence

Lift Your Molecules: Molecular Graph Generation in Latent Euclidean Space

Efficiently Parameterized Neural Metriplectic Systems

Near-Exact Privacy Amplification for Matrix Mechanisms

Federated Continual Learning Goes Online: Uncertainty-Aware Memory Management for Vision Tasks and Beyond

TimeKAN: KAN-based Frequency Decomposition Learning Architecture for Long-term Time Series Forecasting

VAE-Var: Variational Autoencoder-Enhanced Variational Methods for Data Assimilation in Meteorology

PostCast: Generalizable Postprocessing for Precipitation Nowcasting via Unsupervised Blurriness Modeling

Improving Convergence Guarantees of Random Subspace Second-order Algorithm for Nonconvex Optimization

BAMDP Shaping: a Unified Theoretical Framework for Intrinsic Motivation and Reward Shaping

Can Large Language Models Understand Symbolic Graphics Programs?

Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets

Flow: Modularized Agentic Workflow Automation

Identifiability for Gaussian Processes with Holomorphic Kernels

IDInit: A Universal and Stable Initialization Method for Neural Network Training

Addressing Label Shift in Distributed Learning via Entropy Regularization

SBSC: Step-by-Step Coding for Improving Mathematical Olympiad Performance

Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics

Context-aware Dynamic Pruning for Speech Foundation Models

Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement

The Superposition of Diffusion Models Using the Itô Density Estimator

RAPID: Retrieval Augmented Training of Differentially Private Diffusion Models

Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold

Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction

Near-optimal Active Regression of Single-Index Models

SplineGS: Learning Smooth Trajectories in Gaussian Splatting for Dynamic Scene Reconstruction

A Simple Approach to Unifying Diffusion-based Conditional Generation

Density estimation with LLMs: a geometric investigation of in-context learning trajectories

Sparse autoencoders reveal selective remapping of visual concepts during adaptation

Preference Elicitation for Offline Reinforcement Learning

Language Models Need Inductive Biases to Count Inductively

TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation

HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment

Learning system dynamics without forgetting

Broadening Target Distributions for Accelerated Diffusion Models via a Novel Analysis Approach

Improving Unsupervised Constituency Parsing via Maximizing Semantic Information

MamBEV: Enabling State Space Models to Learn Birds-Eye-View Representations

econSG: Efficient and Multi-view Consistent Open-Vocabulary 3D Semantic Gaussians

Large Language Models Often Say One Thing and Do Another

DSBench: How Far Are Data Science Agents from Becoming Data Science Experts?

Beyond Worst-Case Dimensionality Reduction for Sparse Vectors

NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative

Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection

ACTIVE: Offline Reinforcement Learning via Adaptive Imitation and In-sample $V$-Ensemble

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

How new data permeates LLM knowledge and how to dilute it

Hierarchical Uncertainty Estimation for Learning-based Registration in Neuroimaging

Diffusing to the Top: Boost Graph Neural Networks with Minimal Hyperparameter Tuning

Improved Techniques for Optimization-Based Jailbreaking on Large Language Models

Enhancing Learning with Label Differential Privacy by Vector Approximation

Accelerating neural network training: An analysis of the AlgoPerf competition

NetMoE: Accelerating MoE Training through Dynamic Sample Placement

Text-to-Image Rectified Flow as Plug-and-Play Priors

NextBestPath: Efficient 3D Mapping of Unseen Environments

AdaFisher: Adaptive Second Order Optimization via Fisher Information

On the Adversarial Risk of Test Time Adaptation: An Investigation into Realistic Test-Time Data Poisoning

Evidential Learning-based Certainty Estimation for Robust Dense Feature Matching

RocketEval: Efficient automated LLM evaluation via grading checklist

IGL-Bench: Establishing the Comprehensive Benchmark for Imbalanced Graph Learning

Can Reinforcement Learning Solve Asymmetric Combinatorial-Continuous Zero-Sum Games?

Resolution Attack: Exploiting Image Compression to Deceive Deep Neural Networks

Cauchy-Schwarz Regularizers

Perturbation-Restrained Sequential Model Editing

Uni$^2$Det: Unified and Universal Framework for Prompt-Guided Multi-dataset 3D Detection

A new framework for evaluating model out-of-distribution generalisation for the biochemical domain

ContextGNN: Beyond Two-Tower Recommendation Systems

Provable Convergence Bounds for Hybrid Dynamical Sampling and Optimization

BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models

Language-Assisted Feature Transformation for Anomaly Detection

HR-Extreme: A High-Resolution Dataset for Extreme Weather Forecasting

Wavelet Diffusion Neural Operator

Towards Self-Supervised Covariance Estimation in Deep Heteroscedastic Regression

Mentored Learning: Improving Generalization and Convergence of Student Learner

SysBench: Can LLMs Follow System Message?

OMG: Opacity Matters in Material Modeling with Gaussian Splatting

Training-Free Activation Sparsity in Large Language Models

Geometry-aware RL for Manipulation of Varying Shapes and Deformable Objects

Building, Reusing, and Generalizing Abstract Representations from Concrete Sequences

Certifying Language Model Robustness with Fuzzed Randomized Smoothing: An Efficient Defense Against Backdoor Attacks

Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning

FreqPrior: Improving Video Diffusion Models with Frequency Filtering Gaussian Noise

UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

A Benchmark for Semantic Sensitive Information in LLMs Outputs

Diverse Preference Learning for Capabilities and Alignment

CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models

MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer

Predicate Hierarchies Improve Few-Shot State Classification

Differential Transformer

Injective flows for star-like manifolds

Efficient Off-Policy Learning for High-Dimensional Action Spaces

Expand and Compress: Exploring Tuning Principles for Continual Spatio-Temporal Graph Forecasting

Episodic Memories Generation and Evaluation Benchmark for Large Language Models

GeSubNet: Gene Interaction Inference for Disease Subtype Network Generation

Transformer Learns Optimal Variable Selection in Group-Sparse Classification

On the Feature Learning in Diffusion Models

On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

DarkBench: Benchmarking Dark Patterns in Large Language Models

VTDexManip: A Dataset and Benchmark for Visual-tactile Pretraining and Dexterous Manipulation with Reinforcement Learning

Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts

On the Role of Attention Heads in Large Language Model Safety

Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement Learning

RECAST: Reparameterized, Compact weight Adaptation for Sequential Tasks

Non-myopic Generation of Language Models for Reasoning and Planning

Uncovering Gaps in How Humans and LLMs Interpret Subjective Language

Extreme Risk Mitigation in Reinforcement Learning using Extreme Value Theory

SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration

Efficient Inference for Large Language Model-based Generative Recommendation

RefactorBench: Evaluating Stateful Reasoning in Language Agents Through Code

TIGeR: Unifying Text-to-Image Generation and Retrieval with Large Multimodal Models

Transformers Struggle to Learn to Search

Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation

Fast Training of Sinusoidal Neural Fields via Scaling Initialization

ZIP: An Efficient Zeroth-order Prompt Tuning for Black-box Vision-Language Models

Stochastic Semi-Gradient Descent for Learning Mean Field Games with Population-Aware Function Approximation

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

TIPS: Text-Image Pretraining with Spatial awareness

Generalized Behavior Learning from Diverse Demonstrations

BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval

Revisiting Convolution Architecture in the Realm of DNA Foundation Models

Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models

COAT: Compressing Optimizer states and Activations for Memory-Efficient FP8 Training

Learn-by-interact: A Data-Centric Framework For Self-Adaptive Agents in Realistic Environments

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

SANA: Efficient High-Resolution Text-to-Image Synthesis with Linear Diffusion Transformers

A transfer learning framework for weak to strong generalization

Direct Post-Training Preference Alignment for Multi-Agent Motion Generation Model Using Implicit Feedback from Pre-training Demonstrations

LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias

LIFe-GoM: Generalizable Human Rendering with Learned Iterative Feedback Over Multi-Resolution Gaussians-on-Mesh

Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling

VCR: Pixel-Level Complex Reasoning by Restoring Occluded Text

Generalization v.s. Memorization: Tracing Language Models’ Capabilities Back to Pretraining Data

On Linear Representations and Pretraining Data Frequency in Language Models

CPSample: Classifier Protected Sampling for Guarding Training Data During Diffusion

Point-based Instance Completion with Scene Constraints

One Hundred Neural Networks and Brains Watching Videos: Lessons from Alignment

Frame-Voyager: Learning to Query Frames for Video Large Language Models

Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?

Exploiting Distribution Constraints for Scalable and Efficient Image Retrieval

Halton Scheduler for Masked Generative Image Transformer

Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Is uniform expressivity too restrictive? Towards efficient expressivity of GNNs

ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments

Backdooring Vision-Language Models with Out-Of-Distribution Data

ImpScore: A Learnable Metric For Quantifying The Implicitness Level of Sentences

Jamba: Hybrid Transformer-Mamba Language Models

Geometry of Long-Tailed Representation Learning: Rebalancing Features for Skewed Distributions

On the Modeling Capabilities of Large Language Models for Sequential Decision Making

MaestroMotif: Skill Design from Artificial Intelligence Feedback

A Causal Lens for Learning Long-term Fair Policies

DINOv2: Learning Robust Visual Features without Supervision

Watermark Anything With Localized Messages

Root Cause Analysis of Anomalies in Multivariate Time Series through Granger Causal Discovery

Human-inspired Episodic Memory for Infinite Context LLMs

E-Valuating Classifier Two-Sample Tests

See What You Are Told: Visual Attention Sink in Large Multimodal Models

SPA-BENCH: A COMPREHENSIVE BENCHMARK FOR SMARTPHONE AGENT EVALUATION

Revisiting Large-Scale Non-convex Distributionally Robust Optimization

Lightning-Fast Image Inversion and Editing for Text-to-Image Diffusion Models

DRESSing Up LLM: Efficient Stylized Question-Answering via Style Subspace Editing

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark for Large Language Models

Metalic: Meta-Learning In-Context with Protein Language Models

Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models

Hummingbird: High Fidelity Image Generation via Multimodal Context Alignment

ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks

Learning Dynamics of LLM Finetuning

Towards counterfactual fairness through auxiliary variables

ToolGen: Unified Tool Retrieval and Calling via Generation

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods

On the Optimization and Generalization of Multi-head Attention

Consistency Checks for Language Model Forecasters

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

Latent-EnSF: A Latent Ensemble Score Filter for High-Dimensional Data Assimilation with Sparse Observation Data

The Foundations of Tokenization: Statistical and Computational Concerns

Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaptation

Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation

Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only

Faster, More Efficient RLHF through Off-Policy Asynchronous Learning

On the Benefits of Attribute-Driven Graph Domain Adaptation

Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better

Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching

Language Guided Skill Discovery

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

MaskBit: Embedding-free Image Generation via Bit Tokens

Mechanistic Permutability: Match Features Across Layers

Learn Your Reference Model for Real Good Alignment

RuAG: Learned-rule-augmented Generation for Large Language Models

From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data

RegMix: Data Mixture as Regression for Language Model Pre-training

Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates

Scaling up Masked Diffusion Models on Text

Bootstrapping Language Models with DPO Implicit Rewards

DiffGAD: A Diffusion-based Unsupervised Graph Anomaly Detector

Atomas: Hierarchical Adaptive Alignment on Molecule-Text for Unified Molecule Understanding and Generation

Data Unlearning in Diffusion Models

What Secrets Do Your Manifolds Hold? Understanding the Local Geometry of Generative Models

SELF-EVOLVED REWARD LEARNING FOR LLMS

DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agent

Optimization by Parallel Quasi-Quantum Annealing with Gradient-Based Sampling

MMTEB: Massive Multilingual Text Embedding Benchmark

MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models

Shedding Light on Time Series Classification using Interpretability Gated Networks

OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning

Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation

HQGS: High-Quality Novel View Synthesis with Gaussian Splatting in Degraded Scenes

MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection

SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction

Optimal Flow Transport and its Entropic Regularization: a GPU-friendly Matrix Iterative Algorithm for Flow Balance Satisfaction

TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees

BodyGen: Advancing Towards Efficient Embodiment Co-Design

PN-GAIL: Leveraging Non-optimal Information from Imperfect Demonstrations

ScImage: How good are multimodal large language models at scientific text-to-image generation?

ParaSolver: A Hierarchical Parallel Integral Solver for Diffusion Models

NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer

Deconstructing Denoising Diffusion Models for Self-Supervised Learning

Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models

Warm Diffusion: Recipe for Blur-Noise Mixture Diffusion Models

3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation

PhiNets: Brain-inspired Non-contrastive Learning Based on Temporal Prediction Hypothesis

DGQ: Distribution-Aware Group Quantization for Text-to-Image Diffusion Models

Towards Unified Human Motion-Language Understanding via Sparse Interpretable Characterization

AgentSquare: Automatic LLM Agent Search in Modular Design Space

INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge

MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models

Flow Distillation Sampling: Regularizing 3D Gaussians with Pre-trained Matching Priors

From an LLM Swarm to a PDDL-empowered Hive: Planning Self-executed Instructions in a Multi-modal Jungle

Accelerating Diffusion Transformers with Token-wise Feature Caching

SpaceGNN: Multi-Space Graph Neural Network for Node Anomaly Detection with Extremely Limited Labels

VLAS: Vision-Language-Action Model with Speech Instructions for Customized Robot Manipulation

Near, far: Patch-ordering enhances vision foundation models' scene understanding

Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model

Model-based Offline Reinforcement Learning with Lower Expectile Q-Learning

CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models

Metamizer: A Versatile Neural Optimizer for Fast and Accurate Physics Simulations

RRM: Robust Reward Model Training Mitigates Reward Hacking

PWM: Policy Learning with Multi-Task World Models

EgoSim: Egocentric Exploration in Virtual Worlds with Multi-modal Conditioning

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models

Causal Information Prioritization for Efficient Reinforcement Learning

PQMass: Probabilistic Assessment of the Quality of Generative Models using Probability Mass Estimation

Quamba: A Post-Training Quantization Recipe for Selective State Space Models

Probabilistic Geometric Principal Component Analysis with application to neural data

A General Framework for Off-Policy Learning with Partially-Observed Reward

Bayesian Image Regression with Soft-thresholded Conditional Autoregressive Prior

GS-CPR: Efficient Camera Pose Refinement via 3D Gaussian Splatting

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

Exponential Topology-enabled Scalable Communication in Multi-agent Reinforcement Learning

Towards a Complete Logical Framework for GNN Expressiveness

DCT-CryptoNets: Scaling Private Inference in the Frequency Domain

Air Quality Prediction with Physics-Guided Dual Neural ODEs in Open Systems

FlashRNN: I/O-Aware Optimization of Traditional RNNs on modern hardware

VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks

PaPaGei: Open Foundation Models for Optical Physiological Signals

U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

Conformal Prediction Sets Can Cause Disparate Impact

GSBA$^K$: $top$-$K$ Geometric Score-based Black-box Attack

Minimalistic Predictions for Online Class Constraint Scheduling

Energy-based Backdoor Defense Against Federated Graph Learning

Rethinking the generalization of drug target affinity prediction algorithms via similarity aware evaluation

Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization

Manifolds, Random Matrices and Spectral Gaps: The geometric phases of generative diffusion

A Truncated Newton Method for Optimal Transport

A Training-Free Sub-quadratic Cost Transformer Model Serving Framework with Hierarchically Pruned Attention

Learning Partial Graph Matching via Optimal Partial Transport

GNNs Getting ComFy: Community and Feature Similarity Guided Rewiring

Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models

Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models

Mixture-of-Agents Enhances Large Language Model Capabilities

Scaling Instruction-tuned LLMs to Million-token Contexts via Hierarchical Synthetic Data Generation

Enhancing End-to-End Autonomous Driving with Latent World Model

API Pack: A Massive Multi-Programming Language Dataset for API Call Generation

Vision CNNs trained to estimate spatial latents learned similar ventral-stream-aligned representations

Generating Physical Dynamics under Priors

Emergence of meta-stable clustering in mean-field transformer models

Efficient Reinforcement Learning with Large Language Model Priors

MOFFlow: Flow Matching for Structure Prediction of Metal-Organic Frameworks

Steering LLMs' Behavior with Concept Activation Vectors

Open-Source vs Close-Source: The Context Utilization Challenge

I Can Hear You: Selective Robust Training for Deepfake Audio Detection

Scaling Speech-Text Pre-training with Synthetic Interleaved Data

Operator Deep Smoothing for Implied Volatility

Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels

SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding

TAU-106K: A New Dataset for Comprehensive Understanding of Traffic Accident

TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types

OASIS Uncovers: High-Quality T2I Models, Same Old Stereotypes

Correlating instruction-tuning (in multimodal models) with vision-language processing (in the brain)

Language Model Alignment in Multilingual Trolley Problems

Aria-MIDI: A Dataset of Piano MIDI Files for Symbolic Music Modeling

UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation

Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object Detection

Towards Homogeneous Lexical Tone Decoding from Heterogeneous Intracranial Recordings

Learning Structured Universe Graph with Outlier OOD Detection for Partial Matching

Not All Language Model Features Are One-Dimensionally Linear

BadRobot: Jailbreaking Embodied LLMs in the Physical World

Strong Model Collapse

Depth Any Video with Scalable Synthetic Data

Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models

Glimpse: Enabling White-Box Methods to Use Proprietary Models for Zero-Shot LLM-Generated Text Detection

VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents

Diversity-Rewarded CFG Distillation

Real-time design of architectural structures with differentiable mechanics and neural networks

Image Watermarks are Removable using Controllable Regeneration from Clean Noise

FairDen: Fair Density-Based Clustering

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

Personalized Visual Instruction Tuning

KGARevion: An AI Agent for Knowledge-Intensive Biomedical QA

CyberHost: A One-stage Diffusion Framework for Audio-driven Talking Body Generation

Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

Complexity Lower Bounds of Adaptive Gradient Algorithms for Non-convex Stochastic Optimization under Relaxed Smoothness

SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations

Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models

From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions

DSPO: Direct Score Preference Optimization for Diffusion Model Alignment

Fine-tuning with Reserved Majority for Noise Reduction

Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens

SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression

Robust System Identification: Finite-sample Guarantees and Connection to Regularization

InterMask: 3D Human Interaction Generation via Collaborative Masked Modeling

Finding and Only Finding Differential Nash Equilibria by Both Pretending to be a Follower

Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning

CirT: Global Subseasonal-to-Seasonal Forecasting with Geometry-inspired Transformer

Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization

RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Dist Loss: Enhancing Regression in Few-Shot Region through Distribution Distance Constraint

Enhancing Vision-Language Model with Unmasked Token Alignment

OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code

Improved Sampling Algorithms for Lévy-Itô Diffusion Models

LeanVec: Searching vectors faster by making them fit

New Algorithms for the Learning-Augmented k-means Problem

Understanding Fairness Surrogate Functions in Algorithmic Fairness

On the Inherent Privacy Properties of Discrete Denoising Diffusion Models

Soft Merging of Experts with Adaptive Routing

Unlocking Guidance for Discrete State-Space Diffusion and Flow Models

See It from My Perspective: How Language Affects Cultural Bias in Image Understanding

Agree to Disagree: Demystifying Homogeneous Deep Ensembles through Distributional Equivalence

What Has Been Overlooked in Contrastive Source-Free Domain Adaptation: Leveraging Source-Informed Latent Augmentation within Neighborhood Context

GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models

Solving Inverse Problems with Model Mismatch using Untrained Neural Networks within Model-based Architectures

Causal Reasoning and Large Language Models: Opening a New Frontier for Causality

A Statistical Approach for Controlled Training Data Detection

Revisiting Energy Based Models as Policies: Ranking Noise Contrastive Estimation and Interpolating Energy Models

AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models

Breaking Class Barriers: Efficient Dataset Distillation via Inter-Class Feature Compensator

Transition Path Sampling with Improved Off-Policy Training of Diffusion Path Samplers

Interpreting Global Perturbation Robustness of Image Models using Axiomatic Spectral Importance Decomposition

From Complexity to Clarity: Analytical Expressions of Deep Neural Network Weights via Clifford Algebra and Convexity

Sensitivity-Aware Amortized Bayesian Inference

Exploiting Hankel-Toeplitz Structures for Fast Computation of Kernel Precision Matrices

Forget the Data and Fine-Tuning! Just Fold the Network to Compress

Differentially Private Federated Learning with Time-Adaptive Privacy Spending

LoRA Learns Less and Forgets Less

VideoGLUE: Video General Understanding Evaluation of Foundation Models

Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning

Hessian Free Efficient Single Loop Iterative Differentiation Methods for Bi-Level Optimization Problems

Transformer Encoder Satisfiability: Complexity and Impact on Formal Reasoning

Plug, Play, and Generalize: Length Extrapolation with Pointer-Augmented Neural Memory

Fair Clustering in the Sliding Window Model

Risk-Controlling Model Selection via Guided Bayesian Optimization

Equivariant Symmetry Breaking Sets

Robustness Auditing for Linear Regression: To Singularity and Beyond

Reward Guided Latent Consistency Distillation

Linear Mode Connectivity in Differentiable Tree Ensembles

Strong Preferences Affect the Robustness of Preference Models and Value Alignment

Manifold Learning by Mixture Models of VAEs for Inverse Problems

Information Theoretic Text-to-Image Alignment

Efficient Cross-Episode Meta-RL

How Two-Layer Neural Networks Learn, One (Giant) Step at a Time

Learning Regularized Graphon Mean-Field Games with Unknown Graphons

Boundary constrained Gaussian processes for robust physics-informed machine learning of linear partial differential equations

HiRA: Parameter-Efficient Hadamard High-Rank Adaptation for Large Language Models

Graph Neural Preconditioners for Iterative Solutions of Sparse Linear Systems

MGCFNN: A Neural MultiGrid Solver with Novel Fourier Neural Network for High Wave Number Helmholtz Equations

Measuring memorization in RLHF for code completion

HyperPLR: Hypergraph Generation through Projection, Learning, and Reconstruction

Factual Context Validation and Simplification: A Scalable Method to Enhance GPT Trustworthiness and Efficiency

A Curious Case of the Missing Measure: Better Scores and Worse Generation

PolyNet: Learning Diverse Solution Strategies for Neural Combinatorial Optimization

Flow With What You Know

Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models

Provable Uncertainty Decomposition via Higher-Order Calibration

Understanding Model Calibration - A gentle introduction and visual exploration of calibration and the expected calibration error (ECE)

“I Am the One and Only, Your Cyber BFF”: Understanding the Impact of GenAI Requires Understanding the Impact of Anthropomorphic AI

Restating the Proof of Linear Convergence for Linear GNNs

A Visual Dive into Conditional Flow Matching

Multi-modal Learning: A Look Back and the Road Ahead

TopoNets: High performing vision and language models with brain-like topography

Intricacies of Feature Geometry in Large Language Models

The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?

Positional Embeddings in Transformer Models: Evolution from Text to Vision Domains

Analysing The Spectral Biases in Generative Models

How to visualize training dynamics in neural networks

Flaws of ImageNet, Computer Vision's Favourite Dataset

Efficient Model Editing with Task-Localized Sparse Fine-tuning

Building Blocks of Differentially Private Training

Provence: efficient and robust context pruning for retrieval-augmented generation

A primer on analytical learning dynamics of nonlinear neural networks

Robustness Reprogramming for Representation Learning

Rethinking Graph Prompts: Unraveling the Power of Data Manipulation in Graph Neural Networks

Vision-LSTM: xLSTM as Generic Vision Backbone

Models trained with unnormalized density functions: A need for a course correction

Fine-Tuning Token-Based Large Multimodal Models: What Works, What Doesn’t and What's Next

Test-time Adaptation for Regression by Subspace Alignment

Peeking Behind Closed Doors: Risks of LLM Evaluation by Private Data Curators

Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient

3D Vision-Language Gaussian Splatting

Learning the Optimal Stopping for Early Classification within Finite Horizons via Sequential Probability Ratio Test

Lost in Prediction: Why Social Media Narratives Don't Help Macroeconomic Forecasting?

Dynamic Neural Fortresses: An Adaptive Shield for Model Extraction Defense

Adaptive Gradient Clipping for Robust Federated Learning

Holistically Evaluating the Environmental Impact of Creating Language Models

Towards Understanding Text Hallucination of Diffusion Models via Local Generation Bias

HOPE for a Robust Parameterization of Long-memory State Space Models

GS-LiDAR: Generating Realistic LiDAR Point Clouds with Panoramic Gaussian Splatting

Neural Eulerian Scene Flow Fields

Improved Sampling Of Diffusion Models In Fluid Dynamics With Tweedie's Formula

Probing the Latent Hierarchical Structure of Data via Diffusion Models

Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming

Variational Bayesian Pseudo-Coreset

The Computational Complexity of Circuit Discovery for Inner Interpretability

Do Deep Neural Network Solutions Form a Star Domain?

Rethinking Artistic Copyright Infringements In the Era Of Text-to-Image Generative Models

Unsupervised Zero-Shot Reinforcement Learning via Dual-Value Forward-Backward Representation

Edge-aware Image Smoothing with Relative Wavelet Domain Representation

Toward Exploratory Inverse Constraint Inference with Generative Diffusion Verifiers

Debiasing Mini-Batch Quadratics for Applications in Deep Learning

Uni-Sign: Toward Unified Sign Language Understanding at Scale

Domain Guidance: A Simple Transfer Approach for a Pre-trained Diffusion Model

ProtComposer: Compositional Protein Structure Generation with 3D Ellipsoids

Mufu: Multilingual Fused Learning for Low-Resource Translation with LLM

Transformers Can Learn Temporal Difference Methods for In-Context Reinforcement Learning

Emergence of a High-Dimensional Abstraction Phase in Language Transformers

Residual Connections and Normalization Can Provably Prevent Oversmoothing in GNNs

ThunderKittens: Simple, Fast, and $\textit{Adorable}$ Kernels

Incremental Causal Effect for Time to Treatment Initialization

A Percolation Model of Emergence: Analyzing Transformers Trained on a Formal Language

BOND: Aligning LLMs with Best-of-N Distillation

ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization

Influence-Guided Diffusion for Dataset Distillation

Monitoring Latent World States in Language Models with Propositional Probes

Factor Graph-based Interpretable Neural Networks

OmniRe: Omni Urban Scene Reconstruction

TRACE: Temporal Grounding Video LLM via Causal Event Modeling

Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances

Transformer Meets Twicing: Harnessing Unattended Residual Information

Continuous Exposure Learning for Low-light Image Enhancement using Neural ODEs

Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models

On the Fourier analysis in the SO(3) space : the EquiLoPO Network

Efficient Source-Free Time-Series Adaptation via Parameter Subspace Disentanglement

Improved Diffusion-based Generative Model with Better Adversarial Robustness

Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models

Generalized Principal-Agent Problem with a Learning Agent

Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs

PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction

Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models

Towards Calibrated Deep Clustering Network

Private Mechanism Design via Quantile Estimation

Rapid Selection and Ordering of In-Context Demonstrations via Prompt Embedding Clustering

MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid

Revisiting Mode Connectivity in Neural Networks with Bezier Surface

TexTailor: Customized Text-aligned Texturing via Effective Resampling

Elliptic Loss Regularization

SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios

SysCaps: Language Interfaces for Simulation Surrogates of Complex Systems

SAM 2: Segment Anything in Images and Videos

Linear Recurrences Accessible to Everyone

Aligned Better, Listen Better For Audio-Visual Large Language Models

LASER: A Neuro-Symbolic Framework for Learning Spatio-Temporal Scene Graphs with Weak Supervision

GenXD: Generating Any 3D and 4D Scenes

Discrete GCBF Proximal Policy Optimization for Multi-agent Safe Optimal Control

Open-Vocabulary Customization from CLIP via Data-Free Knowledge Distillation

Long Context Compression with Activation Beacon

Singular Subspace Perturbation Bounds via Rectangular Random Matrix Diffusions

Natural Language Inference Improves Compositionality in Vision-Language Models

Bayesian Optimization via Continual Variational Last Layer Training

Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning

Federated $Q$-Learning with Reference-Advantage Decomposition: Almost Optimal Regret and Logarithmic Communication Cost

RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything

Efficient and Robust Neural Combinatorial Optimization via Wasserstein-Based Coresets

Adaptive Rank Allocation: Speeding Up Modern Transformers with RaNA Adapters

PIN: Prolate Spheroidal Wave Function-based Implicit Neural Representations

RevisEval: Improving LLM-as-a-Judge via Response-Adapted References

MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards

How much of my dataset did you use? Quantitative Data Usage Inference in Machine Learning

MGMapNet: Multi-Granularity Representation Learning for End-to-End Vectorized HD Map Construction

Variational Search Distributions

High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws

Differentially private learners for heterogeneous treatment effects

Neuroplastic Expansion in Deep Reinforcement Learning

Balanced Ranking with Relative Centrality: A multi-core periphery perspective

Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation

A General Framework for Producing Interpretable Semantic Text Embeddings

Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning

Explore Theory of Mind: program-guided adversarial data generation for theory of mind reasoning

What should a neuron aim for? Designing local objective functions based on information theory

EqNIO: Subequivariant Neural Inertial Odometry

A deep inverse-mapping model for a flapping robotic wing

Intermediate Layer Classifiers for OOD generalization

How Gradient descent balances features: A dynamical analysis for two-layer neural networks

Leave-One-Out Stable Conformal Prediction

Reinforcement Learning for Control of Non-Markovian Cellular Population Dynamics

PolyhedronNet: Representation Learning for Polyhedra with Surface-attributed Graph

Towards Understanding Why FixMatch Generalizes Better Than Supervised Learning

Differentiable Causal Discovery for Latent Hierarchical Causal Models

Fourier Sliced-Wasserstein Embedding for Multisets and Measures

Beyond Graphs: Can Large Language Models Comprehend Hypergraphs?

Generative Representational Instruction Tuning

Be More Diverse than the Most Diverse: Optimal Mixtures of Generative Models via Mixture-UCB Bandit Algorithms

Few-Class Arena: A Benchmark for Efficient Selection of Vision Models and Dataset Difficulty Measurement

Linear Partial Gromov-Wasserstein Embedding

Algorithmic Stability Based Generalization Bounds for Adversarial Training

Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures

Lossy Compression with Pretrained Diffusion Models

Spectro-Riemannian Graph Neural Networks

Flow matching achieves almost minimax optimal convergence

Learning-Guided Rolling Horizon Optimization for Long-Horizon Flexible Job-Shop Scheduling

Global Convergence of Policy Gradient in Average Reward MDPs

CollabEdit: Towards Non-destructive Collaborative Knowledge Editing

Gradient correlation is a key ingredient to accelerate SGD with momentum

Efficient Action-Constrained Reinforcement Learning via Acceptance-Rejection Method and Augmented MDPs

On a Connection Between Imitation Learning and RLHF

Boosting Multiple Views for pretrained-based Continual Learning

Long-Context Linear System Identification

Has the Deep Neural Network learned the Stochastic Process? An Evaluation Viewpoint

AtomSurf: Surface Representation for Learning on Protein Structures

SelectFormer: Private and Practical Data Selection for Transformers

Conformal Language Model Reasoning with Coherent Factuality

Generating Graphs via Spectral Diffusion

HiLo: A Learning Framework for Generalized Category Discovery Robust to Domain Shifts

Advancing LLM Reasoning Generalists with Preference Trees

CheapNet: Cross-attention on Hierarchical representations for Efficient protein-ligand binding Affinity Prediction

Mixture of In-Context Prompters for Tabular PFNs

Boltzmann Semantic Score: A Semantic Metric for Evaluating Large Vision Models Using Large Language Models

General Scene Adaptation for Vision-and-Language Navigation

uniINF: Best-of-Both-Worlds Algorithm for Parameter-Free Heavy-Tailed MABs

ParetoFlow: Guided Flows in Multi-Objective Optimization

ELBOing Stein: Variational Bayes with Stein Mixture Inference

3DMolFormer: A Dual-channel Framework for Structure-based Drug Discovery

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models

Null Counterfactual Factor Interactions for Goal-Conditioned Reinforcement Learning

Robust Gymnasium: A Unified Modular Benchmark for Robust Reinforcement Learning

Distribution Backtracking Builds A Faster Convergence Trajectory for Diffusion Distillation

Trajectory attention for fine-grained video motion control

Qinco2: Vector Compression and Search with Improved Implicit Neural Codebooks

Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL

Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning

Federated Domain Generalization with Data-free On-server Matching Gradient

LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs

Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models

Interpretable Causal Representation Learning for Biological Data in the Pathway Space

Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge

CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation

Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?

Learning to Contextualize Web Pages for Enhanced Decision Making by LLM Agents

WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models

Group Distributionally Robust Dataset Distillation with Risk Minimization

Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

Procedural Synthesis of Synthesizable Molecules

IPDreamer: Appearance-Controllable 3D Object Generation with Complex Image Prompts

An Intelligent Agentic System for Complex Image Restoration Problems

Better Instruction-Following Through Minimum Bayes Risk

Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination

CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion

Selective Label Enhancement Learning for Test-Time Adaptation

LeFusion: Controllable Pathology Synthesis via Lesion-Focused Diffusion Models

LiFT: Learning to Fine-Tune via Bayesian Parameter Efficient Meta Fine-Tuning

LASeR: Towards Diversified and Generalizable Robot Design with Large Language Models

Broaden your SCOPE! Efficient Multi-turn Conversation Planning for LLMs with Semantic Space

On Speeding Up Language Model Evaluation

Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient

DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes

DiffPuter: An EM-Driven Diffusion Model for Missing Data Imputation

AnoLLM: Large Language Models for Tabular Anomaly Detection

Compositional Entailment Learning for Hyperbolic Vision-Language Models

Iterative Substructure Extraction for Molecular Relational Learning with Interactive Graph Information Bottleneck

Visually Consistent Hierarchical Image Classification

Fast unsupervised ground metric learning with tree-Wasserstein distance

CR2PQ: Continuous Relative Rotary Positional Query for Dense Visual Representation Learning

ConMix: Contrastive Mixup at Representation Level for Long-tailed Deep Clustering

Realistic Evaluation of Deep Partial-Label Learning Algorithms

Simulating Human-like Daily Activities with Desire-driven Autonomy

Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition

Extendable and Iterative Structure Learning Strategy for Bayesian Networks

Automatic Curriculum Expert Iteration for Reliable LLM Reasoning

Input Space Mode Connectivity in Deep Neural Networks

Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model

Machine Unlearning via Simulated Oracle Matching

You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning

Tailoring Mixup to Data for Calibration

Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations

Durable Quantization Conditioned Misalignment Attack on Large Language Models

Demystifying Online Clustering of Bandits: Enhanced Exploration Under Stochastic and Smoothed Adversarial Contexts

HyperFace: Generating Synthetic Face Recognition Datasets by Exploring Face Embedding Hypersphere

Scaling Transformers for Low-Bitrate High-Quality Speech Coding

Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning

On Quantizing Neural Representation for Variable-Rate Video Coding

FedTMOS: Efficient One-Shot Federated Learning with Tsetlin Machine

Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models

Planning in Natural Language Improves LLM Search for Code Generation

FreDF: Learning to Forecast in the Frequency Domain

Training-Free Message Passing for Learning on Hypergraphs

ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing

Tamper-Resistant Safeguards for Open-Weight LLMs

DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

AdaGrad under Anisotropic Smoothness

Pushing the Limits of All-Atom Geometric Graph Neural Networks: Pre-Training, Scaling, and Zero-Shot Transfer

ND-SDF: Learning Normal Deflection Fields for High-Fidelity Indoor Reconstruction

On the self-verification limitations of large language models on reasoning and planning tasks

REFINE: Inversion-Free Backdoor Defense via Model Reprogramming

kNN Attention Demystified: A Theoretical Exploration for Scalable Transformers

Beyond Squared Error: Exploring Loss Design for Enhanced Training of Generative Flow Networks

Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning

RankSHAP: Shapley Value Based Feature Attributions for Learning to Rank

Graph Assisted Offline-Online Deep Reinforcement Learning for Dynamic Workflow Scheduling

Self-Evolving Multi-Agent Collaboration Networks for Software Development

Unlocking Point Processes through Point Set Diffusion

Chemistry-Inspired Diffusion with Non-Differentiable Guidance

GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-Time Alignment

Transformers Learn Low Sensitivity Functions: Investigations and Implications

TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies

A-Bench: Are LMMs Masters at Evaluating AI-generated Images?

Global Identifiability of Overcomplete Dictionary Learning via L1 and Volume Minimization

Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset

Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model

What Do You See in Common? Learning Hierarchical Prototypes over Tree-of-Life to Discover Evolutionary Traits

DUALFormer: Dual Graph Transformer

Collab: Controlled Decoding using Mixture of Agents for LLM Alignment

Provable weak-to-strong generalization via benign overfitting

Exploring The Loss Landscape Of Regularized Neural Networks Via Convex Duality

Refining CLIP's Spatial Awareness: A Visual-Centric Perspective

COME: Test-time Adaption by Conservatively Minimizing Entropy

A Probabilistic Perspective on Unlearning and Alignment for Large Language Models

Point Cluster: A Compact Message Unit for Communication-Efficient Collaborative Perception

Binary Losses for Density Ratio Estimation

Discovering Temporally Compositional Neural Manifolds with Switching Infinite GPFA

How to Probe: Simple Yet Effective Techniques for Improving Post-hoc Explanations

Neural Fluid Simulation on Geometric Surfaces

Looped Transformers for Length Generalization

Measuring Non-Adversarial Reproduction of Training Data in Large Language Models

PnP-Flow: Plug-and-Play Image Restoration with Flow Matching

InvestESG: A multi-agent reinforcement learning benchmark for studying climate investment as a social dilemma

Descent with Misaligned Gradients and Applications to Hidden Convexity

Enhancing Compositional Text-to-Image Generation with Reliable Random Seeds

Can One Modality Model Synergize Training of Other Modality Models?

Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis

Make Haste Slowly: A Theory of Emergent Structured Mixed Selectivity in Feature Learning ReLU Networks

Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention

Everything, Everywhere, All at Once: Is Mechanistic Interpretability Identifiable?

A Stochastic Approach to the Subset Selection Problem via Mirror Descent

Conformal Generative Modeling with Improved Sample Efficiency through Sequential Greedy Filtering

Breaking Neural Network Scaling Laws with Modularity

GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation

NeuralPlane: Structured 3D Reconstruction in Planar Primitives with Neural Fields

Captured by Captions: On Memorization and its Mitigation in CLIP Models

Steering Large Language Models between Code Execution and Textual Reasoning

Learning Task Belief Similarity with Latent Dynamics for Meta-Reinforcement Learning

COPER: Correlation-based Permutations for Multi-View Clustering

LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation

MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs

Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse

Conformal Structured Prediction

Affine Steerable Equivariant Layer for Canonicalization of Neural Networks

Learning under Temporal Label Noise

Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity

ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization

KooNPro: A Variance-Aware Koopman Probabilistic Model Enhanced by Neural Process for Time Series Forecasting

Charting the Design Space of Neural Graph Representations for Subgraph Matching

TTVD: Towards a Geometric Framework for Test-Time Adaptation Based on Voronoi Diagram

Convex Formulations for Training Two-Layer ReLU Neural Networks

Precise Localization of Memories: A Fine-grained Neuron-level Knowledge Editing Technique for LLMs

VVC-Gym: A Fixed-Wing UAV Reinforcement Learning Environment for Multi-Goal Long-Horizon Problems

Manifold Constraint Reduces Exposure Bias in Accelerated Diffusion Sampling

Expressivity of Neural Networks with Random Weights and Learned Biases

Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy

Time After Time: Deep-Q Effect Estimation for Interventions on When and What to do

DPLM-2: A Multimodal Diffusion Protein Language Model

Cross-Entropy Is All You Need To Invert the Data Generating Process

Regret-Optimal List Replicable Bandit Learning: Matching Upper and Lower Bounds

Privacy Auditing of Large Language Models

Physics-Informed Deep Inverse Operator Networks for Solving PDE Inverse Problems

On the Identification of Temporal Causal Representation with Instantaneous Dependence

Complementary Label Learning with Positive Label Guessing and Negative Label Enhancement

Not-So-Optimal Transport Flows for 3D Point Cloud Generation

Synergy Between Sufficient Changes and Sparse Mixing Procedure for Disentangled Representation Learning

Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning

In Search of the Engram in LLMs: A Neuroscience Perspective on the Memory Functions in AI Models

Pyramidal Flow Matching for Efficient Video Generative Modeling

Model merging with SVD to tie the Knots

Repurposing in AI: A Distinct Approach or an Extension of Creative Problem Solving?

Scaling up the Banded Matrix Factorization Mechanism for Large Scale Differentially Private ML

Do vision models perceive objects like toddlers ?

Curriculum-aware Training for Discriminating Molecular Property Prediction Models

Variational Diffusion Posterior Sampling with Midpoint Guidance

Preference Diffusion for Recommendation

MotherNet: Fast Training and Inference via Hyper-Network Transformers

NutriBench: A Dataset for Evaluating Large Language Models in Nutrition Estimation from Meal Descriptions

Safety Alignment Should be Made More Than Just a Few Tokens Deep

MTSAM: Multi-Task Fine-Tuning for Segment Anything Model

STAR: Stability-Inducing Weight Perturbation for Continual Learning

Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions

Efficient Biological Data Acquisition through Inference Set Design

Pitfalls of Evidence-Based AI Policy

The Loss Landscape of Deep Linear Neural Networks: a Second-order Analysis

GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training

Improving Text-to-Image Consistency via Automatic Prompt Optimization

FedLWS: Federated Learning with Adaptive Layer-wise Weight Shrinking

High-dimension Prototype is a Better Incremental Object Detection Learner

Learning multi-modal generative models with permutation-invariant encoders and tighter variational objectives

SINGAPO: Single Image Controlled Generation of Articulated Parts in Objects

Noise Stability Optimization for Finding Flat Minima: A Hessian-based Regularization Approach

Inference Optimal VLMs Need Fewer Visual Tokens and More Parameters

Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solver

Interference Among First-Price Pacing Equilibria: A Bias and Variance Analysis

LoRA-X: Bridging Foundation Models with Training-Free Cross-Model Adaptation

HyperDAS: Towards Automating Mechanistic Interpretability with Hypernetworks

BrainUICL: An Unsupervised Individual Continual Learning Framework for EEG Applications

Mastering Task Arithmetic: $\tau$Jp as a Key Indicator for Weight Disentanglement

Benign Overfitting in Out-of-Distribution Generalization of Linear Models

HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models

Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)

Data Taggants: Dataset Ownership Verification Via Harmless Targeted Data Poisoning

Towards Understanding Why Label Smoothing Degrades Selective Classification and How to Fix It

Dynamic Negative Guidance of Diffusion Models

Learning How Hard to Think: Input-Adaptive Allocation of LM Computation

Divergence-enhanced Knowledge-guided Context Optimization for Visual-Language Prompt Tuning

A Multiscale Frequency Domain Causal Framework for Enhanced Pathological Analysis

Towards Foundation Models for Mixed Integer Linear Programming

LaGeM: A Large Geometry Model for 3D Representation Learning and Diffusion

Multi-Label Test-Time Adaptation with Bound Entropy Minimization

Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

Bad-PFL: Exploiting Backdoor Attacks against Personalized Federated Learning

Graph Sparsification via Mixture of Graphs

Regretful Decisions under Label Noise

Bandit Learning in Matching Markets with Indifference

MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

WardropNet: Traffic Flow Predictions via Equilibrium-Augmented Learning

Prompting Fairness: Integrating Causality to Debias Large Language Models

Simplifying Deep Temporal Difference Learning

Examining Alignment of Large Language Models through Representative Heuristics: the case of political stereotypes

Alchemy: Amplifying Theorem-Proving Capability Through Symbolic Mutation

Logically Consistent Language Models via Neuro-Symbolic Integration

Lie Algebra Canonicalization: Equivariant Neural Operators under arbitrary Lie Groups

Entropy-based Activation Function Optimization: A Method on Searching Better Activation Functions

Forte : Finding Outliers with Representation Typicality Estimation

Graph Neural Ricci Flow: Evolving Feature from a Curvature Perspective

Object-Centric Pretraining via Target Encoder Bootstrapping

Manifold Induced Biases for Zero-shot and Few-shot Detection of Generated Images

CoTFormer: A Chain of Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference

A Black Swan Hypothesis: The Role of Human Irrationality in AI Safety

Generalization through variance: how noise shapes inductive biases in diffusion models

CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs

TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark

Proximal Mapping Loss: Understanding Loss Functions in Crowd Counting & Localization

On the Importance of Language-driven Representation Learning for Heterogeneous Federated Learning

Self-Boosting Large Language Models with Synthetic Preference Data

EdgeRunner: Auto-regressive Auto-encoder for Artistic Mesh Generation

Stiefel Flow Matching for Moment-Constrained Structure Elucidation

Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities

HALL-E: Hierarchical Neural Codec Language Model for Minute-Long Zero-Shot Text-to-Speech Synthesis

Optimal Brain Apoptosis

ToolACE: Winning the Points of LLM Function Calling

GDrag:Towards General-Purpose Interactive Editing with Anti-ambiguity Point Diffusion

STAMP: Scalable Task- And Model-agnostic Collaborative Perception

Forking Paths in Neural Text Generation

Breach By A Thousand Leaks: Unsafe Information Leakage in 'Safe' AI Responses

GameGen-X: Interactive Open-world Game Video Generation

LoLCATs: On Low-Rank Linearizing of Large Language Models

Shapley-Guided Utility Learning for Effective Graph Inference Data Valuation

ViSAGe: Video-to-Spatial Audio Generation

Quality Measures for Dynamic Graph Generative Models

TGB-Seq Benchmark: Challenging Temporal GNNs with Complex Sequential Dynamics

Optimal Strong Regret and Violation in Constrained MDPs via Policy Optimization

TetSphere Splatting: Representing High-Quality Geometry with Lagrangian Volumetric Meshes

Controllable Generation via Locally Constrained Resampling

Longhorn: State Space Models are Amortized Online Learners

Explain Yourself, Briefly! Self-Explaining Neural Networks with Concise Sufficient Reasons

DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from a Single Demo

Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

Reconstructive Visual Instruction Tuning

RouteLLM: Learning to Route LLMs from Preference Data

CONDA: Adaptive Concept Bottleneck for Foundation Models Under Distribution Shifts

Generalizability of Neural Networks Minimizing Empirical Risk Based on Expressive Power

ParFam -- (Neural Guided) Symbolic Regression via Continuous Global Optimization

Amortized Control of Continuous State Space Feynman-Kac Model for Irregular Time Series

Do LLMs have Consistent Values?

LevAttention: Time, Space and Streaming Efficient Algorithm for Heavy Attentions

Data Pruning by Information Maximization

Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations

Causal Identification for Complex Functional Longitudinal Studies

Semi-Supervised CLIP Adaptation by Enforcing Semantic and Trapezoidal Consistency

On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent

Enhancing Zeroth-order Fine-tuning for Language Models with Low-rank Structures

Ada-K Routing: Boosting the Efficiency of MoE-based LLMs

VideoPhy: Evaluating Physical Commonsense for Video Generation

An Efficient Framework for Crediting Data Contributors of Diffusion Models

Inverse Constitutional AI: Compressing Preferences into Principles

PRDP: Progressively Refined Differentiable Physics

When do GFlowNets learn the right distribution?

SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget

OmniPhysGS: 3D Constitutive Gaussians for General Physics-Based Dynamics Generation

3D-Properties: Identifying Challenges in DPO and Charting a Path Forward

Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark

Iterative Dual-RL: An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning

ALBAR: Adversarial Learning approach to mitigate Biases in Action Recognition

Towards Optimal Multi-draft Speculative Decoding

IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities

Controllable Unlearning for Image-to-Image Generative Models via $\epsilon$-Constrained Optimization

LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch

DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing

Infilling Score: A Pretraining Data Detection Algorithm for Large Language Models

Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning

Multi-Reward as Condition for Instruction-based Image Editing

Conditional Diffusion with Ordinal Regression: Longitudinal Data Generation for Neurodegenerative Disease Studies

The Complexity of Two-Team Polymatrix Games with Independent Adversaries

Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models

Mixture of Parrots: Experts improve memorization more than reasoning

OSDA Agent: Leveraging Large Language Models for De Novo Design of Organic Structure Directing Agents

Sports-Traj: A Unified Trajectory Generation Model for Multi-Agent Movement in Sports

Vision and Language Synergy for Rehearsal Free Continual Learning

Adaptive Retention & Correction: Test-Time Training for Continual Learning

A CLIP-Powered Framework for Robust and Generalizable Data Selection

Verifying Properties of Binary Neural Networks Using Sparse Polynomial Optimization

Sparse Autoencoders Do Not Find Canonical Units of Analysis

Debiasing Federated Learning with Correlated Client Participation

Beyond Sequence: Impact of Geometric Context for RNA Property Prediction

Causal Order: The Key to Leveraging Imperfect Experts in Causal Inference

SIMPL: Scalable and hassle-free optimisation of neural representations from behaviour

AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution

LocoVR: Multiuser Indoor Locomotion Dataset in Virtual Reality

Integrating Protein Dynamics into Structure-Based Drug Design via Full-Atom Stochastic Flows

MetaOOD: Automatic Selection of OOD Detection Models

Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models

Doubly robust identification of treatment effects from multiple environments

RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection

MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

Open-CK: A Large Multi-Physics Fields Coupling benchmarks in Combustion Kinetics

Grounding Continuous Representations in Geometry: Equivariant Neural Fields

Consistent Flow Distillation for Text-to-3D Generation

Grounding Multimodal Large Language Model in GUI World

Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning

Implicit Search via Discrete Diffusion: A Study on Chess

DeciMamba: Exploring the Length Extrapolation Potential of Mamba

XAIguiFormer: explainable artificial intelligence guided transformer for brain disorder identification

The Illustrated AlphaFold

HG-Adapter: Improving Pre-Trained Heterogeneous Graph Neural Networks with Dual Adapters

Decentralized Optimization with Coupled Constraints

Beyond Random Augmentations: Pretraining with Hard Views

MIRACLE 3D: Memory-efficient Integrated Robust Approach for Continual Learning on 3D Point Clouds via Shape Model Construction

Exploiting Structure in Offline Multi-Agent RL: The Benefits of Low Interaction Rank

Learning Efficient Positional Encodings with Graph Neural Networks

I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength

DeLLMa: Decision Making Under Uncertainty with Large Language Models

Repetition Improves Language Model Embeddings

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

Sequential Stochastic Combinatorial Optimization Using Hierarchal Reinforcement Learning

From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency

A Theoretical Analysis of Self-Supervised Learning for Vision Transformers

RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization

Toward Generalizing Visual Brain Decoding to Unseen Subjects

Transformers Provably Learn Two-Mixture of Linear Classification via Gradient Flow

Audio Large Language Models Can Be Descriptive Speech Quality Evaluators

Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs

Scalable Decentralized Learning with Teleportation

LLaMaFlex: Many-in-one LLMs via Generalized Pruning and Weight Sharing

Towards Neural Scaling Laws for Time Series Foundation Models

FLIP: Flow-Centric Generative Planning as General-Purpose Manipulation World Model

Understanding Constraint Inference in Safety-Critical Inverse Reinforcement Learning

TopoGaussian: Inferring Internal Topology Structures from Visual Clues

UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation

SleepSMC: Ubiquitous Sleep Staging via Supervised Multimodal Coordination

PRISM: Privacy-Preserving Improved Stochastic Masking for Federated Generative Models

Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon

MuHBoost: Multi-Label Boosting For Practical Longitudinal Human Behavior Modeling

Kolmogorov-Arnold Transformer

Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN

Flavors of Margin: Implicit Bias of Steepest Descent in Homogeneous Neural Networks

Constructing Confidence Intervals for Average Treatment Effects from Multiple Datasets

Lightweight Neural App Control

Generating CAD Code with Vision-Language Models for 3D Designs

Towards Bridging Generalization and Expressivity of Graph Neural Networks

Spread Preference Annotation: Direct Preference Judgment for Efficient LLM Alignment

MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection

qNBO: quasi-Newton Meets Bilevel Optimization

Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping

LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph Generation

Efficient and Trustworthy Causal Discovery with Latent Variables and Complex Relations

LiveXiv - A Multi-Modal live benchmark based on Arxiv papers content

Learning Molecular Representation in a Cell

Adaptive teachers for amortized samplers

HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

On the Linear Speedup of Personalized Federated Reinforcement Learning with Shared Representations

Autoregressive Pretraining with Mamba in Vision

What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?

Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance

SC-OmniGS: Self-Calibrating Omnidirectional Gaussian Splatting

Reinforcement Learning from Imperfect Corrective Actions and Proxy Rewards

S4M: S4 for multivariate time series forecasting with Missing values

ProteinBench: A Holistic Evaluation of Protein Foundation Models

Leveraging Variable Sparsity to Refine Pareto Stationarity in Multi-Objective Optimization

Non-Equilibrium Dynamics of Hybrid Continuous-Discrete Ground-State Sampling

Backtracking Improves Generation Safety

RGB-Event ISP: The Dataset and Benchmark

Interactive Speculative Planning: Enhance Agent Efficiency through Co-design of System and User Interface

Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation

Hessian-Free Online Certified Unlearning

The KoLMogorov Test: Compression by Code Generation

Does Training with Synthetic Data Truly Protect Privacy?

Can LLMs Solve Longer Math Word Problems Better?

Improving Long-Text Alignment for Text-to-Image Diffusion Models

Real2Code: Reconstruct Articulated Objects via Code Generation

Innovative Thinking, Infinite Humor: Humor Research of Large Language Models through Structured Thought Leaps

ELICIT: LLM Augmentation Via External In-context Capability

OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces

DRL: Decomposed Representation Learning for Tabular Anomaly Detection

A Deep Generative Learning Approach for Two-stage Adaptive Robust Optimization

Towards Faster Decentralized Stochastic Optimization with Communication Compression

Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling

Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

Restyling Unsupervised Concept Based Interpretable Networks with Generative Models

Digi-Q: Learning VLM Q-Value Functions for Training Device-Control Agents

Training Language Models to Self-Correct via Reinforcement Learning

High-Dynamic Radar Sequence Prediction for Weather Nowcasting Using Spatiotemporal Coherent Gaussian Representation

How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension

When narrower is better: the narrow width limit of Bayesian parallel branching neural networks

OPTAMI: Global Superlinear Convergence of High-order Methods

CFD: Learning Generalized Molecular Representation via Concept-Enhanced Feedback Disentanglement

CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL

Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution

Understanding Methods for Scalable MCTS

Speech Robust Bench: A Robustness Benchmark For Speech Recognition

Efficient and Context-Aware Label Propagation for Zero-/Few-Shot Training-Free Adaptation of Vision-Language Model

Imputation for prediction: beware of diminishing returns.

QA-Calibration of Language Model Confidence Scores

Learning Shape-Independent Transformation via Spherical Representations for Category-Level Object Pose Estimation

Transformer Block Coupling and its Correlation with Generalization in LLMs

MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs

Refine-by-Align: Reference-Guided Artifacts Refinement through Semantic Alignment

Fast Feedforward 3D Gaussian Splatting Compression

Faster Algorithms for Structured Linear and Kernel Support Vector Machines

Clique Number Estimation via Differentiable Functions of Adjacency Matrix Permutations

Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Sensitivity-Constrained Fourier Neural Operators for Forward and Inverse Problems in Parametric Differential Equations

Dynamic Low-Rank Sparse Adaptation for Large Language Models

RazorAttention: Efficient KV Cache Compression Through Retrieval Heads

Trivialized Momentum Facilitates Diffusion Generative Modeling on Lie Groups

Provable Convergence and Limitations of Geometric Tempering for Langevin Dynamics

ADMM for Structured Fractional Minimization

MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs

Reinforcement learning with combinatorial actions for coupled restless bandits

Decoupling Layout from Glyph in Online Chinese Handwriting Generation

Enhancing Document Understanding with Group Position Embedding: A Novel Approach to Incorporate Layout Information

OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup

Continual Slow-and-Fast Adaptation of Latent Neural Dynamics (CoSFan): Meta-Learning What-How & When to Adapt

DEPfold: RNA Secondary Structure Prediction as Dependency Parsing.

Gaussian-Det: Learning Closed-Surface Gaussians for 3D Object Detection

MADGEN: Mass-Spec attends to De Novo Molecular generation

Controlled LLM Decoding via Discrete Auto-regressive Biasing

Student-Informed Teacher Training

Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs

Evaluating Large Language Models through Role-Guide and Self-Reflection: A Comparative Study

Black-Box Detection of Language Model Watermarks

Large Convolutional Model Tuning via Filter Subspace

AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials

DynaPrompt: Dynamic Test-Time Prompt Tuning

A Graph Enhanced Symbolic Discovery Framework For Efficient Logic Optimization

ToVE: Efficient Vision-Language Learning via Knowledge Transfer from Vision Experts

SymmetricDiffusers: Learning Discrete Diffusion Models over Finite Symmetric Groups

Data Center Cooling System Optimization Using Offline Reinforcement Learning

Improved Approximation Algorithms for $k$-Submodular Maximization via Multilinear Extension

CAKE: Cascading and Adaptive KV Cache Eviction with Layer Preferences

GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS

Diffusion-Based Planning for Autonomous Driving with Flexible Guidance

Reveal Object in Lensless Photography via Region Gaze and Amplification

ReGen: Generative Robot Simulation via Inverse Design

The adaptive complexity of parallelized log-concave sampling

Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling

TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Matcha: Mitigating Graph Structure Shifts with Test-Time Adaptation

Efficient Multi-agent Offline Coordination via Diffusion-based Trajectory Stitching

The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise

Agent-Oriented Planning in Multi-Agent Systems

Privacy-Preserving Personalized Federated Prompt Learning for Multimodal Large Language Models

AutoUAD: Hyper-parameter Optimization for Unsupervised Anomaly Detection

Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis

On Large Language Model Continual Unlearning

Local-Prompt: Extensible Local Prompts for Few-Shot Out-of-Distribution Detection

Adapt-$\infty$: Scalable Continual Multimodal Instruction Tuning via Dynamic Data Selection

INFER: A Neural-symbolic Model For Extrapolation Reasoning on Temporal Knowledge Graph

Poison-splat: Computation Cost Attack on 3D Gaussian Splatting

The Ramanujan Library - Automated Discovery on the Hypergraph of Integer Relations

Retrieval Head Mechanistically Explains Long-Context Factuality

BEEM: Boosting Performance of Early Exit DNNs using Multi-Exit Classifiers as Experts

Bisimulation Metric for Model Predictive Control

Regulatory DNA Sequence Design with Reinforcement Learning

Differentially private optimization for non-decomposable objective functions

DataGen: Unified Synthetic Dataset Generation via Large Language Models

DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation

OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision

STAFF: Speculative Coreset Selection for Task-Specific Fine-tuning

Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs

Determine-Then-Ensemble: Necessity of Top-k Union for Large Language Model Ensembling

MAESTRO: Masked Encoding Set Transformer with Self-Distillation

Proxy Denoising for Source-Free Domain Adaptation

Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning

When does compositional structure yield compositional generalization? A kernel theory.

Spherical Tree-Sliced Wasserstein Distance

Differentiable Integer Linear Programming

Discrete Copula Diffusion

Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot Videos

CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning

Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning

Do Large Language Models Truly Understand Geometric Structures?

In Search of Forgotten Domain Generalization

GIFT: Unlocking Full Potential of Labels in Distilled Dataset at Near-zero Cost

The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities

EmbedLLM: Learning Compact Representations of Large Language Models

Diffusion Generative Modeling for Spatially Resolved Gene Expression Inference from Histology Images

DELIFT: Data Efficient Language model Instruction Fine-Tuning

Graph-based Document Structure Analysis

UniCO: On Unified Combinatorial Optimization via Problem Reduction to Matrix-Encoded General TSP

GridMix: Exploring Spatial Modulation for Neural Fields in PDE Modeling

Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic Learning

Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration

Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws

Global Well-posedness and Convergence Analysis of Score-based Generative Models via Sharp Lipschitz Estimates

Sylber: Syllabic Embedding Representation of Speech from Raw Audio

JudgeBench: A Benchmark for Evaluating LLM-Based Judges

AutoCGP: Closed-Loop Concept-Guided Policies from Unlabeled Demonstrations

InstaRevive: One-Step Image Enhancement via Dynamic Score Matching

Rethinking the role of frames for SE(3)-invariant crystal structure modeling

SLMRec: Distilling Large Language Models into Small for Sequential Recommendation

Bridging the Data Provenance Gap Across Text, Speech, and Video

Grounding Video Models to Actions through Goal Conditioned Exploration

SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement

R2Det: Exploring Relaxed Rotation Equivariance in 2D Object Detection

Group-robust Sample Reweighting for Subpopulation Shifts via Influence Functions

ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning

Neural Functions for Learning Periodic Signal

OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer

Rational Decision-Making Agent with Learning Internal Utility Judgment

Efficient Active Imitation Learning with Random Network Distillation

MMQA: Evaluating LLMs with Multi-Table Multi-Hop Complex Questions

TFG-Flow: Training-free Guidance in Multimodal Generative Flow

ADMM for Nonconvex Optimization under Minimal Continuity Assumption

Skill Expansion and Composition in Parameter Space

SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation

Interpreting the Second-Order Effects of Neurons in CLIP

Optimizing $(L_0, L_1)$-Smooth Functions by Gradient Methods

Weakly Supervised Video Scene Graph Generation via Natural Language Supervision

MAPS: Advancing Multi-Modal Reasoning in Expert-Level Physical Science

Mutual Effort for Efficiency: A Similarity-based Token Pruning for Vision Transformers in Self-Supervised Learning

Discrete Latent Plans via Semantic Skill Abstractions

3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds

SMT: Fine-Tuning Large Language Models with Sparse Matrices

RaSA: Rank-Sharing Low-Rank Adaptation

NeurFlow: Interpreting Neural Networks through Neuron Groups and Functional Interactions

Chain-of-Focus Prompting: Leveraging Sequential Visual Cues to Prompt Large Autoregressive Vision Models

Centrality-guided Pre-training for Graph

Taming Transformer Without Using Learning Rate Warmup

Unhackable Temporal Reward for Scalable Video MLLMs

Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models

Enhancing Pre-trained Representation Classifiability can Boost its Interpretability

Learning Graph Invariance by Harnessing Spuriosity

INS: Interaction-aware Synthesis to Enhance Offline Multi-agent Reinforcement Learning

Divergence-Regularized Discounted Aggregation: Equilibrium Finding in Multiplayer Partially Observable Stochastic Games

Empowering LLM Agents with Zero-Shot Optimal Decision-Making through Q-learning

Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages

Round and Round We Go! What makes Rotary Positional Encodings useful?

Remove Symmetries to Control Model Expressivity and Improve Optimization

ZETA: Leveraging $Z$-order Curves for Efficient Top-$k$ Attention

Atlas Gaussians Diffusion for 3D Generation

Partially Observed Trajectory Inference using Optimal Transport and a Dynamics Prior

Data Shapley in One Training Run

ADAPT: Attentive Self-Distillation and Dual-Decoder Prediction Fusion for Continual Panoptic Segmentation

Systems with Switching Causal Relations: A Meta-Causal Perspective

Towards Improving Exploration through Sibling Augmented GFlowNets

Guided Score identity Distillation for Data-Free One-Step Text-to-Image Generation

VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?

World Model on Million-Length Video And Language With Blockwise RingAttention

Certifying Counterfactual Bias in LLMs

Learning Generalizable Skills from Offline Multi-Task Data for Multi-Agent Cooperation

Improving Uncertainty Estimation through Semantically Diverse Language Generation

Learning LLM-as-a-Judge for Preference Alignment

An Illustrated Guide to Automatic Sparse Differentiation

Machine Unlearning Fails to Remove Data Poisoning Attacks

RandLoRA: Full rank parameter-efficient fine-tuning of large models

Rethinking Evaluation of Sparse Autoencoders through the Representation of Polysemous Words

Union-over-Intersections: Object Detection beyond Winner-Takes-All

Refine Knowledge of Large Language Models via Adaptive Contrastive Learning

Unified Convergence Analysis for Score-Based Diffusion Models with Deterministic Samplers

STAR: Synthesis of Tailored Architectures

AlphaEdit: Null-Space Constrained Model Editing for Language Models

Glauber Generative Model: Discrete Diffusion Models via Binary Classification

SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Maximizing the Potential of Synthetic Data: Insights from Random Matrix Theory

Let Your Features Tell The Differences: Understanding Graph Convolution By Feature Splitting

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

HGM³: Hierarchical Generative Masked Motion Modeling with Hard Token Mining

Do LLMs estimate uncertainty well in instruction-following?

Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions

Harnessing Webpage UIs for Text-Rich Visual Understanding

Can In-context Learning Really Generalize to Out-of-distribution Tasks?

Narrowing Information Bottleneck Theory for Multimodal Image-Text Representations Interpretability

From Models to Microtheories: Distilling a Model's Topical Knowledge for Grounded Question-Answering

On Designing General and Expressive Quantum Graph Neural Networks with Applications to MILP Instance Representation

Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling

Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models

Neuron Platonic Intrinsic Representation From Dynamics Using Contrastive Learning

CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding & Reasoning Capabilities of CodeLLMs

DynAlign: Unsupervised Dynamic Taxonomy Alignment for Cross-Domain Segmentation

Controllable Context Sensitivity and the Knob Behind It

Second-Order Min-Max Optimization with Lazy Hessians

Start Smart: Leveraging Gradients For Enhancing Mask-based XAI Methods

Tell me about yourself: LLMs are aware of their learned behaviors

Towards Marginal Fairness Sliced Wasserstein Barycenter

Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequences

From Attention to Activation: Unraveling the Enigmas of Large Language Models

IV-mixed Sampler: Leveraging Image Diffusion Models for Enhanced Video Synthesis

NeuroLM: A Universal Multi-task Foundation Model for Bridging the Gap between Language and EEG Signals

Improving Reasoning Performance in Large Language Models via Representation Engineering

PseDet: Revisiting the Power of Pseudo Label in Incremental Object Detection

Multi-session, multi-task neural decoding from distinct cell-types and brain regions

LOIRE: LifelOng learning on Incremental data via pre-trained language model gRowth Efficiently

MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine

A Simple yet Effective $\Delta\Delta G$ Predictor is An Unsupervised Antibody Optimizer and Explainer

Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

ToolDial: Multi-turn Dialogue Generation Method for Tool-Augmented Language Models

Not All LLM-Generated Data Are Equal: Rethinking Data Weighting in Text Classification

An Empirical Analysis of Uncertainty in Large Language Model Evaluations

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

Inverse decision-making using neural amortized Bayesian actors

LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging

RetroInText: A Multimodal Large Language Model Enhanced Framework for Retrosynthetic Planning via In-Context Representation Learning

PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing

Generalizing Reasoning Problems to Longer Lengths

Ultra-Sparse Memory Network

Discretization-invariance? On the Discretization Mismatch Errors in Neural Operators

System 1.x: Learning to Balance Fast and Slow Planning with Language Models

FreSh: Frequency Shifting for Accelerated Neural Representation Learning

Zero-cost Proxy for Adversarial Robustness Evaluation

Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers

Active Task Disambiguation with LLMs

Differentiable Rule Induction from Raw Sequence Inputs

Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks

How Does Critical Batch Size Scale in Pre-training?

MMAD: A Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection

On the expressiveness and spectral bias of KANs

Scaling Wearable Foundation Models

Disentangling Representations through Multi-task Learning

Accelerated training through iterative gradient propagation along the residual path

Quantitative Approximation for Neural Operators in Nonlinear Parabolic Equations

Autoregressive Video Generation without Vector Quantization

Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding

From Decoupling to Adaptive Transformation: a Wider Optimization Space for PTQ

$\sigma$-zero: Gradient-based Optimization of $\ell_0$-norm Adversarial Examples

Grammar Reinforcement Learning: path and cycle counting in graphs with a Context-Free Grammar and Transformer approach

Sort-free Gaussian Splatting via Weighted Sum Rendering

PIG: Physics-Informed Gaussians as Adaptive Parametric Mesh Representations

Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts

Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model

Advantage-Guided Distillation for Preference Alignment in Small Language Models

A Generic Framework for Conformal Fairness

Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models

Structural-Entropy-Based Sample Selection for Efficient and Effective Learning

Specialized Foundation Models Struggle to Beat Supervised Baselines

Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

A Single Goal is All You Need: Skills and Exploration Emerge from Contrastive RL without Rewards, Demonstrations, or Subgoals

SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP

The Power of LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions

Storybooth: Training-Free Multi-Subject Consistency for Improved Visual Storytelling

Tuning Frequency Bias of State Space Models

Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge

ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains

The Computational Complexity of Positive Non-Clashing Teaching in Graphs

Aligning Visual Contrastive learning models via Preference Optimization

Credit-based self organizing maps: training deep topographic networks with minimal performance degradation

Self-Attention-Based Contextual Modulation Improves Neural System Identification

VLMaterial: Procedural Material Generation with Large Vision-Language Models

Sufficient Context: A New Lens on Retrieval Augmented Generation Systems

Needle Threading: Can LLMs Follow Threads Through Near-Million-Scale Haystacks?

Analyzing Neural Scaling Laws in Two-Layer Networks with Power-Law Data Spectra

Intrinsic User-Centric Interpretability through Global Mixture of Experts

Is Large-scale Pretraining the Secret to Good Domain Generalization?

Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval-Augmented Generation

Boosting Perturbed Gradient Ascent for Last-Iterate Convergence in Games

ToddlerDiffusion: Interactive Structured Image Generation with Cascaded Schrödinger Bridge

Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains

Open-World Reinforcement Learning over Long Short-Term Imagination

Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation

Benchmarking Agentic Workflow Generation

MAGE: Model-Level Graph Neural Networks Explanations via Motif-based Graph Generation

Revisiting Nearest Neighbor for Tabular Data: A Deep Tabular Baseline Two Decades Later

Difference-of-submodular Bregman Divergence

To Trust or Not to Trust? Enhancing Large Language Models' Situated Faithfulness to External Contexts

Faster Cascades via Speculative Decoding

LLMs Can Plan Only If We Tell Them

High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity

Towards Empowerment Gain through Causal Structure Learning in Model-Based Reinforcement Learning

MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents

Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images

ReAttention: Training-Free Infinite Context with Finite Attention Scope

Inverse Rendering using Multi-Bounce Path Tracing and Reservoir Sampling

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

EVA: Geometric Inverse Design for Fast Protein Motif-Scaffolding with Coupled Flow

When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers

Stochastic Bandits Robust to Adversarial Attacks

miniCTX: Neural Theorem Proving with (Long-)Contexts

Timer-XL: Long-Context Transformers for Unified Time Series Forecasting

Can Watermarks be Used to Detect LLM IP Infringement For Free?

CaPo: Cooperative Plan Optimization for Efficient Embodied Multi-Agent Cooperation

Beyond Canonicalization: How Tensorial Messages Improve Equivariant Message Passing

AniSDF: Fused-Granularity Neural Surfaces with Anisotropic Encoding for High-Fidelity 3D Reconstruction

FlowDec: A flow-based full-band general audio codec with high perceptual quality

IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations

Computing Circuits Optimization via Model-Based Circuit Genetic Evolution

Zero-shot Model-based Reinforcement Learning using Large Language Models

YOLO-RD: Introducing Relevant and Compact Explicit Knowledge to YOLO by Retriever-Dictionary

Distilling Structural Representations into Protein Sequence Models

Weighted Point Set Embedding for Multimodal Contrastive Learning Toward Optimal Similarity Metric

GALA: Geometry-Aware Local Adaptive Grids for Detailed 3D Generation

Learning Distributions of Complex Fluid Simulations with Diffusion Graph Networks

Distribution-Specific Agnostic Conditional Classification With Halfspaces

Benchmarking LLMs' Judgments with No Gold Standard

Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View

SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators

On the Performance Analysis of Momentum Method: A Frequency Domain Perspective

A Coefficient Makes SVRG Effective

Language Imbalance Driven Rewarding for Multilingual Self-improving

VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis

Composable Interventions for Language Models

Accelerating Neural ODEs: A Variational Formulation-based Approach

A Non-Contrastive Learning Framework for Sequential Recommendation with Preference-Preserving Profile Generation

Improving Large Language Model Planning with Action Sequence Similarity

Accelerating Task Generalisation with Multi-Level Skill Hierarchies

Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph Generation

A Differentiable Rank-Based Objective for Better Feature Learning

Rethinking Multiple-Instance Learning From Feature Space to Probability Space

Enhancing Prediction Performance through Influence Measure

Data-centric Prediction Explanation via Kernelized Stein Discrepancy

Shallow diffusion networks provably learn hidden low-dimensional structure

Detecting Backdoor Samples in Contrastive Language Image Pretraining

Training-Free Diffusion Model Alignment with Sampling Demons

JPEG Inspired Deep Learning

Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models

Learning to Plan Before Answering: Self-Teaching LLMs to Learn Abstract Plans for Problem Solving

A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules

Streaming Algorithms For $\ell_p$ Flows and $\ell_p$ Regression

MIRAGE: Evaluating and Explaining Inductive Reasoning Process in Language Models

Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation

Asymptotic Analysis of Two-Layer Neural Networks after One Gradient Step under Gaussian Mixtures Data with Structure

Diffusion Feedback Helps CLIP See Better

SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency

MiniPLM: Knowledge Distillation for Pre-training Language Models

Selective Unlearning via Representation Erasure Using Domain Adversarial Training

Projection Head is Secretly an Information Bottleneck

TabReD: Analyzing Pitfalls and Filling the Gaps in Tabular Deep Learning Benchmarks

Node Identifiers: Compact, Discrete Representations for Efficient Graph Learning

Classic but Everlasting: Traditional Gradient-Based Algorithms Converges Fast Even in Time-Varying Multi-Player Games

How efficient is LLM-generated code? A rigorous & high-standard benchmark

DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation

SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

Do as We Do, Not as You Think: the Conformity of Large Language Models

Towards Learning High-Precision Least Squares Algorithms with Sequence Models

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

QERA: an Analytical Framework for Quantization Error Reconstruction

Reducing Hallucinations in Large Vision-Language Models via Latent Space Steering

Accessing Vision Foundation Models via ImageNet-1K

pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation

For Better or For Worse? Learning Minimum Variance Features With Label Augmentation

CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation

Aioli: A Unified Optimization Framework for Language Model Data Mixing

Implicit Neural Surface Deformation with Explicit Velocity Fields

6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering

Can LLMs Understand Time Series Anomalies?

LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation models

Pursuing Better Decision Boundaries for Long-Tailed Object Detection via Category Information Amount

LiveBench: A Challenging, Contamination-Limited LLM Benchmark

Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games

Temporal Reasoning Transfer from Text to Video

Neuron based Personality Trait Induction in Large Language Models

Partial Gromov-Wasserstein Metric

Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model

LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning

What Are Good Positional Encodings for Directed Graphs?

An Evolved Universal Transformer Memory

UniGEM: A Unified Approach to Generation and Property Prediction for Molecules

On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning

Training-Free Dataset Pruning for Instance Segmentation

Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models

Diff-Prompt: Diffusion-driven Prompt Generator with Mask Supervision

DiSK: Differentially Private Optimizer with Simplified Kalman Filter for Noise Reduction

Efficient and Accurate Explanation Estimation with Distribution Compression

From Probability to Counterfactuals: the Increasing Complexity of Satisfiability in Pearl's Causal Hierarchy

Faster Inference of Flow-Based Generative Models via Improved Data-Noise Coupling

Learning Chaos In A Linear Way

What Matters in Learning from Large-Scale Datasets for Robot Manipulation

Rethinking Reward Modeling in Preference-based Large Language Model Alignment

CircuitFusion: Multimodal Circuit Representation Learning for Agile Chip Design

Three-in-One: Fast and Accurate Transducer for Hybrid-Autoregressive ASR

Generating Likely Counterfactuals Using Sum-Product Networks

Enhanced Diffusion Sampling via Extrapolation with Multiple ODE Solutions

What's the Move? Hybrid Imitation Learning via Salient Points

Towards Out-of-Modal Generalization without Instance-level Modal Correspondence

GETS: Ensemble Temperature Scaling for Calibration in Graph Neural Networks

Eliciting Human Preferences with Language Models

AdaRankGrad: Adaptive Gradient Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning

Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching

Laplace Sample Information: Data Informativeness Through a Bayesian Lens

A Periodic Bayesian Flow for Material Generation

SMI-Editor: Edit-based SMILES Language Model with Fragment-level Supervision

CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation

Trusted Multi-View Classification via Evolutionary Multi-View Fusion

Systematic Relational Reasoning With Epistemic Graph Neural Networks

Improving Generalization and Robustness in SNNs Through Signed Rate Encoding and Sparse Encoding Attacks

AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation

Chain-of-region: Visual Language Models Need Details for Diagram Analysis

CoMotion: Concurrent Multi-person 3D Motion

Understanding Matrix Function Normalizations in Covariance Pooling through the Lens of Riemannian Geometry

ESE: Espresso Sentence Embeddings

Random-Set Neural Networks

OGBench: Benchmarking Offline Goal-Conditioned RL

TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data

Explanations of GNN on Evolving Graphs via Axiomatic Layer edges

URLOST: Unsupervised Representation Learning without Stationarity or Topology

Boltzmann priors for Implicit Transfer Operators

On Scaling Up 3D Gaussian Splatting Training

Bayesian WeakS-to-Strong from Text Classification to Generation

Jump Your Steps: Optimizing Sampling Schedule of Discrete Diffusion Models

Compute-Optimal LLMs Provably Generalize Better with Scale

D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement

Rethinking Fair Representation Learning for Performance-Sensitive Tasks

Asymmetric Factorized Bilinear Operation for Vision Transformer

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Generative Flows on Synthetic Pathway for Drug Design

XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning

Spreading Out-of-Distribution Detection on Graphs

GlycanML: A Multi-Task and Multi-Structure Benchmark for Glycan Machine Learning

Zigzag Diffusion Sampling: Diffusion Models Can Self-Improve via Self-Reflection

X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention

Equivariant Denoisers Cannot Copy Graphs: Align Your Graph Diffusion Models

Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning

Commit0: Library Generation from Scratch

Loss Landscape of Shallow ReLU-like Neural Networks: Stationary Points, Saddle Escape, and Network Embedding

Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification

FlickerFusion: Intra-trajectory Domain Generalizing Multi-agent Reinforcement Learning

Faster Diffusion Sampling with Randomized Midpoints: Sequential and Parallel

SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes

Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG

NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval

Knowledge Graph Finetuning Enhances Knowledge Manipulation in Large Language Models

TVNet: A Novel Time Series Analysis Method Based on Dynamic Convolution and 3D-Variation

Variance-Reducing Couplings for Random Features

Looking into User’s Long-term Interests through the Lens of Conservative Evidential Learning

EC-Diffuser: Multi-Object Manipulation via Entity-Centric Behavior Generation

3DIS: Depth-Driven Decoupled Image Synthesis for Universal Multi-Instance Generation

Diffusion Transformer Captures Spatial-Temporal Dependencies: A Theory for Gaussian Process Data

ANaGRAM: A Natural Gradient Relative to Adapted Model for efficient PINNs learning

Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance

Neuron-based Multifractal Analysis of Neuron Interaction Dynamics in Large Models

Multi-Scale Fusion for Object Representation

QP-SNN: Quantized and Pruned Spiking Neural Networks

NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields

Visual Agents as Fast and Slow Thinkers

MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine

HelpSteer2-Preference: Complementing Ratings with Preferences

Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers

Relax and Merge: A Simple Yet Effective Framework for Solving Fair $k$-Means and $k$-sparse Wasserstein Barycenter Problems

Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis

NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals

From Isolated Conversations to Hierarchical Schemas: Dynamic Tree Memory Representation for LLMs

Discovering Clone Negatives via Adaptive Contrastive Learning for Image-Text Matching

Active Learning for Continual Learning: Keeping the Past Alive in the Present

On the Adversarial Vulnerability of Label-Free Test-Time Adaptation

Multi-Draft Speculative Sampling: Canonical Decomposition and Theoretical Limits

Weak to Strong Generalization for Large Language Models with Multi-capabilities

Preble: Efficient Distributed Prompt Scheduling for LLM Serving

TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning

CipherPrune: Efficient and Scalable Private Transformer Inference

Fragment and Geometry Aware Tokenization of Molecules for Structure-Based Drug Design Using Language Models

GOFA: A Generative One-For-All Model for Joint Graph Language Modeling

Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

Chain-of-Thought Provably Enables Learning the (Otherwise) Unlearnable

Efficient Distribution Matching of Representations via Noise-Injected Deep InfoMax

Learning to Help in Multi-Class Settings

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints

An Engorgio Prompt Makes Large Language Model Babble on

Tracking objects that change in appearance with phase synchrony

AdaWM: Adaptive World Model based Planning for Autonomous Driving

Robust Conformal Prediction with a Single Binary Certificate

Long-Short Decision Transformer: Bridging Global and Local Dependencies for Generalized Decision-Making

On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery

RESuM: A Rare Event Surrogate Model for Physics Detector Design

It Helps to Take a Second Opinion: Teaching Smaller LLMs To Deliberate Mutually via Selective Rationale Optimisation

Plastic Learning with Deep Fourier Features

Dynamic Modeling of Patients, Modalities and Tasks via Multi-modal Multi-task Mixture of Experts

Nonconvex Stochastic Optimization under Heavy-Tailed Noises: Optimal Convergence without Gradient Clipping

SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs

Limits to scalable evaluation at the frontier: LLM as judge won’t beat twice the data

Zero-shot Imputation with Foundation Inference Models for Dynamical Systems

Multi-Dimensional Conformal Prediction

From Promise to Practice: Realizing High-performance Decentralized Training

Unifying Causal Representation Learning with the Invariance Principle

Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning

Discrete Codebook World Models for Continuous Control

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

Adaptive $Q$-Network: On-the-fly Target Selection for Deep Reinforcement Learning

Towards Automated Knowledge Integration From Human-Interpretable Representations

RAG-SR: Retrieval-Augmented Generation for Neural Symbolic Regression

Many-Objective Multi-Solution Transport

Measuring And Improving Persuasiveness Of Large Language Models

CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding

Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity

Differentially Private Steering for Large Language Model Alignment

Probabilistic Conformal Prediction with Approximate Conditional Validity

Rethinking Graph Neural Networks From A Geometric Perspective Of Node Features

Formation of Representations in Neural Networks

A Watermark for Order-Agnostic Language Models

Online Reinforcement Learning in Non-Stationary Context-Driven Environments

Conditional Diffusion Models are Minimax-Optimal and Manifold-Adaptive for Conditional Distribution Estimation

Enhancing Graph Of Thought: Enhancing Prompts with LLM Rationales and Dynamic Temperature Control

Understanding Virtual Nodes: Oversquashing and Node Heterogeneity

Robust Root Cause Diagnosis using In-Distribution Interventions

Learning Mask Invariant Mutual Information for Masked Image Modeling

One for all and all for one: Efficient computation of partial Wasserstein distances on the line

Provably Safeguarding a Classifier from OOD and Adversarial Samples

From Search to Sampling: Generative Models for Robust Algorithmic Recourse

Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction

Learning local equivariant representations for quantum operators

Decoupled Graph Energy-based Model for Node Out-of-Distribution Detection on Heterophilic Graphs

Equivariant Masked Position Prediction for Efficient Molecular Representation

Circuit Transformer: A Transformer That Preserves Logical Equivalence

Reconsidering Faithfulness in Regular, Self-Explainable and Domain Invariant GNNs

Learning and aligning single-neuron invariance manifolds in visual cortex

Direct Distributional Optimization for Provable Alignment of Diffusion Models

Signature Kernel Conditional Independence Tests in Causal Discovery for Stochastic Processes

Neural Wave Equation for Irregularly Sampled Sequence Data

Diffusing States and Matching Scores: A New Framework for Imitation Learning

Few for Many: Tchebycheff Set Scalarization for Many-Objective Optimization

Diffusion State-Guided Projected Gradient for Inverse Problems

AstroCompress: A benchmark dataset for multi-purpose compression of astronomical data

Learning Long Range Dependencies on Graphs via Random Walks

Arithmetic Without Algorithms: Language Models Solve Math with a Bag of Heuristics

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

Erasing Concept Combination from Text-to-Image Diffusion Model

RelCon: Relative Contrastive Learning for a Motion Foundation Model for Wearable Data

Efficient Dictionary Learning with Switch Sparse Autoencoders

Learning a Neural Solver for Parametric PDEs to Enhance Physics-Informed Methods

Meta-Continual Learning of Neural Fields

A Sanity Check for AI-generated Image Detection

Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution

Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance

Instant Policy: In-Context Imitation Learning via Graph Diffusion

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration

DS-LLM: Leveraging Dynamical Systems to Enhance Both Training and Inference of Large Language Models

Positive-Unlabeled Diffusion Models for Preventing Sensitive Data Generation

EFFICIENT JAILBREAK ATTACK SEQUENCES ON LARGE LANGUAGE MODELS VIA MULTI-ARMED BANDIT-BASED CONTEXT SWITCHING

CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and Reranking

Inverse Attention Agents for Multi-Agent Systems

Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting

AutoEval: Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks

Adversarial Mixup Unlearning

Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs

Moner: Motion Correction in Undersampled Radial MRI with Unsupervised Neural Representation

Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson–Romberg Extrapolation

Samba: Synchronized Set-of-Sequences Modeling for Multiple Object Tracking

Geometry-Aware Approaches for Balancing Performance and Theoretical Guarantees in Linear Bandits

Proactive Privacy Amnesia for Large Language Models: Safeguarding PII with Negligible Impact on Model Utility

Representational Similarity via Interpretable Visual Concepts

FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference

SafeDiffuser: Safe Planning with Diffusion Probabilistic Models

Robotouille: An Asynchronous Planning Benchmark for LLM Agents

ET-SEED: EFFICIENT TRAJECTORY-LEVEL SE(3) EQUIVARIANT DIFFUSION POLICY

Uncertainty modeling for fine-tuned implicit functions

PaCA: Partial Connection Adaptation for Efficient Fine-Tuning

Distance-Based Tree-Sliced Wasserstein Distance

A Theoretical Framework for Partially-Observed Reward States in RLHF

Mind the GAP: Glimpse-based Active Perception improves generalization and sample efficiency of visual reasoning

Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models

LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

Can We Talk Models Into Seeing the World Differently?

Unlocking the Potential of Model Calibration in Federated Learning

Endowing Visual Reprogramming with Adversarial Robustness

Duoduo CLIP: Efficient 3D Understanding with Multi-View Images

MuPT: A Generative Symbolic Music Pretrained Transformer

Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics

Neural Stochastic Differential Equations for Uncertainty-Aware Offline RL

No Free Lunch: Fundamental Limits of Learning Non-Hallucinating Generative Models

AutoG: Towards automatic graph construction from tabular data

Can Knowledge Editing Really Correct Hallucinations?

Understanding Long Videos with Multimodal Language Models

GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement

Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction

Watch Less, Do More: Implicit Skill Discovery for Video-Conditioned Policy

Immunogenicity Prediction with Dual Attention Enables Vaccine Target Selection

Mining your own secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models

InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales

Uncertainty-Aware Decoding with Minimum Bayes Risk

Instance-dependent Early Stopping

GaussianAnything: Interactive Point Cloud Flow Matching for 3D Generation

ACES: Automatic Cohort Extraction System for Event-Stream Datasets

GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation

No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images

OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones?

Provable Benefit of Annealed Langevin Monte Carlo for Non-log-concave Sampling

Mix-CPT: A Domain Adaptation Framework via Decoupling Knowledge Learning and Format Alignment

T-JEPA: Augmentation-Free Self-Supervised Learning for Tabular Data

B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Directional Gradient Projection for Robust Fine-Tuning of Foundation Models

Precise Parameter Localization for Textual Generation in Diffusion Models

Residual-MPPI: Online Policy Customization for Continuous Control

Diffusion Models Are Real-Time Game Engines

Rare event modeling with self-regularized normalizing flows: what can we learn from a single failure?

Can Video LLMs Refuse to Answer? Alignment for Answerability in Video Large Language Models

Scalable Influence and Fact Tracing for Large Language Model Pretraining

Unlearning-based Neural Interpretations

Towards Auto-Regressive Next-Token Prediction: In-context Learning Emerges from Generalization

Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Model Alignment

Learning Robust Representations with Long-Term Information for Generalization in Visual Reinforcement Learning

RTop-K: Ultra-Fast Row-Wise Top-K Selection for Neural Network Acceleration on GPUs

Safety Representations for Safer Policy Learning

Humanizing the Machine: Proxy Attacks to Mislead LLM Detectors

Generalization and Distributed Learning of GFlowNets

Composing Unbalanced Flows for Flexible Docking and Relaxation

MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance

Local Loss Optimization in the Infinite Width: Stable Parameterization of Predictive Coding Networks and Target Propagation

Weighted-Reward Preference Optimization for Implicit Model Fusion

Mitigating Reward Over-Optimization in RLHF via Behavior-Supported Regularization

Improved Training Technique for Latent Consistency Models

Linear Spherical Sliced Optimal Transport: A Fast Metric for Comparing Spherical Data

Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models

Statistical Advantages of Perturbing Cosine Router in Mixture of Experts

Steering Protein Family Design through Profile Bayesian Flow

Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference

Interpretable Unsupervised Joint Denoising and Enhancement for Real-World low-light Scenarios

Improving Probabilistic Diffusion Models With Optimal Diagonal Covariance Matching

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

Tight Lower Bounds under Asymmetric High-Order Hölder Smoothness and Uniform Convexity

Unlearning or Obfuscating? Jogging the Memory of Unlearned LLMs via Benign Relearning

Connectome Mapping: Shape-Memory Network via Interpretation of Contextual Semantic Information

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Capability Localization: Capabilities Can be Localized rather than Individual Knowledge

Lasso Bandit with Compatibility Condition on Optimal Arm

Universal Image Restoration Pre-training via Degradation Classification

Lightweight Predictive 3D Gaussian Splats

Towards Continuous Reuse of Graph Models via Holistic Memory Diversification

GenVP: Generating Visual Puzzles with Contrastive Hierarchical VAEs

Safety-Prioritizing Curricula for Constrained Reinforcement Learning

What to align in multimodal contrastive learning?

Value-aligned Behavior Cloning for Offline Reinforcement Learning via Bi-level Optimization

Learning Neural Networks with Distribution Shift: Efficiently Certifiable Guarantees

Filtered not Mixed: Filtering-Based Online Gating for Mixture of Large Language Models

Mitigating Parameter Interference in Model Merging via Sharpness-Aware Fine-Tuning

Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood

SegLLM: Multi-round Reasoning Segmentation with Large Language Models

Gap Preserving Distillation by Building Bidirectional Mappings with A Dynamic Teacher

Layerwise Recurrent Router for Mixture-of-Experts

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count

Pairwise Elimination with Instance-Dependent Guarantees for Bandits with Cost Subsidy

Fine-Tuning Attention Modules Only: Enhancing Weight Disentanglement in Task Arithmetic

Training Language Models on Synthetic Edit Sequences Improves Code Synthesis

Provable unlearning in topic modeling and downstream tasks

Gaussian-Based Instance-Adaptive Intensity Modeling for Point-Supervised Facial Expression Spotting

Training Free Exponential Context Extension via Cascading KV Cache

Sequential Controlled Langevin Diffusions

PooDLe🐩: Pooled and dense self-supervised learning from naturalistic videos

Exact Certification of (Graph) Neural Networks Against Label Poisoning

Policy Design in Long-run Welfare Dynamics

Tight Clusters Make Specialized Experts

Predicting the Energy Landscape of Stochastic Dynamical System via Physics-informed Self-supervised Learning

EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing

MallowsPO: Fine-Tune Your LLM with Preference Dispersions

SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction

A Robust Method to Discover Causal or Anticausal Relation

Spectral Compressive Imaging via Unmixing-driven Subspace Diffusion Refinement

Hybrid Regularization Improves Diffusion-based Inverse Problem Solving

A Closer Look at Machine Unlearning for Large Language Models

Underdamped Diffusion Bridges with Applications to Sampling

CLIBD: Bridging Vision and Genomics for Biodiversity Monitoring at Scale

GSE: Group-wise Sparse and Explainable Adversarial Attacks

Efficient Training of Neural Stochastic Differential Equations by Matching Finite Dimensional Distributions

Model Equality Testing: Which Model is this API Serving?

RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style

Model-agnostic meta-learners for estimating heterogeneous treatment effects over time

MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequences

Scale-aware Recognition in Satellite Images under Resource Constraints

Credal Wrapper of Model Averaging for Uncertainty Estimation in Classification

TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models

Adversarial Latent Feature Augmentation for Fairness

REvolve: Reward Evolution with Large Language Models using Human Feedback

Aligning Human Motion Generation with Human Perceptions

PEARL: Parallel Speculative Decoding with Adaptive Draft Length

Subgraph Federated Learning for Local Generalization

One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt

A New Perspective on Shampoo's Preconditioner

GeoLoRA: Geometric integration for parameter efficient fine-tuning

Unsupervised Model Tree Heritage Recovery

MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation

Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs

Diffusion Transformers for Tabular Data Time Series Generation

Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF

Size-Generalizable RNA Structure Evaluation by Exploring Hierarchical Geometries

Transformers Handle Endogeneity in In-Context Linear Regression

No Preference Left Behind: Group Distributional Preference Optimization

InstaTrain: Adaptive Training via Ultra-Fast Natural Annealing within Dynamical Systems

NetFormer: An interpretable model for recovering dynamical connectivity in neuronal population dynamics

Prototype antithesis for biological few-shot class-incremental learning

$\text{I}^2\text{AM}$: Interpreting Image-to-Image Latent Diffusion Models via Bi-Attribution Maps

Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models

A Quantum Circuit-Based Compression Perspective for Parameter-Efficient Learning

BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks

Intrinsic Dimension Correlation: uncovering nonlinear connections in multimodal representations

Risk-Sensitive Diffusion: Robustly Optimizing Diffusion Models with Noisy Samples

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

HADAMRNN: BINARY AND SPARSE TERNARY ORTHOGONAL RNNS

A Geometric Framework for Understanding Memorization in Generative Models

Standardizing Structural Causal Models

Recovering Manifold Structure Using Ollivier Ricci Curvature

TopoLM: brain-like spatio-functional organization in a topographic language model

Towards General-Purpose Model-Free Reinforcement Learning

PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection

QMP: Q-switch Mixture of Policies for Multi-Task Behavior Sharing

Confidence Elicitation: A New Attack Vector for Large Language Models

MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Mitigate the Gap: Improving Cross-Modal Alignment in CLIP

Separation Power of Equivariant Neural Networks

Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?

Toward Understanding In-context vs. In-weight Learning

Error-quantified Conformal Inference for Time Series

Enhancing Language Model Agents using Diversity of Thoughts

InstantPortrait: One-Step Portrait Editing via Diffusion Multi-Objective Distillation

A Distributional Approach to Uncertainty-Aware Preference Alignment Using Offline Demonstrations

Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models

DiffPC: Diffusion-based High Perceptual Fidelity Image Compression with Semantic Refinement

ASTrA: Adversarial Self-supervised Training with Adaptive-Attacks

ARB-LLM: Alternating Refined Binarizations for Large Language Models

Optimized Multi-Token Joint Decoding With Auxiliary Model for LLM Inference

Kronecker Mask and Interpretive Prompts are Language-Action Video Learners

Brain Bandit: A Biologically Grounded Neural Network for Efficient Control of Exploration

InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation

Min-K%++: Improved Baseline for Pre-Training Data Detection from Large Language Models

Latent Bayesian Optimization via Autoregressive Normalizing Flows

Easing Training Process of Rectified Flow Models Via Lengthening Inter-Path Distance

Shape as Line Segments: Accurate and Flexible Implicit Surface Representation

Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression

Cross-Domain Off-Policy Evaluation and Learning for Contextual Bandits

FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models

Diffusion Models as Cartoonists: The Curious Case of High Density Regions

Concept Bottleneck Language Models For Protein Design

MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model

Sensor-Invariant Tactile Representation

Conformalized Interactive Imitation Learning: Handling Expert Shift and Intermittent Feedback

Learning 3D Perception from Others' Predictions

Multilevel Generative Samplers for Investigating Critical Phenomena

Generator Matching: Generative modeling with arbitrary Markov processes

Reframing Structure-Based Drug Design Model Evaluation via Metrics Correlated to Practical Needs

Mixture of Attentions For Speculative Decoding

Medium-Difficulty Samples Constitute Smoothed Decision Boundary for Knowledge Distillation on Pruned Datasets

GOttack: Universal Adversarial Attacks on Graph Neural Networks via Graph Orbits Learning

ODE-based Smoothing Neural Network for Reinforcement Learning Tasks

Enhancing Multilingual Reasoning in LLMs: Insights from Cross-Linguistic Correlations and Optimal Data Proportions

ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement

Progressive Compositionality in Text-to-Image Generative Models

Redefining the task of Bioactivity Prediction

GROOT-2: Weakly Supervised Multimodal Instruction Following Agents

ECD: A Machine Learning Benchmark for Predicting Enhanced-Precision Electronic Charge Density in Crystalline Inorganic Materials

HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction

Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation

On the Crucial Role of Initialization for Matrix Factorization

Handling Delay in Real-Time Reinforcement Learning

NExUME: Adaptive Training and Inference for DNNs under Intermittent Power Environments

Revisiting Random Walks for Learning on Graphs

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Geometry of Lightning Self-Attention: Identifiability and Dimension

Attributing Culture-Conditioned Generations to Pretraining Corpora

Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training

Query-based Knowledge Transfer for Heterogeneous Learning Environments

Long-time asymptotics of noisy SVGD outside the population limit

High-Dimensional Bayesian Optimisation with Gaussian Process Prior Variational Autoencoders

Multi-level Certified Defense Against Poisoning Attacks in Offline Reinforcement Learning

Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping

Semantic Aware Representation Learning for Lifelong Learning

Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance

Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF

Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA

Optimal Non-Asymptotic Rates of Value Iteration for Average-Reward Markov Decision Processes

Meta-Dynamical State Space Models for Integrative Neural Data Analysis

Optimal Learning of Kernel Logistic Regression for Complex Classification Scenarios

Scaling Optimal LR Across Token Horizons

Is In-Context Learning Sufficient for Instruction Following in LLMs?

Discovering Influential Neuron Path in Vision Transformers

Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing

REBIND: Enhancing Ground-state Molecular Conformation Prediction via Force-Based Graph Rewiring

Locality Sensitive Avatars From Video

Does Spatial Cognition Emerge in Frontier Models?

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Release the Powers of Prompt Tuning: Cross-Modality Prompt Transfer

GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models

LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization

Learning Interpretable Hierarchical Dynamical Systems Models from Time Series Data

NeSyC: A Neuro-symbolic Continual Learner For Complex Embodied Tasks in Open Domains

A Decade's Battle on Dataset Bias: Are We There Yet?

TabM: Advancing tabular deep learning with parameter-efficient ensembling

Learning to Solve Differential Equation Constrained Optimization Problems

GameArena: Evaluating LLM Reasoning through Live Computer Games

MrT5: Dynamic Token Merging for Efficient Byte-level Language Models

Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion

CTSyn: A Foundation Model for Cross Tabular Data Generation

Bayesian Regularization of Latent Representation

MCNC: Manifold-Constrained Reparameterization for Neural Compression

Accelerating 3D Molecule Generation via Jointly Geometric Optimal Transport

Bridging the Gap between Database Search and \emph{De Novo} Peptide Sequencing with SearchNovo

REEF: Representation Encoding Fingerprints for Large Language Models

Regret Bounds for Episodic Risk-Sensitive Linear Quadratic Regulator

Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning

Towards a Theoretical Understanding of Synthetic Data in LLM Post-Training: A Reverse-Bottleneck Perspective

How DNNs break the Curse of Dimensionality: Compositionality and Symmetry Learning

Adaptive backtracking for faster optimization

Privacy-Aware Lifelong Learning

Mitigating Spurious Correlations in Zero-Shot Multimodal Models

LeanAgent: Lifelong Learning for Formal Theorem Proving

BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL

Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining

ZooProbe: A Data Engine for Evaluating, Exploring, and Evolving Large-scale Training Data for Multimodal LLMs

CryoFM: A Flow-based Foundation Model for Cryo-EM Densities

PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-agent Tasks

Adversarial Training for Defense Against Label Poisoning Attacks

You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs

Learning Harmonized Representations for Speculative Sampling

Trajectory-LLM: A Language-based Data Generator for Trajectory Prediction in Autonomous Driving

Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation

Self-Supervised Diffusion Models for Electron-Aware Molecular Representation Learning

MUSE: Machine Unlearning Six-Way Evaluation for Language Models

Learning to Adapt Frozen CLIP for Few-Shot Test-Time Domain Adaptation

L3Ms — Lagrange Large Language Models

MambaPEFT: Exploring Parameter-Efficient Fine-Tuning for Mamba

Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

Nonlinear Sequence Embedding by Monotone Variational Inequality

A Formal Framework for Understanding Length Generalization in Transformers

Surprising Effectiveness of pretraining Ternary Language Model at Scale

Poisson-Dirac Neural Networks for Modeling Coupled Dynamical Systems across Domains

FIRING-Net: A filtered feature recycling network for speech enhancement

Multi-Task Dense Predictions via Unleashing the Power of Diffusion

Noise Separation guided Candidate Label Reconstruction for Noisy Partial Label Learning

Physics-informed Temporal Difference Metric Learning for Robot Motion Planning

ClawMachine: Learning to Fetch Visual Tokens for Referential Comprehension

Grokking at the Edge of Numerical Stability

CLIPure: Purification in Latent Space via CLIP for Adversarially Robust Zero-Shot Classification

Solving Video Inverse Problems Using Image Diffusion Models

First-Person Fairness in Chatbots

Gumbel Counterfactual Generation From Language Models

Neural Sampling from Boltzmann Densities: Fisher-Rao Curves in the Wasserstein Geometry

Proteina: Scaling Flow-based Protein Structure Generative Models

DECO: Unleashing the Potential of ConvNets for Query-based Detection and Segmentation

Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models

Deep Learning Alternatives Of The Kolmogorov Superposition Theorem

Logical Consistency of Large Language Models in Fact-Checking

HShare: Fast LLM Decoding by Hierarchical Key-Value Sharing

Exploring the Camera Bias of Person Re-identification

P-SPIKESSM: HARNESSING PROBABILISTIC SPIKING STATE SPACE MODELS FOR LONG-RANGE DEPENDENCY TASKS

Epistemic Monte Carlo Tree Search

Boosting Neural Combinatorial Optimization for Large-Scale Vehicle Routing Problems

On the Relation between Trainability and Dequantization of Variational Quantum Learning Models

MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

The Belief State Transformer

Herald: A Natural Language Annotated Lean 4 Dataset

UNSURE: self-supervised learning with Unknown Noise level and Stein's Unbiased Risk Estimate

Spurious Forgetting in Continual Learning of Language Models

KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks

VideoGrain: Modulating Space-Time Attention for Multi-Grained Video Editing

State Space Model Meets Transformer: A New Paradigm for 3D Object Detection

Discriminator-Guided Embodied Planning for LLM Agent

UTILITY: Utilizing Explainable Reinforcement Learning to Improve Reinforcement Learning

Learnable Expansion of Graph Operators for Multi-Modal Feature Fusion

Language Models Are Implicitly Continuous

MLPs Learn In-Context on Regression and Classification Tasks

TorchTitan: One-stop PyTorch native solution for production ready LLM pretraining

Law of the Weakest Link: Cross Capabilities of Large Language Models

Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process

Bridging Information Asymmetry in Text-video Retrieval: A Data-centric Approach

Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-Based Decision-Making Systems

Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice

Concept-ROT: Poisoning Concepts in Large Language Models with Model Editing

Learning Video-Conditioned Policy on Unlabelled Data with Joint Embedding Predictive Transformer

Vec2Face: Scaling Face Dataset Generation with Loosely Constrained Vectors

Methods with Local Steps and Random Reshuffling for Generally Smooth Non-Convex Federated Optimization

A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts

A Riemannian Framework for Learning Reduced-order Lagrangian Dynamics

AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation

A Theory of Initialisation's Impact on Specialisation

Closed-Form Merging of Parameter-Efficient Modules for Federated Continual Learning

FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling

T2V2: A Unified Non-Autoregressive Model for Speech Recognition and Synthesis via Multitask Learning

SRSA: Skill Retrieval and Adaptation for Robotic Assembly Tasks

Concept Bottleneck Large Language Models

Approaching Rate-Distortion Limits in Neural Compression with Lattice Transform Coding

MELODI: Exploring Memory Compression for Long Contexts

Reflexive Guidance: Improving OoDD in Vision-Language Models via Self-Guided Image-Adaptive Concept Generation

Neural Interactive Proofs

Newton Meets Marchenko-Pastur: Massively Parallel Second-Order Optimization with Hessian Sketching and Debiasing

A Differentiable Metric for Discovering Groups and Unitary Representations

A Simple Framework for Open-Vocabulary Zero-Shot Segmentation

GPS: A Probabilistic Distributional Similarity with Gumbel Priors for Set-to-Set Matching

Model-Free Offline Reinforcement Learning with Enhanced Robustness

Is Your Multimodal Language Model Oversensitive to Safe Queries?

On Disentangled Training for Nonlinear Transform in Learned Image Compression

ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time

InstantSwap: Fast Customized Concept Swapping across Sharp Shape Differences

The 3D-PC: a benchmark for visual perspective taking in humans and machines

Continuous Diffusion for Mixed-Type Tabular Data

SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation

Fair Submodular Cover

Exploring the Design Space of Visual Context Representation in Video MLLMs

Microcanonical Langevin Ensembles: Advancing the Sampling of Bayesian Neural Networks

LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

Advantage Alignment Algorithms

Circuit Representation Learning with Masked Gate Modeling and Verilog-AIG Alignment

AIR-BENCH 2024: A Safety Benchmark based on Regulation and Policies Specified Risk Categories

Efficient Causal Decision Making with One-sided Feedback

ImageFolder: Autoregressive Image Generation with Folded Tokens

Learning Hierarchical Polynomials of Multiple Nonlinear Features

Demystifying Topological Message-Passing with Relational Structures: A Case Study on Oversquashing in Simplicial Message-Passing

A Transfer Attack to Image Watermarks

Navigating Neural Space: Revisiting Concept Activation Vectors to Overcome Directional Divergence

Uncertainty Herding: One Active Learning Method for All Label Budgets

On Rollouts in Model-Based Reinforcement Learning

Matryoshka Multimodal Models

Combining Induction and Transduction for Abstract Reasoning

SoftMatcha: A Soft and Fast Pattern Matcher for Billion-Scale Corpus Searches

VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation

Topograph: An Efficient Graph-Based Framework for Strictly Topology Preserving Image Segmentation

MorphoDiff: Cellular Morphology Painting with Diffusion Models

Accurate and Scalable Graph Neural Networks via Message Invariance

The Breakdown of Gaussian Universality in Classification of High-dimensional Linear Factor Mixtures

LoCoDL: Communication-Efficient Distributed Learning with Local Training and Compression

Protein Language Model Fitness is a Matter of Preference

Biologically Constrained Barrel Cortex Model Integrates Whisker Inputs and Replicates Key Brain Network Dynamics

Triples as the Key: Structuring Makes Decomposition and Verification Easier in LLM-based TableQA

SoftCVI: Contrastive variational inference with self-generated soft labels

Think Then React: Towards Unconstrained Action-to-Reaction Motion Generation

CL-DiffPhyCon: Closed-loop Diffusion Control of Complex Physical Systems

Execution-guided within-prompt search for programming-by-example

RA-TTA: Retrieval-Augmented Test-Time Adaptation for Vision-Language Models

Attention as a Hypernetwork

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity

Gaussian Ensemble Belief Propagation for Efficient Inference in High-Dimensional, Black-box Systems

DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life

Select before Act: Spatially Decoupled Action Repetition for Continuous Control

Expected Sliced Transport Plans

Joint Reward and Policy Learning with Demonstrations and Human Feedback Improves Alignment

Divergence of Neural Tangent Kernel in Classification Problems

Neural Dueling Bandits: Preference-Based Optimization with Human Feedback

Structure Language Models for Protein Conformation Generation

Matérn Kernels for Tunable Implicit Surface Reconstruction

SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection

Designing Mechanical Meta-Materials by Learning Equivariant Flows

Activation Gradient based Poisoned Sample Detection Against Backdoor Attacks

Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving

Convergence of Distributed Adaptive Optimization with Local Updates

DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models

Learning vector fields of differential equations on manifolds with geometrically constrained operator-valued kernels

CARTS: Advancing Neural Theorem Proving with Diversified Tactic Calibration and Bias-Resistant Tree Search

An Online Learning Theory of Trading-Volume Maximization

Learning High-Degree Parities: The Crucial Role of the Initialization

Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective

Fast Direct: Query-Efficient Online Black-box Guidance for Diffusion-model Target Generation

Reasoning Elicitation in Language Models via Counterfactual Feedback

Latent Action Pretraining from Videos

Programming Refusal with Conditional Activation Steering

Scalable Mechanistic Neural Networks

ObscuraCoder: Powering Efficient Code LM Pre-Training Via Obfuscation Grounding

Vertical Federated Learning with Missing Features During Training and Inference

Co$^{\mathbf{3}}$Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion

Correlation and Navigation in the Vocabulary Key Representation Space of Language Models

KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models

Self-supervised contrastive learning performs non-linear system identification

Fengbo: a Clifford Neural Operator pipeline for 3D PDEs in Computational Fluid Dynamics

A Theoretically-Principled Sparse, Connected, and Rigid Graph Representation of Molecules

Deep Random Features for Scalable Interpolation of Spatiotemporal Data

Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

SmartRAG: Jointly Learn RAG-Related Tasks From the Environment Feedback

SparsyFed: Sparse Adaptive Federated Learning

REGENT: A Retrieval-Augmented Generalist Agent That Can Act In-Context in New Environments

Decision Information Meets Large Language Models: The Future of Explainable Operations Research

TRENDy: Temporal Regression of Effective Nonlinear Dynamics

Don't Take Things Out of Context: Attention Intervention for Enhancing Chain-of-Thought Reasoning in Large Language Models

Adaptive Transformer Programs: Bridging the Gap Between Performance and Interpretability in Transformers

Variational Best-of-N Alignment

Aligned LLMs Are Not Aligned Browser Agents

Adaptive Pruning of Pretrained Transformer via Differential Inclusions

PivotMesh: Generic 3D Mesh Generation via Pivot Vertices Guidance

Distributional Associations vs In-Context Reasoning: A Study of Feed-forward and Attention Layers

Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation

Text2PDE: Latent Diffusion Models for Accessible Physics Simulation

Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics

Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective

Training Robust Ensembles Requires Rethinking Lipschitz Continuity

Perm: A Parametric Representation for Multi-Style 3D Hair Modeling

ActionReasoningBench: Reasoning about Actions with and without Ramification Constraints

Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation

CBraMod: A Criss-Cross Brain Foundation Model for EEG Decoding

Re-Evaluating the Impact of Unseen-Class Unlabeled Data on Semi-Supervised Learning Model

MAP: Multi-Human-Value Alignment Palette

Towards Synergistic Path-based Explanations for Knowledge Graph Completion: Exploration and Evaluation

MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions

Breaking Free from MMI: A New Frontier in Rationalization by Probing Input Utilization

Generalization in VAE and Diffusion Models: A Unified Information-Theoretic Analysis

Learning-Augmented Frequent Directions

Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning

A Large-scale Dataset and Benchmark for Commuting Origin-Destination Flow Generation

Weighted Multi-Prompt Learning with Description-free Large Language Model Distillation

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback

A Unifying Framework for Representation Learning

Three Mechanisms of Feature Learning in a Linear Network

DyCAST: Learning Dynamic Causal Structure from Time Series

Online Clustering with Nearly Optimal Consistency

Building Math Agents with Multi-Turn Iterative Preference Learning

GMValuator: Similarity-based Data Valuation for Generative Models

GenDataAgent: On-the-fly Dataset Augmentation with Synthetic Data

Outlier Synthesis via Hamiltonian Monte Carlo for Out-of-Distribution Detection

Small Models are LLM Knowledge Triggers for Medical Tabular Prediction

Learning-Augmented Search Data Structures

Contextual Document Embeddings

Continuity-Preserving Convolutional Autoencoders for Learning Continuous Latent Dynamical Models from Images

LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior

ALLaM: Large Language Models for Arabic and English

Efficient Perplexity Bound and Ratio Matching in Discrete Diffusion Language Models

A Theoretical Perspective: How to Prevent Model Collapse in Self-consuming Training Loops

Learning to Discover Regulatory Elements for Gene Expression Prediction

Controllable Blur Data Augmentation Using 3D-Aware Motion Estimation

GANDALF: Generative AttentioN based Data Augmentation and predictive modeLing Framework for personalized cancer treatment

Topological Schrödinger Bridge Matching

Moral Alignment for LLM Agents

Reassessing How to Compare and Improve the Calibration of Machine Learning Models

F-Fidelity: A Robust Framework for Faithfulness Evaluation of Explainable AI

Motion Control of High-Dimensional Musculoskeletal Systems with Hierarchical Model-Based Planning

HELM: Hierarchical Encoding for mRNA Language Modeling

Think while You Generate: Discrete Diffusion with Planned Denoising

Everything is Editable: Extend Knowledge Editing to Unstructured Data in Large Language Models

Mitigating Memorization in Language Models

Zero-Shot Natural Language Explanations

Fine-tuning can Help Detect Pretraining Data from Large Language Models

Rethinking Invariance Regularization in Adversarial Training to Improve Robustness-Accuracy Trade-off

Following the Human Thread in Social Navigation

DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving

Residual Stream Analysis with Multi-Layer SAEs

EmbodiedSAM: Online Segment Any 3D Thing in Real Time

Unsupervised Disentanglement of Content and Style via Variance-Invariance Constraints

AdaManip: Adaptive Articulated Object Manipulation Environments and Policy Learning

ContraDiff: Planning Towards High Return States via Contrastive Learning

Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Dynamic Scenes

Modeling Unseen Environments with Language-guided Composable Causal Components in Reinforcement Learning

Framer: Interactive Frame Interpolation

DartControl: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control

MambaExtend: A Training-Free Approach to Improve Long Context Extension of Mamba

StochSync: Stochastic Diffusion Synchronization for Image Generation in Arbitrary Spaces

Going Beyond Static: Understanding Shifts with Time-Series Attribution

Contrastive Learning from Synthetic Audio Doppelgängers

Nonlinear multiregion neural dynamics with parametric impulse response communication channels

Graph Neural Networks for Edge Signals: Orientation Equivariance and Invariance

Incorporating Visual Correspondence into Diffusion Model for Virtual Try-On

POTEC: Off-Policy Contextual Bandits for Large Action Spaces via Policy Decomposition

Does Safety Training of LLMs Generalize to Semantically Related Natural Prompts?

3D-SPATIAL MULTIMODAL MEMORY

Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction

Self-supervised Monocular Depth Estimation Robust to Reflective Surface Leveraged by Triplet Mining

VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning

RobustKV: Defending Large Language Models against Jailbreak Attacks via KV Eviction

Competition Dynamics Shape Algorithmic Phases of In-Context Learning

Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization

Learning to engineer protein flexibility

TSC-Net: Prediction of Pedestrian Trajectories by Trajectory-Scene-Cell Classification

Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning

Beyond Circuit Connections: A Non-Message Passing Graph Transformer Approach for Quantum Error Mitigation

On Conformal Isometry of Grid Cells: Learning Distance-Preserving Position Embedding

Deep Linear Probe Generators for Weight Space Learning

IntersectionZoo: Eco-driving for Benchmarking Multi-Agent Contextual Reinforcement Learning

Finding Shared Decodable Concepts and their Negations in the Brain

Adapters for Altering LLM Vocabularies: What Languages Benefit the Most?

MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane Representation

An Asynchronous Bundle Method for Distributed Learning Problems

Agent Skill Acquisition for Large Language Models via CycleQD

GraphArena: Evaluating and Exploring Large Language Models on Graph Computation

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems

Efficient stagewise pretraining via progressive subnetworks

CtD: Composition through Decomposition in Emergent Communication

Locally Connected Echo State Networks for Time Series Forecasting

Efficient Interpolation between Extragradient and Proximal Methods for Weak MVIs

PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer

LaMPlace: Learning to Optimize Cross-Stage Metrics in Macro Placement

Progressive Parameter Efficient Transfer Learning for Semantic Segmentation

Ranking-aware adapter for text-driven image ordering with CLIP

COMBO: Compositional World Models for Embodied Multi-Agent Cooperation

To Tackle Adversarial Transferability: A Novel Ensemble Training Method with Fourier Transformation

A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement

BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models

Dissecting Adversarial Robustness of Multimodal LM Agents

Calibrating LLMs with Information-Theoretic Evidential Deep Learning

SMITE: Segment Me In TimE

PABBO: Preferential Amortized Black-Box Optimization

Federated Granger Causality Learning For Interdependent Clients With State Space Representation

ShEPhERD: Diffusing shape, electrostatics, and pharmacophores for bioisosteric drug design

Toward Efficient Multi-Agent Exploration With Trajectory Entropy Maximization

Improving Semantic Understanding in Speech Language Models via Brain-tuning

Permute-and-Flip: An optimally stable and watermarkable decoder for LLMs

Scaling Laws for Adversarial Attacks on Language Model Activations and Tokens

CameraCtrl: Enabling Camera Control for Video Diffusion Models

NEAR: A Training-Free Pre-Estimator of Machine Learning Model Performance

PhyloVAE: Unsupervised Learning of Phylogenetic Trees via Variational Autoencoders

Training-free Camera Control for Video Generation

From GNNs to Trees: Multi-Granular Interpretability for Graph Neural Networks

Prevalence of Negative Transfer in Continual Reinforcement Learning: Analyses and a Simple Baseline

Improving Equivariant Networks with Probabilistic Symmetry Breaking

Glad: A Streaming Scene Generator for Autonomous Driving

Scaling Large Language Model-based Multi-Agent Collaboration

Gaussian Differentially Private Human Faces Under a Face Radial Curve Representation

Simulating Training Dynamics to Reconstruct Training Data from Deep Neural Networks

Unsupervised Meta-Learning via In-Context Learning

Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs

An Exploration with Entropy Constrained 3D Gaussians for 2D Video Compression

The Directionality of Optimization Trajectories in Neural Networks

TopoDiffusionNet: A Topology-aware Diffusion Model

Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate

Oracle efficient truncated statistics

Self-Improving Robust Preference Optimization

BaB-ND: Long-Horizon Motion Planning with Branch-and-Bound and Neural Dynamics

Radar: Fast Long-Context Decoding for Any Transformer

AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation

A Policy-Gradient Approach to Solving Imperfect-Information Games with Best-Iterate Convergence

Conformalized Survival Analysis for General Right-Censored Data

Truncated Consistency Models

Exploiting Hidden Symmetry to Improve Objective Perturbation for DP linear learners with a nonsmooth L1-norm

Actions Speak Louder Than Words: Rate-Reward Trade-off in Markov Decision Processes

Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo

Learning Dynamics of Deep Matrix Factorization Beyond the Edge of Stability

Learning to Steer Markovian Agents under Model Uncertainty

Theory, Analysis, and Best Practices for Sigmoid Self-Attention

Federated Few-Shot Class-Incremental Learning

What Makes a Maze Look Like a Maze?

Sparse components distinguish visual pathways & their alignment to neural networks

LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics

Conditional Testing based on Localized Conformal $p$-values

Revisiting text-to-image evaluation with Gecko: on metrics, prompts, and human rating

Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling

PvNeXt: Rethinking Network Design and Temporal Motion for Point Cloud Video Recognition

The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation

Problem-Parameter-Free Federated Learning

TLDR: Token-Level Detective Reward Model for Large Vision Language Models

Rethinking Spiking Neural Networks from an Ensemble Learning Perspective

One-for-All Few-Shot Anomaly Detection via Instance-Induced Prompt Learning

Self-Play Preference Optimization for Language Model Alignment

Training Large Language Models for Retrieval-Augmented Question Answering through Backtracking Correction

Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages

CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes

Multiplicative Logit Adjustment Approximates Neural-Collapse-Aware Decision Boundary Adjustment

Retrieval Augmented Diffusion Model for Structure-informed Antibody Design and Optimization

Implicit Bias of Mirror Flow for Shallow Neural Networks in Univariate Regression

Investigating Pattern Neurons in Urban Time Series Forecasting

SaMer: A Scenario-aware Multi-dimensional Evaluator for Large Language Models

SOAP: Improving and Stabilizing Shampoo using Adam for Language Modeling

Self-Updatable Large Language Models by Integrating Context into Model Parameters

Streamlining Redundant Layers to Compress Large Language Models

Multimodal Situational Safety

Diff3DS: Generating View-Consistent 3D Sketch via Differentiable Curve Rendering

TASAR: Transfer-based Attack on Skeletal Action Recognition

Wasserstein-Regularized Conformal Prediction under General Distribution Shift

Fast and Slow Streams for Online Time Series Forecasting Without Information Leakage

$\text{D}_{2}\text{O}$: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models

Does Refusal Training in LLMs Generalize to the Past Tense?

Gaussian Head & Shoulders: High Fidelity Neural Upper Body Avatars with Anchor Gaussian Guided Texture Warping

ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning

Robust Feature Learning for Multi-Index Models in High Dimensions

Stabilized Neural Prediction of Potential Outcomes in Continuous Time

High-Quality Joint Image and Video Tokenization with Causal VAE

Injecting Universal Jailbreak Backdoors into LLMs in Minutes

Learning Continually by Spectral Regularization

Searching for Optimal Solutions with LLMs via Bayesian Optimization

The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning

Training Neural Networks as Recognizers of Formal Languages

Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection

MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models

Adversarially Robust Out-of-Distribution Detection Using Lyapunov-Stabilized Embeddings

Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation

IterGen: Iterative Semantic-aware Structured LLM Generation with Backtracking

QPM: Discrete Optimization for Globally Interpretable Image Classification

Targeted Attack Improves Protection against Unauthorized Diffusion Customization

MMDisCo: Multi-Modal Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation

Multi-agent cooperation through learning-aware policy gradients

DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References

StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization

How Learnable Grids Recover Fine Detail in Low Dimensions: A Neural Tangent Kernel Analysis of Multigrid Parametric Encodings

Learning Geometric Reasoning Networks For Robot Task And Motion Planning

Rethinking Neural Multi-Objective Combinatorial Optimization via Neat Weight Embedding

Denoising Autoregressive Transformers for Scalable Text-to-Image Generation

Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models

Towards Multiple Character Image Animation Through Enhancing Implicit Decoupling

Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

Is Factuality Enhancement a Free Lunch For LLMs? Better Factuality Can Lead to Worse Context-Faithfulness

Implicit In-context Learning

Understanding and Enhancing the Transferability of Jailbreaking Attacks

RecDreamer: Consistent Text-to-3D Generation via Uniform Score Distillation

3D StreetUnveiler with Semantic-aware 2DGS - a simple baseline

Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form

DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models

Inner Information Analysis Algorithm for Deep Neural Network based on Community

AgentStudio: A Toolkit for Building General Virtual Agents

Node Similarities under Random Projections: Limits and Pathological Cases

Towards Scalable Topological Regularizers

DeepGate4: Efficient and Effective Representation Learning for Circuit Design at Scale

Population Transformer: Learning Population-level Representations of Neural Activity

Adapting Multi-modal Large Language Model to Concept Drift From Pre-training Onwards

Rethinking Shapley Value for Negative Interactions in Non-convex Games

No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models

Teaching LLMs How to Learn with Contextual Fine-Tuning

X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing

The Case for Cleaner Biosignals: High-fidelity Neural Compressor Enables Transfer from Cleaner iEEG to Noisier EEG

SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Model

Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement Learning

Second-Order Fine-Tuning without Pain for LLMs: A Hessian Informed Zeroth-Order Optimizer

MeshMask: Physics-Based Simulations with Masked Graph Neural Networks

Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

Efficient Sparse PCA via Block-Diagonalization

The Crucial Role of Samplers in Online Direct Preference Optimization

Towards Generalizable Reinforcement Learning via Causality-Guided Self-Adaptive Representations

Bridging the Gap between Variational Inference and Stochastic Gradient MCMC in Function Space

Topological Blindspots: Understanding and Extending Topological Deep Learning Through the Lens of Expressivity

The Effectiveness of Curvature-Based Rewiring and the Role of Hyperparameters in GNNs Revisited

Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data

CoMRes: Semi-Supervised Time Series Forecasting Utilizing Consensus Promotion of Multi-Resolution

The Geometry of Categorical and Hierarchical Concepts in Large Language Models

Offline RL in Regular Decision Processes: Sample Efficiency via Language Metrics

MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion

Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements

Human-Aligned Chess With a Bit of Search

Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass

Rethinking Self-Distillation: Label Averaging and Enhanced Soft Label Refinement with Partial Labels

Neural Exploratory Landscape Analysis for Meta-Black-Box-Optimization

Can Generative AI Solve Your In-Context Learning Problem? A Martingale Perspective

Making Transformer Decoders Better Differentiable Indexers

Cut Your Losses in Large-Vocabulary Language Models

Bayesian Optimization of Antibodies Informed by a Generative Model of Evolving Sequences

KinPFN: Bayesian Approximation of RNA Folding Kinetics using Prior-Data Fitted Networks

Scaling FP8 training to trillion-token LLMs

Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want

Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory

Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces

PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs

Selective induction Heads: How Transformers Select Causal Structures in Context

SOO-Bench: Benchmarks for Evaluating the Stability of Offline Black-Box Optimization

Attention layers provably solve single-location regression

Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap

Convergence and Implicit Bias of Gradient Descent on Continual Linear Classification

ImDy: Human Inverse Dynamics from Imitated Observations

Leveraging Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning

Towards a learning theory of representation alignment

Dynamic Assortment Selection and Pricing with Censored Preference Feedback

ReMatching Dynamic Reconstruction Flow

Straightness of Rectified Flow: A Theoretical Insight into Wasserstein Convergence

A Large-scale Training Paradigm for Graph Generative Models

Noise-conditioned Energy-based Annealed Rewards (NEAR): A Generative Framework for Imitation Learning from Observation

Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models

Swift4D: Adaptive divide-and-conquer Gaussian Splatting for compact and efficient reconstruction of dynamic scene

Out-of-distribution Generalization for Total Variation based Invariant Risk Minimization

Interactive Adjustment for Human Trajectory Prediction with Individual Feedback

Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-training of Deep Networks

Beyond FVD: An Enhanced Evaluation Metrics for Video Generation Distribution Quality

Estimating the Probabilities of Rare Outputs in Language Models

Probabilistic Language-Image Pre-Training

CoInD: Enabling Logical Compositions in Diffusion Models

Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment

Physics-aligned field reconstruction with diffusion bridge

Find A Winning Sign: Sign Is All We Need to Win the Lottery

Comparing noisy neural population dynamics using optimal transport distances

ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities

Can Textual Gradient Work in Federated Learning?

Progressive Compression with Universally Quantized Diffusion Models

Bridging Context Gaps: Leveraging Coreference Resolution for Long Contextual Understanding

Strength Estimation and Human-Like Strength Adjustment in Games

eQMARL: Entangled Quantum Multi-Agent Reinforcement Learning for Distributed Cooperation over Quantum Channels

SPDIM: Source-Free Unsupervised Conditional and Label Shift Adaptation in EEG

CViT: Continuous Vision Transformer for Operator Learning

Generative Verifiers: Reward Modeling as Next-Token Prediction

Class Distribution-induced Attention Map for Open-vocabulary Semantic Segmentations

CR-CTC: Consistency regularization on CTC for improved speech recognition

Fitting Networks with a Cancellation Trick

Tight Time Complexities in Parallel Stochastic Optimization with Arbitrary Computation Dynamics

ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer

Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF

Knowledge Distillation with Multi-granularity Mixture of Priors for Image Super-Resolution

Distilling Reinforcement Learning Algorithms for In-Context Model-Based Planning

Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond

How to Evaluate Reward Models for RLHF

GeoILP: A Synthetic Dataset to Guide Large-Scale Rule Induction

Number Cookbook: Number Understanding of Language Models and How to Improve It

Online Preference Alignment for Language Models via Count-based Exploration

On-the-fly Preference Alignment via Principle-Guided Decoding

Towards Interpreting Visual Information Processing in Vision-Language Models

LucidPPN: Unambiguous Prototypical Parts Network for User-centric Interpretable Computer Vision

Occlusion-aware Non-Rigid Point Cloud Registration via Unsupervised Neural Deformation Correntropy

Feedback Favors the Generalization of Neural ODEs

SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking

Wasserstein Distances, Neuronal Entanglement, and Sparsity

VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning

X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale

Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions

Overcoming Lower-Level Constraints in Bilevel Optimization: A Novel Approach with Regularized Gap Functions

Decentralized Sporadic Federated Learning: A Unified Algorithmic Framework with Convergence Guarantees

Identifying latent state transitions in non-linear dynamical systems

Learning Clustering-based Prototypes for Compositional Zero-Shot Learning

MotionDreamer: One-to-Many Motion Synthesis with Localized Generative Masked Transformer

Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs

Denoising with a Joint-Embedding Predictive Architecture

DELTA: DENSE EFFICIENT LONG-RANGE 3D TRACKING FOR ANY VIDEO

Large Language Models Assume People are More Rational than We Really are

CodePlan: Unlocking Reasoning Potential in Large Language Models by Scaling Code-form Planning

In-context Time Series Predictor

Beyond Next Token Prediction: Patch-Level Training for Large Language Models

Locality-aware Gaussian Compression for Fast and High-quality Rendering

MixMax: Distributional Robustness in Function Space via Optimal Data Mixtures

Calibrating Expressions of Certainty

Budgeted Online Continual Learning by Adaptive Layer Freezing and Frequency-based Sampling

Generalizable Human Gaussians from Single-View Image

Tool-Planner: Task Planning with Clusters across Multiple Tools

FreeVS: Generative View Synthesis on Free Driving Trajectory

Offline Hierarchical Reinforcement Learning via Inverse Optimization

SCOPE: A Self-supervised Framework for Improving Faithfulness in Conditional Text Generation

ImProver: Agent-Based Automated Proof Optimization

FOSP: Fine-tuning Offline Safe Policy through World Models

Quantum-PEFT: Ultra parameter-efficient fine-tuning

Data Selection via Optimal Control for Language Models

Reasoning with Latent Thoughts: On the Power of Looped Transformers

Improving Deep Regression with Tightness

Gaussian Splatting Lucas-Kanade

FACTS: A Factored State-Space Framework for World Modelling

Progressive Token Length Scaling in Transformer Encoders for Efficient Universal Segmentation

Aligned Datasets Improve Detection of Latent Diffusion-Generated Images

Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace

Learning Diagrams: A Graphical Language for Compositional Training Regimes

DeepTAGE: Deep Temporal-Aligned Gradient Enhancement for Optimizing Spiking Neural Networks

TC-MoE: Augmenting Mixture of Experts with Ternary Expert Choice

RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph

3D-MolT5: Leveraging Discrete Structural Information for Molecule-Text Modeling

Federated Residual Low-Rank Adaption of Large Language Models

Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts

Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models

Ensembling Diffusion Models via Adaptive Feature Aggregation

Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model

Dreamweaver: Learning Compositional World Models from Pixels

PAD: Personalized Alignment at Decoding-time

Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition

Matrix Product Sketching via Coordinated Sampling

Look Before You Leap: Universal Emergent Mechanism for Retrieval in Language Models

Language Representations Can be What Recommenders Need: Findings and Potentials

UniDetox: Universal Detoxification of Large Language Models via Dataset Distillation

Preserving Deep Representations in One-Shot Pruning: A Hessian-Free Second-Order Optimization Framework

DataMan: Data Manager for Pre-training Large Language Models

Rethinking Audio-Visual Adversarial Vulnerability from Temporal and Modality Perspectives

GraphRouter: A Graph-based Router for LLM Selections

CBQ: Cross-Block Quantization for Large Language Models

Multi-Resolution Decomposable Diffusion Model for Non-Stationary Time Series Anomaly Detection

How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?

DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation

Looking Inward: Language Models Can Learn About Themselves by Introspection

TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation

A Meta-Learning Approach to Bayesian Causal Discovery

Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning

Diffusion Bridge Implicit Models

Disentangled Representation Learning with the Gromov-Monge Gap

Persistent Pre-training Poisoning of LLMs

Why Does the Effective Context Length of LLMs Fall Short?

Influence Functions for Scalable Data Attribution in Diffusion Models

Parameter Expanded Stochastic Gradient Markov Chain Monte Carlo

Action Sequence Augmentation for Action Anticipation

Combatting Dimensional Collapse in LLM Pre-Training Data via Submodular File Selection

Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model

Controllable Satellite-to-Street-View Synthesis with Precise Pose Alignment and Zero-Shot Environmental Control

Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs

Efficient Discovery of Pareto Front for Multi-Objective Reinforcement Learning

OpenPRM: Building Open-domain Process-based Reward Models with Preference Trees

Generalization Guarantees for Representation Learning via Data-Dependent Gaussian Mixture Priors

Recovery of Causal Graph Involving Latent Variables via Homologous Surrogates

What is Wrong with Perplexity for Long-context Language Modeling?

LongMamba: Enhancing Mamba's Long-Context Capabilities via Training-Free Receptive Field Enlargement

Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets

PhyMPGN: Physics-encoded Message Passing Graph Network for spatiotemporal PDE systems

GReaTer: Gradients Over Reasoning Makes Smaller Language Models Strong Prompt Optimizers

On Evaluating the Durability of Safeguards for Open-Weight LLMs

Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization

Can We Ignore Labels in Out of Distribution Detection?

Optimality of Matrix Mechanism on $\ell_p^p$-metric

Teaching Human Behavior Improves Content Understanding Abilities Of VLMs

$\phi$-Update: A Class of Policy Update Methods with Policy Convergence Guarantee

Scalable Bayesian Learning with posteriors

Efficient Imitation under Misspecification

Causal Graph Transformer for Treatment Effect Estimation Under Unknown Interference

UGMathBench: A Diverse and Dynamic Benchmark for Undergraduate-Level Mathematical Reasoning with Large Language Models

Simple ReFlow: Improved Techniques for Fast Flow Models

Vision Language Models are In-Context Value Learners

FIG: Flow with Interpolant Guidance for Linear Inverse Problems

ADePT: Adaptive Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning

A Unified Theory of Quantum Neural Network Loss Landscapes

CL-MFAP: A Contrastive Learning-Based Multimodal Foundation Model for Molecular Property Prediction and Antibiotic Screening

Eliminating Position Bias of Language Models: A Mechanistic Approach

ADBM: Adversarial Diffusion Bridge Model for Reliable Adversarial Purification

Black Sheep in the Herd: Playing with Spuriously Correlated Attributes for Vision-Language Recognition

Multi-domain Distribution Learning for De Novo Drug Design

A Computational Framework for Modeling Emergence of Color Vision in the Human Brain

Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness

Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A Benchmark

Exploring channel distinguishability in local neighborhoods of the model space in quantum neural networks

CrossMPT: Cross-attention Message-passing Transformer for Error Correcting Codes

Integrative Decoding: Improving Factuality via Implicit Self-consistency

Bilinear MLPs enable weight-based mechanistic interpretability

DocMIA: Document-Level Membership Inference Attacks against DocVQA Models

Learning stochastic dynamics from snapshots through regularized unbalanced optimal transport

RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction

Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining

Representative Guidance: Diffusion Model Sampling with Coherence

MAI: A Multi-turn Aggregation-Iteration Model for Composed Image Retrieval

Improving the Sparse Structure Learning of Spiking Neural Networks from the View of Compression Efficiency

Unlocking Global Optimality in Bilevel Optimization: A Pilot Study

GraphBridge: Towards Arbitrary Transfer Learning in GNNs

Score Forgetting Distillation: A Swift, Data-Free Method for Machine Unlearning in Diffusion Models

ClimaQA: An Automated Evaluation Framework for Climate Question Answering Models

A Conditional Independence Test in the Presence of Discretization

Rethinking Visual Counterfactual Explanations Through Region Constraint

Minimal Variance Model Aggregation: A principled, non-intrusive, and versatile integration of black box models

ML4TSPBench: Drawing Methodological Principles for TSP and Beyond from Streamlined Design Space of Learning and Search

Beyond Surface Structure: A Causal Assessment of LLMs' Comprehension ability

An Auditing Test to Detect Behavioral Shift in Language Models

Sensitivity Verification for Additive Decision Tree Ensembles

RNNs are not Transformers (Yet): The Key Bottleneck on In-Context Retrieval

BANGS: Game-theoretic Node Selection for Graph Self-Training

Second Order Bounds for Contextual Bandits with Function Approximation

Sharpness-Aware Black-Box Optimization

Improving Complex Reasoning with Dynamic Prompt Corruption: A Soft Prompt Optimization Approach

Bayesian Experimental Design Via Contrastive Diffusions

Diffusion Bridge AutoEncoders for Unsupervised Representation Learning

Probabilistic Neural Pruning via Sparsity Evolutionary Fokker-Planck-Kolmogorov Equation

GI-GS: Global Illumination Decomposition on Gaussian Splatting for Inverse Rendering

SAGEPhos: Sage Bio-Coupled and Augmented Fusion for Phosphorylation Site Detection

MamKO: Mamba-based Koopman operator for modeling and predictive control

Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration

Large Language Models are Interpretable Learners

Rethinking and Improving Autoformalization: Towards a Faithful Metric and a Dependency Retrieval-based Approach

Privately Counting Partially Ordered Data

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks

Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization

Causal Graphical Models for Vision-Language Compositional Understanding

SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation

Mixture Compressor for Mixture-of-Experts LLMs Gains More

Towards Robust Multimodal Open-set Test-time Adaptation via Adaptive Entropy-aware Optimization

Causal Representation Learning from Multimodal Biomedical Observations

Adversarial Search Engine Optimization for Large Language Models

OvercookedV2: Rethinking Overcooked for Zero-Shot Coordination

Mind Control through Causal Inference: Predicting Clean Images from Poisoned Data

TDDBench: A Benchmark for Training data detection

Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs

Investigating the Pre-Training Dynamics of In-Context Learning: Task Recognition vs. Task Learning

Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond

DPaI: Differentiable Pruning at Initialization with Node-Path Balance Principle

Modeling Complex System Dynamics with Flow Matching Across Time and Conditions

Deep Distributed Optimization for Large-Scale Quadratic Programming

Parameter and Memory Efficient Pretraining via Low-rank Riemannian Optimization

SymDiff: Equivariant Diffusion via Stochastic Symmetrisation

ECHOPulse: ECG Controlled Echocardio-gram Video Generation

MM-EMBED: UNIVERSAL MULTIMODAL RETRIEVAL WITH MULTIMODAL LLMS

Simple Guidance Mechanisms for Discrete Diffusion Models

Bootstrapped Model Predictive Control

Lipschitz Bandits in Optimal Space

Interpreting Language Reward Models via Contrastive Explanations

Minimax Optimal Reinforcement Learning with Quasi-Optimism

Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency

Improving Graph Neural Networks by Learning Continuous Edge Directions

Adam-mini: Use Fewer Learning Rates To Gain More

EcoFace: Audio-Visual Emotional Co-Disentanglement Speech-Driven 3D Talking Face Generation

Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion

The Optimization Landscape of SGD Across the Feature Learning Strength

DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding

3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting

Stochastic variance-reduced Gaussian variational inference on the Bures-Wasserstein manifold

Fast Summation of Radial Kernels via QMC Slicing

ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance

Selective Aggregation for Low-Rank Adaptation in Federated Learning

REMEDY: Recipe Merging Dynamics in Large Vision-Language Models

Language models scale reliably with over-training and on downstream tasks

THE ROBUSTNESS OF DIFFERENTIABLE CAUSAL DISCOVERY IN MISSPECIFIED SCENARIOS

Uncertainty and Influence aware Reward Model Refinement for Reinforcement Learning from Human Feedback

Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences

Fast and Accurate Blind Flexible Docking

Adaptive Batch Size for Privately Finding Second-Order Stationary Points

The Value of Sensory Information to a Robot

AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents

AI as Humanity’s Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text

Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data

M^3PC: Test-time Model Predictive Control using Pretrained Masked Trajectory Model

Risk-Sensitive Variational Actor-Critic: A Model-Based Approach

Can Neural Networks Achieve Optimal Computational-statistical Tradeoff? An Analysis on Single-Index Model

Action abstractions for amortized sampling

SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction

Noisy Test-Time Adaptation in Vision-Language Models

Temporal Difference Learning: Why It Can Be Fast and How It Will Be Faster

Optimizing importance weighting in the presence of sub-population shifts

Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering

OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation

Diffusion-based Neural Network Weights Generation

Attribute-based Visual Reprogramming for Vision-Language Models

Competitive Fair Scheduling with Predictions

AnalogGenie: A Generative Engine for Automatic Discovery of Analog Circuit Topologies

Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency

Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting

ComLoRA: A Competitive Learning Approach for Enhancing LoRA

On the Convergence of No-Regret Dynamics in Information Retrieval Games with Proportional Ranking Functions

Training on the Test Task Confounds Evaluation and Emergence

AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models

HeadMap: Locating and Enhancing Knowledge Circuits in LLMs

UniDrive: Towards Universal Driving Perception Across Camera Configurations

SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning

Large Language Models can Become Strong Self-Detoxifiers

Learning Interleaved Image-Text Comprehension in Vision-Language Large Models

Extending Mercer's expansion to indefinite and asymmetric kernels

AssembleFlow: Rigid Flow Matching with Inertial Frames for Molecular Assembly

The AdEMAMix Optimizer: Better, Faster, Older

Preference Optimization for Reasoning with Pseudo Feedback

The OMG dataset: An Open MetaGenomic corpus for mixed-modality genomic language modeling

To Clip or not to Clip: the Dynamics of SGD with Gradient Clipping in High-Dimensions

Quantifying Generalization Complexity for Large Language Models

Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration

Indirect Gradient Matching for Adversarial Robust Distillation

Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context

DUET: Decentralized Bilevel Optimization without Lower-Level Strong Convexity

Identifiable Exchangeable Mechanisms for Causal Structure and Representation Learning

Feedback Schrödinger Bridge Matching

Predictive Uncertainty Quantification for Bird's Eye View Segmentation: A Benchmark and Novel Loss Function

MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility

Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment

ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents

Unveiling the Magic of Code Reasoning through Hypothesis Decomposition and Amendment

PolaFormer: Polarity-aware Linear Attention for Vision Transformers

Unearthing Skill-level Insights for Understanding Trade-offs of Foundation Models

Semantic Loss Guided Data Efficient Supervised Fine Tuning for Safe Responses in LLMs

DeeperForward: Enhanced Forward-Forward Training for Deeper and Better Performance

LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

Beyond single neurons: population response geometry in digital twins of mouse visual cortex

Open-Set Graph Anomaly Detection via Normal Structure Regularisation

Safety Layers in Aligned Large Language Models: The Key to LLM Security

Node-Time Conditional Prompt Learning in Dynamic Graphs

Ward: Provable RAG Dataset Inference via LLM Watermarks

Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation

Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats

Boosting the visual interpretability of CLIP via adversarial fine-tuning

Convergent Privacy Loss of Noisy-SGD without Convexity and Smoothness

MA$^2$E: Addressing Partial Observability in Multi-Agent Reinforcement Learning with Masked Auto-Encoder

RMB: Comprehensively benchmarking reward models in LLM alignment

Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion Model

Dobi-SVD: Differentiable SVD for LLM Compression and Some New Perspectives

Finally Rank-Breaking Conquers MNL Bandits: Optimal and Efficient Algorithms for MNL Assortment

Reti-Diff: Illumination Degradation Image Restoration with Retinex-based Latent Diffusion Model

InstaSHAP: Interpretable Additive Models Explain Shapley Values Instantly

Collapsed Language Models Promote Fairness

Semi-Parametric Retrieval via Binary Bag-of-Tokens Index

Prompt as Knowledge Bank: Boost Vision-language model via Structural Representation for zero-shot medical detection

Taming Overconfidence in LLMs: Reward Calibration in RLHF

Controlling Language and Diffusion Models by Transporting Activations

HiBug2: Efficient and Interpretable Error Slice Discovery for Comprehensive Model Debugging

ADIFF: Explaining audio difference using natural language

CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux Modelling

Gaussian Mixture Counterfactual Generator

BitStack: Any-Size Compression of Large Language Models in Variable Memory Environments

Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?

RFMamba: Frequency-Aware State Space Model for RF-Based Human-Centric Perception

Catastrophic Failure of LLM Unlearning via Quantization

Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data

Agent S: An Open Agentic Framework that Uses Computers Like a Human

MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion

Efficient Top-m Data Values Identification for Data Selection

Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step

HMoRA: Making LLMs More Effective with Hierarchical Mixture of LoRA Experts

TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval

Enhancing Robust Fairness via Confusional Spectral Regularization

Do as I do (Safely): Mitigating Task-Specific Fine-tuning Risks in Large Language Models

Analysis of Linear Mode Connectivity via Permutation-Based Weight Matching: With Insights into Other Permutation Search Methods

Diffusion Actor-Critic: Formulating Constrained Policy Iteration as Diffusion Noise Regression for Offline Reinforcement Learning

ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities

NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

Symbolic regression via MDLformer-guided search: from minimizing prediction error to minimizing description length

Hidden in the Noise: Two-Stage Robust Watermarking for Images

Causal Concept Graph Models: Beyond Causal Opacity in Deep Learning

gRNAde: Geometric Deep Learning for 3D RNA inverse design

Boltzmann-Aligned Inverse Folding Model as a Predictor of Mutational Effects on Protein-Protein Interactions

CATCH: Channel-Aware Multivariate Time Series Anomaly Detection via Frequency Patching

LLM-SR: Scientific Equation Discovery via Programming with Large Language Models

Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape View

AdvPaint: Protecting Images from Inpainting Manipulation via Adversarial Attention Disruption

Is Your Video Language Model a Reliable Judge?

Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model

MQuAKE-Remastered: Multi-Hop Knowledge Editing Can Only Be Advanced with Reliable Evaluations

Coreset Selection via Reducible Loss in Continual Learning

Diffusion Policy Policy Optimization

CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph

Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models

Last Iterate Convergence of Incremental Methods as a Model of Forgetting

OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data

E(3)-equivariant models cannot learn chirality: Field-based molecular generation

Adaptive Length Image Tokenization via Recurrent Allocation

From Tokens to Lattices: Emergent Lattice Structures in Language Models

Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation

Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation

Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation

Subtask-Aware Visual Reward Learning from Segmented Demonstrations

Towards Federated RLHF with Aggregated Client Preference for LLMs

From Layers to States: A State Space Model Perspective to Deep Neural Network Layer Dynamics

Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking

Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models

Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment

Zeroth-Order Fine-Tuning of LLMs with Transferable Static Sparsity

Layout-your-3D: Controllable and Precise 3D Generation with 2D Blueprint

MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmentation

ThinK: Thinner Key Cache by Query-Driven Pruning

Generalization Bounds for Canonicalization: A Comparative Study with Group Averaging

Transformers Provably Solve Parity Efficiently with Chain of Thought

A Tight Convergence Analysis of Inexact Stochastic Proximal Point Algorithm for Stochastic Composite Optimization Problems

Linear combinations of latents in generative models: subspaces and beyond

Towards Semantic Equivalence of Tokenization in Multimodal LLM

cryoSPHERE: Single-Particle HEterogeneous REconstruction from cryo EM

OS-ATLAS: Foundation Action Model for Generalist GUI Agents

Balanced Neural ODEs: nonlinear model order reduction and Koopman operator approximations

Distilling Dataset into Neural Field

HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics

Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning

Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow

YouTube-SL-25: A Large-Scale, Open-Domain Multilingual Sign Language Parallel Corpus

Scale-Free Graph-Language Models

Misspecified $Q$-Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error

CAMEx: Curvature-aware Merging of Experts

Graph Neural Networks Are More Than Filters: Revisiting and Benchmarking from A Spectral Perspective

InversionGNN: A Dual Path Network for Multi-Property Molecular Optimization

Tree-Wasserstein Distance for High Dimensional Data with a Latent Feature Hierarchy

Large-scale and Fine-grained Vision-language Pre-training for Enhanced CT Image Understanding

ACC-Collab: An Actor-Critic Approach to Multi-Agent LLM Collaboration

The Unreasonable Ineffectiveness of the Deeper Layers

KinFormer: Generalizable Dynamical Symbolic Regression for Catalytic Organic Reaction Kinetics

Shifting the Paradigm: A Diffeomorphism Between Time Series Data Manifolds for Achieving Shift-Invariancy in Deep Learning

Neural Causal Graph for Interpretable and Intervenable Classification

MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction

Constraint-Conditioned Actor-Critic for Offline Safe Reinforcement Learning

Chunk-Distilled Language Modeling

DynFrs: An Efficient Framework for Machine Unlearning in Random Forest

Artificial Kuramoto Oscillatory Neurons

An Effective Manifold-based Optimization Method for Distributionally Robust Classification

Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence

CAX: Cellular Automata Accelerated in JAX

ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding

ZAPBench: A Benchmark for Whole-Brain Activity Prediction in Zebrafish

Generative Classifiers Avoid Shortcut Solutions

Physics of Language Models: Part 3.2, Knowledge Manipulation

TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization With Estimated Weights

Think-on-Graph 2.0: Deep and Faithful Large Language Model Reasoning with Knowledge-guided Retrieval Augmented Generation

FaceShot: Bring Any Character into Life

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

DreamDistribution: Learning Prompt Distribution for Diverse In-distribution Generation

Track-On: Transformer-based Online Point Tracking with Memory

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Learning Graph Quantized Tokenizers

Do Mice Grok? Glimpses of Hidden Progress in Sensory Cortex

Support is All You Need for Certified VAE Training

Minimax Optimal Two-Stage Algorithm For Moment Estimation Under Covariate Shift

Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models

Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning

On the Byzantine-Resilience of Distillation-Based Federated Learning

Large (Vision) Language Models are Unsupervised In-Context Learners

Achieving Dimension-Free Communication in Federated Learning via Zeroth-Order Optimization

Stable Segment Anything Model

Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models

Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning

Learning Causal Alignment for Reliable Disease Diagnosis

Solving New Tasks by Adapting Internet Video Knowledge

Stealthy Shield Defense: A Conditional Mutual Information-Based Approach against Black-Box Model Inversion Attacks

Exploring Local Memorization in Diffusion Models via Bright Ending Attention

A Generalist Hanabi Agent

Distribution-Free Data Uncertainty for Neural Network Regression

Multimodal Lego: Model Merging and Fine-Tuning Across Topologies and Modalities in Biomedicine

No Need to Talk: Asynchronous Mixture of Language Models

Data Scaling Laws in Imitation Learning for Robotic Manipulation

CONTRA: Conformal Prediction Region via Normalizing Flow Transformation

Control-oriented Clustering of Visual Latent Representation

Scalable and Certifiable Graph Unlearning: Overcoming the Approximation Error Barrier

Scalable Decision-Making in Stochastic Environments through Learned Temporal Abstraction

Leveraging Flatness to Improve Information-Theoretic Generalization Bounds for SGD

TEASER: Token Enhanced Spatial Modeling for Expressions Reconstruction

Think Thrice Before You Act: Progressive Thought Refinement in Large Language Models

Streamlining Prediction in Bayesian Deep Learning

ICLR: In-Context Learning of Representations

CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control

SOREL: A Stochastic Algorithm for Spectral Risks Minimization

Adaptive Shrinkage Estimation for Personalized Deep Kernel Regression in Modeling Brain Trajectories

Tackling Data Corruption in Offline Reinforcement Learning via Sequence Modeling

Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference

UV-Attack: Physical-World Adversarial Attacks on Person Detection via Dynamic-NeRF-based UV Mapping

On the Optimization Landscape of Low Rank Adaptation Methods for Large Language Models

Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing

Rethinking Invariance in In-context Learning

CofCA: A STEP-WISE Counterfactual Multi-hop QA benchmark

Forgetting Transformer: Softmax Attention with a Forget Gate

Retri3D: 3D Neural Graphics Representation Retrieval

$\gamma-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models

Iterative Label Refinement Matters More than Preference Optimization under Weak Supervision

Generalization Bounds and Model Complexity for Kolmogorov–Arnold Networks

Homomorphism Counts as Structural Encodings for Graph Learning

Multi-Perspective Data Augmentation for Few-shot Object Detection

Do LLMs ``know'' internally when they follow instructions?

Generalization, Expressivity, and Universality of Graph Neural Networks on Attributed Graphs

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation

SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

GravMAD: Grounded Spatial Value Maps Guided Action Diffusion for Generalized 3D Manipulation

LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization

Decoupled Finetuning for Domain Generalizable Semantic Segmentation

Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling

How Much is a Noisy Image Worth? Data Scaling Laws for Ambient Diffusion.

Generalized Video Moment Retrieval

Causal Effect Estimation with Mixed Latent Confounders and Post-treatment Variables

Ambient Diffusion Posterior Sampling: Solving Inverse Problems with Diffusion Models Trained on Corrupted Data

PIORF: Physics-Informed Ollivier-Ricci Flow for Long–Range Interactions in Mesh Graph Neural Networks

Bridging the Semantic Gap Between Text and Table: A Case Study on NL2SQL

Prioritized Generative Replay

ILLUSION: Unveiling Truth with a Comprehensive Multi-Modal, Multi-Lingual Deepfake Dataset

Mitigating Information Loss in Tree-Based Reinforcement Learning via Direct Optimization

Locality Alignment Improves Vision-Language Models

Demystifying the Token Dynamics of Deep Selective State Space Models

DEEM: Diffusion models serve as the eyes of large language models for image perception

Balancing Bias in Two-sided Markets for Fair Stable Matchings

Spiking Vision Transformer with Saccadic Attention

Breaking the Reclustering Barrier in Centroid-based Deep Clustering

Balancing Act: Diversity and Consistency in Large Language Model Ensembles

Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought

Neuralized Markov Random Field for Interaction-Aware Stochastic Human Trajectory Prediction

Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth Study

TULIP: Token-length Upgraded CLIP

Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving

Exposure Bracketing Is All You Need For A High-Quality Image

TS-LIF: A Temporal Segment Spiking Neuron Network for Time Series Forecasting

PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration

Metric-Driven Attributions for Vision Transformers

SEMDICE: Off-policy State Entropy Maximization via Stationary Distribution Correction Estimation

Systematic Outliers in Large Language Models

N-ForGOT: Towards Not-forgetting and Generalization of Open Temporal Graph Learning

Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning

Biologically Plausible Brain Graph Transformer

Reasoning of Large Language Models over Knowledge Graphs with Super-Relations

PhyloLM: Inferring the Phylogeny of Large Language Models and Predicting their Performances in Benchmarks

Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology

$InterLCM$: Low-Quality Images as Intermediate States of Latent Consistency Models for Effective Blind Face Restoration

PaLD: Detection of Text Partially Written by Large Language Models

MoLEx: Mixture of Layer Experts for Fine-tuning with Sparse Upcycling

Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits

Homomorphism Expressivity of Spectral Invariant Graph Neural Networks

High-quality Text-to-3D Character Generation with SparseCubes and Sparse Transformers.

DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image

Optimal Protocols for Continual Learning via Statistical Physics and Control Theory

SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?

UIFace: Unleashing Inherent Model Capabilities to Enhance Intra-Class Diversity in Synthetic Face Recognition

Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation

OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models

{$\tau$}-bench: A Benchmark for \underline{T}ool-\underline{A}gent-\underline{U}ser Interaction in Real-World Domains

UniCBE: An Uniformity-driven Comparing Based Evaluation Framework with Unified Multi-Objective Optimization

Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models

Online-to-Offline RL for Agent Alignment

Robust Transfer of Safety-Constrained Reinforcement Learning Agents

Efficient Alternating Minimization with Applications to Weighted Low Rank Approximation

Beyond the convexity assumption: Realistic tabular data generation under quantifier-free real linear constraints

PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance

Minimal Impact ControlNet: Advancing Multi-ControlNet Integration

Dynamics of Concept Learning and Compositional Generalization

Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models

MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models

Robust LLM safeguarding via refusal feature adversarial training

DRoC: Elevating Large Language Models for Complex Vehicle Routing via Decomposed Retrieval of Constraints

Adaptive Energy Alignment for Accelerating Test-Time Adaptation

ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation

Your Weak LLM is Secretly a Strong Teacher for Alignment

Understanding Optimization in Deep Learning with Central Flows

Energy-Based Diffusion Language Models for Text Generation

Group Downsampling with Equivariant Anti-aliasing

MAST: model-agnostic sparsified training

Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance

One Model Transfer to All: On Robust Jailbreak Prompts Generation against LLMs

Data-adaptive Differentially Private Prompt Synthesis for In-Context Learning

Efficient Masked AutoEncoder for Video Object Counting and A Large-Scale Benchmark

Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models

Autocorrelation Matters: Understanding the Role of Initialization Schemes for State Space Models

Graph Neural Networks Can (Often) Count Substructures

Offline Model-Based Optimization by Learning to Rank

C-CLIP: Multimodal Continual Learning for Vision-Language Model

Improved Finite-Particle Convergence Rates for Stein Variational Gradient Descent

Bundle Neural Network for message diffusion on graphs

Collaborative Discrete-Continuous Black-Box Prompt Learning for Language Models

PiCO: Peer Review in LLMs based on Consistency Optimization

FreeCG: Free the Design Space of Clebsch-Gordan Transform for Machine Learning Force Fields

Towards Understanding the Robustness of Diffusion-Based Purification: A Stochastic Perspective

Semantix: An Energy-guided Sampler for Semantic Style Transfer

BTBS-LNS: Binarized-Tightening, Branch and Search on Learning LNS Policies for MIP

Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy

Nesterov acceleration in benignly non-convex landscapes

Neural Spacetimes for DAG Representation Learning

MetaMetrics: Calibrating Metrics for Generation Tasks Using Human Preferences

Provable Robust Overfitting Mitigation in Wasserstein Distributionally Robust Optimization

Better than Your Teacher: LLM Agents that learn from Privileged AI Feedback

Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction

Tractable Multi-Agent Reinforcement Learning through Behavioral Economics

Adding Conditional Control to Diffusion Models with Reinforcement Learning

TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation

SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix

Context-Alignment: Activating and Enhancing LLMs Capabilities in Time Series

Effective post-training embedding compression via temperature control in contrastive training

Enhance Multi-View Classification Through Multi-Scale Alignment and Expanded Boundary

Breaking Mental Set to Improve Reasoning through Diverse Multi-Agent Debate

MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts

Sparse Learning for State Space Models on Mobile

Uncovering Overfitting in Large Language Model Editing

Automated Design of Agentic Systems

Adversarial Generative Flow Network for Solving Vehicle Routing Problems

Quantum (Inspired) $D^2$-sampling with Applications

Studying the Interplay Between the Actor and Critic Representations in Reinforcement Learning

ThinkBot: Embodied Instruction Following with Thought Chain Reasoning

ElasticTok: Adaptive Tokenization for Image and Video

Holographic Node Representations: Pre-training Task-Agnostic Node Embeddings

Boosting Ray Search Procedure of Hard-label Attacks with Transfer-based Priors

GLOMA: Global Video Text Spotting with Morphological Association

PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation

Discrete Diffusion Schrödinger Bridge Matching for Graph Transformation

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

Hierarchical Autoregressive Transformers: Combining Byte- and Word-Level Processing for Robust, Adaptable Language Models

Precedence-Constrained Winter Value for Effective Graph Data Valuation

COFlowNet: Conservative Constraints on Flows Enable High-Quality Candidate Generation

Dynamic Diffusion Transformer

A Theory for Token-Level Harmonization in Retrieval-Augmented Generation

Scaling and evaluating sparse autoencoders

Scaling Long Context Training Data by Long-Distance Referrals

Knowledge Localization: Mission Not Accomplished? Enter Query Localization!

Learning the Complexity of Weakly Noisy Quantum States

DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search

Heavy-Tailed Diffusion Models

ProAdvPrompter: A Two-Stage Journey to Effective Adversarial Prompting for LLMs

Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology

Cross-Embodiment Dexterous Grasping with Reinforcement Learning

Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds

PEARL: Towards Permutation-Resilient LLMs

Scaling In-the-Wild Training for Diffusion-based Illumination Harmonization and Editing by Imposing Consistent Light Transport

Jailbreaking as a Reward Misspecification Problem

Compositional simulation-based inference for time series

Dimension Agnostic Neural Processes

Capturing the Temporal Dependence of Training Data Influence

SeRA: Self-Reviewing and Alignment of LLMs using Implicit Reward Margins

UniCon: Unidirectional Information Flow for Effective Control of Large-Scale Diffusion Models

NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens

Score-based Self-supervised MRI Denoising

CryoGEN: Generative Energy-based Models for Cryogenic Electron Tomography Reconstruction

Learning Splitting Heuristics in Divide-and-Conquer SAT Solvers with Reinforcement Learning

LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation

Beyond Autoregression: Fast LLMs via Self-Distillation Through Time

Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models

Data Distillation for extrapolative protein design through exact preference optimization

Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization

The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws

Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning

Discriminating image representations with principal distortions

Functional Homotopy: Smoothing Discrete Optimization via Continuous Parameters for LLM Jailbreak Attacks

Can Watermarked LLMs be Identified by Users via Crafted Prompts?

VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks

Weak-to-Strong Generalization Through the Data-Centric Lens

The impact of allocation strategies in subset learning on the expressive power of neural networks

EG4D: Explicit Generation of 4D Object without Score Distillation

How Much is Unseen Depends Chiefly on Information About the Seen

Deep Kernel Posterior Learning under Infinite Variance Prior Weights

TODO: Enhancing LLM Alignment with Ternary Preferences

Towards Certification of Uncertainty Calibration under Adversarial Attacks

Counterfactual Realizability

SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints

Flow Matching with Gaussian Process Priors for Probabilistic Time Series Forecasting

Unbounded: A Generative Infinite Game of Character Life Simulation

Reconciling Model Multiplicity for Downstream Decision Making

VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking

Selective Attention Improves Transformer

Schur's Positive-Definite Network: Deep Learning in the SPD cone with structure

Decoupled Subgraph Federated Learning

Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning

Does SGD really happen in tiny subspaces?

Multimodal Quantitative Language for Generative Recommendation

UniMatch: Universal Matching from Atom to Task for Few-Shot Drug Discovery

PINP: Physics-Informed Neural Predictor with latent estimation of fluid flows

OccProphet: Pushing the Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with an Observer-Forecaster-Refiner Framework

Logic-Logit: A Logic-Based Approach to Choice Modeling

Memory Efficient Transformer Adapter for Dense Predictions

Learning View-invariant World Models for Visual Robotic Manipulation

Long-tailed Adversarial Training with Self-Distillation

Scaling Laws for Downstream Task Performance in Machine Translation

ImagineNav: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination

DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation

Decomposition Polyhedra of Piecewise Linear Functions

RecFlow: An Industrial Full Flow Recommendation Dataset

The Pitfalls of Memorization: When Memorization Hurts Generalization

Differentiable Optimization of Similarity Scores Between Models and Brains

VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words?

Simplifying, Stabilizing and Scaling Continuous-time Consistency Models

Neural Approximate Mirror Maps for Constrained Diffusion Models

MindSimulator: Exploring Brain Concept Localization via Synthetic fMRI

DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking head Video Generation

Scalable Extraction of Training Data from Aligned, Production Language Models

Linear Multistep Solver Distillation for Fast Sampling of Diffusion Models

Text4Seg: Reimagining Image Segmentation as Text Generation

IMDPrompter: Adapting SAM to Image Manipulation Detection by Cross-View Automated Prompt Learning

Revolutionizing EMCCD Denoising through a Novel Physics-Based Learning Framework for Noise Modeling

Training-free LLM-generated Text Detection by Mining Token Probability Sequences

Param$\Delta$ for Direct Mixing: Post-Train Large Language Model At Zero Cost

Routing Experts: Learning to Route Dynamic Experts in Existing Multi-modal Large Language Models

DistillHGNN: A Knowledge Distillation Approach for High-Speed Hypergraph Neural Networks

Overcoming Slow Decision Frequencies in Continuous Control: Model-Based Sequence Reinforcement Learning for Model-Free Control

Growth Inhibitors for Suppressing Inappropriate Image Concepts in Diffusion Models

Field-DiT: Diffusion Transformer on Unified Video, 3D, and Game Field Generation

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

In-Context Editing: Learning Knowledge from Self-Induced Distributions

PIED: Physics-Informed Experimental Design for Inverse Problems

Counterfactual Concept Bottleneck Models

Efficient Low-Bit Quantization with Adaptive Scales for Multi-Task Co-Training

Modeling dynamic social vision highlights gaps between deep learning and humans

Learning Fine-Grained Representations through Textual Token Disentanglement in Composed Video Retrieval

Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models

Effective Interplay between Sparsity and Quantization: From Theory to Practice

LancBiO: Dynamic Lanczos-aided Bilevel Optimization via Krylov Subspace

LLM-based Typed Hyperresolution for Commonsense Reasoning with Knowledge Bases

Process Reward Model with Q-value Rankings

Towards Effective Evaluations and Comparisons for LLM Unlearning Methods

FlashMask: Efficient and Rich Mask Extension of FlashAttention

When GNNs meet symmetry in ILPs: an orbit-based feature augmentation approach

Inspection and Control of Self-Generated-Text Recognition Ability in Llama3-8b-Instruct

GrabS: Generative Embodied Agent for 3D Object Segmentation without Scene Supervision

Learning Successor Features with Distributed Hebbian Temporal Memory

ST-GCond: Self-supervised and Transferable Graph Dataset Condensation

A Solvable Attention for Neural Scaling Laws

Making Text Embedders Few-Shot Learners

Scaling Laws for Precision

Provably Accurate Shapley Value Estimation via Leverage Score Sampling

Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators

Learning to Communicate Through Implicit Communication Channels

CausalRivers - Scaling up benchmarking of causal discovery for real-world time-series

PFDiff: Training-Free Acceleration of Diffusion Models Combining Past and Future Scores

Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation

Scaling Autonomous Agents via Automatic Reward Modeling And Planning

Improving Instruction-Following in Language Models through Activation Steering

GaussianBlock: Building Part-Aware Compositional and Editable 3D Scene by Primitives and Gaussians

Feature Responsiveness Scores: Model-Agnostic Explanations for Recourse

Learning Evolving Tools for Large Language Models

Failures to Find Transferable Image Jailbreaks Between Vision-Language Models

BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks

Self-Supervised Diffusion MRI Denoising via Iterative and Stable Refinement

ZeroDiff: Solidified Visual-semantic Correlation in Zero-Shot Learning

Scalable Universal T-Cell Receptor Embeddings from Adaptive Immune Repertoires

Active Learning for Neural PDE Solvers

Beware of Calibration Data for Pruning Large Language Models

When Selection Meets Intervention: Additional Complexities in Causal Discovery

Learning to Discretize Denoising Diffusion ODEs

Better autoregressive regression with LLMs via regression-aware fine-tuning

(Mis)Fitting Scaling Laws: A Survey of Scaling Law Fitting Techniques in Deep Learning

MolSpectra: Pre-training 3D Molecular Representation with Multi-modal Energy Spectra

Enhancing Uncertainty Estimation and Interpretability with Bayesian Non-negative Decision Layer

Language Models Learn to Mislead Humans via RLHF

Discrete Distribution Networks

Learning General-purpose Biomedical Volume Representations using Randomized Synthesis

Endless Jailbreaks with Bijection Learning

UniRestore3D: A Scalable Framework For General Shape Restoration

Optimal Transport for Time Series Imputation

Reflective Gaussian Splatting

Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control

Context Steering: Controllable Personalization at Inference Time

Diffusion Models are Evolutionary Algorithms

OLMoE: Open Mixture-of-Experts Language Models

Deep Signature: Characterization of Large-Scale Molecular Dynamics

MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

Vector-ICL: In-context Learning with Continuous Vector Representations

MaxCutPool: differentiable feature-aware Maxcut for pooling in graph neural networks

Feature-Based Online Bilateral Trade

SymmCD: Symmetry-Preserving Crystal Generation with Diffusion Models

Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo

JudgeLM: Fine-tuned Large Language Models are Scalable Judges

EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models

InstantSplamp: Fast and Generalizable Stenography Framework for Generative Gaussian Splatting

HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models

Boosting Latent Diffusion with Perceptual Objectives

GOLD: Graph Out-of-Distribution Detection via Implicit Adversarial Latent Generation

Agent-to-Sim: Learning Interactive Behavior Models from Casual Longitudinal Videos

Language Agents Meet Causality -- Bridging LLMs and Causal World Models

Quality over Quantity in Attention Layers: When Adding More Heads Hurts

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

GPromptShield: Elevating Resilience in Graph Prompt Tuning Against Adversarial Attacks

MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation

Image and Video Tokenization with Binary Spherical Quantization

A Unified Framework for Forward and Inverse Problems in Subsurface Imaging using Latent Space Translations

MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models

Understanding the Generalization of In-Context Learning in Transformers: An Empirical Study

Pacmann: Efficient Private Approximate Nearest Neighbor Search

Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron

Causally Motivated Sycophancy Mitigation for Large Language Models

Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets

Point-SAM: Promptable 3D Segmentation Model for Point Clouds

NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models

Language-Image Models with 3D Understanding

ThermalGaussian: Thermal 3D Gaussian Splatting

AIMS.au: A Dataset for the Analysis of Modern Slavery Countermeasures in Corporate Statements

Federated Class-Incremental Learning: A Hybrid Approach Using Latent Exemplars and Data-Free Techniques to Address Local and Global Forgetting

Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI

Effective and Efficient Time-Varying Counterfactual Prediction with State-Space Models

Forewarned is Forearmed: Harnessing LLMs for Data Synthesis via Failure-induced Exploration

ELFS: Label-Free Coreset Selection with Proxy Training Dynamics

DenoiseVAE: Learning Molecule-Adaptive Noise Distributions for Denoising-based 3D Molecular Pre-training

SiReRAG: Indexing Similar and Related Information for Multihop Reasoning

LICO: Large Language Models for In-Context Molecular Optimization

Integral Performance Approximation for Continuous-Time Reinforcement Learning Control

GOAL: A Generalist Combinatorial Optimization Agent Learner

AFlow: Automating Agentic Workflow Generation

Random Is All You Need: Random Noise Injection on Feature Statistics for Generalizable Deep Image Denoising

Deep Kernel Relative Test for Machine-generated Text Detection

Joint Graph Rewiring and Feature Denoising via Spectral Resonance

SSOLE: Rethinking Orthogonal Low-rank Embedding for Self-Supervised Learning

Group Ligands Docking to Protein Pockets

VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents

Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs

Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles

A Skewness-Based Criterion for Addressing Heteroscedastic Noise in Causal Discovery

EvA: Erasing Spurious Correlations with Activations

Unified Parameter-Efficient Unlearning for LLMs

Enhancing Clustered Federated Learning: Integration of Strategies and Improved Methodologies

Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks

To Code or Not To Code? Exploring Impact of Code in Pre-training

Semantic Temporal Abstraction via Vision-Language Model Guidance for Efficient Reinforcement Learning

Synthesizing Realistic fMRI: A Physiological Dynamics-Driven Hierarchical Diffusion Model for Efficient fMRI Acquisition

h4rm3l: A Language for Composable Jailbreak Attack Synthesis

Dataset Ownership Verification in Contrastive Pre-trained Models

Lines of Thought in Large Language Models

IgGM: A Generative Model for Functional Antibody and Nanobody Design

Understanding the Stability-based Generalization of Personalized Federated Learning

Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems

MVTokenFlow: High-quality 4D Content Generation using Multiview Token Flow

Mask in the Mirror: Implicit Sparsification

STORM: Spatio-TempOral Reconstruction Model For Large-Scale Outdoor Scenes

On Stochastic Contextual Bandits with Knapsacks in Small Budget Regime

PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches

PALMBENCH: A COMPREHENSIVE BENCHMARK OF COMPRESSED LARGE LANGUAGE MODELS ON MOBILE PLATFORMS

Gradient descent with generalized Newton’s method

Task Descriptors Help Transformers Learn Linear Models In-Context

Generative Monoculture in Large Language Models

Revisit Micro-batch Clipping: Adaptive Data Pruning via Gradient Manipulation

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

SpinQuant: LLM Quantization with Learned Rotations

U-Nets as Belief Propagation: Efficient Classification, Denoising, and Diffusion in Generative Hierarchical Models

Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design

Diverse Policies Recovering via Pointwise Mutual Information Weighted Imitation Learning

Local Patterns Generalize Better for Novel Anomalies

SFESS: Score Function Estimators for $k$-Subset Sampling

Scalable Discrete Diffusion Samplers: Combinatorial Optimization and Statistical Physics

Learn hybrid prototypes for multivariate time series anomaly detection

SGD with memory: fundamental properties and stochastic acceleration

MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding

Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent

Model Risk-sensitive Offline Reinforcement Learning

Multi-Accurate CATE is Robust to Unknown Covariate Shifts

Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies

AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs

DLEFT-MKC: Dynamic Late Fusion Multiple Kernel Clustering with Robust Tensor Learning via Min-Max Optimization

Apollo-MILP: An Alternating Prediction-Correction Neural Solving Framework for Mixed-Integer Linear Programming

Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Models

Pursuing Feature Separation based on Neural Collapse for Out-of-Distribution Detection

LoCA: Location-Aware Cosine Adaptation for Parameter-Efficient Fine-Tuning

A Second-Order Perspective on Model Compositionality and Incremental Learning

Structuring Benchmark into Knowledge Graphs to Assist Large Language Models in Retrieving and Designing Models

Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation

On the Computation of the Fisher Information in Continual Learning

Distributed Speculative Inference (DSI): Speculation Parallelism for Provably Faster Lossless Language Model Inference

Robust Function-Calling for On-Device Language Model via Function Masking

More Experts Than Galaxies: Conditionally-Overlapping Experts with Biologically-Inspired Fixed Routing

Energy-Weighted Flow Matching for Offline Reinforcement Learning

Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them

ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability

Perplexity Trap: PLM-Based Retrievers Overrate Low Perplexity Documents

Reward Dimension Reduction for Scalable Multi-Objective Reinforcement Learning

Utility-Directed Conformal Prediction: A Decision-Aware Framework for Actionable Uncertainty Quantification

MIND: Math Informed syNthetic Dialogues for Pretraining LLMs

Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning

Inference Scaling for Long-Context Retrieval Augmented Generation

A3D: Does Diffusion Dream about 3D Alignment?

Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization

CraftRTL: High-quality Synthetic Data Generation for Verilog Code Models with Correct-by-Construction Non-Textual Representations and Targeted Code Repair

The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs

MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization

MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge

Scrutinize What We Ignore: Reining In Task Representation Shift Of Context-Based Offline Meta Reinforcement Learning

SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation

SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

From Commands to Prompts: LLM-based Semantic File System for AIOS

Improving Data Efficiency via Curating LLM-Driven Rating Systems

Shh, don't say that! Domain Certification in LLMs

Memory Mosaics

Denoising Levy Probabilistic Models

SFS: Smarter Code Space Search improves LLM Inference Scaling

Progressive Mixed-Precision Decoding for Efficient LLM Inference

Preserving Diversity in Supervised Fine-Tuning of Large Language Models

Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification

T2V-Turbo-v2: Enhancing Video Model Post-Training through Data, Reward, and Conditional Guidance Design

CURIE: Evaluating LLMs on Multitask Scientific Long-Context Understanding and Reasoning

Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data

Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding

u-$\mu$P: The Unit-Scaled Maximal Update Parametrization

Does Editing Provide Evidence for Localization?

NRGBoost: Energy-Based Generative Boosted Trees

MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents

Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search

A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation

Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization

KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models

HERO: Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning

Should VLMs be Pre-trained with Image Data?

Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist

RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards

Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Transformers

Generalizing Weisfeiler-Lehman Kernels to Subgraphs

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

On the Hölder Stability of Multiset and Graph Neural Networks

EIA: ENVIRONMENTAL INJECTION ATTACK ON GENERALIST WEB AGENTS FOR PRIVACY LEAKAGE

Free Hunch: Denoiser Covariance Estimation for Diffusion Models Without Extra Costs

Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

Palu: KV-Cache Compression with Low-Rank Projection

Unleashing the Power of Task-Specific Directions in Parameter Efficient Fine-tuning

Higher-Order Graphon Neural Networks: Approximation and Cut Distance

TD-Paint: Faster Diffusion Inpainting Through Time Aware Pixel Conditioning

MixEval-X: Any-to-any Evaluations from Real-world Data Mixture

Relation-Aware Diffusion for Heterogeneous Graphs with Partially Observed Features

Normed Spaces for Graph Embedding

GotenNet: Rethinking Efficient 3D Equivariant Graph Neural Networks

An Undetectable Watermark for Generative Image Models

PT-T2I/V: An Efficient Proxy-Tokenized Diffusion Transformer for Text-to-Image/Video-Task

Counterfactual Generative Modeling with Variational Causal Inference

Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems

JetFormer: An autoregressive generative model of raw images and text

One Step Diffusion via Shortcut Models

Flow Matching with General Discrete Paths: A Kinetic-Optimal Perspective

Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking

Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning

Towards Explaining the Power of Constant-depth Graph Neural Networks for Structured Linear Programming

From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities

Towards a General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders

MANTRA: The Manifold Triangulations Assemblage

In vivo cell-type and brain region classification via multimodal contrastive learning

MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA

Re-Imagining Multimodal Instruction Tuning: A Representation View

MuseGNN: Forming Scalable, Convergent GNN Layers that Minimize a Sampling-Based Energy

SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models

ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation

$\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs

Adversarially Robust Anomaly Detection through Spurious Negative Pair Mitigation

Improving Neural Optimal Transport via Displacement Interpolation

Restructuring Vector Quantization with the Rotation Trick

TimeMixer++: A General Time Series Pattern Machine for Universal Predictive Analysis

When LLMs Play the Telephone Game: Cultural Attractors as Conceptual Tools to Evaluate LLMs in Multi-turn Settings

As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss

SCBench: A KV Cache-Centric Analysis of Long-Context Methods

Greener GRASS: Enhancing GNNs with Encoding, Rewiring, and Attention

Adaptive Methods through the Lens of SDEs: Theoretical Insights on the Role of Noise

Edge Prompt Tuning for Graph Neural Networks

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Episodic Novelty Through Temporal Distance

Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning

AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents

Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba Models

Test-time Alignment of Diffusion Models without Reward Over-optimization

Gated Delta Networks: Improving Mamba2 with Delta Rule

Looking Backward: Streaming Video-to-Video Translation with Feature Banks

Graph Transformers Dream of Electric Flow

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

Controlling the Fidelity and Diversity of Deep Generative Models via Pseudo Density

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models

VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning

Linear Transformer Topological Masking with Graph Random Features

Fugatto 1: Foundational Generative Audio Transformer Opus 1

Reconstruction-Guided Policy: Enhancing Decision-Making through Agent-Wise State Consistency

ControlAR: Controllable Image Generation with Autoregressive Models

DoF: A Diffusion Factorization Framework for Offline Multi-Agent Reinforcement Learning

Graph Neural Networks Gone Hogwild

BrainOOD: Out-of-distribution Generalizable Brain Network Analysis

Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

On Statistical Rates of Conditional Diffusion Transformers: Approximation, Estimation and Minimax Optimality

Youku Dense Caption: A Large-scale Chinese Video Dense Caption Dataset and Benchmarks

Unlocking Efficient, Scalable, and Continual Knowledge Editing with Basis-Level Representation Fine-Tuning

MIND over Body: Adaptive Thinking using Dynamic Computation

The Rise and Down of Babel Tower: Investigating the Evolution Process of Multilingual Code Large Language Model

MagicPIG: LSH Sampling for Efficient LLM Generation

On the Expressive Power of Sparse Geometric MPNNs

On LLM Knowledge Distillation - A Comparison between Forward KL and Reverse KL

Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality

Presto! Distilling Steps and Layers for Accelerating Music Generation

Token-Supervised Value Models for Enhancing Mathematical Problem-Solving Capabilities of Large Language Models

Human Simulacra: Benchmarking the Personification of Large Language Models

Deep Incomplete Multi-view Learning via Cyclic Permutation of VAEs

Fewer May Be Better: Enhancing Offline Reinforcement Learning with Reduced Dataset

Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage

Automated Filtering of Human Feedback Data for Aligning Text-to-Image Diffusion Models

T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

Towards Hierarchical Rectified Flow

Fundamental Limitations on Subquadratic Alternatives to Transformers

Why RoPE Struggles to Maintain Long-Term Decay in Long Sequences?

Efficient Evolutionary Search Over Chemical Space with Large Language Models

ReNovo: Retrieval-Based \emph{De Novo} Mass Spectrometry Peptide Sequencing

NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation

Enhancing the Scalability and Applicability of Kohn-Sham Hamiltonians for Molecular Systems

MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses

Learning Equivariant Non-Local Electron Density Functionals

ProtPainter: Draw or Drag Protein via Topology-guided Diffusion

Contextualizing biological perturbation experiments through language

BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments

Hyperbolic Genome Embeddings

Estimation of single-cell and tissue perturbation effect in spatial transcriptomics via Spatial Causal Disentanglement

Reliable and Diverse Evaluation of LLM Medical Knowledge Mastery

BoneMet: An Open Large-Scale Multi-Modal Murine Dataset for Breast Cancer Bone Metastasis Diagnosis and Prognosis

CLOVER: Cross-Layer Orthogonal Vectors Pruning and Fine-Tuning

Time-to-Event Pretraining for 3D Medical Imaging

Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval

PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding

QuaDiM: A Conditional Diffusion Model For Quantum State Property Estimation

Continuous Ensemble Weather Forecasting with Diffusion models

SimXRD-4M: Big Simulated X-ray Diffraction Data and Crystal Symmetry Classification Benchmark

Can Transformers Do Enumerative Geometry?

Solving Differential Equations with Constrained Learning

Diff-PIC: Revolutionizing Particle-In-Cell Nuclear Fusion Simulation with Diffusion Models

No Equations Needed: Learning System Dynamics Without Relying on Closed-Form ODEs

Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving

Rapidly Adapting Policies to the Real-World via Simulation-Guided Fine-Tuning

Generating Freeform Endoskeletal Robots

RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation

Physiome-ODE: A Benchmark for Irregularly Sampled Multivariate Time-Series Forecasting Based on Biological ODEs

Diffusion-based Decoupled Deterministic and Uncertain Framework for Probabilistic Multivariate Time Series Forecasting

Zero-shot forecasting of chaotic systems

SimpleTM: A Simple Baseline for Multivariate Time Series Forecasting

TimeInf: Time Series Data Contribution via Influence Functions

Infinite-Resolution Integral Noise Warping for Diffusion Models

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

Automated Proof Generation for Rust Code via Self-Evolution

Improving Language Model Distillation through Hidden State Matching

Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement

Agents' Room: Narrative Generation through Multi-step Collaboration

Continuous Autoregressive Modeling with Stochastic Monotonic Alignment for Speech Synthesis

OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalities

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

W-PCA Based Gradient-Free Proxy for Efficient Search of Lightweight Language Models

PersonalLLM: Tailoring LLMs to Individual Preferences

DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory

Competing Large Language Models in Multi-Agent Gaming Environments

SeCom: On Memory Construction and Retrieval for Personalized Conversational Agents

Multi-modal brain encoding models for multi-modal stimuli

Animate Your Thoughts: Reconstruction of Dynamic Natural Vision from Human Brain Activity

Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers

Quantized Spike-driven Transformer

Range, not Independence, Drives Modularity in Biologically Inspired Representations

BrainACTIV: Identifying visuo-semantic properties driving cortical selectivity using diffusion-based image manipulation

Associative memory and dead neurons

SIM: Surface-based fMRI Analysis for Inter-Subject Multimodal Decoding from Movie-Watching Experiments

As large as it gets – Studying Infinitely Large Convolutions via Neural Implicit Frequency Filters

MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Masked Image Modeling Representations

Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering

CityAnchor: City-scale 3D Visual Grounding with Multi-modality LLMs

Segment Any 3D Object with Language

Bridging Compressed Image Latents and Multimodal Large Language Models

Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy Video

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

Less is More: Masking Elements in Image Condition Features Avoids Content Leakages in Style Transfer Diffusion Models

TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio Motion Embedding and Diffusion Interpolation

Order-aware Interactive Segmentation

Learning Spatial-Semantic Features for Robust Video Object Segmentation

Adaptive Camera Sensor for Vision Models

Optimizing 4D Gaussians for Dynamic Scene Video from Single Landscape Images

Rethinking Classifier Re-Training in Long-Tailed Recognition: Label Over-Smooth Can Balance

Pedestrian Motion Reconstruction: A Large-scale Benchmark via Mixed Reality Rendering with Multiple Perspectives and Modalities

Foundation Models Secretly Understand Neural Network Weights: Enhancing Hypernetwork Architectures with Foundation Models

Robust-PIFu: Robust Pixel-aligned Implicit Function for 3D Human Digitalization from a Single Image

Unleashing the Potential of Vision-Language Pre-Training for 3D Zero-Shot Lesion Segmentation via Mask-Attribute Alignment

Generation and Comprehension Hand-in-Hand: Vision-guided Expression Diffusion for Boosting Referring Expression Generation and Comprehension

3DGS-Drag: Dragging Gaussians for Intuitive Point-Based 3D Editing

AugKD: Ingenious Augmentations Empower Knowledge Distillation for Image Super-Resolution

Cocoon: Robust Multi-Modal Perception with Uncertainty-Aware Sensor Fusion

FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality

CAT-3DGS: A Context-Adaptive Triplane Approach to Rate-Distortion-Optimized 3DGS Compression

Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting

Streaming Video Question-Answering with In-context Video KV-Cache Retrieval

X-Drive: Cross-modality Consistent Multi-Sensor Data Synthesis for Driving Scenarios

Sketch2Diagram: Generating Vector Diagrams from Hand-Drawn Sketches

Bringing NeRFs to the Latent Space: Inverse Graphics Autoencoder

Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control

Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding

Articulate-Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model

An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Latent Radiance Fields with 3D-aware 2D Representations

Which Tasks Should Be Compressed Together? A Causal Discovery Approach for Efficient Multi-Task Representation Compression

Measuring And Improving Engagement of Text-to-Image Generation Models

SurFhead: Affine Rig Blending for Geometrically Accurate 2D Gaussian Surfel Head Avatars

SynQ: Accurate Zero-shot Quantization by Synthesis-aware Fine-tuning

IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training

Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes

Learning Color Equivariant Representations

Towards Realistic Data Generation for Real-World Super-Resolution

Re-Aligning Language to Visual Objects with an Agentic Workflow

EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition

ProtoSnap: Prototype Alignment For Cuneiform Signs

On the Transfer of Object-Centric Representation Learning

CertainlyUncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness

Recognize Any Surgical Object: Unleashing the Power of Weakly-Supervised Data

CHAMP: Conformalized 3D Human Multi-Hypothesis Pose Estimators

RESfM: Robust Deep Equivariant Structure from Motion

IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model

LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension

OATS: Outlier-Aware Pruning Through Sparse and Low Rank Decomposition

MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models

EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing

DiscoveryBench: Towards Data-Driven Discovery with Large Language Models

SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation

OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?

SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models

Empowering Users in Digital Privacy Management through Interactive LLM-Based Agents

BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics

MambaQuant: Quantizing the Mamba Family with Variance Aligned Rotation Methods

OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting

Smoothing the Shift: Towards Stable Test-Time Adaptation under Complex Multimodal Noises

BP-Modified Local Loss for Efficient Training of Deep Neural Networks

Regularizing Energy among Training Samples for Out-of-Distribution Generalization

Rotated Runtime Smooth: Training-Free Activation Smoother for accurate INT4 inference

MotionClone: Training-Free Motion Cloning for Controllable Video Generation

LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding

Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression

EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment

From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions

TypedThinker: Diversify Large Language Model Reasoning with Typed Thinking

Progress or Regress? Self-Improvement Reversal in Post-training

Are Large Vision Language Models Good Game Players?

Neural Phylogeny: Fine-Tuning Relationship Detection among Neural Networks

Ensembles of Low-Rank Expert Adapters

LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Uncovering Latent Memories in Large Language Models

Your Mixture-of-Experts LLM Is Secretly an Embedding Model for Free

Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers

Differential learning kinetics govern the transition from memorization to generalization during in-context learning

DOCS: Quantifying Weight Similarity for Deeper Insights into Large Language Models

Pre-training of Foundation Adapters for LLM Fine-tuning

What's New in My Data? Novelty Exploration via Contrastive Generation

Wavelet-based Positional Representation for Long Context

Self-Improvement in Language Models: The Sharpening Mechanism

Task-Adaptive Pretrained Language Models via Clustered-Importance Sampling

A Statistical Framework for Ranking LLM-based Chatbots

Sail into the Headwind: Alignment via Robust Rewards and Dynamic Labels against Reward Hacking

HaDeMiF: Hallucination Detection and Mitigation in Large Language Models

BadJudge: Backdoor Vulnerabilities of LLM-As-A-Judge

Improving Pretraining Data Using Perplexity Correlations

Going Beyond Feature Similarity: Effective Dataset distillation based on Class-aware Conditional Mutual Information

Gramian Multimodal Representation Learning and Alignment

SelKD: Selective Knowledge Distillation via Optimal Transport Perspective

Improving Neural Network Accuracy by Concurrently Training with a Twin Network

Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge

Robust Representation Consistency Model via Contrastive Denoising

SEBRA : Debiasing through Self-Guided Bias Ranking

DRoP: Distributionally Robust Data Pruning

Democratic Training Against Universal Adversarial Perturbations

Severing Spurious Correlations with Data Pruning

Adversaries With Incentives: A Strategic Alternative to Adversarial Robustness

Provably Reliable Conformal Prediction Sets in the Presence of Data Poisoning

MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments

Morphing Tokens Draw Strong Masked Image Models

The "Law'' of the Unconscious Contrastive Learner: Probabilistic Alignment of Unpaired Modalities

DebGCD: Debiased Learning with Distribution Guidance for Generalized Category Discovery

Bayesian Treatment of the Spectrum of the Empirical Kernel in (Sub)Linear-Width Neural Networks

A Rainbow in Deep Network Black Boxes

Prediction Risk and Estimation Risk of the Ridgeless Least Squares Estimator under General Assumptions on Regression Errors

Oscillatory State-Space Models

Two Sparse Matrices are Better than One: Sparsifying Neural Networks with Double Sparse Factorization

Designing Concise ConvNets with Columnar Stages

MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines

Rethinking Light Decoder-based Solvers for Vehicle Routing Problems

Solving hidden monotone variational inequalities with surrogate losses

Sharpness-Aware Minimization: General Analysis and Improved Rates

Local convergence of simultaneous min-max algorithms to differential equilibrium on Riemannian manifold

OCCAM: Towards Cost-Efficient and Accuracy-Aware Classification Inference

Utilitarian Algorithm Configuration for Infinite Parameter Spaces

Optimizing Posterior Samples for Bayesian Optimization via Rootfinding

On the Almost Sure Convergence of the Stochastic Three Points Algorithm

Local Steps Speed Up Local GD for Heterogeneous Distributed Logistic Regression

Joint Gradient Balancing for Data Ordering in Finite-Sum Multi-Objective Optimization

Learning on One Mode: Addressing Multi-modality in Offline Reinforcement Learning

Cross-Domain Offline Policy Adaptation with Optimal Transport and Dataset Constraint

Fat-to-Thin Policy Optimization: Offline Reinforcement Learning with Sparse Policies

Flat Reward in Policy Parameter Space Implies Robust Reinforcement Learning

Policy Optimization under Imperfect Human Interactions with Agent-Gated Shared Autonomy

Learning to Search from Demonstration Sequences

Policy Gradient with Kernel Quadrature

On Generalization Across Environments In Multi-Objective Reinforcement Learning

POGEMA: A Benchmark Platform for Cooperative Multi-Agent Pathfinding

Expected Return Symmetries

Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning

Learning mirror maps in policy mirror descent

What Makes a Good Diffusion Planner for Decision Making?

Simple, Good, Fast: Self-Supervised World Models Free of Baggage

How to Find the Exact Pareto Front for Multi-Objective MDPs?

Dynamic Contrastive Skill Learning with State-Transition Based Skill Clustering and Dynamic Length Adjustment

$q$-exponential family for policy optimization

CBMA: Improving Conformal Prediction through Bayesian Model Averaging

Robust Simulation-Based Inference under Missing Data via Neural Processes

InverseBench: Benchmarking Plug-and-Play Diffusion Priors for Inverse Problems in Physical Sciences

Residual Deep Gaussian Processes on Manifolds

Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization

End-to-end Learning of Gaussian Mixture Priors for Diffusion Sampler

Diffusion On Syntax Trees For Program Synthesis

Flow-based Variational Mutual Information: Fast and Flexible Approximations

Benchmarking Predictive Coding Networks -- Made Simple

Kernel-based Optimally Weighted Conformal Time-Series Prediction

Connecting Federated ADMM to Bayes

Training One-Dimensional Graph Neural Networks is NP-Hard

On the Optimal Memorization Capacity of Transformers

State Space Models are Provably Comparable to Transformers in Dynamic Token Selection

Single-agent Poisoning Attacks Suffice to Ruin Multi-Agent Learning

Efficient Online Pruning and Abstraction for Imperfect Information Extensive-Form Games

Strategic Classification With Externalities

Sketching for Convex and Nonconvex Regularized Least Squares with Sharp Guarantees

ONLINE EPSILON NET & PIERCING SET FOR GEOMETRIC CONCEPTS

Bounds on $L_p$ Errors in Density Ratio Estimation via $f$-Divergence Loss Functions

Conservative Contextual Bandits: Beyond Linear Representations

Learning from Imperfect Human Feedback: A Tale from Corruption-Robust Dueling

Satisficing Regret Minimization in Bandits

Linear Bandits with Memory

ADAM Optimization with Adaptive Batch Selection

Do Stochastic, Feel Noiseless: Stable Stochastic Optimization via a Double Momentum Mechanism

Accelerated Over-Relaxation Heavy-Ball Method: Achieving Global Accelerated Convergence with Broad Generalization

Reexamining the Aleatoric and Epistemic Uncertainty Dichotomy

Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis

Learning a Fast Mixing Exogenous Block MDP using a Single Trajectory

Statistical Tractability of Off-policy Evaluation of History-dependent Policies in POMDPs

Revisiting a Design Choice in Gradient Temporal Difference Learning

Generalizable Motion Planning via Operator Learning

On Minimizing Adversarial Counterfactual Error in Adversarial Reinforcement Learning

Topological Zigzag Spaghetti for Diffusion-based Generation and Prediction on Graphs

ADAM: An Embodied Causal Agent in Open-World Environments

Causal Discovery via Bayesian Optimization

Euler Characteristic Tools for Topological Data Analysis

KAN: Kolmogorov–Arnold Networks

Advancing Out-of-Distribution Detection via Local Neuroplasticity

VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning

SyllableLM: Learning Coarse Semantic Units for Speech Language Models

Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning

An Information Criterion for Controlled Disentanglement of Multimodal Data

Test-time Adaptation for Cross-modal Retrieval with Query Shift

Neural networks on Symmetric Spaces of Noncompact Type

ColPali: Efficient Document Retrieval with Vision Language Models

Boosting Methods for Interval-censored Data with Regression and Classification

Simple yet Effective Incomplete Multi-view Clustering: Similarity-level Imputation and Intra-view Hybrid-group Prototype Construction

Exact Community Recovery under Side Information: Optimality of Spectral Algorithms

Content-Style Learning from Unaligned Domains: Identifiability under Unknown Latent Dimensions

Scale-Aware Contrastive Reverse Distillation for Unsupervised Medical Anomaly Detection

Let SSMs be ConvNets: State-space Modeling with Optimal Tensor Contractions

Single Teacher, Multiple Perspectives: Teacher Knowledge Augmentation for Enhanced Knowledge Distillation

Fine-tuning can cripple your foundation model; preserving features may be the solution

Comparing Targeting Strategies for Maximizing Social Welfare with Limited Resources

Do not write that jailbreak paper

Encryption-Friendly LLM Architecture

Image-level Memorization Detection via Inversion-based Inference Perturbation

Towards hyperparameter-free optimization with differential privacy

The Last Iterate Advantage: Empirical Auditing and Principled Heuristic Analysis of Differentially Private SGD

Learning from End User Data with Shuffled Differential Privacy over Kernel Densities

How to Verify Any (Reasonable) Distribution Property: Computationally Sound Argument Systems for Distributions

Adversarial Machine Unlearning

More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness

Fantastic Copyrighted Beasts and How (Not) to Generate Them

Probe before You Talk: Towards Black-box Defense against Backdoor Unalignment for Large Language Models

When Prompt Engineering Meets Software Engineering: CNL-P as Natural and Robust "APIs'' for Human-AI Interaction

Exact Computation of Any-Order Shapley Interactions for Graph Neural Networks

Efficient Automated Circuit Discovery in Transformers using Contextual Decomposition

DICE: Data Influence Cascade in Decentralized Learning

Salvage: Shapley-distribution Approximation Learning Via Attribution Guided Exploration for Explainable Image Classification

Mechanism and emergence of stacked attention heads in multi-layer transformers

Century: A Framework and Dataset for Evaluating Historical Contextualisation of Sensitive Images

Linear Representations of Political Perspective Emerge in Large Language Models

Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

How Far Are We from True Unlearnability?

BBCaL: Black-box Backdoor Detection under the Causality Lens

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal

Dynamic Sparse Training versus Dense Training: The Unexpected Winner in Image Corruption Robustness

An Effective Theory of Bias Amplification

STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

Language Models are Advanced Anonymizers

Geometry of Neural Reinforcement Learning in Continuous State and Action Spaces

Towards Domain Adaptive Neural Contextual Bandits

Lean-STaR: Learning to Interleave Thinking and Proving

ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning

Protecting against simultaneous data poisoning attacks

Robustness Inspired Graph Backdoor Defense

Towards Understanding the Universality of Transformers for Next-Token Prediction

SAVA: Scalable Learning-Agnostic Data Valuation

Global Convergence in Neural ODEs: Impact of Activation Functions

$R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning

Deep MMD Gradient Flow without adversarial training

Gyrogroup Batch Normalization

How Feature Learning Can Improve Neural Scaling Laws

Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs

Understanding Factual Recall in Transformers via Associative Memories

WeatherGFM: Learning a Weather Generalist Foundation Model via In-context Learning

On the Benefits of Memory for Modeling Time-Dependent PDEs

Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception

Tracing Representation Progression: Analyzing and Enhancing Layer-Wise Similarity

Consistency Models Made Easy

APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding

Let the Code LLM Edit Itself When You Edit the Code

Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models

Diffusion Models and Gaussian Flow Matching: Two Sides of the Same Coin

Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control

Elucidating the Preconditioning in Consistency Distillation

MatExpert: Decomposing Materials Discovery By Mimicking Human Experts

Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows

Dynamical Diffusion: Learning Temporal Dynamics with Diffusion Models

Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents

ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler

How many samples are needed to train a deep neural network?

Rationalizing and Augmenting Dynamic Graph Neural Networks

LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

Multi-LLM-Agents Debate - Performance, Efficiency, and Scaling Challenges

AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models

RTDiff: Reverse Trajectory Synthesis via Diffusion for Offline Reinforcement Learning

Multi-Field Adaptive Retrieval

Wayward Concepts In Large Multimodal Models

Small-to-Large Generalization: Training Data Influences Models Consistently Across Scale

Hymba: A Hybrid-head Architecture for Small Language Models

Anti-Exposure Bias in Diffusion Models

Decision Tree Induction Through LLMs via Semantically-Aware Evolution

Decoupling Angles and Strength in Low-rank Adaptation

Exploring The Forgetting in Adversarial Training: A Novel Method for Enhancing Robustness

LoRA-Pro: Are Low-Rank Adapters Properly Optimized?

Real-Time Video Generation with Pyramid Attention Broadcast

BenTo: Benchmark Reduction with In-Context Transferability

Do LLM Agents Have Regret? A Case Study in Online Learning and Games

4K4DGen: Panoramic 4D Generation at 4K Resolution

Monte Carlo Planning with Large Language Model for Text-Based Game Agents

GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding

Can LLM Simulations Truly Reflect Humanity? A Deep Dive

Compute-Constrained Data Selection

Multi-objective antibody design with constrained preference optimization

Generative World Explorer

Learning Randomized Algorithms with Transformers

KBLaM: Knowledge Base augmented Language Model

Linear SCM Identification in the Presence of Confounders and Gaussian Noise

Transformer-Squared: Self-adaptive LLMs

No Location Left Behind: Measuring and Improving the Fairness of Implicit Representations for Earth Data

Scaling Diffusion Language Models via Adaptation from Autoregressive Models

StringLLM: Understanding the String Processing Capability of Large Language Models

Progressive distillation induces an implicit curriculum

AutoBencher: Towards Declarative Benchmark Construction

LLMs' Potential Influences on Our Democracy: Challenges and Opportunities

Reassessing EMNLP 2024’s Best Paper: Does Divergence-Based Calibration for MIAs Hold Up?

Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval

Large Scale Knowledge Washing

NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics

Interleaved Scene Graphs for Interleaved Text-and-Image Generation Assessment

MGDA Converges under Generalized Smoothness, Provably