Skip to yearly menu bar
Skip to main content
Main Navigation
ICLR
Help/FAQ
Contact ICLR
Downloads
ICLR Blog
Code of Conduct
Privacy Policy
Create Profile
Reset Password
Journal To Conference Track
Diversity & Inclusion
Proceedings at OpenReview
Future Meetings
Press
Exhibitor Information
ICLR Twitter
About ICLR
My Stuff
Login
Select Year: (2025)
2026
2025
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
Getting Started
Schedule
Main Conference
Invited Talks
Awards
Papers
In-person Orals
Spotlight Posters
Blog Track Posters
Workshops
Community
Town Hall
Socials
Sponsors
Organizers
Help
Helpdesk
RocketChat Client
Website FAQ
Browse
Visualization
Layout:
mini
compact
topic
detail
×
No topics available
No sessions available
title
author
topic
session
shuffle
by
serendipity
bookmarked first
visited first
not visited first
bookmarked but not visited
Enable Javascript in your browser to see the papers page.
Universal Image Restoration Pre-training via Degradation Classification
On the Benefits of Attribute-Driven Graph Domain Adaptation
Port-Hamiltonian Architectural Bias for Long-Range Propagation in Deep Graph Networks
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
Framer: Interactive Frame Interpolation
Beyond Next Token Prediction: Patch-Level Training for Large Language Models
CTSyn: A Foundation Model for Cross Tabular Data Generation
Aligned Better, Listen Better for Audio-Visual Large Language Models
MiniPLM: Knowledge Distillation for Pre-training Language Models
From an LLM Swarm to a PDDL-empowered Hive: Planning Self-executed Instructions in a Multi-modal Jungle
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory
Discrete Codebook World Models for Continuous Control
The 3D-PC: a benchmark for visual perspective taking in humans and machines
Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding
Faster Cascades via Speculative Decoding
miniCTX: Neural Theorem Proving with (Long-)Contexts
Directional Gradient Projection for Robust Fine-Tuning of Foundation Models
Towards a Unified and Verified Understanding of Group-Operation Networks
Scaling Diffusion Language Models via Adaptation from Autoregressive Models
Multi-domain Distribution Learning for De Novo Drug Design
NeSyC: A Neuro-symbolic Continual Learner For Complex Embodied Tasks in Open Domains
Toward Exploratory Inverse Constraint Inference with Generative Diffusion Verifiers
Federated Domain Generalization with Data-free On-server Matching Gradient
Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance
Understanding Optimization in Deep Learning with Central Flows
Inference Optimal VLMs Need Fewer Visual Tokens and More Parameters
StochSync: Stochastic Diffusion Synchronization for Image Generation in Arbitrary Spaces
Data Unlearning in Diffusion Models
Repurposing in AI: A Distinct Approach or an Extension of Creative Problem Solving?
Consistency Models Made Easy
MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL
Fast Uncovering of Protein Sequence Diversity from Structure
Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws
Compute-Optimal LLMs Provably Generalize Better with Scale
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation
Learning mirror maps in policy mirror descent
DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors
CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding & Reasoning Capabilities of CodeLLMs
Bridging the Data Provenance Gap Across Text, Speech, and Video
Decoupled Subgraph Federated Learning
Improved Diffusion-based Generative Model with Better Adversarial Robustness
Manifold Learning by Mixture Models of VAEs for Inverse Problems
Generative Classifiers Avoid Shortcut Solutions
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials
Everything, Everywhere, All at Once: Is Mechanistic Interpretability Identifiable?
Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling
Diffusion Bridge Implicit Models
LiFT: Learning to Fine-Tune via Bayesian Parameter Efficient Meta Fine-Tuning
Beyond Worst-Case Dimensionality Reduction for Sparse Vectors
Elucidating the Preconditioning in Consistency Distillation
Improving Data Efficiency via Curating LLM-Driven Rating Systems
Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis
Retrieval Head Mechanistically Explains Long-Context Factuality
Do LLMs estimate uncertainty well in instruction-following?
Nonconvex Stochastic Optimization under Heavy-Tailed Noises: Optimal Convergence without Gradient Clipping
Large (Vision) Language Models are Unsupervised In-Context Learners
ThunderKittens: Simple, Fast, and $\textit{Adorable}$ Kernels
Achieving Dimension-Free Communication in Federated Learning via Zeroth-Order Optimization
Chain-of-Thought Provably Enables Learning the (Otherwise) Unlearnable
Towards Learning High-Precision Least Squares Algorithms with Sequence Models
Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape View
nGPT: Normalized Transformer with Representation Learning on the Hypersphere
A Coefficient Makes SVRG Effective
Homomorphism Counts as Structural Encodings for Graph Learning
Reasoning with Latent Thoughts: On the Power of Looped Transformers
PhysPDE: Rethinking PDE Discovery and a Physical HYpothesis Selection Benchmark
Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity
SINGER: Stochastic Network Graph Evolving Operator for High Dimensional PDEs
InstantPortrait: One-Step Portrait Editing via Diffusion Multi-Objective Distillation
SurFhead: Affine Rig Blending for Geometrically Accurate 2D Gaussian Surfel Head Avatars
MAP: Multi-Human-Value Alignment Palette
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Diffusion Transformers for Tabular Data Time Series Generation
Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression
Revisiting Large-Scale Non-convex Distributionally Robust Optimization
Mufu: Multilingual Fused Learning for Low-Resource Translation with LLM
ImageFolder: Autoregressive Image Generation with Folded Tokens
High-Dynamic Radar Sequence Prediction for Weather Nowcasting Using Spatiotemporal Coherent Gaussian Representation
Aligning Human Motion Generation with Human Perceptions
SEPARATE: A Simple Low-rank Projection for Gradient Compression in Modern Large-scale Model Training Process
Can a MISL Fly? Analysis and Ingredients for Mutual Information Skill Learning
Affine Steerable Equivariant Layer for Canonicalization of Neural Networks
Three Mechanisms of Feature Learning in a Linear Network
Number Cookbook: Number Understanding of Language Models and How to Improve It
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine
TC-MoE: Augmenting Mixture of Experts with Ternary Expert Choice
Provably Accurate Shapley Value Estimation via Leverage Score Sampling
Remove Symmetries to Control Model Expressivity and Improve Optimization
Attention with Markov: A Curious Case of Single-layer Transformers
The AdEMAMix Optimizer: Better, Faster, Older
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Pyramidal Flow Matching for Efficient Video Generative Modeling
DeLLMa: Decision Making Under Uncertainty with Large Language Models
JPEG Inspired Deep Learning
SpinQuant: LLM Quantization with Learned Rotations
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference
Param$\Delta$ for Direct Mixing: Post-Train Large Language Model At Zero Cost
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities
Image-level Memorization Detection via Inversion-based Inference Perturbation
gRNAde: Geometric Deep Learning for 3D RNA inverse design
A Policy-Gradient Approach to Solving Imperfect-Information Games with Best-Iterate Convergence
Boltzmann priors for Implicit Transfer Operators
Conformal Generative Modeling with Improved Sample Efficiency through Sequential Greedy Filtering
How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?
Neuron-based Multifractal Analysis of Neuron Interaction Dynamics in Large Models
Pursuing Feature Separation based on Neural Collapse for Out-of-Distribution Detection
CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control
Hidden in the Noise: Two-Stage Robust Watermarking for Images
Biologically Constrained Barrel Cortex Model Integrates Whisker Inputs and Replicates Key Brain Network Dynamics
LoRA-Pro: Are Low-Rank Adapters Properly Optimized?
Proximal Mapping Loss: Understanding Loss Functions in Crowd Counting & Localization
Pairwise Elimination with Instance-Dependent Guarantees for Bandits with Cost Subsidy
MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge
In-Context Editing: Learning Knowledge from Self-Induced Distributions
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Towards Understanding the Universality of Transformers for Next-Token Prediction
Cocoon: Robust Multi-Modal Perception with Uncertainty-Aware Sensor Fusion
High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws
Learning Task Belief Similarity with Latent Dynamics for Meta-Reinforcement Learning
LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
CryoGEN: Generative Energy-based Models for Cryogenic Electron Tomography Reconstruction
CLIBD: Bridging Vision and Genomics for Biodiversity Monitoring at Scale
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code
Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction
KAN: Kolmogorov–Arnold Networks
On the expressiveness and spectral bias of KANs
Online Clustering with Nearly Optimal Consistency
Simplifying, Stabilizing and Scaling Continuous-time Consistency Models
The Geometry of Categorical and Hierarchical Concepts in Large Language Models
Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks
Accelerating neural network training: An analysis of the AlgoPerf competition
An Auditing Test to Detect Behavioral Shift in Language Models
Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data
TRENDy: Temporal Regression of Effective Nonlinear Dynamics
HQGS: High-Quality Novel View Synthesis with Gaussian Splatting in Degraded Scenes
Dynamic Contrastive Skill Learning with State-Transition Based Skill Clustering and Dynamic Length Adjustment
Efficient Causal Decision Making with One-sided Feedback
Think Then React: Towards Unconstrained Action-to-Reaction Motion Generation
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
Regularized Proportional Fairness Mechanism for Resource Allocation Without Money
Dynamic Neural Fortresses: An Adaptive Shield for Model Extraction Defense
Agent S: An Open Agentic Framework that Uses Computers Like a Human
Simple ReFlow: Improved Techniques for Fast Flow Models
Protein Language Model Fitness is a Matter of Preference
Forte : Finding Outliers with Representation Typicality Estimation
Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models
MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses
Automated Design of Agentic Systems
Watch Less, Do More: Implicit Skill Discovery for Video-Conditioned Policy
Learning and aligning single-neuron invariance manifolds in visual cortex
Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
Learning Video-Conditioned Policy on Unlabelled Data with Joint Embedding Predictive Transformer
Discrete Latent Plans via Semantic Skill Abstractions
Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping
Adaptive Energy Alignment for Accelerating Test-Time Adaptation
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents
Cross-Embodiment Dexterous Grasping with Reinforcement Learning
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding
Cross-Domain Offline Policy Adaptation with Optimal Transport and Dataset Constraint
Revisiting Source-Free Domain Adaptation: a New Perspective via Uncertainty Control
ESE: Espresso Sentence Embeddings
Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time
Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge
3DIS: Depth-Driven Decoupled Image Synthesis for Universal Multi-Instance Generation
ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
Catastrophic Failure of LLM Unlearning via Quantization
Gaussian Head & Shoulders: High Fidelity Neural Upper Body Avatars with Anchor Gaussian Guided Texture Warping
Robustness Inspired Graph Backdoor Defense
Specialized Foundation Models Struggle to Beat Supervised Baselines
R2Det: Exploring Relaxed Rotation Equivariance in 2D Object Detection
Do You Keep an Eye on What I Ask? Mitigating Multimodal Hallucination via Attention-Guided Ensemble Decoding
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
NL-Eye: Abductive NLI For Images
Lost in Prediction: Why Social Media Narratives Don't Help Macroeconomic Forecasting?
Exploring the Camera Bias of Person Re-identification
Improving Probabilistic Diffusion Models With Optimal Diagonal Covariance Matching
Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning
Towards Foundation Models for Mixed Integer Linear Programming
Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data
Second Order Bounds for Contextual Bandits with Function Approximation
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation
TetSphere Splatting: Representing High-Quality Geometry with Lagrangian Volumetric Meshes
Wavelet-based Positional Representation for Long Context
Inverse Scaling: When Bigger Isn't Better
Procedural Synthesis of Synthesizable Molecules
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
LOIRE: LifelOng learning on Incremental data via pre-trained language model gRowth Efficiently
Conditional Testing based on Localized Conformal $p$-values
Error-quantified Conformal Inference for Time Series
Occlusion-aware Non-Rigid Point Cloud Registration via Unsupervised Neural Deformation Correntropy
VLMaterial: Procedural Material Generation with Large Vision-Language Models
LoCA: Location-Aware Cosine Adaptation for Parameter-Efficient Fine-Tuning
Flow-based Variational Mutual Information: Fast and Flexible Approximations
SMI-Editor: Edit-based SMILES Language Model with Fragment-level Supervision
Beyond the convexity assumption: Realistic tabular data generation under quantifier-free real linear constraints
3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting
Risk-Sensitive Variational Actor-Critic: A Model-Based Approach
Faster Diffusion Sampling with Randomized Midpoints: Sequential and Parallel
Learning How Hard to Think: Input-Adaptive Allocation of LM Computation
On Disentangled Training for Nonlinear Transform in Learned Image Compression
GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
Competition Dynamics Shape Algorithmic Phases of In-Context Learning
ADAM Optimization with Adaptive Batch Selection
Navigating Neural Space: Revisiting Concept Activation Vectors to Overcome Directional Divergence
Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
Unsupervised Model Tree Heritage Recovery
Optimality of Matrix Mechanism on $\ell_p^p$-metric
PINP: Physics-Informed Neural Predictor with latent estimation of fluid flows
Adversarial Mixup Unlearning
Pushing the Limits of All-Atom Geometric Graph Neural Networks: Pre-Training, Scaling, and Zero-Shot Transfer
Time-to-Event Pretraining for 3D Medical Imaging
Real-Time Video Generation with Pyramid Attention Broadcast
SBSC: Step-by-Step Coding for Improving Mathematical Olympiad Performance
Denoising Autoregressive Transformers for Scalable Text-to-Image Generation
Few-Class Arena: A Benchmark for Efficient Selection of Vision Models and Dataset Difficulty Measurement
GSBA$^K$: $top$-$K$ Geometric Score-based Black-box Attack
Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference
Bad-PFL: Exploiting Backdoor Attacks against Personalized Federated Learning
Deep Linear Probe Generators for Weight Space Learning
UNSURE: self-supervised learning with Unknown Noise level and Stein's Unbiased Risk Estimate
Preserving Deep Representations in One-Shot Pruning: A Hessian-Free Second-Order Optimization Framework
Wicked Oddities: Selectively Poisoning for Effective Clean-Label Backdoor Attacks
Graph Neural Networks Are More Than Filters: Revisiting and Benchmarking from A Spectral Perspective
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
DataGen: Unified Synthetic Dataset Generation via Large Language Models
GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding
CR-CTC: Consistency regularization on CTC for improved speech recognition
DEEM: Diffusion models serve as the eyes of large language models for image perception
TS-LIF: A Temporal Segment Spiking Neuron Network for Time Series Forecasting
FLOPS: Forward Learning with OPtimal Sampling
AdvPaint: Protecting Images from Inpainting Manipulation via Adversarial Attention Disruption
An Illustrated Guide to Automatic Sparse Differentiation
RGB-Event ISP: The Dataset and Benchmark
Animate-X: Universal Character Image Animation with Enhanced Motion Representation
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-training of Deep Networks
OmniKV: Dynamic Context Selection for Efficient Long-Context LLMs
Inverse Constitutional AI: Compressing Preferences into Principles
Neural Functions for Learning Periodic Signal
MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization
Infilling Score: A Pretraining Data Detection Algorithm for Large Language Models
Measuring And Improving Persuasiveness Of Large Language Models
Teaching Human Behavior Improves Content Understanding Abilities Of VLMs
Diff-Prompt: Diffusion-driven Prompt Generator with Mask Supervision
Competing Large Language Models in Multi-Agent Gaming Environments
Zeroth-Order Fine-Tuning of LLMs with Transferable Static Sparsity
From Commands to Prompts: LLM-based Semantic File System for AIOS
AutoCGP: Closed-Loop Concept-Guided Policies from Unlabeled Demonstrations
Robust System Identification: Finite-sample Guarantees and Connection to Regularization
Accelerating 3D Molecule Generation via Jointly Geometric Optimal Transport
Spurious Forgetting in Continual Learning of Language Models
Training Large Language Models for Retrieval-Augmented Question Answering through Backtracking Correction
Transition Path Sampling with Improved Off-Policy Training of Diffusion Path Samplers
Tight Lower Bounds under Asymmetric High-Order Hölder Smoothness and Uniform Convexity
Generalization Bounds and Model Complexity for Kolmogorov–Arnold Networks
Discrete Diffusion Schrödinger Bridge Matching for Graph Transformation
CyberHost: A One-stage Diffusion Framework for Audio-driven Talking Body Generation
A Statistical Framework for Ranking LLM-based Chatbots
KooNPro: A Variance-Aware Koopman Probabilistic Model Enhanced by Neural Process for Time Series Forecasting
Lie Algebra Canonicalization: Equivariant Neural Operators under arbitrary Lie Groups
Block Verification Accelerates Speculative Decoding
Continuous Diffusion for Mixed-Type Tabular Data
Intricacies of Feature Geometry in Large Language Models
What to align in multimodal contrastive learning?
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Masked Image Modeling Representations
Shape as Line Segments: Accurate and Flexible Implicit Surface Representation
Generative Flows on Synthetic Pathway for Drug Design
DartControl: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control
NExUME: Adaptive Training and Inference for DNNs under Intermittent Power Environments
PEARL: Parallel Speculative Decoding with Adaptive Draft Length
Weakly Supervised Video Scene Graph Generation via Natural Language Supervision
Mixture of Attentions For Speculative Decoding
Data Shapley in One Training Run
PALMBENCH: A COMPREHENSIVE BENCHMARK OF COMPRESSED LARGE LANGUAGE MODELS ON MOBILE PLATFORMS
Provably Robust Explainable Graph Neural Networks against Graph Perturbation Attacks
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Can Reinforcement Learning Solve Asymmetric Combinatorial-Continuous Zero-Sum Games?
Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix
Standardizing Structural Causal Models
Capturing the Temporal Dependence of Training Data Influence
Reinforcement learning with combinatorial actions for coupled restless bandits
Diffusion Bridge AutoEncoders for Unsupervised Representation Learning
CL-MFAP: A Contrastive Learning-Based Multimodal Foundation Model for Molecular Property Prediction and Antibiotic Screening
Perm: A Parametric Representation for Multi-Style 3D Hair Modeling
Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model
Accelerating Goal-Conditioned Reinforcement Learning Algorithms and Research
A Decade's Battle on Dataset Bias: Are We There Yet?
An Effective Theory of Bias Amplification
Warm Diffusion: Recipe for Blur-Noise Mixture Diffusion Models
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
Reconciling Model Multiplicity for Downstream Decision Making
Unlearning or Obfuscating? Jogging the Memory of Unlearned LLMs via Benign Relearning
ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains
Rethinking Evaluation of Sparse Autoencoders through the Representation of Polysemous Words
Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
Calibrating Expressions of Certainty
Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits
Convergent Privacy Loss of Noisy-SGD without Convexity and Smoothness
GLOMA: Global Video Text Spotting with Morphological Association
Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration
MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation
CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models
Explanations of GNN on Evolving Graphs via Axiomatic Layer edges
ARB-LLM: Alternating Refined Binarizations for Large Language Models
Unlocking Efficient, Scalable, and Continual Knowledge Editing with Basis-Level Representation Fine-Tuning
Advantage-Guided Distillation for Preference Alignment in Small Language Models
Dynamic Modeling of Patients, Modalities and Tasks via Multi-modal Multi-task Mixture of Experts
Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot Videos
Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory
Fragment and Geometry Aware Tokenization of Molecules for Structure-Based Drug Design Using Language Models
ASTrA: Adversarial Self-supervised Training with Adaptive-Attacks
MambaQuant: Quantizing the Mamba Family with Variance Aligned Rotation Methods
Cross-Attention Head Position Patterns Can Align with Human Visual Concepts in Text-to-Image Generative Models
OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting
GOttack: Universal Adversarial Attacks on Graph Neural Networks via Graph Orbits Learning
Misspecified $Q$-Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error
MrSteve: Instruction-Following Agents in Minecraft with What-Where-When Memory
Regret-Optimal List Replicable Bandit Learning: Matching Upper and Lower Bounds
Efficient Imitation under Misspecification
Enhancing Robust Fairness via Confusional Spectral Regularization
Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid
ThermalGaussian: Thermal 3D Gaussian Splatting
FreeCG: Free the Design Space of Clebsch-Gordan Transform for Machine Learning Force Fields
From Layers to States: A State Space Model Perspective to Deep Neural Network Layer Dynamics
From Risk to Uncertainty: Generating Predictive Uncertainty Measures via Bayesian Estimation
Enhancing the Scalability and Applicability of Kohn-Sham Hamiltonians for Molecular Systems
Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics
Optimization with Access to Auxiliary Information
Bandit Learning in Matching Markets with Indifference
Grounding Video Models to Actions through Goal Conditioned Exploration
Proteina: Scaling Flow-based Protein Structure Generative Models
On Speeding Up Language Model Evaluation
You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs
Trajectory-LLM: A Language-based Data Generator for Trajectory Prediction in Autonomous Driving
FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints
PAD: Personalized Alignment of LLMs at Decoding-time
Federated $Q$-Learning with Reference-Advantage Decomposition: Almost Optimal Regret and Logarithmic Communication Cost
PaCA: Partial Connection Adaptation for Efficient Fine-Tuning
Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition
What Matters in Learning from Large-Scale Datasets for Robot Manipulation
Accurate and Scalable Graph Neural Networks via Message Invariance
Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning
ECHOPulse: ECG Controlled Echocardio-gram Video Generation
Complexity Lower Bounds of Adaptive Gradient Algorithms for Non-convex Stochastic Optimization under Relaxed Smoothness
Local Steps Speed Up Local GD for Heterogeneous Distributed Logistic Regression
PuzzleFusion++: Auto-agglomerative 3D Fracture Assembly by Denoise and Verify
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
SparsyFed: Sparse Adaptive Federated Learning
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel
Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning
Robust-PIFu: Robust Pixel-aligned Implicit Function for 3D Human Digitalization from a Single Image
Exact Computation of Any-Order Shapley Interactions for Graph Neural Networks
Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing
Discrete Distribution Networks
The Unreasonable Ineffectiveness of the Deeper Layers
Divergence-enhanced Knowledge-guided Context Optimization for Visual-Language Prompt Tuning
Physics-Informed Diffusion Models
Adaptive Methods through the Lens of SDEs: Theoretical Insights on the Role of Noise
Functional Homotopy: Smoothing Discrete Optimization via Continuous Parameters for LLM Jailbreak Attacks
Predictive Uncertainty Quantification for Bird's Eye View Segmentation: A Benchmark and Novel Loss Function
Adapt-$\infty$: Scalable Continual Multimodal Instruction Tuning via Dynamic Data Selection
econSG: Efficient and Multi-view Consistent Open-Vocabulary 3D Semantic Gaussians
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
RaSA: Rank-Sharing Low-Rank Adaptation
MLPs Learn In-Context on Regression and Classification Tasks
Re-Evaluating the Impact of Unseen-Class Unlabeled Data on Semi-Supervised Learning Model
GOAL: A Generalist Combinatorial Optimization Agent Learner
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
Towards Federated RLHF with Aggregated Client Preference for LLMs
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
Do Large Language Models Truly Understand Geometric Structures?
FreqPrior: Improving Video Diffusion Models with Frequency Filtering Gaussian Noise
Endowing Visual Reprogramming with Adversarial Robustness
ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization
On the Crucial Role of Initialization for Matrix Factorization
Activation Gradient based Poisoned Sample Detection Against Backdoor Attacks
Grid Cell-Inspired Fragmentation and Recall for Efficient Map Building
Be More Diverse than the Most Diverse: Optimal Mixtures of Generative Models via Mixture-UCB Bandit Algorithms
ReGen: Generative Robot Simulation via Inverse Design
Self-Supervised Diffusion MRI Denoising via Iterative and Stable Refinement
How Learnable Grids Recover Fine Detail in Low Dimensions: A Neural Tangent Kernel Analysis of Multigrid Parametric Encodings
RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation
DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
DGQ: Distribution-Aware Group Quantization for Text-to-Image Diffusion Models
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
InverseBench: Benchmarking Plug-and-Play Diffusion Priors for Inverse Problems in Physical Sciences
Lightweight Predictive 3D Gaussian Splats
IgGM: A Generative Model for Functional Antibody and Nanobody Design
Federated Granger Causality Learning For Interdependent Clients With State Space Representation
MambaPEFT: Exploring Parameter-Efficient Fine-Tuning for Mamba
No Need to Talk: Asynchronous Mixture of Language Models
BingoGuard: LLM Content Moderation Tools with Risk Levels
HyPoGen: Optimization-Biased Hypernetworks for Generalizable Policy Generation
Learning Gain Map for Inverse Tone Mapping
Causal Graph Transformer for Treatment Effect Estimation Under Unknown Interference
Does SGD really happen in tiny subspaces?
The Superposition of Diffusion Models Using the Itô Density Estimator
CBQ: Cross-Block Quantization for Large Language Models
Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs
Weak-to-Strong Generalization Through the Data-Centric Lens
Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Model Alignment
Implicit In-context Learning
Understanding the Stability-based Generalization of Personalized Federated Learning
Data-centric Prediction Explanation via Kernelized Stein Discrepancy
GenVP: Generating Visual Puzzles with Contrastive Hierarchical VAEs
MotionDreamer: One-to-Many Motion Synthesis with Localized Generative Masked Transformer
Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization
Scaling Speech-Text Pre-training with Synthetic Interleaved Data
Global Convergence of Policy Gradient in Average Reward MDPs
Learning-Augmented Frequent Directions
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency
Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions
Decoupling Layout from Glyph in Online Chinese Handwriting Generation
Structuring Benchmark into Knowledge Graphs to Assist Large Language Models in Retrieving and Designing Models
A Quantum Circuit-Based Compression Perspective for Parameter-Efficient Learning
Causal Information Prioritization for Efficient Reinforcement Learning
SRSA: Skill Retrieval and Adaptation for Robotic Assembly Tasks
Robotouille: An Asynchronous Planning Benchmark for LLM Agents
Learning local equivariant representations for quantum operators
Towards Auto-Regressive Next-Token Prediction: In-context Learning Emerges from Generalization
TTVD: Towards a Geometric Framework for Test-Time Adaptation Based on Voronoi Diagram
Towards Empowerment Gain through Causal Structure Learning in Model-Based Reinforcement Learning
LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging
Accelerating Training with Neuron Interaction and Nowcasting Networks
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
SiReRAG: Indexing Similar and Related Information for Multihop Reasoning
DynaPrompt: Dynamic Test-Time Prompt Tuning
MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models
GS-LiDAR: Generating Realistic LiDAR Point Clouds with Panoramic Gaussian Splatting
Forgetting Transformer: Softmax Attention with a Forget Gate
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation
Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation
Test-time Alignment of Diffusion Models without Reward Over-optimization
Lightning-Fast Image Inversion and Editing for Text-to-Image Diffusion Models
Adaptive Retention & Correction: Test-Time Training for Continual Learning
Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling
ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability
Rethinking Audio-Visual Adversarial Vulnerability from Temporal and Modality Perspectives
Preference Optimization for Reasoning with Pseudo Feedback
Theory on Mixture-of-Experts in Continual Learning
Self-Boosting Large Language Models with Synthetic Preference Data
STAFF: Speculative Coreset Selection for Task-Specific Fine-tuning
Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
Dynamic Assortment Selection and Pricing with Censored Preference Feedback
IntersectionZoo: Eco-driving for Benchmarking Multi-Agent Contextual Reinforcement Learning
High-dimension Prototype is a Better Incremental Object Detection Learner
Diverse Preference Learning for Capabilities and Alignment
Topological Blindspots: Understanding and Extending Topological Deep Learning Through the Lens of Expressivity
Language Representations Can be What Recommenders Need: Findings and Potentials
Causal Effect Estimation with Mixed Latent Confounders and Post-treatment Variables
Fast and Accurate Blind Flexible Docking
3D-MolT5: Leveraging Discrete Structural Information for Molecule-Text Modeling
A Simple yet Effective $\Delta\Delta G$ Predictor is An Unsupervised Antibody Optimizer and Explainer
Efficient Source-Free Time-Series Adaptation via Parameter Subspace Disentanglement
Reconstructive Visual Instruction Tuning
Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control
Sequential Stochastic Combinatorial Optimization Using Hierarchal Reinforcement Learning
Optimizing Neural Network Representations of Boolean Networks
ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning
CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
$R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning
AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization
Time After Time: Deep-Q Effect Estimation for Interventions on When and What to do
Can Neural Networks Achieve Optimal Computational-statistical Tradeoff? An Analysis on Single-Index Model
Learning Regularized Graphon Mean-Field Games with Unknown Graphons
Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation
Online Preference Alignment for Language Models via Count-based Exploration
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models
Can Video LLMs Refuse to Answer? Alignment for Answerability in Video Large Language Models
BrainUICL: An Unsupervised Individual Continual Learning Framework for EEG Applications
Pursuing Better Decision Boundaries for Long-Tailed Object Detection via Category Information Amount
Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models
Valid Conformal Prediction for Dynamic GNNs
TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark
LR0.FM: LOW-RESOLUTION ZERO-SHOT CLASSIFICATION BENCHMARK FOR FOUNDATION MODELS
CBraMod: A Criss-Cross Brain Foundation Model for EEG Decoding
Robust Simulation-Based Inference under Missing Data via Neural Processes
NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models
Temporal Difference Learning: Why It Can Be Fast and How It Will Be Faster
High-quality Text-to-3D Character Generation with SparseCubes and Sparse Transformers.
E(3)-equivariant models cannot learn chirality: Field-based molecular generation
ToolDial: Multi-turn Dialogue Generation Method for Tool-Augmented Language Models
Deriving Causal Order from Single-Variable Interventions: Guarantees & Algorithm
Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
Revisiting Zeroth-Order Optimization: Minimum-Variance Two-Point Estimators and Directionally Aligned Perturbations
Ensembling Diffusion Models via Adaptive Feature Aggregation
Scalable Bayesian Learning with posteriors
EVA: Geometric Inverse Design for Fast Protein Motif-Scaffolding with Coupled Flow
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents
Point-SAM: Promptable 3D Segmentation Model for Point Clouds
Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
SoftMatcha: A Soft and Fast Pattern Matcher for Billion-Scale Corpus Searches
LICORICE: Label-Efficient Concept-Based Interpretable Reinforcement Learning
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
COFlowNet: Conservative Constraints on Flows Enable High-Quality Candidate Generation
Training-free Camera Control for Video Generation
LongMamba: Enhancing Mamba's Long-Context Capabilities via Training-Free Receptive Field Enlargement
Learning Neural Networks with Distribution Shift: Efficiently Certifiable Guarantees
Adaptive backtracking for faster optimization
CodePlan: Unlocking Reasoning Potential in Large Language Models by Scaling Code-form Planning
Accelerated training through iterative gradient propagation along the residual path
Continuous Exposure Learning for Low-light Image Enhancement using Neural ODEs
JetFormer: An autoregressive generative model of raw images and text
Boosting Perturbed Gradient Ascent for Last-Iterate Convergence in Games
AdaRankGrad: Adaptive Gradient Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning
Towards a Theoretical Understanding of Synthetic Data in LLM Post-Training: A Reverse-Bottleneck Perspective
Vec2Face: Scaling Face Dataset Generation with Loosely Constrained Vectors
Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation
Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control
REVISITING MULTI-PERMUTATION EQUIVARIANCE THROUGH THE LENS OF IRREDUCIBLE REPRESENTATIONS
Targeted Attack Improves Protection against Unauthorized Diffusion Customization
Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model
On the Expressive Power of Sparse Geometric MPNNs
Radar: Fast Long-Context Decoding for Any Transformer
Credal Wrapper of Model Averaging for Uncertainty Estimation in Classification
An Asynchronous Bundle Method for Distributed Learning Problems
Learning the Optimal Stopping for Early Classification within Finite Horizons via Sequential Probability Ratio Test
Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture
MIND over Body: Adaptive Thinking using Dynamic Computation
Interleaved Scene Graphs for Interleaved Text-and-Image Generation Assessment
Random-Set Neural Networks
Active Learning for Continual Learning: Keeping the Past Alive in the Present
DUALFormer: Dual Graph Transformer
Steering Large Language Models between Code Execution and Textual Reasoning
Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count
Breaking Free from MMI: A New Frontier in Rationalization by Probing Input Utilization
Convergence and Implicit Bias of Gradient Descent on Continual Linear Classification
3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
Leave-One-Out Stable Conformal Prediction
DeepGate4: Efficient and Effective Representation Learning for Circuit Design at Scale
GotenNet: Rethinking Efficient 3D Equivariant Graph Neural Networks
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
Tight Time Complexities in Parallel Stochastic Optimization with Arbitrary Computation Dynamics
Swift4D: Adaptive divide-and-conquer Gaussian Splatting for compact and efficient reconstruction of dynamic scene
PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches
Provable Benefit of Annealed Langevin Monte Carlo for Non-log-concave Sampling
SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators
Knowledge Distillation with Multi-granularity Mixture of Priors for Image Super-Resolution
Control-oriented Clustering of Visual Latent Representation
Implicit Search via Discrete Diffusion: A Study on Chess
Ensembles of Low-Rank Expert Adapters
Interactive Speculative Planning: Enhance Agent Efficiency through Co-design of System and User Interface
OS-ATLAS: Foundation Action Model for Generalist GUI Agents
MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines
Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
Apollo-MILP: An Alternating Prediction-Correction Neural Solving Framework for Mixed-Integer Linear Programming
PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training
Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Classic but Everlasting: Traditional Gradient-Based Algorithms Converge Fast Even in Time-Varying Multi-Player Games
I Can Hear You: Selective Robust Training for Deepfake Audio Detection
(Mis)Fitting Scaling Laws: A Survey of Scaling Law Fitting Techniques in Deep Learning
Generating CAD Code with Vision-Language Models for 3D Designs
An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning
A-Bench: Are LMMs Masters at Evaluating AI-generated Images?
Null Counterfactual Factor Interactions for Goal-Conditioned Reinforcement Learning
Generating Physical Dynamics under Priors
Matryoshka Multimodal Models
TDDBench: A Benchmark for Training data detection
GPS: A Probabilistic Distributional Similarity with Gumbel Priors for Set-to-Set Matching
RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
$\text{I}^2\text{AM}$: Interpreting Image-to-Image Latent Diffusion Models via Bi-Attribution Maps
Monet: Mixture of Monosemantic Experts for Transformers
Refine-by-Align: Reference-Guided Artifacts Refinement through Semantic Alignment
Aligned Datasets Improve Detection of Latent Diffusion-Generated Images
Group-robust Sample Reweighting for Subpopulation Shifts via Influence Functions
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation
CirT: Global Subseasonal-to-Seasonal Forecasting with Geometry-inspired Transformer
SimXRD-4M: Big Simulated X-ray Diffraction Data and Crystal Symmetry Classification Benchmark
MuPT: A Generative Symbolic Music Pretrained Transformer
Shapley-Guided Utility Learning for Effective Graph Inference Data Valuation
A Solvable Attention for Neural Scaling Laws
DyCAST: Learning Dynamic Causal Structure from Time Series
CrossMPT: Cross-attention Message-passing Transformer for Error Correcting Codes
6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering
3D Vision-Language Gaussian Splatting
Order-aware Interactive Segmentation
New Algorithms for the Learning-Augmented k-means Problem
Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG
RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards
Discovering Temporally Compositional Neural Manifolds with Switching Infinite GPFA
ZIP: An Efficient Zeroth-order Prompt Tuning for Black-box Vision-Language Models
Unlearning-based Neural Interpretations
Bridging Information Asymmetry in Text-video Retrieval: A Data-centric Approach
NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens
A Truncated Newton Method for Optimal Transport
MamBEV: Enabling State Space Models to Learn Birds-Eye-View Representations
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late In Training
Do Deep Neural Network Solutions Form a Star Domain?
On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Personalized Representation from Personalized Generation
Beyond Circuit Connections: A Non-Message Passing Graph Transformer Approach for Quantum Error Mitigation
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
ToolACE: Winning the Points of LLM Function Calling
Differentially private optimization for non-decomposable objective functions
Multi-Robot Motion Planning with Diffusion Models
Beyond Graphs: Can Large Language Models Comprehend Hypergraphs?
ImDy: Human Inverse Dynamics from Imitated Observations
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
ReMatching Dynamic Reconstruction Flow
Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks
Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction
HOPE for a Robust Parameterization of Long-memory State Space Models
CFD: Learning Generalized Molecular Representation via Concept-Enhanced Feedback Disentanglement
Analysis of Linear Mode Connectivity via Permutation-Based Weight Matching: With Insights into Other Permutation Search Methods
Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy Video
DoF: A Diffusion Factorization Framework for Offline Multi-Agent Reinforcement Learning
THE ROBUSTNESS OF DIFFERENTIABLE CAUSAL DISCOVERY IN MISSPECIFIED SCENARIOS
VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning
MIND: Math Informed syNthetic Dialogues for Pretraining LLMs
Building Math Agents with Multi-Turn Iterative Preference Learning
Towards Robust Multimodal Open-set Test-time Adaptation via Adaptive Entropy-aware Optimization
Local Loss Optimization in the Infinite Width: Stable Parameterization of Predictive Coding Networks and Target Propagation
Provable Convergence and Limitations of Geometric Tempering for Langevin Dynamics
ShEPhERD: Diffusing shape, electrostatics, and pharmacophores for bioisosteric drug design
Simplifying Deep Temporal Difference Learning
From Search to Sampling: Generative Models for Robust Algorithmic Recourse
Hot-pluggable Federated Learning: Bridging General and Personalized FL via Dynamic Selection
Machine Unlearning via Simulated Oracle Matching
What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?
Linear Partial Gromov-Wasserstein Embedding
Mitigate the Gap: Improving Cross-Modal Alignment in CLIP
Horizon Generalization in Reinforcement Learning
Robust Function-Calling for On-Device Language Model via Function Masking
Improved Sampling Of Diffusion Models In Fluid Dynamics With Tweedie's Formula
Temporal Flexibility in Spiking Neural Networks: Towards Generalization Across Time Steps and Deployment Friendliness
SeCom: On Memory Construction and Retrieval for Personalized Conversational Agents
Determine-Then-Ensemble: Necessity of Top-k Union for Large Language Model Ensembling
Finding Shared Decodable Concepts and their Negations in the Brain
State Space Model Meets Transformer: A New Paradigm for 3D Object Detection
Scaling Optimal LR Across Token Horizons
A Probabilistic Perspective on Unlearning and Alignment for Large Language Models
Flow Matching with Gaussian Process Priors for Probabilistic Time Series Forecasting
A primer on analytical learning dynamics of nonlinear neural networks
Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequences
Enhancing Clustered Federated Learning: Integration of Strategies and Improved Methodologies
KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models
Can LLMs Understand Time Series Anomalies?
TRACE: Temporal Grounding Video LLM via Causal Event Modeling
Timer-XL: Long-Context Transformers for Unified Time Series Forecasting
ClimaQA: An Automated Evaluation Framework for Climate Question Answering Models
Neural Fluid Simulation on Geometric Surfaces
Recovering Manifold Structure Using Ollivier Ricci Curvature
Investigating the Pre-Training Dynamics of In-Context Learning: Task Recognition vs. Task Learning
Towards Self-Supervised Covariance Estimation in Deep Heteroscedastic Regression
The Complexity of Two-Team Polymatrix Games with Independent Adversaries
EgoSim: Egocentric Exploration in Virtual Worlds with Multi-modal Conditioning
PWM: Policy Learning with Multi-Task World Models
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
Distilling Reinforcement Learning Algorithms for In-Context Model-Based Planning
Breaking the $\log(1/\Delta_2)$ Barrier: Better Batched Best Arm Identification with Adaptive Grids
Learning from Imperfect Human Feedback: A Tale from Corruption-Robust Dueling
On the Expressiveness of Rational ReLU Neural Networks With Bounded Depth
Transformer Learns Optimal Variable Selection in Group-Sparse Classification
Graph Neural Networks Gone Hogwild
Adaptive $Q$-Network: On-the-fly Target Selection for Deep Reinforcement Learning
Swing-by Dynamics in Concept Learning and Compositional Generalization
Single-agent Poisoning Attacks Suffice to Ruin Multi-Agent Learning
Do Contemporary Causal Inference Models Capture Real-World Heterogeneity? Findings from a Large-Scale Benchmark
A Differentiable Rank-Based Objective for Better Feature Learning
Mixture of In-Context Prompters for Tabular PFNs
Learning Long Range Dependencies on Graphs via Random Walks
Point-based Instance Completion with Scene Constraints
Geometry-Aware Approaches for Balancing Performance and Theoretical Guarantees in Linear Bandits
Graph Neural Networks Can (Often) Count Substructures
ICLR: In-Context Learning of Representations
Optimized Multi-Token Joint Decoding With Auxiliary Model for LLM Inference
MAGNet: Motif-Agnostic Generation of Molecules from Scaffolds
Spectral Compressive Imaging via Unmixing-driven Subspace Diffusion Refinement
How to Probe: Simple Yet Effective Techniques for Improving Post-hoc Explanations
As large as it gets – Studying Infinitely Large Convolutions via Neural Implicit Frequency Filters
Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images
Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning
Near, far: Patch-ordering enhances vision foundation models' scene understanding
Unifying Causal Representation Learning with the Invariance Principle
Scalable Mechanistic Neural Networks
Deep Signature: Characterization of Large-Scale Molecular Dynamics
Distributional Associations vs In-Context Reasoning: A Study of Feed-forward and Attention Layers
Quality over Quantity in Attention Layers: When Adding More Heads Hurts
PICASO: Permutation-Invariant Context Composition with State Space Models
InstaSHAP: Interpretable Additive Models Explain Shapley Values Instantly
A Large-scale Dataset and Benchmark for Commuting Origin-Destination Flow Generation
VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
A Multiscale Frequency Domain Causal Framework for Enhanced Pathological Analysis
Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling
Training-Free Activation Sparsity in Large Language Models
Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement
LLMs' Potential Influences on Our Democracy: Challenges and Opportunities
DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Active Learning for Neural PDE Solvers
Utilitarian Algorithm Configuration for Infinite Parameter Spaces
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing
Durable Quantization Conditioned Misalignment Attack on Large Language Models
Beyond Sequence: Impact of Geometric Context for RNA Property Prediction
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Epistemic Monte Carlo Tree Search
Advantage Alignment Algorithms
Solving hidden monotone variational inequalities with surrogate losses
HELMET: How to Evaluate Long-context Models Effectively and Thoroughly
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Diffusion-based Neural Network Weights Generation
Fitting Networks with a Cancellation Trick
Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning
Poisson-Dirac Neural Networks for Modeling Coupled Dynamical Systems across Domains
AssembleFlow: Rigid Flow Matching with Inertial Frames for Molecular Assembly
Sparse Autoencoders Do Not Find Canonical Units of Analysis
Selective Unlearning via Representation Erasure Using Domain Adversarial Training
Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL
Dynamic Low-Rank Sparse Adaptation for Large Language Models
Learning a Fast Mixing Exogenous Block MDP using a Single Trajectory
Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning
CAKE: Cascading and Adaptive KV Cache Eviction with Layer Preferences
Re-Thinking Inverse Graphics With Large Language Models
Rethinking Self-Distillation: Label Averaging and Enhanced Soft Label Refinement with Partial Labels
Identifiability for Gaussian Processes with Holomorphic Kernels
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Residual Deep Gaussian Processes on Manifolds
HADAMRNN: BINARY AND SPARSE TERNARY ORTHOGONAL RNNS
U-Nets as Belief Propagation: Efficient Classification, Denoising, and Diffusion in Generative Hierarchical Models
Probabilistic Language-Image Pre-Training
$\phi$-Update: A Class of Policy Update Methods with Policy Convergence Guarantee
NetFormer: An interpretable model for recovering dynamical connectivity in neuronal population dynamics
PaPaGei: Open Foundation Models for Optical Physiological Signals
Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization
Towards Improving Exploration through Sibling Augmented GFlowNets
PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance
On the Transfer of Object-Centric Representation Learning
Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic Learning
Structure Language Models for Protein Conformation Generation
Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold
How efficient is LLM-generated code? A rigorous & high-standard benchmark
Bootstrapping Language Models with DPO Implicit Rewards
Hessian Free Efficient Single Loop Iterative Differentiation Methods for Bi-Level Optimization Problems
Mix-CPT: A Domain Adaptation Framework via Decoupling Knowledge Learning and Format Alignment
Is Your Video Language Model a Reliable Judge?
A Riemannian Framework for Learning Reduced-order Lagrangian Dynamics
Commit0: Library Generation from Scratch
CraftRTL: High-quality Synthetic Data Generation for Verilog Code Models with Correct-by-Construction Non-Textual Representations and Targeted Code Repair
Latent Bayesian Optimization via Autoregressive Normalizing Flows
Locality Sensitive Avatars From Video
Conditional Diffusion with Ordinal Regression: Longitudinal Data Generation for Neurodegenerative Disease Studies
Action abstractions for amortized sampling
Adaptive teachers for amortized samplers
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models
Is uniform expressivity too restrictive? Towards efficient expressivity of GNNs
Tuning Frequency Bias of State Space Models
Denoising Task Difficulty-based Curriculum for Training Diffusion Models
Interpreting the Second-Order Effects of Neurons in CLIP
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations
Selective Attention Improves Transformer
Locally Connected Echo State Networks for Time Series Forecasting
Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition
Range, not Independence, Drives Modularity in Biologically Inspired Representations
HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment
Stabilized Neural Prediction of Potential Outcomes in Continuous Time
Model-agnostic meta-learners for estimating heterogeneous treatment effects over time
Feature Responsiveness Scores: Model-Agnostic Explanations for Recourse
Signature Kernel Conditional Independence Tests in Causal Discovery for Stochastic Processes
Interpretable Causal Representation Learning for Biological Data in the Pathway Space
Probabilistic Geometric Principal Component Analysis with application to neural data
Data Scaling Laws in Imitation Learning for Robotic Manipulation
TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio Motion Embedding and Diffusion Interpolation
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?
DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking head Video Generation
MCNC: Manifold-Constrained Reparameterization for Neural Compression
Robust Conformal Prediction with a Single Binary Certificate
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
RA-TTA: Retrieval-Augmented Test-Time Adaptation for Vision-Language Models
Adaptive Rank Allocation: Speeding Up Modern Transformers with RaNA Adapters
Prevalence of Negative Transfer in Continual Reinforcement Learning: Analyses and a Simple Baseline
SPDIM: Source-Free Unsupervised Conditional and Label Shift Adaptation in EEG
Maximizing the Potential of Synthetic Data: Insights from Random Matrix Theory
Clique Number Estimation via Differentiable Functions of Adjacency Matrix Permutations
UniDetox: Universal Detoxification of Large Language Models via Dataset Distillation
Charting the Design Space of Neural Graph Representations for Subgraph Matching
SyllableLM: Learning Coarse Semantic Units for Speech Language Models
Isometric Regularization for Manifolds of Functional Data
A Deep Generative Learning Approach for Two-stage Adaptive Robust Optimization
Budgeted Online Continual Learning by Adaptive Layer Freezing and Frequency-based Sampling
Decomposition Polyhedra of Piecewise Linear Functions
REGENT: A Retrieval-Augmented Generalist Agent That Can Act In-Context in New Environments
Understanding Virtual Nodes: Oversquashing and Node Heterogeneity
Language Guided Skill Discovery
Privacy-Aware Lifelong Learning
Model-based Offline Reinforcement Learning with Lower Expectile Q-Learning
Approximation algorithms for combinatorial optimization with predictions
Transformers Learn Low Sensitivity Functions: Investigations and Implications
Visually Consistent Hierarchical Image Classification
TSC-Net: Prediction of Pedestrian Trajectories by Trajectory-Scene-Cell Classification
SIM: Surface-based fMRI Analysis for Inter-Subject Multimodal Decoding from Movie-Watching Experiments
Balanced Ranking with Relative Centrality: A multi-core periphery perspective
Fréchet Wavelet Distance: A Domain-Agnostic Metric for Image Generation
Test-time Adaptation for Image Compression with Distribution Regularization
Learning Mask Invariant Mutual Information for Masked Image Modeling
Last Iterate Convergence of Incremental Methods as a Model of Forgetting
Parameter Expanded Stochastic Gradient Markov Chain Monte Carlo
The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning
Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets
Adaptive Gradient Clipping for Robust Federated Learning
Neuroplastic Expansion in Deep Reinforcement Learning
Learning to Plan Before Answering: Self-Teaching LLMs to Learn Abstract Plans for Problem Solving
Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization
Zero-shot Model-based Reinforcement Learning using Large Language Models
Sufficient Context: A New Lens on Retrieval Augmented Generation Systems
A Large-Scale 3D Face Mesh Video Dataset via Neural Re-parameterized Optimization
Semantic Aware Representation Learning for Lifelong Learning
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Min-K%++: Improved Baseline for Pre-Training Data Detection from Large Language Models
An Effective Manifold-based Optimization Method for Distributionally Robust Classification
MambaExtend: A Training-Free Approach to Improve Long Context Extension of Mamba
Detecting Backdoor Samples in Contrastive Language Image Pretraining
Learning to Solve Differential Equation Constrained Optimization Problems
Mitigating Spurious Correlations in Zero-Shot Multimodal Models
Representational Similarity via Interpretable Visual Concepts
Separation Power of Equivariant Neural Networks
Test-Time Adaptation for Combating Missing Modalities in Egocentric Videos
Bundle Neural Network for message diffusion on graphs
Neural Spacetimes for DAG Representation Learning
Training-Free Message Passing for Learning on Hypergraphs
The Value of Sensory Information to a Robot
Manifolds, Random Matrices and Spectral Gaps: The geometric phases of generative diffusion
QPM: Discrete Optimization for Globally Interpretable Image Classification
SegLLM: Multi-round Reasoning Segmentation with Large Language Models
BrainOOD: Out-of-distribution Generalizable Brain Network Analysis
DebGCD: Debiased Learning with Distribution Guidance for Generalized Category Discovery
Cached Multi-Lora Composition for Multi-Concept Image Generation
Robust Feature Learning for Multi-Index Models in High Dimensions
Improving Graph Neural Networks by Learning Continuous Edge Directions
Edge Prompt Tuning for Graph Neural Networks
T2V-Turbo-v2: Enhancing Video Model Post-Training through Data, Reward, and Conditional Guidance Design
Rethinking Spiking Neural Networks from an Ensemble Learning Perspective
Regret Bounds for Episodic Risk-Sensitive Linear Quadratic Regulator
RefactorBench: Evaluating Stateful Reasoning in Language Agents Through Code
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
metabench - A Sparse Benchmark of Reasoning and Knowledge in Large Language Models
Second-Order Fine-Tuning without Pain for LLMs: A Hessian Informed Zeroth-Order Optimizer
Bootstrapped Model Predictive Control
Understanding Model Calibration - A gentle introduction and visual exploration of calibration and the expected calibration error (ECE)
MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion
Learning vector fields of differential equations on manifolds with geometrically constrained operator-valued kernels
Multi-Modal and Multi-Attribute Generation of Single Cells with CFGen
Graph Neural Preconditioners for Iterative Solutions of Sparse Linear Systems
Steering LLMs' Behavior with Concept Activation Vectors
Distribution-Free Data Uncertainty for Neural Network Regression
Revolutionizing EMCCD Denoising through a Novel Physics-Based Learning Framework for Noise Modeling
Temporal Heterogeneous Graph Generation with Privacy, Utility, and Efficiency
Memory Efficient Transformer Adapter for Dense Predictions
Learning Robust Representations with Long-Term Information for Generalization in Visual Reinforcement Learning
MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments
Unlocking the Potential of Model Calibration in Federated Learning
Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models
GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation
DiffPuter: Empowering Diffusion Models for Missing Data Imputation
TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation
Towards Faster Decentralized Stochastic Optimization with Communication Compression
MetaMetrics: Calibrating Metrics for Generation Tasks Using Human Preferences
Holistically Evaluating the Environmental Impact of Creating Language Models
Generating Likely Counterfactuals Using Sum-Product Networks
A Sanity Check for AI-generated Image Detection
Conformalized Survival Analysis for General Right-Censored Data
Sensitivity-Constrained Fourier Neural Operators for Forward and Inverse Problems in Parametric Differential Equations
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
See What You Are Told: Visual Attention Sink in Large Multimodal Models
MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction
The Foundations of Tokenization: Statistical and Computational Concerns
To Code or Not To Code? Exploring Impact of Code in Pre-training
Make Haste Slowly: A Theory of Emergent Structured Mixed Selectivity in Feature Learning ReLU Networks
MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs
POTEC: Off-Policy Contextual Bandits for Large Action Spaces via Policy Decomposition
A General Framework for Off-Policy Learning with Partially-Observed Reward
SeRA: Self-Reviewing and Alignment of LLMs using Implicit Reward Margins
PolyNet: Learning Diverse Solution Strategies for Neural Combinatorial Optimization
Streamlining Prediction in Bayesian Deep Learning
Learning 3D Perception from Others' Predictions
Graph Sparsification via Mixture of Graphs
Rethinking Graph Prompts: Unraveling the Power of Data Manipulation in Graph Neural Networks
Training Language Models to Self-Correct via Reinforcement Learning
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Generalization Guarantees for Representation Learning via Data-Dependent Gaussian Mixture Priors
How many samples are needed to train a deep neural network?
Revisiting In-context Learning Inference Circuit in Large Language Models
Neural Wave Equation for Irregularly Sampled Sequence Data
Deep Learning Alternatives Of The Kolmogorov Superposition Theorem
Understanding Long Videos with Multimodal Language Models
Residual Stream Analysis with Multi-Layer SAEs
When Attention Sink Emerges in Language Models: An Empirical View
Protecting against simultaneous data poisoning attacks
Rethinking Reward Modeling in Preference-based Large Language Model Alignment
Rethinking Artistic Copyright Infringements In the Era Of Text-to-Image Generative Models
On Calibration of LLM-based Guard Models for Reliable Content Moderation
Exploiting Hankel-Toeplitz Structures for Fast Computation of Kernel Precision Matrices
MuHBoost: Multi-Label Boosting For Practical Longitudinal Human Behavior Modeling
PETRA: Parallel End-to-end Training with Reversible Architectures
Diffusion On Syntax Trees For Program Synthesis
What Makes a Good Diffusion Planner for Decision Making?
Test-time Adaptation for Regression by Subspace Alignment
Attributing Culture-Conditioned Generations to Pretraining Corpora
X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale
Semantix: An Energy-guided Sampler for Semantic Style Transfer
LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding
ConcreTizer: Model Inversion Attack via Occupancy Classification and Dispersion Control for 3D Point Cloud Restoration
Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond
Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models
No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models
Efficient Off-Policy Learning for High-Dimensional Action Spaces
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Searching for Optimal Solutions with LLMs via Bayesian Optimization
RandLoRA: Full rank parameter-efficient fine-tuning of large models
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models
Precise Parameter Localization for Textual Generation in Diffusion Models
Decoupled Graph Energy-based Model for Node Out-of-Distribution Detection on Heterophilic Graphs
Toward Generalizing Visual Brain Decoding to Unseen Subjects
Debiasing Mini-Batch Quadratics for Applications in Deep Learning
Optimal Protocols for Continual Learning via Statistical Physics and Control Theory
A Theory of Initialisation's Impact on Specialisation
Correlating instruction-tuning (in multimodal models) with vision-language processing (in the brain)
Non-Equilibrium Dynamics of Hybrid Continuous-Discrete Ground-State Sampling
Do as I do (Safely): Mitigating Task-Specific Fine-tuning Risks in Large Language Models
Disentangling 3D Animal Pose Dynamics with Scrubbed Conditional Latent Variables
Towards Interpreting Visual Information Processing in Vision-Language Models
L-WISE: Boosting Human Visual Category Learning Through Model-Based Image Selection and Enhancement
In-context Time Series Predictor
The Loss Landscape of Deep Linear Neural Networks: a Second-order Analysis
Task-Adaptive Pretrained Language Models via Clustered-Importance Sampling
Feedback Favors the Generalization of Neural ODEs
Zero-cost Proxy for Adversarial Robustness Evaluation
Safety Alignment Should be Made More Than Just a Few Tokens Deep
MaestroMotif: Skill Design from Artificial Intelligence Feedback
Optimal Learning of Kernel Logistic Regression for Complex Classification Scenarios
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
Privately Counting Partially Ordered Data
CONGO: Compressive Online Gradient Optimization
EIA: ENVIRONMENTAL INJECTION ATTACK ON GENERALIST WEB AGENTS FOR PRIVACY LEAKAGE
Graph Neural Networks for Edge Signals: Orientation Equivariance and Invariance
SAM 2: Segment Anything in Images and Videos
Extreme Risk Mitigation in Reinforcement Learning using Extreme Value Theory
Systems with Switching Causal Relations: A Meta-Causal Perspective
Towards Scalable Topological Regularizers
ToVE: Efficient Vision-Language Learning via Knowledge Transfer from Vision Experts
ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding
LeanAgent: Lifelong Learning for Formal Theorem Proving
CLOVER: Cross-Layer Orthogonal Vectors Pruning and Fine-Tuning
Enhancing Graph Of Thought: Enhancing Prompts with LLM Rationales and Dynamic Temperature Control
GOFA: A Generative One-For-All Model for Joint Graph Language Modeling
Copyright-Protected Language Generation via Adaptive Model Fusion
Edge-aware Image Smoothing with Relative Wavelet Domain Representation
When does compositional structure yield compositional generalization? A kernel theory.
When GNNs meet symmetry in ILPs: an orbit-based feature augmentation approach
Sail into the Headwind: Alignment via Robust Rewards and Dynamic Labels against Reward Hacking
Programming Refusal with Conditional Activation Steering
Mentored Learning: Improving Generalization and Convergence of Student Learner
Towards General-Purpose Model-Free Reinforcement Learning
Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces
Soft Merging of Experts with Adaptive Routing
Scaling Autonomous Agents via Automatic Reward Modeling And Planning
Scaling Laws for Adversarial Attacks on Language Model Activations and Tokens
EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing
TVNet: A Novel Time Series Analysis Method Based on Dynamic Convolution and 3D-Variation
DiffGAD: A Diffusion-based Unsupervised Graph Anomaly Detector
The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation
Generative Adversarial Ranking Nets
ADMM for Nonconvex Optimization under Minimal Continuity Assumption
Leveraging Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning
Root Cause Analysis of Anomalies in Multivariate Time Series through Granger Causal Discovery
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
Provable unlearning in topic modeling and downstream tasks
Object-Centric Pretraining via Target Encoder Bootstrapping
A Simple Framework for Open-Vocabulary Zero-Shot Segmentation
LLaVA-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Weakly-Supervised Affordance Grounding Guided by Part-Level Semantic Priors
RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything
Strength Estimation and Human-Like Strength Adjustment in Games
LASeR: Towards Diversified and Generalizable Robot Design with Large Language Models
OptionZero: Planning with Learned Options
Towards Automated Knowledge Integration From Human-Interpretable Representations
Divergence-Regularized Discounted Aggregation: Equilibrium Finding in Multiplayer Partially Observable Stochastic Games
Risk-Sensitive Diffusion: Robustly Optimizing Diffusion Models with Noisy Samples
Schur's Positive-Definite Network: Deep Learning in the SPD cone with structure
Empowering LLM Agents with Zero-Shot Optimal Decision-Making through Q-learning
Discovering Group Structures via Unitary Representation Learning
What's New in My Data? Novelty Exploration via Contrastive Generation
Language Agents Meet Causality -- Bridging LLMs and Causal World Models
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Rethinking Visual Counterfactual Explanations Through Region Constraint
IGL-Bench: Establishing the Comprehensive Benchmark for Imbalanced Graph Learning
Beyond Random Masking: When Dropout meets Graph Convolutional Networks
Not-So-Optimal Transport Flows for 3D Point Cloud Generation
Node Identifiers: Compact, Discrete Representations for Efficient Graph Learning
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
Pacmann: Efficient Private Approximate Nearest Neighbor Search
PiCO: Peer Review in LLMs based on Consistency Optimization
RRM: Robust Reward Model Training Mitigates Reward Hacking
MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA
Can We Ignore Labels in Out of Distribution Detection?
MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors
A Non-Contrastive Learning Framework for Sequential Recommendation with Preference-Preserving Profile Generation
The Same but Different: Structural Similarities and Differences in Multilingual Language Modeling
Training-Free Dataset Pruning for Instance Segmentation
SEMDICE: Off-policy State Entropy Maximization via Stationary Distribution Correction Estimation
SpaceGNN: Multi-Space Graph Neural Network for Node Anomaly Detection with Extremely Limited Labels
ADePT: Adaptive Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning
Boosting Neural Combinatorial Optimization for Large-Scale Vehicle Routing Problems
Certifying Language Model Robustness with Fuzzed Randomized Smoothing: An Efficient Defense Against Backdoor Attacks
A Single Goal is All You Need: Skills and Exploration Emerge from Contrastive RL without Rewards, Demonstrations, or Subgoals
Task Descriptors Help Transformers Learn Linear Models In-Context
Fast Direct: Query-Efficient Online Black-box Guidance for Diffusion-model Target Generation
Boosting the visual interpretability of CLIP via adversarial fine-tuning
LLMs Can Plan Only If We Tell Them
Topological Schrödinger Bridge Matching
Extending Mercer's expansion to indefinite and asymmetric kernels
FreDF: Learning to Forecast in the Frequency Domain
Differentially Private Federated Learning with Time-Adaptive Privacy Spending
Solving Video Inverse Problems Using Image Diffusion Models
Breaking the Reclustering Barrier in Centroid-based Deep Clustering
Should VLMs be Pre-trained with Image Data?
Efficient Alternating Minimization with Applications to Weighted Low Rank Approximation
On a Connection Between Imitation Learning and RLHF
Competitive Fair Scheduling with Predictions
Bayesian Experimental Design Via Contrastive Diffusions
Estimation of single-cell and tissue perturbation effect in spatial transcriptomics via Spatial Causal Disentanglement
Learning a Neural Solver for Parametric PDEs to Enhance Physics-Informed Methods
Second-Order Min-Max Optimization with Lazy Hessians
DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from a Single Demo
3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation
Online-to-Offline RL for Agent Alignment
More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness
URLOST: Unsupervised Representation Learning without Stationarity or Topology
Lambda-Skip Connections: the architectural component that prevents Rank Collapse
Decision Tree Induction Through LLMs via Semantically-Aware Evolution
Active Task Disambiguation with LLMs
Examining Alignment of Large Language Models through Representative Heuristics: the case of political stereotypes
Knowledge Localization: Mission Not Accomplished? Enter Query Localization!
MIRAGE: Evaluating and Explaining Inductive Reasoning Process in Language Models
Controlled LLM Decoding via Discrete Auto-regressive Biasing
ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time
Partially Observed Trajectory Inference using Optimal Transport and a Dynamics Prior
Optimization by Parallel Quasi-Quantum Annealing with Gradient-Based Sampling
OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
Bayesian Optimization of Antibodies Informed by a Generative Model of Evolving Sequences
Diverse Policies Recovering via Pointwise Mutual Information Weighted Imitation Learning
Advancing Graph Generation through Beta Diffusion
GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
Certifying Counterfactual Bias in LLMs
A Black Swan Hypothesis: The Role of Human Irrationality in AI Safety
Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data
AIMS.au: A Dataset for the Analysis of Modern Slavery Countermeasures in Corporate Statements
MQuAKE-Remastered: Multi-Hop Knowledge Editing Can Only Be Advanced with Reliable Evaluations
Laplace Sample Information: Data Informativeness Through a Bayesian Lens
S4M: S4 for multivariate time series forecasting with Missing values
Weak to Strong Generalization for Large Language Models with Multi-capabilities
On the Benefits of Memory for Modeling Time-Dependent PDEs
Studying the Interplay Between the Actor and Critic Representations in Reinforcement Learning
The Case for Cleaner Biosignals: High-fidelity Neural Compressor Enables Transfer from Cleaner iEEG to Noisier EEG
Monitoring Latent World States in Language Models with Propositional Probes
BAMDP Shaping: a Unified Framework for Intrinsic Motivation and Reward Shaping
Do LLMs have Consistent Values?
Multilevel Generative Samplers for Investigating Critical Phenomena
Sensor-Invariant Tactile Representation
Gaussian Ensemble Belief Propagation for Efficient Inference in High-Dimensional, Black-box Systems
Rethinking Invariance Regularization in Adversarial Training to Improve Robustness-Accuracy Trade-off
3D-SPATIAL MULTIMODAL MEMORY
Adaptive Transformer Programs: Bridging the Gap Between Performance and Interpretability in Transformers
Controllable Blur Data Augmentation Using 3D-Aware Motion Estimation
A Causal Lens for Learning Long-term Fair Policies
Mechanism and Emergence of Stacked Attention Heads in Multi-Layer Transformers
Optimizing 4D Gaussians for Dynamic Scene Video from Single Landscape Images
ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization
Global Identifiability of Overcomplete Dictionary Learning via L1 and Volume Minimization
ControlAR: Controllable Image Generation with Autoregressive Models
Fat-to-Thin Policy Optimization: Offline Reinforcement Learning with Sparse Policies
MaskBit: Embedding-free Image Generation via Bit Tokens
Connectome Mapping: Shape-Memory Network via Interpretation of Contextual Semantic Information
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models
Towards Understanding the Robustness of Diffusion-Based Purification: A Stochastic Perspective
When LLMs Play the Telephone Game: Cultural Attractors as Conceptual Tools to Evaluate LLMs in Multi-turn Settings
Probabilistic Learning to Defer: Handling Missing Expert Annotations and Controlling Workload Distribution
CPSample: Classifier Protected Sampling for Guarding Training Data During Diffusion
STAR: Synthesis of Tailored Architectures
RetroInText: A Multimodal Large Language Model Enhanced Framework for Retrosynthetic Planning via In-Context Representation Learning
PhyloLM: Inferring the Phylogeny of Large Language Models and Predicting their Performances in Benchmarks
Logically Consistent Language Models via Neuro-Symbolic Integration
Improving Text-to-Image Consistency via Automatic Prompt Optimization
OvercookedV2: Rethinking Overcooked for Zero-Shot Coordination
SFESS: Score Function Estimators for $k$-Subset Sampling
ET-SEED: EFFICIENT TRAJECTORY-LEVEL SE(3) EQUIVARIANT DIFFUSION POLICY
Generating Less Certain Adversarial Examples Improves Robust Generalization
Discretization-invariance? On the Discretization Mismatch Errors in Neural Operators
Select before Act: Spatially Decoupled Action Repetition for Continuous Control
Score-based free-form architectures for high-dimensional Fokker-Planck equations
ACTIVE: Offline Reinforcement Learning via Adaptive Imitation and In-sample $V$-Ensemble
Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood
Revisiting Mode Connectivity in Neural Networks with Bezier Surface
Inverse decision-making using neural amortized Bayesian actors
Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing
Approximating Full Conformal Prediction for Neural Network Regression with Gauss-Newton Influence
SWEb: A Large Web Dataset for the Scandinavian Languages
Reveal Object in Lensless Photography via Region Gaze and Amplification
Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs
ReSi: A Comprehensive Benchmark for Representational Similarity Measures
Operator Deep Smoothing for Implied Volatility
Learning from negative feedback, or positive feedback or both
Vertical Federated Learning with Missing Features During Training and Inference
Exploiting Hidden Symmetry to Improve Objective Perturbation for DP Linear Learners with a Nonsmooth L1-Norm
Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries
The Power of LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions
A Second-Order Perspective on Model Compositionality and Incremental Learning
Going Beyond Feature Similarity: Effective Dataset distillation based on Class-aware Conditional Mutual Information
Adversarially Robust Out-of-Distribution Detection Using Lyapunov-Stabilized Embeddings
Robust Gymnasium: A Unified Modular Benchmark for Robust Reinforcement Learning
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Self-supervised contrastive learning performs non-linear system identification
Learning on One Mode: Addressing Multi-modality in Offline Reinforcement Learning
Continual Slow-and-Fast Adaptation of Latent Neural Dynamics (CoSFan): Meta-Learning What-How & When to Adapt
Reflective Gaussian Splatting
Proxy Denoising for Source-Free Domain Adaptation
GeoLoRA: Geometric integration for parameter efficient fine-tuning
Concept Bottleneck Large Language Models
A transfer learning framework for weak to strong generalization
Higher-Order Graphon Neural Networks: Approximation and Cut Distance
Generalization, Expressivity, and Universality of Graph Neural Networks on Attributed Graphs
LiveXiv - A Multi-Modal live benchmark based on Arxiv papers content
Generalization Bounds for Canonicalization: A Comparative Study with Group Averaging
Scalable Influence and Fact Tracing for Large Language Model Pretraining
Open-Set Graph Anomaly Detection via Normal Structure Regularisation
Efficient Distribution Matching of Representations via Noise-Injected Deep InfoMax
DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation
Enhance Multi-View Classification Through Multi-Scale Alignment and Expanded Boundary
OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer
Variational Best-of-N Alignment
Learning Efficient Positional Encodings with Graph Neural Networks
Training Neural Networks as Recognizers of Formal Languages
Compositional simulation-based inference for time series
Controllable Context Sensitivity and the Knob Behind It
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages
KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models
Towards more rigorous evaluations of language models
Continuous Ensemble Weather Forecasting with Diffusion models
Ada-K Routing: Boosting the Efficiency of MoE-based LLMs
Balancing Bias in Two-sided Markets for Fair Stable Matchings
Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning
Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts
KBLaM: Knowledge Base augmented Language Model
Fantastic Copyrighted Beasts and How (Not) to Generate Them
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding
The Belief State Transformer
CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation
DRoP: Distributionally Robust Data Pruning
Improving Reasoning Performance in Large Language Models via Representation Engineering
Interpreting Language Reward Models via Contrastive Explanations
Neural Context Flows for Meta-Learning of Dynamical Systems
From Complexity to Clarity: Analytical Expressions of Deep Neural Network Weights via Clifford Algebra and Convexity
PnP-Flow: Plug-and-Play Image Restoration with Flow Matching
Minimal Variance Model Aggregation: A principled, non-intrusive, and versatile integration of black box models
Scaling and evaluating sparse autoencoders
MIRACLE 3D: Memory-efficient Integrated Robust Approach for Continual Learning on 3D Point Clouds via Shape Model Construction
Distributed Speculative Inference (DSI): Speculation Parallelism for Provably Faster Lossless Language Model Inference
Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration
Controllable Generation via Locally Constrained Resampling
Articulate-Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model
Deep Networks Learn Features From Local Discontinuities in the Label Function
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
Linear Mode Connectivity in Differentiable Tree Ensembles
Text4Seg: Reimagining Image Segmentation as Text Generation
SIMPL: Scalable and hassle-free optimisation of neural representations from behaviour
Building Blocks of Differentially Private Training
Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice
Formation of Representations in Neural Networks
Federated Few-Shot Class-Incremental Learning
Contractive Dynamical Imitation Policies for Efficient Out-of-Sample Recovery
Inference Scaling for Long-Context Retrieval Augmented Generation
Vision and Language Synergy for Rehearsal Free Continual Learning
Large Language Models Assume People are More Rational than We Really are
Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge
RESuM: A Rare Event Surrogate Model for Physics Detector Design
Reinforcement Learning for Control of Non-Markovian Cellular Population Dynamics
Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model
Composable Interventions for Language Models
PRISM: Privacy-Preserving Improved Stochastic Masking for Federated Generative Models
Graph-based Document Structure Analysis
SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding
LICO: Large Language Models for In-Context Molecular Optimization
Bisimulation Metric for Model Predictive Control
Causal Identification for Complex Functional Longitudinal Studies
Geometry of Lightning Self-Attention: Identifiability and Dimension
Taming Overconfidence in LLMs: Reward Calibration in RLHF
Fourier Head: Helping Large Language Models Learn Complex Probability Distributions
Mutual Effort for Efficiency: A Similarity-based Token Pruning for Vision Transformers in Self-Supervised Learning
DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References
Efficient Model Editing with Task-Localized Sparse Fine-tuning
Minimalistic Predictions for Online Class Constraint Scheduling
Linear combinations of latents in generative models: subspaces and beyond
XAIguiFormer: explainable artificial intelligence guided transformer for brain disorder identification
Benchmarking Predictive Coding Networks -- Made Simple
Shifting the Paradigm: A Diffeomorphism Between Time Series Data Manifolds for Achieving Shift-Invariancy in Deep Learning
Data Distillation for extrapolative protein design through exact preference optimization
Risk-Controlling Model Selection via Guided Bayesian Optimization
On Scaling Up 3D Gaussian Splatting Training
Coreset Selection via Reducible Loss in Continual Learning
Generator Matching: Generative modeling with arbitrary Markov processes
A Percolation Model of Emergence: Analyzing Transformers Trained on a Formal Language
PhyloVAE: Unsupervised Learning of Phylogenetic Trees via Variational Autoencoders
Underdamped Diffusion Bridges with Applications to Sampling
Simulating Training Dynamics to Reconstruct Training Data from Deep Neural Networks
Sequential Controlled Langevin Diffusions
Interpreting Global Perturbation Robustness of Image Models using Axiomatic Spectral Importance Decomposition
Solving Inverse Problems with Model Mismatch using Untrained Neural Networks within Model-based Architectures
GraphArena: Evaluating and Exploring Large Language Models on Graph Computation
GDrag:Towards General-Purpose Interactive Editing with Anti-ambiguity Point Diffusion
Find A Winning Sign: Sign Is All We Need to Win the Lottery
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models
AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors
SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement
Handling Delay in Real-Time Reinforcement Learning
Watermark Anything With Localized Messages
How Low Can You Go? Searching for the Intrinsic Dimensionality of Complex Networks using Metric Node Embeddings
CircuitFusion: Multimodal Circuit Representation Learning for Agile Chip Design
Can Generative AI Solve Your In-Context Learning Problem? A Martingale Perspective
Lift Your Molecules: Molecular Graph Generation in Latent Euclidean Space
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Statistical Tractability of Off-policy Evaluation of History-dependent Policies in POMDPs
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
COAT: Compressing Optimizer states and Activations for Memory-Efficient FP8 Training
Mechanistic Interpretability Meets Vision Language Models: Insights and Limitations
A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts
Efficient and Trustworthy Causal Discovery with Latent Variables and Complex Relations
Video Action Differencing
Hierarchical Uncertainty Estimation for Learning-based Registration in Neuroimaging
Recovery of Causal Graph Involving Latent Variables via Homologous Surrogates
GenEx: Generating an Explorable World
Understanding and Enhancing the Transferability of Jailbreaking Attacks
Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering
Training LLMs over Neurally Compressed Text
Autoregressive Pretraining with Mamba in Vision
Mastering Task Arithmetic: $\tau$Jp as a Key Indicator for Weight Disentanglement
Instance-dependent Early Stopping
SFS: Smarter Code Space Search improves LLM Inference Scaling
YOLO-RD: Introducing Relevant and Compact Explicit Knowledge to YOLO by Retriever-Dictionary
Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning
Rational Decision-Making Agent with Learning Internal Utility Judgment
IFORMER: INTEGRATING CONVNET AND TRANSFORMER FOR MOBILE APPLICATION
TypedThinker: Diversify Large Language Model Reasoning with Typed Thinking
Identification of Intermittent Temporal Latent Process
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
Synergy Between Sufficient Changes and Sparse Mixing Procedure for Disentangled Representation Learning
Causal Representation Learning from Multimodal Biomedical Observations
Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models
Sparse autoencoders reveal selective remapping of visual concepts during adaptation
How much of my dataset did you use? Quantitative Data Usage Inference in Machine Learning
Discovering Influential Neuron Path in Vision Transformers
SGD with memory: fundamental properties and stochastic acceleration
Reconsidering Faithfulness in Regular, Self-Explainable and Domain Invariant GNNs
Beyond Surface Structure: A Causal Assessment of LLMs' Comprehension ability
OCCAM: Towards Cost-Efficient and Accuracy-Aware Classification Inference
StringLLM: Understanding the String Processing Capability of Large Language Models
Transformer-Squared: Self-adaptive LLMs
Intrinsic Dimension Correlation: uncovering nonlinear connections in multimodal representations
Agent Skill Acquisition for Large Language Models via CycleQD
Controlling the Fidelity and Diversity of Deep Generative Models via Pseudo Density
Personalized Visual Instruction Tuning
PFGuard: A Generative Framework with Privacy and Fairness Safeguards
Hessian-Free Online Certified Unlearning
Integrative Decoding: Improving Factuality via Implicit Self-consistency
Kolmogorov-Arnold Transformer
GraphBridge: Towards Arbitrary Transfer Learning in GNNs
Factual Context Validation and Simplification: A Scalable Method to Enhance GPT Trustworthiness and Efficiency
HG-Adapter: Improving Pre-Trained Heterogeneous Graph Neural Networks with Dual Adapters
A General Framework for Producing Interpretable Semantic Text Embeddings
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
ColPali: Efficient Document Retrieval with Vision Language Models
NRGBoost: Energy-Based Generative Boosted Trees
Neuron based Personality Trait Induction in Large Language Models
Support is All You Need for Certified VAE Training
Robust Barycenter Estimation using Semi-Unbalanced Neural Optimal Transport
SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction
Quamba: A Post-Training Quantization Recipe for Selective State Space Models
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?
Universal generalization guarantees for Wasserstein distributionally robust models
Scaling In-the-Wild Training for Diffusion-based Illumination Harmonization and Editing by Imposing Consistent Light Transport
Generating Graphs via Spectral Diffusion
Sparse components distinguish visual pathways & their alignment to neural networks
GaussianAnything: Interactive Point Cloud Flow Matching for 3D Generation
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics
Eliciting Human Preferences with Language Models
What Makes a Maze Look Like a Maze?
Identifying latent state transitions in non-linear dynamical systems
Selective Task Group Updates for Multi-Task Optimization
TULIP: Token-length Upgraded CLIP
From Few to Many: Self-Improving Many-Shot Reasoners Through Iterative Optimization and Generation
Real-time design of architectural structures with differentiable mechanics and neural networks
Scalable Decentralized Learning with Teleportation
PhiNets: Brain-inspired Non-contrastive Learning Based on Temporal Prediction Hypothesis
Self-Supervised Diffusion Models for Electron-Aware Molecular Representation Learning
Learning Distributions of Complex Fluid Simulations with Diffusion Graph Networks
Palu: KV-Cache Compression with Low-Rank Projection
REFINE: Inversion-Free Backdoor Defense via Model Reprogramming
To Trust or Not to Trust? Enhancing Large Language Models' Situated Faithfulness to External Contexts
Approaching Rate-Distortion Limits in Neural Compression with Lattice Transform Coding
Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training
Mitigating Object Hallucination in MLLMs via Data-augmented Phrase-level Alignment
AI2TALE: An Innovative Information Theory-based Approach for Learning to Localize Phishing Attacks
E-Valuating Classifier Two-Sample Tests
Bayesian Image Regression with Soft-thresholded Conditional Autoregressive Prior
Explore Theory of Mind: program-guided adversarial data generation for theory of mind reasoning
BEEM: Boosting Performance of Early Exit DNNs using Multi-Exit Classifiers as Experts
Linear Recurrences Accessible to Everyone
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
The Ramanujan Library - Automated Discovery on the Hypergraph of Integer Relations
PersonalLLM: Tailoring LLMs to Individual Preferences
Language Imbalance Driven Rewarding for Multilingual Self-improving
Balanced Neural ODEs: nonlinear model order reduction and Koopman operator approximations
APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding
ManiSkill-HAB: A Benchmark for Low-Level Manipulation in Home Rearrangement Tasks
HiLo: A Learning Framework for Generalized Category Discovery Robust to Domain Shifts
More Experts Than Galaxies: Conditionally-Overlapping Experts with Biologically-Inspired Fixed Routing
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Score Forgetting Distillation: A Swift, Data-Free Method for Machine Unlearning in Diffusion Models
Provable weak-to-strong generalization via benign overfitting
Non-Stationary Dueling Bandits Under a Weighted Borda Criterion
Scalable Decision-Making in Stochastic Environments through Learned Temporal Abstraction
Has the Deep Neural Network learned the Stochastic Process? An Evaluation Viewpoint
Beyond Mere Token Analysis: A Hypergraph Metric Space Framework for Defending Against Socially Engineered LLM Attacks
Adversarial Search Engine Optimization for Large Language Models
Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI
OLMoE: Open Mixture-of-Experts Language Models
Gramian Multimodal Representation Learning and Alignment
Measuring Non-Adversarial Reproduction of Training Data in Large Language Models
O(d/T) Convergence Theory for Diffusion Probabilistic Models under Minimal Assumptions
SINGAPO: Single Image Controlled Generation of Articulated Parts in Objects
Hierarchical Autoregressive Transformers: Combining Byte- and Word-Level Processing for Robust, Adaptable Language Models
One Model Transfer to All: On Robust Jailbreak Prompts Generation against LLMs
IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities
Scalable Extraction of Training Data from Aligned, Production Language Models
Reti-Diff: Illumination Degradation Image Restoration with Retinex-based Latent Diffusion Model
Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model
Gap Preserving Distillation by Building Bidirectional Mappings with A Dynamic Teacher
GenDataAgent: On-the-fly Dataset Augmentation with Synthetic Data
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Learning Interpretable Hierarchical Dynamical Systems Models from Time Series Data
CAT-3DGS: A Context-Adaptive Triplane Approach to Rate-Distortion-Optimized 3DGS Compression
Causal Graphical Models for Vision-Language Compositional Understanding
A Unified Theory of Quantum Neural Network Loss Landscapes
Emergence of meta-stable clustering in mean-field transformer models
Counterfactual Generative Modeling with Variational Causal Inference
Bias Mitigation in Graph Diffusion Models
VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
Improved Sampling Algorithms for Lévy-Itô Diffusion Models
Aioli: A Unified Optimization Framework for Language Model Data Mixing
ActionReasoningBench: Reasoning about Actions with and without Ramification Constraints
Counterfactual Realizability
MMD-Regularized Unbalanced Optimal Transport
Variance-Reducing Couplings for Random Features
Geometry of Neural Reinforcement Learning in Continuous State and Action Spaces
Differentiable and Learnable Wireless Simulation with Geometric Transformers
Optimal Strong Regret and Violation in Constrained MDPs via Policy Optimization
Do vision models perceive objects like toddlers ?
Open-World Reinforcement Learning over Long Short-Term Imagination
Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond
Expected Return Symmetries
Fast training and sampling of Restricted Boltzmann Machines
Spreading Out-of-Distribution Detection on Graphs
Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval
Bayesian Optimization via Continual Variational Last Layer Training
Relation-Aware Diffusion for Heterogeneous Graphs with Partially Observed Features
On Rollouts in Model-Based Reinforcement Learning
TabReD: Analyzing Pitfalls and Filling the Gaps in Tabular Deep Learning Benchmarks
Sports-Traj: A Unified Trajectory Generation Model for Multi-Agent Movement in Sports
Scale-Free Graph-Language Models
Accessing Vision Foundation Models via ImageNet-1K
Samba: Synchronized Set-of-Sequences Modeling for Multiple Object Tracking
Subgraph Federated Learning for Local Generalization
Does Refusal Training in LLMs Generalize to the Past Tense?
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals
GETS: Ensemble Temperature Scaling for Calibration in Graph Neural Networks
Selective induction Heads: How Transformers Select Causal Structures in Context
DreamDistribution: Learning Prompt Distribution for Diverse In-distribution Generation
DeepRTL: Bridging Verilog Understanding and Generation with a Unified Representation Model
Is In-Context Learning Sufficient for Instruction Following in LLMs?
InstantSwap: Fast Customized Concept Swapping across Sharp Shape Differences
Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba Models
Learning to Discover Regulatory Elements for Gene Expression Prediction
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation
Improved Training Technique for Latent Consistency Models
Does Training with Synthetic Data Truly Protect Privacy?
InterMask: 3D Human Interaction Generation via Collaborative Masked Modeling
KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation
Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles
MA$^2$E: Addressing Partial Observability in Multi-Agent Reinforcement Learning with Masked Auto-Encoder
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Multi-Perspective Data Augmentation for Few-shot Object Detection
Automated Filtering of Human Feedback Data for Aligning Text-to-Image Diffusion Models
TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning
Rapid Selection and Ordering of In-Context Demonstrations via Prompt Embedding Clustering
Plug, Play, and Generalize: Length Extrapolation with Pointer-Augmented Neural Memory
FlickerFusion: Intra-trajectory Domain Generalizing Multi-agent Reinforcement Learning
EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
Deconstructing Denoising Diffusion Models for Self-Supervised Learning
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Student-Informed Teacher Training
Stochastic variance-reduced Gaussian variational inference on the Bures-Wasserstein manifold
Generalization through variance: how noise shapes inductive biases in diffusion models
A deep inverse-mapping model for a flapping robotic wing
Capability Localization: Capabilities Can be Localized rather than Individual Knowledge
Learning Successor Features with Distributed Hebbian Temporal Memory
Dist Loss: Enhancing Regression in Few-Shot Region through Distribution Distance Constraint
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Robust Transfer of Safety-Constrained Reinforcement Learning Agents
On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning
SplineGS: Learning Smooth Trajectories in Gaussian Splatting for Dynamic Scene Reconstruction
SLMRec: Distilling Large Language Models into Small for Sequential Recommendation
PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer
Accelerated Over-Relaxation Heavy-Ball Method: Achieving Global Accelerated Convergence with Broad Generalization
BaB-ND: Long-Horizon Motion Planning with Branch-and-Bound and Neural Dynamics
Towards Generalizable Reinforcement Learning via Causality-Guided Self-Adaptive Representations
PolyhedronNet: Representation Learning for Polyhedra with Surface-attributed Graph
Gaussian Splatting Lucas-Kanade
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation
Mind the GAP: Glimpse-based Active Perception improves generalization and sample efficiency of visual reasoning
SCBench: A KV Cache-Centric Analysis of Long-Context Methods
Exponential Topology-enabled Scalable Communication in Multi-agent Reinforcement Learning
On the self-verification limitations of large language models on reasoning and planning tasks
Multi-modal brain encoding models for multi-modal stimuli
Brain Bandit: A Biologically Grounded Neural Network for Efficient Control of Exploration
Robustness Auditing for Linear Regression: To Singularity and Beyond
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
Joint Graph Rewiring and Feature Denoising via Spectral Resonance
Expected Sliced Transport Plans
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion
HAMSTER: Hierarchical Action Models for Open-World Robot Manipulation
Regularization by Texts for Latent Diffusion Inverse Solvers
Partial Gromov-Wasserstein Metric
CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models
Improving Large Language Model Planning with Action Sequence Similarity
Differential Transformer
Bayesian Analysis of Combinatorial Gaussian Process Bandits
Atomas: Hierarchical Adaptive Alignment on Molecule-Text for Unified Molecule Understanding and Generation
Demystifying the Token Dynamics of Deep Selective State Space Models
Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark
Language-Image Models with 3D Understanding
CausalRivers - Scaling up benchmarking of causal discovery for real-world time-series
DenoiseVAE: Learning Molecule-Adaptive Noise Distributions for Denoising-based 3D Molecular Pre-training
DarkBench: Benchmarking Dark Patterns in Large Language Models
Methods with Local Steps and Random Reshuffling for Generally Smooth Non-Convex Federated Optimization
MAST: model-agnostic sparsified training
Generating Freeform Endoskeletal Robots
Wasserstein-Regularized Conformal Prediction under General Distribution Shift
Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems
You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning
PQMass: Probabilistic Assessment of the Quality of Generative Models using Probability Mass Estimation
Rethinking the role of frames for SE(3)-invariant crystal structure modeling
Iterative Label Refinement Matters More than Preference Optimization under Weak Supervision
Uncovering Gaps in How Humans and LLMs Interpret Subjective Language
Periodic Materials Generation using Text-Guided Joint Diffusion Model
SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints
Towards Unbiased Calibration using Meta-Regularization
Fast Training of Sinusoidal Neural Fields via Scaling Initialization
Learning the Complexity of Weakly Noisy Quantum States
Fast unsupervised ground metric learning with tree-Wasserstein distance
Learning system dynamics without forgetting
QERA: an Analytical Framework for Quantization Error Reconstruction
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
Self-Improvement for Neural Combinatorial Optimization: Sample Without Replacement, but Improvement
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Improving Pretraining Data Using Perplexity Correlations
Retri3D: 3D Neural Graphics Representation Retrieval
On the Price of Differential Privacy for Hierarchical Clustering
CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning
Learning-Augmented Search Data Structures
MixMax: Distributional Robustness in Function Space via Optimal Data Mixtures
AdaWM: Adaptive World Model based Planning for Autonomous Driving
HASARD: A Benchmark for Vision-Based Safe Reinforcement Learning in Embodied Agents
Toward Understanding In-context vs. In-weight Learning
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization
Improving Unsupervised Constituency Parsing via Maximizing Semantic Information
GIFT: Unlocking Full Potential of Labels in Distilled Dataset at Near-zero Cost
CoMRes: Semi-Supervised Time Series Forecasting Utilizing Consensus Promotion of Multi-Resolution
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
Optimal Transport for Time Series Imputation
Is Large-scale Pretraining the Secret to Good Domain Generalization?
Cross-Domain Off-Policy Evaluation and Learning for Contextual Bandits
Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap
ANaGRAM: A Natural Gradient Relative to Adapted Model for efficient PINNs learning
Systematic Outliers in Large Language Models
Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them
Needle Threading: Can LLMs Follow Threads Through Near-Million-Scale Haystacks?
Residual Connections and Normalization Can Provably Prevent Oversmoothing in GNNs
Gaussian Mixture Counterfactual Generator
Efficient Top-m Data Values Identification for Data Selection
Gradient correlation is a key ingredient to accelerate SGD with momentum
Learning to Search from Demonstration Sequences
An Information Criterion for Controlled Disentanglement of Multimodal Data
Mitigating Parameter Interference in Model Merging via Sharpness-Aware Fine-Tuning
Language Models Are Implicitly Continuous
Weighted Point Set Embedding for Multimodal Contrastive Learning Toward Optimal Similarity Metric
PABBO: Preferential Amortized Black-Box Optimization
Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model
Jump Your Steps: Optimizing Sampling Schedule of Discrete Diffusion Models
PRDP: Progressively Refined Differentiable Physics
TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation
Generalized Consistency Trajectory Models for Image Manipulation
Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination
NeurFlow: Interpreting Neural Networks through Neuron Groups and Functional Interactions
HyperPLR: Hypergraph Generation through Projection, Learning, and Reconstruction
Semi-Parametric Retrieval via Binary Bag-of-Tokens Index
M^3PC: Test-time Model Predictive Control using Pretrained Masked Trajectory Model
Imputation for prediction: beware of diminishing returns.
Forking Paths in Neural Text Generation
RECAST: Reparameterized, Compact weight Adaptation for Sequential Tasks
Broaden your SCOPE! Efficient Multi-turn Conversation Planning for LLMs with Semantic Space
HALL-E: Hierarchical Neural Codec Language Model for Minute-Long Zero-Shot Text-to-Speech Synthesis
Adapters for Altering LLM Vocabularies: What Languages Benefit the Most?
Scaling LLM Test-Time Compute Optimally Can be More Effective than Scaling Parameters for Reasoning
Model Equality Testing: Which Model is this API Serving?
Decoupled Finetuning for Domain Generalizable Semantic Segmentation
Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology
Context Clues: Evaluating Long Context Models for Clinical Prediction Tasks on EHR Data
Enhancing Uncertainty Estimation and Interpretability with Bayesian Non-negative Decision Layer
Progressive Token Length Scaling in Transformer Encoders for Efficient Universal Segmentation
Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment
Positive-Unlabeled Diffusion Models for Preventing Sensitive Data Generation
Equivariant Neural Functional Networks for Transformers
Intermediate Layer Classifiers for OOD generalization
Jamba: Hybrid Transformer-Mamba Language Models
From Tokens to Words: On the Inner Lexicon of LLMs
A Curious Case of the Missing Measure: Better Scores and Worse Generation
Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models
Self-Updatable Large Language Models by Integrating Context into Model Parameters
Exploiting Distribution Constraints for Scalable and Efficient Image Retrieval
Dimension Agnostic Neural Processes
HR-Extreme: A High-Resolution Dataset for Extreme Weather Forecasting
ViSAGe: Video-to-Spatial Audio Generation
Revisiting Energy Based Models as Policies: Ranking Noise Contrastive Estimation and Interpolating Energy Models
Fine-tuning with Reserved Majority for Noise Reduction
Shallow diffusion networks provably learn hidden low-dimensional structure
Deep Kernel Posterior Learning under Infinite Variance Prior Weights
Credit-based self organizing maps: training deep topographic networks with minimal performance degradation
Privacy Auditing of Large Language Models
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better
An Efficient Framework for Crediting Data Contributors of Diffusion Models
TexTailor: Customized Text-aligned Texturing via Effective Resampling
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
A Large-scale Training Paradigm for Graph Generative Models
PooDLe🐩: Pooled and dense self-supervised learning from naturalistic videos
From GNNs to Trees: Multi-Granular Interpretability for Graph Neural Networks
Stable Segment Anything Model
Attention as a Hypernetwork
Node Similarities under Random Projections: Limits and Pathological Cases
OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models
Permute-and-Flip: An optimally stable and watermarkable decoder for LLMs
Breaking Class Barriers: Efficient Dataset Distillation via Inter-Class Feature Compensator
STORM: Spatio-TempOral Reconstruction Model For Large-Scale Outdoor Scenes
UniGEM: A Unified Approach to Generation and Property Prediction for Molecules
Harnessing Webpage UIs for Text-Rich Visual Understanding
Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Prompt as Knowledge Bank: Boost Vision-language model via Structural Representation for zero-shot medical detection
How do we interpret the outputs of a neural network trained on classification?
ONLINE EPSILON NET & PIERCING SET FOR GEOMETRIC CONCEPTS
Vision CNNs trained to estimate spatial latents learned similar ventral-stream-aligned representations
Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization
Unsupervised Meta-Learning via In-Context Learning
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
AdaGrad under Anisotropic Smoothness
Re-evaluating Open-ended Evaluation of Large Language Models
A Skewness-Based Criterion for Addressing Heteroscedastic Noise in Causal Discovery
Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians
Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF
Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration
PPT: Patch Order Do Matters In Time Series Pretext Task
Bridging Jensen Gap for Max-Min Group Fairness Optimization in Recommendation
Efficient Discovery of Pareto Front for Multi-Objective Reinforcement Learning
Constructing Confidence Intervals for Average Treatment Effects from Multiple Datasets
Efficient Training of Neural Stochastic Differential Equations by Matching Finite Dimensional Distributions
The Computational Complexity of Circuit Discovery for Inner Interpretability
ImpScore: A Learnable Metric For Quantifying The Implicitness Level of Sentences
Immunogenicity Prediction with Dual Attention Enables Vaccine Target Selection
ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments
Counterfactual Concept Bottleneck Models
“I Am the One and Only, Your Cyber BFF”: Understanding the Impact of GenAI Requires Understanding the Impact of Anthropomorphic AI
Predicate Hierarchies Improve Few-Shot State Classification
Expand and Compress: Exploring Tuning Principles for Continual Spatio-Temporal Graph Forecasting
Air Quality Prediction with Physics-Guided Dual Neural ODEs in Open Systems
Track-On: Transformer-based Online Point Tracking with Memory
Towards Neural Scaling Laws for Time Series Foundation Models
Enhancing Document Understanding with Group Position Embedding: A Novel Approach to Incorporate Layout Information
First-Person Fairness in Chatbots
Interactive Adjustment for Human Trajectory Prediction with Individual Feedback
On the Modeling Capabilities of Large Language Models for Sequential Decision Making
AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution
LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid
PIG: Physics-Informed Gaussians as Adaptive Parametric Mesh Representations
Transformer Encoder Satisfiability: Complexity and Impact on Formal Reasoning
Learning Causal Alignment for Reliable Disease Diagnosis
Refining CLIP's Spatial Awareness: A Visual-Centric Perspective
Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency
YouTube-SL-25: A Large-Scale, Open-Domain Multilingual Sign Language Parallel Corpus
AIR-BENCH 2024: A Safety Benchmark based on Regulation and Policies Specified Risk Categories
Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models
Mini-batch Coresets for Memory-efficient Language Model Training on Data Mixtures
QA-Calibration of Language Model Confidence Scores
Speech Robust Bench: A Robustness Benchmark For Speech Recognition
Segment Any 3D Object with Language
ComPC: Completing a 3D Point Cloud with 2D Diffusion Priors
Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning
Beyond single neurons: population response geometry in digital twins of mouse visual cortex
A Conditional Independence Test in the Presence of Discretization
Towards Unbiased Learning in Semi-Supervised Semantic Segmentation
Chain-of-Focus Prompting: Leveraging Sequential Visual Cues to Prompt Large Autoregressive Vision Models
Flow: Modularized Agentic Workflow Automation
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance
Prototype antithesis for biological few-shot class-incremental learning
PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection
Towards Explaining the Power of Constant-depth Graph Neural Networks for Structured Linear Programming
On the Optimization Landscape of Low Rank Adaptation Methods for Large Language Models
Multi-modal Learning: A Look Back and the Road Ahead
Execution-guided within-prompt search for programming-by-example
Metric-Driven Attributions for Vision Transformers
Correlation and Navigation in the Vocabulary Key Representation Space of Language Models
LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch
OPTAMI: Global Superlinear Convergence of High-order Methods
SOO-Bench: Benchmarks for Evaluating the Stability of Offline Black-Box Optimization
Centrality-guided Pre-training for Graph
Emergent Orientation Maps —— Mechanisms, Coding Efficiency and Robustness
Unveiling the Magic of Code Reasoning through Hypothesis Decomposition and Amendment
Do not write that jailbreak paper
Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Dynamic Scenes
It Helps to Take a Second Opinion: Teaching Smaller LLMs To Deliberate Mutually via Selective Rationale Optimisation
Symbolic regression via MDLformer-guided search: from minimizing prediction error to minimizing description length
Learning Partial Graph Matching via Optimal Partial Transport
ComLoRA: A Competitive Learning Approach for Enhancing LoRA
Learnable Expansion of Graph Operators for Multi-Modal Feature Fusion
MTSAM: Multi-Task Fine-Tuning for Segment Anything Model
HeadMap: Locating and Enhancing Knowledge Circuits in LLMs
Learning to Explore and Exploit with GNNs for Unsupervised Combinatorial Optimization
Trajectory-Class-Aware Multi-Agent Reinforcement Learning
Generalized Video Moment Retrieval
Modeling Unseen Environments with Language-guided Composable Causal Components in Reinforcement Learning
HiRA: Parameter-Efficient Hadamard High-Rank Adaptation for Large Language Models
Sharpness-Aware Black-Box Optimization
Better Instruction-Following Through Minimum Bayes Risk
AFlow: Automating Agentic Workflow Generation
Long-Short Decision Transformer: Bridging Global and Local Dependencies for Generalized Decision-Making
Training-free LLM-generated Text Detection by Mining Token Probability Sequences
Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction
Collab: Controlled Decoding using Mixture of Agents for LLM Alignment
VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration
Joint Gradient Balancing for Data Ordering in Finite-Sum Multi-Objective Optimization
Advancing Out-of-Distribution Detection via Local Neuroplasticity
Hierarchically Encapsulated Representation for Protocol Design in Self-Driving Labs
RTop-K: Ultra-Fast Row-Wise Top-K Selection for Neural Network Acceleration on GPUs
The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws
Manifold Constraint Reduces Exposure Bias in Accelerated Diffusion Sampling
Scaling Wearable Foundation Models
Topological Zigzag Spaghetti for Diffusion-based Generation and Prediction on Graphs
Divergence of Neural Tangent Kernel in Classification Problems
SplatFormer: Point Transformer for Robust 3D Gaussian Splatting
The Optimization Landscape of SGD Across the Feature Learning Strength
Depth Any Video with Scalable Synthetic Data
Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics
DOPL: Direct Online Preference Learning for Restless Bandits with Preference Feedback
SynQ: Accurate Zero-shot Quantization by Synthesis-aware Fine-tuning
Sensitivity-Aware Amortized Bayesian Inference
Combining Induction and Transduction for Abstract Reasoning
How DNNs break the Curse of Dimensionality: Compositionality and Symmetry Learning
LLM Unlearning via Loss Adjustment with Only Forget Data
Topograph: An Efficient Graph-Based Framework for Strictly Topology Preserving Image Segmentation
Scale-aware Recognition in Satellite Images under Resource Constraints
KGARevion: An AI Agent for Knowledge-Intensive Biomedical QA
Measuring memorization in RLHF for code completion
Denoising Levy Probabilistic Models
Why In-Context Learning Models are Good Few-Shot Learners?
Boltzmann-Aligned Inverse Folding Model as a Predictor of Mutational Effects on Protein-Protein Interactions
Co$^{\mathbf{3}}$Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion
Latent Action Pretraining from Videos
Improving Generalization and Robustness in SNNs Through Signed Rate Encoding and Sparse Encoding Attacks
Event-Driven Online Vertical Federated Learning
Do WGANs succeed because they minimize the Wasserstein Distance? Lessons from Discrete Generators
Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs
Training Free Exponential Context Extension via Cascading KV Cache
Mixture Compressor for Mixture-of-Experts LLMs Gains More
A Training-Free Sub-quadratic Cost Transformer Model Serving Framework with Hierarchically Pruned Attention
Transformers Handle Endogeneity in In-Context Linear Regression
Language-Assisted Feature Transformation for Anomaly Detection
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
MOFFlow: Flow Matching for Structure Prediction of Metal-Organic Frameworks
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
The Illustrated AlphaFold
Noise-conditioned Energy-based Annealed Rewards (NEAR): A Generative Framework for Imitation Learning from Observation
Improved Convergence Rate for Diffusion Probabilistic Models
Probing the Latent Hierarchical Structure of Data via Diffusion Models
SymDiff: Equivariant Diffusion via Stochastic Symmetrisation
Prediction Risk and Estimation Risk of the Ridgeless Least Squares Estimator under General Assumptions on Regression Errors
L3Ms — Lagrange Large Language Models
Learning High-Degree Parities: The Crucial Role of the Initialization
Learning to Contextualize Web Pages for Enhanced Decision Making by LLM Agents
Microcanonical Langevin Ensembles: Advancing the Sampling of Bayesian Neural Networks
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
A Watermark for Order-Agnostic Language Models
Efficient Neuron Segmentation in Electron Microscopy by Affinity-Guided Queries
ADBM: Adversarial Diffusion Bridge Model for Reliable Adversarial Purification
LoLCATs: On Low-Rank Linearizing of Large Language Models
DynAlign: Unsupervised Dynamic Taxonomy Alignment for Cross-Domain Segmentation
On Conformal Isometry of Grid Cells: Learning Distance-Preserving Position Embedding
Amortized Control of Continuous State Space Feynman-Kac Model for Irregular Time Series
DS-LLM: Leveraging Dynamical Systems to Enhance Both Training and Inference of Large Language Models
Interpreting Emergent Planning in Model-Free Reinforcement Learning
Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis
Diff-PIC: Revolutionizing Particle-In-Cell Nuclear Fusion Simulation with Diffusion Models
Effective post-training embedding compression via temperature control in contrastive training
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks
In vivo cell-type and brain region classification via multimodal contrastive learning
Spectral-Refiner: Accurate Fine-Tuning of Spatiotemporal Fourier Neural Operator for Turbulent Flows
LucidPPN: Unambiguous Prototypical Parts Network for User-centric Interpretable Computer Vision
DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications for Multi-Task RL
FlashRNN: I/O-Aware Optimization of Traditional RNNs on modern hardware
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Vision-LSTM: xLSTM as Generic Vision Backbone
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
Efficiently Parameterized Neural Metriplectic Systems
MatExpert: Decomposing Materials Discovery By Mimicking Human Experts
VAE-Var: Variational Autoencoder-Enhanced Variational Methods for Data Assimilation in Meteorology
Neural Stochastic Differential Equations for Uncertainty-Aware Offline RL
Stiefel Flow Matching for Moment-Constrained Structure Elucidation
Robust Root Cause Diagnosis using In-Distribution Interventions
CSA: Data-efficient Mapping of Unimodal Features to Multimodal Features
Efficient and Robust Neural Combinatorial Optimization via Wasserstein-Based Coresets
Safety-Prioritizing Curricula for Constrained Reinforcement Learning
Semantic Loss Guided Data Efficient Supervised Fine Tuning for Safe Responses in LLMs
Benchmarking LLMs' Judgments with No Gold Standard
UIFace: Unleashing Inherent Model Capabilities to Enhance Intra-Class Diversity in Synthetic Face Recognition
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior
Transformers Provably Learn Two-Mixture of Linear Classification via Gradient Flow
Forewarned is Forearmed: Harnessing LLMs for Data Synthesis via Failure-induced Exploration
CheapNet: Cross-attention on Hierarchical representations for Efficient protein-ligand binding Affinity Prediction
Provable Uncertainty Decomposition via Higher-Order Calibration
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
A Generic Framework for Conformal Fairness
Representative Guidance: Diffusion Model Sampling with Coherence
Spread Preference Annotation: Direct Preference Judgment for Efficient LLM Alignment
Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning
Optimizing importance weighting in the presence of sub-population shifts
Synthesizing Realistic fMRI: A Physiological Dynamics-Driven Hierarchical Diffusion Model for Efficient fMRI Acquisition
Logical Consistency of Large Language Models in Fact-Checking
ProtComposer: Compositional Protein Structure Generation with 3D Ellipsoids
TAU-106K: A New Dataset for Comprehensive Understanding of Traffic Accident
Conditional Diffusion Models are Minimax-Optimal and Manifold-Adaptive for Conditional Distribution Estimation
CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs
HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics
Multimodal Lego: Model Merging and Fine-Tuning Across Topologies and Modalities in Biomedicine
Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention
Elliptic Loss Regularization
Tuning-Free Bilevel Optimization: New Algorithms and Convergence Analysis
How to visualize training dynamics in neural networks
Progressive Compression with Universally Quantized Diffusion Models
AstroCompress: A benchmark dataset for multi-purpose compression of astronomical data
Private Mechanism Design via Quantile Estimation
SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation
Implicit Bias of Mirror Flow for Shallow Neural Networks in Univariate Regression
Multi-Label Test-Time Adaptation with Bound Entropy Minimization
Subtask-Aware Visual Reward Learning from Segmented Demonstrations
Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation
Trusted Multi-View Classification via Evolutionary Multi-View Fusion
Federated Residual Low-Rank Adaption of Large Language Models
Flavors of Margin: Implicit Bias of Steepest Descent in Homogeneous Neural Networks
On the Importance of Language-driven Representation Learning for Heterogeneous Federated Learning
Pareto Prompt Optimization
Revisiting Random Walks for Learning on Graphs
Algorithmic Stability Based Generalization Bounds for Adversarial Training
A Generalist Hanabi Agent
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Following the Human Thread in Social Navigation
Population Transformer: Learning Population-level Representations of Neural Activity
Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search
CoTFormer: A Chain of Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference
LancBiO: Dynamic Lanczos-aided Bilevel Optimization via Krylov Subspace
ReAttention: Training-Free Infinite Context with Finite Attention Scope
PCNN: Probable-Class Nearest-Neighbor Explanations Improve Fine-Grained Image Classification Accuracy for AIs and Humans
Bringing NeRFs to the Latent Space: Inverse Graphics Autoencoder
Matrix Product Sketching via Coordinated Sampling
Uncovering Latent Memories in Large Language Models
PIORF: Physics-Informed Ollivier-Ricci Flow for Long–Range Interactions in Mesh Graph Neural Networks
Learning Spatiotemporal Dynamical Systems from Point Process Observations
NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields
TimeInf: Time Series Data Contribution via Influence Functions
Trivialized Momentum Facilitates Diffusion Generative Modeling on Lie Groups
Multi-Field Adaptive Retrieval
Diffusion Generative Modeling for Spatially Resolved Gene Expression Inference from Histology Images
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements
Learning General-purpose Biomedical Volume Representations using Randomized Synthesis
Context Steering: Controllable Personalization at Inference Time
Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass
ST-GCond: Self-supervised and Transferable Graph Dataset Condensation
Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis
A new framework for evaluating model out-of-distribution generalisation for the biochemical domain
Think Thrice Before You Act: Progressive Thought Refinement in Large Language Models
Multiplicative Logit Adjustment Approximates Neural-Collapse-Aware Decision Boundary Adjustment
Improving Equivariant Networks with Probabilistic Symmetry Breaking
Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
Conformal Structured Prediction
Straightness of Rectified Flow: A Theoretical Insight into Wasserstein Convergence
Rare event modeling with self-regularized normalizing flows: what can we learn from a single failure?
Adaptive Pruning of Pretrained Transformer via Differential Inclusions
Optimal Non-Asymptotic Rates of Value Iteration for Average-Reward Markov Decision Processes
SmartRAG: Jointly Learn RAG-Related Tasks From the Environment Feedback
SOREL: A Stochastic Algorithm for Spectral Risks Minimization
Controlling Language and Diffusion Models by Transporting Activations
TopoDiffusionNet: A Topology-aware Diffusion Model
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
FairDen: Fair Density-Based Clustering
Diffusion Models are Evolutionary Algorithms
OATS: Outlier-Aware Pruning Through Sparse and Low Rank Decomposition
Discrete Copula Diffusion
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Learn hybrid prototypes for multivariate time series anomaly detection
Efficient Biological Data Acquisition through Inference Set Design
MetaOOD: Automatic Selection of OOD Detection Models
Transformer Block Coupling and its Correlation with Generalization in LLMs
Collapsed Language Models Promote Fairness
Glauber Generative Model: Discrete Diffusion Models via Binary Classification
SMITE: Segment Me In TimE
Privacy-Preserving Personalized Federated Prompt Learning for Multimodal Large Language Models
LLaMaFlex: Many-in-one LLMs via Generalized Pruning and Weight Sharing
Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaptation
Generative Monoculture in Large Language Models
ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance
Image and Video Tokenization with Binary Spherical Quantization
Training Robust Ensembles Requires Rethinking Lipschitz Continuity
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion
Distilling Structural Representations into Protein Sequence Models
Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs
COPER: Correlation-based Permutations for Multi-View Clustering
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
KinPFN: Bayesian Approximation of RNA Folding Kinetics using Prior-Data Fitted Networks
SMT: Fine-Tuning Large Language Models with Sparse Matrices
Provence: efficient and robust context pruning for retrieval-augmented generation
Long-Context Linear System Identification
Spectro-Riemannian Graph Neural Networks
Efficient Evolutionary Search Over Chemical Space with Large Language Models
Efficient Sparse PCA via Block-Diagonalization
Chain-of-region: Visual Language Models Need Details for Diagram Analysis
Reasoning of Large Language Models over Knowledge Graphs with Super-Relations
Aligned LLMs Are Not Aligned Browser Agents
SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training
Minimal Impact ControlNet: Advancing Multi-ControlNet Integration
Unbounded: A Generative Infinite Game of Character Life Simulation
Noise Stability Optimization for Finding Flat Minima: A Hessian-based Regularization Approach
Three-in-One: Fast and Accurate Transducer for Hybrid-Autoregressive ASR
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
A Unified Framework for Forward and Inverse Problems in Subsurface Imaging using Latent Space Translations
Interference Among First-Price Pacing Equilibria: A Bias and Variance Analysis
MAGE: Model-Level Graph Neural Networks Explanations via Motif-based Graph Generation
Transformers Struggle to Learn to Search
Adversarial Latent Feature Augmentation for Fairness
Simple Guidance Mechanisms for Discrete Diffusion Models
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Improved Finite-Particle Convergence Rates for Stein Variational Gradient Descent
Learning Randomized Algorithms with Transformers
Encryption-Friendly LLM Architecture
Bonsai: Gradient-free Graph Condensation for Node Classification
Sensitivity Verification for Additive Decision Tree Ensembles
Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?
An Undetectable Watermark for Generative Image Models
Diffusion Models and Gaussian Flow Matching: Two Sides of the Same Coin
Differentiable Causal Discovery for Latent Hierarchical Causal Models
On the Relation between Trainability and Dequantization of Variational Quantum Learning Models
Sketching for Convex and Nonconvex Regularized Least Squares with Sharp Guarantees
Lawma: The Power of Specialization for Legal Annotation
Fengbo: a Clifford Neural Operator pipeline for 3D PDEs in Computational Fluid Dynamics
Problem-Parameter-Free Federated Learning
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs
LoRA-X: Bridging Foundation Models with Training-Free Cross-Model Adaptation
Beyond FVD: An Enhanced Evaluation Metrics for Video Generation Distribution Quality
Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting
VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning
Differentiable Optimization of Similarity Scores Between Models and Brains
Progressive distillation induces an implicit curriculum
Enhanced Diffusion Sampling via Extrapolation with Multiple ODE Solutions
McEval: Massively Multilingual Code Evaluation
E(n) Equivariant Topological Neural Networks
Tell me about yourself: LLMs are aware of their learned behaviors
Conformal Language Model Reasoning with Coherent Factuality
Better than Your Teacher: LLM Agents that learn from Privileged AI Feedback
Finding and Only Finding Differential Nash Equilibria by Both Pretending to be a Follower
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux Modelling
On the Inherent Privacy Properties of Discrete Denoising Diffusion Models
Towards Understanding Why Label Smoothing Degrades Selective Classification and How to Fix It
Self-Improving Robust Preference Optimization
NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching
Fugatto 1: Foundational Generative Audio Transformer Opus 1
Gumbel Counterfactual Generation From Language Models
Linear Representations of Political Perspective Emerge in Large Language Models
Looking Backward: Streaming Video-to-Video Translation with Feature Banks
Why Does the Effective Context Length of LLMs Fall Short?
Weighted Multi-Prompt Learning with Description-free Large Language Model Distillation
Dobi-SVD: Differentiable SVD for LLM Compression and Some New Perspectives
Pitfalls of Evidence-Based AI Policy
Think-on-Graph 2.0: Deep and Faithful Large Language Model Reasoning with Knowledge-guided Retrieval Augmented Generation
Towards Understanding Text Hallucination of Diffusion Models via Local Generation Bias
Model-Agnostic Knowledge Guided Correction for Improved Neural Surrogate Rollout
Learning Splitting Heuristics in Divide-and-Conquer SAT Solvers with Reinforcement Learning
Singular Subspace Perturbation Bounds via Rectangular Random Matrix Diffusions
Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies
Leveraging Variable Sparsity to Refine Pareto Stationarity in Multi-Objective Optimization
Stochastic Bandits Robust to Adversarial Attacks
Model-Free Offline Reinforcement Learning with Enhanced Robustness
Efficient stagewise pretraining via progressive subnetworks
A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules
SONICS: Synthetic Or Not - Identifying Counterfeit Songs
Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement Learning
Learned Reference-based Diffusion Sampler for multi-modal distributions
PolaFormer: Polarity-aware Linear Attention for Vision Transformers
BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments
Effective Interplay between Sparsity and Quantization: From Theory to Practice
On the Convergence of No-Regret Dynamics in Information Retrieval Games with Proportional Ranking Functions
Diffusion Models as Cartoonists: The Curious Case of High Density Regions
Fine-Tuning Attention Modules Only: Enhancing Weight Disentanglement in Task Arithmetic
ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities
Generalization and Distributed Learning of GFlowNets
ToolGen: Unified Tool Retrieval and Calling via Generation
Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient
WardropNet: Traffic Flow Predictions via Equilibrium-Augmented Learning
When do GFlowNets learn the right distribution?
CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and Reranking
Equivariant Denoisers Cannot Copy Graphs: Align Your Graph Diffusion Models
Towards a Complete Logical Framework for GNN Expressiveness
Improving Semantic Understanding in Speech Language Models via Brain-tuning
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL
Efficient and Context-Aware Label Propagation for Zero-/Few-Shot Training-Free Adaptation of Vision-Language Model
Scaling Laws for Precision
Human-inspired Episodic Memory for Infinite Context LLMs
Rethinking Shapley Value for Negative Interactions in Non-convex Games
MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models
Matérn Kernels for Tunable Implicit Surface Reconstruction
Brain-inspired $L_p$-Convolution benefits large kernels and aligns better with visual cortex
In Search of the Engram in LLMs: A Neuroscience Perspective on the Memory Functions in AI Models
GLoRa: A Benchmark to Evaluate the Ability to Learn Long-Range Dependencies in Graphs
Progress or Regress? Self-Improvement Reversal in Post-training
MAESTRO: Masked Encoding Set Transformer with Self-Distillation
dEBORA: Efficient Bilevel Optimization-based low-Rank Adaptation
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback
Endless Jailbreaks with Bijection Learning
Multi-session, multi-task neural decoding from distinct cell-types and brain regions
Towards Multiple Character Image Animation Through Enhancing Implicit Decoupling
Offline Model-Based Optimization by Learning to Rank
Open-CK: A Large Multi-Physics Fields Coupling benchmarks in Combustion Kinetics
Self-Improvement in Language Models: The Sharpening Mechanism
RecDreamer: Consistent Text-to-3D Generation via Uniform Score Distillation
Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson–Romberg Extrapolation
CameraCtrl: Enabling Camera Control for Video Diffusion Models
RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation
Feature-Based Online Bilateral Trade
DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation
Layout-your-3D: Controllable and Precise 3D Generation with 2D Blueprint
No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images
Comparing Targeting Strategies for Maximizing Social Welfare with Limited Resources
Dynamic Sparse Training versus Dense Training: The Unexpected Winner in Image Corruption Robustness
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
MotionClone: Training-Free Motion Cloning for Controllable Video Generation
Systematic Relational Reasoning With Epistemic Graph Neural Networks
CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation
System 1.x: Learning to Balance Fast and Slow Planning with Language Models
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension
Understanding the Generalization of In-Context Learning in Transformers: An Empirical Study
Halton Scheduler for Masked Generative Image Transformer
Continuity-Preserving Convolutional Autoencoders for Learning Continuous Latent Dynamical Models from Images
$\sigma$-zero: Gradient-based Optimization of $\ell_0$-norm Adversarial Examples
pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation
Comparing noisy neural population dynamics using optimal transport distances
Physics-Informed Deep Inverse Operator Networks for Solving PDE Inverse Problems
ALLaM: Large Language Models for Arabic and English
ParFam -- (Neural Guided) Symbolic Regression via Continuous Global Optimization
OSDA Agent: Leveraging Large Language Models for De Novo Design of Organic Structure Directing Agents
Progressive Compositionality in Text-to-Image Generative Models
Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models
Learning to engineer protein flexibility
OmniPhysGS: 3D Constitutive Gaussians for General Physics-Based Dynamics Generation
Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation
ACES: Automatic Cohort Extraction System for Event-Stream Datasets
Learning multi-modal generative models with permutation-invariant encoders and tighter variational objectives
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
GRAIN: Exact Graph Reconstruction from Gradients
Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models
Gradient descent with generalized Newton’s method
Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity
Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
Nesterov acceleration in benignly non-convex landscapes
Towards hyperparameter-free optimization with differential privacy
MMAD: A Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection
See It from My Perspective: How Language Affects Cultural Bias in Image Understanding
LevAttention: Time, Space and Streaming Efficient Algorithm for Heavy Attentions
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
From Attention to Activation: Unraveling the Enigmas of Large Language Models
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation
Long-time asymptotics of noisy SVGD outside the population limit
MamKO: Mamba-based Koopman operator for modeling and predictive control
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
Streaming Algorithms For $\ell_p$ Flows and $\ell_p$ Regression
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
Improving the Sparse Structure Learning of Spiking Neural Networks from the View of Compression Efficiency
A Theory for Token-Level Harmonization in Retrieval-Augmented Generation
Emergence of a High-Dimensional Abstraction Phase in Language Transformers
Model merging with SVD to tie the Knots
Temporal Reasoning Transfer from Text to Video
Revealing the 3D Cosmic Web through Gravitationally Constrained Neural Fields
Peeking Behind Closed Doors: Risks of LLM Evaluation by Private Data Curators
Reassessing EMNLP 2024’s Best Paper: Does Divergence-Based Calibration for MIAs Hold Up?
Mining your own secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
Herald: A Natural Language Annotated Lean 4 Dataset
SoftCVI: Contrastive variational inference with self-generated soft labels
ContraDiff: Planning Towards High Return States via Contrastive Learning
Class Distribution-induced Attention Map for Open-vocabulary Semantic Segmentations
Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate
Rethinking Fair Representation Learning for Performance-Sensitive Tasks
Reconstruction-Guided Policy: Enhancing Decision-Making through Agent-Wise State Consistency
BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL
From Probability to Counterfactuals: the Increasing Complexity of Satisfiability in Pearl's Causal Hierarchy
Exploring The Loss Landscape Of Regularized Neural Networks Via Convex Duality
Reasoning Elicitation in Language Models via Counterfactual Feedback
TSVD: Bridging Theory and Practice in Continual Learning with Pre-trained Models
InCoDe: Interpretable Compressed Descriptions For Image Generation
Coreset Spectral Clustering
ObscuraCoder: Powering Efficient Code LM Pre-Training Via Obfuscation Grounding
Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion Models
Efficient and Accurate Explanation Estimation with Distribution Compression
Fundamental Limitations on Subquadratic Alternatives to Transformers
Latent-EnSF: A Latent Ensemble Score Filter for High-Dimensional Data Assimilation with Sparse Observation Data
A Meta-Learning Approach to Bayesian Causal Discovery
Start Smart: Leveraging Gradients For Enhancing Mask-based XAI Methods
Learning Chaos In A Linear Way
Improving Instruction-Following in Language Models through Activation Steering
Unearthing Skill-level Insights for Understanding Trade-offs of Foundation Models
Union-over-Intersections: Object Detection beyond Winner-Takes-All
Compositional Entailment Learning for Hyperbolic Vision-Language Models
Implicit Neural Surface Deformation with Explicit Velocity Fields
Intelligence at the Edge of Chaos
Multimodal Situational Safety
Analysing The Spectral Biases in Generative Models
Learning to Communicate Through Implicit Communication Channels
Discrete GCBF Proximal Policy Optimization for Multi-agent Safe Optimal Control
Learning Continually by Spectral Regularization
MGDA Converges under Generalized Smoothness, Provably
Boundary constrained Gaussian processes for robust physics-informed machine learning of linear partial differential equations
Multi-Task Dense Predictions via Unleashing the Power of Diffusion
Law of the Weakest Link: Cross Capabilities of Large Language Models
InvestESG: A multi-agent reinforcement learning benchmark for studying climate investment as a social dilemma
Bayesian Regularization of Latent Representation
Boosting Ray Search Procedure of Hard-label Attacks with Transfer-based Priors
BOND: Aligning LLMs with Best-of-N Distillation
Can LLMs Solve Longer Math Word Problems Better?
UGMathBench: A Diverse and Dynamic Benchmark for Undergraduate-Level Mathematical Reasoning with Large Language Models
Can Knowledge Editing Really Correct Hallucinations?
Improving Uncertainty Estimation through Semantically Diverse Language Generation
F-Fidelity: A Robust Framework for Faithfulness Evaluation of Explainable AI
Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval-Augmented Generation
Evaluating Large Language Models through Role-Guide and Self-Reflection: A Comparative Study
Stochastic Semi-Gradient Descent for Learning Mean Field Games with Population-Aware Function Approximation
Solving Differential Equations with Constrained Learning
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning
On Generalization Across Environments In Multi-Objective Reinforcement Learning
TASAR: Transfer-based Attack on Skeletal Action Recognition
Diffusion Policy Policy Optimization
Optimizing Posterior Samples for Bayesian Optimization via Rootfinding
DEPfold: RNA Secondary Structure Prediction as Dependency Parsing.
Open-Source vs Close-Source: The Context Utilization Challenge
Aria-MIDI: A Dataset of Piano MIDI Files for Symbolic Music Modeling
Enhancing Language Model Agents using Diversity of Thoughts
Variational Bayesian Pseudo-Coreset
ThinK: Thinner Key Cache by Query-Driven Pruning
Actions Speak Louder Than Words: Rate-Reward Trade-off in Markov Decision Processes
Efficient Inference for Large Language Model-based Generative Recommendation
Scaling up the Banded Matrix Factorization Mechanism for Large Scale Differentially Private ML
NetMoE: Accelerating MoE Training through Dynamic Sample Placement
Adversarial Machine Unlearning
Storybooth: Training-Free Multi-Subject Consistency for Improved Visual Storytelling
Concept-ROT: Poisoning Concepts in Large Language Models with Model Editing
Morphing Tokens Draw Strong Masked Image Models
BoneMet: An Open Large-Scale Multi-Modal Murine Dataset for Breast Cancer Bone Metastasis Diagnosis and Prognosis
Adaptive Camera Sensor for Vision Models
Nonlinear multiregion neural dynamics with parametric impulse response communication channels
DELIFT: Data Efficient Language model Instruction Fine-Tuning
On the Almost Sure Convergence of the Stochastic Three Points Algorithm
On-the-fly Preference Alignment via Principle-Guided Decoding
Normed Spaces for Graph Embedding
Asymptotic Analysis of Two-Layer Neural Networks after One Gradient Step under Gaussian Mixtures Data with Structure
Dataset Ownership Verification in Contrastive Pre-trained Models
Equivariant Symmetry Breaking Sets
Size-Generalizable RNA Structure Evaluation by Exploring Hierarchical Geometries
Semialgebraic Neural Networks: From roots to representations
Geometry-aware RL for Manipulation of Varying Shapes and Deformable Objects
Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation
Nova: Generative Language Models for Assembly Code with Hierarchical Attention and Contrastive Learning
Towards Synergistic Path-based Explanations for Knowledge Graph Completion: Exploration and Evaluation
Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation
Flow matching achieves almost minimax optimal convergence
LaGeM: A Large Geometry Model for 3D Representation Learning and Diffusion
Direct Distributional Optimization for Provable Alignment of Diffusion Models
Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference
Causal Concept Graph Models: Beyond Causal Opacity in Deep Learning
ADAPT: Attentive Self-Distillation and Dual-Decoder Prediction Fusion for Continual Panoptic Segmentation
Learning to Steer Markovian Agents under Model Uncertainty
PvNeXt: Rethinking Network Design and Temporal Motion for Point Cloud Video Recognition
Flow With What You Know
KLay: Accelerating Arithmetic Circuits for Neurosymbolic AI
Difference-of-submodular Bregman Divergence
Transformers are Universal In-context Learners
Differential learning kinetics govern the transition from memorization to generalization during in-context learning
MMDisCo: Multi-Modal Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation
Improving Convergence Guarantees of Random Subspace Second-order Algorithm for Nonconvex Optimization
GameGen-X: Interactive Open-world Game Video Generation
Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers
DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References
ContextGNN: Beyond Two-Tower Recommendation Systems
Machine Unlearning Fails to Remove Data Poisoning Attacks
SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning
Vector-ICL: In-context Learning with Continuous Vector Representations
TabM: Advancing tabular deep learning with parameter-efficient ensembling
Fourier Sliced-Wasserstein Embedding for Multisets and Measures
Minimax Optimal Reinforcement Learning with Quasi-Optimism
EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition
RFMamba: Frequency-Aware State Space Model for RF-Based Human-Centric Perception
EC-Diffuser: Multi-Object Manipulation via Entity-Centric Behavior Generation
Demystifying Topological Message-Passing with Relational Structures: A Case Study on Oversquashing in Simplicial Message-Passing
Revealing and Mitigating Over-Attention in Knowledge Editing
SCOPE: A Self-supervised Framework for Improving Faithfulness in Conditional Text Generation
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
When narrower is better: the narrow width limit of Bayesian parallel branching neural networks
MeshMask: Physics-Based Simulations with Masked Graph Neural Networks
Scalable Universal T-Cell Receptor Embeddings from Adaptive Immune Repertoires
Improved Algorithms for Kernel Matrix-Vector Multiplication Under Sparsity Assumptions
Neural Exploratory Landscape Analysis for Meta-Black-Box-Optimization
NextBestPath: Efficient 3D Mapping of Unseen Environments
Safety Representations for Safer Policy Learning
Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning
MotherNet: Fast Training and Inference via Hyper-Network Transformers
No Preference Left Behind: Group Distributional Preference Optimization
RankSHAP: Shapley Value Based Feature Attributions for Learning to Rank
Optimistic Games for Combinatorial Bayesian Optimization with Application to Protein Design
POGEMA: A Benchmark Platform for Cooperative Multi-Agent Pathfinding
Physiome-ODE: A Benchmark for Irregularly Sampled Multivariate Time-Series Forecasting Based on Biological ODEs
Agent-Oriented Planning in Multi-Agent Systems
Differentially private learners for heterogeneous treatment effects
Probabilistic Conformal Prediction with Approximate Conditional Validity
STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning
Meta-Dynamical State Space Models for Integrative Neural Data Analysis
Diffusion Transformer Captures Spatial-Temporal Dependencies: A Theory for Gaussian Process Data
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Provable Convergence Bounds for Hybrid Dynamical Sampling and Optimization
Overcoming Slow Decision Frequencies in Continuous Control: Model-Based Sequence Reinforcement Learning for Model-Free Control
A Rainbow in Deep Network Black Boxes
Large Scale Knowledge Washing
The Effectiveness of Curvature-Based Rewiring and the Role of Hyperparameters in GNNs Revisited
Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting
NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval
ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks
Multi-Dimensional Conformal Prediction
MorphoDiff: Cellular Morphology Painting with Diffusion Models
Generalization v.s. Memorization: Tracing Language Models’ Capabilities Back to Pretraining Data
On Linear Representations and Pretraining Data Frequency in Language Models
PN-GAIL: Leveraging Non-optimal Information from Imperfect Demonstrations
Exploring channel distinguishability in local neighborhoods of the model space in quantum neural networks
Rethinking Graph Neural Networks From A Geometric Perspective Of Node Features
InfoGS: Efficient Structure-Aware 3D Gaussians via Lightweight Information Shaping
GI-GS: Global Illumination Decomposition on Gaussian Splatting for Inverse Rendering
IV-mixed Sampler: Leveraging Image Diffusion Models for Enhanced Video Synthesis
Animate Your Thoughts: Reconstruction of Dynamic Natural Vision from Human Brain Activity
Unified Parameter-Efficient Unlearning for LLMs
Breaking Mental Set to Improve Reasoning through Diverse Multi-Agent Debate
Consistency Checks for Language Model Forecasters
Multi-objective Differentiable Neural Architecture Search
Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues
uniINF: Best-of-Both-Worlds Algorithm for Parameter-Free Heavy-Tailed MABs
Quantitative Approximation for Neural Operators in Nonlinear Parabolic Equations
Neural Interactive Proofs
RAPID: Retrieval Augmented Training of Differentially Private Diffusion Models
OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code
CAX: Cellular Automata Accelerated in JAX
Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models
Modeling dynamic social vision highlights gaps between deep learning and humans
LLM-based Typed Hyperresolution for Commonsense Reasoning with Knowledge Bases
MoLEx: Mixture of Layer Experts for Fine-tuning with Sparse Upcycling
Generalization in VAE and Diffusion Models: A Unified Information-Theoretic Analysis
Transformer Meets Twicing: Harnessing Unattended Residual Information
RAG-SR: Retrieval-Augmented Generation for Neural Symbolic Regression
Unlocking Guidance for Discrete State-Space Diffusion and Flow Models
Causal Order: The Key to Leveraging Imperfect Experts in Causal Inference
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
Tight Clusters Make Specialized Experts
Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning
CAMEx: Curvature-aware Merging of Experts
ClawMachine: Learning to Fetch Visual Tokens for Referential Comprehension
Distribution-Specific Agnostic Conditional Classification With Halfspaces
Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games
Joint Reward and Policy Learning with Demonstrations and Human Feedback Improves Alignment
Learning to Discretize Denoising Diffusion ODEs
Divide and Translate: Compositional First-Order Logic Translation and Verification for Complex Logical Reasoning
REBIND: Enhancing Ground-state Molecular Conformation Prediction via Force-Based Graph Rewiring
Token-Supervised Value Models for Enhancing Mathematical Problem-Solving Capabilities of Large Language Models
Test-Time Ensemble via Linear Mode Connectivity: A Path to Better Adaptation
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
TopoGaussian: Inferring Internal Topology Structures from Visual Clues
SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios
Inverse Attention Agents for Multi-Agent Systems
ToddlerDiffusion: Interactive Structured Image Generation with Cascaded Schrödinger Bridge
SelectFormer in Data Markets: Privacy-Preserving and Efficient Data Selection for Transformers with Multi-Party Computation
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Differentiable Rule Induction from Raw Sequence Inputs
Atlas Gaussians Diffusion for 3D Generation
Generalizing Reasoning Problems to Longer Lengths
Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models
Learn-by-interact: A Data-Centric Framework For Self-Adaptive Agents in Realistic Environments
Intrinsic User-Centric Interpretability through Global Mixture of Experts
RegMix: Data Mixture as Regression for Language Model Pre-training
Inner Information Analysis Algorithm for Deep Neural Network based on Community
Verifying Properties of Binary Neural Networks Using Sparse Polynomial Optimization
Scaling up Masked Diffusion Models on Text
GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery
Data-adaptive Differentially Private Prompt Synthesis for In-Context Learning
Optimal Flow Transport and its Entropic Regularization: a GPU-friendly Matrix Iterative Algorithm for Flow Balance Satisfaction
Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors
IDInit: A Universal and Stable Initialization Method for Neural Network Training
Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace
EdgeRunner: Auto-regressive Auto-encoder for Artistic Mesh Generation
Action Sequence Augmentation for Action Anticipation
IterGen: Iterative Semantic-aware Structured LLM Generation with Backtracking
ZETA: Leveraging $Z$-order Curves for Efficient Top-$k$ Attention
AnalogGenie: A Generative Engine for Automatic Discovery of Analog Circuit Topologies
The Last Iterate Advantage: Empirical Auditing and Principled Heuristic Analysis of Differentially Private SGD
Unified Convergence Analysis for Score-Based Diffusion Models with Deterministic Samplers
Near-Exact Privacy Amplification for Matrix Mechanisms
Selective Aggregation for Low-Rank Adaptation in Federated Learning
Adaptive Shrinkage Estimation for Personalized Deep Kernel Regression in Modeling Brain Trajectories
VTDexManip: A Dataset and Benchmark for Visual-tactile Pretraining and Dexterous Manipulation with Reinforcement Learning
GameArena: Evaluating LLM Reasoning through Live Computer Games
Learning to Select Nodes in Branch and Bound with Sufficient Tree Representation
Looking into User’s Long-term Interests through the Lens of Conservative Evidential Learning
SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
Large-scale and Fine-grained Vision-language Pre-training for Enhanced CT Image Understanding
MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model
Computational Explorations of Total Variation Distance
Presto! Distilling Steps and Layers for Accelerating Music Generation
TopoLM: brain-like spatio-functional organization in a topographic language model
Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs
Convex Formulations for Training Two-Layer ReLU Neural Networks
Improved Techniques for Optimization-Based Jailbreaking on Large Language Models
Grokking at the Edge of Numerical Stability
Accelerating Task Generalisation with Multi-Level Skill Hierarchies
SSOLE: Rethinking Orthogonal Low-rank Embedding for Self-Supervised Learning
Regretful Decisions under Label Noise
Large Convolutional Model Tuning via Filter Subspace
ACC-Collab: An Actor-Critic Approach to Multi-Agent LLM Collaboration
Improving Language Model Distillation through Hidden State Matching
Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization
Lightweight Neural App Control
Plastic Learning with Deep Fourier Features
A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement
DPaI: Differentiable Pruning at Initialization with Node-Path Balance Principle
Erasing Concept Combination from Text-to-Image Diffusion Model
Test-time Adaptation for Cross-modal Retrieval with Query Shift
What's the Move? Hybrid Imitation Learning via Salient Points
Metalic: Meta-Learning In-Context with Protein Language Models
Unlocking Global Optimality in Bilevel Optimization: A Pilot Study
GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models
Balancing Act: Diversity and Consistency in Large Language Model Ensembles
What should a neuron aim for? Designing local objective functions based on information theory
Tackling Data Corruption in Offline Reinforcement Learning via Sequence Modeling
DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models
Fast Summation of Radial Kernels via QMC Slicing
DELTA: DENSE EFFICIENT LONG-RANGE 3D TRACKING FOR ANY VIDEO
Tailoring Mixup to Data for Calibration
Restyling Unsupervised Concept Based Interpretable Networks with Generative Models
BlendRL: A Framework for Merging Symbolic and Neural Policy Learning
MGMapNet: Multi-Granularity Representation Learning for End-to-End Vectorized HD Map Construction
MUSE: Machine Unlearning Six-Way Evaluation for Language Models
GMValuator: Similarity-based Data Valuation for Generative Models
On Evaluating the Durability of Safeguards for Open-Weight LLMs
Not All LLM-Generated Data Are Equal: Rethinking Data Weighting in Text Classification
LeanVec: Searching vectors faster by making them fit
HiBug2: Efficient and Interpretable Error Slice Discovery for Comprehensive Model Debugging
Gated Delta Networks: Improving Mamba2 with Delta Rule
No Free Lunch: Fundamental Limits of Learning Non-Hallucinating Generative Models
Efficient Dictionary Learning with Switch Sparse Autoencoders
Curriculum-aware Training for Discriminating Molecular Property Prediction Models
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Trajectory attention for fine-grained video motion control
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
Surprising Effectiveness of pretraining Ternary Language Model at Scale
PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-agent Tasks
Rationalizing and Augmenting Dynamic Graph Neural Networks
Large Language Models can Become Strong Self-Detoxifiers
From Promise to Practice: Realizing High-performance Decentralized Training
Which Tasks Should Be Compressed Together? A Causal Discovery Approach for Efficient Multi-Task Representation Compression
High-Quality Joint Image and Video Tokenization with Causal VAE
ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement
Attribute-based Visual Reprogramming for Vision-Language Models
Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning
VideoPhy: Evaluating Physical Commonsense for Video Generation
Text-to-Image Rectified Flow as Plug-and-Play Priors
On the Adversarial Risk of Test Time Adaptation: An Investigation into Realistic Test-Time Data Poisoning
Evidential Learning-based Certainty Estimation for Robust Dense Feature Matching
Long-Sequence Recommendation Models Need Decoupled Embeddings
Outlier Synthesis via Hamiltonian Monte Carlo for Out-of-Distribution Detection
Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking
AtomSurf: Surface Representation for Learning on Protein Structures
Policy Design in Long-run Welfare Dynamics
Controllable Satellite-to-Street-View Synthesis with Precise Pose Alignment and Zero-Shot Environmental Control
ImProver: Agent-Based Automated Proof Optimization
KAA: Kolmogorov-Arnold Attention for Enhancing Attentive Graph Neural Networks
Policy Gradient with Kernel Quadrature
Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph Generation
DECO: Unleashing the Potential of ConvNets for Query-based Detection and Segmentation
A Theoretical Framework for Partially-Observed Reward States in RLHF
LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias
Global Convergence in Neural ODEs: Impact of Activation Functions
InstaTrain: Adaptive Training via Ultra-Fast Natural Annealing within Dynamical Systems
Repetition Improves Language Model Embeddings
SPA-BENCH: A COMPREHENSIVE BENCHMARK FOR SMARTPHONE AGENT EVALUATION
h4rm3l: A Language for Composable Jailbreak Attack Synthesis
Black-Box Detection of Language Model Watermarks
Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering
Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization
Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment
Regulatory DNA Sequence Design with Reinforcement Learning
Learning Transformer-based World Models with Contrastive Predictive Coding
Decoupling Angles and Strength in Low-rank Adaptation
Variational Search Distributions
Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks
Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning
Building, Reusing, and Generalizing Abstract Representations from Concrete Sequences
DeeperForward: Enhanced Forward-Forward Training for Deeper and Better Performance
ELBOing Stein: Variational Bayes with Stein Mixture Inference
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)
SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement
SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding
Spherical Tree-Sliced Wasserstein Distance
Distance-Based Tree-Sliced Wasserstein Distance
MAI: A Multi-turn Aggregation-Iteration Model for Composed Image Retrieval
Disentangled Representation Learning with the Gromov-Monge Gap
Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization
kNN Attention Demystified: A Theoretical Exploration for Scalable Transformers
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
Chunk-Distilled Language Modeling
u-$\mu$P: The Unit-Scaled Maximal Update Parametrization
Diffusion Models Are Real-Time Game Engines
Arithmetic Without Algorithms: Language Models Solve Math with a Bag of Heuristics
UV-Attack: Physical-World Adversarial Attacks on Person Detection via Dynamic-NeRF-based UV Mapping
Towards Unified Human Motion-Language Understanding via Sparse Interpretable Characterization
Efficient Low-Bit Quantization with Adaptive Scales for Multi-Task Co-Training
SelKD: Selective Knowledge Distillation via Optimal Transport Perspective
Regularizing Energy among Training Samples for Out-of-Distribution Generalization
Rethinking and Improving Autoformalization: Towards a Faithful Metric and a Dependency Retrieval-based Approach
Learning Structured Universe Graph with Outlier OOD Detection for Partial Matching
What Secrets Do Your Manifolds Hold? Understanding the Local Geometry of Generative Models
To Clip or not to Clip: the Dynamics of SGD with Gradient Clipping in High-Dimensions
Locality Alignment Improves Vision-Language Models
Robust LLM safeguarding via refusal feature adversarial training
UniCO: On Unified Combinatorial Optimization via Problem Reduction to Matrix-Encoded General TSP
Lasso Bandit with Compatibility Condition on Optimal Arm
Preference Elicitation for Offline Reinforcement Learning
Unify ML4TSP: Drawing Methodological Principles for TSP and Beyond from Streamlined Design Space of Learning and Search
Guaranteed Generation from Large Language Models
Forget the Data and Fine-Tuning! Just Fold the Network to Compress
Statistical Advantages of Perturbing Cosine Router in Mixture of Experts
Zero-Shot Natural Language Explanations
Learning Geometric Reasoning Networks For Robot Task And Motion Planning
Learning to Help in Multi-Class Settings
Variational Diffusion Posterior Sampling with Midpoint Guidance
Disentangling Representations through Multi-task Learning
Prompting Fairness: Integrating Causality to Debias Large Language Models
T-JEPA: Augmentation-Free Self-Supervised Learning for Tabular Data
Causal Discovery via Bayesian Optimization
Deep Incomplete Multi-view Learning via Cyclic Permutation of VAEs
Contrastive Learning from Synthetic Audio Doppelgängers
When Selection Meets Intervention: Additional Complexities in Causal Discovery
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Dynamic Negative Guidance of Diffusion Models
Bilinear MLPs enable weight-based mechanistic interpretability
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
Direct Post-Training Preference Alignment for Multi-Agent Motion Generation Model Using Implicit Feedback from Pre-training Demonstrations
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation
MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models
AdaFisher: Adaptive Second Order Optimization via Fisher Information
DocMIA: Document-Level Membership Inference Attacks against DocVQA Models
The Directionality of Optimization Trajectories in Neural Networks
Expressivity of Neural Networks with Random Weights and Learned Biases
Beyond Random Augmentations: Pretraining with Hard Views
Attention layers provably solve single-location regression
Fine-Tuning Token-Based Large Multimodal Models: What Works, What Doesn’t and What's Next
Lines of Thought in Large Language Models
Density estimation with LLMs: a geometric investigation of in-context learning trajectories
NEAR: A Training-Free Pre-Estimator of Machine Learning Model Performance
Training-Free Diffusion Model Alignment with Sampling Demons
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
Towards Certification of Uncertainty Calibration under Adversarial Attacks
Shh, don't say that! Domain Certification in LLMs
Revisiting Feature Prediction for Learning Visual Representations from Video
Simulating Human-like Daily Activities with Desire-driven Autonomy
Newton Meets Marchenko-Pastur: Massively Parallel Second-Order Optimization with Hessian Sketching and Debiasing
Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Uncertainty-Aware Decoding with Minimum Bayes Risk
TopoNets: High performing vision and language models with brain-like topography
OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling
Hierarchical World Models as Visual Whole-Body Humanoid Controllers
Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning
FormalAlign: Automated Alignment Evaluation for Autoformalization
LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
PharmacoMatch: Efficient 3D Pharmacophore Screening via Neural Subgraph Matching
Tracking objects that change in appearance with phase synchrony
InversionGNN: A Dual Path Network for Multi-Property Molecular Optimization
Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs
CoInD: Enabling Logical Compositions in Diffusion Models
Descent with Misaligned Gradients and Applications to Hidden Convexity
OASIS Uncovers: High-Quality T2I Models, Same Old Stereotypes
Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy
$\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs
Diffusion State-Guided Projected Gradient for Inverse Problems
LiveBench: A Challenging, Contamination-Limited LLM Benchmark
Filtered not Mixed: Filtering-Based Online Gating for Mixture of Large Language Models
API Pack: A Massive Multi-Programming Language Dataset for API Call Generation
Learning from weak labelers as constraints
Hummingbird: High Fidelity Image Generation via Multimodal Context Alignment
Neuron Platonic Intrinsic Representation From Dynamics Using Contrastive Learning
Instant Policy: In-Context Imitation Learning via Graph Diffusion
Group Downsampling with Equivariant Anti-aliasing
LoRA Learns Less and Forgets Less
A Distributional Approach to Uncertainty-Aware Preference Alignment Using Offline Demonstrations
Estimating the Probabilities of Rare Outputs in Language Models
Self-Normalized Resets for Plasticity in Continual Learning
DiSK: Differentially Private Optimizer with Simplified Kalman Filter for Noise Reduction
Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models
On Minimizing Adversarial Counterfactual Error in Adversarial Reinforcement Learning
Revisiting Nearest Neighbor for Tabular Data: A Deep Tabular Baseline Two Decades Later
Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation
Limits to scalable evaluation at the frontier: LLM as judge won’t beat twice the data
Parameter and Memory Efficient Pretraining via Low-rank Riemannian Optimization
Reward Learning from Multiple Feedback Types
Probabilistic Neural Pruning via Sparsity Evolutionary Fokker-Planck-Kolmogorov Equation
Training on the Test Task Confounds Evaluation and Emergence
Homomorphism Expressivity of Spectral Invariant Graph Neural Networks
Towards Marginal Fairness Sliced Wasserstein Barycenter
How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequences
CO-MOT: Boosting End-to-end Transformer-based Multi-Object Tracking via Coopetition Label Assignment and Shadow Sets
Does Spatial Cognition Emerge in Frontier Models?
Chemistry-Inspired Diffusion with Non-Differentiable Guidance
Cut Your Losses in Large-Vocabulary Language Models
COME: Test-time Adaption by Conservatively Minimizing Entropy
CoMotion: Concurrent Multi-person 3D Motion
GNNs Getting ComFy: Community and Feature Similarity Guided Rewiring
MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
DRoC: Elevating Large Language Models for Complex Vehicle Routing via Decomposed Retrieval of Constraints
Oracle efficient truncated statistics
Training Free Guided Flow-Matching with Optimal Control
6D Object Pose Tracking in Internet Videos for Robotic Manipulation
SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation
MGCFNN: A Neural MultiGrid Solver with Novel Fourier Neural Network for High Wave Number Helmholtz Equations
BTBS-LNS: Binarized-Tightening, Branch and Search on Learning LNS Policies for MIP
How new data permeates LLM knowledge and how to dilute it
Pre-training of Foundation Adapters for LLM Fine-tuning
Spiking Vision Transformer with Saccadic Attention
NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation
NeuroLM: A Universal Multi-task Foundation Model for Bridging the Gap between Language and EEG Signals
Diffusion Feedback Helps CLIP See Better
Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation
Conformalized Interactive Imitation Learning: Handling Expert Shift and Intermittent Feedback
Agents' Room: Narrative Generation through Multi-step Collaboration
ND-SDF: Learning Normal Deflection Fields for High-Fidelity Indoor Reconstruction
LeFusion: Controllable Pathology Synthesis via Lesion-Focused Diffusion Models
Efficient Active Imitation Learning with Random Network Distillation
A Computational Framework for Modeling Emergence of Color Vision in the Human Brain
GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation
Do Mice Grok? Glimpses of Hidden Progress in Sensory Cortex
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
Unsupervised Multiple Kernel Learning for Graphs via Ordinality Preservation
Revisiting text-to-image evaluation with Gecko: on metrics, prompts, and human rating
Mitigating Information Loss in Tree-Based Reinforcement Learning via Direct Optimization
Salvage: Shapley-distribution Approximation Learning Via Attribution Guided Exploration for Explainable Image Classification
Collaborative Discrete-Continuous Black-Box Prompt Learning for Language Models
Overcoming Lower-Level Constraints in Bilevel Optimization: A Novel Approach with Regularized Gap Functions
Adversarial Training for Defense Against Label Poisoning Attacks
qNBO: quasi-Newton Meets Bilevel Optimization
Faster Inference of Flow-Based Generative Models via Improved Data-Noise Coupling
Generalizable Human Gaussians from Single-View Image
Leveraging Flatness to Improve Information-Theoretic Generalization Bounds for SGD
Certified Robustness Under Bounded Levenshtein Distance
Block-Attention for Efficient Prefilling
Efficient Interpolation between Extragradient and Proximal Methods for Weak MVIs
Rotated Runtime Smooth: Training-Free Activation Smoother for accurate INT4 inference
TODO: Enhancing LLM Alignment with Ternary Preferences
Learning Dynamics of LLM Finetuning
Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Models
Injective flows for star-like manifolds
Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View
Causally Motivated Sycophancy Mitigation for Large Language Models
VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking
Flaws of ImageNet, Computer Vision's Favourite Dataset
Influence Functions for Scalable Data Attribution in Diffusion Models
HShare: Fast LLM Decoding by Hierarchical Key-Value Sharing
ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer
Indirect Gradient Matching for Adversarial Robust Distillation
Streamlining Redundant Layers to Compress Large Language Models
Lossy Compression with Pretrained Diffusion Models
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models
Finally Rank-Breaking Conquers MNL Bandits: Optimal and Efficient Algorithms for MNL Assortment
Multi-agent cooperation through learning-aware policy gradients
Highly Efficient Self-Adaptive Reward Shaping for Reinforcement Learning
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
GeSubNet: Gene Interaction Inference for Disease Subtype Network Generation
The Pitfalls of Memorization: When Memorization Hurts Generalization
IPDreamer: Appearance-Controllable 3D Object Generation with Complex Image Prompts
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
Feedback Schrödinger Bridge Matching
Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient
Deep Distributed Optimization for Large-Scale Quadratic Programming
Field-DiT: Diffusion Transformer on Unified Video, 3D, and Game Field Generation
Safety Layers in Aligned Large Language Models: The Key to LLM Security
PIN: Prolate Spheroidal Wave Function-based Implicit Neural Representations
Learning Harmonized Representations for Speculative Sampling
High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity
PEAR: Primitive Enabled Adaptive Relabeling for Boosting Hierarchical Reinforcement Learning
REEF: Representation Encoding Fingerprints for Large Language Models
From Tokens to Lattices: Emergent Lattice Structures in Language Models
The Computational Complexity of Positive Non-Clashing Teaching in Graphs
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
Accelerating Diffusion Transformers with Token-wise Feature Caching
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts
Extendable and Iterative Structure Learning Strategy for Bayesian Networks
Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Transformers
Efficient Reinforcement Learning with Large Language Model Priors
KinFormer: Generalizable Dynamical Symbolic Regression for Catalytic Organic Reaction Kinetics
Transformers Provably Solve Parity Efficiently with Chain of Thought
Rethinking Classifier Re-Training in Long-Tailed Recognition: Label Over-Smooth Can Balance
Pedestrian Motion Reconstruction: A Large-scale Benchmark via Mixed Reality Rendering with Multiple Perspectives and Modalities
IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations
SimpleTM: A Simple Baseline for Multivariate Time Series Forecasting
CLIPure: Purification in Latent Space via CLIP for Adversarially Robust Zero-Shot Classification
Models trained with unnormalized density functions: A need for a course correction
Composing Unbalanced Flows for Flexible Docking and Relaxation
REMEDY: Recipe Merging Dynamics in Large Vision-Language Models
Generalized Behavior Learning from Diverse Demonstrations
Semi-Supervised CLIP Adaptation by Enforcing Semantic and Trapezoidal Consistency
Holographic Node Representations: Pre-training Task-Agnostic Node Embeddings
Realistic Evaluation of Deep Partial-Label Learning Algorithms
Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality
PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing
AniSDF: Fused-Granularity Neural Surfaces with Anisotropic Encoding for High-Fidelity 3D Reconstruction
Closed-Form Merging of Parameter-Efficient Modules for Federated Continual Learning
Noise Separation guided Candidate Label Reconstruction for Noisy Partial Label Learning
ILLUSION: Unveiling Truth with a Comprehensive Multi-Modal, Multi-Lingual Deepfake Dataset
MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
Boosting Multiple Views for pretrained-based Continual Learning
Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning
PaLD: Detection of Text Partially Written by Large Language Models
Benchmarking Agentic Workflow Generation
SAVA: Scalable Learning-Agnostic Data Valuation
ADMM for Structured Fractional Minimization
Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness
Projection Head is Secretly an Information Bottleneck
Boltzmann Semantic Score: A Semantic Metric for Evaluating Large Vision Models Using Large Language Models
Unsupervised Zero-Shot Reinforcement Learning via Dual-Value Forward-Backward Representation
LoCoDL: Communication-Efficient Distributed Learning with Local Training and Compression
IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
MindSimulator: Exploring Brain Concept Localization via Synthetic fMRI
SELF-EVOLVED REWARD LEARNING FOR LLMS
RuAG: Learned-rule-augmented Generation for Large Language Models
On the Adversarial Vulnerability of Label-Free Test-Time Adaptation
IMDPrompter: Adapting SAM to Image Manipulation Detection by Cross-View Automated Prompt Learning
Autocorrelation Matters: Understanding the Role of Initialization Schemes for State Space Models
ReCogLab: a framework testing relational reasoning & cognitive hypotheses on LLMs
GraphRouter: A Graph-based Router for LLM Selections
SOAP: Improving and Stabilizing Shampoo using Adam for Language Modeling
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
Anti-Exposure Bias in Diffusion Models
Easing Training Process of Rectified Flow Models Via Lengthening Inter-Path Distance
How Does Critical Batch Size Scale in Pre-training?
A New Perspective on Shampoo's Preconditioner
Deconstructing What Makes a Good Optimizer for Autoregressive Language Models
Mixture of Parrots: Experts improve memorization more than reasoning
Revisiting a Design Choice in Gradient Temporal Difference Learning
Towards Understanding Why FixMatch Generalizes Better Than Supervised Learning
Doubly Optimal Policy Evaluation for Reinforcement Learning
InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation
Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning
Transformers Can Learn Temporal Difference Methods for In-Context Reinforcement Learning
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Unleashing the Potential of Vision-Language Pre-Training for 3D Zero-Shot Lesion Segmentation via Mask-Attribute Alignment
Rethinking the generalization of drug target affinity prediction algorithms via similarity aware evaluation
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Learning Shape-Independent Transformation via Spherical Representations for Category-Level Object Pose Estimation
Improving Deep Regression with Tightness
RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
Generative Verifiers: Reward Modeling as Next-Token Prediction
GSE: Group-wise Sparse and Explainable Adversarial Attacks
Understanding Methods for Scalable MCTS
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
Neural Sampling from Boltzmann Densities: Fisher-Rao Curves in the Wasserstein Geometry
eQMARL: Entangled Quantum Multi-Agent Reinforcement Learning for Distributed Cooperation over Quantum Channels
AnoLLM: Large Language Models for Tabular Anomaly Detection
The impact of allocation strategies in subset learning on the expressive power of neural networks
Fine-tuning can Help Detect Pretraining Data from Large Language Models
Boost Self-Supervised Dataset Distillation via Parameterization, Predefined Augmentation, and Approximation
Exploring Learning Complexity for Efficient Downstream Dataset Pruning
REvolve: Reward Evolution with Large Language Models using Human Feedback
UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation
Wavelet Diffusion Neural Operator
CL-DiffPhyCon: Closed-loop Diffusion Control of Complex Physical Systems
Metamizer: A Versatile Neural Optimizer for Fast and Accurate Physics Simulations
Long-tailed Adversarial Training with Self-Distillation
TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types
VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text
MallowsPO: Fine-Tune Your LLM with Preference Dispersions
MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation
OccProphet: Pushing the Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with an Observer-Forecaster-Refiner Framework
Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning
Nonlinear Sequence Embedding by Monotone Variational Inequality
Kernel-based Optimally Weighted Conformal Time-Series Prediction
Agree to Disagree: Demystifying Homogeneous Deep Ensembles through Distributional Equivalence
Quantum (Inspired) $D^2$-sampling with Applications
Inspection and Control of Self-Generated-Text Recognition Ability in Llama3-8b-Instruct
Concept Bottleneck Language Models For Protein Design
Discovering Clone Negatives via Adaptive Contrastive Learning for Image-Text Matching
PostCast: Generalizable Postprocessing for Precipitation Nowcasting via Unsupervised Blurriness Modeling
Mitigating Reward Over-Optimization in RLHF via Behavior-Supported Regularization
A Closer Look at Machine Unlearning for Large Language Models
Looking Inward: Language Models Can Learn About Themselves by Introspection
Perplexity Trap: PLM-Based Retrievers Overrate Low Perplexity Documents
Influence-Guided Diffusion for Dataset Distillation
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
MaxCutPool: differentiable feature-aware Maxcut for pooling in graph neural networks
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Glad: A Streaming Scene Generator for Autonomous Driving
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Resolution Attack: Exploiting Image Compression to Deceive Deep Neural Networks
Stem-OB: Generalizable Visual Imitation Learning with Stem-Like Convergent Observation through Diffusion Inversion
Circuit Transformer: A Transformer That Preserves Logical Equivalence
Information Theoretic Text-to-Image Alignment
GeoILP: A Synthetic Dataset to Guide Large-Scale Rule Induction
Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
CREIMBO: Cross-Regional Ensemble Interactions in Multi-view Brain Observations
EvA: Erasing Spurious Correlations with Activations
SysBench: Can LLMs Follow System Message?
Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations
Federated Class-Incremental Learning: A Hybrid Approach Using Latent Exemplars and Data-Free Techniques to Address Local and Global Forgetting
ProtPainter: Draw or Drag Protein via Topology-guided Diffusion
Redefining the task of Bioactivity Prediction
CtD: Composition through Decomposition in Emergent Communication
Reframing Structure-Based Drug Design Model Evaluation via Metrics Correlated to Practical Needs
Rethinking Diffusion Posterior Sampling: From Conditional Score Estimator to Maximizing a Posterior
Mask in the Mirror: Implicit Sparsification
The Rise and Down of Babel Tower: Investigating the Evolution Process of Multilingual Code Large Language Model
Score-based Self-supervised MRI Denoising
Revisit Micro-batch Clipping: Adaptive Data Pruning via Gradient Manipulation
Contextualizing biological perturbation experiments through language
PhyMPGN: Physics-encoded Message Passing Graph Network for spatiotemporal PDE systems
Local Patterns Generalize Better for Novel Anomalies
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
Model Risk-sensitive Offline Reinforcement Learning
Beyond Squared Error: Exploring Loss Design for Enhanced Training of Generative Flow Networks
Offline RL in Regular Decision Processes: Sample Efficiency via Language Metrics
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
CViT: Continuous Vision Transformer for Operator Learning
Rethinking Multiple-Instance Learning From Feature Space to Probability Space
SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents
TabWak: A Watermark for Tabular Diffusion Models
Interpretable Unsupervised Joint Denoising and Enhancement for Real-World low-light Scenarios
FIRING-Net: A filtered feature recycling network for speech enhancement
Can We Talk Models Into Seeing the World Differently?
ZooProbe: A Data Engine for Evaluating, Exploring, and Evolving Large-scale Training Data for Multimodal LLMs
DLEFT-MKC: Dynamic Late Fusion Multiple Kernel Clustering with Robust Tensor Learning via Min-Max Optimization
Simple yet Effective Incomplete Multi-view Clustering: Similarity-level Imputation and Intra-view Hybrid-group Prototype Construction
Causal Reasoning and Large Language Models: Opening a New Frontier for Causality
An Online Learning Theory of Trading-Volume Maximization
UniCBE: An Uniformity-driven Comparing Based Evaluation Framework with Unified Multi-Objective Optimization
Dynamical Diffusion: Learning Temporal Dynamics with Diffusion Models
Mitigating Memorization in Language Models
Predicting the Energy Landscape of Stochastic Dynamical System via Physics-informed Self-supervised Learning
Image Watermarks are Removable using Controllable Regeneration from Clean Noise
Fair Clustering in the Sliding Window Model
ELICIT: LLM Augmentation Via External In-context Capability
Personality Alignment of Large Language Models
Neural Causal Graph for Interpretable and Intervenable Classification
Language Models are Advanced Anonymizers
Ward: Provable RAG Dataset Inference via LLM Watermarks
Towards Homogeneous Lexical Tone Decoding from Heterogeneous Intracranial Recordings
NutriBench: A Dataset for Evaluating Large Language Models in Nutrition Estimation from Meal Descriptions
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding
Linear Spherical Sliced Optimal Transport: A Fast Metric for Comparing Spherical Data
CycleResearcher: Improving Automated Research via Automated Review
MMQA: Evaluating LLMs with Multi-Table Multi-Hop Complex Questions
Human Simulacra: Benchmarking the Personification of Large Language Models
DAMO: Decoding by Accumulating Activations Momentum for Mitigating Hallucinations in Vision-Language Models
UniRestore3D: A Scalable Framework For General Shape Restoration
Offline Hierarchical Reinforcement Learning via Inverse Optimization
Generative Representational Instruction Tuning
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
Adversarially Robust Anomaly Detection through Spurious Negative Pair Mitigation
An Exploration with Entropy Constrained 3D Gaussians for 2D Video Compression
Learning Fine-Grained Representations through Textual Token Disentanglement in Composed Video Retrieval
Decentralized Sporadic Federated Learning: A Unified Algorithmic Framework with Convergence Guarantees
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory
T2V2: A Unified Non-Autoregressive Model for Speech Recognition and Synthesis via Multitask Learning
Moner: Motion Correction in Undersampled Radial MRI with Unsupervised Neural Representation
One Hundred Neural Networks and Brains Watching Videos: Lessons from Alignment
BrainACTIV: Identifying visuo-semantic properties driving cortical selectivity using diffusion-based image manipulation
Multi-Draft Speculative Sampling: Canonical Decomposition and Theoretical Limits
TD-Paint: Faster Diffusion Inpainting Through Time-Aware Pixel Conditioning
Modeling Complex System Dynamics with Flow Matching Across Time and Conditions
One for all and all for one: Efficient computation of partial Wasserstein distances on the line
Learning Dynamics of Deep Matrix Factorization Beyond the Edge of Stability
Beyond Canonicalization: How Tensorial Messages Improve Equivariant Message Passing
Manifold Induced Biases for Zero-shot and Few-shot Detection of Generated Images
Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation
DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving
An Engorgio Prompt Makes Large Language Model Babble on
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models
SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models
Hotspot-Driven Peptide Design via Multi-Fragment Autoregressive Extension
SANA: Efficient High-Resolution Text-to-Image Synthesis with Linear Diffusion Transformers
Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning
Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining
Semantic Temporal Abstraction via Vision-Language Model Guidance for Efficient Reinforcement Learning
Scaling FP8 training to trillion-token LLMs
Enhancing Pre-trained Representation Classifiability can Boost its Interpretability
Efficient Multi-agent Offline Coordination via Diffusion-based Trajectory Stitching
Supervised and Semi-Supervised Diffusion Maps with Label-Driven Diffusion
Tree-Wasserstein Distance for High Dimensional Data with a Latent Feature Hierarchy
Reassessing How to Compare and Improve the Calibration of Machine Learning Models
On Stochastic Contextual Bandits with Knapsacks in Small Budget Regime
ChemAgent: Self-updating Memories in Large Language Models Improves Chemical Reasoning
For Better or For Worse? Learning Minimum Variance Features With Label Augmentation
Audio Large Language Models Can Be Descriptive Speech Quality Evaluators
Residual-MPPI: Online Policy Customization for Continuous Control
ImagineNav: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination
X-Drive: Cross-modality Consistent Multi-Sensor Data Synthesis for Driving Scenarios
Self-Attention-Based Contextual Modulation Improves Neural System Identification
Restructuring Vector Quantization with the Rotation Trick
GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
Analyzing Neural Scaling Laws in Two-Layer Networks with Power-Law Data Spectra
Distribution Backtracking Builds A Faster Convergence Trajectory for Diffusion Distillation
FaceShot: Bring Any Character into Life
Fully-inductive Node Classification on Arbitrary Graphs
On Large Language Model Continual Unlearning
MolSpectra: Pre-training 3D Molecular Representation with Multi-modal Energy Spectra
Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models
LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph Generation
AdaManip: Adaptive Articulated Object Manipulation Environments and Policy Learning
MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution
Non-myopic Generation of Language Models for Reasoning and Planning
On the Optimization and Generalization of Multi-head Attention
Tamper-Resistant Safeguards for Open-Weight LLMs
DCT-CryptoNets: Scaling Private Inference in the Frequency Domain
Learning Structured Representations by Embedding Class Hierarchy with Fast Optimal Transport
Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers
FOSP: Fine-tuning Offline Safe Policy through World Models
Adversaries With Incentives: A Strategic Alternative to Adversarial Robustness
Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling
SAGEPhos: Sage Bio-Coupled and Augmented Fusion for Phosphorylation Site Detection
MaRS: A Fast Sampler for Mean Reverting Diffusion based on ODE and SDE Solvers
Precise Localization of Memories: A Fine-grained Neuron-level Knowledge Editing Technique for LLMs
CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models
Progressive Mixed-Precision Decoding for Efficient LLM Inference
SC-OmniGS: Self-Calibrating Omnidirectional Gaussian Splatting
DRESSing Up LLM: Efficient Stylized Question-Answering via Style Subspace Editing
EmbedLLM: Learning Compact Representations of Large Language Models
AgentSquare: Automatic LLM Agent Search in Modular Design Space
ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning
TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees
Bridging the Gap Between f-divergences and Bayes Hilbert Spaces
Designing Mechanical Meta-Materials by Learning Equivariant Flows
DeepTAGE: Deep Temporal-Aligned Gradient Enhancement for Optimizing Spiking Neural Networks
Round and Round We Go! What makes Rotary Positional Encodings useful?
SysCaps: Language Interfaces for Simulation Surrogates of Complex Systems
Sparse Learning for State Space Models on Mobile
Revisit the Open Nature of Open Vocabulary Semantic Segmentation
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?
Multi-Scale Fusion for Object Representation
Recognize Any Surgical Object: Unleashing the Power of Weakly-Supervised Data
GOLD: Graph Out-of-Distribution Detection via Implicit Adversarial Latent Generation
SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations
Fine-tuning can cripple your foundation model; preserving features may be the solution
GALA: Geometry-Aware Local Adaptive Grids for Detailed 3D Generation
Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Hyperbolic Genome Embeddings
Triples as the Key: Structuring Makes Decomposition and Verification Easier in LLM-based TableQA
Data Pruning by Information Maximization
An Evolved Universal Transformer Memory
VLAS: Vision-Language-Action Model with Speech Instructions for Customized Robot Manipulation
Memory Mosaics
X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention
Gradient-Free Generation for Hard-Constrained Systems
CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes
FreeVS: Generative View Synthesis on Free Driving Trajectory
Improving Long-Text Alignment for Text-to-Image Diffusion Models
Enhancing End-to-End Autonomous Driving with Latent World Model
On the Computation of the Fisher Information in Continual Learning
DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
SimulPL: Aligning Human Preferences in Simultaneous Machine Translation
CREAM: Consistency Regularized Self-Rewarding Language Models
Beware of Calibration Data for Pruning Large Language Models
Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow
3DMolFormer: A Dual-channel Framework for Structure-based Drug Discovery
Does Editing Provide Evidence for Localization?
A Geometric Framework for Understanding Memorization in Generative Models
TLDR: Token-Level Detective Reward Model for Large Vision Language Models
Glimpse: Enabling White-Box Methods to Use Proprietary Models for Zero-Shot LLM-Generated Text Detection
SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget
Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing
Towards Domain Adaptive Neural Contextual Bandits
Investigating Pattern Neurons in Urban Time Series Forecasting
Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought
DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models
Adding Conditional Control to Diffusion Models with Reinforcement Learning
Distilling Dataset into Neural Field
Black Sheep in the Herd: Playing with Spuriously Correlated Attributes for Vision-Language Recognition
Dissecting Adversarial Robustness of Multimodal LM Agents
Incorporating Visual Correspondence into Diffusion Model for Virtual Try-On
Wayward Concepts In Multimodal Models
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs
Can Watermarks be Used to Detect LLM IP Infringement For Free?
Learning Diagrams: A Graphical Language for Compositional Training Regimes
EcoFace: Audio-Visual Emotional Co-Disentanglement Speech-Driven 3D Talking Face Generation
Neural Approximate Mirror Maps for Constrained Diffusion Models
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
Satisficing Regret Minimization in Bandits
Controlling Space and Time with Diffusion Models
Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs
Exploring the Design Space of Visual Context Representation in Video MLLMs
GANDALF: Generative AttentioN based Data Augmentation and predictive modeLing Framework for personalized cancer treatment
Towards Realistic Data Generation for Real-World Super-Resolution
Provably Safeguarding a Classifier from OOD and Adversarial Samples
Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving
DSBench: How Far Are Data Science Agents from Becoming Data Science Experts?
Probe before You Talk: Towards Black-box Defense against Backdoor Unalignment for Large Language Models
Uncertainty modeling for fine-tuned implicit functions
On the Fourier analysis in the SO(3) space : the EquiLoPO Network
Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions
HaDeMiF: Hallucination Detection and Mitigation in Large Language Models
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
Guided Score identity Distillation for Data-Free One-Step Text-to-Image Generation
Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step
Adapting Multi-modal Large Language Model to Concept Drift From Pre-training Onwards
Optimizing $(L_0, L_1)$-Smooth Functions by Gradient Methods
cryoSPHERE: Single-Particle HEterogeneous REconstruction from cryo EM
Bridging the Gap between Variational Inference and Stochastic Gradient MCMC in Function Space
Grammar Reinforcement Learning: path and cycle counting in graphs with a Context-Free Grammar and Transformer approach
DICE: Data Influence Cascade in Decentralized Learning
HyperFace: Generating Synthetic Face Recognition Datasets by Exploring Face Embedding Hypersphere
Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance
Release the Powers of Prompt Tuning: Cross-Modality Prompt Transfer
HMoRA: Making LLMs More Effective with Hierarchical Mixture of LoRA Experts
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Going Beyond Static: Understanding Shifts with Time-Series Attribution
Scrutinize What We Ignore: Reining In Task Representation Shift Of Context-Based Offline Meta Reinforcement Learning
Confidence Elicitation: A New Attack Vector for Large Language Models
PIED: Physics-Informed Experimental Design for Inverse Problems
RelitLRM: Generative Relightable Radiance for Large Reconstruction Models
BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models
Decentralized Optimization with Coupled Constraints
Repulsive Latent Score Distillation for Solving Inverse Problems
A Visual Dive into Conditional Flow Matching
Youku Dense Caption: A Large-scale Chinese Video Dense Caption Dataset and Benchmarks
Refine Knowledge of Large Language Models via Adaptive Contrastive Learning
Frame-Voyager: Learning to Query Frames for Video Large Language Models
Zero-shot Imputation with Foundation Inference Models for Dynamical Systems
Qinco2: Vector Compression and Search with Improved Implicit Neural Codebooks
Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data
Large Language Models Often Say One Thing and Do Another
Can a Large Language Model be a Gaslighter?
Deep Random Features for Scalable Interpolation of Spatiotemporal Data
Enhancing Vision-Language Model with Unmasked Token Alignment
A3D: Does Diffusion Dream about 3D Alignment?
Input Space Mode Connectivity in Deep Neural Networks
Making Transformer Decoders Better Differentiable Indexers
What Makes Large Language Models Reason in (Multi-Turn) Code Generation?
P-SPIKESSM: HARNESSING PROBABILISTIC SPIKING STATE SPACE MODELS FOR LONG-RANGE DEPENDENCY TASKS
The KoLMogorov Test: Compression by Code Generation
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
General Scene Adaptation for Vision-and-Language Navigation
How to Evaluate Reward Models for RLHF
Long Context Compression with Activation Beacon
RouteLLM: Learning to Route LLMs from Preference Data
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
Making Text Embedders Few-Shot Learners
K-HALU: Multiple Answer Korean Hallucination Benchmark for Large Language Models
MMTEB: Massive Multilingual Text Embedding Benchmark
AutoUAD: Hyper-parameter Optimization for Unsupervised Anomaly Detection
FlashMask: Efficient and Rich Mask Extension of FlashAttention
EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos
CipherPrune: Efficient and Scalable Private Transformer Inference
OpenHands: An Open Platform for AI Software Developers as Generalist Agents
DON’T STOP ME NOW: EMBEDDING BASED SCHEDULING FOR LLMS
EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment
FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling
How Much is Unseen Depends Chiefly on Information About the Seen
ZAPBench: A Benchmark for Whole-Brain Activity Prediction in Zebrafish
Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion
Loss Landscape of Shallow ReLU-like Neural Networks: Stationary Points, Saddle Escape, and Network Embedding
Quantum-PEFT: Ultra parameter-efficient fine-tuning
Data Selection via Optimal Control for Language Models
TeaserGen: Generating Teasers for Long Documentaries
Aligning Visual Contrastive learning models via Preference Optimization
VVC-Gym: A Fixed-Wing UAV Reinforcement Learning Environment for Multi-Goal Long-Horizon Problems
Exploring a Principled Framework for Deep Subspace Clustering
HERO: Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning
Scaling Laws for Downstream Task Performance in Machine Translation
Ranking-aware adapter for text-driven image ordering with CLIP
Towards Optimal Multi-draft Speculative Decoding
On the Feature Learning in Diffusion Models
Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
One Step Diffusion via Shortcut Models
OGBench: Benchmarking Offline Goal-Conditioned RL
Random Is All You Need: Random Noise Injection on Feature Statistics for Generalizable Deep Image Denoising
Prioritized Generative Replay
CURIE: Evaluating LLMs on Multitask Scientific Long-Context Understanding and Reasoning
Multimodal Quantitative Language for Generative Recommendation
FreSh: Frequency Shifting for Accelerated Neural Representation Learning
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
Precedence-Constrained Winter Value for Effective Graph Data Valuation
Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models
Uncertainty Modeling in Graph Neural Networks via Stochastic Differential Equations
LASER: A Neuro-Symbolic Framework for Learning Spatio-Temporal Scene Graphs with Weak Supervision
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark for Large Language Models
Federated Continual Learning Goes Online: Uncertainty-Aware Memory Management for Vision Tasks and Beyond
Diversity-Rewarded CFG Distillation
Gaussian Differentially Private Human Faces Under a Face Radial Curve Representation
MANTRA: The Manifold Triangulations Assemblage
Diffusion-Based Planning for Autonomous Driving with Flexible Guidance
Skill Expansion and Composition in Parameter Space
What Do You See in Common? Learning Hierarchical Prototypes over Tree-of-Life to Discover Evolutionary Traits
Backdooring Vision-Language Models with Out-Of-Distribution Data
Geometry of Long-Tailed Representation Learning: Rebalancing Features for Skewed Distributions
Gaussian-Det: Learning Closed-Surface Gaussians for 3D Object Detection
A Transfer Attack to Image Watermarks
Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs
MM-EMBED: UNIVERSAL MULTIMODAL RETRIEVAL WITH MULTIMODAL LLMS
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
GenXD: Generating Any 3D and 4D Scenes
Locality-aware Gaussian Compression for Fast and High-quality Rendering
EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
Meta-Continual Learning of Neural Fields
Faster Algorithms for Structured Linear and Kernel Support Vector Machines
On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving
Adversarial Attacks on Data Attribution
Energy-Weighted Flow Matching for Offline Reinforcement Learning
DPLM-2: A Multimodal Diffusion Protein Language Model
CONTRA: Conformal Prediction Region via Normalizing Flow Transformation
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models
UTILITY: Utilizing Explainable Reinforcement Learning to Improve Reinforcement Learning
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Can Large Language Models Understand Symbolic Graphics Programs?
DistillHGNN: A Knowledge Distillation Approach for High-Speed Hypergraph Neural Networks
Deep Kernel Relative Test for Machine-generated Text Detection
Continuous Autoregressive Modeling with Stochastic Monotonic Alignment for Speech Synthesis
Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solver
Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems
Visual Agents as Fast and Slow Thinkers
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents
Grounding Multimodal Large Language Model in GUI World
Denoising with a Joint-Embedding Predictive Architecture
LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation
CofCA: A STEP-WISE Counterfactual Multi-hop QA benchmark
Learning Molecular Representation in a Cell
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Exact Certification of (Graph) Neural Networks Against Label Poisoning
Learning stochastic dynamics from snapshots through regularized unbalanced optimal transport
Unlocking Point Processes through Point Set Diffusion
Learning Spatial-Semantic Features for Robust Video Object Segmentation
SafeDiffuser: Safe Planning with Diffusion Probabilistic Models
PseDet: Revisiting the Power of Pseudo Label in Incremental Object Detection
The Breakdown of Gaussian Universality in Classification of High-dimensional Linear Factor Mixtures
Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
FedTMOS: Efficient One-Shot Federated Learning with Tsetlin Machine
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Everything is Editable: Extend Knowledge Editing to Unstructured Data in Large Language Models
Learning View-invariant World Models for Visual Robotic Manipulation
How Gradient descent balances features: A dynamical analysis for two-layer neural networks
A Unifying Framework for Representation Learning
Minimax Optimal Two-Stage Algorithm For Moment Estimation Under Covariate Shift
TimeKAN: KAN-based Frequency Decomposition Learning Architecture for Long-term Time Series Forecasting
From Models to Microtheories: Distilling a Model's Topical Knowledge for Grounded Question-Answering
Exploring Local Memorization in Diffusion Models via Bright Ending Attention
ALBAR: Adversarial Learning approach to mitigate Biases in Action Recognition
Analytic DAG Constraints for Differentiable DAG Learning
Artificial Kuramoto Oscillatory Neurons
InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales
Exploiting Structure in Offline Multi-Agent RL: The Benefits of Low Interaction Rank
Towards Generalization Bounds of GCNs for Adversarially Robust Node Classification
Physics-informed Temporal Difference Metric Learning for Robot Motion Planning
Scalable and Certifiable Graph Unlearning: Overcoming the Approximation Error Barrier
Restating the Proof of Linear Convergence for Linear GNNs
TGB-Seq Benchmark: Challenging Temporal GNNs with Complex Sequential Dynamics
Energy-based Backdoor Defense Against Federated Graph Learning
How Two-Layer Neural Networks Learn, One (Giant) Step at a Time
Mechanistic Permutability: Match Features Across Layers
Moral Alignment for LLM Agents
Ask, and it shall be given: On the Turing completeness of prompting
One-for-All Few-Shot Anomaly Detection via Instance-Induced Prompt Learning
From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data
Process Reward Model with Q-value Rankings
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
UniCon: Unidirectional Information Flow for Effective Control of Large-Scale Diffusion Models
Efficient Cross-Episode Meta-RL
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities
COMBO: Compositional World Models for Embodied Multi-Agent Cooperation
Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification
Learn Your Reference Model for Real Good Alignment
Steering Protein Family Design through Profile Bayesian Flow
C-CLIP: Multimodal Continual Learning for Vision-Language Model
Local-Prompt: Extensible Local Prompts for Few-Shot Out-of-Distribution Detection
Adaptive Batch Size for Privately Finding Second-Order Stationary Points
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
BRAID: Input-driven Nonlinear Dynamical Modeling of Neural-Behavioral Data
Learning Clustering-based Prototypes for Compositional Zero-Shot Learning
DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding
Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
CR2PQ: Continuous Relative Rotary Positional Query for Dense Visual Representation Learning
Benign Overfitting in Out-of-Distribution Generalization of Linear Models
Captured by Captions: On Memorization and its Mitigation in CLIP Models
Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation
Understanding Fairness Surrogate Functions in Algorithmic Fairness
Neural Eulerian Scene Flow Fields
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Calibrating LLMs with Information-Theoretic Evidential Deep Learning
Diffusion Actor-Critic: Formulating Constrained Policy Iteration as Diffusion Noise Regression for Offline Reinforcement Learning
TIGeR: Unifying Text-to-Image Generation and Retrieval with Large Multimodal Models
Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage
Effective and Efficient Time-Varying Counterfactual Prediction with State-Space Models
Rethinking Neural Multi-Objective Combinatorial Optimization via Neat Weight Embedding
Neural Multi-Objective Combinatorial Optimization via Graph-Image Multimodal Fusion
Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
Language Models Learn to Mislead Humans via RLHF
Adversarial Generative Flow Network for Solving Vehicle Routing Problems
DataMan: Data Manager for Pre-training Large Language Models
Graph Assisted Offline-Online Deep Reinforcement Learning for Dynamic Workflow Scheduling
RNNs are not Transformers (Yet): The Key Bottleneck on In-Context Retrieval
Efficient Action-Constrained Reinforcement Learning via Acceptance-Rejection Method and Augmented MDPs
QMP: Q-switch Mixture of Policies for Multi-Task Behavior Sharing
Flow Distillation Sampling: Regularizing 3D Gaussians with Pre-trained Matching Priors
UniCoTT: A Unified Framework for Structural Chain-of-Thought Distillation
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agent
$\text{D}_{2}\text{O}$: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
Can Textual Gradient Work in Federated Learning?
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs
Differentially Private Steering for Large Language Model Alignment
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection
Aligning Language Models with Demonstrated Feedback
AugKD: Ingenious Augmentations Empower Knowledge Distillation for Image Super-Resolution
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations
Uni$^2$Det: Unified and Universal Framework for Prompt-Guided Multi-dataset 3D Detection
Enhancing Learning with Label Differential Privacy by Vector Approximation
RB-Modulation: Training-Free Stylization using Reference-Based Modulation
PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment
$InterLCM$: Low-Quality Images as Intermediate States of Latent Consistency Models for Effective Blind Face Restoration
HelpSteer2-Preference: Complementing Ratings with Preferences
Single Teacher, Multiple Perspectives: Teacher Knowledge Augmentation for Enhanced Knowledge Distillation
Contextual Document Embeddings
Matcha: Mitigating Graph Structure Shifts with Test-Time Adaptation
Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators
FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models
SecureGS: Boosting the Security and Fidelity of 3D Gaussian Splatting Steganography
Your Weak LLM is Secretly a Strong Teacher for Alignment
DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation
CONDA: Adaptive Concept Bottleneck for Foundation Models Under Distribution Shifts
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs
Learning Graph Invariance by Harnessing Spuriosity
Zigzag Diffusion Sampling: Diffusion Models Can Self-Improve via Self-Reflection
Grounding Continuous Representations in Geometry: Equivariant Neural Fields
Shedding Light on Time Series Classification using Interpretability Gated Networks
Measuring And Improving Engagement of Text-to-Image Generation Models
Discriminating image representations with principal distortions
MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis
One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt
LLM-SR: Scientific Equation Discovery via Programming with Large Language Models
Advancing Prompt-Based Methods for Replay-Independent General Continual Learning
FACTS: A Factored State-Space Framework for World Modelling
Wasserstein Distances, Neuronal Entanglement, and Sparsity
Lean-STaR: Learning to Interleave Thinking and Proving
Self-Play Preference Optimization for Language Model Alignment
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving
OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?
Flow Matching with General Discrete Paths: A Kinetic-Optimal Perspective
MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane Representation
Enhancing Zeroth-order Fine-tuning for Language Models with Low-rank Structures
On the Performance Analysis of Momentum Method: A Frequency Domain Perspective
Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers
SaMer: A Scenario-aware Multi-dimensional Evaluator for Large Language Models
Self-supervised Monocular Depth Estimation Robust to Reflective Surface Leveraged by Triplet Mining
InstantSplamp: Fast and Generalizable Stenography Framework for Generative Gaussian Splatting
PFDiff: Training-Free Acceleration of Diffusion Models Combining Past and Future Scores
4K4DGen: Panoramic 4D Generation at 4K Resolution
Fast and Slow Streams for Online Time Series Forecasting Without Information Leakage
FLIP: Flow-Centric Generative Planning as General-Purpose Manipulation World Model
ScImage: How good are multimodal large language models at scientific text-to-image generation?
Binary Losses for Density Ratio Estimation
Data Taggants: Dataset Ownership Verification Via Harmless Targeted Data Poisoning
DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image
BANGS: Game-theoretic Node Selection for Graph Self-Training
CityAnchor: City-scale 3D Visual Grounding with Multi-modality LLMs
Not All Language Model Features Are One-Dimensionally Linear
MVTokenFlow: High-quality 4D Content Generation using Multiview Token Flow
Rapidly Adapting Policies to the Real-World via Simulation-Guided Fine-Tuning
Modality-Specialized Synergizers for Interleaved Vision-Language Generalists
SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Model
State Space Models are Provably Comparable to Transformers in Dynamic Token Selection
AgentStudio: A Toolkit for Building General Virtual Agents
Unleashing the Power of Task-Specific Directions in Parameter Efficient Fine-tuning
Diffusing to the Top: Boost Graph Neural Networks with Minimal Hyperparameter Tuning
When Graph Neural Networks Meet Dynamic Mode Decomposition
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Linear Transformer Topological Masking with Graph Random Features
Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds
DINOv2: Learning Robust Visual Features without Supervision
Learning under Temporal Label Noise
Robustness of Quantum Algorithms for Nonconvex Optimization
Uncertainty Herding: One Active Learning Method for All Label Budgets
What Has Been Overlooked in Contrastive Source-Free Domain Adaptation: Leveraging Source-Informed Latent Augmentation within Neighborhood Context
Reducing Hallucinations in Large Vision-Language Models via Latent Space Steering
TorchTitan: One-stop PyTorch native solution for production ready LLM pretraining
HELM: Hierarchical Encoding for mRNA Language Modeling
ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents
$q$-exponential family for policy optimization
SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection
Do as We Do, Not as You Think: the Conformity of Large Language Models
Addressing Label Shift in Distributed Learning via Entropy Regularization
LIFe-GoM: Generalizable Human Rendering with Learned Iterative Feedback Over Multi-Resolution Gaussians-on-Mesh
PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs
Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data
The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs
Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions
Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model
Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics
TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization With Estimated Weights
ParaSolver: A Hierarchical Parallel Integral Solver for Diffusion Models
NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer
Identifiable Exchangeable Mechanisms for Causal Structure and Representation Learning
Narrowing Information Bottleneck Theory for Multimodal Image-Text Representations Interpretability
Interaction Asymmetry: A General Principle for Learning Composable Abstractions
Cross-Entropy Is All You Need To Invert the Data Generating Process
Let the Code LLM Edit Itself When You Edit the Code
Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels
In Search of Forgotten Domain Generalization
MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility
FIG: Flow with Interpolant Guidance for Linear Inverse Problems
Towards Hierarchical Rectified Flow
SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation
Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement Learning
Large Language Models are Interpretable Learners
AutoG: Towards automatic graph construction from tabular data
AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation
Demystifying Online Clustering of Bandits: Enhanced Exploration Under Stochastic and Smoothed Adversarial Contexts
Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
Scalable Discrete Diffusion Samplers: Combinatorial Optimization and Statistical Physics
Perturbation-Restrained Sequential Model Editing
Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models
Quality Measures for Dynamic Graph Generative Models
Injecting Universal Jailbreak Backdoors into LLMs in Minutes
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
Reward Guided Latent Consistency Distillation
On Bits and Bandits: Quantifying the Regret-Information Trade-off
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models
Relax and Merge: A Simple Yet Effective Framework for Solving Fair $k$-Means and $k$-sparse Wasserstein Barycenter Problems
JudgeBench: A Benchmark for Evaluating LLM-Based Judges
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
To Tackle Adversarial Transferability: A Novel Ensemble Training Method with Fourier Transformation
Boosting Methods for Interval-censored Data with Regression and Classification
Multi-level Certified Defense Against Poisoning Attacks in Offline Reinforcement Learning
GridMix: Exploring Spatial Modulation for Neural Fields in PDE Modeling
Breaking Neural Network Scaling Laws with Modularity
Tractable Multi-Agent Reinforcement Learning through Behavioral Economics
ElasticTok: Adaptive Tokenization for Image and Video
World Model on Million-Length Video And Language With Blockwise RingAttention
RocketEval: Efficient automated LLM evaluation via grading checklist
Look Before You Leap: Universal Emergent Mechanism for Retrieval in Language Models
The Hidden Cost of Waiting for Accurate Predictions
Boosting Latent Diffusion with Perceptual Objectives
Do LLMs ``know'' internally when they follow instructions?
RelCon: Relative Contrastive Learning for a Motion Foundation Model for Wearable Data
EFFICIENT JAILBREAK ATTACK SEQUENCES ON LARGE LANGUAGE MODELS VIA MULTI-ARMED BANDIT-BASED CONTEXT SWITCHING
Diff3DS: Generating View-Consistent 3D Sketch via Differentiable Curve Rendering
Adaptive Length Image Tokenization via Recurrent Allocation
Diffusing States and Matching Scores: A New Framework for Imitation Learning
Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking
Positional Embeddings in Transformer Models: Evolution from Text to Vision Domains
Natural Language Inference Improves Compositionality in Vision-Language Models
Language models scale reliably with over-training and on downstream tasks
MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmentation
Diffusion-based Decoupled Deterministic and Uncertain Framework for Probabilistic Multivariate Time Series Forecasting
Episodic Novelty Through Temporal Distance
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation
Fewer May Be Better: Enhancing Offline Reinforcement Learning with Reduced Dataset
Context-Alignment: Activating and Enhancing LLMs Capabilities in Time Series
ECD: A Machine Learning Benchmark for Predicting Enhanced-Precision Electronic Charge Density in Crystalline Inorganic Materials
Harnessing Diversity for Important Data Selection in Pretraining Large Language Models
Global Well-posedness and Convergence Analysis of Score-based Generative Models via Sharp Lipschitz Estimates
Domain Guidance: A Simple Transfer Approach for a Pre-trained Diffusion Model
VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning
Linear Multistep Solver Distillation for Fast Sampling of Diffusion Models
BodyGen: Advancing Towards Efficient Embodiment Co-Design
ProtoSnap: Prototype Alignment For Cuneiform Signs
Weighted-Reward Preference Optimization for Implicit Model Fusion
BBCaL: Black-box Backdoor Detection under the Causality Lens
Reward Dimension Reduction for Scalable Multi-Objective Reinforcement Learning
Mind Control through Causal Inference: Predicting Clean Images from Poisoned Data
Equivariant Masked Position Prediction for Efficient Molecular Representation
Tree of Attributes Prompt Learning for Vision-Language Models
Decision Information Meets Large Language Models: The Future of Explainable Operations Research
Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes
Explain Yourself, Briefly! Self-Explaining Neural Networks with Concise Sufficient Reasons
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-Time Alignment
Out-of-distribution Generalization for Total Variation based Invariant Risk Minimization
SPD Attack - Prevention of AI Powered Image Editing by Image Immunization
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency
Towards Bridging Generalization and Expressivity of Graph Neural Networks
Shared-AE: Automatic Identification of Shared Subspaces in High-dimensional Neural and Behavioral Activity
Medium-Difficulty Samples Constitute Smoothed Decision Boundary for Knowledge Distillation on Pruned Datasets
Value-aligned Behavior Cloning for Offline Reinforcement Learning via Bi-level Optimization
Physics-aligned field reconstruction with diffusion bridge
How to Find the Exact Pareto Front for Multi-Objective MDPs?
Hybrid Regularization Improves Diffusion-based Inverse Problem Solving
Broadening Target Distributions for Accelerated Diffusion Models via a Novel Analysis Approach
Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers
PEARL: Towards Permutation-Resilient LLMs
SymmetricDiffusers: Learning Discrete Diffusion on Finite Symmetric Groups
CaPo: Cooperative Plan Optimization for Efficient Embodied Multi-Agent Cooperation
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
HGM³: Hierarchical Generative Masked Motion Modeling with Hard Token Mining
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
Logic-Logit: A Logic-Based Approach to Choice Modeling
Learning Evolving Tools for Large Language Models
Enhancing Compositional Text-to-Image Generation with Reliable Random Seeds
WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models
Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance
Robust Representation Consistency Model via Contrastive Denoising
Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning
Tracing Representation Progression: Analyzing and Enhancing Layer-Wise Similarity
Robustness Reprogramming for Representation Learning
On the Linear Speedup of Personalized Federated Reinforcement Learning with Shared Representations
Understanding Constraint Inference in Safety-Critical Inverse Reinforcement Learning
Alchemy: Amplifying Theorem-Proving Capability Through Symbolic Mutation
Multi-Accurate CATE is Robust to Unknown Covariate Shifts
OpenPRM: Building Open-domain Process-based Reward Models with Preference Trees
Advancing LLM Reasoning Generalists with Preference Trees
NeuralPlane: Structured 3D Reconstruction in Planar Primitives with Neural Fields
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models
Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection
{$\tau$}-bench: A Benchmark for \underline{T}ool-\underline{A}gent-\underline{U}ser Interaction in Real-World Domains
QuaDiM: A Conditional Diffusion Model For Quantum State Property Estimation
Learning Interleaved Image-Text Comprehension in Vision-Language Large Models
ADIFF: Explaining audio difference using natural language
Taming Transformer Without Using Learning Rate Warmup
Towards a learning theory of representation alignment
Preference Diffusion for Recommendation
Fair Submodular Cover
Biologically Plausible Brain Graph Transformer
Real2Code: Reconstruct Articulated Objects via Code Generation
CertainlyUncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
BLEND: Behavior-guided Neural Population Dynamics Modeling via Privileged Knowledge Distillation
AI as Humanity’s Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text
$F^3Set$: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos
RecFlow: An Industrial Full Flow Recommendation Dataset
Towards a General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
How Far Are We from True Unlearnability?
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation
GaussianBlock: Building Part-Aware Compositional and Editable 3D Scene by Primitives and Gaussians
Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo
MoDeGPT: Modular Decomposition for Large Language Model Compression
Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search
Reflexive Guidance: Improving OoDD in Vision-Language Models via Self-Guided Image-Adaptive Concept Generation
W-PCA Based Gradient-Free Proxy for Efficient Search of Lightweight Language Models
CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL
A CLIP-Powered Framework for Robust and Generalizable Data Selection
Flat Reward in Policy Parameter Space Implies Robust Reinforcement Learning
Revisiting Convolution Architecture in the Realm of DNA Foundation Models
Can One Modality Model Synergize Training of Other Modality Models?
BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities
Integral Performance Approximation for Continuous-Time Reinforcement Learning Control
Does Safety Training of LLMs Generalize to Semantically Related Natural Prompts?
SymmCD: Symmetry-Preserving Crystal Generation with Diffusion Models
A Formal Framework for Understanding Length Generalization in Transformers
How Much is a Noisy Image Worth? Data Scaling Laws for Ambient Diffusion.
Efficient Automated Circuit Discovery in Transformers using Contextual Decomposition
A Theoretically-Principled Sparse, Connected, and Rigid Graph Representation of Molecules
Learning Generalizable Skills from Offline Multi-Task Data for Multi-Agent Cooperation
Efficient Perplexity Bound and Ratio Matching in Discrete Diffusion Language Models
Iterative Substructure Extraction for Molecular Relational Learning with Interactive Graph Information Bottleneck
VideoGLUE: Video General Understanding Evaluation of Foundation Models
Utility-Directed Conformal Prediction: A Decision-Aware Framework for Actionable Uncertainty Quantification
Bridging the Semantic Gap Between Text and Table: A Case Study on NL2SQL
TIPS: Text-Image Pretraining with Spatial awareness
GPromptShield: Elevating Resilience in Graph Prompt Tuning Against Adversarial Attacks
Uncovering Overfitting in Large Language Model Editing
UniDrive: Towards Universal Driving Perception Across Camera Configurations
Context-aware Dynamic Pruning for Speech Foundation Models
ParetoFlow: Guided Flows in Multi-Objective Optimization
High-Dimensional Bayesian Optimisation with Gaussian Process Prior Variational Autoencoders
Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks
TimeMixer++: A General Time Series Pattern Machine for Universal Predictive Analysis
Ambient Diffusion Posterior Sampling: Solving Inverse Problems with Diffusion Models Trained on Corrupted Data
HyperDAS: Towards Automating Mechanistic Interpretability with Hypernetworks
Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification
Online Reinforcement Learning in Non-Stationary Context-Driven Environments
Strong Model Collapse
Optimal Brain Apoptosis
Node-Time Conditional Prompt Learning in Dynamic Graphs
Autoregressive Video Generation without Vector Quantization
A Stochastic Approach to the Subset Selection Problem via Mirror Descent
GReaTer: Gradients Over Reasoning Makes Smaller Language Models Strong Prompt Optimizers
Masked Temporal Interpolation Diffusion for Procedure Planning in Instructional Videos
DEPT: Decoupled Embeddings for Pre-training Language Models
Agent-to-Sim: Learning Interactive Behavior Models from Casual Longitudinal Videos
MAPS: Advancing Multi-Modal Reasoning in Expert-Level Physical Science
Automated Proof Generation for Rust Code via Self-Evolution
Century: A Framework and Dataset for Evaluating Historical Contextualisation of Sensitive Images
Bridging Compressed Image Latents and Multimodal Large Language Models
Fast Feedforward 3D Gaussian Splatting Compression
Selective Label Enhancement Learning for Test-Time Adaptation
SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP
Towards Calibrated Deep Clustering Network
ConMix: Contrastive Mixup at Representation Level for Long-tailed Deep Clustering
ProAdvPrompter: A Two-Stage Journey to Effective Adversarial Prompting for LLMs
Growth Inhibitors for Suppressing Inappropriate Image Concepts in Diffusion Models
The OMG dataset: An Open MetaGenomic corpus for mixed-modality genomic language modeling
CATCH: Channel-Aware Multivariate Time Series Anomaly Detection via Frequency Patching
Exposure Bracketing Is All You Need For A High-Quality Image
GlycanML: A Multi-Task and Multi-Structure Benchmark for Glycan Machine Learning
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
Scaling Long Context Training Data by Long-Distance Referrals
Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models
FedLWS: Federated Learning with Adaptive Layer-wise Weight Shrinking
DRL: Decomposed Representation Learning for Tabular Anomaly Detection
Gaussian-Based Instance-Adaptive Intensity Modeling for Point-Supervised Facial Expression Spotting
FlowDec: A flow-based full-band general audio codec with high perceptual quality
MuseGNN: Forming Scalable, Convergent GNN Layers that Minimize a Sampling-Based Energy
Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
AutoBencher: Towards Declarative Benchmark Construction
Avoid Overclaims: Summary of Complexity Bounds for Algorithms in Minimization and Minimax Optimization
On the Identification of Temporal Causal Representation with Instantaneous Dependence
Poison-splat: Computation Cost Attack on 3D Gaussian Splatting
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets
Multi-Label Node Classification with Label Influence Propagation
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Generalizing Weisfeiler-Lehman Kernels to Subgraphs
Digi-Q: Learning VLM Q-Value Functions for Training Device-Control Agents
CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation
ELFS: Label-Free Coreset Selection with Proxy Training Dynamics
Dreamweaver: Learning Compositional World Models from Pixels
On LLM Knowledge Distillation - A Comparison between Forward KL and Reverse KL
Longhorn: State Space Models are Amortized Online Learners
On the Hölder Stability of Multiset and Graph Neural Networks
Uncertainty and Influence aware Reward Model Refinement for Reinforcement Learning from Human Feedback
3D StreetUnveiler with Semantic-aware 2DGS - a simple baseline
PivotMesh: Generic 3D Mesh Generation via Pivot Vertices Guidance
EMOS: Embodiment-aware Heterogeneous Multi-robot Operating System with LLM Agents
Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth Study
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Free Hunch: Denoiser Covariance Estimation for Diffusion Models Without Extra Costs
Scale-Aware Contrastive Reverse Distillation for Unsupervised Medical Anomaly Detection
Generalized Principal-Agent Problem with a Learning Agent
Innovative Thinking, Infinite Humor: Humor Research of Large Language Models through Structured Thought Leaps
Near-optimal Active Regression of Single-Index Models
Few for Many: Tchebycheff Set Scalarization for Many-Objective Optimization
Let Your Features Tell The Differences: Understanding Graph Convolution By Feature Splitting
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming
Truncated Consistency Models
Heavy-Tailed Diffusion Models
Energy-Based Diffusion Language Models for Text Generation
Think while You Generate: Discrete Diffusion with Planned Denoising
LocoVR: Multiuser Indoor Locomotion Dataset in Virtual Reality
Quantized Spike-driven Transformer
Can Watermarked LLMs be Identified by Users via Crafted Prompts?
Persistent Pre-training Poisoning of LLMs
Backtracking Improves Generation Safety
Human-Aligned Chess With a Bit of Search
What Are Good Positional Encodings for Directed Graphs?
Consistent Flow Distillation for Text-to-3D Generation
3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds
From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions
Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo
Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models
DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation
Looped Transformers for Length Generalization
A Theoretical Perspective: How to Prevent Model Collapse in Self-consuming Training Loops
Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning
Language Models Need Inductive Biases to Count Inductively
CLDyB: Towards Dynamic Benchmarking for Continual Learning with Pre-trained Models
Unlocking the Power of Function Vectors for Characterizing and Mitigating Catastrophic Forgetting in Continual Instruction Tuning
SD-LoRA: Scalable Decoupled Low-Rank Adaptation for Class Incremental Learning
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
Learning-Guided Rolling Horizon Optimization for Long-Horizon Flexible Job-Shop Scheduling
GravMAD: Grounded Spatial Value Maps Guided Action Diffusion for Generalized 3D Manipulation
MADGEN: Mass-Spec attends to De Novo Molecular generation
Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models
MELODI: Exploring Memory Compression for Long Contexts
How Discrete and Continuous Diffusion Meet: Comprehensive Analysis of Discrete Diffusion Models via a Stochastic Integral Framework
Better autoregressive regression with LLMs via regression-aware fine-tuning
A Simple Approach to Unifying Diffusion-based Conditional Generation
PT-T2I/V: An Efficient Proxy-Tokenized Diffusion Transformer for Text-to-Image/Video-Task
ODE-based Smoothing Neural Network for Reinforcement Learning Tasks
Efficient Learning with Sine-Activated Low-Rank Matrices
Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity
Group Distributionally Robust Dataset Distillation with Risk Minimization
Proactive Privacy Amnesia for Large Language Models: Safeguarding PII with Negligible Impact on Model Utility
Can In-context Learning Really Generalize to Out-of-distribution Tasks?
Rethinking Invariance in In-context Learning
What is Wrong with Perplexity for Long-context Language Modeling?
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Conformal Prediction Sets Can Cause Disparate Impact
TFG-Flow: Training-free Guidance in Multimodal Generative Flow
Infinite-Resolution Integral Noise Warping for Diffusion Models
Is Factuality Enhancement a Free Lunch For LLMs? Better Factuality Can Lead to Worse Context-Faithfulness
MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
Residual Kernel Policy Network: Enhancing Stability and Robustness in RKHS-Based Reinforcement Learning
Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model
Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A Benchmark
Doubly robust identification of treatment effects from multiple environments
Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-Based Decision-Making Systems
Enhancing Federated Domain Adaptation with Multi-Domain Prototype-Based Federated Fine-Tuning
Re-Imagining Multimodal Instruction Tuning: A Representation View
Graph-Guided Scene Reconstruction from Images with 3D Gaussian Splatting
Dynamic Diffusion Transformer
Bridging the Gap between Database Search and \emph{De Novo} Peptide Sequencing with SearchNovo
ReNovo: Retrieval-Based \emph{De Novo} Mass Spectrometry Peptide Sequencing
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
Factor Graph-based Interpretable Neural Networks
The adaptive complexity of parallelized log-concave sampling
Improved Approximation Algorithms for $k$-Submodular Maximization via Multilinear Extension
Progressive Parameter Efficient Transfer Learning for Semantic Segmentation
The Crucial Role of Samplers in Online Direct Preference Optimization
I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength
Reinforcement Learning from Imperfect Corrective Actions and Proxy Rewards
The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise
BenTo: Benchmark Reduction with In-Context Transferability
Is Your Multimodal Language Model Oversensitive to Safe Queries?
Many-Objective Multi-Solution Transport
Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective
Integrating Protein Dynamics into Structure-Based Drug Design via Full-Atom Stochastic Flows
Group Ligands Docking to Protein Pockets
ProteinBench: A Holistic Evaluation of Protein Foundation Models
Enhancing Prediction Performance through Influence Measure
Point Cluster: A Compact Message Unit for Communication-Efficient Collaborative Perception
Dense Video Object Captioning from Disjoint Supervision
A Robust Method to Discover Causal or Anticausal Relation
N-ForGOT: Towards Not-forgetting and Generalization of Open Temporal Graph Learning
HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction
UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting
CryoFM: A Flow-based Foundation Model for Cryo-EM Densities
Anyprefer: An Agentic Framework for Preference Data Synthesis
Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models
Routing Experts: Learning to Route Dynamic Experts in Existing Multi-modal Large Language Models
$\gamma-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
DiffPC: Diffusion-based High Perceptual Fidelity Image Compression with Semantic Refinement
UniMatch: Universal Matching from Atom to Task for Few-Shot Drug Discovery
Learning LLM-as-a-Judge for Preference Alignment
BitStack: Any-Size Compression of Large Language Models in Variable Memory Environments
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine
Reliable and Diverse Evaluation of LLM Medical Knowledge Mastery
On the Role of Attention Heads in Large Language Model Safety
WeatherGFM: Learning a Weather Generalist Foundation Model via In-context Learning
CARTS: Advancing Neural Theorem Proving with Diversified Tactic Calibration and Bias-Resistant Tree Search
Circuit Representation Learning with Masked Gate Modeling and Verilog-AIG Alignment
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
Learning Graph Quantized Tokenizers
Motion Control of High-Dimensional Musculoskeletal Systems with Hierarchical Model-Based Planning
Stealthy Shield Defense: A Conditional Mutual Information-Based Approach against Black-Box Model Inversion Attacks
PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration
BadRobot: Jailbreaking Embodied LLM Agents in the Physical World
Hyper-Connections
Ultra-Sparse Memory Network
Re-Aligning Language to Visual Objects with an Agentic Workflow
DSPO: Direct Score Preference Optimization for Diffusion Model Alignment
On Quantizing Neural Representation for Variable-Rate Video Coding
From Decoupling to Adaptive Transformation: a Wider Optimization Space for PTQ
Towards Effective Evaluations and Comparisons for LLM Unlearning Methods
Inverse Rendering using Multi-Bounce Path Tracing and Reservoir Sampling
An Intelligent Agentic System for Complex Image Restoration Problems
Scaling Large Language Model-based Multi-Agent Collaboration
Toward Efficient Multi-Agent Exploration With Trajectory Entropy Maximization
VideoGrain: Modulating Space-Time Attention for Multi-Grained Video Editing
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
Less is More: Masking Elements in Image Condition Features Avoids Content Leakages in Style Transfer Diffusion Models
Noisy Test-Time Adaptation in Vision-Language Models
When Prompt Engineering Meets Software Engineering: CNL-P as Natural and Robust "APIs'' for Human-AI Interaction
An Empirical Analysis of Uncertainty in Large Language Model Evaluations
Towards Out-of-Modal Generalization without Instance-level Modal Correspondence
Quantifying Generalization Complexity for Large Language Models
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
AgentRefine: Enhancing Agent Generalization through Refinement Tuning
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection
Vision Language Models are In-Context Value Learners
DynFrs: An Efficient Framework for Machine Unlearning in Random Forest
SigDiffusions: Score-Based Diffusion Models for Time Series via Log-Signature Embeddings
TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data
Asymmetric Factorized Bilinear Operation for Vision Transformer
Efficient Masked AutoEncoder for Video Object Counting and A Large-Scale Benchmark
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
DUET: Decentralized Bilevel Optimization without Lower-Level Strong Convexity
Multi-Reward as Condition for Instruction-based Image Editing
A Tight Convergence Analysis of Inexact Stochastic Proximal Point Algorithm for Stochastic Composite Optimization Problems
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
A Benchmark for Semantic Sensitive Information in LLMs Outputs
Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective
Data Center Cooling System Optimization Using Offline Reinforcement Learning
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation
Lipschitz Bandits in Optimal Space
TEASER: Token Enhanced Spatial Modeling for Expressions Reconstruction
INFER: A Neural-symbolic Model For Extrapolation Reasoning on Temporal Knowledge Graph
Generalizability of Neural Networks Minimizing Empirical Risk Based on Expressive Power
Provable Robust Overfitting Mitigation in Wasserstein Distributionally Robust Optimization
Complementary Label Learning with Positive Label Guessing and Negative Label Enhancement
InstaRevive: One-Step Image Enhancement via Dynamic Score Matching
Physics of Language Models: Part 3.2, Knowledge Manipulation
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
Unhackable Temporal Reward for Scalable Video MLLMs
GROOT-2: Weakly Supervised Multimodal Instruction Following Agents
Knowledge Graph Finetuning Enhances Knowledge Manipulation in Large Language Models
Computing Circuits Optimization via Model-Based Circuit Genetic Evolution
A Graph Enhanced Symbolic Discovery Framework For Efficient Logic Optimization
Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models
Uni-Sign: Toward Unified Sign Language Understanding at Scale
RMB: Comprehensively benchmarking reward models in LLM alignment
Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning
Jailbreaking as a Reward Misspecification Problem
Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement
MagicPIG: LSH Sampling for Efficient LLM Generation
D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement
Incremental Causal Effect for Time to Treatment Initialization
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
Graph Neural Ricci Flow: Evolving Feature from a Curvature Perspective
On Designing General and Expressive Quantum Graph Neural Networks with Applications to MILP Instance Representation
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces
OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
On the Completeness of Invariant Geometric Deep Learning Models
MixEval-X: Any-to-any Evaluations from Real-world Data Mixture
RobustKV: Defending Large Language Models against Jailbreak Attacks via KV Eviction
Entropy-based Activation Function Optimization: A Method on Searching Better Activation Functions
BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks
From Isolated Conversations to Hierarchical Schemas: Dynamic Tree Memory Representation for LLMs
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
Retrieval Augmented Diffusion Model for Structure-informed Antibody Design and Optimization
Humanizing the Machine: Proxy Attacks to Mislead LLM Detectors
Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning
Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization
Episodic Memories Generation and Evaluation Benchmark for Large Language Models
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Sort-free Gaussian Splatting via Weighted Sum Rendering
Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
STAR: Stability-Inducing Weight Perturbation for Continual Learning
OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalities
Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models
OMG: Opacity Matters in Material Modeling with Gaussian Splatting
Preble: Efficient Distributed Prompt Scheduling for LLM Serving
Learning Hierarchical Polynomials of Multiple Nonlinear Features
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist
Understanding Matrix Function Normalizations in Covariance Pooling through the Lens of Riemannian Geometry
Gyrogroup Batch Normalization
Convergence of Distributed Adaptive Optimization with Local Updates
Do Egocentric Video-Language Models Truly Understand Hand-Object Interactions?
ZeroDiff: Solidified Visual-semantic Correlation in Zero-Shot Learning
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions
GrabS: Generative Embodied Agent for 3D Object Segmentation without Scene Supervision
Hymba: A Hybrid-head Architecture for Small Language Models
OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones?
QP-SNN: Quantized and Pruned Spiking Neural Networks
LaMPlace: Learning to Optimize Cross-Stage Metrics in Macro Placement
Differentiable Integer Linear Programming
Accelerating Neural ODEs: A Variational Formulation-based Approach
Text2PDE: Latent Diffusion Models for Accessible Physics Simulation
Self-Evolving Multi-Agent Collaboration Networks for Software Development
Monte Carlo Planning with Large Language Model for Text-Based Game Agents
Why RoPE Struggles to Maintain Long-Term Decay in Long Sequences?
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
Solving New Tasks by Adapting Internet Video Knowledge
Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs
Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs
Neuralized Markov Random Field for Interaction-Aware Stochastic Human Trajectory Prediction
Post-hoc Reward Calibration: A Case Study on Length Bias
Layerwise Recurrent Router for Mixture-of-Experts
Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence
On the Byzantine-Resilience of Distillation-Based Federated Learning
Preserving Diversity in Supervised Fine-Tuning of Large Language Models
Understanding and Mitigating Hallucination in Large Vision-Language Models via Modular Attribution and Intervention
Adam-mini: Use Fewer Learning Rates To Gain More
Learning to Adapt Frozen CLIP for Few-Shot Test-Time Domain Adaptation
As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss
Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning
LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation models
Combatting Dimensional Collapse in LLM Pre-Training Data via Submodular File Selection
Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception
Eliminating Position Bias of Language Models: A Mechanistic Approach
Query-based Knowledge Transfer for Heterogeneous Learning Environments
SEBRA : Debiasing through Self-Guided Bias Ranking
STAMP: Scalable Task- And Model-agnostic Collaborative Perception
A Statistical Approach for Controlled Training Data Detection
GS-CPR: Efficient Camera Pose Refinement via 3D Gaussian Splatting
Automatic Curriculum Expert Iteration for Reliable LLM Reasoning
Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages
Synthetic continued pretraining
ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
Kronecker Mask and Interpretive Prompts are Language-Action Video Learners
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Rethinking Light Decoder-based Solvers for Vehicle Routing Problems
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
ThinkBot: Embodied Instruction Following with Thought Chain Reasoning
EmbodiedSAM: Online Segment Any 3D Thing in Real Time
Strong Preferences Affect the Robustness of Preference Models and Value Alignment
Breach By A Thousand Leaks: Unsafe Information Leakage in 'Safe' AI Responses
Teaching LLMs How to Learn with Contextual Fine-Tuning
Planning in Natural Language Improves LLM Search for Code Generation
BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models
A Theoretical Analysis of Self-Supervised Learning for Vision Transformers
Open-Vocabulary Customization from CLIP via Data-Free Knowledge Distillation
Bayesian WeakS-to-Strong from Text Classification to Generation
Debiasing Federated Learning with Correlated Client Participation
TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models
Towards counterfactual fairness through auxiliary variables
SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Improving Neural Optimal Transport via Displacement Interpolation
Towards Continuous Reuse of Graph Models via Holistic Memory Diversification
SleepSMC: Ubiquitous Sleep Staging via Supervised Multimodal Coordination
EqNIO: Subequivariant Neural Inertial Odometry
OmniRe: Omni Urban Scene Reconstruction
EG4D: Explicit Generation of 4D Object without Score Distillation
Unsupervised Disentanglement of Content and Style via Variance-Invariance Constraints
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search
Cauchy-Schwarz Regularizers
Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models
T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching
PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
INS: Interaction-aware Synthesis to Enhance Offline Multi-agent Reinforcement Learning
A Periodic Bayesian Flow for Material Generation
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models
Don't Take Things Out of Context: Attention Intervention for Enhancing Chain-of-Thought Reasoning in Large Language Models
Improving Complex Reasoning with Dynamic Prompt Corruption: A Soft Prompt Optimization Approach
Constraint-Conditioned Actor-Critic for Offline Safe Reinforcement Learning
Language Model Alignment in Multilingual Trolley Problems
Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs
Structural-Entropy-Based Sample Selection for Efficient and Effective Learning
Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models
Discriminator-Guided Embodied Planning for LLM Agent
Controllable Unlearning for Image-to-Image Generative Models via $\epsilon$-Constrained Optimization
Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object Detection
Multi-Resolution Decomposable Diffusion Model for Non-Stationary Time Series Anomaly Detection
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation
LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning
VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words?
Mixture-of-Agents Enhances Large Language Model Capabilities
Scaling Instruction-tuned LLMs to Million-token Contexts via Hierarchical Synthetic Data Generation
Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models
CLIPDrag: Combining Text-based and Drag-based Instructions for Image Editing
CollabEdit: Towards Non-destructive Collaborative Knowledge Editing
Bridging Context Gaps: Leveraging Coreference Resolution for Long Contextual Understanding
Tool-Planner: Task Planning with Clusters across Multiple Tools
Small Models are LLM Knowledge Triggers for Medical Tabular Prediction
Greener GRASS: Enhancing GNNs with Encoding, Rewiring, and Attention
Graph Transformers Dream of Electric Flow
On Statistical Rates of Conditional Diffusion Transformers: Approximation, Estimation and Minimax Optimality
Learning Equivariant Non-Local Electron Density Functionals
Can Transformers Do Enumerative Geometry?
No Equations Needed: Learning System Dynamics Without Relying on Closed-Form ODEs
Zero-shot forecasting of chaotic systems
Associative memory and dead neurons
Foundation Models Secretly Understand Neural Network Weights: Enhancing Hypernetwork Architectures with Foundation Models
Generation and Comprehension Hand-in-Hand: Vision-guided Expression Diffusion for Boosting Referring Expression Generation and Comprehension
3DGS-Drag: Dragging Gaussians for Intuitive Point-Based 3D Editing
Sketch2Diagram: Generating Vector Diagrams from Hand-Drawn Sketches
Latent Radiance Fields with 3D-aware 2D Representations
Learning Color Equivariant Representations
CHAMP: Conformalized 3D Human Multi-Hypothesis Pose Estimators
RESfM: Robust Deep Equivariant Structure from Motion
Empowering Users in Digital Privacy Management through Interactive LLM-Based Agents
Smoothing the Shift: Towards Stable Test-Time Adaptation under Complex Multimodal Noises
BP-Modified Local Loss for Efficient Training of Deep Neural Networks
Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression
Are Large Vision Language Models Good Game Players?
Neural Phylogeny: Fine-Tuning Relationship Detection among Neural Networks
Your Mixture-of-Experts LLM Is Secretly an Embedding Model for Free
DOCS: Quantifying Weight Similarity for Deeper Insights into Large Language Models
BadJudge: Backdoor Vulnerabilities of LLM-As-A-Judge
Improving Neural Network Accuracy by Concurrently Training with a Twin Network
Democratic Training Against Universal Adversarial Perturbations
Severing Spurious Correlations with Data Pruning
Provably Reliable Conformal Prediction Sets in the Presence of Data Poisoning
The "Law'' of the Unconscious Contrastive Learner: Probabilistic Alignment of Unpaired Modalities
Bayesian Treatment of the Spectrum of the Empirical Kernel in (Sub)Linear-Width Neural Networks
Oscillatory State-Space Models
Two Sparse Matrices are Better than One: Sparsifying Neural Networks with Double Sparse Factorization
Designing Concise ConvNets with Columnar Stages
Sharpness-Aware Minimization: General Analysis and Improved Rates
Local convergence of simultaneous min-max algorithms to differential equilibrium on Riemannian manifold
Policy Optimization under Imperfect Human Interactions with Agent-Gated Shared Autonomy
Simple, Good, Fast: Self-Supervised World Models Free of Baggage
CBMA: Improving Conformal Prediction through Bayesian Model Averaging
End-to-end Learning of Gaussian Mixture Priors for Diffusion Sampler
Connecting Federated ADMM to Bayes
Training One-Dimensional Graph Neural Networks is NP-Hard
On the Optimal Memorization Capacity of Transformers
Efficient Online Pruning and Abstraction for Imperfect Information Extensive-Form Games
Strategic Classification With Externalities
Bounds on $L_p$ Errors in Density Ratio Estimation via $f$-Divergence Loss Functions
Conservative Contextual Bandits: Beyond Linear Representations
Linear Bandits with Memory
Do Stochastic, Feel Noiseless: Stable Stochastic Optimization via a Double Momentum Mechanism
Reexamining the Aleatoric and Epistemic Uncertainty Dichotomy
Generalizable Motion Planning via Operator Learning
ADAM: An Embodied Causal Agent in Open-World Environments
Euler Characteristic Tools for Topological Data Analysis
Neural networks on Symmetric Spaces of Noncompact Type
Exact Community Recovery under Side Information: Optimality of Spectral Algorithms
Content-Style Learning from Unaligned Domains: Identifiability under Unknown Latent Dimensions
Let SSMs be ConvNets: State-space Modeling with Optimal Tensor Contractions
Learning from End User Data with Shuffled Differential Privacy over Kernel Densities
How to Verify Any (Reasonable) Distribution Property: Computationally Sound Argument Systems for Distributions
ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning
Deep MMD Gradient Flow without adversarial training
How Feature Learning Can Improve Neural Scaling Laws
Understanding Factual Recall in Transformers via Associative Memories
ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler
Multi-LLM-Agents Debate - Performance, Efficiency, and Scaling Challenges
RTDiff: Reverse Trajectory Synthesis via Diffusion for Offline Reinforcement Learning
Small-to-Large Generalization: Training Data Influences Models Consistently Across Scale
Exploring The Forgetting in Adversarial Training: A Novel Method for Enhancing Robustness
Do LLM Agents Have Regret? A Case Study in Online Learning and Games
Can LLM Simulations Truly Reflect Humanity? A Deep Dive
Compute-Constrained Data Selection
Multi-objective antibody design with constrained preference optimization
Linear SCM Identification in the Presence of Confounders and Gaussian Noise
No Location Left Behind: Measuring and Improving the Fairness of Implicit Representations for Earth Data
We use cookies to store which papers have been visited.
I agree
Successful Page Load
ICLR uses cookies for essential functions only. We do not sell your personal information.
Our Privacy Policy »
Accept Cookies
We use cookies to store which papers have been visited.
I agree