Skip to yearly menu bar
Skip to main content
Main Navigation
ICLR
Help/FAQ
Contact ICLR
Downloads
ICLR Blog
Code of Conduct
Privacy Policy
Create Profile
Reset Password
Journal To Conference Track
Diversity & Inclusion
Proceedings at OpenReview
Future Meetings
Press
Exhibitor Information
ICLR Twitter
About ICLR
My Stuff
Login
Select Year: (2025)
2025
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
Getting Started
Schedule
Main Conference
Awards
Papers
In-person Orals
Spotlight Posters
Invited Talks
Workshops
Community
Town Hall
Affinity Events
Socials
Sponsors
Organizers
Help
RocketChat Client
Website FAQ
Helpdesk
Layout:
mini
compact
topic
detail
×
No topics available
No sessions available
title
author
topic
session
shuffle
by
serendipity
bookmarked first
visited first
not visited first
bookmarked but not visited
Enable Javascript in your browser to see the papers page.
Vision Language Models are In-Context Value Learners
SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
Denoising with a Joint-Embedding Predictive Architecture
Do Stochastic, Feel Noiseless: Stable Stochastic Optimization via a Double Momentum Mechanism
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
Generalizing Weisfeiler-Lehman Kernels to Subgraphs
Discretization-invariance? On the Discretization Mismatch Errors in Neural Operators
From an LLM Swarm to a PDDL-empowered Hive: Planning Self-executed Instructions in a Multi-modal Jungle
MotionDreamer: One-to-Many Motion Synthesis with Localized Generative Masked Transformer
DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications for Multi-Task RL
ComLoRA: A Competitive Learning Approach for Enhancing LoRA
No Free Lunch: Fundamental Limits of Learning Non-Hallucinating Generative Models
Exploiting Distribution Constraints for Scalable and Efficient Image Retrieval
Directional Gradient Projection for Robust Fine-Tuning of Foundation Models
Efficient Imitation under Misspecification
Overcoming Lower-Level Constraints in Bilevel Optimization: A Novel Approach with Regularized Gap Functions
Learned Reference-based Diffusion Sampler for multi-modal distributions
Neural Stochastic Differential Equations for Uncertainty-Aware Offline RL
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Safety-Prioritizing Curricula for Constrained Reinforcement Learning
VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning
Wasserstein Distances, Neuronal Entanglement, and Sparsity
Feedback Favors the Generalization of Neural ODEs
Exploring The Forgetting in Adversarial Training: A Novel Method for Enhancing Robustness
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow
One for all and all for one: Efficient computation of partial Wasserstein distances on the line
Graph Neural Networks Gone Hogwild
Commit0: Library Generation from Scratch
Occlusion-aware Non-Rigid Point Cloud Registration via Unsupervised Neural Deformation Correntropy
Pursuing Feature Separation based on Neural Collapse for Out-of-Distribution Detection
EIA: ENVIRONMENTAL INJECTION ATTACK ON GENERALIST WEB AGENTS FOR PRIVACY LEAKAGE
Efficient and Robust Neural Combinatorial Optimization via Wasserstein-Based Coresets
$\phi$-Update: A Class of Policy Update Methods with Policy Convergence Guarantee
Linear Spherical Sliced Optimal Transport: A Fast Metric for Comparing Spherical Data
Do LLMs have Consistent Values?
On-the-fly Preference Alignment via Principle-Guided Decoding
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
Predicate Hierarchies Improve Few-Shot State Classification
Online Preference Alignment for Language Models via Count-based Exploration
ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
Combining Induction and Transduction for Abstract Reasoning
Teaching Human Behavior Improves Content Understanding Abilities Of VLMs
GeoILP: A Synthetic Dataset to Guide Large-Scale Rule Induction
Flash Inference: Near Linear Time Inference for Long Convolution Sequence Models and Beyond
Knowledge Distillation with Multi-granularity Mixture of Priors for Image Super-Resolution
MMTEB: Massive Multilingual Text Embedding Benchmark
Training-Free Diffusion Model Alignment with Sampling Demons
Intricacies of Feature Geometry in Large Language Models
Semantic Aware Representation Learning for Lifelong Learning
Efficient Dictionary Learning with Switch Sparse Autoencoders
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
Aligning Visual Contrastive learning models via Preference Optimization
Large Language Models are Interpretable Learners
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization
Tight Time Complexities in Parallel Stochastic Optimization with Arbitrary Computation Dynamics
Scalable Bayesian Learning with posteriors
Accelerating Training with Neuron Interaction and Nowcasting Networks
COME: Test-time Adaption by Conservatively Minimizing Entropy
Tight Lower Bounds under Asymmetric High-Order Hölder Smoothness and Uniform Convexity
On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
CViT: Continuous Vision Transformer for Operator Learning
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models
TopoDiffusionNet: A Topology-aware Diffusion Model
eQMARL: Entangled Quantum Multi-Agent Reinforcement Learning for Distributed Cooperation over Quantum Channels
Zero-shot forecasting of chaotic systems
Bridging Context Gaps: Leveraging Coreference Resolution for Long Contextual Understanding
Improved Sampling Algorithms for Lévy-Itô Diffusion Models
SimpleTM: A Simple Baseline for Multivariate Time Series Forecasting
BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL
gRNAde: Geometric Deep Learning for 3D RNA inverse design
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
Subgraph Federated Learning for Local Generalization
Interaction Asymmetry: A General Principle for Learning Composable Abstractions
TC-MoE: Augmenting Mixture of Experts with Ternary Expert Choice
CoMotion: Concurrent Multi-person 3D Motion
Diff-PIC: Revolutionizing Particle-In-Cell Nuclear Fusion Simulation with Diffusion Models
Homomorphism Counts as Structural Encodings for Graph Learning
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation
Adam-mini: Use Fewer Learning Rates To Gain More
Elliptic Loss Regularization
Can We Ignore Labels in Out of Distribution Detection?
Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-training of Deep Networks
Minimal Variance Model Aggregation: A principled, non-intrusive, and versatile integration of black box models
Dynamical Diffusion: Learning Temporal Dynamics with Diffusion Models
Generalized Video Moment Retrieval
$\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs
Neural networks on Symmetric Spaces of Noncompact Type
Bayesian Regularization of Latent Representation
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
LoRA-X: Bridging Foundation Models with Training-Free Cross-Model Adaptation
Repulsive Latent Score Distillation for Solving Inverse Problems
TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio Motion Embedding and Diffusion Interpolation
In vivo cell-type and brain region classification via multimodal contrastive learning
Revisiting Zeroth-Order Optimization: Minimum-Variance Two-Point Estimators and Directionally Aligned Perturbations
Mini-batch Coresets for Memory-efficient Language Model Training on Data Mixtures
Contrastive Learning from Synthetic Audio Doppelgängers
ImDy: Human Inverse Dynamics from Imitated Observations
Bridging the Data Provenance Gap Across Text, Speech, and Video
LR0.FM: LOW-RESOLUTION ZERO-SHOT CLASSIFICATION BENCHMARK FOR FOUNDATION MODELS
Language Models Need Inductive Biases to Count Inductively
SOO-Bench: Benchmarks for Evaluating the Stability of Offline Black-Box Optimization
Exploring Local Memorization in Diffusion Models via Bright Ending Attention
Transformers Learn Low Sensitivity Functions: Investigations and Implications
PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs
Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces
Conformal Prediction Sets Can Cause Disparate Impact
Diffusion Transformers for Tabular Data Time Series Generation
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Memory Mosaics
No Preference Left Behind: Group Distributional Preference Optimization
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
MagicPIG: LSH Sampling for Efficient LLM Generation
Ensembling Diffusion Models via Adaptive Feature Aggregation
Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization
CURIE: Evaluating LLMs on Multitask Scientific Long-Context Understanding and Reasoning
VideoPhy: Evaluating Physical Commonsense for Video Generation
MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA
A Distributional Approach to Uncertainty-Aware Preference Alignment Using Offline Demonstrations
Toward Exploratory Inverse Constraint Inference with Generative Diffusion Verifiers
LancBiO: Dynamic Lanczos-aided Bilevel Optimization via Krylov Subspace
Making Transformer Decoders Better Differentiable Indexers
Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference
LocoVR: Multiuser Indoor Locomotion Dataset in Virtual Reality
DSBench: How Far Are Data Science Agents from Becoming Data Science Experts?
GReaTer: Gradients Over Reasoning Makes Smaller Language Models Strong Prompt Optimizers
Deep Networks Learn Features From Local Discontinuities in the Label Function
Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass
SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations
MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion
MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models
Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Anyprefer: An Agentic Framework for Preference Data Synthesis
Improving Probabilistic Diffusion Models With Optimal Diagonal Covariance Matching
MuseGNN: Forming Scalable, Convergent GNN Layers that Minimize a Sampling-Based Energy
Functional Homotopy: Smoothing Discrete Optimization via Continuous Parameters for LLM Jailbreak Attacks
The Geometry of Categorical and Hierarchical Concepts in Large Language Models
Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs
From GNNs to Trees: Multi-Granular Interpretability for Graph Neural Networks
Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets
PWM: Policy Learning with Multi-Task World Models
Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models
On the Transfer of Object-Centric Representation Learning
DiSK: Differentially Private Optimizer with Simplified Kalman Filter for Noise Reduction
A Theoretical Analysis of Self-Supervised Learning for Vision Transformers
CoMRes: Semi-Supervised Time Series Forecasting Utilizing Consensus Promotion of Multi-Resolution
Fine-tuning with Reserved Majority for Noise Reduction
DOCS: Quantifying Weight Similarity for Deeper Insights into Large Language Models
Rethinking the role of frames for SE(3)-invariant crystal structure modeling
Generalization Guarantees for Representation Learning via Data-Dependent Gaussian Mixture Priors
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
A Unifying Framework for Representation Learning
Restructuring Vector Quantization with the Rotation Trick
Bridging the Gap between Variational Inference and Stochastic Gradient MCMC in Function Space
Towards Generalizable Reinforcement Learning via Causality-Guided Self-Adaptive Representations
MeshMask: Physics-Based Simulations with Masked Graph Neural Networks
Lightweight Predictive 3D Gaussian Splats
OpenPRM: Building Open-domain Process-based Reward Models with Preference Trees
Tell me about yourself: LLMs are aware of their learned behaviors
Efficient Discovery of Pareto Front for Multi-Objective Reinforcement Learning
Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
SAGEPhos: Sage Bio-Coupled and Augmented Fusion for Phosphorylation Site Detection
Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs
SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget
MuPT: A Generative Symbolic Music Pretrained Transformer
No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models
McEval: Massively Multilingual Code Evaluation
KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks
Reflexive Guidance: Improving OoDD in Vision-Language Models via Self-Guided Image-Adaptive Concept Generation
BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks
CONTRA: Conformal Prediction Region via Normalizing Flow Transformation
API Pack: A Massive Multi-Programming Language Dataset for API Call Generation
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies
Simple yet Effective Incomplete Multi-view Clustering: Similarity-level Imputation and Intra-view Hybrid-group Prototype Construction
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
AgentStudio: A Toolkit for Building General Virtual Agents
Inner Information Analysis Algorithm for Deep Neural Network based on Community
Optimal Transport for Time Series Imputation
Autoregressive Video Generation without Vector Quantization
Efficient Evolutionary Search Over Chemical Space with Large Language Models
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models
Capability Localization: Capabilities Can be Localized rather than Individual Knowledge
Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics
Scaling Speech-Text Pre-training with Synthetic Interleaved Data
Interpretable Unsupervised Joint Denoising and Enhancement for Real-World low-light Scenarios
Action Sequence Augmentation for Action Anticipation
Influence Functions for Scalable Data Attribution in Diffusion Models
RecDreamer: Consistent Text-to-3D Generation via Uniform Score Distillation
DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking head Video Generation
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions
Decoupling Layout from Glyph in Online Chinese Handwriting Generation
Reframing Structure-Based Drug Design Model Evaluation via Metrics Correlated to Practical Needs
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
KAA: Kolmogorov-Arnold Attention for Enhancing Attentive Graph Neural Networks
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations
Robust Representation Consistency Model via Contrastive Denoising
UGMathBench: A Diverse and Dynamic Benchmark for Undergraduate-Level Mathematical Reasoning with Large Language Models
Find A Winning Sign: Sign Is All We Need to Win the Lottery
Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws
Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models
InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales
Why Does the Effective Context Length of LLMs Fall Short?
Conformalized Interactive Imitation Learning: Handling Expert Shift and Intermittent Feedback
Bonsai: Gradient-free Graph Condensation for Node Classification
Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement
Fugatto 1: Foundational Generative Audio Transformer Opus 1
HADAMRNN: BINARY AND SPARSE TERNARY ORTHOGONAL RNNS
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
TGB-Seq Benchmark: Challenging Temporal GNNs with Complex Sequential Dynamics
TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation
Filtered not Mixed: Filtering-Based Online Gating for Mixture of Large Language Models
IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities
CAX: Cellular Automata Accelerated in JAX
Learning Geometric Reasoning Networks For Robot Task And Motion Planning
OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code
Benchmarking Agentic Workflow Generation
LoRA-Pro: Are Low-Rank Adapters Properly Optimized?
VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
DreamDistribution: Learning Prompt Distribution for Diverse In-distribution Generation
Scaling Autonomous Agents via Automatic Reward Modeling And Planning
DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation
Mitigating Parameter Interference in Model Merging via Sharpness-Aware Fine-Tuning
Scalable and Certifiable Graph Unlearning: Overcoming the Approximation Error Barrier
Training Neural Networks as Recognizers of Formal Languages
Agent-to-Sim: Learning Interactive Behavior Models from Casual Longitudinal Videos
Searching for Optimal Solutions with LLMs via Bayesian Optimization
Multi-Resolution Decomposable Diffusion Model for Non-Stationary Time Series Anomaly Detection
Self-Boosting Large Language Models with Synthetic Preference Data
CBQ: Cross-Block Quantization for Large Language Models
Rethinking Audio-Visual Adversarial Vulnerability from Temporal and Modality Perspectives
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark for Large Language Models
QMP: Q-switch Mixture of Policies for Multi-Task Behavior Sharing
High-Quality Joint Image and Video Tokenization with Causal VAE
Controlling Space and Time with Diffusion Models
Group-robust Sample Reweighting for Subpopulation Shifts via Influence Functions
DPaI: Differentiable Pruning at Initialization with Node-Path Balance Principle
Glad: A Streaming Scene Generator for Autonomous Driving
Generating Physical Dynamics under Priors
Preserving Deep Representations in One-Shot Pruning: A Hessian-Free Second-Order Optimization Framework
Learning Shape-Independent Transformation via Spherical Representations for Category-Level Object Pose Estimation
Diffusion Policy Policy Optimization
Mitigate the Gap: Improving Cross-Modal Alignment in CLIP
KBLaM: Knowledge Base augmented Language Model
HASARD: A Benchmark for Vision-Based Safe Reinforcement Learning in Embodied Agents
Look Before You Leap: Universal Emergent Mechanism for Retrieval in Language Models
Towards a Theoretical Understanding of Synthetic Data in LLM Post-Training: A Reverse-Bottleneck Perspective
SEBRA : Debiasing through Self-Guided Bias Ranking
Matrix Product Sketching via Coordinated Sampling
Adversarially Robust Out-of-Distribution Detection Using Lyapunov-Stabilized Embeddings
Robust Feature Learning for Multi-Index Models in High Dimensions
Towards a General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders
Multi-Label Test-Time Adaptation with Bound Entropy Minimization
3D-MolT5: Leveraging Discrete Structural Information for Molecule-Text Modeling
Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models
TopoNets: High performing vision and language models with brain-like topography
Adversarially Robust Anomaly Detection through Spurious Negative Pair Mitigation
BadJudge: Backdoor Vulnerabilities of LLM-As-A-Judge
Wasserstein-Regularized Conformal Prediction under General Distribution Shift
Diff3DS: Generating View-Consistent 3D Sketch via Differentiable Curve Rendering
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
Towards Self-Supervised Covariance Estimation in Deep Heteroscedastic Regression
Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling
Emergence of meta-stable clustering in mean-field transformer models
Learning LLM-as-a-Judge for Preference Alignment
Transformer Learns Optimal Variable Selection in Group-Sparse Classification
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late In Training
Self-Updatable Large Language Models by Integrating Context into Model Parameters
Dreamweaver: Learning Compositional World Models from Pixels
MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model
Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model
Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth Study
Investigating Pattern Neurons in Urban Time Series Forecasting
SplineGS: Learning Smooth Trajectories in Gaussian Splatting for Dynamic Scene Reconstruction
Retrieval Augmented Diffusion Model for Structure-informed Antibody Design and Optimization
From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions
CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes
URLOST: Unsupervised Representation Learning without Stationarity or Topology
Forgetting Transformer: Softmax Attention with a Forget Gate
Faster, More Efficient RLHF through Off-Policy Asynchronous Learning
PIN: Prolate Spheroidal Wave Function-based Implicit Neural Representations
Controllable Blur Data Augmentation Using 3D-Aware Motion Estimation
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages
Zero-cost Proxy for Adversarial Robustness Evaluation
Rethinking Spiking Neural Networks from an Ensemble Learning Perspective
Iterative Dual-RL: An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning
The Belief State Transformer
Advantage Alignment Algorithms
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
ClawMachine: Learning to Fetch Visual Tokens for Referential Comprehension
Self-supervised contrastive learning performs non-linear system identification
SINGER: Stochastic Network Graph Evolving Operator for High Dimensional PDEs
BodyGen: Advancing Towards Efficient Embodiment Co-Design
What Makes a Good Diffusion Planner for Decision Making?
Hidden in the Noise: Two-Stage Robust Watermarking for Images
Controllable Satellite-to-Street-View Synthesis with Precise Pose Alignment and Zero-Shot Environmental Control
A Decade's Battle on Dataset Bias: Are We There Yet?
VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning
DGQ: Distribution-Aware Group Quantization for Text-to-Image Diffusion Models
SymmetricDiffusers: Learning Discrete Diffusion Models over Finite Symmetric Groups
Towards Unified Human Motion-Language Understanding via Sparse Interpretable Characterization
Bridging the Semantic Gap Between Text and Table: A Case Study on NL2SQL
BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models
Re-Imagining Multimodal Instruction Tuning: A Representation View
Inverse decision-making using neural amortized Bayesian actors
Designing Concise ConvNets with Columnar Stages
Towards Scalable Topological Regularizers
MVTokenFlow: High-quality 4D Content Generation using Multiview Token Flow
Integrative Decoding: Improving Factuality via Implicit Self-consistency
Evaluating Large Language Models through Role-Guide and Self-Reflection: A Comparative Study
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
Understanding the Stability-based Generalization of Personalized Federated Learning
IgGM: A Generative Model for Functional Antibody and Nanobody Design
Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances
Ultra-Sparse Memory Network
Do You Keep an Eye on What I Ask? Mitigating Multimodal Hallucination via Attention-Guided Ensemble Decoding
Exact Community Recovery under Side Information: Optimality of Spectral Algorithms
Deconstructing What Makes a Good Optimizer for Autoregressive Language Models
System 1.x: Learning to Balance Fast and Slow Planning with Language Models
ToolDial: Multi-turn Dialogue Generation Method for Tool-Augmented Language Models
Semantic Temporal Abstraction via Vision-Language Model Guidance for Efficient Reinforcement Learning
Shared-AE: Automatic Identification of Shared Subspaces in High-dimensional Neural and Behavioral Activity
To Code or Not To Code? Exploring Impact of Code in Pre-training
Unified Parameter-Efficient Unlearning for LLMs
FreSh: Frequency Shifting for Accelerated Neural Representation Learning
Co$^{\mathbf{3}}$Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion
RocketEval: Efficient automated LLM evaluation via grading checklist
A Skewness-Based Criterion for Addressing Heteroscedastic Noise in Causal Discovery
Entropy-based Activation Function Optimization: A Method on Searching Better Activation Functions
Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer
Cross-Embodiment Dexterous Grasping with Reinforcement Learning
Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks
Gradient correlation is a key ingredient to accelerate SGD with momentum
PhysPDE: Rethinking PDE Discovery and a Physical HYpothesis Selection Benchmark
NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals
Deep Kernel Relative Test for Machine-generated Text Detection
AFlow: Automating Agentic Workflow Generation
Integral Performance Approximation for Continuous-Time Reinforcement Learning Control
Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaptation
AutoBencher: Towards Declarative Benchmark Construction
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
Forewarned is Forearmed: Harnessing LLMs for Data Synthesis via Failure-induced Exploration
Effective and Efficient Time-Varying Counterfactual Prediction with State-Space Models
Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI
On the expressiveness and spectral bias of KANs
NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models
Generative Monoculture in Large Language Models
Weighted-Reward Preference Optimization for Implicit Model Fusion
Disentangling Representations through Multi-task Learning
APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding
Causally Motivated Sycophancy Mitigation for Large Language Models
Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs
Pacmann: Efficient Private Approximate Nearest Neighbor Search
Decentralized Optimization with Coupled Constraints
Understanding the Generalization of In-Context Learning in Transformers: An Empirical Study
Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers
A Unified Framework for Forward and Inverse Problems in Subsurface Imaging using Latent Space Translations
Quantifying Generalization Complexity for Large Language Models
Mitigating Object Hallucination in MLLMs via Data-augmented Phrase-level Alignment
Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration
UniCO: On Unified Combinatorial Optimization via Problem Reduction to Matrix-Encoded General TSP
Sort-free Gaussian Splatting via Weighted Sum Rendering
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
PIG: Physics-Informed Gaussians as Adaptive Parametric Mesh Representations
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
Advantage-Guided Distillation for Preference Alignment in Small Language Models
Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo
Feature-Based Online Bilateral Trade
MaxCutPool: differentiable feature-aware Maxcut for pooling in graph neural networks
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Actions Speak Louder Than Words: Rate-Reward Trade-off in Markov Decision Processes
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Stem-OB: Generalizable Visual Imitation Learning with Stem-Like Convergent Observation through Diffusion Inversion
OLMoE: Open Mixture-of-Experts Language Models
Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion
Structural-Entropy-Based Sample Selection for Efficient and Effective Learning
Consistency Models Made Easy
Learning General-purpose Biomedical Volume Representations using Randomized Synthesis
Discrete Distribution Networks
MolSpectra: Pre-training 3D Molecular Representation with Multi-modal Energy Spectra
Building, Reusing, and Generalizing Abstract Representations from Concrete Sequences
A Single Goal is All You Need: Skills and Exploration Emerge from Contrastive RL without Rewards, Demonstrations, or Subgoals
When Selection Meets Intervention: Additional Complexities in Causal Discovery
Advancing Graph Generation through Beta Diffusion
Self-Supervised Diffusion MRI Denoising via Iterative and Stable Refinement
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
Learning Evolving Tools for Large Language Models
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation
GaussianBlock: Building Part-Aware Compositional and Editable 3D Scene by Primitives and Gaussians
Towards Marginal Fairness Sliced Wasserstein Barycenter
Erasing Concept Combination from Text-to-Image Diffusion Model
Improving Instruction-Following in Language Models through Activation Steering
Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation
Beware of Calibration Data for Pruning Large Language Models
CausalRivers - Scaling up benchmarking of causal discovery for real-world time-series
Learning to Communicate Through Implicit Communication Channels
ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains
Provably Accurate Shapley Value Estimation via Leverage Score Sampling
A Solvable Attention for Neural Scaling Laws
ST-GCond: Self-supervised and Transferable Graph Dataset Condensation
Learning Successor Features with Distributed Hebbian Temporal Memory
Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation
TimeKAN: KAN-based Frequency Decomposition Learning Architecture for Long-term Time Series Forecasting
SBSC: Step-by-Step Coding for Improving Mathematical Olympiad Performance
NRGBoost: Energy-Based Generative Boosted Trees
Forget the Data and Fine-Tuning! Just Fold the Network to Compress
Progressive distillation induces an implicit curriculum
UniCoTT: A Unified Framework for Structural Chain-of-Thought Distillation
Second Order Bounds for Contextual Bandits with Function Approximation
Language Agents Meet Causality -- Bridging LLMs and Causal World Models
Needle Threading: Can LLMs Follow Threads Through Near-Million-Scale Haystacks?
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
Learning Fine-Grained Representations through Textual Token Disentanglement in Composed Video Retrieval
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation
Intrinsic User-Centric Interpretability through Global Mixture of Experts
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Is Large-scale Pretraining the Secret to Good Domain Generalization?
Efficient Low-Bit Quantization with Adaptive Scales for Multi-Task Co-Training
Hierarchical Uncertainty Estimation for Learning-based Registration in Neuroimaging
Optimistic Games for Combinatorial Bayesian Optimization with Application to Protein Design
Adversarial Training for Defense Against Label Poisoning Attacks
Enhancing Language Model Agents using Diversity of Thoughts
SELF-EVOLVED REWARD LEARNING FOR LLMS
CrossMPT: Cross-attention Message-passing Transformer for Error Correcting Codes
Growth Inhibitors for Suppressing Inappropriate Image Concepts in Diffusion Models
CAMEx: Curvature-aware Merging of Experts
DistillHGNN: A Knowledge Distillation Approach for High-Speed Hypergraph Neural Networks
Open-World Reinforcement Learning over Long Short-Term Imagination
MAGE: Model-Level Graph Neural Networks Explanations via Motif-based Graph Generation
Faster Cascades via Speculative Decoding
Text4Seg: Reimagining Image Segmentation as Text Generation
Towards Empowerment Gain through Causal Structure Learning in Model-Based Reinforcement Learning
Transformers Can Learn Temporal Difference Methods for In-Context Reinforcement Learning
VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words?
SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models
RecFlow: An Industrial Full Flow Recommendation Dataset
ImagineNav: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination
Scaling Laws for Downstream Task Performance in Machine Translation
Stochastic Bandits Robust to Adversarial Attacks
KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models
Long-tailed Adversarial Training with Self-Distillation
Learning View-invariant World Models for Visual Robotic Manipulation
Logic-Logit: A Logic-Based Approach to Choice Modeling
CaPo: Cooperative Plan Optimization for Efficient Embodied Multi-Agent Cooperation
The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs
Towards Understanding Why FixMatch Generalizes Better Than Supervised Learning
Neuron Platonic Intrinsic Representation From Dynamics Using Contrastive Learning
Beyond Canonicalization: How Tensorial Messages Improve Equivariant Message Passing
OccProphet: Pushing the Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with an Observer-Forecaster-Refiner Framework
Context Clues: Evaluating Long Context Models for Clinical Prediction Tasks on EHR Data
Clique Number Estimation via Differentiable Functions of Adjacency Matrix Permutations
The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning
UniMatch: Universal Matching from Atom to Task for Few-Shot Drug Discovery
Charting the Design Space of Neural Graph Representations for Subgraph Matching
Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
PostCast: Generalizable Postprocessing for Precipitation Nowcasting via Unsupervised Blurriness Modeling
Selective Attention Improves Transformer
Complexity Lower Bounds of Adaptive Gradient Algorithms for Non-convex Stochastic Optimization under Relaxed Smoothness
Context Steering: Controllable Personalization at Inference Time
MGMapNet: Multi-Granularity Representation Learning for End-to-End Vectorized HD Map Construction
Robust-PIFu: Robust Pixel-aligned Implicit Function for 3D Human Digitalization from a Single Image
VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking
Unbounded: A Generative Infinite Game of Character Life Simulation
FlowDec: A flow-based full-band general audio codec with high perceptual quality
IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations
TODO: Enhancing LLM Alignment with Ternary Preferences
Frame-Voyager: Learning to Query Frames for Video Large Language Models
The impact of allocation strategies in subset learning on the expressive power of neural networks
Learning on One Mode: Addressing Multi-modality in Offline Reinforcement Learning
Towards Automated Knowledge Integration From Human-Interpretable Representations
OmniKV: Dynamic Context Selection for Efficient Long-Context LLMs
ScImage: How good are multimodal large language models at scientific text-to-image generation?
Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
Zero-shot Model-based Reinforcement Learning using Large Language Models
LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
Active Task Disambiguation with LLMs
Learning Splitting Heuristics in Divide-and-Conquer SAT Solvers with Reinforcement Learning
SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation
CryoGEN: Generative Energy-based Models for Cryogenic Electron Tomography Reconstruction
Lift Your Molecules: Molecular Graph Generation in Latent Euclidean Space
Score-based Self-supervised MRI Denoising
Efficiently Parameterized Neural Metriplectic Systems
Learning Distributions of Complex Fluid Simulations with Diffusion Graph Networks
Capturing the Temporal Dependence of Training Data Influence
Benchmarking LLMs' Judgments with No Gold Standard
Towards Neural Scaling Laws for Time Series Foundation Models
Adaptive $Q$-Network: On-the-fly Target Selection for Deep Reinforcement Learning
Jailbreaking as a Reward Misspecification Problem
Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Models
Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View
KLay: Accelerating Arithmetic Circuits for Neurosymbolic AI
Deriving Causal Order from Single-Variable Interventions: Guarantees & Algorithm
Scaling In-the-Wild Training for Diffusion-based Illumination Harmonization and Editing by Imposing Consistent Light Transport
Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds
Differentiable Rule Induction from Raw Sequence Inputs
GROOT-2: Weakly Supervised Multimodal Instruction Following Agents
On Generalization Across Environments In Multi-Objective Reinforcement Learning
Adaptive backtracking for faster optimization
Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control
Robustness Inspired Graph Backdoor Defense
Improving Large Language Model Planning with Action Sequence Similarity
Adversaries With Incentives: A Strategic Alternative to Adversarial Robustness
Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph Generation
Identifiable Exchangeable Mechanisms for Causal Structure and Representation Learning
Rethinking Multiple-Instance Learning From Feature Space to Probability Space
Cross-Entropy Is All You Need To Invert the Data Generating Process
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search
Learning the Complexity of Weakly Noisy Quantum States
Bridging Jensen Gap for Max-Min Group Fairness Optimization in Recommendation
Learning Generalizable Skills from Offline Multi-Task Data for Multi-Agent Cooperation
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting
Temporal Heterogeneous Graph Generation with Privacy, Utility, and Efficiency
Debiasing Mini-Batch Quadratics for Applications in Deep Learning
BAMDP Shaping: a Unified Theoretical Framework for Intrinsic Motivation and Reward Shaping
A Formal Framework for Understanding Length Generalization in Transformers
The Effectiveness of Curvature-Based Rewiring and the Role of Hyperparameters in GNNs Revisited
Can LLMs Solve Longer Math Word Problems Better?
Knowledge Localization: Mission Not Accomplished? Enter Query Localization!
Stochastic Semi-Gradient Descent for Learning Mean Field Games with Population-Aware Function Approximation
Scaling and evaluating sparse autoencoders
$\sigma$-zero: Gradient-based Optimization of $\ell_0$-norm Adversarial Examples
Certifying Counterfactual Bias in LLMs
Microcanonical Langevin Ensembles: Advancing the Sampling of Bayesian Neural Networks
Improving Deep Regression with Tightness
Provable unlearning in topic modeling and downstream tasks
Dynamic Diffusion Transformer
Quantum-PEFT: Ultra parameter-efficient fine-tuning
MIRAGE: Evaluating and Explaining Inductive Reasoning Process in Language Models
Hierarchical Autoregressive Transformers: Combining Byte- and Word-Level Processing for Robust, Adaptable Language Models
Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models
Discrete Diffusion Schrödinger Bridge Matching for Graph Transformation
Glauber Generative Model: Discrete Diffusion Models via Binary Classification
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
Metamizer: A Versatile Neural Optimizer for Fast and Accurate Physics Simulations
Diffusion Feedback Helps CLIP See Better
SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency
MiniPLM: Knowledge Distillation for Pre-training Language Models
Revealing the 3D Cosmic Web through Gravitationally Constrained Neural Fields
ElasticTok: Adaptive Tokenization for Image and Video
Memory Efficient Transformer Adapter for Dense Predictions
Scale-Aware Contrastive Reverse Distillation for Unsupervised Medical Anomaly Detection
Valid Conformal Prediction for Dynamic GNNs
World Model on Million-Length Video And Language With Blockwise RingAttention
Provable Benefit of Annealed Langevin Monte Carlo for Non-log-concave Sampling
Fast and Accurate Blind Flexible Docking
Geometry-aware RL for Manipulation of Varying Shapes and Deformable Objects
Studying the Interplay Between the Actor and Critic Representations in Reinforcement Learning
Adversarial Generative Flow Network for Solving Vehicle Routing Problems
P-SPIKESSM: HARNESSING PROBABILISTIC SPIKING STATE SPACE MODELS FOR LONG-RANGE DEPENDENCY TASKS
Lightning-Fast Image Inversion and Editing for Text-to-Image Diffusion Models
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?
Automated Design of Agentic Systems
Uncovering Overfitting in Large Language Model Editing
Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models
Understanding the Impacts of GenAI Requires Understanding the Impact of Anthropomorphic AI
On the Benefits of Attribute-Driven Graph Domain Adaptation
SeRA: Self-Reviewing and Alignment of LLMs using Implicit Reward Margins
Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data
Enhance Multi-View Classification Through Multi-Scale Alignment and Expanded Boundary
Sharpness-Aware Black-Box Optimization
Improving Data Efficiency via Curating LLM-Driven Rating Systems
Context-Alignment: Activating and Enhancing LLMs Capabilities in Time Series
LLM Unlearning via Loss Adjustment with Only Forget Data
VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration
SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix
Pareto Prompt Optimization
TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation
Adversarial Machine Unlearning
Gaussian-Det: Learning Closed-Surface Gaussians for 3D Object Detection
How efficient is LLM-generated code? A rigorous & high-standard benchmark
Towards Learning High-Precision Least Squares Algorithms with Sequence Models
Neural Spacetimes for DAG Representation Learning
Examining Alignment of Large Language Models through Representative Heuristics: the case of political stereotypes
BTBS-LNS: Binarized-Tightening, Branch and Search on Learning LNS Policies for MIP
Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix
FreeCG: Free the Design Space of Clebsch-Gordan Transform for Machine Learning Force Fields
Bundle Neural Network for message diffusion on graphs
Improved Finite-Particle Convergence Rates for Stein Variational Gradient Descent
Offline Model-Based Optimization by Learning to Rank
Benchmarking Predictive Coding Networks -- Made Simple
Physics-Informed Diffusion Models
Graph Neural Networks Can (Often) Count Substructures
Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding
Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models
Efficient Masked AutoEncoder for Video Object Counting and A Large-Scale Benchmark
Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance
Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data
Language Models Are Implicitly Continuous
Achieving Dimension-Free Communication in Federated Learning via Zeroth-Order Optimization
Understanding Optimization in Deep Learning with Central Flows
Temporal Reasoning Transfer from Text to Video
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation
Learning Dynamics of Deep Matrix Factorization Beyond the Edge of Stability
Adaptive Energy Alignment for Accelerating Test-Time Adaptation
$InterLCM$: Low-Quality Images as Intermediate States of Latent Consistency Models for Effective Blind Face Restoration
Accelerating Neural ODEs: A Variational Formulation-based Approach
Partial Gromov-Wasserstein Metric
SCBench: A KV Cache-Centric Analysis of Long-Context Methods
Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Root Cause Analysis of Anomalies in Multivariate Time Series through Granger Causal Discovery
DRoC: Elevating Large Language Models for Complex Vehicle Routing via Decomposed Retrieval of Constraints
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models
Deep Incomplete Multi-view Learning via Cyclic Permutation of VAEs
Articulate-Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model
Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models
Latent Action Pretraining from Videos
VCR: Pixel-Level Complex Reasoning by Restoring Occluded Text
Minimal Impact ControlNet: Advancing Multi-ControlNet Integration
Robotouille: An Asynchronous Planning Benchmark for LLM Agents
From Probability to Counterfactuals: the Increasing Complexity of Satisfiability in Pearl's Causal Hierarchy
Efficient Alternating Minimization with Applications to Weighted Low Rank Approximation
UniCBE: An Uniformity-driven Comparing Based Evaluation Framework with Unified Multi-Objective Optimization
ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments
Bayesian Image Regression with Soft-thresholded Conditional Autoregressive Prior
Multi-Perspective Data Augmentation for Few-shot Object Detection
SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP
Beyond Mere Token Analysis: A Hypergraph Metric Space Framework for Defending Against Socially Engineered LLM Attacks
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?
DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image
Homomorphism Expressivity of Spectral Invariant Graph Neural Networks
Joint Gradient Balancing for Data Ordering in Finite-Sum Multi-Objective Optimization
CircuitFusion: Multimodal Circuit Representation Learning for Agile Chip Design
Lossy Compression with Pretrained Diffusion Models
Mechanism and emergence of stacked attention heads in multi-layer transformers
Near-Exact Privacy Amplification for Matrix Mechanisms
PhyloLM: Inferring the Phylogeny of Large Language Models and Predicting their Performances in Benchmarks
Reasoning of Large Language Models over Knowledge Graphs with Super-Relations
Privacy Auditing of Large Language Models
Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning
The Last Iterate Advantage: Empirical Auditing and Principled Heuristic Analysis of Differentially Private SGD
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
Scalable Extraction of Training Data from Aligned, Production Language Models
N-ForGOT: Towards Not-forgetting and Generalization of Open Temporal Graph Learning
SEMDICE: Off-policy State Entropy Maximization via Stationary Distribution Correction Estimation
CHAMP: Conformalized 3D Human Multi-Hypothesis Pose Estimators
PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration
TS-LIF: A Temporal Segment Spiking Neuron Network for Time Series Forecasting
Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving
EC-Diffuser: Multi-Object Manipulation via Entity-Centric Behavior Generation
A Statistical Framework for Ranking LLM-based Chatbots
TULIP: Token-length Upgraded CLIP
Modeling dynamic social vision highlights gaps between deep learning and humans
SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models
Neuralized Markov Random Field for Interaction-Aware Stochastic Human Trajectory Prediction
Monte Carlo Planning with Large Language Model for Text-Based Game Agents
Breaking the Reclustering Barrier in Centroid-based Deep Clustering
TetSphere Splatting: Representing High-Quality Geometry with Lagrangian Volumetric Meshes
Balancing Bias in Two-sided Markets for Fair Stable Matchings
Demystifying the Token Dynamics of Deep Selective State Space Models
ILLUSION: Unveiling Truth with a Comprehensive Multi-Modal, Multi-Lingual Deepfake Dataset
PIORF: Physics-Informed Ollivier-Ricci Flow for Long–Range Interactions in Mesh Graph Neural Networks
GETS: Ensemble Temperature Scaling for Calibration in Graph Neural Networks
Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling
Procedural Synthesis of Synthesizable Molecules
Diffusion On Syntax Trees For Program Synthesis
CREIMBO: Cross-Regional Ensemble Interactions in Multi-view Brain Observations
GravMAD: Grounded Spatial Value Maps Guided Action Diffusion for Generalized 3D Manipulation
Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
Improving Generalization and Robustness in SNNs Through Signed Rate Encoding and Sparse Encoding Attacks
Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions
How many samples are needed to train a deep neural network?
A Truncated Newton Method for Optimal Transport
Generalization Bounds and Model Complexity for Kolmogorov–Arnold Networks
$\gamma-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
CofCA: A STEP-WISE Counterfactual Multi-hop QA benchmark
Learning from Imperfect Human Feedback: A Tale from Corruption-Robust Dueling
Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference
Scalable Discrete Diffusion Samplers: Combinatorial Optimization and Statistical Physics
Encryption-Friendly LLM Architecture
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory
CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control
ICLR: In-Context Learning of Representations
MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation
Explanations of GNN on Evolving Graphs via Axiomatic Layer edges
Boltzmann priors for Implicit Transfer Operators
Scalable Decision-Making in Stochastic Environments through Learned Temporal Abstraction
Autocorrelation Matters: Understanding the Role of Initialization Schemes for State Space Models
Diverse Preference Learning for Capabilities and Alignment
Data Scaling Laws in Imitation Learning for Robotic Manipulation
Bayesian WeakS-to-Strong from Text Classification to Generation
Jump Your Steps: Optimizing Sampling Schedule of Discrete Diffusion Models
Topological Zigzag Spaghetti for Diffusion-based Generation and Prediction on Graphs
A Generalist Hanabi Agent
Rethinking Fair Representation Learning for Performance-Sensitive Tasks
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
TLDR: Token-Level Detective Reward Model for Large Vision Language Models
A Benchmark for Semantic Sensitive Information in LLMs Outputs
RelCon: Relative Contrastive Learning for a Motion Foundation Model for Wearable Data
Improving Convergence Guarantees of Random Subspace Second-order Algorithm for Nonconvex Optimization
Stealthy Shield Defense: A Conditional Mutual Information-Based Approach against Black-Box Model Inversion Attacks
Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning
GlycanML: A Multi-Task and Multi-Structure Benchmark for Glycan Machine Learning
Towards Realistic Data Generation for Real-World Super-Resolution
Large (Vision) Language Models are Unsupervised In-Context Learners
Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning
Loss Landscape of Shallow ReLU-like Neural Networks: Stationary Points, Saddle Escape, and Network Embedding
DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References
Provably Reliable Conformal Prediction Sets in the Presence of Data Poisoning
On the Byzantine-Resilience of Distillation-Based Federated Learning
Counterfactual Generative Modeling with Variational Causal Inference
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
Be More Diverse than the Most Diverse: Optimal Mixtures of Generative Models via Mixture-UCB Bandit Algorithms
Statistical Advantages of Perturbing Cosine Router in Mixture of Experts
Gap Preserving Distillation by Building Bidirectional Mappings with A Dynamic Teacher
Improving Language Model Distillation through Hidden State Matching
Dynamic Low-Rank Sparse Adaptation for Large Language Models
A Generic Framework for Conformal Fairness
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)
Track-On: Transformer-based Online Point Tracking with Memory
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
Fast Direct: Query-Efficient Online Black-box Guidance for Diffusion-model Target Generation
Node Identifiers: Compact, Discrete Representations for Efficient Graph Learning
Adversarial Attacks on Data Attribution
Beyond Random Masking: When Dropout meets Graph Convolutional Networks
Generative Adversarial Ranking Nets
Not All LLM-Generated Data Are Equal: Rethinking Data Weighting in Text Classification
Physics of Language Models: Part 3.2, Knowledge Manipulation
Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration
Second-Order Fine-Tuning without Pain for LLMs: A Hessian Informed Zeroth-Order Optimizer
ZAPBench: A Benchmark for Whole-Brain Activity Prediction in Zebrafish
ProAdvPrompter: A Two-Stage Journey to Effective Adversarial Prompting for LLMs
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Greener GRASS: Enhancing GNNs with Encoding, Rewiring, and Attention
ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence
An Effective Manifold-based Optimization Method for Distributionally Robust Classification
Deconstructing Denoising Diffusion Models for Self-Supervised Learning
ContextGNN: Beyond Two-Tower Recommendation Systems
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
Revisiting Feature Prediction for Learning Visual Representations from Video
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
UniCon: Unidirectional Information Flow for Effective Control of Large-Scale Diffusion Models
MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers
Revisiting Source-Free Domain Adaptation: a New Perspective via Uncertainty Control
Artificial Kuramoto Oscillatory Neurons
Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance
Diffusing to the Top: Boost Graph Neural Networks with Minimal Hyperparameter Tuning
Chunk-Distilled Language Modeling
When Graph Neural Networks Meet Dynamic Mode Decomposition
Global Well-posedness and Convergence Analysis of Score-based Generative Models via Sharp Lipschitz Estimates
Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
Neural Causal Graph for Interpretable and Intervenable Classification
SiReRAG: Indexing Similar and Related Information for Multihop Reasoning
NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields
KinFormer: Generalizable Dynamical Symbolic Regression for Catalytic Organic Reaction Kinetics
ACC-Collab: An Actor-Critic Approach to Multi-Agent LLM Collaboration
Tree-Wasserstein Distance for High Dimensional Data with a Latent Feature Hierarchy
ToolGen: Unified Tool Retrieval and Calling via Generation
Graph Neural Networks Are More Than Filters: Revisiting and Benchmarking from A Spectral Perspective
General Scene Adaptation for Vision-and-Language Navigation
Scale-Free Graph-Language Models
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning
ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability
OS-ATLAS: Foundation Action Model for Generalist GUI Agents
PEARL: Towards Permutation-Resilient LLMs
ASTrA: Adversarial Self-supervised Training with Adaptive-Attacks
Open-Vocabulary Customization from CLIP via Data-Free Knowledge Distillation
Towards Semantic Equivalence of Tokenization in Multimodal LLM
A Tight Convergence Analysis of Inexact Stochastic Proximal Point Algorithm for Stochastic Composite Optimization Problems
Shedding Light on Time Series Classification using Interpretability Gated Networks
Transformers Provably Solve Parity Efficiently with Chain of Thought
ThinK: Thinner Key Cache by Query-Driven Pruning
Layout-your-3D: Controllable and Precise 3D Generation with 2D Blueprint
Atomas: Hierarchical Adaptive Alignment on Molecule-Text for Unified Molecule Understanding and Generation
TIGeR: Unifying Text-to-Image Generation and Retrieval with Large Multimodal Models
Towards Federated RLHF with Aggregated Client Preference for LLMs
Quamba: A Post-Training Quantization Recipe for Selective State Space Models
Associative memory and dead neurons
Probabilistic Geometric Principal Component Analysis with application to neural data
Multi-Modal and Multi-Attribute Generation of Single Cells with CFGen
E(3)-equivariant models cannot learn chirality: Field-based molecular generation
A General Framework for Off-Policy Learning with Partially-Observed Reward
HiBug2: Efficient and Interpretable Error Slice Discovery for Comprehensive Model Debugging
CipherPrune: Efficient and Scalable Private Transformer Inference
DRESSing Up LLM: Efficient Stylized Question-Answering via Style Subspace Editing
Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models
Fragment and Geometry Aware Tokenization of Molecules for Structure-Based Drug Design Using Language Models
ParetoFlow: Guided Flows in Multi-Objective Optimization
U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models
SPA-BENCH: A COMPREHENSIVE BENCHMARK FOR SMARTPHONE AGENT EVALUATION
GOFA: A Generative One-For-All Model for Joint Graph Language Modeling
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
MQuAKE-Remastered: Multi-Hop Knowledge Editing Can Only Be Advanced with Reliable Evaluations
Is Your Video Language Model a Reliable Judge?
Learning to Plan Before Answering: Self-Teaching LLMs to Learn Abstract Plans for Problem Solving
AdvPaint: Protecting Images from Inpainting Manipulation via Adversarial Attention Disruption
$\text{D}_{2}\text{O}$: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
An Engorgio Prompt Makes Large Language Model Babble on
CAT-3DGS: A Context-Adaptive Triplane Approach to Rate-Distortion-Optimized 3DGS Compression
Local Steps Speed Up Local GD for Heterogeneous Distributed Logistic Regression
Is uniform expressivity too restrictive? Towards efficient expressivity of GNNs
Image-level Memorization Detection via Inversion-based Inference Perturbation
Multi-Dimensional Conformal Prediction
Symbolic regression via MDLformer-guided search: from minimizing prediction error to minimizing description length
Student-Informed Teacher Training
REEF: Representation Encoding Fingerprints for Large Language Models
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
How Low Can You Go? Searching for the Intrinsic Dimensionality of Complex Networks using Metric Node Embeddings
Discrete Codebook World Models for Continuous Control
Task Descriptors Help Transformers Learn Linear Models In-Context
Analysis of Linear Mode Connectivity via Permutation-Based Weight Matching: With Insights into Other Permutation Search Methods
SOREL: A Stochastic Algorithm for Spectral Risks Minimization
Accelerating Diffusion Transformers with Token-wise Feature Caching
Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Transformers
PT-T2I/V: An Efficient Proxy-Tokenized Diffusion Transformer for Text-to-Image/Video-Task
Post-hoc Reward Calibration: A Case Study on Length Bias
DCT-CryptoNets: Scaling Private Inference in the Frequency Domain
Layerwise Recurrent Router for Mixture-of-Experts
Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity
Efficient Top-m Data Values Identification for Data Selection
Linear combinations of latents in generative models: subspaces and beyond
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion
Agent S: An Open Agentic Framework that Uses Computers Like a Human
Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning
Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data
RFMamba: Frequency-Aware State Space Model for RF-Based Human-Centric Perception
IMDPrompter: Adapting SAM to Image Manipulation Detection by Cross-View Automated Prompt Learning
Causal Discovery via Bayesian Optimization
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
Rethinking Graph Neural Networks From A Geometric Perspective Of Node Features
Tuning Frequency Bias of State Space Models
Gaussian Mixture Counterfactual Generator
CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux Modelling
Online Reinforcement Learning in Non-Stationary Context-Driven Environments
SaMer: A Scenario-aware Multi-dimensional Evaluator for Large Language Models
ADIFF: Explaining audio difference using natural language
Enhancing Graph Of Thought: Enhancing Prompts with LLM Rationales and Dynamic Temperature Control
Controlling Language and Diffusion Models by Transporting Activations
SWEb: A Large Web Dataset for the Scandinavian Languages
Robust Root Cause Diagnosis using In-Distribution Interventions
Taming Overconfidence in LLMs: Reward Calibration in RLHF
Emergent Orientation Maps —— Mechanisms, Coding Efficiency and Robustness
LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding
Prompt as Knowledge Bank: Boost Vision-language model via Structural Representation for zero-shot medical detection
Continuity-Preserving Convolutional Autoencoders for Learning Continuous Latent Dynamical Models from Images
Learning local equivariant representations for quantum operators
FlashRNN: I/O-Aware Optimization of Traditional RNNs on modern hardware
Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving
Finally Rank-Breaking Conquers MNL Bandits: Optimal and Efficient Algorithms for MNL Assortment
Provably Safeguarding a Classifier from OOD and Adversarial Samples
REvolve: Reward Evolution with Large Language Models using Human Feedback
Transformer Block Coupling and its Correlation with Generalization in LLMs
ChemAgent: Self-updating Memories in Large Language Models Improves Chemical Reasoning
Your Mixture-of-Experts LLM Is Secretly an Embedding Model for Free
Circuit Transformer: A Transformer That Preserves Logical Equivalence
RMB: Comprehensively benchmarking reward models in LLM alignment
Boosting the visual interpretability of CLIP via adversarial fine-tuning
Many-Objective Multi-Solution Transport
SIM: Surface-based fMRI Analysis for Inter-Subject Multimodal Decoding from Movie-Watching Experiments
Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats
Neural Wave Equation for Irregularly Sampled Sequence Data
Re-evaluating Open-ended Evaluation of Large Language Models
Diffusing States and Matching Scores: A New Framework for Imitation Learning
Safety Layers in Aligned Large Language Models: The Key to LLM Security
Palu: KV-Cache Compression with Low-Rank Projection
Video Action Differencing
Beyond the convexity assumption: Realistic tabular data generation under quantifier-free real linear constraints
Open-Set Graph Anomaly Detection via Normal Structure Regularisation
Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization
Beyond single neurons: population response geometry in digital twins of mouse visual cortex
Copyright-Protected Language Generation via Adaptive Model Fusion
OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting
SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators
Bandit Learning in Matching Markets with Indifference
Diffusion State-Guided Projected Gradient for Inverse Problems
Bootstrapped Model Predictive Control
AstroCompress: A benchmark dataset for multi-purpose compression of astronomical data
Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron
Single Teacher, Multiple Perspectives: Teacher Knowledge Augmentation for Enhanced Knowledge Distillation
A CLIP-Powered Framework for Robust and Generalizable Data Selection
Correlating instruction-tuning (in multimodal models) with vision-language processing (in the brain)
MambaQuant: Quantizing the Mamba Family with Variance Aligned Rotation Methods
Air Quality Prediction with Physics-Guided Dual Neural ODEs in Open Systems
Unearthing Skill-level Insights for Understanding Trade-offs of Foundation Models
Surprising Effectiveness of pretraining Ternary Language Model at Scale
A transfer learning framework for weak to strong generalization
Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models
Learning Long Range Dependencies on Graphs via Random Walks
Asymptotic Analysis of Two-Layer Neural Networks after One Gradient Step under Gaussian Mixtures Data with Structure
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
Predictive Uncertainty Quantification for Bird's Eye View Segmentation: A Benchmark and Novel Loss Function
Breaking Neural Network Scaling Laws with Modularity
Feedback Schrödinger Bridge Matching
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation
How to Evaluate Reward Models for RLHF
DUET: Decentralized Bilevel Optimization without Lower-Level Strong Convexity
Neural Phylogeny: Fine-Tuning Relationship Detection among Neural Networks
Heavy-Tailed Diffusion Models
Indirect Gradient Matching for Adversarial Robust Distillation
An Undetectable Watermark for Generative Image Models
Long-Sequence Recommendation Models Need Decoupled Embeddings
OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalities
Preference Optimization for Reasoning with Pseudo Feedback
Noise Separation guided Candidate Label Reconstruction for Noisy Partial Label Learning
Spurious Forgetting in Continual Learning of Language Models
VTDexManip: A Dataset and Benchmark for Visual-tactile Pretraining and Dexterous Manipulation with Reinforcement Learning
Can a MISL Fly? Analysis and Ingredients for Mutual Information Skill Learning
Instant Policy: In-Context Imitation Learning via Graph Diffusion
GeoLoRA: Geometric integration for parameter efficient fine-tuning
GeSubNet: Gene Interaction Inference for Disease Subtype Network Generation
On Scaling Up 3D Gaussian Splatting Training
Extending Mercer's expansion to indefinite and asymmetric kernels
Learning Interleaved Image-Text Comprehension in Vision-Language Large Models
Causal Graph Transformer for Treatment Effect Estimation Under Unknown Interference
Causal Identification for Complex Functional Longitudinal Studies
On the Linear Speedup of Personalized Federated Reinforcement Learning with Shared Representations
DOPL: Direct Online Preference Learning for Restless Bandits with Preference Feedback
AniSDF: Fused-Granularity Neural Surfaces with Anisotropic Encoding for High-Fidelity 3D Reconstruction
UniDrive: Towards Universal Driving Perception Across Camera Configurations
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
Progressive Token Length Scaling in Transformer Encoders for Efficient Universal Segmentation
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Training on the Test Task Confounds Evaluation and Emergence
ECHOPulse: ECG Controlled Echocardio-gram Video Generation
Population Transformer: Learning Population-level Representations of Neural Activity
Training-Free Dataset Pruning for Instance Segmentation
On the Convergence of No-Regret Dynamics in Information Retrieval Games with Proportional Ranking Functions
Compute-Optimal LLMs Provably Generalize Better with Scale
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
AnalogGenie: A Generative Engine for Automatic Discovery of Analog Circuit Topologies
Diffusion Generative Modeling for Spatially Resolved Gene Expression Inference from Histology Images
PiCO: Peer Review in LLMs based on Consistency Optimization
EFFICIENT JAILBREAK ATTACK SEQUENCES ON LARGE LANGUAGE MODELS VIA MULTI-ARMED BANDIT-BASED CONTEXT SWITCHING
Normed Spaces for Graph Embedding
Attribute-based Visual Reprogramming for Vision-Language Models
Diffusion-based Neural Network Weights Generation
Rethinking the generalization of drug target affinity prediction algorithms via similarity aware evaluation
InstantPortrait: One-Step Portrait Editing via Diffusion Multi-Objective Distillation
Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering
PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training
RegMix: Data Mixture as Regression for Language Model Pre-training
Recognize Any Surgical Object: Unleashing the Power of Weakly-Supervised Data
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Can Neural Networks Achieve Optimal Computational-statistical Tradeoff? An Analysis on Single-Index Model
Nova: Generative Language Models for Assembly Code with Hierarchical Attention and Contrastive Learning
When Attention Sink Emerges in Language Models: An Empirical View
M^3PC: Test-time Model Predictive Control using Pretrained Masked Trajectory Model
A Simple Approach to Unifying Diffusion-based Conditional Generation
Scaling up Masked Diffusion Models on Text
Bootstrapping Language Models with DPO Implicit Rewards
Reflective Gaussian Splatting
AI as Humanity’s Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text
Rethinking Classifier Re-Training in Long-Tailed Recognition: Label Over-Smooth Can Balance
Tackling Data Corruption in Offline Reinforcement Learning via Sequence Modeling
Regret-Optimal List Replicable Bandit Learning: Matching Upper and Lower Bounds
ALLaM: Large Language Models for Arabic and English
Metalic: Meta-Learning In-Context with Protein Language Models
Reconsidering Faithfulness in Regular, Self-Explainable and Domain Invariant GNNs
Guaranteed Generation from Large Language Models
Intelligence at the Edge of Chaos
Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Severing Spurious Correlations with Data Pruning
BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities
Generalizing Reasoning Problems to Longer Lengths
InversionGNN: A Dual Path Network for Multi-Property Molecular Optimization
PABBO: Preferential Amortized Black-Box Optimization
LLM-based Typed Hyperresolution for Commonsense Reasoning with Knowledge Bases
Learning Structured Representations by Embedding Class Hierarchy with Fast Optimal Transport
REMEDY: Recipe Merging Dynamics in Large Vision-Language Models
ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance
FreeVS: Generative View Synthesis on Free Driving Trajectory
Utility-Directed Conformal Prediction: A Decision-Aware Framework for Actionable Uncertainty Quantification
Fast Summation of Radial Kernels via QMC Slicing
Stochastic variance-reduced Gaussian variational inference on the Bures-Wasserstein manifold
Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
MMDisCo: Multi-Modal Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation
DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding
The Optimization Landscape of SGD Across the Feature Learning Strength
On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
Refine Knowledge of Large Language Models via Adaptive Contrastive Learning
Minimax Optimal Reinforcement Learning with Quasi-Optimism
Interpreting Language Reward Models via Contrastive Explanations
More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness
EVA: Geometric Inverse Design for Fast Protein Motif-Scaffolding with Coupled Flow
Lipschitz Bandits in Optimal Space
Data Selection via Optimal Control for Language Models
How Learnable Grids Recover Fine Detail in Low Dimensions: A Neural Tangent Kernel Analysis of Multigrid Parametric Encodings
CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph
Language Guided Skill Discovery
Computational Explorations of Total Variation Distance
MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models
A Simple yet Effective $\Delta\Delta G$ Predictor is An Unsupervised Antibody Optimizer and Explainer
SymDiff: Equivariant Diffusion via Stochastic Symmetrisation
Parameter and Memory Efficient Pretraining via Low-rank Riemannian Optimization
MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction
Deep Distributed Optimization for Large-Scale Quadratic Programming
Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification
Modeling Complex System Dynamics with Flow Matching Across Time and Conditions
TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models
Contractive Dynamical Imitation Policies for Efficient Out-of-Sample Recovery
Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond
GSBA$^K$: $top$-$K$ Geometric Score-based Black-box Attack
Field-DiT: Diffusion Transformer on Unified Video, 3D, and Game Field Generation
Investigating the Pre-Training Dynamics of In-Context Learning: Task Recognition vs. Task Learning
MA$^2$E: Addressing Partial Observability in Multi-Agent Reinforcement Learning with Masked Auto-Encoder
TDDBench: A Benchmark for Training data detection
Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning
CO-MOT: Boosting End-to-end Transformer-based Multi-Object Tracking via Coopetition Label Assignment and Shadow Sets
Mind Control through Causal Inference: Predicting Clean Images from Poisoned Data
Can Knowledge Editing Really Correct Hallucinations?
Adversarial Search Engine Optimization for Large Language Models
RankSHAP: Shapley Value Based Feature Attributions for Learning to Rank
Causal Representation Learning from Multimodal Biomedical Observations
Mixture Compressor for Mixture-of-Experts LLMs Gains More
InstaTrain: Adaptive Training via Ultra-Fast Natural Annealing within Dynamical Systems
DS-LLM: Leveraging Dynamical Systems to Enhance Both Training and Inference of Large Language Models
Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization
Solving hidden monotone variational inequalities with surrogate losses
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Decision Information Meets Large Language Models: The Future of Explainable Operations Research
OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones?
AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics
Probabilistic Neural Pruning via Sparsity Evolutionary Fokker-Planck-Kolmogorov Equation
Improving Complex Reasoning with Dynamic Prompt Corruption: A Soft Prompt Optimization Approach
Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods
Generalization in VAE and Diffusion Models: A Unified Information-Theoretic Analysis
Model Risk-sensitive Offline Reinforcement Learning
BANGS: Game-theoretic Node Selection for Graph Self-Training
On the Role of Attention Heads in Large Language Model Safety
Detecting Backdoor Samples in Contrastive Language Image Pretraining
On the Price of Differential Privacy for Hierarchical Clustering
Learning to Select Nodes in Branch and Bound with Sufficient Tree Representation
Beyond Surface Structure: A Causal Assessment of LLMs' Comprehension ability
Reti-Diff: Illumination Degradation Image Restoration with Retinex-based Latent Diffusion Model
Scaling Long Context Training Data by Long-Distance Referrals
Learning Graph Quantized Tokenizers
Synthetic continued pretraining
Adding Conditional Control to Diffusion Models with Reinforcement Learning
Dataset Ownership Verification in Contrastive Pre-trained Models
ML4TSPBench: Drawing Methodological Principles for TSP and Beyond from Streamlined Design Space of Learning and Search
MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmentation
A Conditional Independence Test in the Presence of Discretization
Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression
Does Safety Training of LLMs Generalize to Semantically Related Natural Prompts?
On the Performance Analysis of Momentum Method: A Frequency Domain Perspective
TEASER: Token Enhanced Spatial Modeling for Expressions Reconstruction
CityAnchor: City-scale 3D Visual Grounding with Multi-modality LLMs
Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation
Reliable and Diverse Evaluation of LLM Medical Knowledge Mastery
Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search
LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
Precise Parameter Localization for Textual Generation in Diffusion Models
Improving the Sparse Structure Learning of Spiking Neural Networks from the View of Compression Efficiency
Combatting Dimensional Collapse in LLM Pre-Training Data via Submodular File Selection
SeCom: On Memory Construction and Retrieval for Personalized Conversational Agents
Unlocking the Power of Function Vectors for Characterizing and Mitigating Catastrophic Forgetting in Continual Instruction Tuning
ImpScore: A Learnable Metric For Quantifying The Implicitness Level of Sentences
LIFe-GoM: Generalizable Human Rendering with Learned Iterative Feedback Over Multi-Resolution Gaussians-on-Mesh
MixEval-X: Any-to-any Evaluations from Real-world Data Mixture
Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A Benchmark
MAI: A Multi-turn Aggregation-Iteration Model for Composed Image Retrieval
Efficient Biological Data Acquisition through Inference Set Design
Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining
Towards Explaining the Power of Constant-depth Graph Neural Networks for Structured Linear Programming
Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark
PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
When GNNs meet symmetry in ILPs: an orbit-based feature augmentation approach
Rare event modeling with self-regularized normalizing flows: what can we learn from a single failure?
Learning stochastic dynamics from snapshots through regularized unbalanced optimal transport
Towards Auto-Regressive Next-Token Prediction: In-context Learning Emerges from Generalization
BrainUICL: An Unsupervised Individual Continual Learning Framework for EEG Applications
Safety Representations for Safer Policy Learning
Bilinear MLPs enable weight-based mechanistic interpretability
Composing Unbalanced Flows for Flexible Docking and Relaxation
The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws
Exploring channel distinguishability in local neighborhoods of the model space in quantum neural networks
Grid Cell-Inspired Fragmentation and Recall for Efficient Map Building
GNNs Getting ComFy: Community and Feature Similarity Guided Rewiring
Local Loss Optimization in the Infinite Width: Stable Parameterization of Predictive Coding Networks and Target Propagation
THE ROBUSTNESS OF DIFFERENTIABLE CAUSAL DISCOVERY IN MISSPECIFIED SCENARIOS
Block-Attention for Efficient Prefilling
Towards Understanding the Robustness of Diffusion-Based Purification: A Stochastic Perspective
Multi-domain Distribution Learning for De Novo Drug Design
Black Sheep in the Herd: Playing with Spuriously Correlated Attributes for Vision-Language Recognition
Isometric Regularization for Manifolds of Functional Data
CL-MFAP: A Contrastive Learning-Based Multimodal Foundation Model for Molecular Property Prediction and Antibiotic Screening
Fantastic Copyrighted Beasts and How (Not) to Generate Them
Gramian Multimodal Representation Learning and Alignment
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss
Boosting Neural Combinatorial Optimization for Large-Scale Vehicle Routing Problems
OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling
Few for Many: Tchebycheff Set Scalarization for Many-Objective Optimization
Block Verification Accelerates Speculative Decoding
Semi-Parametric Retrieval via Binary Bag-of-Tokens Index
VLMaterial: Procedural Material Generation with Large Vision-Language Models
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Dynamic Sparse Training versus Dense Training: The Unexpected Winner in Image Corruption Robustness
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Learning a Neural Solver for Parametric PDEs to Enhance Physics-Informed Methods
Do LLMs ``know'' internally when they follow instructions?
EvA: Erasing Spurious Correlations with Activations
Offline Hierarchical Reinforcement Learning via Inverse Optimization
Learning Harmonized Representations for Speculative Sampling
FreqPrior: Improving Video Diffusion Models with Frequency Filtering Gaussian Noise
Training Free Exponential Context Extension via Cascading KV Cache
Brain Bandit: A Biologically Grounded Neural Network for Efficient Control of Exploration
Uncertainty and Influence aware Reward Model Refinement for Reinforcement Learning from Human Feedback
Tool-Planner: Task Planning with Clusters across Multiple Tools
OvercookedV2: Rethinking Overcooked for Zero-Shot Coordination
HeadMap: Locating and Enhancing Knowledge Circuits in LLMs
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
MTSAM: Multi-Task Fine-Tuning for Segment Anything Model
BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics
Generalizable Human Gaussians from Single-View Image
Budgeted Online Continual Learning by Adaptive Layer Freezing and Frequency-based Sampling
Fine-Tuning Token-Based Large Multimodal Models: What Works, What Doesn’t and What's Next
Calibrating Expressions of Certainty
MixMax: Distributional Robustness in Function Space via Optimal Data Mixtures
Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces
Unifying Causal Representation Learning with the Invariance Principle
Locality-aware Gaussian Compression for Fast and High-quality Rendering
Enhancing End-to-End Autonomous Driving with Latent World Model
Training-Free Activation Sparsity in Large Language Models
Beyond Next Token Prediction: Patch-Level Training for Large Language Models
Policy Design in Long-run Welfare Dynamics
Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination
Denoising Autoregressive Transformers for Scalable Text-to-Image Generation
Morphing Tokens Draw Strong Masked Image Models
3D StreetUnveiler with Semantic-aware 2DGS - a simple baseline
CLIBD: Bridging Vision and Genomics for Biodiversity Monitoring at Scale
Rethinking Self-Distillation: Label Averaging and Enhanced Soft Label Refinement with Partial Labels
Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs
HGM³: Hierarchical Generative Masked Motion Modeling with Hard Token Mining
Optimized Multi-Token Joint Decoding With Auxiliary Model for LLM Inference
Min-K%++: Improved Baseline for Pre-Training Data Detection from Large Language Models
Distance-Based Tree-Sliced Wasserstein Distance
Latent Bayesian Optimization via Autoregressive Normalizing Flows
PhyloVAE: Unsupervised Learning of Phylogenetic Trees via Variational Autoencoders
Alchemy: Amplifying Theorem-Proving Capability Through Symbolic Mutation
CameraCtrl: Enabling Camera Control for Video Diffusion Models
NeSyC: A Neuro-symbolic Continual Learner For Complex Embodied Tasks in Open Domains
Conditional Diffusion with Ordinal Regression: Longitudinal Data Generation for Neurodegenerative Disease Studies
SelKD: Selective Knowledge Distillation via Optimal Transport Perspective
Spherical Tree-Sliced Wasserstein Distance
FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Equivariant Neural Functional Networks for Transformers
Permute-and-Flip: An optimally stable and watermarkable decoder for LLMs
Nesterov acceleration in benignly non-convex landscapes
Concept Bottleneck Language Models For Protein Design
ELBOing Stein: Variational Bayes with Stein Mixture Inference
REFINE: Inversion-Free Backdoor Defense via Model Reprogramming
Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL
Learning 3D Perception from Others' Predictions
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
DeepGate4: Efficient and Effective Representation Learning for Circuit Design at Scale
Training-free LLM-generated Text Detection by Mining Token Probability Sequences
Multilevel Generative Samplers for Investigating Critical Phenomena
LOIRE: LifelOng learning on Incremental data via pre-trained language model gRowth Efficiently
ADePT: Adaptive Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning
Calibrating LLMs with Information-Theoretic Evidential Deep Learning
Highly Efficient Self-Adaptive Reward Shaping for Reinforcement Learning
On the Crucial Role of Initialization for Matrix Factorization
Progressive Parameter Efficient Transfer Learning for Semantic Segmentation
LaMPlace: Learning to Optimize Cross-Stage Metrics in Macro Placement
Hybrid Regularization Improves Diffusion-based Inverse Problem Solving
Learning Transformer-based World Models with Contrastive Predictive Coding
Effective Interplay between Sparsity and Quantization: From Theory to Practice
BoneMet: An Open Large-Scale Multi-Modal Murine Dataset for Breast Cancer Bone Metastasis Diagnosis and Prognosis
Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems
Diffusion Models Are Real-Time Game Engines
EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
GraphArena: Evaluating and Exploring Large Language Models on Graph Computation
qNBO: quasi-Newton Meets Bilevel Optimization
Active Learning for Continual Learning: Keeping the Past Alive in the Present
Data Center Cooling System Optimization Using Offline Reinforcement Learning
Rapidly Adapting Policies to the Real-World via Simulation-Guided Fine-Tuning
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
A Large-Scale 3D Face Mesh Video Dataset via Neural Re-parameterized Optimization
UV-Attack: Physical-World Adversarial Attacks on Person Detection via Dynamic-NeRF-based UV Mapping
Geometry of Lightning Self-Attention: Identifiability and Dimension
Attributing Culture-Conditioned Generations to Pretraining Corpora
IntersectionZoo: Eco-driving for Benchmarking Multi-Agent Contextual Reinforcement Learning
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
Beyond Circuit Connections: A Non-Message Passing Graph Transformer Approach for Quantum Error Mitigation
Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation
Can We Talk Models Into Seeing the World Differently?
TabWak: A Watermark for Tabular Diffusion Models
Task-Adaptive Pretrained Language Models via Clustered-Importance Sampling
Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization
COAT: Compressing Optimizer states and Activations for Memory-Efficient FP8 Training
Enhancing the Scalability and Applicability of Kohn-Sham Hamiltonians for Molecular Systems
Faster Inference of Flow-Based Generative Models via Improved Data-Noise Coupling
UIFace: Unleashing Inherent Model Capabilities to Enhance Intra-Class Diversity in Synthetic Face Recognition
EcoFace: Audio-Visual Emotional Co-Disentanglement Speech-Driven 3D Talking Face Generation
Value-aligned Behavior Cloning for Offline Reinforcement Learning via Bi-level Optimization
MindSimulator: Exploring Brain Concept Localization via Synthetic fMRI
NL-Eye: Abductive NLI For Images
Self-supervised Monocular Depth Estimation Robust to Reflective Surface Leveraged by Triplet Mining
Agree to Disagree: Demystifying Homogeneous Deep Ensembles through Distributional Equivalence
Incorporating Visual Correspondence into Diffusion Model for Virtual Try-On
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
When LLMs Play the Telephone Game: Cultural Attractors as Conceptual Tools to Evaluate LLMs in Multi-turn Settings
Graph Neural Networks for Edge Signals: Orientation Equivariance and Invariance
Going Beyond Static: Understanding Shifts with Time-Series Attribution
StochSync: Stochastic Diffusion Synchronization for Image Generation in Arbitrary Spaces
Divergence-enhanced Knowledge-guided Context Optimization for Visual-Language Prompt Tuning
Modeling Unseen Environments with Language-guided Composable Causal Components in Reinforcement Learning
Samba: Synchronized Set-of-Sequences Modeling for Multiple Object Tracking
Residual Stream Analysis with Multi-Layer SAEs
Do as We Do, Not as You Think: the Conformity of Large Language Models
Bayesian Analysis of Combinatorial Gaussian Process Bandits
A Statistical Approach for Controlled Training Data Detection
Can Generative AI Solve Your In-Context Learning Problem? A Martingale Perspective
Centrality-guided Pre-training for Graph
A Multiscale Frequency Domain Causal Framework for Enhanced Pathological Analysis
Rethinking and Improving Autoformalization: Towards a Faithful Metric and a Dependency Retrieval-based Approach
Everything is Editable: Extend Knowledge Editing to Unstructured Data in Large Language Models
PFDiff: Training-Free Acceleration of Diffusion Models Combining Past and Future Scores
ClimaQA: An Automated Evaluation Framework for Climate Question Answering Models
Selective Aggregation for Low-Rank Adaptation in Federated Learning
Multi-level Certified Defense Against Poisoning Attacks in Offline Reinforcement Learning
Decoupling Angles and Strength in Low-rank Adaptation
Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate
Semantic Loss Guided Data Efficient Supervised Fine Tuning for Safe Responses in LLMs
On Minimizing Adversarial Counterfactual Error in Adversarial Reinforcement Learning
Causal Graphical Models for Vision-Language Compositional Understanding
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
Training Large Language Models for Retrieval-Augmented Question Answering through Backtracking Correction
Exact Certification of (Graph) Neural Networks Against Label Poisoning
Breaking Class Barriers: Efficient Dataset Distillation via Inter-Class Feature Compensator
Topological Schrödinger Bridge Matching
DenoiseVAE: Learning Molecule-Adaptive Noise Distributions for Denoising-based 3D Molecular Pre-training
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Uni$^2$Det: Unified and Universal Framework for Prompt-Guided Multi-dataset 3D Detection
GANDALF: Generative AttentioN based Data Augmentation and predictive modeLing Framework for personalized cancer treatment
An Intelligent Agentic System for Complex Image Restoration Problems
Trajectory-Class-Aware Multi-Agent Reinforcement Learning
Coreset Selection via Reducible Loss in Continual Learning
Analytic DAG Constraints for Differentiable DAG Learning
Optimal Non-Asymptotic Rates of Value Iteration for Average-Reward Markov Decision Processes
A Theoretical Perspective: How to Prevent Model Collapse in Self-consuming Training Loops
Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior
Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models
Small Models are LLM Knowledge Triggers for Medical Tabular Prediction
PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction
The Computational Complexity of Positive Non-Clashing Teaching in Graphs
Making Text Embedders Few-Shot Learners
GenDataAgent: On-the-fly Dataset Augmentation with Synthetic Data
GMValuator: Similarity-based Data Valuation for Generative Models
DyCAST: Learning Dynamic Causal Structure from Time Series
Three Mechanisms of Feature Learning in a Linear Network
Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding
Transformer Encoder Satisfiability: Complexity and Impact on Formal Reasoning
A Large-scale Dataset and Benchmark for Commuting Origin-Destination Flow Generation
Multi-Reward as Condition for Instruction-based Image Editing
Linear Mode Connectivity in Differentiable Tree Ensembles
Discovering Influential Neuron Path in Vision Transformers
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
Re-Evaluating the Impact of Unseen-Class Unlabeled Data on Semi-Supervised Learning Model
Interleaved Scene Graphs for Interleaved Text-and-Image Generation Assessment
HyPoGen: Optimization-Biased Hypernetworks for Generalizable Policy Generation
Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing
REBIND: Enhancing Ground-state Molecular Conformation Prediction via Force-Based Graph Rewiring
MOFFlow: Flow Matching for Structure Prediction of Metal-Organic Frameworks
Transition Path Sampling with Improved Off-Policy Training of Diffusion Path Samplers
Generative Flows on Synthetic Pathway for Drug Design
Adaptive teachers for amortized samplers
Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation
ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents
CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation
Does Spatial Cognition Emerge in Frontier Models?
MIRACLE 3D: Memory-efficient Integrated Robust Approach for Continual Learning on 3D Point Clouds via Shape Model Construction
Self-Improvement in Language Models: The Sharpening Mechanism
CFD: Learning Generalized Molecular Representation via Concept-Enhanced Feedback Disentanglement
Salvage: Shapley-distribution Approximation Learning Via Attribution Guided Exploration for Explainable Image Classification
InstantSplamp: Fast and Generalizable Stenography Framework for Generative Gaussian Splatting
PivotMesh: Generic 3D Mesh Generation via Pivot Vertices Guidance
Variational Best-of-N Alignment
A Quantum Circuit-Based Compression Perspective for Parameter-Efficient Learning
Knowledge Graph Finetuning Enhances Knowledge Manipulation in Large Language Models
Bringing NeRFs to the Latent Space: Inverse Graphics Autoencoder
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions
Don't Take Things Out of Context: Attention Intervention for Enhancing Chain-of-Thought Reasoning in Large Language Models
MANTRA: The Manifold Triangulations Assemblage
Unveiling the Magic of Code Reasoning through Hypothesis Decomposition and Amendment
SigDiffusions: Score-Based Diffusion Models for Time Series via Log-Signature Embeddings
SC-OmniGS: Self-Calibrating Omnidirectional Gaussian Splatting
Differentially Private Federated Learning with Time-Adaptive Privacy Spending
POGEMA: A Benchmark Platform for Cooperative Multi-Agent Pathfinding
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
Generating Likely Counterfactuals Using Sum-Product Networks
Improved Sampling Of Diffusion Models In Fluid Dynamics With Tweedie's Formula
GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models
Fengbo: a Clifford Neural Operator pipeline for 3D PDEs in Computational Fluid Dynamics
Random-Set Neural Networks
Adaptive Camera Sensor for Vision Models
Less is More: Masking Elements in Image Condition Features Avoids Content Leakages in Style Transfer Diffusion Models
Learning Interpretable Hierarchical Dynamical Systems Models from Time Series Data
Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning
Support is All You Need for Certified VAE Training
Smoothing the Shift: Towards Stable Test-Time Adaptation under Complex Multimodal Noises
CryoFM: A Flow-based Foundation Model for Cryo-EM Densities
CREAM: Consistency Regularized Self-Rewarding Language Models
ObscuraCoder: Powering Efficient Code LM Pre-Training Via Obfuscation Grounding
From Promise to Practice: Realizing High-performance Decentralized Training
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion
Active Learning for Neural PDE Solvers
Convergence of Distributed Adaptive Optimization with Local Updates
Prototype antithesis for biological few-shot class-incremental learning
PALMBENCH: A COMPREHENSIVE BENCHMARK OF COMPRESSED LARGE LANGUAGE MODELS ON MOBILE PLATFORMS
MCNC: Manifold-Constrained Reparameterization for Neural Compression
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs
KooNPro: A Variance-Aware Koopman Probabilistic Model Enhanced by Neural Process for Time Series Forecasting
DiffGAD: A Diffusion-based Unsupervised Graph Anomaly Detector
Arithmetic Without Algorithms: Language Models Solve Math with a Bag of Heuristics
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents
PaCA: Partial Connection Adaptation for Efficient Fine-Tuning
NEAR: A Training-Free Pre-Estimator of Machine Learning Model Performance
Attention as a Hypernetwork
Biologically Constrained Barrel Cortex Model Integrates Whisker Inputs and Replicates Key Brain Network Dynamics
Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues
Uni-Sign: Toward Unified Sign Language Understanding at Scale
Generalizability of Neural Networks Minimizing Empirical Risk Based on Expressive Power
Comparing noisy neural population dynamics using optimal transport distances
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
Provable Robust Overfitting Mitigation in Wasserstein Distributionally Robust Optimization
Language Models are Advanced Anonymizers
Mitigating Spurious Correlations in Zero-Shot Multimodal Models
The Breakdown of Gaussian Universality in Classification of High-dimensional Linear Factor Mixtures
Mutual Effort for Efficiency: A Similarity-based Token Pruning for Vision Transformers in Self-Supervised Learning
Strong Preferences Affect the Robustness of Preference Models and Value Alignment
VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation
LeanAgent: Lifelong Learning for Formal Theorem Proving
Decomposition Polyhedra of Piecewise Linear Functions
GSE: Group-wise Sparse and Explainable Adversarial Attacks
Linear Bandits with Memory
Mechanistic Permutability: Match Features Across Layers
Efficient Learning with Sine-Activated Low-Rank Matrices
PhyMPGN: Physics-encoded Message Passing Graph Network for spatiotemporal PDE systems
Neural Interactive Proofs
cryoSPHERE: Single-Particle HEterogeneous REconstruction from cryo EM
CirT: Global Subseasonal-to-Seasonal Forecasting with Geometry-inspired Transformer
SimXRD-4M: Big Simulated X-ray Diffraction Data and Crystal Symmetry Classification Benchmark
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts
Learn Your Reference Model for Real Good Alignment
Trajectory-LLM: A Language-based Data Generator for Trajectory Prediction in Autonomous Driving
Ward: Provable RAG Dataset Inference via LLM Watermarks
Generalizable Motion Planning via Operator Learning
GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation
Efficient Causal Decision Making with One-sided Feedback
Language-Assisted Feature Transformation for Anomaly Detection
AIR-BENCH 2024: A Safety Benchmark based on Regulation and Policies Specified Risk Categories
Learning Randomized Algorithms with Transformers
PEAR: Primitive Enabled Adaptive Relabeling for Boosting Hierarchical Reinforcement Learning
Language Representations Can be What Recommenders Need: Findings and Potentials
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Certifying Language Model Robustness with Fuzzed Randomized Smoothing: An Efficient Defense Against Backdoor Attacks
Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning
Circuit Representation Learning with Masked Gate Modeling and Verilog-AIG Alignment
ZeroDiff: Solidified Visual-semantic Correlation in Zero-Shot Learning
TimeMixer++: A General Time Series Pattern Machine for Universal Predictive Analysis
Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators
Exploring the Design Space of Visual Context Representation in Video MLLMs
L3Ms — Lagrange Large Language Models
On the Optimal Memorization Capacity of Transformers
Recovery of Causal Graph Involving Latent Variables via Homologous Surrogates
Learning Graph Invariance by Harnessing Spuriosity
Noisy Test-Time Adaptation in Vision-Language Models
Addressing Label Shift in Distributed Learning via Entropy Regularization
InstantSwap: Fast Customized Concept Swapping across Sharp Shape Differences
Understanding and Enhancing the Transferability of Jailbreaking Attacks
Towards Robust Multimodal Open-set Test-time Adaptation via Adaptive Entropy-aware Optimization
Instance-dependent Early Stopping
Towards Out-of-Modal Generalization without Instance-level Modal Correspondence
Audio Large Language Models Can Be Descriptive Speech Quality Evaluators
Towards Effective Evaluations and Comparisons for LLM Unlearning Methods
A Robust Method to Discover Causal or Anticausal Relation
InverseBench: Benchmarking Plug-and-Play Diffusion Priors for Inverse Problems in Physical Sciences
Chain-of-Focus Prompting: Leveraging Sequential Visual Cues to Prompt Large Autoregressive Vision Models
Flow: Modularized Agentic Workflow Automation
DEEM: Diffusion models serve as the eyes of large language models for image perception
Perplexity Trap: PLM-Based Retrievers Overrate Low Perplexity Documents
GPS: A Probabilistic Distributional Similarity with Gumbel Priors for Set-to-Set Matching
Structuring Benchmark into Knowledge Graphs to Assist Large Language Models in Retrieving and Designing Models
End-to-end Learning of Gaussian Mixture Priors for Diffusion Sampler
Mind the GAP: Glimpse-based Active Perception improves generalization and sample efficiency of visual reasoning
HiRA: Parameter-Efficient Hadamard High-Rank Adaptation for Large Language Models
Grokking at the Edge of Numerical Stability
MELODI: Exploring Memory Compression for Long Contexts
LiFT: Learning to Fine-Tune via Bayesian Parameter Efficient Meta Fine-Tuning
Approaching Rate-Distortion Limits in Neural Compression with Lattice Transform Coding
Sequential Controlled Langevin Diffusions
Balanced Neural ODEs: nonlinear model order reduction and Koopman operator approximations
Deep Signature: Characterization of Large-Scale Molecular Dynamics
MIND: Math Informed syNthetic Dialogues for Pretraining LLMs
T2V2: A Unified Non-Autoregressive Model for Speech Recognition and Synthesis via Multitask Learning
FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling
AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Identifying latent state transitions in non-linear dynamical systems
A Coefficient Makes SVRG Effective
Methods with Local Steps and Random Reshuffling for Generally Smooth Non-Convex Federated Optimization
Probabilistic Learning to Defer: Handling Missing Expert Annotations and Controlling Workload Distribution
Self-Supervised Diffusion Models for Electron-Aware Molecular Representation Learning
Spreading Out-of-Distribution Detection on Graphs
ReSi: A Comprehensive Benchmark for Representational Similarity Measures
Real-time design of architectural structures with differentiable mechanics and neural networks
Bridging Information Asymmetry in Text-video Retrieval: A Data-centric Approach
First-Person Fairness in Chatbots
Graph Neural Preconditioners for Iterative Solutions of Sparse Linear Systems
State Space Model Meets Transformer: A New Paradigm for 3D Object Detection
MGCFNN: A Neural MultiGrid Solver with Novel Fourier Neural Network for High Wave Number Helmholtz Equations
Unsupervised Meta-Learning via In-Context Learning
STAFF: Speculative Coreset Selection for Task-Specific Fine-tuning
Measuring memorization in RLHF for code completion
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
Training Language Models to Self-Correct via Reinforcement Learning
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
On the Relation between Trainability and Dequantization of Variational Quantum Learning Models
Towards Bridging Generalization and Expressivity of Graph Neural Networks
Epistemic Monte Carlo Tree Search
HShare: Fast LLM Decoding by Hierarchical Key-Value Sharing
Metric-Driven Attributions for Vision Transformers
ContraDiff: Planning Towards High Return States via Contrastive Learning
SONICS: Synthetic Or Not - Identifying Counterfeit Songs
HyperPLR: Hypergraph Generation through Projection, Learning, and Reconstruction
Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs
Uncertainty Modeling in Graph Neural Networks via Stochastic Differential Equations
A Stochastic Approach to the Subset Selection Problem via Mirror Descent
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
DECO: Unleashing the Potential of ConvNets for Query-based Detection and Segmentation
Signature Kernel Conditional Independence Tests in Causal Discovery for Stochastic Processes
Restyling Unsupervised Concept Based Interpretable Networks with Generative Models
GOttack: Universal Adversarial Attacks on Graph Neural Networks via Graph Orbits Learning
ReCogLab: a framework testing relational reasoning & cognitive hypotheses on LLMs
In Search of Forgotten Domain Generalization
RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation
FIRING-Net: A filtered feature recycling network for speech enhancement
Provable Uncertainty Decomposition via Higher-Order Calibration
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
See It from My Perspective: How Language Affects Cultural Bias in Image Understanding
Generalization Bounds for Canonicalization: A Comparative Study with Group Averaging
Optimality of Matrix Mechanism on $\ell_p^p$-metric
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
Learning to Adapt Frozen CLIP for Few-Shot Test-Time Domain Adaptation
MUSE: Machine Unlearning Six-Way Evaluation for Language Models
You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs
ZooProbe: A Data Engine for Evaluating, Exploring, and Evolving Large-scale Training Data for Multimodal LLMs
Fundamental Limitations on Subquadratic Alternatives to Transformers
Gumbel Counterfactual Generation From Language Models
Gradient-Free Generation for Hard-Constrained Systems
Training Free Guided Flow-Matching with Optimal Control
MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL
Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Hotspot-Driven Peptide Design via Multi-Fragment Autoregressive Extension
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt
LiveXiv - A Multi-Modal live benchmark based on Arxiv papers content
Robustness Reprogramming for Representation Learning
Fast Training of Sinusoidal Neural Fields via Scaling Initialization
When Prompt Engineering Meets Software Engineering: CNL-P as Natural and Robust "APIs'' for Human-AI Interaction
Point-SAM: Promptable 3D Segmentation Model for Point Clouds
Bridging the Gap between Database Search and \emph{De Novo} Peptide Sequencing with SearchNovo
Logical Consistency of Large Language Models in Fact-Checking
Vision-LSTM: xLSTM as Generic Vision Backbone
GameArena: Evaluating LLM Reasoning through Live Computer Games
Herald: A Natural Language Annotated Lean 4 Dataset
Information Theoretic Text-to-Image Alignment
UNSURE: self-supervised learning with Unknown Noise level and Stein's Unbiased Risk Estimate
Lasso Bandit with Compatibility Condition on Optimal Arm
Release the Powers of Prompt Tuning: Cross-Modality Prompt Transfer
MMD-Regularized Unbalanced Optimal Transport
Your Weak LLM is Secretly a Strong Teacher for Alignment
Fast Uncovering of Protein Sequence Diversity from Structure
Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient
Is In-Context Learning Sufficient for Instruction Following in LLMs?
Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs
VideoGrain: Modulating Space-Time Attention for Multi-Grained Video Editing
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
Reasoning with Latent Thoughts: On the Power of Looped Transformers
FaceShot: Bring Any Character into Life
Competitive Fair Scheduling with Predictions
3D Vision-Language Gaussian Splatting
Neural Sampling from Boltzmann Densities: Fisher-Rao Curves in the Wasserstein Geometry
Robust LLM safeguarding via refusal feature adversarial training
Fat-to-Thin Policy Optimization: Offline Reinforcement Learning with Sparse Policies
Learning the Optimal Stopping for Early Classification within Finite Horizons via Sequential Probability Ratio Test
RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything
Emergence of a High-Dimensional Abstraction Phase in Language Transformers
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Joint Graph Rewiring and Feature Denoising via Spectral Resonance
Efficient Model Editing with Task-Localized Sparse Fine-tuning
ThermalGaussian: Thermal 3D Gaussian Splatting
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
TD-Paint: Faster Diffusion Inpainting Through Time Aware Pixel Conditioning
Consistency Checks for Language Model Forecasters
Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance
ARB-LLM: Alternating Refined Binarizations for Large Language Models
To Trust or Not to Trust? Enhancing Large Language Models' Situated Faithfulness to External Contexts
ProtoSnap: Prototype Alignment For Cuneiform Signs
Causal Information Prioritization for Efficient Reinforcement Learning
Improved Convergence Rate for Diffusion Probabilistic Models
NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation
Learnable Expansion of Graph Operators for Multi-Modal Feature Fusion
Federated Granger Causality Learning For Interdependent Clients With State Space Representation
BaB-ND: Long-Horizon Motion Planning with Branch-and-Bound and Neural Dynamics
SmartRAG: Jointly Learn RAG-Related Tasks From the Environment Feedback
Towards Understanding Text Hallucination of Diffusion Models via Local Generation Bias
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Revisiting Random Walks for Learning on Graphs
NExUME: Adaptive Training and Inference for DNNs under Intermittent Power Environments
Safety Alignment Should be Made More Than Just a Few Tokens Deep
HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction
Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control
How to Find the Exact Pareto Front for Multi-Objective MDPs?
Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-Based Decision-Making Systems
Streamlining Prediction in Bayesian Deep Learning
Medium-Difficulty Samples Constitute Smoothed Decision Boundary for Knowledge Distillation on Pruned Datasets
Mixture of Attentions For Speculative Decoding
CLDyB: Towards Dynamic Benchmarking for Continual Learning with Pre-trained Models
Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression
3DMolFormer: A Dual-channel Framework for Structure-based Drug Discovery
A Theory of Initialisation's Impact on Specialisation
Closed-Form Merging of Parameter-Efficient Modules for Federated Continual Learning
Local convergence of simultaneous min-max algorithms to differential equilibrium on Riemannian manifold
GS-LiDAR: Generating Realistic LiDAR Point Clouds with Panoramic Gaussian Splatting
Range, not Independence, Drives Modularity in Biologically Inspired Representations
Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them
Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design
An Information Criterion for Controlled Disentanglement of Multimodal Data
Towards Synergistic Path-based Explanations for Knowledge Graph Completion: Exploration and Evaluation
DiffPC: Diffusion-based High Perceptual Fidelity Image Compression with Semantic Refinement
Revisit Micro-batch Clipping: Adaptive Data Pruning via Gradient Manipulation
AdaFisher: Adaptive Second Order Optimization via Fisher Information
Halton Scheduler for Masked Generative Image Transformer
AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution
Concept Bottleneck Large Language Models
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Privately Counting Partially Ordered Data
Perturbation-Restrained Sequential Model Editing
Adaptive Methods through the Lens of SDEs: Theoretical Insights on the Role of Noise
PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection
DataMan: Data Manager for Pre-training Large Language Models
Cached Multi-Lora Composition for Multi-Concept Image Generation
Towards General-Purpose Model-Free Reinforcement Learning
A Simple Framework for Open-Vocabulary Zero-Shot Segmentation
Is Your Multimodal Language Model Oversensitive to Safe Queries?
Near, far: Patch-ordering enhances vision foundation models' scene understanding
ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time
Poisson-Dirac Neural Networks for Modeling Coupled Dynamical Systems across Domains
From Complexity to Clarity: Analytical Expressions of Deep Neural Network Weights via Clifford Algebra and Convexity
Newton Meets Marchenko-Pastur: Massively Parallel Second-Order Optimization with Hessian Sketching and Debiasing
LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging
Exploring The Loss Landscape Of Regularized Neural Networks Via Convex Duality
Collaborative Discrete-Continuous Black-Box Prompt Learning for Language Models
DRoP: Distributionally Robust Data Pruning
VideoGLUE: Video General Understanding Evaluation of Foundation Models
Risk-Sensitive Variational Actor-Critic: A Model-Based Approach
Learning from End User Data with Shuffled Differential Privacy over Kernel Densities
Do Deep Neural Network Solutions Form a Star Domain?
Intrinsic Dimension Correlation: uncovering nonlinear connections in multimodal representations
Fréchet Wavelet Distance: A Domain-Agnostic Metric for Image Generation
Transformers Handle Endogeneity in In-Context Linear Regression
FairDen: Fair Density-Based Clustering
SCOPE: A Self-supervised Framework for Improving Faithfulness in Conditional Text Generation
Exploring the Camera Bias of Person Re-identification
Size-Generalizable RNA Structure Evaluation by Exploring Hierarchical Geometries
Proteina: Scaling Flow-based Protein Structure Generative Models
A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement
Energy-Based Diffusion Language Models for Text Generation
Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs
Truncated Consistency Models
A3D: Does Diffusion Dream about 3D Alignment?
Sensitivity Verification for Additive Decision Tree Ensembles
LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias
ProtComposer: Compositional Protein Structure Generation with 3D Ellipsoids
MRS: A Fast Sampler for Mean Reverting Diffusion based on ODE and SDE Solvers
Towards Multiple Character Image Animation Through Enhancing Implicit Decoupling
Mitigating Information Loss in Tree-Based Reinforcement Learning via Direct Optimization
PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-agent Tasks
MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models
Order-aware Interactive Segmentation
ODE-based Smoothing Neural Network for Reinforcement Learning Tasks
GS-CPR: Efficient Camera Pose Refinement via 3D Gaussian Splatting
Competing Large Language Models in Multi-Agent Gaming Environments
PEARL: Parallel Speculative Decoding with Adaptive Draft Length
Aligning Human Motion Generation with Human Perceptions
Multi-modal brain encoding models for multi-modal stimuli
Scale-aware Recognition in Satellite Images under Resource Constraints
Towards Interpreting Visual Information Processing in Vision-Language Models
Learning to Explore and Exploit with GNNs for Unsupervised Combinatorial Optimization
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequences
Physiome-ODE: A Benchmark for Irregularly Sampled Multivariate Time-Series Forecasting Based on Biological ODEs
ImageFolder: Autoregressive Image Generation with Folded Tokens
Resolution Attack: Exploiting Image Compression to Deceive Deep Neural Networks
PaLD: Detection of Text Partially Written by Large Language Models
Generalized Behavior Learning from Diverse Demonstrations
SoftMatcha: A Soft and Fast Pattern Matcher for Billion-Scale Corpus Searches
Fully-inductive Node Classification on Arbitrary Graphs
Relax and Merge: A Simple Yet Effective Framework for Solving Fair $k$-Means and $k$-sparse Wasserstein Barycenter Problems
Efficient Source-Free Time-Series Adaptation via Parameter Subspace Disentanglement
SymmCD: Symmetry-Preserving Crystal Generation with Diffusion Models
Do Contemporary Causal Inference Models Capture Real-World Heterogeneity? Findings from a Large-Scale Benchmark
Towards Understanding Why Label Smoothing Degrades Selective Classification and How to Fix It
Convex Formulations for Training Two-Layer ReLU Neural Networks
Underdamped Diffusion Bridges with Applications to Sampling
Large-scale and Fine-grained Vision-language Pre-training for Enhanced CT Image Understanding
Topograph: An Efficient Graph-Based Framework for Strictly Topology Preserving Image Segmentation
Simplifying Deep Temporal Difference Learning
Predicting the Energy Landscape of Stochastic Dynamical System via Physics-informed Self-supervised Learning
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Beyond FVD: An Enhanced Evaluation Metrics for Video Generation Distribution Quality
LoCoDL: Communication-Efficient Distributed Learning with Local Training and Compression
Think Then React: Towards Unconstrained Action-to-Reaction Motion Generation
RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards
Divergence of Neural Tangent Kernel in Classification Problems
Locally Connected Echo State Networks for Time Series Forecasting
Let SSMs be ConvNets: State-space Modeling with Optimal Tensor Contractions
Aioli: A Unified Optimization Framework for Language Model Data Mixing
CL-DiffPhyCon: Closed-loop Diffusion Control of Complex Physical Systems
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension
Towards Continuous Reuse of Graph Models via Holistic Memory Diversification
Universal Image Restoration Pre-training via Degradation Classification
Connectome Mapping: Shape-Memory Network via Interpretation of Contextual Semantic Information
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
ND-SDF: Learning Normal Deflection Fields for High-Fidelity Indoor Reconstruction
A Training-Free Sub-quadratic Cost Transformer Model Serving Framework with Hierarchically Pruned Attention
Steering Protein Family Design through Profile Bayesian Flow
Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction
Improved Training Technique for Latent Consistency Models
Joint Reward and Policy Learning with Demonstrations and Human Feedback Improves Alignment
Gaussian Ensemble Belief Propagation for Efficient Inference in High-Dimensional, Black-box Systems
Injecting Universal Jailbreak Backdoors into LLMs in Minutes
Generalization and Distributed Learning of GFlowNets
Storybooth: Training-Free Multi-Subject Consistency for Improved Visual Storytelling
Humanizing the Machine: Proxy Attacks to Mislead LLM Detectors
Vec2Face: Scaling Face Dataset Generation with Loosely Constrained Vectors
Tamper-Resistant Safeguards for Open-Weight LLMs
Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models
CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and Reranking
Unlocking Efficient, Scalable, and Continual Knowledge Editing with Basis-Level Representation Fine-Tuning
OpenHands: An Open Platform for AI Software Developers as Generalist Agents
Learning Robust Representations with Long-Term Information for Generalization in Visual Reinforcement Learning
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Model Alignment
Biologically Plausible Brain Graph Transformer
Select before Act: Spatially Decoupled Action Repetition for Continuous Control
IterGen: Iterative Semantic-aware Structured LLM Generation with Backtracking
Can Video LLMs Refuse to Answer? Alignment for Answerability in Video Large Language Models
{$\tau$}-bench: A Benchmark for \underline{T}ool-\underline{A}gent-\underline{U}ser Interaction in Real-World Domains
Denoising Levy Probabilistic Models
Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba Models
Context-aware Dynamic Pruning for Speech Foundation Models
QuaDiM: A Conditional Diffusion Model For Quantum State Property Estimation
Expected Sliced Transport Plans
u-$\mu$P: The Unit-Scaled Maximal Update Parametrization
Continuous Diffusion for Mixed-Type Tabular Data
Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG
GrabS: Generative Embodied Agent for 3D Object Segmentation without Scene Supervision
A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts
RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph
Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval
CATCH: Channel-Aware Multivariate Time Series Anomaly Detection via Frequency Patching
Scaling Diffusion Language Models via Adaptation from Autoregressive Models
Law of the Weakest Link: Cross Capabilities of Large Language Models
No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images
ACES: Automatic Cohort Extraction System for Event-Stream Datasets
Does Refusal Training in LLMs Generalize to the Past Tense?
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Broadening Target Distributions for Accelerated Diffusion Models via a Novel Analysis Approach
GaussianAnything: Interactive Point Cloud Flow Matching for 3D Generation
AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors
DELTA: DENSE EFFICIENT LONG-RANGE 3D TRACKING FOR ANY VIDEO
MuHBoost: Multi-Label Boosting For Practical Longitudinal Human Behavior Modeling
Multi-session, multi-task neural decoding from distinct cell-types and brain regions
KAN: Kolmogorov–Arnold Networks
Connecting Federated ADMM to Bayes
Differentiable Causal Discovery for Latent Hierarchical Causal Models
Structure Language Models for Protein Conformation Generation
$q$-exponential family for policy optimization
A Probabilistic Perspective on Unlearning and Alignment for Large Language Models
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
Not-So-Optimal Transport Flows for 3D Point Cloud Generation
Matérn Kernels for Tunable Implicit Surface Reconstruction
Flow Matching with Gaussian Process Priors for Probabilistic Time Series Forecasting
Decision Tree Induction Through LLMs via Semantically-Aware Evolution
Deep MMD Gradient Flow without adversarial training
Pushing the Limits of All-Atom Geometric Graph Neural Networks: Pre-Training, Scaling, and Zero-Shot Transfer
Deep Learning Alternatives Of The Kolmogorov Superposition Theorem
Diffusion Models and Gaussian Flow Matching: Two Sides of the Same Coin
AutoG: Towards automatic graph construction from tabular data
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
Differential Transformer
Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model
Deep Linear Probe Generators for Weight Space Learning
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents
HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics
ADAM: An Embodied Causal Agent in Open-World Environments
Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction
One Step Diffusion via Shortcut Models
Toward Understanding In-context vs. In-weight Learning
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
Understanding Long Videos with Multimodal Language Models
A Theoretical Framework for Partially-Observed Reward States in RLHF
ReNovo: Retrieval-Based \emph{De Novo} Mass Spectrometry Peptide Sequencing
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
A New Perspective on Shampoo's Preconditioner
LASER: A Neuro-Symbolic Framework for Learning Spatio-Temporal Scene Graphs with Weak Supervision
Test-time Adaptation for Regression by Subspace Alignment
Programming Refusal with Conditional Activation Steering
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
ET-SEED: EFFICIENT TRAJECTORY-LEVEL SE(3) EQUIVARIANT DIFFUSION POLICY
PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer
Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson–Romberg Extrapolation
SINGAPO: Single Image Controlled Generation of Articulated Parts in Objects
Rethinking Visual Counterfactual Explanations Through Region Constraint
Scalable Mechanistic Neural Networks
Inverse Attention Agents for Multi-Agent Systems
ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning
A Second-Order Perspective on Model Compositionality and Incremental Learning
SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints
Progressive Mixed-Precision Decoding for Efficient LLM Inference
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel
DoF: A Diffusion Factorization Framework for Offline Multi-Agent Reinforcement Learning
KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models
JPEG Inspired Deep Learning
Physics-Informed Deep Inverse Operator Networks for Solving PDE Inverse Problems
Controllable Generation via Locally Constrained Resampling
$\text{I}^2\text{AM}$: Interpreting Image-to-Image Latent Diffusion Models via Bi-Attribution Maps
Deep Random Features for Scalable Interpolation of Spatiotemporal Data
Discriminating image representations with principal distortions
Model-based Offline Reinforcement Learning with Lower Expectile Q-Learning
Laplace Sample Information: Data Informativeness Through a Bayesian Lens
PooDLe🐩: Pooled and dense self-supervised learning from naturalistic videos
MAGNet: Motif-Agnostic Generation of Molecules from Scaffolds
TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees
REGENT: A Retrieval-Augmented Generalist Agent That Can Act In-Context in New Environments
MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs
Direct Distributional Optimization for Provable Alignment of Diffusion Models
TRENDy: Temporal Regression of Effective Nonlinear Dynamics
What should a neuron aim for? Designing local objective functions based on information theory
Equivariant Masked Position Prediction for Efficient Molecular Representation
Decoupled Graph Energy-based Model for Node Out-of-Distribution Detection on Heterophilic Graphs
Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction
TSC-Net: Prediction of Pedestrian Trajectories by Trajectory-Scene-Cell Classification
Optimization by Parallel Quasi-Quantum Annealing with Gradient-Based Sampling
Demystifying Topological Message-Passing with Relational Structures: A Case Study on Oversquashing in Simplicial Message-Passing
From Search to Sampling: Generative Models for Robust Algorithmic Recourse
Aligned LLMs Are Not Aligned Browser Agents
Unsupervised Model Tree Heritage Recovery
Understanding Virtual Nodes: Oversquashing and Node Heterogeneity
Decoupled Finetuning for Domain Generalizable Semantic Segmentation
Federated Class-Incremental Learning: A Hybrid Approach Using Latent Exemplars and Data-Free Techniques to Address Local and Global Forgetting
Conditional Diffusion Models are Minimax-Optimal and Manifold-Adaptive for Conditional Distribution Estimation
A Watermark for Order-Agnostic Language Models
RefactorBench: Evaluating Stateful Reasoning in Language Agents Through Code
Measuring And Improving Persuasiveness Of Large Language Models
Singular Subspace Perturbation Bounds via Rectangular Random Matrix Diffusions
Text2PDE: Latent Diffusion Models for Accessible Physics Simulation
Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis
On the Expressive Power of Sparse Geometric MPNNs
Should VLMs be Pre-trained with Image Data?
EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition
Language models scale reliably with over-training and on downstream tasks
Zero-shot Imputation with Foundation Inference Models for Dynamical Systems
Specialized Foundation Models Struggle to Beat Supervised Baselines
Temporal Difference Learning: Why It Can Be Fast and How It Will Be Faster
CBraMod: A Criss-Cross Brain Foundation Model for EEG Decoding
PaPaGei: Open Foundation Models for Optical Physiological Signals
MAST: model-agnostic sparsified training
Limits to scalable evaluation at the frontier: LLM as judge won’t beat twice the data
Fewer May Be Better: Enhancing Offline Reinforcement Learning with Reduced Dataset
Gradient descent with generalized Newton’s method
Multi-Scale Fusion for Object Representation
MAP: Multi-Human-Value Alignment Palette
Systematic Relational Reasoning With Epistemic Graph Neural Networks
Nonconvex Stochastic Optimization under Heavy-Tailed Noises: Optimal Convergence without Gradient Clipping
Tracking objects that change in appearance with phase synchrony
Adaptive Rank Allocation: Speeding Up Modern Transformers with RaNA Adapters
No Need to Talk: Asynchronous Mixture of Language Models
It Helps to Take a Second Opinion: Teaching Smaller LLMs To Deliberate Mutually via Selective Rationale Optimisation
Learning and aligning single-neuron invariance manifolds in visual cortex
Learning to Help in Multi-Class Settings
Online Clustering with Nearly Optimal Consistency
SANA: Efficient High-Resolution Text-to-Image Synthesis with Linear Diffusion Transformers
Chain-of-Thought Provably Enables Learning the (Otherwise) Unlearnable
JetFormer: An autoregressive generative model of raw images and text
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
Flaws of ImageNet, Computer Vision's Favourite Dataset
Outlier Synthesis via Hamiltonian Monte Carlo for Out-of-Distribution Detection
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Learning-Augmented Search Data Structures
CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs
TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning
Multi-Draft Speculative Sampling: Canonical Decomposition and Theoretical Limits
Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models
Generating Less Certain Adversarial Examples Improves Robust Generalization
On the Adversarial Vulnerability of Label-Free Test-Time Adaptation
Efficient Reinforcement Learning with Large Language Model Priors
MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis
BP-Modified Local Loss for Efficient Training of Deep Neural Networks
Anti-Exposure Bias in Diffusion Models
HelpSteer2-Preference: Complementing Ratings with Preferences
Continuous Exposure Learning for Low-light Image Enhancement using Neural ODEs
Learning to Discover Regulatory Elements for Gene Expression Prediction
A Theoretically-Principled Sparse, Connected, and Rigid Graph Representation of Molecules
EG4D: Explicit Generation of 4D Object without Score Distillation
Moral Alignment for LLM Agents
TVNet: A Novel Time Series Analysis Method Based on Dynamic Convolution and 3D-Variation
NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval
From Layers to States: A State Space Model Perspective to Deep Neural Network Layer Dynamics
Learning Neural Networks with Distribution Shift: Efficiently Certifiable Guarantees
FlickerFusion: Intra-trajectory Domain Generalizing Multi-agent Reinforcement Learning
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation
Nonlinear Sequence Embedding by Monotone Variational Inequality
Re-Aligning Language to Visual Objects with an Agentic Workflow
HELM: Hierarchical Encoding for mRNA Language Modeling
X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention
Zigzag Diffusion Sampling: Diffusion Models Can Self-Improve via Self-Reflection
Spectral-Refiner: Accurate Fine-Tuning of Spatiotemporal Fourier Neural Operator for Turbulent Flows
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Asymmetric Factorized Bilinear Operation for Vision Transformer
DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life
Think while You Generate: Discrete Diffusion with Planned Denoising
Mitigating Memorization in Language Models
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
Mining your own secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs
PIED: Physics-Informed Experimental Design for Inverse Problems
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints
SFS: Smarter Code Space Search improves LLM Inference Scaling
PAD: Personalized Alignment at Decoding-time
On the Almost Sure Convergence of the Stochastic Three Points Algorithm
OGBench: Benchmarking Offline Goal-Conditioned RL
Do Egocentric Video-Language Models Truly Understand Hand-Object Interactions?
Following the Human Thread in Social Navigation
DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation
DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving
Trusted Multi-View Classification via Evolutionary Multi-View Fusion
STORM: Spatio-TempOral Reconstruction Model For Large-Scale Outdoor Scenes
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
A Periodic Bayesian Flow for Material Generation
On the Fourier analysis in the SO(3) space : the EquiLoPO Network
Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching
AdaRankGrad: Adaptive Gradient Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning
Unsupervised Disentanglement of Content and Style via Variance-Invariance Constraints
AdaManip: Adaptive Articulated Object Manipulation Environments and Policy Learning
Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement Learning
Generalized Principal-Agent Problem with a Learning Agent
Framer: Interactive Frame Interpolation
Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems
Efficient and Accurate Explanation Estimation with Distribution Compression
MambaExtend: A Training-Free Approach to Improve Long Context Extension of Mamba
Harnessing Diversity for Important Data Selection in Pretraining Large Language Models
Diff-Prompt: Diffusion-driven Prompt Generator with Mask Supervision
Easing Training Process of Rectified Flow Models Via Lengthening Inter-Path Distance
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation
Neuron based Personality Trait Induction in Large Language Models
Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games
Counterfactual Concept Bottleneck Models
Pursuing Better Decision Boundaries for Long-Tailed Object Detection via Category Information Amount
Spectral Compressive Imaging via Unmixing-driven Subspace Diffusion Refinement
LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation models
Cross-Domain Offline Policy Adaptation with Optimal Transport and Dataset Constraint
Subtask-Aware Visual Reward Learning from Segmented Demonstrations
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Complementary Label Learning with Positive Label Guessing and Negative Label Enhancement
Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering
Extreme Risk Mitigation in Reinforcement Learning using Extreme Value Theory
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agent
HOPE for a Robust Parameterization of Long-memory State Space Models
One Hundred Neural Networks and Brains Watching Videos: Lessons from Alignment
VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
Zeroth-Order Fine-Tuning of LLMs with Transferable Static Sparsity
For Better or For Worse? Learning Minimum Variance Features With Label Augmentation
Accessing Vision Foundation Models via ImageNet-1K
Reducing Hallucinations in Large Vision-Language Models via Latent Space Steering
Certified Robustness Under Bounded Levenshtein Distance
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Apollo-MILP: An Alternating Prediction-Correction Neural Solving Framework for Mixed-Integer Linear Programming
DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation
Inference Optimal VLMs Need Fewer Visual Tokens and More Parameters
Selective Unlearning via Representation Erasure Using Domain Adversarial Training
Data-adaptive Differentially Private Prompt Synthesis for In-Context Learning
Adapters for Altering LLM Vocabularies: What Languages Benefit the Most?
An Asynchronous Bundle Method for Distributed Learning Problems
Agent Skill Acquisition for Large Language Models via CycleQD
Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning
Streaming Algorithms For $\ell_p$ Flows and $\ell_p$ Regression
A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules
ThunderKittens: Simple, Fast, and $\textit{Adorable}$ Kernels
Shallow diffusion networks provably learn hidden low-dimensional structure
CtD: Composition through Decomposition in Emergent Communication
Manifolds, Random Matrices and Spectral Gaps: The geometric phases of generative diffusion
Scaling Laws for Precision
Higher-Order Graphon Neural Networks: Approximation and Cut Distance
Enhancing Prediction Performance through Influence Measure
LoLCATs: On Low-Rank Linearizing of Large Language Models
A Differentiable Rank-Based Objective for Better Feature Learning
A Non-Contrastive Learning Framework for Sequential Recommendation with Preference-Preserving Profile Generation
Ranking-aware adapter for text-driven image ordering with CLIP
Action abstractions for amortized sampling
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
Scalable Influence and Fact Tracing for Large Language Model Pretraining
Language Imbalance Driven Rewarding for Multilingual Self-improving
Gyrogroup Batch Normalization
Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Dynamic Scenes
Machine Unlearning via Simulated Oracle Matching
SysBench: Can LLMs Follow System Message?
Understanding Matrix Function Normalizations in Covariance Pooling through the Lens of Riemannian Geometry
Distribution-Specific Agnostic Conditional Classification With Halfspaces
GALA: Geometry-Aware Local Adaptive Grids for Detailed 3D Generation
Distilling Structural Representations into Protein Sequence Models
YOLO-RD: Introducing Relevant and Compact Explicit Knowledge to YOLO by Retriever-Dictionary
To Tackle Adversarial Transferability: A Novel Ensemble Training Method with Fourier Transformation
DartControl: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control
SMITE: Segment Me In TimE
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
Can Watermarks be Used to Detect LLM IP Infringement For Free?
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Timer-XL: Long-Context Transformers for Unified Time Series Forecasting
miniCTX: Neural Theorem Proving with (Long-)Contexts
Training-free Camera Control for Video Generation
ReAttention: Training-Free Infinite Context with Finite Attention Scope
Prevalence of Negative Transfer in Continual Reinforcement Learning: Analyses and a Simple Baseline
Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images
VLAS: Vision-Language-Action Model with Speech Instructions for Customized Robot Manipulation
Scaling Large Language Model-based Multi-Agent Collaboration
Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model
Gaussian Differentially Private Human Faces Under a Face Radial Curve Representation
Injective flows for star-like manifolds
Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation
Towards Calibrated Deep Clustering Network
ToddlerDiffusion: Interactive Structured Image Generation with Cascaded Schrödinger Bridge
ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities
An Exploration with Entropy Constrained 3D Gaussians for 2D Video Compression
Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge
Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
IDInit: A Universal and Stable Initialization Method for Neural Network Training
Residual Deep Gaussian Processes on Manifolds
Reinforcement Learning from Imperfect Corrective Actions and Proxy Rewards
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts
Conformalized Survival Analysis for General Right-Censored Data
Private Mechanism Design via Quantile Estimation
Jamba: Hybrid Transformer-Mamba Language Models
From Decoupling to Adaptive Transformation: a Wider Optimization Space for PTQ
Efficient Off-Policy Learning for High-Dimensional Action Spaces
Accelerated training through iterative gradient propagation along the residual path
How Does Critical Batch Size Scale in Pre-training?
Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers
Systems with Switching Causal Relations: A Meta-Causal Perspective
PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing
RetroInText: A Multimodal Large Language Model Enhanced Framework for Retrosynthetic Planning via In-Context Representation Learning
SparsyFed: Sparse Adaptive Federated Learning
DEPT: Decoupled Embeddings for Pre-training Language Models
MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines
What Makes a Maze Look Like a Maze?
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
Learning vector fields of differential equations on manifolds with geometrically constrained operator-valued kernels
Provably Robust Explainable Graph Neural Networks against Graph Perturbation Attacks
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine
Enhancing Learning with Label Differential Privacy by Vector Approximation
Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation
PseDet: Revisiting the Power of Pseudo Label in Incremental Object Detection
Improving Reasoning Performance in Large Language Models via Representation Engineering
Sparse components distinguish visual pathways & their alignment to neural networks
Conditional Testing based on Localized Conformal $p$-values
Efficient Distribution Matching of Representations via Noise-Injected Deep InfoMax
NeuroLM: A Universal Multi-task Foundation Model for Bridging the Gap between Language and EEG Signals
IV-mixed Sampler: Leveraging Image Diffusion Models for Enhanced Video Synthesis
Revisiting text-to-image evaluation with Gecko: on metrics, prompts, and human rating
Watermark Anything With Localized Messages
Start Smart: Leveraging Gradients For Enhancing Mask-based XAI Methods
Controllable Context Sensitivity and the Knob Behind It
Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling
Generalization v.s. Memorization: Tracing Language Models’ Capabilities Back to Pretraining Data
On Designing General and Expressive Quantum Graph Neural Networks with Applications to MILP Instance Representation
SafeDiffuser: Safe Planning with Diffusion Probabilistic Models
Harnessing Webpage UIs for Text-Rich Visual Understanding
Implicit Bias of Mirror Flow for Shallow Neural Networks in Univariate Regression
X-Drive: Cross-modality Consistent Multi-Sensor Data Synthesis for Driving Scenarios
SOAP: Improving and Stabilizing Shampoo using Adam for Language Modeling
Streamlining Redundant Layers to Compress Large Language Models
Let Your Features Tell The Differences: Understanding Graph Convolution By Feature Splitting
Multimodal Situational Safety
AutoEval: Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks
Maximizing the Potential of Synthetic Data: Insights from Random Matrix Theory
Sail into the Headwind: Alignment via Robust Rewards and Dynamic Labels against Reward Hacking
Episodic Novelty Through Temporal Distance
Convergent Privacy Loss of Noisy-SGD without Convexity and Smoothness
Optimizing Posterior Samples for Bayesian Optimization via Rootfinding
What Are Good Positional Encodings for Directed Graphs?
Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
How Far Are We from True Unlearnability?
TASAR: Transfer-based Attack on Skeletal Action Recognition
Fast and Slow Streams for Online Time Series Forecasting Without Information Leakage
SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
AlphaEdit: Null-Space Constrained Model Editing for Language Models
Gaussian Head & Shoulders: High Fidelity Neural Upper Body Avatars with Anchor Gaussian Guided Texture Warping
Backdooring Vision-Language Models with Out-Of-Distribution Data
Masked Temporal Interpolation Diffusion for Procedure Planning in Instructional Videos
RandLoRA: Full rank parameter-efficient fine-tuning of large models
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
Agents' Room: Narrative Generation through Multi-step Collaboration
Learning Continually by Spectral Regularization
Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection
DLEFT-MKC: Dynamic Late Fusion Multiple Kernel Clustering with Robust Tensor Learning via Min-Max Optimization
SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation
Data Shapley in One Training Run
Energy-Weighted Flow Matching for Offline Reinforcement Learning
Sufficient Context: A New Lens on Retrieval Augmented Generation Systems
SecureGS: Boosting the Security and Fidelity of 3D Gaussian Splatting Steganography
Atlas Gaussians Diffusion for 3D Generation
The OMG dataset: An Open MetaGenomic corpus for mixed-modality genomic language modeling
3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
Remove Symmetries to Control Model Expressivity and Improve Optimization
Round and Round We Go! What makes Rotary Positional Encodings useful?
RTDiff: Reverse Trajectory Synthesis via Diffusion for Offline Reinforcement Learning
Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages
Learning Gain Map for Inverse Tone Mapping
Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception
Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning
Towards counterfactual fairness through auxiliary variables
Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models
QPM: Discrete Optimization for Globally Interpretable Image Classification
Multi-agent cooperation through learning-aware policy gradients
Enhancing Pre-trained Representation Classifiability can Boost its Interpretability
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Presto! Distilling Steps and Layers for Accelerating Music Generation
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Unhackable Temporal Reward for Scalable Video MLLMs
Taming Transformer Without Using Learning Rate Warmup
RaSA: Rank-Sharing Low-Rank Adaptation
SMT: Fine-Tuning Large Language Models with Sparse Matrices
3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds
Rethinking Reward Modeling in Preference-based Large Language Model Alignment
Bridging Compressed Image Latents and Multimodal Large Language Models
Oscillatory State-Space Models
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation
Weakly Supervised Video Scene Graph Generation via Natural Language Supervision
Optimizing $(L_0, L_1)$-Smooth Functions by Gradient Methods
Rethinking Neural Multi-Objective Combinatorial Optimization via Neat Weight Embedding
ADMM for Nonconvex Optimization under Minimal Continuity Assumption
TFG-Flow: Training-free Guidance in Multimodal Generative Flow
MMQA: Evaluating LLMs with Multi-Table Multi-Hop Complex Questions
Rational Decision-Making Agent with Learning Internal Utility Judgment
Neural Functions for Learning Periodic Signal
ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning
Self-Normalized Resets for Plasticity in Continual Learning
Implicit In-context Learning
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
SLMRec: Distilling Large Language Models into Small for Sequential Recommendation
Natural Language Inference Improves Compositionality in Vision-Language Models
Teaching LLMs How to Learn with Contextual Fine-Tuning
Dynamic Neural Fortresses: An Adaptive Shield for Model Extraction Defense
Synergy Between Sufficient Changes and Sparse Mixing Procedure for Disentangled Representation Learning
InstaRevive: One-Step Image Enhancement via Dynamic Score Matching
JudgeBench: A Benchmark for Evaluating LLM-Based Judges
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic Learning
Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes
GridMix: Exploring Spatial Modulation for Neural Fields in PDE Modeling
Graph-based Document Structure Analysis
Realistic Evaluation of Deep Partial-Label Learning Algorithms
Learning How Hard to Think: Input-Adaptive Allocation of LM Computation
EmbedLLM: Learning Compact Representations of Large Language Models
Node Similarities under Random Projections: Limits and Pathological Cases
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
Federated $Q$-Learning with Reference-Advantage Decomposition: Almost Optimal Regret and Logarithmic Communication Cost
GIFT: Unlocking Full Potential of Labels in Distilled Dataset at Near-zero Cost
SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
Do Large Language Models Truly Understand Geometric Structures?
Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning
CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning
Shot2Story: A New Benchmark for Comprehensive Understanding of Multi-shot Videos
Inference Scaling for Long-Context Retrieval Augmented Generation
Expand and Compress: Exploring Tuning Principles for Continual Spatio-Temporal Graph Forecasting
h4rm3l: A Language for Composable Jailbreak Attack Synthesis
Wavelet Diffusion Neural Operator
Representational Similarity via Interpretable Visual Concepts
Differentiable Integer Linear Programming
Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning
Proxy Denoising for Source-Free Domain Adaptation
SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Model
MAESTRO: Masked Encoding Set Transformer with Self-Distillation
Determine-Then-Ensemble: Necessity of Top-k Union for Large Language Model Ensembling
AgentRefine: Enhancing Agent Generalization through Refinement Tuning
On Stochastic Contextual Bandits with Knapsacks in Small Budget Regime
Event-Driven Online Vertical Federated Learning
GOLD: Graph Out-of-Distribution Detection via Implicit Adversarial Latent Generation
MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection
Shape as Line Segments: Accurate and Flexible Implicit Surface Representation
Efficient Sparse PCA via Block-Diagonalization
A Geometric Framework for Understanding Memorization in Generative Models
PolaFormer: Polarity-aware Linear Attention for Vision Transformers
6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering
The Crucial Role of Samplers in Online Direct Preference Optimization
Shh, don't say that! Domain Certification in LLMs
Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object Detection
FOSP: Fine-tuning Offline Safe Policy through World Models
DataGen: Unified Synthetic Dataset Generation via Large Language Models
Differentially private optimization for non-decomposable objective functions
Topological Blindspots: Understanding and Extending Topological Deep Learning Through the Lens of Expressivity
GraphBridge: Towards Arbitrary Transfer Learning in GNNs
Retrieval Head Mechanistically Explains Long-Context Factuality
The Ramanujan Library - Automated Discovery on the Hypergraph of Integer Relations
EMOS: Embodiment-aware Heterogeneous Multi-robot Operating System with LLM Agents
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
Adapt-$\infty$: Scalable Continual Multimodal Instruction Tuning via Dynamic Data Selection
Local-Prompt: Extensible Local Prompts for Few-Shot Out-of-Distribution Detection
Privacy-Preserving Personalized Federated Prompt Learning for Multimodal Large Language Models
Efficient Multi-agent Offline Coordination via Diffusion-based Trajectory Stitching
Matcha: Mitigating Graph Structure Shifts with Test-Time Adaptation
MIND over Body: Adaptive Thinking using Dynamic Computation
The adaptive complexity of parallelized log-concave sampling
Selective Task Group Updates for Multi-Task Optimization
Probe before You Talk: Towards Black-box Defense against Backdoor Unalignment for Large Language Models
Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation
Youku Dense Caption: A Large-scale Chinese Video Dense Caption Dataset and Benchmarks
ReGen: Generative Robot Simulation via Inverse Design
Offline RL in Regular Decision Processes: Sample Efficiency via Language Metrics
Reveal Object in Lensless Photography via Region Gaze and Amplification
R2Det: Exploring Relaxed Rotation Equivariance in 2D Object Detection
How much of my dataset did you use? Quantitative Data Usage Inference in Machine Learning
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements
CAKE: Cascading and Adaptive KV Cache Eviction with Layer Preferences
Improved Approximation Algorithms for $k$-Submodular Maximization via Multilinear Extension
Segment Any 3D Object with Language
DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models
Learning Spatial-Semantic Features for Robust Video Object Segmentation
A Graph Enhanced Symbolic Discovery Framework For Efficient Logic Optimization
RECAST: Reparameterized, Compact weight Adaptation for Sequential Tasks
CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models
Large Convolutional Model Tuning via Filter Subspace
Black-Box Detection of Language Model Watermarks
Cut Your Losses in Large-Vocabulary Language Models
Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs
KinPFN: Bayesian Approximation of RNA Folding Kinetics using Prior-Data Fitted Networks
Scaling FP8 training to trillion-token LLMs
Boosting Methods for Interval-censored Data with Regression and Classification
Interpreting Emergent Planning in Model-Free Reinforcement Learning
Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model
DEPfold: RNA Secondary Structure Prediction as Dependency Parsing.
Balancing Act: Diversity and Consistency in Large Language Model Ensembles
OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Enhancing Document Understanding with Group Position Embedding: A Novel Approach to Incorporate Layout Information
Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory
Reinforcement learning with combinatorial actions for coupled restless bandits
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning
ADMM for Structured Fractional Minimization
Attention layers provably solve single-location regression
Convergence and Implicit Bias of Gradient Descent on Continual Linear Classification
Trivialized Momentum Facilitates Diffusion Generative Modeling on Lie Groups
Leveraging Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning
Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation
Sensitivity-Constrained Fourier Neural Operators for Forward and Inverse Problems in Parametric Differential Equations
Dynamic Assortment Selection and Pricing with Censored Preference Feedback
OATS: Outlier-Aware Pruning Through Sparse and Low Rank Decomposition
Cocoon: Robust Multi-Modal Perception with Uncertainty-Aware Sensor Fusion
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Faster Algorithms for Structured Linear and Kernel Support Vector Machines
Interactive Adjustment for Human Trajectory Prediction with Individual Feedback
Fast Feedforward 3D Gaussian Splatting Compression
Estimating the Probabilities of Rare Outputs in Language Models
Probabilistic Language-Image Pre-Training
QA-Calibration of Language Model Confidence Scores
Imputation for prediction: beware of diminishing returns.
Speech Robust Bench: A Robustness Benchmark For Speech Recognition
Physics-aligned field reconstruction with diffusion bridge
Can Textual Gradient Work in Federated Learning?
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution
Strength Estimation and Human-Like Strength Adjustment in Games
Accelerating neural network training: An analysis of the AlgoPerf competition
Scrutinize What We Ignore: Reining In Task Representation Shift Of Context-Based Offline Meta Reinforcement Learning
SPDIM: Source-Free Unsupervised Conditional and Label Shift Adaptation in EEG
How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension
High-Dynamic Radar Sequence Prediction for Weather Nowcasting Using Spatiotemporal Coherent Gaussian Representation
Digi-Q: Learning VLM Q-Value Functions for Training Device-Control Agents
MrSteve: Instruction-Following Agents in Minecraft with What-Where-When Memory
Improving Neural Optimal Transport via Displacement Interpolation
Generative Verifiers: Reward Modeling as Next-Token Prediction
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization
CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding & Reasoning Capabilities of CodeLLMs
OCCAM: Towards Cost-Efficient and Accuracy-Aware Classification Inference
Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling
Bias Mitigation in Graph Diffusion Models
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning
Class Distribution-induced Attention Map for Open-vocabulary Semantic Segmentations
Towards Faster Decentralized Stochastic Optimization with Communication Compression
DRL: Decomposed Representation Learning for Tabular Anomaly Detection
ELICIT: LLM Augmentation Via External In-context Capability
BrainACTIV: Identifying visuo-semantic properties driving cortical selectivity using diffusion-based image manipulation
Does Training with Synthetic Data Truly Protect Privacy?
EqNIO: Subequivariant Neural Inertial Odometry
The KoLMogorov Test: Compression by Code Generation
Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation
Fitting Networks with a Cancellation Trick
Flow Distillation Sampling: Regularizing 3D Gaussians with Pre-trained Matching Priors
Intermediate Layer Classifiers for OOD generalization
Interactive Speculative Planning: Enhance Agent Efficiency through Co-design of System and User Interface
Leave-One-Out Stable Conformal Prediction
RGB-Event ISP: The Dataset and Benchmark
PolyhedronNet: Representation Learning for Polyhedra with Surface-attributed Graph
Animate Your Thoughts: Reconstruction of Dynamic Natural Vision from Human Brain Activity
RuAG: Learned-rule-augmented Generation for Large Language Models
SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction
Test-time Adaptation for Cross-modal Retrieval with Query Shift
Scaling Instruction-tuned LLMs to Million-token Contexts via Hierarchical Synthetic Data Generation
ProteinBench: A Holistic Evaluation of Protein Foundation Models
S4M: S4 for multivariate time series forecasting with Missing values
What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?
Distilling Reinforcement Learning Algorithms for In-Context Model-Based Planning
Fourier Sliced-Wasserstein Embedding for Multisets and Measures
Learning Molecular Representation in a Cell
ADAM Optimization with Adaptive Batch Selection
Efficient and Trustworthy Causal Discovery with Latent Variables and Complex Relations
ControlAR: Controllable Image Generation with Autoregressive Models
MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection
Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
LucidPPN: Unambiguous Prototypical Parts Network for User-centric Interpretable Computer Vision
Generating CAD Code with Vision-Language Models for 3D Designs
Lightweight Neural App Control
Constructing Confidence Intervals for Average Treatment Effects from Multiple Datasets
Horizon Generalization in Reinforcement Learning
Flavors of Margin: Implicit Bias of Steepest Descent in Homogeneous Neural Networks
Human Simulacra: Benchmarking the Personification of Large Language Models
Generative Representational Instruction Tuning
ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL
Linear Partial Gromov-Wasserstein Embedding
PRISM: Privacy-Preserving Improved Stochastic Masking for Federated Generative Models
Hot-pluggable Federated Learning: Bridging General and Personalized FL via Dynamic Selection
The Foundations of Tokenization: Statistical and Computational Concerns
FormalAlign: Automated Alignment Evaluation for Autoformalization
TopoGaussian: Inferring Internal Topology Structures from Visual Clues
Understanding Constraint Inference in Safety-Critical Inverse Reinforcement Learning
FLIP: Flow-Centric Generative Planning as General-Purpose Manipulation World Model
LLaMaFlex: Many-in-one LLMs via Generalized Pruning and Weight Sharing
E(n) Equivariant Topological Neural Networks
Transformers Provably Learn Two-Mixture of Linear Classification via Gradient Flow
Toward Generalizing Visual Brain Decoding to Unseen Subjects
Global Convergence in Neural ODEs: Impact of Activation Functions
L-WISE: Boosting human visual category learning through model-based image selection and enhancement
Sequential Stochastic Combinatorial Optimization Using Hierarchal Reinforcement Learning
Periodic Materials Generation using Text-Guided Joint Diffusion Model
Efficient Action-Constrained Reinforcement Learning via Acceptance-Rejection Method and Augmented MDPs
DeLLMa: Decision Making Under Uncertainty with Large Language Models
Boosting Multiple Views for pretrained-based Continual Learning
Learning Efficient Positional Encodings with Graph Neural Networks
Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality
Multi-Robot Motion Planning with Diffusion Models
MatExpert: Decomposing Materials Discovery By Mimicking Human Experts
Ask, and it shall be given: On the Turing completeness of prompting
AtomSurf: Surface Representation for Learning on Protein Structures
ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks
Geometry of Neural Reinforcement Learning in Continuous State and Action Spaces
Beyond Random Augmentations: Pretraining with Hard Views
Conformal Language Model Reasoning with Coherent Factuality
HG-Adapter: Improving Pre-Trained Heterogeneous Graph Neural Networks with Dual Adapters
XAIguiFormer: explainable artificial intelligence guided transformer for brain disorder identification
Efficient Inference for Large Language Model-based Generative Recommendation
AugKD: Ingenious Augmentations Empower Knowledge Distillation for Image Super-Resolution
Implicit Search via Discrete Diffusion: A Study on Chess
Consistent Flow Distillation for Text-to-3D Generation
Grounding Continuous Representations in Geometry: Equivariant Neural Fields
Open-CK: A Large Multi-Physics Fields Coupling benchmarks in Combustion Kinetics
CheapNet: Cross-attention on Hierarchical representations for Efficient protein-ligand binding Affinity Prediction
Boltzmann Semantic Score: A Semantic Metric for Evaluating Large Vision Models Using Large Language Models
RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection
Doubly robust identification of treatment effects from multiple environments
Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models
Integrating Protein Dynamics into Structure-Based Drug Design via Full-Atom Stochastic Flows
Hierarchically Encapsulated Representation for Protocol Design in Self-Driving Labs
SIMPL: Scalable and hassle-free optimisation of neural representations from behaviour
Causal Order: The Key to Leveraging Imperfect Experts in Causal Inference
Beyond Sequence: Impact of Geometric Context for RNA Property Prediction
Debiasing Federated Learning with Correlated Client Participation
AutoCGP: Closed-Loop Concept-Guided Policies from Unlabeled Demonstrations
Sparse Autoencoders Do Not Find Canonical Units of Analysis
Transformers Struggle to Learn to Search
Vision and Language Synergy for Rehearsal Free Continual Learning
OSDA Agent: Leveraging Large Language Models for De Novo Design of Organic Structure Directing Agents
Mixture of Parrots: Experts improve memorization more than reasoning
Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference
The Complexity of Two-Team Polymatrix Games with Independent Adversaries
Differentiable and Learnable Wireless Simulation with Geometric Transformers
Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold
DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing
LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch
SplatFormer: Point Transformer for Robust 3D Gaussian Splatting
ALBAR: Adversarial Learning approach to mitigate Biases in Action Recognition
3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
OmniPhysGS: 3D Constitutive Gaussians for General Physics-Based Dynamics Generation
When do GFlowNets learn the right distribution?
Hyper-Connections
PRDP: Progressively Refined Differentiable Physics
An Efficient Framework for Crediting Data Contributors of Diffusion Models
Ada-K Routing: Boosting the Efficiency of MoE-based LLMs
Enhancing Zeroth-order Fine-tuning for Language Models with Low-rank Structures
DebGCD: Debiased Learning with Distribution Guidance for Generalized Category Discovery
On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Semi-Supervised CLIP Adaptation by Enforcing Semantic and Trapezoidal Consistency
SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training
Denoising Task Difficulty-based Curriculum for Training Diffusion Models
Data Pruning by Information Maximization
Edge Prompt Tuning for Graph Neural Networks
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
LevAttention: Time, Space and Streaming Efficient Algorithm for Heavy Attentions
ParFam -- (Neural Guided) Symbolic Regression via Continuous Global Optimization
Towards a Unified and Verified Understanding of Group-Operation Networks
RouteLLM: Learning to Route LLMs from Preference Data
Sharpness-Aware Minimization: General Analysis and Improved Rates
PFGuard: A Generative Framework with Privacy and Fairness Safeguards
Reconstructive Visual Instruction Tuning
DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from a Single Demo
Explain Yourself, Briefly! Self-Explaining Neural Networks with Concise Sufficient Reasons
From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data
Automated Filtering of Human Feedback Data for Aligning Text-to-Image Diffusion Models
Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
Dynamic Contrastive Skill Learning with State-Transition Based Skill Clustering and Dynamic Length Adjustment
Optimal Strong Regret and Violation in Constrained MDPs via Policy Optimization
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding
Quality Measures for Dynamic Graph Generative Models
GameGen-X: Interactive Open-world Game Video Generation
Federated Domain Generalization with Data-free On-server Matching Gradient
Breach By A Thousand Leaks: Unsafe Information Leakage in 'Safe' AI Responses
Generative World Explorer
CraftRTL: High-quality Synthetic Data Generation for Verilog Code Models with Correct-by-Construction Non-Textual Representations and Targeted Code Repair
X-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos
SEPARATE: A Simple Low-rank Projection for Gradient Compression in Modern Large-scale Model Training Process
GDrag:Towards General-Purpose Interactive Editing with Anti-ambiguity Point Diffusion
ToolACE: Winning the Points of LLM Function Calling
Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search
Optimal Brain Apoptosis
PICASO: Permutation-Invariant Context Composition with State Space Models
Towards Unbiased Learning in Semi-Supervised Semantic Segmentation
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities
EdgeRunner: Auto-regressive Auto-encoder for Artistic Mesh Generation
PPT: Patch Order Do Matters In Time Series Pretext Task
Better Instruction-Following Through Minimum Bayes Risk
Hierarchical World Models as Visual Whole-Body Humanoid Controllers
See What You Are Told: Visual Attention Sink in Large Multimodal Models
Exploring a Principled Framework for Deep Subspace Clustering
TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark
LASeR: Towards Diversified and Generalizable Robot Design with Large Language Models
Sketching for Convex and Nonconvex Regularized Least Squares with Sharp Guarantees
CoTFormer: A Chain of Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference
Manifold Induced Biases for Zero-shot and Few-shot Detection of Generated Images
GRAIN: Exact Graph Reconstruction from Gradients
Do WGANs succeed because they minimize the Wasserstein Distance? Lessons from Discrete Generators
Theory on Mixture-of-Experts in Continual Learning
Forte : Finding Outliers with Representation Typicality Estimation
Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient
Modality-Specialized Synergizers for Interleaved Vision-Language Generalists
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Lie Algebra Canonicalization: Equivariant Neural Operators under arbitrary Lie Groups
3DGS-Drag: Dragging Gaussians for Intuitive Point-Based 3D Editing
Prompting Fairness: Integrating Causality to Debias Large Language Models
WardropNet: Traffic Flow Predictions via Equilibrium-Augmented Learning
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
PuzzleFusion++: Auto-agglomerative 3D Fracture Assembly by Denoise and Verify
Regretful Decisions under Label Noise
Graph Sparsification via Mixture of Graphs
Bad-PFL: Exploiting Backdoor Attacks against Personalized Federated Learning
On the Adversarial Risk of Test Time Adaptation: An Investigation into Realistic Test-Time Data Poisoning
Towards Foundation Models for Mixed Integer Linear Programming
Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
Unsupervised Multiple Kernel Learning for Graphs via Ordinality Preservation
Data Taggants: Dataset Ownership Verification Via Harmless Targeted Data Poisoning
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models
Hummingbird: High Fidelity Image Generation via Multimodal Context Alignment
Benign Overfitting in Out-of-Distribution Generalization of Linear Models
Interference Among First-Price Pacing Equilibria: A Bias and Variance Analysis
UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
High-dimension Prototype is a Better Incremental Object Detection Learner
FedLWS: Federated Learning with Adaptive Layer-wise Weight Shrinking
Identification of Intermittent Temporal Latent Process
STAR: Stability-Inducing Weight Perturbation for Continual Learning
CSA: Data-efficient Mapping of Unimodal Features to Multimodal Features
Linear Transformer Topological Masking with Graph Random Features
Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model
Token-Supervised Value Models for Enhancing Mathematical Problem-Solving Capabilities of Large Language Models
Preference Diffusion for Recommendation
Towards Hierarchical Rectified Flow
Variational Diffusion Posterior Sampling with Midpoint Guidance
Curriculum-aware Training for Discriminating Molecular Property Prediction Models
Diverse Policies Recovering via Pointwise Mutual Information Weighted Imitation Learning
Scaling up the Banded Matrix Factorization Mechanism for Large Scale Differentially Private ML
Pyramidal Flow Matching for Efficient Video Generative Modeling
BlendRL: A Framework for Merging Symbolic and Neural Policy Learning
Doubly Optimal Policy Evaluation for Reinforcement Learning
Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy
Expressivity of Neural Networks with Random Weights and Learned Biases
Manifold Constraint Reduces Exposure Bias in Accelerated Diffusion Sampling
VVC-Gym: A Fixed-Wing UAV Reinforcement Learning Environment for Multi-Goal Long-Horizon Problems
Precise Localization of Memories: A Fine-grained Neuron-level Knowledge Editing Technique for LLMs
PhiNets: Brain-inspired Non-contrastive Learning Based on Temporal Prediction Hypothesis
TTVD: Towards a Geometric Framework for Test-Time Adaptation Based on Voronoi Diagram
Learning under Temporal Label Noise
Affine Steerable Equivariant Layer for Canonicalization of Neural Networks
MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs
COPER: Correlation-based Permutations for Multi-View Clustering
Learning Task Belief Similarity with Latent Dynamics for Meta-Reinforcement Learning
Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
Steering Large Language Models between Code Execution and Textual Reasoning
Contextualizing biological perturbation experiments through language
Captured by Captions: On Memorization and its Mitigation in CLIP Models
NeuralPlane: Structured 3D Reconstruction in Planar Primitives with Neural Fields
SD-LoRA: Scalable Decoupled Low-Rank Adaptation for Class Incremental Learning
dEBORA: Efficient Bilevel Optimization-based low-Rank Adaptation
Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data
Solving Differential Equations with Constrained Learning
Energy-based Backdoor Defense Against Federated Graph Learning
Quantized Spike-driven Transformer
Prioritized Generative Replay
Everything, Everywhere, All at Once: Is Mechanistic Interpretability Identifiable?
Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention
Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis
Enhancing Compositional Text-to-Image Generation with Reliable Random Seeds
HR-Extreme: A High-Resolution Dataset for Extreme Weather Forecasting
PnP-Flow: Plug-and-Play Image Restoration with Flow Matching
Measuring Non-Adversarial Reproduction of Training Data in Large Language Models
Neural Fluid Simulation on Geometric Surfaces
How to Probe: Simple Yet Effective Techniques for Improving Post-hoc Explanations
Graph-Guided Scene Reconstruction from Images with 3D Gaussian Splatting
Point Cluster: A Compact Message Unit for Communication-Efficient Collaborative Perception
MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
IFORMER: INTEGRATING CONVNET AND TRANSFORMER FOR MOBILE APPLICATION
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations
Neural Multi-Objective Combinatorial Optimization via Graph-Image Multimodal Fusion
Rethinking Light Decoder-based Solvers for Vehicle Routing Problems
A-Bench: Are LMMs Masters at Evaluating AI-generated Images?
Revealing and Mitigating Over-Attention in Knowledge Editing
Fourier Head: Helping Large Language Models Learn Complex Probability Distributions
Accelerating Goal-Conditioned Reinforcement Learning Algorithms and Research
Unlocking Point Processes through Point Set Diffusion
HyperFace: Generating Synthetic Face Recognition Datasets by Exploring Face Embedding Hypersphere
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning
Graph Assisted Offline-Online Deep Reinforcement Learning for Dynamic Workflow Scheduling
On the self-verification limitations of large language models on reasoning and planning tasks
Evidential Learning-based Certainty Estimation for Robust Dense Feature Matching
LoCA: Location-Aware Cosine Adaptation for Parameter-Efficient Fine-Tuning
Democratic Training Against Universal Adversarial Perturbations
Free Hunch: Denoiser Covariance Estimation for Diffusion Models Without Extra Costs
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Learning from negative feedback, or positive feedback or both
CONGO: Compressive Online Gradient Optimization
Training-Free Message Passing for Learning on Hypergraphs
FreDF: Learning to Forecast in the Frequency Domain
Planning in Natural Language Improves LLM Search for Code Generation
Single-agent Poisoning Attacks Suffice to Ruin Multi-Agent Learning
Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning
FedTMOS: Efficient One-Shot Federated Learning with Tsetlin Machine
On Quantizing Neural Representation for Variable-Rate Video Coding
Demystifying Online Clustering of Bandits: Enhanced Exploration Under Stochastic and Smoothed Adversarial Contexts
Efficient Automated Circuit Discovery in Transformers using Contextual Decomposition
Tailoring Mixup to Data for Calibration
BrainOOD: Out-of-distribution Generalizable Brain Network Analysis
AI2TALE: An Innovative Information Theory-based Approach for Learning to Localize Phishing Attacks
Input Space Mode Connectivity in Deep Neural Networks
Automatic Curriculum Expert Iteration for Reliable LLM Reasoning
Simulating Human-like Daily Activities with Desire-driven Autonomy
ConMix: Contrastive Mixup at Representation Level for Long-tailed Deep Clustering
CR2PQ: Continuous Relative Rotary Positional Query for Dense Visual Representation Learning
Iterative Substructure Extraction for Molecular Relational Learning with Interactive Graph Information Bottleneck
DiffPuter: An EM-Driven Diffusion Model for Missing Data Imputation
Model-Agnostic Knowledge Guided Correction for Improved Neural Surrogate Rollout
On Speeding Up Language Model Evaluation
Broaden your SCOPE! Efficient Multi-turn Conversation Planning for LLMs with Semantic Space
LeFusion: Controllable Pathology Synthesis via Lesion-Focused Diffusion Models
Selective Label Enhancement Learning for Test-Time Adaptation
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
IPDreamer: Appearance-Controllable 3D Object Generation with Complex Image Prompts
Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs
Multi-Field Adaptive Retrieval
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
RelitLRM: Generative Relightable Radiance for Large Reconstruction Models
Group Distributionally Robust Dataset Distillation with Risk Minimization
WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models
Learning to Contextualize Web Pages for Enhanced Decision Making by LLM Agents
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
Interpretable Causal Representation Learning for Biological Data in the Pathway Space
Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models
LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs
Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning
Distribution Backtracking Builds A Faster Convergence Trajectory for Diffusion Distillation
Revisit the Open Nature of Open Vocabulary Semantic Segmentation
Null Counterfactual Factor Interactions for Goal-Conditioned Reinforcement Learning
Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors
uniINF: Best-of-Both-Worlds Algorithm for Parameter-Free Heavy-Tailed MABs
The Superposition of Diffusion Models Using the Itô Density Estimator
T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching
Towards hyperparameter-free optimization with differential privacy
GLoRa: A Benchmark to Evaluate the Ability to Learn Long-Range Dependencies in Graphs
DeepRTL: Bridging Verilog Understanding and Generation with a Unified Representation Model
On the Identification of Temporal Causal Representation with Instantaneous Dependence
Looped Transformers for Length Generalization
Advancing LLM Reasoning Generalists with Preference Trees
HiLo: A Learning Framework for Generalized Category Discovery Robust to Domain Shifts
Improving Long-Text Alignment for Text-to-Image Diffusion Models
Has the Deep Neural Network learned the Stochastic Process? An Evaluation Viewpoint
InvestESG: A multi-agent reinforcement learning benchmark for studying climate investment as a social dilemma
DICE: Data Influence Cascade in Decentralized Learning
Boost Self-Supervised Dataset Distillation via Parameterization, Predefined Augmentation, and Approximation
On a Connection Between Imitation Learning and RLHF
CollabEdit: Towards Non-destructive Collaborative Knowledge Editing
Global Convergence of Policy Gradient in Average Reward MDPs
ZIP: An Efficient Zeroth-order Prompt Tuning for Black-box Vision-Language Models
Flow matching achieves almost minimax optimal convergence
Automated Proof Generation for Rust Code via Self-Evolution
ParaSolver: A Hierarchical Parallel Integral Solver for Diffusion Models
Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures
CLIPDrag: Combining Text-based and Drag-based Instructions for Image Editing
From Commands to Prompts: LLM-based Semantic File System for AIOS
Few-Class Arena: A Benchmark for Efficient Selection of Vision Models and Dataset Difficulty Measurement
Learning from weak labelers as constraints
HELMET: How to Evaluate Long-context Models Effectively and Thoroughly
PharmacoMatch: Efficient 3D Pharmacophore Screening via Neural Subgraph Matching
HQGS: High-Quality Novel View Synthesis with Gaussian Splatting in Degraded Scenes
A deep inverse-mapping model for a flapping robotic wing
Explore Theory of Mind: program-guided adversarial data generation for theory of mind reasoning
A General Framework for Producing Interpretable Semantic Text Embeddings
Neuroplastic Expansion in Deep Reinforcement Learning
Lambda-Skip Connections: the architectural component that prevents Rank Collapse
Variational Search Distributions
Cross-Attention Head Position Patterns Can Align with Human Visual Concepts in Text-to-Image Generative Models
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation
MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References
More Experts Than Galaxies: Conditionally-Overlapping Experts with Biologically-Inspired Fixed Routing
Aligning Language Models with Demonstrated Feedback
Conformal Generative Modeling with Improved Sample Efficiency through Sequential Greedy Filtering
Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
Long Context Compression with Activation Beacon
Wicked Oddities: Selectively Poisoning for Effective Clean-Label Backdoor Attacks
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?
Discrete GCBF Proximal Policy Optimization for Multi-agent Safe Optimal Control
Mastering Task Arithmetic: $\tau$Jp as a Key Indicator for Weight Disentanglement
GenXD: Generating Any 3D and 4D Scenes
A Riemannian Framework for Learning Reduced-order Lagrangian Dynamics
Monet: Mixture of Monosemantic Experts for Transformers
TexTailor: Customized Text-aligned Texturing via Effective Resampling
Revisiting Mode Connectivity in Neural Networks with Bezier Surface
Optimizing Neural Network Representations of Boolean Networks
Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models
6D Object Pose Tracking in Internet Videos for Robotic Manipulation
Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
Transformer Meets Twicing: Harnessing Unattended Residual Information
TRACE: Temporal Grounding Video LLM via Causal Event Modeling
OmniRe: Omni Urban Scene Reconstruction
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity
Influence-Guided Diffusion for Dataset Distillation
ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization
On Bits and Bandits: Quantifying the Regret-Information Trade-off
Going Beyond Feature Similarity: Effective Dataset distillation based on Class-aware Conditional Mutual Information
Comparing Targeting Strategies for Maximizing Social Welfare with Limited Resources
Universal generalization guarantees for Wasserstein distributionally robust models
PETRA: Parallel End-to-end Training with Reversible Architectures
Mufu: Multilingual Fused Learning for Low-Resource Translation with LLM
PN-GAIL: Leveraging Non-optimal Information from Imperfect Demonstrations
Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count
Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage
Edge-aware Image Smoothing with Relative Wavelet Domain Representation
SAVA: Scalable Learning-Agnostic Data Valuation
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Masked Image Modeling Representations
Variational Bayesian Pseudo-Coreset
Brain-inspired $L_p$-Convolution benefits large kernels and aligns better with visual cortex
Probing the Latent Hierarchical Structure of Data via Diffusion Models
Personality Alignment of Large Language Models
Neural Eulerian Scene Flow Fields
Divide and Translate: Compositional First-Order Logic Translation and Verification for Complex Logical Reasoning
Holistically Evaluating the Environmental Impact of Creating Language Models
Adaptive Gradient Clipping for Robust Federated Learning
Port-Hamiltonian Architectural Bias for Long-Range Propagation in Deep Graph Networks
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback
Lost in Prediction: Why Social Media Narratives Don't Help Macroeconomic Forecasting?
Peeking Behind Closed Doors: Risks of LLM Evaluation by Private Data Curators
Reassessing EMNLP 2024’s Best Paper: Does Divergence-Based Calibration for MIAs Hold Up?
In Search of the Engram in LLMs: A Neuroscience Perspective on the Memory Functions in AI Models
On the Computation of the Fisher Information in Continual Learning
Bootstrapped Energy Based Models: What are they good for?
On LLM Knowledge Distillation - A Comparison between Forward KL and Reverse KL
Repurposing in AI: A Distinct Approach or an Extension of Creative Problem Solving?
Rethinking Graph Prompts: Unraveling the Power of Data Manipulation in Graph Neural Networks
A primer on analytical learning dynamics of nonlinear neural networks
Think Twice Before Claiming Your Optimization Algorithm Outperformance - Review and Beyond
How to visualize training dynamics in neural networks
Analysing The Spectral Biases in Generative Models
LLMs' Potential Influences on Our Democracy: Challenges and Opportunities
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
Do not write that jailbreak paper
A Visual Dive into Conditional Flow Matching
Multi-modal Learning: A Look Back and the Road Ahead
Restating the Proof of Linear Convergence for Linear GNNs
Understanding Model Calibration - A gentle introduction and visual exploration of calibration and the expected calibration error (ECE)
Pre-training of Foundation Adapters for LLM Fine-tuning
The Illustrated AlphaFold
Pitfalls of Evidence-Based AI Policy
Flow With What You Know
Linear Recursions for Everyone
Factual Context Validation and Simplification: A Scalable Method to Enhance GPT Trustworthiness and Efficiency
Euler Characteristic Tools for Topological Data Analysis
Boundary constrained Gaussian processes for robust physics-informed machine learning of linear partial differential equations
Manifold Learning by Mixture Models of VAEs for Inverse Problems
Non-Stationary Dueling Bandits Under a Weighted Borda Criterion
BBCaL: Black-box Backdoor Detection under the Causality Lens
Regularized Proportional Fairness Mechanism for Resource Allocation Without Money
Reward Guided Latent Consistency Distillation
Improving Text-to-Image Consistency via Automatic Prompt Optimization
Risk-Controlling Model Selection via Guided Bayesian Optimization
Plug, Play, and Generalize: Length Extrapolation with Pointer-Augmented Neural Memory
Hessian Free Efficient Single Loop Iterative Differentiation Methods for Bi-Level Optimization Problems
Learning multi-modal generative models with permutation-invariant encoders and tighter variational objectives
Interpreting Global Perturbation Robustness of Image Models using Axiomatic Spectral Importance Decomposition
Re-Thinking Inverse Graphics With Large Language Models
PCNN: Probable-Class Nearest-Neighbor Explanations Improve Fine-Grained Image Classification Accuracy for AIs and Humans
Revisiting Energy Based Models as Policies: Ranking Noise Contrastive Estimation and Interpolating Energy Models
Self-Improvement for Neural Combinatorial Optimization: Sample Without Replacement, but Improvement
Controlling the Fidelity and Diversity of Deep Generative Models via Pseudo Density
Causal Reasoning and Large Language Models: Opening a New Frontier for Causality
Solving Inverse Problems with Model Mismatch using Untrained Neural Networks within Model-based Architectures
Fine-tuning can cripple your foundation model; preserving features may be the solution
As large as it gets – Studying Infinitely Large Convolutions via Neural Implicit Frequency Filters
GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models
What Has Been Overlooked in Contrastive Source-Free Domain Adaptation: Leveraging Source-Informed Latent Augmentation within Neighborhood Context
On the Inherent Privacy Properties of Discrete Denoising Diffusion Models
Understanding Fairness Surrogate Functions in Algorithmic Fairness
LeanVec: Searching vectors faster by making them fit
E-Valuating Classifier Two-Sample Tests
Enhancing Vision-Language Model with Unmasked Token Alignment
Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
DINOv2: Learning Robust Visual Features without Supervision
Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning
Finding and Only Finding Differential Nash Equilibria by Both Pretending to be a Follower
LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph Generation
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models
KGARevion: An AI Agent for Knowledge-Intensive Biomedical QA
Personalized Visual Instruction Tuning
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
$F^3Set$: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing
AgentSquare: Automatic LLM Agent Search in Modular Design Space
Image Watermarks are Removable using Controllable Regeneration from Clean Noise
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
$R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Personalized Representation from Personalized Generation
3DIS: Depth-Driven Decoupled Image Synthesis for Universal Multi-Instance Generation
3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting
Disentangling 3D Animal Pose Dynamics with Scrubbed Conditional Latent Variables
BadRobot: Jailbreaking Embodied LLMs in the Physical World
3D-SPATIAL MULTIMODAL MEMORY
Towards Homogeneous Lexical Tone Decoding from Heterogeneous Intracranial Recordings
4K4DGen: Panoramic 4D Generation at 4K Resolution
CycleResearcher: Improving Automated Research via Automated Review
UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation
Aria-MIDI: A Dataset of Piano MIDI Files for Symbolic Music Modeling
K-HALU: Multiple Answer Korean Hallucination Benchmark for Large Language Models
A Black Swan Hypothesis: The Role of Human Irrationality in AI Safety
A Causal Lens for Learning Long-term Fair Policies
Accelerated Over-Relaxation Heavy-Ball Method: Achieving Global Accelerated Convergence with Broad Generalization
Accelerating 3D Molecule Generation via Jointly Geometric Optimal Transport
Measuring And Improving Engagement of Text-to-Image Generation Models
A Meta-Learning Approach to Bayesian Causal Discovery
Unleashing the Power of Task-Specific Directions in Parameter Efficient Fine-tuning
Can a Large Language Model be a Gaslighter?
GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding
Accelerating Task Generalisation with Multi-Level Skill Hierarchies
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding
Accurate and Scalable Graph Neural Networks via Message Invariance
ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer
A Closer Look at Machine Unlearning for Large Language Models
A Computational Framework for Modeling Emergence of Color Vision in the Human Brain
ActionReasoningBench: Reasoning about Actions with and without Ramification Constraints
IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
ACTIVE: Offline Reinforcement Learning via Adaptive Imitation and In-sample $V$-Ensemble
Grounding Multimodal Large Language Model in GUI World
A Curious Case of the Missing Measure: Better Scores and Worse Generation
AdaGrad under Anisotropic Smoothness
OASIS Uncovers: High-Quality T2I Models, Same Old Stereotypes
Empowering LLM Agents with Zero-Shot Optimal Decision-Making through Q-learning
TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types
Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity
ConcreTizer: Model Inversion Attack via Occupancy Classification and Dispersion Control for 3D Point Cloud Restoration
ADAPT: Attentive Self-Distillation and Dual-Decoder Prediction Fusion for Continual Panoptic Segmentation
Adapting Multi-modal Large Language Model to Concept Drift From Pre-training Onwards
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
Adaptive Batch Size for Privately Finding Second-Order Stationary Points
TeaserGen: Generating Teasers for Long Documentaries
TAU-106K: A New Dataset for Comprehensive Understanding of Traffic Accident
Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels
Operator Deep Smoothing for Implied Volatility
Looking Backward: Streaming Video-to-Video Translation with Feature Banks
Adaptive Length Image Tokenization via Recurrent Allocation
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
Adaptive Pruning of Pretrained Transformer via Differential Inclusions
Adaptive Retention & Correction: Test-Time Training for Continual Learning
Adaptive Shrinkage Estimation for Personalized Deep Kernel Regression in Modeling Brain Trajectories
Adaptive Transformer Programs: Bridging the Gap Between Performance and Interpretability in Transformers
I Can Hear You: Selective Robust Training for Deepfake Audio Detection
AdaWM: Adaptive World Model based Planning for Autonomous Driving
ADBM: Adversarial Diffusion Bridge Model for Reliable Adversarial Purification
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers
A Deep Generative Learning Approach for Two-stage Adaptive Robust Optimization
A Differentiable Metric for Discovering Groups and Unitary Representations
Advancing Out-of-Distribution Detection via Local Neuroplasticity
Robust Barycenter Estimation using Semi-Unbalanced Neural Optimal Transport
Adversarial Latent Feature Augmentation for Fairness
Adversarial Mixup Unlearning
DarkBench: Benchmarking Dark Patterns in Large Language Models
Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step
Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data
AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models
Diffusion-Based Planning for Autonomous Driving with Flexible Guidance
Agent-Oriented Planning in Multi-Agent Systems
Skill Expansion and Composition in Parameter Space
PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials
AIMS.au: A Dataset for the Analysis of Modern Slavery Countermeasures in Corporate Statements
A Large-scale Training Paradigm for Graph Generative Models
Optimizing 4D Gaussians for Dynamic Scene Video from Single Landscape Images
Algorithmic Stability Based Generalization Bounds for Adversarial Training
Aligned Better, Listen Better For Audio-Visual Large Language Models
Aligned Datasets Improve Detection of Latent Diffusion-Generated Images
Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits
InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation
Amortized Control of Continuous State Space Feynman-Kac Model for Irregular Time Series
Activation Gradient based Poisoned Sample Detection Against Backdoor Attacks
ANaGRAM: A Natural Gradient Relative to Adapted Model for efficient PINNs learning
Positional Embeddings in Transformer Models: Evolution from Text to Vision Domains
How Much is a Noisy Image Worth? Data Scaling Laws for Ambient Diffusion.
Ambient Diffusion Posterior Sampling: Solving Inverse Problems with Diffusion Models Trained on Corrupted Data
Analyzing Neural Scaling Laws in Two-Layer Networks with Power-Law Data Spectra
An Auditing Test to Detect Behavioral Shift in Language Models
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Infilling Score: A Pretraining Data Detection Algorithm for Large Language Models
An Effective Theory of Bias Amplification
An Empirical Analysis of Uncertainty in Large Language Model Evaluations
An Evolved Universal Transformer Memory
A new framework for evaluating model out-of-distribution generalisation for the biochemical domain
An Illustrated Guide to Automatic Sparse Differentiation
Animate-X: Universal Character Image Animation with Enhanced Motion Representation
AnoLLM: Large Language Models for Tabular Anomaly Detection
An Online Learning Theory of Trading-Volume Maximization
MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
A Percolation Model of Emergence: Analyzing Transformers Trained on a Formal Language
Handling Delay in Real-Time Reinforcement Learning
A Policy-Gradient Approach to Solving Imperfect-Information Games with Best-Iterate Convergence
Approximating Full Conformal Prediction for Neural Network Regression with Gauss-Newton Influence
Approximation algorithms for combinatorial optimization with predictions
A Rainbow in Deep Network Black Boxes
Learning system dynamics without forgetting
Are Large Vision Language Models Good Game Players?
ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
Advancing Prompt-Based Methods for Replay-Independent General Continual Learning
A Sanity Check for AI-generated Image Detection
Online-to-Offline RL for Agent Alignment
AssembleFlow: Rigid Flow Matching with Inertial Frames for Molecular Assembly
A Theory for Token-Level Harmonization in Retrieval-Augmented Generation
A Transfer Attack to Image Watermarks
Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers
Attention with Markov: A Curious Case of Single-layer Transformers
A Unified Theory of Quantum Neural Network Loss Landscapes
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval-Augmented Generation
Autoregressive Pretraining with Mamba in Vision
AutoUAD: Hyper-parameter Optimization for Unsupervised Anomaly Detection
Backtracking Improves Generation Safety
Balanced Ranking with Relative Centrality: A multi-core periphery perspective
Bayesian Experimental Design Via Contrastive Diffusions
Bayesian Optimization of Antibodies Informed by a Generative Model of Evolving Sequences
Bayesian Optimization via Continual Variational Last Layer Training
Bayesian Treatment of the Spectrum of the Empirical Kernel in (Sub)Linear-Width Neural Networks
BEEM: Boosting Performance of Early Exit DNNs using Multi-Exit Classifiers as Experts
BenTo: Benchmark Reduction with In-Context Transferability
Better autoregressive regression with LLMs via regression-aware fine-tuning
Better than Your Teacher: LLM Agents that learn from Privileged AI Feedback
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time
Beyond correlation: The impact of human uncertainty in measuring the effectiveness of automatic evaluation and LLM-as-a-judge
Beyond Graphs: Can Large Language Models Comprehend Hypergraphs?
Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness
Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification
Beyond Squared Error: Exploring Loss Design for Enhanced Training of Generative Flow Networks
Beyond Worst-Case Dimensionality Reduction for Sparse Vectors
Binary Losses for Density Ratio Estimation
BingoGuard: LLM Content Moderation Tools with Risk Levels
BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments
Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequences
BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models
Bisimulation Metric for Model Predictive Control
BitStack: Any-Size Compression of Large Language Models in Variable Memory Environments
BLEND: Behavior-guided Neural Population Dynamics Modeling via Privileged Knowledge Distillation
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Boltzmann-Aligned Inverse Folding Model as a Predictor of Mutational Effects on Protein-Protein Interactions
BOND: Aligning LLMs with Best-of-N Distillation
Boosting Latent Diffusion with Perceptual Objectives
Boosting Perturbed Gradient Ascent for Last-Iterate Convergence in Games
Boosting Ray Search Procedure of Hard-label Attacks with Transfer-based Priors
Bounds on $L_p$ Errors in Density Ratio Estimation via $f$-Divergence Loss Functions
BRAID: Input-driven Nonlinear Dynamical Modeling of Neural-Behavioral Data
Breaking Free from MMI: A New Frontier in Rationalization by Probing Input Utilization
Breaking Mental Set to Improve Reasoning through Diverse Multi-Agent Debate
Breaking the $\log(1/\Delta_2)$ Barrier: Better Batched Best Arm Identification with Adaptive Grids
Bridging the Gap Between f-divergences and Bayes Hilbert Spaces
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Building Blocks of Differentially Private Training
Building Math Agents with Multi-Turn Iterative Preference Learning
Can In-context Learning Really Generalize to Out-of-distribution Tasks?
Can Large Language Models Understand Symbolic Graphics Programs?
Can LLM Simulations Truly Reflect Humanity? A Deep Dive
Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?
Can LLMs Understand Time Series Anomalies?
Can One Modality Model Synergize Training of Other Modality Models?
Can Reinforcement Learning Solve Asymmetric Combinatorial-Continuous Zero-Sum Games?
Can Transformers Do Enumerative Geometry?
Can Watermarked LLMs be Identified by Users via Crafted Prompts?
CARTS: Advancing Neural Theorem Proving with Diversified Tactic Calibration and Bias-Resistant Tree Search
Catastrophic Failure of LLM Unlearning via Quantization
CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models
Cauchy-Schwarz Regularizers
Causal Concept Graph Models: Beyond Causal Opacity in Deep Learning
Causal Effect Estimation with Mixed Latent Confounders and Post-treatment Variables
CBMA: Improving Conformal Prediction through Bayesian Model Averaging
C-CLIP: Multimodal Continual Learning for Vision-Language Model
CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models
Century: A Framework and Dataset for Evaluating Historical Contextualisation of Sensitive Images
CertainlyUncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding
Chain-of-region: Visual Language Models Need Details for Diagram Analysis
CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities
Chemistry-Inspired Diffusion with Non-Differentiable Guidance
Classic but Everlasting: Traditional Gradient-Based Algorithms Converges Fast Even in Time-Varying Multi-Player Games
CLIPure: Purification in Latent Space via CLIP for Adversarially Robust Zero-Shot Classification
CodePlan: Unlocking Reasoning Potential in Large Language Models by Scaling Code-form Planning
COFlowNet: Conservative Constraints on Flows Enable High-Quality Candidate Generation
CoInD: Enabling Logical Compositions in Diffusion Models
Collab: Controlled Decoding using Mixture of Agents for LLM Alignment
Collapsed Language Models Promote Fairness
ColPali: Efficient Document Retrieval with Vision Language Models
ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization
COMBO: Compositional World Models for Embodied Multi-Agent Cooperation
ComPC: Completing a 3D Point Cloud with 2D Diffusion Priors
Competition Dynamics Shape Algorithmic Phases of In-Context Learning
Composable Interventions for Language Models
Compositional Entailment Learning for Hyperbolic Vision-Language Models
Compositional simulation-based inference for time series
Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models
Compute-Constrained Data Selection
Computing Circuits Optimization via Model-Based Circuit Genetic Evolution
ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning
Concept-ROT: Poisoning Concepts in Large Language Models with Model Editing
CONDA: Adaptive Concept Bottleneck for Foundation Models Under Distribution Shifts
Confidence Elicitation: A New Attack Vector for Large Language Models
Conformal Structured Prediction
Conservative Contextual Bandits: Beyond Linear Representations
Constraint-Conditioned Actor-Critic for Offline Safe Reinforcement Learning
Content-Style Learning from Unaligned Domains: Identifiability under Unknown Latent Dimensions
Contextual Document Embeddings
Continual Slow-and-Fast Adaptation of Latent Neural Dynamics (CoSFan): Meta-Learning What-How & When to Adapt
Continuous Autoregressive Modeling with Stochastic Monotonic Alignment for Speech Synthesis
Continuous Ensemble Weather Forecasting with Diffusion models
Controllable Unlearning for Image-to-Image Generative Models via $\epsilon$-Constrained Optimization
Controlled LLM Decoding via Discrete Auto-regressive Biasing
Control-oriented Clustering of Visual Latent Representation
Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis
Coreset Spectral Clustering
Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization
Correlation and Navigation in the Vocabulary Key Representation Space of Language Models
Counterfactual Realizability
CPSample: Classifier Protected Sampling for Guarding Training Data During Diffusion
CR-CTC: Consistency regularization on CTC for improved speech recognition
Credal Wrapper of Model Averaging for Uncertainty Estimation in Classification
Credit-based self organizing maps: training deep topographic networks with minimal performance degradation
Cross-Domain Off-Policy Evaluation and Learning for Contextual Bandits
Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models
CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation
CTSyn: A Foundation Model for Cross Tabular Data Generation
CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation
CyberHost: A One-stage Diffusion Framework for Audio-driven Talking Body Generation
DAMO: Decoding by Accumulating Activations Momentum for Mitigating Hallucinations in Vision-Language Models
Data-centric Prediction Explanation via Kernelized Stein Discrepancy
Data Distillation for extrapolative protein design through exact preference optimization
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Data Unlearning in Diffusion Models
Decentralized Sporadic Federated Learning: A Unified Algorithmic Framework with Convergence Guarantees
Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies
Decoupled Subgraph Federated Learning
DeeperForward: Enhanced Forward-Forward Training for Deeper and Better Performance
Deep Kernel Posterior Learning under Infinite Variance Prior Weights
DeepTAGE: Deep Temporal-Aligned Gradient Enhancement for Optimizing Spiking Neural Networks
Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries
DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
DELIFT: Data Efficient Language model Instruction Fine-Tuning
DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory
Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration
Dense Video Object Captioning from Disjoint Supervision
Density estimation with LLMs: a geometric investigation of in-context learning trajectories
Depth Any Video with Scalable Synthetic Data
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
Descent with Misaligned Gradients and Applications to Hidden Convexity
Designing Mechanical Meta-Materials by Learning Equivariant Flows
D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement
Difference-of-submodular Bregman Divergence
Differentiable Optimization of Similarity Scores Between Models and Brains
Differential learning kinetics govern the transition from memorization to generalization during in-context learning
Differentially private learners for heterogeneous treatment effects
Differentially Private Steering for Large Language Model Alignment
Diffusion Actor-Critic: Formulating Constrained Policy Iteration as Diffusion Noise Regression for Offline Reinforcement Learning
Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion Model
Diffusion-based Decoupled Deterministic and Uncertain Framework for Probabilistic Multivariate Time Series Forecasting
Diffusion Bridge AutoEncoders for Unsupervised Representation Learning
Diffusion Bridge Implicit Models
Diffusion Models are Evolutionary Algorithms
Diffusion Models as Cartoonists: The Curious Case of High Density Regions
Diffusion Transformer Captures Spatial-Temporal Dependencies: A Theory for Gaussian Process Data
Dimension Agnostic Neural Processes
Direct Post-Training Preference Alignment for Multi-Agent Motion Generation Model Using Implicit Feedback from Pre-training Demonstrations
Discovering Clone Negatives via Adaptive Contrastive Learning for Image-Text Matching
Discovering Temporally Compositional Neural Manifolds with Switching Infinite GPFA
Discrete Copula Diffusion
Discrete Latent Plans via Semantic Skill Abstractions
Discriminator-Guided Embodied Planning for LLM Agent
Disentangled Representation Learning with the Gromov-Monge Gap
DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation
Dissecting Adversarial Robustness of Multimodal LM Agents
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching
Distilling Dataset into Neural Field
Dist Loss: Enhancing Regression in Few-Shot Region through Distribution Distance Constraint
Distributed Speculative Inference (DSI): Speculation Parallelism for Provably Faster Lossless Language Model Inference
Distributional Associations vs In-Context Reasoning: A Study of Feed-forward and Attention Layers
Distribution-Free Data Uncertainty for Neural Network Regression
DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors
Divergence-Regularized Discounted Aggregation: Equilibrium Finding in Multiplayer Partially Observable Stochastic Games
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents
Diversity-Rewarded CFG Distillation
Do as I do (Safely): Mitigating Task-Specific Fine-tuning Risks in Large Language Models
Dobi-SVD: Differentiable SVD for LLM Compression and Some New Perspectives
DocMIA: Document-Level Membership Inference Attacks against DocVQA Models
Does Editing Provide Evidence for Localization?
Does SGD really happen in tiny subspaces?
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Do LLM Agents Have Regret? A Case Study in Online Learning and Games
Do LLMs estimate uncertainty well in instruction-following?
Domain Guidance: A Simple Transfer Approach for a Pre-trained Diffusion Model
Do Mice Grok? Glimpses of Hidden Progress in Sensory Cortex
DON’T STOP ME NOW: EMBEDDING BASED SCHEDULING FOR LLMS
Do vision models perceive objects like toddlers ?
DPLM-2: A Multimodal Diffusion Protein Language Model
DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation
DSPO: Direct Score Preference Optimization for Diffusion Model Alignment
DUALFormer: Dual Graph Transformer
Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting
Durable Quantization Conditioned Misalignment Attack on Large Language Models
DynAlign: Unsupervised Dynamic Taxonomy Alignment for Cross-Domain Segmentation
Dynamic Modeling of Patients, Modalities and Tasks via Multi-modal Multi-task Mixture of Experts
Dynamic Negative Guidance of Diffusion Models
Dynamics of Concept Learning and Compositional Generalization
DynaPrompt: Dynamic Test-Time Prompt Tuning
DynFrs: An Efficient Framework for Machine Unlearning in Random Forest
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective
ECD: A Machine Learning Benchmark for Predicting Enhanced-Precision Electronic Charge Density in Crystalline Inorganic Materials
EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing
econSG: Efficient and Multi-view Consistent Open-Vocabulary 3D Semantic Gaussians
EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
Effective post-training embedding compression via temperature control in contrastive training
Efficient Active Imitation Learning with Random Network Distillation
Efficient and Context-Aware Label Propagation for Zero-/Few-Shot Training-Free Adaptation of Vision-Language Model
Efficient Cross-Episode Meta-RL
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction
Efficient Interpolation between Extragradient and Proximal Methods for Weak MVIs
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs
Efficient Neuron Segmentation in Electron Microscopy by Affinity-Guided Queries
Efficient Online Pruning and Abstraction for Imperfect Information Extensive-Form Games
Efficient Perplexity Bound and Ratio Matching in Discrete Diffusion Language Models
Efficient stagewise pretraining via progressive subnetworks
Efficient Training of Neural Stochastic Differential Equations by Matching Finite Dimensional Distributions
EgoSim: Egocentric Exploration in Virtual Worlds with Multi-modal Conditioning
ELFS: Label-Free Coreset Selection with Proxy Training Dynamics
Eliciting Human Preferences with Language Models
Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models
Eliminating Position Bias of Language Models: A Mechanistic Approach
Elucidating the Preconditioning in Consistency Distillation
EmbodiedSAM: Online Segment Any 3D Thing in Real Time
EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment
Empowering Users in Digital Privacy Management through Interactive LLM-Based Agents
Endless Jailbreaks with Bijection Learning
Endowing Visual Reprogramming with Adversarial Robustness
Enhanced Diffusion Sampling via Extrapolation with Multiple ODE Solutions
Enhancing Clustered Federated Learning: Integration of Strategies and Improved Methodologies
Enhancing Federated Domain Adaptation with Multi-Domain Prototype-Based Federated Fine-Tuning
Enhancing Multilingual Reasoning in LLMs: Insights from Cross-Linguistic Correlations and Optimal Data Proportions
Enhancing Robust Fairness via Confusional Spectral Regularization
Enhancing Uncertainty Estimation and Interpretability with Bayesian Non-negative Decision Layer
Ensembles of Low-Rank Expert Adapters
Episodic Memories Generation and Evaluation Benchmark for Large Language Models
Equivariant Denoisers Cannot Copy Graphs: Align Your Graph Diffusion Models
Equivariant Symmetry Breaking Sets
Error-quantified Conformal Inference for Time Series
ESE: Espresso Sentence Embeddings
Estimation of single-cell and tissue perturbation effect in spatial transcriptomics via Spatial Causal Disentanglement
Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective
Exact Computation of Any-Order Shapley Interactions for Graph Neural Networks
Execution-guided within-prompt search for programming-by-example
Expected Return Symmetries
Exploiting Hankel-Toeplitz Structures for Fast Computation of Kernel Precision Matrices
Exploiting Hidden Symmetry to Improve Objective Perturbation for DP linear learners with a nonsmooth L1-norm
Exploiting Structure in Offline Multi-Agent RL: The Benefits of Low Interaction Rank
Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF
Exploring Learning Complexity for Efficient Downstream Dataset Pruning
Exponential Topology-enabled Scalable Communication in Multi-agent Reinforcement Learning
Exposure Bracketing Is All You Need For A High-Quality Image
Extendable and Iterative Structure Learning Strategy for Bayesian Networks
Factor Graph-based Interpretable Neural Networks
FACTS: A Factored State-Space Framework for World Modelling
Fair Clustering in the Sliding Window Model
Fair Submodular Cover
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"
FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models
Faster Diffusion Sampling with Randomized Midpoints: Sequential and Parallel
Fast training and sampling of Restricted Boltzmann Machines
Fast unsupervised ground metric learning with tree-Wasserstein distance
Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks
Feature Responsiveness Scores: Model-Agnostic Explanations for Recourse
Federated Continual Learning Goes Online: Uncertainty-Aware Memory Management for Vision Tasks and Beyond
Federated Few-Shot Class-Incremental Learning
Federated Residual Low-Rank Adaption of Large Language Models
F-Fidelity: A Robust Framework for Faithfulness Evaluation of Explainable AI
Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning
FIG: Flow with Interpolant Guidance for Linear Inverse Problems
Finding Shared Decodable Concepts and their Negations in the Brain
Fine-Tuning Attention Modules Only: Enhancing Weight Disentanglement in Task Arithmetic
Fine-tuning can Help Detect Pretraining Data from Large Language Models
FlashMask: Efficient and Rich Mask Extension of FlashAttention
Flat Reward in Policy Parameter Space Implies Robust Reinforcement Learning
FLOPS: Forward Learning with OPtimal Sampling
Flow-based Variational Mutual Information: Fast and Flexible Approximations
Flow Matching with General Discrete Paths: A Kinetic-Optimal Perspective
Forking Paths in Neural Text Generation
Formation of Representations in Neural Networks
Foundation Models Secretly Understand Neural Network Weights: Enhancing Hypernetwork Architectures with Foundation Models
From Attention to Activation: Unraveling the Enigmas of Large Language Models
From Few to Many: Self-Improving Many-Shot Reasoners Through Iterative Optimization and Generation
From Isolated Conversations to Hierarchical Schemas: Dynamic Tree Memory Representation for LLMs
From Models to Microtheories: Distilling a Model's Topical Knowledge for Grounded Question-Answering
From Risk to Uncertainty: Generating Predictive Uncertainty Measures via Bayesian Estimation
From Tokens to Lattices: Emergent Lattice Structures in Language Models
From Tokens to Words: On the Inner Lexicon of LLMs
Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency
Gated Delta Networks: Improving Mamba2 with Delta Rule
Gaussian Splatting Lucas-Kanade
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-Time Alignment
Generalization, Expressivity, and Universality of Graph Neural Networks on Attributed Graphs
Generalization through variance: how noise shapes inductive biases in diffusion models
Generalized Consistency Trajectory Models for Image Manipulation
Generating Freeform Endoskeletal Robots
Generating Graphs via Spectral Diffusion
Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models
Generation and Comprehension Hand-in-Hand: Vision-guided Expression Diffusion for Boosting Referring Expression Generation and Comprehension
Generative Classifiers Avoid Shortcut Solutions
Generator Matching: Generative modeling with arbitrary Markov processes
GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
GenVP: Generating Visual Puzzles with Contrastive Hierarchical VAEs
Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture
Geometry-Aware Approaches for Balancing Performance and Theoretical Guarantees in Linear Bandits
Geometry of Long-Tailed Representation Learning: Rebalancing Features for Skewed Distributions
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
GI-GS: Global Illumination Decomposition on Gaussian Splatting for Inverse Rendering
Glimpse: Enabling White-Box Methods to Use Proprietary Models for Zero-Shot LLM-Generated Text Detection
Global Identifiability of Overcomplete Dictionary Learning via L1 and Volume Minimization
GLOMA: Global Video Text Spotting with Morphological Association
GOAL: A Generalist Combinatorial Optimization Agent Learner
GotenNet: Rethinking Efficient 3D Equivariant Graph Neural Networks
GPromptShield: Elevating Resilience in Graph Prompt Tuning Against Adversarial Attacks
GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS
Grammar Reinforcement Learning: path and cycle counting in graphs with a Context-Free Grammar and Transformer approach
GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation
Graph Neural Ricci Flow: Evolving Feature from a Curvature Perspective
GraphRouter: A Graph-based Router for LLM Selections
Graph Transformers Dream of Electric Flow
Grounding Video Models to Actions through Goal Conditioned Exploration
Group Downsampling with Equivariant Anti-aliasing
Group Ligands Docking to Protein Pockets
GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement
Guided Score identity Distillation for Data-Free One-Step Text-to-Image Generation
HaDeMiF: Hallucination Detection and Mitigation in Large Language Models
HALL-E: Hierarchical Neural Codec Language Model for Minute-Long Zero-Shot Text-to-Speech Synthesis
HAMSTER: Hierarchical Action Models for Open-World Robot Manipulation
HERO: Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning
Hessian-Free Online Certified Unlearning
HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment
High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws
High-Dimensional Bayesian Optimisation with Gaussian Process Prior Variational Autoencoders
High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity
High-quality Text-to-3D Character Generation with SparseCubes and Sparse Transformers.
HMoRA: Making LLMs More Effective with Hierarchical Mixture of LoRA Experts
Holographic Node Representations: Pre-training Task-Agnostic Node Embeddings
How Discrete and Continuous Diffusion Meet: Comprehensive Analysis of Discrete Diffusion Models via a Stochastic Integral Framework
How DNNs break the Curse of Dimensionality: Compositionality and Symmetry Learning
How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?
How do we interpret the outputs of a neural network trained on classification?
How Feature Learning Can Improve Neural Scaling Laws
How Gradient descent balances features: A dynamical analysis for two-layer neural networks
How Much is Unseen Depends Chiefly on Information About the Seen
How new data permeates LLM knowledge and how to dilute it
How to Verify Any (Reasonable) Distribution Property: Computationally Sound Argument Systems for Distributions
How Two-Layer Neural Networks Learn, One (Giant) Step at a Time
Human-Aligned Chess With a Bit of Search
Human-inspired Episodic Memory for Infinite Context LLMs
Hymba: A Hybrid-head Architecture for Small Language Models
Hyperbolic Genome Embeddings
HyperDAS: Towards Automating Mechanistic Interpretability with Hypernetworks
Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models
I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength
Identifiability for Gaussian Processes with Holomorphic Kernels
IGL-Bench: Establishing the Comprehensive Benchmark for Imbalanced Graph Learning
Image and Video Tokenization with Binary Spherical Quantization
Immunogenicity Prediction with Dual Attention Enables Vaccine Target Selection
Implicit Neural Surface Deformation with Explicit Velocity Fields
Improved Algorithms for Kernel Matrix-Vector Multiplication Under Sparsity Assumptions
Improved Diffusion-based Generative Model with Better Adversarial Robustness
Improved Techniques for Optimization-Based Jailbreaking on Large Language Models
ImProver: Agent-Based Automated Proof Optimization
Improving Equivariant Networks with Probabilistic Symmetry Breaking
Improving Graph Neural Networks by Learning Continuous Edge Directions
Improving Neural Network Accuracy by Concurrently Training with a Twin Network
Improving Pretraining Data Using Perplexity Correlations
Improving Semantic Understanding in Speech Language Models via Brain-tuning
Improving Uncertainty Estimation through Semantically Diverse Language Generation
Improving Unsupervised Constituency Parsing via Maximizing Semantic Information
In-Context Editing: Learning Knowledge from Self-Induced Distributions
In-context Time Series Predictor
Incremental Causal Effect for Time to Treatment Initialization
INFER: A Neural-symbolic Model For Extrapolation Reasoning on Temporal Knowledge Graph
Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving
Infinite-Resolution Integral Noise Warping for Diffusion Models
InfoGS: Efficient Structure-Aware 3D Gaussians via Lightweight Information Shaping
Innovative Thinking, Infinite Humor: Humor Research of Large Language Models through Structured Thought Leaps
INS: Interaction-aware Synthesis to Enhance Offline Multi-agent Reinforcement Learning
Inspection and Control of Self-Generated-Text Recognition Ability in Llama3-8b-Instruct
InstaSHAP: Interpretable Additive Models Explain Shapley Values Instantly
Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
InterMask: 3D Human Interaction Generation via Collaborative Masked Modeling
Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks
Interpretable Compressed Descriptions For Image Generation
Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology
Interpreting the Second-Order Effects of Neurons in CLIP
Inverse Constitutional AI: Compressing Preferences into Principles
Inverse Rendering using Multi-Bounce Path Tracing and Reservoir Sampling
Inverse Scaling: When Bigger Isn't Better
Is Factuality Enhancement a Free Lunch For LLMs? Better Factuality Can Lead to Worse Context-Faithfulness
Iterative Label Refinement Matters More than Preference Optimization under Weak Supervision
Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment
Kernel-based Optimally Weighted Conformal Time-Series Prediction
kNN Attention Demystified: A Theoretical Exploration for Scalable Transformers
Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition
Kolmogorov-Arnold Transformer
Kronecker Mask and Interpretive Prompts are Language-Action Video Learners
LaGeM: A Large Geometry Model for 3D Representation Learning and Diffusion
LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning
Language-Image Models with 3D Understanding
Language Model Alignment in Multilingual Trolley Problems
Language Models Learn to Mislead Humans via RLHF
Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice
Large Language Models Assume People are More Rational than We Really are
Large Language Models can Become Strong Self-Detoxifiers
Large Language Models Often Say One Thing and Do Another
Large Scale Knowledge Washing
Last Iterate Convergence of Incremental Methods as a Model of Forgetting
Latent-EnSF: A Latent Ensemble Score Filter for High-Dimensional Data Assimilation with Sparse Observation Data
Latent Radiance Fields with 3D-aware 2D Representations
Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement Learning
Lawma: The Power of Specialization for Legal Annotation
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics
LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid
Lean-STaR: Learning to Interleave Thinking and Proving
Learn-by-interact: A Data-Centric Framework For Self-Adaptive Agents in Realistic Environments
Learn hybrid prototypes for multivariate time series anomaly detection
Learning a Fast Mixing Exogenous Block MDP using a Single Trajectory
Learning-Augmented Frequent Directions
Learning Causal Alignment for Reliable Disease Diagnosis
Learning Chaos In A Linear Way
Learning Clustering-based Prototypes for Compositional Zero-Shot Learning
Learning Color Equivariant Representations
Learning Diagrams: A Graphical Language for Compositional Training Regimes
Learning Dynamics of LLM Finetuning
Learning Equivariant Non-Local Electron Density Functionals
Learning-Guided Rolling Horizon Optimization for Long-Horizon Flexible Job-Shop Scheduling
Learning Hierarchical Polynomials of Multiple Nonlinear Features
Learning High-Degree Parities: The Crucial Role of the Initialization
Learning Mask Invariant Mutual Information for Masked Image Modeling
Learning mirror maps in policy mirror descent
Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics
Learning Partial Graph Matching via Optimal Partial Transport
Learning Regularized Graphon Mean-Field Games with Unknown Graphons
Learning Spatiotemporal Dynamical Systems from Point Process Observations
Learning Structured Universe Graph with Outlier OOD Detection for Partial Matching
Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training
Learning to Discretize Denoising Diffusion ODEs
Learning to engineer protein flexibility
Learning to Search from Demonstration Sequences
Learning to Solve Differential Equation Constrained Optimization Problems
Learning to Steer Markovian Agents under Model Uncertainty
Learning Video-Conditioned Policy on Unlabelled Data with Joint Embedding Predictive Transformer
Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model
Let the Code LLM Edit Itself When You Edit the Code
Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction
Leveraging Flatness to Improve Information-Theoretic Generalization Bounds for SGD
Leveraging Variable Sparsity to Refine Pareto Stationarity in Multi-Objective Optimization
LICO: Large Language Models for In-Context Molecular Optimization
LICORICE: Label-Efficient Concept-Based Interpretable Reinforcement Learning
Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better
Linear Multistep Solver Distillation for Fast Sampling of Diffusion Models
Linear Representations of Political Perspective Emerge in Large Language Models
Linear SCM Identification in the Presence of Confounders and Gaussian Noise
Lines of Thought in Large Language Models
LiveBench: A Challenging, Contamination-Limited LLM Benchmark
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
LLMs Can Plan Only If We Tell Them
LLM-SR: Scientific Equation Discovery via Programming with Large Language Models
Locality Alignment Improves Vision-Language Models
Locality Sensitive Avatars From Video
Local Patterns Generalize Better for Novel Anomalies
Logically Consistent Language Models via Neuro-Symbolic Integration
Long-Context Linear System Identification
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
Longhorn: State Space Models are Amortized Online Learners
LongMamba: Enhancing Mamba's Long-Context Capabilities via Training-Free Receptive Field Enlargement
Long-Short Decision Transformer: Bridging Global and Local Dependencies for Generalized Decision-Making
Long-time asymptotics of noisy SVGD outside the population limit
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Looking into User’s Long-term Interests through the Lens of Conservative Evidential Learning
Looking Inward: Language Models Can Learn About Themselves by Introspection
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency
LoRA Learns Less and Forgets Less
LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
Machine Unlearning Fails to Remove Data Poisoning Attacks
MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization
MADGEN: Mass-Spec attends to De Novo Molecular generation
MaestroMotif: Skill Design from Artificial Intelligence Feedback
Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning
Make Haste Slowly: A Theory of Emergent Structured Mixed Selectivity in Feature Learning ReLU Networks
MallowsPO: Fine-Tune Your LLM with Preference Dispersions
MambaPEFT: Exploring Parameter-Efficient Fine-Tuning for Mamba
MamBEV: Enabling State Space Models to Learn Birds-Eye-View Representations
MamKO: Mamba-based Koopman operator for modeling and predictive control
ManiSkill-HAB: A Benchmark for Low-Level Manipulation in Home Rearrangement Tasks
MAPS: Advancing Multi-Modal Reasoning in Expert-Level Physical Science
MaskBit: Embedding-free Image Generation via Bit Tokens
Mask in the Mirror: Implicit Sparsification
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code
Matryoshka Multimodal Models
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine
Mechanistic Interpretability Meets Vision Language Models: Insights and Limitations
Mentored Learning: Improving Generalization and Convergence of Student Learner
metabench - A Sparse Benchmark of Reasoning and Knowledge in Large Language Models
Meta-Continual Learning of Neural Fields
Meta-Dynamical State Space Models for Integrative Neural Data Analysis
MetaMetrics: Calibrating Metrics for Generation Tasks Using Human Preferences
MetaOOD: Automatic Selection of OOD Detection Models
MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility
MGDA Converges under Generalized Smoothness, Provably
Minimalistic Predictions for Online Class Constraint Scheduling
Minimax Optimal Two-Stage Algorithm For Moment Estimation Under Covariate Shift
Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid
(Mis)Fitting Scaling Laws: A Survey of Scaling Law Fitting Techniques in Deep Learning
Misspecified $Q$-Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error
Mitigating Hallucination in Large Vision-Language Models via Modular Attribution and Intervention
Mitigating Reward Over-Optimization in RLHF via Behavior-Supported Regularization
Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace
Mix-CPT: A Domain Adaptation Framework via Decoupling Knowledge Learning and Format Alignment
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
Mixture-of-Agents Enhances Large Language Model Capabilities
Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models
Mixture of In-Context Prompters for Tabular PFNs
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents
MLPs Learn In-Context on Regression and Classification Tasks
MMAD: A Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection
MM-EMBED: UNIVERSAL MULTIMODAL RETRIEVAL WITH MULTIMODAL LLMS
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge
MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments
MoDeGPT: Modular Decomposition for Large Language Model Compression
Model-agnostic meta-learners for estimating heterogeneous treatment effects over time
Model Equality Testing: Which Model is this API Serving?
Model-Free Offline Reinforcement Learning with Enhanced Robustness
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning
Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions
Model merging with SVD to tie the Knots
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
MoLEx: Mixture of Layer Experts for Fine-tuning with Sparse Upcycling
Moner: Motion Correction in Undersampled Radial MRI with Unsupervised Neural Representation
Monitoring Latent World States in Language Models with Propositional Probes
Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning
MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses
MorphoDiff: Cellular Morphology Painting with Diffusion Models
MotherNet: Fast Training and Inference via Hyper-Network Transformers
Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs
MotionClone: Training-Free Motion Cloning for Controllable Video Generation
Motion Control of High-Dimensional Musculoskeletal Systems with Hierarchical Model-Based Planning
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
MP-Mat: A 3D-and-Instance-Aware Human Matting and Editing Framework with Multiplane Representation
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
Multi-Accurate CATE is Robust to Unknown Covariate Shifts
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains
Multi-Label Node Classification with Label Influence Propagation
Multi-LLM-Agents Debate - Performance, Efficiency, and Scaling Challenges
Multimodal Lego: Model Merging and Fine-Tuning Across Topologies and Modalities in Biomedicine
Multimodal Quantitative Language for Generative Recommendation
Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap
Multi-objective antibody design with constrained preference optimization
Multi-objective Differentiable Neural Architecture Search
Multiplicative Logit Adjustment Approximates Neural-Collapse-Aware Decision Boundary Adjustment
Multi-Task Dense Predictions via Unleashing the Power of Diffusion
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solver
Narrowing Information Bottleneck Theory for Multimodal Image-Text Representations Interpretability
Navigating Neural Space: Revisiting Concept Activation Vectors to Overcome Directional Divergence
Near-optimal Active Regression of Single-Index Models
Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency
NetFormer: An interpretable model for recovering dynamical connectivity in neuronal population dynamics
NetMoE: Accelerating MoE Training through Dynamic Sample Placement
Neural Approximate Mirror Maps for Constrained Diffusion Models
Neural Context Flows for Meta-Learning of Dynamical Systems
Neural Exploratory Landscape Analysis for Meta-Black-Box-Optimization
Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning
NeurFlow: Interpreting Neural Networks through Neuron Groups and Functional Interactions
Neuron-based Multifractal Analysis of Neuron Interaction Dynamics in Large Models
New Algorithms for the Learning-Augmented k-means Problem
NextBestPath: Efficient 3D Mapping of Unseen Environments
nGPT: Normalized Transformer with Representation Learning on the Hypersphere
Node-Time Conditional Prompt Learning in Dynamic Graphs
No Equations Needed: Learning System Dynamics Without Relying on Closed-Form ODEs
Noise-conditioned Energy-based Annealed Rewards (NEAR): A Generative Framework for Imitation Learning from Observation
Noise Stability Optimization for Finding Flat Minima: A Hessian-based Regularization Approach
No Location Left Behind: Measuring and Improving the Fairness of Implicit Representations for Earth Data
Non-Equilibrium Dynamics of Hybrid Continuous-Discrete Ground-State Sampling
Nonlinear multiregion neural dynamics with parametric impulse response communication channels
Non-myopic Generation of Language Models for Reasoning and Planning
Not All Language Model Features Are One-Dimensionally Linear
NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens
Number Cookbook: Number Understanding of Language Models and How to Improve It
NutriBench: A Dataset for Evaluating Large Language Models in Nutrition Estimation from Meal Descriptions
Object-Centric Pretraining via Target Encoder Bootstrapping
OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models
O(d/T) Convergence Theory for Diffusion Probabilistic Models under Minimal Assumptions
Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood
OMG: Opacity Matters in Material Modeling with Gaussian Splatting
On Calibration of LLM-based Guard Models for Reliable Content Moderation
On Conformal Isometry of Grid Cells: Learning Distance-Preserving Position Embedding
On Disentangled Training for Nonlinear Transform in Learned Image Compression
One-for-All Few-Shot Anomaly Detection via Instance-Induced Prompt Learning
One Model Transfer to All: On Robust Jailbreak Prompts Generation against LLMs
On Evaluating the Durability of Safeguards for Open-Weight LLMs
On Large Language Model Continual Unlearning
On Linear Representations and Pretraining Data Frequency in Language Models
ONLINE EPSILON NET & PIERCING SET FOR GEOMETRIC CONCEPTS
On Rollouts in Model-Based Reinforcement Learning
On Statistical Rates of Conditional Diffusion Transformers: Approximation, Estimation and Minimax Optimality
On the Benefits of Memory for Modeling Time-Dependent PDEs
On the Completeness of Invariant Geometric Deep Learning Models
On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization
On the Expressiveness of Rational ReLU Neural Networks With Bounded Depth
On the Feature Learning in Diffusion Models
On the Hölder Stability of Multiset and Graph Neural Networks
On the Importance of Language-driven Representation Learning for Heterogeneous Federated Learning
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery
On the Modeling Capabilities of Large Language Models for Sequential Decision Making
On the Optimization and Generalization of Multi-head Attention
On the Optimization Landscape of Low Rank Adaptation Methods for Large Language Models
OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?
OPTAMI: Global Superlinear Convergence of High-order Methods
Optimal Flow Transport and its Entropic Regularization: a GPU-friendly Matrix Iterative Algorithm for Flow Balance Satisfaction
Optimal Learning of Kernel Logistic Regression for Complex Classification Scenarios
Optimal Protocols for Continual Learning via Statistical Physics and Control Theory
Optimization with Access to Auxiliary Information
Optimizing importance weighting in the presence of sub-population shifts
OptionZero: Planning with Learned Options
Oracle efficient truncated statistics
OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning
Out-of-distribution Generalization for Total Variation based Invariant Risk Minimization
Overcoming Slow Decision Frequencies in Continuous Control: Model-Based Sequence Reinforcement Learning for Model-Free Control
OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer
Pairwise Elimination with Instance-Dependent Guarantees for Bandits with Cost Subsidy
PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment
Param$\Delta$ for Direct Mixing: Post-Train Large Language Model At Zero Cost
Parameter-Efficient and Stable Singular Value Adaptation for Pre-Trained Models
Parameter Expanded Stochastic Gradient Markov Chain Monte Carlo
Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences
Partially Observed Trajectory Inference using Optimal Transport and a Dynamics Prior
Pedestrian Motion Reconstruction: A Large-scale Benchmark via Mixed Reality Rendering with Multiple Perspectives and Modalities
Perm: A Parametric Representation for Multi-Style 3D Hair Modeling
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
Persistent Pre-training Poisoning of LLMs
PersonalLLM: Tailoring LLMs to Individual Preferences
Physics-informed Temporal Difference Metric Learning for Robot Motion Planning
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
PINP: Physics-Informed Neural Predictor with latent estimation of fluid flows
Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming
Plastic Learning with Deep Fourier Features
pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation
Point-based Instance Completion with Scene Constraints
Poison-splat: Computation Cost Attack on 3D Gaussian Splatting
Policy Gradient with Kernel Quadrature
Policy Optimization under Imperfect Human Interactions with Agent-Gated Shared Autonomy
PolyNet: Learning Diverse Solution Strategies for Neural Combinatorial Optimization
Positive-Unlabeled Diffusion Models for Preventing Sensitive Data Generation
POTEC: Off-Policy Contextual Bandits for Large Action Spaces via Policy Decomposition
PQMass: Probabilistic Assessment of the Quality of Generative Models using Probability Mass Estimation
Preble: Efficient Distributed Prompt Scheduling for LLM Serving
Precedence-Constrained Winter Value for Effective Graph Data Valuation
Prediction Risk and Estimation Risk of the Ridgeless Least Squares Estimator under General Assumptions on Regression Errors
Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation
Preference Elicitation for Offline Reinforcement Learning
Preserving Diversity in Supervised Fine-Tuning of Large Language Models
Privacy-Aware Lifelong Learning
Proactive Privacy Amnesia for Large Language Models: Safeguarding PII with Negligible Impact on Model Utility
Probabilistic Conformal Prediction with Approximate Conditional Validity
Problem-Parameter-Free Federated Learning
Process Reward Model with Q-value Rankings
Progressive Compositionality in Text-to-Image Generative Models
Progressive Compression with Universally Quantized Diffusion Models
Progress or Regress? Self-Improvement Reversal in Post-training
Projection Head is Secretly an Information Bottleneck
Protecting against simultaneous data poisoning attacks
Protein Language Model Fitness is a Matter of Preference
ProtPainter: Draw or Drag Protein via Topology-guided Diffusion
Provable Convergence and Limitations of Geometric Tempering for Langevin Dynamics
Provable Convergence Bounds for Hybrid Dynamical Sampling and Optimization
Provable weak-to-strong generalization via benign overfitting
Provence: efficient and robust context pruning for retrieval-augmented generation
Proximal Mapping Loss: Understanding Loss Functions in Crowd Counting & Localization
PvNeXt: Rethinking Network Design and Temporal Motion for Point Cloud Video Recognition
QERA: an Analytical Framework for Quantization Error Reconstruction
Qinco2: Vector Compression and Search with Improved Implicit Neural Codebooks
QP-SNN: Quantized and Pruned Spiking Neural Networks
Quality over Quantity in Attention Layers: When Adding More Heads Hurts
Quantitative Approximation for Neural Operators in Nonlinear Parabolic Equations
Quantum (Inspired) $D^2$-sampling with Applications
Query-based Knowledge Transfer for Heterogeneous Learning Environments
Radar: Fast Long-Context Decoding for Any Transformer
RAG-SR: Retrieval-Augmented Generation for Neural Symbolic Regression
RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
Random Is All You Need: Random Noise Injection on Feature Statistics for Generalizable Deep Image Denoising
RAPID: Retrieval Augmented Training of Differentially Private Diffusion Models
Rapid Selection and Ordering of In-Context Demonstrations via Prompt Embedding Clustering
Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance
Rationalizing and Augmenting Dynamic Graph Neural Networks
RA-TTA: Retrieval-Augmented Test-Time Adaptation for Vision-Language Models
RB-Modulation: Training-Free Stylization using Reference-Based Modulation
Real2Code: Reconstruct Articulated Objects via Code Generation
Real-Time Video Generation with Pyramid Attention Broadcast
Reasoning Elicitation in Language Models via Counterfactual Feedback
Reassessing How to Compare and Improve the Calibration of Machine Learning Models
Reconciling Model Multiplicity for Downstream Decision Making
Reconstruction-Guided Policy: Enhancing Decision-Making through Agent-Wise State Consistency
Recovering Manifold Structure Using Ollivier Ricci Curvature
Redefining the task of Bioactivity Prediction
Reexamining the Aleatoric and Epistemic Uncertainty Dichotomy
Refine-by-Align: Reference-Guided Artifacts Refinement through Semantic Alignment
Refining CLIP's Spatial Awareness: A Visual-Centric Perspective
ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement
Regret Bounds for Episodic Risk-Sensitive Linear Quadratic Regulator
Regularization by Texts for Latent Diffusion Inverse Solvers
Regularizing Energy among Training Samples for Out-of-Distribution Generalization
Regulatory DNA Sequence Design with Reinforcement Learning
Reinforcement Learning for Control of Non-Markovian Cellular Population Dynamics
Relation-Aware Diffusion for Heterogeneous Graphs with Partially Observed Features
ReMatching Dynamic Reconstruction Flow
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing
Repetition Improves Language Model Embeddings
Representative Guidance: Diffusion Model Sampling with Coherence
RESfM: Robust Deep Equivariant Structure from Motion
Residual Connections and Normalization Can Provably Prevent Oversmoothing in GNNs
Residual Kernel Policy Network: Enhancing Stability and Robustness in RKHS-Based Reinforcement Learning
Residual-MPPI: Online Policy Customization for Continuous Control
RESuM: A Rare Event Surrogate Model for Physics Detector Design
Rethinking Artistic Copyright Infringements In the Era Of Text-to-Image Generative Models
Rethinking Diffusion Posterior Sampling: From Conditional Score Estimator to Maximizing a Posterior
Rethinking Evaluation of Sparse Autoencoders through the Representation of Polysemous Words
Rethinking Invariance in In-context Learning
Rethinking Invariance Regularization in Adversarial Training to Improve Robustness-Accuracy Trade-off
Rethinking Shapley Value for Negative Interactions in Non-convex Games
Retri3D: 3D Neural Graphics Representation Retrieval
Revisiting a Design Choice in Gradient Temporal Difference Learning
Revisiting Convolution Architecture in the Realm of DNA Foundation Models
Revisiting In-context Learning Inference Circuit in Large Language Models
Revisiting Large-Scale Non-convex Distributionally Robust Optimization
REVISITING MULTI-PERMUTATION EQUIVARIANCE THROUGH THE LENS OF IRREDUCIBLE REPRESENTATIONS
Revisiting Nearest Neighbor for Tabular Data: A Deep Tabular Baseline Two Decades Later
Revolutionizing EMCCD Denoising through a Novel Physics-Based Learning Framework for Noise Modeling
Reward Dimension Reduction for Scalable Multi-Objective Reinforcement Learning
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
Reward Learning from Multiple Feedback Types
Risk-Sensitive Diffusion: Robustly Optimizing Diffusion Models with Noisy Samples
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
RNNs are not Transformers (Yet): The Key Bottleneck on In-Context Retrieval
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets
Robust Conformal Prediction with a Single Binary Certificate
Robust Function-Calling for On-Device Language Model via Function Masking
Robust Gymnasium: A Unified Modular Benchmark for Robust Reinforcement Learning
RobustKV: Defending Large Language Models against Jailbreak Attacks via KV Eviction
Robustness Auditing for Linear Regression: To Singularity and Beyond
Robustness of Quantum Algorithms for Nonconvex Optimization
Robust Simulation-Based Inference under Missing Data via Neural Processes
Robust System Identification: Finite-sample Guarantees and Connection to Regularization
Robust Transfer of Safety-Constrained Reinforcement Learning Agents
Rotated Runtime Smooth: Training-Free Activation Smoother for accurate INT4 inference
Routing Experts: Learning to Route Dynamic Experts in Existing Multi-modal Large Language Models
RRM: Robust Reward Model Training Mitigates Reward Hacking
RTop-K: Ultra-Fast Row-Wise Top-K Selection for Neural Network Acceleration on GPUs
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation
SAM 2: Segment Anything in Images and Videos
SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement
Satisficing Regret Minimization in Bandits
Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy Video
Scalable Decentralized Learning with Teleportation
Scalable Universal T-Cell Receptor Embeddings from Adaptive Immune Repertoires
Scaling Laws for Adversarial Attacks on Language Model Activations and Tokens
Scaling LLM Test-Time Compute Optimally Can be More Effective than Scaling Parameters for Reasoning
Scaling Optimal LR Across Token Horizons
Scaling Wearable Foundation Models
Schur's Positive-Definite Network: Deep Learning in the SPD cone with structure
Score-based free-form architectures for high-dimensional Fokker-Planck equations
Score Forgetting Distillation: A Swift, Data-Free Method for Machine Unlearning in Diffusion Models
SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection
Second-Order Min-Max Optimization with Lazy Hessians
SegLLM: Multi-round Reasoning Segmentation with Large Language Models
SelectFormer: Private and Practical Data Selection for Transformers
Selective induction Heads: How Transformers Select Causal Structures in Context
Self-Attention-Based Contextual Modulation Improves Neural System Identification
Self-Evolving Multi-Agent Collaboration Networks for Software Development
Self-Improving Robust Preference Optimization
Self-Play Preference Optimization for Language Model Alignment
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations
Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors
Semantix: An Energy-guided Sampler for Semantic Style Transfer
Semialgebraic Neural Networks: From roots to representations
Sensitivity-Aware Amortized Bayesian Inference
Sensor-Invariant Tactile Representation
Separation Power of Equivariant Neural Networks
SFESS: Score Function Estimators for $k$-Subset Sampling
SGD with memory: fundamental properties and stochastic acceleration
Shapley-Guided Utility Learning for Effective Graph Inference Data Valuation
ShEPhERD: Diffusing shape, electrostatics, and pharmacophores for bioisosteric drug design
Shifting the Paradigm: A Diffeomorphism Between Time Series Data Manifolds for Achieving Shift-Invariancy in Deep Learning
SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning
Simple, Good, Fast: Self-Supervised World Models Free of Baggage
Simple Guidance Mechanisms for Discrete Diffusion Models
Simple ReFlow: Improved Techniques for Fast Flow Models
Simplifying, Stabilizing and Scaling Continuous-time Consistency Models
Simulating Training Dynamics to Reconstruct Training Data from Deep Neural Networks
SimulPL: Aligning Human Preferences in Simultaneous Machine Translation
Sketch2Diagram: Generating Vector Diagrams from Hand-Drawn Sketches
SleepSMC: Ubiquitous Sleep Staging via Supervised Multimodal Coordination
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
Small-to-Large Generalization: Training Data Influences Models Consistently Across Scale
SMI-Editor: Edit-based SMILES Language Model with Fragment-level Supervision
SoftCVI: Contrastive variational inference with self-generated soft labels
Soft Merging of Experts with Adaptive Routing
Solving New Tasks by Adapting Internet Video Knowledge
Solving Video Inverse Problems Using Image Diffusion Models
SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation
SpaceGNN: Multi-Space Graph Neural Network for Node Anomaly Detection with Extremely Limited Labels
Sparse autoencoders reveal selective remapping of visual concepts during adaptation
Sparse Learning for State Space Models on Mobile
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Spectro-Riemannian Graph Neural Networks
Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks
Spiking Vision Transformer with Saccadic Attention
SpinQuant: LLM Quantization with Learned Rotations
Sports-Traj: A Unified Trajectory Generation Model for Multi-Agent Movement in Sports
SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models
Spread Preference Annotation: Direct Preference Judgment for Efficient LLM Alignment
SRSA: Skill Retrieval and Adaptation for Robotic Assembly Tasks
SSOLE: Rethinking Orthogonal Low-rank Embedding for Self-Supervised Learning
Stabilized Neural Prediction of Potential Outcomes in Continuous Time
Stable Segment Anything Model
STAMP: Scalable Task- And Model-agnostic Collaborative Perception
Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization
Standardizing Structural Causal Models
STAR: Synthesis of Tailored Architectures
State Space Models are Provably Comparable to Transformers in Dynamic Token Selection
Statistical Tractability of Off-policy Evaluation of History-dependent Policies in POMDPs
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
Stiefel Flow Matching for Moment-Constrained Structure Elucidation
Straightness of Rectified Flow: A Theoretical Insight into Wasserstein Convergence
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Strategic Classification With Externalities
StringLLM: Understanding the String Processing Capability of Large Language Models
Strong Model Collapse
Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking
Supervised and Semi-Supervised Diffusion Maps with Label-Driven Diffusion
SurFhead: Affine Rig Blending for Geometrically Accurate 2D Gaussian Surfel Head Avatars
SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding
SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement
Swift4D: Adaptive divide-and-conquer Gaussian Splatting for compact and efficient reconstruction of dynamic scene
SyllableLM: Learning Coarse Semantic Units for Speech Language Models
Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling
SynQ: Accurate Zero-shot Quantization by Synthesis-aware Fine-tuning
Synthesizing Realistic fMRI: A Physiological Dynamics-Driven Hierarchical Diffusion Model for Efficient fMRI Acquisition
SysCaps: Language Interfaces for Simulation Surrogates of Complex Systems
Systematic Outliers in Large Language Models
T2V-Turbo-v2: Enhancing Video Model Post-Training through Data, Reward, and Conditional Guidance Design
TabM: Advancing tabular deep learning with parameter-efficient ensembling
TabReD: Analyzing Pitfalls and Filling the Gaps in Tabular Deep Learning Benchmarks
Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics
Targeted Attack Improves Protection against Unauthorized Diffusion Customization
Temporal Flexibility in Spiking Neural Networks: Towards Generalization Across Time Steps and Deployment Friendliness
TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data
Test-Time Adaptation for Combating Missing Modalities in Egocentric Videos
Test-time Adaptation for Image Compression with Distribution Regularization
Test-time Alignment of Diffusion Models without Reward Over-optimization
Test-Time Ensemble via Linear Mode Connectivity: A Path to Better Adaptation
Text-to-Image Rectified Flow as Plug-and-Play Priors
The 3D-PC: a benchmark for visual perspective taking in humans and machines
The AdEMAMix Optimizer: Better, Faster, Older
The Case for Cleaner Biosignals: High-fidelity Neural Compressor Enables Transfer from Cleaner iEEG to Noisier EEG
The Computational Complexity of Circuit Discovery for Inner Interpretability
The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise
The Directionality of Optimization Trajectories in Neural Networks
The Hidden Cost of Waiting for Accurate Predictions
The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation
The "Law'' of the Unconscious Contrastive Learner: Probabilistic Alignment of Unpaired Modalities
The Loss Landscape of Deep Linear Neural Networks: a Second-order Analysis
Theory, Analysis, and Best Practices for Sigmoid Self-Attention
The Pitfalls of Memorization: When Memorization Hurts Generalization
The Power of LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions
The Rise and Down of Babel Tower: Investigating the Evolution Process of Multilingual Code Large Language Model
The Same but Different: Structural Similarities and Differences in Multilingual Language Modeling
The Unreasonable Ineffectiveness of the Deeper Layers
The Value of Sensory Information to a Robot
ThinkBot: Embodied Instruction Following with Thought Chain Reasoning
Think-on-Graph 2.0: Deep and Faithful Large Language Model Reasoning with Knowledge-guided Retrieval Augmented Generation
Think Thrice Before You Act: Progressive Thought Refinement in Large Language Models
Three-in-One: Fast and Accurate Transducer for Hybrid-Autoregressive ASR
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
Tight Clusters Make Specialized Experts
Time After Time: Deep-Q Effect Estimation for Interventions on When and What to do
TimeInf: Time Series Data Contribution via Influence Functions
Time-to-Event Pretraining for 3D Medical Imaging
TIPS: Text-Image Pretraining with Spatial awareness
TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization With Estimated Weights
T-JEPA: Augmentation-Free Self-Supervised Learning for Tabular Data
To Clip or not to Clip: the Dynamics of SGD with Gradient Clipping in High-Dimensions
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Gaussian-Based Instance-Adaptive Intensity Modeling for Point-Supervised Facial Expression Spotting
TopoLM: brain-like spatio-functional organization in a topographic language model
TorchTitan: One-stop PyTorch native solution for production ready LLM pretraining
ToVE: Efficient Vision-Language Learning via Knowledge Transfer from Vision Experts
Toward Efficient Multi-Agent Exploration With Trajectory Entropy Maximization
Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment
Towards a Complete Logical Framework for GNN Expressiveness
Towards a learning theory of representation alignment
Towards Certification of Uncertainty Calibration under Adversarial Attacks
Towards Domain Adaptive Neural Contextual Bandits
Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians
Towards Generalization Bounds of GCNs for Adversarially Robust Node Classification
Towards Improving Exploration through Sibling Augmented GFlowNets
Towards more rigorous evaluations of language models
Towards Optimal Multi-draft Speculative Decoding
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning
Towards Unbiased Calibration using Meta-Regularization
Towards Understanding the Universality of Transformers for Next-Token Prediction
Tracing Representation Progression: Analyzing and Enhancing Layer-Wise Similarity
Tractable Multi-Agent Reinforcement Learning through Behavioral Economics
Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
Training LLMs over Neurally Compressed Text
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
Training One-Dimensional Graph Neural Networks is NP-Hard
Training Robust Ensembles Requires Rethinking Lipschitz Continuity
Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models
Trajectory attention for fine-grained video motion control
Transformers are Universal In-context Learners
Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought
Transformer-Squared: Self-adaptive LLMs
Tree of Attributes Prompt Learning for Vision-Language Models
Triples as the Key: Structuring Makes Decomposition and Verification Easier in LLM-based TableQA
Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement
TSVD: Bridging Theory and Practice in Continual Learning with Pre-trained Models
Tuning-Free Bilevel Optimization: New Algorithms and Convergence Analysis
Two Sparse Matrices are Better than One: Sparsifying Neural Networks with Double Sparse Factorization
TypedThinker: Diversify Large Language Model Reasoning with Typed Thinking
Uncertainty-Aware Decoding with Minimum Bayes Risk
Uncertainty Herding: One Active Learning Method for All Label Budgets
Uncertainty modeling for fine-tuned implicit functions
Uncovering Gaps in How Humans and LLMs Interpret Subjective Language
Uncovering Latent Memories in Large Language Models
Understanding Factual Recall in Transformers via Associative Memories
Understanding Methods for Scalable MCTS
Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape View
U-Nets as Belief Propagation: Efficient Classification, Denoising, and Diffusion in Generative Hierarchical Models
UniDetox: Universal Detoxification of Large Language Models via Dataset Distillation
Unified Convergence Analysis for Score-Based Diffusion Models with Deterministic Samplers
UniGEM: A Unified Approach to Generation and Property Prediction for Molecules
Union-over-Intersections: Object Detection beyond Winner-Takes-All
UniRestore3D: A Scalable Framework For General Shape Restoration
Unlearning-based Neural Interpretations
Unlearning or Obfuscating? Jogging the Memory of Unlearned LLMs via Benign Relearning
Unleashing the Potential of Vision-Language Pre-Training for 3D Zero-Shot Lesion Segmentation via Mask-Attribute Alignment
Unlocking Global Optimality in Bilevel Optimization: A Pilot Study
Unlocking Guidance for Discrete State-Space Diffusion and Flow Models
Unlocking the Potential of Model Calibration in Federated Learning
Unsupervised Zero-Shot Reinforcement Learning via Dual-Value Forward-Backward Representation
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs
Utilitarian Algorithm Configuration for Infinite Parameter Spaces
UTILITY: Utilizing Explainable Reinforcement Learning to Improve Reinforcement Learning
VAE-Var: Variational Autoencoder-Enhanced Variational Methods for Data Assimilation in Meteorology
Variance-Reducing Couplings for Random Features
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
Vector-ICL: In-context Learning with Continuous Vector Representations
Verifying Properties of Binary Neural Networks Using Sparse Polynomial Optimization
Vertical Federated Learning with Missing Features During Training and Inference
ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
ViSAGe: Video-to-Spatial Audio Generation
Vision CNNs trained to estimate spatial latents learned similar ventral-stream-aligned representations
Visual Agents as Fast and Slow Thinkers
Visually Consistent Hierarchical Image Classification
Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning
Warm Diffusion: Recipe for Blur-Noise Mixture Diffusion Models
Watch Less, Do More: Implicit Skill Discovery for Video-Conditioned Policy
Wavelet-based Positional Representation for Long Context
Wayward Concepts In Large Multimodal Models
Weakly-Supervised Affordance Grounding Guided by Part-Level Semantic Priors
Weak to Strong Generalization for Large Language Models with Multi-capabilities
Weak-to-Strong Generalization Through the Data-Centric Lens
WeatherGFM: Learning a Weather Generalist Foundation Model via In-context Learning
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Weighted Multi-Prompt Learning with Description-free Large Language Model Distillation
Weighted Point Set Embedding for Multimodal Contrastive Learning Toward Optimal Similarity Metric
What Do You See in Common? Learning Hierarchical Prototypes over Tree-of-Life to Discover Evolutionary Traits
What is Wrong with Perplexity for Long-context Language Modeling?
What Makes Large Language Models Reason in (Multi-Turn) Code Generation?
What Matters in Learning from Large-Scale Datasets for Robot Manipulation
What Secrets Do Your Manifolds Hold? Understanding the Local Geometry of Generative Models
What's New in My Data? Novelty Exploration via Contrastive Generation
What's the Move? Hybrid Imitation Learning via Salient Points
What to align in multimodal contrastive learning?
When does compositional structure yield compositional generalization? A kernel theory.
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
When narrower is better: the narrow width limit of Bayesian parallel branching neural networks
Which Tasks Should Be Compressed Together? A Causal Discovery Approach for Efficient Multi-Task Representation Compression
Why In-Context Learning Models are Good Few-Shot Learners?
Why RoPE Struggles to Maintain Long-Term Decay in Long Sequences?
Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
W-PCA Based Gradient-Free Proxy for Efficient Search of Lightweight Language Models
X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale
X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing
xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation
You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning
YouTube-SL-25: A Large-Scale, Open-Domain Multilingual Sign Language Parallel Corpus
Zero-Shot Natural Language Explanations
ZETA: Leveraging $Z$-order Curves for Efficient Top-$k$ Attention
We use cookies to store which papers have been visited.
I agree
Successful Page Load
ICLR uses cookies for essential functions only. We do not sell your personal information.
Our Privacy Policy »
Accept Cookies
We use cookies to store which papers have been visited.
I agree