# Downloads 2023

Number of events: 1615

- $k$NN Prompting: Beyond-Context Learning with Calibration-Free Nearest Neighbor Inference
- $\Lambda$-DARTS: Mitigating Performance Collapse by Harmonizing Operation Selection among Cells
- $\mathcal{O}$-GNN: incorporating ring priors into molecular modeling
- $\mathrm{SE}(3)$-Equivariant Attention Networks for Shape Reconstruction in Function Space
- $\mathscr{N}$-WL: A New Hierarchy of Expressivity for Graph Neural Networks
- $O(T^{-1})$ Convergence of Optimistic-Follow-the-Regularized-Leader in Two-Player Zero-Sum Markov Games
- $\rm A^2Q$: Aggregation-Aware Quantization for Graph Neural Networks
- 3D Equivariant Diffusion for Target-Aware Molecule Generation and Affinity Prediction
- 3D generation on ImageNet
- 3D Segmenter: 3D Transformer based Semantic Segmentation via 2D Panoramic Distillation
- 3D UX-Net: A Large Kernel Volumetric ConvNet Modernizing Hierarchical Transformer for Medical Image Segmentation
- 4th Workshop on African Natural Language Processing (AfricaNLP 2023)
- AANG : Automating Auxiliary Learning
- A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification
- Accelerated Single-Call Methods for Constrained Min-Max Optimization
- Accelerating Guided Diffusion Sampling with Splitting Numerical Methods
- Accelerating Hamiltonian Monte Carlo via Chebyshev Integration Time
- Accurate Bayesian Meta-Learning by Accurate Task Posterior Inference
- Accurate Image Restoration with Attention Retractable Transformer
- Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats
- Achieve the Minimum Width of Neural Networks for Universal Approximation
- Achieving Near-Optimal Individual Regret & Low Communications in Multi-Agent Bandits
- Achieving Sub-linear Regret in Infinite Horizon Average Reward Constrained MDP with Linear Function Approximation
- A Closer Look at Model Adaptation using Feature Distortion and Simplicity Bias
- A CMDP-within-online framework for Meta-Safe Reinforcement Learning
- ACMP: Allen-Cahn Message Passing with Attractive and Repulsive Forces for Graph Neural Networks
- A Control-Centric Benchmark for Video Prediction
- A Convergent Single-Loop Algorithm for Relaxation of Gromov-Wasserstein in Graph Data
- A critical look at the evaluation of GNNs under heterophily: Are we really making progress?
- Actionable Neural Representations: Grid Cells from Minimal Constraints
- Active Image Indexing
- Active Learning for Object Detection with Evidential Deep Learning and Hierarchical Uncertainty Aggregation
- Active Learning in Bayesian Neural Networks with Balanced Entropy Learning Principle
- Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
- Adaptive Optimization in the $\infty$-Width Limit
- Adaptive Robust Evidential Optimization For Open Set Detection from Imbalanced Data
- Addressing Parameter Choice Issues in Unsupervised Domain Adaptation by Aggregation
- A Differential Geometric View and Explainability of GNN on Evolving Graphs
- Advancing Radiograph Representation Learning with Masked Record Modeling
- Adversarial Attacks on Adversarial Bandits
- Adversarial Diversity in Hanabi
- Adversarial Imitation Learning with Preferences
- Adversarial Training of Self-supervised Monocular Depth Estimation against Physical-World Attacks
- AE-FLOW: Autoencoders with Normalizing Flows for Medical Images Anomaly Detection
- A framework for benchmarking Class-out-of-distribution detection and its application to ImageNet
- A General Framework For Proving The Equivariant Strong Lottery Ticket Hypothesis
- A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning
- A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning
- A General Rank Preserving Framework for Asymmetric Image Retrieval
- Agent-based Graph Neural Networks
- A GNN-Guided Predict-and-Search Framework for Mixed-Integer Linear Programming
- Agnostic Learning of General ReLU Activation Using Gradient Descent
- A Graph Neural Network Approach to Automated Model Building in Cryo-EM Maps
- Agree to Disagree: Diversity through Disagreement for Better Transferability
- AGRO: Adversarial discovery of error-prone Groups for Robust Optimization
- A Higher Precision Algorithm for Computing the $1$-Wasserstein Distance
- A Holistic View of Label Noise Transition Matrix in Deep Learning and Beyond
- AI for Agent-Based Modelling (AI4ABM)
- AI, History and Equity
- AIM: Adapting Image Models for Efficient Video Action Recognition
- A Kernel Perspective of Skip Connections in Convolutional Networks
- A Laplace-inspired Distribution on SO(3) for Probabilistic Rotation Estimation
- A law of adversarial risk, interpolation, and label noise
- A Learning Based Hypothesis Test for Harmful Covariate Shift
- Aligning Model and Macaque Inferior Temporal Cortex Representations Improves Model-to-Human Behavioral Alignment and Adversarial Robustness
- Almost Linear Constant-Factor Sketching for $\ell_1$ and Logistic Regression
- Alternating Differentiation for Optimization Layers
- A Message Passing Perspective on Learning Dynamics of Contrastive Learning
- A Minimalist Dataset for Systematic Generalization of Perception, Syntax, and Semantics
- A Mixture-of-Expert Approach to RL-based Dialogue Management
- A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental Learning
- Amortised Invariance Learning for Contrastive Self-Supervision
- A Multi-Grained Self-Interpretable Symbolic-Neural Model For Single/Multi-Labeled Text Classification
- An Adaptive Policy to Employ Sharpness-Aware Minimization
- An Additive Instance-Wise Approach to Multi-class Model Interpretation
- Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning
- Analogy-Forming Transformers for Few-Shot 3D Parsing
- Analyzing Tree Architectures in Ensembles via Neural Tangent Kernel
- Anamnesic Neural Differential Equations with Orthogonal Polynomial Projections
- An efficient encoder-decoder architecture with top-down attention for speech separation
- An Equal-Size Hard EM Algorithm for Diverse Dialogue Generation
- A Neural Mean Embedding Approach for Back-door and Front-door Adjustment
- A new characterization of the edge of stability based on a sharpness measure aware of batch gradient distribution
- An Exact Poly-Time Membership-Queries Algorithm for Extracting a Three-Layer ReLU Network
- An Extensible Multi-modal Multi-task Object Dataset with Materials
- An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
- Anisotropic Message Passing: Graph Neural Networks with Directional and Long-Range Interactions
- A Non-Asymptotic Analysis of Oversmoothing in Graph Neural Networks
- A Non-monotonic Self-terminating Language Model
- Anti-Symmetric DGN: a stable architecture for Deep Graph Networks
- AnyDA: Anytime Domain Adaptation
- Any-scale Balanced Samplers for Discrete Space
- Approximate Bayesian Inference with Stein Functional Variational Gradient Descent
- Approximate Nearest Neighbor Search through Modern Error-Correcting Codes
- Approximate Vanishing Ideal Computations at Scale
- Approximation and non-parametric estimation of functions over high-dimensional spheres via deep ReLU networks
- A Primal-Dual Framework for Transformers and Neural Networks
- A probabilistic framework for task-aligned intra- and inter-area neural manifold estimation
- Arbitrary Virtual Try-on Network: Characteristics Representation and Trade-off between Body and Clothing
- ArCL: Enhancing Contrastive Learning with Augmentation-Robust Representations
- Are More Layers Beneficial to Graph Transformers?
- Artificial Neuronal Ensembles with Learned Context Dependent Gating
- A Self-Attention Ansatz for Ab-initio Quantum Chemistry
- A Simple Approach for Visual Room Rearrangement: 3D Mapping and Semantic Search
- A Simple Yet Powerful Deep Active Learning With Snapshots Ensembles
- Ask Me Anything: A simple strategy for prompting language models
- Associative Memory Augmented Asynchronous Spatiotemporal Representation Learning for Event-based Perception
- A Stable and Scalable Method for Solving Initial Value PDEs with Neural Networks
- A Statistical Framework for Personalized Federated Learning and Estimation: Theory, Algorithms, and Privacy
- Asymptotic Instance-Optimal Algorithms for Interactive Decision Making
- Asynchronous Distributed Bilevel Optimization
- Asynchronous Gradient Play in Zero-Sum Multi-agent Games
- A System for Morphology-Task Generalization via Unified Representation and Behavior Distillation
- A Theoretical Framework for Inference and Learning in Predictive Coding Networks
- A theoretical study of inductive biases in contrastive learning
- A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity
- A Theory of Dynamic Benchmarks
- A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
- AudioGen: Textually Guided Audio Generation
- Augmentation Component Analysis: Modeling Similarity via the Augmentation Overlaps
- Augmentation with Projection: Towards an Effective and Efficient Data Augmentation Paradigm for Distillation
- A Unified Algebraic Perspective on Lipschitz Neural Networks
- A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games
- A Unified Framework for Soft Threshold Pruning
- Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning?
- Auto-Encoding Goodness of Fit
- AutoGT: Automated Graph Transformer Architecture Search
- Automated Data Augmentations for Graph Classification
- Automatic Chain of Thought Prompting in Large Language Models
- Automating Nearest Neighbor Search Configuration with Constrained Optimization
- Autoregressive Conditional Neural Processes
- AutoTransfer: AutoML with Knowledge Transfer - An Application to Graph Neural Networks
- A VAE for Transformers with Nonparametric Variational Information Bottleneck
- Average Sensitivity of Decision Tree Learning
- A View From Somewhere: Human-Centric Face Representations
- A view of mini-batch SGD via generating functions: conditions of convergence, phase transitions, benefit from negative momenta.
- Avoiding spurious correlations via logit correction
- Backdoor Attacks and Defenses in Machine Learning
- Backpropagation at the Infinitesimal Inference Limit of Energy-Based Models: Unifying Predictive Coding, Equilibrium Propagation, and Contrastive Hebbian Learning
- Backpropagation through Combinatorial Algorithms: Identity with Projection Works
- Backstepping Temporal Difference Learning
- Bag of Tricks for Unsupervised Text-to-Speech
- BALTO: fast tensor program optimization with diversity-based active learning
- Basic Binary Convolution Unit for Binarized Image Restoration Network
- Batch Multivalid Conformal Prediction
- Bayesian Oracle for bounding information gain in neural encoding models
- Bayes-MIL: A New Probabilistic Perspective on Attention-based Multiple Instance Learning for Whole Slide Images
- BAYES RISK CTC: CONTROLLABLE CTC ALIGNMENT IN SEQUENCE-TO-SEQUENCE TASKS
- BC-IRL: Learning Generalizable Reward Functions from Demonstrations
- Become a Proficient Player with Limited Data through Watching Pure Videos
- BEEF: Bi-Compatible Class-Incremental Learning via Energy-Based Expansion and Fusion
- Behavior Prior Representation learning for Offline Reinforcement Learning
- Behavior Proximal Policy Optimization
- Behind the Scenes of Gradient Descent: A Trajectory Analysis via Basis Function Decomposition
- Benchmarking Constraint Inference in Inverse Reinforcement Learning
- Benchmarking Offline Reinforcement Learning on Real-Robot Hardware
- Benign Overfitting in Classification: Provably Counter Label Noise with Larger Models
- Better Generative Replay for Continual Federated Learning
- Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation
- Betty: An Automatic Differentiation Library for Multilevel Optimization
- BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object Detection
- Beyond calibration: estimating the grouping loss of modern neural networks
- Beyond Lipschitz: Sharp Generalization and Excess Risk Bounds for Full-Batch GD
- Bias Propagation in Federated Learning
- Bidirectional Language Models Are Also Few-shot Learners
- Bidirectional Propagation for Cross-Modal 3D Object Detection
- BigVGAN: A Universal Neural Vocoder with Large-Scale Training
- Bi-level Physics-Informed Neural Networks for PDE Constrained Optimization using Broyden's Hypergradients
- Binding Language Models in Symbolic Languages
- Bispectral Neural Networks
- Bit-Pruning: A Sparse Multiplication-Less Dot-Product
- Bitrate-Constrained DRO: Beyond Worst Case Robustness To Unknown Group Shifts
- Block and Subword-Scaling Floating-Point (BSFP) : An Efficient Non-Uniform Quantization For Low Precision Inference
- Blog Track Poster Session
- Blurring Diffusion Models
- Boosting Adversarial Transferability using Dynamic Cues
- Boosting Causal Discovery via Adaptive Sample Reweighting
- Boosting Multiagent Reinforcement Learning via Permutation Invariant and Permutation Equivariant Networks
- Boosting the Cycle Counting Power of Graph Neural Networks with I$^2$-GNNs
- Bort: Towards Explainable Neural Networks with Bounded Orthogonal Constraint
- BrainBERT: Self-supervised representation learning for intracranial recordings
- Brain-like representational straightening of natural movies in robust feedforward neural networks
- Breaking Correlation Shift via Conditional Invariant Regularizer
- Bridge the Inference Gaps of Neural Processes via Expectation Maximization
- Bridging the Gap between ANNs and SNNs by Calibrating Offset Spikes
- Bridging the Gap to Real-World Object-Centric Learning
- Broken Neural Scaling Laws
- BSTT: A Bayesian Spatial-Temporal Transformer for Sleep Staging
- Budgeted Training for Vision Transformer
- Building a Subspace of Policies for Scalable Continual Learning
- Building Normalizing Flows with Stochastic Interpolants
- Calibrating Sequence likelihood Improves Conditional Language Generation
- Calibrating the Rigged Lottery: Making All Tickets Reliable
- Calibrating Transformers via Sparse Gaussian Processes
- Calibration Matters: Tackling Maximization Bias in Large-scale Advertising Recommendation Systems
- Can Agents Run Relay Race with Strangers? Generalization of RL to Out-of-Distribution Trajectories
- Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries
- Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study
- Can CNNs Be More Robust Than Transformers?
- Can discrete information extraction prompts generalize across language models?
- CANIFE: Crafting Canaries for Empirical Privacy Measurement in Federated Learning
- Can Neural Networks Learn Implicit Logic from Physical Reasoning?
- Can We Faithfully Represent Absence States to Compute Shapley Values on a DNN?
- Can We Find Nash Equilibria at a Linear Rate in Markov Games?
- Capturing the Motion of Every Joint: 3D Human Pose and Shape Estimation with Independent Tokens
- CASR: Generating Complex Sequences with Autoregressive Self-Boost Refinement
- Causal Balancing for Domain Generalization
- Causal Confusion and Reward Misidentification in Preference-Based Reward Learning
- Causal Estimation for Text Data with (Apparent) Overlap Violations
- Causal Imitation Learning via Inverse Reinforcement Learning
- Causality Compensated Attention for Contextual Biased Visual Recognition
- Causal Reasoning in the Presence of Latent Confounders via Neural ADMG Learning
- Causal Representation Learning for Instantaneous and Temporal Effects in Interactive Systems
- Certifiably Robust Policy Learning against Adversarial Multi-Agent Communication
- (Certified!!) Adversarial Robustness for Free!
- Certified Defences Against Adversarial Patch Attacks on Semantic Segmentation
- Certified Training: Small Boxes are All You Need
- CFlowNets: Continuous Control with Generative Flow Networks
- Characteristic Neural Ordinary Differential Equation
- Characterizing intrinsic compositionality in transformers with Tree Projections
- Characterizing the Influence of Graph Elements
- Characterizing the spectrum of the NTK via a power series expansion
- Chasing All-Round Graph Representation Robustness: Model, Training, and Optimization
- Cheap Talk Discovery and Utilization in Multi-Agent Reinforcement Learning
- ChiroDiff: Modelling chirographic data with Diffusion Models
- ChordMixer: A Scalable Neural Attention Model for Sequences with Different Length
- Choreographer: Learning and Adapting Skills in Imagination
- CircNet: Meshing 3D Point Clouds with Circumcenter Detection
- CktGNN: Circuit Graph Neural Network for Electronic Design Automation
- CLARE: Conservative Model-Based Reward Learning for Offline Inverse Reinforcement Learning
- Classically Approximating Variational Quantum Machine Learning with Random Fourier Features
- Clean-image Backdoor: Attacking Multi-label Models with Poisoned Labels Only
- Clifford Neural Layers for PDE Modeling
- CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks
- CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
- CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Alignment
- CO3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving
- CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code
- CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
- CodeT: Code Generation with Generated Tests
- Code Translation with Compiler Representations
- CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
- Collaborative Pure Exploration in Kernel Bandit
- Combating Exacerbated Heterogeneity for Robust Models in Federated Learning
- Combinatorial-Probabilistic Trade-Off: P-Values of Community Properties Test in the Stochastic Block Models
- Combinatorial Pure Exploration of Causal Bandits
- Competitive Physics Informed Networks
- Complexity-Based Prompting for Multi-step Reasoning
- Composing Ensembles of Pre-trained Models via Iterative Consensus
- Composing Task Knowledge With Modular Successor Feature Approximators
- Composite Slice Transformer: An Efficient Transformer with Composition of Multi-Scale Multi-Range Attentions
- Compositionality with Variation Reliably Emerges in Neural Networks
- Compositional Law Parsing with Latent Random Functions
- Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection
- Compositional Semantic Parsing with Large Language Models
- Compositional Task Representations for Large Language Models
- Compressing multidimensional weather and climate data into neural networks
- Computational Language Acquisition with Theory of Mind
- Computing all Optimal Partial Transports
- Concept Gradient: Concept-based Interpretation Without Linear Assumption
- Concept-level Debugging of Part-Prototype Networks
- Conditional Antibody Design as 3D Equivariant Graph Translation
- Conditional Positional Encodings for Vision Transformers
- Confidence-Based Feature Imputation for Graphs with Partially Known Features
- Confidence-Conditioned Value Functions for Offline Reinforcement Learning
- Confidence Estimation Using Unlabeled Data
- Confidential-PROFITT: Confidential PROof of FaIr Training of Trees
- Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization
- Consolidator: Mergable Adapter with Group Connections for Visual Adaptation
- Constraining Representations Yields Models That Know What They Don't Know
- Constructive TT-representation of the tensors given as index interaction functions with applications
- Context-enriched molecule representations improve few-shot drug discovery
- Contextual bandits with concave rewards, and an application to fair ranking
- Contextual Convolutional Networks
- Contextual Image Masking Modeling via Synergized Contrasting without View Augmentation for Faster and Better Visual Pretraining
- Continual evaluation for lifelong learning: Identifying the stability gap
- Continual Pre-training of Language Models
- Continual Transformers: Redundancy-Free Attention for Online Inference
- Continual Unsupervised Disentangling of Self-Organizing Representations
- Continuized Acceleration for Quasar Convex Functions in Non-Convex Optimization
- Continuous-Discrete Convolution for Geometry-Sequence Modeling in Proteins
- Continuous PDE Dynamics Forecasting with Implicit Neural Representations
- Continuous pseudo-labeling from the start
- Continuous-time identification of dynamic state-space models by deep subspace encoding
- ContraNorm: A Contrastive Learning Perspective on Oversmoothing and Beyond
- Contrastive Alignment of Vision to Language Through Parameter-Efficient Transfer Learning
- Contrastive Audio-Visual Masked Autoencoder
- Contrastive Corpus Attribution for Explaining Representations
- Contrastive Learning Can Find An Optimal Basis For Approximately View-Invariant Functions
- Contrastive Learning for Unsupervised Domain Adaptation of Time Series
- Contrastive Meta-Learning for Partially Observable Few-Shot Learning
- Copy is All You Need
- Correlative Information Maximization Based Biologically Plausible Neural Networks for Correlated Source Separation
- Corrupted Image Modeling for Self-Supervised Visual Pre-Training
- CoRTX: Contrastive Framework for Real-time Explanation
- Coupled Multiwavelet Operator Learning for Coupled Differential Equations
- Coverage-centric Coreset Selection for High Pruning Rates
- CrAM: A Compression-Aware Minimizer
- Critic Sequential Monte Carlo
- CROM: Continuous Reduced-Order Modeling of PDEs Using Implicit Neural Representations
- Crossformer: Transformer Utilizing Cross-Dimension Dependency for Multivariate Time Series Forecasting
- Cross-Layer Retrospective Retrieving via Layer Attention
- Cross-Level Distillation and Feature Denoising for Cross-Domain Few-Shot Classification
- CUDA: Curriculum of Data Augmentation for Long-tailed Recognition
- Curriculum-based Co-design of Morphology and Control of Voxel-based Soft Robots
- CUTS: Neural Causal Discovery from Irregular Time-Series Data
- Cycle-consistent Masked AutoEncoder for Unsupervised Domain Generalization
- Cycle to Clique (Cy2C) Graph Neural Network: A Sight to See beyond Neighborhood Aggregation
- D4AM: A General Denoising Framework for Downstream Acoustic Models
- D4FT: A Deep Learning Approach to Kohn-Sham Density Functional Theory
- DAG Learning on the Permutahedron
- DAG Matters! GFlowNets Enhanced Explainer for Graph Neural Networks
- DamoFD: Digging into Backbone Design on Face Detection
- DASHA: Distributed Nonconvex Optimization with Communication Compression and Optimal Oracle Complexity
- Data augmentation alone can improve adversarial training
- Data Continuity Matters: Improving Sequence Modeling with Lipschitz Regularizer
- Data-Free One-Shot Federated Learning Under Very High Statistical Heterogeneity
- Dataless Knowledge Fusion by Merging Weights of Language Models
- Dataset Pruning: Reducing Training Data by Examining Generalization Influence
- Data Valuation Without Training of a Model
- DAVA: Disentangling Adversarial Variational Autoencoder
- DaxBench: Benchmarking Deformable Object Manipulation with Differentiable Physics
- DBQ-SSD: Dynamic Ball Query for Efficient 3D Object Detection
- DCI-ES: An Extended Disentanglement Framework with Connections to Identifiability
- DDM$^2$: Self-Supervised Diffusion MRI Denoising with Generative Diffusion Models
- DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
- DecAF: Joint Decoding of Answers and Logical Forms for Question Answering over Knowledge Bases
- DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training
- Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret Learning in Markov Games
- Decepticons: Corrupted Transformers Breach Privacy in Federated Learning for Language Models
- Decision S4: Efficient Sequence-Based RL via State Spaces Layers
- Decision Transformer under Random Frame Dropping
- Decomposed Prompting: A Modular Approach for Solving Complex Tasks
- Decompose to Generalize: Species-Generalized Animal Pose Estimation
- Decompositional Generation Process for Instance-Dependent Partial Label Learning
- Deconstructing Distributions: A Pointwise Framework of Learning
- Decoupled Training for Long-Tailed Classification With Stochastic Representations
- Deep Declarative Dynamic Time Warping for End-to-End Learning of Alignment Paths
- Deep Ensembles for Graphs with Higher-order Dependencies
- Deep Generative Modeling on Limited Data with Regularization by Nontransferable Pre-trained Models
- Deep Generative Symbolic Regression
- Deep Learning for Code (DL4C)
- Deep Learning From Crowdsourced Labels: Coupled Cross-Entropy Minimization, Identifiability, and Regularization
- Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?
- Deep Learning on Implicit Neural Representations of Shapes
- Deep Ranking Ensembles for Hyperparameter Optimization
- Deep Reinforcement Learning for Cost-Effective Medical Diagnosis
- Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation
- Deep Variational Implicit Processes
- Defending against Adversarial Audio via Diffusion Model
- Deja Vu: Continual Model Generalization for Unseen Domains
- DELTA: DEGRADATION-FREE FULLY TEST-TIME ADAPTATION
- Delving into Semantic Scale Imbalance
- Denoising Diffusion Error Correction Codes
- Denoising Diffusion Samplers
- Denoising Masked Autoencoders Help Robust Classification
- De Novo Molecular Generation via Connection-aware Motif Mining
- DensePure: Understanding Diffusion Models for Adversarial Robustness
- DENSE RGB SLAM WITH NEURAL IMPLICIT MAPS
- DEP-RL: Embodied Exploration for Reinforcement Learning in Overactuated and Musculoskeletal Systems
- DepthFL : Depthwise Federated Learning for Heterogeneous Clients
- Depth Separation with Multilayer Mean-Field Networks
- Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling
- Deterministic training of generative autoencoders using invertible layers
- DexDeform: Dexterous Deformable Object Manipulation with Human Demonstrations and Differentiable Physics
- DFlow: Learning to Synthesize Better Optical Flow Datasets via a Differentiable Pipeline
- DFPC: Data flow driven pruning of coupled channels without data.
- Diagnosing and Rectifying Vision Models using Language
- Dialogue Research in the Era of LLMs
- Dichotomy of Control: Separating What You Can Control from What You Cannot
- DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking
- DiffEdit: Diffusion-based semantic image editing with mask guidance
- Differentiable Gaussianization Layers for Inverse Problems Regularized by Deep Generative Models
- Differentiable Mathematical Programming for Object-Centric Representation Learning
- Differentially Private $L_2$-Heavy Hitters in the Sliding Window Model
- Differentially Private Adaptive Optimization with Delayed Preconditioners
- DiffMimic: Efficient Motion Mimicking with Differentiable Physics
- DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion
- DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models
- DiffusER: Diffusion via Edit-based Reconstruction
- Diffusion Adversarial Representation Learning for Self-supervised Vessel Segmentation
- Diffusion-based Image Translation using disentangled style and content representation
- Diffusion-GAN: Training GANs with Diffusion
- Diffusion Models Already Have A Semantic Latent Space
- Diffusion Models for Causal Discovery via Topological Ordering
- Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning
- Diffusion Posterior Sampling for General Noisy Inverse Problems
- Diffusion Probabilistic Fields
- Diffusion Probabilistic Modeling of Protein Backbones in 3D for the motif-scaffolding problem
- DiGress: Discrete Denoising diffusion for graph generation
- Dilated convolution with learnable spacings
- Diminishing Return of Value Expansion Methods in Model-Based Reinforcement Learning
- DINO as a von Mises-Fisher mixture model
- DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
- Direct Embedding of Temporal Network Edges via Time-Decayed Line Graphs
- Dirichlet-based Uncertainty Calibration for Active Domain Adaptation
- Discovering Evolution Strategies via Meta-Black-Box Optimization
- Discovering Generalizable Multi-agent Coordination Skills from Multi-task Offline Data
- Discovering Informative and Robust Positives for Video Domain Adaptation
- Discovering Latent Knowledge in Language Models Without Supervision
- Discovering Policies with DOMiNO: Diversity Optimization Maintaining Near Optimality
- Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation
- Discrete Predictor-Corrector Diffusion Models for Image Synthesis
- Disentanglement of Correlated Factors via Hausdorff Factorized Support
- Disentanglement with Biological Constraints: A Theory of Functional Cell Types
- Disentangling Learning Representations with Density Estimation
- Disentangling the Mechanisms Behind Implicit Regularization in SGD
- Disparate Impact in Differential Privacy from Gradient Misalignment
- Distilling Cognitive Backdoor Patterns within an Image
- Distilling Model Failures as Directions in Latent Space
- Distributed Differential Privacy in Multi-Armed Bandits
- Distributed Extra-gradient with Optimal Complexity and Communication Guarantees
- Distributionally Robust Post-hoc Classifiers under Prior Shifts
- Distributionally Robust Recourse Action
- Distributional Meta-Gradient Reinforcement Learning
- Diversify and Disambiguate: Out-of-Distribution Robustness via Disagreement
- Divide to Adapt: Mitigating Confirmation Bias for Domain Adaptation of Black-Box Predictors
- DM-NeRF: 3D Scene Geometry Decomposition and Manipulation from 2D Images
- DocPrompting: Generating Code by Retrieving the Docs
- Does Deep Learning Learn to Abstract? A Systematic Probing Framework
- Does Learning from Decentralized Non-IID Unlabeled Data Benefit from Self Supervision?
- Does Zero-Shot Reinforcement Learning Exist?
- Domain Generalisation via Domain Adaptation: An Adversarial Fourier Amplitude Approach
- Domain Generalization via Heckman-type Selection Models
- Domain-Indexing Variational Bayes: Interpretable Domain Index for Domain Adaptation
- Don’t fear the unlabelled: safe semi-supervised learning via debiasing
- Don’t forget the nullspace! Nullspace occupancy as a mechanism for out of distribution failure
- Do We Really Need Complicated Model Architectures For Temporal Networks?
- Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs
- DreamFusion: Text-to-3D using 2D Diffusion
- DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training
- Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL Robustness
- DualAfford: Learning Collaborative Visual Affordance for Dual-gripper Manipulation
- Dual Algorithmic Reasoning
- Dual Diffusion Implicit Bridges for Image-to-Image Translation
- Dual Student Networks for Data-Free Model Stealing
- Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning
- Dynamic Update-to-Data Ratio: Minimizing World Model Overfitting
- DynaMS: Dyanmic Margin Selection for Efficient Deep Learning
- DySR: Adaptive Super-Resolution via Algorithm and System Co-design
- E3Bind: An End-to-End Equivariant Network for Protein-Ligand Docking
- EAGLE: Large-scale Learning of Turbulent Fluid Dynamics with Mesh Transformers
- EA-HAS-Bench: Energy-aware Hyperparameter and Architecture Search Benchmark
- Easy Differentially Private Linear Regression
- E-CRF: Embedded Conditional Random Field for Boundary-caused Class Weights Confusion in Semantic Segmentation
- Edgeformers: Graph-Empowered Transformers for Representation Learning on Textual-Edge Networks
- Edge Guided GANs with Contrastive Learning for Semantic Image Synthesis
- Editing models with task arithmetic
- Effectively Modeling Time Series with Simple Discrete State Spaces
- Effective passive membership inference attacks in federated learning against overparameterized models
- Effective Self-supervised Pre-training on Low-compute Networks without Distillation
- Effects of Graph Convolutions in Multi-layer Networks
- Efficient approximation of neural population structure and correlations with probabilistic circuits
- Efficient Attention via Control Variates
- Efficient Certified Training and Robustness Verification of Neural ODEs
- Efficient Conditionally Invariant Representation Learning
- Efficient Deep Reinforcement Learning Requires Regulating Overfitting
- Efficient Discrete Multi Marginal Optimal Transport Regularization
- Efficient Edge Inference by Selective Query
- Efficient Federated Domain Translation
- Efficiently Computing Nash Equilibria in Adversarial Team Markov Games
- Efficiently Controlling Multiple Risks with Pareto Testing
- Efficient Model Updates for Approximate Unlearning of Graph-Structured Data
- Efficient Offline Policy Optimization with a Learned Model
- Efficient Planning in a Compact Latent Action Space
- Efficient recurrent architectures through activity sparsity and sparse back-propagation through time
- Embedding Fourier for Ultra-High-Definition Low-Light Image Enhancement
- Emergence of Maps in the Memories of Blind Navigation Agents
- Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task
- Empowering Graph Representation Learning with Test-Time Graph Transformation
- Empowering Networks With Scale and Rotation Equivariance Using A Similarity Convolution
- Encoding Recurrence into Transformers
- Energy-based Out-of-Distribution Detection for Graph Neural Networks
- Energy-Based Test Sample Adaptation for Domain Generalization
- Energy-Inspired Self-Supervised Pretraining for Vision Models
- Enhancing Meta Learning via Multi-Objective Soft Improvement Functions
- Enhancing the Inductive Biases of Graph Neural ODE for Modeling Physical Systems
- Ensuring DNN Solution Feasibility for Optimization Problems with Linear Constraints
- Entanglements, Exploring Artificial Biodiversity
- EPISODE: Episodic Gradient Clipping with Periodic Resampled Corrections for Federated Learning with Heterogeneous Data
- Equal Improvability: A New Fairness Notion Considering the Long-term Impact
- Equiformer: Equivariant Graph Attention Transformer for 3D Atomistic Graphs
- EquiMod: An Equivariance Module to Improve Visual Instance Discrimination
- Equivariance-aware Architectural Optimization of Neural Networks
- Equivariant Descriptor Fields: SE(3)-Equivariant Energy-Based Models for End-to-End Visual Robotic Manipulation Learning
- Equivariant Energy-Guided SDE for Inverse Molecular Design
- Equivariant Hypergraph Diffusion Neural Operators
- Equivariant Shape-Conditioned Generation of 3D Molecules for Ligand-Based Drug Design
- ERL-Re$^2$: Efficient Evolutionary Reinforcement Learning with Shared State Representation and Individual Policy Representation
- Error Sensitivity Modulation based Experience Replay: Mitigating Abrupt Representation Drift in Continual Learning
- ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret
- ESD: Expected Squared Difference as a Tuning-Free Trainable Calibration Measure
- Estimating individual treatment effects under unobserved confounding using binary instruments
- EUCLID: Towards Efficient Unsupervised Reinforcement Learning with Multi-choice Dynamics Model
- EVA3D: Compositional 3D Human Generation from 2D Image Collections
- Evaluating Long-Term Memory in 3D Mazes
- Evaluating Representations with Readout Model Switching
- Eva: Practical Second-order Optimization with Kronecker-vectorized Approximation
- EVC: Towards Real-Time Neural Image Compression with Mask Decay
- Everybody Needs Good Neighbours: An Unsupervised Locality-based Method for Bias Mitigation
- EV-GAN: Simulation of extreme events with ReLU neural networks
- Evidential Uncertainty and Diversity Guided Active Learning for Scene Graph Generation
- Evolve Smoothly, Fit Consistently: Learning Smooth Latent Dynamics For Advection-Dominated Systems
- Evolving Populations of Diverse RL Agents with MAP-Elites
- Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student Settings and its Superiority to Kernel Methods
- Explaining RL Decisions with Trajectories
- Explaining Temporal Graph Models through an Explorer-Navigator Framework
- Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation
- Explicitly Minimizing the Blur Error of Variational Autoencoders
- Exploring Active 3D Object Detection from a Generalization Perspective
- Exploring and Exploiting Decision Boundary Dynamics for Adversarial Robustness
- Exploring Low-Rank Property in Multiple Instance Learning for Whole Slide Image Classification
- Exploring perceptual straightness in learned visual representations
- Exploring Temporally Dynamic Data Augmentation for Video Recognition
- Exploring the Limits of Differentially Private Deep Learning with Group-wise Clipping
- Exploring The Role of Mean Teachers in Self-supervised Masked Auto-Encoders
- Exponential Generalization Bounds with Near-Optimal Rates for $L_q$-Stable Algorithms
- ExpressivE: A Spatio-Functional Embedding For Knowledge Graph Completion
- Expressive Monotonic Neural Networks
- Extracting Robust Models with Uncertain Examples
- Extremely Simple Activation Shaping for Out-of-Distribution Detection
- Extreme Q-Learning: MaxEnt RL without Entropy
- Factorized Fourier Neural Operators
- Fair Attribute Completion on Graph with Missing Attributes
- FaiREE: fair classification with finite-sample and distribution-free guarantee
- FairGBM: Gradient Boosting with Fairness Constraints
- Fairness and Accuracy under Domain Generalization
- Fairness-aware Contrastive Learning with Partially Annotated Sensitive Attributes
- Fake It Until You Make It : Towards Accurate Near-Distribution Novelty Detection
- Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-oriented Dialogue Systems
- Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search
- Faster federated optimization under second-order similarity
- Faster Gradient-Free Methods for Escaping Saddle Points
- Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games
- FastFill: Efficient Compatible Model Update
- Fast Nonlinear Vector Quantile Regression
- Fast Sampling of Diffusion Models with Exponential Integrator
- f-DM: A Multi-stage Diffusion Model via Progressive Signal Transformation
- Feature Reconstruction From Outputs Can Mitigate Simplicity Bias in Neural Networks
- Feature selection and low test error in shallow low-rotation ReLU networks
- FedDAR: Federated Domain-Aware Representation Learning
- Federated Learning as Variational Inference: A Scalable Expectation Propagation Approach
- Federated Learning from Small Datasets
- Federated Nearest Neighbor Machine Translation
- Federated Neural Bandits
- FedExP: Speeding Up Federated Averaging via Extrapolation
- FedFA: Federated Feature Augmentation
- FedSpeed: Larger Local Interval, Less Communication Round, and Higher Generalization Accuracy
- Few-shot Backdoor Attacks via Neural Tangent Kernels
- Few-shot Cross-domain Image Generation via Inference-time Latent-code Learning
- Few-Shot Domain Adaptation For End-to-End Communication
- FIFA: Making Fairness More Generalizable in Classifiers Trained on Imbalanced Data
- FIGARO: Controllable Music Generation using Learned and Expert Features
- Filter-Recovery Network for Multi-Speaker Audio-Visual Speech Separation
- FINDE: Neural Differential Equations for Finding and Preserving Invariant Quantities
- Finding Actual Descent Directions for Adversarial Training
- Finding the Global Semantic Representation in GAN through Fréchet Mean
- First Steps Toward Understanding the Extrapolation of Nonlinear Models to Unseen Domains
- First workshop on "Machine Learning & Global Health".
- Fisher-Legendre (FishLeg) optimization of deep neural networks
- FIT: A Metric for Model Sensitivity
- FiT: Parameter Efficient Few-shot Transfer Learning for Personalized and Federated Image Classification
- FLIP: A Provable Defense Framework for Backdoor Mitigation in Federated Learning
- Flow Annealed Importance Sampling Bootstrap
- Flow Matching for Generative Modeling
- Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
- FluidLab: A Differentiable Environment for Benchmarking Complex Fluid Manipulation
- Fooling SHAP with Stealthily Biased Sampling
- Formal Mathematics Statement Curriculum Learning
- Forward Super-Resolution: How Can GANs Learn Hierarchical Generative Models for Real-World Distributions
- FoSR: First-order spectral rewiring for addressing oversquashing in GNNs
- Free Lunch for Domain Adversarial Training: Environment Label Smoothing
- FreeMatch: Self-adaptive Thresholding for Semi-supervised Learning
- From $t$-SNE to UMAP with contrastive learning
- From Molecules to Materials: ICLR 2023 Workshop on Machine learning for materials (ML4Materials)
- From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data
- Function-Consistent Feature Distillation
- Function-space regularized Rényi divergences
- Fundamental Limits in Formal Verification of Message-Passing Neural Networks
- Fundamental limits on the robustness of image classifiers
- FunkNN: Neural Interpolation for Functional Generation
- Fuzzy Alignments in Directed Acyclic Graph for Non-Autoregressive Machine Translation
- GAIN: On the Generalization of Instructional Action Understanding
- GAMR: A Guided Attention Model for (visual) Reasoning
- gDDIM: Generalized denoising diffusion implicit models
- GEASS: Neural causal feature selection for high-dimensional biological data
- GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis
- Generalization and Estimation Error Bounds for Model-based Neural Networks
- Generalization Bounds for Federated Learning: Fast Rates, Unparticipating Clients and Unbounded Losses
- Generalized Precision Matrix for Scalable Estimation of Nonparametric Markov Networks
- Generalize Learned Heuristics to Solve Large-scale Vehicle Routing Problems in Real-time
- Generalizing and Decoupling Neural Collapse via Hyperspherical Uniformity Gap
- General Neural Gauge Fields
- Generate rather than Retrieve: Large Language Models are Strong Context Generators
- Generating Diverse Cooperative Agents by Learning Incompatible Policies
- Generating Sequences by Learning to Self-Correct
- Generative Augmented Flow Networks
- Generative Modeling Helps Weak Supervision (and Vice Versa)
- Generative Modelling with Inverse Heat Dissipation
- Geometrically regularized autoencoders for non-Euclidean data
- GFlowNets and variational inference
- Git Re-Basin: Merging Models modulo Permutation Symmetries
- GLM-130B: An Open Bilingual Pre-trained Model
- Global Explainability of GNNs via Logic Combination of Learned Concepts
- Globally Injective ReLU Networks
- Globally Optimal Training of Neural Networks with Threshold Activation Functions
- GNNDelete: A General Strategy for Unlearning in Graph Neural Networks
- GNNInterpreter: A Probabilistic Generative Model-Level Explanation for Graph Neural Networks
- GoBigger: A Scalable Platform for Cooperative-Competitive Multi-Agent Interactive Simulation
- GOGGLE: Generative Modelling for Tabular Data by Learning Relational Structure
- GOOD: Exploring geometric cues for detecting objects in an open world
- GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation
- GRACE-C: Generalized Rate Agnostic Causal Estimation via Constraints
- Gradient Boosting Performs Gaussian Process Inference
- Gradient Gating for Deep Multi-Rate Learning on Graphs
- Gradient-Guided Importance Sampling for Learning Binary Energy-Based Models
- Graph-based Deterministic Policy Gradient for Repetitive Combinatorial Optimization Problems
- Graph Contrastive Learning for Skeleton-based Action Recognition
- Graph Domain Adaptation via Theory-Grounded Spectral Regularization
- Graph Neural Network-Inspired Kernels for Gaussian Processes in Semi-Supervised Learning
- Graph Neural Networks are Inherently Good Generalizers: Insights by Bridging GNNs and MLPs
- Graph Neural Networks for Link Prediction with Subgraph Sketching
- Graph Signal Sampling for Inductive One-Bit Matrix Completion: a Closed-form Solution
- Gray-Box Gaussian Processes for Automated Reinforcement Learning
- Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences
- Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement
- GReTo: Remedying dynamic graph topology-task discordance via target homophily
- Gromov-Wasserstein Autoencoders
- Grounding Graph Network Simulators using Physical Sensor Observations
- Guarded Policy Optimization with Imperfect Online Demonstrations
- Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners
- Guiding continuous operator learning through Physics-based boundary constraints
- Guiding Energy-based Models via Contrastive Latent Variables
- Guiding Safe Exploration with Weakest Preconditions
- H2RBox: Horizontal Box Annotation is All You Need for Oriented Object Detection
- Hard-Meta-Dataset++: Towards Understanding Few-Shot Performance on Difficult Tasks
- Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Weighting
- Harnessing Out-Of-Distribution Examples via Augmenting Content and Style
- Hebbian and Gradient-based Plasticity Enables Robust Memory and Rapid Learning in RNNs
- Hebbian Deep Learning Without Feedback
- Heterogeneous Neuronal and Synaptic Dynamics for Spike-Efficient Unsupervised Learning: Theory and Design Principles
- HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention
- Hidden Markov Transformer for Simultaneous Machine Translation
- Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement
- Hierarchical Relational Learning for Few-Shot Knowledge Graph Completion
- Hierarchical Sliced Wasserstein Distance
- HiT-MDP: Learning the SMDP option framework on MDPs with Hidden Temporal Embeddings
- HiViT: A Simpler and More Efficient Design of Hierarchical Vision Transformer
- Holistic Adversarially Robust Pruning
- HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers
- HotProtein: A Novel Framework for Protein Thermostability Prediction and Editing
- How Does Semi-supervised Learning with Pseudo-labelers Work? A Case Study
- How gradient estimator variance and bias impact learning in neural networks
- How I Learned to Stop Worrying and Love Retraining
- How Informative is the Approximation Error from Tensor Decomposition for Neural Network Compression?
- How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization
- How Much Space Has Been Explored? Measuring the Chemical Space Covered by Databases and Machine-Generated Molecules
- How robust is unsupervised representation learning to distribution shift?
- How Sharpness-Aware Minimization Minimizes Sharpness?
- How to Exploit Hyperspherical Embeddings for Out-of-Distribution Detection?
- How to prepare your task head for finetuning
- How to Train your HIPPO: State Space Models with Generalized Orthogonal Basis Projections
- Human alignment of neural network representations
- Human-Guided Fair Classification for Natural Language Processing
- Human-level Atari 200x faster
- Humanly Certifying Superhuman Classifiers
- Human Motion Diffusion Model
- Human MotionFormer: Transferring Human Motions with Vision Transformers
- Hungry Hungry Hippos: Towards Language Modeling with State Space Models
- Hybrid RL: Using both offline and online data can make RL efficient
- Hyperbolic Deep Reinforcement Learning
- Hyperbolic Self-paced Learning for Self-supervised Skeleton-based Action Representations
- Hyper-Decision Transformer for Efficient Online Policy Adaptation
- HyperDeepONet: learning operator with complex target function space using the limited resources via hypernetwork
- HypeR: Multitask Hyper-Prompted Training Enables Large-Scale Retrieval Generalization
- Hyperparameter Optimization through Neural Network Partitioning
- ICLR 2023 Workshop on Machine Learning for Remote Sensing
- ICLR 2023 Workshop on Sparsity in Neural Networks: On practical limitations and tradeoffs between sustainability and efficiency
- IDEAL: Query-Efficient Data-Free Learning from Black-Box Models
- Identifiability Results for Multimodal Contrastive Learning
- ILA-DA: Improving Transferability of Intermediate Level Attack with Data Augmentation
- Image as Set of Points
- ImageNet-X: Understanding Model Mistakes with Factor of Variation Annotations
- Images as Weight Matrices: Sequential Image Generation Through Synaptic Learning Rules
- Image to Sphere: Learning Equivariant Features for Efficient Pose Prediction
- ImaginaryNet: Learning Object Detectors without Real Images and Annotations
- Imbalanced Semi-supervised Learning with Bias Adaptive Classifier
- Imitating Graph-Based Planning with Goal-Conditioned Policies
- Imitating Human Behaviour with Diffusion Models
- Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data
- Implicit Bias of Large Depth Networks: a Notion of Rank for Nonlinear Functions
- Implicit Regularization for Group Sparsity
- Implicit regularization in Heavy-ball momentum accelerated stochastic gradient descent
- Importance-Weighting Approach to Distribution Shift Adaptation
- Impossibly Good Experts and How to Follow Them
- Improved Convergence of Differential Private SGD with Gradient Clipping
- Improved Learning-augmented Algorithms for k-means and k-medians Clustering
- Improved Sample Complexity for Reward-free Reinforcement Learning under Low-rank MDPs
- Improved Training of Physics-Informed Neural Networks Using Energy-Based Priors: a Study on Electrical Impedance Tomography
- Improving Deep Policy Gradients with Value Function Search
- Improving Deep Regression with Ordinal Entropy
- Improving Differentiable Neural Architecture Search by Encouraging Transferability
- Improving Object-centric Learning with Query Optimization
- Improving Out-of-distribution Generalization with Indirection Representations
- Improving the imputation of missing data with Markov Blanket discovery
- InCoder: A Generative Model for Code Infilling and Synthesis
- Incompatibility Clustering as a Defense Against Backdoor Poisoning Attacks
- In-context Reinforcement Learning with Algorithm Distillation
- Incremental Learning of Structured Memory via Closed-Loop Transcription
- Indiscriminate Poisoning Attacks on Unsupervised Contrastive Learning
- Individual Privacy Accounting with Gaussian Differential Privacy
- Inequality phenomenon in $l_{\infty}$-adversarial training, and its unrealized threats
- Information Plane Analysis for Dropout Neural Networks
- Information-Theoretic Analysis of Unsupervised Domain Adaptation
- Information-Theoretic Characterization of the Generalization Error for Iterative Semi-Supervised Learning
- Information-Theoretic Diffusion
- InPL: Pseudo-labeling the Inliers First for Imbalanced Semi-supervised Learning
- In-sample Actor Critic for Offline Reinforcement Learning
- In-Situ Text-Only Adaptation of Speech Models with Low-Overhead Speech Imputations
- Instance-wise Batch Label Restoration via Gradients in Federated Learning
- Integrating Symmetry into Differentiable Planning with Steerable Convolutions
- Interaction-Based Disentanglement of Entities for Object-Centric World Models
- Interactive Portrait Harmonization
- Interneurons accelerate learning dynamics in recurrent neural networks for statistical adaptation
- Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small
- Interpretability with full complexity by constraining feature information
- Interpretable Debiasing of Vectorized Language Representations with Iterative Orthogonalization
- Interpretable Geometric Deep Learning via Learnable Randomness Injection
- Interpretations of Domain Adaptations via Layer Variational Analysis
- Investigating Multi-task Pretraining and Generalization in Reinforcement Learning
- ISAAC Newton: Input-based Approximate Curvature for Newton's Method
- Is a Caption Worth a Thousand Images? A Study on Representation Learning
- Is Adversarial Training Really a Silver Bullet for Mitigating Data Poisoning?
- Is Attention All That NeRF Needs?
- Is Conditional Generative Modeling all you need for Decision Making?
- Is Forgetting Less a Good Inductive Bias for Forward Transfer?
- Is Model Ensemble Necessary? Model-based RL via a Single Model with Lipschitz Regularized Value Function
- Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
- ISS: Image as Stepping Stone for Text-Guided 3D Shape Generation
- IS SYNTHETIC DATA FROM GENERATIVE MODELS READY FOR IMAGE RECOGNITION?
- Is the Performance of My Deep Network Too Good to Be True? A Direct Approach to Estimating the Bayes Error in Binary Classification
- Iterative Circuit Repair Against Formal Specifications
- Iterative Patch Selection for High-Resolution Image Recognition
- Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural Networks
- Jointly Learning Visual and Auditory Speech Representations from Raw Data
- Kernel Neural Optimal Transport
- kNN-Diffusion: Image Generation via Large-Scale Retrieval
- KnowDA: All-in-One Knowledge Mixture Model for Data Augmentation in Low-Resource NLP
- Knowledge Distillation based Degradation Estimation for Blind Super-Resolution
- Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models
- Koopman Neural Operator Forecaster for Time-series with Temporal Distributional Shifts
- KwikBucks: Correlation Clustering with Cheap-Weak and Expensive-Strong Signals
- Label-free Concept Bottleneck Models
- Label Propagation with Weak Supervision
- Language Modelling with Pixels
- Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought
- Language models are multilingual chain-of-thought reasoners
- Language Models are Realistic Tabular Data Generators
- Language Models Can Teach Themselves to Program Better
- Large Language Models are Human-Level Prompt Engineers
- Last Layer Re-Training is Sufficient for Robustness to Spurious Correlations
- Latent Bottlenecked Attentive Neural Processes
- Latent Graph Inference using Product Manifolds
- Latent Neural ODEs with Sparse Bayesian Multiple Shooting
- Latent State Marginalization as a Low-cost Approach for Improving Exploration
- Latent Variable Representation for Reinforcement Learning
- LAVA: Data Valuation without Pre-Specified Learning Algorithms
- Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Label-Efficient Representations
- LDMIC: Learning-based Distributed Multi-view Image Coding
- Learnable Behavior Control: Breaking Atari Human World Records via Sample-Efficient Behavior Selection
- Learnable Graph Convolutional Attention Networks
- Learnable Topological Features For Phylogenetic Inference via Graph Neural Networks
- Learned Index with Dynamic $\epsilon$
- Learned optimizers: why they're the future, why they’re hard, and what they can do now
- Learning About Progress From Experts
- Learning Achievement Structure for Structured Exploration in Domains with Sparse Reward
- Learning a Data-Driven Policy Network for Pre-Training Automated Feature Engineering
- Learning Adversarial Linear Mixture Markov Decision Processes with Bandit Feedback and Unknown Transition
- Learning Continuous Normalizing Flows For Faster Convergence To Target Distribution via Ascent Regularizations
- Learning Controllable Adaptive Simulation for Multi-resolution Physics
- Learning Cut Selection for Mixed-Integer Linear Programming via Hierarchical Sequence Model
- Learning differentiable solvers for systems with hard constraints
- Learning Diffusion Bridges on Constrained Domains
- Learning Domain-Agnostic Representation for Disease Diagnosis
- Learning Fair Graph Representations via Automated Data Augmentations
- Learning Fast and Slow for Online Time Series Forecasting
- Learning Group Importance using the Differentiable Hypergeometric Distribution
- Learning Harmonic Molecular Representations on Riemannian Manifold
- Learning Heterogeneous Interaction Strengths by Trajectory Prediction with Graph Neural Network
- Learning Hierarchical Protein Representations via Complete 3D Graph Networks
- Learning Human-Compatible Representations for Case-Based Decision Support
- Learning Hyper Label Model for Programmatic Weak Supervision
- Learning Input-agnostic Manipulation Directions in StyleGAN with Text Guidance
- Learning in temporally structured environments
- Learning Iterative Neural Optimizers for Image Steganography
- Learning Kernelized Contextual Bandits in a Distributed and Asynchronous Environment
- Learning Label Encodings for Deep Regression
- Learning Language Representations with Logical Inductive Bias
- Learning Locality and Isotropy in Dialogue Modeling
- Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets
- Learning Math Reasoning from Self-Sampled Correct and Partially-Correct Solutions
- Learning MLPs on Graphs: A Unified View of Effectiveness, Robustness, and Efficiency
- Learning Multimodal Data Augmentation in Feature Space
- Learning multi-scale local conditional probability models of images
- Learning Object-Language Alignments for Open-Vocabulary Object Detection
- Learning on Large-scale Text-attributed Graphs via Variational Inference
- Learning Probabilistic Topological Representations Using Discrete Morse Theory
- Learning Proximal Operators to Discover Multiple Optima
- Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training
- Learning Rationalizable Equilibria in Multiplayer Games
- Learning ReLU networks to high uniform accuracy is intractable
- Learning rigid dynamics with face interaction graph networks
- Learning Simultaneous Navigation and Construction in Grid Worlds
- Learning Soft Constraints From Constrained Expert Demonstrations
- Learning Sparse and Low-Rank Priors for Image Recovery via Iterative Reweighted Least Squares Minimization
- Learning Sparse Group Models Through Boolean Relaxation
- Learning Structured Representations by Embedding Class Hierarchy
- Learning Symbolic Models for Graph-structured Physical Mechanism
- Learning the Positions in CountSketch
- Learning to Compose Soft Prompts for Compositional Zero-Shot Learning
- Learning to CROSS exchange to solve min-max vehicle routing problems
- Learning to Decompose Visual Features with Latent Textual Prompts
- Learning to Estimate Shapley Values with Vision Transformers
- Learning to Estimate Single-View Volumetric Flow Motions without 3D Supervision
- Learning to Extrapolate: A Transductive Approach
- Learning to Generate Columns with Application to Vertex Coloring
- Learning to Grow Pretrained Models for Efficient Transformer Training
- Learning to Induce Causal Structure
- Learning to Jointly Share and Prune Weights for Grounding Based Vision and Language Models
- Learning to Linearize Deep Neural Networks for Secure and Efficient Private Inference
- Learning topology-preserving data representations
- Learning to reason over visual objects
- Learning to Segment from Noisy Annotations: A Spatial Correction Approach
- Learning to Solve Constraint Satisfaction Problems with Recurrent Transformer
- Learning Uncertainty for Unknown Domains with Zero-Target-Assumption
- Learning Vortex Dynamics for Fluid Inference and Prediction
- Learning What and Where: Disentangling Location and Identity Tracking Without Supervision
- Learning where and when to reason in neuro-symbolic inference
- Learning with Auxiliary Activation for Memory-Efficient Training
- Learning with Logical Constraints but without Shortcut Satisfaction
- Learning without Prejudices: Continual Unbiased Learning via Benign and Malignant Forgetting
- Learning with Stochastic Orders
- Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased
- Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
- Leveraging Future Relationship Reasoning for Vehicle Trajectory Prediction
- Leveraging Importance Weights in Subset Selection
- Leveraging Large Language Models for Multiple Choice Question Answering
- Leveraging Unlabeled Data to Track Memorization
- LexMAE: Lexicon-Bottlenecked Pretraining for Large-Scale Retrieval
- LiftedCL: Lifting Contrastive Learning for Human-Centric Perception
- LightGCL: Simple Yet Effective Graph Contrastive Learning for Recommendation
- Light Sampling Field and BRDF Representation for Physically-based Neural Rendering
- LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification
- Limitless Stability for Graph Convolutional Networks
- Linear Connectivity Reveals Generalization Strategies
- Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies
- Linearly Mapping from Image to Text Space
- Link Prediction with Non-Contrastive Learning
- LipsFormer: Introducing Lipschitz Continuity to Vision Transformers
- Liquid Structural State-Space Models
- LMC: Fast Training of GNNs via Subgraph Sampling with Provable Convergence
- LMSeg: Language-guided Multi-dataset Segmentation
- Localized Randomized Smoothing for Collective Robustness Certification
- Logical Entity Representation in Knowledge-Graphs for Differentiable Rule Learning
- Logical Message Passing Networks with One-hop Inference on Atomic Formulas
- LogicDP: Creating Labels for Graph Data via Inductive Logic Programming
- Long Range Language Modeling via Gated State Spaces
- Long-Tailed Learning Requires Feature Learning
- Long-Tailed Partial Label Learning via Dynamic Rebalancing
- Loss Landscapes are All You Need: Neural Network Generalization Can Be Explained Without the Implicit Bias of Gradient Descent
- Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation
- Lower Bounds on the Depth of Integral ReLU Neural Networks via Lattice Polytopes
- LPT: Long-tailed Prompt Tuning for Image Classification
- LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning
- MA-BERT: Towards Matrix Arithmetic-only BERT Inference by Eliminating Complex Non-Linear Functions
- Machine Learning for Drug Discovery (MLDD)
- Machine Learning for IoT: Datasets, Perception, and Understanding
- Machine Unlearning of Federated Clusters
- MACTA: A Multi-agent Reinforcement Learning Approach for Cache Timing Attacks and Detection
- MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning
- Make-A-Video: Text-to-Video Generation without Text-Video Data
- Making Better Decision by Directly Planning in Continuous Control
- Making Substitute Models More Bayesian Can Enhance Transferability of Adversarial Examples
- Malign Overfitting: Interpolation and Invariance are Fundamentally at Odds
- ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills
- ManyDG: Many-domain Generalization for Healthcare Applications
- MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction
- Markup-to-Image Diffusion Models with Scheduled Sampling
- MARS: Meta-learning as Score Matching in the Function Space
- Martingale Posterior Neural Processes
- Masked Distillation with Receptive Tokens
- Masked Frequency Modeling for Self-Supervised Visual Pre-Training
- Masked Image Modeling with Denoising Contrast
- Masked Unsupervised Self-training for Label-free Image Classification
- Masked Vision and Language Modeling for Multi-modal Representation Learning
- MaskFusion: Feature Augmentation for Click-Through Rate Prediction via Input-adaptive Mask Fusion
- MaskViT: Masked Visual Pre-Training for Video Prediction
- Mass-Editing Memory in a Transformer
- Massively Scaling Heteroscedastic Classifiers
- Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning
- MAST: Masked Augmentation Subspace Training for Generalizable Self-Supervised Priors
- Matching receptor to odorant with protein language and graph neural networks
- Mathematical and Empirical Understanding of Foundation Models (ME-FoMo)
- Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
- Maximizing Spatio-Temporal Entropy of Deep 3D CNNs for Efficient Video Recognition
- Max-Margin Works while Large Margin Fails: Generalization without Uniform Convergence
- MCAL: Minimum Cost Human-Machine Active Labeling
- Measure the Predictive Heterogeneity
- Measuring axiomatic soundness of counterfactual image models
- Measuring Forgetting of Memorized Training Examples
- MECTA: Memory-Economic Continual Test-Time Model Adaptation
- MEDFAIR: Benchmarking Fairness for Medical Imaging
- MEDICAL IMAGE UNDERSTANDING WITH PRETRAINED VISION LANGUAGE MODELS: A COMPREHENSIVE STUDY
- Mega: Moving Average Equipped Gated Attention
- Memorization Capacity of Neural Networks with Conditional Computation
- Memorization-Dilation: Modeling Neural Collapse Under Noise
- Memory Gym: Partially Observable Challenges to Memory-Based Agents
- MeshDiffusion: Score-based Generative 3D Mesh Modeling
- Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics
- MetaGL: Evaluation-Free Selection of Graph Learning Models via Meta-Learning
- Meta Knowledge Condensation for Federated Learning
- Meta-learning Adaptive Deep Kernel Gaussian Processes for Molecular Property Prediction
- Meta-Learning in Games
- Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning
- Meta-prediction Model for Distillation-Aware NAS on Unseen Datasets
- Meta Temporal Point Processes
- MICN: Multi-scale Local and Global Context Modeling for Long-term Series Forecasting
- Mid-Vision Feedback
- MIMT: Masked Image Modeling Transformer for Video Compression
- Mind's Eye: Grounded Language Model Reasoning through Simulation
- Mind the Gap: Offline Policy Optimization for Imperfect Rewards
- Mind the Pool: Convolutional Neural Networks Can Overfit Input Size
- Mini-batch $k$-means terminates within $O(d/\epsilon)$ iterations
- Minimalistic Unsupervised Representation Learning with the Sparse Manifold Transform
- Minimax Optimal Kernel Operator Learning via Multilevel Training
- Minimum Description Length Control
- Minimum Variance Unbiased N:M Sparsity for the Neural Gradients
- Min-Max Multi-objective Bilevel Optimization with Applications in Robust Machine Learning
- Mitigating Dataset Bias by Using Per-Sample Gradient
- Mitigating Gradient Bias in Multi-objective Learning: A Provably Convergent Approach
- Mitigating Memorization of Noisy Labels via Regularization between Representations
- MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer
- M-L2O: Towards Generalizable Learning-to-Optimize by Test-Time Fast Self-Adaptation
- MLPInit: Embarrassingly Simple GNN Training Acceleration with MLP Initialization
- MMVAE+: Enhancing the Generative Quality of Multimodal VAEs without Compromises
- MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models
- MocoSFL: enabling cross-client collaborative self-supervised learning
- Model-based Causal Bayesian Optimization
- Model ensemble instead of prompt fusion: a sample-specific knowledge transfer method for few-shot prompt tuning
- Modeling content creator incentives on algorithm-curated platforms
- Modeling Multimodal Aleatoric Uncertainty in Segmentation with Mixture of Stochastic Experts
- Modeling Sequential Sentence Relation to Improve Cross-lingual Dense Retrieval
- Modeling the Data-Generating Process is Necessary for Out-of-Distribution Generalization
- Modelling Long Range Dependencies in $N$D: From Task-Specific to a General Purpose CNN
- MoDem: Accelerating Visual Model-Based Reinforcement Learning with Demonstrations
- Moderate Coreset: A Universal Method of Data Selection for Real-world Data-efficient Deep Learning
- Mole-BERT: Rethinking Pre-training Graph Neural Networks for Molecules
- Molecular Geometry Pretraining with SE(3)-Invariant Denoising Distance Matching
- Molecule Generation For Target Protein Binding with Structural Motifs
- Momentum Stiefel Optimizer, with Applications to Suitably-Orthogonal Attention, and Optimal Transport
- Monocular Scene Reconstruction with 3D SDF Transformers
- More Centralized Training, Still Decentralized Execution: Multi-Agent Conditional Policy Factorization
- More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity
- Mosaic Representation Learning for Self-supervised Visual Pre-training
- Moving Forward by Moving Backward: Embedding Action Impact over Action Semantics
- MPCFORMER: FAST, PERFORMANT AND PRIVATE TRANSFORMER INFERENCE WITH MPC
- Multi-domain image generation and translation with identifiability guarantees
- Multifactor Sequential Disentanglement via Structured Koopman Autoencoders
- Multi-level Protein Structure Pre-training via Prompt Learning
- Multi-lingual Evaluation of Code Generation Models
- Multimodal Analogical Reasoning over Knowledge Graphs
- Multimodal Federated Learning via Contrastive Representation Ensemble
- Multimodal Representation Learning (MRL): Perks and Pitfalls
- Multi-Objective Online Learning
- Multi-objective optimization via equivariant deep hypervolume approximation
- Multi-Objective Reinforcement Learning: Convexity, Stationarity and Pareto Optimality
- Multiple sequence alignment as a sequence-to-sequence learning problem
- Multi-Rate VAE: Train Once, Get the Full Rate-Distortion Curve
- Multi-skill Mobile Manipulation for Object Rearrangement
- Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning
- Multi-task Self-supervised Graph Neural Networks Enable Stronger Task Generalization
- Multivariate Time-series Imputation with Disentangled Temporal Representations
- MultiViz: Towards Visualizing and Understanding Multimodal Models
- Mutual Partial Label Learning with Competitive Label Noise
- NAGphormer: A Tokenized Graph Transformer for Node Classification in Large Graphs
- NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
- Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game
- Near-Optimal Adversarial Reinforcement Learning with Switching Costs
- Near-optimal Coresets for Robust Clustering
- Near-Optimal Deployment Efficiency in Reward-Free Reinforcement Learning with Linear Function Approximation
- Near-optimal Policy Identification in Active Reinforcement Learning
- NERDS: A General Framework to Train Camera Denoisers from Raw-RGB Noisy Image Pairs
- NeRF-SOS: Any-View Self-supervised Object Segmentation on Complex Scenes
- NeRN: Learning Neural Representations for Neural Networks
- Neural Agents Struggle to Take Turns in Bidirectional Emergent Communication
- Neural Architecture Design and Robustness: A Dataset
- Neural-based classification rule learning for sequential data
- Neural Bregman Divergences for Distance Learning
- Neural Causal Models for Counterfactual Identification and Estimation
- Neural Collapse Inspired Feature-Classifier Alignment for Few-Shot Class-Incremental Learning
- Neural Compositional Rule Learning for Knowledge Graph Reasoning
- Neural DAG Scheduling via One-Shot Priority Sampling
- Neural Design for Genetic Perturbation Experiments
- Neural ePDOs: Spatially Adaptive Equivariant Partial Differential Operator Based Networks
- Neural Episodic Control with State Abstraction
- Neural Fields across Fields: Methods and Applications of Implicit Neural Representations
- Neural Groundplans: Persistent Neural Scene Representations from a Single Image
- Neural Image-based Avatars: Generalizable Radiance Fields for Human Avatar Modeling
- Neural Implicit Shape Editing using Boundary Sensitivity
- Neural Lagrangian Schr\"{o}dinger Bridge: Diffusion Modeling for Population Dynamics
- Neural Networks and the Chomsky Hierarchy
- Neural Networks Efficiently Learn Low-Dimensional Representations with SGD
- Neural Optimal Transport
- Neural Radiance Field Codebooks
- Neural Systematic Binder
- Neuroevolution is a Competitive Alternative to Reinforcement Learning for Skill Discovery
- Neuromechanical Autoencoders: Learning to Couple Elastic and Neural Network Nonlinearity
- Neurosymbolic Generative Models (NeSy-GeMs)
- Neuro-Symbolic Procedural Planning with Commonsense Prompting
- New Insights for the Stability-Plasticity Dilemma in Online Continual Learning
- Noise Injection Node Regularization for Robust Learning
- Noise Is Not the Main Factor Behind the Gap Between Sgd and Adam on Transformers, But Sign Descent Might Be
- Noise-Robust De-Duplication at Scale
- Nonlinear Reconstruction for Operator Learning of PDEs with Discontinuities
- Non-parametric Outlier Synthesis
- No Reason for No Supervision: Improved Generalization in Supervised Models
- NORM: Knowledge Distillation via N-to-One Representation Matching
- Not All Tasks Are Born Equal: Understanding Zero-Shot Generalization
- Novel View Synthesis with Diffusion Models
- NTFields: Neural Time Fields for Physics-Informed Robot Motion Planning
- NTK-SAP: Improving neural network pruning by aligning training dynamics
- ODAM: Gradient-based Instance-Specific Visual Explanations for Object Detection
- Offline Congestion Games: How Feedback Type Affects Data Coverage Requirement
- Offline Q-learning on Diverse Multi-Task Data Both Scales And Generalizes
- Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling
- Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient
- Offline RL for Natural Language Generation with Implicit Language Q Learning
- Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization
- Ollivier-Ricci Curvature for Hypergraphs: A Unified Framework
- Omnigrok: Grokking Beyond Algorithmic Data
- On Accelerated Perceptrons and Beyond
- On Achieving Optimal Adversarial Test Error
- On amortizing convex conjugates for optimal transport
- On Compositional Uncertainty Quantification for Seq2seq Graph Parsing
- One-Pixel Shortcut: On the Learning Preference of Deep Neural Networks
- One Transformer Can Understand Both 2D & 3D Molecular Data
- On Explaining Neural Network Robustness with Activation Path
- Online Bias Correction for Task-Free Continual Learning
- Online Boundary-Free Continual Learning by Scheduled Data Prior
- Online Low Rank Matrix Completion
- On Pre-training Language Model for Antibody
- On Representing Linear Programs by Graph Neural Networks
- On Representing Mixed-Integer Linear Programs by Graph Neural Networks
- On the complexity of nonsmooth automatic differentiation
- On the Convergence of AdaGrad(Norm) on $\mathbb{R}^d$: Beyond Convexity, Non-Asymptotic Rate and Acceleration
- On the Data-Efficiency with Contrastive Image Transformation in Reinforcement Learning
- On the duality between contrastive and non-contrastive self-supervised learning
- On the Effectiveness of Out-of-Distribution Data in Self-Supervised Long-Tail Learning.
- On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning
- On the Importance and Applicability of Pre-Training for Federated Learning
- On The Inadequacy of Optimizing Alignment and Uniformity in Contrastive Learning of Sentence Representations
- On the Performance of Temporal Difference Learning With Neural Networks
- On the Perils of Cascading Robust Classifiers
- On The Relative Error of Random Fourier Features for Preserving Kernel Distance
- On the Robustness of Safe Reinforcement Learning under Observational Perturbations
- On the Robustness to Misspecification of α-posteriors and Their Variational Approximations
- On the Saturation Effect of Kernel Ridge Regression
- On the Sensitivity of Reward Inference to Misspecified Human Models
- On the Soft-Subnetwork for Few-Shot Class Incremental Learning
- On The Specialization of Neural Modules
- On the Trade-Off between Actionable Explanations and the Right to be Forgotten
- On the Usefulness of Embeddings, Clusters and Strings for Text Generation Evaluation
- On the Word Boundaries of Emergent Languages Based on Harris's Articulation Scheme
- Open-Vocabulary Object Detection upon Frozen Vision and Language Models
- Optimal Activation Functions for the Random Features Regression Model
- Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian
- Optimal Transport for Offline Imitation Learning
- Optimistic Exploration with Learned Features Provably Solves Markov Decision Processes with Neural Dynamics
- Optimizing Bi-Encoder for Named Entity Recognition via Contrastive Learning
- Optimizing Spca-based Continual Learning: A Theoretical Approach
- OPTQ: Accurate Quantization for Generative Pre-trained Transformers
- Ordered GNN: Ordering Message Passing to Deal with Heterophily and Over-smoothing
- Order Matters: Agent-by-agent Policy Optimization
- OTOv2: Automatic, Generic, User-Friendly
- Outcome-directed Reinforcement Learning by Uncertainty \& Temporal Distance-Aware Curriculum Goal Generation
- Out-of-Distribution Detection and Selective Generation for Conditional Language Models
- Out-of-Distribution Detection based on In-Distribution Data Patterns Memorization with Modern Hopfield Energy
- Out-of-distribution Detection with Implicit Outlier Transformation
- Out-of-distribution Representation Learning for Time Series Classification
- Over-parameterized Model Optimization with Polyak-{\L}ojasiewicz Condition
- Over-Training with Mixup May Hurt Generalization
- Packed Ensembles for efficient uncertainty estimation
- PAC-NeRF: Physics Augmented Continuum Neural Radiance Fields for Geometry-Agnostic System Identification
- PAC Reinforcement Learning for Predictive State Representations
- PaLI: A Jointly-Scaled Multilingual Language-Image Model
- PandA: Unsupervised Learning of Parts and Appearances in the Feature Maps of GANs
- Panning for Gold in Federated Learning: Targeted Text Extraction under Arbitrarily Large-Scale Aggregation
- Parallel Deep Neural Networks Have Zero Duality Gap
- Parameter-Efficient Fine-Tuning Design Spaces
- Parametrizing Product Shape Manifolds by Composite Networks
- Pareto Invariant Risk Minimization: Towards Mitigating the Optimization Dilemma in Out-of-Distribution Generalization
- Part-Based Models Improve Adversarial Robustness
- Partial Label Unsupervised Domain Adaptation with Class-Prototype Alignment
- Partially Observable RL with B-Stability: Unified Structural Condition and Sharp Sample-Efficient Algorithms
- Particle-based Variational Inference with Preconditioned Functional Gradient Flow
- PASHA: Efficient HPO and NAS with Progressive Resource Allocation
- PatchDCT: Patch Refinement for High Quality Instance Segmentation
- Patch-Level Contrasting without Patch Correspondence for Accurate and Dense Contrastive Representation Learning
- PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning Algorithm
- PEER: A Collaborative Language Model
- Perfectly Secure Steganography Using Minimum Entropy Coupling
- PerFedMask: Personalized Federated Learning with Optimized Masking Vectors
- Performance Bounds for Model and Policy Transfer in Hidden-parameter MDPs
- Personalized Federated Learning with Feature Alignment and Classifier Collaboration
- Personalized Reward Learning with Interaction-Grounded Learning (IGL)
- Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes
- PGrad: Learning Principal Gradients For Domain Generalization
- Phase2vec: dynamical systems embedding with a physics-informed convolutional network
- Phase transition for detecting a small community in a large network
- Phenaki: Variable Length Video Generation from Open Domain Textual Descriptions
- Physics for Machine Learning
- PiFold: Toward effective and efficient protein inverse folding
- Pink Noise Is All You Need: Colored Noise Exploration in Deep Reinforcement Learning
- PINTO: Faithful Language Reasoning Using Prompt-Generated Rationales
- Pitfalls of Gaussians as a noise distribution in NCE
- Pitfalls of limited data and computation for Trustworthy ML
- Planckian Jitter: countering the color-crippling effects of color jitter on self-supervised training
- Planning Goals for Exploration
- Planning with Large Language Models for Code Generation
- Planning with Sequence Models through Iterative Energy Minimization
- Plateau in Monotonic Linear Interpolation --- A "Biased" View of Loss Landscape for Deep Networks
- PLOT: Prompt Learning with Optimal Transport for Vision-Language Models
- Policy-Based Self-Competition for Planning Problems
- Policy Expansion for Bridging Offline-to-Online Reinforcement Learning
- Policy Pre-training for Autonomous Driving via Self-supervised Geometric Modeling
- POPGym: Benchmarking Partially Observable Reinforcement Learning
- Population-size-Aware Policy Optimization for Mean-Field Games
- Post-hoc Concept Bottleneck Models
- Powderworld: A Platform for Understanding Generalization via Rich Task Distributions
- PowerQuant: Automorphism Search for Non-Uniform Quantization
- Predicting Cellular Responses with Variational Causal Inference and Refined Relational Information
- Predictive Inference with Feature Conformal Prediction
- Predictor-corrector algorithms for stochastic optimization under gradual distribution shift
- Preference Transformer: Modeling Human Preferences using Transformers for RL
- Preserving Pre-trained Features Helps Calibrate Fine-tuned Language Models
- Pre-training via Denoising for Molecular Property Prediction
- Principal Components Bias in Over-parameterized Linear Models, and its Manifestation in Deep Neural Networks
- Priors, Hierarchy, and Information Asymmetry for Skill Transfer in Reinforcement Learning
- Private Federated Learning Without a Trusted Server: Optimal Algorithms for Convex Losses
- Proactive Multi-Camera Collaboration for 3D Human Pose Estimation
- Probabilistically Robust Recourse: Navigating the Trade-offs between Costs and Robustness in Algorithmic Recourse
- Programmatically Grounded, Compositionally Generalizable Robotic Manipulation
- Progressively Compressed Auto-Encoder for Self-supervised Representation Learning
- Progressive Mix-Up for Few-Shot Supervised Multi-Source Domain Transfer
- Progressive Prompts: Continual Learning for Language Models
- Progressive Voronoi Diagram Subdivision Enables Accurate Data-free Class-Incremental Learning
- Progress measures for grokking via mechanistic interpretability
- Projective Proximal Gradient Descent for Nonconvex Nonsmooth Optimization: Fast Convergence Without Kurdyka-Lojasiewicz (KL) Property
- Promptagator: Few-shot Dense Retrieval From 8 Examples
- Prompting GPT-3 To Be Reliable
- Prompt-to-Prompt Image Editing with Cross-Attention Control
- Proposal-Contrastive Pretraining for Object Detection from Fewer Data
- Protein Representation Learning by Geometric Structure Pretraining
- Protein Representation Learning via Knowledge Enhanced Primary Structure Reasoning
- Protein Sequence and Structure Co-Design with Equivariant Translation
- Prototypical Calibration for Few-shot Learning of Language Models
- Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks
- Provable Defense Against Geometric Transformations
- Provable Memorization Capacity of Transformers
- Provable Robustness against Wasserstein Distribution Shifts via Input Randomization
- Provable Sim-to-real Transfer in Continuous Domain with Partial Observations
- Provably Auditing Ordinary Least Squares in Low Dimensions
- Provably Efficient Lifelong Reinforcement Learning with Linear Representation
- Provably Efficient Risk-Sensitive Reinforcement Learning: Iterated CVaR and Worst Path
- Pruning Deep Neural Networks from a Sparsity Perspective
- Pseudoinverse-Guided Diffusion Models for Inverse Problems
- Pseudo-label Training and Model Inertia in Neural Machine Translation
- Pushing the Accuracy-Group Robustness Frontier with Introspective Self-play
- Pushing the Limits of Fewshot Anomaly Detection in Industry Vision: Graphcore
- PV3D: A 3D Generative Model for Portrait Video Generation
- QAID: Question Answering Inspired Few-shot Intent Detection
- Q-Pensieve: Boosting Sample Efficiency of Multi-Objective RL Through Memory Sharing of Q-Snapshots
- Quality-Similar Diversity via Population Based Reinforcement Learning
- Quantifying and Mitigating the Impact of Label Errors on Model Disparity Metrics
- Quantifying Memorization Across Neural Language Models
- Quantile Risk Control: A Flexible Framework for Bounding the Probability of High-Loss Predictions
- Quantized Compressed Sensing with Score-Based Generative Models
- QuAnt: Quantum Annealing with Learnt Couplings
- Quasi-optimal Reinforcement Learning with Continuous Actions
- Random Laplacian Features for Learning with Hyperbolic Space
- RandProx: Primal-Dual Optimization Algorithms with Randomized Proximal Updates
- Rarity Score : A New Metric to Evaluate the Uncommonness of Synthesized Images
- ReAct: Synergizing Reasoning and Acting in Language Models
- Real-Time Image Demoir$\acute{e}$ing on Mobile Devices
- Real-time variational method for learning neural trajectory and its dynamics
- Re-calibrating Feature Attributions for Model Interpretation
- Recitation-Augmented Language Models
- Recon: Reducing Conflicting Gradients From the Root For Multi-Task Learning
- Recursive Time Series Data Augmentation
- Red PANDA: Disambiguating Image Anomaly Detection by Removing Nuisance Factors
- Regression with Label Differential Privacy
- Re-Imagen: Retrieval-Augmented Text-to-Image Generator
- Reincarnating Reinforcement Learning
- Relational Attention: Generalizing Transformers for Graph-Structured Tasks
- Relative Behavioral Attributes: Filling the Gap between Symbolic Goal Specification and Reward Learning from Human Preferences
- Relative representations enable zero-shot latent space communication
- Reliability of CKA as a Similarity Measure in Deep Learning
- REPAIR: REnormalizing Permuted Activations for Interpolation Repair
- Reparameterization through Spatial Gradient Scaling
- Re-parameterizing Your Optimizers rather than Architectures
- Replay Memory as An Empirical MDP: Combining Conservative Estimation with Experience Replay
- Replicable Bandits
- Representational Dissimilarity Metric Spaces for Stochastic Neural Networks
- Representation Learning for Low-rank General-sum Markov Games
- Represent to Control Partially Observed Systems: Representation Learning with Provable Sample Efficiency
- ResAct: Reinforcing Long-term Engagement in Sequential Recommendation with Residual Actor
- Restricted Strong Convexity of Deep Learning Models with Smooth Activations
- Rethinking Graph Lottery Tickets: Graph Sparsity Matters
- Rethinking Self-Supervised Visual Representation Learning in Pre-training for 3D Human Pose and Shape Estimation
- Rethinking skip connection model as a learnable Markov chain
- Rethinking Symbolic Regression: Morphology and Adaptability in the Context of Evolutionary Algorithms
- Rethinking the Effect of Data Augmentation in Adversarial Contrastive Learning
- Rethinking the Expressive Power of GNNs via Graph Biconnectivity
- Retrieval-based Controllable Molecule Generation
- Reversible Column Networks
- Revisit Finetuning strategy for Few-Shot Learning to Transfer the Emdeddings
- Revisiting adapters with adversarial training
- Revisiting Graph Adversarial Attack and Defense From a Data Distribution Perspective
- Revisiting Intrinsic Reward for Exploration in Procedurally Generated Environments
- Revisiting Populations in multi-agent Communication
- REVISITING PRUNING AT INITIALIZATION THROUGH THE LENS OF RAMANUJAN GRAPH
- Revisiting Robustness in Graph Machine Learning
- Revisiting the Assumption of Latent Separability for Backdoor Defenses
- Revisiting the Entropy Semiring for Neural Speech Recognition
- Revocable Deep Reinforcement Learning with Affinity Regularization for Outlier-Robust Graph Matching
- Reward Design with Language Models
- Re-weighting Based Group Fairness Regularization via Classwise Robust Optimization
- RGI: robust GAN-inversion for mask-free image inpainting and unsupervised pixel-wise anomaly detection
- Rhino: Deep Causal Temporal Relationship Learning with History-dependent Noise
- Riemannian Metric Learning via Optimal Transport
- Risk-Aware Reinforcement Learning with Coherent Risk Measures and Non-linear Function Approximation
- RLx2: Training a Sparse Deep Reinforcement Learning Model from Scratch
- Robust Active Distillation
- Robust Algorithms on Adaptive Inputs from Bounded Adversaries
- Robust and Controllable Object-Centric Learning through Energy-based Models
- Robust Explanation Constraints for Neural Networks
- Robust Fair Clustering: A Novel Fairness Attack and Defense Framework
- Robust Graph Dictionary Learning
- Robust Multivariate Time-Series Forecasting: Adversarial Attacks and Defense Mechanisms
- Robustness to corruption in pre-trained Bayesian neural networks
- Robust Scheduling with GFlowNets
- ROCO: A General Framework for Evaluating Robustness of Combinatorial Optimization Solvers on Graphs
- RoPAWS: Robust Semi-supervised Representation Learning from Uncurated Data
- ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning
- Rotamer Density Estimator is an Unsupervised Learner of the Effect of Mutations on Protein-Protein Interaction
- RPM: Generalizable Multi-Agent Policies for Multi-Agent Reinforcement Learning
- Safe Exploration Incurs Nearly No Additional Sample Complexity for Reward-Free RL
- Safe Reinforcement Learning From Pixels Using a Stochastic Latent Representation
- SAM as an Optimal Relaxation of Bayes
- Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks
- Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier
- Sampling-based inference for large linear models, with application to linearised Laplace
- Sampling-free Inference for Ab-Initio Potential Energy Surface Networks
- Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions
- Sampling with Mollified Interaction Energy Descent
- Scaffolding a Student to Instill Knowledge
- Scalable and Equivariant Spherical CNNs by Discrete-Continuous (DISCO) Convolutions
- Scalable Batch-Mode Deep Bayesian Active Learning via Equivalence Class Annealing
- Scalable Subset Sampling with Neural Conditional Poisson Networks
- Scaleformer: Iterative Multi-scale Refining Transformers for Time Series Forecasting
- Scale-invariant Bayesian Neural Networks with Connectivity Tangent Kernel
- SCALE-UP: An Efficient Black-box Input-level Backdoor Detection via Analyzing Scaled Prediction Consistency
- Scaling Forward Gradient With Local Losses
- Scaling Laws for a Multi-Agent Reinforcement Learning Model
- Scaling Laws For Deep Learning Based Image Reconstruction
- Scaling Pareto-Efficient Decision Making via Offline Multi-Objective RL
- Scaling up and Stabilizing Differentiable Planning with Implicit Differentiation
- Scaling Up Probabilistic Circuits by Latent Variable Distillation
- Scenario-based Question Answering with Interacting Contextual Properties
- Scene Representations for Autonomous Driving
- Schema Inference for Interpretable Image Classification
- SCoMoE: Efficient Mixtures of Experts with Structured Communication
- Score-based Continuous-time Discrete Diffusion Models
- SeaFormer: Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation
- Searching Lottery Tickets in Graph Neural Networks: A Dual Perspective
- Seeing Differently, Acting Similarly: Heterogeneously Observable Imitation Learning
- Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning
- Selective Annotation Makes Language Models Better Few-Shot Learners
- Selective Frequency Network for Image Restoration
- Self-Consistency Improves Chain of Thought Reasoning in Language Models
- Self-Distillation for Further Pre-training of Transformers
- Self-Ensemble Protection: Training Checkpoints Are Good Data Protectors
- Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning
- Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability
- Self-Supervised Category-Level Articulated Object Pose Estimation with Part-Level SE(3) Equivariance
- Self-Supervised Geometric Correspondence for Category-Level 6D Object Pose Estimation in the Wild
- Self-supervised learning with rotation-invariant kernels
- Self-Supervised Set Representation Learning for Unsupervised Meta-Learning
- Self-supervision through Random Segments with Autoregressive Coding (RandSAC)
- Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation
- Semi-Implicit Variational Inference via Score Matching
- Semi-Parametric Inducing Point Networks and Neural Processes
- Semi-supervised Community Detection via Structural Similarity Metrics
- Semi-supervised learning with a principled likelihood from a generative model of data curation
- SemPPL: Predicting Pseudo-Labels for Better Contrastive Representations
- Sequential Attention for Feature Selection
- Sequential Gradient Coding For Straggler Mitigation
- Sequential Latent Variable Models for Few-Shot High-Dimensional Time-Series Forecasting
- Sequential Learning of Neural Networks for Prequential MDL
- Serving Graph Compression for Graph Neural Networks
- SGDA with shuffling: faster convergence for nonconvex-PŁ minimax optimization
- Share Your Representation Only: Guaranteed Improvement of the Privacy-Utility Tradeoff in Federated Learning
- Sharper Bounds for Uniformly Stable Algorithms with Stationary Mixing Process
- Short-Term Memory Convolutions
- Sign and Basis Invariant Networks for Spectral Graph Representation Learning
- SimPer: Simple Self-Supervised Learning of Periodic Targets
- SIMPLE: A Gradient Estimator for k-Subset Sampling
- Simple and Scalable Nearest Neighbor Machine Translation
- Simple Emergent Action Representations from Multi-Task Policy Training
- Simple initialization and parametrization of sinusoidal networks via their kernel bandwidth
- simpleKT: A Simple But Tough-to-Beat Baseline for Knowledge Tracing
- SIMPLE: Specialized Model-Sample Matching for Domain Generalization
- Simplicial Embeddings in Self-Supervised Learning and Downstream Classification
- Simplicial Hopfield networks
- Simplified State Space Layers for Sequence Modeling
- Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective
- Single-shot General Hyper-parameter Optimization for Federated Learning
- SketchKnitter: Vectorized Sketch Generation with Diffusion Models
- SlotFormer: Unsupervised Visual Dynamics Simulation with Object-Centric Models
- SLTUNET: A Simple Unified Model for Sign Language Translation
- SmartFRZ: An Efficient Training Framework using Attention-Based Layer Freezing
- SMART: Self-supervised Multi-task pretrAining with contRol Transformers
- SMART: Sentences as Basic Units for Text Evaluation
- S-NeRF: Neural Radiance Fields for Street Views
- Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
- Softened Symbol Grounding for Neuro-symbolic Systems
- SoftMatch: Addressing the Quantity-Quality Tradeoff in Semi-supervised Learning
- Soft Neighbors are Positive Supporters in Contrastive Visual Representation Learning
- SoftZoo: A Soft Robot Co-design Benchmark For Locomotion In Diverse Environments
- Solving Constrained Variational Inequalities via a First-order Interior Point-based Method
- Solving Continuous Control via Q-learning
- Solving stochastic weak Minty variational inequalities without increasing batch size
- Sound Randomized Smoothing in Floating-Point Arithmetic
- SP2 : A Second Order Stochastic Polyak Method
- Spacetime Representation Learning
- Sparse Distributed Memory is a Continual Learner
- Sparse Mixture-of-Experts are Domain Generalizable Learners
- Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers
- Sparse Random Networks for Communication-Efficient Federated Learning
- Sparse Token Transformer with Attention Back Tracking
- Sparse tree-based Initialization for Neural Networks
- Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
- Sparsity-Constrained Optimal Transport
- Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!
- Spatial Attention Kinetic Networks with E(n)-Equivariance
- Spatio-temporal point processes with deep non-stationary kernels
- Specformer: Spectral Graph Neural Networks Meet Transformers
- Spectral Augmentation for Self-Supervised Learning on Graphs
- Spectral Decomposition Representation for Reinforcement Learning
- SpeedyZero: Mastering Atari with Limited Data and Time
- Spherical Sliced-Wasserstein
- Spikformer: When Spiking Neural Network Meets Transformer
- Spiking Convolutional Neural Networks for Text Classification
- Spotlight: Mobile UI Understanding using Vision-Language Models with a Focus
- SQA3D: Situated Question Answering in 3D Scenes
- Squeeze Training for Adversarial Robustness
- StableDR: Stabilized Doubly Robust Learning for Recommendation on Data Missing Not at Random
- Stable Target Field for Reduced Variance Score Estimation in Diffusion Models
- STaSy: Score-based Tabular data Synthesis
- Stateful Active Facilitator: Coordination and Environmental Heterogeneity in Cooperative Multi-Agent Reinforcement Learning
- Static Prediction of Runtime Errors by Learning to Execute Programs with External Resource Descriptions
- Statistical Efficiency of Score Matching: The View from Isoperimetry
- Statistical Guarantees for Consensus Clustering
- Statistical Inference for Fisher Market Equilibrium
- Statistical Theory of Differentially Private Marginal-based Data Synthesis Algorithms
- Stay Moral and Explore: Learn to Behave Morally in Text-based Games
- Stochastic Differentially Private and Fair Learning
- Stochastic Multi-Person 3D Motion Forecasting
- Stochastic No-regret Learning for General Games with Variance Reduction
- Strategic Classification with Graph Neural Networks
- STREET: A MULTI-TASK STRUCTURED REASONING AND EXPLANATION BENCHMARK
- Strong inductive biases provably prevent harmless interpolation
- StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training
- Structure by Architecture: Structured Representations without Regularization
- STUNT: Few-shot Tabular Learning with Self-generated Tasks from Unlabeled Tables
- StyleMorph: Disentangled 3D-Aware Image Synthesis with a 3D Morphable StyleGAN
- Subquadratic Algorithms for Kernel Matrices via Kernel Density Estimation
- Subsampling in Large Graphs Using Ricci Curvature
- Sub-Task Decomposition Enables Learning in Sequence to Sequence Tasks
- Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees
- Supervision Complexity and its Role in Knowledge Distillation
- Suppressing the Heterogeneity: A Strong Feature Extractor for Few-shot Segmentation
- Surgical Fine-Tuning Improves Adaptation to Distribution Shifts
- SWIFT: Rapid Decentralized Federated Learning via Wait-Free Model Communication
- Switch-NeRF: Learning Scene Decomposition with Mixture of Experts for Large-scale Neural Radiance Fields
- Symbolic Physics Learner: Discovering governing equations via Monte Carlo tree search
- Symmetric Pruning in Quantum Neural Networks
- Symmetries, Flat Minima, and the Conserved Quantities of Gradient Flow
- SYNC: SAFETY-AWARE NEURAL CONTROL FOR STABILIZING STOCHASTIC DELAY-DIFFERENTIAL EQUATIONS
- Synthetic Data Generation of Many-to-Many Datasets via Random Graph Generation
- Systematic Rectification of Language Models via Dead-end Analysis
- TabCaps: A Capsule Neural Network for Tabular Data Classification with BoW Routing
- TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second
- Tackling Climate Change with Machine Learning: Global Perspectives and Local Challenges
- Tailoring Language Generation Models under Total Variation Distance
- Taking a Step Back with KCal: Multi-Class Kernel-Based Calibration for Deep Neural Networks
- TANGOS: Regularizing Tabular Neural Networks through Gradient Orthogonalization and Specialization
- Targeted Hyperparameter Optimization with Lexicographic Preferences Over Multiple Objectives
- Task Ambiguity in Humans and Language Models
- Task-Aware Information Routing from Common Representation Space in Lifelong Learning
- Task-customized Masked Autoencoder via Mixture of Cluster-conditional Experts
- TaskPrompter: Spatial-Channel Multi-Task Prompting for Dense Scene Understanding
- TDR-CL: Targeted Doubly Robust Collaborative Learning for Debiased Recommendations
- Teacher Guided Training: An Efficient Framework for Knowledge Transfer
- TempCLR: Temporal Alignment Representation with Contrastive Learning
- TEMPERA: Test-Time Prompt Editing via Reinforcement Learning
- Temperature Schedules for self-supervised contrastive methods on long-tail data
- Temporal Coherent Test Time Optimization for Robust Video Classification
- Temporal Dependencies in Feature Importance for Time Series Prediction
- Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning
- Temporal Domain Generalization with Drift-Aware Dynamic Neural Networks
- Tensor-Based Sketching Method for the Low-Rank Approximation of Data Streams.
- Test-Time Adaptation via Self-Training with Nearest Neighbor Information
- Test-Time Robust Personalization for Federated Learning
- TextGrad: Advancing Robustness Evaluation in NLP by Gradient-Driven Optimization
- TextShield: Beyond Successfully Detecting Adversarial Sentences in text classification
- Text Summarization with Oracle Expectation
- Thalamus: a brain-inspired algorithm for biologically-plausible continual learning and disentangled representations
- That Label's got Style: Handling Label Style Bias for Uncertain Image Segmentation
- The 4th Workshop on practical ML for Developing Countries: learning under limited/low resource settings
- The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural Networks
- The Augmented Image Prior: Distilling 1000 Classes by Extrapolating from a Single Image
- The Best of Both Worlds: Accurate Global and Personalized Models through Federated Learning with Data-Free Hyper-Knowledge Distillation
- The Curious Case of Benign Memorization
- The Dark Side of AutoML: Towards Architectural Backdoor Search
- The Devil is in the Wrongly-classified Samples: Towards Unified Open-set Recognition
- The hidden uniform cluster prior in self-supervised learning
- The Implicit Bias of Minima Stability in Multivariate Shallow ReLU Networks
- The Influence of Learning Rule on Representation Dynamics in Wide Neural Networks
- The In-Sample Softmax for Offline Reinforcement Learning
- The KFIoU Loss for Rotated Object Detection
- The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers
- The Lie Derivative for Measuring Learned Equivariance
- The Modality Focusing Hypothesis: Towards Understanding Crossmodal Knowledge Distillation
- The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich Regimes
- Theoretical Characterization of the Generalization Performance of Overfitted Meta-Learning
- The Power of Regularization in Solving Extensive-Form Games
- The Provable Benefit of Unsupervised Data Sharing for Offline Reinforcement Learning
- The Role of Coverage in Online Reinforcement Learning
- The Role of ImageNet Classes in Fréchet Inception Distance
- The Surprising Computational Power of Nondeterministic Stack RNNs
- The Surprising Effectiveness of Equivariant Models in Domains with Latent Symmetry
- The Symmetric Generalized Eigenvalue Problem as a Nash Equilibrium
- The Tilted Variational Autoencoder: Improving Out-of-Distribution Detection
- The Trade-off between Universality and Label Efficiency of Representations from Contrastive Learning
- This Looks Like It Rather Than That: ProtoKNN For Similarity-Based Classifiers
- TiAda: A Time-scale Adaptive Algorithm for Nonconvex Minimax Optimization
- Tier Balancing: Towards Dynamic Fairness over Underlying Causal Factors
- TILP: Differentiable Learning of Temporal Logical Rules on Knowledge Graphs
- Time Series Representation Learning for Health
- TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis
- Time to augment self-supervised visual representation learning
- Time Will Tell: New Outlooks and A Baseline for Temporal Multi-View 3D Object Detection
- Timing is Everything: Learning to Act Selectively with Costly Actions and Budgetary Constraints
- Toeplitz Neural Network for Sequence Modeling
- Token Merging: Your ViT But Faster
- Topologically penalized regression on manifolds
- Topology-aware Robust Optimization for Out-of-Distribution Generalization
- Toward Adversarial Training on Contextualized Language Representation
- Towards Addressing Label Skews in One-Shot Federated Learning
- Towards a Unified Theoretical Understanding of Non-contrastive Learning via Rank Differential Mechanism
- Towards Better Selective Classification
- Towards convergence to Nash equilibria in two-team zero-sum games
- Towards Effective and Interpretable Human-Agent Collaboration in MOBA Games: A Communication Perspective
- Towards Inferential Reproducibility of Machine Learning Research
- Towards Interpretable Deep Reinforcement Learning with Human-Friendly Prototypes
- Towards Lightweight, Model-Agnostic and Diversity-Aware Active Anomaly Detection
- Towards Minimax Optimal Reward-free Reinforcement Learning in Linear MDPs
- Towards One-shot Neural Combinatorial Solvers: Theoretical and Empirical Notes on the Cardinality-Constrained Case
- Towards Open Temporal Graph Neural Networks
- Towards Robustness Certification Against Universal Perturbations
- Towards Robust Object Detection Invariant to Real-World Domain Shifts
- Towards Smooth Video Composition
- Towards Stable Test-time Adaptation in Dynamic Wild World
- Towards the Generalization of Contrastive Self-Supervised Learning
- Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning
- Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning
- Towards Understanding GD with Hard and Conjugate Pseudo-labels for Test-Time Adaptation
- Towards Understanding Why Mask Reconstruction Pretraining Helps in Downstream Tasks
- Trading Information between Latents in Hierarchical Variational Autoencoders
- Trainability Preserving Neural Pruning
- Trainable Weight Averaging: Efficient Training by Optimizing Historical Solutions
- Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
- Training language models to summarize narratives improves brain alignment
- Transferable Unlearnable Examples
- Transfer Learning with Deep Tabular Models
- Transfer NAS with Meta-learned Bayesian Surrogates
- Transformer-based model for symbolic regression via joint supervised learning
- Transformer-based World Models Are Happy With 100k Interactions
- Transformer Meets Boundary Value Inverse Problems
- Transformer-Patcher: One Mistake Worth One Neuron
- Transformers are Sample-Efficient World Models
- Transformers Learn Shortcuts to Automata
- TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation
- Treeformer: Dense Gradient Trees for Efficient Attention Computation
- TrojText: Test-time Invisible Textual Trojan Insertion
- Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Auto-Encoders
- Trustworthy and Reliable Large-Scale Machine Learning Models
- Trustworthy Machine Learning for Healthcare
- Truthful Self-Play
- TTN: A Domain-Shift Aware Batch Normalization in Test-Time Adaptation
- Tuning Frequency Bias in Neural Network Training with Nonuniform Data
- Turning the Curse of Heterogeneity in Federated Learning into a Blessing for Out-of-Distribution Detection
- TVSPrune - Pruning Non-discriminative filters via Total Variation separability of intermediate representations without fine tuning
- TypeT5: Seq2seq Type Inference using Static Analysis
- UL2: Unifying Language Learning Paradigms
- Unbiased Stochastic Proximal Solver for Graph Neural Networks with Equilibrium States
- Unbiased Supervised Contrastive Learning
- Understanding and Adopting Rational Behavior by Bellman Score Estimation
- Understanding DDPM Latent Codes Through Optimal Transport
- Understanding Edge-of-Stability Training Dynamics with a Minimalist Example
- Understanding Embodied Reference with Touch-Line Transformer
- Understanding Influence Functions and Datamodels via Harmonic Analysis
- Understanding Neural Coding on Latent Manifolds by Sharing Features and Dividing Ensembles
- Understanding new tasks through the lens of training data via exponential tilting
- Understanding Systematic Deviations in Data for Trustworthy AI
- Understanding the Covariance Structure of Convolutional Filters
- Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization
- Understanding The Robustness of Self-supervised Learning Through Topic Modeling
- Understanding the Role of Nonlinearity in Training Dynamics of Contrastive Learning
- Understanding Train-Validation Split in Meta-Learning with Neural Networks
- Understanding weight-magnitude hyperparameters in training binary networks
- Understanding Why Generalized Reweighting Does Not Improve Over ERM
- Understanding Zero-shot Adversarial Robustness for Large-Scale Models
- Unicom: Universal and Compact Representation Learning for Image Retrieval
- UNICORN: A Unified Backdoor Trigger Inversion Framework
- Unified Detoxifying and Debiasing in Language Generation via Inference-time Adaptive Optimization
- Unified Discrete Diffusion for Simultaneous Vision-Language Generation
- UNIFIED-IO: A Unified Model for Vision, Language, and Multi-modal Tasks
- Uniform-in-time propagation of chaos for the mean-field gradient Langevin dynamics
- UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question Answering Over Knowledge Graph
- UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining
- Uni-Mol: A Universal 3D Molecular Representation Learning Framework
- Universal Approximation Theorems for Differentiable Geometric Deep Learning
- Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching
- Universal Vision-Language Dense Retrieval: Learning A Unified Representation Space for Multi-Modal Retrieval
- Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask?
- Unsupervised 3D Object Learning through Neuron Activity aware Plasticity
- Unsupervised Learning for Combinatorial Optimization Needs Meta Learning
- Unsupervised Manifold Alignment with Joint Multidimensional Scaling
- Unsupervised Meta-learning via Few-shot Pseudo-supervised Contrastive Learning
- Unsupervised Model Selection for Time Series Anomaly Detection
- Unsupervised Semantic Segmentation with Self-supervised Object-centric Representations
- Unsupervised visualization of image datasets using contrastive learning
- Unveiling the sampling density in non-uniform geometric graphs
- User-Interactive Offline Reinforcement Learning
- Using Both Demonstrations and Language Instructions to Efficiently Learn Robotic Tasks
- Using Language to Extend to Unseen Domains
- VA-DepthNet: A Variational Approach to Single Image Depth Prediction
- Valid P-Value for Deep Learning-driven Salient Region
- Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning
- Variance-Aware Sparse Linear Bandits
- Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker Assumptions and Communication Compression as a Cherry on the Top
- Variational Information Pursuit for Interpretable Predictions
- Variational Latent Branching Model for Off-Policy Evaluation
- Verifying the Union of Manifolds Hypothesis for Image Data
- Versatile Neural Processes for Learning Implicit Neural Representations
- Video Scene Graph Generation from Single-Frame Weak Supervision
- ViewCo: Discovering Text-Supervised Segmentation Masks via Multi-View Semantic Consistency
- View Synthesis with Sculpted Neural Points
- VIPeR: Provably Efficient Algorithm for Offline RL with Neural Function Approximation
- VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training
- Vision Transformer Adapter for Dense Predictions
- Visual Classification via Description from Large Language Models
- Visual Imitation Learning with Patch Rewards
- Visually-Augmented Language Modeling
- Visual Recognition with Deep Nearest Centroids
- VoGE: A Differentiable Volume Renderer using Gaussian Ellipsoids for Analysis-by-Synthesis
- Voint Cloud: Multi-View Point Cloud Representation for 3D Understanding
- Volumetric Optimal Transportation by Fast Fourier Transform
- Voxurf: Voxel-based Efficient and Accurate Neural Surface Reconstruction
- Warping the Space: Weight Space Rotation for Class-Incremental Few-Shot Learning
- Wasserstein Auto-encoded MDPs: Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees
- wav2tok: Deep Sequence Tokenizer for Audio Retrieval
- Weakly Supervised Explainable Phrasal Reasoning with Neural Fuzzy Logic
- Weakly-supervised HOI Detection via Prior-guided Bi-level Representation Learning
- Weakly Supervised Knowledge Transfer with Probabilistic Logical Reasoning for Object Detection
- Weighted Clock Logic Point Process
- Weighted Ensemble Self-Supervised Learning
- What Can we Learn From The Selective Prediction And Uncertainty Estimation Performance Of 523 Imagenet Classifiers?
- What Do Self-Supervised Vision Transformers Learn?
- What do we need for successful domain generalization?
- What Is Missing in IRM Training and Evaluation? Challenges and Solutions
- What learning algorithm is in-context learning? Investigations with linear models
- What Makes Convolutional Models Great on Long Sequence Modeling?
- What shapes the loss landscape of self supervised learning?
- When and Why Vision-Language Models Behave like Bags-Of-Words, and What to Do About It?
- When Data Geometry Meets Deep Function: Generalizing Offline Reinforcement Learning
- When Source-Free Domain Adaptation Meets Learning with Noisy Labels
- When to Make and Break Commitments?
- Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning
- Where to Diffuse, How to Diffuse, and How to Get Back: Automated Learning for Multivariate Diffusions
- Which Layer is Learning Faster? A Systematic Exploration of Layer-wise Convergence Rate for Deep Neural Networks
- Why adversarial training can hurt robust accuracy
- Why (and When) does Local SGD Generalize Better than SGD?
- WikiWhy: Answering and Explaining Cause-and-Effect Questions
- WiNeRT: Towards Neural Ray Tracing for Wireless Channel Modelling and Differentiable Simulations
- Winning Both the Accuracy of Floating Point Activation and the Simplicity of Integer Arithmetic
- Win: Weight-Decay-Integrated Nesterov Acceleration for Adaptive Gradient Algorithms
- Words are all you need? Language as an approximation for human similarity judgments
- Write and Paint: Generative Vision-Language Models are Unified Modal Learners
- Your Contrastive Learning Is Secretly Doing Stochastic Neighbor Embedding
- Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model
- Zeroth-Order Optimization with Trajectory-Informed Derivative Estimation
- ZiCo: Zero-shot NAS via inverse Coefficient of Variation on Gradients