# Downloads 2021

Number of events: 896

- $i$-Mix: A Domain-Agnostic Strategy for Contrastive Representation Learning
- 2nd Workshop on Practical ML for Developing Countries: Learning Under Limited/low Resource Scenarios
- A Better Alternative to Error Feedback for Communication-Efficient Distributed Learning
- A Block Minifloat Representation for Training Deep Neural Networks
- Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC via Variance Reduction
- Accurate Learning of Graph Representations with Graph Multiset Pooling
- Achieving Linear Speedup with Partial Worker Participation in Non-IID Federated Learning
- A Critique of Self-Expressive Deep Subspace Clustering
- Acting in Delayed Environments with Non-Stationary Markov Policies
- Activation-level uncertainty in deep neural networks
- Active Contrastive Learning of Audio-Visual Video Representations
- AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition
- AdaGCN: Adaboosting Graph Convolutional Networks into Deep Models
- AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights
- Adapting to Reward Progressivity via Spectral Reinforcement Learning
- Adaptive and Generative Zero-Shot Learning
- Adaptive Extra-Gradient Methods for Min-Max Optimization and Games
- Adaptive Federated Optimization
- Adaptive Procedural Task Generation for Hard-Exploration Problems
- Adaptive Universal Generalized PageRank Graph Neural Network
- AdaSpeech: Adaptive Text to Speech for Custom Voice
- A Design Space Study for LISTA and Beyond
- A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima
- A Discriminative Gaussian Mixture Model with Sparsity
- A Distributional Approach to Controlled Text Generation
- Adversarially Guided Actor-Critic
- Adversarially-Trained Deep Nets Transfer Better: Illustration on Image Classification
- Adversarial score matching and improved sampling for image generation
- A Geometric Analysis of Deep Generative Image Models and Its Applications
- A Good Image Generator Is What You Need for High-Resolution Video Synthesis
- A Gradient Flow Framework For Analyzing Network Pruning
- A Hypergradient Approach to Robust Regression without Correspondence
- AI for Public Health
- AI in Finance: Scope and Examples
- AIMOCC -- AI: Modeling Oceans and Climate Change
- A Learning Theoretic Perspective on Local Explainability
- ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
- Aligning AI With Shared Human Values
- A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks
- Analyzing the Expressive Power of Graph Neural Networks in a Spectral Perspective
- Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics
- Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
- ANOCE: Analysis of Causal Effects with Multiple Mediators via Constrained Structural Learning
- Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval
- An Unsupervised Deep Learning Approach for Real-World Image Denoising
- Anytime Sampling for Autoregressive Models via Ordered Autoencoding
- A PAC-Bayesian Approach to Generalization Bounds for Graph Neural Networks
- A Panda? No, It's a Sloth: Slowdown Attacks on Adaptive Multi-Exit Neural Network Inference
- Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval
- Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks
- Are Neural Rankers still Outperformed by Gradient Boosted Decision Trees?
- Are wider nets better given the same number of parameters?
- ARMOURED: Adversarially Robust MOdels using Unlabeled data by REgularizing Diversity
- A Roadmap to Never-Ending RL
- Ask Your Humans: Using Human Instructions to Improve Generalization in Reinforcement Learning
- A statistical theory of cold posteriors in deep neural networks
- Async-RED: A Provably Convergent Asynchronous Block Parallel Stochastic Method using Deep Denoising Priors
- A teacher-student framework to distill future trajectories
- A Temporal Kernel Approach for Deep Learning with Continuous-time Information
- A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention
- Attentional Constellation Nets for Few-Shot Learning
- Auction Learning as a Two-Player Game
- Augmenting Physical Models with Deep Networks for Complex Dynamics Forecasting
- A Unified Approach to Interpreting and Boosting Adversarial Transferability
- A unifying view on implicit bias in training linear neural networks
- A Universal Representation Transformer Layer for Few-Shot Image Classification
- AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly
- Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization
- Autoregressive Entity Retrieval
- Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation
- Auxiliary Learning by Implicit Differentiation
- Auxiliary Task Update Decomposition: The Good, the Bad and the Neutral
- Average-case Acceleration for Bilinear Games and Normal Matrices
- A Wigner-Eckart Theorem for Group Equivariant Convolution Kernels
- Bag of Tricks for Adversarial Training
- Balancing Constraints and Rewards with Meta-Gradient D4PG
- Batch Reinforcement Learning Through Continuation Method
- Bayesian Context Aggregation for Neural Processes
- Bayesian Few-Shot Classification with One-vs-Each Pólya-Gamma Augmented Gaussian Processes
- Behavioral Cloning from Noisy Demonstrations
- Benchmarks for Deep Off-Policy Evaluation
- Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods
- BERTology Meets Biology: Interpreting Attention in Protein Language Models
- Better Fine-Tuning by Reducing Representational Collapse
- Beyond Categorical Label Representations for Image Classification
- Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with $1/n$ Parameters
- Beyond Static Papers: Rethinking How We Share Scientific Understanding in ML
- Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech
- BiPointNet: Binary Neural Network for Point Clouds
- Blending MPC & Value Function Approximation for Efficient Reinforcement Learning
- BOIL: Towards Representation Change for Few-shot Learning
- Boost then Convolve: Gradient Boosting Meets Graph Neural Networks
- Bowtie Networks: Generative Modeling for Joint Few-Shot Recognition and Novel-View Synthesis
- BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction
- BREEDS: Benchmarks for Subpopulation Shift
- BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization
- BUSTLE: Bottom-Up Program Synthesis Through Learning-Guided Exploration
- Bypassing the Ambient Dimension: Private SGD with Gradient Subspace Identification
- Byzantine-Resilient Non-Convex Stochastic Gradient Descent
- Calibration of Neural Networks using Splines
- Calibration tests beyond classification
- Can a Fruit Fly Learn Word Embeddings?
- CaPC Learning: Confidential and Private Collaborative Learning
- Capturing Label Characteristics in VAEs
- Categorical Normalizing Flows via Continuous Transformations
- CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning
- CcGAN: Continuous Conditional Generative Adversarial Networks for Image Generation
- Certify or Predict: Boosting Certified Robustness with Compositional Architectures
- Chaos of Learning Beyond Zero-sum and Coordination via Game Decompositions
- Characterizing signal propagation to close the performance gap in unnormalized ResNets
- ChipNet: Budget-Aware Pruning with Heaviside Continuous Approximations
- Clairvoyance: A Pipeline Toolkit for Medical Time Series
- Class Normalization for (Continual)? Generalized Zero-Shot Learning
- C-Learning: Horizon-Aware Cumulative Accessibility Estimation
- C-Learning: Learning to Achieve Goals via Recursive Classification
- Clustering-friendly Representation Learning via Instance Discrimination and Feature Decorrelation
- CO2: Consistent Contrast for Unsupervised Visual Representation Learning
- CoCo: Controllable Counterfactuals for Evaluating Dialogue State Trackers
- CoCon: A Self-Supervised Approach for Controlled Text Generation
- CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding
- Collective Robustness Certificates: Exploiting Interdependence in Graph Neural Networks
- Colorization Transformer
- Combining Ensembles and Data Augmentation Can Harm Your Calibration
- Combining Label Propagation and Simple Models out-performs Graph Neural Networks
- Combining Physics and Machine Learning for Network Flow Estimation
- Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity
- Commonsense AI: Myth and Truth
- Communication in Multi-Agent Reinforcement Learning: Intention Sharing
- Complex Query Answering with Neural Link Predictors
- CompOFA – Compound Once-For-All Networks for Faster Multi-Platform Deployment
- Computational Separation Between Convolutional and Fully-Connected Networks
- Concept Learners for Few-Shot Learning
- Conditional Generative Modeling via Learning the Latent Space
- Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data
- Conditional Negative Sampling for Contrastive Learning of Visual Representations
- Conformation-Guided Molecular Representation with Hamiltonian Neural Networks
- Conservative Safety Critics for Exploration
- Contemplating Real-World Object Classification
- Contextual Dropout: An Efficient Sample-Dependent Dropout Module
- Contextual Transformation Networks for Online Continual Learning
- Continual learning in recurrent neural networks
- Continuous Wasserstein-2 Barycenter Estimation without Minimax Optimization
- Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
- Contrastive Divergence Learning is a Time Reversal Adversarial Game
- Contrastive Explanations for Reinforcement Learning via Embedded Self Predictions
- Contrastive Learning with Adversarial Perturbations for Conditional Text Generation
- Contrastive Learning with Hard Negative Samples
- Contrastive Syn-to-Real Generalization
- Control-Aware Representations for Model-based Reinforcement Learning
- Convex Potential Flows: Universal Probability Distributions with Optimal Transport and Convex Optimization
- Convex Regularization behind Neural Reconstruction
- Coping with Label Shift via Distributionally Robust Optimisation
- CopulaGNN: Towards Integrating Representational and Correlational Roles of Graphs in Graph Neural Networks
- Correcting experience replay for multi-agent communication
- Counterfactual Generative Networks
- Coupled Oscillatory Recurrent Neural Network (coRNN): An accurate and (gradient) stable architecture for learning long time dependencies
- CPR: Classifier-Projection Regularization for Continual Learning
- CPT: Efficient Deep Neural Network Training via Cyclic Precision
- Creative Sketch Generation
- Cross-Attentional Audio-Visual Fusion for Weakly-Supervised Action Localization
- CT-Net: Channel Tensorization Network for Video Classification
- Cut out the annotator, keep the cutout: better segmentation with weak supervision
- Dance Revolution: Long-Term Dance Generation with Music via Curriculum Learning
- DARTS-: Robustly Stepping out of Performance Collapse Without Indicators
- Data-Efficient Reinforcement Learning with Self-Predictive Representations
- Dataset Condensation with Gradient Matching
- Dataset Inference: Ownership Resolution in Machine Learning
- Dataset Meta-Learning from Kernel Ridge-Regression
- DC3: A learning method for optimization with hard constraints
- DDPNOpt: Differential Dynamic Programming Neural Optimizer
- Deberta: Decoding-Enhanced Bert With Disentangled Attention
- Debiasing Concept-based Explanations with Causal Analysis
- Decentralized Attribution of Generative Models
- Deciphering and Optimizing Multi-Task Learning: a Random Matrix Approach
- Deconstructing the Regularization of BatchNorm
- Decoupling Global and Local Representations via Invertible Generative Flows
- DeepAveragers: Offline Reinforcement Learning By Solving Derived Non-Parametric MDPs
- Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation
- Deep Equals Shallow for ReLU Networks in Kernel Regimes
- Deep Learning for Simulation
- Deep Learning meets Projective Clustering
- Deep Networks and the Multiple Manifold Problem
- Deep Neural Network Fingerprinting by Conferrable Adversarial Examples
- Deep Neural Tangent Kernel and Laplace Kernel Have the Same RKHS
- Deep Partition Aggregation: Provable Defenses against General Poisoning Attacks
- Deep Repulsive Clustering of Ordered Data Based on Order-Identity Decomposition
- Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients
- Deformable DETR: Deformable Transformers for End-to-End Object Detection
- Degree-Quant: Quantization-Aware Training for Graph Neural Networks
- DeLighT: Deep and Light-weight Transformer
- Denoising Diffusion Implicit Models
- Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization
- DialoGraph: Incorporating Interpretable Strategy-Graph Networks into Negotiation Dialogues
- DICE: Diversity in Deep Ensembles via Conditional Redundancy Adversarial Estimation
- Differentiable Segmentation of Sequences
- Differentiable Trust Region Layers for Deep Reinforcement Learning
- Differentially Private Learning Needs Better Features (or Much More Data)
- DiffWave: A Versatile Diffusion Model for Audio Synthesis
- DINO: A Conditional Energy-Based GAN for Domain Translation
- Directed Acyclic Graph Neural Networks
- Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate
- Disambiguating Symbolic Expressions in Informal Documents
- Discovering a set of policies for the worst case reward
- Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization
- Discovering Non-monotonic Autoregressive Orderings with Variational Inference
- Discrete Graph Structure Learning for Forecasting Multiple Time Series
- Disentangled Recurrent Wasserstein Autoencoder
- Disentangling 3D Prototypical Networks for Few-Shot Concept Learning
- Distance-Based Regularisation of Deep Networks for Fine-Tuning
- Distilling Knowledge from Reader to Retriever for Question Answering
- Distributed Momentum for Byzantine-resilient Stochastic Gradient Descent
- Distributional Sliced-Wasserstein and Applications to Generative Modeling
- Diverse Video Generation using a Gaussian Process Trigger
- Do 2D GANs Know 3D Shape? Unsupervised 3D Shape Reconstruction from 2D Image GANs
- Does enhanced shape bias improve neural network robustness to common corruptions?
- Domain Generalization with MixStyle
- Domain-Robust Visual Imitation Learning with Mutual Information Constraints
- Do not Let Privacy Overbill Utility: Gradient Embedding Perturbation for Private Learning
- DOP: Off-Policy Multi-Agent Decomposed Policy Gradients
- Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth
- DrNAS: Dirichlet Neural Architecture Search
- Drop-Bottleneck: Learning Discrete Compressed Representation for Noise-Robust Exploration
- Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling
- Dynamic Tensor Rematerialization
- DynaTune: Dynamic Tensor Program Optimization in Deep Neural Network Compilation
- Early Stopping in Deep Networks: Double Descent and How to Eliminate it
- Economic Hyperparameter Optimization With Blended Search Strategy
- EEC: Learning to Encode and Regenerate Images for Continual Learning
- Effective Abstract Reasoning with Dual-Contrast Network
- Effective and Efficient Vote Attack on Capsule Networks
- Effective Distributed Learning with Random Features: Improved Bounds and Algorithms
- Efficient Certified Defenses Against Patch Attacks on Image Classifiers
- Efficient Conformal Prediction via Cascaded Inference with Expanded Admission
- Efficient Continual Learning with Modular Networks and Task-Driven Priors
- Efficient Empowerment Estimation for Unsupervised Stabilization
- Efficient Generalized Spherical CNNs
- Efficient Inference of Flexible Interaction in Spiking-neuron Networks
- Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL
- Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation
- Efficient Wasserstein Natural Gradients for Reinforcement Learning
- EigenGame: PCA as a Nash Equilibrium
- Emergent Road Rules In Multi-Agent Driving Environments
- Emergent Symbols through Binding in External Memory
- Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition
- Empirical or Invariant Risk Minimization? A Sample Complexity Perspective
- End-to-end Adversarial Text-to-Speech
- End-to-End Egospheric Spatial Memory
- Energy-Based Models: Current Perspectives, Challenges, and Opportunities
- Enforcing robust control guarantees within neural network policies
- Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space Navigation
- Entropic gradient descent algorithms and wide flat minima
- Estimating and Evaluating Regression Predictive Uncertainty in Deep Object Detectors
- Estimating informativeness of samples with Smooth Unique Information
- Estimating Lipschitz constants of monotone deep equilibrium models
- Evaluating the Disentanglement of Deep Generative Models through Manifold Topology
- Evaluation of Neural Architectures Trained With Square Loss vs Cross-Entropy in Classification Tasks
- Evaluation of Similarity-based Explanations
- Evaluations and Methods for Explanation through Robustness Analysis
- Evolving Reinforcement Learning Algorithms
- Exemplary Natural Images Explain CNN Activations Better than State-of-the-Art Feature Visualization
- Explainable Deep One-Class Classification
- Explainable Subgraph Reasoning for Forecasting on Temporal Knowledge Graphs
- Explaining by Imitating: Understanding Decisions by Interpretable Policy Learning
- Explaining the Efficacy of Counterfactually Augmented Data
- Exploring Balanced Feature Spaces for Representation Learning
- Exploring the Uncertainty Properties of Neural Networks’ Implicit Priors in the Infinite-Width Limit
- Expressive Power of Invariant and Equivariant Graph Neural Networks
- Extracting Strong Policies for Robotics Tasks from Zero-Order Trajectory Optimizers
- Extreme Memorization via Scale of Initialization
- Factorizing Declarative and Procedural Knowledge in Structured, Dynamical Environments
- FairBatch: Batch Selection for Model Fairness
- FairFil: Contrastive Neural Debiasing Method for Pretrained Text Encoders
- Fair Mixup: Fairness via Interpolation
- Fantastic Four: Differentiable and Efficient Bounds on Singular Values of Convolution Layers
- Fast and Complete: Enabling Complete Neural Network Verification with Rapid and Massively Parallel Incomplete Verifiers
- Fast And Slow Learning Of Recurrent Independent Mechanisms
- Fast convergence of stochastic subgradient method under interpolation
- Faster Binary Embeddings for Preserving Euclidean Distances
- Fast Geometric Projections for Local Robustness Certification
- FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
- FedBE: Making Bayesian Model Ensemble Applicable to Federated Learning
- FedBN: Federated Learning on Non-IID Features via Local Batch Normalization
- Federated Learning Based on Dynamic Regularization
- Federated Learning via Posterior Averaging: A New Perspective and Practical Algorithms
- Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint Learning
- FedMix: Approximation of Mixup under Mean Augmented Federated Learning
- Few-Shot Bayesian Optimization with Deep Kernel Surrogates
- Few-Shot Learning via Learning the Representation, Provably
- Fidelity-based Deep Adiabatic Scheduling
- Filtered Inner Product Projection for Crosslingual Embedding Alignment
- Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
- FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization
- Fooling a Complete Neural Network Verifier
- For self-supervised learning, Rationality implies generalization, provably
- Fourier Neural Operator for Parametric Partial Differential Equations
- Free Lunch for Few-shot Learning: Distribution Calibration
- Fully Unsupervised Diversity Denoising with Convolutional Variational Autoencoders
- Fuzzy Tiling Activations: A Simple Approach to Learning Sparse Representations Online
- GAN2GAN: Generative Noise Learning for Blind Denoising with Single Noisy Images
- GANs Can Play Lottery Tickets Too
- GAN "Steerability" without optimization
- Gauge Equivariant Mesh CNNs: Anisotropic convolutions on geometric graphs
- Generalization beyond the training distribution in brains and machines
- Generalization bounds via distillation
- Generalization in data-driven models of primary visual cortex
- Generalized Energy Based Models
- Generalized Multimodal ELBO
- Generalized Variational Continual Learning
- Generating Adversarial Computer Programs using Optimized Obfuscations
- Generating Furry Cars: Disentangling Object Shape and Appearance across Multiple Domains
- Generative Language-Grounded Policy in Vision-and-Language Navigation with Bayes' Rule
- Generative Scene Graph Networks
- Generative Time-series Modeling with Fourier Flows
- Genetic Soft Updates for Policy Evolution in Deep Reinforcement Learning
- Geometric and Topological Representation Learning
- Geometric Deep Learning: the Erlangen Programme of ML
- Geometry-Aware Gradient Algorithms for Neural Architecture Search
- Geometry-aware Instance-reweighted Adversarial Training
- Getting a CLUE: A Method for Explaining Uncertainty Estimates
- Global Convergence of Three-layer Neural Networks in the Mean Field Regime
- Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime
- Go with the flow: Adaptive control for Neural ODEs
- Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability
- Gradient Origin Networks
- Gradient Projection Memory for Continual Learning
- Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models
- gradSim: Differentiable simulation for system identification and visuomotor control
- Graph-Based Continual Learning
- Graph Coarsening with Neural Networks
- GraphCodeBERT: Pre-training Code Representations with Data Flow
- Graph Convolution with Low-rank Learnable Local Filters
- Graph Edit Networks
- Graph Information Bottleneck for Subgraph Recognition
- Graph Traversal with Tensor Functionals: A Meta-Algorithm for Scalable Learning
- GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing
- Greedy-GQ with Variance Reduction: Finite-time Analysis and Improved Complexity
- Grounded Language Learning Fast and Slow
- Grounding Language to Autonomously-Acquired Skills via Goal Generation
- Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning
- Group Equivariant Conditional Neural Processes
- Group Equivariant Generative Adversarial Networks
- Group Equivariant Stand-Alone Self-Attention For Vision
- Growing Efficient Deep Networks by Structured Continuous Sparsification
- GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
- HalentNet: Multimodal Trajectory Forecasting with Hallucinative Intents
- Hardware-Aware Efficient Training of Deep Learning Models
- Heating up decision boundaries: isocapacitory saturation, adversarial scenarios and generalization bounds
- HeteroFL: Computation and Communication Efficient Federated Learning for Heterogeneous Clients
- Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization
- Hierarchical Autoregressive Modeling for Neural Video Compression
- Hierarchical Reinforcement Learning by Discovering Intrinsic Options
- High-Capacity Expert Binary Networks
- Hopfield Networks is All You Need
- Hopper: Multi-hop Transformer for Spatiotemporal Reasoning
- How Benign is Benign Overfitting ?
- How Can Findings About The Brain Improve AI Systems?
- How Does Mixup Help With Robustness and Generalization?
- How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks?
- How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks
- How to Find Your Friendly Neighborhood: Graph Attention Design with Self-Supervision
- Human-Level Performance in No-Press Diplomacy via Equilibrium Search
- HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark
- Hyperbolic Neural Networks++
- HyperDynamics: Meta-Learning Object and Agent Dynamics with Hypernetworks
- HyperGrid Transformers: Towards A Single Model for Multiple Tasks
- ICLR 2021 Workshop on Embodied Multimodal Learning (EML)
- Identifying nonlinear dynamical systems with multiple time scales and long-range dependencies
- Identifying Physical Law of Hamiltonian Systems via Meta-Learning
- IDF++: Analyzing and Improving Integer Discrete Flows for Lossless Compression
- IEPT: Instance-Level and Episode-Level Pretext Tasks for Few-Shot Learning
- Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels
- Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering
- Impact of Representation Learning in Linear Bandits
- Implicit Convex Regularizers of CNN Architectures: Convex Optimization of Two- and Three-Layer Networks in Polynomial Time
- Implicit Gradient Regularization
- Implicit Normalizing Flows
- Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning
- Improved Autoregressive Modeling with Distribution Smoothing
- Improved Estimation of Concentration Under $\ell_p$-Norm Distance Metrics Using Half Spaces
- Improve Object Detection with Feature-based Knowledge Distillation: Towards Accurate and Efficient Detectors
- Improving Adversarial Robustness via Channel-wise Activation Suppressing
- Improving Relational Regularized Autoencoders with Spherical Sliced Fused Gromov Wasserstein
- Improving Transformation Invariance in Contrastive Representation Learning
- Improving VAEs' Robustness to Adversarial Attack
- Improving Zero-Shot Voice Style Transfer via Disentangled Representation Learning
- Incorporating Symmetry into Deep Dynamics Models for Improved Generalization
- Incremental few-shot learning via vector quantization in deep embedded space
- In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning
- Individually Fair Gradient Boosting
- Individually Fair Rankings
- Inductive Representation Learning in Temporal Networks via Causal Anonymous Walks
- Influence Estimation for Generative Adversarial Networks
- Influence Functions in Deep Learning Are Fragile
- InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective
- Information Laundering for Model Privacy
- Initialization and Regularization of Factorized Neural Layers
- In-N-Out: Pre-Training and Self-Training using Auxiliary Information for Out-of-Distribution Robustness
- In Search of Lost Domain Generalization
- INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving
- Integrating Categorical Semantics into Unsupervised Domain Translation
- Interactive Weak Supervision: Learning Useful Heuristics for Data Labeling
- Interpretable Models for Granger Causality Using Self-explaining Neural Networks
- Interpretable Neural Architecture Search via Bayesian Optimisation with Weisfeiler-Lehman Kernels
- Interpreting and Boosting Dropout from a Game-Theoretic View
- Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking
- Interpreting Knowledge Graph Relation Representation from Word Embeddings
- Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds
- Intraclass clustering: an implicit learning ability that regularizes DNNs
- Intrinsic-Extrinsic Convolution and Pooling for Learning on 3D Protein Structures
- IOT: Instance-wise Layer Reordering for Transformer Structures
- IsarStep: a Benchmark for High-level Mathematical Reasoning
- Is Attention Better Than Matrix Decomposition?
- Is Label Smoothing Truly Incompatible with Knowledge Distillation: An Empirical Study
- Is My Dataset Biased?
- Isometric Propagation Network for Generalized Zero-shot Learning
- Isometric Transformation Invariant and Equivariant Graph Convolutional Networks
- Isotropy in the Contextual Embedding Space: Clusters and Manifolds
- Iterated learning for emergent systematicity in VQA
- Iterative Empirical Game Solving via Single Policy Best Response
- Kanerva++: Extending the Kanerva Machine With Differentiable, Locally Block Allocated Latent Memory
- Knowledge Distillation as Semiparametric Inference
- Knowledge distillation via softmax regression representation learning
- LambdaNetworks: Modeling long-range Interactions without Attention
- Language-Agnostic Representation Learning of Source Code from Structure and Context
- Large Associative Memory Problem in Neurobiology and Machine Learning
- Large Batch Simulation for Deep Reinforcement Learning
- Large Scale Image Completion via Co-Modulated Generative Adversarial Networks
- Large-width functional asymptotics for deep Gaussian neural networks
- Latent Convergent Cross Mapping
- Latent Skill Planning for Exploration and Transfer
- Layer-adaptive Sparsity for the Magnitude-based Pruning
- LEAF: A Learnable Frontend for Audio Classification
- Learnable Embedding sizes for Recommender Systems
- Learning Accurate Entropy Model with Global Reference for Image Compression
- Learning advanced mathematical computations from examples
- Learning a Latent Search Space for Routing Problems using Variational Autoencoders
- Learning a Latent Simplex in Input Sparsity Time
- Learning A Minimax Optimizer: A Pilot Study
- Learning and Evaluating Representations for Deep One-Class Classification
- Learning Associative Inference Using Fast Weight Memory
- Learning-based Support Estimation in Sublinear Time
- Learning Better Structured Representations Using Low-rank Adaptive Label Smoothing
- Learning continuous-time PDEs from sparse data with graph neural networks
- Learning Cross-Domain Correspondence for Control with Dynamics Cycle-Consistency
- Learning Deep Features in Instrumental Variable Regression
- Learning Energy-Based Generative Models via Coarse-to-Fine Expanding and Sampling
- Learning Energy-Based Models by Diffusion Recovery Likelihood
- Learning explanations that are hard to vary
- Learning from Demonstration with Weakly Supervised Disentanglement
- Learning from others' mistakes: Avoiding dataset biases without modeling them
- Learning from Protein Structure with Geometric Vector Perceptrons
- Learning Generalizable Visual Representations via Interactive Gameplay
- Learning Hyperbolic Representations of Topological Features
- Learning Incompressible Fluid Dynamics from Scratch - Towards Fast, Differentiable Fluid Models that Generalize
- Learning Invariant Representations for Reinforcement Learning without Reconstruction
- Learning Long-term Visual Dynamics with Region Proposal Interaction Networks
- Learning Manifold Patch-Based Representations of Man-Made Shapes
- Learning Mesh-Based Simulation with Graph Networks
- Learning Neural Event Functions for Ordinary Differential Equations
- Learning Neural Generative Dynamics for Molecular Conformation Generation
- Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch
- Learning Parametrised Graph Shift Operators
- Learning perturbation sets for robust machine learning
- Learning Reasoning Paths over Semantic Graphs for Video-grounded Dialogues
- Learning Robust State Abstractions for Hidden-Parameter Block MDPs
- Learning Safe Multi-agent Control with Decentralized Neural Barrier Certificates
- Learning Structural Edits via Incremental Tree Transformations
- Learning Subgoal Representations with Slow Dynamics
- Learning Task Decomposition with Ordered Memory Policy Network
- Learning Task-General Representations with Generative Neuro-Symbolic Modeling
- Learning the Pareto Front with Hypernetworks
- Learning to Deceive Knowledge Graph Augmented Models via Targeted Perturbation
- Learning to Generate 3D Shapes with Generative Cellular Automata
- Learning to live with Dale's principle: ANNs with separate excitatory and inhibitory units
- Learning to Make Decisions via Submodular Regularization
- Learning to Reach Goals via Iterated Supervised Learning
- Learning to Recombine and Resample Data For Compositional Generalization
- Learning to Represent Action Values as a Hypergraph on the Action Vertices
- Learning to Sample with Local and Global Contexts in Experience Replay Buffer
- Learning to Set Waypoints for Audio-Visual Navigation
- Learning Value Functions in Deep Policy Gradients using Residual Variance
- Learning "What-if" Explanations for Sequential Decision-Making
- Learning What To Do by Simulating the Past
- Learning with AMIGo: Adversarially Motivated Intrinsic Goals
- Learning with Feature-Dependent Label Noise: A Progressive Approach
- Learning with Instance-Dependent Label Noise: A Sample Sieve Approach
- Lifelong Learning of Compositional Structures
- LiftPool: Bidirectional ConvNet Pooling
- Linear Convergent Decentralized Optimization with Compression
- Linear Last-iterate Convergence in Constrained Saddle-point Optimization
- Linear Mode Connectivity in Multitask and Continual Learning
- Lipschitz Recurrent Neural Networks
- Local Convergence Analysis of Gradient Descent Ascent with Finite Timescale Separation
- Locally Free Weight Sharing for Network Width Search
- Local Search Algorithms for Rank-Constrained Convex Optimization
- Long Live the Lottery: The Existence of Winning Tickets in Lifelong Learning
- Long Range Arena : A Benchmark for Efficient Transformers
- Long-tailed Recognition by Routing Diverse Distribution-Aware Experts
- Long-tail learning via logit adjustment
- Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search
- Lossless Compression of Structured Convolutional Models via Lifting
- LowKey: Leveraging Adversarial Attacks to Protect Social Media Users from Facial Recognition
- Machine Learning for Preventing and Combating Pandemics
- MALI: A memory efficient and reverse accurate integrator for Neural ODEs
- Mapping the Timescale Organization of Neural Language Models
- MARS: Markov Molecular Sampling for Multi-objective Drug Discovery
- Mastering Atari with Discrete World Models
- Mathematical Reasoning via Self-supervised Skip-tree Training
- Measuring Massive Multitask Language Understanding
- MELR: Meta-Learning via Modeling Episode-Level Relationships for Few-Shot Learning
- Memory Optimization for Deep Networks
- Meta Back-Translation
- Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
- Meta-Learning of Structured Task Distributions in Humans and Machines
- Meta-learning Symmetries by Reparameterization
- Meta-learning with negative learning rates
- Meta-Learning with Neural Tangent Kernels
- MetaNorm: Learning to Normalize Few-Shot Batches Across Domains
- MiCE: Mixture of Contrastive Experts for Unsupervised Image Clustering
- Mind the Gap when Conditioning Amortised Inference in Sequential Latent-Variable Models
- Mind the Pad -- CNNs Can Develop Blind Spots
- Minimum Width for Universal Approximation
- Mirostat: A Neural Text Decoding Algorithm That Directly Controls Perplexity
- Mixed-Features Vectors and Subspace Splitting
- MixKD: Towards Efficient Distillation of Large-scale Language Models
- MODALS: Modality-agnostic Automated Data Augmentation in the Latent Space
- Model-based micro-data reinforcement learning: what are the crucial model properties and which model to choose?
- Model-Based Offline Planning
- Model-Based Visual Planning with Self-Supervised Functional Distances
- Modeling the Second Player in Distributionally Robust Optimization
- Modelling Hierarchical Structure between Dialogue Policy and Natural Language Generator with Option Framework for Task-oriented Dialogue System
- Model Patching: Closing the Subgroup Performance Gap with Data Augmentation
- Molecule Optimization by Explainable Evolution
- MONGOOSE: A Learnable LSH Framework for Efficient Neural Network Training
- Monotonic Kronecker-Factored Lattice
- Monte-Carlo Planning and Learning with Language Action Value Estimates
- MoPro: Webly Supervised Learning with Momentum Prototypes
- More or Less: When and How to Build Convolutional Neural Network Ensembles
- MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond
- Moving beyond the fairness rhetoric in machine learning
- Multi-Class Uncertainty Calibration via Mutual Information Maximization-based Binning
- Multi-Level Local SGD: Distributed SGD for Heterogeneous Hierarchical Networks
- MultiModalQA: complex question answering over text, tables and images
- Multiplicative Filter Networks
- Multi-Prize Lottery Ticket Hypothesis: Finding Accurate Binary Neural Networks by Pruning A Randomly Weighted Network
- Multi-resolution modeling of a discrete stochastic process identifies causes of cancer
- Multiscale Score Matching for Out-of-Distribution Detection
- Multi-Time Attention Networks for Irregularly Sampled Time Series
- Multi-timescale Representation Learning in LSTM Language Models
- Multivariate Probabilistic Time Series Forecasting via Conditioned Normalizing Flows
- Mutual Information State Intrinsic Control
- My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control
- NAS-Bench-ASR: Reproducible Neural Architecture Search for Speech Recognition
- NBDT: Neural-Backed Decision Tree
- Nearest Neighbor Machine Translation
- Negative Data Augmentation
- NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation
- Net-DNF: Effective Deep Modeling of Tabular Data
- Network Pruning That Matters: A Case Study on Retraining Variants
- Neural Approximate Sufficient Statistics for Implicit Models
- Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective
- Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks
- Neural Compression: From Information Theory to Applications
- Neural Conversational AI: Bridging the Gap Between Research and Real World (NeuCAIR)
- Neural Delay Differential Equations
- Neural gradients are near-lognormal: improved quantized and sparse training
- Neural Jump Ordinary Differential Equations: Consistent Continuous-Time Prediction and Filtering
- Neural Learning of One-of-Many Solutions for Combinatorial Problems in Structured Output Spaces
- Neurally Augmented ALISTA
- Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics
- Neural Networks for Learning Counterfactual G-Invariances from Single Environments
- Neural networks with late-phase weights
- Neural ODE Processes
- Neural Pruning via Growing Regularization
- Neural representation and generation for RNA secondary structures
- Neural Spatio-Temporal Point Processes
- Neural Synthesis of Binaural Speech From Mono Audio
- Neural Thompson Sampling
- Neural Topic Model via Optimal Transport
- New Bounds For Distributed Mean Estimation and Variance Reduction
- No Cost Likelihood Manipulation at Test Time for Making Better Mistakes in Deep Networks
- Noise against noise: stochastic label noise helps combat inherent label noise
- Noise or Signal: The Role of Image Backgrounds in Object Recognition
- No MCMC for me: Amortized sampling for fast and stable training of energy-based models
- Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and Dual Bounds
- Nonseparable Symplectic Neural Networks
- not-MIWAE: Deep Generative Modelling with Missing not at Random Data
- NOVAS: Non-convex Optimization via Adaptive Stochastic Search for End-to-end Learning and Control
- Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers
- Offline Model-Based Optimization via Normalized Maximum Likelihood Estimation
- On Data-Augmentation and Consistency-Based Semi-Supervised Learning
- On Dyadic Fairness: Exploring and Mitigating Bias in Graph Connections
- One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks
- On Fast Adversarial Robustness Adaptation in Model-Agnostic Meta-Learning
- On Graph Neural Networks versus Graph-Augmented MLPs
- On InstaHide, Phase Retrieval, and Sparse Matrix Factorization
- On Learning Universal Representations Across Languages
- Online Adversarial Purification based on Self-supervised Learning
- On Position Embeddings in BERT
- On Self-Supervised Image Representations for GAN Evaluation
- On Statistical Bias In Active Learning: How and When to Fix It
- On the Bottleneck of Graph Neural Networks and its Practical Implications
- On the Critical Role of Conventions in Adaptive Human-AI Collaboration
- On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis
- On the Dynamics of Training Attention Models
- On the geometry of generalization and memorization in deep neural networks
- On the Impossibility of Global Convergence in Multi-Loss Optimization
- On the mapping between Hopfield networks and Restricted Boltzmann Machines
- On the Origin of Implicit Regularization in Stochastic Gradient Descent
- On the role of planning in model-based deep reinforcement learning
- On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines
- On the Theory of Implicit Deep Learning: Global Convergence with Implicit Layers
- On the Transfer of Disentangled Representations in Realistic Settings
- On the Universality of Rotation Equivariant Point Cloud Networks
- On the Universality of the Double Descent Peak in Ridgeless Regression
- OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning
- Open Question Answering over Tables and Text
- Optimal Conversion of Conventional Artificial Neural Networks to Spiking Neural Networks
- Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime
- Optimal Regularization can Mitigate Double Descent
- Optimism in Reinforcement Learning with Generalized Linear Function Approximation
- Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning
- Orthogonalizing Convolutional Layers with the Cayley Transform
- Overfitting for Fun and Profit: Instance-Adaptive Data Compression
- Overparameterisation and worst-case generalisation: friend or foe?
- PAC Confidence Predictions for Deep Neural Network Classifiers
- Parameter-Based Value Functions
- Parameter Efficient Multimodal Transformers for Video Representation Learning
- Parrot: Data-Driven Behavioral Priors for Reinforcement Learning
- Partitioned Learned Bloom Filters
- PC2WF: 3D Wireframe Reconstruction from Raw Point Clouds
- PDE-Driven Spatiotemporal Disentanglement
- Perceiving the 3D World from Images and Video
- Perceptual Adversarial Robustness: Defense Against Unseen Threat Models
- Personalized Federated Learning with First Order Model Optimization
- Physics-aware, probabilistic model order reduction with guaranteed stability
- Plan-Based Relaxed Reward Shaping for Goal-Directed Tasks
- Planning from Pixels using Inverse Dynamics Models
- PlasticineLab: A Soft-Body Manipulation Benchmark with Differentiable Physics
- PMI-Masking: Principled masking of correlated spans
- PolarNet: Learning to Optimize Polar Keypoints for Keypoint Based Object Detection
- Policy-Driven Attack: Learning to Query for Hard-label Black-box Adversarial Examples
- Practical Massively Parallel Monte-Carlo Tree Search Applied to Molecular Design
- Practical Real Time Recurrent Learning with a Sparse Approximation
- Predicting Classification Accuracy When Adding New Unobserved Classes
- Predicting Inductive Biases of Pre-Trained Models
- Predicting Infectiousness for Proactive Contact Tracing
- Prediction and generalisation over directed actions by grid cells
- Pre-training Text-to-Text Transformers for Concept-centric Common Sense
- Primal Wasserstein Imitation Learning
- Private Image Reconstruction from System Side Channels Using Generative Models
- Private Post-GAN Boosting
- Probabilistic Numeric Convolutional Neural Networks
- Probing BERT in Hyperbolic Spaces
- Progressive Skeletonization: Trimming more fat from a network at initialization
- Projected Latent Markov Chain Monte Carlo: Conditional Sampling of Normalizing Flows
- Property Controllable Variational Autoencoder via Invertible Mutual Dependence
- Protecting DNNs from Theft using an Ensemble of Diverse Models
- Prototypical Contrastive Learning of Unsupervised Representations
- Prototypical Representation Learning for Relation Extraction
- Provable Rich Observation Reinforcement Learning with Combinatorial Latent States
- Provably robust classification of adversarial examples with detection
- Proximal Gradient Descent-Ascent: Variable Convergence under KŁ Geometry
- Pruning Neural Networks at Initialization: Why Are We Missing the Mark?
- PseudoSeg: Designing Pseudo Labels for Semantic Segmentation
- PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences
- QPLEX: Duplex Dueling Multi-Agent Q-Learning
- Quantifying Differences in Reward Functions
- Random Feature Attention
- Randomized Automatic Differentiation
- Randomized Ensembled Double Q-Learning: Learning Fast Without a Model
- Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments
- Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient Estimator
- Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets
- Rapid Task-Solving in Novel Environments
- Recurrent Independent Mechanisms
- Reducing the Computational Cost of Deep Generative Models with Binary Neural Networks
- Refining Deep Generative Models via Discriminator Gradient Flow
- Regularization Matters in Policy Optimization - An Empirical Study on Continuous Control
- Regularized Inverse Reinforcement Learning
- Reinforcement Learning with Random Delays
- Relating by Contrasting: A Data-efficient Framework for Multimodal Generative Models
- Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting
- Removing Undesirable Feature Contributions Using Out-of-Distribution Data
- Representation Balancing Offline Model-based Reinforcement Learning
- Representation learning for improved interpretability and classification accuracy of clinical factors from EEG
- Representation Learning for Sequence Data with Deep Autoencoding Predictive Components
- Representation Learning via Invariant Causal Mechanisms
- Representing Partial Programs with Blended Abstract Semantics
- Repurposing Pretrained Models for Robust Out-of-domain Few-Shot Learning
- Reset-Free Lifelong Learning with Skill-Space Planning
- ResNet After All: Neural ODEs and Their Numerical Solution
- Responsible AI (RAI)
- Rethinking Architecture Selection in Differentiable NAS
- Rethinking Attention with Performers
- Rethinking Embedding Coupling in Pre-trained Language Models
- Rethinking Positional Encoding in Language Pre-training
- Rethinking Soft Labels for Knowledge Distillation: A Bias–Variance Tradeoff Perspective
- Rethinking the Role of Gradient-based Attribution Methods for Model Interpretability
- Retrieval-Augmented Generation for Code Summarization via Hybrid GNN
- Return-Based Contrastive Representation Learning for Reinforcement Learning
- Revisiting Dynamic Convolution via Matrix Decomposition
- Revisiting Few-sample BERT Fine-tuning
- Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction
- Revisiting Locally Supervised Learning: an Alternative to End-to-end Training
- Reweighting Augmented Samples by Minimizing the Maximal Expected Loss
- R-GAP: Recursive Gradient Attack on Privacy
- Ringing ReLUs: Harmonic Distortion Analysis of Nonlinear Feedforward Networks
- Risk-Averse Offline Reinforcement Learning
- RMSprop converges with proper hyper-parameter
- RNNLogic: Learning Logic Rules for Reasoning on Knowledge Graphs
- Robust and Generalizable Visual Representation Learning via Random Convolutions
- Robust and reliable machine learning in the real world
- Robust Curriculum Learning: from clean label detection to noisy label self-correction
- Robust early-learning: Hindering the memorization of noisy labels
- Robust Learning of Fixed-Structure Bayesian Networks in Nearly-Linear Time
- Robust Overfitting may be mitigated by properly learned smoothening
- Robust Pruning at Initialization
- Robust Reinforcement Learning on State Observations with Learned Optimal Adversary
- RODE: Learning Roles to Decompose Multi-Agent Tasks
- S2D-OLAD: From shallow to deep, overcoming limited and adverse data
- SAFENet: A Secure, Accurate and Fast Neural Network Inference
- SALD: Sign Agnostic Learning with Derivatives
- Saliency is a Possible Red Herring When Diagnosing Poor Generalization
- SaliencyMix: A Saliency Guided Data Augmentation Strategy for Better Regularization
- Sample-Efficient Automated Deep Reinforcement Learning
- Scalable Bayesian Inverse Reinforcement Learning
- Scalable Learning and MAP Inference for Nonsymmetric Determinantal Point Processes
- Scalable Transfer Learning with Expert Models
- Scaling Symbolic Methods using Gradients for Neural Model Explanation
- Scaling the Convex Barrier with Active Sets
- Science and Engineering of Deep Learning
- Score-Based Generative Modeling through Stochastic Differential Equations
- SCoRe: Pre-Training for Context Representation in Conversational Semantic Parsing
- Security and Safety in Machine Learning Systems
- SEDONA: Search for Decoupled Neural Networks toward Greedy Block-wise Learning
- SEED: Self-supervised Distillation For Visual Representation
- Selective Classification Can Magnify Disparities Across Groups
- Selectivity considered harmful: evaluating the causal impact of class selectivity in DNNs
- Self-supervised Adversarial Robustness for the Low-label, High-data Regime
- Self-supervised Learning from a Multi-view Perspective
- Self-Supervised Learning of Compressed Video Representations
- Self-Supervised Policy Adaptation during Deployment
- Self-supervised Representation Learning with Relative Predictive Coding
- Self-supervised Visual Reinforcement Learning with Object-centric Representations
- Self-Supervision for Learning from the Bottom Up
- Self-Supervision for Reinforcement Learning
- Self-training For Few-shot Transfer Across Extreme Task Differences
- Semantic Re-tuning with Contrastive Tension
- Semi-supervised Keypoint Localization
- SenSeI: Sensitive Set Invariance for Enforcing Individual Fairness
- Separation and Concentration in Deep Networks
- Seq2Tens: An Efficient Representation of Sequences by Low-Rank Tensor Projections
- Sequential Density Ratio Estimation for Simultaneous Optimization of Speed and Accuracy
- Set Prediction without Imposing Structure as Conditional Density Estimation
- Shape or Texture: Understanding Discriminative Features in CNNs
- Shape-Texture Debiased Neural Network Training
- Shapley explainability on the data manifold
- Shapley Explanation Networks
- Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation
- Sharper Generalization Bounds for Learning with Gradient-dominated Objective Functions
- Sharpness-aware Minimization for Efficiently Improving Generalization
- Signatory: differentiable computations of the signature and logsignature transforms, on both CPU and GPU
- Simple Augmentation Goes a Long Way: ADRL for DNN Quantization
- Simple Spectral Graph Convolution
- Single-Photon Image Classification
- Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy
- SkipW: Resource Adaptable RNN with Strict Upper Computational Limit
- Sliced Kernelized Stein Discrepancy
- SMiRL: Surprise Minimizing Reinforcement Learning in Unstable Environments
- Soft bodied robots for human centered design of robots for everyday life
- SOLAR: Sparse Orthogonal Learned and Random Embeddings
- Solving Compositional Reinforcement Learning Problems via Task Reduction
- Sparse encoding for more-interpretable feature-selecting representations in probabilistic matrix factorization
- Sparse Quantized Spectral Clustering
- Spatial Dependency Networks: Neural Layers for Improved Generative Image Modeling
- Spatially Structured Recurrent Modules
- Spatio-Temporal Graph Scattering Transform
- SSD: A Unified Framework for Self-Supervised Outlier Detection
- Stabilized Medical Image Attacks
- Statistical inference for individual fairness
- Stochastic Security: Adversarial Defense Using Long-Run Dynamics of Energy-Based Models
- Structured Prediction as Translation between Augmented Natural Languages
- Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning
- Support-set bottlenecks for video-text representation learning
- Symmetry-Aware Actor-Critic for 3D Molecular Design
- Synthetic Data Generation: Quality, Privacy, Bias
- Systematic generalisation with group invariant predictions
- Taking Notes on the Fly Helps Language Pre-Training
- Taming GANs with Lookahead-Minmax
- Targeted Attack against Deep Neural Networks via Flipping Limited Weight Bits
- Task-Agnostic Morphology Evolution
- Teaching Temporal Logics to Neural Networks
- Teaching with Commentaries
- Temporally-Extended ε-Greedy Exploration
- Tent: Fully Test-Time Adaptation by Entropy Minimization
- Text Generation by Learning from Demonstrations
- The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers
- The geometry of integration in text classification RNNs
- The Importance of Pessimism in Fixed-Dataset Policy Optimization
- The inductive bias of ReLU networks on orthogonally separable data
- The Intrinsic Dimension of Images and Its Impact on Learning
- Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data
- Theoretical bounds on estimation error for meta-learning
- The Recurrent Neural Tangent Kernel
- The Risks of Invariant Risk Minimization
- The role of Disentanglement in Generalisation
- The Role of Mathematical Reasoning in General Artificial Intelligence
- The Role of Momentum Parameters in the Optimal Convergence of Adaptive Polyak's Heavy-ball Methods
- The Traveling Observer Model: Multi-task Learning Through Spatial Variable Embeddings
- The Unreasonable Effectiveness of Patches in Deep Convolutional Kernels Methods
- Tilted Empirical Risk Minimization
- Tomographic Auto-Encoder: Unsupervised Bayesian Recovery of Corrupted Data
- Topology-Aware Segmentation Using Discrete Morse Theory
- Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis
- Towards Impartial Multi-task Learning
- Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding
- Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning
- Towards Robustness Against Natural Language Word Substitutions
- Towards Robust Neural Networks via Close-loop Control
- Tradeoffs in Data Augmentation: An Empirical Study
- Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs
- Training GANs with Stronger Augmentations via Contrastive Discriminator
- Training independent subnetworks for robust prediction
- Training with Quantization Noise for Extreme Model Compression
- Trajectory Prediction using Equivariant Continuous Convolution
- Transformer protein language models are unsupervised structure learners
- Transient Non-stationarity and Generalisation in Deep Reinforcement Learning
- TropEx: An Algorithm for Extracting Linear Terms in Deep Neural Networks
- Trusted Multi-View Classification
- UMEC: Unified model and embedding compression for efficient recommendation systems
- Unbiased Teacher for Semi-Supervised Object Detection
- Uncertainty-aware Active Learning for Optimal Bayesian Classifier
- Uncertainty Estimation and Calibration with Finite-State Probabilistic RNNs
- Uncertainty Estimation in Autoregressive Structured Prediction
- Uncertainty in Gradient Boosting via Ensembles
- Uncertainty Sets for Image Classifiers using Conformal Prediction
- Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning
- Understanding and Improving Lexical Choice in Non-Autoregressive Translation
- Understanding Over-parameterization in Generative Adversarial Networks
- Understanding the effects of data parallelism and sparsity on neural network training
- Understanding the failure modes of out-of-distribution generalization
- Understanding the role of importance weighting for deep learning
- Undistillable: Making A Nasty Teacher That CANNOT teach students
- Universal approximation power of deep residual neural networks via nonlinear control theory
- Universal Weakly Supervised Segmentation by Pixel-to-Segment Contrastive Learning
- Unlearnable Examples: Making Personal Data Unexploitable
- Unsupervised Audiovisual Synthesis via Exemplar Autoencoders
- Unsupervised Discovery of 3D Physical Objects
- Unsupervised Meta-Learning through Latent-Space Interpolation in Generative Models
- Unsupervised Object Keypoint Learning using Local Spatial Predictability
- Unsupervised Representation Learning for Time Series with Temporal Neighborhood Coding
- UPDeT: Universal Multi-agent RL via Policy Decoupling with Transformers
- Usable Information and Evolution of Optimal Representations During Training
- Using latent space regression to analyze and leverage compositionality in GANs
- VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models
- VA-RED$^2$: Video Adaptive Redundancy Reduction
- Variational Information Bottleneck for Effective Low-Resource Fine-Tuning
- Variational Intrinsic Control Revisited
- Variational State-Space Models for Localisation and Dense 3D Mapping in 6 DoF
- VCNet and Functional Targeted Regularization For Learning Causal Effects of Continuous Treatments
- Vector-output ReLU Neural Network Problems are Copositive Programs: Convex Analysis of Two Layer Networks and Polynomial-time Algorithms
- Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images
- Viewmaker Networks: Learning Views for Unsupervised Representation Learning
- VTNet: Visual Transformer Network for Object Goal Navigation
- Vulnerability-Aware Poisoning Mechanism for Online RL with Unknown Dynamics
- Wandering within a world: Online contextualized few-shot learning
- WaNet - Imperceptible Warping-based Backdoor Attack
- Wasserstein-2 Generative Networks
- Wasserstein Embedding for Graph Learning
- Watch-And-Help: A Challenge for Social Perception and Human-AI Collaboration
- WaveGrad: Estimating Gradients for Waveform Generation
- What are the Statistical Limits of Offline RL with Linear Function Approximation?
- What Can You Learn From Your Muscles? Learning Visual Representation from Human Interactions
- What Makes Instance Discrimination Good for Transfer Learning?
- What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study
- What Should Not Be Contrastive in Contrastive Learning
- What they do when in doubt: a study of inductive biases in seq2seq learners
- When Do Curricula Work?
- When does preconditioning help or hurt generalization?
- When Optimizing $f$-Divergence is Robust with Label Noise
- Why Are Convolutional Nets More Sample-Efficient than Fully-Connected Nets?
- Why resampling outperforms reweighting for correcting sampling bias with stochastic gradients
- Winning the L2RPN Challenge: Power Grid Management via Semi-Markov Afterstate Actor-Critic
- Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching
- Workshop on Distributed and Private Machine Learning
- Workshop on Enormous Language Models: Perspectives and Benchmarks
- Workshop on Learning to Learn
- Workshop on Neural Architecture Search
- Workshop on Weakly Supervised Learning
- WrapNet: Neural Net Inference with Ultra-Low-Precision Arithmetic
- X2T: Training an X-to-Text Typing Interface with Online Learning from User Feedback
- You Only Need Adversarial Supervision for Semantic Image Synthesis
- Zero-Cost Proxies for Lightweight NAS
- Zero-shot Synthesis with Group-Supervised Learning