ICLR 2020 Papers

Skip to yearly menu bar Skip to main content

Layout:

mini compact topic detail

Computation Reallocation for Object Detection

At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?

A Closer Look at the Optimization Landscapes of Generative Adversarial Networks

Hamiltonian Generative Networks

Stochastic Weight Averaging in Parallel: Large-Batch Training That Generalizes Well

Convergence of Gradient Methods on Bilinear Zero-Sum Games

Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies

Global Relational Models of Source Code

Continual learning with hypernetworks

Environmental drivers of systematicity and generalization in a situated agent

An Exponential Learning Rate Schedule for Deep Learning

Understanding the Limitations of Conditional Generative Models

ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning

Adversarial AutoAugment

Neural Machine Translation with Universal Visual Representation

Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?

Once for All: Train One Network and Specialize it for Efficient Deployment

Dynamical Distance Learning for Semi-Supervised and Unsupervised Skill Discovery

Differentiation of Blackbox Combinatorial Solvers

Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets

Dynamically Pruned Message Passing Networks for Large-scale Knowledge Graph Reasoning

Counterfactuals uncover the modular structure of deep generative models

Fast Neural Network Adaptation via Parameter Remapping and Architecture Search

Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization

Learning to Group: A Bottom-Up Framework for 3D Part Discovery in Unseen Categories

Action Semantics Network: Considering the Effects of Actions in Multiagent Systems

Kernelized Wasserstein Natural Gradient

Learning from Explanations with Neural Execution Tree

Variance Reduction With Sparse Gradients

Batch-shaping for learning conditional channel gated networks

Self-Supervised Learning of Appliance Usage

CAQL: Continuous Action Q-Learning

Domain Adaptive Multibranch Networks

Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning

Mirror-Generative Neural Machine Translation

FSNet: Compression of Deep Convolutional Neural Networks by Filter Summary

Explain Your Move: Understanding Agent Actions Using Salient and Relevant Feature Attribution

Convolutional Conditional Neural Processes

Regularizing activations in neural networks via distribution matching with the Wasserstein metric

VideoFlow: A Conditional Flow-Based Model for Stochastic Video Generation

Deep Orientation Uncertainty Learning based on a Bingham Loss

Scale-Equivariant Steerable Networks

The intriguing role of module criticality in the generalization of deep networks

A Theoretical Analysis of the Number of Shots in Few-Shot Learning

Graph Neural Networks Exponentially Lose Expressive Power for Node Classification

Provable Filter Pruning for Efficient Neural Networks

Option Discovery using Deep Skill Chaining

Deep Symbolic Superoptimization Without Human Knowledge

State Alignment-based Imitation Learning

Mogrifier LSTM

Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control

Target-Embedding Autoencoders for Supervised Representation Learning

Fair Resource Allocation in Federated Learning

Causal Discovery with Reinforcement Learning

Geom-GCN: Geometric Graph Convolutional Networks

Variational Hetero-Encoder Randomized GANs for Joint Image-Text Modeling

Sampling-Free Learning of Bayesian Quantized Neural Networks

On the Relationship between Self-Attention and Convolutional Layers

A Generalized Training Approach for Multiagent Learning

Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee

Towards Verified Robustness under Text Deletion Interventions

Mixed Precision DNNs: All you need is a good parametrization

On Computation and Generalization of Generative Adversarial Imitation Learning

Demystifying Inter-Class Disentanglement

Progressive Learning and Disentanglement of Hierarchical Representations

Transferable Perturbations of Deep Feature Distributions

Hypermodels for Exploration

AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures

Semi-Supervised Generative Modeling for Controllable Speech Synthesis

You Only Train Once: Loss-Conditional Training of Deep Networks

Ranking Policy Gradient

Understanding and Robustifying Differentiable Architecture Search

On the interaction between supervision and self-play in emergent communication

Knowledge Consistency between Neural Networks and Beyond

Capsules with Inverted Dot-Product Attention Routing

Variational Autoencoders for Highly Multivariate Spatial Point Processes Intensities

Towards Fast Adaptation of Neural Architectures with Meta Learning

Stochastic Conditional Generative Networks with Basis Decomposition

Guiding Program Synthesis by Learning to Generate Examples

HiLLoC: lossless image compression with hierarchical latent variable models

Estimating counterfactual treatment outcomes over time through adversarially balanced representations

Denoising and Regularization via Exploiting the Structural Bias of Convolutional Generators

Identifying through Flows for Recovering Latent Representations

Learning Efficient Parameter Server Synchronization Policies for Distributed SGD

Watch, Try, Learn: Meta-Learning from Demonstrations and Rewards

Strategies for Pre-training Graph Neural Networks

Decoupling Representation and Classifier for Long-Tailed Recognition

Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks

Accelerating SGD with momentum for over-parameterized learning

PAC Confidence Sets for Deep Neural Networks via Calibrated Prediction

Inductive and Unsupervised Representation Learning on Graph Structured Objects

GraphSAINT: Graph Sampling Based Inductive Learning Method

Non-Autoregressive Dialog State Tracking

Disentangling neural mechanisms for perceptual grouping

A Probabilistic Formulation of Unsupervised Text Style Transfer

MEMO: A Deep Network for Flexible Combination of Episodic Memories

Neural Stored-program Memory

Asymptotics of Wide Networks from Feynman Diagrams

Optimistic Exploration even with a Pessimistic Initialisation

Gradient Descent Maximizes the Margin of Homogeneous Neural Networks

Duration-of-Stay Storage Assignment under Uncertainty

Continual Learning with Bayesian Neural Networks for Non-Stationary Data

Language GANs Falling Short

Neural Tangents: Fast and Easy Infinite Neural Networks in Python

Fooling Detection Alone is Not Enough: Adversarial Attack against Multiple Object Tracking

Gap-Aware Mitigation of Gradient Staleness

Finite Depth and Width Corrections to the Neural Tangent Kernel

Augmenting Genetic Algorithms with Deep Neural Networks for Exploring the Chemical Space

Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning

SCALOR: Generative World Models with Scalable Object Representations

ReMixMatch: Semi-Supervised Learning with Distribution Matching and Augmentation Anchoring

Graph Constrained Reinforcement Learning for Natural Language Action Spaces

Learning Robust Representations via Multi-View Information Bottleneck

Dynamics-Aware Unsupervised Skill Discovery

Ridge Regression: Structure, Cross-Validation, and Sketching

Padé Activation Units: End-to-end Learning of Flexible Activation Functions in Deep Networks

Feature Interaction Interpretability: A Case for Explaining Ad-Recommendation Systems via Neural Interaction Detection

End to End Trainable Active Contours via Differentiable Rendering

Learning Disentangled Representations for CounterFactual Regression

Symplectic Recurrent Neural Networks

RNA Secondary Structure Prediction By Learning Unrolled Algorithms

BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by Coupling Binary Activations

Training individually fair ML models with sensitive subspace robustness

Mixed-curvature Variational Autoencoders

The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information Budget

Meta-Learning with Warped Gradient Descent

Towards a Deep Network Architecture for Structured Smoothness

Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

Locally Constant Networks

Phase Transitions for the Information Bottleneck in Representation Learning

Lookahead: A Far-sighted Alternative of Magnitude-based Pruning

Decentralized Deep Learning with Arbitrary Communication Compression

Fast is better than free: Revisiting adversarial training

Structured Object-Aware Physics Prediction for Video Modeling and Planning

Fast Task Inference with Variational Intrinsic Successor Features

Ae-Ot: A New Generative Model Based on Extended Semi-Discrete Optimal Transport

Generative Ratio Matching Networks

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

The Early Phase of Neural Network Training

Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples

Pay Attention to Features, Transfer Learn Faster CNNs

Few-shot Text Classification with Distributional Signatures

Memory-Based Graph Networks

On the Equivalence between Positional Node Embeddings and Structural Graph Representations

Sub-policy Adaptation for Hierarchical Reinforcement Learning

DBA: Distributed Backdoor Attacks against Federated Learning

Gradient-Based Neural DAG Learning

Scalable Neural Methods for Reasoning With a Symbolic Knowledge Base

A closer look at the approximation capabilities of neural networks

Black-Box Adversarial Attack with Transferable Model-based Embedding

Online and stochastic optimization beyond Lipschitz continuity: A Riemannian approach

Self-labelling via simultaneous clustering and representation learning

Classification-Based Anomaly Detection for General Data

AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

RNNs Incrementally Evolving on an Equilibrium Manifold: A Panacea for Vanishing and Exploding Gradients?

Implementing Inductive bias for different navigation tasks through diverse RNN attrractors

Neural Text Generation With Unlikelihood Training

Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps

Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness

Spike-based causal inference for weight alignment

Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring

Empirical Studies on the Properties of Linear Regions in Deep Neural Networks

Expected Information Maximization: Using the I-Projection for Mixture Density Estimation

Meta-learning curiosity algorithms

Actor-Critic Provably Finds Nash Equilibria of Linear-Quadratic Mean-Field Games

Learning Execution Through Neural Code Fusion

Emergence of functional and structural properties of the head direction system by optimization of recurrent neural networks

Implicit Bias of Gradient Descent based Adversarial Training on Separable Data

Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering

Learning from Rules Generalizing Labeled Exemplars

Measuring Compositional Generalization: A Comprehensive Method on Realistic Data

SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference

Automated Relational Meta-learning

Reinforced Genetic Algorithm Learning for Optimizing Computation Graphs

RaPP: Novelty Detection with Reconstruction along Projection Pathway

Neural Execution of Graph Algorithms

Compositional Language Continual Learning

Observational Overfitting in Reinforcement Learning

Understanding Knowledge Distillation in Non-autoregressive Machine Translation

Learning to Plan in High Dimensions via Neural Exploration-Exploitation Trees

TabFact: A Large-scale Dataset for Table-based Fact Verification

Neural Arithmetic Units

Reconstructing continuous distributions of 3D protein structure from cryo-EM images

Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint

SELF: Learning to Filter Noisy Labels with Self-Ensembling

Robust Reinforcement Learning for Continuous Control with Model Misspecification

GenDICE: Generalized Offline Estimation of Stationary Values

Unsupervised Model Selection for Variational Disentangled Representation Learning

Robust training with ensemble consensus

Improved Sample Complexities for Deep Neural Networks and Robust Classification via an All-Layer Margin

Functional vs. parametric equivalence of ReLU networks

Robust Subspace Recovery Layer for Unsupervised Anomaly Detection

A Constructive Prediction of the Generalization Error Across Scales

Learning deep graph matching with channel-independent embedding and Hungarian attention

Learning To Explore Using Active Neural SLAM

BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget

Tranquil Clouds: Neural Networks for Learning Temporally Coherent Features in Point Clouds

Why Not to Use Zero Imputation? Correcting Sparsity Bias in Training Neural Networks

Rényi Fair Inference

SAdam: A Variant of Adam for Strongly Convex Functions

SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition

Hoppity: Learning Graph Transformations to Detect and Fix Bugs in Programs

Neural Symbolic Reader: Scalable Integration of Distributed and Symbolic Representations for Reading Comprehension

Adversarial Training and Provable Defenses: Bridging the Gap

Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation

Composing Task-Agnostic Policies with Deep Reinforcement Learning

NeurQuRI: Neural Question Requirement Inspector for Answerability Prediction in Machine Reading Comprehension

Lazy-CFR: fast and near-optimal regret minimization for extensive games with imperfect information

Semantically-Guided Representation Learning for Self-Supervised Monocular Depth

Vid2Game: Controllable Characters Extracted from Real-World Videos

Network Deconvolution

Pure and Spurious Critical Points: a Geometric Study of Linear Networks

Improving Generalization in Meta Reinforcement Learning using Learned Objectives

Meta Dropout: Learning to Perturb Latent Features for Generalization

A Theory of Usable Information under Computational Constraints

Gradientless Descent: High-Dimensional Zeroth-Order Optimization

Measuring and Improving the Use of Graph Information in Graph Neural Networks

Few-Shot Learning on Graphs via Super-Classes Based on Graph Spectral Measures

Keep Doing What Worked: Behavior Modelling Priors for Offline Reinforcement Learning

A Learning-based Iterative Method for Solving Vehicle Routing Problems

Bridging Mode Connectivity in Loss Landscapes and Adversarial Robustness

Understanding the Limitations of Variational Mutual Information Estimators

Generalized Convolutional Forest Networks for Domain Generalization and Visual Recognition

Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP

Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning

Relational State-Space Model for Stochastic Multi-Object Systems

Deep Learning of Determinantal Point Processes via Proper Spectral Sub-gradient

Unrestricted Adversarial Examples via Semantic Manipulation

Data-Independent Neural Pruning via Coresets

Your classifier is secretly an energy based model and you should treat it like one

Generalization through Memorization: Nearest Neighbor Language Models

Piecewise linear activations substantially shape the loss surfaces of neural networks

Contrastive Learning of Structured World Models

On the Variance of the Adaptive Learning Rate and Beyond

Scalable Model Compression by Entropy Penalized Reparameterization

Ensemble Distribution Distillation

Low-Resource Knowledge-Grounded Dialogue Generation

Novelty Detection Via Blurring

A Signal Propagation Perspective for Pruning Neural Networks at Initialization

GLAD: Learning Sparse Graph Recovery

Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning

Rethinking the Hyperparameters for Fine-tuning

Dynamics-Aware Embeddings

Unbiased Contrastive Divergence Algorithm for Training Energy-Based Latent Variable Models

Robustness Verification for Transformers

Improved memory in recurrent neural networks with sequential non-normal dynamics

Locality and Compositionality in Zero-Shot Learning

Extreme Classification via Adversarial Softmax Approximation

Economy Statistical Recurrent Units For Inferring Nonlinear Granger Causality

The Gambler's Problem and Beyond

DropEdge: Towards Deep Graph Convolutional Networks on Node Classification

Interpretable Complex-Valued Neural Networks for Privacy Protection

Geometric Insights into the Convergence of Nonlinear TD Learning

Learning Compositional Koopman Operators for Model-Based Control

On the Need for Topology-Aware Generative Models for Manifold-Based Defenses

Probabilistic Connection Importance Inference and Lossless Compression of Deep Neural Networks

Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activations

Robust anomaly detection and backdoor attack detection via differential privacy

Finding and Visualizing Weaknesses of Deep Reinforcement Learning Agents

Conditional Learning of Fair Representations

Infinite-Horizon Differentiable Model Predictive Control

Measuring the Reliability of Reinforcement Learning Algorithms

Disagreement-Regularized Imitation Learning

Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP

NAS evaluation is frustratingly hard

Curvature Graph Network

Compositional languages emerge in a neural iterated learning model

Depth-Adaptive Transformer

Plug and Play Language Models: A Simple Approach to Controlled Text Generation

NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture Search

Self-Adversarial Learning with Comparative Discrimination for Text Generation

Model Based Reinforcement Learning for Atari

Stochastic AUC Maximization with Deep Neural Networks

Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network

Toward Evaluating Robustness of Deep Reinforcement Learning with Continuous Control

GAT: Generative Adversarial Training for Adversarial Example Detection and Classification

DeepSphere: a graph-based spherical CNN

Learning-Augmented Data Stream Algorithms

Disentanglement by Nonlinear ICA with General Incompressible-flow Networks (GIN)

Drawing Early-Bird Tickets: Toward More Efficient Training of Deep Networks

The Shape of Data: Intrinsic Distance for Data Distributions

Implementation Matters in Deep RL: A Case Study on PPO and TRPO

Skip Connections Matter: On the Transferability of Adversarial Examples Generated with ResNets

Universal Approximation with Certified Networks

Deep Semi-Supervised Anomaly Detection

BayesOpt Adversarial Attack

Encoding word order in complex embeddings

N-BEATS: Neural basis expansion analysis for interpretable time series forecasting

Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distribution Tasks

To Relieve Your Headache of Training an MRF, Take AdVIL

Learning to Link

Federated Learning with Matched Averaging

BERTScore: Evaluating Text Generation with BERT

Dream to Control: Learning Behaviors by Latent Imagination

What graph neural networks cannot learn: depth vs width

Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks

Efficient Probabilistic Logic Reasoning with Graph Neural Networks

Breaking Certified Defenses: Semantic Adversarial Examples With Spoofed Robustness Certificates

Biologically inspired sleep algorithm for increased generalization and adversarial robustness in deep neural networks

Deep neuroethology of a virtual rodent

DeepHoyer: Learning Sparser Neural Network with Differentiable Scale-Invariant Sparsity Measures

Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving

Dynamic Time Lag Regression: Predicting What & When

On Mutual Information Maximization for Representation Learning

Lite Transformer with Long-Short Range Attention

Adversarial Policies: Attacking Deep Reinforcement Learning

A critical analysis of self-supervision, or what we can learn from a single image

Discovering Motor Programs by Recomposing Demonstrations

PairNorm: Tackling Oversmoothing in GNNs

On the Global Convergence of Training Deep Linear ResNets

Defending Against Physically Realizable Attacks on Image Classification

Learning to Coordinate Manipulation Skills via Skill Behavior Diversification

Learning the Arrow of Time for Problems in Reinforcement Learning

Deep probabilistic subsampling for task-adaptive compressed sensing

Nesterov Accelerated Gradient and Scale Invariance for Adversarial Attacks

Multiplicative Interactions and Where to Find Them

And the Bit Goes Down: Revisiting the Quantization of Neural Networks

Exploratory Not Explanatory: Counterfactual Analysis of Saliency Maps for Deep Reinforcement Learning

Curriculum Loss: Robust Learning and Generalization against Label Corruption

PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search

On the Weaknesses of Reinforcement Learning for Neural Machine Translation

Contrastive Representation Distillation

Dynamic Model Pruning with Feedback

Sign Bits Are All You Need for Black-Box Attacks

Building Deep Equivariant Capsule Networks

U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation

B-Spline CNNs on Lie groups

Span Recovery for Deep Neural Networks with Applications to Input Obfuscation

Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model

Sliced Cramer Synaptic Consolidation for Preserving Deeply Learned Representations

Critical initialisation in continuous approximations of binary neural networks

Learn to Explain Efficiently via Neural Logic Inductive Learning

Transformer-XH: Multi-Evidence Reasoning with eXtra Hop Attention

SpikeGrad: An ANN-equivalent Computation Model for Implementing Backpropagation with Spikes

Massively Multilingual Sparse Word Representations

Deep Audio Priors Emerge From Harmonic Convolutional Networks

The Local Elasticity of Neural Networks

RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments

Uncertainty-guided Continual Learning with Bayesian Neural Networks

Differentiable Reasoning over a Virtual Knowledge Base

Understanding and Improving Information Transfer in Multi-Task Learning

StructPool: Structured Graph Pooling via Conditional Random Fields

Generative Models for Effective ML on Private, Decentralized Datasets

Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML

Energy-based models for atomic-resolution protein conformations

Abductive Commonsense Reasoning

Training binary neural networks with real-to-binary convolutions

Maxmin Q-learning: Controlling the Estimation Bias of Q-learning

Discriminative Particle Filter Reinforcement Learning for Complex Partial observations

Thieves on Sesame Street! Model Extraction of BERT-based APIs

High Fidelity Speech Synthesis with Adversarial Networks

Program Guided Agent

Sharing Knowledge in Multi-Task Deep Reinforcement Learning

Controlling generative models with continuous factors of variations

Federated Adversarial Domain Adaptation

Symplectic ODE-Net: Learning Hamiltonian Dynamics with Control

AdvectiveNet: An Eulerian-Lagrangian Fluidic Reservoir for Point Cloud Processing

State-only Imitation with Transition Dynamics Mismatch

Lipschitz constant estimation of Neural Networks via sparse polynomial optimization

GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation

A Fair Comparison of Graph Neural Networks for Graph Classification

Deep Imitative Models for Flexible Inference, Planning, and Control

Augmenting Non-Collaborative Dialog Systems with Explicit Semantic and Strategic Dialog History

Distance-Based Learning from Errors for Confidence Calibration

Training Recurrent Neural Networks Online by Learning Explicit State Variables

Query2box: Reasoning over Knowledge Graphs in Vector Space Using Box Embeddings

CoPhy: Counterfactual Learning of Physical Dynamics

ES-MAML: Simple Hessian-Free Meta Learning

Combining Q-Learning and Search with Amortized Value Estimates

Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints

Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction

Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks

Towards Stable and Efficient Training of Verifiably Robust Neural Networks

Neural Policy Gradient Methods: Global Optimality and Rates of Convergence

The Break-Even Point on Optimization Trajectories of Deep Neural Networks

Adjustable Real-time Style Transfer

vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations

Learned Step Size Quantization

Low-dimensional statistical manifold embedding of directed graphs

Query-efficient Meta Attack to Deep Neural Networks

Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform

Learning Expensive Coordination: An Event-Based Deep RL Approach

Adversarial Lipschitz Regularization

Provable robustness against all adversarial $l_p$-perturbations for $p\geq 1$

Effect of Activation Functions on the Training of Overparametrized Neural Nets

Transferring Optimality Across Data Distributions via Homotopy Methods

Population-Guided Parallel Policy Search for Reinforcement Learning

Explanation by Progressive Exaggeration

Enhancing Adversarial Defense by k-Winners-Take-All

Enhancing Transformation-Based Defenses Against Adversarial Attacks with a Distribution Classifier

Making Sense of Reinforcement Learning and Probabilistic Inference

Tree-Structured Attention with Hierarchical Accumulation

Optimal Strategies Against Generative Attacks

Deep Network Classification by Scattering and Homotopy Dictionary Learning

Physics-aware Difference Graph Networks for Sparsely-Observed Dynamics

Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models

On Universal Equivariant Set Networks

Neural tangent kernels, transportation mappings, and universal approximation

CLN2INV: Learning Loop Invariants with Continuous Logic Networks

Neural Epitome Search for Architecture-Agnostic Network Compression

Episodic Reinforcement Learning with Associative Memory

Improving Neural Language Generation with Spectrum Control

In Search for a SAT-friendly Binarized Neural Network Architecture

CLEVRER: Collision Events for Video Representation and Reasoning

Understanding Generalization in Recurrent Neural Networks

Harnessing Structures for Value-Based Planning and Reinforcement Learning

Multilingual Alignment of Contextual Word Representations

Mathematical Reasoning in Latent Space

Smooth markets: A basic mechanism for organizing gradient-based learners

Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple Unified Framework

Learning to solve the credit assignment problem

Permutation Equivariant Models for Compositional Generalization in Language

The Logical Expressiveness of Graph Neural Networks

Reducing Transformer Depth on Demand with Structured Dropout

On Identifiability in Transformers

Overlearning Reveals Sensitive Attributes

MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius

Meta-Learning Acquisition Functions for Transfer Learning in Bayesian Optimization

Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

Mutual Information Gradient Estimation for Representation Learning

CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning

Lagrangian Fluid Simulation with Continuous Convolutions

Pruned Graph Scattering Transforms

Influence-Based Multi-Agent Exploration

Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling

DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames

Four Things Everyone Should Know to Improve Batch Normalization

On Bonus Based Exploration Methods In The Arcade Learning Environment

Weakly Supervised Clustering by Exploiting Unique Class Count

Theory and Evaluation Metrics for Learning Disentangled Representations

A Target-Agnostic Attack on Deep Models: Exploiting Security Vulnerabilities of Transfer Learning

Rotation-invariant clustering of neuronal responses in primary visual cortex

Hierarchical Foresight: Self-Supervised Learning of Long-Horizon Tasks via Visual Subgoal Generation

EMPIR: Ensembles of Mixed Precision Deep Networks for Increased Robustness Against Adversarial Attacks

Intrinsic Motivation for Encouraging Synergistic Behavior

An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality

Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models

Data-dependent Gaussian Prior Objective for Language Generation

Intensity-Free Learning of Temporal Point Processes

GraphZoom: A Multi-level Spectral Approach for Accurate and Scalable Graph Embedding

Masked Based Unsupervised Content Transfer

Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation

Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives

A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms

Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks

Truth or backpropaganda? An empirical investigation of deep learning theory

Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks

Exploring Model-based Planning with Policy Networks

Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation

Improving Adversarial Robustness Requires Revisiting Misclassified Examples

Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning

Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning

Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization

FasterSeg: Searching for Faster Real-time Semantic Segmentation

Jacobian Adversarially Regularized Networks for Robustness

Distributed Bandit Learning: Near-Optimal Regret with Efficient Communication

Cross-Lingual Ability of Multilingual BERT: An Empirical Study

DivideMix: Learning with Noisy Labels as Semi-supervised Learning

Quantifying the Cost of Reliable Photo Authentication via High-Performance Learned Lossy Representations

VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

Monotonic Multihead Attention

Residual Energy-Based Models for Text Generation

Fantastic Generalization Measures and Where to Find Them

Adversarially robust transfer learning

Adversarially Robust Representations with Smooth Encoders

On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

Graph inference learning for semi-supervised classification

Discrepancy Ratio: Evaluating Model Performance When Even Experts Disagree on the Truth

Revisiting Self-Training for Neural Sequence Generation

Imitation Learning via Off-Policy Distribution Matching

Iterative energy-based projection on a normal data manifold for anomaly localization

A Closer Look at Deep Policy Gradients

Tensor Decompositions for Temporal Knowledge Base Completion

Progressive Memory Banks for Incremental Domain Adaptation

IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks

AtomNAS: Fine-Grained End-to-End Neural Architecture Search

Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning

The Ingredients of Real World Robotic Reinforcement Learning

Frequency-based Search-control in Dyna

Exploration in Reinforcement Learning with Deep Covering Options

Projection-Based Constrained Policy Optimization

On Robustness of Neural Ordinary Differential Equations

Generalization bounds for deep convolutional neural networks

Learning to Control PDEs with Differentiable Physics

Real or Not Real, that is the Question

Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds

MMA Training: Direct Input Space Margin Maximization through Adversarial Training

Editable Neural Networks

Learning to Move with Affordance Maps

Model-Augmented Actor-Critic: Backpropagating through Paths

Multi-Agent Interactions Modeling with Correlated Policies

Model-based reinforcement learning for biological sequence design

Intriguing Properties of Adversarial Training at Scale

Behaviour Suite for Reinforcement Learning

Meta-Q-Learning

Deep Double Descent: Where Bigger Models and More Data Hurt

Understanding Architectures Learnt by Cell-based Neural Architecture Search

Extreme Tensoring for Low-Memory Preconditioning

Differentiable learning of numerical rules in knowledge graphs

Neural Network Branching for Neural Network Verification

Learning representations for binary-classification without backpropagation

NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search

Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity

Robust Local Features for Improving the Generalization of Adversarial Training

Shifted and Squeezed 8-bit Floating Point format for Low-Precision Training of Deep Neural Networks

A Neural Dirichlet Process Mixture Model for Task-Free Continual Learning

Learning Space Partitions for Nearest Neighbor Search

Geometric Analysis of Nonconvex Optimization Landscapes for Overcomplete Learning

Principled Weight Initialization for Hypernetworks

Order Learning and Its Application to Age Estimation

Recurrent neural circuits for contour detection

Hyper-SAGNN: a self-attention based graph neural network for hypergraphs

Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions

DiffTaichi: Differentiable Programming for Physical Simulation

Efficient and Information-Preserving Future Frame Prediction and Beyond

Meta-Learning Deep Energy-Based Memory Models

Bayesian Meta Sampling for Fast Uncertainty Adaptation

Restricting the Flow: Information Bottlenecks for Attribution

Multi-agent Reinforcement Learning for Networked System Control

Composition-based Multi-Relational Graph Convolutional Networks

Gradient $\ell_1$ Regularization for Quantization Robustness

Towards neural networks that provably know when they don't know

Reanalysis of Variance Reduced Temporal Difference Learning

Quantum Algorithms for Deep Convolutional Neural Networks

Inductive representation learning on temporal graphs

Bounds on Over-Parameterization for Guaranteed Existence of Descent Paths in Shallow ReLU Networks

Consistency Regularization for Generative Adversarial Networks

Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation

Learning to Guide Random Search

Emergent Tool Use From Multi-Agent Autocurricula

Can gradient clipping mitigate label noise?

From Inference to Generation: End-to-end Fully Self-supervised Generation of Human Face from Speech

LAMOL: LAnguage MOdeling for Lifelong Language Learning

Sparse Coding with Gated Learned ISTA

FSPool: Learning Set Representations with Featurewise Sort Pooling

Pre-training Tasks for Embedding-based Large-scale Retrieval

Robust And Interpretable Blind Image Denoising Via Bias-Free Convolutional Neural Networks

DeepV2D: Video to Depth with Differentiable Structure from Motion

AMRL: Aggregated Memory For Reinforcement Learning

Learning to Represent Programs with Property Signatures

V4D: 4D Convolutional Neural Networks for Video-level Representation Learning

Selection via Proxy: Efficient Data Selection for Deep Learning

PCMC-Net: Feature-based Pairwise Choice Markov Chains

BackPACK: Packing more into Backprop

Kernel of CycleGAN as a principal homogeneous space

Higher-Order Function Networks for Learning Composable 3D Object Representations

Decoding As Dynamic Programming For Recurrent Autoregressive Models

Variational Recurrent Models for Solving Partially Observable Control Tasks

Learning Heuristics for Quantified Boolean Formulas through Reinforcement Learning

VL-BERT: Pre-training of Generic Visual-Linguistic Representations

On the Convergence of FedAvg on Non-IID Data

Never Give Up: Learning Directed Exploration Strategies

Depth-Width Trade-offs for ReLU Networks via Sharkovsky's Theorem

Deep Learning For Symbolic Mathematics

Incorporating BERT into Neural Machine Translation

Escaping Saddle Points Faster with Stochastic Momentum

Learning from Unlabelled Videos Using Contrastive Predictive Neural 3D Mapping

Unpaired Point Cloud Completion on Real Scans using Adversarial Training

Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers

ProxSGD: Training Structured Neural Networks under Regularization and Constraints

Single Episode Policy Transfer in Reinforcement Learning

Don't Use Large Mini-batches, Use Local SGD

Disentangling Factors of Variations Using Few Labels

A Latent Morphology Model for Open-Vocabulary Neural Machine Translation

Stable Rank Normalization for Improved Generalization in Neural Networks and GANs

A Framework for Robustness Certification of Smoothed Classifiers Using F-Divergences

Co-Attentive Equivariant Neural Networks: Focusing Equivariance On Transformations Co-Occurring in Data

Gradients as Features for Deep Representation Learning

Multi-Scale Representation Learning for Spatial Feature Distributions using Grid Cells

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

The asymptotic spectrum of the Hessian of DNN throughout training

Detecting Extrapolation with Local Ensembles

Spectral Embedding of Regularized Block Models

Empirical Bayes Transductive Meta-Learning with Synthetic Gradients

How much Position Information Do Convolutional Neural Networks Encode?

RGBD-GAN: Unsupervised 3D Representation Learning From Natural Image Datasets via RGBD Image Synthesis

Weakly Supervised Disentanglement with Guarantees

Directional Message Passing for Molecular Graphs

Information Geometry of Orthogonal Initializations and Training

BatchEnsemble: an Alternative Approach to Efficient Ensemble and Lifelong Learning

Minimizing FLOPs to Learn Efficient Sparse Representations

Quantifying Point-Prediction Uncertainty in Neural Networks via Residual Estimation with an I/O Kernel

A Mutual Information Maximization Perspective of Language Representation Learning

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Diverse Trajectory Forecasting with Determinantal Point Processes

Graph Convolutional Reinforcement Learning

Picking Winning Tickets Before Training by Preserving Gradient Flow

Unsupervised Clustering using Pseudo-semi-supervised Learning

Learning transport cost from subset correspondence

Scalable and Order-robust Continual Learning with Additive Parameter Decomposition

Scaling Autoregressive Video Models

GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations

Understanding Why Neural Networks Generalize Well Through GSNR of Parameters

Linear Symmetric Quantization of Neural Networks for Low-precision Integer Hardware

Reformer: The Efficient Transformer

Automatically Discovering and Learning New Visual Categories with Ranking Statistics

Difference-Seeking Generative Adversarial Network--Unseen Sample Generation

Comparing Rewinding and Fine-tuning in Neural Network Pruning

I Am Going MAD: Maximum Discrepancy Competition for Comparing Classifiers Adaptively

Are Transformers universal approximators of sequence-to-sequence functions?

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Enabling Deep Spiking Neural Networks with Hybrid Conversion and Spike Timing Dependent Backpropagation

RTFM: Generalising to New Environment Dynamics via Reading

Certified Robustness for Top-k Predictions against Adversarial Perturbations via Randomized Smoothing

Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference

Analysis of Video Feature Learning in Two-Stream CNNs on the Example of Zebrafish Swim Bout Classification

You CAN Teach an Old Dog New Tricks! On Training Knowledge Graph Embeddings

SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models

Understanding l4-based Dictionary Learning: Interpretation, Stability, and Robustness

Inductive Matrix Completion Based on Graph Neural Networks

LambdaNet: Probabilistic Type Inference using Graph Neural Networks

Latent Normalizing Flows for Many-to-Many Cross-Domain Mappings

Conservative Uncertainty Estimation By Fitting Prior Networks

Compressive Transformers for Long-Range Sequence Modelling

word2ket: Space-efficient Word Embeddings inspired by Quantum Entanglement

Differentially Private Meta-Learning

Adaptive Structural Fingerprints for Graph Attention Networks

Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data

InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization

Infinite-horizon Off-Policy Policy Evaluation with Multiple Behavior Policies

Learning to Learn by Zeroth-Order Oracle

RaCT: Toward Amortized Ranking-Critical Training For Collaborative Filtering

Training Generative Adversarial Networks from Incomplete Observations using Factorised Discriminators

Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation

Learning The Difference That Makes A Difference With Counterfactually-Augmented Data

White Noise Analysis of Neural Networks

Intrinsically Motivated Discovery of Diverse Patterns in Self-Organizing Systems

Neural Outlier Rejection for Self-Supervised Keypoint Learning

Estimating Gradients for Discrete Random Variables by Sampling without Replacement

Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation

Reinforced active learning for image segmentation

On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach

A Stochastic Derivative Free Optimization Method with Momentum

Input Complexity and Out-of-distribution Detection with Likelihood-based Generative Models

Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech

Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue

Watch the Unobserved: A Simple Approach to Parallelizing Monte Carlo Tree Search

Physics-as-Inverse-Graphics: Unsupervised Physical Parameter Estimation from Video

Jelly Bean World: A Testbed for Never-Ending Learning

Posterior sampling for multi-agent reinforcement learning: solving extensive games with imperfect information

FreeLB: Enhanced Adversarial Training for Natural Language Understanding

Neural Module Networks for Reasoning over Text

SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum

AutoQ: Automated Kernel-Wise Neural Network Quantization

V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control

Meta-Learning without Memorization

DDSP: Differentiable Digital Signal Processing

A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case

What Can Neural Networks Reason About?

MetaPix: Few-Shot Video Retargeting

Functional Regularisation for Continual Learning with Gaussian Processes

Identity Crisis: Memorization and Generalization Under Extreme Overparameterization

Probability Calibration for Knowledge Graph Embedding Models

Thinking While Moving: Deep Reinforcement Learning with Concurrent Control

Smoothness and Stability in GANs

From Variational to Deterministic Autoencoders

SNOW: Subscribing to Knowledge via Channel Pooling for Transfer & Lifelong Learning of Convolutional Neural Networks

Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification

CATER: A diagnostic dataset for Compositional Actions & TEmporal Reasoning

Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks

How to 0wn the NAS in Your Spare Time

Variational Template Machine for Data-to-Text Generation

Deep Graph Matching Consensus

Certified Defenses for Adversarial Patches

The Curious Case of Neural Text Degeneration

Learning Nearly Decomposable Value Functions Via Communication Minimization

Short and Sparse Deconvolution --- A Geometric Approach

Deep 3D Pan via Local adaptive "t-shaped" convolutions with global and local adaptive dilations

Distributionally Robust Neural Networks

DeFINE: Deep Factorized Input Token Embeddings for Neural Sequence Modeling

Sign-OPT: A Query-Efficient Hard-label Adversarial Attack

Evaluating The Search Phase of Neural Architecture Search

A Baseline for Few-Shot Image Classification

Abstract Diagrammatic Reasoning with Multiplex Graph Networks

SNODE: Spectral Discretization of Neural ODEs for System Identification

On the "steerability" of generative adversarial networks

SVQN: Sequential Variational Soft Q-Learning Networks

Automated curriculum generation through setter-solver interactions

One-Shot Pruning of Recurrent Neural Networks by Jacobian Spectrum Evaluation

Synthesizing Programmatic Policies that Inductively Generalize

The Implicit Bias of Depth: How Incremental Learning Drives Generalization

Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks

Double Neural Counterfactual Regret Minimization

Continual Learning with Adaptive Weights (CLAW)

SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards

Logic and the 2-Simplicial Transformer

Image-guided Neural Object Rendering