Tue 2:30 a.m. - 4:30 a.m.
|
Practical Applications of Behaviour Suite for Reinforcement Learning
(
Poster
)
link »
Poster Location: MH1-2-3-4 #150 In 2019, researchers at DeepMind published a suite of reinforcement learning environments called Behavior Suite for Reinforcement Learning, or bsuite. Each environment is designed to directly test a core capability of a general reinforcement learning agent, such as its ability to generalize from past experience or handle delayed rewards. The authors claim that bsuite can be used to benchmark agents and bridge the gap between theoretical and applied reinforcement learning understanding. In this blog post, we extend their work by providing specific examples of how bsuite can address common challenges faced by reinforcement learning practitioners during the development process. Our work offers pragmatic guidance to researchers and highlights future research directions in reproducible reinforcement learning. |
Loren Anderson · Nathan Bittner 🔗 |
Tue 2:30 a.m. - 4:30 a.m.
|
A Hitchhiker's Guide to Momentum
(
Poster
)
link »
Poster Location: MH1-2-3-4 #149 Polyak momentum is one of the most iconic methods in optimization. Despite it's simplicit, it features rich dynamics that depend both on the step-size and momentum parameter. In this blog post we identify the different regions of the parameter space and discuss their convergence properties using the theory of Chebyshev polynomials. |
Fabian Pedregosa 🔗 |
Tue 2:30 a.m. - 4:30 a.m.
|
Thinking Like Transformers
(
Poster
)
link »
Poster Location: MH1-2-3-4 #167 Thinking like Transformers proposes a computational framework for Transformer-like calculations. The framework uses discrete computation to simulate Transformer computations. The resulting language RASP is a programming language where every program compiles down to a specific Transformer. In this blog post, we reimplement a variant of RASP in Python (RASPy). The language is roughly compatible with the original version, but with some syntactic changes for simplicity. With this language, we consider a challenging set of puzzles to walk through and understand how it works. |
Alexander M Rush · Gail Weiss 🔗 |
Tue 2:30 a.m. - 4:30 a.m.
|
Autoregressive Renaissance in Neural PDE Solvers
(
Poster
)
link »
Poster Location: MH1-2-3-4 #151 Recent developments in the field of neural partial differential equation (PDE) solvers have placed a strong emphasis on neural operators. However, the paper "Message Passing Neural PDE Solver" by Brandstetter et al. published in ICLR 2022 revisits autoregressive models and designs a message passing graph neural network that is comparable with or outperforms both the state-of-the-art Fourier Neural Operator and traditional classical PDE solvers in its generalization capabilities and performance. This blog post delves into the key contributions of this work, exploring the strategies used to address the common problem of instability in autoregressive models and the design choices of the message passing graph neural network architecture. |
Yolanne Y R Lee 🔗 |
Tue 2:30 a.m. - 4:30 a.m.
|
How does the inductive bias influence the generalization capability of neural networks?
(
Poster
)
link »
Poster Location: MH1-2-3-4 #169 Deep neural networks are a commonly used machine learning technique that have proven to be effective for many different use cases. However, their ability to generalize from training data is not well understood. In this blog post, we will explore the paper “Identity Crisis: Memorization and Generalization under Extreme Overparameterization” by Zhang et al. [2020], which aims to shed light on the question of why neural networks are able to generalize and how inductive biases influence their generalization capabilities. |
Charlotte Barth · Thomas Goerttler · Klaus Obermayer 🔗 |
Tue 2:30 a.m. - 4:30 a.m.
|
How much meta-learning is in image-to-image translation?
(
Poster
)
link »
Poster Location: MH1-2-3-4 #73 At the last ICLR conference, Zhou et al. [2022] presented work showing that CNNs do not transfer information between classes of a classification task.Allan Zhou, Fahim Tajwar, Alexander Robey, Tom Knowles, George J. Pappas, Hamed Hassani, Chelsea Finn [ICLR, 2022] Do Deep Networks Transfer Invariances Across Classes?Here is a quick summary of their findings: If we train a Convolutional Neural Net (CNN) to classify fruit on a set of randomly brightened and darkened images of apples and oranges, it will learn to ignore the scene’s brightness. We say that the CNN learned that classification is invariant to the nuisance transformation of randomly changing the brightness of an image. We now add a set of plums to the training data, but fewer examples of them than we have apples and oranges. However, we keep using the same random transformations. The training set thus becomes class-imbalanced.We might expect a sophisticated learner to look at the entire dataset, recognize the random brightness modifications across all types of fruit and henceforth ignore brightness when making predictions. If this applied to our fruit experiment, the CNN would be similarly good at ignoring lighting variations on all types of fruit. Furthermore, we would expect the CNN to become more competent at ignoring lighting variations in proportion to the total amount of images, irrespective of which fruit they depict.Zhou et al. [2022] show that a CNN does not behave like this: When using a CNN on a class-imbalanced classification task with random nuisance transformations, the CNNs invariance to the transformation is proportional to the size of the training set for each class. This finding suggests CNNs don’t transfer invariance between classes when learning such a classification task.However, there is a solution: Zhou et al. [2022] use an Image to Image translation architecture called MUNIT to learn the transformations and generate additional data from which the CNN can learn the invariance separately for each class. Thus, the invariance to nuisance transformations is transferred generatively. They call this method Generative Invariance Transfer (GIT).In this blog post, we are going to argue that the experiment described above is a meta-learning experiment and that MUNIT is related to meta-learning methods. |
Maximilian Eißler · Thomas Goerttler · Klaus Obermayer 🔗 |
Tue 2:30 a.m. - 4:30 a.m.
|
Rethinking the Implementation Tricks and Monotonicity Constraint in Cooperative Multi-agent Reinforcement Learning
(
Poster
)
link »
Poster Location: MH1-2-3-4 #80 QMIX, a very classical multi-agent reinforcement learning (MARL) algorithm, is often considered to be a weak performance baseline due to its representation capability limitations. However, we found that by improving the implementation techniques of QMIX we can enable it to achieve state-of-the-art under the StarCraft Multi-Agent Challenge (SMAC). Further, we found that the monotonicity constraint of QMIX is a key factor for its superior performance. We have open-sourced the code at https://github.com/xxxx/xxxx (Anonymous) for researchers to evaluate the effects of these proposed techniques. Our work has been widely used as a new QMIX baseline. |
Jian Hu · Siying Wang · Siyang Jiang · Weixun Wang 🔗 |
-
|
Data Poisoning is Hitting a Wall
(
Poster
)
link »
Data poisoning has been proposed as a compelling defense against facial recognition models trained on Web-scraped pictures. Users can perturb images they post online, so that models will misclassify future (unperturbed) pictures.We demonstrate that this strategy provides a false sense of security, as it ignores an inherent asymmetry between the parties: users' pictures are perturbed once and for all before being published (at which point they are scraped) and must thereafter fool all future models---including models trained adaptively against the users' past attacks, or models that use new technologies discovered after the attack. We evaluate two systems for poisoning attacks against large-scale facial recognition, Fawkes (500,000+ downloads) and LowKey. We demonstrate how an "oblivious" model trainer can simply wait for future developments in computer vision to nullify the protection of pictures collected in the past. We further show that an adversary with black-box access to the attack can (i) train a robust model that resists the perturbations of collected pictures and (ii) detect poisoned pictures uploaded online. We caution that facial recognition poisoning will not admit an "arms race" between attackers and defenders. Once perturbed pictures are scraped, the attack cannot be changed so any future successful defense irrevocably undermines users' privacy. |
Rajat Sahay 🔗 |
-
|
Decay No More
(
Poster
)
link »
Weight decay is among the most important tuning parameters to reach high accuracy for large-scale machine learning models. In this blog post, we revisit AdamW, the weight decay version of Adam, summarizing empirical findings as well as theoretical motivations from an optimization perspective. |
Fabian Schaipp 🔗 |
-
|
Universality of Neural Networks on Sets and Graphs
(
Poster
)
link »
Universal function approximation is one of the central tenets in theoretical deep learning research. It is the question whether a specific neural network architecture is, in theory, able to approximate any function of interest. The ICLR paper "How Powerful are Graph Neural Networks?" shows that mathematically analysing the constraints of an architecture as a universal function approximator and alleviating these constraints can lead to more principled architecture choices, performance improvements, and long term impact on the field. Specifically in the fields of learning on sets and learning on graphs, universal function approximation is a well-studied property. The two fields are closely linked, because the need for permutation invariance in both cases lead to similar building blocks. However, these two fields have evolved in parallel, often lacking awareness of developments in the respective other field. This post aims at bringing these two fields closer together, particularly from the perspective of universal function approximation. |
Fabian Fuchs · Petar Veličković 🔗 |
-
|
Strategies for Classification Layer Initialization in Model-Agnostic Meta-Learning
(
Poster
)
link »
In a previous study, Raghu et al. [2020] found that in model-agnostic meta-learning (MAML) for few-shot classification, the majority of changes observed in the network during the inner loop fine-tuning process occurred in the linear classification head. It is commonly believed that during this phase, the linear head remaps encoded features to the classes of the new task. In traditional MAML, the weights of the final linear layer are meta-learned in the usual way. However, there are some issues with this approach:First, it is difficult to imagine that a single set of optimal weights can be learned. This becomes apparent when considering class label permutations: two different tasks may have the same classes but in a different order. As a result, the weights that perform well for the first task will likely not be effective for the second task. This is reflected in the fact that MAML’s performance can vary by up to 15% depending on the class label assignments during testing.Second, more challenging datasets such as Meta-Dataset are being proposed as few-shot learning benchmarks. These datasets have varying numbers of classes per task, making it impossible to learn a single set of weights for the classification layer.Therefore, it seems logical to consider how to initialize the final classification layer before fine-tuning on a new task. Random initialization may not be optimal, as it can introduce unnecessary noise.This blog post will discuss different approaches to the last layer initialization that claim to outperform the original MAML method. |
Nys Tjade Siegel · Thomas Goerttler · Klaus Obermayer 🔗 |