Workshop

S2D-OLAD: From shallow to deep, overcoming limited and adverse data

Colin Bellinger, Roberto Corizzo, Vincent Dumoulin, Nathalie Japkowicz

Abstract:

Data coupled with the right algorithms offers the potential to save lives, protect the environment and increase profitability in different applications and domains. This potential, however, can be severely inhibited by adverse data properties specifically resulting in poor model performance, failed projects, and potentially serious social implications. This workshop will examine representation learning in the context of limited and sparse training samples, class imbalance, long-tailed distributions, rare cases and classes, and outliers. Speakers and participants will discuss the challenges and risks associated with designing, developing and learning deep representations from data with adverse properties. In addition, the workshop aims to connect researchers devoted to these topics in the traditional shallow representation learning research community and the more recent deep learning community, in order to advance novel and holistic solutions. Critically, given the growth in the application of AI to real-world decision making, the workshop will also facilitate a discussion of the potential social issues associated with application of deep representation learning in the context of data adversity. The workshop will bring together theoretical and applied deep learning researchers from academia and industry, and lay the groundwork for fruitful research collaborations that span communities that are often siloed.

Chat is not available.

Timezone: »

Schedule

Fri 5:00 a.m. - 5:08 a.m.
Welcome from the Organisers
Fri 5:10 a.m. - 5:55 a.m.
  

Few-Shot Classification by Recycling Deep Learning

Hugo Larochelle
Fri 5:55 a.m. - 6:10 a.m.
Hugo Larochelle (Invited Talk Q & A)
Fri 6:10 a.m. - 6:14 a.m.
  

"Learning to classify time series with limited data is a practical yet challenging problem. Current methods are primarily based on hand-designed feature extraction rules or domain-specific data augmentation. Motivated by the advances in deep speech processing models and the fact that voice data are univariate temporal signals, in this paper we propose Voice2Serie (V2S), a novel end-to-end approach that reprograms acoustic models for time series classification, through input transformation learning and output label mapping. Leveraging the representation learning power of a large-scale pre-trained speech processing model, on 31 different time series tasks we show that V2S outperforms or is on part with state-of-the-art methods on 22 tasks, and improves their average accuracy by 1.72%. We further provide theoretical justification of V2S by proving its population risk is upper bounded by the source risk and a Wasserstein distance accounting for feature alignment via reprogramming. Our results offer new and effective means to time series classification."

Huck Yang
Fri 6:14 a.m. - 6:18 a.m.
  

"Generative models which use explicit density modeling (e.g., variational autoencoders, flow-based generative models) often involve finding the optimal mapping (i.e., transfer operator) from a known distribution, e.g. Gaussian, to the input (unknown) distribution. This often requires searching over a class of non-linear functions (e.g. functions that can be represented by a deep neural network). While effective in practice, the associated computational/memory costs can increase rapidly, usually as a function of the performance that is desired in an application. We propose a substantially cheaper (and simpler) distribution matching strategy by leveraging recent developments in neural kernels together with ideas from known results on kernel transfer operators. We show that our formulation enables highly efficient distribution approximation and sampling, and offers empirical performance that compares very favorably with powerful baselines, but with significant savings in runtime. We show that the algorithm also performs well in the small sample size settings. "

Zhichun Huang
Fri 6:18 a.m. - 6:22 a.m.
  

Adversarial examples causing evasive predictions are widely used to evaluate and improve the robustness of machine learning models. However, current studies focus on supervised learning tasks, relying on the ground-truth data label, a targeted objective, or supervision from a trained classifier. In this paper, we propose a framework of generating adversarial examples for unsupervised models and demonstrate novel applications to data augmentation. Our framework exploits a mutual information neural estimator as an information-theoretic similarity measure to generate adversarial examples without supervision. We propose a new MinMax algorithm with provable convergence guarantees for efficient generation of unsupervised adversarial examples. When using unsupervised adversarial examples as a simple plug-in data augmentation tool for model retraining, significant improvements are consistently observed across different unsupervised tasks and datasets, including data reconstruction, representation learning, and contrastive learning.

Chia-Yi Hsu
Fri 6:22 a.m. - 6:26 a.m.
  

"Adversarial robustness of deep learning models has gained much traction in the last few years. While a lot of approaches have been proposed to improve adversarial robustness, one promising direction for improving adversarial robustness is un-explored, i.e., the complex topology of the neural network architecture. In this work, we empirically understand the effect of architecture on adversarial robustness by experimenting with different hand-crafted and NAS based architectures. Our findings show that, for small-scale attacks, NAS-based architectures are more robust for small-scale datasets and simple tasks than hand-crafted architectures. However, as the dataset's size or the task's complexity increase, hand-crafted architectures are more robust than NAS-based architectures. We perform the first large scale study to understand adversarial robustness purely from an \textit{architectural perspective}. Our results show that random sampling in the search space of DARTS (a popular NAS method) with simple ensembling can improve the robustness to PGD attack by nearly ~12\%. We show that NAS, which is popular for SoTA accuracy, can provide adversarial accuracy as a \textit{free add-on} without any form of adversarial training. We also introduce a metric that can be used to calculate the trade-off between clean accuracy and adversarial robustness. "

Chaitanya Devaguptapu
Fri 6:26 a.m. - 6:30 a.m.
  

With the rapid growth of data, it is becoming increasingly difficult to train or improve deep learning models with the right subset of data. We show that this problem can be effectively solved at an additional labeling cost by targeted data subset selection(TSS) where a subset of unlabeled data points similar to an auxiliary set are added to the training data. We do so by using a rich class of Submodular Mutual Information (SMI) functions and demonstrate its effectiveness for image classification on CIFAR-10 and MNIST datasets. Lastly, we compare the performance of SMI functions for TSS with other state-of-the-art methods for closely related problems like active learning. Using SMI functions, we observe ≈30% gain over the model’s performance before re-training with added targeted subset; ≈12% more than other methods.

Suraj Kothawade
Fri 6:30 a.m. - 6:34 a.m.
  

"Principal Component Analysis (PCA) provides reliable dimensionality reduction (DR) when data possesses linear properties even for small datasets. However, faced with data that exhibits non-linear behaviour, PCA cannot perform optimally as compared to non-linear DR methods such as AutoEncoders. By contrast, AutoEncoders typically require much larger datasets for training than PCA. This data requirement is a critical impediment in applications where samples are scarce and expensive to come by. One such area is nanophotonics component design where generating a single data point might involve running optimization methods that use computationally demanding solvers.

We propose Guided AutoEncoders (G-AE) of nearly arbitrary architecture which are standard AutoEncoders initialized using a numerically stable procedure to replicate PCA behaviour before training. Our results show this approach yields a marked reduction in the data size requirements for training the network along with gains in capturing non-linearity during dimensionality reduction and thus performing better than PCA alone."

Muhammad Al-Digeil
Fri 6:34 a.m. - 6:38 a.m.
  

We investigate the effectiveness of maximum-entropy based uncertainty sampling for active learning, for a convolutional neural network, when the acquired dataset is used to train another CNN. Our analysis shows that maximum entropy sampling always performs worse than random iid sampling on the three datasets that are investigated, for all sample sizes considerably smaller than half of the dataset. Side by side, we compare it to a minimum entropy sampling strategy, and propose using a mixture of the two, which is almost always better than iid sampling, and often beats it by a large margin. Our analysis is limited to the text classification setting.

Nimrah Shakeel
Fri 6:40 a.m. - 7:25 a.m.
Coffee Break + Gathertown Virtual Poster Session 1
Fri 7:27 a.m. - 8:12 a.m.
Nitesh Chawla, Frank M. Freimann Professor of Computer Science & Engineering and Director of Lucy Family Institute for Data and Society at the University of Notre Dame (Invited Talk)
Nitesh Chawla
Fri 8:12 a.m. - 8:27 a.m.

SMOTE: From Shallow to Deep

Fri 8:27 a.m. - 9:00 a.m.
Breakout discussion session
Fri 9:00 a.m. - 10:30 a.m.
Lunch Break and Gather.town Discussion Sessions (Lunch Break)
Fri 10:32 a.m. - 11:17 a.m.
  

Learning to see from fewer labels

Bharath Hariharan
Fri 11:17 a.m. - 11:32 a.m.
Bharath Hariharan (Invited Talk Q & A)
Fri 11:32 a.m. - 11:36 a.m.
  

A considerable proportion of the passive acoustic data sets collected for marine mammal conservation purposes remain unanalyzed by human experts. In some cases, the aforementioned proportion amounts to as much as 97% of the entire data set. As a result, research and development into automated classification systems rely on sparsely annotated data sets. In this work we adapt a semi-supervised deep learning approach to develop a classification system of marine mammal vocalizations such that both the annotated and non-annotated portions of an acoustic data set can be used during training.

Mark Thomas
Fri 11:36 a.m. - 11:40 a.m.
  

We propose a simple method by which to choose sample weights for problems with highly imbalanced or skewed traits. Rather than naively discretizing regression labels to find binned weights, we take a more principled approach - we derive sample weights from the transfer function between an estimated source and specified target distributions. Our method outperforms both unweighted and discretely-weighted models on both regression and classification tasks. We also open-source our implementation of this method, providing a modular and robust software package to the scientific community.

Daniel J Wu
Fri 11:40 a.m. - 11:44 a.m.
  

"Gaussian Processes (GPs) are known to provide accurate predictions and uncertainty estimates in small data settings by capturing similarity between data points through their kernel function. However traditional GP kernels do not work well with high dimensional data points. A solution is to use a neural network to map data to low dimensional embeddings before kernel computation. However the huge data requirement of neural networks makes this approach ineffective in small data settings. We solve the conflicting issues of representation learning and data efficiency, by mapping high dimensional data to low dimensional probability distributions using a probabilistic neural network and then computing kernels between these distributions to capture similarity. We also derive a functional gradient descent approach for end-to-end training of our model. Experiments on various datasets show that our approach outperforms the state-of-the-art in GP kernel learning."

Ankur Mallick
Fri 11:44 a.m. - 11:48 a.m.
  

Drawing inferences from a spermatozoon (Sperm Cell) image based on its morphology is ubiquitous, challenging, and of substantial practical interest. In the present study, we endeavour to deconstruct and demonstrate a framework to distinguish between the binary classes, which constitutes 'Good' (Fertile) and 'Bad' (Infertile) Sperm Cell images. We have selected the DenseNet121 architecture to train our model for this task, the reason for which is examined in Section 2.3. Furthermore, Conditional Deep Convolutional Generative Adversarial Networks (cDCGAN) was used to tackle the minority Class imbalance problem, which was heavily prominent in the dataset chosen for this task as seen in Section 2.2. We have hand-picked numerous statistical inferential tests and metrics to validate our model to accentuate the reliability of the obtained results, thus finally formulating and delineating a table based on the respective `Quality Scores' of the test samples provided. With the cDCGAN training data augmentation, the test-set accuracy was recorded to be 86.2%, while the model without cDCGAN scored only 24.3%. The source code for this project can be found at xx location (hidden for double-blind review purposes).

Dipam Paul
Fri 11:48 a.m. - 11:52 a.m.
  

Large datasets in NLP suffer from noisy labels, due to erroneous automatic and human annotation procedures. We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over the classifier. We first assign a probability score to each training sample of having a noisy label, through a beta mixture model fitted on the losses at an early epoch of training. Then, we use this score to selectively guide the learning of the noise model and classifier. Our empirical evaluation on two text classification tasks shows that our approach can improve over the baseline accuracy, and prevent over-fitting to the noise.

Siddhant Garg
Fri 11:52 a.m. - 11:56 a.m.
  

Despite over two decades of progress, imbalanced data is still considered a significant challenge for contemporary machine learning models. With modern advances and rapid developments in deep learning, countering the problem of imbalanced data has become extremely important. The two main approaches to address this issue are based on loss function modifications and instance resampling, typically based on Generative Adversarial Networks (GANs) that may suffer from mode collapse. Therefore, there is a need for an oversampling method that is specifically tailored to deep learning models, can work on raw images while preserving their properties, and is capable of generating high quality, artificial images that can enhance minority classes and balance the training set. We propose DeepSMOTE - a novel oversampling algorithm for deep learning models. It is simple, yet effective in its design. It consists of only three major components: (i) an encoder/decoder framework; (ii) SMOTE-based oversampling; and (iii) a dedicated loss function enhanced with a penalty term. An important advantage of DeepSMOTE over GAN-based oversampling is that DeepSMOTE does not require a discriminator, and it generates high-quality artificial images that are both information-rich and suitable for visual inspection. DeepSMOTE code is publicly available: https://github.com/dd1github/DeepSMOTE

Bartosz Krawczyk
Fri 12:00 p.m. - 12:45 p.m.
Coffee Break + Gathertown Virtual Poster Session 2
Fri 12:47 p.m. - 1:32 p.m.
  

Beyond Bias: Algorithmic Unfairness, Infrastructure, and Genealogies of Data

Alex Hanna
Fri 1:32 p.m. - 1:47 p.m.
Alex Hanna (Invited Talk Q & A)
Fri 1:47 p.m. - 2:45 p.m.

With Alex Hanna, Bharath Hariharan, Nitesh Chawla and Hugo Larochelle

Fri 2:45 p.m. - 3:00 p.m.
Concluding Remarks by the organisers