ICLR Poster On Adversarial Training without Perturbing all Examples

Poster

On Adversarial Training without Perturbing all Examples

Max Losch · Mohamed Omran · David Stutz · Mario Fritz · Bernt Schiele

Halle B #125

[ Abstract ]

[ Poster] [ OpenReview]

Abstract: Adversarial training is the de-facto standard for improving robustness against adversarial examples. This usually involves a multi-step adversarial attack applied on each example during training. In this paper, we explore only constructing adversarial examples (AE) on a subset of the training examples. That is, we split the training set in two subsets

$A$ and

$B$ , train models on both (

$A\cup B$ ) but construct AEs only for examples in

$A$ . Starting with

$A$ containing only a single class, we systematically increase the size of

$A$ and consider splitting by class and by examples. We observe that: (i) adv. robustness transfers by difficulty and to classes in

$B$ that have never been adv. attacked during training, (ii) we observe a tendency for hard examples to provide better robustness transfer than easy examples, yet find this tendency to diminish with increasing complexity of datasets (iii) generating AEs on only

$50$ % of training data is sufficient to recover most of the baseline AT performance even on ImageNet. We observe similar transfer properties across tasks, where generating AEs on only

$30$ % of data can recover baseline robustness on the target task. We evaluate our subset analysis on a wide variety of image datasets like CIFAR-10, CIFAR-100, ImageNet-200 and show transfer to SVHN, Oxford-Flowers-102 and Caltech-256. In contrast to conventional practice, our experiments indicate that the utility of computing AEs varies by class and examples and that weighting examples from

$A$ higher than

$B$ provides high transfer performance. Code is available at [http://github.com/mlosch/SAT](http://github.com/mlosch/SAT).

Chat is not available.