ICLR Poster Provably robust classification of adversarial examples with detection

Poster

Provably robust classification of adversarial examples with detection

Fatemeh Sheikholeslami · Ali Lotfi · Zico Kolter

Keywords: [ robust deep learning ] [ adversarial robustness ]

[ Abstract ] [ Paper PDF ]

[ Paper ]

Abstract: Adversarial attacks against deep networks can be defended against either by building robust classifiers or, by creating classifiers that can \emph{detect} the presence of adversarial perturbations. Although it may intuitively seem easier to simply detect attacks rather than build a robust classifier, this has not bourne out in practice even empirically, as most detection methods have subsequently been broken by adaptive attacks, thus necessitating \emph{verifiable} performance for detection mechanisms. In this paper, we propose a new method for jointly training a provably robust classifier and detector. Specifically, we show that by introducing an additional "abstain/detection" into a classifier, we can modify existing certified defense mechanisms to allow the classifier to either robustly classify \emph{or} detect adversarial attacks. We extend the common interval bound propagation (IBP) method for certified robustness under

ℓ_{\infty}

$\ell_\infty$ perturbations to account for our new robust objective, and show that the method outperforms traditional IBP used in isolation, especially for large perturbation sizes. Specifically, tests on MNIST and CIFAR-10 datasets exhibit promising results, for example with provable robust error less than

63.63 %

$63.63\%$ and

67.92 %

$67.92\%$ , for

55.6 %

$55.6\%$ and

66.37 %

$66.37\%$ natural error, for

ϵ = 8 / 255

$\epsilon=8/255$ and

16 / 255

$16/255$ on the CIFAR-10 dataset, respectively.

Chat is not available.