Skip to yearly menu bar Skip to main content


Poster

Provably robust classification of adversarial examples with detection

Fatemeh Sheikholeslami · Ali Lotfi · Zico Kolter

Keywords: [ robust deep learning ] [ adversarial robustness ]


Abstract: Adversarial attacks against deep networks can be defended against either by building robust classifiers or, by creating classifiers that can \emph{detect} the presence of adversarial perturbations. Although it may intuitively seem easier to simply detect attacks rather than build a robust classifier, this has not bourne out in practice even empirically, as most detection methods have subsequently been broken by adaptive attacks, thus necessitating \emph{verifiable} performance for detection mechanisms. In this paper, we propose a new method for jointly training a provably robust classifier and detector. Specifically, we show that by introducing an additional "abstain/detection" into a classifier, we can modify existing certified defense mechanisms to allow the classifier to either robustly classify \emph{or} detect adversarial attacks. We extend the common interval bound propagation (IBP) method for certified robustness under perturbations to account for our new robust objective, and show that the method outperforms traditional IBP used in isolation, especially for large perturbation sizes. Specifically, tests on MNIST and CIFAR-10 datasets exhibit promising results, for example with provable robust error less than 63.63%63.63% and 67.92%67.92%, for 55.6%55.6% and 66.37%66.37% natural error, for ϵ=8/255ϵ=8/255 and 16/25516/255 on the CIFAR-10 dataset, respectively.

Chat is not available.