GAT: Generative Adversarial Training for Adversarial Example Detection and Classification

Xuwang Yin; Soheil Kolouri; Gustavo K Rohde

Abstract: The vulnerabilities of deep neural networks against adversarial examples have become a significant concern for deploying these models in sensitive domains. Devising a definitive defense against such attacks is proven to be challenging, and the methods relying on detecting adversarial samples are only valid when the attacker is oblivious to the detection mechanism. In this paper we first present an adversarial example detection method that provides performance guarantee to norm constrained adversaries. The method is based on the idea of training adversarial robust subspace detectors using generative adversarial training (GAT). The novel GAT objective presents a minimax problem similar to that of GANs; it has the same convergence property, and consequently supports the learning of class conditional distributions. We first demonstrate that the minimax problem could be reasonably solved by PGD attack, and then use the learned class conditional generative models to define generative detection/classification models that are both robust and more interpretable. We provide comprehensive evaluations of the above methods, and demonstrate their competitive performances and compelling properties on adversarial detection and robust classification problems.

GAT: Generative Adversarial Training for Adversarial Example Detection and Classification

Xuwang Yin, Soheil Kolouri, Gustavo K Rohde

Similar Papers

Defending Against Physically Realizable Attacks on Image Classification

Tong Wu, Liang Tong, Yevgeniy Vorobeychik,

Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions

Yao Qin, Nicholas Frosst, Sara Sabour, Colin Raffel, Garrison Cottrell, Geoffrey Hinton,

BREAKING CERTIFIED DEFENSES: SEMANTIC ADVERSARIAL EXAMPLES WITH SPOOFED ROBUSTNESS CERTIFICATES

Amin Ghiasi, Ali Shafahi, Tom Goldstein,