Jacobian Adversarially Regularized Networks for Robustness

Alvin Chan, Yi Tay, Yew Soon Ong, Jie Fu

Keywords: adversarial, perturbation, robustness

Abstract: Adversarial examples are crafted with imperceptible perturbations with the intent to fool neural networks. Against such attacks, adversarial training and its variants stand as the strongest defense to date. Previous studies have pointed out that robust models that have undergone adversarial training tend to produce more salient and interpretable Jacobian matrices than their non-robust counterparts. A natural question is whether a model trained with an objective to produce salient Jacobian can result in better robustness. This paper answers this question with affirmative empirical results. We propose Jacobian Adversarially Regularized Networks (JARN) as a method to optimize the saliency of a classifier's Jacobian by adversarially regularizing the model's Jacobian to resemble natural training images. Image classifiers trained with JARN show improved robust accuracy compared to standard models on the MNIST, SVHN and CIFAR-10 datasets, uncovering a new angle to boost robustness without using adversarial training.

Similar Papers

Robust Local Features for Improving the Generalization of Adversarial Training
Chuanbiao Song, Kun He, Jiadong Lin, Liwei Wang, John E. Hopcroft,
Improving Adversarial Robustness Requires Revisiting Misclassified Examples
Yisen Wang, Difan Zou, Jinfeng Yi, James Bailey, Xingjun Ma, Quanquan Gu,