ICLR Poster Provably Safeguarding a Classifier from OOD and Adversarial Samples

Poster

Provably Safeguarding a Classifier from OOD and Adversarial Samples

Nicolas Atienza · Johanne Cohen · Christophe Labreuche · Michele Sebag

Hall 3 + Hall 2B #340

[ Abstract ]

Thu 24 Apr midnight PDT — 2:30 a.m. PDT

Abstract:

This paper aims to transform a trained classifier into an abstaining classifier, suchthat the latter is provably protected from out-of-distribution and adversarial samples. The proposed Sample-efficient Probabilistic Detection using Extreme ValueTheory (SPADE) approach relies on a Generalized Extreme Value (GEV) modelof the training distribution in the latent space of the classifier. Under mild assumptions, this GEV model allows for formally characterizing out-of-distributionand adversarial samples and rejecting them. Empirical validation of the approachis conducted on various neural architectures (ResNet, VGG, and Vision Transformer) and considers medium and large-sized datasets (CIFAR-10, CIFAR-100,and ImageNet). The results show the stability and frugality of the GEV model anddemonstrate SPADE’s efficiency compared to the state-of-the-art methods.

Live content is unavailable. Log in and register to view live content