Poster
Provably Safeguarding a Classifier from OOD and Adversarial Samples
Nicolas Atienza · Johanne Cohen · Christophe Labreuche · Michele Sebag
Hall 3 + Hall 2B #340
This paper aims to transform a trained classifier into an abstaining classifier, suchthat the latter is provably protected from out-of-distribution and adversarial samples. The proposed Sample-efficient Probabilistic Detection using Extreme ValueTheory (SPADE) approach relies on a Generalized Extreme Value (GEV) modelof the training distribution in the latent space of the classifier. Under mild assumptions, this GEV model allows for formally characterizing out-of-distributionand adversarial samples and rejecting them. Empirical validation of the approachis conducted on various neural architectures (ResNet, VGG, and Vision Transformer) and considers medium and large-sized datasets (CIFAR-10, CIFAR-100,and ImageNet). The results show the stability and frugality of the GEV model anddemonstrate SPADE’s efficiency compared to the state-of-the-art methods.
Live content is unavailable. Log in and register to view live content