ICLR Poster On the Perils of Cascading Robust Classifiers

In-Person Poster presentation / poster accept

On the Perils of Cascading Robust Classifiers

Ravi Mangal · Zifan Wang · Chi Zhang · Klas Leino · Corina Pasareanu · Matt Fredrikson

MH1-2-3-4 #139

Keywords: [ Social Aspects of Machine Learning ] [ adversarial attack ] [ ensemble ] [ Soundness ] [ certifiable robustness ]

[ Abstract ]

[ Poster] [ OpenReview]

Abstract: Ensembling certifiably robust neural networks is a promising approach for improving the \emph{certified robust accuracy} of neural models. Black-box ensembles that assume only query-access to the constituent models (and their robustness certifiers) during prediction are particularly attractive due to their modular structure. Cascading ensembles are a popular instance of black-box ensembles that appear to improve certified robust accuracies in practice. However, we show that the robustness certifier used by a cascading ensemble is unsound. That is, when a cascading ensemble is certified as locally robust at an input $x$ (with respect to $\epsilon$), there can be inputs $x'$ in the $\epsilon$-ball centered at $x$, such that the cascade's prediction at $x'$ is different from $x$ and thus the ensemble is not locally robust. Our theoretical findings are accompanied by empirical results that further demonstrate this unsoundness. We present a new attack against cascading ensembles and show that: (1) there exists an adversarial input for up to 88\% of the samples where the ensemble claims to be certifiably robust and accurate; and (2) the accuracy of a cascading ensemble under our attack is as low as 11\% when it claims to be certifiably robust and accurate on 97\% of the test set. Our work reveals a critical pitfall of cascading certifiably robust models by showing that the seemingly beneficial strategy of cascading can actually hurt the robustness of the resulting ensemble. Our code is available at https://github.com/TristaChi/ensembleKW.

Chat is not available.