Poster
in
Workshop: Workshop on Spurious Correlation and Shortcut Learning: Foundations and Solutions
Spawrious: A Benchmark for Fine Control of Spurious Correlation Biases
Aengus Lynch · Gbetondji Dovonon · Jean Kaddour · Ricardo Silva
Keywords: [ spurious correlation ] [ benchmark ]
The problem of spurious correlations (SCs) arises when a classifier relies on non-predictive features that happen to be correlated with the labels in the training data. Previous SC benchmark datasets suffer from varying issues, e.g., over-saturation or only containing one-to-one (O2O) SCs, but no many-to-many (M2M) SCs arising between groups of spurious attributes and classes. In this paper, we present Spawrious-{O2O, M2M}-{Easy, Medium, Hard}, an image classification benchmark suite containing spurious correlations between classes and backgrounds. We employ a text-to-image model to generate photo-realistic images and an image captioning model to filter out unsuitable ones. The resulting dataset is of high quality and contains approximately 152k images. Our experimental results demonstrate that state-of-the-art group robustness methods struggle with Spawrious.