Oral

Poisoning and Backdooring Contrastive Learning

Nicholas Carlini · Andreas Terzis

[ Abstract ] [ Livestream: Visit Oral 1: Learning in the wild, Reinforcement learning ]
Mon 25 Apr 5:30 p.m. — 5:45 p.m. PDT
[ OpenReview

Multimodal contrastive learning methods like CLIP train on noisy and uncurated training datasets. This is cheaper than labeling datasets manually, and even improves out-of-distribution robustness. We show that this practice makes backdoor and poisoning attacks a significant threat. By poisoning just 0.01% of a dataset (e.g., just 300 images of the 3 million-example Conceptual Captions dataset), we can cause the model to misclassify test images by overlaying a small patch. Targeted poisoning attacks, whereby the model misclassifies a particular test input with an adversarially-desired label, are even easier requiring control of 0.0001% of the dataset (e.g., just three out of the 3 million images). Our attacks call into question whether training on noisy and uncurated Internet scrapes is desirable.

Chat is not available.