In-Person Oral presentation / top 5% paper

Clean-image Backdoor: Attacking Multi-label Models with Poisoned Labels Only

Kangjie Chen · Xiaoxuan Lou · Guowen Xu · Jiwei Li · Tianwei Zhang

[ Abstract ] [ Livestream: Visit Oral 5 Track 4: Applications & Optimization ]
Wed 3 May 1:10 a.m. — 1:20 a.m. PDT

Multi-label models have been widely used in various applications including image annotation and object detection. The fly in the ointment is its inherent vulnerability to backdoor attacks due to the adoption of deep learning techniques. However, all existing backdoor attacks exclusively require to modify training inputs (e.g., images), which may be impractical in real-world applications. In this paper, we aim to break this wall and propose the first clean-image backdoor attack, which only poisons the training labels without touching the training samples. Our key insight is that in a multi-label learning task, the adversary can just manipulate the annotations of training samples consisting of a specific set of classes to activate the backdoor. We design a novel trigger exploration method to find convert and effective triggers to enhance the attack performance. We also propose three target label selection strategies to achieve different goals. Experimental results indicate that our clean-image backdoor can achieve a 98% attack success rate while preserving the model's functionality on the benign inputs. Besides, the proposed clean-image backdoor can evade existing state-of-the-art defenses.

Chat is not available.