Backdoor attacks aim to cause consistent misclassification of any input by adding a specific pattern called a trigger. Recent studies have shown the feasibility of launching backdoor attacks in various domains, such as computer vision (CV), natural language processing (NLP), federated learning (FL), etc. As backdoor attacks are mostly carried out through data poisoning (i.e., adding malicious inputs to training data), it raises major concerns for many publicly available pre-trained models. Defending against backdoor attacks has sparked multiple lines of research. Many defense techniques are effective against some particular types of backdoor attacks. However, with increasingly emerging diverse backdoors, the defense performance of existing work tends to be limited. This workshop, Backdoor Attacks aNd DefenSes in Machine Learning (BANDS), aims to bring together researchers from government, academia, and industry that share a common interest in exploring and building more secure machine learning models against backdoor attacks.
Fri 6:00 a.m. - 6:10 a.m.
|
Introduction and Opening Remarks
SlidesLive Video » |
🔗 |
Fri 6:10 a.m. - 6:55 a.m.
|
Keynote Talk by Amir Houmansadr
(
Keynote Talk
)
link »
SlidesLive Video » |
🔗 |
Fri 6:55 a.m. - 7:25 a.m.
|
Invited Talk by Vitaly Shmatikov
(
Invited Talk
)
link »
SlidesLive Video » |
🔗 |
Fri 7:25 a.m. - 7:35 a.m.
|
Coffee Break
|
🔗 |
Fri 7:35 a.m. - 8:05 a.m.
|
Invited Talk by Yang Zhang
(
Invited Talk
)
link »
SlidesLive Video » |
🔗 |
Fri 8:05 a.m. - 8:20 a.m.
|
How to Backdoor Diffusion Models?
(
Oral
)
link »
SlidesLive Video »
Diffusion models are state-of-the-art deep learning empowered generative models that are trained based on the principle of learning forward and reverse diffusion processes via progressive noise-addition and denoising. To gain a better understanding of the limitations and potential risks, this paper presents the first study on the robustness of diffusion models against backdoor attacks. Specifically, we propose $\textbf{BadDiffusion}$, a novel attack framework that engineers compromised diffusion processes during model training for backdoor implantation. At the inference stage, the backdoored diffusion model will behave just like an untampered generator for regular data inputs, while falsely generating some targeted outcome designed by the bad actor upon receiving the implanted trigger signal. Such a critical risk can be dreadful for downstream tasks and applications built upon the problematic model. Our extensive experiments on various backdoor attack settings show that $\textbf{BadDiffusion}$ can consistently lead to compromised diffusion models with high utility and target specificity. Even worse, $\textbf{BadDiffusion}$ can be made cost-effective by simply finetuning a clean pre-trained diffusion model to implant backdoors. We also explore some possible countermeasures for risk mitigation. Our results call attention to potential risks and possible misuse of diffusion models.
|
Sheng-Yen Chou · Pin-Yu Chen · Tsung-Yi Ho 🔗 |
Fri 8:20 a.m. - 8:50 a.m.
|
Invited Talk by Bo Li
(
Invited Talk
)
link »
SlidesLive Video » |
🔗 |
Fri 8:50 a.m. - 9:50 a.m.
|
Lunch Break
|
🔗 |
Fri 9:50 a.m. - 10:20 a.m.
|
IEEE Trojan Removal Competition Remarks
(
Competition Remarks
)
link »
SlidesLive Video » |
🔗 |
Fri 10:20 a.m. - 10:50 a.m.
|
Invited Talk by Michael Mahoney
(
Invited Talk
)
link »
SlidesLive Video » |
🔗 |
Fri 10:50 a.m. - 11:20 a.m.
|
Invited Talk by Ruoxi Jia
(
Invited Talk
)
link »
SlidesLive Video » |
🔗 |
Fri 11:20 a.m. - 11:35 a.m.
|
Removing Backdoor Behaviors with Unlabeled Data
(
Oral
)
link »
SlidesLive Video » Due to the increasing computational demand of Deep Neural Networks (DNNs), companies and organizations have begun to outsource the training process. However, the externally trained DNNs can potentially be backdoor attacked. It is crucial to defend against such attacks, i.e., to postprocess a suspicious model so that its backdoor behavior is mitigated while its normal prediction power on clean inputs remain uncompromised. To remove the abnormal backdoor behavior, existing methods mostly rely on additional labeled clean samples. However, such requirement is unrealistic as the training data are often unavailable to end users. In this paper, we investigate the possibility of circumventing such barrier. We propose a novel defense method that does not require training labels. Through a carefully designed layer-wise weight re-initialization and knowledge distillation, our method can effectively cleanse backdoor behaviors of a suspicious network with negligible compromise in its normal behavior. In experiments, we show that our method, trained without labels, is on-par with state-of-the-art defense methods trained using labels. We also observe promising defense results even on out-of-distribution data. This makes our method a very practical solution. |
Lu Pang · Tao Sun · Haibin Ling · Chao Chen 🔗 |
Fri 11:35 a.m. - 12:05 p.m.
|
Invited Talk by Ben Y. Zhao
(
Invited Talk
)
link »
SlidesLive Video » |
🔗 |
Fri 12:05 p.m. - 12:20 p.m.
|
BITE: Textual Backdoor Attacks with Iterative Trigger Injection
(
Spotlight
)
link »
SlidesLive Video » Backdoor attacks have become an emerging threat to NLP systems. By providing poisoned training data, the adversary can embed a "backdoor" into the victim model, which allows input instances satisfying certain textual patterns (e.g., containing a keyword) to be predicted as a target label of the adversary's choice. In this paper, we demonstrate that it's possible to design a backdoor attack that is both stealthy (i.e., hard to notice) and effective (i.e., has a high attack success rate). We propose BITE, a backdoor attack that poisons the training data to establish strong correlations between the target label and some "trigger words", by iteratively injecting them into target-label instances through natural word-level perturbations. The poisoned training data instruct the victim model to predict the target label on inputs containing trigger words, forming the backdoor. Experiments on four text classification datasets show that our proposed attack is significantly more effective than baseline methods while maintaining decent stealthiness, raising alarm on the usage of untrusted training data. |
Jun Yan · Vansh Gupta · Xiang Ren 🔗 |
Fri 12:05 p.m. - 12:20 p.m.
|
Learning the Wrong Lessons: Inserting Trojans During Knowledge Distillation
(
Spotlight
)
link »
SlidesLive Video » In recent years, knowledge distillation has become a cornerstone of efficiently deployed machine learning, with labs and industries using knowledge distillation to train models that are inexpensive and resource-optimized. Trojan attacks have contemporaneously gained significant prominence, revealing fundamental vulnerabilities in deep learning models. Given the widespread use of knowledge distillation, in this work we seek to exploit the unlabelled data knowledge distillation process to embed Trojans in a student model without introducing conspicuous behavior in the teacher. We ultimately devise a Trojan attack that effectively reduces student accuracy, does not alter teacher performance, and is efficiently constructible in practice. |
Leonard Tang · Tom Shlomi · Alexander Cai 🔗 |
Fri 12:05 p.m. - 12:20 p.m.
|
Learning to Backdoor Federated Learning
(
Spotlight
)
link »
SlidesLive Video » In a federated learning (FL) system, malicious participants can easily embed backdoors into the aggregated model while maintaining the model's performance on the main task. To this end, various defenses, including training stage aggregation-based defenses and post-training mitigation defenses, have been proposed recently. While these defenses obtain reasonable performance against existing backdoor attacks, which are mainly heuristics based, we show that they are insufficient in the face of more advanced attacks. In particular, we propose a general reinforcement learning-based backdoor attack framework where the attacker first trains a (non-myopic) attack policy using a simulator built upon its local data and common knowledge on the FL system, which is then applied during actual FL training. Our attack framework is both adaptive and flexible and achieves strong attack performance and durability even under state-of-the-art defenses. |
Henger Li · Chen Wu · Sencun Zhu · Zizhan Zheng 🔗 |
Fri 12:05 p.m. - 12:20 p.m.
|
Secure Federated Learning against Model Poisoning Attacks via Client Filtering
(
Spotlight
)
link »
SlidesLive Video » Given the distributed nature, detecting and defending against the backdoor attack under federated learning (FL) systems is challenging. In this paper, we observe that the cosine similarity of the last layer's weight between the global model and each local update could be used effectively as an indicator of malicious model updates. Therefore, we propose CosDefense, a cosine-similarity-based attacker detection algorithm. Specifically, under CosDefense, the server calculates the cosine similarity score of the last layer's weight between the global model and each client update, labels malicious clients whose score is much higher than the average, and filters them out of the model aggregation in each round. Compared to existing defense schemes, CosDefense does not require any extra information besides the received model updates to operate and is compatible with client sampling. Experiment results on three real-world datasets demonstrate that CosDefense could provide robust performance under the state-of-the-art FL poisoning attack. |
Duygu Nur Yaldiz · Tuo Zhang · Salman Avestimehr 🔗 |
Fri 12:05 p.m. - 12:20 p.m.
|
Unlearning Backdoor Attacks in Federated Learning
(
Spotlight
)
link »
SlidesLive Video » Backdoor attacks are always a big threat to the federated learning system. Substantial progress has been made to mitigate such attacks during or after the training process. However, how to remove a potential attacker's contribution from the trained global model still remains an open problem. Towards this end, we propose a federated unlearning method to eliminate an attacker's contribution by subtracting the accumulated historical updates from the model and leveraging the knowledge distillation method to restore the model's performance without introducing the backdoor. Our method can be broadly applied to different types of neural networks and does not rely on clients' participation. Thus, it is practical and efficient. Experiments on three canonical datasets demonstrate the effectiveness and efficiency of our method. |
Chen Wu · SENCUN ZHU · Prasenjit Mitra 🔗 |
Fri 12:05 p.m. - 12:20 p.m.
|
Rethinking the Necessity of Labels in Backdoor Removal
(
Spotlight
)
link »
SlidesLive Video » Since training a model from scratch always requires massive computational resources recently, it has become popular to download pre-trained backbones from third-party platforms and deploy them in various downstream tasks. While providing some convenience, it also introduces potential security risks like backdoor attacks, which lead to target misclassification for any input image with a specifically defined trigger (i.e., backdoored examples). Current backdoor defense methods always rely on clean labeled data, which indicates that safely deploying the pre-trained model in downstream tasks still demands these costly or hard-to-obtain labels. In this paper, we focus on how to purify a backdoored backbone with only unlabeled data. To evoke the backdoor patterns without labels, we propose to leverage the unsupervised contrastive loss to search for backdoors in the feature space. Surprisingly, we find that we can mimic backdoored examples with adversarial examples crafted by contrastive loss, and erase them with adversarial finetuning. Thus, we name our method as Contrastive Backdoor Defense (CBD). Against several backdoored backbones from both supervised and self-supervised learning, extensive experiments demonstrate our method, without using labels, achieves comparable or even better defense compared to the backdoor defense using labels. Thus, our method allows practitioners to safely deploy pre-trained backbones on downstream tasks without extra labeling costs. |
Zidi Xiong · Dongxian Wu · Yifei Wang · Yisen Wang 🔗 |
Fri 12:20 p.m. - 12:30 p.m.
|
Coffee Break
|
🔗 |
Fri 12:30 p.m. - 1:00 p.m.
|
Invited Talk by Pin-Yu Chen
(
Invited Talk
)
link »
SlidesLive Video » |
🔗 |
Fri 1:00 p.m. - 1:30 p.m.
|
Invited Talk by Wenbo Guo
(
Invited Talk
)
link »
SlidesLive Video » |
🔗 |
Fri 1:30 p.m. - 1:45 p.m.
|
Backdoor Attacks Against Transformers with Attention Enhancement
(
Oral
)
link »
SlidesLive Video » With the popularity of transformers in natural language processing (NLP) applications, there are growing concerns about their security. Most existing NLP attack methods focus on injecting stealthy trigger words/phrases. In this paper, we focus on the interior structure of neural networks and the Trojan mechanism. Focusing on the prominent NLP transformer models, we propose a novel Trojan Attention Loss (TAL), which enhances the Trojan behavior by directly manipulating the attention pattern. TAL significantly improves the attack efficacy; it achieves better successful rates and uses a much smaller poisoning rate (i.e., a smaller proportion of poisoned samples). It boosts attack efficacy for not only traditional dirty-label attacks, but also the more challenging clean-label attacks. TAL is compatible with existing attack methods and can be easily adapted to different backbone transformer models. |
Weimin Lyu · Songzhu Zheng · Haibin Ling · Chao Chen 🔗 |
Fri 1:45 p.m. - 2:00 p.m.
|
BackdoorBox: A Python Toolbox for Backdoor Learning
(
Spotlight
)
link »
SlidesLive Video »
Third-party resources ($e.g.$, samples, backbones, and pre-trained models) are usually involved in the training of deep neural networks (DNNs), which brings backdoor attacks as a new training-phase threat. In general, backdoor attackers intend to implant hidden backdoor in DNNs, so that the attacked DNNs behave normally on benign samples whereas their predictions will be maliciously changed to a pre-defined target label if hidden backdoors are activated by attacker-specified trigger patterns. To facilitate the research and development of more secure training schemes and defenses, we design an open-sourced Python toolbox that implements representative and advanced backdoor attacks and defenses under a unified and flexible framework. Our toolbox has four important and promising characteristics, including consistency, simplicity, flexibility, and co-development. It allows researchers and developers to easily implement and compare different methods on benchmark or their local datasets. This Python toolbox, namely \texttt{BackdoorBox}, will be available at GitHub.
|
Yiming Li · Mengxi Ya · Yang Bai · Yong Jiang · Shu-Tao Xia 🔗 |
Fri 1:45 p.m. - 2:00 p.m.
|
On the Existence of a Trojaned Twin Model
(
Spotlight
)
link »
SlidesLive Video » We study the Trojan Attack problem, where malicious attackers sabotage deep neural network models with poisoned training data. In most existing works, the effectiveness of the attack is largely overlooked; many attacks can be ineffective or inefficient for certain training schemes, e.g., adversarial training.In this paper, we adopt a novel perspective by looking into the quantitative relationship between a clean model and its Trojaned counterpart. We formulate a successful attack using classic machine learning language, namely a universal Trojan trigger intrinsic to the data distribution. Theoretically, we prove that, under mild assumptions, there exists a Trojaned model, {named Trojaned Twin}, that is very close to the clean model in the output space. Practically, we show that these results have powerful implications since the Trojaned twin model has enhanced attack efficacy and strong resiliency against detection. Empirically, we illustrate the consistent attack efficacy of the proposed method across different training schemes, including the challenging adversarial training scheme. Furthermore, we show that this Trojaned twin model is robust against SoTA detection methods. |
Songzhu Zheng · Yikai Zhang · Lu Pang · Weimin Lyu · Mayank Goswami · Anderson Schneider · Yuriy Nevmyvaka · Haibin Ling · Chao Chen 🔗 |
Fri 1:45 p.m. - 2:00 p.m.
|
DABS: Data-Agnostic Backdoor attack at the Server in Federated Learning
(
Spotlight
)
link »
SlidesLive Video » Federated learning (FL) attempts to train a global model by aggregating local models from distributed devices under the coordination of a central server. However, the existence of a large number of heterogeneous devices makes FL vulnerable to various attacks, especially the stealthy backdoor attack. Backdoor attack aims to trick a neural network to misclassify data to a target label by injecting specific triggers while keeping correct predictions on original training data. Existing works focus on client-side attacks which try to poison the global model by modifying the local datasets. In this work, we propose a new attack model for FL, namely Data-Agnostic Backdoor attack at the Server (DABS), where the server directly modifies the global model to backdoor an FL system. Extensive simulation results show that this attack scheme achieves a higher attack success rate compared with baseline methods while maintaining normal accuracy on the clean data. |
Wenqiang Sun · Sen Li · Yuchang Sun · Jun Zhang 🔗 |
Fri 1:45 p.m. - 2:00 p.m.
|
Exploring Vulnerabilities of Semi-Supervised Learning to Simple Backdoor Attacks
(
Spotlight
)
link »
SlidesLive Video » Semi-supervised learning methods can train high-accuracy machine learning models with a fraction of the labeled training samples required for traditional supervised learning. Such methods do not typically involve close review of the unlabeled training samples, making them tempting targets for data poisoning attacks. In this paper, we show that simple backdoor attacks on unlabeled samples in semi-supervised learning are surprisingly effective - achieving an average attack success rate as high as 96.9%. We identify unique characteristics of backdoor attacks against semi-supervised learning that can provide practitioners with a better understanding of the vulnerabilities of their models to backdoor attacks. |
Marissa Connor · Vincent Emanuele 🔗 |
Fri 1:45 p.m. - 2:00 p.m.
|
Augmentation Backdoors
(
Spotlight
)
link »
SlidesLive Video » Data augmentation is used extensively to improve model generalisation. However, reliance on external libraries to implement augmentation methods introduces a vulnerability into the machine learning pipeline. It is well known that backdoors can be inserted into machine learning models through serving a modified dataset to train on. Augmentation therefore presents a perfect opportunity to perform this modification without requiring an initially backdoored dataset. In this paper we present three backdoor attacks that can be covertly inserted into data augmentation. Our attacks each insert a backdoor using a different type of computer vision augmentation transform, covering simple image transforms, GAN-based augmentation, and composition-based augmentation. By inserting the backdoor using these augmentation transforms, we make our backdoors difficult to detect, while still supporting arbitrary backdoor functionality. We evaluate our attacks on a range of computer vision benchmarks and demonstrate that an attacker is able to introduce backdoors through just a malicious augmentation routine. |
Joseph Rance · Yiren Zhao · I Shumailov · Robert Mullins 🔗 |
Fri 1:45 p.m. - 2:00 p.m.
|
Salient Conditional Diffusion for Backdoors
(
Spotlight
)
link »
SlidesLive Video » We propose a novel algorithm, Salient Conditional Diffusion (Sancdifi), a state-of-the-art defense against backdoor attacks. Sancdifi uses a diffusion model (DDPM) to degrade an image with noise and then recover it. Critically, we compute saliency map-based masks to condition our diffusion, allowing for stronger diffusion on the most salient pixels by the DDPM. As a result, Sancdifi is highly effective at diffusing out triggers in data poisoned by backdoor attacks. At the same time, it reliably recovers salient features when applied to clean data. Sancdifi is a black-box defense, requiring no access to the trojan network parameters. |
Brandon May · Joseph Tatro · Piyush Kumar · Nathan Shnidman 🔗 |
Fri 2:00 p.m. - 3:00 p.m.
|
Panel Discussion
SlidesLive Video » |
🔗 |
Fri 3:00 p.m. - 3:05 p.m.
|
Closing Remarks
SlidesLive Video » |
🔗 |