Oral
in
Workshop: Backdoor Attacks and Defenses in Machine Learning
Backdoor Attacks Against Transformers with Attention Enhancement
Weimin Lyu · Songzhu Zheng · Haibin Ling · Chao Chen
With the popularity of transformers in natural language processing (NLP) applications, there are growing concerns about their security. Most existing NLP attack methods focus on injecting stealthy trigger words/phrases. In this paper, we focus on the interior structure of neural networks and the Trojan mechanism. Focusing on the prominent NLP transformer models, we propose a novel Trojan Attention Loss (TAL), which enhances the Trojan behavior by directly manipulating the attention pattern. TAL significantly improves the attack efficacy; it achieves better successful rates and uses a much smaller poisoning rate (i.e., a smaller proportion of poisoned samples). It boosts attack efficacy for not only traditional dirty-label attacks, but also the more challenging clean-label attacks. TAL is compatible with existing attack methods and can be easily adapted to different backbone transformer models.