ICLR Poster Adversarial Training for Defense Against Label Poisoning Attacks

Poster

Adversarial Training for Defense Against Label Poisoning Attacks

Melis Ilayda Bal · Volkan Cevher · Michael Muehlebach

Hall 3 + Hall 2B #313

[ Abstract ] [ Project Page ]

Fri 25 Apr midnight PDT — 2:30 a.m. PDT

Abstract: As machine learning models grow in complexity and increasingly rely on publicly sourced data, such as the human-annotated labels used in training large language models, they become more vulnerable to label poisoning attacks.These attacks, in which adversaries subtly alter the labels within a training dataset, can severely degrade model performance, posing significant risks in critical applications.In this paper, we propose

$\textbf{Floral}$ , a novel adversarial training defense strategy based on support vector machines (SVMs) to counter these threats. Utilizing a bilevel optimization framework, we cast the training process as a non-zero-sum Stackelberg game between an

$\textit{attacker}$ , who strategically poisons critical training labels, and the

$\textit{model}$ , which seeks to recover from such attacks. Our approach accommodates various model architectures and employs a projected gradient descent algorithm with kernel SVMs for adversarial training. We provide a theoretical analysis of our algorithm’s convergence properties and empirically evaluate

$\textbf{Floral}$ 's effectiveness across diverse classification tasks.Compared to robust baselines and foundation models such as RoBERTa,

$\textbf{Floral}$ consistently achieves higher robust accuracy under increasing attacker budgets.These results underscore the potential of

$\textbf{Floral}$ to enhance the resilience of machine learning models against label poisoning threats, thereby ensuring robust classification in adversarial settings.

Live content is unavailable. Log in and register to view live content