Skip to yearly menu bar Skip to main content


Poster

Adversarial Training for Defense Against Label Poisoning Attacks

Melis Ilayda Bal · Volkan Cevher · Michael Muehlebach

Hall 3 + Hall 2B #313
[ ] [ Project Page ]
Fri 25 Apr midnight PDT — 2:30 a.m. PDT

Abstract: As machine learning models grow in complexity and increasingly rely on publicly sourced data, such as the human-annotated labels used in training large language models, they become more vulnerable to label poisoning attacks.These attacks, in which adversaries subtly alter the labels within a training dataset, can severely degrade model performance, posing significant risks in critical applications.In this paper, we propose $\textbf{Floral}$, a novel adversarial training defense strategy based on support vector machines (SVMs) to counter these threats. Utilizing a bilevel optimization framework, we cast the training process as a non-zero-sum Stackelberg game between an $\textit{attacker}$, who strategically poisons critical training labels, and the $\textit{model}$, which seeks to recover from such attacks. Our approach accommodates various model architectures and employs a projected gradient descent algorithm with kernel SVMs for adversarial training. We provide a theoretical analysis of our algorithm’s convergence properties and empirically evaluate $\textbf{Floral}$'s effectiveness across diverse classification tasks.Compared to robust baselines and foundation models such as RoBERTa, $\textbf{Floral}$ consistently achieves higher robust accuracy under increasing attacker budgets.These results underscore the potential of $\textbf{Floral}$ to enhance the resilience of machine learning models against label poisoning threats, thereby ensuring robust classification in adversarial settings.

Live content is unavailable. Log in and register to view live content