ICLR Poster ASTrA: Adversarial Self-supervised Training with Adaptive-Attacks

Poster

ASTrA: Adversarial Self-supervised Training with Adaptive-Attacks

Prakash Chandra Chhipa · Gautam Vashishtha · Jithamanyu Settur · Rajkumar Saini · Mubarak Shah · Marcus Liwicki

Hall 3 + Hall 2B #329

[ Abstract ] [ Project Page ]

Fri 25 Apr 7 p.m. PDT — 9:30 p.m. PDT

Abstract: Existing self-supervised adversarial training (self-AT) methods rely on hand-crafted adversarial attack strategies for PGD attacks, which fail to adapt to the evolving learning dynamics of the model and do not account for instance-specific characteristics of images. This results in sub-optimal adversarial robustness and limits the alignment between clean and adversarial data distributions. To address this, we propose

$\textit{ASTrA}$ (

$\textbf{A}$ dversarial

$\textbf{S}$ elf-supervised

$\textbf{Tr}$ aining with

$\textbf{A}$ daptive-Attacks), a novel framework introducing a learnable, self-supervised attack strategy network that autonomously discovers optimal attack parameters through exploration-exploitation in a single training episode. ASTrA leverages a reward mechanism based on contrastive loss, optimized with REINFORCE, enabling adaptive attack strategies without labeled data or additional hyperparameters. We further introduce a mixed contrastive objective to align the distribution of clean and adversarial examples in representation space. ASTrA achieves state-of-the-art results on CIFAR10, CIFAR100, and STL10 while integrating seamlessly as a plug-and-play module for other self-AT methods. ASTrA shows scalability to larger datasets, demonstrates strong semi-supervised performance, and is resilient to robust overfitting, backed by explainability analysis on optimal attack strategies. Project page for source code and other details at https://prakashchhipa.github.io/projects/ASTrA.

Live content is unavailable. Log in and register to view live content