TrojanTO: Action-Level Backdoor Attacks Against Trajectory Optimization Models
Abstract
Trajectory Optimization (TO) models have achieved remarkable success in offline reinforcement learning (offline RL). However, their vulnerability to backdoor attacks remains largely unexplored. We find that existing backdoor attacks in RL, which typically rely on reward manipulation throughout training, are largely ineffective against TO models due to their inherent sequence modeling nature and large network size. Moreover, the complexities introduced by high-dimensional continuous action further compound the challenge of injecting effective backdoors. To address these gaps, we propose TrojanTO, the first action-level backdoor attack against TO models. TrojanTO is a post-training attack and employs alternating training to forge a strong connection between triggers and target actions, ensuring high attack effectiveness. To maintain attack stealthiness, it utilizes trajectory filtering to preserve the benign performance and batch poisoning for trigger consistency. Extensive evaluations demonstrate that TrojanTO effectively implants backdoors across diverse tasks and attack objectives with a low attack budget (0.3\% of trajectories). Furthermore, TrojanTO exhibits broad applicability to DT, GDT, and DC, underscoring its scalability across diverse TO model architectures.