Poster
in
Workshop: Building Trust in LLMs and LLM Applications: From Guardrails to Explainability to Regulation
Harmful Helper: Perform malicious tasks? Web AI agents might help
Yang Fan Chiang · Seungjae (Jay) Lee · Jia-Bin Huang · Furong Huang · Yizheng Chen
Recent research has significantly advanced Web AI agents, introducing groundbreaking architectures and benchmarks demonstrating major progress in autonomous web interaction and navigation. However, recent studies have shown that many AI agents can execute malicious tasks and are more vulnerable than standalone LLMs. Our work studies why Web AI agents, built on safety-aligned backbone Large Language Models (LLMs), remain highly susceptible to following malicious user inputs. In particular, we investigate the sources of these vulnerabilities by analyzing the differences between Web AI agents and standalone LLMs in terms of their design and components, quantifying the vulnerability rate introduced by each component. Through a fine-grained evaluation to uncover nuanced jailbreaking signals, we identify three key factors in Web AI agents that make them more vulnerable than standalone LLMs: 1) directly including user input in the system prompt of LLMs, 2) generating actions in a multi-step manner, and 3) processing Event Streams (observation + action history) from web navigation. Furthermore, we observe that many current benchmarks and evlautions rely on mock-up websites, which could potentially lead to misleading results. Our findings highlight the need to prioritize security and robustness when designing the individual components of AI agents. We also suggest developing more realistic safety evaluation systems for Web AI agents.