ICLR Poster Functional Homotopy: Smoothing Discrete Optimization via Continuous Parameters for LLM Jailbreak Attacks

Poster

Functional Homotopy: Smoothing Discrete Optimization via Continuous Parameters for LLM Jailbreak Attacks

Zi Wang · Divyam Anshumaan · Ashish Hooda · Yudong Chen · Somesh Jha

Hall 3 + Hall 2B #323

[ Abstract ] [ Project Page ]

Fri 25 Apr midnight PDT — 2:30 a.m. PDT

Abstract:

Optimization methods are widely employed in deep learning to address and mitigate undesired model responses. While gradient-based techniques have proven effective for image models, their application to language models is hindered by the discrete nature of the input space. This study introduces a novel optimization approach, termed the functional homotopy method, which leverages the functional duality between model training and input generation. By constructing a series of easy-to-hard optimization problems, we iteratively solve these using principles derived from established homotopy methods. We apply this approach to jailbreak attack synthesis for large language models (LLMs), achieving a 20%-30% improvement in success rate over existing methods in circumventing established safe open-source models such as Llama-2 and Llama-3.

Live content is unavailable. Log in and register to view live content