ICLR Poster Language Model Inversion

Poster

Language Model Inversion

John X. Morris · Wenting Zhao · Justin Chiu · Vitaly Shmatikov · Alexander Rush

Halle B #279

[ Abstract ]

[ OpenReview]

Abstract: Given a prompt, language models produce a distribution over all possible next tokens; when the prompt is unknown, can we use this distributional information to recover the prompt? We consider the problem of anguage model inversion and show that next-token probabilities contain a surprising amount of information about the preceding text. Often we can recover the text in cases where it is hidden from the user, motivating a method for recovering unknown prompts given only the model's current distribution output. We consider a variety of model access scenarios, and show how even without predictions for every token in the vocabulary we can recover the probability vector through search and reconstruction of the input. On LLAMA-7B, our inversion method reconstructs prompts with a BLEU of

59

$59$ and token-level F1 of

$77$ and recovers

$23\%$ of prompts exactly

Chat is not available.