Poster
Language Model Inversion
John X. Morris · Wenting Zhao · Justin Chiu · Vitaly Shmatikov · Alexander Rush
Halle B #279
Abstract:
Given a prompt, language models produce a distribution over all possible next tokens; when the prompt is unknown, can we use this distributional information to recover the prompt? We consider the problem of anguage model inversion and show that next-token probabilities contain a surprising amount of information about the preceding text. Often we can recover the text in cases where it is hidden from the user, motivating a method for recovering unknown prompts given only the model's current distribution output. We consider a variety of model access scenarios, and show how even without predictions for every token in the vocabulary we can recover the probability vector through search and reconstruction of the input. On LLAMA-7B, our inversion method reconstructs prompts with a BLEU of $59$ and token-level F1 of $77$ and recovers $23\%$ of prompts exactly
Chat is not available.