Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Secure and Trustworthy Large Language Models

I'm not familiar with the name Harry Potter: Prompting Baselines for Unlearning in LLMs

Pratiksha Thaker · Yash Maurya · Virginia Smith


Abstract:

Recent work has demonstrated that fine-tuning is a promising approach to `unlearn' concepts from large language models. However, fine-tuning can be expensive, as it requires both generating a set of examples and running iterations of fine-tuning to update the model. In this work, we show that simple prompting approaches can achieve unlearning results comparable to fine-tuning methods. We recommend that researchers investigate prompting as a lightweight baseline when evaluating the performance of more computationally intensive fine-tuning approaches. While we do not claim that prompting is a universal solution to the problem of unlearning, our work suggests the need for evaluation metrics that can better separate the power of prompting and fine-tuning, and highlights scenarios where prompting itself may be useful for unlearning, such as in generating examples for fine-tuning or unlearning when only API access is available.

Chat is not available.