Skip to yearly menu bar Skip to main content


TamperBench: A Systematic Framework to Stress-Test LLM Safety Under Fine-Tuning and Tampering

Saad Hossain ⋅ Tom Tseng ⋅ Punya Syon Pandey ⋅ Samanvay Vajpayee ⋅ Matthew Kowal ⋅ Nayeema Nonta ⋅ Samuel Simko ⋅ Stephen Casper ⋅ Zhijing Jin ⋅ Kellin Pelrine ⋅ Sirisha Rambhatla

Abstract

Chat is not available.