Skip to yearly menu bar Skip to main content


TamperBench: A Systematic Framework to Stress-Test LLM Safety Under Fine-Tuning and Tampering

Saad Hossain ⋅ Tom Tseng ⋅ Punya Syon Pandey ⋅ Samanvay Vajpayee ⋅ Nayeema Nonta ⋅ Matthew Kowal ⋅ Samuel Simko ⋅ Stephen Casper ⋅ Zhijing Jin ⋅ Kellin Pelrine ⋅ Sirisha Rambhatla

Abstract

Chat is not available.