Skip to yearly menu bar Skip to main content


The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination

Yifan Sun ⋅ Han Wang ⋅ Dongbai Li ⋅ Gang Wang ⋅ Huan Zhang

Abstract

Video

Chat is not available.