LINGOLY-TOO: Disentangling Reasoning from Knowledge with Templatised Orthographic Obfuscation
Jude Khouja · Lingyi Yang · Simi Hellsten · Karolina Korgul · Vlad A. NeacÈ™u · Harry Mayne · Ryan Kearns · Andrew Bean · Adam Mahdi
Abstract
Frontier language models appear strong at solving reasoning problems, but their performance is often inflated by shortcuts such as memorisation and knowledge. We introduce LingOLY-TOO, a challenging reasoning benchmark of 6,995 questions that counters these shortcuts by applying expert-designed obfuscations to Linguistics Olympiad problems. These obfuscations preserve the underlying solution logic while removing orthographic clues that could trigger patterns from memorisation or knowledge. Our experiments show that models exploit shortcuts on the original questions as performance markedly drop upon obfuscation. Even the best reasoning models remain highly sensitive, with scores dropping from around $0.60$ on original problems to $0.48$ after obfuscation. LingOLY-TOO disentangles reasoning from knowledge, offering a clear measure of true reasoning capabilities.
Successful Page Load