Skip to yearly menu bar Skip to main content


Poster Thu, Apr 23, 2026 • 11:15 AM – 1:45 PM PDT Pavilion 3 P3-#726

GneissWeb: Preparing High Quality Data for LLMs at Scale

Hajar Emami Gohari ⋅ Swanand Kadhe ⋅ Yousaf Shah ⋅ Constantin Adam ⋅ Abdulhamid Adebayo ⋅ Praneet Adusumilli ⋅ Farhan Ahmed ⋅ Nathalie Baracaldo ⋅ Santosh Borse ⋅ Yuan-Chi Chang ⋅ Xuan-Hong Dang ⋅ Nirmit Desai ⋅ Revital Eres ⋅ Ran Iwamoto ⋅ Alexei Karve ⋅ Yan Koyfman ⋅ Wei-Han Lee ⋅ Changchang Liu ⋅ Boris Lublinsky ⋅ Takuya Ohko ⋅ Pablo Pesce ⋅ Maroun Touma ⋅ Shiqiang Wang ⋅ Shalisha Witherspooon ⋅ Herbert Woisetschläger ⋅ David Wood ⋅ Kun-Lung Wu ⋅ Issei Yoshida ⋅ Syed Zawad ⋅ Petros Zerfos ⋅ Yi Zhou ⋅ Bishwaranjan Bhattacharjee

Abstract

Log in and register to view live content