Skip to yearly menu bar Skip to main content


Poster

Combatting Dimensional Collapse in LLM Pre-Training Data via Submodular File Selection

Ziqing Fan · Siyuan Du · Shengchao Hu · Pingjie Wang · Li Shen · Ya Zhang · Dacheng Tao · Yanfeng Wang
2025 Poster

Abstract

Video

Chat is not available.