Skip to yearly menu bar Skip to main content


(7 events)   Timezone:  
Show all
Toggle Poster Visibility
Oral
Fri Apr 24 06:30 AM -- 06:40 AM (PDT) @ 202 A/B None
Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation
Zhengbo Wang ⋅ Jian Liang ⋅ Ran He ⋅ Zilei Wang ⋅ Tieniu Tan
[ OpenReview
Oral
Fri Apr 24 06:42 AM -- 06:52 AM (PDT) @ 202 A/B None
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training
Changxin Tian ⋅ Jiapeng Wang ⋅ Qian Zhao ⋅ Kunlong Chen ⋅ Jia Liu ⋅ Ziqi Liu ⋅ Jiaxin Mao ⋅ Xin Zhao ⋅ Zhiqiang Zhang ⋅ JUN ZHOU
[ OpenReview
Oral
Fri Apr 24 06:54 AM -- 07:04 AM (PDT) @ 202 A/B None
Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks
Taishi Nakamura ⋅ Satoki Ishikawa ⋅ Masaki Kawamura ⋅ Okamoto ⋅ Daisuke Nohara ⋅ Jun Suzuki ⋅ Rio Yokota
[ OpenReview
Oral
Fri Apr 24 07:06 AM -- 07:16 AM (PDT) @ 202 A/B None
How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining
Kairong Luo ⋅ Zhenbo Sun ⋅ Haodong Wen ⋅ Xinyu Shi ⋅ Jiarui Cui ⋅ Chenyi Dang ⋅ Kaifeng Lyu ⋅ Wenguang Chen
[ OpenReview
Oral
Fri Apr 24 07:18 AM -- 07:28 AM (PDT) @ 202 A/B None
In-Place Test-Time Training
Guhao Feng ⋅ Shengjie Luo ⋅ Kai Hua ⋅ Ge Zhang ⋅ Wenhao Huang ⋅ Di He ⋅ Tianle Cai
[ OpenReview
Oral
Fri Apr 24 07:30 AM -- 07:40 AM (PDT) @ 202 A/B None
Softmax Transformers are Turing-Complete
Hongjian Jiang ⋅ Michael Hahn ⋅ Georg Zetzsche ⋅ Anthony W. Lin
[ OpenReview
Oral
Fri Apr 24 07:42 AM -- 07:52 AM (PDT) @ 202 A/B None
Pre-training under infinite compute
Konwoo Kim ⋅ Suhas Kotha ⋅ Percy Liang ⋅ Tatsunori Hashimoto
[ OpenReview