Skip to yearly menu bar Skip to main content


(7 events)   Timezone:  
Show all
The 2026 schedule is still incomplete
Toggle Poster Visibility
Oral
Fri Apr 24 06:30 AM -- 06:40 AM (PDT) None
Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation
Zhengbo Wang · Jian Liang · Ran He · Zilei Wang · Tieniu Tan
[ OpenReview
Oral
Fri Apr 24 06:42 AM -- 06:52 AM (PDT) None
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training
Changxin Tian · Jiapeng Wang · Qian Zhao · Kunlong Chen · Jia Liu · Ziqi Liu · Jiaxin Mao · Xin Zhao · Zhiqiang Zhang · JUN ZHOU
[ OpenReview
Oral
Fri Apr 24 06:54 AM -- 07:04 AM (PDT) None
Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks
Taishi Nakamura · Satoki Ishikawa · Masaki Kawamura · Okamoto · Daisuke Nohara · Jun Suzuki · Rio Yokota
[ OpenReview
Oral
Fri Apr 24 07:06 AM -- 07:16 AM (PDT) None
How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining
Kairong Luo · Zhenbo Sun · Haodong Wen · Xinyu Shi · Jiarui Cui · Chenyi Dang · Kaifeng Lyu · Wenguang Chen
[ OpenReview
Oral
Fri Apr 24 07:18 AM -- 07:28 AM (PDT) None
In-Place Test-Time Training
Guhao Feng · Shengjie Luo · Kai Hua · Ge Zhang · Wenhao Huang · Di He · Tianle Cai
[ OpenReview
Oral
Fri Apr 24 07:30 AM -- 07:40 AM (PDT) None
Softmax Transformers are Turing-Complete
Hongjian Jiang · Michael Hahn · Georg Zetzsche · Anthony W. Lin
[ OpenReview
Oral
Fri Apr 24 07:42 AM -- 07:52 AM (PDT) None
Pre-training under infinite compute
Konwoo Kim · Suhas Kotha · Percy Liang · Tatsunori Hashimoto
[ OpenReview