Toggle Poster Visibility
Oral
Fri Apr 24 06:30 AM -- 06:40 AM (PDT) None
Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation
[
OpenReview]
Oral
Fri Apr 24 06:42 AM -- 06:52 AM (PDT) None
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training
[
OpenReview]
Oral
Fri Apr 24 06:54 AM -- 07:04 AM (PDT) None
Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks
[
OpenReview]
Oral
Fri Apr 24 07:06 AM -- 07:16 AM (PDT) None
How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining
[
OpenReview]
Oral
Fri Apr 24 07:30 AM -- 07:40 AM (PDT) None
Softmax Transformers are Turing-Complete
[
OpenReview]
Successful Page Load