Skip to yearly menu bar Skip to main content


Optimal Learning-Rate Schedules under Functional Scaling Laws: Power Decay and Warmup-Stable-Decay

Binghui Li ⋅ Zilin Wang ⋅ Fengling Chen ⋅ Shiyang Zhao ⋅ Ruiheng Zheng ⋅ Lei Wu

Abstract

Chat is not available.