Skip to yearly menu bar Skip to main content


Deriving Hyperparameter Scaling Laws via Modern Optimization Theory

Egor Shulgin ⋅ Dimitri von Rütte ⋅ Tianyue Zhang ⋅ Niccolò Ajroldi ⋅ Bernhard Schölkopf ⋅ Antonio Orvieto

Abstract

Chat is not available.