Just Enough Learning: GRPO-Guided Controllers for Hyperparameter Sweeps
Abstract
Hyperparameter optimization remains a persistent bottleneck in deep learning, requiring expensive sweeps for each new model or dataset. We propose JEL (Just Enough Learning), a lightweight learned controller that adjusts optimizer hyperparameters throughout training. JEL treats the training process as an episodic reinforcement learning problem: at fixed decision intervals, a compact policy network observes training progress and outputs multiplicative corrections to learning rate and weight decay applied on top of a strong base optimizer. We train the controller using a modified group-relative policy optimization (GRPO) objective that removes length and per-group variance normalizations to avoid biasing the learning signal. On transformer pretraining tasks, JEL improves validation performance by 2.5% over schedule-free optimizers at equivalent computational cost, requiring controller training equivalent to only 5.6 training runs, a one-time cost amortized across deployments. JEL achieves performance within 8% of an upper bound from extensive manual experimentation, while already costing less than traditional 6-8 run hyperparameter sweeps, with savings compounding on each subsequent task. Our results demonstrate that a simple learned controller can effectively replace costly hyperparameter searches while maintaining competitive performance.