Poster
in
Workshop: Bridging the Gap Between Practice and Theory in Deep Learning
Scaling laws and Zipf's law in AlphaZero
Oren Neumann · Claudius Gros
Neural scaling laws are a well-documented phenomenon that is widely used to train large models, notably crucial for the training of LLMs. Despite that, no consensus exists regarding the theory behind them. One such theory, the quantization model, suggests neural scaling laws stem from Zipf's law, which is present in language datasets. Here we revisit a known case of power-law scaling in RL, AlphaZero, and show strong evidence suggesting the theory for LLM scaling applies in RL as well. We find that AlphaZero games follow Zipf's law, where the frequency of popular board states drops as a power law. During training, AlphaZero agents react to this power law by prioritizing learning the most frequent board configurations, learning to model early game states first even though they are the hardest to model. We also discuss the origin of Zipf's law in board games and show how this power law can be manipulated through the choice of policy temperature.