Effect of Parallel Environments and Rollout Steps in PPO
Teerthaa Parakh
Abstract
The blog post explores batch size in PPO - what happens when we increase the number of parallel environments versus the number of rollout steps, while keeping the total samples per update fixed. We discuss how this affects bias and variance in gradient estimation.
Successful Page Load