Emergent Dexterity Via Diverse Resets and Large-Scale Reinforcement Learning
Abstract
Reinforcement learning in GPU-enabled physics simulation has been the driving force behind many of the breakthroughs in sim-to-real robot learning. However, current approaches for data generation in simulation are unwieldy and task-specific, requiring extensive human effort to engineer training curricula and rewards. Even with this engineering, these approaches still struggle to reliably solve long-horizon, dexterous manipulation tasks. To provide a seamless tool for robotic data generation in simulation, we introduce a simple framework that enables on-policy reinforcement learning to reliably solve an array of such tasks with a single reward function, set of algorithm hyper-parameters, no auto-curricula, and no human demonstrations. Our key insight is careful usage of diverse simulator resets for simplifying long-horizon exploration challenges. Our proposed system, OmniReset, automatically generates these resets with minimal human input and gracefully scales with compute to solve dexterous, contact-rich long-horizon tasks. OmniReset outperforms baselines on easier versions of our tasks, and scales to tasks with complexities beyond the reach of existing techniques. Finally, we use this data-generation methodology to create a large dataset of trajectories in simulation, and show that augmenting it with a small amount of real-world data enables successful real-world transfer for complex manipulation tasks.