Skip to yearly menu bar Skip to main content

Workshop: Reincarnating Reinforcement Learning

TGRL: Teacher Guided Reinforcement Learning Algorithm for POMDPs

Idan Shenfeld · Zhang-Wei Hong · Aviv Tamar · Pulkit Agrawal

Abstract: In many real-world problems, an agent must operate in an uncertain and partially observable environment. Due to partial information, a policy directly trained to operate from these restricted observations tends to perform poorly. In some scenarios, during training more information about the environment is available, which can be utilized to find a superior policy. Because this privileged information is unavailable at deployment, such a policy cannot be deployed. The $\textit{teacher-student}$ paradigm overcomes this challenge by using actions of privileged (or $\textit{teacher}$) policy as the target for training the deployable (or $\textit{student}$) policy operating from the restricted observation space using supervised learning. However, due to information asymmetry, it is not always feasible for the student to perfectly mimic the teacher. We provide a principled solution to this problem, wherein the student policy dynamically balances between following the teacher's guidance and utilizing reinforcement learning to solve the partially observed task directly. The proposed algorithm is evaluated on diverse domains and fares favorably against strong baselines.

Chat is not available.