Skip to yearly menu bar Skip to main content


Workshop

Decoupling Dynamics and Reward for Transfer Learning

Harsh Satija · Amy Zhang · Joelle Pineau

East Meeting Level 8 + 15 #11

Tue 1 May, 11 a.m. PDT

Reinforcement Learning (RL) provides a sound decision-theoretic framework to optimize the behavior of learning agents in an interactive setting. However, one of the limitations to applications of RLto real-world tasks is the amount of data required for learning an optimal policy. Our goal is to design an RL model that can be efficiently trained on new tasks, and produce solutions that generalize well beyond the training environment. We take inspiration from Successor Features (Dayan, 1993), which decouples the value function representation into dynamics and rewards, and learns them separately. We take this further by explicitly decoupling learning the state representation, reward function, forward dynamics, and inverse dynamics of the environment. We posit that we can learn a representation space \mathcal{Z} via this decoupling that makes downstream learning easier as: (1) the modules can be learned separately enabling efficient reuse of common knowledge across tasks to quickly adapt to new tasks; (2) the modules can be optimized jointly leading to a representation space that is adapted to the policy and value function, rather than only the observation space; (3) the dynamics model enables forward search and planning, in the usual model-based RL way. Our approach is the first model-based RL method to explicitly incorporate learning of inverse dynamics, and we show that this plays an important role in stabilizing learning

Live content is unavailable. Log in and register to view live content