Skip to yearly menu bar Skip to main content


Virtual presentation / poster accept

HiT-MDP: Learning the SMDP option framework on MDPs with Hidden Temporal Embeddings

Chang Li · Dongjin Song · Dacheng Tao

Keywords: [ Hiearchical Reinforcement Learning ] [ markov decision process ] [ reinforcement learning ] [ Reinforcement Learning ]


Abstract:

The standard option framework is developed on the Semi-Markov Decision Process (SMDP) which is unstable to optimize and sample inefficient. To this end, we propose the Hidden Temporal MDP (HiT-MDP) and prove that the option-induced HiT-MDP is homomorphic equivalent to the option-induced SMDP. A novel transformer-based framework is introduced to learn options' embedding vectors (rather than conventional option tuples) on HiT-MDPs. We then derive a stable and sample efficient option discovering method under the maximum-entropy policy gradient framework. Extensive experiments on challenging Mujoco environments demonstrate HiT-MDP's efficiency and effectiveness: under widely used configurations, HiT-MDP achieves competitive, if not better, performance compared to the state-of-the-art baselines on all finite horizon and transfer learning environments. Moreover, HiT-MDP significantly outperforms all baselines on infinite horizon environments while exhibiting smaller variance, faster convergence, and better interpretability. Our work potentially sheds light on the theoretical ground of extending the option framework into a large-scale foundation model.

Chat is not available.