Skip to yearly menu bar Skip to main content


Poster

Demystifying Linear MDPs and Novel Dynamics Aggregation Framework

Joongkyu Lee · Min-hwan Oh

Halle B #197

Abstract: In this work, we prove that, in linear MDPs, the feature dimension d is lower bounded by S/U in order to aptly represent transition probabilities, where S is the size of the state space and U is the maximum size of directly reachable states.Hence, d can still scale with S depending on the direct reachability of the environment. To address this limitation of linear MDPs, we propose a novel structural aggregation framework based on dynamics, named as the *dynamics aggregation*. For this newly proposed framework, we design a provably efficient hierarchical reinforcement learning algorithm in linear function approximation that leverages aggregated sub-structures. Our proposed algorithm exhibits statistical efficiency, achieving a regret of O~(dψ3/2H3/2NT), where dψ represents the feature dimension of *aggregated subMDPs* and N signifies the number of aggregated subMDPs. We establish that the condition dψ3Nd3 is readily met in most real-world environments with hierarchical structures, enabling a substantial improvement in the regret bound compared to LSVI-UCB, which enjoys a regret of O~(d3/2H3/2T). To the best of our knowledge, this work presents the first HRL algorithm with linear function approximation that offers provable guarantees.

Chat is not available.