ICLR Poster Demystifying Linear MDPs and Novel Dynamics Aggregation Framework

Poster

Demystifying Linear MDPs and Novel Dynamics Aggregation Framework

Joongkyu Lee · Min-hwan Oh

Halle B #197

[ Abstract ]

[ Poster] [ OpenReview]

Abstract: In this work, we prove that, in linear MDPs, the feature dimension

d

$d$ is lower bounded by

S / U

$S/U$ in order to aptly represent transition probabilities, where

S

$S$ is the size of the state space and

U

$U$ is the maximum size of directly reachable states.Hence,

d

$d$ can still scale with

S

$S$ depending on the direct reachability of the environment. To address this limitation of linear MDPs, we propose a novel structural aggregation framework based on dynamics, named as the *dynamics aggregation*. For this newly proposed framework, we design a provably efficient hierarchical reinforcement learning algorithm in linear function approximation that leverages aggregated sub-structures. Our proposed algorithm exhibits statistical efficiency, achieving a regret of

\tilde{O} (d_{ψ}^{3 / 2} H^{3 / 2} \sqrt{N T})

$\tilde{O} \big( d_{\psi}^{3/2} H^{3/2}\sqrt{ NT} \big)$ , where

d_{ψ}

$d_{\psi}$ represents the feature dimension of *aggregated subMDPs* and

N

$N$ signifies the number of aggregated subMDPs. We establish that the condition

d_{ψ}^{3} N ≪ d^{3}

$d_{\psi}^3 N \ll d^{3}$ is readily met in most real-world environments with hierarchical structures, enabling a substantial improvement in the regret bound compared to LSVI-UCB, which enjoys a regret of

\tilde{O} (d^{3 / 2} H^{3 / 2} \sqrt{T})

$\tilde{O}(d^{3/2} H^{3/2} \sqrt{ T})$ . To the best of our knowledge, this work presents the first HRL algorithm with linear function approximation that offers provable guarantees.

Chat is not available.