Poster
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Toshinori Kitamura · Tadashi Kozuno · Wataru Kumagai · Kenta Hoshino · Yohei Hosoe · Kazumi Kasaura · Masashi Hamaya · Paavo Parmas · Yutaka Matsuo
Hall 3 + Hall 2B #460
[
Abstract
]
Fri 25 Apr midnight PDT
— 2:30 a.m. PDT
Abstract:
Designing a safe policy for uncertain environments is crucial in real-world control systems. However, this challenge remains inadequately addressed within the Markov decision process (MDP) framework. This paper presents the first algorithm guaranteed to identify a near-optimal policy in a robust constrained MDP (RCMDP), where an optimal policy minimizes cumulative cost while satisfying constraints in the worst-case scenario across a set of environments. We first prove that the conventional policy gradient approach to the Lagrangian max-min formulation can become trapped in suboptimal solutions. This occurs when its inner minimization encounters a sum of conflicting gradients from the objective and constraint functions. To address this, we leverage the epigraph form of the RCMDP problem, which resolves the conflict by selecting a single gradient from either the objective or the constraints. Building on the epigraph form, we propose a bisection search algorithm with a policy gradient subroutine and prove that it identifies an εε-optimal policy in an RCMDP with ˜O(ε−4) robust policy evaluations.
Live content is unavailable. Log in and register to view live content