Skip to yearly menu bar Skip to main content


Poster

Offline RL in Regular Decision Processes: Sample Efficiency via Language Metrics

Ahana Deb · Roberto Cipollone · Anders Jonsson · Alessandro Ronca · Mohammad Sadegh Talebi

Hall 3 + Hall 2B #398
[ ]
Wed 23 Apr 7 p.m. PDT — 9:30 p.m. PDT

Abstract:

This work studies offline Reinforcement Learning (RL) in a class of non-Markovian environments called Regular Decision Processes (RDPs). In RDPs, the unknown dependency of future observations and rewards from the past interactions can be captured by some hidden finite-state automaton. For this reason, many RDP algorithms first reconstruct this unknown dependency using automata learning techniques. In this paper, we consider episodic RDPs and show that it is possible to overcome the limitations of existing offline RL algorithms for RDPs viathe introduction of two original techniques: a novel metric grounded in formal language theory and an approach based on Count-Min-Sketch (CMS). Owing to the novel language metric, our algorithm is proven to be more sample efficient than existing results, and in some problem instances admitting low complexity languages, the gain is showcased to be exponential in the episode length. The CMS-based approach removes the need for naïve counting and alleviates the memory requirements for long planning horizons. We derive Probably Approximately Correct (PAC) sample complexity bounds associated to each of these techniques, and validate the approach experimentally.

Live content is unavailable. Log in and register to view live content