ICLR Poster Sample-Efficiency in Multi-Batch Reinforcement Learning: The Need for Dimension-Dependent Adaptivity

Poster

Sample-Efficiency in Multi-Batch Reinforcement Learning: The Need for Dimension-Dependent Adaptivity

Emmeran Johnson · Ciara Pike-Burke · Patrick Rebeschini

Halle B #258

[ Abstract ]

[ Poster] [ OpenReview]

Abstract: We theoretically explore the relationship between sample-efficiency and adaptivity in reinforcement learning. An algorithm is sample-efficient if it uses a number of queries

n

$n$ to the environment that is polynomial in the dimension

d

$d$ of the problem. Adaptivity refers to the frequency at which queries are sent and feedback is processed to update the querying strategy. To investigate this interplay, we employ a learning framework that allows sending queries in

K

$K$ batches, with feedback being processed and queries updated after each batch. This model encompasses the whole adaptivity spectrum, ranging from non-adaptive `offline' (

K = 1

$K=1$ ) to fully adaptive (

K = n

$K=n$ ) scenarios, and regimes in between. For the problems of policy evaluation and best-policy identification under

d

$d$ -dimensional linear function approximation, we establish

Ω (log log d)

$\Omega(\log \log d)$ lower bounds on the number of batches

K

$K$ required for sample-efficient algorithms with

n = O (p o l y (d))

$n = O(poly(d))$ queries. Our results show that just having adaptivity (

K > 1

$K>1$ ) does not necessarily guarantee sample-efficiency. Notably, the adaptivity-boundary for sample-efficiency is not between offline reinforcement learning (

K = 1

$K=1$ ), where sample-efficiency was known to not be possible, and adaptive settings. Instead, the boundary lies between different regimes of adaptivity and depends on the problem dimension.

Chat is not available.