Processing math: 100%
Skip to yearly menu bar Skip to main content


Poster

Regret-Optimal List Replicable Bandit Learning: Matching Upper and Lower Bounds

Michael Chen · A. Pavan · N. V. Vinodchandran · Ruosong Wang · Lin Yang

Hall 3 + Hall 2B #446
[ ]
Fri 25 Apr midnight PDT — 2:30 a.m. PDT

Abstract: This paper investigates *list replicability* [Dixon et al., 2023] in the context of multi-armed (also linear) bandits (MAB). We define an algorithm A for MAB to be (,δ)-list replicable if with probability at least 1δ, A has at most traces in independent executions even with different random bits, where a trace means sequence of arms played during an execution. For k-armed bandits, although the total number of traces can be Ω(kT) for a time horizon T, we present several surprising upper bounds that either independent of or logarithmic of T: (1) a (2k,δ)-list replicable algorithm with near-optimal regret, ˜O(kT), (2) a (O(k/δ),δ)-list replicable algorithm with regret ˜O(kδkT), (3) a ((k+1)B1,δ)-list replicable algorithm with regret ˜O(k32T12+2(B+1)) for any integer B>1. On the other hand, for the sublinear regret regime, we establish a matching lower bound on the list complexity (parameter ). We prove that there is no (k1,δ)-list replicable algorithm with o(T)-regret. This is optimal in list complexity in the sub-linear regret regime as there is a (k,0)-list replicable algorithm with O(T2/3)-regret. We further show that for linear bandits with d-dimensional features, there is a ˜O(d2T1/2+2(B+1))-regret algorithm with ((2d+1)B1,δ)-list replicability, for B>1, even when the number of possible arms can be infinite.

Live content is unavailable. Log in and register to view live content