Poster
Regret-Optimal List Replicable Bandit Learning: Matching Upper and Lower Bounds
Michael Chen · A. Pavan · N. V. Vinodchandran · Ruosong Wang · Lin Yang
Hall 3 + Hall 2B #446
[
Abstract
]
Fri 25 Apr midnight PDT
— 2:30 a.m. PDT
Abstract:
This paper investigates *list replicability* [Dixon et al., 2023] in the context of multi-armed (also linear) bandits (MAB). We define an algorithm A for MAB to be (ℓ,δ)-list replicable if with probability at least 1−δ, A has at most ℓ traces in independent executions even with different random bits, where a trace means sequence of arms played during an execution. For k-armed bandits, although the total number of traces can be Ω(kT) for a time horizon T, we present several surprising upper bounds that either independent of or logarithmic of T: (1) a (2k,δ)-list replicable algorithm with near-optimal regret, ˜O(√kT), (2) a (O(k/δ),δ)-list replicable algorithm with regret ˜O(kδ√kT), (3) a ((k+1)B−1,δ)-list replicable algorithm with regret ˜O(k32T12+2−(B+1)) for any integer B>1. On the other hand, for the sublinear regret regime, we establish a matching lower bound on the list complexity (parameter ℓ). We prove that there is no (k−1,δ)-list replicable algorithm with o(T)-regret. This is optimal in list complexity in the sub-linear regret regime as there is a (k,0)-list replicable algorithm with O(T2/3)-regret. We further show that for linear bandits with d-dimensional features, there is a ˜O(d2T1/2+2−(B+1))-regret algorithm with ((2d+1)B−1,δ)-list replicability, for B>1, even when the number of possible arms can be infinite.
Live content is unavailable. Log in and register to view live content