A Near-Optimal Best-of-Both-Worlds Algorithm for Federated Bandits
Zicheng Hu ⋅ Zihao Wang ⋅ Cheng Chen
Abstract
This paper studies federated multi-armed bandit (MAB) problems in which multiple agents work together to solve a common MAB problem through a communication network. We focus on the heterogeneous setting in which no single agent can identify the globally best arm using only locally biased observations. In this setting, different agents may select the same arm at the same time step, but receive different rewards. We propose a novel algorithm called \textsc{FedFTRL} for this problem and, to our knowledge, it is the first to achieve near-optimal regret guarantees in both stochastic and adversarial environments. Notably, in the adversarial regime, our algorithm achieves $O(T^{\frac{1}{2}})$ regret, a significant improvement over the state-of-the-art regret of $O(T^{\frac{2}{3}})$ \citep{yi2023doubly}. We also provide empirical evaluations comparing our algorithm with baseline methods, demonstrating the effectiveness of our approach on both synthetic and real-world datasets.
Successful Page Load