Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Skip to yearly menu bar Skip to main content


Poster

On Stochastic Contextual Bandits with Knapsacks in Small Budget Regime

Hengquan Guo · Xin Liu

Hall 3 + Hall 2B #383
[ ]
Wed 23 Apr 7 p.m. PDT — 9:30 p.m. PDT

Abstract: This paper studies stochastic contextual bandits with knapsack constraints (CBwK), where a learner observes a context, takes an action, receives a reward, and incurs a vector of costs at every round. The learner aims to maximize the cumulative rewards across T rounds under the knapsack constraints with an initial budget of B. We study CBwK in the small budget regime where the budget B=Ω(T)and propose an Adaptive and Universal Primal--Dual algorithm (AUPD) that achieves strong regret performance: i) AUPD achieves ˜O((1+νδb)T) regret under the strict feasibility assumption without any prior information, matching the best-known bounds;ii) AUPD achieves ˜O(T+νbT34) regret without strict feasibility assumption, which, to the best of our knowledge, is the first result in the literature. Here, the parameter ν represents the optimal average reward; b=B/T is the average budget and δb is the feasibility/safety margin.We establish these strong results through the adaptive budget-aware design, which effectively balances reward maximization and budget consumption. We provide a new perspective on analyzing budget consumption using the Lyapunov drift method, along with a refined analysis of its cumulative variance. Our theory is further supported by experiments conducted on a large-scale dataset.

Live content is unavailable. Log in and register to view live content