Diversified Multinomial Logit Contextual Bandits
Heesang Ann ⋅ Taehyun Hwang ⋅ Min-hwan Oh
Abstract
Existing contextual multinomial logit (MNL) bandits model relevance-driven choice but ignore the potential benefits of within-assortment diversity, while submodular/combinatorial bandits encode diversity in rewards but lack structured choice probabilities. We bridge this gap with the *diversified multinomial logit* (DMNL) contextual bandit, which augments MNL choice probabilities with a generally submodular diversity function, thereby formalizing the relevance—diversity trade-off within a single model. Incorporating diversity renders exact MNL assortment optimization intractable. We propose a *white-box* UCB-based algorithm, `OFU-DMNL`, that constructs assortments item-wise by maximizing optimistic marginal gains, avoids black-box optimization oracles, and provides end-to-end guarantees. We show that `OFU-DMNL` achieves at least a $(1-\tfrac{1}{e+1})$-*approximate* regret bound $\tilde{O}\big(d \sqrt{T/K}\big)$, where $d$ is the context dimension, $K$ the maximum assortment size, and $T$ the horizon, and attains an improved approximation factor over standard submodular baselines. Experiments demonstrate consistent gains and, relative to exhaustive enumeration, comparable regret with substantially lower runtime. Overall, DMNL bandits provide a principled and practical foundation for diversity-aware assortment optimization under uncertainty, and `OFU-DMNL` offers a statistically and computationally efficient solution.
Successful Page Load