Poster
Bandit Learning in Matching Markets with Indifference
Fang Kong · Jingqi Tang · Mingzhu Li · Pinyan Lu · John C.S. Lui · Shuai Li
Hall 3 + Hall 2B #454
[
Abstract
]
Fri 25 Apr midnight PDT
— 2:30 a.m. PDT
Abstract:
A rich line of recent works studies how participants in matching markets learn their unknown preferences through iterative interactions with each other. The two sides of participants in the market can be respectively formulated as players and arms in the bandit problem. To ensure market stability, the objective is to minimize the stable regret of each player. Though existing works provide significant theoretical upper bounds for players' stable regret, the results heavily rely on the assumption that each participant has a strict preference ranking. However, in real applications, multiple candidates (e.g., workers in the labor market and students in school admission) usually demonstrate comparable performance levels, making it challenging for participants (e.g., employers and schools) to differentiate and rank their preferences. To deal with the potential indifferent preferences, we propose an adaptive exploration algorithm based on arm-guided Gale-Shapley (AE-AGS). We show that its stable regret is of order , where is the number of players, the number of arms, the total time horizon, and the minimum non-zero preference gap. Extensive experiments demonstrate the algorithm's effectiveness in handling such complex situations and its consistent superiority over baselines.
Live content is unavailable. Log in and register to view live content