Optimal Control Meets Online Mechanism: Adaptive Policy Learning with Strategic Agent Response
Abstract
Many platforms and protocols must adaptively choose a policy action (i.e., a price, subsidy, emission rate, or resource allocation) while heterogeneous, self-interested agents respond strategically. We formalize this as a \emph{two-level online mechanism-design} problem: the planner commits to a state-contingent policy (mechanism network) that maps observable system state to an action; agents respond by optimizing private objectives, producing an aggregate equilibrium response. We microfound the agent layer via a heterogeneous-agent threshold equilibrium: the aggregate response is the unique solution to a monotone fixed point and increases smoothly with the planner's action. With adjustment frictions, agent behavior co-adapts with the planner through a controlled diffusion whose drift is locally affine in the action, yielding an explicit continuous-time Bellman/HJB characterization of the optimal value (critic) and a closed-form greedy policy-improvement map that generalizes classical stabilization rules while internalizing the marginal value of strategic participation. For deployment, we use the HJB-derived structure as an expert prior to initialize a compact mechanism network and refine it online via projected stochastic approximation with convergence guarantees. We instantiate the framework on \emph{adaptive token issuance} in blockchain protocols, where the planner sets an issuance rate and agents decide whether to stake. In comprehensive experiments across multiple economic regimes, the adaptive mechanism achieves significant improvements in target tracking and maintains stability relative to fixed and zero-action baselines, with all improvements statistically significant.