Reinforcement Learning with World Models for Optimizing Alzheimer’s Disease Treatment Timing and Dosing
Abstract
Recent work reports pharmacologic reversal of advanced Alzheimer’s disease (AD) phenotypes in mouse models via restoration of NAD+ homeostasis, shifting the therapeutic question from whether reversal is possible to how to deploy reversal-capable interventions over time. We cast timing and dosing as a long-horizon, constrained, partially observable sequential decision problem and propose a world-model-centric solution: learn an action-conditioned disease simulator from longitudinal biomarkers and optimize dosing with uncertainty-aware planning and conservative offline reinforcement learning (RL). To ground the approach with executable experiments, we introduce ALZWORLD, a minimal synthetic benchmark that captures qualitative NAD+-linked degeneration and reversal and surfaces core failure modes of world-model control (horizon sensitivity, model exploitation, and safety constraint violations). In ALZWORLD, planning in the learned simulator discovers adaptive schedules that match aggressive fixed-dose baselines while using lower cumulative exposure, illustrating the value of “imagination” for principled efficacy–burden trade-offs. We conclude with a translational roadmap for mouse and human “digital-twin” world models, emphasizing calibrated uncertainty, counterfactual validation on held-out protocols, and safety-by-design via physiological homeostasis constraints. Our work focuses on synthetic proof-of-concept validation; real-data application requires addressing multimodal missingness and confounding in observational cohorts.