ACTIVEGENE: REWARD-FREE, HOMEOSTASIS- ALIGNED CONTROL FOR CLOSED-LOOP GENE REGULATION VIA ACTIVE INFERENCE
Abstract
Reinforcement learning (RL) is increasingly used to frame closed-loop genomics as sequential decision-making, but its reliance on scalar rewards makes biological control brittle: minor specification errors can induce reward-hacking--like solutions and require extensive, context-specific reward shaping~\citep{Amodei2016Concrete,RewardHacking2024}. We introduce \textbf{ActiveGene}, a \emph{conceptual framework and benchmark specification} for reward-free gene regulation that replaces engineered utilities with \emph{prior preferences} over future assay outcomes/states---a distributional definition of ``healthy'' aligned with biological homeostasis. ActiveGene selects intervention policies by minimizing \emph{Expected Free Energy} (EFE), which trades off reaching preferred outcomes (risk/pragmatic value) with resolving uncertainty (epistemic value) under partial observability, avoiding ad-hoc exploration bonuses and hand-tuned penalty terms. To make the proposal operational without wet-lab access, we propose \textbf{ActiveGeneBench}: a POMDP-style virtual-cell environment separating latent cellular state from noisy single-cell observations and supporting sequential perturbations (e.g., CRISPRi/a/KO, dosing). We outline method-agnostic evaluation metrics---target attainment, safety-violation probability, intervention cost, and sample efficiency---and argue that planning under interventions is a missing axis in current static perturbation-prediction evaluations~\citep{PerturBench2024}.