LEARNING REGULATORY-AWARE AGENTIC POLICIES VIA ENDOGENOUS CONSTRAINT DISCOVERY
Abstract
We present Endogenous Constraint Discovery (ECD), a framework enabling autonomous agents to learn regulatory compliance when constraints are implicit, non-stationary, and signaled sparsely with delays. Unlike constrained RL approaches with fixed constraints, ECD models regulation as a latent process inferred from experience. By treating regulatory risk as epistemic uncertainty and jointly learning policy, regime beliefs, and violation predictors, ECD encourages cautious actions under ambiguity and rapid adaptation to regime shifts. In a synthetic financial trading environment, ECD cuts regulatory violations by 50% compared to fixed-constraint baselines while maintaining competitive performance, providing evidence for the value of endogenous constraint learning.