Safe Context Switching for Agents in the Wild: Mitigating Subspace Interference via Orthogonal Adaptation
Abstract
The scalability of agentic world models is currently restricted by a geometric stability-plasticity dilemma: as agents increasingly internalize high-complexity domains (e.g., chain-of-thought reasoning, code generation), the resulting expansion of task manifolds naturally impinges on the latent subspaces representing safety constraints. We capture this as a Sequential Subspace Interference, whereby standard plasticity mechanisms permit high-variance reasoning tasks to non-linearly overwrite alignment priors, which imposes a -23.3% Interference Penalty on safety benchmarks. We introduce a spectral regularization frame- work in our work called AURA (Adaptive Unique Residual Allocation), and it implements World Model Disentanglement through null-space projection. By limiting rank-adaptation updates only to the orthogonal complement of the safety manifold, AURA establishes a verified “Geometric Shield” that makes alignment constraints ultimately topologically invariant to subsequent learning. Empirically, this restores the intrinsic dimension of the safety state to > 0.98 cosine fidelity and recaptures +23.0% of the performance impeded by interference, allowing for the robust evolution of complex, open-ended world models.