Detecting Distributional Drift in Transformers Through Representation Dynamics
Abstract
We introduce a representation-level framework for monitoring distributional drift in transformer hidden states during autoregressive generation. Given a domain corpus, we construct per-layer manifold representations and measure drift using hash-based fingerprinting and Mahalanobis distance, enabling continuous monitoring of how model representations evolve during generation. Through systematic analysis of seven architectures spanning 0.5B to 8B parameters across varying generation lengths, we discover universal pre-equilibrated dynamics where drift follows first-order autoregressive processes with negative feedback, equilibrating from initialization rather than converging gradually. Cross-domain validation reveals architecture-dependent robustness patterns: while most models maintain consistent dynamics across domains, certain architectures exhibit length-dependent breakdown in off-domain settings, characterized by equilibrium collapse, dynamics failure, and noise explosion. Hash-based drift measurement achieves optimal monitoring performance with minimal computational overhead, enabling real-time drift detection and out-of-distribution identification. These findings provide foundations for principled drift monitoring in production deployment of large language models.