Auditing Black-Box Trends: Structural Inductive Bias Facilitates Causal Interpretability in Clinical Time Series
Abstract
The deployment of predictive Transformer architectures in high-stakes healthcare presents a critical safety challenge: the divergence between forecasting accuracy and interventional validity. We term this the "Alignment Gap." In observational data, standard training objectives incentivize models to exploit "confounding by indication," often leading to inverted causal semantics. In this work, we present a simple audit protocol for quantifying this gap. We introduce the Causal Hallucination Score (CHS), a metric measuring the divergence between a foundation model's zero-shot counterfactuals and a structural reference instrument. Applying this to Lag-Llama and Chronos-T5, we reveal a severe safety failure: despite high predictive likelihood, naive prompting of these models reflects the dataset's observational bias (associating life-saving vasopressors with increased mortality). We demonstrate that a Propensity-Regularized GRU-D serves as an effective audit instrument, recovering a directionally consistent therapeutic signal (CATE: +0.005) validated by doubly robust estimation and placebo falsification. We release the code, dataset split, and evaluation protocol as a public benchmark to facilitate future safety audits of clinical foundation models.