Directly Optimizing Calibrated Test-Time Uncertainty
Carlos Stein Brito
Abstract
Uncertainty in learned predictors is often split into aleatoric and epistemic components using architectural choices or Bayesian approximations, making the decomposition sensitive to modeling details. We propose an objective-driven decomposition into predictive noise ($\psi$) and generalization noise ($\phi$). Predictive noise represents the residual stochasticity needed to fit the training data under a chosen likelihood and model class, while generalization noise captures instability of the learned predictor as revealed by held-out data. Both noises can be instantiated as additive randomness in the predictive distribution (output-only or internal), and they are separable because they are optimized on different splits and losses: standard training NLL for $(\theta,\psi)$ and a held-out marginal log-likelihood for $\phi$. The resulting total predictive distribution improves reliability without explicit ensembles and yields noise-budget learning curves that explain how performance changes across data size and capacity. We demonstrate the decomposition on a controlled mixture model and on MLP regression.
Chat is not available.
Successful Page Load