Hyperspherical Filtering for Online Classification under Drift
Abstract
In online learning, a model processes a nonstationary data stream by alternating between training and prediction steps. Recent work has employed a Gaussian Kalman filter with learnable forgetting coefficient to adapt last-layer classifier weights under sudden distribution shift. Gaussian models assume Euclidean geometry, while softmax heads (especially with normalized features) are primarily directional. We investigate this limitation by modeling each class weight on the hypersphere with a von Mises--Fisher (vMF) posterior. On various drift tasks with pretrained backbones, the vMF filter consistently improves negative log-likelihood, expected calibration error, and Brier score compared to Gaussian Kalman filtering, at the cost of a small reduction in average online accuracy.