Bayesian Post Training Enhancement of Regression Models with Calibrated Rankings
Abstract
Accurate regression models are essential for scientific discovery, yet high-quality numeric labels are scarce and expensive. In contrast, rankings (especially pairwise) are easier to obtain from domain experts or artificial intelligence judges. We introduce RankRefine++, a novel plug-and-play method that improves a base regressor's prediction for a query by leveraging pairwise rankings between the query and reference items with known labels. RankRefine++ performs a Bayesian update that combines a Gaussian likelihood from the regressor and the Bradley-Terry likelihood from the ranker. This yields a strictly log-concave posterior with a unique maximum likelihood solution and fast Newton updates. We show that prior state-of-the-art is a special case of our framework, and we identify a fundamental failure mode: Bradley-Terry likelihoods suffer from scale mismatch and curvature dominance when the number of reference items is large, which can degrade performance. From this analysis, we derive a calibration method to adjust the information originating from the expert rankings. RankRefine++ shows a stunning 97.65\% median improvement across 12 datasets over previous state-of-the-art method using a realistically-accurate ranker, and runs efficiently on a consumer-grade CPU.