ASTRA: Statistically Robust Model Selection from Cross-Validation
Abstract
Current standard practices for comparing machine learning models in low-data regimes, common in materials discovery, lack statistical rigour. We present Automated model selection using Statistical Testing for Robust Algorithms (ASTRA), which combines model training using cross-validation (CV) with statistical hypothesis testing to identify significantly better performing models. Evaluating ASTRA on hundreds of synthetic data sets and real-life drug discovery data sets from the ASAP Discovery x OpenADMET challenge shows that it selects better models than choosing the model with the best mean or median CV score, in particular in classification settings and when CV scores do not correlate significantly with test performance. ASTRA will make it easier to develop new approaches that significantly outperform previous models, and its modular and customisable design allows users to seamlessly integrate it into existing machine learning workflows. ASTRA is freely available in a GitHub repository.