Genomic heterogeneity inflates the performance of variant pathogenicity predictions
Baiyu Lu ⋅ ⋅ Po-Yu Lin ⋅ Nadav Brandes
Abstract
Recent studies have reported unprecedented accuracy predicting pathogenic variants across the genome, including in noncoding regions, using large AI models trained on vast genomic data. We present a comprehensive evaluation of these frontier models, showing that performance is inflated by differences in the prevalence of pathogenic variants across genomic contexts. We identify the best-performing models for each variant type and establish a benchmark to guide future progress.
Chat is not available.
Successful Page Load