Spatial Autocorrelation Predicts Cross-Modal Learnability: A Systematic Benchmark of Metabolite Prediction from Gene Expression
Abstract
Understanding which molecular states can be learned across measurement modal- ities is fundamental to multi-omics integration. We systematically evaluate whether metabolite abundances can be predicted from gene expression using the first spatially-matched transcriptomics-metabolomics dataset. Here we present the first comprehensive benchmark of transcriptome-to-metabolome prediction in spatial multi-omics. Our systematic evaluation spans seven architectures, from regularized linear models to graph neural networks, across multiple feature selection strategies and validation protocols, implemented via a reproducible Snakemake pipeline. Moving beyond aggregate performance metrics, we characterize 500 individual metabolites to identify the distributional properties that determine learnability. This analysis reveals a three-tier biological hierarchy linking spatial organization to prediction success and post-translational regulation to fundamental limits, providing quantitative criteria for when computational inference can substitute for experimental measurement.