Efficient Benchmarking of Functional Connectivity Modeling via Structure-aware Core-set Selection
Abstract
Benchmarking the hundreds of available functional connectivity (FC) models on large fMRI datasets is critical for reproducible neuroscience, but is often computationally infeasible, with full-scale comparisons requiring months of compute time. This creates a critical bottleneck, hindering data-driven model selection. To break this bottleneck, we address the challenge of FC benchmarking by introducing a pre-analytical step: selecting a small, representative core-set whose sole purpose is to preserve the relative performance ranking of FC models. We formulate this as a ranking recommendation problem and propose Structure-aware Contrastive Learning for Core-set Selection (SCLCS), a self-supervised framework to select these core-sets. SCLCS first uses an adaptive Transformer to learn each sample's unique FC structure. It then introduces a novel Structural Perturbation Score (SPS) to quantify the stability of these learned structures during training, identifying samples that represent foundational connectivity archetypes. Finally, it combines this stability-based ranking with a density-aware sampling strategy to ensure the selected core-set is both robust and diverse. On the large-scale REST-meta-MDD dataset, SCLCS preserves the ground-truth model ranking with just 10% of the data, outperforming state-of-the-art (SOTA) selection methods by up to 23.2% in ranking consistency (nDCG@k). To our knowledge, this is the first work to formalize core-set selection for FC model benchmarking, making previously intractable large-scale model comparisons feasible.