Sparse Control of Disease-Aligned Gene Programs in Single-Cell Transcriptomics
Abstract
A central challenge in applying machine learning to biological discovery is not only prediction, but also control: identifying small, interpretable sets of molecular targets whose perturbation can drive cells along meaningful biological trajectories. We introduce Sparse Linear Manifold Control (SLMC), a disease-aligned framework that isolates donor-robust axes of variation in single-cell transcriptomic data and casts target selection as a sparse reconstruction problem. Across diverse human datasets, the resulting objective exhibits strong diminishing returns, enabling simple greedy algorithms to recover compact, disease-relevant gene sets that generalize across donors. By exposing algorithmic structure in disease manifolds, SLMC bridges representation learning and experiment-ready perturbation design for interpretable and actionable biological discovery.