Beyond Edge Prediction: Residual Set Modeling for Combinatorial Gene Regulation
Abstract
Despite recent advances in genomic machine learning, progress in gene regulatory network (GRN) inference remains constrained by evaluation objectives that fail to reflect the combinatorial nature of biological regulation. Most existing methods frame GRN inference as pairwise edge prediction, even though gene expression is often controlled by specific sets of interacting regulators. As a result, strong edge-level performance does not necessarily imply recovery of the underlying regulatory mechanism. We reframe target-specific GRN inference as Exact Regulator Set Recovery and introduce a two-stage Filter-and-Refine pipeline. Stage-1 retrieves a high-recall candidate pool using a target-conditioned attention retriever, while Stage-2 selects the regulator set using a residual high-order set scorer built on top of a decomposable pairwise baseline. Through controlled experiments on synthetic single-cell data with known combinatorial ground truth, we show that improved retrieval ceilings are critical for exact recovery, and that residual modeling of non-additive regulator interactions substantially improves recovery of regulatory logic under sparse supervision.