Contrastive Alignment of Expression and Copy Number Highlights Dosage-Insensitive Genes in Cancer
Abstract
Copy number variations (CNVs) are a hallmark of cancer genomes, yet the relationship between CNV and gene expression is not strictly deterministic. Some genes maintain stable expression despite copy number changes through regulatory compensation. Identifying such dosage-insensitive genes is challenging, as it requires distinguishing true regulatory escape from technical noise in heterogeneous single-cell data. Here, we present a contrastive learning framework that aligns single-cell RNA-seq expression profiles with inferred CNV patterns in a shared latent space. Our key innovation is hard negative mining, which explicitly trains on cell pairs with similar CNV profiles but divergent expression patterns, representing potential dosage insensitivity. By combining an InfoNCE objective with a hard-negative triplet loss, the model learns embeddings in which expression–CNV distance quantifies regulatory concordance. We apply this framework to 10 lung adenocarcinoma patients (approximately 80,000 cells) from the GSE131907 atlas, classifying cancer cells as concordant (expression follows CNV) or discordant (expression escapes CNV). Differential expression analysis between these groups reveals two classes of dosage-insensitive genes: escape genes that are upregulated in discordant cells despite CNV status, and compensation genes that are downregulated. Pooled analysis across 40,775 cancer cells identifies immune- and macrophage-associated escape genes, including VSIG4, FCGR1A, TREM2, and MARCO, as well as cytotoxic lymphocyte-associated compensation genes such as MALAT1, CCL5, and CD8A. These results highlight recurrent CNV-independent transcriptional programs and demonstrate that contrastive alignment provides a general framework for discovering regulatory escape mechanisms in cancer using standard single-cell RNA-seq data.