Invited Keynote 5 (Adam Kosiorek, Ph.D.): AlphaGenome - Advancing regulatory variant effect prediction with a unified DNA sequence model
Abstract
AlphaGenome is a unified DNA sequence model that takes 1Mb of DNA sequence as input and predicts thousands of functional genomic tracks up to single-base-pair resolution. By bridging the gap between traditional short-sequence/high-resolution models and long-sequence/low-resolution models, AlphaGenome expands the range of modalities, input lengths, and prediction resolutions compared to prior art. Remarkably, AlphaGenome is trained on just two i.i.d. examples: the reference human and mouse genomes. In this talk, we will explore the modeling assumptions that enable training a state-of-the-art model in this setup, leading to performance that matches or exceeds the strongest available external models in most variant effect prediction evaluations. As a case study, we will demonstrate how AlphaGenome’s multimodal capabilities successfully recapitulate the mechanisms of clinically relevant variants near the TAL1 oncogene. We will also discuss the model's current limitations and a need for a joint sequence and function model .