Sequence Design and Phylogenetic Inference with Generative Flow Networks
Qichen Huang ⋅ Carlos Mourra-Diaz ⋅ Xiaozhen Wen ⋅ David Payette
Abstract
Phylogenetic inference remains computationally challenging due to the exponentially growing tree topology search space, and current methods rely heavily on multiple sequence alignments (MSAs) which are expensive and error-prone. We propose AncestorGFN, a novel approach leveraging Generative Flow Networks (GFlowNets) for simultaneous sequence generation and phylogenetic inference without requiring MSAs. Our method learns to generate sequences matching a target distribution while the flow trajectories implicitly encode evolutionary relationships. We demonstrate that greedy traceback on maximum-flow trajectories recovers shared ancestral states, and evaluate on the let-7 microRNA family where the learned flow structure captures phylogenetic branching patterns. Furthermore, beam search at inference time discovers novel sequences clustering near known targets, suggesting applications in $\textit{de novo}$ sequence design. This work establishes a foundation for MSA-free phylogenetic inference using generative models.
Chat is not available.
Successful Page Load