Sequence Generation and Phylogenetic Inference with Generative Flow Networks
Qichen Huang ⋅ Carlos Mourra-Diaz ⋅ Xiaozhen Wen ⋅ David Payette
Abstract
Phylogenetic inference remains computationally challenging due to the exponen- tially growing tree topology search space, and current methods rely heavily on multiple sequence alignments (MSAs) which are expensive and error-prone. We propose AncestorGFN, a proof-of-concept approach leveraging Generative Flow Networks (GFlowNets) for simultaneous sequence generation and phylogenetic exploration without requiring explicit MSAs. Our method learns to generate se- quences matching a target distribution while the flow trajectories implicitly encode structural relationships among sequences. We demonstrate that greedy traceback on maximum-flow trajectories recovers shared intermediate states suggestive of common ancestry, and evaluate on the let-7 microRNA family where the learned flow structure qualitatively captures phylogenetic branching patterns. Further- more, beam search at inference time discovers novel sequences clustering near known targets, suggesting applications in $\textit{de novo}$ sequence design. This work es- tablishes an initial foundation for alignment-free phylogenetic exploration using generative models.
Chat is not available.
Successful Page Load