Ancestral protein sequence reconstruction using a tree-structured Ornstein-Uhlenbeck variational autoencoder

Lys Sanz Moreta · Ola Rønning · Ahmad Salim Al-Sibahi · Jotun Hein · Douglas Theobald · Thomas Hamelryck

Keywords: [ evolution ] [ variational autoencoders ]

[ Abstract ]
[ Abstract ]
Mon 25 Apr 6:30 p.m. PDT — 8:30 p.m. PDT


We introduce a deep generative model for representation learning of biological sequences that, unlike existing models, explicitly represents the evolutionary process. The model makes use of a tree-structured Ornstein-Uhlenbeck process, obtained from a given phylogenetic tree, as an informative prior for a variational autoencoder. We show the model performs well on the task of ancestral sequence reconstruction of single protein families. Our results and ablation studies indicate that the explicit representation of evolution using a suitable tree-structured prior has the potential to improve representation learning of biological sequences considerably. Finally, we briefly discuss extensions of the model to genomic-scale data sets and the case of a latent phylogenetic tree.

