Poster
in
Workshop: Integrating Generative and Experimental Platforms for Biomolecular Design
ACTIVE LEARNING ON SYNTHONS FOR MOLECULAR DESIGN
Tom Grigg · Mason Burlage · Oliver Scott · Dominique Sydow · Liam Wilbraham
Exhaustive screening is highly informative but often intractable against expensive objective functions involved in modern drug discovery. This problem is exacerbated in combinatorial contexts such as multi-vector screening, where molecular spaces can quickly become too large. Here, we introduce Scalable Active Learning via Synthon Acquisition (SALSA): a simple algorithm for efficient multi-vector expansion which extends pool-based active learning to non- enumerable spaces by factoring modeling and acquisition over synthon or fragment choices. Through experiments on ligand- and structure-based objectives we highlight SALSA’s sample efficiency, and its ability to scale to spaces of trillions of compounds. Further, we demonstrate application toward multi-parameter objective design tasks on three protein targets – finding SALSA-generated molecules have comparable chemical property profiles to known bioactives, and exhibit greater diversity and higher scores over an industry-leading generative approach.