Machine learning-guided design of biomolecular condensates in an automated laboratory
Abstract
Biomolecular condensates are phase-separated cellular compartments that regulate signaling, stress responses, and molecular sequestration. Designing synthetic condensates with specified phase behavior and material properties remains difficult due to a complex, context-dependent sequence–property landscape in human cells. Here we present a generative ML–guided design–build–test–learn loop that couples experimental measurements with machine learning to discover condensate-forming sequences. We begin with a domain-expert seed library and perform high-throughput live-cell confocal imaging across multiple cell cycles. An automated image-processing pipeline extracts functionally relevant properties, including saturation concentration, size distribution, and morphology, producing a curated sequence–property dataset. Then, we fit a multi-output Gaussian Process surrogate and use Bayesian Optimization (BO) to propose new candidate sequences, closing the loop between computation and experimentation. Our approach effectively reduces the number of iterations needed to achieve an optimal design over expensive-to-evaluate functions such as the sequence-property landscape for biomolecule condensates The work contributes experimental results, a reusable benchmark dataset, and a practical strategy for generative ML in biomolecular proteomics.