Leveraging Discrete Function Decomposability for Scientific Design
Abstract
In the era of AI-driven science and engineering, we often want to design discrete objects (e.g., circuits, proteins, materials) in silico according to user-specified properties (e.g., that a protein binds its target). Given a property predictive model, in silico design typically involves training a generative model over the design space (e.g., over the set of all length-L proteins) to concentrate on designs with the desired properties. Distributional optimization, formalized as an estimation of distribution algorithm or as reinforcement learning policy optimization, maximizes an objective function in expectation over samples. Optimizing a distribution over discrete-valued designs is in general challenging due to the combinatorial nature of the design space. However, many property predictors in scientific applications are decomposable in the sense that they can be factorized over design variables in a way that will prove useful. For example, the active site amino acids in a catalytic protein may need to only loosely interact with the rest of the protein for maximal catalytic activity. Current distributional optimization algorithms are unable to make use of such structure, which could dramatically improve the optimization. Herein, we propose and demonstrate use of a new distributional optimization algorithm, Decomposition-Aware Distributional Optimization (DADO), that can leverage any decomposability defined by a junction tree on the design variables. At its core, DADO employs a factorized “search distribution”—a learned generative model—for efficient navigation of the search space, and invokes graph message passing to coordinate optimization across all variables.