Poster
in
Workshop: Machine Learning for Drug Discovery (MLDD)
Improving Small Molecule Generation using Mutual Information Machine
Danny Reidenbach · Micha Livne · Rajesh Ilango · Michelle Gill · Johnny Israeli
in
Workshop: Machine Learning for Drug Discovery (MLDD)
We address the task of controlled generation of small molecules, which entails finding novel molecules with desired properties under certain constraints. Here we introduce MolMIM, a probabilistic auto-encoder for small molecule drug discovery that learns an informative and clustered latent space.MolMIM is trained with Mutual Information Machine (MIM) learning and provides a fixed-size representation of variable-length SMILES strings.Since encoder-decoder models can learn representations with ``holes'' of invalid samples, here we propose a novel extension to the MIM training procedure which promotes a dense latent space and allows the model to sample valid molecules from random perturbations of latent codes.We provide a thorough comparison of MolMIM to several variable-size and fixed-size encoder-decoder models, demonstrating MolMIM's superior generation as measured in terms of validity, uniqueness, and novelty.We then utilize CMA-ES, a naive black-box, and gradient-free search algorithm, over MolMIM's latent space for the task of property-guided molecule optimization.We achieve state-of-the-art results in several constrained single-property optimization tasks and show competitive results in the challenging task of multi-objective optimization.We attribute the strong results to the structure of MolMIM's learned representation which promotes the clustering of similar molecules in the latent space, whereas CMA-ES is often used as a baseline optimization method. We also demonstrate MolMIM to be favorable in a compute-limited regime.