Poster
in
Workshop: Machine Learning for Drug Discovery (MLDD)
Improving Small Molecule Generation using Mutual Information Machine
Danny Reidenbach · Micha Livne · Rajesh Ilango · Michelle Gill · Johnny Israeli
We address the task of controlled generation of small molecules, which entails finding novel molecules with desired properties under certain constraints. Here we introduce MolMIM, a probabilistic auto-encoder for small molecule drug discovery that learns an informative and clustered latent space.MolMIM is trained with Mutual Information Machine (MIM) learning and provides a fixed-size representation of variable-length SMILES strings.Since encoder-decoder models can learn representations with ``holes'' of invalid samples, here we propose a novel extension to the MIM training procedure which promotes a dense latent space and allows the model to sample valid molecules from random perturbations of latent codes.We provide a thorough comparison of MolMIM to several variable-size and fixed-size encoder-decoder models, demonstrating MolMIM's superior generation as measured in terms of validity, uniqueness, and novelty.We then utilize CMA-ES, a naive black-box, and gradient-free search algorithm, over MolMIM's latent space for the task of property-guided molecule optimization.We achieve state-of-the-art results in several constrained single-property optimization tasks and show competitive results in the challenging task of multi-objective optimization.We attribute the strong results to the structure of MolMIM's learned representation which promotes the clustering of similar molecules in the latent space, whereas CMA-ES is often used as a baseline optimization method. We also demonstrate MolMIM to be favorable in a compute-limited regime.