Technical Report MSC-2019-05

Title: Prototype-Based Chemical Design using Diversity-Driven Generative Models
Authors: Shahar Harel
Supervisors: Kira Radinsky and Shaul Markovitch
PDFCurrently accessibly only within the Technion network
Abstract: As the space of potential molecules for pharmacological treatment is literally infinite, designing a new drug is an expensive and lengthy process. A common technique during drug discovery is to start from a molecule which already has some of the desired properties. An interdisciplinary team of scientists generates hypothesis about the required changes to the prototype. We call this process a prototype-driven hypothesis generation.

In this work, we develop an algorithmic unsupervised approach for prototype-driven hypothesis generation. Our method is inspired by the known analogy between a chemist understanding of a compound and a language speaker understanding of a word (“Atoms are letters, molecules are the words, supramolecular entities are the sentences and the chapters” [Jean-Marie Lehn 1995]), which motivates the potential of Natural Language Processing for Computational Chemistry. More formally, we design a conditional deep generative model for molecule generation with diversity attention.

The model operates on a given molecule prototype and generates various molecules as candidates. The generated molecules should be novel and share desired properties with the prototype. Our model extends Variational Autoencoders to allow a conditional diverse sampling - sampling an example from the data distribution (drug-like molecules) which is closer to a given input. This allows sampling molecules closer to a prototype molecule, and thus increase probability of generating a valid drug with similar characteristics. Additionally, we add a diversity component that introduce parametrized diversity into the generation process, to allow the sampling to generate novelty with respect to the prototype.

We show that the molecules generated by the system are valid molecules which simultaneously have strong connection to the prototype and are novel. In addition, we suggest several ranking functions for the generated molecule population. Out of the compounds generated by the system, we identified 35 FDA-approved drugs. As an example, our system generated Isoniazid - one of the main drugs for Tuberculosis.

CopyrightThe above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information

Remark: Any link to this technical report should be to this page (, rather than to the URL of the PDF files directly. The latter URLs may change without notice.

To the list of the MSC technical reports of 2019
To the main CS technical reports page

Computer science department, Technion