Guy Barshatski, M.Sc. Thesis Seminar
Molecular lead optimization is an important task of drug discovery focusing on generating novel molecules similar to a drug candidate but with enhanced properties.
Prior works focused on supervised models requiring datasets of pairs of a molecule and an enhanced molecule.
These approaches require large amounts of data and are limited by the bias of the specific examples of enhanced molecules.
In this Thesis, we first tackle the molecule optimization problem and present an unsupervised generative approach with a molecule-embedding component that maps a discrete representation of a molecule to a continuous space.
The components are then coupled with a unique training architecture leveraging molecule fingerprints and applying double cycle constraints to enable both chemical resemblance to the original molecular lead while generating novel molecules with enhanced properties.
We evaluate our method on multiple common molecular optimization tasks, including dopamine receptor (DRD2) and drug likeness (QED), and show our method outperforms previous state-of-the-art baselines.
Moreover, we conduct thorough ablation experiments to show the effect and necessity of important components in our model.
Furthermore, we demonstrate our method's ability to generate FDA-approved drugs it has never encountered before, such as Perazine and Clozapine, which are used to treat psychotic disorders, like Schizophrenia.
The system is currently being deployed for use in the Targeted Drug Delivery and Personalized Medicine laboratories generating treatments using nanoparticle-based technology.
Next, since often molecules that satisfy multiple constrains are needed, e.g. DRD2 and QED, we focus on multi-property optimization.
Simultaneously optimizing these constraints was shown to be difficult, mostly due to the lack of training examples satisfying all constraints.
In this thesis, we present a novel unpaired approach for multi-property optimization.
Our architecture learns a transformation for each property optimization separately, while constraining the latent embedding space between all transformations.
This allows generating a molecule which optimizes multiple properties simultaneously.
We present a novel adaptive loss which balances the separate transformations and stabilizes the optimization process.
We evaluate our method on optimizing for two properties: dopamine receptor (DRD2) and drug likeness (QED), and show our method outperforms previous state-of-the-art, especially when training examples satisfying all constraints are sparse.
This thesis is the first work to demonstrate a unique dual learning style training leveraging shared translation components and molecules' fingerprints for molecular optimization.