Application of Generative Autoencoder in De Novo Molecular Design

doi:10.1002/minf.201700123

. 2018 Jan;37(1-2):1700123.

doi: 10.1002/minf.201700123. Epub 2017 Dec 13.

Application of Generative Autoencoder in De Novo Molecular Design

Thomas Blaschke^{1

2}, Marcus Olivecrona¹, Ola Engkvist¹, Jürgen Bajorath², Hongming Chen¹

Affiliations

¹ Hit Discovery, Discovery Sciences, Innovative Medicines and Early Development Biotech Unit, AstraZeneca R&D Gothenburg, 431 83, Mölndal, Sweden.
² University of Bonn, Bonn Aachen International Center for Information Technology BIT, Life Science Informatics, Dahlmannstrasse 2, 53113, Bonn, Germany.

PMID: 29235269
PMCID: PMC5836887
DOI: 10.1002/minf.201700123

Application of Generative Autoencoder in De Novo Molecular Design

Thomas Blaschke et al. Mol Inform. 2018 Jan.

. 2018 Jan;37(1-2):1700123.

doi: 10.1002/minf.201700123. Epub 2017 Dec 13.

Authors

Thomas Blaschke^{1

2}, Marcus Olivecrona¹, Ola Engkvist¹, Jürgen Bajorath², Hongming Chen¹

Affiliations

¹ Hit Discovery, Discovery Sciences, Innovative Medicines and Early Development Biotech Unit, AstraZeneca R&D Gothenburg, 431 83, Mölndal, Sweden.
² University of Bonn, Bonn Aachen International Center for Information Technology BIT, Life Science Informatics, Dahlmannstrasse 2, 53113, Bonn, Germany.

PMID: 29235269
PMCID: PMC5836887
DOI: 10.1002/minf.201700123

Abstract

A major challenge in computational chemistry is the generation of novel molecular structures with desirable pharmacological and physiochemical properties. In this work, we investigate the potential use of autoencoder, a deep learning methodology, for de novo molecular design. Various generative autoencoders were used to map molecule structures into a continuous latent space and vice versa and their performance as structure generator was assessed. Our results show that the latent space preserves chemical similarity principle and thus can be used for the generation of analogue structures. Furthermore, the latent space created by autoencoders were searched systematically to generate novel compounds with predicted activity against dopamine receptor type 2 and compounds similar to known active compounds not included in the trainings set were identified.

Keywords: Autoencoder; chemoinformatics; de novo molecular design; deep learning; inverse QSAR.

PubMed Disclaimer

Figures

**Figure 1**
An autoencoder is a coordinated pair of NNs. The encoder converts a high‐ dimensional input, e. g. a molecule, into a continuous numerical representation with fixed dimensionality. The decoder reconstructs the input from the numerical representation.

**Figure 2**
Encoding and decoding of a molecule using a variational autoencoder. The encoder converts a molecule structure into a Gaussian distribution deterministically. Given the generated mean and variance, a new point is sampled and fed into the decoder. The decoder then generates a new molecule from the sampled point.

**Figure 3**
Sequence generation using teachers forcing. The last decoder trained with teachers forcing receives two inputs: the output of the previous layer and a character from the previous time step. In the training mode, the previous character is equal to the corresponding character from the input sequence, regardless of the probability output. During the generation mode the decoder samples at each time step a new character based on the output probability and uses this as input for the next time step.

**Figure 4**
Learning process of an adversarial autoencoder. The encoder converts a molecule directly into a numerical representation. During training the output is not only fed into the decoder but also into a discriminator. The discriminator is trained to distinguish between the output of the encoder and a randomly sampled point from a prior distribution. The encoder is trained to “fool” the discriminator by mimicking the target prior distribution.

**Figure 5**
Different representations of 4‐(bromomethyl)‐1H‐pyrazole. Exemplary generation of the one‐hot representation derived from the SMILES. For simplicity only a reduced vocabulary is shown here, while in practice a larger vocabulary that covers all tokens present in the *training* data is used.

**Figure 6**
Sampled structures at the latent vector corresponding to Celecoxib. The structures are sorted by the relative generation frequencies in descending order from left to right.

**Figure 7**
(a) Chemical similarity (Tanimoto, ECFP6) of generated structures to Celecoxib in relation to the distance in the latent space. (b) Fraction of valid SMILES generated during the reconstruction

**Figure 8**
Results without Celecoxib in trainings set. (a) Chemical similarity (Tanimoto, ECFP6) of generated structures to Celecoxib in relation to the distance in the latent space. (b) Fraction of valid SMILES generated during the reconstruction.

**Figure 9**
Searching for DRD2 active compounds using the Uniform AAE. The first 100 iterations are randomly sampled points while the next 500 iterations are determined by Bayesian optimization.

**Figure 10**
Generated structures from BO compared to the nearest neighbour from the set of validated actives. The validated actives were not present in the training set of the autoencoder. The Tanimoto similarity is calculated using the ECFP6 fingerprint.

**Figure 11**
The relationship between the fraction of generated active compounds at specific latent points and the BO score. The fraction of generated actives is the number of actives divided by all 500 reconstruction attempts. The set “Random” corresponds to the randomly selected latent points.

See this image and copyright information in PMC

Cited by

The Advent of Generative Chemistry.
Vanhaelen Q, Lin YC, Zhavoronkov A. Vanhaelen Q, et al. ACS Med Chem Lett. 2020 Jul 14;11(8):1496-1505. doi: 10.1021/acsmedchemlett.0c00088. eCollection 2020 Aug 13. ACS Med Chem Lett. 2020. PMID: 32832015 Free PMC article.
Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR.
Tropsha A, Isayev O, Varnek A, Schneider G, Cherkasov A. Tropsha A, et al. Nat Rev Drug Discov. 2024 Feb;23(2):141-155. doi: 10.1038/s41573-023-00832-0. Epub 2023 Dec 8. Nat Rev Drug Discov. 2024. PMID: 38066301 Review.
Who Is Metabolizing What? Discovering Novel Biomolecules in the Microbiome and the Organisms Who Make Them.
Couvillion SP, Agrawal N, Colby SM, Brandvold KR, Metz TO. Couvillion SP, et al. Front Cell Infect Microbiol. 2020 Jul 31;10:388. doi: 10.3389/fcimb.2020.00388. eCollection 2020. Front Cell Infect Microbiol. 2020. PMID: 32850487 Free PMC article.
Retro Drug Design: From Target Properties to Molecular Structures.
Wang Y, Michael S, Huang R, Zhao J, Recabo K, Bougie D, Shu Q, Shinn P, Sun H. Wang Y, et al. bioRxiv [Preprint]. 2021 May 12:2021.05.11.442656. doi: 10.1101/2021.05.11.442656. bioRxiv. 2021. Update in: J Chem Inf Model. 2022 Jun 13;62(11):2659-2669. doi: 10.1021/acs.jcim.2c00123 PMID: 34013260 Free PMC article. Updated. Preprint.
MOLER: Incorporate Molecule-Level Reward to Enhance Deep Generative Model for Molecule Optimization.
Fu T, Xiao C, Glass LM, Sun J. Fu T, et al. IEEE Trans Knowl Data Eng. 2022 Nov;34(11):5459-5471. doi: 10.1109/tkde.2021.3052150. Epub 2021 Jan 21. IEEE Trans Knowl Data Eng. 2022. PMID: 36590707 Free PMC article.

See all "Cited by" articles

References

1. Ma J., Sheridan R. P., Liaw A., Dahl G. E., Svetnik V., J. Chem. Inf. Model. 2015, 55, 263–274. - PubMed
1. Segler M. H. S., Kogej T., Tyrchan C., Waller M. P., ArXiv:1701.01329 Phys. Stat 2017.
1. Yuan W., Jiang D., Nambiar D. K., Liew L. P., Hay M. P., Bloomstein J., Lu P., Turner B., Le Q.-T., Tibshirani R., et al., J. Chem. Inf. Model. 2017, 57, 875–882. - PMC - PubMed
1. Jaques N., Gu S., Bahdanau D., Lobato J. M. H., Turner R. E., Eck D., ArXiv:1611.02796 Cs 2016.
1. Bickerton G. R., Paolini G. V., Besnard J., Muresan S., Hopkins A. L., Nat. Chem. 2012, 4, 90–98. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

[1] Ma J., Sheridan R. P., Liaw A., Dahl G. E., Svetnik V., J. Chem. Inf. Model. 2015, 55, 263–274. - PubMed

[2] Ma J., Sheridan R. P., Liaw A., Dahl G. E., Svetnik V., J. Chem. Inf. Model. 2015, 55, 263–274. - PubMed

[3] Segler M. H. S., Kogej T., Tyrchan C., Waller M. P., ArXiv:1701.01329 Phys. Stat 2017.

[4] Segler M. H. S., Kogej T., Tyrchan C., Waller M. P., ArXiv:1701.01329 Phys. Stat 2017.

[5] Yuan W., Jiang D., Nambiar D. K., Liew L. P., Hay M. P., Bloomstein J., Lu P., Turner B., Le Q.-T., Tibshirani R., et al., J. Chem. Inf. Model. 2017, 57, 875–882. - PMC - PubMed

[6] Yuan W., Jiang D., Nambiar D. K., Liew L. P., Hay M. P., Bloomstein J., Lu P., Turner B., Le Q.-T., Tibshirani R., et al., J. Chem. Inf. Model. 2017, 57, 875–882. - PMC - PubMed

[7] Jaques N., Gu S., Bahdanau D., Lobato J. M. H., Turner R. E., Eck D., ArXiv:1611.02796 Cs 2016.

[8] Jaques N., Gu S., Bahdanau D., Lobato J. M. H., Turner R. E., Eck D., ArXiv:1611.02796 Cs 2016.

[9] Bickerton G. R., Paolini G. V., Besnard J., Muresan S., Hopkins A. L., Nat. Chem. 2012, 4, 90–98. - PMC - PubMed

[10] Bickerton G. R., Paolini G. V., Besnard J., Muresan S., Hopkins A. L., Nat. Chem. 2012, 4, 90–98. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Application of Generative Autoencoder in De Novo Molecular Design

Affiliations

Application of Generative Autoencoder in De Novo Molecular Design

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources