Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Aug 21;9(1):12220.
doi: 10.1038/s41598-019-47536-3.

Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II

Affiliations

Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II

Keiji Ogura et al. Sci Rep. .

Abstract

Assessing the hERG liability in the early stages of drug discovery programs is important. The recent increase of hERG-related information in public databases enabled various successful applications of machine learning techniques to predict hERG inhibition. However, most of these researches constructed the datasets from only one database, limiting the predictability and scope of the models. In this study, a hERG classification model was constructed using the largest dataset for hERG inhibition built by integrating multiple databases. The integrated dataset consisted of more than 291,000 structurally diverse compounds derived from ChEMBL, GOSTAR, PubChem, and hERGCentral. The prediction model was built by support vector machine (SVM) with descriptor selection based on Non-dominated Sorting Genetic Algorithm-II (NSGA-II) to optimize the descriptor set for maximum prediction performance with the minimal number of descriptors. The SVM classification model using 72 selected descriptors and ECFP_4 structural fingerprints recorded kappa statistics of 0.733 and accuracy of 0.984 for the test set, substantially outperforming the prediction performance of the current commercial applications for hERG prediction. Finally, the applicability domain of the prediction model was assessed based on the molecular similarity between the training set and test set compounds.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
The result of descriptor selection by NSGA-II. (a) Ratio of dominated Pareto solutions of previous generation at generation t. (b) The Kappa statistics and the number of used descriptors of the descriptor sets in the 100th generation. (c) Results of the combination of the Pareto solutions descriptor set and ECFP_4. The selected descriptor sets for model building are highlighted. The kappa statistics of the SVM model only using ECFP_4 is shown as dashed line.
Figure 2
Figure 2
Frequency of descriptors in the 100th generation of the Pareto solutions.
Figure 3
Figure 3
The ROC scores of the SVM models built from the hERG integrated database (light blue bar), ChEMBL (blue line), GOSTAR (red line), NCGC (green line), and hERGCentral (purple line), using (a) ECFP4, (b) ECFP4 and 72 descriptors as the explanatory variables. The horizontal axis corresponds to the data source of the test set.
Figure 4
Figure 4
ROC curve of the SVM model, using the 72 descriptors and ECFP_4 (red), as compared to ACD/Percepta (orange), ADMET Predictor (blue), and StarDrop (green).
Figure 5
Figure 5
Compounds for which only the SVM model correctly predicted the activities, and their most similar hERG inhibitors. Each structure was ionized at pH7.4.
Figure 6
Figure 6
Performance metrics for the test set in each similarity range. The horizontal axis denotes similarity range, and the vertical axis indicates the values for accuracy (red), Balanced Accuracy (ocher), Kappa (green), Sensitivity (blue), and Specificity (green).

Similar articles

Cited by

References

    1. Kennedy T. Managing the drug discovery/development interface. Drug Discov. Today. 1997;2:436–444. doi: 10.1016/S1359-6446(97)01099-4. - DOI
    1. Kola I, Landis J. Can the Pharmaceutical Industry Reduce Attrition Rates? Nat. Rev. Drug Discov. 2004;3:711–715. doi: 10.1038/nrd1470. - DOI - PubMed
    1. Laverty HG, et al. How Can We Improve Our Understanding of Cardiovascular Safety Liabilities to Develop Safer Medicines? Br. J. Pharmacol. 2011;163:675–693. doi: 10.1111/j.1476-5381.2011.01255.x. - DOI - PMC - PubMed
    1. Snyders DJ. Structure and Function of Cardiac Potassium Channels. Cardiovasc. Res. 1999;42:377–390. doi: 10.1016/S0008-6363(99)00071-1. - DOI - PubMed
    1. Redfern WS, et al. Relationships between Preclinical Cardiac Electrophysiology, Clinical QT Interval Prolongation and Torsade de Pointes for a Broad Range of Drugs: Evidence for a Provisional Safety Margin in Drug Development. Cardiovasc. Res. 2003;58:32–45. doi: 10.1016/S0008-6363(02)00846-5. - DOI - PubMed

Publication types