Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 14;62(3):463-471.
doi: 10.1021/acs.jcim.1c01531. Epub 2022 Feb 1.

Yuel: Improving the Generalizability of Structure-Free Compound-Protein Interaction Prediction

Affiliations

Yuel: Improving the Generalizability of Structure-Free Compound-Protein Interaction Prediction

Jian Wang et al. J Chem Inf Model. .

Abstract

Predicting binding affinities between small molecules and the protein target is at the core of computational drug screening and drug target identification. Deep learning-based approaches have recently been adapted to predict binding affinities and they claim to achieve high prediction accuracy in their tests; we show that these approaches do not generalize, that is, they fail to predict interactions between unknown proteins and unknown small molecules. To address these shortcomings, we develop a new compound-protein interaction predictor, Yuel, which predicts compound-protein interactions with a higher generalizability than the existing methods. Upon comprehensive tests on various data sets, we find that out of all the deep-learning approaches surveyed, Yuel manifests the best ability to predict interactions between unknown compounds and unknown proteins.

PubMed Disclaimer

Conflict of interest statement

DECLARATION OF INTERESTS

The authors declare no competing financial interest.

Figures

Figure 1.
Figure 1.
The architecture of Yuel. CNN is convolution neural network. GNN is graph neural network, and FC is fully-connected layers. The “X” symbol means the outer product of the compound feature matrix and the protein feature matrix.
Figure 2.
Figure 2.
Testing Yuel, DeepConv-DTI, DeepDTA, Null-Protein, and Null-Compound models on different data sets. The label “DS1/DS2” on the upper left corner of each panel means the models are trained on DS1 and tested on DS2. The Y axis indicates the Pearson coefficient correlation.
Figure 3.
Figure 3.
Testing Yuel, DeepConv-DTI, DeepDTA, Null-Protein, and Null-Compound models on the PDBbind (Alanine), Davis (Alanine), and Metz (Alanine) data sets with each protein sequence replaced with a single alanine. The X axis is the experimental affinity, and the Y axis is the predicted affinity. The label on the upper left corner is the Person correlation coefficient.
Figure 4.
Figure 4.
Comparison of position-wise FC layers and feature-wise FC layers. (a) The protein features and the compound features are first concatenated and flattened to a 1-D feature vector. The 1-D feature vector is then subject to full-connected layers. (b) The protein features and the compound features are first split to individual residue features and atom features. Each residue feature and each atom feature is subject to fully-connected layers, individually. Finally, the residue features and the atom features are multiplied to obtain an attention matrix. (c) Testing Yuel-cc on Davis and PDBbind. The label “DS1/DS2” above each panel means the model is trained on DS1 and tested on DS2.
Figure 5.
Figure 5.
Testing the ability to identify hotspot atoms and residues. (a) The predicted protein-binding interactions of fours atoms in N-Aryl-Hydroxybicyclohydantoin (LG790), the ligand of the rat androgen receptor (PDB ID: 2IHQ). The X axis indicates the residues sorted by their scores from high to low. The Y axis is the score of the compound atom corresponding to a protein residue. (b) The histogram of AUC of hotspot atoms prediction in the PDBbind data set, and the histogram of the ranks of hotspot residues in the PDBbind data set. (c) The prediction and the ground-truth hotspot atoms in LG790. (d) The predicted hotspot residues in the protein of 2IHQ.

Similar articles

Cited by

References

    1. Drews J Drug Discovery: A Historical Perspective. Science. 2000, 287, 1960–1964. - PubMed
    1. Macarron R; Banks MN; Bojanic D; Burns DJ; Cirovic DA; Garyantes T; Green DVS; Hertzberg RP; Janzen WP; Paslay JW Impact of High-Throughput Screening in Biomedical Research. Nat. Rev. Drug Discov 2011, 10, 188–195. - PubMed
    1. Shoichet BK Virtual Screening of Chemical Libraries. Nature 2004, 432, 862–865. - PMC - PubMed
    1. Wang J; Dokholyan NV MedusaDock 2.0: Efficient and Accurate Protein-Ligand Docking With Constraints. J. Chem. Inf. Model 2019, 59, 2509–2515. - PMC - PubMed
    1. Trott O; Olson AJ AutoDock Vina: Improving the Speed and Accuracy of Docking with a New Scoring Function, Efficient Optimization, and Multithreading. J. Comput. Chem 2010, 31, 455–461. - PMC - PubMed

Publication types