Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct 26:12:752732.
doi: 10.3389/fgene.2021.752732. eCollection 2021.

A Deep Learning and XGBoost-Based Method for Predicting Protein-Protein Interaction Sites

Affiliations

A Deep Learning and XGBoost-Based Method for Predicting Protein-Protein Interaction Sites

Pan Wang et al. Front Genet. .

Abstract

Knowledge about protein-protein interactions is beneficial in understanding cellular mechanisms. Protein-protein interactions are usually determined according to their protein-protein interaction sites. Due to the limitations of current techniques, it is still a challenging task to detect protein-protein interaction sites. In this article, we presented a method based on deep learning and XGBoost (called DeepPPISP-XGB) for predicting protein-protein interaction sites. The deep learning model served as a feature extractor to remove redundant information from protein sequences. The Extreme Gradient Boosting algorithm was used to construct a classifier for predicting protein-protein interaction sites. The DeepPPISP-XGB achieved the following results: area under the receiver operating characteristic curve of 0.681, a recall of 0.624, and area under the precision-recall curve of 0.339, being competitive with the state-of-the-art methods. We also validated the positive role of global features in predicting protein-protein interaction sites.

Keywords: deep learning; extreme gradient boosting; machine learning; protein functions; protein-protein interaction.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
The architecture of DeepPPISP-XGB model. (A) Illustration of the DeepPPISP-XGB workflow, which consists of three modules: extracting feature, training classifier, predicting PPIS. (B) The architecture of DeepPPISP model, which contain embedding layer, different scale convolutions, fully connected layers and output layer.
FIGURE 2
FIGURE 2
UMAP diagrams of (A) raw features of the training set, (B) preprocessing features of the training set, (C) raw features of the testing set, and (D) preprocessing feature of the testing set.
FIGURE 3
FIGURE 3
The ROC curves of (A) 5-fold cross validation and (B) independent test. The red dotted line is a control line on which AUROC = 0.5.
FIGURE 4
FIGURE 4
The ROC curves of 10-fold cross validation on the train set. The minimum AUROC value cross validation is 0.730 at the first fold. The maximum value of the cross validation is 0.752 at the ten-th fold. The green line represents the ROC curve of the cross validation mean. The mean value of AUROC is 0.741. The red dotted line is a control line on which AUROC = 0.5.
FIGURE 5
FIGURE 5
The ROC curves (A) and precision-recall curves (B) for 5 algorithms on the independent test.
FIGURE 6
FIGURE 6
The ROC curves (A) and the precision-recall curves (B) for both local and global & local features on the independent test.

Similar articles

Cited by

References

    1. Altschul S., Madden T. L., Schäffer A. A., Zhang J., Zhang Z., Miller W., et al. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25 (17), 3389–3402. 10.1093/nar/25.17.3389 - DOI - PMC - PubMed
    1. Aumentado-Armstrong T. T., Istrate B., Murgita R. A. (2015). Algorithmic approaches to protein-protein interaction site prediction. Algorithms Mol. Biol. 10, 1–21. 10.1186/s13015-015-0033-9 - DOI - PMC - PubMed
    1. Bagchi A. (2015). Use of Machine Learning Features to Detect Protein-Protein Interaction Sites at the Molecular Level. Inf. Syst. Des. Intell. Appl., 49–54. Springer. 10.1007/978-81-322-2247-7_6 - DOI
    1. Bendell C. J., Liu S., Aumentado-Armstrong T., Istrate B., Cernek P. T., Khan S., et al. (2014). Transient protein-protein interface prediction: datasets, features, algorithms, and the RAD-T predictor. BMC bioinformatics 15, 1–12. 10.1186/1471-2105-15-82 - DOI - PMC - PubMed
    1. Berman H. M., Westbrook J., Feng Z., Gilliland G., Bhat T. N., Weissig H., et al. (2000). The protein data bank. Nucleic Acids Res. 28, 235–242. 10.1093/nar/28.1.235 - DOI - PMC - PubMed

LinkOut - more resources