Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Sep 28;10(1):15856.
doi: 10.1038/s41598-020-72906-7.

Predicting binding sites from unbound versus bound protein structures

Affiliations

Predicting binding sites from unbound versus bound protein structures

Jordan J Clark et al. Sci Rep. .

Abstract

We present the application of seven binding-site prediction algorithms to a meticulously curated dataset of ligand-bound and ligand-free crystal structures for 304 unique protein sequences (2528 crystal structures). We probe the influence of starting protein structures on the results of binding-site prediction, so the dataset contains a minimum of two ligand-bound and two ligand-free structures for each protein. We use this dataset in a brief survey of five geometry-based, one energy-based, and one machine-learning-based methods: Surfnet, Ghecom, LIGSITEcsc, Fpocket, Depth, AutoSite, and Kalasanty. Distributions of the F scores and Matthew's correlation coefficients for ligand-bound versus ligand-free structure performance show no statistically significant difference in structure type versus performance for most methods. Only Fpocket showed a statistically significant but low magnitude enhancement in performance for holo structures. Lastly, we found that most methods will succeed on some crystal structures and fail on others within the same protein family, despite all structures being relatively high-quality structures with low structural variation. We expected better consistency across varying protein conformations of the same sequence. Interestingly, the success or failure of a given structure cannot be predicted by quality metrics such as resolution, Cruickshank Diffraction Precision index, or unresolved residues. Cryptic sites were also examined.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
(A) Distribution of the sizes of unified binding sites for the 304 protein families in this dataset, as % frequency. (B) Distribution of amino acid composition of the 304 unified binding sites.
Figure 2
Figure 2
Analyses of maximum and mean backbone RMSD for each protein family. Each point represents the maximum or mean observed in one protein family, and the number of points of each section is labeled in black (numbers in parenthesis are points with values > 3.5 Å). (A) The maximum backbone RMSD across the apo-apo pairs is compared to the maximum of the holo-holo pairs; 206 proteins display RMSD ≤ 1 Å for both groups. (B) The mean backbone RMSD across the apo-apo pairs is compared to the mean of the holo-holo pairs; 247 proteins display RMSD ≤ 1 Å for both groups. (C) The maximum UBS RMSD across the apo-apo pairs is compared to the maximum of the holo-holo pairs; 206 proteins display RMSD ≤ 1 Å for both groups. (D) The mean UBS RMSD across the apo-apo pairs is compared to the mean of the holo-holo pairs; 235 proteins display RMSD ≤ 1 Å for both groups.
Figure 3
Figure 3
Distribution of family median F scores of apo and holo protein structures for (A) Surfnet (p = 0.90), (B) Ghecom (p = 0.20), (C) LIGSITEcsc (p = 0.56), (D) Fpocket (p = 0.04), (E) Depth (p = 0.32), (F) AutoSite (p = 0.13), and (G) Kalasanty (p = 0.12).
Figure 4
Figure 4
Distribution of family median Matthews Correlation Coefficients (MCCs) of apo and holo protein structures for (A) Surfnet (p = 0.63), (B) Ghecom (p = 0.17), (C) LIGSITEcsc (p = 0.60), (D) Fpocket (p = 0.03), (E) Depth (p = 0.17), (F) AutoSite (p = 0.10), and (G) Kalasanty (p = 0.11).
Figure 5
Figure 5
Family median F scores of apo and holo protein structures for (A) Surfnet, (B) Ghecom, (C) LIGSITEcsc, (D) Fpocket, (E) Depth, (F) AutoSite, and (G) Kalasanty where the error bars are constructed from the family minima and maxima. Line: y = x.
Figure 5
Figure 5
Family median F scores of apo and holo protein structures for (A) Surfnet, (B) Ghecom, (C) LIGSITEcsc, (D) Fpocket, (E) Depth, (F) AutoSite, and (G) Kalasanty where the error bars are constructed from the family minima and maxima. Line: y = x.
Figure 6
Figure 6
Family median MCCs of apo and holo protein structures for (A) Surfnet, (B) Ghecom, (C) LIGSITEcsc, (D) Fpocket, (E) Depth, (F) AutoSite, and (G) Kalasanty where the error bars are constructed from the family minima and maxima. Line: y = x.
Figure 6
Figure 6
Family median MCCs of apo and holo protein structures for (A) Surfnet, (B) Ghecom, (C) LIGSITEcsc, (D) Fpocket, (E) Depth, (F) AutoSite, and (G) Kalasanty where the error bars are constructed from the family minima and maxima. Line: y = x.

Similar articles

Cited by

References

    1. Xie Z-R, Hwang M-J. Molecular Modeling of Proteins Methods in Molecular Biology. New York: Humana Press; 2015. pp. 383–398.
    1. Ghersi D, Sanchez R. Beyond structural genomics: computational approaches for the identification of ligand binding sites in protein structures. J. Struct. Funct. Genom. 2011;12:109–117. doi: 10.1007/s10969-011-9110-6. - DOI - PMC - PubMed
    1. Perot S, Sperandio O, Miteva MA, Camproux AC, Villoutreix BO. Druggable pockets and binding site centric chemical space: a paradigm shift in drug discovery. Drug Discov. Today. 2010;15:656–667. doi: 10.1016/j.drudis.2010.05.015. - DOI - PubMed
    1. Berman HM, et al. The protein data bank. Nucleic Acids Res. 2000;28:235–242. doi: 10.1093/nar/28.1.235. - DOI - PMC - PubMed
    1. Chen K, Mizianty MJ, Gao J, Kurgan L. A critical comparative assessment of predictions of protein-binding sites for biologically relevant organic compounds. Structure. 2011;19:613–621. doi: 10.1016/j.str.2011.02.015. - DOI - PubMed

Publication types