Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Feb 22;428(4):709-719.
doi: 10.1016/j.jmb.2016.01.029. Epub 2016 Feb 5.

CryptoSite: Expanding the Druggable Proteome by Characterization and Prediction of Cryptic Binding Sites

Affiliations

CryptoSite: Expanding the Druggable Proteome by Characterization and Prediction of Cryptic Binding Sites

Peter Cimermancic et al. J Mol Biol. .

Abstract

Many proteins have small-molecule binding pockets that are not easily detectable in the ligand-free structures. These cryptic sites require a conformational change to become apparent; a cryptic site can therefore be defined as a site that forms a pocket in a holo structure, but not in the apo structure. Because many proteins appear to lack druggable pockets, understanding and accurately identifying cryptic sites could expand the set of drug targets. Previously, cryptic sites were identified experimentally by fragment-based ligand discovery and computationally by long molecular dynamics simulations and fragment docking. Here, we begin by constructing a set of structurally defined apo-holo pairs with cryptic sites. Next, we comprehensively characterize the cryptic sites in terms of their sequence, structure, and dynamics attributes. We find that cryptic sites tend to be as conserved in evolution as traditional binding pockets but are less hydrophobic and more flexible. Relying on this characterization, we use machine learning to predict cryptic sites with relatively high accuracy (for our benchmark, the true positive and false positive rates are 73% and 29%, respectively). We then predict cryptic sites in the entire structurally characterized human proteome (11,201 structures, covering 23% of all residues in the proteome). CryptoSite increases the size of the potentially "druggable" human proteome from ~40% to ~78% of disease-associated proteins. Finally, to demonstrate the utility of our approach in practice, we experimentally validate a cryptic site in protein tyrosine phosphatase 1B using a covalent ligand and NMR spectroscopy. The CryptoSite Web server is available at http://salilab.org/cryptosite.

Keywords: cryptic binding sites; machine learning; protein dynamics; undruggable proteins.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(a) Examples of a pocket and cryptic site in p38 MAP kinase. The nucleotide-binding site of the p38 MAP kinase is a pocket visible in both bound (holo; blue ribbon; PDB ID: 2ZB1) and unbound (apo; grey ribbon; PDB ID: 2NPQ) conformations. The ligand, biphenyl amide inhibitor, is depicted as blue spheres. On the other hand, the site in the C-lobe domain that binds octyglucoside lipid (green spheres) becomes a visible pocket only after the movement of the α- helix at the left of the structure (marked with the double-headed arrow). The small molecules are shown as they bind in the holo structures. UCSF Chimera software was used for the visualization (67). (b) Flowchart summarizing the analyses in this study. We started by creating a representative dataset of 84 known examples of cryptic binding sites, 92 binding pockets, and 705 concave surface patches from the Protein Data Bank (31) and the MOAD database (32). Next, we designed a set of 58 features that describe sequence, structure, and dynamics of individual residues and their neighbors. We then compared these attributes between the three types of a site to better understand the underlying characteristics of each site. Next, we used machine-learning algorithms to classify residues as belonging to a cryptic site or not. We then predicted cryptic sites in the entire structurally characterized human proteome (Materials and Methods, SI Text).
Figure 2
Figure 2
The accuracies of our predictive model, FTFlex, and Fpocket are measured as the area under the receiver-operating characteristic (ROC) curve based on predictions on all proteins in the test set (a), as well as based on sensitivity (true positive rate) and specificity (true negative rate) values from predictions on individual proteins (b). (a) Only ~45% and ~80% of cryptic site residues were detected by Fpocket and FTFlex, respectively; the area under the ROC curve was calculated by connecting the end of the ROC curve and the upper-right corner as a straight line. The accuracy of CryptoSite is comparable to that of FTFlex when small pockets that could fit small-molecule fragments are already present in the apo state of a cryptic site (this is the case in 10 out of 14 testing examples). However, CryptoSite is more accurate than FTFlex when a cryptic site is buried or resides in a large protein (Fig. S8A). (b) Sensitivities and specificities were determined for each protein in our test set (larger data points with black circle) and training set (smaller data points) based on leave-one-out cross-validation. The classification of the residues is based on the score threshold of 0.1. The two empty circles denote two predictions (one failed) of cryptic sites in proteins with more than one cryptic site. (c) The cryptic sites from our dataset are marked by green rectangles, and the computed scores that a residue is in a cryptic site are shown on the blue-to-red color scale. The small molecules that bind into the known cryptic sites are superposed from the alignment to the bound conformations and represented as yellow sticks.
Figure 3
Figure 3
Cryptic binding sites are predicted to expand the size of the druggable proteome. (a) The percentage of proteins for which no binding sites (grey), only cryptic sites (green), only binding pockets (blue), and both cryptic sites and binding pockets (orange) were predicted for all human proteins with known structure (left pie chart) and for a subset of disease-associated proteins (right pie chart). Shown are the results of the fast version of our predictive model that does not take into account features based on molecular dynamics simulations. (b) Cryptic binding sites in PTP1B. Ribbon (left and center) and surface (right) representations of the PTP1B structure (PDB ID: 2F6V) are colored based on the cryptic site score as in Fig. 2C. Residues with definitive chemical shift changes (|Δδ|) upon ABDF labeling (khaki) cluster around the cryptic and ABDF binding sites, whereas residues whose chemical shifts definitively do not change (purple) are more distal. The panel also shows positions and average volumes of the pockets (grey mesh) that are at least partially open more than 50% of the time, as observed in the molecular dynamics simulation at 300 K.

Similar articles

Cited by

References

    1. Nisius B, Sha F, Gohlke H. Structure-based computational analysis of protein binding sites for function and druggability prediction. Journal of biotechnology. 2012;159(3):123–134. - PubMed
    1. Campbell SJ, Gold ND, Jackson RM, Westhead DR. Ligand binding: functional site location, similarity and docking. Current opinion in structural biology. 2003;13(3):389–395. - PubMed
    1. Laurie AT, Jackson RM. Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites. Bioinformatics. 2005;21(9):1908–1916. - PubMed
    1. Hermann JC, et al. Structure-based activity prediction for an enzyme of unknown function. Nature. 2007;448(7155):775–779. - PMC - PubMed
    1. Laskowski RA, Luscombe NM, Swindells MB, Thornton JM. Protein clefts in molecular recognition and function. Protein science : a publication of the Protein Society. 1996;5(12):2438–2452. - PMC - PubMed

Publication types

LinkOut - more resources