Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Mar 1;29(5):597-604.
doi: 10.1093/bioinformatics/btt024. Epub 2013 Jan 17.

APoc: large-scale identification of similar protein pockets

Affiliations

APoc: large-scale identification of similar protein pockets

Mu Gao et al. Bioinformatics. .

Abstract

Motivation: Most proteins interact with small-molecule ligands such as metabolites or drug compounds. Over the past several decades, many of these interactions have been captured in high-resolution atomic structures. From a geometric point of view, most interaction sites for grasping these small-molecule ligands, as revealed in these structures, form concave shapes, or 'pockets', on the protein's surface. An efficient method for comparing these pockets could greatly assist the classification of ligand-binding sites, prediction of protein molecular function and design of novel drug compounds.

Results: We introduce a computational method, APoc (Alignment of Pockets), for the large-scale, sequence order-independent, structural comparison of protein pockets. A scoring function, the Pocket Similarity Score (PS-score), is derived to measure the level of similarity between pockets. Statistical models are used to estimate the significance of the PS-score based on millions of comparisons of randomly related pockets. APoc is a general robust method that may be applied to pockets identified by various approaches, such as ligand-binding sites as observed in experimental complex structures, or predicted pockets identified by a pocket-detection method. Finally, we curate large benchmark datasets to evaluate the performance of APoc and present interesting examples to demonstrate the usefulness of the method. We also demonstrate that APoc has better performance than the geometric hashing-based method SiteEngine.

Availability and implementation: The APoc software package including the source code is freely available at http://cssb.biology.gatech.edu/APoc.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Flowchart of the APoc algorithm. Red/blue spheres represent backbone Cα atoms from the two pockets, arrows represent vectors pointing from the Cα to the Cβ and solid lines represent protein backbone traces. ‘Aln’ is the abbreviation for ‘alignment’
Fig. 2.
Fig. 2.
Mean scores of randomly selected similar length protein pockets. Pockets were detected by CAVITATOR and LIGSITE in RS1, a set of 5000 experimental protein structures, respectively, and by LPC in RS2, a set of protein–ligand complex structures curated from the PDB
Fig. 3.
Fig. 3.
Performance of APoc. (A) Cumulative fraction of pairs of pockets at various significance levels of pocket similarity for the subject and the control sets, respectively. (B) Sensitivity versus FPR. ‘Obs Pk’ and ‘Pre Pk’ denotes observed and predicted pockets (see Text), respectively. ‘Vol Diff’ denotes the pocket volume difference given by Dvol
Fig. 4.
Fig. 4.
Examples of similar pockets from proteins having different global topology and/or fold. (A) A GTP-binding pocket in a GTPase PAB0955 from P.abyssi (PDB code: 1yr8, chain A, green) versus a GDP-binding pocket in YqeH GTPase from G.stearothermophilus (PDB code: 3ec1, chain A, purple). (B) An ATP-binding pocket in a bifunctional glutathionylspermidine synthetase/amidase from E.coli (PDB code: 2io7, chain B, green) versus an ATP-binding pocket in an aminoglycoside phosphotransferase from A.baumannii (PDB code: 4ej7, chain B, purple). In each snapshot, the two protein structures are shown in green/purple cartoon representations, and the corresponding bound ligands are shown in cyan/red licorice representations, respectively. For clarity, pocket/non-pocket regions are shown in solid/transparent colours, respectively. Aligned pocket Cα atoms are shown as spheres. Molecular images were created with VMD (Humphrey et al., 1996). The global structural similarity measured by TM-score is denoted as ‘gTM-score’
Fig. 5.
Fig. 5.
APoc versus SiteEngine. The ROC curves of SiteEngine were obtained by varying the threshold values on the Match score. The two red curves are on the full set of 2000 pairs of pockets, and on a subset of pockets possessing 45–65 pseudocentres, respectively

Similar articles

Cited by

References

    1. Alberts B. Molecular Biology of the Cell. New York: Garland Science; 2008.
    1. Berman HM, et al. The protein data bank. Nucleic Acids Res. 2000;28:235–242. - PMC - PubMed
    1. Binkowski TA, et al. CASTp: computed atlas of surface topography of proteins. Nucleic Acids Res. 2003;31:3352–3355. - PMC - PubMed
    1. Brylinski M, Skolnick J. FINDSITELHM: a threading-based approach to ligand homology modeling. PLoS Comp. Biol. 2009;5 - PMC - PubMed
    1. Chikhi R, et al. Real-time ligand binding pocket database search using local surface descriptors. Proteins. 2010;78:2007–2028. - PMC - PubMed

Publication types