Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jun;27(6):1125-1135.
doi: 10.1002/pro.3416. Epub 2018 Apr 27.

Functional classification of protein structures by local structure matching in graph representation

Affiliations

Functional classification of protein structures by local structure matching in graph representation

Caitlyn L Mills et al. Protein Sci. 2018 Jun.

Abstract

As a result of high-throughput protein structure initiatives, over 14,400 protein structures have been solved by Structural Genomics (SG) centers and participating research groups. While the totality of SG data represents a tremendous contribution to genomics and structural biology, reliable functional information for these proteins is generally lacking. Better functional predictions for SG proteins will add substantial value to the structural information already obtained. Our method described herein, Graph Representation of Active Sites for Prediction of Function (GRASP-Func), predicts quickly and accurately the biochemical function of proteins by representing residues at the predicted local active site as graphs rather than in Cartesian coordinates. We compare the GRASP-Func method to our previously reported method, Structurally Aligned Local Sites of Activity (SALSA), using the Ribulose Phosphate Binding Barrel (RPBB), 6-Hairpin Glycosidase (6-HG), and Concanavalin A-like Lectins/Glucanase (CAL/G) superfamilies as test cases. In each of the superfamilies, SALSA and the much faster method GRASP-Func yield similar correct classification of previously characterized proteins, providing a validated benchmark for the new method. In addition, we analyzed SG proteins using our SALSA and GRASP-Func methods to predict function. Forty-one SG proteins in the RPBB superfamily, nine SG proteins in the 6-HG superfamily, and one SG protein in the CAL/G superfamily were successfully classified into one of the functional families in their respective superfamily by both methods. This improved, faster, validated computational method can yield more reliable predictions of function that can be used for a wide variety of applications by the community.

Keywords: 6-Hairpin Glycosidase (6-HG) superfamily; Concanavalin A-like Lectins/Glucanase (CAL/G) superfamily; Graph Representation of Active Sites for Prediction of Function (GRASP-Func); Ribulose Phosphate Binding Barrel (RPBB) superfamily; Structurally Aligned Local Sites of Activity (SALSA); protein function annotation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
GRASP‐Func clustering of RPBB known function (light blue) and SG (dark green) proteins. Proteins are represented as nodes. The thickness of each edge shows the degree of similarity between the two connected proteins. PDB IDs for proteins of known function: 1pii:N, 1i4n, 2c3z (1a–c, respectively); 1geq, 1qop, 1xc4, 1rd5 (2a–d); 1pii:C, 1lbm (3a–b); 1qo2, 1vzw, 2y85 (4a–c); 1thf, 1h5y, 1ox6 (5a–c); 1rpx, 2fli, 1h1y, 1tqj, 3ovp (6a–e); 1dbt, 1dv7, 1dqw, 1l2u, 2za1, 3qw3, 3l0k (7a–g); 1xbv, 3exr (8a–b); 3ajx, HPS1 (9a–b). Each SG protein is numbered based on its Label in Table S12, Supporting Information.
Figure 2
Figure 2
GRASP‐Func clustering of 6‐HG known function (light blue) and SG (dark green) proteins. Proteins are represented as nodes. The thickness of each edge shows the degree of similarity between the two connected proteins. PDB IDs for proteins of known function: 1gai, 1ayx, 1lf9, 1ug9 (1a–d); 3qt9, 3qsp (2a–b); 1cem, 1wu4, 1v5c, 1h12 (3a–d); 1clc, 1kfg, 1ksc, 1ia6 (4a–d); 2d5j, 2zzr (5a–b); 2okx, 3w5m, ALR1 (6a–c); 4ufc, 2eac, ALF1, ALF2 (7a–d); 2jf4, TRE1 (8a–b); 2d8l (9); 3ren (10); 1v7x, 2cqs, CDP1 (11a–c); 1h54, NGP1 (12a–b); 1fp3, 2gz6 (13a–b). Each SG protein is numbered based on its Label in Table S12, Supporting Information.
Figure 3
Figure 3
GRASP‐Func clustering of CAL/G known function (light blue) and SG (dark green) proteins. Proteins are represented as nodes. The thickness of each edge shows the degree of similarity between the two connected proteins. PDB IDs for proteins of known function: 1m4w, 1h4g, 1bcx (1a–c); 1uu4, 1h8v, 2nlr (2a–c); 1z3t, 1dy4, 2rfw (3a–c); 2ayh, 1dyp, 3ilf, 2vy0, 1mve (4a–e); 1uai, 1j1t, 1vav (5a–c); 2fir, 1y43 (6a–b). Each SG protein is numbered based on its label in Table S12, Supporting Information.

Similar articles

Cited by

References

    1. Gherardini PF, Helmer‐Citterich M (2008) Structure‐based function prediction: approaches and applications. Brief Funct Genomic Proteomic 7:291–302. - PubMed
    1. Kihara D, Ed. (2011) Protein function prediction for omics era, 1st ed Dordrecht: Springer.
    1. Sleator RD, Walsh P (2010) An overview of in silico protein function prediction. Arch Microbiol 192:151–155. - PubMed
    1. Mills CL, Beuning PJ, Ondrechen MJ (2015) Biochemical functional predictions for protein structures of unknown or uncertain function. Comput Struct Biotechnol J 13:182–191. - PMC - PubMed
    1. Lee D, Redfern O, Orengo C (2007) Predicting protein function from sequence and structure. Nat Rev Mol Cell Biol 8:995–1005. - PubMed

Publication types

Associated data

LinkOut - more resources