Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jan 11:9:17.
doi: 10.1186/1471-2105-9-17.

Prediction of enzyme function based on 3D templates of evolutionarily important amino acids

Affiliations

Prediction of enzyme function based on 3D templates of evolutionarily important amino acids

David M Kristensen et al. BMC Bioinformatics. .

Abstract

Background: Structural genomics projects such as the Protein Structure Initiative (PSI) yield many new structures, but often these have no known molecular functions. One approach to recover this information is to use 3D templates - structure-function motifs that consist of a few functionally critical amino acids and may suggest functional similarity when geometrically matched to other structures. Since experimentally determined functional sites are not common enough to define 3D templates on a large scale, this work tests a computational strategy to select relevant residues for 3D templates.

Results: Based on evolutionary information and heuristics, an Evolutionary Trace Annotation (ETA) pipeline built templates for 98 enzymes, half taken from the PSI, and sought matches in a non-redundant structure database. On average each template matched 2.7 distinct proteins, of which 2.0 share the first three Enzyme Commission digits as the template's enzyme of origin. In many cases (61%) a single most likely function could be predicted as the annotation with the most matches, and in these cases such a plurality vote identified the correct function with 87% accuracy. ETA was also found to be complementary to sequence homology-based annotations. When matches are required to both geometrically match the 3D template and to be sequence homologs found by BLAST or PSI-BLAST, the annotation accuracy is greater than either method alone, especially in the region of lower sequence identity where homology-based annotations are least reliable.

Conclusion: These data suggest that knowledge of evolutionarily important residues improves functional annotation among distant enzyme homologs. Since, unlike other 3D template approaches, the ETA method bypasses the need for experimental knowledge of the catalytic mechanism, it should prove a useful, large scale, and general adjunct to combine with other methods to decipher protein function in the structural proteome.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Illustration of the automated functional annotation pipeline.
Figure 2
Figure 2
Average positive predictive value (black bars) and sensitivity (white bars) in the training set for several heuristics: (a) choosing residues for the 3D template; (b) representing those residues as points, with single-point methods left of the grey line and multiple-point methods right of it; or (c) choosing the size of the template.
Figure 3
Figure 3
Overlap (purple spheres) of ET Rank template residues (red spheres) with SITE records (green spheres) provided by the PDB, in the context of the surface trace cluster (red sticks) from which the template residues were chosen. (a) Rieske iron-sulfur protein (PDB 1RIE); (b) Casein kinase II (PDB 1QF8, chain A).
Figure 4
Figure 4
PPV for each of the 53 proteins in the training set (highest PPV in red squares, others in purple diamonds) and for the templates with randomly chosen residues (green triangles, highest PPV; blue 'X's, others). All residue combinations were sampled for the 32 proteins on the left; 500 templates were randomly sampled for the 21 proteins on the right.
Figure 5
Figure 5
(a) Stacked cumulative histogram of RMSDs of significant (p-value ≤ 1%) matches, including matches with the same function (red) and different function (blue). (b) Scatterplot of these same matches, adding the average absolute value of the difference in evolutionary importance between the matched and query residues to allow separation of true and false matches.
Figure 6
Figure 6
Venn diagram showing the overlap with true (purple) and false (white) matches to a) the PDB set and b) the PSI set, found by ETA (blue), BLAST (yellow), and PSI-BLAST (orange). Sum of all matches found in each category are at right.
Figure 7
Figure 7
PPV of BLAST (cyan dashed hollow circle) and BLAST+ETA (blue solid circle) as the maximum e-value cutoff for BLAST varies (horizontal axis). ETA shown as a single point (black diamond) at e-value = 0.05. (a) PDB set; (b) PSI set.
Figure 8
Figure 8
Match PPV of ETA (black dashed hollow diamond), BLAST (cyan dashed hollow circle), PSI-BLAST (orange dashed hollow square), the intersection of BLAST+ETA (blue solid circle), and the intersection of PSI-BLAST+ETA (red solid square). The horizontal axis represents decreasing levels of match sequence identity. (a) PDB set; (b) PSI set.
Figure 9
Figure 9
Annotation performance of ETA (black dashed hollow diamond), BLAST (cyan dashed hollow circle), PSI-BLAST (orange dashed hollow square), the intersection of BLAST+ETA (blue solid circle), and the intersection of PSI-BLAST+ETA (red solid square). The horizontal axis represents decreasing levels of match sequence identity, and the vertical axis represents: (a) PDB set voting accuracy; (b) PDB set voting availability; (c) PSI set accuracy; (d) PSI set availability.

Similar articles

Cited by

References

    1. Brenner SE. A tour of structural genomics. Nat Rev Genet. 2001;2:801–809. - PubMed
    1. Burley SK. An overview of structural genomics. Nat Struct Biol. 2000;7 Suppl:932–934. - PubMed
    1. Leulliot N, Tresaugues L, Bremang M, Sorel I, Ulryck N, Graille M, Aboulfath I, Poupon A, Liger D, Quevillon-Cheruel S, Janin J, van Tilbeurgh H. High-throughput crystal-optimization strategies in the South Paris Yeast Structural Genomics Project: one size fits all? Acta Crystallogr D Biol Crystallogr. 2005;61:664–670. - PubMed
    1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. - PMC - PubMed
    1. Kuznetsova E, Proudfoot M, Sanders SA, Reinking J, Savchenko A, Arrowsmith CH, Edwards AM, Yakunin AF. Enzyme genomics: Application of general enzymatic screens to discover new enzymes. FEMS Microbiol Rev. 2005;29:263–279. - PubMed

Publication types