Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Dec 1;32(21):6226-39.
doi: 10.1093/nar/gkh956. Print 2004.

EFICAz: a comprehensive approach for accurate genome-scale enzyme function inference

Affiliations

EFICAz: a comprehensive approach for accurate genome-scale enzyme function inference

Weidong Tian et al. Nucleic Acids Res. .

Abstract

EFICAz (Enzyme Function Inference by Combined Approach) is an automatic engine for large-scale enzyme function inference that combines predictions from four different methods developed and optimized to achieve high prediction accuracy: (i) recognition of functionally discriminating residues (FDRs) in enzyme families obtained by a Conservation-controlled HMM Iterative procedure for Enzyme Family classification (CHIEFc), (ii) pairwise sequence comparison using a family specific Sequence Identity Threshold, (iii) recognition of FDRs in Multiple Pfam enzyme families, and (iv) recognition of multiple Prosite patterns of high specificity. For FDR (i.e. conserved positions in an enzyme family that discriminate between true and false members of the family) identification, we have developed an Evolutionary Footprinting method that uses evolutionary information from homofunctional and heterofunctional multiple sequence alignments associated with an enzyme family. The FDRs show a significant correlation with annotated active site residues. In a jackknife test, EFICAz shows high accuracy (92%) and sensitivity (82%) for predicting four EC digits in testing sequences that are <40% identical to any member of the corresponding training set. Applied to Escherichia coli genome, EFICAz assigns more detailed enzymatic function than KEGG, and generates numerous novel predictions.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of the procedure to build enzyme families by CHIEFc.
Figure 2
Figure 2
Divergence of enzyme functions. Cumulative relative distributions of the number of different enzyme functions in the heterofunctional MSAs associated with four EC digits (A) and three EC digits CHIEFc enzyme families (B). Box-and-whisker plots showing the distributions of the number of different enzyme functions in heterofunctional MSAs versus the average pairwise sequence identity (sequence diversity) of the corresponding homofunctional MSAs associated with four EC digits (C) and three EC digits CHIEFc enzyme families (D). From top to bottom, the statistics represented in the box-and-whisker plots are 95th percentile (black circle), 90th percentile (whisker, top), 75th percentile (box, top), median (thick line), 25th percentile (box, bottom), 10th percentile (whisker, bottom) and 5th percentile (closed circle).
Figure 3
Figure 3
Average number of FDRs in CHIEFc enzyme families versus the average pairwise sequence identity of their corresponding homofunctional MSAs. The FDRs are selected based on the conservation score of either homofunctional MSAs alone (closed circles and open circles, to discriminate four EC digits and the first three EC digits, respectively), or both homofunctional MSAs and heterofunctional MSAs (closed triangles and open triangles, to discriminate four EC digits and the first three EC digits, respectively).
Figure 4
Figure 4
Correlation between functionally important residues and FDRs. (A) Fraction of CHIEFc families whose FDRs include at least one residue annotated as active site in Swiss-Prot. Two strategies for obtaining the FDRs are compared. The EF method (gray bars) and random selection (open bars, with error bars representing SD of the mean). (B) Functional annotation and spatial location of the FDRs for the phosphoprotein phosphatase CHIEFc family, mapped on the 3D structure of PDB entry 1FJM.
Figure 5
Figure 5
Benchmark of different enzyme function inference approaches by jackknife test. Accuracy (A, D, G), sensitivity (B, E, H) and Matthews Correlation Coefficient (C, F, I) values for different enzyme function inference methods, at different levels of maximal testing to training sequence identity, averaged per EC number. See Methods for a full description of the jackknife procedure. The plotted values are the averages of three repetitions of the jackknife analysis; the corresponding SDs are omitted for clarity, they range from 0.01 to 0.09, with a median value of 0.01.

Similar articles

Cited by

References

    1. Orengo C.A., Todd,A.E. and Thornton,J.M. (1999) From protein structure to function. Curr. Opin. Struct. Biol., 9, 374–382. - PubMed
    1. Kenyon G.L., DeMarini,D.M., Fuchs,E., Galas,D.J., Kirsch,J.F., Leyh,T.S., Moos,W.H., Petsko,G.A., Ringe,D., Rubin,G.M. et al. (2002) Defining the mandate of proteomics in the post-genomics era: workshop report. Mol. Cell. Proteomics, 1, 763–780. - PubMed
    1. Rost B., Liu,J., Nair,R., Wrzeszczynski,K.O. and Ofran,Y. (2003) Automatic prediction of protein function. Cell. Mol. Life Sci., 60, 2637–2650. - PMC - PubMed
    1. Jimenez-Sanchez G., Childs,B. and Valle,D. (2001) Human disease genes. Nature, 409, 853–855. - PubMed
    1. Pearson W.R. (2000) Flexible sequence similarity searching with the FASTA3 program package. Methods Mol. Biol., 132, 185–219. - PubMed

Publication types