Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011:2011:743782.
doi: 10.1155/2011/743782. Epub 2011 Mar 29.

ModEnzA: Accurate Identification of Metabolic Enzymes Using Function Specific Profile HMMs with Optimised Discrimination Threshold and Modified Emission Probabilities

Affiliations

ModEnzA: Accurate Identification of Metabolic Enzymes Using Function Specific Profile HMMs with Optimised Discrimination Threshold and Modified Emission Probabilities

Dhwani K Desai et al. Adv Bioinformatics. 2011.

Abstract

Various enzyme identification protocols involving homology transfer by sequence-sequence or profile-sequence comparisons have been devised which utilise Swiss-Prot sequences associated with EC numbers as the training set. A profile HMM constructed for a particular EC number might select sequences which perform a different enzymatic function due to the presence of certain fold-specific residues which are conserved in enzymes sharing a common fold. We describe a protocol, ModEnzA (HMM-ModE Enzyme Annotation), which generates profile HMMs highly specific at a functional level as defined by the EC numbers by incorporating information from negative training sequences. We enrich the training dataset by mining sequences from the NCBI Non-Redundant database for increased sensitivity. We compare our method with other enzyme identification methods, both for assigning EC numbers to a genome as well as identifying protein sequences associated with an enzymatic activity. We report a sensitivity of 88% and specificity of 95% in identifying EC numbers and annotating enzymatic sequences from the E. coli genome which is higher than any other method. With the next-generation sequencing methods producing a huge amount of sequence data, the development and use of fully automated yet accurate protocols such as ModEnzA is warranted for rapid annotation of newly sequenced genomes and metagenomic sequences.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flow diagram of the ModEnzA protocol.
Figure 2
Figure 2
ROC curves for genome-wide enzyme identification using the ModEnzA profiles. The classification of the complete genomes of the four organisms is shown in (a). A fraction of the EC number profiles (284 out of 2075 Tier I ModEnzA profiles) were retrained with an older version of the ENZYME database and compared to PRIAM and MetaShark (b). ModEnzA-RT-Retrained ModEnzA profiles.

Similar articles

Cited by

References

    1. MacLean D, Jones JDG, Studholme DJ. Application of ’next-generation’ sequencing technologies to microbial genetics. Nature Reviews Microbiology. 2009;7(4):287–296. - PubMed
    1. Galperin MY, Koonin EV. Searching for drug targets in microbial genomes. Current Opinion in Biotechnology. 1999;10(6):571–578. - PubMed
    1. Hopkins AL, Groom CR. The druggable genome. Nature Reviews Drug Discovery. 2002;1(9):727–730. - PubMed
    1. Russ AP, Lampel S. The druggable genome: an update. Drug Discovery Today. 2005;10(23-24):1607–1610. - PubMed
    1. Yeh I, Hanekamp T, Tsoka S, Karp PD, Altman RB. Computational analysis of Plasmodium falciparum metabolism: organizing genomic information to facilitate drug discovery. Genome Research. 2004;14(5):917–924. - PMC - PubMed

LinkOut - more resources