Sequence similarity analysis of Escherichia coli proteins: functional and evolutionary implications
- PMID: 8524875
- PMCID: PMC40515
- DOI: 10.1073/pnas.92.25.11921
Sequence similarity analysis of Escherichia coli proteins: functional and evolutionary implications
Abstract
A computer analysis of 2328 protein sequences comprising about 60% of the Escherichia coli gene products was performed using methods for database screening with individual sequences and alignment blocks. A high fraction of E. coli proteins--86%--shows significant sequence similarity to other proteins in current databases; about 70% show conservation at least at the level of distantly related bacteria, and about 40% contain ancient conserved regions (ACRs) shared with eukaryotic or Archaeal proteins. For > 90% of the E. coli proteins, either functional information or sequence similarity, or both, are available. Forty-six percent of the E. coli proteins belong to 299 clusters of paralogs (intraspecies homologs) defined on the basis of pairwise similarity. Another 10% could be included in 70 superclusters using motif detection methods. The majority of the clusters contain only two to four members. In contrast, nearly 25% of all E. coli proteins belong to the four largest superclusters--namely, permeases, ATPases and GTPases with the conserved "Walker-type" motif, helix-turn-helix regulatory proteins, and NAD(FAD)-binding proteins. We conclude that bacterial protein sequences generally are highly conserved in evolution, with about 50% of all ACR-containing protein families represented among the E. coli gene products. With the current sequence databases and methods of their screening, computer analysis yields useful information on the functions and evolutionary relationships of the vast majority of genes in a bacterial genome. Sequence similarity with E. coli proteins allows the prediction of functions for a number of important eukaryotic genes, including several whose products are implicated in human diseases.
Similar articles
-
Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea.Mol Microbiol. 1997 Aug;25(4):619-37. doi: 10.1046/j.1365-2958.1997.4821861.x. Mol Microbiol. 1997. PMID: 9379893
-
Exposition of a family of RNA m(5)C methyltransferases from searching genomic and proteomic sequences.Nucleic Acids Res. 1999 Aug 1;27(15):3138-45. doi: 10.1093/nar/27.15.3138. Nucleic Acids Res. 1999. PMID: 10454610 Free PMC article.
-
Widespread protein sequence similarities: origins of Escherichia coli genes.J Bacteriol. 1995 Mar;177(6):1585-8. doi: 10.1128/jb.177.6.1585-1588.1995. J Bacteriol. 1995. PMID: 7883716 Free PMC article.
-
Conserved domains in DNA repair proteins and evolution of repair systems.Nucleic Acids Res. 1999 Mar 1;27(5):1223-42. doi: 10.1093/nar/27.5.1223. Nucleic Acids Res. 1999. PMID: 9973609 Free PMC article. Review.
-
Genome sequences: genome sequence of a model prokaryote.Curr Biol. 1997 Oct 1;7(10):R656-9. doi: 10.1016/s0960-9822(06)00328-9. Curr Biol. 1997. PMID: 9368752 Review.
Cited by
-
A minimal gene set for cellular life derived by comparison of complete bacterial genomes.Proc Natl Acad Sci U S A. 1996 Sep 17;93(19):10268-73. doi: 10.1073/pnas.93.19.10268. Proc Natl Acad Sci U S A. 1996. PMID: 8816789 Free PMC article.
-
Genome plasticity as a paradigm of eubacteria evolution.J Mol Evol. 1997;44 Suppl 1:S57-64. doi: 10.1007/pl00000052. J Mol Evol. 1997. PMID: 9395406
-
Support vector machine (SVM) based multiclass prediction with basic statistical analysis of plasminogen activators.BMC Res Notes. 2014 Jan 27;7:63. doi: 10.1186/1756-0500-7-63. BMC Res Notes. 2014. PMID: 24468032 Free PMC article.
-
The structure and function of small nucleolar ribonucleoproteins.Nucleic Acids Res. 2007;35(5):1452-64. doi: 10.1093/nar/gkl1172. Epub 2007 Feb 6. Nucleic Acids Res. 2007. PMID: 17284456 Free PMC article. Review.
-
The yeast gene YNL292w encodes a pseudouridine synthase (Pus4) catalyzing the formation of psi55 in both mitochondrial and cytoplasmic tRNAs.Nucleic Acids Res. 1997 Nov 15;25(22):4493-9. doi: 10.1093/nar/25.22.4493. Nucleic Acids Res. 1997. PMID: 9358157 Free PMC article.
References
Publication types
MeSH terms
Substances
Associated data
- Actions
LinkOut - more resources
Full Text Sources
Other Literature Sources