Abstract
DNA sequencing efforts frequently uncover genes other than the targeted ones. We have used rapid database scanning methods to search for undescribed eubacterial and archean protein coding frames in regions flanking known genes. By searching all prokaryotic DNA sequences not marked as coding for proteins or stable RNAs against the protein databases, we have identified more than 450 new examples of bacterial proteins, as well as a smaller number of possible revisions to known proteins, at a surprisingly high rate of one new protein or revision for every 24 initial DNA sequences or 8,300 nucleotides examined. Seven proteins are members of families which have not been described in prokaryotic sequences. We also describe 49 re–interpretations of existing sequence data of particular biological significance.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Gish, W. & States, D. Identification of protein coding regions by database similarity search. Nature Genet. 3, 266–272 (1993).
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment search tool. J. molec. Biol. 214, 1–8 (1990).
Karlin, S. & Altschul, S.F. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. natn. Acad. Sci. U.S.A. 87, 2264–2268 (1990).
Osawa, S., Jukes, T.H., Watanabe, K., Muto, A. Recent evidence for evolution of the genetic code. Microbiol. Rev. 56, 229–264 (1992).
Roth, J.R., Lawrence, J.G., Rubenfield, M., Kieffer-Higgins, S. & Church, G.M. Characterization of the cobalamin (vitamin B12) biosynthetic genes of Salmonella typhimurium. J. Bact. 175, 3303–3316 (1993).
Stormo, G.D., Schneider, T.D., Gold, L. & Ehrenfeucht, A. Use of the ‘Perceptron’ algorithm to distinguish translational initiation sites in E. coli. Nucl. Acids Res. 10, 2997–3011 (1982).
Gesteland, R.F., Weiss, R.B. & Atkins, J.F. Reprogrammed genetic decoding. Science 257, 1640–1641 (1992).
Cech, T.R. RNA editing: World's smallest introns? Cell 64, 667–669 (1991).
Krawetz, S.A. Sequence errors described in GenBank: a means to determine the accuracy of DNA sequence interpretation. Nucl. Acids. Res. 17, 3951–3957 (1989).
Kristensen, T., Lopez, R. & Prydz, H. An estimate of the sequencing error frequency in the DNA sequence databases. DNA Seq. 2, 343–346 (1992).
Pocalyko, D.J., Carroll, L.J., Martin, B.M., Babbitt, P.C. & Dunaway–Mariano, D. Analysis of sequence homologueies in plant and bacterial pyruvate phosphate dikinase, Enzyme I of the bacterial phosphoenolpyruvate:sugar phosphotransferase system and other PEP–utilizing enzymes. Biochem. 29, 10757–10765 (1990).
Carlisle, S.M. et al. Pyrophosphate–dependent phosphofructokinase: Conservation of protein sequence between the alpha- and beta-subunits and with the ATP–dependent phosphofructokinase. J. biol. Chem. 265, 18366–18371 (1990).
Fickett, J.W. & Tung, C.S. Assessment of protein coding measures. Nucl. Acids Res. 20, 6441–6450 (1992).
Posfai, J. & Roberts, R.J. Finding errors in DNA sequences. Proc. natn. Acad. Sci. U.S.A. 89, 4698–4702 (1992).
States, D.J. & Botstein, D. Molecular sequence accuracy and the analysis of protein coding regions. Proc. natn. Acad. Sci. U.S.A. 88, 5518–5522 (1991).
Benson, D., Lipman, D.J. & Ostell, J. Gen Bank. Nucl. Acids Res. 21, 2963–2965 (1993).
Altschul, S.F. Amino acid substitution matrices from an information theoretic perspective. J. molec. Biol. 219, 555–565 (1991).
Dayhoff, M.O., Schwartz, R.M. & Orcutt, B.C. . in Atlas of Protein Sequence and Structure (ed. Dayhoff, M.O) 5, 345–352 (National Biomedical Research Foundation, Washington D.C., 1978).
Henikoff, S. & Henikoff, J.G. Amino acid substitution matrices from protein blocks. Proc. natn. Acad. Sci. U.S.A. 88, 10915–10919 (1992).
Pearson, W.R. & Lipman, D.J. Improved tools for biological sequence comparison. Proc. natn. Acad. Sci. U.S.A. 85, 2444–2448 (1988).
Barker, W.C., George, D.G., Hunt, L.T. & Garavelli, J.S. The PIR protein sequence database. Nucl. Acids Res. 19, 2231–2236 (1991).
Bairoch, A. & Boeckmann, B. The SWISS–PROT protein sequence data bank. Nucl. Acids Res. 19, 2247–2249 (1991).
Claverie, J.-M. Identifying coding exons by similarity search: Alu–derived and other potentially misleading protein sequences. Genomics 12, 838–841 (1992).
Higgins, D.G., Bleasby, A.J. & Fuchs, R. CLUSTAL V: improved software for multiple sequence alignment. CABIOS 8, 181–191 (1992).
Larsen, N. et al. The ribosomal database project. Nucl. Acids Res. 21 (Suppl), 3021–3023 (1993).
Klenin, A. et al. Comparative analysis of genes encoding methyl coenzyme M reductase in methanogenic bacteria. Molec. gen. Genet. 213, 409–420 (1988).
Cram, D.S. et al. Structure and expression of the genes, mcrBDCGA, which encode the subunits of component C of methyl coenzyme M reductase in Methanococcus vannielii. Proc. natn. Acad. Sci. U.S.A. 84, 3992–3996 (1987).
Bokranz, M. & Klein, A. Nucleotide sequence of the methyl coenzyme M reductase gene cluster from Methanosarcina barken. Nucl. Acids Res. 15, 4350–4351 (1987).
Bokranz, M., Baeumner, G., Allmansberger, R., Ankel–Fuchs, D. & Klein, A. Cloning and characterization of the methyl coenzyme M reductase genes from Methanobacterium thermoautotrophicum. J. Bacteriol. 170, 568–577 (1988).
Puehler, G., Lottspeich, F. & Zillig, W. Organization and nucleotide sequence of the genes encoding the large subunits A, B and C of the DNA–dependent RNA polymerase of the archaebacterium Sulfolobus acidocaldarius. Nucl. Acids Res. 17, 4517–4534 (1987).
Lechner, K., Heller, K. & Boeck, A. Organization and nucleotide sequence of a transcription unit of Methanococcus vannielii comprising genes for protein synthesis elongation factors and ribosomal proteins. J. molec. Evol. 29, 20–27 (1989).
Klenk, H.P., Schwass, V. & Zillig, W. Nucleotide sequence of the genes encoding the L30, S12 and S7 equivalent ribosomal proteins from the archaeum Thermococcus celer. Nucl. Acids Res. 19, 6047–6047 (1991).
Nielsen, H., Andreasen, P.H., Dreisig, H., Kristiansen, K. & Engberg, J. An intron in aribosomal protein gene from Tetrahymena. EMBO J. 5, 2711–2717 (1986).
Alksne, L.E. & Warner, J.R. A novel cloning strategy reveals the gene for the yeast homologueue to Escherichia coli ribosomal protein S12. J. biol. Chem. 268, 10813–10819 (1993).
Leffers, H., Gropp, F., Lottspeich, F., Zillig, W. & Garrett, R.A., Sequence, organisation, transcription and evolution of RNA polymerase subunit genes from the archaebacterial extreme halophiles Halobacterium halobium and Halococcus morrhuae. J. molec. Biol. 206, 1–17 (1989).
Auer, J., Spicker, G., Mayerhofer, L., Puehler, G. & Boeck, A. Organisation and nucleotide sequence of a gene cluster comprising the translation elongation factor 1-alpha from the extreme thermophilic archaebacterium Sulfolobus acidocaldarius: Phylogenetic implications. Syst. appl. Microbiol. 14, 14–22 (1990).
Kuwano, Y., Olvera, J. & Wool, I.G. The primary structure of rat ribosomal protein S5, a ribosomal protein present in the rat genome in a single copy. J. biol. Chem. 267, 25304–25308 (1992).
Stroeher, U.H., Karageorgos, L.E., Morona, R.,& Manning, P.A. Serotype conversion in vibrio cholerae o1. Proc. natn. Acad. Sci. U.S.A. 89, 2566–2570 (1992).
Koeplin, R. et al. Genetics of xanthan production in Xanthomonas campestris: the xanA and xanB genes are involved in UDP–glucose and GDP–mannose biosynthesis. J. Bacteriol. 174, 191–199 (1992).
Zielinski, N.A., Chakrabarty, A.M. & Berry, A. Characterization and regulation of the Pseudomonas aeruginosa algc gene encoding phosphomannomutase. J. biol. Chem. 266, 9754–9763 (1991).
Lee, S.J., Romana, L.K. & Reeves, P.R. Sequence and structural analysis of the rfb (o antigen) gene cluster from a group C1 Salmonella enterica strain. J. gen. Microbiol. 138, 1843–1855 (1992).
Matsuoka, M. et al. Primary structure of maize pyruvate,orthophosphate dikinase as deduced from cDNA sequence. J. biol. Chem. 263, 11080–11083 (1988).
Belunis, C.J., Mdluli, K.E., Raetz, C.R.H. & Nano, F.E. A novel 3-Deoxy-D-manno-octulosonic acid transferase from Chlamydia trachomatis required for expression of the genus–specific epitope. J. biol. Chem. 267, 18702–18707 (1992).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Robison, K., Gilbert, W. & Church, G. Large scale bacterial gene discovery by similarity search. Nat Genet 7, 205–214 (1994). https://doi.org/10.1038/ng0694-205
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1038/ng0694-205