Abstract
The exponential growth of sequence data does not necessarily lead to an increase in knowledge about the functions of genes and their products. Prediction of function using comparative sequence analysis is extremely powerful but, if not performed appropriately, may also lead to the creation and propagation of assignment errors. While current homology detection methods can cope with the data flow, the identification, verification and annotation of functional features need to be drastically improved.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Bork, P. & Bairoch, A. Go hunting in sequence databases but watch out for the traps. Trends Genet. 12, 425–427 (1996).
Bhatia, U., Robison, K. & Gilbert, W. Dealing with database explosion: a cautionary note. Science 276, 1724–1725 (1997).
Altschul, S.F., Boguski, M.S., Gish, W. & Wootton, J.C. Issues in searching molecular sequence databases. Nature Genet. 6, 119–129 (1994).
Smith, R.F. Sequence database searching in the era of large-scale genomic sequencing. Genome Res. 6, 653–660 (1996).
Tatusov, R.L., Koonin, E.V. & Lipman, D.J. A genomic perspective on protein families. Science 278, 631–637 (1997).
Boguski, M.S., Tolstoshev, C.M. & Bassett, D.E. Jr., Gene discovery in dbEST. Science 265,1993–1994 (1994).
Green, P., Lipman, D., Hillier, L., Waterstone, R., States, D. & Claverie, J.-M. Ancient conserved regions in new gene sequences and the protein databases. Science 259, 1711–1716 (1993).
Oliver, S.G. et al. The complete sequence of yeast chromosome III. Nature 357, 38–46 (1992).
Bork, P. et al. What's in a genome? Nature 358, 287 (1992).
Sharp, P.M. & Lloyd, A.T. Regional base composition variation along yeast chromosome III: evolution of chromosome primary structure. Nucleic Acids Res. 21, 179–183 (1993).
Koonin, E.V., Bork, P. & Sander, C. Yeast chromosome III: New gene functions. EMBO J. 13, 493–503 (1994).
Fickett, J.W. ORF's and genes: how strong a connection? J. Comput. Biol. 2, 117–123 (1995).
Collins, F.S. Positional cloning from moves from perditional to traditional. Nature Genet. 9, 347–350 (1995).
Mushegian, A.R., Bassett, D.E. Jr, Boguski, M., Bork, P. & Koonin, E.V. Positionally cloned human disease genes: Patterns of evolutionary conservation. Proc. Natl. Acad. Sci. USA 94, 5831–5836 (1997).
Bork, P. & Gibson, T.J. Applying motif and profile searches. Methods Enzymol. 266, 162–184 (1996).
Altschul, S.F. et al. Gapped Blast and PSI-Blast, a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Wootton, J.C. & Federhen, S. Analysis of compositionally biased regions in sequence databases. Methods Enzymol. 266, 554–571 (1996).
Wootton, J.C. Sequences with unusual amino acid composition. Curr. Opin. Struct. Biol. 4, 413–421 (1994).
Lupas, A. Predicting coiled coil regions in proteins Curr. Opin. Struct. Biol. 7, 388–393 (1997).
Rost, B. & O'Donoghue, S. Sysisphus and the prediction of protein structure Comput. Appl. Biosci. 13, 345–356 (1997).
Henikoff, S. et al. Gene families: the taxonomy of protein paralogues and chimeras. Science 278, 609–613 (1997).
Schultz, J., Milpetz, F., Bork, P. & Ponting, C.P. SMART, a simple modular architecture research tool: Identification of signalling domains Proc. Natl. Acad. Sci. USA, in press.
Zhang, Y. et al. Positional cloning of the mouse obese gene and its human homologue. Nature 372, 425–432 (1994).
Madej, T., Boguski, M.S. & Bryant, S.H. Threading analysis suggests that the obese gene product may be a helical cytokine. FEBS Lett. 373, 13–18 (1995).
Zhang, F. et al. Crystal structure of the obese protein leptin-E100. Nature 387, 206–209 (1997).
Tartaglia, L.A. The leptin receptor. J. Biol. Chem. 272, 6093–6096 (1996).
Rost, B., Schneider, R. & Sander, C. Protein fold recognition by prediction-based threading. J. Mol. Biol. 270, 471–480 (1997).
Smith, T.F. et al. Current limitations to protein threading approaches. J. Comput. Biol. 4, 217–225 (1997).
Fischer, D. & Eisenberg, D. Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium. Proc. Natl. Acad. Sci. USA 94, 11929–11934 (1997).
Fitch, W.M. Distinguishing homologous from analogous proteins. Syst Zool. 19, 99–113 (1970).
Fitch, W.M. Uses for evolutionary trees. Phil. Trans. R. Soc. Lond. B. 349, 93–102 (1995).
Kaghad, M. et al. Monoallelically expressed gene related to p53 at 1p36, a region frequently deleted in neuroblastoma and other human cancers. Cell 90, 809–819 (1997).
Schultz, J., Ponting, C.P., Hofmann, K. & Bork, P. SAM as a protein interaction domain involved in developmental regulation. Prot. Sci. 6, 249–253 (1997).
Schmale, H. & Bamberger, C. A novel protein with strong homology to the tumor surpressor p53. Oncogene 15, 1363–1367 (1997).
Riley, M. Functions of the gene products of Escherichia coli. Microbiol. Rev. 57, 862–952 (1993).
Bork, P. et al. Exploring the Mycoplasma capricolum genome: a minimal cell reveals its physiology. Mol. Microbiol. 16, 955–967 (1995).
Tatusov, R.L. et al. Metabolism and evolution of Haemophilus influenzae deduced from a whole genome comparison to Escherichia coli. Curr. Biol. 6, 279–291 (1996).
Hieter, P. & Boguski, M. functional genomics: It's all how you read it. Science 278, 601–602 (1997).
Zhang, L. et al. Gene expression profiles in normal and cancer cells. Science 276, 1268–1272 (1997).
Koenig, M., Monaco, A.P. & Kunkel, L.M. The complete sequence of dystrophin predicts a rod-shaped cytoskeletal protein. Cell 53, 219–226 (1988).
Bork, P. & Sudol, M. The WW domain: a signalling site in dystrophin? Trends Biochem. Sci. 19, 531–533 (1994).
Ponting, C.P., Blake, D.J., Davies, K.E., Kendrick-Jones, J. & Winder, S.J. ZZ and TAZ: new putative zinc fingers in dystrophin and other proteins. Trends Biochem. Sci. 21, 11–13 (1996).
Macias, M.J. et al. Structure of the WW domain of a kinase-associated protein complexed with a proline-rich peptide. Nature 382, 646–649 (1996).
Huntington's disease collaborative research group.A novel gene containing a trinucleotide repeat that is expanded and unstable in Huntington's disease chromosomes. Cell 72, 971–983 (1993).
Andrade, M. & Bork, P. HEAT repeats in Huntington's disease protein. Nature Genet. 11, 115–116 (1995).
Miki, Y. et al. A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. Science 266, 66–71 (1994).
Koonin, E.V., Altschul, S. & Bork, P. BRCA1 protein products: functional motifs. Nature Genet. 13, 266–268 (1996).
Bork, P. et al. A superfamily of conserved domains in DNA damage responsive cell cycle checkpoint proteins. FASEB J. 11, 68–76 (1997).
Callebaut, I. & Mornon, J.P. From BRCA1 to RAP1: a widespread BRCT module closely associated with DNA repair. FEBS Lett. 400, 25–30 (1997).
Monteiro, A.N.A., August, A. & Hanafusa, H. Evidence for a transcriptional activation function of BRCA1 C-terminal region. Proc. Natl. Acad. Sci. USA 93, 13595–13599 (1996).
Scully, R. et al. Dynamic changes of BRCA1 subnuclear location and phosphorylation state are initiated by DNA damage. Cell 90, 425–435 (1997).
Wooster, R. et al. Identification of the breast cancer susceptibility gene BRCA2. Nature 378, 789–792 (1995).
Bork, P., Blomberg, N. & Nilges, M. Internal repeats in the BRCA2 protein sequence. Nature Genet. 13, 22–23 (1996).
Bignell, G., Micklem, G., Stratton, M.R., Ashworth, A. & Wooster, R. The BRC repeats are conserved in mammalian BRCA2 proteins. Hum. Mol. Genet. 6, 53–58 (1997).
Cremers, F.P. et al. Cloning of a gene that is rearranged in patients with choroideraemia. Nature 347, 674–677 (1990).
Koonin, E.V. Human choroideraemia protein contains an FAD-binding domain. Nature Genet. 12, 237–239 (1996).
Wu, S.K., Zeng, K., Wilson, I.A. & Balch, W.E. Structural insights into the function of the Rab GDI superfamily. Trends Biochem. Sci. 21, 472–476 (1996).
Campuzano, V. et al. Freidrich's ataxia: autosomal recessive disease caused by an intronic GAA repeat expansion. Science 271, 1423–1427 (1996).
Gibson, T., Koonin, E.V., Musco, G., Pastore, A. & Bork, P. Freidrich's ataxia protein: phylogenetic evidence for mitochondrial dysfunction. Trends Neurosci. 19, 465–468 (1996).
Koenig, M. & Mandel, J.-L. Deciphering the cause of Freidrich's ataxia. Curr. Opin. Neurobiol. 7, 689–694 (1997).
Koch, M.C. et al. The skeletal muscle chloride channel in dominant and recessive myotonia. Science 257, 797–600 (1992).
Bateman, A. The structure of a domain common to archebacteria and the homocystinuria disease protein. Trends Biochem. Sci. 22, 12–13 (1997).
Bione, S. et al. A novel X-Hnked gene, G4.5 is responsible for Barth syndrome. Nature Genet. 12, 385–389 (1996).
Neuwald, A.F. Barth syndrome might be due to acyltransferase deficiency. Curr. Biol. 7, 465–466 (1997).
Kolodner, R. et al. Biochemistry and genetics of eukaryotic mismatch repair. Genes Dev. 10, 1433–1442 (1996).
Bergerat, A. et al. An atypical topoisomerase II from Archea with implications for meiotic recombination. Nature 386, 414–417 (1997).
Yu, C.E. et al. Positional cloning of the Werner's syndrome gene. Science 272, 258–262 (1996).
Mian, I.S. Comparative sequence analysis of ribonucleases HII, II, II PH and D. Nucleic Acids Res. 25, 3187–3195 (1997).
Morozov, V., Mushegian, A.R., Koonin, E.V. & Bork, P. A putative nucleic acid-binding domain in Bloom's and Werner's syndrome helicases. Trends Biochem. Sci. 22, 417–418 (1997).
Ellis, N.A. et al. The Bloom's syndrome gene product is homologous to RecQ helicases. Cell 83, 655–666 (1995).
Symons, M. et al. Wiskott-Aldrich syndrome protein, a novel effector for the GTPase CDC42HS, is implicated in actin polymerization. Cell 84, 723–734 (1996).
Ponting, C.P. & Phillips, C. Identification of homer as a homologue of the Wiskott-Aldrich syndrome protein suggests a receptor-binding function for WH1 domains. J. Mol. Med. 75, 769–771 (1997).
Imbert, G. et al. Cloning of the gene for spinocerebellar ataxia 2 reveals a locus with high sensitivity to expanded CAG/glutamine repeats. Nature Genet. 14, 285–291 (1996).
Neuwald, A.F. & Koonin, E.V., Ataxin-2, global regulators of bacterial gene expression, and spliceosomal snRNP proteins share a conserved domain. J. Mol. Med. 76, 3–5 (1998).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bork, P., Koonin, E. Predicting functions from protein sequences—where are the bottlenecks?. Nat Genet 18, 313–318 (1998). https://doi.org/10.1038/ng0498-313
Issue Date:
DOI: https://doi.org/10.1038/ng0498-313