Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Progress
  • Published:

Predicting functions from protein sequences—where are the bottlenecks?

Abstract

The exponential growth of sequence data does not necessarily lead to an increase in knowledge about the functions of genes and their products. Prediction of function using comparative sequence analysis is extremely powerful but, if not performed appropriately, may also lead to the creation and propagation of assignment errors. While current homology detection methods can cope with the data flow, the identification, verification and annotation of functional features need to be drastically improved.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Similar content being viewed by others

References

  1. Bork, P. & Bairoch, A. Go hunting in sequence databases but watch out for the traps. Trends Genet. 12, 425–427 (1996).

    Article  CAS  Google Scholar 

  2. Bhatia, U., Robison, K. & Gilbert, W. Dealing with database explosion: a cautionary note. Science 276, 1724–1725 (1997).

    Article  CAS  Google Scholar 

  3. Altschul, S.F., Boguski, M.S., Gish, W. & Wootton, J.C. Issues in searching molecular sequence databases. Nature Genet. 6, 119–129 (1994).

    Article  CAS  Google Scholar 

  4. Smith, R.F. Sequence database searching in the era of large-scale genomic sequencing. Genome Res. 6, 653–660 (1996).

    Article  CAS  Google Scholar 

  5. Tatusov, R.L., Koonin, E.V. & Lipman, D.J. A genomic perspective on protein families. Science 278, 631–637 (1997).

    Article  CAS  Google Scholar 

  6. Boguski, M.S., Tolstoshev, C.M. & Bassett, D.E. Jr., Gene discovery in dbEST. Science 265,1993–1994 (1994).

    Article  CAS  Google Scholar 

  7. Green, P., Lipman, D., Hillier, L., Waterstone, R., States, D. & Claverie, J.-M. Ancient conserved regions in new gene sequences and the protein databases. Science 259, 1711–1716 (1993).

    Article  CAS  Google Scholar 

  8. Oliver, S.G. et al. The complete sequence of yeast chromosome III. Nature 357, 38–46 (1992).

    Article  CAS  Google Scholar 

  9. Bork, P. et al. What's in a genome? Nature 358, 287 (1992).

    Article  CAS  Google Scholar 

  10. Sharp, P.M. & Lloyd, A.T. Regional base composition variation along yeast chromosome III: evolution of chromosome primary structure. Nucleic Acids Res. 21, 179–183 (1993).

    Article  CAS  Google Scholar 

  11. Koonin, E.V., Bork, P. & Sander, C. Yeast chromosome III: New gene functions. EMBO J. 13, 493–503 (1994).

    Article  CAS  Google Scholar 

  12. Fickett, J.W. ORF's and genes: how strong a connection? J. Comput. Biol. 2, 117–123 (1995).

    Article  CAS  Google Scholar 

  13. Collins, F.S. Positional cloning from moves from perditional to traditional. Nature Genet. 9, 347–350 (1995).

    Article  CAS  Google Scholar 

  14. Mushegian, A.R., Bassett, D.E. Jr, Boguski, M., Bork, P. & Koonin, E.V. Positionally cloned human disease genes: Patterns of evolutionary conservation. Proc. Natl. Acad. Sci. USA 94, 5831–5836 (1997).

    Article  CAS  Google Scholar 

  15. Bork, P. & Gibson, T.J. Applying motif and profile searches. Methods Enzymol. 266, 162–184 (1996).

    Article  CAS  Google Scholar 

  16. Altschul, S.F. et al. Gapped Blast and PSI-Blast, a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

    Article  CAS  Google Scholar 

  17. Wootton, J.C. & Federhen, S. Analysis of compositionally biased regions in sequence databases. Methods Enzymol. 266, 554–571 (1996).

    Article  CAS  Google Scholar 

  18. Wootton, J.C. Sequences with unusual amino acid composition. Curr. Opin. Struct. Biol. 4, 413–421 (1994).

    Article  CAS  Google Scholar 

  19. Lupas, A. Predicting coiled coil regions in proteins Curr. Opin. Struct. Biol. 7, 388–393 (1997).

    Article  CAS  Google Scholar 

  20. Rost, B. & O'Donoghue, S. Sysisphus and the prediction of protein structure Comput. Appl. Biosci. 13, 345–356 (1997).

    CAS  PubMed  Google Scholar 

  21. Henikoff, S. et al. Gene families: the taxonomy of protein paralogues and chimeras. Science 278, 609–613 (1997).

    Article  CAS  Google Scholar 

  22. Schultz, J., Milpetz, F., Bork, P. & Ponting, C.P. SMART, a simple modular architecture research tool: Identification of signalling domains Proc. Natl. Acad. Sci. USA, in press.

  23. Zhang, Y. et al. Positional cloning of the mouse obese gene and its human homologue. Nature 372, 425–432 (1994).

    Article  CAS  Google Scholar 

  24. Madej, T., Boguski, M.S. & Bryant, S.H. Threading analysis suggests that the obese gene product may be a helical cytokine. FEBS Lett. 373, 13–18 (1995).

    Article  CAS  Google Scholar 

  25. Zhang, F. et al. Crystal structure of the obese protein leptin-E100. Nature 387, 206–209 (1997).

    Article  CAS  Google Scholar 

  26. Tartaglia, L.A. The leptin receptor. J. Biol. Chem. 272, 6093–6096 (1996).

    Article  Google Scholar 

  27. Rost, B., Schneider, R. & Sander, C. Protein fold recognition by prediction-based threading. J. Mol. Biol. 270, 471–480 (1997).

    Article  CAS  Google Scholar 

  28. Smith, T.F. et al. Current limitations to protein threading approaches. J. Comput. Biol. 4, 217–225 (1997).

    Article  CAS  Google Scholar 

  29. Fischer, D. & Eisenberg, D. Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium. Proc. Natl. Acad. Sci. USA 94, 11929–11934 (1997).

    Article  CAS  Google Scholar 

  30. Fitch, W.M. Distinguishing homologous from analogous proteins. Syst Zool. 19, 99–113 (1970).

    Article  CAS  Google Scholar 

  31. Fitch, W.M. Uses for evolutionary trees. Phil. Trans. R. Soc. Lond. B. 349, 93–102 (1995).

    Article  CAS  Google Scholar 

  32. Kaghad, M. et al. Monoallelically expressed gene related to p53 at 1p36, a region frequently deleted in neuroblastoma and other human cancers. Cell 90, 809–819 (1997).

    Article  CAS  Google Scholar 

  33. Schultz, J., Ponting, C.P., Hofmann, K. & Bork, P. SAM as a protein interaction domain involved in developmental regulation. Prot. Sci. 6, 249–253 (1997).

    Article  CAS  Google Scholar 

  34. Schmale, H. & Bamberger, C. A novel protein with strong homology to the tumor surpressor p53. Oncogene 15, 1363–1367 (1997).

    Article  CAS  Google Scholar 

  35. Riley, M. Functions of the gene products of Escherichia coli. Microbiol. Rev. 57, 862–952 (1993).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Bork, P. et al. Exploring the Mycoplasma capricolum genome: a minimal cell reveals its physiology. Mol. Microbiol. 16, 955–967 (1995).

    Article  CAS  Google Scholar 

  37. Tatusov, R.L. et al. Metabolism and evolution of Haemophilus influenzae deduced from a whole genome comparison to Escherichia coli. Curr. Biol. 6, 279–291 (1996).

    Article  CAS  Google Scholar 

  38. Hieter, P. & Boguski, M. functional genomics: It's all how you read it. Science 278, 601–602 (1997).

    Article  CAS  Google Scholar 

  39. Zhang, L. et al. Gene expression profiles in normal and cancer cells. Science 276, 1268–1272 (1997).

    Article  CAS  Google Scholar 

  40. Koenig, M., Monaco, A.P. & Kunkel, L.M. The complete sequence of dystrophin predicts a rod-shaped cytoskeletal protein. Cell 53, 219–226 (1988).

    Article  CAS  Google Scholar 

  41. Bork, P. & Sudol, M. The WW domain: a signalling site in dystrophin? Trends Biochem. Sci. 19, 531–533 (1994).

    Article  CAS  Google Scholar 

  42. Ponting, C.P., Blake, D.J., Davies, K.E., Kendrick-Jones, J. & Winder, S.J. ZZ and TAZ: new putative zinc fingers in dystrophin and other proteins. Trends Biochem. Sci. 21, 11–13 (1996).

    Article  CAS  Google Scholar 

  43. Macias, M.J. et al. Structure of the WW domain of a kinase-associated protein complexed with a proline-rich peptide. Nature 382, 646–649 (1996).

    Article  CAS  Google Scholar 

  44. Huntington's disease collaborative research group.A novel gene containing a trinucleotide repeat that is expanded and unstable in Huntington's disease chromosomes. Cell 72, 971–983 (1993).

    Article  Google Scholar 

  45. Andrade, M. & Bork, P. HEAT repeats in Huntington's disease protein. Nature Genet. 11, 115–116 (1995).

    Article  CAS  Google Scholar 

  46. Miki, Y. et al. A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. Science 266, 66–71 (1994).

    Article  CAS  Google Scholar 

  47. Koonin, E.V., Altschul, S. & Bork, P. BRCA1 protein products: functional motifs. Nature Genet. 13, 266–268 (1996).

    Article  CAS  Google Scholar 

  48. Bork, P. et al. A superfamily of conserved domains in DNA damage responsive cell cycle checkpoint proteins. FASEB J. 11, 68–76 (1997).

    Article  CAS  Google Scholar 

  49. Callebaut, I. & Mornon, J.P. From BRCA1 to RAP1: a widespread BRCT module closely associated with DNA repair. FEBS Lett. 400, 25–30 (1997).

    Article  CAS  Google Scholar 

  50. Monteiro, A.N.A., August, A. & Hanafusa, H. Evidence for a transcriptional activation function of BRCA1 C-terminal region. Proc. Natl. Acad. Sci. USA 93, 13595–13599 (1996).

    Article  CAS  Google Scholar 

  51. Scully, R. et al. Dynamic changes of BRCA1 subnuclear location and phosphorylation state are initiated by DNA damage. Cell 90, 425–435 (1997).

    Article  CAS  Google Scholar 

  52. Wooster, R. et al. Identification of the breast cancer susceptibility gene BRCA2. Nature 378, 789–792 (1995).

    Article  CAS  Google Scholar 

  53. Bork, P., Blomberg, N. & Nilges, M. Internal repeats in the BRCA2 protein sequence. Nature Genet. 13, 22–23 (1996).

    Article  CAS  Google Scholar 

  54. Bignell, G., Micklem, G., Stratton, M.R., Ashworth, A. & Wooster, R. The BRC repeats are conserved in mammalian BRCA2 proteins. Hum. Mol. Genet. 6, 53–58 (1997).

    Article  CAS  Google Scholar 

  55. Cremers, F.P. et al. Cloning of a gene that is rearranged in patients with choroideraemia. Nature 347, 674–677 (1990).

    Article  CAS  Google Scholar 

  56. Koonin, E.V. Human choroideraemia protein contains an FAD-binding domain. Nature Genet. 12, 237–239 (1996).

    Article  CAS  Google Scholar 

  57. Wu, S.K., Zeng, K., Wilson, I.A. & Balch, W.E. Structural insights into the function of the Rab GDI superfamily. Trends Biochem. Sci. 21, 472–476 (1996).

    Article  CAS  Google Scholar 

  58. Campuzano, V. et al. Freidrich's ataxia: autosomal recessive disease caused by an intronic GAA repeat expansion. Science 271, 1423–1427 (1996).

    Article  CAS  Google Scholar 

  59. Gibson, T., Koonin, E.V., Musco, G., Pastore, A. & Bork, P. Freidrich's ataxia protein: phylogenetic evidence for mitochondrial dysfunction. Trends Neurosci. 19, 465–468 (1996).

    Article  CAS  Google Scholar 

  60. Koenig, M. & Mandel, J.-L. Deciphering the cause of Freidrich's ataxia. Curr. Opin. Neurobiol. 7, 689–694 (1997).

    Article  CAS  Google Scholar 

  61. Koch, M.C. et al. The skeletal muscle chloride channel in dominant and recessive myotonia. Science 257, 797–600 (1992).

    Article  CAS  Google Scholar 

  62. Bateman, A. The structure of a domain common to archebacteria and the homocystinuria disease protein. Trends Biochem. Sci. 22, 12–13 (1997).

    Article  CAS  Google Scholar 

  63. Bione, S. et al. A novel X-Hnked gene, G4.5 is responsible for Barth syndrome. Nature Genet. 12, 385–389 (1996).

    Article  CAS  Google Scholar 

  64. Neuwald, A.F. Barth syndrome might be due to acyltransferase deficiency. Curr. Biol. 7, 465–466 (1997).

    Article  Google Scholar 

  65. Kolodner, R. et al. Biochemistry and genetics of eukaryotic mismatch repair. Genes Dev. 10, 1433–1442 (1996).

    Article  CAS  Google Scholar 

  66. Bergerat, A. et al. An atypical topoisomerase II from Archea with implications for meiotic recombination. Nature 386, 414–417 (1997).

    Article  CAS  Google Scholar 

  67. Yu, C.E. et al. Positional cloning of the Werner's syndrome gene. Science 272, 258–262 (1996).

    Article  CAS  Google Scholar 

  68. Mian, I.S. Comparative sequence analysis of ribonucleases HII, II, II PH and D. Nucleic Acids Res. 25, 3187–3195 (1997).

    Article  CAS  Google Scholar 

  69. Morozov, V., Mushegian, A.R., Koonin, E.V. & Bork, P. A putative nucleic acid-binding domain in Bloom's and Werner's syndrome helicases. Trends Biochem. Sci. 22, 417–418 (1997).

    Article  CAS  Google Scholar 

  70. Ellis, N.A. et al. The Bloom's syndrome gene product is homologous to RecQ helicases. Cell 83, 655–666 (1995).

    Article  CAS  Google Scholar 

  71. Symons, M. et al. Wiskott-Aldrich syndrome protein, a novel effector for the GTPase CDC42HS, is implicated in actin polymerization. Cell 84, 723–734 (1996).

    Article  CAS  Google Scholar 

  72. Ponting, C.P. & Phillips, C. Identification of homer as a homologue of the Wiskott-Aldrich syndrome protein suggests a receptor-binding function for WH1 domains. J. Mol. Med. 75, 769–771 (1997).

    Article  CAS  Google Scholar 

  73. Imbert, G. et al. Cloning of the gene for spinocerebellar ataxia 2 reveals a locus with high sensitivity to expanded CAG/glutamine repeats. Nature Genet. 14, 285–291 (1996).

    Article  CAS  Google Scholar 

  74. Neuwald, A.F. & Koonin, E.V., Ataxin-2, global regulators of bacterial gene expression, and spliceosomal snRNP proteins share a conserved domain. J. Mol. Med. 76, 3–5 (1998).

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peer Bork.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bork, P., Koonin, E. Predicting functions from protein sequences—where are the bottlenecks?. Nat Genet 18, 313–318 (1998). https://doi.org/10.1038/ng0498-313

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng0498-313

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing