Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2000 Mar 24;287(5461):2204-15.
doi: 10.1126/science.287.5461.2204.

Comparative genomics of the eukaryotes

Affiliations
Comparative Study

Comparative genomics of the eukaryotes

G M Rubin et al. Science. .

Abstract

A comparative analysis of the genomes of Drosophila melanogaster, Caenorhabditis elegans, and Saccharomyces cerevisiae-and the proteins they are predicted to encode-was undertaken in the context of cellular, developmental, and evolutionary processes. The nonredundant protein sets of flies and worms are similar in size and are only twice that of yeast, but different gene families are expanded in each genome, and the multidomain proteins and signaling pathways of the fly and worm are far more complex than those of yeast. The fly has orthologs to 177 of the 289 human disease genes examined and provides the foundation for rapid analysis of some of the basic processes involved in human disease.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Fly (F), worm (W), and yeast (Y) genes showing similarity to human disease genes. This collection of human disease genes was selected to represent a cross section of human pathophysiology and is not comprehensive. The selection criteria require that the gene is actually mutated, altered, amplified, or deleted in a human disease, as opposed to having a function deduced from experiments on model organisms or in cell culture. Due to redundancy in gene and protein sequence databases, a single reference sequence for each gene had to be chosen. Most reference sequences represent the longest mRNA of several alternatives in GenBank. Authoritative sources in the literature and electronic databases [Online Mendelian Inheritance in Man (OMIM)] were also consulted. In all, 289 protein sequences met these criteria. These were used as queries to search a database consisting of the sum total of gene products (38,860) found in the complete genomes of fly, worm, and yeast. 12,953 was used as the effective database size (the z parameter in BLAST). BLASTP searches were conducted as described for full genome searches, except for the z parameter. To control for potential frameshift errors in the Drosophila genome sequence, searches against a six-frame translation of the entire genome (using TBLASTN) were also conducted with the disease gene sequences using the z parameter above. Only two cases in which matches to genomic sequence were better than to the predicted protein were found, and these were manually corrected to reflect the better TBLASTN scores in the table. Results are scaled according to various levels of statistical significance, reflecting a level of confidence in either evolutionary homology or functional similarity. White boxes represent BLAST E values >1 × 10−6, indicating no or weak similarity; light blue boxes represent E values in the range of 1 × 10−6 to 1 × 10−40; purple boxes represent E values in the range of 1 × 10−40 to 1 × 10−100; and dark blue boxes represent E values <1 × 10−100, indicating the highest degree of sequence conservation. Actual E values can be found in the Web supplement to this figure (62), where links to OMIM and GenBank may also be found. A plus sign indicates our best estimate that the corresponding Drosophila gene product is the functional equivalent of the human protein, based on degree of sequence similarity, InterPro domain composition, and supporting biological evidence, when available. A minus sign indicates that we were unable to identify a likely functional equivalent of the human protein.
Fig. 1
Fig. 1
Fly (F), worm (W), and yeast (Y) genes showing similarity to human disease genes. This collection of human disease genes was selected to represent a cross section of human pathophysiology and is not comprehensive. The selection criteria require that the gene is actually mutated, altered, amplified, or deleted in a human disease, as opposed to having a function deduced from experiments on model organisms or in cell culture. Due to redundancy in gene and protein sequence databases, a single reference sequence for each gene had to be chosen. Most reference sequences represent the longest mRNA of several alternatives in GenBank. Authoritative sources in the literature and electronic databases [Online Mendelian Inheritance in Man (OMIM)] were also consulted. In all, 289 protein sequences met these criteria. These were used as queries to search a database consisting of the sum total of gene products (38,860) found in the complete genomes of fly, worm, and yeast. 12,953 was used as the effective database size (the z parameter in BLAST). BLASTP searches were conducted as described for full genome searches, except for the z parameter. To control for potential frameshift errors in the Drosophila genome sequence, searches against a six-frame translation of the entire genome (using TBLASTN) were also conducted with the disease gene sequences using the z parameter above. Only two cases in which matches to genomic sequence were better than to the predicted protein were found, and these were manually corrected to reflect the better TBLASTN scores in the table. Results are scaled according to various levels of statistical significance, reflecting a level of confidence in either evolutionary homology or functional similarity. White boxes represent BLAST E values >1 × 10−6, indicating no or weak similarity; light blue boxes represent E values in the range of 1 × 10−6 to 1 × 10−40; purple boxes represent E values in the range of 1 × 10−40 to 1 × 10−100; and dark blue boxes represent E values <1 × 10−100, indicating the highest degree of sequence conservation. Actual E values can be found in the Web supplement to this figure (62), where links to OMIM and GenBank may also be found. A plus sign indicates our best estimate that the corresponding Drosophila gene product is the functional equivalent of the human protein, based on degree of sequence similarity, InterPro domain composition, and supporting biological evidence, when available. A minus sign indicates that we were unable to identify a likely functional equivalent of the human protein.

Similar articles

Cited by

References

    1. Adams MD, et al. Science. 2000;287:2185. - PubMed
    2. C elegans Sequencing Consortium. Science. 1998;282:2012. - PubMed
    3. Goffeau A, et al. Science. 1996;274:546. - PubMed
    1. Fleischman RD, et al. Science. 1995;269:496. - PubMed
    1. C. elegans data were taken from A C. Elegans Database (ACEDB) release WS8.

    1. Local gene duplications were determined by searching for N similar genes within 2N genes on each arm. For example, if three similar genes are found within a region containing six genes, this counts as one cluster of three genes. Genes were judged to be similar if a BLASTP High Scoring Pair (HSP) with a score of 200 or more existed between them. Histone gene clusters were not included. C. elegans data were taken from ACEDB release WS8, containing 18,424 genes.

    1. More information about GO is available at http://www.geneontology.org/. The Gene Ontology project provides terms for categorizing gene products on the basis of their molecular function, biological role, and cellular location using controlled vocabularies.

Publication types

MeSH terms