Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Feb 25:7:94.
doi: 10.1186/1471-2105-7-94.

GONOME: measuring correlations between GO terms and genomic positions

Affiliations

GONOME: measuring correlations between GO terms and genomic positions

Stefan M Stanley et al. BMC Bioinformatics. .

Abstract

Background: Current methods to find significantly under- and over-represented gene ontology (GO) terms in a set of genes consider the genes as equally probable "balls in a bag", as may be appropriate for transcripts in micro-array data. However, due to the varying length of genes and intergenic regions, that approach is inappropriate for deciding if any GO terms are correlated with a set of genomic positions.

Results: We present an algorithm--GONOME--that can determine which GO terms are significantly associated with a set of genomic positions given a genome annotated with (at least) the starts and ends of genes. We show that certain GO terms may appear to be significantly associated with a set of randomly chosen positions in the human genome if gene lengths are not considered, and that these same terms have been reported as significantly over-represented in a number of recent papers. This apparent over-representation disappears when gene lengths are considered, as GONOME does. For example, we show that, when gene length is taken into account, the term "development" is not significantly enriched in genes associated with human CpG islands, in contradiction to a previous report. We further demonstrate the efficacy of GONOME by showing that occurrences of the proteosome-associated control element (PACE) upstream activating sequence in the S. cerevisiae genome associate significantly to appropriate GO terms. An extension of this approach yields a whole-genome motif discovery algorithm that allows identification of many other promoter sequences linked to different types of genes, including a large group of previously unknown motifs significantly associated with the terms 'translation' and 'translational elongation'.

Conclusion: GONOME is an algorithm that correctly extracts over-represented GO terms from a set of genomic positions. By explicitly considering gene size, GONOME avoids a systematic bias toward GO terms linked to large genes. Inappropriate use of existing algorithms that do not take gene size into account has led to erroneous or suspect conclusions. Reciprocally GONOME may be used to identify new features in genomes that are significantly associated with particular categories of genes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
GONOME and GOstat output on random positions. Panel A compares GONOME and GOstat analyses of 30,000 randomly selected positions in the human genome. The E-values of the top 10 over-represented GO Terms as found by GOstat (red), and the values GONOME derives for the same terms (blue). Panel B shows the top ten over-represented terms according to GONOME. E-values were calculated as described in Methods.
Figure 2
Figure 2
Over-represented gene ontology terms associated with human CpG islands. Over-represented gene ontology terms associated with human CpG islands as determined by GONOME when (A) unscored regions are included in the analysis and when (B) unscored regions are excluded from the analysis. The E-values of the 25 most over-represented GO process associated with CpG islands in the human genome in each case are shown. The image is the actual output of the GONOME application, save that long GO terms have been replaced with shorter equivalents and their GO identification numbers provided in brackets.
Figure 3
Figure 3
GONOME analysis of PACE elements. Figure legend text. Over-represented GO terms associated with positions in the S. cerevisiae genome of the proteosome associated control element (PACE) upstream activating sequence (UAS), 5'-GGTGGCAAA-3'. "Locus" refers in general to a gene and its associated upstream and downstream regions, which are "Hit" once when a position falls within any associated region.
Figure 4
Figure 4
GONOME scoring function: S(X, T). Genomic positions in the input set X are shown with arrows. Regions belonging to genes annotated with GO term T are shaded red. Regions belonging to other genes are shaded green. Transcribed regions are shown in solid color. Upstream (downstream) regions are shown with horizontal (vertical) crosshatching. Unscored regions are shown as horizontal black lines. Positions in green and unscored regions receive association score of zero. Other positions receive association scores equal to the appropriate region weight. Panel A illustrates the simplest case where upstream and downstream regions of adjacent genes do not overlap. In panel B, the position x2 lies in the "overlap" region of two "red" genes, so its score is the sum of the upstream and downstream weights.

Similar articles

Cited by

References

    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–29. doi: 10.1038/75556. - DOI - PMC - PubMed
    1. Beissbarth T, Speed TP. GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics. 2004;20:1464–1465. doi: 10.1093/bioinformatics/bth088. - DOI - PubMed
    1. Boyle EI, Weng S, Gollub J, Jin H, Botstein D, Cherry JM, Sherlock G. GO::TermFinder - open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics. 2004. - PMC - PubMed
    1. Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, Mattick JS, Haussler D. Ultraconserved elements in the human genome. Science. 2004;304:1321–1325. doi: 10.1126/science.1098119. - DOI - PubMed
    1. Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, Weinstock GM, Wilson RK, Gibbs RA, Kent WJ, Miller W, Haussler D. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005;15:1034–1050. doi: 10.1101/gr.3715005. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources