GONOME: measuring correlations between GO terms and genomic positions
- PMID: 16504139
- PMCID: PMC1413564
- DOI: 10.1186/1471-2105-7-94
GONOME: measuring correlations between GO terms and genomic positions
Abstract
Background: Current methods to find significantly under- and over-represented gene ontology (GO) terms in a set of genes consider the genes as equally probable "balls in a bag", as may be appropriate for transcripts in micro-array data. However, due to the varying length of genes and intergenic regions, that approach is inappropriate for deciding if any GO terms are correlated with a set of genomic positions.
Results: We present an algorithm--GONOME--that can determine which GO terms are significantly associated with a set of genomic positions given a genome annotated with (at least) the starts and ends of genes. We show that certain GO terms may appear to be significantly associated with a set of randomly chosen positions in the human genome if gene lengths are not considered, and that these same terms have been reported as significantly over-represented in a number of recent papers. This apparent over-representation disappears when gene lengths are considered, as GONOME does. For example, we show that, when gene length is taken into account, the term "development" is not significantly enriched in genes associated with human CpG islands, in contradiction to a previous report. We further demonstrate the efficacy of GONOME by showing that occurrences of the proteosome-associated control element (PACE) upstream activating sequence in the S. cerevisiae genome associate significantly to appropriate GO terms. An extension of this approach yields a whole-genome motif discovery algorithm that allows identification of many other promoter sequences linked to different types of genes, including a large group of previously unknown motifs significantly associated with the terms 'translation' and 'translational elongation'.
Conclusion: GONOME is an algorithm that correctly extracts over-represented GO terms from a set of genomic positions. By explicitly considering gene size, GONOME avoids a systematic bias toward GO terms linked to large genes. Inappropriate use of existing algorithms that do not take gene size into account has led to erroneous or suspect conclusions. Reciprocally GONOME may be used to identify new features in genomes that are significantly associated with particular categories of genes.
Figures
Similar articles
-
Enrichment of transcriptional regulatory sites in non-coding genomic region.Bioinformatics. 2004 Mar 1;20(4):569-75. doi: 10.1093/bioinformatics/btg450. Epub 2004 Jan 22. Bioinformatics. 2004. PMID: 14990453
-
GeneTools--application for functional annotation and statistical hypothesis testing.BMC Bioinformatics. 2006 Oct 24;7:470. doi: 10.1186/1471-2105-7-470. BMC Bioinformatics. 2006. PMID: 17062145 Free PMC article.
-
Identification of putative regulatory upstream ORFs in the yeast genome using heuristics and evolutionary conservation.BMC Bioinformatics. 2007 Aug 8;8:295. doi: 10.1186/1471-2105-8-295. BMC Bioinformatics. 2007. PMID: 17686169 Free PMC article.
-
PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny.PLoS Comput Biol. 2005 Dec;1(7):e67. doi: 10.1371/journal.pcbi.0010067. Epub 2005 Dec 9. PLoS Comput Biol. 2005. PMID: 16477324 Free PMC article.
-
Advances in the Exon-Intron Database (EID).Brief Bioinform. 2006 Jun;7(2):178-85. doi: 10.1093/bib/bbl003. Epub 2006 Mar 9. Brief Bioinform. 2006. PMID: 16772261 Review.
Cited by
-
Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes.Genome Res. 2011 Nov;21(11):1916-28. doi: 10.1101/gr.108753.110. Epub 2011 Oct 12. Genome Res. 2011. PMID: 21994248 Free PMC article.
-
A computational pipeline for comparative ChIP-seq analyses.Nat Protoc. 2011 Dec 15;7(1):45-61. doi: 10.1038/nprot.2011.420. Nat Protoc. 2011. PMID: 22179591
-
Limited evidence for classic selective sweeps in African populations.Genetics. 2012 Nov;192(3):1049-64. doi: 10.1534/genetics.112.144071. Epub 2012 Sep 7. Genetics. 2012. PMID: 22960214 Free PMC article.
-
The Annotation, Mapping, Expression and Network (AMEN) suite of tools for molecular systems biology.BMC Bioinformatics. 2008 Feb 6;9:86. doi: 10.1186/1471-2105-9-86. BMC Bioinformatics. 2008. PMID: 18254954 Free PMC article.
-
Genome-wide screens for in vivo Tinman binding sites identify cardiac enhancers with diverse functional architectures.PLoS Genet. 2013;9(1):e1003195. doi: 10.1371/journal.pgen.1003195. Epub 2013 Jan 10. PLoS Genet. 2013. PMID: 23326246 Free PMC article.
References
-
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–29. doi: 10.1038/75556. - DOI - PMC - PubMed
-
- Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, Weinstock GM, Wilson RK, Gibbs RA, Kent WJ, Miller W, Haussler D. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005;15:1034–1050. doi: 10.1101/gr.3715005. - DOI - PMC - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Molecular Biology Databases