Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Jul 25:6:189.
doi: 10.1186/1471-2105-6-189.

GObar: a gene ontology based analysis and visualization tool for gene sets

Affiliations

GObar: a gene ontology based analysis and visualization tool for gene sets

Jason S M Lee et al. BMC Bioinformatics. .

Abstract

Background: Microarray experiments, as well as other genomic analyses, often result in large gene sets containing up to several hundred genes. The biological significance of such sets of genes is, usually, not readily apparent. Identification of the functions of the genes in the set can help highlight features of interest. The Gene Ontology Consortium 1 has annotated genes in several model organisms using a controlled vocabulary of terms and placed the terms on a Gene Ontology (GO), which comprises three disjoint hierarchies for Molecular functions, Biological processes and Cellular locations. The annotations can be used to identify functions that are enriched in the set, but this analysis can be misleading since the underlying distribution of genes among various functions is not uniform. For example, a large number of genes in a set might be kinases just because the genome contains many kinases.

Results: We use the Gene Ontology hierarchy and the annotations to pick significant functions and pathways by comparing the distribution of functions in a given gene list against the distribution of all the genes in the genome, using the hypergeometric distribution to assign probabilities. GObar is a web-based visualizer that implements this algorithm. The public website for GObar 2 can analyse gene lists from the yeast (S. cervisiae), fly (D. Melanogaster), mouse (M. musculus) and human (H. sapiens) genomes. It also allows visualization of the GO tree, as well as placement of a single gene on the GO hierarchy. We analyse a gene list from a genomic study of pre-mRNA splicing to demonstrate the utility of GObar.

Conclusion: GObar is freely available as a web-based tool at http://katahdin.cshl.org:9331/GO2 and can help analyze and visualize gene lists from genomic analyses.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A small section of the GO tree. A schematic of a small section of the molecular function branch of the GO tree around the nucleic-acid binding term. The number of D. Melanogaster genes at each node is also given, as are the GO ids and the definitions of the terms at each node. The GOTermBrowser link at the GObar website [2] allows searching for GO terms using keywords and regular expressions (such as *NA*binding) and can also draw relationship diagrams as interactive images.
Figure 2
Figure 2
Placement of D. Melanogaster Dicer-1 on the GO tree. This is generated by entering FBgn0039016 (Dicer-1 in D. Melanogaster) on the Gobar website, and turning off pruning of the tree in step 4. The leaves (nodes with no children, here shown as green ovals) are the terms associated with Dicer-1. In GObar, green ovals signify nodes that contain genes from the uploaded list, while red nodes do not contain any genes from the uploaded list. In this case they correspond to amino-acid binding (GO:0005515), bidentate ribonuclease III activity (GO:0003725) and double-stranded RNA binding (GO:0016443). The numbers on the path, which signify deviation from the expected values, are used for pruning and highlighting highly interesting nodes, but are not important when pruning has been turned off.
Figure 5
Figure 5
Result of a GObar analysis of human genes with AT-AC-U12 type splice sites. The result of a GObar analysis is an SVG (scalable vector graphics) image, with a red path signifying branches that are disproportionately over-represented in the gene list, as compared to the distribution of all the genes from the organism. Placing the mouse over a GO term pops-up a window in the figure, with information on the GO term and links to download data. The tool also allows searching for terms, as well as zooming in and out of the image. Table 2, which is a section of a table that appears in a pop-up window at the website at the end of the calculation, shows GO terms that are significantly enriched in the dataset. The numbers on each path depict the deviation over the expected count, this calculation is described in the text.
Figure 3
Figure 3
The front page of the GObar website. Selections are made for each step, and the list of genes is entered in the final step before launching the program. The pruning of the tree is controlled in step 4. A node can only be pruned if every node under it also satisfies the pruning condition. The pruning options are explained in the subsection pruning the tree. Very strict pruning might cause useful results to be thrown away, but can also highlight the best information in the dataset. In contrast, using a low stringency at this step, or no pruning, can cause too much information to be presented. Step 5 allows the nodes to be either annotated with GO ids (preferred for large trees) or definitions of GO terms. The GOTermBrowser link at the top of the page allows search for GOids using keywords or regular expressions such as "*NA*binding" to explore the GO tree neighborhood of the search term.
Figure 4
Figure 4
GO tree depth calculation. A section of the GO tree is depicted here. The directed acyclic nature of GO is shown by the red and black trails leading to the same node. The depth of a node is its distance from the root. Thus the node for GO:0003700 at the bottom has different depths on the tree, depending on the path traversed to get to it from the root. We use the higher number (6, the greater depth) as its depth for our calculations, which are described in the text.
Figure 6
Figure 6
Calculation of the bare and distributed counts of genes at each GO term. The nodes in the figure are GO terms, the arrows are directed from parent to child nodes. The bare count (BC) at each node is the number of genes that are placed there by the annotations of the gene lists. The distributed counts (DC) are the counts transmitted up from the children of the node. Each node contributes its total count = bare count + distributed count, equally up to each of its parents. Thus half of the total number of genes in Node 3 are contributed to the distributed counts of Node 0 and Node 4.

Similar articles

Cited by

References

    1. Gene Ontology Consortium website http://www.geneontology.org
    1. GObar website http://katahdin.cshl.org:9331/GO
    1. Smith B, Williams J, Schulze-Kremer S. The Ontology of Gene Ontology. Proceedings of AMIA Symposium. 2003. http://ontology.buffalo.edu/medo/Gene_Ontology.pdf - PMC - PubMed
    1. Gene E. Entrez Gene: unified query environment for genes http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene
    1. FlyBase FlyBase: A database of the drosophila genome http://www.flybase.org/

Publication types

LinkOut - more resources