Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 10;21(1):139.
doi: 10.1186/s12859-020-3447-4.

GOMCL: a toolkit to cluster, evaluate, and extract non-redundant associations of Gene Ontology-based functions

Affiliations

GOMCL: a toolkit to cluster, evaluate, and extract non-redundant associations of Gene Ontology-based functions

Guannan Wang et al. BMC Bioinformatics. .

Abstract

Background: Functional enrichment of genes and pathways based on Gene Ontology (GO) has been widely used to describe the results of various -omics analyses. GO terms statistically overrepresented within a set of a large number of genes are typically used to describe the main functional attributes of the gene set. However, these lists of overrepresented GO terms are often too large and contains redundant overlapping GO terms hindering informative functional interpretations.

Results: We developed GOMCL to reduce redundancy and summarize lists of GO terms effectively and informatively. This lightweight python toolkit efficiently identifies clusters within a list of GO terms using the Markov Clustering (MCL) algorithm, based on the overlap of gene members between GO terms. GOMCL facilitates biological interpretation of a large number of GO terms by condensing them into GO clusters representing non-overlapping functional themes. It enables visualizing GO clusters as a heatmap, networks based on either overlap of members or hierarchy among GO terms, and tables with depth and cluster information for each GO term. Each GO cluster generated by GOMCL can be evaluated and further divided into non-overlapping sub-clusters using the GOMCL-sub module. The outputs from both GOMCL and GOMCL-sub can be imported to Cytoscape for additional visualization effects.

Conclusions: GOMCL is a convenient toolkit to cluster, evaluate, and extract non-redundant associations of Gene Ontology-based functions. GOMCL helps researchers to reduce time spent on manual curation of large lists of GO terms, minimize biases introduced by redundant GO terms in data interpretation, and batch processing of multiple GO enrichment datasets. A user guide, a test dataset, and the source code of GOMCL are available at https://github.com/Guannan-Wang/GOMCL and www.lsugenomics.org.

Keywords: Functional genomics; Functional networks; GO annotations; GO similarity; Gene ontology clustering; High throughput omics; Markov clustering.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
The workflow of GOMCL clustering on GO enrichment test results
Fig. 2
Fig. 2
Representative outputs created with GOMCL for clustering of enriched GO terms in a selected study [22] to distinguish two cell populations. Overlap coefficient of 0.5 and cluster granularity of 1.5 were used in GOMCL for cluster identification. a Similarity heatmap, b Network of identified GOMCL clusters. Node size represents the number of genes in the test set which are annotated to that GO term; edges represent the similarity index between GO terms; each cluster is coded with a different color; and shade of each node represents p-value assigned by the enrichment test. Lighter to darker shades indicate larger to smaller p-values, respectively. c A tabular summary of all GOMCL clusters. x: the number of genes in the test set; n: total number of genes in the reference annotation
Fig. 3
Fig. 3
Cumulative distribution of similarity indexes between GO terms within each GOMCL cluster identified from test data reported in Wendrich et al. 2017. P(> = 0.5) indicates the proportion of similarity indexes greater than 0.5 among all the similarity indexes within a given cluster
Fig. 4
Fig. 4
GO hierarchical structure produced using GOMCL for cluster C1 described in Fig. 1. Edges represent the parent/child relationships of the GO terms. The black edges connect parent and child terms that are directly linked, while the grey edges indicate connections with intermediate GO terms between the parent and child terms. Node size represents the number of genes in the test set which are annotated to that GO term; and shade of each node represents p-value assigned by the enrichment test. Lighter to darker shades indicate larger to smaller p-values, respectively. The main hierarchical branches are marked by red circles
Fig. 5
Fig. 5
Sub-clustering results produced by GOMCL-sub on cluster C1 described in Fig. 1. a Similarity heatmap of sub-groups identified by GOMCL-sub. b and c GO hierarchical structures of C1–1 and C1–2 sub-clusters. The black edges connect parent and child terms that are directly linked, while the grey edges indicate connections with intermediate GO terms between the parent and child terms. Node size represents the number of genes in the test set which are annotated to that GO term; and shade of each node represents p-value assigned by the enrichment test. Lighter to darker shades indicate larger to smaller p-values, respectively. The main hierarchical branches are marked by red circles

Similar articles

Cited by

References

    1. Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 1999;27:29–34. doi: 10.1093/nar/27.1.29. - DOI - PMC - PubMed
    1. Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2016;44:D457–D462. doi: 10.1093/nar/gkv1070. - DOI - PMC - PubMed
    1. Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017;45:D353–D361. doi: 10.1093/nar/gkw1092. - DOI - PMC - PubMed
    1. Fabregat A, Jupe S, Matthews L, Sidiropoulos K, Gillespie M, Garapati P, et al. The Reactome Pathway Knowledgebase. Nucleic Acids Res. 2018;46:D649–D655. doi: 10.1093/nar/gkx1132. - DOI - PMC - PubMed
    1. Cerami EG, Gross BE, Demir E, Rodchenkov I, Babur O, Anwar N, et al. Pathway Commons, a web resource for biological pathway data. Nucleic Acids Res. 2011;39(Database):D685–D690. doi: 10.1093/nar/gkq1039. - DOI - PMC - PubMed