Question

GENE ONTOLOGY FOR TOMATO-ITAG4.1 OR 4

1

Entering edit mode

2.3 years ago

anusha ▴ 10

Hi all, I was doing Gene ontology enrichment analysis and realised the software i was using , had a different build of genome version like SL 3.0 instead of SL 4.0. By using this so many genes are missing the annotation. Iam not able to find any GO annotation file for itag 4.1 or SL 4.0 build. Can you all let me know how to generate that file or any software tool which uses new version of genome in tomato to do GO enrichment analysis.

BUILD GO ITAG4.1 SL4.0 • 2.3k views

ADD COMMENT • link updated 8 months ago by Hamtaro ▴ 50 • written 2.3 years ago by anusha ▴ 10

score 1 · Answer 1 · 2023-01-20

1

Entering edit mode

2.1 years ago

ar14g12 ▴ 10

Hi, is this what you're looking for: https://solgenomics.net/ftp/tomato_genome/annotation/ITAG4.1_release/

ADD COMMENT • link 2.1 years ago by ar14g12 ▴ 10

score 0 · Answer 2 · 2023-10-09

Hey there!

So I was absolutely having the same issue as you. The assembly/annotation version difference across different tools is horrendous when working with Tomato.

You can perform GO enrichment using a custom GO annotation file using the enricher() function from the clusterProfiler package. It takes in TERM2GENE and TERM2NAME objects in lieu of an annotation database. The TERM2GENE and TERM2NAME objects should each be in the format of a dataframe with 2 columns, GOterms in column 1 and either a GO description (for TERM2NAME) or a gene/locusID (for TERM2GENE) in column 2.

I have taken the liberty of writing a simple function that will perform the analysis, that takes in a dataframe with the required GO mappings (provided below):

https://github.com/Tobias-deWerk/GOenrichment/blob/main/gene-to-GO-ALL.csv

library(clusterProfiler)

GOEnrichment <- function(genes_of_interest, universe, gene_to_GO, ontology = 'BP'){

     if (ontology == 'ALL') {

          TERM2GENE = gene_to_GO[,c('GOterm', 'LocusID')]
          TERM2GENE$LocusID = substr(TERM2GENE$LocusID, start = 1, stop = 16)
          TERM2NAME = gene_to_GO[,c('GOterm', 'GOdesc')]

     } else {

          TERM2GENE = gene_to_GO[gene_to_GO$Ontology == ontology, c('GOterm', 'LocusID')]
          TERM2GENE$LocusID = substr(TERM2GENE$LocusID, start = 1, stop = 16)
          TERM2NAME = gene_to_GO[gene_to_GO$Ontology == ontology, c('GOterm', 'GOdesc')]

     }

     results <- clusterProfiler::enricher(gene = genes_of_interest,
                                       universe = universe,
                                       TERM2GENE = TERM2GENE,
                                       TERM2NAME = TERM2NAME,
                                       pAdjustMethod = 'fdr',
                                       pvalueCutoff = 0.1,
                                       minGSSize = 7,
                                       maxGSSize = 500)

     return(results)
}

The results can be obtained by calling the function:

results <- GOEnrichment(genes_of_interest = "A vector of your genes of interest",
                    universe = "A vector containing all genes in your data",
                    gene_to_GO = "The gene to GO conversion mapping provided below",
                    ontology = "Biological process (BP) / Molecular Function (MF) / Cellular Component (CC) / All (ALL)")

The results object can be treated as normal in clusterProfiler, e.g. by calling the results:

> results

or making a dotplot

dotplot(results)