Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dotplot Circle Size Ambiguous #305

Open
DarioS opened this issue Dec 12, 2024 · 1 comment
Open

Dotplot Circle Size Ambiguous #305

DarioS opened this issue Dec 12, 2024 · 1 comment

Comments

@DarioS
Copy link

DarioS commented Dec 12, 2024

Dot plot is similar to bar plot with the capability to encode another score as dot size.

Is circle diameter the number of DE genes or number of total genes in pathway after filtering by the gene universe?

@guidohooiveld
Copy link

guidohooiveld commented Dec 16, 2024

I agree that this is not (very well) documented.

According to the help page of dotplot (?dotplot), the default setting of the argument size is NULL. If needed, it can be specified to either be "geneRatio", "Percentage" and "count".

If not specified (i.e. size = NULL), checking the source code shows that the 'value' of size is depending on x; thus on which values are plotted on the x-axis:

enrichplot/R/dotplot.R

Lines 148 to 171 in 0ca2fcf

dotplot.enrichResult <- function(object, x = "geneRatio", color = "p.adjust",
showCategory=10, size=NULL, split = NULL,
font.size=12, title = "", orderBy="x",
label_format = 30, decreasing=TRUE) {
colorBy <- match.arg(color, c("pvalue", "p.adjust", "qvalue"))
if (x == "geneRatio" || x == "GeneRatio") {
x <- "GeneRatio"
if (is.null(size))
size <- "Count"
} else if (x == "count" || x == "Count") {
x <- "Count"
if (is.null(size))
size <- "GeneRatio"
} else if (is(x, "formula")) {
x <- as.character(x)[2]
if (is.null(size))
size <- "Count"
} else {
## message("invalid x, setting to 'GeneRatio' by default")
## x <- "GeneRatio"
## size <- "Count"
if (is.null(size))
size <- "Count"
}

Since for the plotting of almost all types of analyses the default is x = "GeneRatio", size thus is set to Count.

In the next chunk of code the value of Count is calculated through the function fortify. It then becomes clear that:

For an ORA, Count equals the number of foreground genes (for example genes present in a list of DEG) that are annotated to that gene set / pathway.

For a GSEA, Count equals the number of leading edge (a.k.a. core enriched) genes of that gene set / pathway.

enrichplot/R/dotplot.R

Lines 173 to 181 in 0ca2fcf

if (inherits(object, c("enrichResultList", "gseaResultList"))) {
ldf <- lapply(object, fortify, showCategory=showCategory, split=split)
df <- dplyr::bind_rows(ldf, .id="category")
df$category <- factor(df$category, levels=names(object))
} else {
df <- fortify(object, showCategory = showCategory, split=split)
## already parsed in fortify
## df$GeneRatio <- parse_ratio(df$GeneRatio)
}

Some code to demonstrate this:


> library(clusterProfiler)
> library(ggplot2) ## for function fortify
> 
> data(geneList, package="DOSE")
> gene <- names(geneList)[abs(geneList) > 2]
> 
> ORA.kegg <- enrichKEGG(gene         = gene,
+                        organism     = 'hsa',
+                        pvalueCutoff = 0.05)
> 
> 
> 
> GSEA.kegg <- gseKEGG(geneList     = geneList,
+                      organism     = 'hsa',
+                      eps          = 0,
+                      pvalueCutoff = 0.05)
using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).

preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
> 
> 
> 
> ## line 178 of the source code
> ## https://github.com/YuLab-SMU/enrichplot/blob/0ca2fcf18a4178e28ea8578d9b21e471448ef5fa/R/dotplot.R#L178
> ## set additional arguments
> showCategory=3
> split = NULL
> 

> ## for ORA result
> fortify(ORA.kegg, showCategory = showCategory, split=split)
                   category           subcategory       ID         Description
hsa04110 Cellular Processes Cell growth and death hsa04110          Cell cycle
hsa04114 Cellular Processes Cell growth and death hsa04114      Oocyte meiosis
hsa04218 Cellular Processes Cell growth and death hsa04218 Cellular senescence
          GeneRatio    BgRatio RichFactor FoldEnrichment   zScore       pvalue
hsa04110 0.14150943 0.01779680 0.09493671       7.951397 9.691235 4.684809e-10
hsa04114 0.09433962 0.01565668 0.07194245       6.025519 6.564387 5.673603e-06
hsa04218 0.09433962 0.01768416 0.06369427       5.334695 6.023681 1.667772e-05
             p.adjust       qvalue
hsa04110 1.007234e-07 9.862756e-08
hsa04114 6.099123e-04 5.972213e-04
hsa04218 1.195237e-03 1.170366e-03
                                                                           geneID
hsa04110 8318/991/9133/10403/890/983/4085/81620/7272/9212/1111/9319/891/4174/9232
hsa04114                          991/9133/983/4085/51806/6790/891/9232/3708/5241
hsa04218                           2305/4605/9133/890/983/51806/1111/891/776/3708
         Count
hsa04110    15
hsa04114    10
hsa04218    10
> 
> ## check value for Count for set 'Cell cycle', that has id hsa04110
> fortify(ORA.kegg, showCategory = showCategory, split=split)["hsa04110","Count"]
[1] 15
> 
> dotplot(ORA.kegg)
>

image

> ## for GSEA result
> fortify(GSEA.kegg, showCategory = showCategory, split=split)
               ID             Description setSize enrichmentScore      NES
hsa04110 hsa04110              Cell cycle     139       0.6637551 2.850548
hsa03050 hsa03050              Proteasome      43       0.7094784 2.489242
hsa04657 hsa04657 IL-17 signaling pathway      85       0.5622094 2.205005
               pvalue     p.adjust       qvalue rank
hsa04110 5.567944e-21 1.926509e-18 1.435943e-18 1155
hsa03050 1.425077e-08 2.465384e-06 1.837600e-06 2516
hsa04657 2.226086e-08 2.567419e-06 1.913652e-06 2880
                           leading_edge
hsa04110  tags=36%, list=9%, signal=33%
hsa03050 tags=65%, list=20%, signal=52%
hsa04657 tags=49%, list=23%, signal=38%
                                                                                                                                                                                                                                              core_enrichment
hsa04110 8318/991/9133/10403/890/983/4085/81620/7272/9212/1111/9319/891/4174/9232/4171/993/990/5347/701/9700/898/23594/4998/9134/4175/4173/10926/6502/994/699/4609/5111/26271/1869/1029/8317/4176/2810/3066/1871/1031/9088/995/1019/4172/5885/11200/7027/1875
hsa03050                                                                                                      5688/5709/5698/5693/3458/5713/11047/5721/5691/5685/5690/5684/5686/5695/10213/23198/7979/5699/5714/5702/5708/5692/5704/5683/5694/5718/51371/5682
hsa04657                                4312/6280/6279/6278/3627/2921/6364/8061/4318/3576/3934/6347/727897/1051/6354/3458/6361/6374/2919/9618/5603/7128/1994/7124/3569/8772/5743/7186/3596/6356/5594/4792/9641/1147/2932/6300/5597/27190/1432/7184/64806/3326
         Count     .sign GeneRatio
hsa04110    50 activated 0.3597122
hsa03050    28 activated 0.6511628
hsa04657    42 activated 0.4941176
> 
> ## check value for Count for set 'Ribosome biogenesis in eukaryotes', that has id hsa03008
> fortify(GSEA.kegg, showCategory = 10, split=split)["hsa03008","Count"]
[1] 41
> 
> 
> dotplot(GSEA.kegg, split=".sign") + facet_grid(.~.sign)
> 

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants