Hi,
I'm having trouble with Sankey and dot plots. Could anyone help me with the R script to draw them?
Thank you
Hi,
I'm having trouble with Sankey and dot plots. Could anyone help me with the R script to draw them?
Thank you
Here is a rough outline you can start to play with using the dataset you provided. It creates separate Sankey and dot plots, but you can create a vector to reorder the pathway so that both plots are ordered correctly.
library(ggalluvial)
library(ggplot2)
library(tidyr)
## Loading toy dataset
dat <- structure(list(
Pathway = c("Alpha-amino acid biosynthetic proc.", "Carboxylic acid biosynthetic proc.",
"Organic acid biosynthetic proc.", "Small molecule biosynthetic proc.",
"Carboxylic acid metabolic proc.", "Oxoacid metabolic proc."),
GeneRatio = c(0.133333, 0.066869, 0.066465, 0.055838, 0.042726, 0.042616),
FDR = c(3.851756, 4.941747, 4.941747, 5.919270, 5.207657, 5.207657),
GeneID = c("ASNS/CTH/ASS1/PSAT1/CBS/ATP2B4/OAT/PHGDH/MTHFD1/PLOD3",
"ASNS/CTH/ASS1/ACLY/PSAT1/OXSM/CBS/NA/CASP1/ATP2B4/MTHFD1/KYNU/OLAH/AKR1C3/OAT/PHGDH/PLOD3/HSD17B12/FABP5/ERLIN2/PRMT3/CBR1",
"ASNS/CTH/ASS1/ACLY/PSAT1/OXSM/CBS/NA/CASP1/ATP2B4/MTHFD1/KYNU/OLAH/AKR1C3/OAT/PHGDH/PLOD3/HSD17B12/FABP5/ERLIN2/PRMT3/CBR1",
"ASNS/COQ9/COQ5/CTH/COQ6/ASS1/ACLY/PSAT1/OXSM/CBS/FDXR/PC/NA/CASP1/ATP2B4/MTHFD1/KYNU/ERLIN2/OLAH/AKR1C3/OAT/PHGDH/PLOD3/ALDOC/HSD17B12/ADK/FABP5/PTPN2/PRMT3/SIRT5/AACS/ACSS3/CBR1",
"ACAA1/OAT/ASNS/ARG2/QPRT/FAH/ALDOC/KYNU/CTH/ASS1/ACLY/HSD17B4/PSAT1/IDH1/EPHX1/OXSM/CBS/PC/AKR1C3/NA/DLD/CASP1/ATP2B4/MTHFD1/ALDH1L2/OLAH/FOXK1/ACAD9/GPX1/LYPLA2/AACS/PHGDH/PLOD3/SLC27A1/HSD17B12/FABP5/GFPT1/NUDT19/ERLIN2/PRMT3/GSS/CBR1",
"ACAA1/OAT/ASNS/ARG2/QPRT/FAH/ALDOC/KYNU/CTH/ASS1/ACLY/HSD17B4/PSAT1/IDH1/EPHX1/OXSM/CBS/PC/AKR1C3/NA/DLD/CASP1/ATP2B4/MTHFD1/ALDH1L2/OLAH/FOXK1/ACAD9/GPX1/LYPLA2/AACS/PHGDH/PLOD3/SLC27A1/HSD17B12/FABP5/GFPT1/NUDT19/ERLIN2/PRMT3/GSS/ABHD14B/CBR1"),
Count = c(10L, 22L, 22L, 33L, 42L, 43L)), class = "data.frame", row.names = c(NA, -6L))
## Splitting the GeneID by /
sankey_dat <- dat %>%
separate_rows(GeneID, sep = "/")
## Draw the Sankey network
ggplot(sankey_dat, aes(axis1 = GeneID, axis2 = Pathway, y = Count)) +
geom_alluvium(aes(fill = Pathway), colour = "black") +
geom_stratum() +
geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
scale_x_discrete(limits = c("Pathway", "GeneID"), labels = NULL, expand = c(0,0)) +
scale_y_continuous(expand = c(0,0), labels = NULL) +
labs(y = "Gene Count", x = "", fill = "Pathway") +
theme_classic(base_size = 15) +
theme(legend.position = "none", axis.text.x = element_blank(), axis.ticks.x = element_blank(), axis.text.y = element_blank(), axis.ticks.y = element_blank())
## Draw the dot plot
ggplot(dat, aes(x = GeneRatio, y = reorder(Pathway, GeneRatio), size = Count, fill = FDR)) +
geom_point(shape=21, colour = "black") +
scale_fill_gradient(low = "yellow", high = "blue") +
labs(x = "Gene Ratio", y = "Pathway", color = "FDR", size = "Gene Count") +
theme_classic()
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
What is the problem, where is the code you are trying? Provide code, example data, expected output.
I agree with zx8754 you haven't indicated what the problem is. To me, the sankey looks reasonable given how much data you are trying to plot, as does the dot plot, though I don't know exactly what you are trying to show so it's hard to say if it's appropriate.
Hi,
This is the header of my data:
pathway GeneRatio FDR genesID count
In the genesID, I have a list of genes separated by two spaces.
This is my script and output
You still haven't elaborated on what your problem is. Are you trying to recreate the top image and the bottom is what you end up with?
I think both since this is
geom_alluvium
this might be for the sankeyThank you for your attention. Yes, the top is what I need and the bottom is my output after the script, using a table with pathway, GeneRatio, FDR, genesID, count as data.
Please add your dataframe that would be helpful for others to recreate the issue and use you code may be small subset of your data would be helpful
Hi, thank you for your attention. Here is my dataframe as you asked