Hi everyone,
I am struggling since hours to import expression data from GEO. I am trying to work on this dataset: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE151095 It is a Superseries. Some scRNAseq but also bulkRNA seq. I am especially interested in the bulkRNA seq samples (in detail: DC2 and DC3 samples with and without TLR as stated in the sample list).
I have tried this approach in R:
library(DESeq2)
library(tidyverse)
library(GEOquery)
gse<-getGEO(GEO='GSE151095', GSEMatrix=TRUE)
gse <- gse[[1]]
metadata <-pData(phenoData(gse))
exprs(gse)
GSM4565845 GSM4565846 GSM4565847 #and so on
So this does not seem to work, it only gives me the GSM characters. I also could not find a .csv file with countdata to import manually. Any advice, on how I can work on this dataset or on GEO datasets in general? Is there a recommended workflow?
Thanks in advance!
Hey Kevin, thanks for your answer. Is it correct to import the data for the bulk sequencing from here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE151073, I really only want to compare DC2 and DC3 with and without TLR. THose samples are included.
Using the GSE151073_PG_Bulk_Blood_raw_counts.csv.gz?
I did this and i got totally different results after DESeq2, than the group who published the results. But I really don't unterstand why. Looking back at the raw counts, their results are impossible. So i think, that I might have done something wrong.
Hi again. In which way were the results different?
I Will try to quickly state my workflow and post their results. This might be a bigger post then.
I performed tidying to adjust metadata and the data (=countdata)
Getting these final data.frames
prepareing DESeq
I will stop here and just quickly say what I did next: I used lfcShrink (for the contrast group I used ashr), I used biomaRt to annotate the genes (actually the gene names were listed behind the ENSG-number, so I could see if i did any mistakes and then I plotted the values. My vulcano plot looks totally different then the one from the publication.
u can see, that theey state e.g. IL1B as upregulated DEG for DC3_TLR, but when i look back at the original raw count data table for IL1B, this is what I see: I know that those are raw counts, but how can this end up as a DEG upregulated in DC3? (This always refers to the DC3-CD163_s column, which means TLR stimulated)
as comparison, my plot
Hmm, the volcano plot from the publication looks completely bogus. For one, the x-axis values all seem positive, and why is that 10 and not -5?
Some of your genes also appear to have the opposite directionality, in terms of fold-change, but --would you believe-- I have seen this more often than I would like to admit. It is clear that a few --possibly many-- published works contain erroneous results.
Thanks for taking the time to look at it, because I was unsure whether I fucked it up at some place. Well this is sad, as this is a major paper in our field...