We have some targeted RNA-seq data as part of a qualitative pilot study we're doing. We're not looking to do DE on this data and do not have replicates.
While we've done some simple within-sample transcript level comparisons using Salmon TPM values we would like to do some between-sample comparisons to assess levels of particular transcripts across the cell line samples.
Ordinarily I'd be bringing these into R and computing TMM and looking at the normalised CPM values. However, given this is effectively a subset of the transcriptome I was wondering if this method is still appropriate in this context and if there was anything I need to keep in mind when computing TMM on transcript level counts?
You need to run the TMM calculation on a set of genes that you are confident with that they are non DE. After all the edgeR procedure is a two-step process. First you normalize by total library size and then you correct this with the TMM factors. This is all done internally, e.g. when running calcNormFactors() and then cpm(). The total library part is the same be it targeted or full transcriptome, but the TMM part should be based on non-DE genes. In a full transcriptome setting TMM tries to automatically find these genes by trimming away genes with extreme M values, and it is good at doing that as long as you do not have extreme shifts in your data and a good portion of non-DE genes. In targeted approaches that is not guaranteed. Are these any controls in there that you can run it on? Technically you would run calcNormFactors() on the count matrix containing only control genes and then feed the factors back to the full DGEList object. Do you have such controls? You can also just run it on the whole thing and then make an MA-plot, checking if individual genes that are supposed to be non-DE center somewhat at y=0.
I see. Thank you. That makes a lot of sense. This experiment is really just a look-see at the expression of a bunch of transcripts across a bunch of cell lines. There aren't any control transcripts in there that we are are confident won't change across cell-lines. However, we have some ERCCs in the mix so providing they have behaved themselves (they haven't been in other runs so we haven't looked at them yet for this one) we could use them as the non DE "genes". I might take the MA-plot route too though and see what we have. Thanks again for your time. It is very much appreciated and including control genes in the next run will be a must. Cheers
I see. Thank you. That makes a lot of sense. This experiment is really just a look-see at the expression of a bunch of transcripts across a bunch of cell lines. There aren't any control transcripts in there that we are are confident won't change across cell-lines. However, we have some ERCCs in the mix so providing they have behaved themselves (they haven't been in other runs so we haven't looked at them yet for this one) we could use them as the non DE "genes". I might take the MA-plot route too though and see what we have. Thanks again for your time. It is very much appreciated and including control genes in the next run will be a must. Cheers
....also how can I accept your comment as an answer? Thank you.
moved to answer, glad it helped you.