Data for drawing Heatmaps (RNA-seq)
1
2
Entering edit mode
7.1 years ago
sd.gamboa.t ▴ 50

Hello,

Please I'd like some advice..

I performed a de novo assembly (of RNA-seq reads) of the transcriptome of my target organism by means of Trinity. Next, I followed the Trinity pipeline and scripts to get the following data matrices about the assembled genes:

  • FPKM
  • TPM
  • TMM

My question is: Which of these data (FPKM, TPM or TMM) should I use to perform a hierarchichal clustering of the genes and draw a heatmap?

I'd like to use TMM because it is a normalized value across samples (and the trinity scripts use TMM for clustering and heatmaps). However, I've seen in some papers that the FPKM values are used instead.

Also, which kind of normalization is better for drawing a heatmap? z-score or centered log2 transformation?

Thanks in advance.

Samuel

RNA-Seq Heatmap FPKM TPM TMM • 9.6k views
ADD COMMENT
0
Entering edit mode

I think VST counts from DESeq2 might be a good choice (seq depth+composition bias correction) for heatmaps and MDS. But I think VST is not controlling for gene length. I am not sure if it is possible to get length normalised VST.

ADD REPLY
0
Entering edit mode
7.1 years ago
Corentin ▴ 610

Hi,

The normalization should be performed by the tool you are using (the most popular being EdgeR, DESeq2 and limma), each one of them has a different way of normalizing the data, but if your data is robust (one of the important thing is having enough replicates), they should give similar results,

If you are using Trinity, there is a script called "run_DE_analysis.pl" which will perform the normalization (using EdgeR, DESeq or limma as you choose) and pairwise comparisons among each of your sample. To know how to run it you can just follow this trinity tutorial : https://github.com/trinityrnaseq/trinityrnaseq/wiki/Trinity-Differential-Expression. As you can read on this page, this script is asking for a "matrix of raw read counts (not normalized!)". This tutorial explain every step (including drawing heatmaps).

Now, if you want information on how FPKM, RPKM and TPM work, I find this video useful (and by the way all the videos from StatQuest are good): https://www.youtube.com/watch?time_continue=608&v=TTUrtCY2k-w basically FPKM, RPKM and TPM normalize by library size (sequencing depth) and transcripts length, which should be enough if all your samples come from the same tissue.

I do not know a lot about TMM but as I understood it, it also adjusts for library composition. Meaning that it is useful if you want to compare different tissues, indeed if a gene is heavily expressed in one tissue and not the other, it will "absorb" most of the reads and the other genes will seems less expressed. Here is a video explaining how DESeq2 normalize data :

So in the end it depends on your experiment / data type.

Corentin

ADD COMMENT
1
Entering edit mode

Hi Corentin,

Thanks for your response. I had followed the trinity instructions and scripts to perform differential expression analysis (using the gene counts matrix). The trinity scripts also provided a mean to automatically perform several analysis, including a heatmap where the TMM matrix of differential expressed genes is represented. Trinity scripts also provide a TPM matrix; and a FPKM matrix can be easily obtained from the RSEM output. However, I'd like to draw additional heatmaps for specific gene sets.

Trinity scripts help to draw a heatmap, which is based on mean-cetered-log2(TMM+1) values. I thought using this metric because i do comparisons among samples in my experiment design. However, in many papers they employ the FPKM values instead, others use CPM (count per millions), and so on, even when they compare among samples (as my case). Additionally, in some papers they use z-scores instead of log2 transformation.

By comparing heatmaps drawed with different metrics (TMM, TPM, or FPKM) and transformations (log2 or z-core) I got different heatmap coloring patterns and clusters. So my doubt still remains regarding if is it better to use (or more accepted by scientific community) any particular type of metric and transformation? or is one's choice which metric to present? just for the specific case of drawing and clustering gene sets in a heatmap.

Thanks again.

Samuel

ADD REPLY

Login before adding your answer.

Traffic: 1236 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6