Question

scaling for kallisto estimate counts

0

Entering edit mode

4 months ago

Meghan.T • 0

We have a dataset of bulk RNA-seq and we analyzed it with RNAlysis software before. Basically we gave raw fastq files and the human reference transcriptome and RNAlysis runs cutadapt and kallisto on all samples and gives a .tsv and an.h5 file for each sample. Additionally it generated one per-gene scaled output table of all samples. After that we performed differential gene expression analysis using this table as input.

Recently we tried using an in-house script to process with cutadapt and kallisto with the same parameters. therefore we generated an .h5 fand .tsv file for each sample. However, I'm not sure if it is necessary to perform TPM scaling or per gene scaling on those files since we are only going to do DGE and if so what should I use for this purpose.

Differential-gene-expression RNAlysis Kallisto • 347 views

ADD COMMENT • link 4 months ago by Meghan.T • 0

score 1 · Answer 1 · 2024-06-12

1

Entering edit mode

4 months ago

ATpoint 85k

I am not sure what "scaling" here means. I hope not Z-scoring as this kind of "deletes" all information on the magnitude of counts.

The differential workflow is simple. Get kallisto output. Summarize to gene level with tximport (R/Bioconductor) and then use the testing framework you want to use, for example DESeq2, edgeR or limma.

ADD COMMENT • link 4 months ago by ATpoint 85k

0

Entering edit mode

Hi ATpoint and thank you for your response. I realized that scaling in RNAlysis is normalizing gene counts to gene length. So in DGE analysis. I got different results using un-normalized gene counts versus using gene length normalized counts. The two DEG sets that I got almost had 30% overlap which is very confusing for me, after all, we are comparing each gene to itself among different conditions, so the gene length scaling should not cause that much of difference. What are your thoughts?

ADD REPLY • link 4 months ago by Meghan.T • 0