We have a dataset of bulk RNA-seq and we analyzed it with RNAlysis software before. Basically we gave raw fastq files and the human reference transcriptome and RNAlysis runs cutadapt and kallisto on all samples and gives a .tsv and an.h5 file for each sample. Additionally it generated one per-gene scaled output table of all samples. After that we performed differential gene expression analysis using this table as input.
Recently we tried using an in-house script to process with cutadapt and kallisto with the same parameters. therefore we generated an .h5 fand .tsv file for each sample. However, I'm not sure if it is necessary to perform TPM scaling or per gene scaling on those files since we are only going to do DGE and if so what should I use for this purpose.
Hi ATpoint and thank you for your response. I realized that scaling in RNAlysis is normalizing gene counts to gene length. So in DGE analysis. I got different results using un-normalized gene counts versus using gene length normalized counts. The two DEG sets that I got almost had 30% overlap which is very confusing for me, after all, we are comparing each gene to itself among different conditions, so the gene length scaling should not cause that much of difference. What are your thoughts?