Hello everyone, I would appreciate some help with my data:
I've been analysing some NGS miRNA data in order to do Differential Expression profiling.
Using the miRDeep2 program and algorithm, I'm in trouble when dealing with results to make a proper table in order to use it as input for DE software such as EdgeR or DEseq2
Such softwares require an input table in format i-rows containing ids from targets (miRNAs in this case), and j-columns containing read counts from each sample within the groups.
I have four groups with 12 samples each.
When analysing the miRDeep results, I have two types of docs.csv from each four groups to look at:
miRNAs_expressed_all_samples_14_09_2016_t_15_05_38.csv
and
result_14_09_2016_t_15_05_38.csv
and so on for the thre remaining groups.
According to the first doc.csv, there are some miRNAs that can come from different precursors located in different regions of the genome, being the same though, but the output takes both as different miRNAs, but with the same name in two different arrows with different read counts, each corresponding to different precursors.
But, if I add the -W option to the quantifier.pl step in the miRDeep2 analysis, the results turns out to be quite similar, with two different arrows with the same miRNA name both coming from two different precursors, and showing different read counts too, BUT, the value turns out to be aproximately half of the value comparing it with the same result but without the -W option in quantifier.pl step, as -W adds a 0.5 to a read count instead of a 1 if multimapping is detected, as it's the case of equal mature miRNAs coming from different precursors alocated in different regions.
So, in order to do a proper read count table for DE analysis. Should I sum both absolute read count values and take this as final read count? or should I use the divided value (converting to integer) provided by the -W option an sum them to obtain a final read count? In both cases, with -W and without this option, I obtain paired rows with the same mature miRNAs but coming from different precursors, so I have two rows named the same, but in the first case, with a total read count which is double the value if we take the -W option.
Example:
Without -W option:
miRNA_ID precursor read_count_sample_1 read_count_sample_2
ssc-let-7 ssc-let-7-1 1450393 1034593
ssc-let-7 ssc-let-7-2 1634574 1200943
With -W option:
miRNA_ID precursor read_count_sample_1 read_count_sample_2
ssc-let-7 ssc-let-7-1 654832.23 570432.21
ssc-let-7 ssc-let-7-2 765342.40 647823.78
In order to do DE analysis, what kind of matrix should I build? Should I consider both miRNAs as just one, using the summatory value from the -W option and transforming to integer? Should I use the firs without -W values an adding them to form another ssc-let-7(row1) + ssc-let-7(row2) row?
This just happens with some of the microRNAs, not with all of them...
Any help?