I'm trying to analyze a set of RNA-seq data, and a subset of my data hasn't generated the results. The code I use:
for i in {KO1_1,KO1_2,KO1_3,KO2_1,KO2_2,KO2_3,KO_Ctrl_1,KO_Ctrl_2,KO_Ctrl_3,OE1_1,OE1_2,OE1_3,OE_NC_1,OE_NC_2,OE_NC_3};
do
## STAR alignment
STAR --twopassMode Basic \
--quantMode TranscriptomeSAM GeneCounts \
--runThreadN 6 \
--genomeDir index/STAR_index_hg38+V41 \
--alignIntronMin 20 \
--alignIntronMax 50000 \
--outSAMtype BAM SortedByCoordinate \
--sjdbOverhang 149 \
--outSAMattrRGline ID:sample SM:sample PL:ILLUMINA \
--outFilterMismatchNmax 2 \
--outSJfilterReads Unique \
--outSAMmultNmax 1 \
--outFileNamePrefix ${i} \
--outSAMmapqUnique 60 \
--readFilesCommand gunzip -c \
--readFilesIn rawdata/Bianlab_RNA_seq/${i}_1.fq.gz rawdata/Bianlab_RNA_seq/${i}_2.fq.gz
## RSEM count
rsem-calculate-expression --paired-end --no-bam-output \
--alignments -p 15 \
--bam ${i}Aligned.toTranscriptome.out.bam \
index/RSEM_index_hg38+V41/RSEM_index_hg38+V41 \
intermediate_data/${i}_
done
I tried to rerun the groups that did not generate results, and encountered following error:
rsem-calculate-expression --paired-end --no-bam-output \
> --alignments -p 15 \
> --bam OE1_2Aligned.toTranscriptome.out.bam \
> index/RSEM_index_hg38+V41/RSEM_index_hg38+V41 \
> intermediate_data/OE1_2_
rsem-parse-alignments index/RSEM_index_hg38+V41/RSEM_index_hg38+V41 intermediate_data/OE1_2_.temp/OE1_2_ intermediate_data/OE1_2_.stat/OE1_2_ OE1_2Aligned.toTranscriptome.out.bam 3 -tag XM
[samopen] no @SQ lines in the header.
The SAM/BAM file declares less than one reference sequence!
"rsem-parse-alignments index/RSEM_index_hg38+V41/RSEM_index_hg38+V41 intermediate_data/OE1_2_.temp/OE1_2_ intermediate_data/OE1_2_.stat/OE1_2_ OE1_2Aligned.toTranscriptome.out.bam 3 -tag XM" failed! Plase check if you provide correct parameters/options for the pipeline!
I tried to check the bam file, however, the result is as:
samtools view OE1_2Aligned.toTranscriptome.out.bam
[main_samview] fail to read the header from "OE1_2Aligned.toTranscriptome.out.bam".
I am currently confused about which step's parameters have been called incorrectly, preventing me from obtaining the results.
what is the output of
?
oops
However, OE1_2 and OE1_3 are not the only groups experiencing problems. I tried another non-empty file, and:
The most torturous thing is that some groups did indeed generate results.
I'm so confused...
Looks like you are missing some files.
BTW this is not a
bug
since those are generally problems associated with programs themselves. This seems to be an issue with your execution and/or data.Their .fq files all look similar, should I rerun these programs?
Since we don't have access to your data we can't comment. Check the integrity of your input data files. Validate the files if you are not sure about their contents.
It seems that the files that generated the results also have significant issues. Except for one sample, all other samples only have over 200 transcripts with non-zero counts.
One of them looks okay, but I don't quite believe it.
Pallondyle : Use
ADD REPLY/ADD COMMENT
to add additional information to existing threads.ADD YOUR ANSWER
is meant for adding new answers to the original question.Can you check using
featureCounts
to make sure things look reasonable with alignments? You could also usesalmon
to do transcript quantitation instead ofrsem
.Thx. I found out that I used the wrong reference genome. I have a set of data from mice and a set from humans. I processed the human data using the mouse reference genome and the mouse data using the human reference genome.