RSEM getting TPM from STAR alignment
1
0
Entering edit mode
4 months ago
Emily ▴ 20

I've performed RNAseq alignment with STAR using --quantMode TranscriptomeSAM GeneCounts and obtained the Aligned.toTranscriptome.out.bam file. I wanted to get the TPM of the transcripts using RSEM. I build the indice of RSEM using

 rsem-prepare-reference --polyA-length 125 --gtf GCF_000002985.6_WBcel235_genomic.gtf GCF_000002985.6_WBcel235_genomic.fna rsem_ref   

there is no error up to this step. However when I performed

rsem-calculate-expression --bam --no-bam-output -p 8 --paired-end --forward-prob 1 Aligned.toTranscriptome.out.bam rsem_ref outputRsem > OutputRSEM.log

it shows the error:

rsem-parse-alignments rsem_ref outputRsem.temp/outputRsem outputRsem.stat/outputRsem Aligned.toTranscriptome.out.bam 3 -tag XM
Warning: The SAM/BAM file declares less reference sequences (56716) than RSEM knows (56718)!
"rsem-parse-alignments rsem_ref outputRsem.temp/outputRsem outputRsem.stat/outputRsem Aligned.toTranscriptome.out.bam 3 -tag XM" failed! Plase check if you provide correct parameters/options for the pipeline!

Pls Help!

TPM STAR RNAseq RSEM • 669 views
ADD COMMENT
2
Entering edit mode

There is a mismatch between rsem reference and reference used for star, the difference is two transcripts. Check that references are the same.

ADD REPLY
0
Entering edit mode

thanks! not sure how this warning caused as I used the same gtf and fna for building STAR alginer indice, maybe I could just avoid this warning?

Sorry I have another error popped up: RSEM's indices might be corrupted, unassigned_transcript_572 appears more than once!

do you have any idea on this or if there's other method that could calculate TPM from STAR output bam file, pls kindly advice me.

ADD REPLY
0
Entering edit mode

I realised theres lines (approximately 3000 of them) showing unassigned transcript such as this one in my RSEM indice transcript.fa file

unassigned_transcript_2688 GGTTTTGAGAGGAATCCTTTT and same for the STAR index transcript.fa file (approximately 2000 unassigned transcript)! Is it appropriate to manually remove them from the fasta file? I'm really confused why there are so many unassigned transcripts while I use the matching gtf and genome file downloaded from the same genome assembly.

ADD REPLY
1
Entering edit mode
4 months ago
Emily ▴ 20

finally solved by using another gtf version!

ADD COMMENT

Login before adding your answer.

Traffic: 1453 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6