Question

RSEM getting TPM from STAR alignment

0

Entering edit mode

4 months ago

Emily ▴ 20

I've performed RNAseq alignment with STAR using --quantMode TranscriptomeSAM GeneCounts and obtained the Aligned.toTranscriptome.out.bam file. I wanted to get the TPM of the transcripts using RSEM. I build the indice of RSEM using

 rsem-prepare-reference --polyA-length 125 --gtf GCF_000002985.6_WBcel235_genomic.gtf GCF_000002985.6_WBcel235_genomic.fna rsem_ref

there is no error up to this step. However when I performed

rsem-calculate-expression --bam --no-bam-output -p 8 --paired-end --forward-prob 1 Aligned.toTranscriptome.out.bam rsem_ref outputRsem > OutputRSEM.log

it shows the error:

rsem-parse-alignments rsem_ref outputRsem.temp/outputRsem outputRsem.stat/outputRsem Aligned.toTranscriptome.out.bam 3 -tag XM
Warning: The SAM/BAM file declares less reference sequences (56716) than RSEM knows (56718)!
"rsem-parse-alignments rsem_ref outputRsem.temp/outputRsem outputRsem.stat/outputRsem Aligned.toTranscriptome.out.bam 3 -tag XM" failed! Plase check if you provide correct parameters/options for the pipeline!

Pls Help!

TPM STAR RNAseq RSEM • 669 views

ADD COMMENT • link 4 months ago by Emily ▴ 20

2

Entering edit mode

There is a mismatch between rsem reference and reference used for star, the difference is two transcripts. Check that references are the same.

ADD REPLY • link 4 months ago by ATpoint 85k

0

Entering edit mode

thanks! not sure how this warning caused as I used the same gtf and fna for building STAR alginer indice, maybe I could just avoid this warning?

Sorry I have another error popped up: RSEM's indices might be corrupted, unassigned_transcript_572 appears more than once!

do you have any idea on this or if there's other method that could calculate TPM from STAR output bam file, pls kindly advice me.

ADD REPLY • link 4 months ago by Emily ▴ 20

0

Entering edit mode

I realised theres lines (approximately 3000 of them) showing unassigned transcript such as this one in my RSEM indice transcript.fa file

unassigned_transcript_2688 GGTTTTGAGAGGAATCCTTTT and same for the STAR index transcript.fa file (approximately 2000 unassigned transcript)! Is it appropriate to manually remove them from the fasta file? I'm really confused why there are so many unassigned transcripts while I use the matching gtf and genome file downloaded from the same genome assembly.

ADD REPLY • link 4 months ago by Emily ▴ 20

score 1 · Accepted Answer · 2024-06-22

1

Entering edit mode

4 months ago

Emily ▴ 20

finally solved by using another gtf version!

ADD COMMENT • link 4 months ago by Emily ▴ 20