Question

Some questions about call variants for mtDNA using GATK HaplotypeCaller

1

Entering edit mode

6.2 years ago

MatthewP ★ 1.4k

Hi everyone. I tried to apply GATK pipeline to amplicon-based mtDNA sequencing data analyse and found some problems want to ask/discuss with you guys.
I sat up my pipeline according this GATK Best Practice. Here is my pipeline:

QC(fastp)
mapping
bwa mem -t 4 -R "@RG\tID..." hs_ref_GRCh38.p12_chrMT.fa sample.R1.fastq.gz sample.R2.fastq.gz > sample.sam
SAM to BAM(samtools)
samtools view -bF 12 sample.sam > sample.bam
sort BAM(GATK SortSam)
gatk SortSam -I sample.bam -O sample_sorted.bam -SO coordinate
BAM index(samtools index)
Validate BAM(GATK ValidateSamFile)
gatk ValidateSamFile -I sample_sorted.bam -M VERBOSE
Call variant(GATK HaplotypeCaller)
gatk HaplotypeCaller -I samplt_sorted.bam -O sample.g.vcf -R hs_ref_GRCh38.p12_chrMT.fa -ploidy 1 -ERC GVCF
Consolidate GVCFs(GATK GenomicsDBImport)
gatk GenomicsDBImport --genomicsdb-workspace-path ./workspace -L MT -V sample1.g.vcf -V sample2.g.vcf ...(many samples)
Joint-Call Cohort(GATK GenotypeGVCFs)
gatk GenotypeGVCFs -O 18R111.vcf -R hs_ref_GRCh38.p12_chrMT.fa -ploidy 1 -V gendb://workspace

Here is something I want to explain first

No mark/remove duplicates because this is amplicon-based(multi-PCR) sequencing data, most of them are duplicates normally(fastp reported about 96%).
I set -ploidy 1 in step 7 and 9 because mtDNA should treated as haploid. However I can't set step 8 to haploid because this only for diploid. Second, I find description below in this GATK article: > The one quantity that changes whether the GATK considers the possibility of a heterozygous genotype at all is the ploidy, which describes how many copies of each chromosome each individual in the species carries.

It seems this ploidy setting only works for nuclear genome, means the copies of chromosome. But this not quite right for mtDNA for copy number not sure(At least not 1 copy). Question1: What will this affect my result?

When I check my finally vcf result I find 2 problems, result example:

MT      214     .       A       G       2386.46 ... GT:AD:DP:GQ:PL  1:0,112:112:99:5445,0   0:136,0:136:99:0,1080...

There problem is AD and DP too low value(Important to me, I need to calculate each variant's frequency.). The real depth value is about 5000, I find many has the same issue(check here). This seems because HaplotypeCaller not suite for amplicon data. Question2: Have anyone call such mtDNA variant using GATK? How?

Question3: Will Mutect2 work here?
I think about this because mtDNA heterogeneity are like tumor, while tumor has many other features that mtDNA don't like structure variants. Any one think this before? What should be normal tissue data (-I:normal param in Mutect2) if I apply Mutect2 to mtDNA? Reference sequence?

Thanks everyone!

GATK HaplotypeCaller mtDNA • 3.1k views

ADD COMMENT • link updated 2.5 years ago by kanika.151 ▴ 130 • written 6.2 years ago by MatthewP ★ 1.4k

1

Entering edit mode

2 . mapping
bwa mem -t 4 -R "@RG\tID..." hs_ref_GRCh38.p12_chrMT.fa sample.R1.fastq.gz sample.R2.fastq.gz > sample.sam

3 . SAM to BAM(samtools)
samtools view -bF 12 sample.sam > sample.bam

4 . sort BAM(GATK SortSam)
gatk SortSam -I sample.bam -O sample_sorted.bam -SO coordinate

Note that you shorten steps 2-4, and avoid intermediate files by piping the result of the alignment directly to samtools sort, creating a sorted bam file:

bwa mem -t 4 -R "@RG\tID..." hs_ref_GRCh38.p12_chrMT.fa sample.R1.fastq.gz sample.R2.fastq.gz  | samtools sort -o sample.bam -

ADD REPLY • link 6.2 years ago by WouterDeCoster 47k

0

Entering edit mode

Thanks, I used samtools sort before until I found gatk SortSam. I though this two tools do exactly the same thing. You mean my understanding here is wrong?

ADD REPLY • link 6.2 years ago by MatthewP ★ 1.4k

0

Entering edit mode

They probably do exactly the same thing, but I'd naively expect samtools to be quicker.

What you are doing is correct, there is just no need to split the alignment, sam to bam and sorting bam across three commands. It's messy, takes longer and makes intermediate files which you'll have to delete. A sam file takes a ton of disk space.

ADD REPLY • link 6.2 years ago by WouterDeCoster 47k

0

Entering edit mode

Good advice, indeed I need to delete those files by hand. Thanks.

ADD REPLY • link 6.2 years ago by MatthewP ★ 1.4k

score 0 · Answer 1 · 2022-05-02

0

Entering edit mode

2.5 years ago

kanika.151 ▴ 130

I just wanted to know if you got answers to your questions.

As Mutect2 will restrict results to low freq variant calls. :-)

ADD COMMENT • link 2.5 years ago by kanika.151 ▴ 130