We sequenced human mtDNA for multiple individuals. Our method involved making 2 partially overlapping PCR amplicons of mtDNA and then sequencing them on Illumina MiSeq (paired-end, ~350 average insert size).
I removed adapters, mapped the reads to hg38 reference using bwa and I am observing a strange coverage drop at position 3,234. It is present at this exact position in all samples (multiple separate batches) but in some samples the coverage drops from several hundred fold to 10-15x while in others it only drops to about half the coverage of the neighboring region:
This position is in the middle of one of the PCR amplicons and not near any of the primers used to make the amplicons.
I am trying to figure out what is happening.
1) Does the shape of the coverage track indicate anything? I am confused why the coverage drops so suddenly at position 3,234 (in a straight vertical line) but then gradually increases from almost 0 starting at position 3,235. How do I interpret that?
2) What should I try to figure this out (things I have already done are below)?. I'd like to solve this in silico but if I can't, will PCR amplify the region.
Here is what I have done so far:
1) checked primer specificity using BLAST - none of the primer sequences match anywhere besides the intended positions on human mtDNA
2) de novo assembled several samples - in all cases the region downstream of position 3,234 didn't assemble correctly (when I map reads back to the assembly, that region looks messy and is not uniformly covered; when I BLAST the part of the assembly downstream of 3,234 it aligns to various parts of mtDNA that are not supposed to be near position 3,234 but there is no clear pattern between different samples
3) looked at soft-clipped reads on both sides of position 3,234. Some soft-clipped parts of reads cannot be found anywhere at all, others map to various parts of mtDNA - again no clear pattern.
4) for reads that map near position 3,234, I looked where the other read in the pair maps - again, there is no clear pattern
5) looked at incorrectly oriented reads - there are some throughout the alignment and for the ones where one read is near position 3,234 again there is no clear pattern to where the second read in the pair is.
Thank you very much in advance!
Please use these directions to add images: How to add images to a Biostars post
Looking at the BAM may be too late to diagnose what is happening. I would start by grepping your raw reads for mtDNA sequence around that 3234 area - agatggc, not much longer than that. https://www.mitomap.org/foswiki/bin/view/MITOMAP/HumanMitoSeq
If they don't even exist you know it's something in the wet lab.
If they do exist then align those in isolation - using BLAST and BWA - see why they don't align and get back to us.