Question

Strange coverage of mtDNA - need help interpreting MiSeq results

2

Entering edit mode

4.7 years ago

tatiatcg ▴ 20

We sequenced human mtDNA for multiple individuals. Our method involved making 2 partially overlapping PCR amplicons of mtDNA and then sequencing them on Illumina MiSeq (paired-end, ~350 average insert size).

I removed adapters, mapped the reads to hg38 reference using bwa and I am observing a strange coverage drop at position 3,234. It is present at this exact position in all samples (multiple separate batches) but in some samples the coverage drops from several hundred fold to 10-15x while in others it only drops to about half the coverage of the neighboring region:

coverage of 2 of the samples

This position is in the middle of one of the PCR amplicons and not near any of the primers used to make the amplicons.

I am trying to figure out what is happening.

1) Does the shape of the coverage track indicate anything? I am confused why the coverage drops so suddenly at position 3,234 (in a straight vertical line) but then gradually increases from almost 0 starting at position 3,235. How do I interpret that?

2) What should I try to figure this out (things I have already done are below)?. I'd like to solve this in silico but if I can't, will PCR amplify the region.

Here is what I have done so far:

1) checked primer specificity using BLAST - none of the primer sequences match anywhere besides the intended positions on human mtDNA

2) de novo assembled several samples - in all cases the region downstream of position 3,234 didn't assemble correctly (when I map reads back to the assembly, that region looks messy and is not uniformly covered; when I BLAST the part of the assembly downstream of 3,234 it aligns to various parts of mtDNA that are not supposed to be near position 3,234 but there is no clear pattern between different samples

3) looked at soft-clipped reads on both sides of position 3,234. Some soft-clipped parts of reads cannot be found anywhere at all, others map to various parts of mtDNA - again no clear pattern.

4) for reads that map near position 3,234, I looked where the other read in the pair maps - again, there is no clear pattern

5) looked at incorrectly oriented reads - there are some throughout the alignment and for the ones where one read is near position 3,234 again there is no clear pattern to where the second read in the pair is.

Thank you very much in advance!

next-gen sequencing alignment igv mtdna • 1.5k views

ADD COMMENT • link updated 4.7 years ago by swbarnes2 14k • written 4.7 years ago by tatiatcg ▴ 20

0

Entering edit mode

Please use these directions to add images: How to add images to a Biostars post

ADD REPLY • link 4.7 years ago by GenoMax 146k

0

Entering edit mode

Looking at the BAM may be too late to diagnose what is happening. I would start by grepping your raw reads for mtDNA sequence around that 3234 area - agatggc, not much longer than that. https://www.mitomap.org/foswiki/bin/view/MITOMAP/HumanMitoSeq

If they don't even exist you know it's something in the wet lab.

If they do exist then align those in isolation - using BLAST and BWA - see why they don't align and get back to us.

ADD REPLY • link 4.7 years ago by Jeremy Leipzig 22k

score 3 · Answer 1 · 2020-03-02

First thing to do is to investigate the region in a genome browser. Here is how your region looks in UCSC. Your coordinate overlaps an annotated repeat: tRNA-Leu-TTA(m) seemingly a tandem repeat of variable length. Your read pairs likely do not span the real repeat region which is actually longer than the sequence depicted in the assembly, or if they do span it, the resulting read sequence is not identical to the reference. This is also the reason why you couldn't assemble this region.

So, when you look at the genome at this spot, for which the actual assembly gives:

>hg38_rmsk_tRNA-Leu-TTA(m) range=chrM:3230-3307 5'pad=0 3'pad=0 strand=+ repeatMasking=lower
gttaagatggcagagcccggtaatcgcataaaacttaaaactttacagtc
agaggttcaattcctcttcttaacaaca

it could be in fact 10 or 100 times that sequence, and with variability between individuals as well. No read pair could span the beginning of the repeat and still align to the reference.

If you need the exact sequence length at this site, you could do nanopore sequencing. I estimate that the actual repeat rate is not much more than 10x - 20x, limited by the fragment size of the specific polymerase used during PCR (pcr_length = 1400bp+78*x if there is only this repeat). You could maybe also measure your real fragment length using a southern blot.

score 0 · Answer 2 · 2020-03-02

0

Entering edit mode

4.7 years ago

swbarnes2 14k

You've got some kind of indel going on there. That's what causes that coverage pattern.

de novo assembled several samples - in all cases the region downstream of position 3,234 didn't assemble correctly

Not 'correctly'? Or not what you expect?

ADD COMMENT • link 4.7 years ago by swbarnes2 14k

0

Entering edit mode

Yes, an "insertion" at/in a tandem repeat ;)

ADD REPLY • link 4.7 years ago by Michael 55k

0

Entering edit mode

Thank you! By correctly I meant if the reads don't map back to the de novo assembly creating at least somewhat uniform coverage and instead the alignment is spotty, there must be an assembly issue.

ADD REPLY • link 4.7 years ago by tatiatcg ▴ 20