Missing 1 Mbp of alignment on Chromosome 16 between T2T-CHM13 v1.0 and GRCh38 #816
Description
Hi Heng,
I am looking at a 1 Mbp region on chr16 that doesn't seem to align when comparing GRCh38 and T2T, but there is a lot of homology in the region. When I do a whole genome alignment I get about 1 Mbp of missing alignment around ~chr16:21,000,000-22,000,000
and more specifically I get these alignments (including the flanking regions):
chr16 90338345 15116377 21305673 + chr16 96330493 15117341 21337915
chr16 90338345 21430746 21466480 + chr16 96330493 21709065 21744757
chr16 90338345 22472332 28471794 + chr16 96330493 22750307 28752414
With this command (version 2.22):
minimap2 \
-cx asm20 --eqx --secondary=no \
-s 10000 -t 100 -K 8G \
../assemblies/chm13.draft_v1.0.fasta ../assemblies/hg38.chr_only.fa
And here is an image of the alignment (T2T on the bottom)
However if I take just the region that is mostly unaligned and align it then I get a nearly complete alignment in the reverse complement, suggesting a missed inversion I think.
chr16:21305673-22472332 1166660 160939 1166660 - chr16:21337915-22750307 1412393 406974 1412393 1005059
chr16:21305673-22472332 1166660 91920 196680 + chr16:21337915-22750307 1412393 337923 442372 103880 105150
Command:
minimap2 \
-cx asm20 --eqx --secondary=no \
-s 10000 -t 100 -K 8G \
<(samtools faidx ../assemblies/chm13.draft_v1.0.fasta chr16:21337915-22750307 ) \
<(samtools faidx ../assemblies/hg38.chr_only.fa chr16:21305673-22472332 ) \
I have also dropped the two paf files and references at this link:
https://eichlerlab.gs.washington.edu/help/mvollger/share/mm2-issue-chr16/
I am not sure the cause of this, but it seems like a large event to miss, and hopefully I this is not just some mistake on my part.
(And credit to Ariel Gershman for finding this issue, if it is a real issue and not some mistake on my part)
Thanks in advance!
Mitchell