Suspicious alignments of regions with Ns stretches #155
Description
We noticed suspicious alignments of the attached scaffolded fragment against the C. elegans reference genome (downloaded from here: ftp://ftp.ensemblgenomes.org/pub/metazoa/release-39/fasta/caenorhabditis_elegans/dna/Caenorhabditis_elegans.WBcel235.dna.toplevel.fa.gz).
The running command was:
./minimap2 -c -x asm5 -B5 -r 50 --no-long-join -N 50 -s 65 -z 200 --mask-level 0.9 -f 200 --cs Caenorhabditis_elegans.WBcel235.dna.toplevel.fa scaffold.fa
The fragment includes two stretches of 200 Ns. We expect minimap2 to break the alignment at this stretches since we use -r 50
and 200 >> 50. However, we get:
94676_SeqID_1.0_Cov_1.0 40534 0 14850 - II_dna_chromosome_chromosome_WBcel235_II_1_15279421_1_REF 15279421 15096376 15111253 14366 14577 60 NM:i:611 ms:i:13459 AS:i:13375 nn:i:400 tp:A:P cm:i:1284 s1:i:12921 s2:i:1163 dv:f:0.0048 cg:Z:268M30I10135M57D1287M70I70D3060M cs:Z::11*tg:10*cg:1*ag:62*ct:4*cg:2*at:34*at:6*ga:13*tg:5*tc:4*gc:1*ga:23*cg:3*tc:75+nnnnnnnnnnnnnnnnnnnnnnnnnnnnnn*gn*gn*an*an*tn*tn*tn*tn*cn*an*an*tn*tn*cn*cn*gn*gn*cn*an*an*tn*tn*tn*tn*cn*cn*an*an*cn*tn*tn*gn*cn*cn*cn*gn*an*an*an*tn*tn*tn*tn*cn*an*an*cn*tn*cn*cn*gn*gn*cn*an*an*tn*tn*tn*tn*cn*cn*an*an*cn*tn*tn*gn*cn*cn*cn*gn*an*an*an*tn*tn*tn*tn*cn*an*an*tn*tn*cn*cn*gn*gn*cn*an*an*tn*tn*tn*gn*cn*cn*an*an*cn*cn*tn*gn*cn*cn*gn*gn*an*an*an*tn*tn*tn*tn*cn*an*an*tn*cn*cn*cn*gn*gn*cn*an*an*tn*tn*tn*gn*cn*cn*an*an*cn*tn*tn*gn*cn*cn*cn*gn*an*an*an*tn*tn*tn*tn*cn*an*an*tn*tn*cn*cn*gn*gn*cn*an*an*tn*tn*tn*gn*cn*cn*an*an*cn*tn:9965-gaattttgtgtatttcatcaatagaaaggcataatttaagagaaataacaaaaattt*cn*an*an*tn*an*tn*an*an*an*an*cn*an*gn*cn*gn*an*an*an*an*an*an*tn*gn*an*gn*an*an*an*an*an*tn*cn*gn*an*cn*gn*an*an*an*an*tn*cn*gn*gn*tn*an*tn*an*an*an*an*tn*cn*an*an*an*tn*an*an*an*an*an*tn*gn*gn*an*an*gn*gn*an*an*an*an*tn*an*tn*tn*cn*an*tn*cn*tn*cn*gn*tn*an*an*an*cn*cn*cn*an*cn*an*cn*tn*tn*gn*cn*gn*gn*cn*an*cn*gn*gn*tn*tn*tn*cn*gn*tn*gn*gn*gn*cn*gn*gn*gn*gn*cn*gn*tn*cn*tn*cn*tn*gn*gn*cn*gn*gn*gn*an*an*an*an*tn*tn*cn*an*gn*cn*gn*tn*tn*tn*gn*an*an*an*an*cn*tn*cn*an*cn*an*tn*an*tn*an*gn*gn*cn*an*tn*cn*cn*an*an*tn*gn*an*an*tn*tn*tn*tn*cn*gn*gn*an*tn*tn*tn*tn*an*an*an*an*an*tn*tn*an*an*tn*an*tn*an:1087+tgccggaatttataatttccggcaaatcggcaaattgccgaaattaagaatttccggcaaataagcaaat-tgccggaatttataatttccggcaaatcggcaaattgccgaaattaagaatttccggcaaataagcaaat:3060
The second question is about the last sequences in the end of cs tag
:
+tgccggaatttataatttccggcaaatcggcaaattgccgaaattaagaatttccggcaaataagcaaat
and
-tgccggaatttataatttccggcaaatcggcaaattgccgaaattaagaatttccggcaaataagcaaat
look equivalent. Why minimap2 reported them as insertion (+) and deletion (-)?