Skip to content

Suspicious alignments of regions with Ns stretches #155

Closed
@alexeigurevich

Description

@alexeigurevich

We noticed suspicious alignments of the attached scaffolded fragment against the C. elegans reference genome (downloaded from here: ftp://ftp.ensemblgenomes.org/pub/metazoa/release-39/fasta/caenorhabditis_elegans/dna/Caenorhabditis_elegans.WBcel235.dna.toplevel.fa.gz).

The running command was:
./minimap2 -c -x asm5 -B5 -r 50 --no-long-join -N 50 -s 65 -z 200 --mask-level 0.9 -f 200 --cs Caenorhabditis_elegans.WBcel235.dna.toplevel.fa scaffold.fa

The fragment includes two stretches of 200 Ns. We expect minimap2 to break the alignment at this stretches since we use -r 50 and 200 >> 50. However, we get:
94676_SeqID_1.0_Cov_1.0 40534 0 14850 - II_dna_chromosome_chromosome_WBcel235_II_1_15279421_1_REF 15279421 15096376 15111253 14366 14577 60 NM:i:611 ms:i:13459 AS:i:13375 nn:i:400 tp:A:P cm:i:1284 s1:i:12921 s2:i:1163 dv:f:0.0048 cg:Z:268M30I10135M57D1287M70I70D3060M cs:Z::11*tg:10*cg:1*ag:62*ct:4*cg:2*at:34*at:6*ga:13*tg:5*tc:4*gc:1*ga:23*cg:3*tc:75+nnnnnnnnnnnnnnnnnnnnnnnnnnnnnn*gn*gn*an*an*tn*tn*tn*tn*cn*an*an*tn*tn*cn*cn*gn*gn*cn*an*an*tn*tn*tn*tn*cn*cn*an*an*cn*tn*tn*gn*cn*cn*cn*gn*an*an*an*tn*tn*tn*tn*cn*an*an*cn*tn*cn*cn*gn*gn*cn*an*an*tn*tn*tn*tn*cn*cn*an*an*cn*tn*tn*gn*cn*cn*cn*gn*an*an*an*tn*tn*tn*tn*cn*an*an*tn*tn*cn*cn*gn*gn*cn*an*an*tn*tn*tn*gn*cn*cn*an*an*cn*cn*tn*gn*cn*cn*gn*gn*an*an*an*tn*tn*tn*tn*cn*an*an*tn*cn*cn*cn*gn*gn*cn*an*an*tn*tn*tn*gn*cn*cn*an*an*cn*tn*tn*gn*cn*cn*cn*gn*an*an*an*tn*tn*tn*tn*cn*an*an*tn*tn*cn*cn*gn*gn*cn*an*an*tn*tn*tn*gn*cn*cn*an*an*cn*tn:9965-gaattttgtgtatttcatcaatagaaaggcataatttaagagaaataacaaaaattt*cn*an*an*tn*an*tn*an*an*an*an*cn*an*gn*cn*gn*an*an*an*an*an*an*tn*gn*an*gn*an*an*an*an*an*tn*cn*gn*an*cn*gn*an*an*an*an*tn*cn*gn*gn*tn*an*tn*an*an*an*an*tn*cn*an*an*an*tn*an*an*an*an*an*tn*gn*gn*an*an*gn*gn*an*an*an*an*tn*an*tn*tn*cn*an*tn*cn*tn*cn*gn*tn*an*an*an*cn*cn*cn*an*cn*an*cn*tn*tn*gn*cn*gn*gn*cn*an*cn*gn*gn*tn*tn*tn*cn*gn*tn*gn*gn*gn*cn*gn*gn*gn*gn*cn*gn*tn*cn*tn*cn*tn*gn*gn*cn*gn*gn*gn*an*an*an*an*tn*tn*cn*an*gn*cn*gn*tn*tn*tn*gn*an*an*an*an*cn*tn*cn*an*cn*an*tn*an*tn*an*gn*gn*cn*an*tn*cn*cn*an*an*tn*gn*an*an*tn*tn*tn*tn*cn*gn*gn*an*tn*tn*tn*tn*an*an*an*an*an*tn*tn*an*an*tn*an*tn*an:1087+tgccggaatttataatttccggcaaatcggcaaattgccgaaattaagaatttccggcaaataagcaaat-tgccggaatttataatttccggcaaatcggcaaattgccgaaattaagaatttccggcaaataagcaaat:3060

The second question is about the last sequences in the end of cs tag:
+tgccggaatttataatttccggcaaatcggcaaattgccgaaattaagaatttccggcaaataagcaaat
and
-tgccggaatttataatttccggcaaatcggcaaattgccgaaattaagaatttccggcaaataagcaaat
look equivalent. Why minimap2 reported them as insertion (+) and deletion (-)?

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions