Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indel calling #154

Open
kokyriakidis opened this issue May 31, 2022 · 3 comments
Open

Indel calling #154

kokyriakidis opened this issue May 31, 2022 · 3 comments

Comments

@kokyriakidis
Copy link

kokyriakidis commented May 31, 2022

Hi!

I want to detect accurately indels in some panel samples. I care only about small indels <=50b. Do you think I should increase "-d max edit distance" option to 50? Does this increase only affect snap speed or it also affects accuracy?

How does "-d max edit distance" compare with "-i max edit distance to considerfor potential indels"?

Are there any other options I should consider to increase sensitivity and precision around indels?

My first priority is sensitivity and accuracy and not speed.

KK

@bolosky
Copy link
Contributor

bolosky commented Jun 2, 2022

If you want to find indels up to 50, then you should make -d a little bigger than 50 in case there are other differences in the read, like SNPs away from the indel. The max value for -d is 62 or 63 (depending on other stuff) so you have some slack here.

Using -d this big will slow SNAP down quite a bit if you happen to have a lot of reads that don't align at all (or align with high edit distance) but that have enough similarity to the reference to have many seed hits. You may or may not care about this and you can experiment with your data to see what happens.

What -i does is to look for potential indels in the seeding phase. That is, if it sees two seed hits that are close to one another but offset (which might indicate an indel in the read between the seeds) then it increases the max edit distance only for that alignment candidate. It will have a much smaller performance impact than -d, but it will miss an indel that doesn't have seed hits on either side of (because, for example, it's close to the end of the read or because the region between the indel and one end or the other has enough differences from the reference that it doesn't have an exact match that corresponds to a seed SNAP looked at). In truth, if indels are close to the end of the read they're likely to be soft clipped anyway unless you turn off soft clipping.

@kokyriakidis
Copy link
Author

kokyriakidis commented Jun 2, 2022

Thank you for your detailed answer!

So, to summarize.

  1. I should increase '-d' to 60 for example.
  2. Should I turn off soft clipping for better results? If yes, how do I do that? (I have already trimmed my data from adapters etc)

With these two things I will get the best snap performance for indels?

@bolosky
Copy link
Contributor

bolosky commented Jun 2, 2022

I'd increase -d to 60 and see if you like the output. I'd also try -i 60, which will probably produce less noise since it will only increase the distance when it looks like there's an indel.

I think I spoke too soon about turning off soft clipping. We don't expose an option to do that, so you're stuck with it. So you're not likely to find big indels that are near the ends of reads, since they'll get clipped. That said, they're also pretty unreliable so you probably just want to stick with ones in the middle anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants