Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tiny fraction of reads mapped #61

Open
blahah opened this issue Nov 1, 2015 · 10 comments
Open

Tiny fraction of reads mapped #61

blahah opened this issue Nov 1, 2015 · 10 comments

Comments

@blahah
Copy link
Contributor

blahah commented Nov 1, 2015

via @ctb

User input 114396588 pairs, of which only 77778 were reported in the BAM file.

SNAP logs:

Loading index from directory... 0s.  236823466 bases, seed size 23
Aligning.
Welcome to SNAP version 1.0beta.18.

sched_setaffinity: Invalid argument
sched_setaffinity: Invalid argument
sched_setaffinity: Invalid argument
sched_setaffinity: Invalid argument

Could those log messages have something to do with it? Can provide input data if necessary.

@ctb
Copy link

ctb commented Nov 1, 2015

On Sun, Nov 01, 2015 at 09:21:43AM -0800, Richard Smith-Unna wrote:

via @ctb

User input 114396588 pairs, of which only 77778 were reported in the BAM file.

SNAP logs:

Loading index from directory... 0s.  236823466 bases, seed size 23
Aligning.
Welcome to SNAP version 1.0beta.18.

sched_setaffinity: Invalid argument
sched_setaffinity: Invalid argument
sched_setaffinity: Invalid argument
sched_setaffinity: Invalid argument

Could those log messages have something to do with it? Can provide input data if necessary.

Maybe, although the advice on line is to ignore it.

I recompiled snap and am now trying to figure out how to get transrate
to find it.

@ctb
Copy link

ctb commented Nov 1, 2015

Same problem as #60.

@bolosky
Copy link
Contributor

bolosky commented Nov 1, 2015

No, those messages don’t have anything to do with read mapping.

SNAP tries to bind the aligner threads to cores, which somewhat improves efficiency because the hardware doesn’t have to move the cache state to follow the thread. This message means that it failed to bind a thread to a core, which usually happens when you give it –t with more threads than there are cores in the system. When that happens, the extra threads float to whatever core is idle, which might affect performance but won’t affect behavior.

Seeing lots of reads in the input which don’t make it into the output is probably because of one of two things. One is that the paired read matcher can’t match ends of reads to one another, either because RNEXT and PNEXT aren’t filled in, or because of a bug that Ravi’s working on now; it usually generates a message at the end of the alignment to this effect. The other reason is that the input reads are marked with the secondary or supplementary alignment flags (0x100 and 0x800), which ordinarily are dropped during the input phase because these aren’t real reads from the sequencer, they’re artifacts produced by a previous aligner. If you want to keep them, you can say –sa.

If it’s not either of those things, please let me know and I’ll try to figure out what’s going on.

--Bill

From: Richard Smith-Unna [mailto:notifications@github.com]
Sent: Sunday, November 1, 2015 9:22 AM
To: amplab/snap snap@noreply.github.com
Subject: [snap] Tiny fraction of reads mapped (#61)

via @ctbhttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fctb&data=01%7c01%7cbolosky%40microsoft.com%7c33987378431c4c3e3bd408d2e2e0eada%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=PUzgyC5P6wtzkSqwWjlLE3HCGGBpeRTMu%2b%2bkUr0HvBE%3d

User input 114396588 pairs, of which only 77778 were reported in the BAM file.

SNAP logs:

Loading index from directory... 0s. 236823466 bases, seed size 23

Aligning.

Welcome to SNAP version 1.0beta.18.

sched_setaffinity: Invalid argument

sched_setaffinity: Invalid argument

sched_setaffinity: Invalid argument

sched_setaffinity: Invalid argument

Could those log messages have something to do with it? Can provide input data if necessary.


Reply to this email directly or view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2famplab%2fsnap%2fissues%2f61&data=01%7c01%7cbolosky%40microsoft.com%7c33987378431c4c3e3bd408d2e2e0eada%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=4Ax9QUOaTd1jErSbgv2PaxHKWqpf6QJzOD%2fyO2xKp%2b8%3d.

@ctb
Copy link

ctb commented Nov 1, 2015

On Sun, Nov 01, 2015 at 10:11:05AM -0800, Bill Bolosky wrote:

No, those messages don???t have anything to do with read mapping.

Yes, sorry, the original problem was few reads mapping, and that was the only
message in the log file; but it's not the problem.

@blahah
Copy link
Contributor Author

blahah commented Nov 1, 2015

These are FASTQ format input, so I don't think it can be RNEXT/PNEXT or supplementary aln flags.

@bolosky
Copy link
Contributor

bolosky commented Nov 1, 2015

Well then that’s very strange.

By “reported in the BAM file” you mean that are there at all, not there and mapped? That is, you’re saying that it’s completely losing reads rather than simply failing to map them, right? What does SNAP print out at the end of its run when it reports read counts (the line that starts “Total Reads Aligned MAPQ >= 10…”?

--Bill

From: Richard Smith-Unna [mailto:notifications@github.com]
Sent: Sunday, November 1, 2015 10:14 AM
To: amplab/snap snap@noreply.github.com
Cc: Bill Bolosky bolosky@microsoft.com
Subject: Re: [snap] Tiny fraction of reads mapped (#61)

These are FASTQ format input, so I don't think it can be RNEXT/PNEXT or supplementary aln flags.


Reply to this email directly or view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2famplab%2fsnap%2fissues%2f61%23issuecomment-152849717&data=01%7c01%7cbolosky%40microsoft.com%7c5c81a3ca5f48427d320208d2e2e82897%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=IH9h58ee2yovMDXAn3vRckeyL%2fRC1JQ4o3SfTN6Z%2bAg%3d.

@ctb
Copy link

ctb commented Nov 1, 2015

tl; dr? I can chase down the error messages if you really want, but I think the root problem is #60.

Longer version:

I was using transrate, and getting very low mapping stats. The version of transrate that I was using comes with snap 1.0b18, which had the above error message. @blahah thought it might be the problem behind the low mapping rate, so he created this issue.

In the meantime, I compiled my own version of snap-aligner, identified as 1.0b20 (<= latest from github) and figured out that (variously) 1.0b20 crashed on 'snap-aligner paired' when used with the index that I'd created, and that 1.0b18 misbehaved in some way with the same index. At some point some snap-aligner command said, hey, I hate your FASTA header format, and so I shortened the headers in my reference transcriptome. transrate out-of-the-box (with snap 1.0b18) now maps a decent number of reads and all is well.

If you want error messages or verification, I am happy to provide them, but I suspect the root cause is the length of my FASTA headers, which is documented in #60. If you fix that I can re-run everything with the original data and verify that it solves all the problems!

@bolosky
Copy link
Contributor

bolosky commented Nov 1, 2015

OK. I have a fix for that that’s almost ready to go. I’ll try to get it checked in tomorrow.

From: C. Titus Brown [mailto:notifications@github.com]
Sent: Sunday, November 1, 2015 10:37 AM
To: amplab/snap snap@noreply.github.com
Cc: Bill Bolosky bolosky@microsoft.com
Subject: Re: [snap] Tiny fraction of reads mapped (#61)

tl; dr? I can chase down the error messages if you really want, but I think the root problem is #60https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2famplab%2fsnap%2fissues%2f60&data=01%7c01%7cbolosky%40microsoft.com%7ca9fa65af91f3449140f308d2e2eb803e%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=aXirXPkm2n2VKXZBE7PF82WwV5q5L3i%2byxJ1I0pKL%2fI%3d.

Longer version:

I was using transrate, and getting very low mapping stats. The version of transrate that I was using comes with snap 1.0b18, which had the above error message. @Blahahhttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fBlahah&data=01%7c01%7cbolosky%40microsoft.com%7ca9fa65af91f3449140f308d2e2eb803e%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=AwYP2dFs6CBLnkAErBsVXP8AIiVKtNJ%2f72Gv3j9w224%3d thought it might be the problem behind the low mapping rate, so he created this issue.

In the meantime, I compiled my own version of snap-aligner, identified as 1.0b20 (<= latest from github) and figured out that (variously) 1.0b20 crashed on 'snap-aligner paired' when used with the index that I'd created, and that 1.0b18 misbehaved in some way with the same index. At some point some snap-aligner command said, hey, I hate your FASTA header format, and so I shortened the headers in my reference transcriptome. transrate out-of-the-box (with snap 1.0b18) now maps a decent number of reads and all is well.

If you want error messages or verification, I am happy to provide them, but I suspect the root cause is the length of my FASTA headers, which is documented in #60https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2famplab%2fsnap%2fissues%2f60&data=01%7c01%7cbolosky%40microsoft.com%7ca9fa65af91f3449140f308d2e2eb803e%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=aXirXPkm2n2VKXZBE7PF82WwV5q5L3i%2byxJ1I0pKL%2fI%3d. If you fix that I can re-run everything with the original data and verify that it solves all the problems!


Reply to this email directly or view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2famplab%2fsnap%2fissues%2f61%23issuecomment-152851026&data=01%7c01%7cbolosky%40microsoft.com%7ca9fa65af91f3449140f308d2e2eb803e%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=EQFkMqWBMPefyIJc2sijuVF1raT%2bSznPtOGOveuimNI%3d.

@ctb
Copy link

ctb commented Nov 1, 2015

great!

On Sun, Nov 01, 2015 at 10:39:19AM -0800, Bill Bolosky wrote:

OK. I have a fix for that that???s almost ready to go. I???ll try to get it checked in tomorrow.

@bolosky
Copy link
Contributor

bolosky commented Nov 3, 2015

I pushed a fix for very long contig names in beta.21 (and dev.91). You should try that and see if it helps.

From: C. Titus Brown [mailto:notifications@github.com]
Sent: Sunday, November 1, 2015 10:40 AM
To: amplab/snap snap@noreply.github.com
Cc: Bill Bolosky bolosky@microsoft.com
Subject: Re: [snap] Tiny fraction of reads mapped (#61)

great!

On Sun, Nov 01, 2015 at 10:39:19AM -0800, Bill Bolosky wrote:

OK. I have a fix for that that???s almost ready to go. I???ll try to get it checked in tomorrow.


Reply to this email directly or view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2famplab%2fsnap%2fissues%2f61%23issuecomment-152851161&data=01%7c01%7cbolosky%40microsoft.com%7c572429c5940047801d4308d2e2ebe7b4%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=GFq38avbp01EeH1Q0loxOubJHZ5ZI7G7LPitscFOmE0%3d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants