Hi all,
I was wondering if I'm missing something obvious: samtools
can filter your BAM file based on many criteria (such as flags, tags, qlen etc) - but what is the correct way to get rid of the chimeric mappings (at least the type where R1/R2 map to different chromosomes?). I can come up with a one-liner (decode to text/check 7th column/encode back to binary), but I think this should be in samtools
- am I wrong?
I am working with hisat2
BAMS, and no flags are set from what I can tell. This is one of the mappings:
SRR12813830.114509898 65 NZ_JAHRBL010000049.1 1405 1 46M NZ_WJPN01000041.1 1609 0 TCCTGGTGGTGCCCTTCCGTCAATTCCTTTAAGTTTCAGCTTTGCA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF AS:i:0 ZS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:46 YT:Z:UP NH:i:18
That example read is missing the 'mapped in proper pair' flag, which is generally not set when the mate is on a different chromosome. You should be able to filter out those reads with
samtools view -f2 infile.bam
.I wanted to retain single-end mappings though. I am very surprised that there are no specific options to filter chimeric reads from
samtools
(I guess there's a flag that needs to be set, so technically that's the mapper's fault, but still)What is the relevance of the
CRAM
andSAM
tags here?No special relevance, but the question can be applied to all 3 formats, so I decided to add them just for the hell of it.