Hi all! I have a little problem: I have 1 bam file (44gb ca) ant it contain the reads from 11 different sample. I have 2 txt file with sample name and a lot tab delimited number.
How can I split this unique BAM file into 11 different bam files?
Is it correct to use the following code? samtools view -bhR readids_for_sample_A.txt File.bam > File_A.bam
Do you have @RG tags in your BAM? See: Split a multisample bam using RG tag information
I'm not sure that my txt file contain the RG tags, because in my files there are this information:
Any idea? Could work like RG tags? How can I produce 11 separated files?
could you do a samtools view input.bam |head -1 and post the results?
this is the output
ok, so for this read, the read group is l7izc? How come rg is not capitalized? How was demultiplexing done?
I tryed to convert my bam file into a sam file to understand something. The complete head is:
[...]
So, I could understand that I have 94 (from complete sam file) distinct ID, from 1 sample (SM, right?) but I know that I have only 11 sample. Am I right?
Are those chromosomes in the ID by any chance? i.e. samples split into chromosomes?
If so, should I not have 24*11 IDs?
So one would think. Have you looked through the collection of 94 to see if there is a pattern consistent with all?
every RG is like this:
the only thing that change is the ID. HD line is:
Any idea?
Then the program I mentioned should work and give you 1 bam file per sample while going through the bam file once.
Ok, now I have 94 bam files, but I have 11 sample, any idea to how can I have 1 file for sample?
the program creates one file per RG in the header, can you do an 'ls -al' in your directory?
This is the 'ls -al' output.
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized.Use
Submit Answers
only for new answers to original question.Those do not look like Chromosome names after all and it does not look like the samples were split.
Yep, you're right! It a strange output, I have 1 big bam file, and the others are very small, but I have just this informations, there aren't any process that I can do for split this file??
Was this data produced by Torrent Suite? Perhaps you can export individual samples from there?
Yes, this data was produce by Torrent Suite. Honestly I don't know, I only know that this file is from a specific analysis with a specific workflow. They ask to me if I could analyze this file, but for my analysis I need 1 file per sample, I tried to analyze the entire unique file, but I can't. Maybe is there a script, R package or similar for analyze a file like this? In particular for aneuplody research.