My final goal is to make kallisto|bustools ("kb count") take a BAM file as input. Since kb requires FASTQ input, I have to convert BAM to FASTQ first.
The original FASTQ files for this sample look like:
SRR6470906_S1_L001_R1_001.fastq.gz
SRR6470906_S1_L001_R2_001.fastq.gz
SRR6470906_S1_L002_R1_001.fastq.gz
SRR6470906_S1_L002_R2_001.fastq.gz
Cell Ranger produces a single BAM file from them, and its stats are:
244560805 + 0 in total (QC-passed reads + QC-failed reads)
244560805 + 0 primary
0 + 0 secondary
0 + 0 supplementary
109475755 + 0 duplicates
109475755 + 0 primary duplicates
238794765 + 0 mapped (97.64% : N/A)
238794765 + 0 primary mapped (97.64% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
It appears to be single end; when I try to convert it to two FASTQ files (samtools fastq -1 .. -2...), the output files are empty. Therefore, my conversion command is:
samtools sort -n --threads 64 -O bam possorted_genome_bam.bam | samtools fastq --threads 64 > converted_906.fastq.gz
which generates that single FASTQ file. "kb count" requires at least two FASTQ files and it throws an error otherwise.
Is there a way to generate the proper FASTQ files (ideally, all four) from Cell Ranger's BAM or it's a dead end?
Update
I followed ATpoint's advice and used 10x's bamtofastq utility:
/usr/bin/time bamtofastq_linux --nthreads=64 ./SRR6470906_S1/possorted_genome_bam.bam /SRR6470906_S1_FASTQ_converted
It creates 12 FASTQ files:
$ ls -l SRR6470906_S1_FASTQ_converted/*
SRR6470906_S1_FASTQ_converted/SRR6470906_S1_0_1_HL73JBCXY:
total 15199836
-rw-r--r-- 1 flow flowuser 2065966716 Mar 29 13:54 bamtofastq_S1_L002_R1_001.fastq.gz
-rw-r--r-- 1 flow flowuser 2063237359 Mar 29 14:03 bamtofastq_S1_L002_R1_002.fastq.gz
-rw-r--r-- 1 flow flowuser 1571415091 Mar 29 14:09 bamtofastq_S1_L002_R1_003.fastq.gz
-rw-r--r-- 1 flow flowuser 3517361843 Mar 29 13:54 bamtofastq_S1_L002_R2_001.fastq.gz
-rw-r--r-- 1 flow flowuser 3553693269 Mar 29 14:03 bamtofastq_S1_L002_R2_002.fastq.gz
-rw-r--r-- 1 flow flowuser 2792919205 Mar 29 14:09 bamtofastq_S1_L002_R2_003.fastq.gz
SRR6470906_S1_FASTQ_converted/SRR6470906_S1_0_1_HLFGJBCXY:
total 11609908
-rw-r--r-- 1 flow flowuser 2026939869 Mar 29 13:57 bamtofastq_S1_L002_R1_001.fastq.gz
-rw-r--r-- 1 flow flowuser 2020845366 Mar 29 14:07 bamtofastq_S1_L002_R1_002.fastq.gz
-rw-r--r-- 1 flow flowuser 291803225 Mar 29 14:09 bamtofastq_S1_L002_R1_003.fastq.gz
-rw-r--r-- 1 flow flowuser 3468170873 Mar 29 13:57 bamtofastq_S1_L002_R2_001.fastq.gz
-rw-r--r-- 1 flow flowuser 3500377653 Mar 29 14:07 bamtofastq_S1_L002_R2_002.fastq.gz
-rw-r--r-- 1 flow flowuser 580369333 Mar 29 14:09 bamtofastq_S1_L002_R2_003.fastq.gz
My new question is: how come that in the original files we saw both L001 and L002, but the converted files have only L002?
There's no point in separating lanes 1 and 2. It's fine if they are combined.