I have 15 seabird gut fecal samples, and 30 sediment samples that were sequenced using an Illumina MiSeq (300+300 bp PE). The files I am working with are in demultiplexed FASTQ format, and we expected to get MAXIMUM 50K reads per sample. When I ran a MultiQC I noticed the read counts for many samples were very low, and some samples had much higher read counts than we expected. The samples were isolated using a commercially available kit, and 16S-V4V5 primers were used for sequencing. Here is a short breakdown:
- 4/15 birds received read counts of: ~2000, 18000, 32000 and 16000.
- All the other birds received ~100K reads and 1 bird recieved ~1M reads??
.
- Only 2 sediments samples reached over 50K reads.
- Sediments had an average read count of ~27 000.
- Most sediment samples only reached around the 3000 read count mark.
Essentially the samples either went way over or stayed well under max read count, I am wondering why the read count of some samples went way over the expected max, while others are dwindling at such a low number?
It seems like the sediment samples I am working with did particularly bad. Some things to note:
- The overall quality check shows that many of the samples have a higher number of duplicated reads compared to unique reads
- Overall the phred score of all the samples stays well above 20
My real question is if there's an accepted threshold for read count that I can go by to do my analysis - I realize a lot of my sediment samples will have to be tossed but I am wondering what is an acceptable minimum/maximum number of reads to consider if I wanted to continue my analysis with some of the samples? I am using the samples in the Qiime2 workflow.
Thanks!