Hi,
I have some data in which the library prep isn't known. We're doing QC on it, and trying to determine how contamination filtering should be done. I'm wondering if there is any way to infer if rRNA depletion or mRNA enrichment was done from artifacts within the sample data.
Thank you, Synanth
If you have rRNA sequences of the organism you are working with then you could use
bbduk.sh
in filter mode (guide https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbduk-guide/ ) to look for reads that match the sequences.You could also try to predict the sequences (if you are working with a unique organism) by using one of the programs here: Tools for rRNA gene annotations
It's human data, and I've been using fastq_screen to determine where the reads are from. Fastq_screen shows where reads potentially multimap against more than one genome, but the overlap amount seems to vary highly between samples. I want to try and infer the library prep to see if rRNA depletion was done shoddily on some samples, so that I know how to proceed with the filtering criteria.
Contamination with rRNA as opposed to some other organism are two different things. You can find the sequence of the complete human rDNA repeat in a prior thread (how can i download human ribosomal reference ? ). That should be good enough for the screen with
bbduk.sh
. If the reads do not turn out to be rRNA then you will have to move on to the other possibilities.