Hey everyone, I'm new here, so apologies if my question is broad or a bit obvious. I'm a biotechnology graduate doing an internship for the summer. I have some limited experience in bioinformatics, but I've spent the last month learning QIIME, reading papers and getting a sense of 16s rRNA analysis.
I am looking to create a library of 10 pathogenic bacteria. the aim is to identify with accuracy if the 16S rRNA sample is one of these 10 or not.
I was thinking the best way to do this would be to get 100 sequeneces of each of the species of interest (E. coli, Klebsiella) from SILVA and run a MSA. I've done this with Seaview. From this MSA I would like to create a representative 16s rRNA sequence for each species (think of it as an average of all of them). From the 100 seqs MSA, I was hoping to be able to identify conserved/variable regions for each species, that I can use to accurately identify if the sample contains one of the 10 bacterial species or not.
I was wondering, how best I can go about this. I found this 2012 paper entitled "Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment Nagar et al "
Here, they use some R packages including "Quasi Align". Following the links provided, it shows that these tools were last updated in 2015 and are no longer supported by R.
I then tried to look up more recent papers, but nothing of interest popped up.
I was wondering if you would be so kind as to share your knowledge and give some constructive criticism on shown best to go about this.
In conclusion: I want to identify 16s rRNA conserved and variable regions of 10 different pathogenic bacteria, so I can accurately identify if they are in a sequenced sample or not
Thanks a lot - Rob :)
QIIME2, Mothur and DADA2 implement functions for assigning taxonomic labels to sequences. If you're interested in assigning taxonomic labels to your custom library, you can just use one of those. For example, the
assignTaxonomy
function in DADA2 provides an implementation of the Naïve Bayesian Classifier for this purpose. Further, the 'minBoot' parameter can be used as a measure of accuracy for assigning these labels.Does this help?