Hello everyone. I am a begginer in this bioinformatics world so my question might be a bit stupid, but I've been looking for answers everywhere and I can't seem to find them. I am currently trying to do a differential abundance analysis on some 16S rRNA sequencing data using R package DESeq2 to find differentially abundant taxa among my two study groups.
The problem I am having is that I find the same microbial Genus (as I don't have species level information) up to three times after filtering based on adjusted p-value and Log2FoldChange. As far as I know, this makes sense as the statistical analysis is done based on ASVs abundances, and different ASVs can belong to the same microbial genus. The thing is that I don't know how to make biological sense of these results. This taxa is the one with the highest Log2FoldChange in my treatment group, but how can I make hypothesis based on this if it is also differentially abundant on my control group?
This analysis is done with a taxonomic classification I've made by training my own classifier via Qiime2, maybe I've made some errors in this process and using a pre-trained classifier could achieve better results? I've also read online that some people do not recommend DESeq2 for this type of data and instead suggest other approaches such as using ANCOM-BC.
Thank you so much for your time.
- Muribaculaceae is a family not a genus
- Who said that ASV with the same taxonomic classification must have the same behaviour?
- Without a proper filtering, in deseq2 extreme fold-changes are tipically returned by features with too many zero across your samples