Hi!
I am trying to do taxonomic annotation of rpS3 proteins from a metagenomic assembly. I used BLAST against the NCBI_nr dabase to do the taxonomic annotation. I selected the option max-target-seq 3, to have the first three matches to the NCBI_nr database. The output looks like this:
NODE_3855_length_8418_cov_6.988521_14 NCO28665.1 100 220 0 0 1 220 1 220 3.36E-155 439 30S ribosomal protein S3 [Caldiserica bacterium]
NODE_3855_length_8418_cov_6.988521_14 NIA10752.1 70.721 222 62 1 1 219 1 222 9.83E-111 327 30S ribosomal protein S3 [Nitrospiraceae bacterium]
NODE_3855_length_8418_cov_6.988521_14 PMP66757.1 71.429 217 62 0 1 217 1 217 1.39E-109 323 30S ribosomal protein S3 [Caldisericum exile]
I wanted to use the LCA to define the taxonomic annotation per contig. However, many contigs had hits from different phyla (see example above). In the example, the contig has two hits for Caldiserica and one for Nitrospiraceae. If I use LCA in this case, the sequence would be classified as bacteria. Therefore, I would like to find one software that does taxonomic assignments based on BLAST outputs based on consensus.
Thanks a lot!
You may want to look into
kraken2
(LINK). While it can use the samenr
database, it uses a different method to identify hits.