Database creation issue #703

Cyanoney · 2023-05-18T15:48:44Z

Cyanoney
May 18, 2023

Hello,

I am trying to run diamond blastp against a custom made database. My database consists of about 60,000 genomes and I am trying to create a database against which I can blast protein sequences. I tried approaching this in two different ways which are giving me two very different results:

I am creating a separate .dmnd database file for each genome in my database. This produces a folder with about 60,000 .dmnd files which i then query one by one with my multifasta query faa file containg my query sequences. I collate all the hits under a set e-value threshold into a single result file, which is then deduplicated based on the subject sequence name. This produces a good result, however the process is very slow.
I have combined all of the .faa files of the 60,000 genomes into a single .faa file and generated a Diamond database based on this file. Then I run diamond blastp against this .dmnd file, however I am only getting a single hit for each of the entries in my multifasta query file. This process is much faster than the other one, but I suspect this is because I am only getting a single hit for each query sequence.

How can I produce a result similar to what I get in option 1. ? I would like to get all of the hits with a lower e-value than the threshold, not just the top one...

Thanks in advance

bbuchfink · 2023-05-30T11:07:27Z

bbuchfink
May 30, 2023
Maintainer

If you use the option -k0 you should get all alignments for a query.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database creation issue #703

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Database creation issue #703

Cyanoney May 18, 2023

Replies: 1 comment

bbuchfink May 30, 2023 Maintainer

Cyanoney
May 18, 2023

bbuchfink
May 30, 2023
Maintainer