Replies: 1 comment
-
If you use the option |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I am trying to run diamond blastp against a custom made database. My database consists of about 60,000 genomes and I am trying to create a database against which I can blast protein sequences. I tried approaching this in two different ways which are giving me two very different results:
I am creating a separate .dmnd database file for each genome in my database. This produces a folder with about 60,000 .dmnd files which i then query one by one with my multifasta query faa file containg my query sequences. I collate all the hits under a set e-value threshold into a single result file, which is then deduplicated based on the subject sequence name. This produces a good result, however the process is very slow.
I have combined all of the .faa files of the 60,000 genomes into a single .faa file and generated a Diamond database based on this file. Then I run diamond blastp against this .dmnd file, however I am only getting a single hit for each of the entries in my multifasta query file. This process is much faster than the other one, but I suspect this is because I am only getting a single hit for each query sequence.
How can I produce a result similar to what I get in option 1. ? I would like to get all of the hits with a lower e-value than the threshold, not just the top one...
Thanks in advance
Beta Was this translation helpful? Give feedback.
All reactions