Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downstream rank duplicates #807

Closed
sunray1 opened this issue Mar 17, 2020 · 4 comments
Closed

Downstream rank duplicates #807

sunray1 opened this issue Mar 17, 2020 · 4 comments
Labels
Milestone

Comments

@sunray1
Copy link

sunray1 commented Mar 17, 2020

Recently I've been trying to get taxonomic information for a variety of taxa and I've noticed that taxize gets 'stuck' and eventually times out when there are different downstream taxonomic ranks with the same names.

For example, using the family 'Pieridae':

$ sp_out <- downstream("Pieridae", downto = "species", db = "ncbi", ambiguous=FALSE)

will not run because there is a subgenus and a genus (the genus containing the subgenus) called "Euchloe".

Running

$ sp_out <- downstream("Euchloe", downto = "species", db = "ncbi", ambiguous=FALSE)

returns

$ More than one UID found for taxon 'Euchloe'!

            Enter rownumber of taxon (other inputs will return 'NA'):

  status     rank    division scientificname    commonname    uid genus
1 active subgenus butterflies        Euchloe               415320      
2 active    genus butterflies        Euchloe little whites  72254      
  species subsp modificationdate
1               2016/01/21 00:00
2               2017/06/14 00:00

Neither choice will work and eventually a time out occurs. All other ranks within 'Pieridae' seem to work with the 'downstream' command just fine.

Thanks!

@sckott
Copy link
Contributor

sckott commented Mar 17, 2020

thanks for your question @sunray1

i can replicate the issue. I'll have a look

@sckott sckott added this to the v0.9.93 milestone Mar 17, 2020
@sckott sckott added the bug label Mar 17, 2020
sckott added a commit that referenced this issue Mar 17, 2020
queries for certain ids, in this case two subgenera, returned themselves and species within those subgenera
so the while loop never ended
change ncbi_downstream to remove any ids from resulting searches that were in the search themselves
also changed to after takig bind rows, then taking unique to remove any duplicate rows
@sckott
Copy link
Contributor

sckott commented Mar 17, 2020

please try again after restarting R, then reinstalling like remotes::install_github("ropensci/taxize")

ncbi was doing a weird thing where when searching on the two subgenera within Euchloe, it always returned the subgenera themsleves and the species below them, and repeated it so it was stuck indefinitely. should work now

@sunray1
Copy link
Author

sunray1 commented Mar 17, 2020

Wonderful thank you so much!!

@sunray1 sunray1 closed this as completed Mar 18, 2020
@sckott
Copy link
Contributor

sckott commented Mar 18, 2020

glad it works. let me know if what it returns is not correct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants