Skip to content

gnr_resolve not matching the same name multiple times OR matches erroneously #920

Open
@ErikKusch

Description

The Issue

Using the function gnr_resolve(), I never obtain the same matched name for multiple user-supplied names - even when doing so would lead to a clearly better match. These erroneous matches persist even in single-species gnr_resolve()queries.

Minimal Working Example

Running this code:

library(taxize)
sps <- c("Lagopus matu", "Logopus muta", "Lagopus lagopus", "Lagopus muta", "Lagopas lagopus")
GNR_df <- gnr_resolve(sci = sps, best_match_only = TRUE)
GNR_df

results in this output:

# A tibble: 5 × 5
  user_supplied_name submitted_name  matched_name              data_source_title score
* <chr>              <chr>           <chr>                     <chr>             <dbl>
1 Lagopus matu       Lagopus matu    Lagopus Brisson, 1760     Catalogue of Lif… 0.75 
2 Logopus muta       Logopus muta    Lagopus muta (Montin, 17… Catalogue of Lif… 0.75 
3 Lagopus lagopus    Lagopus lagopus Lagopus lagopus           Wikispecies       0.988
4 Lagopus muta       Lagopus muta    Lagopus muta              Wikispecies       0.988
5 Lagopas lagopus    Lagopas lagopus Lagopus lagopus (Linnaeu… Catalogue of Lif… 0.75 

Evidently, the best match for Lagopus matu (first row in the output) should be Lagopus muta as has been matched correctly in row four. Additionally, the matches to Lagopus lagopus (row 3) and Lagopas lagopus (row 5) ought to be the same - Lagopus lagopus.

Interestingly, even when running the gnr_resolve()function only on just the first species:

gnr_resolve(sci = sps[1], best_match_only = TRUE)

still results in the same erroneous match as above:

# A tibble: 1 × 5
  user_supplied_name submitted_name matched_name          data_source_title      score
* <chr>              <chr>          <chr>                 <chr>                  <dbl>
1 Lagopus matu       Lagopus matu   Lagopus Brisson, 1760 Catalogue of Life Che…  0.75

Workaround

For now, I have put together a workaround with the rgbif package:

library(rgbif)
Fixed_Species <- sapply(sps, # loop over species names
    FUN = function(x){
        gbif_resolve <- rgbif::name_backbone_verbose(x) # retrieve gbif backbone matches
        ifelse(gbif_resolve$data$matchType != "NONE", 
               gbif_resolve$data$canonicalName[1], # if match has been made, then pull matched canonical name
               gbif_resolve$alternatives$canonicalName # if no match, then pull out alternative matches from fuzzy matching
              )
    }
)

which, to me, leads to the expected matches:

    Lagopus matu      Logopus muta   Lagopus lagopus      Lagopus muta   Lagopas lagopus 
   "Lagopus muta"    "Lagopus muta" "Lagopus lagopus"    "Lagopus muta" "Lagopus lagopus" 
Session Info
R version 4.3.2 (2023-10-31)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Oslo
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] taxize_0.9.100

loaded via a namespace (and not attached):
 [1] bold_1.3.0        gtable_0.3.4      jsonlite_1.8.7    crayon_1.5.2     
 [5] rgbif_3.7.7       dplyr_1.1.2       compiler_4.3.2    tidyselect_1.2.0 
 [9] Rcpp_1.0.11       xml2_1.3.4        stringr_1.5.0     parallel_4.3.2   
[13] scales_1.2.1      uuid_1.1-1        lattice_0.21-9    ggplot2_3.4.3    
[17] R6_2.5.1          plyr_1.8.8        generics_0.1.3    curl_5.0.2       
[21] oai_0.4.0         iterators_1.0.14  tibble_3.2.1      crul_1.4.0       
[25] munsell_0.5.0     pillar_1.9.0      rlang_1.1.1       utf8_1.2.3       
[29] httpcode_0.3.0    stringi_1.7.12    lazyeval_0.2.2    cli_3.6.1        
[33] magrittr_2.0.3    foreach_1.5.2     digest_0.6.31     grid_4.3.2       
[37] rstudioapi_0.15.0 lifecycle_1.0.3   nlme_3.1-163      vctrs_0.6.3      
[41] glue_1.6.2        data.table_1.14.8 whisker_0.4.1     zoo_1.8-12       
[45] codetools_0.2-19  ape_5.7-1         fansi_1.0.4       colorspace_2.1-0 
[49] conditionz_0.1.0  httr_1.4.7        tools_4.3.2       pkgconfig_2.0.3  

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions