gnr_resolve
not matching the same name multiple times OR matches erroneously #920
Description
The Issue
Using the function gnr_resolve()
, I never obtain the same matched name for multiple user-supplied names - even when doing so would lead to a clearly better match. These erroneous matches persist even in single-species gnr_resolve()
queries.
Minimal Working Example
Running this code:
library(taxize)
sps <- c("Lagopus matu", "Logopus muta", "Lagopus lagopus", "Lagopus muta", "Lagopas lagopus")
GNR_df <- gnr_resolve(sci = sps, best_match_only = TRUE)
GNR_df
results in this output:
# A tibble: 5 × 5
user_supplied_name submitted_name matched_name data_source_title score
* <chr> <chr> <chr> <chr> <dbl>
1 Lagopus matu Lagopus matu Lagopus Brisson, 1760 Catalogue of Lif… 0.75
2 Logopus muta Logopus muta Lagopus muta (Montin, 17… Catalogue of Lif… 0.75
3 Lagopus lagopus Lagopus lagopus Lagopus lagopus Wikispecies 0.988
4 Lagopus muta Lagopus muta Lagopus muta Wikispecies 0.988
5 Lagopas lagopus Lagopas lagopus Lagopus lagopus (Linnaeu… Catalogue of Lif… 0.75
Evidently, the best match for Lagopus matu (first row in the output) should be Lagopus muta as has been matched correctly in row four. Additionally, the matches to Lagopus lagopus (row 3) and Lagopas lagopus (row 5) ought to be the same - Lagopus lagopus.
Interestingly, even when running the gnr_resolve()
function only on just the first species:
gnr_resolve(sci = sps[1], best_match_only = TRUE)
still results in the same erroneous match as above:
# A tibble: 1 × 5
user_supplied_name submitted_name matched_name data_source_title score
* <chr> <chr> <chr> <chr> <dbl>
1 Lagopus matu Lagopus matu Lagopus Brisson, 1760 Catalogue of Life Che… 0.75
Workaround
For now, I have put together a workaround with the rgbif
package:
library(rgbif)
Fixed_Species <- sapply(sps, # loop over species names
FUN = function(x){
gbif_resolve <- rgbif::name_backbone_verbose(x) # retrieve gbif backbone matches
ifelse(gbif_resolve$data$matchType != "NONE",
gbif_resolve$data$canonicalName[1], # if match has been made, then pull matched canonical name
gbif_resolve$alternatives$canonicalName # if no match, then pull out alternative matches from fuzzy matching
)
}
)
which, to me, leads to the expected matches:
Lagopus matu Logopus muta Lagopus lagopus Lagopus muta Lagopas lagopus
"Lagopus muta" "Lagopus muta" "Lagopus lagopus" "Lagopus muta" "Lagopus lagopus"
Session Info
R version 4.3.2 (2023-10-31)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.1
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/Oslo
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] taxize_0.9.100
loaded via a namespace (and not attached):
[1] bold_1.3.0 gtable_0.3.4 jsonlite_1.8.7 crayon_1.5.2
[5] rgbif_3.7.7 dplyr_1.1.2 compiler_4.3.2 tidyselect_1.2.0
[9] Rcpp_1.0.11 xml2_1.3.4 stringr_1.5.0 parallel_4.3.2
[13] scales_1.2.1 uuid_1.1-1 lattice_0.21-9 ggplot2_3.4.3
[17] R6_2.5.1 plyr_1.8.8 generics_0.1.3 curl_5.0.2
[21] oai_0.4.0 iterators_1.0.14 tibble_3.2.1 crul_1.4.0
[25] munsell_0.5.0 pillar_1.9.0 rlang_1.1.1 utf8_1.2.3
[29] httpcode_0.3.0 stringi_1.7.12 lazyeval_0.2.2 cli_3.6.1
[33] magrittr_2.0.3 foreach_1.5.2 digest_0.6.31 grid_4.3.2
[37] rstudioapi_0.15.0 lifecycle_1.0.3 nlme_3.1-163 vctrs_0.6.3
[41] glue_1.6.2 data.table_1.14.8 whisker_0.4.1 zoo_1.8-12
[45] codetools_0.2-19 ape_5.7-1 fansi_1.0.4 colorspace_2.1-0
[49] conditionz_0.1.0 httr_1.4.7 tools_4.3.2 pkgconfig_2.0.3