[dplyr::arrange] interfering with data.table's auto-indexing #259
Closed
Description
opened on Jun 12, 2021
This is a follow-up on this StackOverflow question/answer.
The issue is documented in data.table/issues/5042, and this is a cross-reference because data.table
team suggested there might an issue with dplyr
as well, in the way the indexes are reset.
dplyr::arrange
seems to interfere with auto-indexing in data.table
leading to unexpected wrong results.
MRE :
library(dplyr);
library(data.table)
DT <-
fread(
"iso3c country income
MOZ Mozambique LIC
ZMB Zambia LMIC
ALB Albania UMIC
MOZ Mozambique LIC
ZMB Zambia LMIC
ALB Albania UMIC
"
)
codes <- c("ALB", "ZMB")
options(datatable.auto.index = TRUE) # Default
DT <- distinct(DT) %>% as.data.table()
# Index creation because %in% is used for the first time
DT[iso3c %in% codes,verbose=T]
# Index mixed up by arrange
DT <- DT %>% arrange(iso3c) %>% as.data.table()
# this is wack because data.table uses the old index where row were rearranged:
DT[iso3c %in% codes,verbose=T]
#> iso3c country income
#> 1: ALB Albania UMIC
# this works because (...) prevents the parser to use auto-index
DT[(iso3c %in% codes)]
#> iso3c country income
#> 1: ALB Albania UMIC
#> 2: ZMB Zambia LMIC
Metadata
Assignees
Labels
No labels
Activity