Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get() in by and the inconsistent error #4873

Closed
sindribaldur opened this issue Jan 13, 2021 · 3 comments · Fixed by #4982 or #4985
Closed

get() in by and the inconsistent error #4873

sindribaldur opened this issue Jan 13, 2021 · 3 comments · Fixed by #4982 or #4985
Labels
programming parameterizing queries: get, mget, eval, env
Milestone

Comments

@sindribaldur
Copy link

sindribaldur commented Jan 13, 2021

IDT <- data.table(iris)
vr <- "Species"
IDT[, virginca := get(vr) == "virginica"]
IDT[(virginca)][, .N, by = .(round(Sepal.Width), k = round(Sepal.Length), kar = get(vr))] 
# Works fine

IDT[(virginca), .N, by = .(round(Sepal.Width), k = round(Sepal.Length), kar = get(vr))] 
# Error in get(vr) : object 'Species' not found

This is the closest I could get to reproducing the error I got on data in the wild when I added a logical in i, but my actual error message was

column or expression 3 of 'by' or 'keyby' is type builtin. Do not quote column names. Usage: DT[,sum(colC),by=list(colA,month(colB))]

I'm using R 4.0.2 and data.table_1.13.4 on Windows 10.

@jangorecki jangorecki added the programming parameterizing queries: get, mget, eval, env label Jan 13, 2021
@tlapak
Copy link
Contributor

tlapak commented Jan 21, 2021

I did some digging and the issue you are showing here stems from an optimization that data.table does when you combine subsetting and grouping. When column names are present in the by argument then only those will be used for the subsetting and the following computations. The issue here is that the get is only evaluated after that happens which then results in the first error you're reporting here. There's currently some discussion around the use and parsing of get and the .. notation. @jangorecki has been working on an alternative interface.

As I see it, properly fixing this would require either tightening or even completely turning off the optimization which is probably not gonna fly. The other alternative would be to go through the parse tree for all the lazily evaluated arguments.

As for the error message that you posted, kinda hard to tell without the actual code but likely an error/typo on your end:

IDT[, .N, by = .(sum)] 
# Error in `[.data.table`(IDT, , .N, by = .(sum)) : 
#  column or expression 1 of 'by' or 'keyby' is type builtin. Do not quote column names. Usage: DT[,sum(colC),by=list(colA,month(colB))]

@avimallu
Copy link
Contributor

avimallu commented May 9, 2021

Will #4982 close this, if accepted?

@OfekShilon
Copy link
Contributor

I just tested the repro code with the #4982 suggested PR, and it works fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
programming parameterizing queries: get, mget, eval, env
Projects
None yet
6 participants