Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in dcast because of type mismatch in fun.aggregate #2394

Closed
khotilov opened this issue Sep 28, 2017 · 10 comments · Fixed by #4251
Closed

Segfault in dcast because of type mismatch in fun.aggregate #2394

khotilov opened this issue Sep 28, 2017 · 10 comments · Fixed by #4251
Labels
reshape dcast melt
Milestone

Comments

@khotilov
Copy link
Contributor

With the latest release and development data.table versions, the following example results in a segfault with no warning:

library(data.table)
agg <- function(x) if(length(x) > 0) min(x) else NA
d <- data.table(id = c(1,1,2,2), x = c('y','y','y','z'), v = c('a','b','c','d'))
dcast(d, formula = id ~ x, fun.aggregate = agg, value.var = 'v')

Careless use of NA instead of NA_character_ in the aggregation function is my fault. But I hope such errors could be handled more gracefully.

@MichaelChirico
Copy link
Member

Latest master doesn't segfault:

Error in dcast.data.table(d, formula = id ~ x, fun.aggregate = agg, value.var = "v") :
STRING_ELT() can only be applied to a 'character vector', not a 'logical'

It looks like this might be an improvement from base R though

@MichaelChirico
Copy link
Member

Yep, seems to be that recent R itself errors more gracefully; 3.1.0 still segfaults

docker run -it jangorecki/r-3.1.0
install.packages('data.table', type = 'source', repos = 'http://Rdatatable.github.io/data.table')
library(data.table)
agg <- function(x) if(length(x) > 0) min(x) else NA
d <- data.table(id = c(1,1,2,2), x = c('y','y','y','z'), v = c('a','b','c','d'))
dcast(d, formula = id ~ x, fun.aggregate = agg, value.var = 'v')
 *** caught segfault ***
address 0x80000000, cause 'memory not mapped'

Traceback:
 1: dcast.data.table(d, formula = id ~ x, fun.aggregate = agg, value.var = "v")
 2: dcast(d, formula = id ~ x, fun.aggregate = agg, value.var = "v")

So, not sure if we should try and build a patch to this while we still depend on 3.1.0, or slap a wontfix on it, or bring forward our R dependency (I haven't traced which version of R fixed this)

@jangorecki
Copy link
Member

jangorecki commented May 26, 2019

No point in raising dep. If someone is affected and then can upgrade R. Otherwise we kill all non affected environments running older R.

@MichaelChirico
Copy link
Member

i.e. wontfix then?

@jangorecki
Copy link
Member

jangorecki commented May 26, 2019

we could eventually improve error message to mention that, now it is not very useful

@bpolacco
Copy link

I came across a related problem having to do with NA vs NA_real_. It doesn't generate a Segfault or other error, but it does put a garbage number where NA should appear. Nearly the same example as above, but with numeric value.var instead of character:

library(data.table)
agg <- function(x) if(length(x) > 0) min(x) else NA
d <- data.table(id = c(1,1,2,2), x = c('y','y','y','z'), v = (1:4)/10)
dcast(d, formula = id ~ x, fun.aggregate = agg, value.var = 'v')
   id   y             z
1:  1 0.1 6.927149e-310
2:  2 0.3  4.000000e-01

Note that if I set fill = NA, I get the result I expect, and that is my current workaround.

dcast(d, formula = id ~ x, fun.aggregate = agg, value.var = 'v', fill=NA)
   id   y   z
1:  1 0.1  NA
2:  2 0.3 0.4
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.5

...
other attached packages:
[1] data.table_1.12.8
...

@MichaelChirico
Copy link
Member

@bpolacco can you try running this on current master? i'm actually not getting any error

@MichaelChirico
Copy link
Member

Not sure how (it's a big commit) but this commit fixed this issue as far as I can tell; I'll add a test & file to close:

4aadde8

@bpolacco
Copy link

@MichaelChirico Yes, latest master doesn't break for me. Sorry for not trying that before commenting. To my naive eyes, I'd say the changed if statement in file fcast.c line 27 (was 29) is the 'how' of the fix. Thanks!

@jangorecki jangorecki added the reshape dcast melt label Apr 6, 2020
@mattdowle mattdowle added this to the 1.14.1 milestone May 13, 2021
@mattdowle
Copy link
Member

I ran this test in v1.12.8 where it failed as described, but returns correct result in v1.13.0. So yes it does look like 4aadde8 in v1.13.0 fixed it.
I'll modify the news item in @MichaelChirico's #4251 to include this detail and merge.

@jangorecki jangorecki modified the milestones: 1.14.9, 1.15.0 Oct 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reshape dcast melt
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants