-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[.data.table
crashes with segfault while grouping with more than 79 threads
#5077
Comments
I did some more testing and it turns out the double free for GDB Backtrace for `N = 80`#0 0x00007ffff6d2f420 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007ffff6d30a01 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x00007ffff6d72877 in __libc_message () from /lib64/libc.so.6
No symbol table info available.
#3 0x00007ffff6d79093 in malloc_printerr () from /lib64/libc.so.6
No symbol table info available.
#4 0x00007ffff6d7a999 in _int_free () from /lib64/libc.so.6
No symbol table info available.
#5 0x00007ffff05eecde in cleanup () at forder.c:88
No locals.
#6 0x00007ffff05f2541 in forder (DT=0x30c4190, by=0x30c41c8, retGrpArg=<optimized out>, sortGroupsArg=<optimized out>, ascArg=0x30d3710, naArg=<optimized out>) at forder.c:771
n_protect = 3
verbose = true
n_cplx = <optimized out>
ans = 0x3576180
ncol = 1
keyAlloc = <optimized out>
spare = <optimized out>
Rf_isReal = <optimized out>
complexRerun = <optimized out>
CplxPart = 0x6215a0
#7 0x00007ffff783bb64 in R_doDotCall (ofun=<optimized out>, nargs=<optimized out>, cargs=<optimized out>, call=0x2ac6128) at dotcode.c:614
fun = <optimized out>
retval = <optimized out>
#8 0x00007ffff787b38a in bcEval (body=body@entry=0x2ad4de0, rho=rho@entry=0x2a7e350, useCache=useCache@entry=TRUE) at eval.c:7655
cargs = {0x3527f20, 0x6215a0, 0x30d36a0, 0x30d36d8, 0x30d3710, 0x30d3748, 0x80, 0x7ffff6d7def7 <malloc+199>, 0x0, 0xffffffff, 0x7fffffff9ea0, 0x80, 0x13, 0x7ffff78c4ec4 <Rf_allocVector3+1796>,
0x7fffffff9ea0, 0x7fff00000007}
vmax = 0x0
val = <optimized out>
call = 0x2ac6128
nargs = 6
ofun = 0x7ffff05f1b90 <forder>
sym = <optimized out>
args = <optimized out>
op = <optimized out>
retvalue = <optimized out>
constants = 0xccca50
pc = <optimized out>
codebase = 0x294e1e0
oldntop = <optimized out>
evalcount = 391
oldsrcref = <optimized out>
oldbcintactive = <optimized out>
oldbcbody = <optimized out>
oldbcpc = <optimized out>
currentpc = 0x294e5d0
which = 0
init = <optimized out>
old_byte_code = <optimized out>
ibcl_oldptop = 0x7ffff3b10020
vcache = 0x7ffff3b10090
smallcache = <optimized out>
loop = <optimized out>
#9 0x00007ffff78855f0 in Rf_eval (e=0x2ad4de0, rho=rho@entry=0x2a7e350) at eval.c:723
op = <optimized out>
tmp = <optimized out>
evalcount = 508
bcintactivesave = 1
srcrefsave = 0x6215a0
depthsave = 2
#10 0x00007ffff788736f in R_execClosure (call=call@entry=0x2e8cc68, newrho=newrho@entry=0x2a7e350, sysparent=<optimized out>, rho=rho@entry=0x2d18570, arglist=arglist@entry=0x2a7e1c8, op=op@entry=0x2ad48d8) at eval.c:1888
body = 0x2ad4de0
cntxt = {nextcontext = 0x7fffffffb710, callflag = 12, cjmpbuf = {{__jmpbuf = {44912096, -69324313296529690, 140737351868760, 44910808, 140737351868704, 6428064, 69330512277772006, 69347660491101926}, __mask_was_saved = 0, __saved_mask = {__val = {7, 140737351869784, 1701147977, 19819992, 80008, 256025600, 140724853501968, 53199584, 140737351868760, 140724853501968, 6425936, 53199584, 140737351869872, 55738144, 140737345629784, 140737351868760}}}}, cstacktop = 16, evaldepth = 2, promargs = 0x2a7e1c8, callfun = 0x2ad48d8, sysparent = 0x2d18570, call = 0x2e8cc68, cloenv = 0x2a7e350, conexit = 0x6215a0, cend = 0x0, cenddata = 0x7ffff78bab14 <Rf_matchArgs_NR+1364>, vmax = 0x0, intsusp = 0, gcenabled = 1, bcintactive = 1, bcbody = 0x2f8ee50, bcpc = 0x7fffffffaf78, handlerstack = 0x6215a0, restartstack = 0x6215a0, prstack = 0x0, nodestack = 0x7ffff3b10080, bcprottop = 0x7ffff3b10020, srcref = 0x621488, browserfinish = 0, returnValue = 0x0, jumptarget = 0x0, jumpmask = 0}
dbg = FALSE
#11 0x00007ffff7888137 in Rf_applyClosure (call=call@entry=0x2e8cc68, op=op@entry=0x2ad48d8, arglist=arglist@entry=0x2a7e1c8, rho=rho@entry=0x2d18570, suppliedvars=<optimized out>) at eval.c:1814
formals = <optimized out>
actuals = <optimized out>
savedrho = <optimized out>
newrho = 0x2a7e350
f = 0x6215a0
a = 0x6215a0
is_getter_call = FALSE
val = <optimized out>
#12 0x00007ffff787c519 in bcEval (body=body@entry=0x2f8ee50, rho=rho@entry=0x2d18570, useCache=useCache@entry=TRUE) at eval.c:7067
fun = 0x2ad48d8
call = 0x2e8cc68
value = 0x0
flag = <optimized out>
args = <optimized out>
retvalue = <optimized out>
constants = 0x2631f50
pc = 0x2c58438
codebase = 0x2c403e0
oldntop = <optimized out>
evalcount = 391
oldsrcref = <optimized out>
oldbcintactive = <optimized out>
oldbcbody = <optimized out>
oldbcpc = <optimized out>
currentpc = 0x2c58420
which = 0
init = <optimized out>
old_byte_code = <optimized out>
ibcl_oldptop = 0x7ffff3b0f010
vcache = 0x7ffff3b0f020
smallcache = <optimized out>
loop = <optimized out>
#13 0x00007ffff78855f0 in Rf_eval (e=0x2f8ee50, rho=rho@entry=0x2d18570) at eval.c:723
op = <optimized out>
tmp = <optimized out>
evalcount = 508
bcintactivesave = 0
srcrefsave = 0x6215a0
depthsave = 1
#14 0x00007ffff788736f in R_execClosure (call=call@entry=0x2d18148, newrho=newrho@entry=0x2d18570, sysparent=<optimized out>, rho=rho@entry=0x2f8d330, arglist=arglist@entry=0x2f92f78, op=op@entry=0x2f8d8a8) at eval.c:1888
body = 0x2f8ee50
cntxt = {nextcontext = 0x7fffffffbac0, callflag = 12, cjmpbuf = {{__jmpbuf = {49868368, -69324529462569242, 140737351868760, 49862824, 140737351868704, 6428064, 69330512674133734, 69347660491101926}, __mask_was_saved = 0, __saved_mask = {__val = {4421, 0, 0, 140737351869784, 140737346569376, 140737351869784, 140737347655273, 0, 140737351869176, 9627792, 2, 0, 140737346572518, 8371735909806386779, 139639678329441, 6428064}}}}, cstacktop = 15, evaldepth = 1, promargs = 0x2f92f78, callfun = 0x2f8d8a8, sysparent = 0x6595c8, call = 0x2d18148, cloenv = 0x2d18570, conexit = 0x2d11638, cend = 0x0, cenddata = 0x0, vmax = 0x0, intsusp = 0, gcenabled = 1, bcintactive = 0, bcbody = 0x0, bcpc = 0x0, handlerstack = 0x6215a0, restartstack = 0x6215a0, prstack = 0x0, nodestack = 0x7ffff3b0f010, bcprottop = 0x7ffff3b0f010, srcref = 0x6215a0, browserfinish = 0, returnValue = 0x0, jumptarget = 0x0, jumpmask = 0}
dbg = FALSE
#15 0x00007ffff7888137 in Rf_applyClosure (call=<optimized out>, op=<optimized out>, arglist=<optimized out>, rho=0x2f8d330, suppliedvars=<optimized out>) at eval.c:1814
formals = <optimized out>
actuals = <optimized out>
savedrho = <optimized out>
newrho = 0x2d18570
f = 0x6215a0
a = 0x6215a0
is_getter_call = FALSE
val = <optimized out>
#16 0x00007ffff78c9463 in applyMethod (call=call@entry=0x2d18148, op=op@entry=0x2f8d8a8, args=<optimized out>, rho=rho@entry=0x2f8d330, newvars=newvars@entry=0x2d22398) at objects.c:118
ans = <optimized out>
#17 0x00007ffff78ca203 in dispatchMethod (sxp=sxp@entry=0x2f8d8a8, dotClass=dotClass@entry=0x30acb88, cptr=cptr@entry=0x7fffffffbac0, method=method@entry=0x26c2e88, generic=0x7ffff79d1669 "[", rho=0x2f8d330, callrho=0x6595c8, defrho=0x621530, op=<optimized out>, op=<optimized out>) at objects.c:436
newvars = 0x2d22398
newcall = <optimized out>
matchedarg = <optimized out>
ans = <optimized out>
#18 0x00007ffff78ca6b4 in Rf_usemethod (generic=generic@entry=0x7ffff79d1669 "[", obj=obj@entry=0x92e890, call=call@entry=0x2f92df0, args=args@entry=0x2f92f78, rho=rho@entry=0x2f8d330, callrho=callrho@entry=0x6595c8, defrho=0x621530, ans=0x7fffffffbcc8) at objects.c:476
vmax = <optimized out>
ss = <optimized out>
klass = 0x30acb88
method = 0x26c2e88
sxp = 0x2f8d8a8
op = 0x62a5e0
i = 0
nclass = <optimized out>
cptr = 0x7fffffffbac0
#19 0x00007ffff788c651 in Rf_DispatchOrEval (call=call@entry=0x2f92df0, op=op@entry=0x62a5e0, generic=generic@entry=0x7ffff79d1669 "[", args=0x2f92ed0, rho=rho@entry=0x6595c8, ans=ans@entry=0x7fffffffbcc8, dropmissing=0, argsevald=0) at eval.c:3610
cntxt = {nextcontext = 0x7ffff7dd6360 <R_Toplevel>, callflag = 20, cjmpbuf = {{__jmpbuf = {140733193388032, 140737351869176, 140737351869784, -69330188288551194, 140737351868760, 38117464, 140737351868704, 6428064}, __mask_was_saved = -41314586, __saved_mask = {__val = {69347660491101926, 0, 0, 0, 6657480, 140737488338816, 140737351934547, 1, 0, 0, 140737345000376, 140737488338720, 140737351963754, 0, 0, 1}}}}, cstacktop = 7, evaldepth = 1, promargs = 0x2f92f78, callfun = 0x62a5e0, sysparent = 0x6595c8, call = 0x2f92df0, cloenv = 0x2f8d330, conexit = 0x6215a0, cend = 0x0, cenddata = 0x6215a0, vmax = 0x0, intsusp = 0, gcenabled = 1, bcintactive = 0, bcbody = 0x0, bcpc = 0x0, handlerstack = 0x6215a0, restartstack = 0x6215a0, prstack = 0x0, nodestack = 0x7ffff3b0f010, bcprottop = 0x7ffff3b0f010, srcref = 0x6215a0, browserfinish = 0, returnValue = 0x0, jumptarget = 0x0, jumpmask = 0}
pargs = 0x2f92f78
rho1 = 0x2f8d330
pt = <optimized out>
x = 0x92e890
dots = <optimized out>
nprotect = 3
#20 0x00007ffff7947a81 in R_DispatchOrEvalSP (ans=0x7fffffffbcc8, rho=0x6595c8, args=<optimized out>, generic=0x7ffff79d1669 "[", op=0x62a5e0, call=0x2f92df0) at subset.c:633
prom = 0x2f92e98
disp = <optimized out>
prom = <optimized out>
disp = <optimized out>
x = <optimized out>
il__x__ = <optimized out>
irc__x__ = <optimized out>
dl__x__ = <optimized out>
drc__x__ = <optimized out>
dl__x__ = <optimized out>
drc__x__ = <optimized out>
#21 do_subset (call=0x2f92df0, op=0x62a5e0, args=0x2f92db8, rho=0x6595c8) at subset.c:653
ans = 0x7ffff7856838 <Rf_findFun3+328>
#22 0x00007ffff78859e2 in Rf_eval (e=e@entry=0x2f92df0, rho=rho@entry=0x6595c8) at eval.c:798
save = 2
flag = 0
vmax = 0x0
op = 0x62a5e0
tmp = <optimized out>
evalcount = 508
bcintactivesave = 0
srcrefsave = 0x6215a0
depthsave = 0
#23 0x00007ffff78b888a in Rf_ReplIteration (rho=0x6595c8, savestack=<optimized out>, browselevel=0, state=0x7fffffffbf80) at main.c:264
c = <optimized out>
browsevalue = <optimized out>
value = <optimized out>
wasDisplayed = FALSE
thisExpr = 0x2f92df0
state = <optimized out>
savestack = <optimized out>
rho = <optimized out>
c = <optimized out>
browsevalue = <optimized out>
value = <optimized out>
wasDisplayed = <optimized out>
browselevel = <optimized out>
thisExpr = <optimized out>
c = <optimized out>
browsevalue = <optimized out>
value = <optimized out>
thisExpr = <optimized out>
wasDisplayed = FALSE
#24 0x00007ffff78b8c41 in R_ReplConsole (rho=0x6595c8, savestack=0, browselevel=0) at main.c:314
status = <optimized out>
state = {status = PARSE_OK, prompt_type = 1, browselevel = 0, buf = "library(data.table);M <- 160;M4 <- M / 4;J <- 10001;setDTthreads(threads = 80);d2 <- data.table(Time=rep(1:J, each=M),AgentID=rep(1:M4, times=4*J*M),i_phase=sample(0:1, J*M, replace=T)*2-1,State=sam"..., bufp = 0x7fffffffc1b6 ""}
#25 0x00007ffff78b8cd8 in run_Rmainloop () at main.c:1113
No locals.
#26 0x00007ffff78b8d32 in Rf_mainloop () at main.c:1120
No locals.
#27 0x00000000004007cb in main (ac=<optimized out>, av=<optimized out>) at Rmain.c:29
No locals.
But disabling checks for simple memory errors with export MALLOC_CHECK_=3 before running
And these errors - the The (gdb) p my_TMP[my_starts[my_key[i]]+1]
$1 = 0
(gdb) GDB Backtrace for `N = 80`#0 0x00007ffff058e9bd in radix_r (from=1640, to=<optimized out>, radix=1) at forder.c:995
i = 75
osub = 0x7fffb24f59e0
my_starts = {0, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 24, 24, 24, 24, 24, 32, 32, 32, 32, 32, 39, 40, 40, 40, 40, 48, 48, 48, 48, 48, 55, 56, 56, 56, 56, 64, 64, 64, 64, 64, 72, 72, 72, 72, 72,
76, 80, 80, 80, 80, 80, 88, 88, 88, 88, 88, 96, 96, 96, 96, 96, 103, 104, 104, 104, 104, 112, 112, 112, 112, 112, 120, 120, 120, 120, 120, 128, 128, 128, 128, 128, 136, 136, 136, 136, 136,
144, 144, 144, 144, 144, 152, 152, 152, 152, 152, 160, 160, 160, 160, 160, 168, 168, 168, 168, 168, 176, 176, 176, 176, 176, 183, 184, 184, 184, 184, 192, 192, 192, 192, 192, 199, 200, 200,
200, 200, 208, 208, 208, 208, 208, 216, 216, 216, 216, 216, 224, 224, 224, 224, 224, 231, 232, 232, 232, 232, 240, 240, 240, 240, 240, 248, 248, 248, 248, 248, 256, 256, 256, 256, 256, 264,
264, 264, 264, 264, 272, 272, 272, 272, 272, 280, 280, 280, 280, 280, 288, 288, 288, 288, 288, 296, 296, 296, 296, 296, 304, 304, 304, 304, 304, 311, 312, 312, 312, 312, 320, 320, 320...}
my_starts_copy = {0, 0, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 24, 24, 24, 24, 24, 32, 32, 32, 32, 32, 39, 40, 40, 40, 40, 48, 48, 48, 48, 48, 55, 56, 56, 56, 56, 64, 64, 64, 64, 64, 72, 72, 72,
72, 72, 80, 80, 80, 80, 80, 88, 88, 88, 88, 88, 96, 96, 96, 96, 96, 103, 104, 104, 104, 104, 112, 112, 112, 112, 112, 120, 120, 120, 120, 120, 128, 128, 128, 128, 128, 136, 136, 136, 136,
136, 144, 144, 144, 144, 144, 152, 152, 152, 152, 152, 160, 160, 160, 160, 160, 168, 168, 168, 168, 168, 176, 176, 176, 176, 176, 183, 184, 184, 184, 184, 192, 192, 192, 192, 192, 199, 200,
200, 200, 200, 208, 208, 208, 208, 208, 216, 216, 216, 216, 216, 224, 224, 224, 224, 224, 231, 232, 232, 232, 232, 240, 240, 240, 240, 240, 248, 248, 248, 248, 248, 256, 256, 256, 256, 256,
264, 264, 264, 264, 264, 272, 272, 272, 272, 272, 280, 280, 280, 280, 280, 288, 288, 288, 288, 288, 296, 296, 296, 296, 296, 304, 304, 304, 304, 304, 311, 312, 312, 312, 312, 320, 320, 320...}
my_TMP = 0x7ffce85c6ed4
my_counts = {0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 7, 1, 0, 0, 0, 8, 0, 0, 0, 0, 7, 1, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0,
0, 7, 1, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 7, 1, 0, 0, 0, 8, 0, 0, 0, 0, 7, 1, 0,
0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 7, 1, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0,
0, 0, 0, 7, 1, 0, 0, 0, 8, 0, 0, 0...}
my_ugrp = 0x23da570 "f"
my_key = 0x26f2848 "\001\001\001\001\001\001\001\001\006\006\006\006\006\006\006\006\v\v\v\v\v\v\v\v\020\020\020\020\020\020\020\020\025\025\025\026\025\025\025\025\032\032\032\032\032\032\032\032\037\037\037\037\037\037\037 $$$$$$$$))))))))........3333333388888888===>====BBBBBBBBGGGGGGGGLLLLLLLLQQQQQQQQVVVVVVVV[[[[[[[[````````eeeeeeeejjjjjjjjooopoooottttttttyyyyyyyz"...
my_gs = 0x0
ngrp = 58
skip = <optimized out>
my_n = 408
batchSize = <optimized out>
nBatch = <optimized out>
lastBatchSize = <optimized out>
counts = <optimized out>
ugrps = <optimized out>
ngrps = <optimized out>
skip = <optimized out>
n_rem = <optimized out>
ugrp = "\000\000\b\000\b\000\b\000\b\000\b\000\020\000\020\000\020\000\020\000\020\000\030\000\030\000\030\000\030\000\030\000 \000 \000 \000 \000 \000'\000(\000(\000(\000(\000\060\000\060\000\060\000\060\000\060\000\067\000\070\000\070\000\070\000\070\000@\000@\000@\000@\000@\000H\000H\000H\000H\000H\000L\000P\000P\000P\000P\000P\000X\000X\000X\000X\000X\000`\000`\000`\000`\000`\000g\000h\000h\000h\000h\000p\000p\000p\000p\000p\000x\000x\000x\000x\000x\000\200\000\200\000\200\000\200\000\200\000\210\000\210\000\210\000\210\000\210\000\220\000\220\000\220\000\220\000\220\000\230\000\230\000\230\000\230\000\230\000\240\000\240\000\240\000"...
seen = {false, false, false, false, 8, false, 8, false, 8, false, 8, false, 8, false, 16, false, 16, false, 16, false, 16, false, 16, false, 24, false, 24, false, 24, false, 24, false, 24,
false, 32, false, 32, false, 32, false, 32, false, 32, false, 39, false, 40, false, 40, false, 40, false, 40, false, 48, false, 48, false, 48, false, 48, false, 48, false, 55, false, 56,
false, 56, false, 56, false, 56, false, 64, false, 64, false, 64, false, 64, false, 64, false, 72, false, 72, false, 72, false, 72, false, 72, false, 80, false, 80, false, 80, false, 80,
false, 80, false, 88, false, 88, false, 88, false, 88, false, 88, false, 96, false, 96, false, 96, false, 96, false, 96, false, 103, false, 104, false, 104, false, 104, false, 104, false,
112, false, 112, false, 112, false, 112, false, 112, false, 120, false, 120, false, 120, false, 120, false, 120, false, 128, false, 128, false, 128, false, 128, false, 128, false, 136, false,
136, false, 136, false, 136, false, 136, false, 144, false, 144, false, 144, false, 144, false, 144, false, 152, false, 152, false, 152, false, 152, false, 152, false, 160, false, 160, false,
160, false...}
ngrp = <optimized out>
last_seen = <optimized out>
starts = <optimized out>
my_gs = 0x0
#1 0x00007ffff0590828 in radix_r (from=<optimized out>, to=<optimized out>, radix=<optimized out>) at forder.c:1238
start = <optimized out>
start = <optimized out>
i = <optimized out>
i = <optimized out>
anyBig = <optimized out>
my_n = 0
batchSize = <optimized out>
nBatch = <optimized out>
lastBatchSize = <optimized out>
counts = <optimized out>
ugrps = <optimized out>
ngrps = <optimized out>
skip = <optimized out>
n_rem = <optimized out>
ugrp = "X\000`\000`\000`\000`\000`\000g\000h\000h\000h\000h\000p\000p\000p\000p\000p\000x\000x\000x\000x\000x\000\200\000\200\000\200\000\200\000\200\000\210\000\210\000\210\000\210\000\210\000\220\000\220\000\220\000\220\000\220\000\230\000\230\000\230\000\230\000\230\000\240\000\240\000\240\000\240\000\240\000\250\000\250\000\250\000\250\000\250\000\260\000\260\000\260\000\260\000\260\000\267\000\270\000\270\000\270\000\270\000\300\000\300\000\300\000\300\000\300\000\307\000\310\000\310\000\310\000\310\000\320\000\320\000\320\000\320\000\320\000\330\000\330\000\330\000\330\000\330\000\340\000\340\000\340\000\340\000\340\000\347\000\350\000\350\000\350\000\350\000\360\000\360\000\360\000\360\000\360\000\370\000\370\000\370\000\370\000"...
seen = {88, false, 96, false, 96, false, 96, false, 96, false, 96, false, 103, false, 104, false, 104, false, 104, false, 104, false, 112, false, 112, false, 112, false, 112, false, 112, false, 120, false, 120, false, 120, false, 120, false, 120, false, 128, false, 128, false, 128, false, 128, false, 128, false, 136, false, 136, false, 136, false, 136, false, 136, false, 144, false, 144, false, 144, false, 144, false, 144, false, 152, false, 152, false, 152, false, 152, false, 152, false, 160, false, 160, false, 160, false, 160, false, 160, false, 168, false, 168, false, 168, false, 168, false, 168, false, 176, false, 176, false, 176, false, 176, false, 176, false, 183, false, 184, false, 184, false, 184, false, 184, false, 192, false, 192, false, 192, false, 192, false, 192, false, 199, false, 200, false, 200, false, 200, false, 200, false, 208, false, 208, false, 208, false, 208, false, 208, false, 216, false, 216, false, 216, false, 216, false, 216, false, 224, false, 224, false, 224, false, 224, false, 224, false, 231, false, 232, false, 232, false, 232, false, 232, false, 240, false, 240, false, 240, false, 240, false, 240, false, 248, false, 248, false, 248, false, 248, false...}
ngrp = <optimized out>
last_seen = <optimized out>
starts = <optimized out>
my_gs = 0x0
#2 0x00007ffff72eb46e in ?? () from /usr/lib64/libgomp.so.1
No symbol table info available.
#3 0x00007ffff70b94f9 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4 0x00007ffff6df1ecf in clone () from /lib64/libc.so.6
No symbol table info available.
But the (gdb) p my_TMP[my_starts[my_key[i]]+1]
Cannot access memory at address 0x4a8405c
(gdb) p my_starts[my_key[i]]
$1 = 1
(gdb) p osub[i]
$2 = 23761
(gdb) GDB Backtrace for `N = 96`#0 0x00007ffff05f19bd in radix_r (from=23760, to=<optimized out>, radix=1) at forder.c:995
i = 0
osub = 0x358d2b0
my_starts = {0, 0, 1, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 24, 24, 24, 24, 24, 32, 32, 32, 32, 32, 40, 40, 40, 40, 40, 47, 48, 48, 48, 48, 55, 56, 56, 56, 56, 64, 64, 64, 64, 64, 72, 72, 72, 72,
72, 80, 80, 80, 80, 80, 88, 88, 88, 88, 88, 96, 96, 96, 96, 96, 104, 104, 104, 104, 104, 111, 112, 112, 112, 112, 120, 120, 120, 120, 120, 127, 128, 128, 128, 128, 136, 136, 136, 136, 136,
143, 144, 144, 144, 144, 152, 152, 152, 152, 152, 160, 160, 160, 160, 160, 167, 168, 168, 168, 168, 176, 176, 176, 176, 176, 184, 184, 184, 184, 184, 192, 192, 192, 192, 192, 199, 200, 200,
200, 200, 208, 208, 208, 208, 208, 216, 216, 216, 216, 216, 224, 224, 224, 224, 224, 232, 232, 232, 232, 232, 240, 240, 240, 240, 240, 248, 248, 248, 248, 248, 256, 256, 256, 256, 256, 264,
264, 264, 264, 264, 272, 272, 272, 272, 272, 279, 280, 280, 280, 280, 288, 288, 288, 288, 288, 296, 296, 296, 296, 296, 304, 304, 304, 304, 304, 312, 312, 312, 312, 312, 320, 320...}
my_starts_copy = {0, 0, 0, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 24, 24, 24, 24, 24, 32, 32, 32, 32, 32, 40, 40, 40, 40, 40, 47, 48, 48, 48, 48, 55, 56, 56, 56, 56, 64, 64, 64, 64, 64, 72, 72, 72,
72, 72, 80, 80, 80, 80, 80, 88, 88, 88, 88, 88, 96, 96, 96, 96, 96, 104, 104, 104, 104, 104, 111, 112, 112, 112, 112, 120, 120, 120, 120, 120, 127, 128, 128, 128, 128, 136, 136, 136, 136,
136, 143, 144, 144, 144, 144, 152, 152, 152, 152, 152, 160, 160, 160, 160, 160, 167, 168, 168, 168, 168, 176, 176, 176, 176, 176, 184, 184, 184, 184, 184, 192, 192, 192, 192, 192, 199, 200,
200, 200, 200, 208, 208, 208, 208, 208, 216, 216, 216, 216, 216, 224, 224, 224, 224, 224, 232, 232, 232, 232, 232, 240, 240, 240, 240, 240, 248, 248, 248, 248, 248, 256, 256, 256, 256, 256,
264, 264, 264, 264, 264, 272, 272, 272, 272, 272, 279, 280, 280, 280, 280, 288, 288, 288, 288, 288, 296, 296, 296, 296, 296, 304, 304, 304, 304, 304, 312, 312, 312, 312, 312, 320, 320...}
my_TMP = 0x4a84054
my_counts = {0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 7, 1, 0, 0, 0, 7, 1, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0,
0, 0, 8, 0, 0, 0, 0, 7, 1, 0, 0, 0, 8, 0, 0, 0, 0, 7, 1, 0, 0, 0, 8, 0, 0, 0, 0, 7, 1, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 7, 1, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 7, 1,
0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 7, 1, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0, 0, 0, 8,
0, 0, 0, 0, 8, 0, 0, 0, 0, 8, 0, 0...}
my_ugrp = 0x2adea70 "\200"
my_key = 0x3176240 "\002\002\002\002\002\002\002\002\a\a\a\a\a\a\a\a\f\f\f\f\f\f\f\f\021\021\021\021\021\021\021\021\026\026\026\026\026\026\026\026\033\033\033\033\033\033\033\034 !%%%%%%%%********////////4444444499999999>>>>>>>>CCCCCCCDHHHHHHHHMMMMMMMNRRRRRRRRWWWWWWWX\\\\\\\\\\\\\\\\aaaaaaaafffgffffkkkkkkkkppppppppuuuuuuuuzzzzzzz{"...
my_gs = 0x0
ngrp = 60
skip = <optimized out>
my_n = 408
batchSize = <optimized out>
nBatch = <optimized out>
lastBatchSize = <optimized out>
counts = <optimized out>
ugrps = <optimized out>
ngrps = <optimized out>
skip = <optimized out>
n_rem = <optimized out>
ugrp = "\000\000\000\000\001\000\b\000\b\000\b\000\b\000\b\000\020\000\020\000\020\000\020\000\020\000\030\000\030\000\030\000\030\000\030\000 \000 \000 \000 \000 \000(\000(\000(\000(\000(\000/\000\060\000\060\000\060\000\060\000\067\000\070\000\070\000\070\000\070\000@\000@\000@\000@\000@\000H\000H\000H\000H\000H\000P\000P\000P\000P\000P\000X\000X\000X\000X\000X\000`\000`\000`\000`\000`\000h\000h\000h\000h\000h\000o\000p\000p\000p\000p\000x\000x\000x\000x\000x\000\177\000\200\000\200\000\200\000\200\000\210\000\210\000\210\000\210\000\210\000\217\000\220\000\220\000\220\000\220\000\230\000\230\000\230\000\230\000\230\000\240\000\240\000"...
seen = {false, false, false, false, false, false, 8, false, 8, false, 8, false, 8, false, 8, false, 16, false, 16, false, 16, false, 16, false, 16, false, 24, false, 24, false, 24, false, 24,
false, 24, false, 32, false, 32, false, 32, false, 32, false, 32, false, 40, false, 40, false, 40, false, 40, false, 40, false, 47, false, 48, false, 48, false, 48, false, 48, false, 55,
false, 56, false, 56, false, 56, false, 56, false, 64, false, 64, false, 64, false, 64, false, 64, false, 72, false, 72, false, 72, false, 72, false, 72, false, 80, false, 80, false, 80,
false, 80, false, 80, false, 88, false, 88, false, 88, false, 88, false, 88, false, 96, false, 96, false, 96, false, 96, false, 96, false, 104, false, 104, false, 104, false, 104, false, 104,
false, 111, false, 112, false, 112, false, 112, false, 112, false, 120, false, 120, false, 120, false, 120, false, 120, false, 127, false, 128, false, 128, false, 128, false, 128, false, 136,
false, 136, false, 136, false, 136, false, 136, false, 143, false, 144, false, 144, false, 144, false, 144, false, 152, false, 152, false, 152, false, 152, false, 152, false, 160, false, 160,
false...}
ngrp = <optimized out>
last_seen = <optimized out>
starts = <optimized out>
my_gs = 0x0
#1 0x00007ffff05f3828 in radix_r (from=<optimized out>, to=<optimized out>, radix=<optimized out>) at forder.c:1238
start = <optimized out>
start = <optimized out>
i = <optimized out>
i = <optimized out>
anyBig = <optimized out>
my_n = 0
batchSize = <optimized out>
nBatch = <optimized out>
lastBatchSize = <optimized out>
counts = <optimized out>
ugrps = <optimized out>
ngrps = <optimized out>
skip = <optimized out>
n_rem = <optimized out>
ugrp = "X\000X\000`\000`\000`\000`\000`\000h\000h\000h\000h\000h\000o\000p\000p\000p\000p\000x\000x\000x\000x\000x\000\177\000\200\000\200\000\200\000\200\000\210\000\210\000\210\000\210\000\210\000\217\000\220\000\220\000\220\000\220\000\230\000\230\000\230\000\230\000\230\000\240\000\240\000\240\000\240\000\240\000\247\000\250\000\250\000\250\000\250\000\260\000\260\000\260\000\260\000\260\000\270\000\270\000\270\000\270\000\270\000\300\000\300\000\300\000\300\000\300\000\307\000\310\000\310\000\310\000\310\000\320\000\320\000\320\000\320\000\320\000\330\000\330\000\330\000\330\000\330\000\340\000\340\000\340\000\340\000\340\000\350\000\350\000\350\000\350\000\350\000\360\000\360\000\360\000\360\000\360\000\370\000\370\000\370\000"...
seen = {88, false, 88, false, 96, false, 96, false, 96, false, 96, false, 96, false, 104, false, 104, false, 104, false, 104, false, 104, false, 111, false, 112, false, 112, false, 112, false, 112, false, 120, false, 120, false, 120, false, 120, false, 120, false, 127, false, 128, false, 128, false, 128, false, 128, false, 136, false, 136, false, 136, false, 136, false, 136, false, 143, false, 144, false, 144, false, 144, false, 144, false, 152, false, 152, false, 152, false, 152, false, 152, false, 160, false, 160, false, 160, false, 160, false, 160, false, 167, false, 168, false, 168, false, 168, false, 168, false, 176, false, 176, false, 176, false, 176, false, 176, false, 184, false, 184, false, 184, false, 184, false, 184, false, 192, false, 192, false, 192, false, 192, false, 192, false, 199, false, 200, false, 200, false, 200, false, 200, false, 208, false, 208, false, 208, false, 208, false, 208, false, 216, false, 216, false, 216, false, 216, false, 216, false, 224, false, 224, false, 224, false, 224, false, 224, false, 232, false, 232, false, 232, false, 232, false, 232, false, 240, false, 240, false, 240, false, 240, false, 240, false, 248, false, 248, false, 248, false...}
ngrp = <optimized out>
last_seen = <optimized out>
starts = <optimized out>
my_gs = 0x0
#2 0x00007ffff72eb46e in ?? () from /usr/lib64/libgomp.so.1
No symbol table info available.
#3 0x00007ffff70b94f9 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4 0x00007ffff6df1ecf in clone () from /lib64/libc.so.6
No symbol table info available.
There was a part here about some address calculation, which turned out to be wrong because I forgot someting in the calulation. And doing the calulation right makes the part irrelevant. But I'll leave it as Spoiler for completeness sake. Wrong asumptionWhat is also interesting doing some address calculation to track this down. int *restrict my_TMP = TMP + omp_get_thread_num()*UINT16_MAX; when doing this calulation backwards with the address found by gdb (gdb) p my_TMP
$3 = (int * restrict) 0x4a84054
(gdb) p TMP
$4 = (int *) 0x35c41a0
(gdb) it turns out Also it look like with (gdb) p my_TMP
$1 = (int * restrict) 0x7ffce85c6ed4
(gdb) p TMP
$2 = (int *) 0x7ffce7207010
(gdb) tell a similar story: |
…rting the amount of thread, memory was allocated for. L717-L718 allocated memory for a batch-throttled number of threads in the TMP buffer, but L1235-L1238 started an unthrottled number of threads, causing the segfault at L995 for threads over the throttled threshold.
…t start too many threads where the buffer might be used. Also renaming the global TMP as such. see Rdatatable#5077 and Rdatatable#5087
…amount of thread, memory was allocated for. (#5087)
I'm working at the IT department in a research institute, and recently one of our scientists came to me with mysteriously crashing R sessions on one (and only one) of our machines (designated for use with R and Rstudio) at the institute. While investigating the problem, I found that the main and deciding difference between the machines he tried it on is the CPU and more specifically the number of cores of the CPUs. The machine the code was working on was a 4 socket, 16 core machine and the machine where the crashes occurred was a 2 socket, 265 core (with SMT) machine. Further investigation revealed, this was important, because when importing
data.table
, by default 50% of the available cores are chosen for multithreaded operations.From playing around with
setDTthreads(threads = N)
, I found thatN = 79
seamed to be the magical threshold for the crashes to occur. Below this number the code consistently works, above this number the code will crash in most (but not all) cases.The error first occured with R 4.0.2 and data.table 1.13.0. But because #4892 looked somewhat similar, I also tested it with the latest development version 1.14.1.
Please find blow the crashing code, from what he told me, the code is trying to archive the follwing:
Minimal reproducible example
error
The spcifics of the segfault vary, depending on the number of threads, so here are the error for a variaty of thread numbers
the result for `N = 80` threads
the result for `N = 96` threads
the result for `N = 128` threads
Because the
M = 160
seamed related to the threshold of 79/80 cores, I also tried otherM
, specificallyM = 100
but this error threshold still remains the same.Output of
sessionInfo()
The text was updated successfully, but these errors were encountered: