Fix memory errors when streaming updates with expression columns #1464
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes a memory issue that was encountered on
streaming.js
and in Python, where trivial constructions of expression columns caused hard-to-debug memory errors in both Javascript and Python. After some debugging and generating a minimal repro in Python, I found the issue in the way we handlecapacity
andsize
for the expression tables that live on each context.Specifically, during each cycle of computing the expressions, the code called
set_capacity
on each of the expression tables to match the capacity offlattened
, and thenset_size
on each of the tables to match the size offlattened
. However, setting the capacity of thet_data_table
does not set the capacity of itst_column
ort_lstore
objects.I added a call to
verify
for each expressions table and saw that the transitions table was broken, where the capacity of the column was 8 bytes (the default empty capacity) yet the size was set to 50 bytes (50 rows in the test dataset * 1 byte peruint8_t
in the transitions column). Looking at the code oft_lstore::reserve_impl
, it seems like the implementation chooses capacity asmax(size, capacity)
so there isn't an obvious place where the memory corruption happens.However, replacing
set_capacity
withreserve
- which does change the capacity of the underlying columns and lstore - fixed the issue both instreaming.js
and in the Python library. I've added a test in the Python library that fails immediately with various aborts and segfaults on master, and it fully passes consistently on this branch.Additionally, I added a fix where the underlying
t_column
s are not allocated when the the table returned fromt_data_table::join
is created, thus preventing additional unnecessary allocation and reallocation (as the column is immediately replaced by the columns from the two input tables tojoin
).