Fix corrupt string results in expressions
#1589
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
During column
expressions
computation, Perspective maintains a linear, growable memory block for end-to-end C-string storage to minimize heap allocation, in the form of at_vocab
struct. In investigating failing tests around string values inexpressions
, @sc1f discovered a bug in the expression engine's use oft_vocab
's reallocation method, which is called when the memory block is exhausted. In this case, after an expression was in the process of being calculated, any already-allocatedt_scalar
structs interned on thist_vocab
block would contain invalid pointers to the just-deallocated block.We discussed this issue and decided to implement a paged-vocab approach that simply allocates a new
t_vocab
on exhaustion rather than reallocating. Interning strings in the first place is to prevent allocating intermediate to calculating a column; so we should minimize these anyway. There is quite a bit of potentially wasted memory here at the expensive of lookup-time in the string block and other issues; however, in the interest of correctness, we chose this implementation due to its simplicity, practicality and lack of worst-case performance corners.An alternate approach, updating the scalars in-place by incrementing their pointers by the offset of the reallocated base address, was not implemented; its a place for potential investigation in the future. I expect this will perform better for expressions with a small set strings, but its performance will degrade linearly as this set expands; whether that cost is of practical consequence versus the latter strategy would need performance profiles to know for sure.