Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix corrupt string results in expressions #1589

Merged
merged 7 commits into from
Oct 24, 2021
Merged

Fix corrupt string results in expressions #1589

merged 7 commits into from
Oct 24, 2021

Conversation

texodus
Copy link
Member

@texodus texodus commented Oct 22, 2021

During column expressions computation, Perspective maintains a linear, growable memory block for end-to-end C-string storage to minimize heap allocation, in the form of a t_vocab struct. In investigating failing tests around string values in expressions, @sc1f discovered a bug in the expression engine's use of t_vocab's reallocation method, which is called when the memory block is exhausted. In this case, after an expression was in the process of being calculated, any already-allocated t_scalar structs interned on this t_vocab block would contain invalid pointers to the just-deallocated block.

We discussed this issue and decided to implement a paged-vocab approach that simply allocates a new t_vocab on exhaustion rather than reallocating. Interning strings in the first place is to prevent allocating intermediate to calculating a column; so we should minimize these anyway. There is quite a bit of potentially wasted memory here at the expensive of lookup-time in the string block and other issues; however, in the interest of correctness, we chose this implementation due to its simplicity, practicality and lack of worst-case performance corners.

An alternate approach, updating the scalars in-place by incrementing their pointers by the offset of the reallocated base address, was not implemented; its a place for potential investigation in the future. I expect this will perform better for expressions with a small set strings, but its performance will degrade linearly as this set expands; whether that cost is of practical consequence versus the latter strategy would need performance profiles to know for sure.

@texodus texodus added the bug Concrete, reproducible bugs label Oct 22, 2021
@texodus texodus merged commit e527088 into master Oct 24, 2021
@texodus texodus deleted the pagevocab branch October 24, 2021 04:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Concrete, reproducible bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants