-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce allocations, chapter III #7222
Conversation
✅ Deploy Preview for openpolicyagent ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
05faeca
to
5f7ec1e
Compare
8775f4e
to
96cb4ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How the Grinch Stole all our Allocations 🎄
cfda5d6
to
3395c4a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
for i := 0; i < len(kvs); i += 2 { | ||
tuples[i/2] = *(*[2]*Term)(kvs[i : i+2]) | ||
} | ||
return NewObject(tuples...), nil | ||
case map[string]string: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the above optimization make sense here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most likely! I don't remember but it's likely the case that this case wasn't hit by my input data (i.e. Regal linting), and I have avoided doing changes on code that's not excercised in my own tests. I'm sure this is covered by unit tests and such, but :)
See Roast for an uber-optimized version of InterfaceToValue... it's not pretty, lol https://github.com/anderseknert/roast/blob/main/internal/transforms/value.go#L301
@@ -357,14 +357,14 @@ func (vis *GenericVisitor) Walk(x interface{}) { | |||
vis.Walk(x.Get(k)) | |||
}) | |||
case Object: | |||
x.Foreach(func(k, _ *Term) { | |||
for _, k := range x.Keys() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
object.Keys()
will create a new slice of keys; I suppose the cost of that is less than the removed closure?
To further optimize, perhaps we could expose the object.sortedKeys()
function somehow, or reuse the util.NewPtrSlice()
optimization inside object.Keys()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah tbh this might be a leftover from an optimization that didn't work out. AFAICS, yeah, these two should be equally expensive in terms of allocations, although there's probably probably some negligible overhead from using a callback compared to the loop.
When we can use Go 1.23, I'd be keen to explore range over functions for things like this! That would allow us to have a generic way to traverse AST collections, with copying or without. And be able to exit early when some condition is met, without creating a new slice, and whatnot.
@@ -68,7 +68,7 @@ func (u *bindings) Plug(a *ast.Term) *ast.Term { | |||
} | |||
|
|||
func (u *bindings) PlugNamespaced(a *ast.Term, caller *bindings) *ast.Term { | |||
if u != nil { | |||
if u != nil && u.instr != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the u.plugNamespaced(a, caller)
call on line 78 make sense if u == nil
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not likely. I did this change only because it felt obvious that this was the original intention of the check, and it came with no risk (as far as I could see)
child := evalPool.Get() | ||
|
||
e.closure(x.Body, child) | ||
defer evalPool.Put(child) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most other places declares the return to the pool immediately after retrieving it; could we follow that pattern everywhere to be consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure 👍 Fixed
My last PR for a while in the ongoing "reduce allocations in eval" quest. Motivated initially mostly to speed up `regal lint`, but most of the changes here positively impacts evaluation performance for most policies. The changes with the highest impact in this PR: * Use `sync.Pool`s to avoid the most costly allocations, includuing heavy `*eval` pointers created each time a child or closure scope is evaluated. * When tracing is disabled, avoid variable escaping to heap in `evalStep` function whose value is only read when tracing is enabled. * Save one allocation per iteration in `walkNoPath` by reusing an AST array instead of creating a new one for each call. Also a few minor fixes here and there which either fixed some correctness issue, or had a measurable (although minor) positive impact on performance. **regal lint bundle (main)** ``` BenchmarkRegalLintingItself-10 1 2015560750 ns/op 4335625360 B/op 83728460 allocs/op ``` **regal lint bundle (now)** ``` BenchmarkRegalLintingItself-10 1 1828754125 ns/op 3541027496 B/op 70080568 allocs/op ``` About 10% faster eval, with almost a gigabyte less memory allocated, and 13 million+ allocations less performed. Another topic discussed recently has been the cost of calling custom functions in hot paths. While this PR doesn't address that problem fully, the benefits of the change is still quite noticeable. A benchmark for that case specifically is also included in the PR, and the change compared to main as noted below: **main** ``` BenchmarkCustomFunctionInHotPath-10 55 18543908 ns/op 20821043 B/op 284611 allocs/op ``` **pr** ``` BenchmarkCustomFunctionInHotPath-10 73 16247587 ns/op 13048108 B/op 228406 allocs/op ``` It's worth noting however that this benchmark benefits "unfairly" by the improvements made in the `walkNoPath` function, and perhaps more so than custom function evaluation getting that much more efficient. Signed-off-by: Anders Eknert <anders@styra.com>
3395c4a
to
2dcd80b
Compare
My last PR for a while in the ongoing "reduce allocations in eval" quest. Motivated initially mostly to speed up
regal lint
, but most of the changes here positively impacts evaluation performance for most policies.The changes with the highest impact in this PR:
sync.Pool
s to avoid the most costly allocations, includuing heavy*eval
pointers created each time a child or closure scope is evaluated.evalStep
function whose value is only read when tracing is enabled.walkNoPath
by reusing an AST array instead of creating a new one for each call.Also a few minor fixes here and there which either fixed some correctness issue, or had a measurable (although minor) positive impact on performance.
regal lint bundle (main)
regal lint bundle (now)
About 10% faster eval, with almost a gigabyte less memory allocated, and 13 million+ allocations
less performed.
Another topic discussed recently has been the cost of calling custom functions in hot paths.
While this PR doesn't address that problem fully, the benefits of the change is still quite
noticeable. A benchmark for that case specifically is also included in the PR, and the change
compared to main as noted below:
main
pr
It's worth noting however that this benchmark benefits "unfairly" by the improvements made
in the
walkNoPath
function, and perhaps more so than custom function evaluation gettingthat much more efficient.