Reduce allocations, chapter III #7222

anderseknert · 2024-12-16T10:24:23Z

My last PR for a while in the ongoing "reduce allocations in eval" quest. Motivated initially mostly to speed up regal lint, but most of the changes here positively impacts evaluation performance for most policies.

The changes with the highest impact in this PR:

Use sync.Pools to avoid the most costly allocations, includuing heavy *eval pointers created each time a child or closure scope is evaluated.
When tracing is disabled, avoid variable escaping to heap in evalStep function whose value is only read when tracing is enabled.
Save one allocation per iteration in walkNoPath by reusing an AST array instead of creating a new one for each call.

Also a few minor fixes here and there which either fixed some correctness issue, or had a measurable (although minor) positive impact on performance.

regal lint bundle (main)

BenchmarkRegalLintingItself-10   1      2015560750 ns/op        4335625360 B/op 83728460 allocs/op

regal lint bundle (now)

BenchmarkRegalLintingItself-10   1      1828754125 ns/op        3541027496 B/op 70080568 allocs/op

About 10% faster eval, with almost a gigabyte less memory allocated, and 13 million+ allocations
less performed.

Another topic discussed recently has been the cost of calling custom functions in hot paths.
While this PR doesn't address that problem fully, the benefits of the change is still quite
noticeable. A benchmark for that case specifically is also included in the PR, and the change
compared to main as noted below:

main

BenchmarkCustomFunctionInHotPath-10     55  18543908 ns/op  20821043 B/op  284611 allocs/op

pr

BenchmarkCustomFunctionInHotPath-10     73  16247587 ns/op  13048108 B/op  228406 allocs/op

It's worth noting however that this benchmark benefits "unfairly" by the improvements made
in the walkNoPath function, and perhaps more so than custom function evaluation getting
that much more efficient.

netlify · 2024-12-16T10:26:18Z

✅ Deploy Preview for openpolicyagent ready!

Name	Link
🔨 Latest commit	`3395c4a`
🔍 Latest deploy log	https://app.netlify.com/sites/openpolicyagent/deploys/6762e42d4ccf120008373968
😎 Deploy Preview	https://deploy-preview-7222--openpolicyagent.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

v1/topdown/eval.go

srenatus

How the Grinch Stole all our Allocations 🎄

v1/topdown/walk.go

v1/topdown/eval.go

johanfylling

LGTM 👍

johanfylling · 2024-12-18T14:00:49Z

v1/ast/term.go

+		for i := 0; i < len(kvs); i += 2 {
+			tuples[i/2] = *(*[2]*Term)(kvs[i : i+2])
+		}
+		return NewObject(tuples...), nil
 	case map[string]string:


Would the above optimization make sense here too?

Most likely! I don't remember but it's likely the case that this case wasn't hit by my input data (i.e. Regal linting), and I have avoided doing changes on code that's not excercised in my own tests. I'm sure this is covered by unit tests and such, but :)

See Roast for an uber-optimized version of InterfaceToValue... it's not pretty, lol https://github.com/anderseknert/roast/blob/main/internal/transforms/value.go#L301

johanfylling · 2024-12-18T14:08:28Z

v1/ast/visit.go

@@ -357,14 +357,14 @@ func (vis *GenericVisitor) Walk(x interface{}) {
 			vis.Walk(x.Get(k))
 		})
 	case Object:
-		x.Foreach(func(k, _ *Term) {
+		for _, k := range x.Keys() {


object.Keys() will create a new slice of keys; I suppose the cost of that is less than the removed closure?

To further optimize, perhaps we could expose the object.sortedKeys() function somehow, or reuse the util.NewPtrSlice() optimization inside object.Keys()?

Hmm yeah tbh this might be a leftover from an optimization that didn't work out. AFAICS, yeah, these two should be equally expensive in terms of allocations, although there's probably probably some negligible overhead from using a callback compared to the loop.

When we can use Go 1.23, I'd be keen to explore range over functions for things like this! That would allow us to have a generic way to traverse AST collections, with copying or without. And be able to exit early when some condition is met, without creating a new slice, and whatnot.

johanfylling · 2024-12-18T14:26:01Z

v1/topdown/bindings.go

@@ -68,7 +68,7 @@ func (u *bindings) Plug(a *ast.Term) *ast.Term {
 }

 func (u *bindings) PlugNamespaced(a *ast.Term, caller *bindings) *ast.Term {
-	if u != nil {
+	if u != nil && u.instr != nil {


Does the u.plugNamespaced(a, caller) call on line 78 make sense if u == nil?

Not likely. I did this change only because it felt obvious that this was the original intention of the check, and it came with no risk (as far as I could see)

johanfylling · 2024-12-18T14:57:52Z

v1/topdown/eval.go

+	child := evalPool.Get()
+
+	e.closure(x.Body, child)
+	defer evalPool.Put(child)


Most other places declares the return to the pool immediately after retrieving it; could we follow that pattern everywhere to be consistent?

Sure 👍 Fixed

My last PR for a while in the ongoing "reduce allocations in eval" quest. Motivated initially mostly to speed up `regal lint`, but most of the changes here positively impacts evaluation performance for most policies. The changes with the highest impact in this PR: * Use `sync.Pool`s to avoid the most costly allocations, includuing heavy `*eval` pointers created each time a child or closure scope is evaluated. * When tracing is disabled, avoid variable escaping to heap in `evalStep` function whose value is only read when tracing is enabled. * Save one allocation per iteration in `walkNoPath` by reusing an AST array instead of creating a new one for each call. Also a few minor fixes here and there which either fixed some correctness issue, or had a measurable (although minor) positive impact on performance. **regal lint bundle (main)** ``` BenchmarkRegalLintingItself-10 1 2015560750 ns/op 4335625360 B/op 83728460 allocs/op ``` **regal lint bundle (now)** ``` BenchmarkRegalLintingItself-10 1 1828754125 ns/op 3541027496 B/op 70080568 allocs/op ``` About 10% faster eval, with almost a gigabyte less memory allocated, and 13 million+ allocations less performed. Another topic discussed recently has been the cost of calling custom functions in hot paths. While this PR doesn't address that problem fully, the benefits of the change is still quite noticeable. A benchmark for that case specifically is also included in the PR, and the change compared to main as noted below: **main** ``` BenchmarkCustomFunctionInHotPath-10 55 18543908 ns/op 20821043 B/op 284611 allocs/op ``` **pr** ``` BenchmarkCustomFunctionInHotPath-10 73 16247587 ns/op 13048108 B/op 228406 allocs/op ``` It's worth noting however that this benchmark benefits "unfairly" by the improvements made in the `walkNoPath` function, and perhaps more so than custom function evaluation getting that much more efficient. Signed-off-by: Anders Eknert <anders@styra.com>

anderseknert force-pushed the more-perf branch 2 times, most recently from 05faeca to 5f7ec1e Compare December 16, 2024 10:37

srenatus reviewed Dec 16, 2024

View reviewed changes

v1/topdown/eval.go Show resolved Hide resolved

v1/topdown/eval.go Show resolved Hide resolved

anderseknert force-pushed the more-perf branch 2 times, most recently from 8775f4e to 96cb4ce Compare December 17, 2024 10:51

srenatus previously approved these changes Dec 17, 2024

View reviewed changes

v1/topdown/walk.go Show resolved Hide resolved

v1/topdown/walk.go Show resolved Hide resolved

v1/topdown/eval.go Outdated Show resolved Hide resolved

v1/topdown/eval.go Show resolved Hide resolved

anderseknert dismissed srenatus’s stale review via b395bcd December 17, 2024 15:33

anderseknert force-pushed the more-perf branch 4 times, most recently from cfda5d6 to 3395c4a Compare December 18, 2024 15:03

johanfylling previously approved these changes Dec 18, 2024

View reviewed changes

anderseknert dismissed johanfylling’s stale review via 2dcd80b December 18, 2024 18:57

anderseknert force-pushed the more-perf branch from 3395c4a to 2dcd80b Compare December 18, 2024 18:57

johanfylling approved these changes Dec 18, 2024

View reviewed changes

anderseknert merged commit 50b5ee5 into open-policy-agent:main Dec 18, 2024
28 checks passed

anderseknert deleted the more-perf branch December 18, 2024 19:59

BrewTestBot mentioned this pull request Dec 20, 2024

opa 1.0.0 Homebrew/homebrew-core#201984

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce allocations, chapter III #7222

Reduce allocations, chapter III #7222

anderseknert commented Dec 16, 2024 •

edited

Loading

netlify bot commented Dec 16, 2024 •

edited

Loading

srenatus left a comment

johanfylling left a comment

johanfylling Dec 18, 2024

anderseknert Dec 18, 2024

johanfylling Dec 18, 2024

anderseknert Dec 18, 2024

johanfylling Dec 18, 2024

anderseknert Dec 18, 2024

johanfylling Dec 18, 2024

anderseknert Dec 18, 2024 •

edited

Loading

Reduce allocations, chapter III #7222

Reduce allocations, chapter III #7222

Conversation

anderseknert commented Dec 16, 2024 • edited Loading

netlify bot commented Dec 16, 2024 • edited Loading

✅ Deploy Preview for openpolicyagent ready!

srenatus left a comment

Choose a reason for hiding this comment

johanfylling left a comment

Choose a reason for hiding this comment

johanfylling Dec 18, 2024

Choose a reason for hiding this comment

anderseknert Dec 18, 2024

Choose a reason for hiding this comment

johanfylling Dec 18, 2024

Choose a reason for hiding this comment

anderseknert Dec 18, 2024

Choose a reason for hiding this comment

johanfylling Dec 18, 2024

Choose a reason for hiding this comment

anderseknert Dec 18, 2024

Choose a reason for hiding this comment

johanfylling Dec 18, 2024

Choose a reason for hiding this comment

anderseknert Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

anderseknert commented Dec 16, 2024 •

edited

Loading

netlify bot commented Dec 16, 2024 •

edited

Loading

anderseknert Dec 18, 2024 •

edited

Loading