Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce allocations, chapter III #7222

Merged
merged 1 commit into from
Dec 18, 2024

Conversation

anderseknert
Copy link
Member

@anderseknert anderseknert commented Dec 16, 2024

My last PR for a while in the ongoing "reduce allocations in eval" quest. Motivated initially mostly to speed up regal lint, but most of the changes here positively impacts evaluation performance for most policies.

The changes with the highest impact in this PR:

  • Use sync.Pools to avoid the most costly allocations, includuing heavy *eval pointers created each time a child or closure scope is evaluated.
  • When tracing is disabled, avoid variable escaping to heap in evalStep function whose value is only read when tracing is enabled.
  • Save one allocation per iteration in walkNoPath by reusing an AST array instead of creating a new one for each call.

Also a few minor fixes here and there which either fixed some correctness issue, or had a measurable (although minor) positive impact on performance.

regal lint bundle (main)

BenchmarkRegalLintingItself-10   1      2015560750 ns/op        4335625360 B/op 83728460 allocs/op

regal lint bundle (now)

BenchmarkRegalLintingItself-10   1      1828754125 ns/op        3541027496 B/op 70080568 allocs/op

About 10% faster eval, with almost a gigabyte less memory allocated, and 13 million+ allocations
less performed.

Another topic discussed recently has been the cost of calling custom functions in hot paths.
While this PR doesn't address that problem fully, the benefits of the change is still quite
noticeable. A benchmark for that case specifically is also included in the PR, and the change
compared to main as noted below:

main

BenchmarkCustomFunctionInHotPath-10     55  18543908 ns/op  20821043 B/op  284611 allocs/op

pr

BenchmarkCustomFunctionInHotPath-10     73  16247587 ns/op  13048108 B/op  228406 allocs/op

It's worth noting however that this benchmark benefits "unfairly" by the improvements made
in the walkNoPath function, and perhaps more so than custom function evaluation getting
that much more efficient.

Copy link

netlify bot commented Dec 16, 2024

Deploy Preview for openpolicyagent ready!

Name Link
🔨 Latest commit 3395c4a
🔍 Latest deploy log https://app.netlify.com/sites/openpolicyagent/deploys/6762e42d4ccf120008373968
😎 Deploy Preview https://deploy-preview-7222--openpolicyagent.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@anderseknert anderseknert force-pushed the more-perf branch 2 times, most recently from 05faeca to 5f7ec1e Compare December 16, 2024 10:37
v1/topdown/eval.go Show resolved Hide resolved
v1/topdown/eval.go Show resolved Hide resolved
@anderseknert anderseknert force-pushed the more-perf branch 2 times, most recently from 8775f4e to 96cb4ce Compare December 17, 2024 10:51
srenatus
srenatus previously approved these changes Dec 17, 2024
Copy link
Contributor

@srenatus srenatus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How the Grinch Stole all our Allocations 🎄

v1/topdown/walk.go Show resolved Hide resolved
v1/topdown/walk.go Show resolved Hide resolved
v1/topdown/eval.go Outdated Show resolved Hide resolved
v1/topdown/eval.go Show resolved Hide resolved
johanfylling
johanfylling previously approved these changes Dec 18, 2024
Copy link
Contributor

@johanfylling johanfylling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

for i := 0; i < len(kvs); i += 2 {
tuples[i/2] = *(*[2]*Term)(kvs[i : i+2])
}
return NewObject(tuples...), nil
case map[string]string:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the above optimization make sense here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely! I don't remember but it's likely the case that this case wasn't hit by my input data (i.e. Regal linting), and I have avoided doing changes on code that's not excercised in my own tests. I'm sure this is covered by unit tests and such, but :)

See Roast for an uber-optimized version of InterfaceToValue... it's not pretty, lol https://github.com/anderseknert/roast/blob/main/internal/transforms/value.go#L301

@@ -357,14 +357,14 @@ func (vis *GenericVisitor) Walk(x interface{}) {
vis.Walk(x.Get(k))
})
case Object:
x.Foreach(func(k, _ *Term) {
for _, k := range x.Keys() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

object.Keys() will create a new slice of keys; I suppose the cost of that is less than the removed closure?

To further optimize, perhaps we could expose the object.sortedKeys() function somehow, or reuse the util.NewPtrSlice() optimization inside object.Keys()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah tbh this might be a leftover from an optimization that didn't work out. AFAICS, yeah, these two should be equally expensive in terms of allocations, although there's probably probably some negligible overhead from using a callback compared to the loop.

When we can use Go 1.23, I'd be keen to explore range over functions for things like this! That would allow us to have a generic way to traverse AST collections, with copying or without. And be able to exit early when some condition is met, without creating a new slice, and whatnot.

@@ -68,7 +68,7 @@ func (u *bindings) Plug(a *ast.Term) *ast.Term {
}

func (u *bindings) PlugNamespaced(a *ast.Term, caller *bindings) *ast.Term {
if u != nil {
if u != nil && u.instr != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the u.plugNamespaced(a, caller) call on line 78 make sense if u == nil?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not likely. I did this change only because it felt obvious that this was the original intention of the check, and it came with no risk (as far as I could see)

child := evalPool.Get()

e.closure(x.Body, child)
defer evalPool.Put(child)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most other places declares the return to the pool immediately after retrieving it; could we follow that pattern everywhere to be consistent?

Copy link
Member Author

@anderseknert anderseknert Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure 👍 Fixed

My last PR for a while in the ongoing "reduce allocations in eval" quest.
Motivated initially mostly to speed up `regal lint`, but most of the changes
here positively impacts evaluation performance for most policies.

The changes with the highest impact in this PR:

* Use `sync.Pool`s to avoid the most costly allocations, includuing heavy `*eval`
  pointers created each time a child or closure scope is evaluated.
* When tracing is disabled, avoid variable escaping to heap in `evalStep` function
  whose value is only read when tracing is enabled.
* Save one allocation per iteration in `walkNoPath` by reusing an AST array instead
  of creating a new one for each call.

Also a few minor fixes here and there which either fixed some correctness issue, or
had a measurable (although minor) positive impact on performance.

**regal lint bundle (main)**
```
BenchmarkRegalLintingItself-10   1	2015560750 ns/op	4335625360 B/op	83728460 allocs/op
```

**regal lint bundle (now)**
```
BenchmarkRegalLintingItself-10   1	1828754125 ns/op	3541027496 B/op	70080568 allocs/op
```

About 10% faster eval, with almost a gigabyte less memory allocated, and 13 million+ allocations
less performed.

Another topic discussed recently has been the cost of calling custom functions in hot paths.
While this PR doesn't address that problem fully, the benefits of the change is still quite
noticeable. A benchmark for that case specifically is also included in the PR, and the change
compared to main as noted below:

**main**
```
BenchmarkCustomFunctionInHotPath-10    	55  18543908 ns/op  20821043 B/op  284611 allocs/op
```

**pr**
```
BenchmarkCustomFunctionInHotPath-10    	73  16247587 ns/op  13048108 B/op  228406 allocs/op
```

It's worth noting however that this benchmark benefits "unfairly" by the improvements made
in the `walkNoPath` function, and perhaps more so than custom function evaluation getting
that much more efficient.

Signed-off-by: Anders Eknert <anders@styra.com>
@anderseknert anderseknert merged commit 50b5ee5 into open-policy-agent:main Dec 18, 2024
28 checks passed
@anderseknert anderseknert deleted the more-perf branch December 18, 2024 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants