Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve gc tricolor by avoiding revisiting nodes #10644

Merged
merged 2 commits into from
Aug 27, 2024

Conversation

djdongjin
Copy link
Member

@djdongjin djdongjin commented Aug 27, 2024

If a node can be reached by, e.g., 100 nodes, then it will be added to grays 100 times (and refs will be called 100 times for this node). This PR avoids this by pre-marking a node as non-white right after it's added to grays.

My understanding from the code is we can/should avoid such duplications in grays (and duplicated refs(node) calls within a Tricolor call).

  1. Tricolor is single-thread GC, and only called once by DB.getMarked
    reachable, err := gc.Tricolor(nodes, refs)
  2. getMarked is only called once by DB.GarbageCollect which holds a write lock to DB:
    m.wlock.Lock()
    t1 := time.Now()
    c := startGCContext(ctx, m.collectors)
    marked, err := m.getMarked(ctx, c) // Pass in gc context
  3. So there should be no concurrent writes during a Tricolor call, and repeated refs(node) calls will return the same results and thus should be avoided.

The added benchmark on my macbook:

# before the change
$ go test -bench=. -benchmem -count 5 ./pkg/gc/...
BenchmarkTricolor-8        40724             26680 ns/op           75641 B/op        134 allocs/op
BenchmarkTricolor-8        45002             26666 ns/op           75646 B/op        134 allocs/op
BenchmarkTricolor-8        44790             26622 ns/op           75636 B/op        134 allocs/op
BenchmarkTricolor-8        45115             26654 ns/op           75639 B/op        134 allocs/op
BenchmarkTricolor-8        44782             26682 ns/op           75643 B/op        134 allocs/op
# after the change
$ go test -bench=. -benchmem -count 5 ./pkg/gc/...
BenchmarkTricolor-8        48979             22379 ns/op           71538 B/op        133 allocs/op
BenchmarkTricolor-8        53335             22484 ns/op           71541 B/op        133 allocs/op
BenchmarkTricolor-8        53569             22260 ns/op           71550 B/op        133 allocs/op
BenchmarkTricolor-8        53278             22312 ns/op           71545 B/op        133 allocs/op
BenchmarkTricolor-8        53372             22421 ns/op           71547 B/op        133 allocs/op

By creating a graph where A -> [X1, X100] -> D, it shows the GC latency has a ~16% improvement (22260/26622=84%). Memory also has ~6% reduction (71550/75636=94%). (the actual improvement will depend on the cost of calling refs and number of duplicated nodes).

Appreciate any feedback/concern, thanks!

pkg/gc/gc.go Show resolved Hide resolved
@djdongjin djdongjin force-pushed the improve-gc-tricolor branch from a0deb04 to 3b28db5 Compare August 27, 2024 02:12
@djdongjin djdongjin marked this pull request as ready for review August 27, 2024 02:41
Signed-off-by: Jin Dong <djdongjin95@gmail.com>
Signed-off-by: Jin Dong <djdongjin95@gmail.com>
@djdongjin djdongjin force-pushed the improve-gc-tricolor branch from 3b28db5 to d83184c Compare August 27, 2024 06:36
@dmcgowan
Copy link
Member

This is a good find and think your assessment is accurate. I think in the function description we already calls out that func(ref Node) ([]Node, error) should return the same results during the call, although we could make that more clear.

@mxpv mxpv added this pull request to the merge queue Aug 27, 2024
Merged via the queue into containerd:main with commit 0027f81 Aug 27, 2024
52 checks passed
@djdongjin djdongjin deleted the improve-gc-tricolor branch August 28, 2024 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants