Improve gc tricolor by avoiding revisiting nodes #10644
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If a node can be reached by, e.g., 100 nodes, then it will be added to
grays
100 times (andrefs
will be called 100 times for this node). This PR avoids this by pre-marking a node as non-white right after it's added tograys
.My understanding from the code is we can/should avoid such duplications in
grays
(and duplicatedrefs(node)
calls within aTricolor
call).Tricolor
is single-thread GC, and only called once byDB.getMarked
containerd/core/metadata/db.go
Line 516 in 4cfeb7b
getMarked
is only called once byDB.GarbageCollect
which holds a write lock toDB
:containerd/core/metadata/db.go
Lines 372 to 376 in 4cfeb7b
Tricolor
call, and repeatedrefs(node)
calls will return the same results and thus should be avoided.The added benchmark on my macbook:
By creating a graph where A -> [X1, X100] -> D, it shows the GC latency has a ~16% improvement (22260/26622=84%). Memory also has ~6% reduction (71550/75636=94%). (the actual improvement will depend on the cost of calling
refs
and number of duplicated nodes).Appreciate any feedback/concern, thanks!