-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Garbage Collector] Umbrella: known issues of garbage collector #26120
Comments
Can you give an example for "Update event of the owner resource", "creation/update of some dependents"? Where does the event come from? Why aren't they ordered? |
@hongchaodeng, Let's use a replication controller and its pods as an example.
But from the GC's perspective, it may observe things in this order:
So the GC doesn't know |
@deads2k FYI |
It's reasonable for certain clients to potentially treat resource version as comparable - GC is certainly one of them. We do want to have UID tombstones eventually, but it's certainly a lot of work to get there. What other options do we have? |
As discussed today on sig meeting, we need this for the "migrate client" to touch every object at least once as well. |
@smarterclayton By UID tombstones do you mean etcd3's ability to store versions of deleted objects or constructing our own API? |
Possibly both.
|
Automatic merge from submit-queue [GarbageCollector] add absent owner cache <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md 2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md 3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes --> **What this PR does / why we need it**: Reducing the Request sent to the API server by the garbage collector to check if an owner exists. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes # #26120 **Special notes for your reviewer**: **Release note**: <!-- Steps to write your release note: 1. Use the release-note-* labels to set the release note state (if you have access) 2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`. --> ```release-note ``` Currently when processing an item in the dirtyQueue, the garbage collector issues GET to check if any of its owners exist. If the owner is a replication controller with 1000 pods, the garbage collector sends a GET for the RC 1000 times. This PR caches the owner's UID if it does not exist according to the API server. This cuts 1/3 of the garbage collection time of the density test in the gce-500 and gce-scale, where the QPS is the bottleneck.
Is deletion incrementing metadata.Generation? |
Yes. |
Automatic merge from submit-queue (batch tested with PRs 43106, 43110) Wait for garbagecollector to be synced in test Fix #42952 Without the `cache.WaitForCacheSync` in the test, it's possible for the GC to get a merged event of RC's creation and its update (update to deletionTimestamp != 0), before GC gets the creation event of the pod, so it's possible the GC will handle the foreground deletion of the RC before it adds the Pod to the dependency graph, thus the race. With the `cache.WaitForCacheSync` in the test, because GC runs a single thread to process graph changes, it's guaranteed the Pod will be added to the dependency graph before GC handles the foreground deletion of the RC. Note that this pull fixes the race in the test. The race described in the first point of #26120 still exists.
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
This is a umbrella issue for Todos and known problems of the garbage collector that we have to solve before graduating GC to beta.
_Functionalities_
This race will cause
DeleteOptions.OrphanDependents=true
not orphan all dependents. We expect this race to be rare. Here are the details: there is no guarantee on the ordering of the events of different resources, it's possible that the GC observes the orphan request (which is an Update event of the owner resource), before observing the creation/update of the dependents. Consequently, these dependents will not be orphaned. [GarbageCollector] monitor the watch latency #30483 adds a monitor the latency of watch, because the latency is small, so this race is rare.Considered solution:
a. let the GC wait for a short period of time (e.g., 1 min) before carrying out the orphaning procedure, but it won't thoroughly solve the problem, and it will make the deletion slow.
b. let the user supply a resourceVersion, and before carrying out the orphaning procedure, let GC wait until it has observed events of the dependent resource with a larger resourceVersion. Problem is GC is a client, and a client should treat resourceVersion as opaque.
For example, GC needs to determine if it should watch for extensions/v1beta1/job or batch/v1/job. (edit: this use case doesn't matter, because the ownerRef will only point to one version of the object, the other version of the object will just be a shadow)[DONE] According to the controllerRef proposal Proposal for ControllerReference #25256, "GarbageCollector will remove ControllerRef from objects that no longer points to existing controllers".[DONE] Update at least one controller to use GC. Now replicaset controller and replicationcontroller manager use GC ([GarbageCollector] Let the RC manager set/remove ControllerRef #27600)[DONE] [update] we have foreground gc now. Expose the progress of garbage collection. See [RFC][GarbageCollector] expose the progress of garbage collection #29891.[Fixing, see GC: Fix re-adoption race when orphaning dependents. #42938 ] The design doc said before orphaning dependents, GC should wait for the owner's controller to bump up owner's ObservedGeneration, which means the owner controller has acknowledged that it has observed the deletion of the owner and will stop adoption. Otherwise GC's orphaning process will race with owner controller's adoption process, and results in the deletion of the dependents. We hasn't implemented this yet. We expect this race to be rare, because currently only the replicaset and replication controller does the adoption, and it's triggered by updates to RC or the 10-minute resync. We have an e2e test for orphaning 100 pods controlled by a RC, it never hits this race.[Done for new resources. For old resources 200 is returned for compatiblity] API server should return 202 if a deletion request is not completed synchronously. (API server should return 202 if the operation is asynchronous #33196)_Peformance_
2. [Done] Improvement: update the List and Watch to only store the TypeMeta and ObjectMeta ([GarbageCollector] only store typeMeta and objectMeta in the gc store #28480)[Done] [GarbageCollector] add absent owner cache #31167. Caching known deleted UID. GC contacts API server to see if an owner exists when processing its dependents. If the owner doesn't exist according to API server, it won't exist in the future either, because it's impossible for user to predicate UID of a future object and put it in the ownerRef. Such a cache is very useful in the RC-Pods case, because GC will check for the existence of the RC for every pods it created. Such a cache will help API server as well because a GET request with JSON media type is expensive for the API server.4. [RFC] In API server, support resouceVersion as a delete precondition. This will make the Get() in processItem() unnecessary.References:
@lavalamp @gmarek @derekwaynecarr @kubernetes/sig-api-machinery
The text was updated successfully, but these errors were encountered: