-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase supported pod deletion rate #96752
Comments
@mborsz: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig scalability |
Re "why":
Re solution itself:
|
Re option 1 comment:
Re comment 2b: In fact we currently do periodic polling for virtual objects (here). I just suggest moving this logic to a separate place. |
Yes - you can set Preconditions in DeleteOptions:
But unless I'm misreading it (sorry, didn't have time to look deep enough), this is happening on deletions. |
I think it makes sense to start setting RV precondition anyway (so that if a given object changes between GET and DELETE, we will know that the DELETE is still up to date decision).
My proposal is to "do whatever we are doing now, but in a separate loop". This way we will avoid doing API call on a standard code path only to support virtual objects. I understand that we are requeing all virtual objects, not only on deletions, but maybe I'm wrong. |
That will help in not taking action on stale data, but introduces the possibility that a frequently written object will never be garbage collected.
Correct, virtual objects are already polled until they are observed present (via an informer event) or verified absent (via a 404 on the live lookup) and a virtual delete event is queued. Having just spent a lot of time in the GC code, I'm not sure a third queue will simplify things. |
@liggitt - aren't we afraid that we delete something that we shouldn't delete in this case? I would assume that if something is supposed to be GC-ed it shouldn't be frequently changing (or at least it should eventually stop being updated frequently). |
Good point. Frequent object changes may cause gc starvation. This can be resolved by proper "precondition failed" error handing (i.e. retrying with GET API call) |
Another idea: maybe in fact we don't need to get rid of GET call -- it's cheap and it's not a problem itself. The only problem about a GET call is that it consumes token from rate limiter. Maybe we can somehow change qps limit for gc collector only to allow it e.g. 200 qps while other controllers will be still using 100 qps? |
So that will go away with Priority&Fairness eventually, because we want to get rid of qps-limits once this will prove to be stable. |
Yes, but only if the updates are modifying ownerReferences
I don't think that's necessarily true. Imagine a controller updating some status on an object as long as it exists, but which is depending on the object to go away when its owner does. |
Sure - that is exactly the point I had on my mind.
Sure - but I'm assuming that it shouldn't be more frequently than every 10s or so (so far by frequently I meant more than 1qps). And with longer delays between updates, we should eventually GC that (the latency of watch+GC processing time should be smaller than that). |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
/remove-lifecycle stale |
@mborsz I am interested to increase deletion rate too, trying to understand your issue here. Does it mean with default settings at controller manager (--kube-api-qps=30), in one second only max 15 pods will go from RUNNING->TERMINATING? Or is this rate about after TERMINATING to removal from API Server? |
In that case, only 15 pods will be marked to be deleted (by setting deletetionTimestamp != 0; pod phase will not change IIUCC). Then kubelets will remove objects from API server, once they terminate the process running on the node. |
I see okay, I was interested to increase the rate at which pod status changes from RUNNING -> TERMINATING after deletion. If I delete say 1000 pods at once, their state changes from RUNNING -> TERMINATING in small batches. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
Fixed in #104966 /close |
@mborsz: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What would you like to be added:
Currently garbage collector is using two api calls per a single object deletion. Given that the performance of controllers is limited by --kube-api-qps flag (set to 100 e.g. in our scalability tests), we can create pods with 100 pods/s churn, they are deleted with only 50 pods/s churn.
Why is this needed:
How to do that:
Ref:
kubernetes/pkg/controller/garbagecollector/garbagecollector.go
Lines 412 to 429 in 50db32c
IIUC this code correctly the additional get (and not using cache) has two purposes:
We can change the logic in attemptToDeleteItem to:
This way, for most of the cases we will skip api call and we will support both cases: attemptToDeleteItem will handle correctly "not yet observed by cache" objects (api call will return some object) and will make it possible to get rid of stale virtual objects (api call will return 404 and we will remove the virtual object from graph).
Alternatively, we can try something like:
The second solution seems to be a little bit more clear to me (it decouples two purposes of the attemptToDeleteItem call) and it avoids unnecessary API calls in case of watch delay (e.g. attemptToDeleteItem called on the pod that has been deleted and watch doesn't reflect this fact yet).
@liggitt @caesarxuchao @wojtek-t WDYT?
The text was updated successfully, but these errors were encountered: