Increase supported pod deletion rate #96752

mborsz · 2020-11-20T13:50:20Z

What would you like to be added:

Currently garbage collector is using two api calls per a single object deletion. Given that the performance of controllers is limited by --kube-api-qps flag (set to 100 e.g. in our scalability tests), we can create pods with 100 pods/s churn, they are deleted with only 50 pods/s churn.

Why is this needed:

Supporting high pod churn workloads
Making scalability tests faster (while we are able to create pods with 100 pods/s which takes ~30m in our tests, deletion of them happens with 50 pods/s (even less as garbage collector also removes other object types than pods), so it takes over hour).

How to do that:
Ref:

kubernetes/pkg/controller/garbagecollector/garbagecollector.go

Lines 412 to 429 in 50db32c

    
           // TODO: It's only necessary to talk to the API server if this is a 
        
           // "virtual" node. The local graph could lag behind the real status, but in 
        
           // practice, the difference is small. 
        
           latest, err := gc.getObject(item.identity) 
        
           switch { 
        
           case errors.IsNotFound(err): 
        
           	// the GraphBuilder can add "virtual" node for an owner that doesn't 
        
           	// exist yet, so we need to enqueue a virtual Delete event to remove 
        
           	// the virtual node from GraphBuilder.uidToNode. 
        
           	klog.V(5).Infof("item %v not found, generating a virtual delete event", item.identity) 
        
           	gc.dependencyGraphBuilder.enqueueVirtualDeleteEvent(item.identity) 
        
           	// since we're manually inserting a delete event to remove this node, 
        
           	// we don't need to keep tracking it as a virtual node and requeueing in attemptToDelete 
        
           	item.markObserved() 
        
           	return nil 
        
           case err != nil: 
        
           	return err 
        
           }

IIUC this code correctly the additional get (and not using cache) has two purposes:

making it possible to handle "virtual" objects (checking if they can be deleted): added in Add synchronous garbage collection #38676
getting rid of stale virtual objects: Ensure virtual nodes aren't stranded in GC graph #57503

We can change the logic in attemptToDeleteItem to:

if the object is present in cache, use it
if it's not, fallback to API call

This way, for most of the cases we will skip api call and we will support both cases: attemptToDeleteItem will handle correctly "not yet observed by cache" objects (api call will return some object) and will make it possible to get rid of stale virtual objects (api call will return 404 and we will remove the virtual object from graph).

Alternatively, we can try something like:

in attemptToDeleteItem, simply using cached object, if it doesn't exist, drop the object from queue (when we will observe the virtual object by watch, it will be added back)
add a dedicated worker queue for removing stale virtual objects: if we observe a new virtual object, adding to that queue, each queue loop will:
- check if the node is still virtual (not observed by cache): if not, drop it from queue
- otherwise api call to validate if it still exists on apiserver:
  - if it does, requeue object with some backoff
  - if not, drop object from graph

The second solution seems to be a little bit more clear to me (it decouples two purposes of the attemptToDeleteItem call) and it avoids unnecessary API calls in case of watch delay (e.g. attemptToDeleteItem called on the pod that has been deleted and watch doesn't reflect this fact yet).

@liggitt @caesarxuchao @wojtek-t WDYT?

k8s-ci-robot · 2020-11-20T13:50:29Z

@mborsz: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mborsz · 2020-11-20T14:06:17Z

/sig scalability

wojtek-t · 2020-11-23T12:24:43Z

Re "why":

high-churn workloads is something that we should be focused
I wouldn't explicitly mention speeding up tests as the goal itself, we probably could come up with other ways to solve that too.
But the first one is a compelling enough reasons and I fully support it.

Re solution itself:

option 1: what if the option present in cache is stale?
We currently don't set any preconditions on ResourceVersion, so we would have to start doing that, then retry on failures, etc.]
[All of that seems doable, but I'm just trying to point that it's not necessary as easy as it seems]
option 2:
(a) it has the same problem as above
(b) you're suggesting an additional periodic polling for virtual objects, which doesn't scale if we have many of those

mborsz · 2020-11-23T12:52:07Z

Re option 1 comment:

I think this is an issue also today: it is possible that between GET and DELETE calls the object has been modified. I agree that with moving to cache this may happen more often.
Is there any way to DELETE api object conditionally (i.e. delete only if it has a given resourceVersion?)?

Re comment 2b: In fact we currently do periodic polling for virtual objects (here). I just suggest moving this logic to a separate place.

wojtek-t · 2020-11-23T13:05:36Z

Yes - you can set Preconditions in DeleteOptions:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L506
and you can set RV in Preconditions:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L623

Re comment 2b: In fact we currently do periodic polling for virtual objects (here). I just suggest moving this logic to a separate place.

But unless I'm misreading it (sorry, didn't have time to look deep enough), this is happening on deletions.
The way how I understood your proposal is to poll them periodically for the whole lifetime?

mborsz · 2020-11-23T14:23:31Z

I think it makes sense to start setting RV precondition anyway (so that if a given object changes between GET and DELETE, we will know that the DELETE is still up to date decision).

But unless I'm misreading it (sorry, didn't have time to look deep enough), this is happening on deletions.
The way how I understood your proposal is to poll them periodically for the whole lifetime?

My proposal is to "do whatever we are doing now, but in a separate loop". This way we will avoid doing API call on a standard code path only to support virtual objects. I understand that we are requeing all virtual objects, not only on deletions, but maybe I'm wrong.

liggitt · 2020-11-23T14:24:47Z

I think it makes sense to start setting RV precondition anyway (so that if a given object changes between GET and DELETE, we will know that the DELETE is still up to date decision).

That will help in not taking action on stale data, but introduces the possibility that a frequently written object will never be garbage collected.

I understand that we are requeing all virtual objects, not only on deletions

Correct, virtual objects are already polled until they are observed present (via an informer event) or verified absent (via a 404 on the live lookup) and a virtual delete event is queued.

Having just spent a lot of time in the GC code, I'm not sure a third queue will simplify things.

wojtek-t · 2020-11-23T14:30:36Z

I think it makes sense to start setting RV precondition anyway (so that if a given object changes between GET and DELETE, we will know that the DELETE is still up to date decision).

That will help in not taking action on stale data, but introduces the possibility that a frequently written object will never be garbage collected.

@liggitt - aren't we afraid that we delete something that we shouldn't delete in this case? I would assume that if something is supposed to be GC-ed it shouldn't be frequently changing (or at least it should eventually stop being updated frequently).
Setting preconditions here would make sense to me.

mborsz · 2020-11-23T14:37:18Z

I think it makes sense to start setting RV precondition anyway (so that if a given object changes between GET and DELETE, we will know that the DELETE is still up to date decision).

That will help in not taking action on stale data, but introduces the possibility that a frequently written object will never be garbage collected.

Good point. Frequent object changes may cause gc starvation. This can be resolved by proper "precondition failed" error handing (i.e. retrying with GET API call)

mborsz · 2020-11-23T14:50:06Z

Another idea: maybe in fact we don't need to get rid of GET call -- it's cheap and it's not a problem itself. The only problem about a GET call is that it consumes token from rate limiter. Maybe we can somehow change qps limit for gc collector only to allow it e.g. 200 qps while other controllers will be still using 100 qps?

wojtek-t · 2020-11-23T14:54:03Z

So that will go away with Priority&Fairness eventually, because we want to get rid of qps-limits once this will prove to be stable.
[Also note that they are cheaper than modifying requests, but they are still not super cheap (e.g. they are linearized in etcd)]

liggitt · 2020-11-23T15:15:00Z

@liggitt - aren't we afraid that we delete something that we shouldn't delete in this case?

Yes, but only if the updates are modifying ownerReferences

I would assume that if something is supposed to be GC-ed it shouldn't be frequently changing (or at least it should eventually stop being updated frequently).

I don't think that's necessarily true. Imagine a controller updating some status on an object as long as it exists, but which is depending on the object to go away when its owner does.

wojtek-t · 2020-11-23T15:31:21Z

Yes, but only if the updates are modifying ownerReferences

Sure - that is exactly the point I had on my mind.
So doesn't that by itself deserve that we should protect this case?

I don't think that's necessarily true. Imagine a controller updating some status on an object as long as it exists, but which is depending on the object to go away when its owner does.

Sure - but I'm assuming that it shouldn't be more frequently than every 10s or so (so far by frequently I meant more than 1qps). And with longer delays between updates, we should eventually GC that (the latency of watch+GC processing time should be smaller than that).

fejta-bot · 2021-02-21T16:17:11Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

mborsz · 2021-02-22T07:57:10Z

/remove-lifecycle stale

zorro786 · 2021-04-07T05:19:46Z

@mborsz I am interested to increase deletion rate too, trying to understand your issue here. Does it mean with default settings at controller manager (--kube-api-qps=30), in one second only max 15 pods will go from RUNNING->TERMINATING? Or is this rate about after TERMINATING to removal from API Server?

mborsz · 2021-04-07T06:55:33Z

Maciej Borsz I am interested to increase deletion rate too, trying to understand your issue here. Does it mean with default settings at controller manager (--kube-api-qps=30), in one second only max 15 pods will go from RUNNING->TERMINATED? Or is this rate about after TERMINATED to removal from API Server?

In that case, only 15 pods will be marked to be deleted (by setting deletetionTimestamp != 0; pod phase will not change IIUCC). Then kubelets will remove objects from API server, once they terminate the process running on the node.

zorro786 · 2021-04-07T07:02:50Z

In that case, only 15 pods will be marked to be deleted (by setting deletetionTimestamp != 0; pod phase will not change IIUCC). Then kubelets will remove objects from API server, once they terminate the process running on the node.

I see okay, I was interested to increase the rate at which pod status changes from RUNNING -> TERMINATING after deletion. If I delete say 1000 pods at once, their state changes from RUNNING -> TERMINATING in small batches.

fejta-bot · 2021-07-06T07:07:48Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

wojtek-t · 2021-07-06T07:15:14Z

/remove-lifecycle stale

k8s-triage-robot · 2021-10-04T08:09:04Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

wojtek-t · 2021-10-04T08:09:48Z

/remove-lifecycle stale

k8s-triage-robot · 2022-01-02T08:10:54Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

wojtek-t · 2022-01-03T07:12:31Z

/remove-lifecycle stale

k8s-triage-robot · 2022-04-03T07:30:30Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

mborsz · 2022-04-04T07:09:58Z

Fixed in #104966

/close

k8s-ci-robot · 2022-04-04T07:10:19Z

@mborsz: Closing this issue.

In response to this:

Fixed in #104966

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mborsz added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 20, 2020

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 20, 2020

mborsz changed the title ~~Increase pod deletion rate~~ Increase supported pod deletion rate Nov 20, 2020

k8s-ci-robot added sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 20, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 21, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 22, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 6, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 6, 2021

mborsz mentioned this issue Sep 13, 2021

Increase QPS in garbage collector controller #104966

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 4, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 4, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 2, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 3, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 3, 2022

k8s-ci-robot closed this as completed Apr 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase supported pod deletion rate #96752

Increase supported pod deletion rate #96752

mborsz commented Nov 20, 2020 •

edited

Loading

k8s-ci-robot commented Nov 20, 2020

mborsz commented Nov 20, 2020

wojtek-t commented Nov 23, 2020

mborsz commented Nov 23, 2020

wojtek-t commented Nov 23, 2020

mborsz commented Nov 23, 2020

liggitt commented Nov 23, 2020

wojtek-t commented Nov 23, 2020

mborsz commented Nov 23, 2020

mborsz commented Nov 23, 2020

wojtek-t commented Nov 23, 2020

liggitt commented Nov 23, 2020

wojtek-t commented Nov 23, 2020

fejta-bot commented Feb 21, 2021

mborsz commented Feb 22, 2021

zorro786 commented Apr 7, 2021 •

edited

Loading

mborsz commented Apr 7, 2021

zorro786 commented Apr 7, 2021 •

edited

Loading

fejta-bot commented Jul 6, 2021

wojtek-t commented Jul 6, 2021

k8s-triage-robot commented Oct 4, 2021

wojtek-t commented Oct 4, 2021

k8s-triage-robot commented Jan 2, 2022

wojtek-t commented Jan 3, 2022

k8s-triage-robot commented Apr 3, 2022

mborsz commented Apr 4, 2022

k8s-ci-robot commented Apr 4, 2022

Increase supported pod deletion rate #96752

Increase supported pod deletion rate #96752

Comments

mborsz commented Nov 20, 2020 • edited Loading

k8s-ci-robot commented Nov 20, 2020

mborsz commented Nov 20, 2020

wojtek-t commented Nov 23, 2020

mborsz commented Nov 23, 2020

wojtek-t commented Nov 23, 2020

mborsz commented Nov 23, 2020

liggitt commented Nov 23, 2020

wojtek-t commented Nov 23, 2020

mborsz commented Nov 23, 2020

mborsz commented Nov 23, 2020

wojtek-t commented Nov 23, 2020

liggitt commented Nov 23, 2020

wojtek-t commented Nov 23, 2020

fejta-bot commented Feb 21, 2021

mborsz commented Feb 22, 2021

zorro786 commented Apr 7, 2021 • edited Loading

mborsz commented Apr 7, 2021

zorro786 commented Apr 7, 2021 • edited Loading

fejta-bot commented Jul 6, 2021

wojtek-t commented Jul 6, 2021

k8s-triage-robot commented Oct 4, 2021

wojtek-t commented Oct 4, 2021

k8s-triage-robot commented Jan 2, 2022

wojtek-t commented Jan 3, 2022

k8s-triage-robot commented Apr 3, 2022

mborsz commented Apr 4, 2022

k8s-ci-robot commented Apr 4, 2022

mborsz commented Nov 20, 2020 •

edited

Loading

zorro786 commented Apr 7, 2021 •

edited

Loading

zorro786 commented Apr 7, 2021 •

edited

Loading