Controller-manager sees higher mem-usage when load test runs before density #61041

shyamjvs · 2018-03-12T15:05:10Z

I accidentally turned off our load test in PR #60973. But thanks to it, I observed this pattern in our controller-manager memory usage during density test:

You can see the jump after run 11479 when I re-enabled load test. And all subsequent spikes are seen in runs when the density test was preceeded by load test. We were seeing similar issues in past, but IIRC it was for kube-proxies. My feeling is this is related to endpoints-controller processing backlog - but need to confirm.

@wojtek-t - Is it sth already observed in the past? Do we consider it WAI or should we try to fix it?

@kubernetes/sig-scalability-bugs

shyamjvs · 2018-03-12T15:16:15Z

So this is also seen in the apiserver:

wojtek-t · 2018-03-12T15:56:18Z

It hopefully shouldn't be endpoint controller. Note that we don't start the second test before all namespaces from the previous one are deleted. That means endpoint controller wouldn't be able to update endpoints object becuase of non-existing namespace.
I hope that we would observe the higher number of errors somewhere if that's the case.

shyamjvs · 2018-03-12T15:58:00Z

I looked into the apiserver logs, and can't find any calls that have 'load' mentioned in them after the first call which has 'density' mentioned in it. This could potentially mean that the memory usage is coming from watches (though not of those created by the e2e test, as they IIUC are closed after the load test finishes). So maybe watches from kube-proxies or kubelets?

shyamjvs · 2018-03-12T16:01:04Z

I have one idea to check if it is sth around services. Let's disable them in our load test for the job and see.

shyamjvs · 2018-03-12T16:09:37Z

It hopefully shouldn't be endpoint controller. Note that we don't start the second test before all namespaces from the previous one are deleted. That means endpoint controller wouldn't be able to update endpoints object becuase of non-existing namespace.

That's true. But IIUC it's still possible that endpoints-controller is using memory, for e.g to process watch events for endpoints updates (coming from load test deletion phase)?

wojtek-t · 2018-03-12T16:12:47Z

I don't think it's watch related.
Yes - CM may accumulate some memory, because it allocated a lot in the past. And without memory-pressure not everything might have been cleared yet.

I don't think we should spent too much time on it now.

shyamjvs · 2018-03-12T16:16:38Z

Makes sense.

I don't think we should spent too much time on it now.

Sure... But this is just taking very small time while I'm running bisection for the pod-startup regression in the background :)

jberkus · 2018-03-12T21:09:41Z

@shyamjvs, @wojtek-t does this look more like a problem with the test, or does this look like a real user-affecting performance problem related to the other performance issues?

shyamjvs · 2018-03-12T21:15:01Z

This needs more digging, but AFAIU it's not a regression and we've been seeing it for a while already I think.
Wrt users this may not be much of a problem, but I'd want to hear out @wojtek-t on this.

shyamjvs · 2018-03-12T21:18:48Z

related to the other performance issues?

By "other performance issues" if you mean the recent ones that I've been hunting down (#60500, #60589), then I don't think so. Not 100% sure - but I believe they're different.

fejta-bot · 2018-06-10T21:21:22Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2018-07-10T22:07:10Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-08-09T22:53:22Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot added sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. kind/bug Categorizes issue or PR as related to a bug. labels Mar 12, 2018

shyamjvs mentioned this issue Mar 12, 2018

Disable services in 100-node performance job kubernetes/test-infra#7221

Closed

This was referenced Mar 12, 2018

1.10 Issue Burndown kubernetes/sig-release#86

Closed

v1.10 known issues / FAQ accumulator #59764

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 10, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 10, 2018

k8s-ci-robot closed this as completed Aug 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Controller-manager sees higher mem-usage when load test runs before density #61041

Controller-manager sees higher mem-usage when load test runs before density #61041

shyamjvs commented Mar 12, 2018 •

edited

Loading

shyamjvs commented Mar 12, 2018

wojtek-t commented Mar 12, 2018

shyamjvs commented Mar 12, 2018 •

edited

Loading

shyamjvs commented Mar 12, 2018

shyamjvs commented Mar 12, 2018

wojtek-t commented Mar 12, 2018

shyamjvs commented Mar 12, 2018

jberkus commented Mar 12, 2018

shyamjvs commented Mar 12, 2018

shyamjvs commented Mar 12, 2018

fejta-bot commented Jun 10, 2018

fejta-bot commented Jul 10, 2018

fejta-bot commented Aug 9, 2018

Controller-manager sees higher mem-usage when load test runs before density #61041

Controller-manager sees higher mem-usage when load test runs before density #61041

Comments

shyamjvs commented Mar 12, 2018 • edited Loading

shyamjvs commented Mar 12, 2018

wojtek-t commented Mar 12, 2018

shyamjvs commented Mar 12, 2018 • edited Loading

shyamjvs commented Mar 12, 2018

shyamjvs commented Mar 12, 2018

wojtek-t commented Mar 12, 2018

shyamjvs commented Mar 12, 2018

jberkus commented Mar 12, 2018

shyamjvs commented Mar 12, 2018

shyamjvs commented Mar 12, 2018

fejta-bot commented Jun 10, 2018

fejta-bot commented Jul 10, 2018

fejta-bot commented Aug 9, 2018

shyamjvs commented Mar 12, 2018 •

edited

Loading

shyamjvs commented Mar 12, 2018 •

edited

Loading