Kubernetes API extremely slow to trigger scale/scheduling #40271
Description
Lately we have noticed Kubernetes API responses becoming very slow on certain
actions. Actions like getting, deleteing and describing are instant as always,
but scaling or applying a new deployment can take up to 5 minutes to go through.
It's hard to pinpoint exactly when it became this slow, I'm inclined to say this
is a boiling frog situation where it just started creeping up on us, but I'm
convinced we had this issue on 1.4.x versions as well.
The best way to demonstrate the problem is with a screen recording of
the "scale" command (but to be sure; this happens for all actions that appear
to have something to do with scheduling). In the screenrecording, you can see it
taking about 90 seconds for the scale to actually take effect.
The same goes for deployments. If we rollout a new deployment, it can take up to
10 minutes to gradually rollout the new pods and remove the old pods, whereas
these deployments used to be a lot faster. You can see it's the API being
delayed, because the time between the new pod being "ready", and the next old
pod migrating to "terminated" can be several minutes.
We're running on our cluster on GKE, and have in fact been in contact with
Google Enterprise support for several weeks now, but it doesn't look like we're
getting any closer to a resolution, so I thought this might actually be related
to Kubernetes itself, and maybe someone more heavily involved in its development
might recognize this issue.
I can send all support logs on request, but some things we tried:
- create small deployment of just a bare
debian
pod (same result) - manage scaling through the Google Cloud Shell (same result)
- clean up all old replicasets (same result)
Kubernetes version (use kubectl version
):
Client Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.6+e569a27", GitCommit:"e569a27d02001e343cb68086bc06d47804f62af6", GitTreeState:"not a git tree", BuildDate:"2016-11-12T09:26:56Z", GoVersion:"go1.7.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1", GitCommit:"82450d03cb057bab0950214ef122b67c83fb11df", GitTreeState:"clean", BuildDate:"2016-12-14T00:52:01Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Environment:
- Cloud provider or hardware configuration: GKE
- OS (e.g. from /etc/os-release): GCI