-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand petset volume zone spreading #27809
Comments
I am having scaling issues with multiple zone PVCs on GCE. We are getting timeout errors with the PVCs. We have 1000 nodes spread across 3 az in us-central1. Here is an example of how long the times are taking: $ kubectl get po
NAME READY STATUS RESTARTS AGE
cassandra-analytics-0 1/1 Running 0 12m
cassandra-analytics-1 1/1 Running 0 7m
cassandra-analytics-2 1/1 Running 0 2m
cassandra-analytics-3 0/1 ContainerCreating 0 28s
cassandra-data-0 1/1 Running 0 12m
cassandra-data-1 1/1 Running 0 5m
cassandra-data-2 0/1 ContainerCreating 0 1m I have seen times for 1-2m po creation in the same zones, but now in multiple zones it is crawling. $ kubectl describe cassandra-data-1
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
7m 7m 1 {default-scheduler } Normal Scheduled Successfully assigned cassandra-data-1 to kubernetes-minion-group-x8ev
5m 5m 1 {kubelet kubernetes-minion-group-x8ev} Warning FailedMount Unable to mount volumes for pod "cassandra-data-1_default(5acd02c7-38ea-11e6-b7d0-42010a800002)": timeout expired waiting for volumes to attach/mount for pod "cassandra-data-1"/"default". list of unattached/unmounted volumes=[cassandra-data]
5m 5m 1 {kubelet kubernetes-minion-group-x8ev} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "cassandra-data-1"/"default". list of unattached/unmounted volumes=[cassandra-data]
3m 3m 1 {kubelet kubernetes-minion-group-x8ev} spec.containers{cassandra} Normal Pulling pulling image "gcr.io/aronchick-apollobit/cassandra-debian:v1.1"
3m 3m 1 {kubelet kubernetes-minion-group-x8ev} spec.containers{cassandra} Normal Pulled Successfully pulled image "gcr.io/aronchick-apollobit/cassandra-debian:v1.1"
3m 3m 1 {kubelet kubernetes-minion-group-x8ev} spec.containers{cassandra} Normal Created Created container with docker id 05c294ddd491
3m 3m 1 {kubelet kubernetes-minion-group-x8ev} spec.containers{cassandra} Normal Started Started container with docker id 05c294ddd491 I am running the last 1.3 with #27553 patched in. I am also provisioning ssd pvc, but the volumes are created long before the timeout issues. |
This seems to be specifically when the Pet / Pod is being deployed in a zone where the master is not in. Some spot checking has shown that pods in central1-b, same as the master, get deployed quickly, which pods in central1-c or central-f, have the |
The sync loop in reconciler.go seems to be running for 2min I0623 05:10:59.821836 3599 kubelet.go:2528] SyncLoop (ADD, "api"): "cassandra-analytics-31_default(dfbe550a-3900-11e6-b7d0-42010a800002)"
I0623 05:10:59.978531 3599 reconciler.go:179] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/gce-pd/kubernetes-dynamic-pvc-9c4c75aa-38f0-11e6-b7d0-42010a800002" (spec.Name: "pvc-9c4c75aa-38f0-11e6-b7d0-42010a800002") pod "dfbe550a-3900-11e6-b7d0-42010a800002" (UID: "dfbe550a-3900-11e6-b7d0-42010a800002")
I0623 05:10:59.978599 3599 reconciler.go:179] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/secret/default-token-mu6y5" (spec.Name: "default-token-mu6y5") pod "dfbe550a-3900-11e6-b7d0-42010a800002" (UID: "dfbe550a-3900-11e6-b7d0-42010a800002")
E0623 05:10:59.980959 3599 goroutinemap.go:155] Operation for "kubernetes.io/gce-pd/kubernetes-dynamic-pvc-9c4c75aa-38f0-11e6-b7d0-42010a800002" failed. No retries permitted until 2016-06-23 05:11:00.480952365 +0000 UTC (durationBeforeRetry 500ms). error: Volume "kubernetes.io/gce-pd/kubernetes-dynamic-pvc-9c4c75aa-38f0-11e6-b7d0-42010a800002" (spec.Name: "pvc-9c4c75aa-38f0-11e6-b7d0-42010a800002") pod "dfbe550a-3900-11e6-b7d0-42010a800002" (UID: "dfbe550a-3900-11e6-b7d0-42010a800002") is not yet attached according to node status.
I0623 05:11:00.078995 3599 reconciler.go:253] MountVolume operation started for volume "kubernetes.io/secret/default-token-mu6y5" (spec.Name: "default-token-mu6y5") to pod "dfbe550a-3900-11e6-b7d0-42010a800002" (UID: "dfbe550a-3900-11e6-b7d0-42010a800002").
I0623 05:11:00.084887 3599 operation_executor.go:673] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/default-token-mu6y5" (spec.Name: "default-token-mu6y5") pod "dfbe550a-3900-11e6-b7d0-42010a800002" (UID: "dfbe550a-3900-11e6-b7d0-42010a800002").
I0623 05:11:00.580234 3599 reconciler.go:179] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/gce-pd/kubernetes-dynamic-pvc-9c4c75aa-38f0-11e6-b7d0-42010a800002" (spec.Name: "pvc-9c4c75aa-38f0-11e6-b7d0-42010a800002") pod "dfbe550a-3900-11e6-b7d0-42010a800002" (UID: "dfbe550a-3900-11e6-b7d0-42010a800002")
E0623 05:11:00.582425 3599 goroutinemap.go:155] Operation for "kubernetes.io/gce-pd/kubernetes-dynamic-pvc-9c4c75aa-38f0-11e6-b7d0-42010a800002" failed. No retries permitted until 2016-06-23 05:11:01.582420585 +0000 UTC (durationBeforeRetry 1s). error: Volume "kubernetes.io/gce-pd/kubernetes-dynamic-pvc-9c4c75aa-38f0-11e6-b7d0-42010a800002" (spec.Name: "pvc-9c4c75aa-38f0-11e6-b7d0-42010a800002") pod "dfbe550a-3900-11e6-b7d0-42010a800002" (UID: "dfbe550a-3900-11e6-b7d0-42010a800002") is not yet attached according to node status. And then it works I0623 05:15:07.929521 3599 reconciler.go:179] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/gce-pd/kubernetes-dynamic-pvc-9c4c75aa-38f0-11e6-b7d0-42010a800002" (spec.Name: "pvc-9c4c75aa-38f0-11e6-b7d0-42010a800002") pod "dfbe550a-3900-11e6-b7d0-42010a800002" (UID: "dfbe550a-3900-11e6-b7d0-42010a800002")
I0623 05:15:07.932210 3599 operation_executor.go:897] Controller successfully attached volume "kubernetes.io/gce-pd/kubernetes-dynamic-pvc-9c4c75aa-38f0-11e6-b7d0-42010a800002" (spec.Name: "pvc-9c4c75aa-38f0-11e6-b7d0-42010a800002") pod "dfbe550a-3900-11e6-b7d0-42010a800002" (UID: "dfbe550a-3900-11e6-b7d0-42010a800002")
I0623 05:15:08.029886 3599 reconciler.go:253] MountVolume operation started for volume "kubernetes.io/gce-pd/kubernetes-dynamic-pvc-9c4c75aa-38f0-11e6-b7d0-42010a800002" (spec.Name: "pvc-9c4c75aa-38f0-11e6-b7d0-42010a800002") to pod "dfbe550a-3900-11e6-b7d0-42010a800002" (UID: "dfbe550a-3900-11e6-b7d0-42010a800002"). Here is a grep for PVC out of the kubelet.log https://gist.github.com/chrislovecnm/50d722b04f73dcfd92800be48c584efa This process is taking about 30 secs when the master is in the same zone as the node, otherwise it is taking 2-4 min. This is a big scaling issue. Compare 30 seconds to 2-4 minutes when you are looking at 1000 pvc to mount. |
@saad-ali / @bprashanth you want me to open a separate issue for the scaling challenges? cc: @bgrant0607 |
Now I ran into another issue with this: I have 1008 minions (added 8 more because of this problem, and did not help). I am unable to deploy my last two C* instances. I am stuck at 998 :( The last pod get's stuck in pending am getting the following errors: $ kubectl describe po cassandra-data-498
<redacted for brevity>
fit failure on node (kubernetes-minion-group-cxmy): Insufficient CPU
fit failure on node (kubernetes-minion-group-zc10): Insufficient CPU
fit failure on node (kubernetes-minion-group-iyit): Insufficient CPU
fit failure on node (kubernetes-minion-group-nnhm): Insufficient CPU
fit failure on node (kubernetes-minion-group-2k68): NoVolumeZoneConflict
fit failure on node (kubernetes-minion-group-xfnx): Insufficient CPU
fit failure on node (kubernetes-minion-group-hbxt): Insufficient CPU
fit failure on node (kubernetes-minion-group-srdl): Insufficient CPU
fit failure on node (kubernetes-minion-group-fx7b): Insufficient CPU
fit failure on node (kubernetes-minion-group-26wv): Insufficient CPU
fit failure on node (kubernetes-minion-group-nd2g): Insufficient CPU
fit failure on node (kubernetes-minion-group-n2px): NoVolumeZoneConflict
fit failure on node (kubernetes-minion-group-4ndb): Insufficient CPU
fit failure on node (kubernetes-minion-group-7zf8): Insufficient CPU I have head room, as I have 8 new nodes in us-central1a that have nothing on them. Ideas ... kinda urgent, as my Demo is on Tuesday :( Don't think I have the time to recreate all of the C* instances. I have reduced the petset size by one, and increased it by one, and K8s keeps putting the volume into zone c. When that happens the pod just stays in pending. |
Im going to guess it's complaining that the pod is asking for a volume in a zone without capacity. Zone spreading is pretty dumb. You probably have the nodes in the wrong zone. If a volume is created in a zone you can't schedule anywhere else. Get the pvs for that pet (looks like they're already created because you scaled up and down -- kubectl get pv | awk '{print $5}' | grep 498 shoudl show you the pv, kubectl get pv PVNAME -o yaml | grep -i zone should show you the zone if it exists) and try create nodes in that zone. |
Yep ... added another node in the same zone and I got to 1k. I am loosing pets because of app problems, and one went into a $ kubectl describe po cassandra-analytics-112
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
8m 8m 1 {default-scheduler } Normal Scheduled Successfully assigned cassandra-analytics-112 to kubernetes-minion-group-0jg9
6m 17s 4 {kubelet kubernetes-minion-group-0jg9} Warning FailedMount Unable to mount volumes for pod "cassandra-analytics-112_default(13171b3b-3a50-11e6-b7d0-42010a800002)": timeout expired waiting for volumes to attach/mount for pod "cassandra-analytics-112"/"default". list of unattached/unmounted volumes=[cassandra-analytics]
6m 17s 4 {kubelet kubernetes-minion-group-0jg9} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "cassandra-analytics-112"/"default". list of unattached/unmounted volumes=[cassandra-analytics] kubelet.log E0624 21:23:22.929674 3622 goroutinemap.go:155] Operation for "kubernetes.io/gce-pd/kubernetes-dynamic-pvc-a5f45f53-38f0-11e6-b7d0-42010a800002" failed. No retries permitted until 2016-06-24 21:25:22.92967033 +0000 UTC (durationBeforeRetry 2m0s). error: UnmountVolume.TearDown failed for volume "kubernetes.io/gce-pd/kubernetes-dynamic-pvc-a5f45f53-38f0-11e6-b7d0-42010a800002" (volume.spec.Name: "cassandra-analytics") pod "6a577c2c-3a4b-11e6-b7d0-42010a800002" (UID: "6a577c2c-3a4b-11e6-b7d0-42010a800002") with: remove /var/lib/kubelet/pods/6a577c2c-3a4b-11e6-b7d0-42010a800002/volumes/kubernetes.io~gce-pd/pvc-a5f45f53-38f0-11e6-b7d0-42010a800002: directory not empty So I assume that it is behaving as expected, but I am a dork, and probably should have not deleted the pet. Options?? |
This issue was labelled only for sig/multicluster and is thus moved over to kubernetes-retired/federation#50. |
We got petset disk zone spreading in and it's really useful. However we left a couple of todos to follow up on:
@justinsb
The text was updated successfully, but these errors were encountered: