Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCE PDs aren't labeled by zone and can't be deleted in ubernetes lite #24447

Closed
a-robinson opened this issue Apr 19, 2016 · 8 comments
Closed
Assignees
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@a-robinson
Copy link
Contributor

The "DynamicProvisioner should create and delete persistent volumes" test is failing because the volume is being provisioned in a different zone than the node that the pod is being placed onto. Attempting to mount the disk on the pod's node fails with:

gce_util.go:187] Error attaching PD "gke-jenkins-e2e-a4fe3d21-dynamic-pv-gce-0xcbx": GCE persistent disk not found: diskName="gke-jenkins-e2e-a4fe3d21-dynamic-pv-gce-0xcbx" zone="us-central1-a"

It's a known issue that dynamically provisioned PVs are created in the master's zone (#23330), but unexpected that the pod created with the claim is being placed in a different zone.

@quinton-hoole @saad-ali

@a-robinson a-robinson self-assigned this Apr 19, 2016
@a-robinson a-robinson added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. team/cluster labels Apr 19, 2016
@saad-ali
Copy link
Member

The PD does not appear to getting zone labels (failure-domain.beta.kubernetes.io/zone, etc.) applied to it. Description field from a GCE PD provisioned by the test:

{"kubernetes.io/created-for/pv/name":"pv-gce-j5swc","kubernetes.io/created-for/pvc/name":"pvc-nkmbw","kubernetes.io/created-for/pvc/namespace":"e2e-tests-volume-provisioning-9iwlu"}

CC @justinsb

@a-robinson
Copy link
Contributor Author

The controller-manager failed to uniquely find the disk:

error getting labels for volume "gke-jenkins-e2e-109093fb-dynamic-pv-gce-j5swc": GCE persistent disk name was found in multiple zones: "gke-jenkins-e2e-109093fb-dynamic-pv-gce-j5swc"  

It might be because there are a ton of leaked disks around in the project. I'll delete them, but @saad-ali is there an existing issue around PD leaks from this test?

@a-robinson
Copy link
Contributor Author

Ah, the leak is caused by the same underlying issue -- getDiskByNameUnknownZone thinking that it's finding the disk in multiple zones. So the issue is why getDiskByNameUnknownZone is finding it in multiple zones.

@a-robinson
Copy link
Contributor Author

@a-robinson
Copy link
Contributor Author

I'm not sure how this test has been passing for the ubernetes-lite on GCE test suite so consistently, given that the same issue would appear at first glance to apply there as well.

@a-robinson
Copy link
Contributor Author

Ah, it's one hell of a coincidence that it's been passing for the ubernetes-lite suite, but it does make sense.

It looks like GCE always returns the list of zones in a consistent order; this one, to be precise:

I0418 20:33:13.727509       7 gce.go:297] managing multiple zones: [us-central1-f us-central1-b us-central1-c us-central1-a]

The disk gets successfully found in the ubernetes-lite suite because the found = disk assignment in gce.getDiskByNameUnknownZone is harmless for the first three zones, because disk is nil. Then when us-central1-a comes around, the actual disk is assigned to found, which isn't checked again because us-central1-a happens to be last.

In the GKE test suite, this isn't the case because we're using us-central1-f as the master's zone, so when we check us-central1-b and don't get an error back from gce.findDiskByName, we bail.

@a-robinson a-robinson changed the title Dynamic volume provisioning test consistently failing in gke-multizone test suite GCE PDs aren't labled by zone and can't be deleted in ubernetes lite Apr 19, 2016
@ghost
Copy link

ghost commented Apr 19, 2016

Nice sleuthing Alex! What an embarassing bug. Thanks for fixing. I will review your fix in detail now to confirm.

@a-robinson a-robinson changed the title GCE PDs aren't labled by zone and can't be deleted in ubernetes lite GCE PDs aren't labeled by zone and can't be deleted in ubernetes lite Apr 19, 2016
@a-robinson
Copy link
Contributor Author

I've confirmed that the tests are passing now that the fix is in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

2 participants