-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PetSets PVCs are not spread across Zones #27256
Comments
we should be able to express zone spreading through pod antiaffinity and the pvs should follow. Is this not the case? |
@bprashanth I don't believe so. Volume creation is separate from pod creation. Currently the volumes are all created in the "master zone". It isn't clear what happens in multi-AZ scenarios, but if KCM is doing the volume creation then I expect they will still end up in one zone. I don't think we have any reasonable hope of 3 volumes landing in 3 separate zones anyway, which is what I think we want. Going to put up a super-strawman PoC in a few minutes, hopefully should be clearer then! |
Maybe not today, but in general, why would we not want them to simply follow the pods? |
By default i mean, there are obviously scenarios where one might want to explicity specify the zone of a pv, and I still think that should be possible. |
Not sure I follow the first comment. On GCE & AWS, volumes are created in a specific zone and can't be attached cross-zone and can't easily be moved. The volume's zone will actually dictate the pod's zone. My PoC doesn't do a lot, but it does let you specify a zone explicitly! Stay tuned... |
I'm going to tag this for the 1.3 milestone, so it stays on the radar. I think there are some options to fix this (as discussed in 27257). I think the current behaviour will be very surprising / disappointing to people trying out PetSets in 1.3. |
Kicking out stuff from 1.3, it's too late. |
Fix in #27553 |
Long term we plan on integrating this into the scheduler, but in the short term we use the volume name to place it onto a zone. We hash the volume name so we don't bias to the first few zones. If the volume name "looks like" a PetSet volume name (ending with -<number>) then we use the number as an offset. In that case we hash the base name. Fixes kubernetes#27256
PVC volumes are currently created in a single zone, or at least not reliably spread across zones. Presumably someone running a zookeeper PetSet on HA k8s wants zone-failure tolerance, so we should spread the created volumes across zones.
I'm going to work up a strawman approach to this, but wanted to record the issue separately.
cc @bprashanth
The text was updated successfully, but these errors were encountered: