-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
attachdetach controller: attach volumes immediately when Pod's PVCs are bound #66863
Conversation
// Skip unbound PVCs. | ||
return | ||
} | ||
pods, err := dswp.podLister.List(labels.Everything()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way - perhaps we can solve this is, by implementing work queues in attach/detach controller. The problem here is - any random PVC update will cause iteration of all pods in the cluster. One can perhaps DOS the cluster by constantly updating the pvc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. I'll update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because PVC does not contain Pod references, but we can index pods by PVC locally, then we don't need to iterate all pods every time. For example, daemon controller indexes pods based on their NodeName.
I think this will improve performance in large cluster.
/test pull-kubernetes-integration |
@cofyc , do you think it is cherry-pickable to 1.10, 1.11? |
@kubernetes/sig-storage-pr-reviews |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly looks good to me.
UpdateFunc: func(old, new interface{}) { | ||
adc.enqueuePVC(new) | ||
}, | ||
DeleteFunc: nil, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think you need to specify DeleteFunc
if you don't want to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
@redbaron |
@cofyc can you please squash your commits? |
waitForPodFuncInDSWP(t, ctrl.GetDesiredStateOfWorld(), 10*time.Second, "expected 1 pod in dsw", 1) | ||
createPVForPVC(t, testClient, pvc) | ||
waitForPodFuncInDSWP(t, ctrl.GetDesiredStateOfWorld(), 20*time.Second, "expected 2 pods in dsw after PVCs are bound", 2) | ||
close(stopCh) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how well 10s and 20s setting will work? What is the typical delay under normal load, and heavy load?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
attachdetach controller processes Pod's PVs to attach very fast, the delay should be almost same as apiserver watch event delay, because it does nothing except adding Pod's Volumes into DSWP (no api calls) on event.
In my local test, delay to attach PV is less than 100ms. The most delay is waiting PV controller to bound PVC, normally 4~5 seconds.
Perhaps we should increase timeout
of waitForPodFuncInDSWP
, and it will not increase execution time of test.
time.Sleep(10 * time.Second)
is a hack to verify the second pod cannot be added into DSWP, because no object state or event can be checked to verify. 10 seconds should be enough here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timeout
of waitForPodFuncInDSWP
is increased to 60 seconds now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome! I just have two nits
} | ||
pvc, err := adc.pvcLister.PersistentVolumeClaims(namespace).Get(name) | ||
if apierrors.IsNotFound(err) { | ||
glog.V(2).Infof("error getting pvc %q from informer: %v", key, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this really be log level 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, increased to log level 4.
if !ok { | ||
return []string{}, nil | ||
} | ||
// We are only interested in active pods with nodeName set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a comment that this cache is only used for attaching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Largely LGTM, a couple of questions.
func (adc *attachDetachController) enqueuePVC(obj interface{}) { | ||
key, err := kcache.DeletionHandlingMetaNamespaceKeyFunc(obj) | ||
if err != nil { | ||
runtime.HandleError(fmt.Errorf("Couldn't get key for object %+v: %v", obj, err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious: Why runtime.HandleError
? Why not just glog.Errorf
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed https://github.com/kubernetes/kubernetes/blob/v1.12.0-alpha.1/pkg/controller/daemon/daemon_controller.go#L316, and found sample-controller uses runtime.HandleError
to log errors too. Perhaps because runtime.HandleError
is dedicated to log non-user facing ignored errors and can be extended by adding error handlers.
return nil | ||
} | ||
|
||
objs, err := adc.podIndexer.ByIndex(pvcKeyIndex, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to verify, if there are multiple pods using the same PVC, this will return all pods? If so make sure there is a test case for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, test has been updated. Created more pvc notbound pods for testing in 74cfffc.
…re bound - Use queue to process PVCs on add/update events - Index pods by PVC key then we don't need to iterate to find pods
…re bound - Add integration test for this feature
/test pull-kubernetes-integration |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cofyc, saad-ali The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-kubernetes-integration |
/retest |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here. |
I've created cherry pick PRs for 1.9/1.10/1.11:
|
This PS moves to use k8s 1.10.8, which includes a couple of fixes for PVC mounts. * kubernetes/kubernetes#66863 Change-Id: Ica30950a8200f5755897b51fd2b4d24c69a10e61 Signed-off-by: Pete Birley <pete@port.direct>
What this PR does / why we need it:
Let attachdetach controller to attach volumes immediately when Pod's PVCs are bound.
Current attachdetach controller calls
util.ProcessPodVolume
to add pod volumes intodesiredStateOfWorld
on these events:desiredStateOfWorldPopulator.findAndAddActivePod
But if a pod is created with PVCs not bound, no volumes will be added into
desiredStateOfWorld
because PVCs not bound. When pv controller binds PVCs successfully, attachdetach controller will not add pod volumes immediately because it does not watch on PVC events.It will wait until a pod update event is triggered (normally will not happen because no new status will be reported by kubelet) or
desiredStateOfWorldPopulator.findAndAddActivePod
is called (maybe 0~3 minutes later, see timer configs).In bad case, pod start time will be very long (~3 minutes + ~2 minutes (kubelet max exponential backoff)), for example: #64549 (comment).
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #64549
Special notes for your reviewer:
Release note: