-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a persistent volume label controller to the cloud-controller-manager #44680
Conversation
So AFAICT the magic here is to set "Phase" to Pending.
|
What's the latest on the generic admission controller that you were working on. Honestly, that would satisfy this use-case better.
Yeah, that's not ok. The previous behavior was -
We are NOT making ccm mandatory in 1.7. |
I looked at conditionalizing the setting of the status, but it looks like the pv-controller sets the status to Available when performing a sync or when the pv is unbound. That should handle setting the pv to Available for non-cloud pv shouldn't it? If not I could set the status only for GCE or AWS volumes. Taking this approach would allow us to completely deprecate the admission plugin in the future. When the ccm becomes mandatory there will really be no need for the admission plugin at all.
Are you wanting to keep the admission plugin as is? That could cause complications. |
cc @sttts @liggitt @deads2k @derekwaynecarr @kubernetes/rh-cluster-infra @eparis |
@@ -216,6 +217,11 @@ func StartControllers(s *options.CloudControllerManagerServer, kubeconfig *restc | |||
nodeController.Run() | |||
time.Sleep(wait.Jitter(s.ControllerStartInterval.Duration, ControllerStartJitter)) | |||
|
|||
// Start thePersistentVolumeLabelController |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the<space>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
"k8s.io/client-go/tools/cache" | ||
"k8s.io/client-go/util/workqueue" | ||
|
||
"k8s.io/kube-aggregator/pkg/controllers" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we allow this import from pkg/ to k8s.io/kube-aggregator? IMO, we shouldn't. /cc @deads2k
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to the controller import
"k8s.io/kubernetes/pkg/cloudprovider" | ||
"k8s.io/kubernetes/pkg/cloudprovider/providers/aws" | ||
"k8s.io/kubernetes/pkg/cloudprovider/providers/gce" | ||
vol "k8s.io/kubernetes/pkg/volume" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the alias? volume is nice and short.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
@@ -15,154 +15,3 @@ limitations under the License. | |||
*/ | |||
|
|||
package label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this empty now? Then just delete it.
|
||
import ( | ||
"testing" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no empty line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
limitations under the License. | ||
*/ | ||
|
||
package cloud |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are here any changes or is it a plain move? I think the only way to make this easier to review is to have a pure move commit and one to fix-up everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pvlcontroller is new. The meat of the work is the logic from the admission controller, but there have been some minor changes to work within a controller vs the admission plugin. The logic related to creating/starting the controller is all new
@@ -216,6 +217,11 @@ func StartControllers(s *options.CloudControllerManagerServer, kubeconfig *restc | |||
nodeController.Run() | |||
time.Sleep(wait.Jitter(s.ControllerStartInterval.Duration, ControllerStartJitter)) | |||
|
|||
// Start thePersistentVolumeLabelController | |||
pvlController := pvlcontroller.NewPersistentVolumeLabelController(sharedInformers.Core().V1().PersistentVolumes(), cloud) | |||
go pvlController.Run(5, stop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
threadiness := 5
would make this clearer.
I will take a look ASAP |
01a7a39
to
2d6195f
Compare
@thockin atm, this PR is incomplete because of the race condition with the pv-controller and setting the status of cloud pv. We need to resolve that before you spend time reviewing. |
curVol.Labels = volume.Labels | ||
curVol.Status.Phase = volume.Status.Phase | ||
|
||
_, err = pvlc.kubeClient.Core().PersistentVolumes().Update(curVol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a patch here? Avoids the version skew problems, and should be clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to using a patch
} | ||
// glog.Info("Pausing before setting volume available") | ||
// time.Sleep(30 * time.Second) | ||
volume.Status.Phase = v1.VolumeAvailable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I feel like I've done this review before, so forgive me if you've already explained this somewhere and just point me there!)
So it is possible that there are no labels, and I don't understand why we wouldn't then mark the volume as Available in that case also. But in general, what is the lifecycle of the volume phase here? It looks like we aren't solely responsible for updating the phase. And in that case, are we guaranteed that the volume won't be used before we label it?
Never mind, looks like I probably shouldn't have reviewed this one. Sorry for noise. |
@smarterclayton PTAL at the initializer use |
/test pull-kubernetes-e2e-gce-gpu |
// Only add labels if in the list of initializers | ||
if needsInitialization(vol.Initializers, initializerName) { | ||
if vol.Spec.AWSElasticBlockStore != nil { | ||
labels, err := pvlc.findAWSEBSLabels(vol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rrati How long is this call going to take? We might enforce a global timeout on all initializers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rrati I put the do-not-merge
label on the PR for this question because it might affect the evolving of initializers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same logic that is in the admission plugin. It will be going away in a future revision of this controller. The call is very fast in my testing, but I haven't hit any instances where the API won't respond.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@caesarxuchao what happens if this initializer times out? Is the object considered uninitializable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We perhaps will let the apiserver or garbage collector delete the object.
The call is very fast in my testing
How fast is the API call? Less than 1s? Could you do caching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API call itself is <<1s in my testing. However, that number could go up if the cloud provider is having issues, and I don't have any numbers for that. I never ran into a case where that occurred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<<1s sounds good. Thanks.
@caesarxuchao I think the question has been answered. Looks to be fine. Even if it did take a while, this still appears to the structurally correct way of dealing with the problem. |
Looks ok as long as the "alpha initializers are disabled ootb" doesn't cause issues for this. /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, ironcladlou, rrati, smarterclayton Associated issue: 88 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
re: timeout this is an excellent example of an initializer that should block progress on PVCs until it completes - i don't realistically think failure is an option for correct labelling. |
@smarterclayton Can you approve the changes you requested? |
Great to see this going through @rrati! Thanks for seeing this through. @smarterclayton - the review is on an outdated commit, and the concerns have been addressed since. If everything looks good to you, please approve it. |
The PR is lgtm'ed, approved, and in queue. |
@smarterclayton @deads2k @rrati I was in a discussion on whether it's realistic to set a global initilaization timeout at 15s, and let the apiserver or a garbage collector delete objects that exceed the limit. I think timing out this initializer is ok, user should retry creating the PV, as long as the timeout only happens occasionally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good but I we need a plan to better break out the provider specific pieces.
@@ -220,6 +220,12 @@ func StartControllers(s *options.CloudControllerManagerServer, kubeconfig *restc | |||
nodeController.Run() | |||
time.Sleep(wait.Jitter(s.ControllerStartInterval.Duration, ControllerStartJitter)) | |||
|
|||
// Start the PersistentVolumeLabelController | |||
pvlController := cloudcontrollers.NewPersistentVolumeLabelController(client("pvl-controller"), cloud) | |||
threads := 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 5? Would it be worth making this configurable?
|
||
// Only add labels if in the list of initializers | ||
if needsInitialization(vol.Initializers, initializerName) { | ||
if vol.Spec.AWSElasticBlockStore != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an idea of how to get this block out of here and in to the AWS specific cloud provider section?
It seems as though we should add a GetVolumeLabel method to the cloud provider interface.
} | ||
volumeLabels = labels | ||
} | ||
if vol.Spec.GCEPersistentDisk != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto can we work out how to move this into the gce provider section?
/test pull-kubernetes-bazel-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and thank you for doing this. We need to track moving the AWS/GCE specific code into the relevant provider section.
@cheftako can you please open an issue for tracking that?
… On 01 Sep 2017, at 00:11, Walter Fender ***@***.***> wrote:
@cheftako commented on this pull request.
Looks good and thank you for doing this. We need to track moving the AWS/GCE specific code into the relevant provider section.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
There already is an issue for removing the cloud provider linkage #51629 |
Automatic merge from submit-queue (batch tested with PRs 51574, 51534, 49257, 44680, 48836) |
related to #49402 |
Part of kubernetes/enhancements#88
Outstanding concerns needing input:
I'm willing to work on addressing the removal of the direct linkage to aws/gce after this PR gets in.