-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSI Node info registration in kubelet #67684
CSI Node info registration in kubelet #67684
Conversation
/sig storage |
f87962d
to
c71f9cb
Compare
|
||
// updateNode repeatedly attempts to update the corresponding node object | ||
// which is modified by applying the given update functions in order. | ||
func (lm *nodeInfoManager) updateNode(updateFuncs []nodeUpdateFunc) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will rename all the receivers to nim
.
} | ||
|
||
if val, ok := existingDriverMap[csiDriverName]; ok { | ||
if reflect.DeepEqual(val, topologyKeys) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will check set equality instead in the new version that uses CSINodeInfo
.
c71f9cb
to
b2ff926
Compare
Feel free to hold off on reviewing this PR for now - I'm almost done with wiring up CSINodeInfo and a good amount of logic will become obsolete |
b2ff926
to
ab5b04e
Compare
Added integration with Missing functionality is updated in the top post, but adding those won't cause any major structural changes. The PR is ready for review now. @liggitt @saad-ali @vladimirvivien @sbezverk @msau42 @gnufied |
ab5b04e
to
0128473
Compare
@saad-ali and I discussed offline, and we are going to leave the CSI label prefix as is in Kubernetes instead of reversing it, for the time being. |
|
||
var nodeKind = v1.SchemeGroupVersion.WithKind("Node") | ||
|
||
// nodeInfoManager is struct of channels used for communication between the driver registration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment needs to be updated. I don't see any channels being used
return fmt.Errorf("error adding CSI driver node info: driverNodeID must not be empty") | ||
} | ||
|
||
if topology == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a plugin doesn't support topology will this be nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, thanks for the catch!
} | ||
} | ||
|
||
func (nim *nodeInfoManager) AddNodeInfo(driverName string, driverNodeID string, topology *csipb.Topology) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment here describing exactly all the objects/fields that are being modified/added?
// exponential backoff to avoid exhausting the apiserver. | ||
|
||
nodeClient := nim.k8s.CoreV1().Nodes() | ||
node, err := nodeClient.Get(string(nim.nodeName), metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there plans to use a shared informer? If so, we should clone the node object first before modifying it so it doesn't impact other uses that may be sharing the same pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I wasn't planning to. Driver registration is fairly infrequent so watching the object doesn't seem worth it
|
||
needUpdate := false | ||
for _, update := range updateFuncs { | ||
newNode, updated, err := update(node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works only if the multiple updateFuncs are not conflicting with each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make it more explicit in the comments for this function that because updateFuncs are called in order, each updateFunc should consider the effects of previous updateFuncs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each updateFunc should consider the effects of previous updateFuncs
what does that mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment here - does that help or should it be more clear?
https://github.com/kubernetes/kubernetes/blob/9c855fb9e62de46f511d9dc0446425c1b5a76499/pkg/volume/csi/nodeinfomanager/nodeinfomanager.go#L119
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean, if two update functions both update same field then last one will win, but caller of this function should try and ensure that - it does not happen in first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well said! Will update the comment
driverInfo.NodeID = driverNodeID | ||
driverInfo.TopologyKeys = topologyKeys | ||
} | ||
driverInfos = append(driverInfos, driverInfo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be adding all drivers in this call to just update a single driver? Do all the CSI drivers register again when we upgrade the node?
Should we be looking at the annotations at all? IIUC, this is supposed to be the replacement for the annotations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah CSI drivers should register again upon node upgrade, and this is here as a safeguard to ensure annotations and CSINodeInfo are consistent while we support both. There is always a chance that kubelet is upgraded while the previous Node object stays, and annotations and CSINodeInfo could be momentarily out of sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to keep the annotations and CSIDriverInfo in sync on the first driver's registration, if the other driver will register again later? In an upgrade scenario, then this means we're on the first boot after upgrade and we're processing the first driver registration. We're adding an entry for all the other drivers too even though they have not gone through registration yet. Could that cause other issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this method should get whole list of registered CSI drivers so it creates/updates complete CSINodeInfo with all drivers. The result should be the same in the end, however it feels more robust. I don't line CSINodeInfo with empty topology for some drivers just because we translated it from an annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method gets called for a single driver registration though, and only on the first one since it's creating the object for the first time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with @msau42 offline. The current implementation doesn't guarantee the node info is always up-to-date. In particular, with the current kubelet plugin registration design, there is no way to remove a non-existent driver unless it detects that a driver socket is removed (after unregistration is implemented). If a driver is removed while kubelet is down, the node information will remain. So whatever issue comes up when CSINodeInfo is kept in sync with annotation is unavoidable today.
But I'll remove the logic to keep the two sources in sync, because it complicates the implementation without adding too much benefit. One minor advantage of keeping them in sync is to avoid debugging confusion, but I think it'll be OK.
pkg/volume/csi/csi_plugin.go
Outdated
// all other CSI components will be able to get the actual socket of CSI drivers by its name. | ||
csiDrivers.Lock() | ||
defer csiDrivers.Unlock() | ||
csiDrivers.driversMap[pluginName] = csiDriver{driverName: pluginName, driverEndpoint: endpoint} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously the lock was being held across the entire function instead of just the map update. Could there be any issues if this map has an entry but we haven't finished registering the plugin yet? Should we update the map at the very end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In both the previous and the current version, the lock is only held on write, all within this RegistrationCallback()
function. The map has to be updated before the csi.NodeGetInfo()
call unfortunately because it relies on the driver info to call out to gRPC.
However, when plugin unregistration is implemented in the future, depending on how it's implemented, there could be a race between calls of RegistrationCallback()
. Will update to lock the entire function once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with @msau42 offline. For now we'll keep the map locked as short as possible. When unregistration is implemented we'll reconsider again.
Will add comments explaining why it's OK to keep locking short.
50db873
to
db19252
Compare
…ing to CSINodeInfo; added unit tests for node updates; updated RBAC, NodeAuthorizer, NodeRestriction.
db19252
to
e966eb1
Compare
/lgtm |
@@ -164,6 +164,10 @@ func NodeRules() []rbacv1.PolicyRule { | |||
nodePolicyRules = append(nodePolicyRules, csiDriverRule) | |||
} | |||
} | |||
if utilfeature.DefaultFeatureGate.Enabled(features.KubeletPluginsWatcher) { | |||
csiNodeInfoRule := rbacv1helpers.NewRule("get", "create", "update", "patch", "delete").Groups("csi.storage.k8s.io").Resources("csinodeinfos").RuleOrDie() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the unit test is failing because this feature flag is enabled by default now. Is this still the right flag that determines if the node needs these permissions, or is it fenced by the CSINodeInfo flag as well now?
If this is correct as is, the fixture for the unit test just needs regenerating (instructions are in the unit test failure message)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's one spot I missed. Thanks for the catch!
one comment on the gate being checked by the RBAC policy change and the unit test failure, authz and admission changes lgtm otherwise it's worth noting that the removal of driver info from the node depends on observing the removal of the driver (rather than reconciling to current observed state), and doesn't handle errors in removing driver info by retrying. That doesn't necessarily block this PR, but seems important to resolve before this graduates. |
To clarify, in this PR driver is only unregistered when there's an error adding the driver. Driver unregistration will be added in the next release. The rest of the CSI system should always assume node info is somewhat out of date, since kubelet can't provide the guarantee that updates are reflected immediately anyway, in the best case. Does the same logic apply to driver addition as well? |
e966eb1
to
a4f339a
Compare
Updated RBAC test |
Keeping a laundry list of considerations for CSI kubelet driver registration moving forward: |
/approve RBAC, node authz and admission changes lgtm |
/retest |
The GCE e2e failure is likely caused by this PR. Taking a look |
a4f339a
to
94d649b
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liggitt, msau42, saad-ali, verult The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@verult: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest Review the full test history for this PR. Silence the bot with an |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md. |
Per #68519 (comment): installing CRD outside k8s core brings back issue #2:
I side with @msau42 that if the |
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #67683
Special notes for your reviewer:
Feature issue: kubernetes/enhancements#557
Design doc: kubernetes/community#2034
Missing pieces:
An RBAC rule is also added to support external-provisioner topology updates.
Release note: