-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List all nodes and occupy cidr map before starting allocations #29062
Conversation
@@ -82,6 +82,17 @@ func NewCIDRRangeAllocator(client clientset.Interface, clusterCIDR *net.IPNet, s | |||
} else { | |||
glog.V(0).Info("No Service CIDR provided. Skipping filtering out service addresses.") | |||
} | |||
for _, node := range nodeList.Items { | |||
if node.Spec.PodCIDR == "" { | |||
glog.Infof("Node %v has no CIDR, ignoring", node.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you want to 'continue' here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't really matter either way, i guess😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that's just for logging
Haven't tested but looks right for a backport. I think there is still a theoretical race but think it is very unlikely now |
I think as long as we do it in a STW phase we know we're not allocating any CIDRS, so the list has all our previous allocations. Please, please try to shoot holes through this :) |
IF we have an inflith request stuck in apiserver memory that doesn't show up in our list but makes it to etcd after we populate the map, and in that time we allocate that cidr, things might go wrong. Apiserver should discard all broken connection writes though, and if it made it through to the datastore, it should show up in the list. |
Cleaned it up a bit, deployed to 3 node cluster with restarting nodecontroller and it no dupes. Trying a larger cluster. |
The only failure case I could come up with was if there were no CIDRs left for the nodes, and we deleted and created a bunch of nodes in between the two lists, we might incorrectly deny some node allocations which we could satisfy. But (1) the window is tiny and (2) the existing code doesn't handle the no-cidrs-left case very well anyway and (3) this isn't a double-allocation so not a disaster. So I think this is great (presuming we don't spot anything), and has the advantage that it isn't an impossible cherry-pick onto the multiple branches. |
@mtaufen encountered this a couple days ago. |
@bgrant0607 |
contrib mesos has their own controller manager startup routine that's calling the same nodecontroller init function (contrib/pkg/controllermanager/controllermanager.go as opposed to kube-controller-manager). Invocation looks identical so I gave it the same failure mode. |
LGTM This does point out that controllers need a way to determine that they are reading the most up-to-date state, without performing an extra write. I also encountered that when toying with a fix for the bugs in updateCIDRAllocation. I'll think about that problem. In the future, CIDR allocation may be a good use case for multi-key transactions. The Node API could keep a secondary index of allocated CIDRs and could fail node updates that would double-assign the same CIDR. |
GCE e2e build/test passed for commit 2f9516d. |
Automatic merge from submit-queue |
Commit found in the "release-1.3" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
…#29062 Move CIDR allocation logic away from nodecontroller.go List all nodes and occupy cidr map before starting allocations :100644 100644 85277c4... 703aaaa... M cmd/integration/integration.go :100644 100644 9947e45... 7ce3365... M cmd/kube-controller-manager/app/controllermanager.go :100644 100644 f05792a... 95170ae... M pkg/controller/node/cidr_allocator.go :100644 100644 37cfdf6... 8d43713... M pkg/controller/node/cidr_allocator_test.go :000000 100644 0000000... 5f013d9... A pkg/controller/node/cidr_set.go :000000 100644 0000000... 5738da0... A pkg/controller/node/cidr_set_test.go :100644 100644 4549202... 1bb3402... M pkg/controller/node/nodecontroller.go :100644 100644 5c60e02... a87beeb... M pkg/controller/node/nodecontroller_test.go :000000 100644 0000000... 94e50c5... A pkg/controller/node/test_utils.go
…#29062 Move CIDR allocation logic away from nodecontroller.go List all nodes and occupy cidr map before starting allocations :100644 100644 85277c4... 703aaaa... M cmd/integration/integration.go :100644 100644 9947e45... 7ce3365... M cmd/kube-controller-manager/app/controllermanager.go :100644 100644 f05792a... 95170ae... M pkg/controller/node/cidr_allocator.go :100644 100644 37cfdf6... 8d43713... M pkg/controller/node/cidr_allocator_test.go :000000 100644 0000000... 5f013d9... A pkg/controller/node/cidr_set.go :000000 100644 0000000... 5738da0... A pkg/controller/node/cidr_set_test.go :100644 100644 4549202... 1bb3402... M pkg/controller/node/nodecontroller.go :100644 100644 5c60e02... a87beeb... M pkg/controller/node/nodecontroller_test.go :000000 100644 0000000... 94e50c5... A pkg/controller/node/test_utils.go
…pick-of-#28604-kubernetes#29062-upstream-release-1.3 Automatic merge from submit-queue Automated cherry pick of kubernetes#28604 kubernetes#29062 Cherry pick of kubernetes#28604 kubernetes#29062 on release-1.3.
Manually tested by starting a 200 node cluster with frequent controller manager restarts.
Fixes #29058