List all nodes and occupy cidr map before starting allocations #29062

bprashanth · 2016-07-16T18:54:56Z

Manually tested by starting a 200 node cluster with frequent controller manager restarts.
Fixes #29058

justinsb · 2016-07-16T19:01:31Z

pkg/controller/node/cidr_allocator.go

@@ -82,6 +82,17 @@ func NewCIDRRangeAllocator(client clientset.Interface, clusterCIDR *net.IPNet, s
 	} else {
 		glog.V(0).Info("No Service CIDR provided. Skipping filtering out service addresses.")
 	}
+	for _, node := range nodeList.Items {
+		if node.Spec.PodCIDR == "" {
+			glog.Infof("Node %v has no CIDR, ignoring", node.Name)


Did you want to 'continue' here

Doesn't really matter either way, i guess😃

yeah that's just for logging

justinsb · 2016-07-16T19:04:34Z

Haven't tested but looks right for a backport. I think there is still a theoretical race but think it is very unlikely now

bprashanth · 2016-07-16T19:06:30Z

I think as long as we do it in a STW phase we know we're not allocating any CIDRS, so the list has all our previous allocations. Please, please try to shoot holes through this :)

bprashanth · 2016-07-16T19:11:46Z

IF we have an inflith request stuck in apiserver memory that doesn't show up in our list but makes it to etcd after we populate the map, and in that time we allocate that cidr, things might go wrong. Apiserver should discard all broken connection writes though, and if it made it through to the datastore, it should show up in the list.

bprashanth · 2016-07-16T20:08:48Z

Cleaned it up a bit, deployed to 3 node cluster with restarting nodecontroller and it no dupes. Trying a larger cluster.

justinsb · 2016-07-16T20:09:53Z

The only failure case I could come up with was if there were no CIDRs left for the nodes, and we deleted and created a bunch of nodes in between the two lists, we might incorrectly deny some node allocations which we could satisfy. But (1) the window is tiny and (2) the existing code doesn't handle the no-cidrs-left case very well anyway and (3) this isn't a double-allocation so not a disaster. So I think this is great (presuming we don't spot anything), and has the advantage that it isn't an impossible cherry-pick onto the multiple branches.

bgrant0607 · 2016-07-16T20:33:48Z

FAIL    k8s.io/kubernetes/contrib/mesos/cmd/km [build failed]

@mtaufen encountered this a couple days ago.

k8s-github-robot · 2016-07-16T20:53:22Z

@bgrant0607
You must link to the test flake issue which caused you to request this manual re-test.
Re-test requests should be in the form of: k8s-bot test this issue: #<number>
Here is the list of open test flakes.

bprashanth · 2016-07-16T20:54:54Z

contrib mesos has their own controller manager startup routine that's calling the same nodecontroller init function (contrib/pkg/controllermanager/controllermanager.go as opposed to kube-controller-manager). Invocation looks identical so I gave it the same failure mode.

bgrant0607 · 2016-07-16T21:50:36Z

LGTM

This does point out that controllers need a way to determine that they are reading the most up-to-date state, without performing an extra write. I also encountered that when toying with a fix for the bugs in updateCIDRAllocation. I'll think about that problem.

In the future, CIDR allocation may be a good use case for multi-key transactions. The Node API could keep a secondary index of allocated CIDRs and could fail node updates that would double-assign the same CIDR.

k8s-bot · 2016-07-16T22:37:19Z

GCE e2e build/test passed for commit 2f9516d.

k8s-github-robot · 2016-07-16T22:49:10Z

Automatic merge from submit-queue

…8604-#29062-upstream-release-1.3 Automatic merge from submit-queue Automated cherry pick of #28604 #29062 Cherry pick of #28604 #29062 on release-1.3.

k8s-cherrypick-bot · 2016-07-17T00:28:53Z

Commit found in the "release-1.3" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

…#29062 Move CIDR allocation logic away from nodecontroller.go List all nodes and occupy cidr map before starting allocations :100644 100644 85277c4... 703aaaa... M cmd/integration/integration.go :100644 100644 9947e45... 7ce3365... M cmd/kube-controller-manager/app/controllermanager.go :100644 100644 f05792a... 95170ae... M pkg/controller/node/cidr_allocator.go :100644 100644 37cfdf6... 8d43713... M pkg/controller/node/cidr_allocator_test.go :000000 100644 0000000... 5f013d9... A pkg/controller/node/cidr_set.go :000000 100644 0000000... 5738da0... A pkg/controller/node/cidr_set_test.go :100644 100644 4549202... 1bb3402... M pkg/controller/node/nodecontroller.go :100644 100644 5c60e02... a87beeb... M pkg/controller/node/nodecontroller_test.go :000000 100644 0000000... 94e50c5... A pkg/controller/node/test_utils.go

…pick-of-#28604-kubernetes#29062-upstream-release-1.3 Automatic merge from submit-queue Automated cherry pick of kubernetes#28604 kubernetes#29062 Cherry pick of kubernetes#28604 kubernetes#29062 on release-1.3.

bprashanth self-assigned this Jul 16, 2016

googlebot added the cla: yes label Jul 16, 2016

bprashanth mentioned this pull request Jul 16, 2016

Duplicate pod CIDR assignments in 1.3.0 #29058

Closed

justinsb reviewed Jul 16, 2016
View reviewed changes

k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed labels Jul 16, 2016

bprashanth force-pushed the cidr branch from 4f41885 to f92b5e9 Compare July 16, 2016 20:07

bprashanth assigned justinsb and j3ffml and unassigned bprashanth Jul 16, 2016

bprashanth changed the title ~~[wip] List all nodes and occupy cidr map before starting allocations~~ List all nodes and occupy cidr map before starting allocations Jul 16, 2016

k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 16, 2016

bgrant0607 added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Jul 16, 2016

List all nodes and occupy cidr map before starting allocations

2f9516d

bprashanth force-pushed the cidr branch from f92b5e9 to 2f9516d Compare July 16, 2016 20:54

bgrant0607 added lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Jul 16, 2016

bprashanth mentioned this pull request Jul 16, 2016

Automated cherry pick of #28604 #29062 #29063

Merged

k8s-github-robot merged commit 0bec77b into kubernetes:master Jul 16, 2016

davidopp added this to the v1.3 milestone Jul 16, 2016

davidopp added the cherrypick-candidate label Jul 16, 2016

k8s-cherrypick-bot removed the cherrypick-candidate label Jul 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List all nodes and occupy cidr map before starting allocations #29062

List all nodes and occupy cidr map before starting allocations #29062

bprashanth commented Jul 16, 2016 •

edited

Loading

justinsb Jul 16, 2016

justinsb Jul 16, 2016

bprashanth Jul 16, 2016

justinsb commented Jul 16, 2016

bprashanth commented Jul 16, 2016

bprashanth commented Jul 16, 2016

bprashanth commented Jul 16, 2016

justinsb commented Jul 16, 2016

bgrant0607 commented Jul 16, 2016

k8s-github-robot commented Jul 16, 2016

bprashanth commented Jul 16, 2016

bgrant0607 commented Jul 16, 2016

k8s-bot commented Jul 16, 2016

k8s-github-robot commented Jul 16, 2016

k8s-cherrypick-bot commented Jul 17, 2016

List all nodes and occupy cidr map before starting allocations #29062

List all nodes and occupy cidr map before starting allocations #29062

Conversation

bprashanth commented Jul 16, 2016 • edited Loading

justinsb Jul 16, 2016

Choose a reason for hiding this comment

justinsb Jul 16, 2016

Choose a reason for hiding this comment

bprashanth Jul 16, 2016

Choose a reason for hiding this comment

justinsb commented Jul 16, 2016

bprashanth commented Jul 16, 2016

bprashanth commented Jul 16, 2016

bprashanth commented Jul 16, 2016

justinsb commented Jul 16, 2016

bgrant0607 commented Jul 16, 2016

k8s-github-robot commented Jul 16, 2016

bprashanth commented Jul 16, 2016

bgrant0607 commented Jul 16, 2016

k8s-bot commented Jul 16, 2016

k8s-github-robot commented Jul 16, 2016

k8s-cherrypick-bot commented Jul 17, 2016

bprashanth commented Jul 16, 2016 •

edited

Loading