Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List all nodes and occupy cidr map before starting allocations #29062

Merged
merged 1 commit into from
Jul 16, 2016

Conversation

bprashanth
Copy link
Contributor

@bprashanth bprashanth commented Jul 16, 2016

Manually tested by starting a 200 node cluster with frequent controller manager restarts.
Fixes #29058

@@ -82,6 +82,17 @@ func NewCIDRRangeAllocator(client clientset.Interface, clusterCIDR *net.IPNet, s
} else {
glog.V(0).Info("No Service CIDR provided. Skipping filtering out service addresses.")
}
for _, node := range nodeList.Items {
if node.Spec.PodCIDR == "" {
glog.Infof("Node %v has no CIDR, ignoring", node.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you want to 'continue' here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't really matter either way, i guess😃

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's just for logging

@justinsb
Copy link
Member

Haven't tested but looks right for a backport. I think there is still a theoretical race but think it is very unlikely now

@bprashanth
Copy link
Contributor Author

I think as long as we do it in a STW phase we know we're not allocating any CIDRS, so the list has all our previous allocations. Please, please try to shoot holes through this :)

@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed labels Jul 16, 2016
@bprashanth
Copy link
Contributor Author

IF we have an inflith request stuck in apiserver memory that doesn't show up in our list but makes it to etcd after we populate the map, and in that time we allocate that cidr, things might go wrong. Apiserver should discard all broken connection writes though, and if it made it through to the datastore, it should show up in the list.

@bprashanth
Copy link
Contributor Author

Cleaned it up a bit, deployed to 3 node cluster with restarting nodecontroller and it no dupes. Trying a larger cluster.

@bprashanth bprashanth assigned justinsb and j3ffml and unassigned bprashanth Jul 16, 2016
@bprashanth bprashanth changed the title [wip] List all nodes and occupy cidr map before starting allocations List all nodes and occupy cidr map before starting allocations Jul 16, 2016
@justinsb
Copy link
Member

The only failure case I could come up with was if there were no CIDRs left for the nodes, and we deleted and created a bunch of nodes in between the two lists, we might incorrectly deny some node allocations which we could satisfy. But (1) the window is tiny and (2) the existing code doesn't handle the no-cidrs-left case very well anyway and (3) this isn't a double-allocation so not a disaster. So I think this is great (presuming we don't spot anything), and has the advantage that it isn't an impossible cherry-pick onto the multiple branches.

@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 16, 2016
@bgrant0607
Copy link
Member

FAIL    k8s.io/kubernetes/contrib/mesos/cmd/km [build failed]

@mtaufen encountered this a couple days ago.

@bgrant0607 bgrant0607 added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Jul 16, 2016
@k8s-github-robot
Copy link

@bgrant0607
You must link to the test flake issue which caused you to request this manual re-test.
Re-test requests should be in the form of: k8s-bot test this issue: #<number>
Here is the list of open test flakes.

@bprashanth
Copy link
Contributor Author

contrib mesos has their own controller manager startup routine that's calling the same nodecontroller init function (contrib/pkg/controllermanager/controllermanager.go as opposed to kube-controller-manager). Invocation looks identical so I gave it the same failure mode.

@bgrant0607
Copy link
Member

LGTM

This does point out that controllers need a way to determine that they are reading the most up-to-date state, without performing an extra write. I also encountered that when toying with a fix for the bugs in updateCIDRAllocation. I'll think about that problem.

In the future, CIDR allocation may be a good use case for multi-key transactions. The Node API could keep a secondary index of allocated CIDRs and could fail node updates that would double-assign the same CIDR.

@bgrant0607 bgrant0607 added lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Jul 16, 2016
@k8s-bot
Copy link

k8s-bot commented Jul 16, 2016

GCE e2e build/test passed for commit 2f9516d.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 0bec77b into kubernetes:master Jul 16, 2016
@davidopp davidopp added this to the v1.3 milestone Jul 16, 2016
k8s-github-robot pushed a commit that referenced this pull request Jul 17, 2016
…8604-#29062-upstream-release-1.3

Automatic merge from submit-queue

Automated cherry pick of #28604 #29062

Cherry pick of #28604 #29062 on release-1.3.
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.3" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

smarterclayton pushed a commit to smarterclayton/kubernetes that referenced this pull request Aug 18, 2016
…#29062

Move CIDR allocation logic away from nodecontroller.go
List all nodes and occupy cidr map before starting allocations

:100644 100644 85277c4... 703aaaa... M	cmd/integration/integration.go
:100644 100644 9947e45... 7ce3365... M	cmd/kube-controller-manager/app/controllermanager.go
:100644 100644 f05792a... 95170ae... M	pkg/controller/node/cidr_allocator.go
:100644 100644 37cfdf6... 8d43713... M	pkg/controller/node/cidr_allocator_test.go
:000000 100644 0000000... 5f013d9... A	pkg/controller/node/cidr_set.go
:000000 100644 0000000... 5738da0... A	pkg/controller/node/cidr_set_test.go
:100644 100644 4549202... 1bb3402... M	pkg/controller/node/nodecontroller.go
:100644 100644 5c60e02... a87beeb... M	pkg/controller/node/nodecontroller_test.go
:000000 100644 0000000... 94e50c5... A	pkg/controller/node/test_utils.go
deads2k pushed a commit to deads2k/kubernetes that referenced this pull request Aug 18, 2016
…#29062

Move CIDR allocation logic away from nodecontroller.go
List all nodes and occupy cidr map before starting allocations

:100644 100644 85277c4... 703aaaa... M	cmd/integration/integration.go
:100644 100644 9947e45... 7ce3365... M	cmd/kube-controller-manager/app/controllermanager.go
:100644 100644 f05792a... 95170ae... M	pkg/controller/node/cidr_allocator.go
:100644 100644 37cfdf6... 8d43713... M	pkg/controller/node/cidr_allocator_test.go
:000000 100644 0000000... 5f013d9... A	pkg/controller/node/cidr_set.go
:000000 100644 0000000... 5738da0... A	pkg/controller/node/cidr_set_test.go
:100644 100644 4549202... 1bb3402... M	pkg/controller/node/nodecontroller.go
:100644 100644 5c60e02... a87beeb... M	pkg/controller/node/nodecontroller_test.go
:000000 100644 0000000... 94e50c5... A	pkg/controller/node/test_utils.go
shyamjvs pushed a commit to shyamjvs/kubernetes that referenced this pull request Dec 1, 2016
…pick-of-#28604-kubernetes#29062-upstream-release-1.3

Automatic merge from submit-queue

Automated cherry pick of kubernetes#28604 kubernetes#29062

Cherry pick of kubernetes#28604 kubernetes#29062 on release-1.3.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants