Add support for kube-up.sh to deploy Calico network policy to GCI masters #29037

matthewdupre · 2016-07-15T23:31:31Z

Also remove requirement for calicoctl from Debian / salt installed nodes and clean it up a little by deploying calico-node with a manifest rather than calicoctl. This also makes it more reliable by retrying properly.

How to use:

make quick-release
NETWORK_POLICY_PROVIDER=calico cluster/kube-up.sh

One place where I was uncertain:

CPU allocations (on the master particularly, where there's very little spare capacity). I took some from etcd, but if there's a better way to decide this, I'm happy to change it.

This change is

matthewdupre · 2016-07-19T20:20:15Z

@thockin: Could you take a look at this please? It builds on the original Calico salt PR you reviewed (#26017).

thockin · 2016-07-20T22:28:45Z

cluster/gce/gci/configure-helper.sh

@@ -520,10 +520,21 @@ function start-etcd-servers {
    rm -f /etc/init.d/etcd
  fi
  prepare-log-file /var/log/etcd.log
-  prepare-etcd-manifest "" "4001" "2380" "200m" "etcd.manifest"
+  prepare-etcd-manifest "" "4001" "2380" "150m" "etcd.manifest"


What is this change

matthewdupre · 2016-07-20T23:21:25Z

Thanks for taking a look.

Those two changes reduce the CPU resources reserved for the existing etcd clusters to provide room for Calico. There's virtually nothing spare (~40m IIRC) to be able to run more pods on the master, so I had to reduce some existing allocations to make room.

There are a few other options:

Take different amounts of CPU from different places.
Don't reserve resources for the Calico containers.
Increase the master instance size so there's more to go around.

I'm open to any of these (or something else)?

matthewdupre · 2016-07-25T20:56:38Z

Anything I can do to help you review this @zmerlynn?

zmerlynn · 2016-07-25T21:31:37Z

Sorry, I was out when originally assigned. I don't think we want to unilterally drop the CPU assignments on the master here - even if there's spare CPU in those containers right now, we need to make sure we leave room for growth in the future. Could possibly do this as a conditional based on the network provider.

cc @mikedanese @fabioy

matthewdupre · 2016-07-25T21:50:39Z

Was there some systematic process for determining the CPU allocations in the first place?

I'd be perfectly happy to go with one of the alternative approaches as well. Please let me know what you think.

matthewdupre · 2016-07-28T22:14:10Z

Sorry to ping you all again @thockin, @mikedanese, @fabioy - I'd really like to get this closed down and in :).

It seems like the main point of contention is the CPU resource allocations. To get things in for now, how about we go with 2 above (remove CPU requests for the Calico components altogether)?

This will have no effect on existing (no NETWORK_POLICY_PROVIDER) function.
They're only requests; so in practice Calico will still be able to use CPU.
We can easily add requests in later if a concrete problem arises.

Would this be acceptable?

thockin · 2016-07-28T23:13:52Z

cluster/gce/gci/configure-helper.sh

+function start-calico-policy-controller {
+  local -r temp_file="/tmp/calico-policy-controller.manifest"
+  cp "${KUBE_HOME}/kube-manifests/kubernetes/gci-trusty/calico-policy-controller.manifest" "${temp_file}"
+  PUBLIC_IP=$(ifconfig eth0 | grep 'inet ' | awk '{print $2}')


when I run this locally I get "addr:172.17.128.66" but when I run it on GCI I get a bare IP. This feels fragile. how about hostname -i ? That is fragile in a different way...

thockin · 2016-07-28T23:37:37Z

wrt CPU allocations on the master, I don't think it was particularly scientific, but this does raise some fun problems. I can't actually guarantee that all combinations of addons will "fit" on the master. Can this run as a "regular" pod in the cluster?

Running with very minimal CPU needs means it might starve. Is that OK? what is the failure mode?

@mikedanese too since this touches on turnup

matthewdupre · 2016-07-29T02:49:26Z

Thanks for the feedback @thockin :). I'll make the minor adjustments you suggest (other than the Jinja syntax question, which I think I've addressed in reply: shout if you disagree). Now onto the CPU allocations / node choice:

Running calico-policy-controller on a regular node:

Running our etcd is problematic: it needs persistent storage, and we'd also have to plumb the IP around (maybe a service)? I don't know of a good solution for the storage.
Bootstrap is also tricky: if calico-nodes start before calico-policy-controller, then all pods will have their networking locked down until the latter is running. It ought to retry, but it's still a bit unpleasant.
We'd also have to make some adjustments to have calico-policy-controller authenticate with the apiserver: this is pretty straightforward though and should happen some time anyway.

Overall it's quite difficult; it might be possible, but it's by no means a trivial move. It's a central cluster service, so I don't think it's too inappropriate to run on the master in that respect.

As for failure modes, if calico-policy-controller receives no CPU time, the calico-nodes will keep everything as it is until they can get new information from the Calico etcd. That is: existing pods will be allowed to talk to the things they were already allowed to, and new pods won't be able to reach anything.

In practice, since the master components all have requests rather than limits, I'm not even sure if omitting requests for the calico-policy-controller would make any real difference?

thockin · 2016-07-29T05:11:29Z

Running calico-policy-controller on a regular node:

Running our etcd is problematic: it needs persistent storage, and we'd also have to plumb the IP around (maybe a service)? I don't know of a good solution for the storage.

Service IP should work. For storage we have persistent volumes and
even dynamic provisioning, so that should work

Bootstrap is also tricky: if calico-nodes start before calico-policy-controller, then all pods will have their networking locked down until the latter is running. It ought to retry, but it's still a bit unpleasant.

We're going to have this problem in general with network
infrastructure. We don't have a way to guarantee ordering, so we
really need to be sure that pods which need calico IPs will fail and
retry until calico is actually up. Meanwhile, pods which use host
networking will run, including calico

We'd also have to make some adjustments to have calico-policy-controller authenticate with the apiserver: this is pretty straightforward though and should happen some time anyway.

When you run as a pod we provide the secrets you need. All you have
to do is present the certificate.

Overall it's quite difficult; it might be possible, but it's by no means a trivial move. It's a central cluster service, so I don't think it's too inappropriate to run on the master in that respect.

I don't think it's INappropriate, but as you're experiencing, the
master is sort of jammed, and the more we put there the less likely it
is to actually, you know, WORK. I also don't think this idea is
impossibly hard. It might expose some bugs, but I can't see why it
shouldn't work eventually. Can we make an agreement to try it and see
what doesn't work and at least TRY to make it work? @mikedanese mr.
self-hosting.

As for failure modes, if calico-policy-controller receives no CPU time, the calico-nodes will keep everything as it is until they can get new information from the Calico etcd. That is: existing pods will be allowed to talk to the things they were already allowed to, and new pods won't be able to reach anything.

@davidopp for ideas. What do we do when the master is full? Do we
have an answer to this?

In practice, since the master components all have requests rather than limits, I'm not even sure if omitting requests for the calico-policy-controller would make any real difference?

If you specify a request with no limit, limit == request. That is
"guaranteed" QoS.

caseydavenport · 2016-08-03T17:33:29Z

cluster/saltbase/salt/calico/calico-node.manifest

+          value: "false"
+      resources:
+        requests:
+          cpu: 40m


Is this the right number?

mikedanese · 2016-08-03T19:33:15Z

Overall it's quite difficult; it might be possible, but it's by no means a trivial move. It's a central cluster service, so I don't think it's too inappropriate to run on the master in that respect.

I don't think it's INappropriate, but as you're experiencing, the
master is sort of jammed, and the more we put there the less likely it
is to actually, you know, WORK. I also don't think this idea is
impossibly hard. It might expose some bugs, but I can't see why it
shouldn't work eventually. Can we make an agreement to try it and see
what doesn't work and at least TRY to make it work? @mikedanese mr.
self-hosting.

The optimal deployment would be a daemonset for the node agent that uses the kubernetes API to store state. It's really the only way to reduce deployment of calico across all kubernetes clusters to a single command, and I think that's what we should strive for. Last time I talked with you guys, you mentioned that this was planned? How far off is that? kube-flannel does this which is merging int flannel proper as an alternativly subnet managers. It removes the dependency on etcd or flannel server.

Running our etcd is problematic: it needs persistent storage, and we'd also have to plumb the IP around (maybe a service)? I don't know of a good solution for the storage.

use the apiserver :)

Bootstrap is also tricky: if calico-nodes start before calico-policy-controller, then all pods will have their networking locked down until the latter is running. It ought to retry, but it's still a bit unpleasant.

this primarily happens during initial deployment, which doesn't sound like too big of an issue.

We'd also have to make some adjustments to have calico-policy-controller authenticate with the apiserver: this is pretty straightforward though and should happen some time anyway.

You can use a service account. Credentials to the apiserver are injected to all pods by default. If you are already using the go client (?) then it's a couple lines of code change to switch to InClusterConfig.

matthewdupre · 2016-08-03T23:24:29Z

@thockin @mikedanese OK, so I had a go at moving the policy controller to regular nodes. There's one thing I haven't yet got working, where I wanted to get some input on what the right approach is.

We need to do a kubectl create -f calico-policy-controller.yaml or equivalent, but where? It looks like the established method is to use the addon manager (which would involve some work adding replication controllers and pvs and pvcs to replace the pet set). Alternatively, if there's another place I can call the create from that would be easier.

With a delayed kubectl create, this commit does work, but it's fairly apparent that there are still unpleasant things here.

All our pods need to be host networked, otherwise they'll never be able to start up. Unfortunately host networked ports don't play nicely with services, hence the 10.0.0.17 hardcoded IP, and similar CONFIGURE_ETC_HOSTS hack to access the kubernetes API (the pod doesn't get the right DNS).
Etcd isn't clustered, and clustering it looks deeply unpleasant. It does restart on failure nicely though.
Moving to calico-node using the kubernetes API directly isn't something that can happen quickly (months not weeks).

I don't really mind what approach is taken here. I do care about getting this fixed sooner rather than later - the GCI master change broke NETWORK_POLICY_PROVIDER=calico cluster/kube-up.sh, and I'm trying to get that working again. Moving to a single create -f .yaml install is a thing we'd like to do (and we're pretty close), but for the time being it will still suffer the limitations above.

Which of these approaches (first commit / second commit) should I go with? Using the kubernetes API directly isn't possible any time soon.

caseydavenport · 2016-08-03T23:30:07Z

@mikedanese

The optimal deployment would be a daemonset for the node agent that uses the kubernetes API to store state. It's really the only way to reduce deployment of calico across all kubernetes clusters to a single command, and I think that's what we should strive for. Last time I talked with you guys, you mentioned that this was planned? How far off is that? kube-flannel does this which is merging int flannel proper as an alternativly subnet managers. It removes the dependency on etcd or flannel server.

Yes! This is something I've been looking into recently. Actually installing the Calico pieces using a DaemonSet is relatively straightforward. I've got a prototype working but it's still a little bit fragile. The problem of course is etcd.

use the apiserver :)

We've looked into modifying Calico to use the k8s API as a replacement for etcd, and it may be something we do but it's going to take a bit of effort to make that happen so probably won't be available in the near future. As a pragmatic short-term solution, I think running etcd on top of k8s can get us that "one-click install" behavior.

this primarily happens during initial deployment, which doesn't sound like too big of an issue.

Yep, I agree. I'm not too worried about transient startup conditions.

You can use a service account. Credentials to the apiserver are injected to all pods by default. If you are already using the go client (?) then it's a couple lines of code change to switch to InClusterConfig.

Yep, the policy controller can already do this.

caseydavenport · 2016-08-04T00:37:33Z

To add to my previous comment:

We want to keep moving down this "one-click install" path, but I don't think this PR is the place to do it all. The GCI master PR broke the NETWORK_POLICY_PROVIDER=calico option and firstly we want to make that work for people again.

We'll definitely want to do subsequent changes to make this "self-hosted" as we make that option more stable.

lxpollitt · 2016-08-04T09:24:39Z

+1 to Casey's comments. We should continue with the self-hosting one-click install work as a high priority (number one priority for us after this PR), but I don't think we should delay getting this fix merged in. If the main immediate concern is the extra load on the master then we can just make sure that the docs are super clear that there is a scale impact of enabling policy - and if they are an at scale deployment (e.g 1000 hosts) then they either should use larger instance for masters or they should follow the manual instructions in the Calico repo. I'm imagining we have the self-hosting one-click install ready to go in time for 1.4 so this is a short term impact anyway.

matthewdupre · 2016-08-08T21:50:03Z

(I'm going to make the addon-manager change; I'll post back when that's done.)

mikedanese · 2016-08-08T23:17:25Z

cluster/addons/calico-policy-controller/MAINTAINERS.md

@@ -0,0 +1,6 @@
+# Maintainers


Please use an OWNERS file like elsewhere in the repo.

I based this on the dashboard and node-problem-detector folders adjacent to this. I'm happy to change it: just pointing out that the closest equivalents currently look like this.

Which would you prefer?

Maintainers is fine

lxpollitt · 2016-08-09T01:53:42Z

I'm concerned that we are going through feature creep on this PR. It was originally intended as a bug fix. If you pass NETWORK_POLICY_PROVIDER in then kube-up doesn't work in 1.3. We want to get this bug fix into the next 1.3 patch release. But it seems we've started adding increments towards the future vision of self-hosting into the PR which is delaying getting the fix into 1.3. We are already working on self-hosting and it is our highest priority task once this PR gets merged, but I don't think we should delay getting this PR merged on any of that work.

mikedanese · 2016-08-09T18:09:08Z

cluster/gce/gci/configure-helper.sh

@@ -878,6 +878,9 @@ function start-kube-addons {
  if echo "${ADMISSION_CONTROL:-}" | grep -q "LimitRanger"; then
    setup-addon-manifests "admission-controls" "limit-range"
  fi
+  if [[ "${NETWORK_POLICY_PROVIDER}" == "calico" ]]; then


Please add a :- to the variable expanision like the other checks.

k8s-bot · 2016-08-11T17:25:44Z

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

mikedanese · 2016-08-11T20:33:05Z

cluster/addons/calico-policy-controller/calico-etcd-service.yaml

+  name: calico-etcd
+  namespace: kube-system
+spec:
+  clusterIP: 10.0.0.17


Sorry for not catching this. We parametrize the same field in kube-dns service because it needs to fall into the service subnet. This may not work if people are not using the default service subnet. I think we can defer the fix for a follow up though.

mikedanese · 2016-08-11T20:44:55Z

Thanks, LGTM here. Last thing I'm unsure of is @caseydavenport 's review comment about resource request. Can you address?

matthewdupre · 2016-08-11T22:20:54Z

Thanks @mikedanese - please hold off for a little longer: I'm tracking down an issue with API connectivity.

I spoke with @caseydavenport - we're going to remove the request for the calico-nodes.

matthewdupre · 2016-08-11T23:33:12Z

OK @mikedanese this is ready to go from my end.

We'll continue to enhance the integration (un-static podding the calico-nodes is fairly close), but this should be enough to cover the issue. That's an enhancement, so we'll do a separate PR.

Let me know if you have any further comments; thanks :).

caseydavenport · 2016-08-11T23:35:43Z

Looking good to me!

…r GCI master

k8s-bot · 2016-08-12T00:58:01Z

GCE e2e build/test passed for commit 568fb74.

k8s-github-robot · 2016-08-12T14:37:19Z

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

k8s-bot · 2016-08-12T15:11:21Z

GCE e2e build/test passed for commit 568fb74.

k8s-github-robot · 2016-08-12T15:12:01Z

Automatic merge from submit-queue

matthewdupre · 2016-08-12T16:46:16Z

Thanks @mikedanese :).

googlebot added the cla: yes label Jul 15, 2016

k8s-github-robot assigned eparis Jul 15, 2016

k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels Jul 15, 2016

thockin reviewed Jul 20, 2016
View reviewed changes

eparis removed their assignment Jul 21, 2016

k8s-github-robot assigned zmerlynn Jul 21, 2016

k8s-github-robot added do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. retest-not-required-docs-only and removed retest-not-required-docs-only labels Jul 28, 2016

thockin reviewed Jul 28, 2016
View reviewed changes

caseydavenport reviewed Aug 3, 2016
View reviewed changes

cluster/saltbase/salt/calico/calico-node.manifest

value: "false"

resources:

requests:

cpu: 40m

Copy link

Member

caseydavenport Aug 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right number?

zmerlynn assigned mikedanese Aug 8, 2016

zmerlynn removed their assignment Aug 8, 2016

mikedanese reviewed Aug 8, 2016
View reviewed changes

mikedanese reviewed Aug 9, 2016
View reviewed changes

matthewdupre force-pushed the calico-policy-deploy branch from 0c01d25 to 074aacd Compare August 11, 2016 18:09

mikedanese reviewed Aug 11, 2016
View reviewed changes

matthewdupre force-pushed the calico-policy-deploy branch from 074aacd to cdec20e Compare August 11, 2016 21:08

matthewdupre force-pushed the calico-policy-deploy branch from cdec20e to f3e3fd4 Compare August 11, 2016 23:29

apelisse removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Aug 11, 2016

kube-up: move Calico policy components off master, and add support fo…

568fb74

…r GCI master

matthewdupre force-pushed the calico-policy-deploy branch from f3e3fd4 to 568fb74 Compare August 12, 2016 00:11

mikedanese added lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Aug 12, 2016

k8s-github-robot merged commit 9fe15e7 into kubernetes:master Aug 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for kube-up.sh to deploy Calico network policy to GCI masters #29037

Add support for kube-up.sh to deploy Calico network policy to GCI masters #29037

matthewdupre commented Jul 15, 2016 •

edited by k8s-oncall

Loading

matthewdupre commented Jul 19, 2016

thockin Jul 20, 2016

matthewdupre commented Jul 20, 2016

matthewdupre commented Jul 25, 2016

zmerlynn commented Jul 25, 2016

matthewdupre commented Jul 25, 2016

matthewdupre commented Jul 28, 2016

thockin Jul 28, 2016

thockin commented Jul 28, 2016

matthewdupre commented Jul 29, 2016

thockin commented Jul 29, 2016

caseydavenport Aug 3, 2016

mikedanese commented Aug 3, 2016 •

edited

Loading

matthewdupre commented Aug 3, 2016

caseydavenport commented Aug 3, 2016

caseydavenport commented Aug 4, 2016

lxpollitt commented Aug 4, 2016

matthewdupre commented Aug 8, 2016

mikedanese Aug 8, 2016

matthewdupre Aug 8, 2016

mikedanese Aug 9, 2016

lxpollitt commented Aug 9, 2016

mikedanese Aug 9, 2016

k8s-bot commented Aug 11, 2016

mikedanese Aug 11, 2016

mikedanese commented Aug 11, 2016

matthewdupre commented Aug 11, 2016

matthewdupre commented Aug 11, 2016

caseydavenport commented Aug 11, 2016

k8s-bot commented Aug 12, 2016

k8s-github-robot commented Aug 12, 2016

k8s-bot commented Aug 12, 2016

k8s-github-robot commented Aug 12, 2016

matthewdupre commented Aug 12, 2016

Add support for kube-up.sh to deploy Calico network policy to GCI masters #29037

Add support for kube-up.sh to deploy Calico network policy to GCI masters #29037

Conversation

matthewdupre commented Jul 15, 2016 • edited by k8s-oncall Loading

matthewdupre commented Jul 19, 2016

Choose a reason for hiding this comment

matthewdupre commented Jul 20, 2016

matthewdupre commented Jul 25, 2016

zmerlynn commented Jul 25, 2016

matthewdupre commented Jul 25, 2016

matthewdupre commented Jul 28, 2016

Choose a reason for hiding this comment

thockin commented Jul 28, 2016

matthewdupre commented Jul 29, 2016

thockin commented Jul 29, 2016

Choose a reason for hiding this comment

mikedanese commented Aug 3, 2016 • edited Loading

matthewdupre commented Aug 3, 2016

caseydavenport commented Aug 3, 2016

caseydavenport commented Aug 4, 2016

lxpollitt commented Aug 4, 2016

matthewdupre commented Aug 8, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lxpollitt commented Aug 9, 2016

Choose a reason for hiding this comment

k8s-bot commented Aug 11, 2016

Choose a reason for hiding this comment

mikedanese commented Aug 11, 2016

matthewdupre commented Aug 11, 2016

matthewdupre commented Aug 11, 2016

caseydavenport commented Aug 11, 2016

k8s-bot commented Aug 12, 2016

k8s-github-robot commented Aug 12, 2016

k8s-bot commented Aug 12, 2016

k8s-github-robot commented Aug 12, 2016

matthewdupre commented Aug 12, 2016

matthewdupre commented Jul 15, 2016 •

edited by k8s-oncall

Loading

mikedanese commented Aug 3, 2016 •

edited

Loading