Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use init container load kernel modules in kube-proxy #56732

Closed
wants to merge 1 commit into from
Closed

use init container load kernel modules in kube-proxy #56732

wants to merge 1 commit into from

Conversation

Lion-Wei
Copy link

@Lion-Wei Lion-Wei commented Dec 2, 2017

What this PR does / why we need it:

kube-proxy ipvs mode need load kernel modules ip_vs/ip_vs_* . Now we do this in kube-proxy container, but have some safety concern. Had some discussion in this pr: #52003 . Mostly think use init container load kernel modules is a good idea.

This pr to use init container load kernel modules instead of load those modules inside kube-proxy container, so that we can lower the kube-proxy container permission.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #54497

Special notes for your reviewer:

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 2, 2017
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 2, 2017
@Lion-Wei
Copy link
Author

Lion-Wei commented Dec 2, 2017

@k8s-ci-robot
Copy link
Contributor

@Lion-Wei: GitHub didn't allow me to request PR reviews from the following users: vfreex.

Note that only kubernetes members can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @MrHohn @cmluciano @m1093782566 @vfreex

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. area/ipvs labels Dec 2, 2017
@dims
Copy link
Member

dims commented Dec 2, 2017

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 2, 2017
@bowei
Copy link
Member

bowei commented Dec 3, 2017

Shouldn't this be loaded only if the IPVS proxier is enabled?

@Lion-Wei
Copy link
Author

Lion-Wei commented Dec 4, 2017

Thanks for replaying, @bowei
Yes, I use --proxy-mode=ipvs and mode: ipvs to identify kube-proxy ipvs mode in init container, with command:

echo "{{params}}" | grep "\-\-proxy-mode=ipvs" && modprobe ...
cat /var/lib/kube-proxy/config.conf | grep "mode: ipvs" && modprobe ...

Copy link

@cmluciano cmluciano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few comments

@@ -175,6 +175,20 @@ spec:
- mountPath: /run/xtables.lock
name: xtables-lock
readOnly: false
initContainers:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this spacing align with the containers line?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No....
Thanks for pointing out.

command:
- /bin/sh
- -c
- cat /var/lib/kube-proxy/config.conf | grep "mode: ipvs" &&

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of parsing the file on disk, can we pull in the component configmap ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using configmap to config conf will be better

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing. I tried to use configmap. Kubeadm created kube-proxy configmap with a struct:

apiVersion: v1
data:
  config.conf: |-
    apiVersion: kubeproxy.config.k8s.io/v1alpha1
...
    mode: ipvs
...

In which config.conf is a key with a bug value. I can use this key get all config.conf value and see it include ipvs: mode or not. But can't get mode field. Is there any other way of using configmap?

- -c
- cat /var/lib/kube-proxy/config.conf | grep "mode: ipvs" &&
modprobe -a ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh nf_conntrack_ipv4 ||
echo "didn't load ipvs related kernel modules"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use an if statement and use an explicit success/failure return code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this can be simpler. And I worried if the command cat /var/lib/kube-proxy/config.conf | grep "mode: ipvs" return non zero, the init container will exist with error.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if we specified that we want IPVS mode, wouldn't we want that to happen? Otherwise we will log the error but continue on to the regular container and fallback to iptables/userspace.

cc @MrHohn @bowei

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's the original logic. If can't use ipvs mode, give a warning and slightly fallback to iptables to make sure kube-proxy still can work.
We can talk about it, to chose a more reasonable logic.
@m1093782566

Copy link
Contributor

@m1093782566 m1093782566 Dec 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If failed to load kernel modules, IPVS proxier can't work. In order to make kube-proxy work, we let kube-proxy fall back to iptables or userspace mode... That's NOT a silent action, kube-proxy will log the reason.

Any problems?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cite a case, if user specify iptables mode but kernel is not compatible or iptables version too low, kube-proxy will fall back to userspace mode.

readOnly: true
initContainers:
- name: init-kube-proxy
image: gcr.io/google_containers/busybox
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a versioned image. Probably gcr.io/google_containers/busybox:1.27.2?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

@@ -57,6 +57,19 @@ spec:
- mountPath: /run/xtables.lock
name: xtables-lock
readOnly: false
readOnly: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate readOnly field?

@@ -693,15 +693,6 @@ func (em proxyEndpointsMap) unmerge(other proxyEndpointsMap) {
// return an error if it fails to get the kernel modules information without error, in which
// case it will also return false.
func CanUseIPVSProxier(ipsetver IPSetVersioner) (bool, error) {
// Try to load IPVS required kernel modules using modprobe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why delete these lines? kube-proxy is not always running inside container. If kube-proxy running on host, it can modprobe kernel modules. Besides, if kube-proxy is running inside container and failed to load kernel modules, it just log warning messages - please double check it.

Copy link
Author

@Lion-Wei Lion-Wei Dec 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured out if kube-proxy run inside the container, it will certainly fail.
But maybe we can accept an warning message. I will take it back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's why we just log warning message instead of existing with error.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Lion-Wei
We suggest the following additional approvers: bowei, matchstick, timothysc

Assign the PR to them by writing /assign @bowei @matchstick @timothysc in a comment when ready.

Associated issue: #52003

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@kubernetes kubernetes deleted a comment from k8s-github-robot Dec 7, 2017
@@ -698,7 +698,7 @@ func CanUseIPVSProxier(ipsetver IPSetVersioner) (bool, error) {
err := utilexec.New().Command("modprobe", "--", kmod).Run()
if err != nil {
glog.Warningf("Failed to load kernel module %v with modprobe. "+
"You can ignore this message when kube-proxy is running inside container without mounting /lib/modules", kmod)
"You can ignore this message when kube-proxy is running inside container", kmod)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the warning message mean here? Does it mean that I do not need to load those kernel modules? But this PR is using initContainer to load those modules, seems a bit weird to me, can you help elaborate?

Copy link
Contributor

@m1093782566 m1093782566 Dec 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the warning message mean here? Does it mean that I do not need to load those kernel modules?

ip_vs_* kernel modules is needed. But, it's unnecessary that we should load it via modprobe by kube-proxy. For example, kube-proxy run inside container and ip_vs_* kernel modules are already installed in host.

But this PR is using initContainer to load those modules, seems a bit weird to me, can you help elaborate?

If kube-proxy is always installed by the scripts/manifests which this PR has covered, we can change the warning to error, but actually it's NOT.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for replaying, we have this warning when modprobe command return err, if we use init container then we don't need load kernel modules inside kube-proxy container, so can ignore this message. The follow code will check if kernel loaded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Lion-Wei and @m1093782566 , still have some question: If the modprobe return error, then it should means that some modules are not loaded, I think that we should return error for such case, as the ipvs may not work well if some modules are not loaded even running in container?

Copy link
Contributor

@m1093782566 m1093782566 Dec 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gyliu513

If the modprobe return error, then it should means that some modules are not loaded

NO.

Suppose you run kube-proxy in a container. When you execute modprobe ip_vs command, you will get an error since there is no kernel module in container by default. However, you will find some kernel modules in /proc/modules since there are some kernel modules in host.

That's why we do 2 steps:

1 modprobe - log warning if error occur

2 "cut", "-f1", "-d", " ", "/proc/modules" - return error

We should consider both container and non-container cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @m1093782566

So if the host has kernel modules in /proc/modules, then we do not need to load those modules in container any more if the container is using security mode?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, as the initContainer already done the work, how about add more info claim that the initContainer already helped to do the modprobe work?

Copy link
Contributor

@m1093782566 m1093782566 Feb 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no? kube-proxy process should not be aware of init container? In other words, process should not care how it's deployed - deployed with init container or anything else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, why ignore this message when proxy is running in container? How to handle the case if there is no init container and kube proxy do not mount /lib/modules either?

Copy link
Contributor

@m1093782566 m1093782566 Feb 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, why ignore this message when proxy is running in container?

Because users can deploy kube-proxy in anyway they like, they may don't use init container.

How to handle the case if there is no init container and kube proxy do not mount /lib/modules either?

If kube-proxy is running in container and didn't mount /lib/modules from host, it still has a chance to run IPVS mode.

- /bin/sh
- -c
- 'echo $(PROXY_CONFIG) | grep "mode: ipvs" && modprobe -a
ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh nf_conntrack_ipv4 || echo "did not load
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More comments, do we really need this init container? As the kube-proxy already modprobe the required modules at here https://github.com/kubernetes/kubernetes/blob/release-1.9/pkg/proxy/ipvs/proxier.go#L697-L703 for 1.9?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed in #52003, Folks believed a better solution is to load kernel modules in init container instead as that has less security concerns, but use this as an alternative solution in first to unblock the IPVS beta work in rebase-1.9.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Lion-Wei

  1. Can you please clarify more for the security risk? and how does initContainer can help this?
  2. Why modprobe is needed in initContainer? I see that the kube-proxy can load those modules dynamically.

@Lion-Wei
Copy link
Author

Lion-Wei commented Jan 3, 2018

@MrHohn @luxas . I think it's ready for approve, since this pr need approval from each of these OWNERS Files. : )

@Lion-Wei
Copy link
Author

Lion-Wei commented Jan 3, 2018

/assign @thockin . : )

@errordeveloper
Copy link
Member

errordeveloper commented Jan 10, 2018

This adds many lines of configuration and a duplicate shell script, all of this code can diverge too easily and very few users would notice. If we want to improve security, I'd say we could ask IPVS users to ensure these modules are present on the nodes (which is easy with kube-up.sh), and we could add a pre-flight check in kubeadm too. This doesn't sound like the kind of thing that needs to be added upstream.

@k8s-github-robot
Copy link

@Lion-Wei PR needs rebase

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 17, 2018
@freehan
Copy link
Contributor

freehan commented Jan 17, 2018

Wouldn't it make more sense if we split the manifest per mode? I would imagine different kube-proxy implementations will diverge. If every mode uses the same manifest, it will become more and more clunky and harder to maintain.

Probably structure it like this ??

cluster/addons/kube-proxy/kube-proxy-ds.yaml (default, if no specific manifest)
cluster/addons/kube-proxy/iptables/...
cluster/addons/kube-proxy/ipvs/...
cluster/addons/kube-proxy/windows/...

Another long term goal would be to split the kube-proxy binary and transform the common components into a library or interface.

@Lion-Wei
Copy link
Author

@errordeveloper Sorry for the delay. But I don't think tell user to load those kernel modules is a good idea. We can do this work in script like kube-up.sh but can't cover all solution of build cluster, In kubeadm, user have to load kernel modules in each node, which sounds not very friendly.

@Lion-Wei
Copy link
Author

@freehan Definitely make sense. IPVS and iptables may have lots different features and different option in the future.
And also, I understand each kube-proxy mode have common component can be extract, but not sure about split the kube-proxy binary.

@errordeveloper
Copy link
Member

If we really are looking for best user experience, I'd be looking forward to any ideas where we would introduce an API change/extensions that allows pods to somehow specify modules it depends on.
If that sounds like a big feature, perhaps we could at least consider using a dedicated Go program (with a container image of its own) that can load a list of modules, instead of this ad-hoc shell script and busybox image.

@Lion-Wei
Copy link
Author

Lion-Wei commented Jan 24, 2018

@errordeveloper Yeah, I understand your concern, indeed, use busybox and shell script to load kernel modules is very primaeval and unsightly, I'll try to make it more graceful. Thanks for your recommendation.
: )

@errordeveloper
Copy link
Member

errordeveloper commented Feb 1, 2018 via email

@rramkumar1
Copy link
Contributor

rramkumar1 commented Feb 2, 2018

@Lion-Wei This needs to be resolved before ipvs goes GA. I think the general sentiment is that putting the modprobe logic in the init container is not ideal. I think the approach that will require the least work from the user's perspective is to bake the logic of loading the relevant modules into the startup scripts. This looks like it would involve plumbing the KUBEPROXY_MODE environment variable around everywhere.

I can work with you on this to make sure that all cases are covered.

@m1093782566
Copy link
Contributor

@rramkumar1 Cool!!

@m1093782566
Copy link
Contributor

m1093782566 commented Feb 3, 2018

Another long term goal would be to split the kube-proxy binary and transform the common components into a library or interface.

I suspect if it's doable because ipvs will fall back to iptables and iptables will fall back to userspace.

Wouldn't it make more sense if we split the manifest per mode? I would imagine different kube-proxy implementations will diverge. If every mode uses the same manifest, it will become more and more clunky and harder to maintain.

Interesting, but I also worry about the proxy mode fall back path.

@m1093782566
Copy link
Contributor

@rramkumar1

Do you mean we don't need to load kernel modules in init container, instead, we should load kernel modules in deployment tools/scripts. @Lion-Wei has already done it for local-up.sh. Should we continue do this for kubeadm, kubernetes-anywhere, gce scripts...?

@m1093782566
Copy link
Contributor

m1093782566 commented Feb 4, 2018

Please check context around #52003 for more information, @gyliu513

Following are some thoughts off the top of my head:

  • Currently we mount /lib/modules to kube-proxy container. It is going to cause problems if we try to reduce the capability of kube-proxy down to CAP_NET_ADMIN? - I am not pretty sure, there is an interesting article: xref: https://lwn.net/Articles/430462/. That's why we are trying to load kernel modules by init container.

  • Apart from init container, we can also load kernel modules in start up scripts. Since IPVS is not the default mode for kube-proxy, possibly we can test an environment variable and then load ip_vs_* kernel modules in start-up scripts for now. When IPVS mode graduates to default mode, we can ALWAYS load ip_vs_* kernel modules in kube-proxy start-up scripts.

@rramkumar1
Copy link
Contributor

@m1093782566 Yes, startup scripts. I think it would be great if you guys could open an issue to track this with a checkbox for each piece you need to cover (kubeadm, gce, kubernetes-anywhere). That way, we can easily track which pieces are complete.

@m1093782566
Copy link
Contributor

@Lion-Wei

What's your opinion?

@Lion-Wei
Copy link
Author

Lion-Wei commented Feb 6, 2018

Summarize what we discussed before, to be concise, I think there are three concerns about this pr:

  • We need a better way to do this walk, like a dedicated Go program with it's own image. @errordeveloper

consider using a dedicated Go program (with a container image of its own) that can load a list of modules, instead of this ad-hoc shell script and busybox image.

  • We shold load kernel modules in setup cluster phase, not kube-proxy init container. @rramkumar1

I think the approach that will require the least work from the user's perspective is to bake the logic of loading the relevant modules into the startup scripts

  • We should split the manifest per mode. @freehan

I would imagine different kube-proxy implementations will diverge. If every mode uses the same manifest, it will become more and more clunky and harder to maintain.

I understand use busybox and shell script to load kernel modules is very primaeval and unsightly, but I do think use init container load kernel modules is a better idea then bake the logic of loading the relevant modules into the startup scripts.

But if we decide that's what we should do, I can accept.

@cmluciano @MrHohn @bowei

@m1093782566
Copy link
Contributor

m1093782566 commented Feb 6, 2018

@Lion-Wei

but I do think use init container load kernel modules is a better idea then bake the logic of loading the relevant modules into the startup scripts.

Do not forget we are NOT always running kube-proxy in container. init container may don't work in such case.

@rramkumar1
Copy link
Contributor

@Lion-Wei can we close this?

@Lion-Wei
Copy link
Author

Lion-Wei commented Feb 7, 2018

@rramkumar1 . I'd like leave this open until our work finished, for reference or something else. : )

@Lion-Wei
Copy link
Author

Lion-Wei commented Mar 5, 2018

/close

@Lion-Wei
Copy link
Author

Lion-Wei commented Mar 5, 2018

I think this can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ipvs cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note-none Denotes a PR that doesn't merit a release note. sig/network Categorizes an issue or PR as relevant to SIG Network. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Load kernel modules using init container in kube-proxy pod