Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the ethtool kubelet dependency; use ip link instead of ethtool to determine pair interface. #30356

Closed
wants to merge 1 commit into from

Conversation

luxas
Copy link
Member

@luxas luxas commented Aug 10, 2016

@kubernetes/sig-network does this seem worthwhile?
I noticed that the ip link inside a container is named eth0@if{host_pair_interface}. This PR removes the dependency on ethtool that does that lookup as well.

Can you reproduce and verify?
ref #26093

$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether f0:76:1c:62:f1:36 brd ff:ff:ff:ff:ff:ff
3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DORMANT group default qlen 1000
    link/ether d0:53:49:0a:a9:8f brd ff:ff:ff:ff:ff:ff
209: flannel0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1472 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 500
    link/none 
210: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1472 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 02:42:5a:c2:48:48 brd ff:ff:ff:ff:ff:ff
212: vethb860bf4@if211: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1472 qdisc noqueue master docker0 state UP mode DEFAULT group default 
    link/ether f6:e0:1a:a4:a6:ec brd ff:ff:ff:ff:ff:ff link-netnsid 1
214: veth345cf11@if213: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1472 qdisc noqueue master docker0 state UP mode DEFAULT group default 
    link/ether c6:55:3a:33:89:71 brd ff:ff:ff:ff:ff:ff link-netnsid 0
$ sudo nsenter -t 22525 -n -- ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
211: eth0@if212: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1472 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 02:42:0a:01:56:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0

This change is Reviewable

@k8s-github-robot k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note-label-needed labels Aug 10, 2016
@yujuhong yujuhong assigned matchstick and unassigned yujuhong Aug 10, 2016
@yujuhong
Copy link
Contributor

Assigned to @matchstick to review or delegate.

@@ -37,7 +37,7 @@ const (
)

var (
ethtoolOutputRegex = regexp.MustCompile("peer_ifindex: (\\d+)")
ipLinkOutputRegex = regexp.MustCompile("eth0@if(\\d+)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where does this come from? "eth0" is not a given...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now eth0 is hardcoded in an other file with a todo to make it configurable. I may fetch the value from there instead.

Do you still think it may be something other than eth0 (on kubenet for example, which I haven't tested much)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubenet is not depending on the kubelet hairpin logic now. The CNI bridge plugin does the job.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we delete any of the old (pre-kubenet) code now before 1.4?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thockin I think we should keep it in v1.4, but after the release we might remove it if it's unneeded.

@freehan freehan assigned freehan and unassigned matchstick Aug 12, 2016
@freehan
Copy link
Contributor

freehan commented Aug 12, 2016

Since kubenet is on by default now. I do not think the auto e2e test will cover the code changes here. I would recommend setting the NETWORK_PROVIDER=none for complete e2e test.

@luxas luxas added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Aug 13, 2016
@Random-Liu
Copy link
Member

@k8s-bot test this issue #29932.

@freehan
Copy link
Contributor

freehan commented Aug 16, 2016

Okay. LGTM.

Please revert the test commit and self apply LGTM label.

@luxas luxas added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 16, 2016
@luxas
Copy link
Member Author

luxas commented Aug 16, 2016

Thanks for the review!
Self-applying now

@luxas
Copy link
Member Author

luxas commented Aug 16, 2016

@k8s-bot smoke test this please github issue: #IGNORE

@luxas
Copy link
Member Author

luxas commented Aug 16, 2016

@k8s-bot test this please github issue: #IGNORE

@k8s-bot
Copy link

k8s-bot commented Aug 16, 2016

GCE e2e build/test passed for commit ae97cda.

@freehan
Copy link
Contributor

freehan commented Aug 17, 2016

hmmm. I just came across a counter case.

$ nsenter -t 6066 -n -F -- ip link show eth0
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc noqueue state UP mode DEFAULT 
    link/ether 26:03:86:7f:02:8e brd ff:ff:ff:ff:ff:ff

It looks like we still cannot get rid of ethtool completely. What if we fall back to use ethtool if pattern not matched?

@freehan freehan removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 17, 2016
@luxas
Copy link
Member Author

luxas commented Aug 18, 2016

Ah, how bad! How did you produce that behaviour?
I'll update the PR to fallback on ethtool

@thockin
Copy link
Member

thockin commented Aug 18, 2016

Can we just do the things that ethtool is doing under the covers? I know
one of the original ethtool authors and he says it is pretty
straightforward..

On Thu, Aug 18, 2016 at 2:12 PM, Lucas Käldström notifications@github.com
wrote:

Ah, how bad! How did you produce that behaviour?
I'll update the PR to fallback on ethtool


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#30356 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVD7mREQ_G-teQhEekNM6Mt9ep9WDks5qhMqngaJpZM4JhCYs
.

@thockin
Copy link
Member

thockin commented Aug 18, 2016

https://raw.githubusercontent.com/safchain/ethtool/master/ethtool.go

I just ran this, it works.

On Thu, Aug 18, 2016 at 4:16 PM, Tim Hockin thockin@google.com wrote:

Can we just do the things that ethtool is doing under the covers? I know
one of the original ethtool authors and he says it is pretty
straightforward..

On Thu, Aug 18, 2016 at 2:12 PM, Lucas Käldström <notifications@github.com

wrote:

Ah, how bad! How did you produce that behaviour?
I'll update the PR to fallback on ethtool


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#30356 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVD7mREQ_G-teQhEekNM6Mt9ep9WDks5qhMqngaJpZM4JhCYs
.

@luxas
Copy link
Member Author

luxas commented Aug 19, 2016

@thockin That's a good idea, but we can't.
We're using nsenter to exec into the net namespace, and we can't do that with go, so the go ethtool code would always be running in the host netns :(

Guess we have to fallback on ethtool

@k8s-bot
Copy link

k8s-bot commented Aug 30, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message will repeat several times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

@luxas
Copy link
Member Author

luxas commented Oct 16, 2016

@freehan How did you procude the above condition?
Was it with the old networking mode (none)?

If so, we can rely on ip link after #34906 is merged then...

@luxas
Copy link
Member Author

luxas commented Nov 11, 2016

Hah @thockin this was the one I was talking about :)
This should be safe to do now when we've got rid of the legacy networking mode.
PTAL

@luxas
Copy link
Member Author

luxas commented Nov 16, 2016

We actually might want to consider this for v1.5 as a bug fix given kubernetes/kops#879

@thockin
Copy link
Member

thockin commented Nov 16, 2016

Do we really understand the escape that @freehan showed? What makes us think we really understand it? Are we sure all versions of kernels and ip link work properly?

Is this veth867530@if9 syntax specified or at least DOCUMENTED somewhere? a link to kernel code or ip tool code might convince me it was reliable.

My fear is we are replacing something known to work (but a dependency) with something new and undocumented. It looks like a side-effect, not an API. I want to know it is reliable.

@bboreham
Copy link
Contributor

I agree that relying on the output format of ip link is unwise.

We're using nsenter to exec into the net namespace, and we can't do that with go

A trick used in various places (e.g. Docker) is to re-exec the current executable with arguments to do just the one task we need done in a different namespace.

@jellonek
Copy link
Contributor

Can not we use there directly vishvananda/netlink instead calling external command? We have it already in our sources...
https://github.com/kubernetes/kubernetes/tree/master/vendor/github.com/vishvananda/netlink

It should be really easy to replicate what @thockin
@luxas PTAL on https://github.com/containernetworking/cni/tree/master/pkg/ns - so we can do something in particular ns while being still in go.

@bboreham
Copy link
Contributor

bboreham commented Nov 17, 2016

@jellonek it is necessary to look in the pod's network namespace, so vishvananda/netlink is insufficient.

If you add vishvananda/ns there is enough code to do the task, but you still have to exec a separate process because of vishvananda/netns#17

Edit: One more thing: the CNI page qualifies its recommendation with "All code dependent on a particular network namespace should be wrapped in the ns.Do() method". Although it sounds simple, it implies a vast amount of work to wrap every network call in Kubernetes and all of its dependent libraries to ensure it runs in the root namespace; it is simpler to exec a separate process when you specifically need to be in a non-root namespace.

jellonek added a commit to jellonek/kubernetes that referenced this pull request Nov 28, 2016
@jellonek
Copy link
Contributor

@bboreham: PTAL on linked PR in which I shown how this could be done - IMO it's a bit simpler than using that two external binaries (nsenter, ethtool).

@thockin btw. doing my PR i also found what you were asking for there

@luxas: WDYT about my implementation?

@bboreham
Copy link
Contributor

@jellonek please read up the issues I have linked. Your approach (which I also once thought was safe) has caused many weird failures in Docker, Weave Net and other places.

It is not possible with the current Go runtime to have one goroutine change namespace without sometimes affecting other goroutines. You must use a separate process to change namespace.

@jellonek
Copy link
Contributor

@bboreham ack. will definitely read both of mentioned issues.

@jellonek
Copy link
Contributor

Ok. So with current version of golang runtime we can not use such approach. I'll close my PR.

@k8s-github-robot
Copy link

This PR hasn't been active in 30 days. It will be closed in 59 days (Feb 26, 2017).

cc @luxas @thockin

You can add 'keep-open' label to prevent this from happening, or add a comment to keep it open another 90 days

@luxas
Copy link
Member Author

luxas commented Jan 8, 2017

Closing this as I don't have time to look into it. Can maybe be revisited in the future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.