Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-proxy's --healthz-bind-address should support IPv4 and IPv6 simultaneously (dual stack) #125055

Open
sanmai-NL opened this issue May 22, 2024 · 25 comments
Assignees
Labels
area/kube-proxy kind/feature Categorizes issue or PR as related to a new feature. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@sanmai-NL
Copy link
Contributor

sanmai-NL commented May 22, 2024

What would you like to be added?

--healthz-bind-address ipport     Default: 0.0.0.0:10256
--
  | The IP address and port for the health check server to serve on, defaulting to "0.0.0.0:10256" (if --bind-address is unset or IPv4), or "[::]:10256" (if --bind-address is IPv6). Set empty to disable. This parameter is ignored if a config file is specified by --config.

The documentation suggests that the bind address is singular and that it's either of the IPv4 or IPv6 family. I think that's doubtful, 0.0.0.0 and :: also bind to both families in kube-apiserver. Fix the docs (but see *) or fix the implementation.

Why is this needed?

This artificial limitation makes it unpredictable where the healthz socket listens when running a dual-stack cluster.
*: It is particularly problematic and not fixable as a docs fix when the user wants to listen on loopback addresses for both families, since there's no notation for that, unlike 'all addresses' (0.0.0.0 and ::).

@sanmai-NL sanmai-NL added the kind/feature Categorizes issue or PR as related to a new feature. label May 22, 2024
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 22, 2024
@neolit123
Copy link
Member

/sig network

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 22, 2024
@cyclinder
Copy link
Contributor

I think that's doubtful, 0.0.0.0 and :: also bind to both families in kube-apiserver

How does kube-apiserver work for headlthz-bind-address? I think if it defaults to "0.0.0.0:10256", we can access to the service with ipv4 or ipv6 address.

when a server is told to listen on 0.0.0.0 that means "listen on every available network interface".

@uablrek
Copy link
Contributor

uablrek commented May 23, 2024

Just FYI, I tested to "scope" ipv6 localhost with [::%lo]:10256. It is accepted by go (but not by kube-proxy), but alas, the scope is simply ignored for non-link-local addresses 😞

@aroradaman
Copy link
Member

@sanmai-NL we are planning to add v1alpha2 kube-proxy configuration which will accept a dual-stack CIDR pair of healthz and metrics addresses.

(we are moving towards CIDR to allow users to provide the network instead of just address, healthz and metrics server will be exposed on nodesIPs which belong to that CIDR range)
(ref: https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/784-kube-proxy-component-config#following-fields-will-be-changed)

@uablrek
Copy link
Contributor

uablrek commented May 23, 2024

The wanted feature here seems to be to accept connects on lo only. A good way is to bind the listen socket to the device using the SO_BINDTODEVICE sockopt. Now, I don't propose this as a solution in this case, but the function may come in handy in some other cases, so I figured out how it's done in go.

tcp-server.go
// Tcp server that binds to "lo"
package main

import (
	"context"
	"log"
	"net"
	"os"
	"syscall"
)

func main() {
	if len(os.Args) < 2 {
		log.Fatal("No address")
	}
	lconfig := net.ListenConfig{
		Control: listenControl,
	}
	listener, err := lconfig.Listen(context.TODO(), "tcp", os.Args[1])
	if err != nil {
		log.Fatal(err)
	}
	defer listener.Close()
	conn, err := listener.Accept()
	if err != nil {
		log.Fatal(err)
	}
	conn.Write([]byte("Bye...\n"))
	defer conn.Close()
}
func listenControl(network, address string, c syscall.RawConn) error {
	return c.Control(setIf)
}
func setIf(fd uintptr) {
	if err := syscall.BindToDevice(int(fd), "lo"); err != nil {
		log.Fatal(err)
	}
}
./tcp-server :7000
nc -v 127.4.5.6 7000  # works
nc -v ::1 7000        # works
nc -v 172.17.0.1 7000 # Connection refused

As you can see, any address on lo works, but not local address on another interface (172.17.0.1 is on docker0)

@uablrek
Copy link
Contributor

uablrek commented May 23, 2024

Um, come to think of it, a device may be better than two addresses here. Unless the addresses are "any" or loopback, they must be different on all nodes. While this can be done, a common configmap makes it hard. An option might be something like:

healthzBindAddress: "[::]:10256"
healthzBindInterface: "lo"

@aroradaman
Copy link
Member

aroradaman commented May 23, 2024

Um, come to think of it, a device may be better than two addresses here. Unless the addresses are "any" or loopback, they must be different on all nodes. While this can be done, a common configmap makes it hard. An option might be something like:

healthzBindAddress: "[::]:10256"
healthzBindInterface: "lo"

kubernetes/enhancements#4337 (comment)

@uablrek
Copy link
Contributor

uablrek commented May 23, 2024

Thanks for the pointer. Actually I think an interface may be better than a CIDR. Interfaces are more stable. It may be easier to guess the interface than the CIDR some infra-structure provider will use for nodes.

@sanmai-NL
Copy link
Contributor Author

Thanks for the pointer. Actually I think an interface may be better than a CIDR. Interfaces are more stable. It may be easier to guess the interface than the CIDR some infra-structure provider will use for nodes.

But people can assign non-loopback addresses to a loopback interface and this would break.

@sanmai-NL
Copy link
Contributor Author

A network interface can be named variously depending on node OS config or hot-swapping NICs. They are less stable than, e.g., a pair of loopback CIDR blocks ::1/128,127.0.0.1/8.

@uablrek
Copy link
Contributor

uablrek commented May 23, 2024

And CIDRs can change because of DHCP lease, or a new /64 segment for SLAAC. But, I agree with Dan that CIDRs are a more "more-kube-proxy-like approach".

@aroradaman
Copy link
Member

And CIDRs can change because of DHCP lease

IPs will change because of the DHCP lease not CIDR, right?
CIDR will be configured with the DHCP server itself, no?

@sanmai-NL
Copy link
Contributor Author

sanmai-NL commented May 23, 2024

@uablrek For the use case I mentioned, CIDR subnets are stable. I think working with devices and supporting zone indexes in IPv6 addresses is interesting and potentially useful, but the requested functionality shouldn't be designed around that.

@k8s-ci-robot
Copy link
Contributor

@shaneutt: The label(s) triage/accept cannot be applied, because the repository doesn't have them.

In response to this:

/assign @shaneutt

In the meeting today we discussed this and it sounds like we're in favor of moving forward:

/triage accept

@aroradaman it appears that you already have active progress on this one, is that correct? May we assign you?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@shaneutt
Copy link
Member

/assign @danwinship

@danwinship
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 23, 2024
@thockin
Copy link
Member

thockin commented May 23, 2024

I feel like we have danced around this area a bunch of times. Can we clarify what the real requirment is and implement it one time that we can use everywhere?

E.g. write a k/utils library that offers "multi-listen" and takes a list of IPs and/or CIDRs, finds all matching local IPs, opens sockets on all of them and presents a facade of a single socket.

@danwinship
Copy link
Contributor

The documentation suggests that the bind address is singular and that it's either of the IPv4 or IPv6 family. I think that's doubtful, 0.0.0.0 and :: also bind to both families in kube-apiserver.

The documentation is correct, just misleading. If you leave the value unset, then kube-proxy behaves as though you had said either --healthz-bind-address 0.0.0.0:10256 or --healthz-bind-address [::]:10256, depending on whether the cluster is ipv4-primary or ipv6-primary. On Linux, this is sort of pointless because they both have the same effect, but on Windows it actually only binds to one family.

@danwinship
Copy link
Contributor

I'm not sure anyone really cares a lot about listening on both 127.0.0.1 and ::. You can always listen on just 127.0.0.1, even in a single-stack IPv6 cluster, because you don't need any external IPv4 routing to use 127.0.0.1.

In discussion on the v1alpha2 stuff, we decided the use case for overriding this is "I only want to serve health/metrics on the IP that faces the load balancer / prometheus / whatever". To match the way --nodeport-addresses works, we decided to let the user specify a CIDR, and have kube-proxy pick a local IP from in that CIDR. That way you can specify a single config for all your nodes.

@thockin
Copy link
Member

thockin commented May 23, 2024

we decided to let the user specify a CIDR, and have kube-proxy pick a local IP from in that CIDR

That pattern makes sense to me, but is a single family sufficient? IOW, does --healthz-bind-address=10.9.8.0/24,2601:8675::3099/64 need to work?

@aroradaman
Copy link
Member

That pattern makes sense to me, but is a single family sufficient? IOW, does --healthz-bind-address=10.9.8.0/24,2601:8675::3099/64 need to work?

All of the following are valid combinations

--health-bind-address=10.9.8.0/24,2601:8675::3099/64
--health-bind-address=10.9.8.0/24
--health-bind-address=2601:8675::3099/64

but IIRC the plan is to allow users to configure it via v1alpha2 config, not via command line flags.

@thockin
Copy link
Member

thockin commented May 23, 2024

Do we have a socketmux Go package already, or is thst something we'll invent?

@aroradaman
Copy link
Member

Do we have a socketmux Go package already, or is thst something we'll invent?

We can do that, we will need to listen on multiple IPs (technically 0.0.0.0/0 is a valid CIDR). That might be handy.

Also would love to have your feedback on #121830 (comment)

@danwinship
Copy link
Contributor

/area kube-proxy
see #128303

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kube-proxy kind/feature Categorizes issue or PR as related to a new feature. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

9 participants