Skip to content

kube-proxy fails to start for IPv6 when underlying infra is dual stack in 1.28 #120999

Closed
@CecileRobertMichon

Description

What happened?

A new kube-proxyvalidation introduced in k8s 1.28 seems to break Azure + ipv6. IPv6 clusters on Azure run on dual-stack hosts. The IPv6 node IP seems like it's only getting assigned to the Node until after kube-proxy starts (someone might need to help me understand what component is responsible). However, kube-proxy now fails to start with:

kube-proxy is in CrashloopBackoff, logs show:

I0825 16:18:19.872382       1 server_others.go:69] "Using iptables proxy"
I0825 16:18:19.892673       1 node.go:141] Successfully retrieved node IP: 10.1.0.6
I0825 16:18:19.894163       1 conntrack.go:52] "Setting nf_conntrack_max" nfConntrackMax=131072
I0825 16:18:19.912200       1 server.go:632] "kube-proxy running in dual-stack mode" primary ipFamily="IPv4"
E0825 16:18:19.912228       1 server.go:537] "Error running ProxyServer" err="kube-proxy configuration is incorrect: cluster is IPv4-primary but clusterCIDRs contains only IPv6 addresses"
E0825 16:18:19.912240       1 run.go:74] "command failed" err="kube-proxy configuration is incorrect: cluster is IPv4-primary but clusterCIDRs contains only IPv6 addresses"

Full logs: https://gcsweb.k8s.io/gcs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api-provider-azure/4086/pull-cluster-api-provider-azure-conformance-ipv6-with-ci-artifacts/1709361655076360192/artifacts/clusters/capz-conf-klo2dh/

What did you expect to happen?

Kube-proxy should start. This does not repro with prior versions of k8s (1.27 and below).

How can we reproduce it (as minimally and precisely as possible)?

  • Build a Kubernetes cluster with v1.28 on Azure (underlying subnet has both IPv6 and Ipv4 addresses) with Kubeadm (e.g. CAPZ), configure the cluster CIDRs to single stack IPv6 (detailed instructions in https://capz.sigs.k8s.io/topics/ipv6)
  • Observe the kube-proxy pod is crashing

This repros consistently

Anything else we need to know?

This new validation was introduced in #119003

Kubernetes version

v1.28.0 and above

Cloud provider

Azure

OS version

Linux version 6.2.0-1011-azure

Install tools

CAPZ

Container runtime (CRI) and version (if applicable)

revision=b69f1ad231b6d87eeb30504398075a92d615e83e version=v1.6.23

Related plugins (CNI, CSI, ...) and versions (if applicable)

Calico CNI (Calico fails to start due to crashing kube-proxy)

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.sig/networkCategorizes an issue or PR as relevant to SIG Network.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions