Setting loadBalancer.acceleration=native causes Cilium Status to report unexpected end of JSON input #35873
Description
Is there an existing issue for this?
- I have searched the existing issues
Version
equal or higher than v1.16.0 and lower than v1.17.0
What happened?
I am testing a three-node cluster. Cilium status reports errors after installation to a fresh Talos cluster when these additional configuration options are supplied:
--set loadBalancer.acceleration=native \
--set loadBalancer.mode=snat \
The errors reported from cilium status
are:
cilium cilium-6k8wl unable to retrieve cilium endpoint information: unable to unmarshal response: unexpected end of JSON input
I have also tried just specifying acceleration=native without specifying the mode.
These are the errors seen in cilium status:
cilium status
/¯¯\
/¯¯\__/¯¯\ Cilium: 4 errors
\__/¯¯\__/ Operator: OK
/¯¯\__/¯¯\ Envoy DaemonSet: OK
\__/¯¯\__/ Hubble Relay: 1 errors
\__/ ClusterMesh: disabled
DaemonSet cilium Desired: 3, Ready: 2/3, Available: 2/3, Unavailable: 1/3
DaemonSet cilium-envoy Desired: 3, Ready: 3/3, Available: 3/3
Deployment cilium-operator Desired: 2, Ready: 2/2, Available: 2/2
Deployment hubble-relay Desired: 1, Unavailable: 1/1
Deployment hubble-ui Desired: 1, Unavailable: 1/1
Containers: cilium Running: 3
cilium-envoy Running: 3
cilium-operator Running: 2
hubble-relay Running: 1
hubble-ui Pending: 1
Cluster Pods: 4/4 managed by Cilium
Helm chart version:
Image versions cilium quay.io/cilium/cilium:v1.16.3@sha256:62d2a09bbef840a46099ac4c69421c90f84f28d018d479749049011329aa7f28: 3
cilium-envoy quay.io/cilium/cilium-envoy:v1.29.9-1728346947-0d05e48bfbb8c4737ec40d5781d970a550ed2bbd@sha256:42614a44e508f70d03a04470df5f61e3cffd22462471a0be0544cf116f2c50ba: 3
cilium-operator quay.io/cilium/operator-generic:v1.16.3@sha256:6e2925ef47a1c76e183c48f95d4ce0d34a1e5e848252f910476c3e11ce1ec94b: 2
hubble-relay quay.io/cilium/hubble-relay:v1.16.3@sha256:feb60efd767e0e7863a94689f4a8db56a0acc7c1d2b307dee66422e3dc25a089: 1
hubble-ui quay.io/cilium/hubble-ui-backend:v0.13.1@sha256:0e0eed917653441fded4e7cdb096b7be6a3bddded5a2dd10812a27b1fc6ed95b: 1
hubble-ui quay.io/cilium/hubble-ui:v0.13.1@sha256:e2e9313eb7caf64b0061d9da0efbdad59c6c461f6ca1752768942bfeda0796c6: 1
Errors: cilium cilium 1 pods of DaemonSet cilium are not ready
cilium cilium-6k8wl unable to retrieve cilium status: unable to unmarshal response of cilium status: unexpected end of JSON input
cilium cilium-6k8wl unable to retrieve cilium endpoint information: unable to unmarshal response: unexpected end of JSON input
cilium cilium-fhrp9 unable to retrieve cilium status: unable to unmarshal response of cilium status: unexpected end of JSON input
hubble-relay hubble-relay 1 pods of Deployment hubble-relay are not ready
hubble-ui hubble-ui 1 pods of Deployment hubble-ui are not ready
Warnings: hubble-ui hubble-ui-77555d5dcf-v9ns9 pod is pending
How can we reproduce the issue?
- Install Cilium using
kubectl apply -f cilium.yaml
after generating the configuration using the Working Configuration combined with Options that cause the issue. I am testing using a fresh Talos installation running on Proxmox using the virtio network driver. I have three control nodes with the 'allowSchedulingOnControlPlanes' option set. I am using the latest Talos build combined with a customised kernel upgraded to 6.10.6 with netkit enabled.
Working configuration:
helm template cilium cilium/cilium \
--version 1.16.3 \
--namespace kube-system \
--set securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
--set securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
--set cgroup.autoMount.enabled=false \
--set cgroup.hostRoot=/sys/fs/cgroup \
--set kubeProxyReplacement=true \
--set k8sServiceHost=127.0.0.1 \
--set k8sServicePort=7445 \
--set devices="eth0 eth1" \
--set routingMode=native \
--set autoDirectNodeRoutes=true \
--set ipam.mode=cluster-pool \
--set ipam.operator.clusterPoolIPv4PodCIDRList="10.244.0.0/16" \
--set ipv4.enabled=true \
--set ipv4NativeRoutingCIDR="10.244.0.0/16" \
--set enableIPv4Masquerade=true \
--set bpf.masquerade=true \
--set bpf.hostLegacyRouting=false \
--set bpf.datapathMode=netkit \
--set hubble.enabled=true \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
> cilium.yaml
Options that cause the issue:
--set loadBalancer.acceleration=native \
--set loadBalancer.mode=snat \
Example Talos Linux Node Patch File:
# c371.patch
machine:
sysctls:
net.ipv4.ip_forward: "1"
network:
hostname: c371
nameservers:
- 192.168.2.62
- 192.168.2.61
interfaces:
- deviceSelector:
driver: virtio_net
hardwareAddr: "bc:24:11:b6:15:6d"
physical: true
addresses:
- 192.168.2.171/24
routes:
- network: 0.0.0.0/0
gateway: 192.168.2.60
mtu: 3506
- deviceSelector:
driver: virtio_net
hardwareAddr: "bc:24:11:04:84:c2"
physical: true
addresses:
- 172.23.1.1/24
routes:
- network: 172.23.1.0/24
mtu: 3506
- deviceSelector:
driver: virtio_net
hardwareAddr: "bc:24:11:8f:f7:4b"
physical: true
addresses:
- 172.23.2.1/24
routes:
- network: 172.23.2.0/24
mtu: 9000
- deviceSelector:
driver: virtio_net
hardwareAddr: "bc:24:11:6c:08:70"
physical: true
addresses:
- 172.29.0.171/24
routes:
- network: 172.29.0.0/24
mtu: 9000
kubelet:
nodeIP:
validSubnets:
- 172.23.1.0/24
extraMounts:
- destination: /var/mnt
type: bind
source: /var/mnt
options:
- bind
- rshared
- rw
- destination: /var/local-path-provisioner
type: bind
source: /var/local-path-provisioner
options:
- bind
- rshared
- rw
install:
disk: /dev/sda
image: factory.talos.dev/installer/ce4c980550dd2ab1b17bbf2b08801c7eb59418eafe8f279833297925d67c7515:v1.8.0
wipe: false
extraKernelArgs:
# https://github.com/siderolabs/talos/issues/9531
- net.ifnames=0
- iommu.strict=0
- iommu.passthrough=1
- cpufreq.default_governor=performance
- spec_rstack_overflow=microcode
- amd_pstate=active
- intel_idle.max_cstate=0
time:
servers:
- 192.168.2.60
features:
hostDNS:
enabled: true
forwardKubeDNSToHost: false
cluster:
network:
dnsDomain: cluster.local
podSubnets:
- 10.244.0.0/16
serviceSubnets:
- 10.96.0.0/12
cni:
name: none
proxy:
disabled: true
etcd:
advertisedSubnets:
- 172.23.2.0/24
allowSchedulingOnControlPlanes: true
Cilium Version
cilium-cli: v0.16.19 compiled with go1.23.1 on linux/amd64
cilium image (default): v1.16.2
cilium image (stable): v1.16.3
Kernel Version
1.10.6
Kubernetes Version
Client Version: v1.31.1
Kustomize Version: v5.4.2
Server Version: v1.31.1
Regression
n/a
Sysdump
cilium sysdump hangs:
Collecting sysdump with cilium-cli version: v0.16.19, args: [sysdump]
🔮 Detected Cilium installation in namespace: "kube-system"
🔮 Detected Cilium operator in namespace: "kube-system"
ℹ️ Using default Cilium Helm release name: "cilium"
ℹ️ Failed to detect Cilium SPIRE installation - using Cilium namespace as Cilium SPIRE namespace: "kube-system"
🔍 Collecting Kubernetes nodes
🔮 Detected Cilium features: map[bpf-lb-external-clusterip:Disabled cidr-match-nodes:Disabled clustermesh-enable-endpoint-sync:Disabled cni-chaining:Disabled:none enable-bgp-control-plane:Disabled enable-envoy-config:Disabled enable-gateway-api:Disabled enable-ipsec:Disabled enable-ipv4-egress-gateway:Disabled enable-local-redirect-policy:Disabled endpoint-routes:Disabled ingress-controller:Disabled ipam:Disabled:cluster-pool ipv4:Enabled ipv6:Disabled mutual-auth-spiffe:Disabled wireguard-encapsulate:Disabled]
🔍 Collecting tracing data from Cilium pods
🔍 Collect Kubernetes nodes
🔍 Collecting Kubernetes events
🔍 Collect Kubernetes version
🔍 Collecting Kubernetes pods
🔍 Collecting Kubernetes namespaces
🔍 Collecting Kubernetes services
🔍 Collecting Kubernetes pods summary
🔍 Collecting Kubernetes endpoints
🔍 Collecting Kubernetes network policies
🔍 Collecting Kubernetes metrics
🔍 Collecting Kubernetes leases
🔍 Collecting Cilium cluster-wide network policies
🔍 Collecting Cilium network policies
🔍 Collecting Cilium Egress Gateway policies
🔍 Collecting Cilium egress NAT policies
🔍 Collecting Cilium local redirect policies
🔍 Collecting Cilium CIDR Groups
🔍 Collecting Cilium endpoint slices
🔍 Collecting Cilium endpoints
🔍 Collecting Cilium nodes
🔍 Collecting Cilium identities
🔍 Collecting Ingresses
🔍 Collecting Cilium Node Configs
🔍 Collecting Cilium LoadBalancer IP Pools
🔍 Collecting IngressClasses
🔍 Checking if cilium-etcd-secrets exists in kube-system namespace
🔍 Collecting Cilium Pod IP Pools
🔍 Collecting the Cilium daemonset(s)
🔍 Collecting the Cilium configuration
🔍 Collecting the Cilium Node Init daemonset
🔍 Collecting the Cilium Envoy configuration
🔍 Collecting the Cilium Envoy daemonset
🔍 Collecting the Hubble Relay configuration
🔍 Collecting the Hubble Relay deployment
🔍 Collecting the Hubble daemonset
🔍 Collecting the Hubble UI deployment
🔍 Collecting the Hubble generate certs cronjob
W1109 17:54:44.721951 29390 warnings.go:70] cilium.io/v2alpha1 CiliumNodeConfig will be deprecated in cilium v1.16; use cilium.io/v2 CiliumNodeConfig
🔍 Collecting the Hubble generate certs pod logs
🔍 Collecting the Hubble cert-manager certificates
🔍 Collecting the Cilium operator deployment
🔍 Collecting the Cilium operator metrics
🔍 Collecting the clustermesh debug information, metrics and gops stats
🔍 Collecting the 'clustermesh-apiserver' deployment
🔍 Collecting the CNI configuration files from Cilium pods
🔍 Collecting the CNI configmap
🔍 Collecting gops stats from Cilium pods
🔍 Collecting gops stats from Cilium-operator pods
🔍 Collecting gops stats from Hubble pods
🔍 Collecting gops stats from Hubble Relay pods
🔍 Collecting bugtool output from Cilium pods
🔍 Collecting profiling data from Cilium pods
🔍 Collecting logs from Cilium pods
Secret "cilium-etcd-secrets" not found in namespace "kube-system" - this is expected when using the CRD KVStore
🔍 Collecting logs from Cilium Envoy pods
I1109 17:54:45.706751 29390 request.go:700] Waited for 1.013226131s due to client-side throttling, not priority and fairness, request: GET:https://kube3.k8d.projectcatalysts.prv:6443/api/v1/namespaces/kube-system/configmaps/cilium-envoy-config
🔍 Collecting logs from Cilium Node Init pods
🔍 Collecting logs from Cilium operator pods
🔍 Collecting logs from 'clustermesh-apiserver' pods
🔍 Collecting logs from Hubble pods
🔍 Collecting logs from Hubble Relay pods
🔍 Collecting logs from Hubble UI pods
🔍 Collecting platform-specific data
🔍 Collecting kvstore data
🔍 Collecting Cilium external workloads
🔍 Collecting Hubble flows from Cilium pods
🔍 Collecting logs from Tetragon pods
🔍 Collecting logs from Tetragon operator pods
🔍 Collecting bugtool output from Tetragon pods
🔍 Collecting Tetragon configmap
🔍 Collecting Tetragon PodInfo custom resources
🔍 Collecting Tetragon tracing policies
🔍 Collecting Tetragon namespaced tracing policies
🔍 Collecting Helm metadata from the release
🔍 Collecting Helm values from the release
^C
Relevant log output
23m Normal Started Pod/hubble-ui-77555d5dcf-v9ns9 Started container frontend
23m Warning FailedToUpdateEndpoint Endpoints/hubble-relay Failed to update endpoint kube-system/hubble-relay: Put "https://127.0.0.1:7445/api/v1/namespaces/kube-system/endpoints/hubble-relay": unexpected EOF
23m Warning FailedToUpdateEndpointSlices Service/hubble-peer Error updating Endpoint Slices for Service kube-system/hubble-peer: failed to update hubble-peer-fkbn2 EndpointSlice for Service kube-system/hubble-peer: Put "https://127.0.0.1:7445/apis/discovery.k8s.io/v1/namespaces/kube-system/endpointslices/hubble-peer-fkbn2": unexpected EOF
23m Warning FailedToUpdateEndpoint Endpoints/hubble-peer Failed to update endpoint kube-system/hubble-peer: Put "https://127.0.0.1:7445/api/v1/namespaces/kube-system/endpoints/hubble-peer": unexpected EOF
23m Normal Started Pod/hubble-relay-c56665db6-r72w9 Started container hubble-relay
23m Warning FailedToUpdateEndpointSlices Service/hubble-relay Error updating Endpoint Slices for Service kube-system/hubble-relay: failed to update hubble-relay-vgrvs EndpointSlice for Service kube-system/hubble-relay: Put "https://127.0.0.1:7445/apis/discovery.k8s.io/v1/namespaces/kube-system/endpointslices/hubble-relay-vgrvs": unexpected EOF
22m Warning FailedToUpdateEndpoint Endpoints/hubble-relay Failed to update endpoint kube-system/hubble-relay: etcdserver: request timed out
22m Normal Pulled Pod/hubble-ui-77555d5dcf-v9ns9 Successfully pulled image "quay.io/cilium/hubble-ui-backend:v0.13.1@sha256:0e0eed917653441fded4e7cdb096b7be6a3bddded5a2dd10812a27b1fc6ed95b" in 7.777s (7.777s including waiting). Image size: 20027102 bytes.
22m Normal Created Pod/hubble-ui-77555d5dcf-v9ns9 Created container backend
22m Warning FailedToUpdateEndpointSlices Service/hubble-relay Error updating Endpoint Slices for Service kube-system/hubble-relay: failed to update hubble-relay-vgrvs EndpointSlice for Service kube-system/hubble-relay: etcdserver: request timed out
22m Normal Started Pod/hubble-ui-77555d5dcf-v9ns9 Started container backend
22m Warning Unhealthy Pod/hubble-relay-c56665db6-r72w9 Startup probe failed: service unhealthy (responded with "NOT_SERVING")
22m Normal Created Pod/coredns-68d75fd545-kmdcz Created container coredns
22m Normal Pulled Pod/coredns-68d75fd545-wc92t Successfully pulled image "registry.k8s.io/coredns/coredns:v1.11.3" in 19.081s (19.081s including waiting). Image size: 18562039 bytes.
22m Normal Pulled Pod/coredns-68d75fd545-kmdcz Successfully pulled image "registry.k8s.io/coredns/coredns:v1.11.3" in 19.156s (19.156s including waiting). Image size: 18562039 bytes.
22m Warning FailedToUpdateEndpointSlices Service/hubble-peer Error updating Endpoint Slices for Service kube-system/hubble-peer: failed to update hubble-peer-fkbn2 EndpointSlice for Service kube-system/hubble-peer: etcdserver: request timed out
22m Warning FailedToUpdateEndpoint Endpoints/hubble-peer Failed to update endpoint kube-system/hubble-peer: etcdserver: request timed out
22m Normal Pulled Pod/cilium-operator-7dc44d6bc-6rn7c Container image "quay.io/cilium/operator-generic:v1.16.3@sha256:6e2925ef47a1c76e183c48f95d4ce0d34a1e5e848252f910476c3e11ce1ec94b" already present on machine
22m Normal LeaderElection Lease/kube-scheduler c373_72e1e506-15e7-40ea-8218-04d5e190ba95 became leader
22m (x7 over 25m) Warning BackOff Pod/kube-scheduler-c371 Back-off restarting failed container kube-scheduler in pod kube-scheduler-c371_kube-system(ead119de7bbe05fa0e8f4f352565c68e)
22m (x4 over 25m) Normal Pulled Pod/kube-scheduler-c371 Container image "registry.k8s.io/kube-scheduler:v1.31.1" already present on machine
22m (x4 over 25m) Normal Created Pod/kube-scheduler-c371 Created container kube-scheduler
22m (x4 over 25m) Normal Started Pod/kube-scheduler-c371 Started container kube-scheduler
Anything else?
No response
Cilium Users Document
- Are you a user of Cilium? Please add yourself to the Users doc
Code of Conduct
- I agree to follow this project's Code of Conduct