Skip to content

Setting loadBalancer.acceleration=native causes Cilium Status to report unexpected end of JSON input #35873

Closed as not planned
@rkerno

Description

Is there an existing issue for this?

  • I have searched the existing issues

Version

equal or higher than v1.16.0 and lower than v1.17.0

What happened?

I am testing a three-node cluster. Cilium status reports errors after installation to a fresh Talos cluster when these additional configuration options are supplied:

	--set loadBalancer.acceleration=native \
	--set loadBalancer.mode=snat \

The errors reported from cilium status are:

cilium             cilium-6k8wl                  unable to retrieve cilium endpoint information: unable to unmarshal response: unexpected end of JSON input

I have also tried just specifying acceleration=native without specifying the mode.

These are the errors seen in cilium status:

cilium status
    /¯¯\
 /¯¯\__/¯¯\    Cilium:             4 errors
 \__/¯¯\__/    Operator:           OK
 /¯¯\__/¯¯\    Envoy DaemonSet:    OK
 \__/¯¯\__/    Hubble Relay:       1 errors
    \__/       ClusterMesh:        disabled

DaemonSet              cilium             Desired: 3, Ready: 2/3, Available: 2/3, Unavailable: 1/3
DaemonSet              cilium-envoy       Desired: 3, Ready: 3/3, Available: 3/3
Deployment             cilium-operator    Desired: 2, Ready: 2/2, Available: 2/2
Deployment             hubble-relay       Desired: 1, Unavailable: 1/1
Deployment             hubble-ui          Desired: 1, Unavailable: 1/1
Containers:            cilium             Running: 3
                       cilium-envoy       Running: 3
                       cilium-operator    Running: 2
                       hubble-relay       Running: 1
                       hubble-ui          Pending: 1
Cluster Pods:          4/4 managed by Cilium
Helm chart version:
Image versions         cilium             quay.io/cilium/cilium:v1.16.3@sha256:62d2a09bbef840a46099ac4c69421c90f84f28d018d479749049011329aa7f28: 3
                       cilium-envoy       quay.io/cilium/cilium-envoy:v1.29.9-1728346947-0d05e48bfbb8c4737ec40d5781d970a550ed2bbd@sha256:42614a44e508f70d03a04470df5f61e3cffd22462471a0be0544cf116f2c50ba: 3
                       cilium-operator    quay.io/cilium/operator-generic:v1.16.3@sha256:6e2925ef47a1c76e183c48f95d4ce0d34a1e5e848252f910476c3e11ce1ec94b: 2
                       hubble-relay       quay.io/cilium/hubble-relay:v1.16.3@sha256:feb60efd767e0e7863a94689f4a8db56a0acc7c1d2b307dee66422e3dc25a089: 1
                       hubble-ui          quay.io/cilium/hubble-ui-backend:v0.13.1@sha256:0e0eed917653441fded4e7cdb096b7be6a3bddded5a2dd10812a27b1fc6ed95b: 1
                       hubble-ui          quay.io/cilium/hubble-ui:v0.13.1@sha256:e2e9313eb7caf64b0061d9da0efbdad59c6c461f6ca1752768942bfeda0796c6: 1
Errors:                cilium             cilium                        1 pods of DaemonSet cilium are not ready
                       cilium             cilium-6k8wl                  unable to retrieve cilium status: unable to unmarshal response of cilium status: unexpected end of JSON input
                       cilium             cilium-6k8wl                  unable to retrieve cilium endpoint information: unable to unmarshal response: unexpected end of JSON input
                       cilium             cilium-fhrp9                  unable to retrieve cilium status: unable to unmarshal response of cilium status: unexpected end of JSON input
                       hubble-relay       hubble-relay                  1 pods of Deployment hubble-relay are not ready
                       hubble-ui          hubble-ui                     1 pods of Deployment hubble-ui are not ready
Warnings:              hubble-ui          hubble-ui-77555d5dcf-v9ns9    pod is pending

How can we reproduce the issue?

  1. Install Cilium using kubectl apply -f cilium.yaml after generating the configuration using the Working Configuration combined with Options that cause the issue. I am testing using a fresh Talos installation running on Proxmox using the virtio network driver. I have three control nodes with the 'allowSchedulingOnControlPlanes' option set. I am using the latest Talos build combined with a customised kernel upgraded to 6.10.6 with netkit enabled.

Working configuration:

helm template cilium cilium/cilium \
	--version 1.16.3 \
	--namespace kube-system \
	--set securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
	--set securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
	--set cgroup.autoMount.enabled=false \
	--set cgroup.hostRoot=/sys/fs/cgroup \
	--set kubeProxyReplacement=true \
	--set k8sServiceHost=127.0.0.1 \
	--set k8sServicePort=7445 \
	--set devices="eth0 eth1" \
	--set routingMode=native \
	--set autoDirectNodeRoutes=true \
	--set ipam.mode=cluster-pool \
	--set ipam.operator.clusterPoolIPv4PodCIDRList="10.244.0.0/16" \
	--set ipv4.enabled=true \
	--set ipv4NativeRoutingCIDR="10.244.0.0/16" \
	--set enableIPv4Masquerade=true \
	--set bpf.masquerade=true \
	--set bpf.hostLegacyRouting=false \
	--set bpf.datapathMode=netkit \
	--set hubble.enabled=true \
	--set hubble.relay.enabled=true \
	--set hubble.ui.enabled=true \
	> cilium.yaml

Options that cause the issue:

	--set loadBalancer.acceleration=native \
	--set loadBalancer.mode=snat \

Example Talos Linux Node Patch File:

# c371.patch
machine:
  sysctls:
    net.ipv4.ip_forward: "1"
  network:
    hostname: c371
    nameservers:
      - 192.168.2.62
      - 192.168.2.61
    interfaces:
      - deviceSelector:
          driver: virtio_net
          hardwareAddr: "bc:24:11:b6:15:6d"
          physical: true
        addresses:
          - 192.168.2.171/24
        routes:
          - network: 0.0.0.0/0
            gateway: 192.168.2.60
        mtu: 3506
      - deviceSelector:
          driver: virtio_net
          hardwareAddr: "bc:24:11:04:84:c2"
          physical: true
        addresses:
          - 172.23.1.1/24
        routes:
          - network: 172.23.1.0/24
        mtu: 3506
      - deviceSelector:
          driver: virtio_net
          hardwareAddr: "bc:24:11:8f:f7:4b"
          physical: true
        addresses:
          - 172.23.2.1/24
        routes:
          - network: 172.23.2.0/24
        mtu: 9000
      - deviceSelector:
          driver: virtio_net
          hardwareAddr: "bc:24:11:6c:08:70"
          physical: true
        addresses:
          - 172.29.0.171/24
        routes:
          - network: 172.29.0.0/24
        mtu: 9000
  kubelet:
    nodeIP:
      validSubnets:
        - 172.23.1.0/24
    extraMounts:
      - destination: /var/mnt
        type: bind
        source: /var/mnt
        options:
          - bind
          - rshared
          - rw
      - destination: /var/local-path-provisioner
        type: bind
        source: /var/local-path-provisioner
        options:
          - bind
          - rshared
          - rw
  install:
    disk: /dev/sda
    image: factory.talos.dev/installer/ce4c980550dd2ab1b17bbf2b08801c7eb59418eafe8f279833297925d67c7515:v1.8.0
    wipe: false
    extraKernelArgs:
      # https://github.com/siderolabs/talos/issues/9531
      - net.ifnames=0
      - iommu.strict=0
      - iommu.passthrough=1
      - cpufreq.default_governor=performance
      - spec_rstack_overflow=microcode
      - amd_pstate=active
      - intel_idle.max_cstate=0
  time:
    servers:
      - 192.168.2.60
  features:
    hostDNS:
      enabled: true
      forwardKubeDNSToHost: false
cluster:
  network:
    dnsDomain: cluster.local
    podSubnets:
      - 10.244.0.0/16
    serviceSubnets:
      - 10.96.0.0/12
    cni:
      name: none
  proxy:
    disabled: true
  etcd:
    advertisedSubnets:
      - 172.23.2.0/24
  allowSchedulingOnControlPlanes: true

Cilium Version

cilium-cli: v0.16.19 compiled with go1.23.1 on linux/amd64
cilium image (default): v1.16.2
cilium image (stable): v1.16.3

Kernel Version

1.10.6

Kubernetes Version

Client Version: v1.31.1
Kustomize Version: v5.4.2
Server Version: v1.31.1

Regression

n/a

Sysdump

cilium sysdump hangs:
Collecting sysdump with cilium-cli version: v0.16.19, args: [sysdump]
🔮 Detected Cilium installation in namespace: "kube-system"
🔮 Detected Cilium operator in namespace: "kube-system"
ℹ️ Using default Cilium Helm release name: "cilium"
ℹ️ Failed to detect Cilium SPIRE installation - using Cilium namespace as Cilium SPIRE namespace: "kube-system"
🔍 Collecting Kubernetes nodes
🔮 Detected Cilium features: map[bpf-lb-external-clusterip:Disabled cidr-match-nodes:Disabled clustermesh-enable-endpoint-sync:Disabled cni-chaining:Disabled:none enable-bgp-control-plane:Disabled enable-envoy-config:Disabled enable-gateway-api:Disabled enable-ipsec:Disabled enable-ipv4-egress-gateway:Disabled enable-local-redirect-policy:Disabled endpoint-routes:Disabled ingress-controller:Disabled ipam:Disabled:cluster-pool ipv4:Enabled ipv6:Disabled mutual-auth-spiffe:Disabled wireguard-encapsulate:Disabled]
🔍 Collecting tracing data from Cilium pods
🔍 Collect Kubernetes nodes
🔍 Collecting Kubernetes events
🔍 Collect Kubernetes version
🔍 Collecting Kubernetes pods
🔍 Collecting Kubernetes namespaces
🔍 Collecting Kubernetes services
🔍 Collecting Kubernetes pods summary
🔍 Collecting Kubernetes endpoints
🔍 Collecting Kubernetes network policies
🔍 Collecting Kubernetes metrics
🔍 Collecting Kubernetes leases
🔍 Collecting Cilium cluster-wide network policies
🔍 Collecting Cilium network policies
🔍 Collecting Cilium Egress Gateway policies
🔍 Collecting Cilium egress NAT policies
🔍 Collecting Cilium local redirect policies
🔍 Collecting Cilium CIDR Groups
🔍 Collecting Cilium endpoint slices
🔍 Collecting Cilium endpoints
🔍 Collecting Cilium nodes
🔍 Collecting Cilium identities
🔍 Collecting Ingresses
🔍 Collecting Cilium Node Configs
🔍 Collecting Cilium LoadBalancer IP Pools
🔍 Collecting IngressClasses
🔍 Checking if cilium-etcd-secrets exists in kube-system namespace
🔍 Collecting Cilium Pod IP Pools
🔍 Collecting the Cilium daemonset(s)
🔍 Collecting the Cilium configuration
🔍 Collecting the Cilium Node Init daemonset
🔍 Collecting the Cilium Envoy configuration
🔍 Collecting the Cilium Envoy daemonset
🔍 Collecting the Hubble Relay configuration
🔍 Collecting the Hubble Relay deployment
🔍 Collecting the Hubble daemonset
🔍 Collecting the Hubble UI deployment
🔍 Collecting the Hubble generate certs cronjob
W1109 17:54:44.721951 29390 warnings.go:70] cilium.io/v2alpha1 CiliumNodeConfig will be deprecated in cilium v1.16; use cilium.io/v2 CiliumNodeConfig
🔍 Collecting the Hubble generate certs pod logs
🔍 Collecting the Hubble cert-manager certificates
🔍 Collecting the Cilium operator deployment
⚠️ Daemonset "cilium-node-init" not found in namespace "kube-system" - this is expected if Node Init DaemonSet is not enabled
🔍 Collecting the Cilium operator metrics
🔍 Collecting the clustermesh debug information, metrics and gops stats
🔍 Collecting the 'clustermesh-apiserver' deployment
🔍 Collecting the CNI configuration files from Cilium pods
🔍 Collecting the CNI configmap
⚠️ cronjob "hubble-generate-certs" not found in namespace "kube-system" - this is expected if auto TLS is not enabled or if not using hubble.auto.tls.method=cronjob
🔍 Collecting gops stats from Cilium pods
🔍 Collecting gops stats from Cilium-operator pods
⚠️ Deployment "clustermesh-apiserver" not found in namespace "kube-system" - this is expected if 'clustermesh-apiserver' isn't enabled
🔍 Collecting gops stats from Hubble pods
🔍 Collecting gops stats from Hubble Relay pods
🔍 Collecting bugtool output from Cilium pods
🔍 Collecting profiling data from Cilium pods
🔍 Collecting logs from Cilium pods
Secret "cilium-etcd-secrets" not found in namespace "kube-system" - this is expected when using the CRD KVStore
🔍 Collecting logs from Cilium Envoy pods
I1109 17:54:45.706751 29390 request.go:700] Waited for 1.013226131s due to client-side throttling, not priority and fairness, request: GET:https://kube3.k8d.projectcatalysts.prv:6443/api/v1/namespaces/kube-system/configmaps/cilium-envoy-config
🔍 Collecting logs from Cilium Node Init pods
🔍 Collecting logs from Cilium operator pods
🔍 Collecting logs from 'clustermesh-apiserver' pods
🔍 Collecting logs from Hubble pods
🔍 Collecting logs from Hubble Relay pods
🔍 Collecting logs from Hubble UI pods
🔍 Collecting platform-specific data
🔍 Collecting kvstore data
🔍 Collecting Cilium external workloads
🔍 Collecting Hubble flows from Cilium pods
🔍 Collecting logs from Tetragon pods
🔍 Collecting logs from Tetragon operator pods
🔍 Collecting bugtool output from Tetragon pods
🔍 Collecting Tetragon configmap
🔍 Collecting Tetragon PodInfo custom resources
🔍 Collecting Tetragon tracing policies
🔍 Collecting Tetragon namespaced tracing policies
🔍 Collecting Helm metadata from the release
🔍 Collecting Helm values from the release
^C

Relevant log output

23m                  Normal    Started                        Pod/hubble-ui-77555d5dcf-v9ns9         Started container frontend
23m                  Warning   FailedToUpdateEndpoint         Endpoints/hubble-relay                 Failed to update endpoint kube-system/hubble-relay: Put "https://127.0.0.1:7445/api/v1/namespaces/kube-system/endpoints/hubble-relay": unexpected EOF
23m                  Warning   FailedToUpdateEndpointSlices   Service/hubble-peer                    Error updating Endpoint Slices for Service kube-system/hubble-peer: failed to update hubble-peer-fkbn2 EndpointSlice for Service kube-system/hubble-peer: Put "https://127.0.0.1:7445/apis/discovery.k8s.io/v1/namespaces/kube-system/endpointslices/hubble-peer-fkbn2": unexpected EOF
23m                  Warning   FailedToUpdateEndpoint         Endpoints/hubble-peer                  Failed to update endpoint kube-system/hubble-peer: Put "https://127.0.0.1:7445/api/v1/namespaces/kube-system/endpoints/hubble-peer": unexpected EOF
23m                  Normal    Started                        Pod/hubble-relay-c56665db6-r72w9       Started container hubble-relay
23m                  Warning   FailedToUpdateEndpointSlices   Service/hubble-relay                   Error updating Endpoint Slices for Service kube-system/hubble-relay: failed to update hubble-relay-vgrvs EndpointSlice for Service kube-system/hubble-relay: Put "https://127.0.0.1:7445/apis/discovery.k8s.io/v1/namespaces/kube-system/endpointslices/hubble-relay-vgrvs": unexpected EOF
22m                  Warning   FailedToUpdateEndpoint         Endpoints/hubble-relay                 Failed to update endpoint kube-system/hubble-relay: etcdserver: request timed out
22m                  Normal    Pulled                         Pod/hubble-ui-77555d5dcf-v9ns9         Successfully pulled image "quay.io/cilium/hubble-ui-backend:v0.13.1@sha256:0e0eed917653441fded4e7cdb096b7be6a3bddded5a2dd10812a27b1fc6ed95b" in 7.777s (7.777s including waiting). Image size: 20027102 bytes.
22m                  Normal    Created                        Pod/hubble-ui-77555d5dcf-v9ns9         Created container backend
22m                  Warning   FailedToUpdateEndpointSlices   Service/hubble-relay                   Error updating Endpoint Slices for Service kube-system/hubble-relay: failed to update hubble-relay-vgrvs EndpointSlice for Service kube-system/hubble-relay: etcdserver: request timed out
22m                  Normal    Started                        Pod/hubble-ui-77555d5dcf-v9ns9         Started container backend
22m                  Warning   Unhealthy                      Pod/hubble-relay-c56665db6-r72w9       Startup probe failed: service unhealthy (responded with "NOT_SERVING")
22m                  Normal    Created                        Pod/coredns-68d75fd545-kmdcz           Created container coredns
22m                  Normal    Pulled                         Pod/coredns-68d75fd545-wc92t           Successfully pulled image "registry.k8s.io/coredns/coredns:v1.11.3" in 19.081s (19.081s including waiting). Image size: 18562039 bytes.
22m                  Normal    Pulled                         Pod/coredns-68d75fd545-kmdcz           Successfully pulled image "registry.k8s.io/coredns/coredns:v1.11.3" in 19.156s (19.156s including waiting). Image size: 18562039 bytes.
22m                  Warning   FailedToUpdateEndpointSlices   Service/hubble-peer                    Error updating Endpoint Slices for Service kube-system/hubble-peer: failed to update hubble-peer-fkbn2 EndpointSlice for Service kube-system/hubble-peer: etcdserver: request timed out
22m                  Warning   FailedToUpdateEndpoint         Endpoints/hubble-peer                  Failed to update endpoint kube-system/hubble-peer: etcdserver: request timed out
22m                  Normal    Pulled                         Pod/cilium-operator-7dc44d6bc-6rn7c    Container image "quay.io/cilium/operator-generic:v1.16.3@sha256:6e2925ef47a1c76e183c48f95d4ce0d34a1e5e848252f910476c3e11ce1ec94b" already present on machine
22m                  Normal    LeaderElection                 Lease/kube-scheduler                   c373_72e1e506-15e7-40ea-8218-04d5e190ba95 became leader
22m (x7 over 25m)    Warning   BackOff                        Pod/kube-scheduler-c371                Back-off restarting failed container kube-scheduler in pod kube-scheduler-c371_kube-system(ead119de7bbe05fa0e8f4f352565c68e)
22m (x4 over 25m)    Normal    Pulled                         Pod/kube-scheduler-c371                Container image "registry.k8s.io/kube-scheduler:v1.31.1" already present on machine
22m (x4 over 25m)    Normal    Created                        Pod/kube-scheduler-c371                Created container kube-scheduler
22m (x4 over 25m)    Normal    Started                        Pod/kube-scheduler-c371                Started container kube-scheduler

Anything else?

No response

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Assignees

No one assigned

    Labels

    kind/bugThis is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.need-more-infoMore information is required to further debug or fix the issue.needs/triageThis issue requires triaging to establish severity and next steps.staleThe stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions