Skip to content

Pod Garbage collector fails to clean up PODs from nodes that are not running anymore #118261

Closed
@carlosjgp

Description

@carlosjgp

What happened?

This happened after upgrading to Kubernetes 1.26 from 1.24

The succession of events we have observed:

  • Our workloads use HorizontalPodAutoscaler to scale up due to traffic increase
  • Cluster Autoscaler provisions new nodes to accommodate the new replicas
  • Traffic goes down
  • HPA downsize replicas
  • Cluster Autoscaler notices nodes under the threshold of usage and PODs can be accommodated into other nodes
  • Nodes are drained and downscaled
  • Kubernetes still reports PODs in a "Running" or "Terminating" state in the node that no longer exists
  • Kubernetes control plane reports "Orphan pods" on its audit logs

What did you expect to happen?

Kubernetes Garbage collector to clean up these PODs after the node is gone

How can we reproduce it (as minimally and precisely as possible)?

Deploy the following yaml deployments into a K8s cluster 1.26 and terminate one of the nodes

(duplicated port key "containerPort + protocol")

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    app.kubernetes.io/name: nginx
    app.kubernetes.io/instance: nginx
    app.kubernetes.io/component: deployment
spec:
  replicas: 100
  selector:
    matchLabels:
      app.kubernetes.io/name: nginx
      app.kubernetes.io/instance: nginx
      app.kubernetes.io/component: deployment
  template:
    metadata:
      labels:
        app.kubernetes.io/name: nginx
        app.kubernetes.io/instance: nginx
        app.kubernetes.io/component: deployment
    spec:
      restartPolicy: Always
      containers:
        - name: nginx
          image: "nginx:latest"
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 8080
              name: http
              protocol: TCP
            - containerPort: 8080
              name: metrics
              protocol: TCP

(Duplicated environment variable. Set twice by mistake)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-2
  labels:
    app.kubernetes.io/name: nginx-2
    app.kubernetes.io/instance: nginx-2
    app.kubernetes.io/component: deployment
spec:
  replicas: 100
  selector:
    matchLabels:
      app.kubernetes.io/name: nginx-2
      app.kubernetes.io/instance: nginx-2
      app.kubernetes.io/component: deployment
  template:
    metadata:
      labels:
        app.kubernetes.io/name: nginx-2
        app.kubernetes.io/instance: nginx-2
        app.kubernetes.io/component: deployment
    spec:
      restartPolicy: Always
      containers:
        - name: nginx-2
          image: "nginx:latest"
          imagePullPolicy: IfNotPresent
          env:
            - name: MY_VAR
              value: value-1
            - name: MY_VAR
              value: value-2

Anything else we need to know?

(Duplicated port error)

{
  "kind": "Event",
  "apiVersion": "audit.k8s.io/v1",
  "level": "RequestResponse",
  "auditID": "9d3c7dbf-f599-422b-866c-84d52f3b1a22",
  "stage": "ResponseComplete",
  "requestURI": "/api/v1/namespaces/app-b/pods/app-b-5894548cb-7tssd/status?fieldManager=PodGC\u0026force=true",
  "verb": "patch",
  "user":
    {
      "username": "system:serviceaccount:kube-system:pod-garbage-collector",
      "uid": "f099fed7-6a3d-4a3b-bc1b-49c668276d76",
      "groups":
        [
          "system:serviceaccounts",
          "system:serviceaccounts:kube-system",
          "system:authenticated",
        ],
    },
  "sourceIPs": ["172.16.38.214"],
  "userAgent": "kube-controller-manager/v1.26.4 (linux/amd64) kubernetes/4a34796/system:serviceaccount:kube-system:pod-garbage-collector",
  "objectRef":
    {
      "resource": "pods",
      "namespace": "app-b",
      "name": "app-b-5894548cb-7tssd",
      "apiVersion": "v1",
      "subresource": "status",
    },
  "responseStatus":
    {
      "metadata": {},
      "status": "Failure",
      "message": 'failed to create manager for existing fields: failed to convert new object (app-b/app-b-5894548cb-7tssd; /v1, Kind=Pod) to smd typed: .spec.containers[name="app-b"].ports: duplicate entries for key [containerPort=8082,protocol="TCP"]',
      "code": 500,
    },
  "requestObject":
    {
      "kind": "Pod",
      "apiVersion": "v1",
      "metadata":
        {
          "name": "app-b-5894548cb-7tssd",
          "namespace": "app-b",
        },
      "status":
        {
          "phase": "Failed",
          "conditions":
            [
              {
                "type": "DisruptionTarget",
                "status": "True",
                "lastTransitionTime": "2023-05-23T17:00:55Z",
                "reason": "DeletionByPodGC",
                "message": "PodGC: node no longer exists",
              },
            ],
        },
    },
  "responseObject":
    {
      "kind": "Status",
      "apiVersion": "v1",
      "metadata": {},
      "status": "Failure",
      "message": 'failed to create manager for existing fields: failed to convert new object (app-b/app-b-5894548cb-7tssd; /v1, Kind=Pod) to smd typed: .spec.containers[name="app-b"].ports: duplicate entries for key [containerPort=8082,protocol="TCP"]',
      "code": 500,
    },
  "requestReceivedTimestamp": "2023-05-23T17:00:55.648887Z",
  "stageTimestamp": "2023-05-23T17:00:55.652513Z",
  "annotations":
    {
      "authorization.k8s.io/decision": "allow",
      "authorization.k8s.io/reason": 'RBAC: allowed by ClusterRoleBinding "system:controller:pod-garbage-collector" of ClusterRole "system:controller:pod-garbage-collector" to ServiceAccount "pod-garbage-collector/kube-system"',
    },
}

(Duplicated env var error)

{
  "kind": "Event",
  "apiVersion": "audit.k8s.io/v1",
  "level": "RequestResponse",
  "auditID": "9ffc9212-3d74-4b86-98bb-5e6f0c5395b1",
  "stage": "ResponseComplete",
  "requestURI": "/api/v1/namespaces/app-a/pods/app-a-7b7ddc5874-c85hq/status?fieldManager=PodGC\u0026force=true",
  "verb": "patch",
  "user":
    {
      "username": "system:serviceaccount:kube-system:pod-garbage-collector",
      "uid": "f099fed7-6a3d-4a3b-bc1b-49c668276d76",
      "groups":
        [
          "system:serviceaccounts",
          "system:serviceaccounts:kube-system",
          "system:authenticated",
        ],
    },
  "sourceIPs": ["172.16.38.214"],
  "userAgent": "kube-controller-manager/v1.26.4 (linux/amd64) kubernetes/4a34796/system:serviceaccount:kube-system:pod-garbage-collector",
  "objectRef":
    {
      "resource": "pods",
      "namespace": "app-a",
      "name": "app-a-7b7ddc5874-c85hq",
      "apiVersion": "v1",
      "subresource": "status",
    },
  "responseStatus":
    {
      "metadata": {},
      "status": "Failure",
      "message": "failed to create manager for existing fields: failed to convert new object (app-a/app-a-7b7ddc5874-c85hq; /v1, Kind=Pod) to smd typed: errors:\n .spec.containers[name=\"app-a\"].env: duplicate entries for key [name=\"RABBITMQ_HOST\"]\n .spec.containers[name=\"app-a\"].env: duplicate entries for key [name=\"RABBITMQ_PORT\"]\n .spec.initContainers[name=\"db-migration\"].env: duplicate entries for key [name=\"RABBITMQ_HOST\"]\n .spec.initContainers[name=\"db-migration\"].env: duplicate entries for key [name=\"RABBITMQ_PORT\"]",
      "code": 500,
    },
  "requestObject":
    {
      "kind": "Pod",
      "apiVersion": "v1",
      "metadata":
        {
          "name": "app-a-7b7ddc5874-c85hq",
          "namespace": "app-a",
        },
      "status":
        {
          "phase": "Failed",
          "conditions":
            [
              {
                "type": "DisruptionTarget",
                "status": "True",
                "lastTransitionTime": "2023-05-23T17:00:55Z",
                "reason": "DeletionByPodGC",
                "message": "PodGC: node no longer exists",
              },
            ],
        },
    },
  "responseObject":
    {
      "kind": "Status",
      "apiVersion": "v1",
      "metadata": {},
      "status": "Failure",
      "message": "failed to create manager for existing fields: failed to convert new object (app-a/app-a-7b7ddc5874-c85hq; /v1, Kind=Pod) to smd typed: errors:\n .spec.containers[name=\"app-a\"].env: duplicate entries for key [name=\"RABBITMQ_HOST\"]\n .spec.containers[name=\"app-a\"].env: duplicate entries for key [name=\"RABBITMQ_PORT\"]\n .spec.initContainers[name=\"db-migration\"].env: duplicate entries for key [name=\"RABBITMQ_HOST\"]\n .spec.initContainers[name=\"db-migration\"].env: duplicate entries for key [name=\"RABBITMQ_PORT\"]",
      "code": 500,
    },
  "requestReceivedTimestamp": "2023-05-23T17:00:55.632119Z",
  "stageTimestamp": "2023-05-23T17:00:55.637338Z",
  "annotations":
    {
      "authorization.k8s.io/decision": "allow",
      "authorization.k8s.io/reason": 'RBAC: allowed by ClusterRoleBinding "system:controller:pod-garbage-collector" of ClusterRole "system:controller:pod-garbage-collector" to ServiceAccount "pod-garbage-collector/kube-system"',
    },
}

Kubernetes version

$ kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.1", GitCommit:"e4d4e1ab7cf1bf15273ef97303551b279f0920a9", GitTreeState:"clean", BuildDate:"2022-09-14T19:49:27Z", GoVersion:"go1.19.1", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"26+", GitVersion:"v1.26.4-eks-0a21954", GitCommit:"4a3479673cb6d9b63f1c69a67b57de30a4d9b781", GitTreeState:"clean", BuildDate:"2023-04-15T00:33:09Z", GoVersion:"go1.19.8", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

AWS EKS

OS version

# On Linux:
$ cat /etc/os-release
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"

$ uname -a
Linux ip-x-x-x-x.region.compute.internal 5.15.108 #1 SMP Tue May 9 23:54:26 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Install tools

Cluster Autoscaler

Container runtime (CRI) and version (if applicable)

ContainerD

Related plugins (CNI, CSI, ...) and versions (if applicable)

CNI: Cilium 1.11 CSI: AWS EBS CSI

Activity

added
kind/bugCategorizes issue or PR as related to a bug.
on May 25, 2023
added
needs-sigIndicates an issue or PR lacks a `sig/foo` label and requires one.
needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.
on May 25, 2023
vaibhav2107

vaibhav2107 commented on May 26, 2023

@vaibhav2107
Member

/sig node

added
sig/nodeCategorizes an issue or PR as relevant to SIG Node.
and removed
needs-sigIndicates an issue or PR lacks a `sig/foo` label and requires one.
on May 26, 2023
carlosjgp

carlosjgp commented on May 26, 2023

@carlosjgp
Author

Log statement "failed to create manager..." found in this fuction

// Apply implements Manager.
func (f *skipNonAppliedManager) Apply(liveObj, appliedObj runtime.Object, managed Managed, fieldManager string, force bool) (runtime.Object, Managed, error) {
if len(managed.Fields()) == 0 {
gvk := appliedObj.GetObjectKind().GroupVersionKind()
emptyObj, err := f.objectCreater.New(gvk)
if err != nil {
return nil, nil, fmt.Errorf("failed to create empty object of type %v: %v", gvk, err)
}
liveObj, managed, err = f.fieldManager.Update(emptyObj, liveObj, managed, f.beforeApplyManagerName)
if err != nil {
return nil, nil, fmt.Errorf("failed to create manager for existing fields: %v", err)
}
}
return f.fieldManager.Apply(liveObj, appliedObj, managed, fieldManager, force)
}

pacoxu

pacoxu commented on May 30, 2023

@pacoxu
Member

I just update the PR #113245 (comment) and the PR is for the pod warning part.

For service, there is a validation error for dup port+protocol, and I tried to add a warning for dup ports(different protocol) in #113245.

Check for duplicate Ports, considering (protocol,port) pairs

// Check for duplicate Ports, considering (protocol,port) pairs
portsPath = specPath.Child("ports")
ports := make(map[core.ServicePort]bool)
for i, port := range service.Spec.Ports {
portPath := portsPath.Index(i)
key := core.ServicePort{Protocol: port.Protocol, Port: port.Port}
_, found := ports[key]
if found {
allErrs = append(allErrs, field.Duplicate(portPath, key))
}
ports[key] = true
}

For pod/podtemplate, there is a validation or warning about hostPorts duplication. And I tried to add a warning for dup port (same or different protocols) .

// validatePodHostNetworkDeps checks fields which depend on whether HostNetwork is
// true or not. It should be called on all PodSpecs, but opts can change what
// is enforce. E.g. opts.ResourceIsPod should only be set when called in the
// context of a Pod, and not on PodSpecs which are embedded in other resources
// (e.g. Deployments).
func validatePodHostNetworkDeps(spec *core.PodSpec, fldPath *field.Path, opts PodValidationOptions) field.ErrorList {
// For <reasons> we keep `.HostNetwork` in .SecurityContext on the internal

// AccumulateUniqueHostPorts extracts each HostPort of each Container,
// accumulating the results and returning an error if any ports conflict.
func AccumulateUniqueHostPorts(containers []core.Container, accumulator *sets.String, fldPath *field.Path) field.ErrorList {
allErrs := field.ErrorList{}
for ci, ctr := range containers {
idxPath := fldPath.Index(ci)
portsPath := idxPath.Child("ports")
for pi := range ctr.Ports {
idxPath := portsPath.Index(pi)
port := ctr.Ports[pi].HostPort
if port == 0 {
continue
}
str := fmt.Sprintf("%s/%s/%d", ctr.Ports[pi].Protocol, ctr.Ports[pi].HostIP, port)
if accumulator.Has(str) {
allErrs = append(allErrs, field.Duplicate(idxPath.Child("hostPort"), str))
} else {
accumulator.Insert(str)
}
}
}
return allErrs
}

There is a thread in #113245 (comment).

carlosjgp

carlosjgp commented on May 31, 2023

@carlosjgp
Author

I just update the PR #113245 (comment) and the PR is for the pod warning part.

For service, there is a validation error for dup port+protocol, and I tried to add a warning for dup ports(different protocol) in #113245.

There is a thread in #113245 (comment).

Do notice that the reason to open this bug is the fact that duplicated ports on PODs (containerPort+protocol) or environment variables seem to break the Pod Garbage collector that runs on the Kubernetes Control Plane which appears to be a side effect of the discussion on the PR mentioned.

But the case I'm experiencing is not covered in the comments of that PR I'll chip in into that conversation

SergeyKanzhelev

SergeyKanzhelev commented on May 31, 2023

@SergeyKanzhelev
Member

/remove-sig node
/sig network

for duplicated pods

87 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.kind/regressionCategorizes issue or PR as related to a regression from a prior release.priority/critical-urgentHighest priority. Must be actively worked on as someone's top priority right now.sig/api-machineryCategorizes an issue or PR as relevant to SIG API Machinery.sig/networkCategorizes an issue or PR as relevant to SIG Network.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    Pod Garbage collector fails to clean up PODs from nodes that are not running anymore · Issue #118261 · kubernetes/kubernetes