Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overlapping labels can lead to HPA matching incorrect pods #124307

Open
adrianmoisey opened this issue Apr 14, 2024 · 16 comments
Open

Overlapping labels can lead to HPA matching incorrect pods #124307

adrianmoisey opened this issue Apr 14, 2024 · 16 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling.

Comments

@adrianmoisey
Copy link
Member

adrianmoisey commented Apr 14, 2024

What happened?

When an HPA targets a Deployment which has a label selector matching Pods that don't belong to it (overlapping labels, for example), those "other" Pods are considered by the HPA to be part of the targeted HPA.

What did you expect to happen?

I have always been lead to believe that this behaviour is correct and the user should have labels that are specific enough to not overlap other Deployments.
This is stated here:

Do not overlap labels or selectors with other controllers (including other Deployments and StatefulSets). Kubernetes doesn't stop you from overlapping, and if multiple controllers have overlapping selectors those controllers might conflict and behave unexpectedly.

However, recently the VPA fixed the same behaviour in kubernetes/autoscaler#6460

Which makes me wonder if I should expect the HPA to test the owner reference too, and only target that Deployment's pods.

How can we reproduce it (as minimally and precisely as possible)?

  1. Create 2 deployments, both with the same labels/selectors
  2. In 1 Deployment set resources
  3. Deploy an HPA (and metrics-server)
  4. Watch the events of the HPA throw error events related to Pods not under control of the HPA's deployment
    1. ie: Warning FailedGetResourceMetric 87s (x3 over 117s) horizontal-pod-autoscaler failed to get cpu utilization: missing request for cpu in container ubuntu of Pod other-64886557cb-ldtnt
YAML files
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
  app: workload
name: workload
spec:
replicas: 1
selector:
  matchLabels:
    app: other
template:
  metadata:
    labels:
      app: other
  spec:
    containers:
    - command:
      - /bin/bash
      - -c
      - sha256sum /dev/zero
      image: ubuntu
      imagePullPolicy: Always
      name: nginx
      resources:
        requests:
          cpu: 10m
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
  app: other
name: other
spec:
replicas: 1
selector:
  matchLabels:
    app: other
template:
  metadata:
    labels:
      app: other
  spec:
    containers:
    - command:
      - /bin/bash
      - -c
      - sleep infinity
      image: ubuntu
      imagePullPolicy: Always
      name: ubuntu
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: workload
spec:
scaleTargetRef:
  apiVersion: apps/v1
  kind: Deployment
  name: workload
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
  resource:
    name: cpu
    target:
      type: Utilization
      averageUtilization: 50

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
Client Version: v1.29.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.2

Cloud provider

  • kind
  • GKE
  • kOps (AWS)

OS version

No response

Install tools

kind

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@adrianmoisey adrianmoisey added the kind/bug Categorizes issue or PR as related to a bug. label Apr 14, 2024
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Apr 14, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 14, 2024
@adrianmoisey
Copy link
Member Author

/sig autoscaling

@k8s-ci-robot k8s-ci-robot added sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 14, 2024
@dbenque
Copy link

dbenque commented Apr 15, 2024

Since the spec is using a scaleTargetRef and not a labelSelector, I think it makes sense to scope the HPA to the deployment and not to the set of pods matching the labelSelector of that deployment. This is what is done in the VPA and also in the Deployment Controller.

This being said the documentation clearly says:

The controller manager finds the target resource defined by the scaleTargetRef, then selects the pods based on the target resource's .spec.selector labels,...

So fixing this would be a change in behaviour as discussed in the SIG. Does it really makes sense to have deployment having overlapping selectors? It is technically possible yes, but is it a good practice... probably not! This leads to so many problems, like overlapping PDBs, that result in 500 when you try to evict pods. If there is a real need for doing labelSelector to define the scope of the HPA, maybe next HPA spec should explicitly introduce that capability by offering in the spec both scaleTargetRef and labelSelector (only one can be used at a time). I am clearly not in favour of such a change, but if the need really exists that would be a clear way to address it.

I would be in favour of giving a mean to fix this issue to align the behaviour of all the controllers (VPA, Deployment). If there is a consensus for fixing, to protect backward compatibility, maybe we can propose to pilot the activation of the fix at controller (or HPA object) level using a new parameter (or annotation on the HPA). Thanks to that new parameter the user can decide on the behaviour: "strict scaleTargetRef using ownerRef" versus "labelSelector"

@adrianmoisey
Copy link
Member Author

I think it makes sense to scope the HPA to the deployment and not to the set of pods matching the labelSelector of that deployment. This is what is done in the VPA and also in the Deployment Controller.

As an end user, I think this is what one would "expect" when starting out on Kubernetes.
I also really like the consistency that this would add.

@sftim
Copy link
Contributor

sftim commented Apr 22, 2024

I think we might need a KEP to change the way that Kubernetes interprets the spec of a HorizontalPodAutoscaler.

VerticalPodAutoscaler is not part of the formal Kubernetes API (yet), whereas HorizontalPodAutoscaler already is.

@OmarHawk
Copy link

OmarHawk commented Apr 26, 2024

Just ran into the same issue I believe.

We have following situation:

  • one namespace app
  • two deployments in the same namespace app-frontend and app-backend
  • both deployments, replicasets and their pods have overlapping labels, like this as everything belongs sort of together:
    -- app: app
  • in the same time, they also have a different label per deployment/replica set/pod like this (just fyi):
    -- Deployment 1: app-role: app-frontend
    -- Deployment 2: app-role: app-backend
  • both have the following thing in their Deployment spec:
  selector:
    matchLabels:
      app: app
  • they have the following default replica count:
    -- app-frontend: 2
    -- app-backend: 1
  • and the following resource requests / limits (cpu) per Pod:
    -- app-frontend: 200m / 1000m
    -- app-backend: 600m / 1200m

The backend one is performing some heavy load operations in the background, causing nearly 100% utilization of that single pod in the deployment all the time (that is fine for us), while the frontend one is having near 0 load, as it is produces only load on a per request basis.

We have configured an HPA that has scaleTargetRef like this:

scaleTargetRef:
    kind: Deployment
    name: app-frontend
    apiVersion: apps/v1

So my assumption would be that metrics defined like this:

  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 80

would only apply to that single app-frontend Deployment as this is also the target.
From what I observe, the metrics seem to also consider the heavy-load app-backend Pods, therefore leading to undesired upscaling operations on the app-frontend Deployment. This is in my opinion nowhere documented. I have tried to find explanations for the observations I have made , but haven't found anything. Additionally, I think, this is also undesired behaviour...

btw. scaling itself happens only on the app-frontend, but not on app-backend, which is fine.

@adrianmoisey
Copy link
Member Author

What are the label selectors for each deployment?

@OmarHawk
Copy link

OmarHawk commented Apr 26, 2024

I've updated my description.

@OmarHawk
Copy link

After adjusting the matchLabels selectors there accordingly, it seems to work properly actually.

@elvis-cai
Copy link

elvis-cai commented Jun 3, 2024

same here, when create different deployment with same label and change hpa to select different deployment, hpa reporting usage is wrong(targets always same), after make sure reference target deployment is unique, this issue is gone
Before

kl get hpa 
NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
danswer-api         Deployment/danswer-api         87%/80%   1         3         3          15m
danswer-indexing    Deployment/danswer-indexing    87%/80%   1         3         3          59m
danswer-inference   Deployment/danswer-inference   87%/80%   1         3         3          59m

After

kl get hpa 
NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
danswer-api         Deployment/danswer-api         28%/80%   1         3         1          14m
danswer-indexing    Deployment/danswer-indexing    98%/80%   1         3         3          79m
danswer-inference   Deployment/danswer-inference   68%/80%   1         3         3          79m

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 1, 2024
@adrianmoisey
Copy link
Member Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 2, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 1, 2024
@adrianmoisey
Copy link
Member Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 1, 2024
@adrianmoisey
Copy link
Member Author

I think I'll take a try at making a KEP for this
/assign

@adrianmoisey
Copy link
Member Author

I've been digging into this more.
It seems that the existing behaviour of matching pods outside of a Deployment doesn't actually work. The HPA artificially increases the replica count, I think it's due to this:

https://github.com/kubernetes/kubernetes/blob/v1.32.0/pkg/controller/podautoscaler/replica_calculator.go#L103

readyPodCount could be too high, since additional pods are matched

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling.
Projects
None yet
Development

No branches or pull requests

7 participants