Failing ci-benchmark-scheduler-perf-master tests for PreemptionAsync and Unschedulable tests #128221

macsko · 2024-10-21T08:07:37Z

Which jobs are failing?

ci-benchmark-scheduler-perf-master

Which tests are failing?

PreemptionAsync and Unschedulable test cases

Since when has it been failing?

17th Oct 2024

Testgrid link

https://testgrid.k8s.io/sig-scalability-benchmarks#scheduler-perf

Reason for failure (if possible)

PreemptionAsync test is failing because of context deadline error:

    scheduler_perf.go:1427: FATAL ERROR: op 3: error in waiting for pods to get scheduled: at least pod namespace-3/pod-4vfnq is not scheduled: context deadline exceeded
--- FAIL: BenchmarkPerfScheduling/PreemptionAsync/5000Node

Caused by #127829

Unschedulable test is failing because of too high threshold configured:

    scheduler_perf.go:1298: ERROR: op 2: BenchmarkPerfScheduling/Unschedulable/5kNodes/10kPods/namespace-2: expected SchedulingThroughput Average to be higher: got 289.204988, want 400.000000
--- FAIL: BenchmarkPerfScheduling/Unschedulable/5kNodes/10kPods

Caused by #128153
After making these changes, the threshold should be even lower, as it seems a value around 270-280 should be good at the moment.

Anything else we need to know?

You should ignore the first row in testgrid (k8s.io/kubernetes/test/integration/scheduler_perf.scheduler_perf), as it doesn't affect the results and check only the second one (ci-benchmark-scheduler-perf-master.Overall).

Relevant SIG(s)

/sig scheduling

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-10-21T08:07:47Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

macsko · 2024-10-21T08:08:06Z

/assign @dom4ha

dom4ha · 2024-10-22T10:27:34Z

Thanks @macsko, working in #128262 to tune these params.

sanposhiho · 2024-10-26T02:14:22Z

/reopen
/remove-kind failing-test
/kind flake

Still looks like flake?
https://prow.k8s.io/job-history/gs/kubernetes-ci-logs/logs/ci-benchmark-scheduler-perf-master?buildId=1846949494873133056

k8s-ci-robot · 2024-10-26T02:14:27Z

@sanposhiho: Reopened this issue.

In response to this:

/reopen
/remove-kind failing-test
/kind flake

Still looks like flake?
https://prow.k8s.io/job-history/gs/kubernetes-ci-logs/logs/ci-benchmark-scheduler-perf-master?buildId=1846949494873133056

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

sanposhiho · 2024-10-26T03:36:56Z

Unrelated to the flake though, in the logs during PreemptionAsync, I don't see any preemption success messages. Rather, I see ↓ only:

I1026 11:25:33.658425   31249 schedule_one.go:1056] "Unable to schedule pod; no fit; waiting" pod="namespace-2/pod-h-glg8x" err="0/5000 nodes are available: 5000 Insufficient cpu. preemption: 0/5000 nodes are available: 5000 Insufficient cpu."
I10

Have we made mistake making this test case somewhere? I'll look into it

sanposhiho · 2024-10-26T05:10:34Z

Node only has 4 cpu, while high priority pods have 9 CPU... 😓

dom4ha · 2024-10-28T13:47:04Z

Thanks for spotting it. I had to mess up at some point. Initially I was modifying PreemptionBasic test which had uses config/templates/pod-high-priority.yaml and the preemption was indeed working...

I will look closer into this test again.

dom4ha · 2024-10-28T18:19:16Z

In general the performance test are flaky. I checked a few failures and in all the cases there were a few tests failing due to

expected SchedulingThroughput Average to be higher: got X, want Y

scheduler_perf.go:1298: ERROR: op 2: BenchmarkPerfScheduling/Unschedulable/5kNodes/10kPods/namespace-2: expected SchedulingThroughput Average to be higher: got 246.885834, want 250.000000
scheduler_perf.go:1298: ERROR: op 3: BenchmarkPerfScheduling/PreemptionAsync/5000Nodes/namespace-3: expected SchedulingThroughput Average to be higher: got 103.320968, want 120.000000
scheduler_perf.go:1298: ERROR: op 2: BenchmarkPerfScheduling/Unschedulable/5kNodes/10kPods/namespace-2: expected SchedulingThroughput Average to be higher: got 246.885834, want 250.000000

I can adjust the limits to reduce the flakiness level, but I'm afraid that there will be quite substantial test throughput variation, which will periodically fail this test anyway.

sanposhiho · 2024-10-29T02:57:17Z

I can adjust the limits to reduce the flakiness level, but I'm afraid that there will be quite substantial test throughput variation, which will periodically fail this test anyway.

Yeah, too conservative threshold wouldn't be great.

So, are the results from those two more scattered than other tests?
If No, we can just lower the thresholds, at least for now. Because that means, even if we lowered the thresholds here, they'd just be as conservative as other tests and could be acceptable.
But, if Yes, then I think the best way is to find why those tests are more scattered than others, and try to make the result more stable..

macsko · 2024-10-29T09:27:14Z

These tests aren't more scattered than the others. Unschedulable (yellow = average):

I think threshold around 200 should be good.

PreemptionAsync:

Scheduling throughput increased after #128348 and I think the actual threshold is too low, so we should increase it after getting more data.

dom4ha · 2024-10-29T23:32:16Z

These tests aren't more scattered than the others. Unschedulable (yellow = average)

Exactly. I actually gave wrong example, as I meant that other tests are also flaky (not only PreemptionAsync and Unschedulable):

scheduler_perf.go:1298: ERROR: op 2: BenchmarkPerfScheduling/SchedulingWithNodeInclusionPolicy/5000Nodes/namespace-2: expected SchedulingThroughput Average to be higher: got 49.791495, want 68.000000
scheduler_perf.go:1298: ERROR: op 3: BenchmarkPerfScheduling/PreemptionAsync/5000Nodes/namespace-3: expected SchedulingThroughput Average to be higher: got 91.335472, want 120.000000

I will gathered some numbers for the most recent runs to see whether/how the thresholds should be adjusted.

Scheduling throughput increased after #128348 and I think the actual threshold is too low, so we should increase it after getting more data.

The performance difference is quite surprising to me. I'd expect the fixed test to work slower (go through full preemption process). Note that the Unschedulable test is in fact similar to the broken test, but the throughput is much higher (to be precise, the throughput is comparable, but churn rate is much higher - 100/s vs 5/s).

The main difference between them is scheduling high priority pods in churn. This let me think that the Unschedulable test is actually not doing what I thought it will do, because the churn pods get to the end of the queue, so in the end we process way lower number than expected.

I changed to put high priority pods instead, so that they really take scheduler time at the defined rate. The throughput should become comparable now: #128427

sanposhiho · 2024-10-30T08:35:13Z

The performance difference is quite surprising to me.

I guess, the reason is that, in a previous test case, all periodically created high-priority Pods are all unschedulable, piled up on the queue, and result in an additional load for the scheduleer.

dom4ha · 2024-10-30T13:00:12Z

I guess, the reason is that, in a previous test case, all periodically created high-priority Pods are all unschedulable, piled up on the queue, and result in an additional load for the scheduleer.

I looked deeper into this phenomena and I noticed that the time of processing unschedulable pods depends on the number of scheduled pods (initialPods). Apparently, in PostFilter, the preemption plugin goes thought all the pods even though the Pod is unschedulable on the Node itself.

So, the preemption scenario is faster, as the preemption plugin finds candidates very quickly and the time to preempt them (send api calls synchronously) is actually smaller than going through all the remaining candidates.

In the context of #126858, in cases where there are thousands of pods running, making api calls asynchronously for the unschedulable pods may not bring the expected improvements, as going though all pod candidates may be more expensive than the api call itself. I can imagine how high priority unschedulable pods can block scheduling for some time.

macsko added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label Oct 21, 2024

k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 21, 2024

k8s-ci-robot assigned dom4ha Oct 21, 2024

dom4ha mentioned this issue Oct 22, 2024

Tune PreemptionAsync and Unschedulable tests threshold and params. #128262

Merged

k8s-ci-robot closed this as completed in #128262 Oct 24, 2024

k8s-ci-robot added kind/flake Categorizes issue or PR as related to a flaky test. and removed kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. labels Oct 26, 2024

k8s-ci-robot reopened this Oct 26, 2024

sanposhiho mentioned this issue Oct 26, 2024

Fix: use pod-high-priority.yaml to trigger preemption in PreemptionAsync test case #128348

Merged

dom4ha mentioned this issue Oct 29, 2024

Fix Unschedulable test by using high priority churn pods to get processed right after they were injected #128427

Merged

dom4ha mentioned this issue Oct 31, 2024

feature: Make Unschedulable scheduler performance test parametrized with the number of initial nodes. #128466

Open

dom4ha mentioned this issue Nov 25, 2024

Adjust performance test throughput threshold limits #128968

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing ci-benchmark-scheduler-perf-master tests for PreemptionAsync and Unschedulable tests #128221

Failing ci-benchmark-scheduler-perf-master tests for PreemptionAsync and Unschedulable tests #128221

macsko commented Oct 21, 2024

k8s-ci-robot commented Oct 21, 2024

macsko commented Oct 21, 2024

dom4ha commented Oct 22, 2024

sanposhiho commented Oct 26, 2024

k8s-ci-robot commented Oct 26, 2024

sanposhiho commented Oct 26, 2024

sanposhiho commented Oct 26, 2024

dom4ha commented Oct 28, 2024

dom4ha commented Oct 28, 2024

sanposhiho commented Oct 29, 2024

macsko commented Oct 29, 2024 •

edited

Loading

dom4ha commented Oct 29, 2024

sanposhiho commented Oct 30, 2024

dom4ha commented Oct 30, 2024

Failing ci-benchmark-scheduler-perf-master tests for PreemptionAsync and Unschedulable tests #128221

Failing ci-benchmark-scheduler-perf-master tests for PreemptionAsync and Unschedulable tests #128221

Comments

macsko commented Oct 21, 2024

Which jobs are failing?

Which tests are failing?

Since when has it been failing?

Testgrid link

Reason for failure (if possible)

Anything else we need to know?

Relevant SIG(s)

k8s-ci-robot commented Oct 21, 2024

macsko commented Oct 21, 2024

dom4ha commented Oct 22, 2024

sanposhiho commented Oct 26, 2024

k8s-ci-robot commented Oct 26, 2024

sanposhiho commented Oct 26, 2024

sanposhiho commented Oct 26, 2024

dom4ha commented Oct 28, 2024

dom4ha commented Oct 28, 2024

sanposhiho commented Oct 29, 2024

macsko commented Oct 29, 2024 • edited Loading

dom4ha commented Oct 29, 2024

sanposhiho commented Oct 30, 2024

dom4ha commented Oct 30, 2024

macsko commented Oct 29, 2024 •

edited

Loading