Introduce a churnOp to scheduler perf testing framework #98900

Huang-Wei · 2021-02-09T01:20:07Z

What type of PR is this?

/kind feature
/sig scheduling

What this PR does / why we need it:

Introduce a churn operator to scheduler perf testing framework. The churn operator is aimed to provide configurable options to continuously churn the cluster by creating/recreating API objects. Below is an explanation for new fields:

opcode: the value must be churn

templatePaths: an array specifying spec files for different API objects. For instance:

templatePaths:
- config/churn/node-default.yaml
- config/churn/service-default.yaml

mode: two modes are supported for now:
- recreate: creates N objects and then deletes afore-create N objects. N is specified by number
- create: continuously create objects until a threshold is met. The threshold is specified by number
number: cycles of operations to run-to-stop (for mode create) or an infinite create-N-objs-delete-N-objs iteration (for mode recreate).

Which issue(s) this PR fixes:

Fix the 3rd issue of #98898.

Special notes for your reviewer:

Some examples:

templatePaths = [churn/node.yaml, churn/service.yaml], mode = recreate, number = 1

[DEBUG] create name = node-churn-rgx94
[DEBUG] create name = service-churn-vjcp7
[DEBUG] create name = node-churn-ztvgz
[DEBUG] create name = service-churn-c4tnc
[DEBUG] delete name = node-churn-rgx94
[DEBUG] delete name = service-churn-vjcp7
[DEBUG] delete name = node-churn-ztvgz
[DEBUG] delete name = service-churn-c4tnc
[DEBUG] create name = node-churn-82lp8
[DEBUG] create name = service-churn-nmz8l
[DEBUG] create name = node-churn-lggct
[DEBUG] create name = service-churn-r6cfr
[DEBUG] delete name = node-churn-82lp8
[DEBUG] delete name = service-churn-nmz8l
[DEBUG] delete name = node-churn-lggct
[DEBUG] delete name = service-churn-r6cfr
[DEBUG] create name = node-churn-rzdzv
[DEBUG] create name = service-churn-g7dgf
# benchmark test stops.

templatePaths = [churn/node.yaml, churn/service.yaml], mode = recreate, number = 3

[DEBUG] create name = node-churn-52ptg
[DEBUG] create name = service-churn-nxhqj
[DEBUG] create name = node-churn-tmvwz
[DEBUG] create name = service-churn-hxnnz
[DEBUG] create name = node-churn-lj9w5
[DEBUG] create name = service-churn-kc5cd
[DEBUG] delete name = node-churn-52ptg
[DEBUG] delete name = service-churn-nxhqj
[DEBUG] delete name = node-churn-tmvwz
[DEBUG] delete name = service-churn-hxnnz
[DEBUG] delete name = node-churn-lj9w5
[DEBUG] delete name = service-churn-kc5cd
# benchmark test stops.

templatePaths = [churn/node.yaml, churn/service.yaml], mode = create, number = 0 (0 = infinite)

[DEBUG] create name = node-churn-vkvlt
[DEBUG] create name = service-churn-tbkvf
[DEBUG] create name = node-churn-hpfp8
[DEBUG] create name = service-churn-c5xl5
[DEBUG] create name = node-churn-lddzf
[DEBUG] create name = service-churn-czxfw
[DEBUG] create name = node-churn-pnf6x
[DEBUG] create name = service-churn-fpd97
[DEBUG] create name = node-churn-mjp4w
[DEBUG] create name = service-churn-gpssn
# benchmark test stops.

templatePaths = [churn/node.yaml, churn/service.yaml], mode = create, number = 2

[DEBUG] create name = node-churn-txcbv
[DEBUG] create name = service-churn-wlpwj
[DEBUG] create name = node-churn-fq4xd
[DEBUG] create name = service-churn-vfv2s
# benchmark test not stop yet
# no ouptut from churn opeartor
# benchmark test stops.

Does this PR introduce a user-facing change?:

NONE

Huang-Wei · 2021-02-09T01:20:18Z

/assign @adtac

k8s-ci-robot · 2021-02-09T01:21:13Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Huang-Wei

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~test/integration/scheduler_perf/OWNERS~~ [Huang-Wei]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

alculquicondor · 2021-02-09T14:28:37Z

Why does creating a service generate requeuing? Is this because of a non-default plugin?

If so, this requeuing event won't be very helpful in the future. Rather, we can test the most critical paths: node creation, tolerations change, pod affinity, etc.

Huang-Wei · 2021-02-09T23:12:01Z

Why does creating a service generate requeuing? Is this because of a non-default plugin?

Our current logic moves pods anyways upon service events, so a Service generator can be an easy way to simulate churn.

If so, this requeuing event won't be very helpful in the future.

Probably, esp. if we remove the service event handlers when ServiceAffinity gets deprecated.

Rather, we can test the most critical paths: node creation, tolerations change, pod affinity, etc.

We definitely should exercise to introduce those events as well as Volume events, but again, a Service generator can be a good start.

alculquicondor · 2021-02-10T14:12:36Z

Probably, esp. if we remove the service event handlers when ServiceAffinity gets deprecated

Not just that, which is happening soon regardless. With your change, since the default plugins don't include ServiceAffinity, we would effective remove any reactions to service events.

So, what I'm saying is that this test is not very future proof.

Huang-Wei · 2021-02-11T05:19:14Z

@alculquicondor I updated the PR to support add/deleting API objects interleavingly. API objects include Nodes/Pods/Services for now.

alculquicondor · 2021-02-11T14:50:08Z

test/integration/scheduler_perf/scheduler_perf_test.go

@@ -43,6 +44,7 @@ const (
 	configFile        = "config/performance-config.yaml"
 	createNodesOpcode = "createNodes"
 	createPodsOpcode  = "createPods"
+	churnOpcode       = "churn"


let's name this in a way that test configurations are more readable. Perhaps continuosRecreates or noiseRecreates.

We used churn a lot and it's the most readable wording to me :) But I'm not a naming expert, @ahg-g @adtac any naming thoughts?

I prefer churn too. Maybe "backgroundChurn" to emphasise that this is non-blocking? would also somewhat cover @alculquicondor's suggestion

- opcode: churn serviceTemplatePath: config/churn/service-default.yaml intervalMilliseconds: 1000

this doesn't make sense to me. It almost feel like networking churn. Maybe recreatesChurn

The pure-service churn is an example of evaluating how Service events may or may not churn the cluster. I removed it to postpone the debate of how practical it could be - I'd like to get to a consensus of how a churn op looks like first, and in follow-up PRs we can add/discuss test cases on demand, also how wild the test case can churn the cluster.

The current logic is to create & delete customizable API objects interleavingly. Other options are:

create & delete N API objects interleavingly

keep creating API objects (without deleting them)

@alculquicondor @adtac WDYT?

can we have something like this:

- opcode: apiChurn mode: recreate serviceTemplatePath: config/churn/service-default.yaml intervalMilliseconds: 1000

Note that I'm not referring to the specific test case, just how the test description and configuration looks like.

SG. So we have different churn modes/patterns to choose from.

recreate (or continuous-create)

ephemeral-create (create & delete a single object)

round-robin-create (create N objects, and then delete N objects, N can be configurable)

and probably incorporate update events

BTW: when you say recreate, do you mean continuously creating API objects?

alculquicondor · 2021-02-11T14:57:49Z

test/integration/scheduler_perf/config/performance-config.yaml

@@ -449,3 +449,32 @@
      initNodes: 5000
      initPods: 2000
      measurePods: 5000
+
+- name: PendingPodsWithMixedChurn


I have another idea, which perhaps is more realistic and something we would like tested.

Create pods first (they are unschedulable). Then create Nodes (perhaps have a ticker for it). We would have to combine that with pending pods that are unschedulable for different reasons (like waiting for pod affinity).

@adtac didn't we have a test similar to what I'm describing?

Another very important one:
New nodes show up, but with not-ready taint, then the taint is removed.

Create pods first (they are unschedulable). Then create Nodes (perhaps have a ticker for it).

SG. However, in that case we need to enable both SkipWaitToCompletion and collectMetrics, which is not supported yet:

kubernetes/test/integration/scheduler_perf/scheduler_perf_test.go

Lines 379 to 381 in b5808c6

// CollectMetrics and SkipWaitToCompletion can never be true at the

// same time, so if we're here, it means that all pods have been

// scheduled.

didn't we have a test similar to what I'm describing?

no, I don't think we did. multiple createNodes in one workload was one of my motivations for the scheduler_perf rewrite, but I never got around to opening a PR with such a workloda even though I tested it locally :/

However, in that case we need to enable both SkipWaitToCompletion and collectMetrics

True. Unfortunately, adding support for specifying both parameters seems to be quite difficult because of how metrics/throughput collection works.

The node taint one is a good idea 👍 but, of course, it requires a modifyNodes op or something like that.

Huang-Wei · 2021-02-16T02:49:45Z

/retest

test/integration/scheduler_perf/scheduler_perf_test.go

adtac · 2021-02-16T12:24:33Z

test/integration/scheduler_perf/config/performance-config.yaml

+    countParam: $initPods
+    podTemplatePath: config/pod-high-priority-large-cpu.yaml
+    skipWaitToCompletion: true
+  - opcode: churn


should the churn pods be in the same namespace as the createPods op? I can't remember if that makes a difference in the scheduler

I assume the users can provide namespace name in the configuration if they think namespace is correlated.

adtac · 2021-02-16T12:28:49Z

test/integration/scheduler_perf/scheduler_perf_test.go

@@ -43,6 +44,7 @@ const (
 	configFile        = "config/performance-config.yaml"
 	createNodesOpcode = "createNodes"
 	createPodsOpcode  = "createPods"
+	churnOpcode       = "churn"


I prefer churn too. Maybe "backgroundChurn" to emphasise that this is non-blocking? would also somewhat cover @alculquicondor's suggestion

adtac · 2021-02-16T12:38:59Z

test/integration/scheduler_perf/config/performance-config.yaml

@@ -449,3 +449,32 @@
      initNodes: 5000
      initPods: 2000
      measurePods: 5000
+
+- name: PendingPodsWithMixedChurn


didn't we have a test similar to what I'm describing?

no, I don't think we did. multiple createNodes in one workload was one of my motivations for the scheduler_perf rewrite, but I never got around to opening a PR with such a workloda even though I tested it locally :/

However, in that case we need to enable both SkipWaitToCompletion and collectMetrics

True. Unfortunately, adding support for specifying both parameters seems to be quite difficult because of how metrics/throughput collection works.

The node taint one is a good idea 👍 but, of course, it requires a modifyNodes op or something like that.

Huang-Wei · 2021-02-17T02:16:54Z

/retest

Huang-Wei · 2021-03-02T07:43:33Z

Follow-up of #98900 (comment), I rewrote this PR to provide different modes (currently create and recreate) to create/recreate API objects, also introduced dynamic client to accept arbitrary spec file ins templatePaths.

@alculquicondor @adtac Please see updated description section for more details.

alculquicondor · 2021-03-02T15:26:30Z

test/integration/scheduler_perf/scheduler_perf_test.go

 	barrierOpcode     = "barrier"
+
+	Recreate = "recreate"


these ones deserve a comment

alculquicondor · 2021-03-02T15:34:52Z

test/integration/scheduler_perf/scheduler_perf_test.go

+					b.Fatalf("op %d: unable to parse the %v-th template path: %v", opIndex, i, err)
+				}
+				// Obtain GVR.
+				mapping, err := restMapper.RESTMapping(gvk.GroupKind(), gvk.Version)


Interesting. I didn't know this was possible.

alculquicondor · 2021-03-02T15:41:00Z

test/integration/scheduler_perf/util.go

+	client := clientset.NewForConfigOrDie(cfg)
+	dynClient := dynamic.NewForConfigOrDie(cfg)
+	cachedClient := cacheddiscovery.NewMemCacheClient(client.Discovery())
+	restMapper := restmapper.NewDeferredDiscoveryRESTMapper(cachedClient)


super nit: calculate restMapper outside of this function, as only one test cares about it.

alculquicondor

good for squash to me.
Anything to add @adtac?

alculquicondor · 2021-03-03T14:46:41Z

test/integration/scheduler_perf/scheduler_perf_test.go

 	barrierOpcode     = "barrier"
+
+	// Two modes supported in "churn" operator.


nit: add empty line, as this comment refers to more than one constant

- support two modes: recreate and create - use DynmaicClient to create API objects

Huang-Wei · 2021-03-03T20:01:18Z

/retest

Huang-Wei · 2021-03-03T23:08:23Z

/priority important-soon
/triage accepted

adtac

/lgtm
/hold for nit since this is approved, feel free to remove and merge
sorry I couldn't review this earlier, I was busy with the code freeze :) we're still within the test freeze, so this PR should be fine for 1.21

adtac · 2021-03-10T13:19:16Z

test/integration/scheduler_perf/scheduler_test.go

@@ -116,7 +116,7 @@ type testConfig struct {

 // getBaseConfig returns baseConfig after initializing number of nodes and pods.
 func getBaseConfig(nodes int, pods int) *testConfig {
-	destroyFunc, podInformer, clientset := mustSetupScheduler()
+	destroyFunc, podInformer, clientset, _ := mustSetupScheduler()


nit: client

IMO there is not much difference among the names "client", "cs" and "clientset" - they're usually used interchangeably. As this file is deprecated, so I leave the name as is. In other places that jointly used clientset and dynamic clientset, I renamed them to client and dynamicClient.

So I'm going to:
/hold cancel

ahg-g · 2021-03-10T17:04:48Z

/milestone v1.21

This PR improves testing, hence eligible for the milestone.

Huang-Wei · 2021-03-11T04:45:12Z

/retest

k8s-ci-robot · 2021-03-11T07:54:30Z

@Huang-Wei: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
pull-kubernetes-bazel-build	10676015aad2f00c39ac09604070e0d129a4bc07	link	`/test pull-kubernetes-bazel-build`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Huang-Wei · 2021-03-11T07:58:29Z

/retest

k8s-ci-robot assigned adtac Feb 9, 2021

k8s-ci-robot requested review from damemi and resouer February 9, 2021 01:20

k8s-ci-robot added area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Feb 9, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 9, 2021

Huang-Wei force-pushed the churn-cluster-op branch 2 times, most recently from 6397b83 to b553c83 Compare February 11, 2021 05:17

alculquicondor reviewed Feb 11, 2021

View reviewed changes

adtac reviewed Feb 16, 2021

View reviewed changes

adtac mentioned this pull request Feb 16, 2021

nominate adtac to sig-scheduling reviewers #99117

Merged

Huang-Wei force-pushed the churn-cluster-op branch 2 times, most recently from f579471 to 1067601 Compare February 17, 2021 20:15

Huang-Wei force-pushed the churn-cluster-op branch from 1067601 to be4ef99 Compare March 2, 2021 07:40

Huang-Wei force-pushed the churn-cluster-op branch from be4ef99 to 0bec07e Compare March 2, 2021 08:03

alculquicondor reviewed Mar 2, 2021

View reviewed changes

Huang-Wei force-pushed the churn-cluster-op branch from 0bec07e to 0405215 Compare March 3, 2021 07:48

alculquicondor reviewed Mar 3, 2021

View reviewed changes

Introduce a churnOp to scheduler perf testing framework

1e5878b

- support two modes: recreate and create - use DynmaicClient to create API objects

Huang-Wei force-pushed the churn-cluster-op branch from 0405215 to 1e5878b Compare March 3, 2021 14:52

This was referenced Mar 7, 2021

implement EnqueueExtensions interface in noderesources #99922

Merged

Avoid moving pods out of unschedulable status unconditionally #94009

Closed

pacoxu mentioned this pull request Mar 9, 2021

implement EnqueueExtensions interface in taint toleration scheduling #99936

Merged

adtac reviewed Mar 10, 2021

View reviewed changes

k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Mar 10, 2021

k8s-ci-robot added this to the v1.21 milestone Mar 10, 2021

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 10, 2021

k8s-ci-robot merged commit 823fa75 into kubernetes:master Mar 11, 2021

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Mar 29, 2021

	// CollectMetrics and SkipWaitToCompletion can never be true at the
	// same time, so if we're here, it means that all pods have been
	// scheduled.

		barrierOpcode = "barrier"

		// Two modes supported in "churn" operator.

Introduce a churnOp to scheduler perf testing framework #98900

Introduce a churnOp to scheduler perf testing framework #98900

Conversation

Huang-Wei commented Feb 9, 2021 • edited by liggitt Loading

Huang-Wei commented Feb 9, 2021

k8s-ci-robot commented Feb 9, 2021

alculquicondor commented Feb 9, 2021

Huang-Wei commented Feb 9, 2021

alculquicondor commented Feb 10, 2021

Huang-Wei commented Feb 11, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Huang-Wei commented Feb 16, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Huang-Wei commented Feb 17, 2021

Huang-Wei commented Mar 2, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alculquicondor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Huang-Wei commented Mar 3, 2021

Huang-Wei commented Mar 3, 2021

adtac left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahg-g commented Mar 10, 2021

Huang-Wei commented Mar 11, 2021

k8s-ci-robot commented Mar 11, 2021 • edited Loading

Huang-Wei commented Mar 11, 2021

Huang-Wei commented Feb 9, 2021 •

edited by liggitt

Loading

Huang-Wei commented Mar 2, 2021 •

edited

Loading

adtac left a comment •

edited

Loading

k8s-ci-robot commented Mar 11, 2021 •

edited

Loading