-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce a churnOp to scheduler perf testing framework #98900
Conversation
/assign @adtac |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Huang-Wei The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Why does creating a service generate requeuing? Is this because of a non-default plugin? If so, this requeuing event won't be very helpful in the future. Rather, we can test the most critical paths: node creation, tolerations change, pod affinity, etc. |
Our current logic moves pods anyways upon service events, so a Service generator can be an easy way to simulate churn.
Probably, esp. if we remove the service event handlers when ServiceAffinity gets deprecated.
We definitely should exercise to introduce those events as well as Volume events, but again, a Service generator can be a good start. |
Not just that, which is happening soon regardless. With your change, since the default plugins don't include ServiceAffinity, we would effective remove any reactions to service events. So, what I'm saying is that this test is not very future proof. |
6397b83
to
b553c83
Compare
@alculquicondor I updated the PR to support add/deleting API objects interleavingly. API objects include Nodes/Pods/Services for now. |
@@ -43,6 +44,7 @@ const ( | |||
configFile = "config/performance-config.yaml" | |||
createNodesOpcode = "createNodes" | |||
createPodsOpcode = "createPods" | |||
churnOpcode = "churn" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's name this in a way that test configurations are more readable. Perhaps continuosRecreates
or noiseRecreates
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer churn too. Maybe "backgroundChurn" to emphasise that this is non-blocking? would also somewhat cover @alculquicondor's suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- opcode: churn
serviceTemplatePath: config/churn/service-default.yaml
intervalMilliseconds: 1000
this doesn't make sense to me. It almost feel like networking churn. Maybe recreatesChurn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pure-service churn is an example of evaluating how Service events may or may not churn the cluster. I removed it to postpone the debate of how practical it could be - I'd like to get to a consensus of how a churn op looks like first, and in follow-up PRs we can add/discuss test cases on demand, also how wild the test case can churn the cluster.
The current logic is to create & delete customizable API objects interleavingly. Other options are:
- create & delete N API objects interleavingly
- keep creating API objects (without deleting them)
@alculquicondor @adtac WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have something like this:
- opcode: apiChurn
mode: recreate
serviceTemplatePath: config/churn/service-default.yaml
intervalMilliseconds: 1000
Note that I'm not referring to the specific test case, just how the test description and configuration looks like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG. So we have different churn modes/patterns to choose from.
- recreate (or continuous-create)
- ephemeral-create (create & delete a single object)
- round-robin-create (create N objects, and then delete N objects, N can be configurable)
- and probably incorporate update events
BTW: when you say recreate
, do you mean continuously creating API objects?
@@ -449,3 +449,32 @@ | |||
initNodes: 5000 | |||
initPods: 2000 | |||
measurePods: 5000 | |||
|
|||
- name: PendingPodsWithMixedChurn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have another idea, which perhaps is more realistic and something we would like tested.
Create pods first (they are unschedulable). Then create Nodes (perhaps have a ticker for it). We would have to combine that with pending pods that are unschedulable for different reasons (like waiting for pod affinity).
@adtac didn't we have a test similar to what I'm describing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another very important one:
New nodes show up, but with not-ready taint, then the taint is removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create pods first (they are unschedulable). Then create Nodes (perhaps have a ticker for it).
SG. However, in that case we need to enable both SkipWaitToCompletion
and collectMetrics
, which is not supported yet:
kubernetes/test/integration/scheduler_perf/scheduler_perf_test.go
Lines 379 to 381 in b5808c6
// CollectMetrics and SkipWaitToCompletion can never be true at the | |
// same time, so if we're here, it means that all pods have been | |
// scheduled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't we have a test similar to what I'm describing?
no, I don't think we did. multiple createNodes in one workload was one of my motivations for the scheduler_perf rewrite, but I never got around to opening a PR with such a workloda even though I tested it locally :/
However, in that case we need to enable both SkipWaitToCompletion and collectMetrics
True. Unfortunately, adding support for specifying both parameters seems to be quite difficult because of how metrics/throughput collection works.
The node taint one is a good idea 👍 but, of course, it requires a modifyNodes op or something like that.
/retest |
countParam: $initPods | ||
podTemplatePath: config/pod-high-priority-large-cpu.yaml | ||
skipWaitToCompletion: true | ||
- opcode: churn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the churn pods be in the same namespace as the createPods op? I can't remember if that makes a difference in the scheduler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the users can provide namespace name in the configuration if they think namespace is correlated.
@@ -43,6 +44,7 @@ const ( | |||
configFile = "config/performance-config.yaml" | |||
createNodesOpcode = "createNodes" | |||
createPodsOpcode = "createPods" | |||
churnOpcode = "churn" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer churn too. Maybe "backgroundChurn" to emphasise that this is non-blocking? would also somewhat cover @alculquicondor's suggestion
@@ -449,3 +449,32 @@ | |||
initNodes: 5000 | |||
initPods: 2000 | |||
measurePods: 5000 | |||
|
|||
- name: PendingPodsWithMixedChurn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't we have a test similar to what I'm describing?
no, I don't think we did. multiple createNodes in one workload was one of my motivations for the scheduler_perf rewrite, but I never got around to opening a PR with such a workloda even though I tested it locally :/
However, in that case we need to enable both SkipWaitToCompletion and collectMetrics
True. Unfortunately, adding support for specifying both parameters seems to be quite difficult because of how metrics/throughput collection works.
The node taint one is a good idea 👍 but, of course, it requires a modifyNodes op or something like that.
/retest |
f579471
to
1067601
Compare
1067601
to
be4ef99
Compare
Follow-up of #98900 (comment), I rewrote this PR to provide different modes (currently @alculquicondor @adtac Please see updated description section for more details. |
be4ef99
to
0bec07e
Compare
barrierOpcode = "barrier" | ||
|
||
Recreate = "recreate" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these ones deserve a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
b.Fatalf("op %d: unable to parse the %v-th template path: %v", opIndex, i, err) | ||
} | ||
// Obtain GVR. | ||
mapping, err := restMapper.RESTMapping(gvk.GroupKind(), gvk.Version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I didn't know this was possible.
client := clientset.NewForConfigOrDie(cfg) | ||
dynClient := dynamic.NewForConfigOrDie(cfg) | ||
cachedClient := cacheddiscovery.NewMemCacheClient(client.Discovery()) | ||
restMapper := restmapper.NewDeferredDiscoveryRESTMapper(cachedClient) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: calculate restMapper
outside of this function, as only one test cares about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
0bec07e
to
0405215
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good for squash to me.
Anything to add @adtac?
barrierOpcode = "barrier" | ||
|
||
// Two modes supported in "churn" operator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add empty line, as this comment refers to more than one constant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
- support two modes: recreate and create - use DynmaicClient to create API objects
0405215
to
1e5878b
Compare
/retest |
/priority important-soon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/hold for nit since this is approved, feel free to remove and merge
sorry I couldn't review this earlier, I was busy with the code freeze :) we're still within the test freeze, so this PR should be fine for 1.21
@@ -116,7 +116,7 @@ type testConfig struct { | |||
|
|||
// getBaseConfig returns baseConfig after initializing number of nodes and pods. | |||
func getBaseConfig(nodes int, pods int) *testConfig { | |||
destroyFunc, podInformer, clientset := mustSetupScheduler() | |||
destroyFunc, podInformer, clientset, _ := mustSetupScheduler() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO there is not much difference among the names "client", "cs" and "clientset" - they're usually used interchangeably. As this file is deprecated, so I leave the name as is. In other places that jointly used clientset and dynamic clientset, I renamed them to client
and dynamicClient
.
So I'm going to:
/hold cancel
/milestone v1.21 This PR improves testing, hence eligible for the milestone. |
/retest |
@Huang-Wei: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest |
What type of PR is this?
/kind feature
/sig scheduling
What this PR does / why we need it:
Introduce a churn operator to scheduler perf testing framework. The churn operator is aimed to provide configurable options to continuously churn the cluster by creating/recreating API objects. Below is an explanation for new fields:
churn
number
number
create
) or an infinite create-N-objs-delete-N-objs iteration (for moderecreate
).Which issue(s) this PR fixes:
Fix the 3rd issue of #98898.
Special notes for your reviewer:
Some examples:
Does this PR introduce a user-facing change?: