Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scheduler plugin execution duration metric. #84522

Merged
merged 1 commit into from
Nov 12, 2019

Conversation

liu-cong
Copy link
Contributor

@liu-cong liu-cong commented Oct 29, 2019

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

/kind feature

What this PR does / why we need it:
Helps us understand feature adoption as well as performance of each plugins.

Which issue(s) this PR fixes:

Fixes #83416

Special notes for your reviewer:
I compared benchmark results for
a) master branch,
b) master branch with extension point metrics removed,
c) this change and
d) this change with sampling every 15 scheduling cycles.

a) vs b): b) improves over a) slightly in many cases, but almost negligible considering noises.
c) vs d): By sampling the performance could be improved for some cases but not for all.

It's surprising that c) is better than a)/b) in test case #1,3,8 consistently, while worse in #2,4,5,6 and very close for #7. I don't know how these numbers map to real use cases, given the value of getting visibility into each plugin, I prefer to get this change out and get some feedback from users.

Benchmark Result (ns/op)

Test #  Test Master No metric Periodic flush Periodic flush with 1/15 sample
1 BenchmarkSchedulingV/5000Nodes/1000Pods-12 9237142 9229039 7761435 6754970
2 BenchmarkSchedulingPodAntiAffinity/5000Nodes/1000Pods-12 9114743 8970976 10910383 9660827
3 BenchmarkSchedulingSecrets/5000Nodes/1000Pods-12 8656598 10362103 8220864 7031266
4 BenchmarkSchedulingInTreePVs/5000Nodes/1000Pods-12 14089110 13623460 15478250 14093959
5 BenchmarkSchedulingMigratedInTreePVs/5000Nodes/1000Pods-12 12615123 12240668 14704521 13194460
6 BenchmarkSchedulingCSIPVs/5000Nodes/1000Pods-12 11570522 11356818 13410923 11998329
7 BenchmarkSchedulingPodAffinity/5000Nodes/1000Pods-12 24290660 25244090 24926082 25813202
8 BenchmarkSchedulingNodeAffinity/5000Nodes/1000Pods-12 19159768 18998646 17467853 18523843

Does this PR introduce a user-facing change?:

Add plugin_execution_duration_seconds metric for scheduler framework plugins.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 29, 2019
@k8s-ci-robot k8s-ci-robot requested review from hex108 and k82cn October 29, 2019 14:58
@liu-cong
Copy link
Contributor Author

@ahg-g
Added latency metric for Filter and Score plugins for doing perf tests. I couldn't run the perf tests locally for some reason. While I am trying to fix that, do you mind checking out this diff and run it on you machine?

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 29, 2019
@liu-cong liu-cong force-pushed the plugin branch 3 times, most recently from aa4f6a8 to 1372a16 Compare November 11, 2019 14:10
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Nov 11, 2019
@liu-cong
Copy link
Contributor Author

/assign @ahg-g

@liu-cong liu-cong marked this pull request as ready for review November 11, 2019 14:37
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 11, 2019
pkg/scheduler/framework/v1alpha1/framework.go Outdated Show resolved Hide resolved
pkg/scheduler/framework/v1alpha1/metrics_recorder.go Outdated Show resolved Hide resolved
}

// metricRecorder records framework metrics in a separate goroutine to avoid overhead in the critical path.
type realMetricsRecorder struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unit tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the behavior is tested in the framework_test.go, as we verify that metrics are recorded successfully. Or are you talking about specific test cases for this struct only?

@ahg-g
Copy link
Member

ahg-g commented Nov 11, 2019

Here are the benchmark numbers I am getting, the first one is likely to be one of the most common workload scenarios, and we observe roughly 20% perf drop:

With the change

BenchmarkSchedulingV/5000Nodes/1000Pods-12          1000           2237820 ns/op
BenchmarkSchedulingPodAntiAffinity/5000Nodes/1000Pods-12                    1000           5520113 ns/op
BenchmarkSchedulingSecrets/5000Nodes/1000Pods-12                            1000           2506520 ns/op
BenchmarkSchedulingInTreePVs/5000Nodes/1000Pods-12                          1000           6115268 ns/op
BenchmarkSchedulingMigratedInTreePVs/5000Nodes/1000Pods-12                  1000           6025317 ns/op
BenchmarkSchedulingCSIPVs/5000Nodes/1000Pods-12                             1000           6197104 ns/op
BenchmarkSchedulingPodAffinity/5000Nodes/1000Pods-12                        1000           7251515 ns/op
BenchmarkSchedulingNodeAffinity/5000Nodes/1000Pods-12                       1000           3317381 ns/op

With the change:

BenchmarkSchedulingV/5000Nodes/1000Pods-12          1000           2724880 ns/op
BenchmarkSchedulingPodAntiAffinity/5000Nodes/1000Pods-12                    1000           5885653 ns/op
BenchmarkSchedulingSecrets/5000Nodes/1000Pods-12                            1000           2858365 ns/op
BenchmarkSchedulingInTreePVs/5000Nodes/1000Pods-12                          1000           6258707 ns/op
BenchmarkSchedulingMigratedInTreePVs/5000Nodes/1000Pods-12                  1000           6306560 ns/op
BenchmarkSchedulingCSIPVs/5000Nodes/1000Pods-12                             1000           6346709 ns/op
BenchmarkSchedulingPodAffinity/5000Nodes/1000Pods-12                        1000           7600392 ns/op
BenchmarkSchedulingNodeAffinity/5000Nodes/1000Pods-12                       1000           3556736 ns/op

To reduce the overhead further, we can sample reading the time and pushing to metricsRecorder channel. So we either measure for all plugins in a specific Run* invocation, or we do nothing.

@liu-cong
Copy link
Contributor Author

I updated to the latest master and here are the new results:

make test-integration WHAT=./test/integration/scheduler_perf KUBE_TEST_VMODULE="''" KUBE_TEST_ARGS="-alsologtostderr=false -logtostderr=false -run=^$$ -bench=BenchmarkScheduling/5000Nodes/1000Pods -cpuprofile ~/cpu-master.out" |grep Benchmark

Master:
BenchmarkSchedulingV/5000Nodes/1000Pods-12 1000 4900572 ns/op
BenchmarkSchedulingPodAntiAffinity/5000Nodes/1000Pods-12 1000 5321680 ns/op
BenchmarkSchedulingSecrets/5000Nodes/1000Pods-12 1000 4943341 ns/op
BenchmarkSchedulingInTreePVs/5000Nodes/1000Pods-12 1000 4078854 ns/op
BenchmarkSchedulingMigratedInTreePVs/5000Nodes/1000Pods-12 1000 4590515 ns/op
BenchmarkSchedulingCSIPVs/5000Nodes/1000Pods-12 1000 5134987 ns/op
BenchmarkSchedulingPodAffinity/5000Nodes/1000Pods-12 1000 12269336 ns/op
BenchmarkSchedulingNodeAffinity/5000Nodes/1000Pods-12 1000 6553250 ns/op

With the change:
BenchmarkSchedulingV/5000Nodes/1000Pods-12 1000 4119963 ns/op
BenchmarkSchedulingPodAntiAffinity/5000Nodes/1000Pods-12 1000 6134407 ns/op
BenchmarkSchedulingSecrets/5000Nodes/1000Pods-12 1000 4521283 ns/op
BenchmarkSchedulingInTreePVs/5000Nodes/1000Pods-12 1000 4110121 ns/op
BenchmarkSchedulingMigratedInTreePVs/5000Nodes/1000Pods-12 1000 4280762 ns/op
BenchmarkSchedulingCSIPVs/5000Nodes/1000Pods-12 1000 3930512 ns/op
BenchmarkSchedulingPodAffinity/5000Nodes/1000Pods-12 1000 12414046 ns/op
BenchmarkSchedulingNodeAffinity/5000Nodes/1000Pods-12 1000 6739703 ns/op

I am still seeing better results with this change for the first test case, which is strange. Now I trust your results more than mine... Can you please take the latest commit with sampling and run it again?
@ahg-g

@ahg-g
Copy link
Member

ahg-g commented Nov 11, 2019

with the latest patch:

BenchmarkSchedulingV/5000Nodes/1000Pods-12          1000           2417580 ns/op
BenchmarkSchedulingPodAntiAffinity/5000Nodes/1000Pods-12                    1000           5496006 ns/op
BenchmarkSchedulingSecrets/5000Nodes/1000Pods-12                            1000           2535986 ns/op
BenchmarkSchedulingInTreePVs/5000Nodes/1000Pods-12                          1000           6165100 ns/op
BenchmarkSchedulingMigratedInTreePVs/5000Nodes/1000Pods-12                  1000           6099270 ns/op
BenchmarkSchedulingCSIPVs/5000Nodes/1000Pods-12                             1000           6150523 ns/op
BenchmarkSchedulingPodAffinity/5000Nodes/1000Pods-12                        1000           7309418 ns/op
BenchmarkSchedulingNodeAffinity/5000Nodes/1000Pods-12                       1000           3344050 ns/op

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 12, 2019
Copy link
Member

@ahg-g ahg-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is much better, overhead is negligible.

pkg/scheduler/framework/v1alpha1/framework.go Outdated Show resolved Hide resolved
@ahg-g
Copy link
Member

ahg-g commented Nov 12, 2019

/approve
please squash

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, liu-cong

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 12, 2019
@ahg-g
Copy link
Member

ahg-g commented Nov 12, 2019

/retest

Address comments

Sample metrics

Use rand.Intn and some cleanup
@ahg-g
Copy link
Member

ahg-g commented Nov 12, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 12, 2019
@liu-cong
Copy link
Contributor Author

/retest

2 similar comments
@liu-cong
Copy link
Contributor Author

/retest

@liu-cong
Copy link
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot merged commit ce11622 into kubernetes:master Nov 12, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Nov 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add metrics to track requests and durations for the scheduler framework
4 participants