Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add density (batch pods creation latency and resource) and resource performance tests to `test-e2e-node' #29764

Merged
merged 4 commits into from
Aug 3, 2016
Merged

Conversation

coufon
Copy link
Contributor

@coufon coufon commented Jul 29, 2016

This PR contains two new tests (migrate from e2e test):

  1. Density test: verify startup latency and resource usage when create a batch of pod with throughput control. Throughput control is done by sleep for an interval between firing concurrently create pod operations.
    It tests both batch creation and sequential (back-to-back) creation and report the throughputs.
  2. Verify resource usage of steady state kubelet.

The test creates a new resource controller for `test-node-e2e' (resource_controller.go) which monitors resource through a standalone Cadvisor pod (port 8090) with 1s housekeeping interval.

@coufon
Copy link
Contributor Author

coufon commented Jul 29, 2016

@Random-Liu
Copy link
Member

/cc @kubernetes/sig-node

@yujuhong
Copy link
Contributor

@coufon, there are many commits in your PR. Could you squash them and only leave functionally meaningful commits? Thanks!

@yujuhong yujuhong added sig/node Categorizes an issue or PR as relevant to SIG Node. area/test labels Jul 29, 2016
@k8s-github-robot k8s-github-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. release-note-label-needed labels Jul 29, 2016
@Random-Liu Random-Liu added release-note Denotes a PR that will be considered when it comes time to generate release notes. area/node-e2e and removed release-note-label-needed labels Jul 29, 2016
@Random-Liu Random-Liu assigned dchen1107 and yujuhong and unassigned timstclair Jul 29, 2016
@timothysc
Copy link
Member

/cc @rrati @mffiedler @jeremyeder - add to node-vertical.

@@ -251,6 +251,9 @@ func (es *e2eService) startKubeletServer() (*killCmd, error) {
"--file-check-frequency", "10s", // Check file frequently so tests won't wait too long
"--v", LOG_VERBOSITY_LEVEL, "--logtostderr",
"--pod-cidr=10.180.0.0/24", // Assign a fixed CIDR to the node because there is no node controller.
"--cgroup-root=/",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work with systemd node.

cc/ @yifan-gu on CoreOS configuration, and cc/ @derekwaynecarr on the rest based on "standardized" systemd node.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flag (and the other cgroup related flags) should not be explicitly set. Just take default kubelet behaviors.

@dchen1107
Copy link
Member

Looks good to me overall. We need to figure out how to pass those different cgroup configurations though.

Copyright 2016 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The boilerplate is wrong. If you compare this with other files in the repository, you'll notice that an empty line missing before an after the URL. They probably got lost during copy/paste. Need to fix that to pass the tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

densityTests := []DensityTest{
{
podsNr: 10,
interval: 0 * time.Millisecond,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer defining the pod startup throughput (pods per second) and convert it into an interval. Pod throughput is more readable.

Copy link
Contributor Author

@coufon coufon Jul 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the `throughput' in configuration may be different from the throughput measured from result. For example, if the interval is zero, the throughput perhaps is infinite. In this case how to define it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it was a naming issue, just name it more precisely. E.g., targetThroughput

I am not sure we should have a 0 interval anyway...

@dchen1107
Copy link
Member

dchen1107 commented Aug 2, 2016

The build is failed with the error:

# k8s.io/kubernetes/test/e2e_node
test/e2e_node/resource_controller.go:372: undefined: "k8s.io/kubernetes/pkg/util".NewUUID
!!! Error in /go/src/k8s.io/kubernetes/hack/lib/golang.sh:542

Forgot uploading some changes?

@dchen1107
Copy link
Member

LGTM

@dchen1107 dchen1107 added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 3, 2016
@k8s-bot
Copy link

k8s-bot commented Aug 3, 2016

GCE e2e build/test passed for commit 04f83c7.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 371c396 into kubernetes:master Aug 3, 2016
@gmarek
Copy link
Contributor

gmarek commented Aug 3, 2016

@lavalamp @fejta - it seems that tests weren't run even though retest-not-required label is not present (or I'm missing something)

@gmarek
Copy link
Contributor

gmarek commented Aug 3, 2016

I believe it broke k8s build. Reverting - I'll re-revert if it won't help. What's strange is that the build works locally for me... @fejta - it may be some problem with our build system

@gmarek
Copy link
Contributor

gmarek commented Aug 3, 2016

Revert fixed the build. @coufon @dchen1107 PTAL.

@lavalamp
Copy link
Member

lavalamp commented Aug 3, 2016

@apelisse or @foxish can one of you investigate what happened here with the queue?

@foxish
Copy link
Contributor

foxish commented Aug 3, 2016

04:43:13.000
I0803 11:43:13.600169 1 submit-queue.go:1149] Skipping retest since head and base sha match previous attempt!
04:43:14.000
I0803 11:43:14.087789 1 github.go:1425] Merging PR# 29764

Looks like it skipped re-test because tests already succeeded at that SHA.

@ixdy
Copy link
Member

ixdy commented Aug 3, 2016

I think the issue is that this PR only builds on linux. e.g. https://github.com/opencontainers/runc/blob/519529febe2e0acb14e39cf54112c292ccb2eabe/libcontainer/cgroups/utils.go, which defines ParseCgroupFile, tags +build linux.

To make SQ faster, we only build for Linux in PR Jenkins. So it passed there.

k8s-github-robot pushed a commit that referenced this pull request Aug 4, 2016
Automatic merge from submit-queue

Resolve docker-daemon cgroup issue for both systemd and non-systemd node for node e2e tests

Fixed #29827

cc/ @coufon this should unblock your pr: #29764

I validated both containervm image and coreos image, and works as expected. 

This is also required for adding gci image to node e2e test infrastructure.
k8s-github-robot pushed a commit that referenced this pull request Aug 5, 2016
Automatic merge from submit-queue

Add density (batch pods creation latency and resource) and resource performance tests to `test-e2e-node' built for Linux only

This PR adds `+build linux' to density_test.go, resource_usage.go and resource_collector.go to last PR #29764. 

#29764 fails build because it depends on cgroup which can not be built for os other than Linux.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/node-e2e area/test lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.