Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node Conformance Test: Containerize the node e2e test #31093

Merged

Conversation

Random-Liu
Copy link
Member

@Random-Liu Random-Liu commented Aug 22, 2016

For #30122, #30174.
Based on #32427, #32454.

Please only review the last 3 commits.

This PR packages the node e2e test into a docker image:

  • 1st commit: Add NodeConformance flag in the node e2e framework to avoid starting kubelet and collecting system logs. We do this because:
    • There are all kinds of ways to manage kubelet and system logs, for different situation we need to mount different things into the container, run different commands. It is hard and unnecessary to handle the complexity inside the test suite.
  • 2nd commit: Remove all sudo in the test container. We do this because:
  • 3rd commit: Package the test into a docker container with corresponding Makefile and Dockerfile. We also added a run_test.sh script to start kubelet and run the test container. The script is only for demonstration purpose and we'll also use the script in our node e2e framework. In the future, we should update the script to start kubelet in production way (maybe with systemd or supervisord).

@dchen1107 @vishh
/cc @kubernetes/sig-node @kubernetes/sig-testing


This change is Reviewable

Release note:

Release alpha version node test container gcr.io/google_containers/node-test-ARCH:0.2 for users to verify their node setup.

@Random-Liu Random-Liu added area/test sig/node Categorizes an issue or PR as relevant to SIG Node. area/node-e2e labels Aug 22, 2016
@Random-Liu Random-Liu added this to the v1.4 milestone Aug 22, 2016
@Random-Liu Random-Liu added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Aug 22, 2016
@k8s-github-robot k8s-github-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. release-note-label-needed labels Aug 22, 2016

# The following environment variables can be override when starting the container.
# FOCUS is regex matching test to run. By default run all conformance test.
ENV FOCUS="\[Conformance\]"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To save some layers, you can combine all ENV statements

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luxas Thanks, good to know that! Will do.

@Random-Liu Random-Liu force-pushed the containerize-node-e2e-test branch from fd046a8 to c87e7f0 Compare August 22, 2016 21:32
@Random-Liu
Copy link
Member Author

@luxas Thanks for reviewing! Addressed your comments. :)

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 23, 2016
@Random-Liu Random-Liu force-pushed the containerize-node-e2e-test branch from c87e7f0 to 4d162b0 Compare August 25, 2016 08:14
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 25, 2016
@Random-Liu Random-Liu force-pushed the containerize-node-e2e-test branch from 4d162b0 to ca928aa Compare August 25, 2016 21:13
@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 25, 2016
@Random-Liu
Copy link
Member Author

Random-Liu commented Aug 26, 2016

@timstclair @dchen1107 @vishh Can any of you take a look at this PR? :)

@vishh
Copy link
Contributor

vishh commented Aug 26, 2016

Reviewing now

@Random-Liu Random-Liu force-pushed the containerize-node-e2e-test branch from 434bbda to 255eb61 Compare November 7, 2016 04:18
@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 7, 2016
# MANIFEST_PATH is the kubelet manifest path in the container.
# FLAKE_ATTEMPTS is the time to retry when there is a test failure. By default 2.
# TEST_ARGS is the test arguments passed into the test.
ENV FOCUS="\[Conformance\]" SKIP="" PARALLELISM=8 REPORT_PATH="/var/result" MANIFEST_PATH="/etc/manifest" FLAKE_ATTEMPTS=2 TEST_ARGS=""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There aren't currently any serial conformance tests, but I think we should either skip serial, or run serially. Since flakes are more likely with parallel tests, I wonder if running serially would be better anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only problem is that running serially is too slow. :(
I'll make it skip serial and flaky test now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW there are some [Conformance] tests under SchedulerPredicates [Serial], so the only way to catch everything conformance-related in one single e2e run is running in serial

But I agree in practice parallel gives signal much faster, and the serial tests are really just refinement

# MANIFEST_PATH is the kubelet manifest path in the container.
# FLAKE_ATTEMPTS is the time to retry when there is a test failure. By default 2.
# TEST_ARGS is the test arguments passed into the test.
ENV FOCUS="\[Conformance\]" SKIP="" PARALLELISM=8 REPORT_PATH="/var/result" MANIFEST_PATH="/etc/manifest" FLAKE_ATTEMPTS=2 TEST_ARGS=""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: style seems to put each var on a separate ENV line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timstclair Each ENV is a separate image layer, I put them in one line intentionally to avoid extra layers #31093 (comment).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good to know, thanks. Consider putting 1 var per line then? E.g.

ENV FOCUS=... \
    SKIP=... \
    ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

# TEST_ARGS is the test arguments passed into the test.
ENV FOCUS="\[Conformance\]" SKIP="" PARALLELISM=8 REPORT_PATH="/var/result" MANIFEST_PATH="/etc/manifest" FLAKE_ATTEMPTS=2 TEST_ARGS=""

ENTRYPOINT ginkgo --focus="$FOCUS" --skip="$SKIP" --nodes=$PARALLELISM --flakeAttempts=$FLAKE_ATTEMPTS /usr/local/bin/e2e_node.test -- --conformance=true --prepull-images=false --manifest-path=$MANIFEST_PATH --report-dir=$REPORT_PATH $TEST_ARGS

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: quote $MANIFEST_PATH and $REPORT_PATH

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

endif
ifeq ($(ARCH),ppc64le)
BASEIMAGE?=ppc64le/debian:jessie
endif

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional: Cleaner to do:

BASEIMAGE_amd64=debian:jessie
BASEIMAGE_arm=armel/debian:jessie
BASEIMAGE_arm64=aarch64/debian:jessie
BASEIMAGE_ppc64le=ppc64le/debian:jessie

BASEIMAGE?=${BASEIMAGE_${ARCH}}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Thanks!

ARCH=${ARCH:-"amd64"}

# VERSION is the version of the test container image.
VERSION=${VERSION:-"v0.1"}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider dropping the v from the version (it doesn't conform to the semantic versioning standard)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

# * kubelet manifest path is mounted to /etc/manifest;
# * log collect directory is mounted to /var/result;
# * root file system is mounted to /rootfs.
sudo docker run -it --rm --privileged=true --net=host -v /:/rootfs \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why -it ? Is the test interactive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need -it so that we can use Ctrl-C to stop the test at any test.

pod_cidr=10.180.0.0/24
log_level=4
start_kubelet --api-servers $apiserver \
--hostname-override=$(hostname) \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Isn't this the default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

@@ -172,12 +186,11 @@ func validateSystem() error {
if err != nil {
return fmt.Errorf("can't get current binary: %v", err)
}
// TODO(random-liu): Remove sudo in containerize PR.
output, err := exec.Command("sudo", testBin, "--system-validate-mode").CombinedOutput()
output, err := exec.Command(testBin, append([]string{"--system-validate-mode"}, os.Args[1:]...)...).CombinedOutput()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you intentionally including all the flags here? If yes, add a comment. Otherwise, use flag.Args()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I want to make sure that all processes see the same flag set, or else it confuses people sometimes.

@@ -352,8 +358,20 @@ func getSshCommand(sep string, args ...string) string {
return fmt.Sprintf("'%s'", strings.Join(args, sep))
}

// Ssh executes ssh command with runSshCommand as root. The `sudo` makes sure that all commands
// are executed by root, so that there won't be permission mismatch between different commands.
func Ssh(host string, cmd ...string) (string, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/Ssh/SSH/ (same below)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

"--logtostderr",
"--vmodule=*="+LOG_VERBOSITY_LEVEL,
)
startCmd := exec.Command(testBin, append([]string{"--run-services-mode"}, os.Args[1:]...)...)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto re:flags

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

@Random-Liu Random-Liu force-pushed the containerize-node-e2e-test branch 2 times, most recently from d5f75a4 to 488e260 Compare November 7, 2016 22:59
@Random-Liu
Copy link
Member Author

@timstclair Addressed comments. Will squash before merging.

@timstclair
Copy link

LGTM. You can self-apply after squash.

@timstclair
Copy link

Remove v from container version in release note.

@k8s-ci-robot
Copy link
Contributor

Jenkins GCI GKE smoke e2e failed for commit 488e260c236703054d05031cfe3086bd383c06bb. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@Random-Liu
Copy link
Member Author

Remove v from container version in release note.

Good catch. And I also need to update the node conformance test document http://kubernetes.io/docs/admin/node-conformance/.

@Random-Liu Random-Liu force-pushed the containerize-node-e2e-test branch from 488e260 to 9345e12 Compare November 7, 2016 23:28
@Random-Liu
Copy link
Member Author

Random-Liu commented Nov 7, 2016

Squashed, will self apply LGTM based on #31093 (comment), after all tests pass.

@Random-Liu Random-Liu added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 8, 2016
@Random-Liu
Copy link
Member Author

Apply LGTM based on #31093 (comment).

@luxas
Copy link
Member

luxas commented Nov 8, 2016

Great to see this is getting in 👍

@k8s-ci-robot
Copy link
Contributor

Jenkins GCE e2e failed for commit 9345e12. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@Random-Liu
Copy link
Member Author

@k8s-bot cvm gce e2e test this kubernetes/test-infra#937.

@k8s-ci-robot
Copy link
Contributor

Jenkins GCE etcd3 e2e failed for commit 9345e12. Full PR test history.

The magic incantation to run this job again is @k8s-bot gce etcd3 e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@Random-Liu
Copy link
Member Author

@k8s-bot gce etcd3 e2e test this

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 0df6384 into kubernetes:master Nov 8, 2016
@Random-Liu Random-Liu deleted the containerize-node-e2e-test branch November 8, 2016 08:16
shyamjvs pushed a commit to shyamjvs/kubernetes that referenced this pull request Dec 1, 2016
…r-node-e2e

Automatic merge from submit-queue

Add separate build process for node test.

This PR is part of kubernetes#31093. However, because currently node e2e is built on `KUBE_TEST_PLATFORMS`, which includes linux/amd64, darwin/amd64, windows/amd64 and linux/arm, it caused kubernetes#32251 to fail.

In fact, node e2e is running on the same node with kubelet, and it also has built-in apiserver, etcd and namespace controller. All of them are only built on `KUBE_SERVER_PLATFORMS`, so node e2e should also only be built on those platforms.
```
KUBE_SERVER_PLATFORMS=(
     linux/amd64
     linux/arm
     linux/arm64
   )
```

This PR added a separate build process for node e2e to address this.

@vishh Do you need this for v1.4? because this blocks your kubernetes#32251. /cc @dchen1107
(cherry picked from commit dae3bdd)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/node-e2e area/test lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants