-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node Conformance Test: Containerize the node e2e test #31093
Node Conformance Test: Containerize the node e2e test #31093
Conversation
|
||
# The following environment variables can be override when starting the container. | ||
# FOCUS is regex matching test to run. By default run all conformance test. | ||
ENV FOCUS="\[Conformance\]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To save some layers, you can combine all ENV statements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luxas Thanks, good to know that! Will do.
fd046a8
to
c87e7f0
Compare
@luxas Thanks for reviewing! Addressed your comments. :) |
c87e7f0
to
4d162b0
Compare
4d162b0
to
ca928aa
Compare
@timstclair @dchen1107 @vishh Can any of you take a look at this PR? :) |
Reviewing now |
434bbda
to
255eb61
Compare
# MANIFEST_PATH is the kubelet manifest path in the container. | ||
# FLAKE_ATTEMPTS is the time to retry when there is a test failure. By default 2. | ||
# TEST_ARGS is the test arguments passed into the test. | ||
ENV FOCUS="\[Conformance\]" SKIP="" PARALLELISM=8 REPORT_PATH="/var/result" MANIFEST_PATH="/etc/manifest" FLAKE_ATTEMPTS=2 TEST_ARGS="" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There aren't currently any serial conformance tests, but I think we should either skip serial, or run serially. Since flakes are more likely with parallel tests, I wonder if running serially would be better anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only problem is that running serially is too slow. :(
I'll make it skip serial and flaky test now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW there are some [Conformance]
tests under SchedulerPredicates [Serial]
, so the only way to catch everything conformance-related in one single e2e run is running in serial
But I agree in practice parallel gives signal much faster, and the serial tests are really just refinement
# MANIFEST_PATH is the kubelet manifest path in the container. | ||
# FLAKE_ATTEMPTS is the time to retry when there is a test failure. By default 2. | ||
# TEST_ARGS is the test arguments passed into the test. | ||
ENV FOCUS="\[Conformance\]" SKIP="" PARALLELISM=8 REPORT_PATH="/var/result" MANIFEST_PATH="/etc/manifest" FLAKE_ATTEMPTS=2 TEST_ARGS="" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: style seems to put each var on a separate ENV
line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timstclair Each ENV
is a separate image layer, I put them in one line intentionally to avoid extra layers #31093 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good to know, thanks. Consider putting 1 var per line then? E.g.
ENV FOCUS=... \
SKIP=... \
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
# TEST_ARGS is the test arguments passed into the test. | ||
ENV FOCUS="\[Conformance\]" SKIP="" PARALLELISM=8 REPORT_PATH="/var/result" MANIFEST_PATH="/etc/manifest" FLAKE_ATTEMPTS=2 TEST_ARGS="" | ||
|
||
ENTRYPOINT ginkgo --focus="$FOCUS" --skip="$SKIP" --nodes=$PARALLELISM --flakeAttempts=$FLAKE_ATTEMPTS /usr/local/bin/e2e_node.test -- --conformance=true --prepull-images=false --manifest-path=$MANIFEST_PATH --report-dir=$REPORT_PATH $TEST_ARGS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: quote $MANIFEST_PATH
and $REPORT_PATH
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
endif | ||
ifeq ($(ARCH),ppc64le) | ||
BASEIMAGE?=ppc64le/debian:jessie | ||
endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional: Cleaner to do:
BASEIMAGE_amd64=debian:jessie
BASEIMAGE_arm=armel/debian:jessie
BASEIMAGE_arm64=aarch64/debian:jessie
BASEIMAGE_ppc64le=ppc64le/debian:jessie
BASEIMAGE?=${BASEIMAGE_${ARCH}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! Thanks!
ARCH=${ARCH:-"amd64"} | ||
|
||
# VERSION is the version of the test container image. | ||
VERSION=${VERSION:-"v0.1"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider dropping the v
from the version (it doesn't conform to the semantic versioning standard)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
# * kubelet manifest path is mounted to /etc/manifest; | ||
# * log collect directory is mounted to /var/result; | ||
# * root file system is mounted to /rootfs. | ||
sudo docker run -it --rm --privileged=true --net=host -v /:/rootfs \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why -it
? Is the test interactive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need -it
so that we can use Ctrl-C
to stop the test at any test.
pod_cidr=10.180.0.0/24 | ||
log_level=4 | ||
start_kubelet --api-servers $apiserver \ | ||
--hostname-override=$(hostname) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Isn't this the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
@@ -172,12 +186,11 @@ func validateSystem() error { | |||
if err != nil { | |||
return fmt.Errorf("can't get current binary: %v", err) | |||
} | |||
// TODO(random-liu): Remove sudo in containerize PR. | |||
output, err := exec.Command("sudo", testBin, "--system-validate-mode").CombinedOutput() | |||
output, err := exec.Command(testBin, append([]string{"--system-validate-mode"}, os.Args[1:]...)...).CombinedOutput() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you intentionally including all the flags here? If yes, add a comment. Otherwise, use flag.Args()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I want to make sure that all processes see the same flag set, or else it confuses people sometimes.
@@ -352,8 +358,20 @@ func getSshCommand(sep string, args ...string) string { | |||
return fmt.Sprintf("'%s'", strings.Join(args, sep)) | |||
} | |||
|
|||
// Ssh executes ssh command with runSshCommand as root. The `sudo` makes sure that all commands | |||
// are executed by root, so that there won't be permission mismatch between different commands. | |||
func Ssh(host string, cmd ...string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/Ssh/SSH/ (same below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
"--logtostderr", | ||
"--vmodule=*="+LOG_VERBOSITY_LEVEL, | ||
) | ||
startCmd := exec.Command(testBin, append([]string{"--run-services-mode"}, os.Args[1:]...)...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto re:flags
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
d5f75a4
to
488e260
Compare
@timstclair Addressed comments. Will squash before merging. |
LGTM. You can self-apply after squash. |
Remove |
Jenkins GCI GKE smoke e2e failed for commit 488e260c236703054d05031cfe3086bd383c06bb. Full PR test history. The magic incantation to run this job again is |
Good catch. And I also need to update the node conformance test document http://kubernetes.io/docs/admin/node-conformance/. |
488e260
to
9345e12
Compare
Squashed, will self apply LGTM based on #31093 (comment), after all tests pass. |
Apply LGTM based on #31093 (comment). |
Great to see this is getting in 👍 |
Jenkins GCE e2e failed for commit 9345e12. Full PR test history. The magic incantation to run this job again is |
@k8s-bot cvm gce e2e test this kubernetes/test-infra#937. |
Jenkins GCE etcd3 e2e failed for commit 9345e12. Full PR test history. The magic incantation to run this job again is |
@k8s-bot gce etcd3 e2e test this |
Automatic merge from submit-queue |
…r-node-e2e Automatic merge from submit-queue Add separate build process for node test. This PR is part of kubernetes#31093. However, because currently node e2e is built on `KUBE_TEST_PLATFORMS`, which includes linux/amd64, darwin/amd64, windows/amd64 and linux/arm, it caused kubernetes#32251 to fail. In fact, node e2e is running on the same node with kubelet, and it also has built-in apiserver, etcd and namespace controller. All of them are only built on `KUBE_SERVER_PLATFORMS`, so node e2e should also only be built on those platforms. ``` KUBE_SERVER_PLATFORMS=( linux/amd64 linux/arm linux/arm64 ) ``` This PR added a separate build process for node e2e to address this. @vishh Do you need this for v1.4? because this blocks your kubernetes#32251. /cc @dchen1107 (cherry picked from commit dae3bdd)
For #30122, #30174.
Based on #32427, #32454.
Please only review the last 3 commits.
This PR packages the node e2e test into a docker image:
NodeConformance
flag in the node e2e framework to avoid starting kubelet and collecting system logs. We do this because:sudo
in the test container. We do this because:sudo
command, and there is no need to usesudo
inside the container.sudo
inside the test. (e2e_node tests leak kube-apiserver processes #29211, Node e2e framework killing issues #26748) In fact we just need to run the test suite withsudo
.Makefile
andDockerfile
. We also added arun_test.sh
script to start kubelet and run the test container. The script is only for demonstration purpose and we'll also use the script in our node e2e framework. In the future, we should update the script to start kubelet in production way (maybe withsystemd
orsupervisord
).@dchen1107 @vishh
/cc @kubernetes/sig-node @kubernetes/sig-testing
This change is
Release note: