Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node Conformance Test: Add system verification #32427

Merged
merged 3 commits into from
Nov 7, 2016

Conversation

Random-Liu
Copy link
Member

@Random-Liu Random-Liu commented Sep 9, 2016

For #30122 and #29081.

This PR introduces system verification test in node e2e and conformance test. It will run before the real test. Once the system verification fails, the test will just fail. The output of the system verification is like this:

I0909 23:33:20.622122    2717 validators.go:45] Validating os...
OS: Linux
I0909 23:33:20.623274    2717 validators.go:45] Validating kernel...
I0909 23:33:20.624037    2717 kernel_validator.go:79] Validating kernel version
KERNEL_VERSION: 3.16.0-4-amd64
I0909 23:33:20.624146    2717 kernel_validator.go:93] Validating kernel config
CONFIG_NAMESPACES: enabled
CONFIG_NET_NS: enabled
CONFIG_PID_NS: enabled
CONFIG_IPC_NS: enabled
CONFIG_UTS_NS: enabled
CONFIG_CGROUPS: enabled
CONFIG_CGROUP_CPUACCT: enabled
CONFIG_CGROUP_DEVICE: enabled
CONFIG_CGROUP_FREEZER: enabled
CONFIG_CGROUP_SCHED: enabled
CONFIG_CPUSETS: enabled
CONFIG_MEMCG: enabled
I0909 23:33:20.679328    2717 validators.go:45] Validating cgroups...
CGROUPS_CPU: enabled
CGROUPS_CPUACCT: enabled
CGROUPS_CPUSET: enabled
CGROUPS_DEVICES: enabled
CGROUPS_FREEZER: enabled
CGROUPS_MEMORY: enabled
I0909 23:33:20.679454    2717 validators.go:45] Validating docker...
DOCKER_GRAPH_DRIVER: aufs

It verifies the system following a predefined SysSpec:

// DefaultSysSpec is the default SysSpec.
 var DefaultSysSpec = SysSpec{
    OS:            "Linux",
    KernelVersion: []string{`3\.[1-9][0-9].*`, `4\..*`}, // Requires 3.10+ or 4+
    // TODO(random-liu): Add more config
    KernelConfig: KernelConfig{
        Required: []string{
            "NAMESPACES", "NET_NS", "PID_NS", "IPC_NS", "UTS_NS",
            "CGROUPS", "CGROUP_CPUACCT", "CGROUP_DEVICE", "CGROUP_FREEZER",
            "CGROUP_SCHED", "CPUSETS", "MEMCG",
        },
        Forbidden: []string{},
    },
    Cgroups: []string{"cpu", "cpuacct", "cpuset", "devices", "freezer", "memory"},
    RuntimeSpec: RuntimeSpec{
        DockerSpec: &DockerSpec{
            Version: []string{`1\.(9|\d{2,})\..*`}, // Requires 1.9+
            GraphDriver: []string{"aufs", "overlay", "devicemapper"},
        },
    },
 }

Currently, it only supports:

  • Kernel validation: version validation and kernel configuration validation
  • Cgroup validation: validating whether required cgroups subsystems are enabled.
  • Runtime Validation: currently, only validates docker graph driver.

The validating framework is ready. The specific validation items could be added over time.

@dchen1107
/cc @kubernetes/sig-node


This change is Reviewable

@Random-Liu Random-Liu added area/test sig/node Categorizes an issue or PR as relevant to SIG Node. area/node-e2e and removed cla: yes labels Sep 9, 2016
@Random-Liu Random-Liu added this to the v1.5 milestone Sep 9, 2016
@Random-Liu Random-Liu added the release-note-none Denotes a PR that doesn't merit a release note. label Sep 10, 2016
@dchen1107
Copy link
Member

Please remove the dependency of #31093 from this one. I think this one should be in first before the packaging solution.

@k8s-github-robot k8s-github-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Sep 10, 2016
@Random-Liu
Copy link
Member Author

@dchen1107 Removed the dependency.

@k8s-github-robot k8s-github-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 10, 2016
@Random-Liu Random-Liu force-pushed the system-verification branch 4 times, most recently from 106128f to ee5aa9e Compare September 11, 2016 06:13
@Random-Liu
Copy link
Member Author

@k8s-bot test this issue #IGNORE.

Cgroups: []string{"cpu", "cpuacct", "cpuset", "devices", "freezer", "memory"},
RuntimeSpec: RuntimeSpec{
DockerSpec: &DockerSpec{
GraphDriver: []string{"aufs", "overlay", "overlay2", "devicemapper"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we validate overlay2 in any format. It might work, but we need to kick out the qualification process before we claim we support that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I just realized that when overlayfs moved to kernel tree, the upstream called it overlay2. What you are referring to graphdrivers specifically named by docker.

From https://docs.docker.com/engine/userguide/storagedriver/overlayfs-driver/ it says "For the overlay2 driver, the version of your kernel must be 4.0 or newer" which should be fine on GCI & CoreOS cases (cc/ @euank @justinsb @philips). But should we tight our validation here against kernel version too?

Also I just realized that we don't know how the docker upgrade will work from 'overlay' to 'overlay2'. We didn't validate that part when we validate docker 1.12.

@Random-Liu
Copy link
Member Author

Random-Liu commented Sep 13, 2016

@dchen1107 Addressed comments.

  • Added docker version check (>= 1.9.*)
  • Removed overlay2 from graph driver list and add a TODO to validate overlay2.

for _, v := range validators {
glog.Infof("Validating %s...", v.Name())
err := v.Validate(spec)
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: errors.NewAggregate() actually checks to see if the errors in the list are nil or not, so you don't have to check here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK.

@errordeveloper
Copy link
Member

As we'd like to use this in kubeadm, it'd be great if output was a little prettier! I'd be happy to help, if you want 👍

@dchen1107
Copy link
Member

Here are couple of things missing in current validation tests as the minimal requirements for kernel:

  1. Network:
  • CONFIG_INET
  • CONFIG_IP_NF_TARGET_REDIRECT or CONFIG_NETFILTER_XT_TARGET_REDIRECT, and
    CONFIG_NETFILTER_XT_MATCH_COMMENT: Otherwise kube-proxy won't work fine.
  1. Storage and filesystem:
  • CONFIG_EXT4_FS, CONFIG_PROC_FS

  • CONFIG_OVERLAY or CONFIG_AUFS or devicemapper (? @derekwaynecarr on depended kernel module & config)

    1. Resource isolation and management:
    • I saw you already includes all required namespace and cgroups. checked.

There are more required based on the feature set. For example, if the user uses Gluster volume, Kernel has to configured with CONFIG_FUSE; if the user uses Calico networking, CONFIG_IP_SET_HASH_IP has to be set. We can extend this in next version for those as a warning with detail information, not as the minimal requirement here.

@Random-Liu
Copy link
Member Author

Random-Liu commented Nov 3, 2016

@errordeveloper @luxas Sorry for the delay. We'll finish this this week.

@errordeveloper Feel free to send PR to make this prettier! That's really appreciated! :)

@luxas
Copy link
Member

luxas commented Nov 3, 2016

It would be great if we could get this PR in before the code freeze. Thanks!

@Random-Liu
Copy link
Member Author

Random-Liu commented Nov 4, 2016

@dchen1107 Addressed comments. Here is the new output:
screenshot from 2016-11-03 17 58 55

@k8s-ci-robot
Copy link
Contributor

Jenkins verification failed for commit 0ac3e2a5d3e86fcdd19f0aac544e67d3931c4fe1. Full PR test history.

The magic incantation to run this job again is @k8s-bot verify test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@Random-Liu
Copy link
Member Author

@dchen1107 I've included unit test from #32454 into this PR so as to get the change merged soon.

@k8s-github-robot k8s-github-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 4, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins GCI GCE e2e failed for commit f9b50f0. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@dchen1107
Copy link
Member

@k8s-bot gci gce e2e test this

@dchen1107 dchen1107 added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Nov 4, 2016
@dchen1107
Copy link
Member

LGTM

@Random-Liu
Copy link
Member Author

Apply LGTM based on #32427 (comment).

@Random-Liu Random-Liu added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 4, 2016
@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 9534c4f into kubernetes:master Nov 7, 2016
@Random-Liu Random-Liu deleted the system-verification branch November 7, 2016 03:53
k8s-github-robot pushed a commit that referenced this pull request Nov 8, 2016
Automatic merge from submit-queue

Node Conformance Test: Containerize the node e2e test

For #30122, #30174.
Based on #32427, #32454.

**Please only review the last 3 commits.**

This PR packages the node e2e test into a docker image:
- 1st commit: Add `NodeConformance` flag in the node e2e framework to avoid starting kubelet and collecting system logs. We do this because:
  - There are all kinds of ways to manage kubelet and system logs, for different situation we need to mount different things into the container, run different commands. It is hard and unnecessary to handle the complexity inside the test suite.
- 2nd commit: Remove all `sudo` in the test container. We do this because:
  - In most container, there is no `sudo` command, and there is no need to use `sudo` inside the container.
  - It introduces some complexity to use `sudo` inside the test. (#29211, #26748) In fact we just need to run the test suite with `sudo`.
- 3rd commit: Package the test into a docker container with corresponding `Makefile` and `Dockerfile`. We also added a `run_test.sh` script to start kubelet and run the test container. The script is only for demonstration purpose and we'll also use the script in our node e2e framework. In the future, we should update the script to start kubelet in production way (maybe with `systemd` or `supervisord`).

@dchen1107 @vishh 
/cc @kubernetes/sig-node @kubernetes/sig-testing



**Release note**:

<!--  Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access) 
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`. 
-->

``` release-note
Release alpha version node test container gcr.io/google_containers/node-test-ARCH:0.1 for users to verify their node setup.
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/node-e2e area/test lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants