Add local PV negative scheduling tests to integration testing #57570

sbezverk · 2017-12-22T22:14:19Z

Move local PV negative scheduling tests to integration

sbezverk · 2017-12-22T22:15:14Z

/sig storage
/assign @msau42

msau42 · 2017-12-23T01:12:33Z

test/integration/scheduler/local-pv-neg-affinity_test.go

+	}
+}
+
+func setupNodes(t *testing.T, nsName string, numberOfNodes int) *testConfig {


Are you able to reuse the setup function in the volume binding test?

If you agree, I will modify setup functions to be able to start mulitple nodes. In this case in your tests I will call it with number of nodes == 1

When I use new seupNodes in yout volume binding test, both tests fail, when I use in my PR new setupNodes and in yours setup, both tests pass. Mystery...

That's odd. Are the node names, node labels, etc exactly the same between the two?

If you can, I would look more deeply into why the setup functions cannot be shared. I don't see any reason why they can't be.

msau42 · 2017-12-23T01:15:06Z

test/integration/scheduler/local-pv-neg-affinity_test.go

+			t.Fatalf("Failed to create Pod %q: %v", pod.Name, err)
+		}
+
+		if err := waitForPodToSchedule(config.client, pod); err != nil {


Can you validate that the pod failed to schedule?

It would be good if we could validate the Pod events and look for the predicate filters. You may need to add some logic in the setup to start the scheduler with an event recorder.

msau42 · 2017-12-23T01:16:13Z

test/integration/scheduler/local-pv-neg-affinity_test.go

+	nodeMarkers := []interface{}{
+		markNodeAffinity,
+		markNodeSelector,
+		markNodeName,


This test case won't work here because it actually completely bypasses the scheduler, and it will fail on the kubelet side. So this last test case needs to remain in the e2e test for now.

Can you also delete the 1st two test cases from the e2e test suite?

sbezverk · 2017-12-23T16:52:48Z

/test pull-kubernetes-unit

sbezverk · 2017-12-23T23:54:36Z

/test pull-kubernetes-unit

msau42 · 2017-12-27T18:31:35Z

test/integration/scheduler/local-pv-neg-affinity_test.go

+		markNodeAffinity,
+		markNodeSelector,
+	}
+	podName := ""


does this need to be defined outside?

not anymore I had some debug outside of for loop. fixed.

msau42 · 2017-12-27T18:33:02Z

test/integration/scheduler/local-pv-neg-affinity_test.go

+			t.Fatalf("Failed to create Pod %q: %v", pod.Name, err)
+		}
+		// Give time to shceduler to attempt to schedule pod
+		if err := waitForPodToSchedule(config.client, pod); err == nil {


How long does this wait?

30 seconds defined in here: https://github.com/kubernetes/kubernetes/blob/master/test/integration/scheduler/util.go#L359

msau42 · 2017-12-27T18:34:17Z

test/integration/scheduler/local-pv-neg-affinity_test.go

+		options := metav1.ListOptions{FieldSelector: selector}
+		events, err := config.client.CoreV1().Events(config.ns).List(options)
+		if err != nil {
+			t.Errorf("Failed to list events with error: %v", err)


this should probably be Fatalf

msau42 · 2017-12-27T18:37:26Z

test/integration/scheduler/local-pv-neg-affinity_test.go

+		}
+		found := false
+		for _, e := range events.Items {
+			if strings.Contains(e.Message, "MatchNodeSelector") ||


Does it have to contain both of them, or just at least one?

At least one

Should it be both?

Well, "and" works too, I just wanted to be more flexible. It was just strange that both mismatches generate the same scheduler errors. I thought that it would be fixed in future.

We're testing that the combination of both predicate failures causes a scheduling failure.

ok changed for "and"

msau42 · 2017-12-27T18:38:41Z

test/integration/scheduler/local-pv-neg-affinity_test.go

+	}
+}
+
+func setupNodes(t *testing.T, nsName string, numberOfNodes int) *testConfig {


That's odd. Are the node names, node labels, etc exactly the same between the two?

msau42 · 2017-12-27T18:39:55Z

test/integration/scheduler/volume_binding_test.go

@@ -461,7 +461,7 @@ func makePod(name, ns string, pvcs []string) *v1.Pod {
 			Containers: []v1.Container{
 				{
 					Name:    "write-pod",
-					Image:   "k8s.gcr.io/busybox:1.24",
+					Image:   "gcr.io/google_containers/busybox:1.24",


Why did you need to change this?

See this PR:
e9dd8a6#diff-c0800dd83834b3847acb7efd347d6d34

Odd that the revert did not roll this back too?

sbezverk · 2017-12-27T23:09:22Z

@msau42 the issue is/was because of labels. You did not apply label kubernetes.io/hostname I needed for scheduling to work and I missed labels needed for your test to work. I suggest to modify the setup: 1 - unconditionally assign "kubernetes.io/hostname" label to node(s) on creation, 2 - add extra label parameter which will be added all nodes in addition to "standard labels". What do you think? The only negative thing I see in this approach is what is somebody will need to apply different labels to different nodes... or another possibility would be to add a new func to assign user defined labels as in your test case to apply to a specific node selected by name. Let me know what do you think.

msau42 · 2017-12-27T23:14:18Z

I think changing my test to use "kubernetes.io/hostname" label is fine.

msau42 · 2017-12-28T19:02:12Z

/lgtm

sbezverk · 2017-12-28T19:06:39Z

/assign @timothysc

sbezverk · 2018-01-01T20:26:11Z

/retest

sbezverk · 2018-01-02T13:52:53Z

/retest

msau42 · 2018-01-02T18:15:57Z

/retest

msau42 · 2018-01-02T18:54:09Z

/lgtm

timothysc

What is the primary motivation of moving this from an e2e to a integration test? Did it routinely flake, and if so why?

This test is beyond the typical simple integration tests, you are standing up an apiserver, a controller, and fake nodes, PVs and PVCs... which sounds an awful lot like an e2e.

timothysc · 2018-01-02T20:12:43Z

test/integration/scheduler/local-pv-neg-affinity_test.go

+			"reason":                   "FailedScheduling",
+		}.AsSelector().String()
+		options := metav1.ListOptions{FieldSelector: selector}
+		events, err := config.client.CoreV1().Events(config.ns).List(options)


Why are you relying on events (which is subject to change) here. If the data is not encoded in pod status then that is a bug.

@timothysc thanks for your review. I do not think POD status provides very granular failure. In this test case we test a very specific scenario and scheduler reaction on it. @msau42 Appreciate your input here.

The pod status would just be "pending", which I think is too generic of an error that we want to be testing.

There are errors should be encoded into the status.message - https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.9/#podstatus-v1-core that indicate the failure. I'm ok with the event for now, but please add a // TODO about reevaluating or removing the event requirement on the test.

sbezverk · 2018-01-02T21:18:03Z

@timothysc The reason this test was moved from e2e was inability to build multi node environment. This test was always skipped during e2e runs as it needed to have at least 2 nodes. With integration framework we could easily have it without any extra compute nodes/vms.

msau42 · 2018-01-02T22:09:17Z

The motivation behind moving this to an integration test is because you only need three components to run: the scheduler, api server and pv controller. You do not actually need a real multi-node cluster with running kubelets to test this functionality. The benefits are that this functionality gets tested more frequently, it uses up less resources, and it's faster. It's a big benefit for local development.

timothysc · 2018-01-03T17:55:26Z

The motivation behind moving this to an integration test is because you only need three components to run: the scheduler, api server and pv controller. You do not actually need a real multi-node cluster with running kubelets to test this functionality. The benefits are that this functionality gets tested more frequently, it uses up less resources, and it's faster. It's a big benefit for local development.

"Only..."
I'm ok with adding an integration test, but you will absolutely want to verify this behavior on a live cluster and in the future I could also see it being a conformance test.

I'm ok with adding an integration for faster verification, but not ok with removing the e2e test.

msau42 · 2018-01-03T21:15:49Z

In the scheduler code here, I see that it does record an event and also updates the pod status with the error message. So the test can be changed to look at the Pod status message instead.

sbezverk · 2018-01-04T01:55:40Z

@timothysc Please check the latest. I restored e2e tests and reworked checking for POD status messages.

Closes: #56088

sbezverk · 2018-01-04T02:35:20Z

/test pull-kubernetes-node-e2e

timothysc

/lgtm
/approve

Just a note - There will likely be a reduction over time on integration startup routines.

k8s-ci-robot · 2018-01-04T15:06:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msau42, sbezverk, timothysc

Associated issue: #56088

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~test/integration/scheduler/OWNERS~~ [timothysc]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

k8s-github-robot · 2018-01-04T16:59:38Z

Automatic merge from submit-queue (batch tested with PRs 56971, 57570, 57830, 57742). If you want to cherry-pick this change to another branch, please follow the instructions here.

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 22, 2017

k8s-github-robot assigned k82cn and timothysc Dec 22, 2017

k8s-ci-robot assigned msau42 Dec 22, 2017

k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Dec 22, 2017

msau42 reviewed Dec 23, 2017

View reviewed changes

msau42 reviewed Dec 27, 2017

View reviewed changes

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 28, 2017

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 28, 2017

k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 2, 2018

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 2, 2018

timothysc suggested changes Jan 2, 2018

View reviewed changes

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 4, 2018

k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 4, 2018

sbezverk changed the title ~~Move local PV negative scheduling tests to integration~~ Add local PV negative scheduling tests to integration testing Jan 4, 2018

Move local PV negative scheduling tests to integratiom

8ef2a87

Closes: #56088

timothysc approved these changes Jan 4, 2018

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 4, 2018

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 4, 2018

k8s-github-robot merged commit 8c6f770 into kubernetes:master Jan 4, 2018

Add local PV negative scheduling tests to integration testing #57570

Add local PV negative scheduling tests to integration testing #57570

Conversation

sbezverk commented Dec 22, 2017 • edited Loading

sbezverk commented Dec 22, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbezverk commented Dec 23, 2017

sbezverk commented Dec 23, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbezverk commented Dec 27, 2017

msau42 commented Dec 27, 2017

msau42 commented Dec 28, 2017

sbezverk commented Dec 28, 2017

sbezverk commented Jan 1, 2018

sbezverk commented Jan 2, 2018

msau42 commented Jan 2, 2018

msau42 commented Jan 2, 2018

timothysc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbezverk commented Jan 2, 2018 • edited Loading

msau42 commented Jan 2, 2018

timothysc commented Jan 3, 2018

msau42 commented Jan 3, 2018

sbezverk commented Jan 4, 2018

sbezverk commented Jan 4, 2018

timothysc left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Jan 4, 2018

k8s-github-robot commented Jan 4, 2018

sbezverk commented Dec 22, 2017 •

edited

Loading

sbezverk commented Jan 2, 2018 •

edited

Loading