Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add local PV negative scheduling tests to integration testing #57570

Merged
merged 1 commit into from
Jan 4, 2018
Merged

Add local PV negative scheduling tests to integration testing #57570

merged 1 commit into from
Jan 4, 2018

Conversation

sbezverk
Copy link
Contributor

@sbezverk sbezverk commented Dec 22, 2017

Closes: #56088

Move local PV negative scheduling tests to integration

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 22, 2017
@sbezverk
Copy link
Contributor Author

/sig storage
/assign @msau42

@k8s-ci-robot k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Dec 22, 2017
}
}

func setupNodes(t *testing.T, nsName string, numberOfNodes int) *testConfig {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you able to reuse the setup function in the volume binding test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you agree, I will modify setup functions to be able to start mulitple nodes. In this case in your tests I will call it with number of nodes == 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I use new seupNodes in yout volume binding test, both tests fail, when I use in my PR new setupNodes and in yours setup, both tests pass. Mystery...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's odd. Are the node names, node labels, etc exactly the same between the two?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can, I would look more deeply into why the setup functions cannot be shared. I don't see any reason why they can't be.

t.Fatalf("Failed to create Pod %q: %v", pod.Name, err)
}

if err := waitForPodToSchedule(config.client, pod); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you validate that the pod failed to schedule?

It would be good if we could validate the Pod events and look for the predicate filters. You may need to add some logic in the setup to start the scheduler with an event recorder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

nodeMarkers := []interface{}{
markNodeAffinity,
markNodeSelector,
markNodeName,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case won't work here because it actually completely bypasses the scheduler, and it will fail on the kubelet side. So this last test case needs to remain in the e2e test for now.

Can you also delete the 1st two test cases from the e2e test suite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

@sbezverk
Copy link
Contributor Author

/test pull-kubernetes-unit

1 similar comment
@sbezverk
Copy link
Contributor Author

/test pull-kubernetes-unit

markNodeAffinity,
markNodeSelector,
}
podName := ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be defined outside?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not anymore I had some debug outside of for loop. fixed.

t.Fatalf("Failed to create Pod %q: %v", pod.Name, err)
}
// Give time to shceduler to attempt to schedule pod
if err := waitForPodToSchedule(config.client, pod); err == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does this wait?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

options := metav1.ListOptions{FieldSelector: selector}
events, err := config.client.CoreV1().Events(config.ns).List(options)
if err != nil {
t.Errorf("Failed to list events with error: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably be Fatalf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
found := false
for _, e := range events.Items {
if strings.Contains(e.Message, "MatchNodeSelector") ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it have to contain both of them, or just at least one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, "and" works too, I just wanted to be more flexible. It was just strange that both mismatches generate the same scheduler errors. I thought that it would be fixed in future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're testing that the combination of both predicate failures causes a scheduling failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok changed for "and"

}
}

func setupNodes(t *testing.T, nsName string, numberOfNodes int) *testConfig {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's odd. Are the node names, node labels, etc exactly the same between the two?

@@ -461,7 +461,7 @@ func makePod(name, ns string, pvcs []string) *v1.Pod {
Containers: []v1.Container{
{
Name: "write-pod",
Image: "k8s.gcr.io/busybox:1.24",
Image: "gcr.io/google_containers/busybox:1.24",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you need to change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd that the revert did not roll this back too?

@sbezverk
Copy link
Contributor Author

@msau42 the issue is/was because of labels. You did not apply label kubernetes.io/hostname I needed for scheduling to work and I missed labels needed for your test to work. I suggest to modify the setup: 1 - unconditionally assign "kubernetes.io/hostname" label to node(s) on creation, 2 - add extra label parameter which will be added all nodes in addition to "standard labels". What do you think? The only negative thing I see in this approach is what is somebody will need to apply different labels to different nodes... or another possibility would be to add a new func to assign user defined labels as in your test case to apply to a specific node selected by name. Let me know what do you think.

@msau42
Copy link
Member

msau42 commented Dec 27, 2017

I think changing my test to use "kubernetes.io/hostname" label is fine.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 28, 2017
@msau42
Copy link
Member

msau42 commented Dec 28, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 28, 2017
@sbezverk
Copy link
Contributor Author

/assign @timothysc

@sbezverk
Copy link
Contributor Author

sbezverk commented Jan 1, 2018

/retest

1 similar comment
@sbezverk
Copy link
Contributor Author

sbezverk commented Jan 2, 2018

/retest

@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 2, 2018
@msau42
Copy link
Member

msau42 commented Jan 2, 2018

/retest

@msau42
Copy link
Member

msau42 commented Jan 2, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 2, 2018
Copy link
Member

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the primary motivation of moving this from an e2e to a integration test? Did it routinely flake, and if so why?

This test is beyond the typical simple integration tests, you are standing up an apiserver, a controller, and fake nodes, PVs and PVCs... which sounds an awful lot like an e2e.

"reason": "FailedScheduling",
}.AsSelector().String()
options := metav1.ListOptions{FieldSelector: selector}
events, err := config.client.CoreV1().Events(config.ns).List(options)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you relying on events (which is subject to change) here. If the data is not encoded in pod status then that is a bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timothysc thanks for your review. I do not think POD status provides very granular failure. In this test case we test a very specific scenario and scheduler reaction on it. @msau42 Appreciate your input here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pod status would just be "pending", which I think is too generic of an error that we want to be testing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are errors should be encoded into the status.message - https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.9/#podstatus-v1-core that indicate the failure. I'm ok with the event for now, but please add a // TODO about reevaluating or removing the event requirement on the test.

@sbezverk
Copy link
Contributor Author

sbezverk commented Jan 2, 2018

@timothysc The reason this test was moved from e2e was inability to build multi node environment. This test was always skipped during e2e runs as it needed to have at least 2 nodes. With integration framework we could easily have it without any extra compute nodes/vms.

@msau42
Copy link
Member

msau42 commented Jan 2, 2018

The motivation behind moving this to an integration test is because you only need three components to run: the scheduler, api server and pv controller. You do not actually need a real multi-node cluster with running kubelets to test this functionality. The benefits are that this functionality gets tested more frequently, it uses up less resources, and it's faster. It's a big benefit for local development.

@timothysc
Copy link
Member

The motivation behind moving this to an integration test is because you only need three components to run: the scheduler, api server and pv controller. You do not actually need a real multi-node cluster with running kubelets to test this functionality. The benefits are that this functionality gets tested more frequently, it uses up less resources, and it's faster. It's a big benefit for local development.

"Only..."
I'm ok with adding an integration test, but you will absolutely want to verify this behavior on a live cluster and in the future I could also see it being a conformance test.

I'm ok with adding an integration for faster verification, but not ok with removing the e2e test.

@msau42
Copy link
Member

msau42 commented Jan 3, 2018

In the scheduler code here, I see that it does record an event and also updates the pod status with the error message. So the test can be changed to look at the Pod status message instead.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 4, 2018
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 4, 2018
@sbezverk
Copy link
Contributor Author

sbezverk commented Jan 4, 2018

@timothysc Please check the latest. I restored e2e tests and reworked checking for POD status messages.

@sbezverk sbezverk changed the title Move local PV negative scheduling tests to integration Add local PV negative scheduling tests to integration testing Jan 4, 2018
@sbezverk
Copy link
Contributor Author

sbezverk commented Jan 4, 2018

/test pull-kubernetes-node-e2e

Copy link
Member

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Just a note - There will likely be a reduction over time on integration startup routines.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 4, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msau42, sbezverk, timothysc

Associated issue: #56088

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 4, 2018
@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 56971, 57570, 57830, 57742). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 8c6f770 into kubernetes:master Jan 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants