-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add local PV negative scheduling tests to integration testing #57570
Add local PV negative scheduling tests to integration testing #57570
Conversation
/sig storage |
} | ||
} | ||
|
||
func setupNodes(t *testing.T, nsName string, numberOfNodes int) *testConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you able to reuse the setup function in the volume binding test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you agree, I will modify setup functions to be able to start mulitple nodes. In this case in your tests I will call it with number of nodes == 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I use new seupNodes in yout volume binding test, both tests fail, when I use in my PR new setupNodes and in yours setup, both tests pass. Mystery...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's odd. Are the node names, node labels, etc exactly the same between the two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can, I would look more deeply into why the setup functions cannot be shared. I don't see any reason why they can't be.
t.Fatalf("Failed to create Pod %q: %v", pod.Name, err) | ||
} | ||
|
||
if err := waitForPodToSchedule(config.client, pod); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you validate that the pod failed to schedule?
It would be good if we could validate the Pod events and look for the predicate filters. You may need to add some logic in the setup to start the scheduler with an event recorder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
nodeMarkers := []interface{}{ | ||
markNodeAffinity, | ||
markNodeSelector, | ||
markNodeName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case won't work here because it actually completely bypasses the scheduler, and it will fail on the kubelet side. So this last test case needs to remain in the e2e test for now.
Can you also delete the 1st two test cases from the e2e test suite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
/test pull-kubernetes-unit |
1 similar comment
/test pull-kubernetes-unit |
markNodeAffinity, | ||
markNodeSelector, | ||
} | ||
podName := "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need to be defined outside?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not anymore I had some debug outside of for loop. fixed.
t.Fatalf("Failed to create Pod %q: %v", pod.Name, err) | ||
} | ||
// Give time to shceduler to attempt to schedule pod | ||
if err := waitForPodToSchedule(config.client, pod); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long does this wait?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
30 seconds defined in here: https://github.com/kubernetes/kubernetes/blob/master/test/integration/scheduler/util.go#L359
options := metav1.ListOptions{FieldSelector: selector} | ||
events, err := config.client.CoreV1().Events(config.ns).List(options) | ||
if err != nil { | ||
t.Errorf("Failed to list events with error: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should probably be Fatalf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
found := false | ||
for _, e := range events.Items { | ||
if strings.Contains(e.Message, "MatchNodeSelector") || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it have to contain both of them, or just at least one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, "and" works too, I just wanted to be more flexible. It was just strange that both mismatches generate the same scheduler errors. I thought that it would be fixed in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're testing that the combination of both predicate failures causes a scheduling failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok changed for "and"
} | ||
} | ||
|
||
func setupNodes(t *testing.T, nsName string, numberOfNodes int) *testConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's odd. Are the node names, node labels, etc exactly the same between the two?
@@ -461,7 +461,7 @@ func makePod(name, ns string, pvcs []string) *v1.Pod { | |||
Containers: []v1.Container{ | |||
{ | |||
Name: "write-pod", | |||
Image: "k8s.gcr.io/busybox:1.24", | |||
Image: "gcr.io/google_containers/busybox:1.24", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you need to change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See this PR:
e9dd8a6#diff-c0800dd83834b3847acb7efd347d6d34
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Odd that the revert did not roll this back too?
@msau42 the issue is/was because of labels. You did not apply label kubernetes.io/hostname I needed for scheduling to work and I missed labels needed for your test to work. I suggest to modify the setup: 1 - unconditionally assign "kubernetes.io/hostname" label to node(s) on creation, 2 - add extra label parameter which will be added all nodes in addition to "standard labels". What do you think? The only negative thing I see in this approach is what is somebody will need to apply different labels to different nodes... or another possibility would be to add a new func to assign user defined labels as in your test case to apply to a specific node selected by name. Let me know what do you think. |
I think changing my test to use "kubernetes.io/hostname" label is fine. |
/lgtm |
/assign @timothysc |
/retest |
1 similar comment
/retest |
/retest |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the primary motivation of moving this from an e2e to a integration test? Did it routinely flake, and if so why?
This test is beyond the typical simple integration tests, you are standing up an apiserver, a controller, and fake nodes, PVs and PVCs... which sounds an awful lot like an e2e.
"reason": "FailedScheduling", | ||
}.AsSelector().String() | ||
options := metav1.ListOptions{FieldSelector: selector} | ||
events, err := config.client.CoreV1().Events(config.ns).List(options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you relying on events (which is subject to change) here. If the data is not encoded in pod status then that is a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timothysc thanks for your review. I do not think POD status provides very granular failure. In this test case we test a very specific scenario and scheduler reaction on it. @msau42 Appreciate your input here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pod status would just be "pending", which I think is too generic of an error that we want to be testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are errors should be encoded into the status.message - https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.9/#podstatus-v1-core that indicate the failure. I'm ok with the event for now, but please add a // TODO about reevaluating or removing the event requirement on the test.
@timothysc The reason this test was moved from e2e was inability to build multi node environment. This test was always skipped during e2e runs as it needed to have at least 2 nodes. With integration framework we could easily have it without any extra compute nodes/vms. |
The motivation behind moving this to an integration test is because you only need three components to run: the scheduler, api server and pv controller. You do not actually need a real multi-node cluster with running kubelets to test this functionality. The benefits are that this functionality gets tested more frequently, it uses up less resources, and it's faster. It's a big benefit for local development. |
"Only..." I'm ok with adding an integration for faster verification, but not ok with removing the e2e test. |
In the scheduler code here, I see that it does record an event and also updates the pod status with the error message. So the test can be changed to look at the Pod status message instead. |
@timothysc Please check the latest. I restored e2e tests and reworked checking for POD status messages. |
/test pull-kubernetes-node-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
Just a note - There will likely be a reduction over time on integration startup routines.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: msau42, sbezverk, timothysc Associated issue: #56088 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Automatic merge from submit-queue (batch tested with PRs 56971, 57570, 57830, 57742). If you want to cherry-pick this change to another branch, please follow the instructions here. |
Closes: #56088