-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of e2e test that re-sizes cluster #8243
Conversation
// Context for checking pods responses by issuing GETs to them and verifying if the answer with pod name. | ||
type podResponseChecker struct { | ||
c *client.Client | ||
ns string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like using abbreviations, especially one for namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"ns" is a commonly used abbreviation for namespace in kubernetes, and, AFAK, golang standard in general encourages short variable names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I just wanted to mention that I don't like it:) It took me ~1sec to figure out what it stands for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a minute the first time I saw one too, but I think at this point ns
is pretty common.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. If it's not only me maybe we should start using full 'namespace' for this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @gmarek. Feel free to do a holistic cleanup of the e2e tests in this regard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I'm used to it I kind of like the brevity... but regardless, bikeshedding for a separate PR :-)
I'm ... ugh. I wish I knew a better way to write this test, because the invasiveness of the variables at the top level, the grosness of shelling out to gcloud in the test - there's a lot that I see wrong and potentially very brittle here. cc @cjcullen |
@mbforbes was looking into something similar and may have some suggestions. |
@@ -55,6 +55,8 @@ func init() { | |||
flag.StringVar(&cloudConfig.MasterName, "kube-master", "", "Name of the kubernetes master. Only required if provider is gce or gke") | |||
flag.StringVar(&cloudConfig.ProjectID, "gce-project", "", "The GCE project being used, if applicable") | |||
flag.StringVar(&cloudConfig.Zone, "gce-zone", "", "GCE zone being used, if applicable") | |||
flag.StringVar(&cloudConfig.NodeInstanceGroup, "node-instance-group", "", "Name of the manage instance group for nodes. Valid only for gce and gke") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/manage/managed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note this is not valid for GKE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@mbforbes |
} | ||
} | ||
|
||
func verifyPodsListining(c *client.Client, ns, name string, pods *api.PodList) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
verifyPodsResponding? By Listing I understand calling 'get pods' on API server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment here—please change to return an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please either change to verifyPodsResponding
as @gmarek suggested, or s/Listining/Listening
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Failf("Couldn't restore the original node instance group size: %s", err) | ||
} | ||
if err := waitForNodeInstanceGroupSize(testContext.CloudConfig.NumNodes); err != nil { | ||
Failf("Couldn't restore the original node instance group size: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think errors are typically logged with %v
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
OK, I think I've thrown enough comments at you for the time being :-) Please ping and I'll be happy to look again. The one thing I wanted to ask explicitly—so it doesn't get lost in my syntactic comments—is whether the pods you're creating ("serve hostname") are guaranteed to run only one per node? I think that's what this test is intending, and what is implied, but I wanted to make sure that it's true. I think this can be done by ensuring that the port is locked so it'll conflict if two of the same pod try to get scheduled on a single node. Is the |
@mbforbes Regarding distributing single pod per node: I achieve it by creating a service spanning all the pods (our scheduler distributes pods from the same service to different nodes). |
return -1, err | ||
} | ||
pattern := "currentSize: " | ||
i := bytes.Index(output, []byte(pattern)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry if I missed this in the giant comment stream, but is there a reason not to use strings methods (https://golang.org/pkg/strings/) so you don't have to convert to []byte
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed it. Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Awesome, this is looking gorgeous; thanks so much for your patience @jszczepkowski. I've left a handful more minor comments, but this LGTM after they're resolved. |
ginkgo_args=() | ||
if [[ ${GINKGO_PARALLEL} =~ ^[yY]$ ]]; then | ||
ginkgo_args+=("-p") | ||
fi | ||
|
||
# Use the kubectl binary from the same directory as the e2e binary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an intentionally added line? It looks like it might be a merge artifact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be an artifact, removed.
@mbforbes PTAL |
Looks good, thanks @jszczepkowski! I'm still a little unclear on the pod rescheduling: because we decrease the number of nodes and ensure the pods are still running, it seems like the pods must not run with a hard requirement of 1/node. That means that the resize-up test could pass without any work being scheduled on the new node. But regardless, it'd be good to get this test in--in the off-chance I'm not confused about this, we can always tweak it a bit later. |
Implementation of e2e test that re-sizes cluster
Implementation of e2e test that re-sizes cluster; currently works on GCE.