Add test to detach a pd whose node was deleted #36009

rkouj · 2016-11-01T21:36:05Z

What this PR does / why we need it:
A test for the following issue :
If a node with a GCE PD attached is deleted (before the volume is detached), subsequent attempts by the attach/detach controller to detach it should not fail.

Bonus :Added additional code to ensure that the pd can still be attached to a different node.
Edit : Removed it as it was making the test much slower.

#29358

This change is

k8s-ci-robot · 2016-11-02T19:20:51Z

Jenkins unit/integration failed for commit 42191d723cc89bcca6cd79527a1fac8707d68149. Full PR test history.

The magic incantation to run this job again is @k8s-bot unit test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-11-02T19:21:02Z

Jenkins GCI GCE e2e failed for commit 42191d723cc89bcca6cd79527a1fac8707d68149. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-11-02T19:21:42Z

Jenkins GKE smoke e2e failed for commit 42191d723cc89bcca6cd79527a1fac8707d68149. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-11-02T19:21:52Z

Jenkins GCE etcd3 e2e failed for commit 42191d723cc89bcca6cd79527a1fac8707d68149. Full PR test history.

The magic incantation to run this job again is @k8s-bot gce etcd3 e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-11-10T06:08:04Z

Jenkins verification failed for commit 571d3cfb15831eb00cd21d7e2b778f5dfde27a94. Full PR test history.

The magic incantation to run this job again is @k8s-bot verify test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

saad-ali

Try to reuse existing code instead of writing your own resize routine.

Also run the test back to back for a couple hours and to test how stable it is. You can use this shell script (just replace with your test name)--it will repeatedly run the same test until failure.

saad-ali · 2016-11-10T23:58:03Z

test/e2e/pd.go

+
+		By("deleting host0")
+
+		output, err = exec.Command("gcloud", "compute", "instances", "delete", string(host0Name), "--project="+framework.TestContext.CloudConfig.ProjectID, "--zone="+framework.TestContext.CloudConfig.Zone).CombinedOutput()


Instead of writing your own cluster resizing routine, reuse the existing code in e2e/resize_nodes.go:

By(fmt.Sprintf("decreasing cluster size to %d", initialGroupSize-1)) err = ResizeGroup(group, initialGroupSize-1) Expect(err).NotTo(HaveOccurred()) err = WaitForGroupSize(group, initialGroupSize-1) Expect(err).NotTo(HaveOccurred()) err = framework.WaitForClusterSize(c, int(initialGroupSize-1), 10*time.Minute) Expect(err).NotTo(HaveOccurred())

Maybe stick it in a common method ResizeCluster(newSize int).

Also immediately prior to resizing make sure to call defer... that reverts the cluster to the original size (even if there is a failure). Again you can probably reuse the code from AfterEach in ResizeCluster(newSize int)

The ResizeCluster(newsize int) picks a random node to shut down which may or may not help me with my test.
What I wanted to do was to shut down a specific node on which the volume was mounted on.

Ack. There is the potential for things not going quite as planned here since the managed-instance-group recreates the missing node. But I don't see a good way to have the MIG kill only the node that we want instead of a random one.

saad-ali · 2016-11-11T00:28:20Z

test/test_owners.csv

+Pet set recreate should recreate evicted petset,hurf,1
+PetSet Basic PetSet functionality should handle healthy pet restarts during scale,kevin-wangzefeng,1
+PetSet Basic PetSet functionality should provide basic identity,girishkalele,1
+PetSet Deploy clustered applications should creating a working mysql cluster,piosz,1


What's up with these other tests being added in the PR?

#36663 is fixing this looks like this file was out of date.

saad-ali

LGTM just run this in a loop for a couple hours and see how flaky it is.

saad-ali · 2016-11-14T01:53:21Z

test/e2e/pd.go

+
+		By("deleting host0")
+
+		output, err = exec.Command("gcloud", "compute", "instances", "delete", string(host0Name), "--project="+framework.TestContext.CloudConfig.ProjectID, "--zone="+framework.TestContext.CloudConfig.Zone).CombinedOutput()


Ack. There is the potential for things not going quite as planned here since the managed-instance-group recreates the missing node. But I don't see a good way to have the MIG kill only the node that we want instead of a random one.

rkouj · 2016-11-28T18:53:51Z

Ran this 512 times

grep -i "1 passed" gce-pd-out.out | wc -l
512

rkouj · 2016-11-28T19:13:18Z

Initially the cluster size wasn't able to come back to the original size. I have increased the timeout in ResizeNodes() so that there is enough time for the all nodes to come back. Wasn't able to get to this earlier as there were other higher priority tasks to deal with.

k8s-ci-robot · 2016-11-28T20:54:35Z

Jenkins Kubemark GCE e2e failed for commit 1eaaa5897f33c5e30ea67f2a6860c51867d8de6c. Full PR test history.

The magic incantation to run this job again is @k8s-bot kubemark e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-11-28T20:59:46Z

Jenkins GCI GKE smoke e2e failed for commit 1eaaa5897f33c5e30ea67f2a6860c51867d8de6c. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-11-28T21:00:26Z

Jenkins GCE Node e2e failed for commit 1eaaa5897f33c5e30ea67f2a6860c51867d8de6c. Full PR test history.

The magic incantation to run this job again is @k8s-bot node e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-11-28T21:01:58Z

Jenkins kops AWS e2e failed for commit 1eaaa5897f33c5e30ea67f2a6860c51867d8de6c. Full PR test history.

The magic incantation to run this job again is @k8s-bot kops aws e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

saad-ali · 2016-12-19T23:40:45Z

/lgtm

k8s-ci-robot · 2016-12-19T23:43:16Z

Jenkins GCE e2e failed for commit f67b495. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

rkouj · 2016-12-20T00:49:39Z

@k8s-bot cvm gce e2e test this

k8s-github-robot · 2016-12-20T03:45:48Z

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

k8s-github-robot · 2016-12-20T04:24:02Z

Automatic merge from submit-queue

googlebot added the cla: yes label Nov 1, 2016

k8s-github-robot assigned ixdy Nov 1, 2016

k8s-github-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed labels Nov 1, 2016

saad-ali assigned saad-ali and unassigned ixdy Nov 1, 2016

rmmh closed this Nov 2, 2016

rmmh reopened this Nov 2, 2016

rkouj force-pushed the GCE-PD-test branch from 42191d7 to 571d3cf Compare November 10, 2016 05:35

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 10, 2016

rkouj force-pushed the GCE-PD-test branch from 571d3cf to 99f52ee Compare November 10, 2016 06:37

k8s-github-robot added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Nov 11, 2016

saad-ali suggested changes Nov 11, 2016

View reviewed changes

apelisse removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Nov 11, 2016

saad-ali approved these changes Nov 14, 2016

View reviewed changes

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 15, 2016

rkouj force-pushed the GCE-PD-test branch from 99f52ee to 5e33c33 Compare November 28, 2016 19:05

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 28, 2016

rkouj force-pushed the GCE-PD-test branch 2 times, most recently from f1f905f to 6827be6 Compare November 28, 2016 19:08

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 28, 2016

rkouj force-pushed the GCE-PD-test branch 4 times, most recently from 1eaaa58 to 2545243 Compare November 28, 2016 20:44

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 5, 2016

Add test to detach a pd whose node was deleted

f67b495

rkouj force-pushed the GCE-PD-test branch from 2545243 to f67b495 Compare December 19, 2016 22:58

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 19, 2016

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 19, 2016

k8s-github-robot added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Dec 19, 2016

saad-ali added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. release-note-label-needed labels Dec 20, 2016

k8s-github-robot merged commit b3e5725 into kubernetes:master Dec 20, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add test to detach a pd whose node was deleted #36009

Add test to detach a pd whose node was deleted #36009

rkouj commented Nov 1, 2016 •

edited

Loading

k8s-ci-robot commented Nov 2, 2016

k8s-ci-robot commented Nov 2, 2016

k8s-ci-robot commented Nov 2, 2016

k8s-ci-robot commented Nov 2, 2016

k8s-ci-robot commented Nov 10, 2016

saad-ali left a comment

saad-ali Nov 10, 2016

rkouj Nov 11, 2016 •

edited

Loading

saad-ali Nov 14, 2016

saad-ali Nov 11, 2016

saad-ali Nov 14, 2016

saad-ali left a comment

saad-ali Nov 14, 2016

rkouj commented Nov 28, 2016

rkouj commented Nov 28, 2016

k8s-ci-robot commented Nov 28, 2016

k8s-ci-robot commented Nov 28, 2016

k8s-ci-robot commented Nov 28, 2016

k8s-ci-robot commented Nov 28, 2016

saad-ali commented Dec 19, 2016

k8s-ci-robot commented Dec 19, 2016

rkouj commented Dec 20, 2016

k8s-github-robot commented Dec 20, 2016

k8s-github-robot commented Dec 20, 2016


		By("deleting host0")

		output, err = exec.Command("gcloud", "compute", "instances", "delete", string(host0Name), "--project="+framework.TestContext.CloudConfig.ProjectID, "--zone="+framework.TestContext.CloudConfig.Zone).CombinedOutput()

Add test to detach a pd whose node was deleted #36009

Add test to detach a pd whose node was deleted #36009

Conversation

rkouj commented Nov 1, 2016 • edited Loading

k8s-ci-robot commented Nov 2, 2016

k8s-ci-robot commented Nov 2, 2016

k8s-ci-robot commented Nov 2, 2016

k8s-ci-robot commented Nov 2, 2016

k8s-ci-robot commented Nov 10, 2016

saad-ali left a comment

Choose a reason for hiding this comment

saad-ali Nov 10, 2016

Choose a reason for hiding this comment

rkouj Nov 11, 2016 • edited Loading

Choose a reason for hiding this comment

saad-ali Nov 14, 2016

Choose a reason for hiding this comment

saad-ali Nov 11, 2016

Choose a reason for hiding this comment

saad-ali Nov 14, 2016

Choose a reason for hiding this comment

saad-ali left a comment

Choose a reason for hiding this comment

saad-ali Nov 14, 2016

Choose a reason for hiding this comment

rkouj commented Nov 28, 2016

rkouj commented Nov 28, 2016

k8s-ci-robot commented Nov 28, 2016

k8s-ci-robot commented Nov 28, 2016

k8s-ci-robot commented Nov 28, 2016

k8s-ci-robot commented Nov 28, 2016

saad-ali commented Dec 19, 2016

k8s-ci-robot commented Dec 19, 2016

rkouj commented Dec 20, 2016

k8s-github-robot commented Dec 20, 2016

k8s-github-robot commented Dec 20, 2016

rkouj commented Nov 1, 2016 •

edited

Loading

rkouj Nov 11, 2016 •

edited

Loading