Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test to detach a pd whose node was deleted #36009

Merged
merged 1 commit into from
Dec 20, 2016

Conversation

rkouj
Copy link
Contributor

@rkouj rkouj commented Nov 1, 2016

What this PR does / why we need it:
A test for the following issue :
If a node with a GCE PD attached is deleted (before the volume is detached), subsequent attempts by the attach/detach controller to detach it should not fail.

Bonus :Added additional code to ensure that the pd can still be attached to a different node.
Edit : Removed it as it was making the test much slower.

#29358


This change is Reviewable

@k8s-github-robot k8s-github-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed labels Nov 1, 2016
@saad-ali saad-ali assigned saad-ali and unassigned ixdy Nov 1, 2016
@rmmh rmmh closed this Nov 2, 2016
@rmmh rmmh reopened this Nov 2, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins unit/integration failed for commit 42191d723cc89bcca6cd79527a1fac8707d68149. Full PR test history.

The magic incantation to run this job again is @k8s-bot unit test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins GCI GCE e2e failed for commit 42191d723cc89bcca6cd79527a1fac8707d68149. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins GKE smoke e2e failed for commit 42191d723cc89bcca6cd79527a1fac8707d68149. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins GCE etcd3 e2e failed for commit 42191d723cc89bcca6cd79527a1fac8707d68149. Full PR test history.

The magic incantation to run this job again is @k8s-bot gce etcd3 e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 10, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins verification failed for commit 571d3cfb15831eb00cd21d7e2b778f5dfde27a94. Full PR test history.

The magic incantation to run this job again is @k8s-bot verify test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-github-robot k8s-github-robot added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Nov 11, 2016
Copy link
Member

@saad-ali saad-ali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to reuse existing code instead of writing your own resize routine.

Also run the test back to back for a couple hours and to test how stable it is. You can use this shell script (just replace with your test name)--it will repeatedly run the same test until failure.


By("deleting host0")

output, err = exec.Command("gcloud", "compute", "instances", "delete", string(host0Name), "--project="+framework.TestContext.CloudConfig.ProjectID, "--zone="+framework.TestContext.CloudConfig.Zone).CombinedOutput()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of writing your own cluster resizing routine, reuse the existing code in e2e/resize_nodes.go:

By(fmt.Sprintf("decreasing cluster size to %d", initialGroupSize-1))
err = ResizeGroup(group, initialGroupSize-1)
Expect(err).NotTo(HaveOccurred())
err = WaitForGroupSize(group, initialGroupSize-1)
Expect(err).NotTo(HaveOccurred())
err = framework.WaitForClusterSize(c, int(initialGroupSize-1), 10*time.Minute)
Expect(err).NotTo(HaveOccurred())

Maybe stick it in a common method ResizeCluster(newSize int).

Also immediately prior to resizing make sure to call defer... that reverts the cluster to the original size (even if there is a failure). Again you can probably reuse the code from AfterEach in ResizeCluster(newSize int)

Copy link
Contributor Author

@rkouj rkouj Nov 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ResizeCluster(newsize int) picks a random node to shut down which may or may not help me with my test.
What I wanted to do was to shut down a specific node on which the volume was mounted on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. There is the potential for things not going quite as planned here since the managed-instance-group recreates the missing node. But I don't see a good way to have the MIG kill only the node that we want instead of a random one.

Pet set recreate should recreate evicted petset,hurf,1
PetSet Basic PetSet functionality should handle healthy pet restarts during scale,kevin-wangzefeng,1
PetSet Basic PetSet functionality should provide basic identity,girishkalele,1
PetSet Deploy clustered applications should creating a working mysql cluster,piosz,1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's up with these other tests being added in the PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#36663 is fixing this looks like this file was out of date.

@apelisse apelisse removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Nov 11, 2016
Copy link
Member

@saad-ali saad-ali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM just run this in a loop for a couple hours and see how flaky it is.


By("deleting host0")

output, err = exec.Command("gcloud", "compute", "instances", "delete", string(host0Name), "--project="+framework.TestContext.CloudConfig.ProjectID, "--zone="+framework.TestContext.CloudConfig.Zone).CombinedOutput()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. There is the potential for things not going quite as planned here since the managed-instance-group recreates the missing node. But I don't see a good way to have the MIG kill only the node that we want instead of a random one.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 15, 2016
@rkouj
Copy link
Contributor Author

rkouj commented Nov 28, 2016

Ran this 512 times

grep -i "1 passed" gce-pd-out.out | wc -l
512

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 28, 2016
@rkouj rkouj force-pushed the GCE-PD-test branch 2 times, most recently from f1f905f to 6827be6 Compare November 28, 2016 19:08
@rkouj
Copy link
Contributor Author

rkouj commented Nov 28, 2016

Initially the cluster size wasn't able to come back to the original size. I have increased the timeout in ResizeNodes() so that there is enough time for the all nodes to come back. Wasn't able to get to this earlier as there were other higher priority tasks to deal with.

@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 28, 2016
@rkouj rkouj force-pushed the GCE-PD-test branch 4 times, most recently from 1eaaa58 to 2545243 Compare November 28, 2016 20:44
@k8s-ci-robot
Copy link
Contributor

Jenkins Kubemark GCE e2e failed for commit 1eaaa5897f33c5e30ea67f2a6860c51867d8de6c. Full PR test history.

The magic incantation to run this job again is @k8s-bot kubemark e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins GCI GKE smoke e2e failed for commit 1eaaa5897f33c5e30ea67f2a6860c51867d8de6c. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins GCE Node e2e failed for commit 1eaaa5897f33c5e30ea67f2a6860c51867d8de6c. Full PR test history.

The magic incantation to run this job again is @k8s-bot node e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins kops AWS e2e failed for commit 1eaaa5897f33c5e30ea67f2a6860c51867d8de6c. Full PR test history.

The magic incantation to run this job again is @k8s-bot kops aws e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 5, 2016
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 19, 2016
@saad-ali
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 19, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins GCE e2e failed for commit f67b495. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-github-robot k8s-github-robot added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Dec 19, 2016
@rkouj
Copy link
Contributor Author

rkouj commented Dec 20, 2016

@k8s-bot cvm gce e2e test this

@saad-ali saad-ali added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. release-note-label-needed labels Dec 20, 2016
@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit b3e5725 into kubernetes:master Dec 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants