Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flake e2e node conformance test failure #25169

Closed
pwittrock opened this issue May 4, 2016 · 7 comments
Closed

flake e2e node conformance test failure #25169

pwittrock opened this issue May 4, 2016 · 7 comments
Assignees
Labels
kind/flake Categorizes issue or PR as related to a flaky test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@pwittrock
Copy link
Member

23:49:35 Expected error:
23:49:35 <*errors.StatusError | 0xc82057fe00>: {
23:49:35 ErrStatus: {
23:49:35 TypeMeta: {Kind: "Status", APIVersion: "v1"},
23:49:35 ListMeta: {SelfLink: "", ResourceVersion: ""},
23:49:35 Status: "Failure",
23:49:35 Message: "pods "busybox" not found",
23:49:35 Reason: "NotFound",
23:49:35 Details: {Name: "busybox", Group: "", Kind: "pods", Causes: nil, RetryAfterSeconds: 0},
23:49:35 Code: 404,
23:49:35 },
23:49:35 }
23:49:35 pods "busybox" not found
23:49:35 not to have occurred

logs

@pwittrock pwittrock added the kind/flake Categorizes issue or PR as related to a flaky test. label May 4, 2016
@pwittrock
Copy link
Member Author

cc @Random-Liu

@Random-Liu
Copy link
Member

Random-Liu commented May 4, 2016

@pwittrock @liangchenye
Offline discussed with @caesarxuchao, this is a known issue #19403 (comment).

For short, event though the pod is deleted on the apiserver, kubelet may still try to delete the pod again. If we create a pod with the same name and namespace soon after deleting the it, the new pod will be deleted by kubelet accidentally.

This is the same with #24937. This should have caused quite a few flakes, bump up the priority.

/cc @yujuhong

@pwittrock
Copy link
Member Author

Great. Seems like we have a quick fix then.

@Random-Liu Random-Liu added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label May 4, 2016
@yujuhong
Copy link
Contributor

yujuhong commented May 4, 2016

For short, event though the pod is deleted on the apiserver, kubelet may still try to delete the pod again. If we create a pod with the same name and namespace soon after deleting the it, the new pod will be deleted by kubelet accidentally.

Don't we check the pod UID before deleting it?

@Random-Liu
Copy link
Member

Random-Liu commented May 4, 2016

@yujuhong But there is still a race between kubelet and apiserver.

For example:

  • Kubelet gets the old pod, and updates the status.
  • Apiserver deletes the pod because it exceeds the graceful timeout.
  • Test creates the new pod with the same name and namespace.
  • Kubelet deletes the new pod.

I'm not sure whether this is the main reason of the flake, but we haven't found any other possible reason yet. :)

@yujuhong
Copy link
Contributor

yujuhong commented May 4, 2016

Ah...yeah...that jogged my memory. Thanks for the explanation. Deleting by UID would be great.

@caesarxuchao
Copy link
Member

Don't we check the pod UID before deleting it?

This is not enough, the pod may be deleted and recreated after the check in kubelet and before the deletion request reaches the API server. We need to use #22965, which technically is still "deleting by Name", but you can set the UID as a precondition.

k8s-github-robot pushed a commit that referenced this issue May 6, 2016
Automatic merge from submit-queue

Delete pod with uid as precondition.

Addressed #25169 (comment).

Fix #25169 
Fix #24937

This PR change status manager to delete pods with uid as a precondition, so that kubelet won't delete pods with different uid but the same name and namespace accidentally.

/cc @yujuhong
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/flake Categorizes issue or PR as related to a flaky test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

5 participants