Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubelet to fail pods that have hostPort conflicts, etc #4623

Closed
erictune opened this issue Feb 19, 2015 · 16 comments · Fixed by #5019
Closed

Kubelet to fail pods that have hostPort conflicts, etc #4623

erictune opened this issue Feb 19, 2015 · 16 comments · Fixed by #5019
Assignees
Labels
area/kubelet priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node.
Milestone

Comments

@erictune
Copy link
Member

Although the scheduler tries not to coschedule pods with hostPort conflicts, there could still be a conflict:

  1. once Remove HostPort conflict checking. #4619 goes in, the kubelet could still maybe get pods with conflict.
  2. file based pods could conflict with apiserver-based pods.

Right now, kubelet can detect a conflict, via the code in syncLoop, in pkg/kubelet/kubelet.go, where it calls filterHostPortConflicts(). However, two things need to change in that loop.

First, instead of just ignoring one of the two conflicting pods, the kubelet should immediately set the pod status to Failed with an event with a useful Reason. It should do this even for pods which requested restarting.

Second, it should do this in a deterministic way. Right now, it is based on the order of the items as they come out of the config code. Maybe it should iterate through the pods by creation timestamp.

@erictune
Copy link
Member Author

This would be a good project for someone who wants to get started in kubelet.

@erictune erictune added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. labels Feb 19, 2015
@erictune
Copy link
Member Author

You could e2e test this as follows. Create a pod in the API that requests the same hostPort as is already used by the file-based cadvisor container. The pod should fail and there should be an appropriate reason.

@erictune
Copy link
Member Author

@dchen1107 we just talked about this.

@erictune
Copy link
Member Author

in future we might use this to fail due to out of resources, or other setup errors.

@thockin
Copy link
Member

thockin commented Feb 19, 2015

Might need some way to poison the node from future scheduling decisions - tricky.

@pmorie
Copy link
Member

pmorie commented Feb 19, 2015

+1 to this, do want
On Thu, Feb 19, 2015 at 5:42 PM Tim Hockin notifications@github.com wrote:

Might need some way to poison the node from future scheduling decisions -
tricky.


Reply to this email directly or view it on GitHub
#4623 (comment)
.

@dchen1107 dchen1107 added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Feb 20, 2015
@dchen1107 dchen1107 added this to the v1.0 milestone Feb 20, 2015
@erictune
Copy link
Member Author

@thockin
there are two cases:

  1. two schedulers racing to start a pod on the same node
    • one pod will get the the node first,
    • the second will be kicked off by the kubelet
    • the rc for the second pod makes a new pod
    • there should not be a race the second time because all the schedulers know about the first pod now.
  2. file pod versus apiserver pod on the same node.

@thockin
Copy link
Member

thockin commented Feb 20, 2015

Oh, I thought this was getting rid of HostPort conflict checking entirely?
Is that ALSO happening somewhere else?

On Thu, Feb 19, 2015 at 4:30 PM, Eric Tune notifications@github.com wrote:

@thockin https://github.com/thockin
there are two cases:

  1. two schedulers racing to start a pod on the same node

Reply to this email directly or view it on GitHub
#4623 (comment)
.

@bgrant0607
Copy link
Member

This is needed in order to eliminate BoundPods.

@yujuhong
Copy link
Contributor

kubelet now sorts the pods before filtering, and also records an event on host port conflict. To actually reject such pods, we will need to kubelet to post pod status back to apiserver (#4561)

@erictune
Copy link
Member Author

The scheduler still checks. This handles parallel scheduler races and file
based pods.
On Feb 27, 2015 4:15 PM, "Yu-Ju Hong" notifications@github.com wrote:

kubelet now sorts the pods before filtering, and also records an event on
host port conflict. To actually reject such pods, we will need to
kubelet to post pod status back to apiserver (#4561
#4561)


Reply to this email directly or view it on GitHub
#4623 (comment)
.

@brendandburns
Copy link
Contributor

There should be no parallel scheduler races, we use resource versions and
optimistic concurrency to ensure correctness.

Brendan
On Feb 27, 2015 4:55 PM, "Eric Tune" notifications@github.com wrote:

The scheduler still checks. This handles parallel scheduler races and file
based pods.
On Feb 27, 2015 4:15 PM, "Yu-Ju Hong" notifications@github.com wrote:

kubelet now sorts the pods before filtering, and also records an event on
host port conflict. To actually reject such pods, we will need to
kubelet to post pod status back to apiserver (#4561
#4561)


Reply to this email directly or view it on GitHub
<
#4623 (comment)

.


Reply to this email directly or view it on GitHub
#4623 (comment)
.

@bgrant0607
Copy link
Member

Re. atomicity -- I don't buy that we need it. I commented here: #2483 (comment)

@yujuhong
Copy link
Contributor

yujuhong commented Mar 2, 2015

@erictune, do you mean that we should still set the pod status to "failed" so that when the scheduler check, it can get the accurate pod status back? I had thought about it, but the pod statuses are computed on the fly (by checking the containers) as of now. Without the ability to directly set the status, we need to cache this information somewhere. Since there are people touching the same code/file, I'd rather wait until #4561 is fixed.

@erictune
Copy link
Member Author

erictune commented Mar 2, 2015

@brendandburns
perhaps parallel was the wrong word, but a scheduler race appears possible. the cache of pods it uses is not updated atomically with binding pods. It appears the scheduler could:

  1. successfully do a /binding of pod A with hostPort N to host H
  2. read the cache and see A not bound
  3. consider binding B with hostPort N to host H, and not see a conflict.
  4. successfully do a /binding of pod B with hostPort N to host H
  5. later, the cache notices that A is bound to H

@yujuhong
Copy link
Contributor

yujuhong commented Mar 3, 2015

@erictune and I discussed offline, and we decided to create a map to store the rejected pod->status, so that the failed status will be correctly reports when polling happens. By doing this, this issue is no longer blocked by any other issues.

I will write up a PR to implement this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants