-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor kubelet syncPod method and fix behavior of Infra Containers for RestartPolicy::Never. #5022
Conversation
@yujuhong do you want to take a first look at this? Feel free to reassign if you need to. |
glog.Errorf("Couldn't make a ref to pod %q: '%v'", podFullName, err) | ||
} | ||
|
||
func (kl *Kubelet) MakeDataDirs(uid types.UID) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this probably doesn't have to be public
One overall comment I would add is that it may be worthwhile to make some of these changes separate and smaller PRs. This code is being changed by ~3 people concurrently and we've had enough trouble with changes to it that I err on the side of caution :) |
if err != nil { | ||
glog.Errorf("Failed to introspect pod infra container: %v; Skipping pod %q", err, podFullName) | ||
return err | ||
func (kl *Kubelet) ShouldContainerBeRestarted(pod *api.BoundPod, containerName, dockerContainerName string, uid types.UID) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here about being private
#5032 is also small enough and may be useful in this refactoring. That will probably get merged in tomorrow, it only has minor comments. |
Sorry, somehow I missed this issue. I have not touched the pod<->container syncing code, so I will re-assign to @vmarmol, who has already commented on the PR (thanks!). |
I was already reviewing the other one so figured I should take a look :) |
|
||
podStatus, podInfraContainerID, found := kl.getPodInfraContainer(podFullName, uid, dockerContainers) | ||
if found { | ||
containersToKeep[podInfraContainerID] = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this case I believe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do. Otherwise we'll delete infra container when deleting everything we don't want to keep.
Overall I like the direction and the changes. I would urge for smaller changes with a promise of faster review. That is almost certainly me being weary of change in this path though :) I have a few more comments around simplifying some of the flow, but I'll hold until you let me know it is ready for more review. |
e0035af
to
494446f
Compare
I managed to write new WIP version of this PR. As previously I didn't fix tests, but this time everything seems to be working (played with a cluster a little bit). This one is big, because it splits the logic into analysis and application parts AND changes how we handle Infra Containers (they'll die if they're only thing that's alive). As the split already pretty much rewrites the whole logic there, I think there's no reason to do it in two separate steps. PTAL. I find the logic there hard to follow (those 'continues'...) so I'm pretty sure I missed some corner case there. Two cases to consider:
|
Friendly ping? |
StartInfraContainer bool | ||
InfraContainerId dockertools.DockerID | ||
ContainersToStart map[int]empty | ||
ContainersToKeep map[dockertools.DockerID]int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What represents the "int" in this map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Index of container for pod.Spec.Containers. It was the pointer before but it doesn't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this can be a map to empty. I don't see us use the index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do, when we find a container which is changed, and thus we restart infra container. In this case we move 'containersToKeep' to 'containersToStart', which, sadly, has to have different types (if we wan't to even pretend to be efficient-ish). containersToKeep are running Docker containers, while containersToStart are specs of containers to start. Yuck...
Sorry, I missed this yesterday somehow. I'll review. Needs a rebase btw. |
@@ -1152,120 +1152,183 @@ func (kl *Kubelet) pullImageAndRunContainer(pod *api.BoundPod, container *api.Co | |||
return containerID, nil | |||
} | |||
|
|||
func (kl *Kubelet) syncPod(pod *api.BoundPod, containersInPod dockertools.DockerContainers) error { | |||
type podContainerChangesSpec struct { | |||
StartInfraContainer bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: these should be lowercase since the type itself is not publicly exposed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
if containerChanges.StartInfraContainer || (len(containerChanges.ContainersToKeep) == 0 && len(containerChanges.ContainersToStart) == 0) { | ||
glog.V(4).Infof("Killing Infra Container for %q", podFullName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets log which of these two cases happened so we know why we're killing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Mainly minor comments, looks good overall. Let me know how the e2e does, I think you said they all passed. |
I run e2e on other PR. For this one they fail, as they fail on the head right now... |
4c1efa1
to
e15eb6e
Compare
With these changes please squash commits and I think we're ready. We're only waiting on e2e. |
After fixing e2e tests I have 25 green and 2 pending. |
@gmarek sorry to be a pain, but can we remove the "WIP" from the commit message :) |
Sorry. I should have done it myself. Done. |
Thanks for putting up with the nits! Merging. |
Refactor kubelet syncPod method and fix behavior of Infra Containers for RestartPolicy::Never.
WIP version of refactoring needed by #4731. Tests obviously don't pass, as this PR changes order and number of methods called. I'll fix them when we agree on the finial version of the flow. I'll be grateful if someone who knows more what's going on on the node will take a look, and see if this change even remotely makes sense. I tried to keep old semantics of things that I don't understand, but I changed some in places I thought I do.