Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor kubelet syncPod method and fix behavior of Infra Containers for RestartPolicy::Never. #5022

Merged
merged 1 commit into from
Mar 13, 2015

Conversation

gmarek
Copy link
Contributor

@gmarek gmarek commented Mar 4, 2015

WIP version of refactoring needed by #4731. Tests obviously don't pass, as this PR changes order and number of methods called. I'll fix them when we agree on the finial version of the flow. I'll be grateful if someone who knows more what's going on on the node will take a look, and see if this change even remotely makes sense. I tried to keep old semantics of things that I don't understand, but I changed some in places I thought I do.

@erictune
Copy link
Member

erictune commented Mar 4, 2015

@yujuhong do you want to take a first look at this? Feel free to reassign if you need to.

glog.Errorf("Couldn't make a ref to pod %q: '%v'", podFullName, err)
}

func (kl *Kubelet) MakeDataDirs(uid types.UID) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this probably doesn't have to be public

@vmarmol
Copy link
Contributor

vmarmol commented Mar 5, 2015

One overall comment I would add is that it may be worthwhile to make some of these changes separate and smaller PRs. This code is being changed by ~3 people concurrently and we've had enough trouble with changes to it that I err on the side of caution :)

if err != nil {
glog.Errorf("Failed to introspect pod infra container: %v; Skipping pod %q", err, podFullName)
return err
func (kl *Kubelet) ShouldContainerBeRestarted(pod *api.BoundPod, containerName, dockerContainerName string, uid types.UID) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here about being private

@vmarmol
Copy link
Contributor

vmarmol commented Mar 5, 2015

#5032 is also small enough and may be useful in this refactoring. That will probably get merged in tomorrow, it only has minor comments.

@yujuhong
Copy link
Contributor

yujuhong commented Mar 5, 2015

Sorry, somehow I missed this issue. I have not touched the pod<->container syncing code, so I will re-assign to @vmarmol, who has already commented on the PR (thanks!).

@yujuhong yujuhong assigned vmarmol and unassigned yujuhong Mar 5, 2015
@vmarmol
Copy link
Contributor

vmarmol commented Mar 5, 2015

I was already reviewing the other one so figured I should take a look :)


podStatus, podInfraContainerID, found := kl.getPodInfraContainer(podFullName, uid, dockerContainers)
if found {
containersToKeep[podInfraContainerID] = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this case I believe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do. Otherwise we'll delete infra container when deleting everything we don't want to keep.

@vmarmol
Copy link
Contributor

vmarmol commented Mar 5, 2015

Overall I like the direction and the changes. I would urge for smaller changes with a promise of faster review. That is almost certainly me being weary of change in this path though :)

I have a few more comments around simplifying some of the flow, but I'll hold until you let me know it is ready for more review.

@gmarek
Copy link
Contributor Author

gmarek commented Mar 10, 2015

I managed to write new WIP version of this PR. As previously I didn't fix tests, but this time everything seems to be working (played with a cluster a little bit).

This one is big, because it splits the logic into analysis and application parts AND changes how we handle Infra Containers (they'll die if they're only thing that's alive). As the split already pretty much rewrites the whole logic there, I think there's no reason to do it in two separate steps.

PTAL. I find the logic there hard to follow (those 'continues'...) so I'm pretty sure I missed some corner case there.

Two cases to consider:

  • differentiating starting fresh pod from the situation when Infra Pod died when RestartPolicy is Never, and we wan't to kill all containers ("shouldContainerBeRestarted" will return true, as it won't be able to find any past failures).
  • case of 'paused' containers: I believe we completely ignore them right now.

@gmarek
Copy link
Contributor Author

gmarek commented Mar 11, 2015

Friendly ping?

StartInfraContainer bool
InfraContainerId dockertools.DockerID
ContainersToStart map[int]empty
ContainersToKeep map[dockertools.DockerID]int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What represents the "int" in this map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index of container for pod.Spec.Containers. It was the pointer before but it doesn't work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this can be a map to empty. I don't see us use the index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do, when we find a container which is changed, and thus we restart infra container. In this case we move 'containersToKeep' to 'containersToStart', which, sadly, has to have different types (if we wan't to even pretend to be efficient-ish). containersToKeep are running Docker containers, while containersToStart are specs of containers to start. Yuck...

@vmarmol
Copy link
Contributor

vmarmol commented Mar 11, 2015

Sorry, I missed this yesterday somehow. I'll review. Needs a rebase btw.

@@ -1152,120 +1152,183 @@ func (kl *Kubelet) pullImageAndRunContainer(pod *api.BoundPod, container *api.Co
return containerID, nil
}

func (kl *Kubelet) syncPod(pod *api.BoundPod, containersInPod dockertools.DockerContainers) error {
type podContainerChangesSpec struct {
StartInfraContainer bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these should be lowercase since the type itself is not publicly exposed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

if containerChanges.StartInfraContainer || (len(containerChanges.ContainersToKeep) == 0 && len(containerChanges.ContainersToStart) == 0) {
glog.V(4).Infof("Killing Infra Container for %q", podFullName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets log which of these two cases happened so we know why we're killing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@vmarmol
Copy link
Contributor

vmarmol commented Mar 12, 2015

Mainly minor comments, looks good overall. Let me know how the e2e does, I think you said they all passed.

@gmarek
Copy link
Contributor Author

gmarek commented Mar 12, 2015

I run e2e on other PR. For this one they fail, as they fail on the head right now...

@gmarek gmarek force-pushed the client3 branch 6 times, most recently from 4c1efa1 to e15eb6e Compare March 12, 2015 14:55
@vmarmol
Copy link
Contributor

vmarmol commented Mar 12, 2015

With these changes please squash commits and I think we're ready. We're only waiting on e2e.

@gmarek gmarek changed the title WIP: refactor kubelet syncPod method Refactor kubelet syncPod method and fix behavior of Infra Containers for RestartPolicy::Never. Mar 13, 2015
@gmarek
Copy link
Contributor Author

gmarek commented Mar 13, 2015

After fixing e2e tests I have 25 green and 2 pending.

@vmarmol
Copy link
Contributor

vmarmol commented Mar 13, 2015

@gmarek sorry to be a pain, but can we remove the "WIP" from the commit message :)

@gmarek
Copy link
Contributor Author

gmarek commented Mar 13, 2015

Sorry. I should have done it myself. Done.

@vmarmol
Copy link
Contributor

vmarmol commented Mar 13, 2015

Thanks for putting up with the nits! Merging.

vmarmol added a commit that referenced this pull request Mar 13, 2015
Refactor kubelet syncPod method and fix behavior of Infra Containers for RestartPolicy::Never.
@vmarmol vmarmol merged commit 505dbc5 into kubernetes:master Mar 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants