-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update pod status only when it changes. #5714
Update pod status only when it changes. #5714
Conversation
Shippable is complaining about gofmt. |
} | ||
|
||
func (kl *Kubelet) generatePodStatus(podFullName string, uid types.UID) (api.PodStatus, error) { | ||
func (kl *Kubelet) generatePodStatus(podFullName string) (api.PodStatus, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually we should subsume this into the new status object too. Not for this PR though :) there is some more refactoring needed before that I think.
d48fa4e
to
415733d
Compare
415733d
to
f068d61
Compare
e2e tests pass (liveness.sh flaked - successful 3/4 times) |
f068d61
to
6af068a
Compare
status api.PodStatus | ||
} | ||
|
||
type statusManager struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we comment that the implementation is thread-safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Looks good, just two comments. Also needs a rebase it seems. |
} | ||
_, err = s.kubeClient.Pods(namespace).UpdateStatus(name, &status) | ||
if err != nil { | ||
glog.Warningf("Error updating status for pod %q: %v", name, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the apiserver request fails, would we lose this status transition completely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, we should retry on failure. Easy way is to remove the status from our map so that SetStatus() will re-send it. We won't be able to lock around the send in SetStatus() though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow! That's a good catch! Thanks :) Fixed. I'm deleting cached value to make sure it is retried next time.
6af068a
to
f0615c1
Compare
Rebased. |
For some reason integration test is flaky. I'm still not sure whether it's related to this PR. I'll investigate tomorrow. |
It's fixed in #5775 We were always terminating the podRunning loop early for mirror pods. ----- Original Message -----
|
f0615c1
to
c88c703
Compare
Thanks. This helped! |
if found { | ||
glog.V(3).Infof("Returning cached status for %q", podFullName) | ||
return cachedPodStatus, nil | ||
} | ||
return kl.generatePodStatus(podFullName, uid) | ||
return kl.generatePodStatus(podFullName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kubelet failed to retrieve cached PodStatus from statusManager, regenerate it from scratch. Shouldn't this update the cached one in statusManager too?
LGTM overall except one small nit / question. |
It seems that after my last rebase, e2e stopped working. It's pretty late here, so I'll investigate tomorrow. |
FYI - they were passing before the rebase. |
There were issues with the e2e tests, may be worth re-running after a rebase. |
You have to rebase first before merge. |
I thought I proposed having podWorker periodically update PodStatus before, and we agreed to have that in a separate PR later. But now with introducing statusManager to populate PodStatus for all pods, it looks like we take a different approach. Why we prefer this way? |
@dchen1107 I don't see how is it different. The only thing that has changed is that syncing logic is outside of kubelet itself. We still populate pod status from pod worker via SyncPod method that is called in PodWorker. |
Rebased. It helped for for e2e tests that pass now. @vmarmol Let's merge, before I need to rebase again :) |
Oooh man, somehow it seems we'll need to rebase again! sorry @fgrzadkowski :( I'm here waiting by the merge button |
c88c703
to
0dd7743
Compare
Rebased :) |
* Refactor syncing logic into a separate struct
0dd7743
to
632ca50
Compare
And rebased again. This time looks OK. |
@fgrzadkowski Here is the reason on why: Currently PodWorker create and run containers which might fail due to all kinds of reasons. Sync Status logic uses docker inspect to retrieve the reason for failure, but there is no easy way to find on why a container failed on creation. If PodWorker does both, we could propogate the creation failures much cleaner. I am ok with what we are having here for now. |
@dchen1107 I think we do what you'd expect today. We set the pod status in the syncWorker but we actually send it in an async thread. We will propagate the creation failure. |
LGTM, will merge on green. Thanks @fgrzadkowski! You get the "most rebased" award :) |
Update pod status only when it changes.
@fgrzadkowski and @vmarmol ahh, I misread that code since the code is outside both PodWorker and Kubelet, but actually setter logic is invoked by PodWorker. Thanks! LGTM |
This PR is ready for review, but I haven't run e2e tests. I'm sending this now so that I can address comments on Monday CET time.
This fixes #5624 and #5693
I plan to apply the same logic (update upon change) for node status updates.
@vmarmol @bgrant0607 @dchen1107 @smarterclayton