-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubelet to POST pod status to apiserver #4561
Comments
In a long run, we want to do a bulk POST to apiserver, but that is not v1 blocker. |
api/$VERSION/namespaces/$NS/pods/$NAME/status for each pod. Sent from my iPhone
|
As suggested by @dchen1107 I'll work on this. <For some reason I can't assign myself to this issue> |
Out of curiosity why can we roll up Node and Pod status into a single status update? |
That is the plan to have a bulk status update, but not strictly required at this moment. |
@timothysc what URL do you propose for doing a node + pod update? |
@fgrzadkowski, FYI, my PR #5019 modifies kubelet to reject (set pod status to fail) pods that have port conflict. The status is stored in a map and gets reported back later via status polling. |
#5085 is quasi blocked on this - in that if we do graceful deletion with TTL (the optimal way) then we won't be able to clear the binding at the point the pod is actually deleted. We could still delete the binding at the point the TTL starts, which is somewhat reasonable (since you can't stop or delay a deletion as I've implemented it so far) because it will trigger the kubelet to remove the pod gracefully. However, since true graceful would be SIGTERM to Docker with the remaining TTL window as soon as the pod sees it, then SIGKILL when delete happens, that's harder to do if the pod disappears from the binding. |
I have almost ready PR for this (some tests are failing). Will send it on Monday. |
I had to revert PR #5305 due to bugs. Reopening issue. Will send fixed version soon. |
So in the case of deletion, we're now seeing gobs of traffic on deletion trying to send updated status and the api server presenting NOT-FOUND. Easy repro:
results in what appears to be a death spiral we can not exit from without a hard cluster reboot. More details: numerous 'kubectl get pods' return no result without an error, but I'm guessing yields ~429. |
Sent out #5619 to lower and spread that load. It lowers the qps from 100 to ~9. |
After discussions with @bgrant0607 and @dchen1107, they suggested to only update the status when it changes and on startup. The heartbeat will be handled by the node controller rather than per-pod. I'll file a separate issue for that and #5619 will go in for now. |
Split off #156 as a smaller more specific work item.
In the kubelet, once every N sync loops, it should POST
/api/$VERSION/namespaces/$NS/pods/$NAME/status
for each pod.The kubelet would do this if enabled by a flag, and emit a warning if it failed to POST the update.
The kubelet would ideally handle a 429 by retrying after the
Retry-after
header.The text was updated successfully, but these errors were encountered: