-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute container hash based on API content, not go type #57741
Conversation
|
9ddc712
to
16dce31
Compare
/cc @yujuhong |
This wouldn't help much. If you read the old issue #23104 completely, you'd realize that if you introduce a new field in the container spec with a non-nil default value, the hash would still be different. I do think this behavior (container restarts) should be call out in the release note. |
I don't understand how the serialization/deserialization helps the issue |
@liggitt By serialization/deserialization, the new obj removes nil ptr and empty slice from the struct, which will keep the to-be-hashed obj unchanged even with new ptr and array type added. Currently the container hash will get changed if struct This is why we should omit nil or empty field when calculating container hash value. |
@yujuhong Yes, right. IMO, if we change the default behavior, we should let the container get restarted. That means the hash got changed is desired. What this PR is going to do is to omit nil or empty new/old fields when calculating container hash. For new-added non-nil or non-empty filed, it is inevitable to let the hash changed. |
I still don't understand. All the pods the kubelet deals with should be obtained from deserializing API responses or manifests from disk. How does reserializing and deserializing again help change the container the kubelet deals with? |
Right.
@liggitt Everytime kubelet gets restarted, the container hash is calculated to identify whether it needs restarted. What this PR is trying to fix is to keep the hash value consistent as far as possible to avoid un-needed restarts. This PR won't change the pod spec and container spec. By reserializing and deserializing again to a new obj ( Currently |
Can you describe how the kubelet loaded this v1 container into memory from an api response or a manifest in a way that would result in different content than if it were roundtripped back to json and then back into memory? |
it seems like the issue is less with nil fields and more with the way the hash is being computed. This change will not prevent a new defaulted field from changing the hash, and seems like it would guarantee that hashes would change between 1.9 and 1.10, which isn't great. Do we not save the serialized container spec we had when we launched the container? If we did, we could load it, apply defaults, and do a semantic deepequal to see if there was a diff that required restarting. Working to detect actual differences seems better than trying to improve the hashing approach which still leaves many unnecessary restart scenarios in place. |
Right.
We do save the hash value in container labels, but not the serialized container spec.
@liggitt Makes sense. But this will not gonna be a small change.
I prefer the second option. @liggitt Any suggestions? ping @kubernetes/sig-node-api-reviews Needs your opinions. Thanks. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
/assign @yujuhong |
@xmudrii Yes, still targeted to v1.16. |
This PR may require API review. If so, when the changes are ready, complete the pre-review checklist and request an API review. Status of requested reviews is tracked in the API Review project. |
Ping @liggitt for review. Thanks. |
rather than committing the copies of types.go files, I'd just record the computed hash for a given pod spec in 1.14 and 1.15 and test against the same HEAD pod spec |
@liggitt Please take another review. Thanks. |
@dixudx: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dixudx, liggitt The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
Currently when upgrading the cluster, docker containers may get restarted if struct
Container
get changed.What we expected is to keep the containers unchanged, since they did not change at all.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #53644
Special notes for your reviewer:
/assign @tallclair @vishh
WARNING: This change will still let container spec hash get changed and restarted.
But it will keep the hash value consistent for future releases.
Release note: