-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Daemonset history #45924
Implement Daemonset history #45924
Conversation
2950cc8
to
41708f9
Compare
425f07a
to
c50dbb5
Compare
e533574
to
1709a25
Compare
7613856
to
43e6b50
Compare
43e6b50
to
82c3627
Compare
toUpdate.Labels = make(map[string]string) | ||
} | ||
toUpdate.Labels[extensions.DefaultDaemonSetUniqueLabelKey] = cur.Labels[extensions.DefaultDaemonSetUniqueLabelKey] | ||
_, err = dsc.kubeClient.ExtensionsV1beta1().DaemonSets(ds.Namespace).UpdateStatus(toUpdate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are going to invalidate label updates during status updates at some point in the future and this will break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will follow up
if err != nil { | ||
return fmt.Errorf("couldn't get node to daemon pod mapping for daemon set %q: %v", ds.Name, err) | ||
} | ||
_, old, err := dsc.constructHistory(ds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if it's going to be cleaner to construct the history once and then pass the references in manage
and cleanupHistory
, instead of reconstructing it for each one of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, will follow up
} | ||
} | ||
} | ||
// Label ds with current history's unique label as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this actually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's easier for us to check if pods are up-to-date without looking though all history again when calculating DS status
} | ||
if err != nil { | ||
return err | ||
} | ||
} | ||
|
||
err = dsc.cleanupHistory(ds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has rollingUpdate
finished the whole update at this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, rollingUpdate
calls syncNodes
which creates & deletes pods and waits for those operations to complete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that a stuck update will hold a worker forever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create & delete pods won't wait forever. If any creation/deletion fails, it collects the error, and returns all errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Successful creates / deletes doesn't mean complete rollout so rollingUpdate
hasn't actually finished the whole rollout at this point in some cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Is the concern about not killing old history when it has a pod that is terminating?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this as an issue for now but if we standarize on having cleanup being executed at the end of a rollout, we should fix this. As I said elsewhere, I am going to rollback the Deployment case now that we have collisionCount (remember that cleanup was executed at the end before and I think it's strictly correct to do so).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an example, Jenkins is cleaning up old builds only after the new builds are complete.
DeleteFunc: dsc.deleteHistory, | ||
}) | ||
dsc.historyLister = historyInformer.Lister() | ||
dsc.historyStoreSynced = historyInformer.Informer().HasSynced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you want to wait for the history cache to be synced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will add in WaitForCacheSync
1. Add revisionHistoryLimit (default 10), collisionCount, and validation code 2. Add daemonset-controller-hash label, and deprecate templateGeneration
1. Create controllerrevisions (history) and label pods with template hash for both RollingUpdate and OnDelete update strategy 2. Clean up old, non-live history based on revisionHistoryLimit 3. Remove duplicate controllerrevisions (the ones with the same template) and relabel their pods 4. Update RBAC to allow DaemonSet controller to manage controllerrevisions 5. In DaemonSet controller unit tests, create new pods with hash labels
997f8b9
to
85ec49c
Compare
Rebased. |
This blocks #46144 |
@k8s-bot pull-kubernetes-unit test this |
Automatic merge from submit-queue |
Automatic merge from submit-queue (batch tested with PRs 45871, 46498, 46729, 46144, 46804) Implement kubectl rollout undo and history for DaemonSet ~Depends on #45924, only the 2nd commit needs review~ (merged) Ref kubernetes/community#527 TODOs: - [x] kubectl rollout history - [x] sort controller history, print overview (with revision number and change cause) - [x] print detail view (content of a history) - [x] print template - [x] ~(do we need to?) print labels and annotations~ - [x] kubectl rollout undo: - [x] list controller history, figure out which revision to rollback to - if toRevision == 0, rollback to the latest revision, otherwise choose the history with matching revision - [x] update the ds using the history to rollback to - [x] replace the ds template with history's - [x] ~(do we need to?) replace the ds labels and annotations with history's~ - [x] test-cmd.sh @kubernetes/sig-apps-pr-reviews @erictune @kow3ns @lukaszo @Kargakis @kubernetes/sig-cli-maintainers --- **Release note**: ```release-note ```
Depends on #45867 (the 1st commit, ignore it when reviewing)(already merged)Ref kubernetes/community#527 and kubernetes/community#594
@kubernetes/sig-apps-api-reviews @kubernetes/sig-apps-pr-reviews @erictune @kow3ns @lukaszo @Kargakis
TODOs:
(maybe) storage_test.go // if we do server side rollbackkubectl part is in #46144
Release note: