Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Daemonset history #45924

Merged
merged 5 commits into from
Jun 3, 2017

Conversation

janetkuo
Copy link
Member

@janetkuo janetkuo commented May 17, 2017

Depends on #45867 (the 1st commit, ignore it when reviewing) (already merged)

  1. Update DaemonSet API for rollback and history
    • Add revisionHistoryLimit (default 10), collisionCount, and validation code
    • Add daemonset-controller-hash label, and deprecate templateGeneration
  2. Implement DaemonSet history logic in controller
    • Create controllerrevisions (history) and label pods with template hash for both RollingUpdate and OnDelete update strategy
    • Clean up old, non-live history based on revisionHistoryLimit
    • Remove duplicate controllerrevisions (the ones with the same template) and relabel their pods
    • Update RBAC to allow DaemonSet controller to manage controllerrevisions
    • In DaemonSet controller unit tests, create new pods with hash labels

Ref kubernetes/community#527 and kubernetes/community#594

@kubernetes/sig-apps-api-reviews @kubernetes/sig-apps-pr-reviews @erictune @kow3ns @lukaszo @Kargakis


TODOs:

  • API changes
    • (maybe) Remove rollback subresource if we decide to do client-side rollback
  • deployment controller
    • controller revision
      • owner ref (claim & adoption)
      • history reconstruct (put revision number, hash collision avoidance)
      • de-dup history and relabel pods
      • compare ds template with history
    • hash labels (put it in controller revision, pods, and maybe deployment)
    • clean up old history
    • Rename status.uniquifier when we reach consensus in Switch Deployments to new hashing algo w/ collision avoidance mechanism #44774
  • e2e tests
  • unit tests
    • daemoncontroller_test.go
    • update_test.go
    • (maybe) storage_test.go // if we do server side rollback

kubectl part is in #46144


Release note:

@janetkuo janetkuo added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label May 17, 2017
@janetkuo janetkuo added this to the v1.7 milestone May 17, 2017
@janetkuo janetkuo self-assigned this May 17, 2017
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 17, 2017
@k8s-github-robot k8s-github-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/new-api size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 17, 2017
@janetkuo janetkuo force-pushed the daemonset-history branch from 2950cc8 to 41708f9 Compare May 17, 2017 21:35
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 17, 2017
@janetkuo janetkuo force-pushed the daemonset-history branch 5 times, most recently from 425f07a to c50dbb5 Compare May 18, 2017 22:03
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 18, 2017
@janetkuo janetkuo force-pushed the daemonset-history branch 2 times, most recently from e533574 to 1709a25 Compare May 18, 2017 23:16
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 18, 2017
@janetkuo janetkuo force-pushed the daemonset-history branch 3 times, most recently from 7613856 to 43e6b50 Compare May 19, 2017 22:10
@janetkuo janetkuo force-pushed the daemonset-history branch from 43e6b50 to 82c3627 Compare May 19, 2017 22:15
@janetkuo janetkuo changed the title [WIP] Implement Daemonset history Implement Daemonset history May 19, 2017
@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 1, 2017
toUpdate.Labels = make(map[string]string)
}
toUpdate.Labels[extensions.DefaultDaemonSetUniqueLabelKey] = cur.Labels[extensions.DefaultDaemonSetUniqueLabelKey]
_, err = dsc.kubeClient.ExtensionsV1beta1().DaemonSets(ds.Namespace).UpdateStatus(toUpdate)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are going to invalidate label updates during status updates at some point in the future and this will break.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will follow up

if err != nil {
return fmt.Errorf("couldn't get node to daemon pod mapping for daemon set %q: %v", ds.Name, err)
}
_, old, err := dsc.constructHistory(ds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if it's going to be cleaner to construct the history once and then pass the references in manage and cleanupHistory, instead of reconstructing it for each one of them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, will follow up

}
}
}
// Label ds with current history's unique label as well
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this actually?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easier for us to check if pods are up-to-date without looking though all history again when calculating DS status

}
if err != nil {
return err
}
}

err = dsc.cleanupHistory(ds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has rollingUpdate finished the whole update at this point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, rollingUpdate calls syncNodes which creates & deletes pods and waits for those operations to complete

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that a stuck update will hold a worker forever?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create & delete pods won't wait forever. If any creation/deletion fails, it collects the error, and returns all errors.

Copy link
Contributor

@0xmichalis 0xmichalis Jun 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Successful creates / deletes doesn't mean complete rollout so rollingUpdate hasn't actually finished the whole rollout at this point in some cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Is the concern about not killing old history when it has a pod that is terminating?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this as an issue for now but if we standarize on having cleanup being executed at the end of a rollout, we should fix this. As I said elsewhere, I am going to rollback the Deployment case now that we have collisionCount (remember that cleanup was executed at the end before and I think it's strictly correct to do so).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an example, Jenkins is cleaning up old builds only after the new builds are complete.

DeleteFunc: dsc.deleteHistory,
})
dsc.historyLister = historyInformer.Lister()
dsc.historyStoreSynced = historyInformer.Informer().HasSynced
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you want to wait for the history cache to be synced?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will add in WaitForCacheSync

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 3, 2017
janetkuo added 5 commits June 3, 2017 00:43
1. Add revisionHistoryLimit (default 10), collisionCount, and validation code
2. Add daemonset-controller-hash label, and deprecate templateGeneration
1. Create controllerrevisions (history) and label pods with template
   hash for both RollingUpdate and OnDelete update strategy
2. Clean up old, non-live history based on revisionHistoryLimit
3. Remove duplicate controllerrevisions (the ones with the same template)
   and relabel their pods
4. Update RBAC to allow DaemonSet controller to manage
   controllerrevisions
5. In DaemonSet controller unit tests, create new pods with hash labels
@janetkuo janetkuo force-pushed the daemonset-history branch from 997f8b9 to 85ec49c Compare June 3, 2017 07:53
@janetkuo
Copy link
Member Author

janetkuo commented Jun 3, 2017

Rebased.

@janetkuo janetkuo added priority/P2 approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. priority/P2 labels Jun 3, 2017
@janetkuo
Copy link
Member Author

janetkuo commented Jun 3, 2017

This blocks #46144

@janetkuo
Copy link
Member Author

janetkuo commented Jun 3, 2017

@k8s-bot pull-kubernetes-unit test this

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit dbd1503 into kubernetes:master Jun 3, 2017
k8s-github-robot pushed a commit that referenced this pull request Jun 5, 2017
Automatic merge from submit-queue (batch tested with PRs 45871, 46498, 46729, 46144, 46804)

Implement kubectl rollout undo and history for DaemonSet

~Depends on #45924, only the 2nd commit needs review~ (merged)

Ref kubernetes/community#527

TODOs:
- [x] kubectl rollout history
  - [x] sort controller history, print overview (with revision number and change cause)
  - [x] print detail view (content of a history) 
    - [x] print template 
    - [x] ~(do we need to?) print labels and annotations~
- [x] kubectl rollout undo: 
  - [x] list controller history, figure out which revision to rollback to
    - if toRevision == 0, rollback to the latest revision, otherwise choose the history with matching revision
  - [x] update the ds using the history to rollback to 
    - [x] replace the ds template with history's
    - [x] ~(do we need to?) replace the ds labels and annotations with history's~
- [x] test-cmd.sh 

@kubernetes/sig-apps-pr-reviews @erictune @kow3ns @lukaszo @Kargakis @kubernetes/sig-cli-maintainers 

--- 

**Release note**:

```release-note
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants