Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DaemonSet: requirements for graduation to beta and then to v1 #15310

Closed
3 of 4 tasks
bgrant0607 opened this issue Oct 8, 2015 · 36 comments
Closed
3 of 4 tasks

DaemonSet: requirements for graduation to beta and then to v1 #15310

bgrant0607 opened this issue Oct 8, 2015 · 36 comments
Assignees
Labels
area/api Indicates an issue on api area. area/app-lifecycle area/workload-api/daemonset lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/apps Categorizes an issue or PR as relevant to SIG Apps.

Comments

@bgrant0607
Copy link
Member

bgrant0607 commented Oct 8, 2015

Forked from #14326

Before DaemonSet graduates from experimental:

Before DaemonSet reaches v1beta1, I'd like to see:

Before DaemonSet reaches v1, I'd like to see:

  • Either integration into the node controller in order to integrate with node lifecycle, such as creation before other pods can be scheduled and implicit forgiveness behavior (Proposal: Forgiveness #1574), or use of initializers and finalizers (Places for hooks #3585) to achieve the same

@davidopp @mikedanese

@bgrant0607 bgrant0607 added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/api Indicates an issue on api area. team/control-plane labels Oct 8, 2015
@erictune
Copy link
Member

erictune commented Oct 8, 2015

@polvi as mentioned last week, DaemonSet can be used to start a process on every machine in a cluster, thus simplifying setup of things like monitoring and local-storage-daemons. Documented at https://github.com/kubernetes/kubernetes/blob/ba89c98fc7e892e816751a95ae0ee22f4266ffa5/docs/admin/daemons.md

@erictune
Copy link
Member

erictune commented Oct 8, 2015

That is a good list of changes that you have thought of.

The best way to find changes that we haven't thought of yet is to use the feature ourselves.
I propose it doesn't graduate from beta until we are using it as part of the kube-up process.
I'll file an issue to that effect.

@erictune
Copy link
Member

erictune commented Oct 8, 2015

Filed #15324

@brendandburns
Copy link
Contributor

I disagree. If we think it is feasible for KubeUp, then it should definitely be in beta. If its useful for us, its useful for customers.

@erictune
Copy link
Member

erictune commented Oct 8, 2015

You have said that before it is beta, we need to be pretty sure we aren't going to change anything. I don't think that we get to that level of sureness without actually using the interface ourselves.

@erictune
Copy link
Member

erictune commented Oct 8, 2015

Also, see description in #15324 about how to stage the work.

@erictune
Copy link
Member

erictune commented Oct 8, 2015

I might be misquoting @brendandburns, but we have definitely talked about a bar for beta stability internally, and I don't see how we reach that bar without using the feature in earnest.

@derekwaynecarr
Copy link
Member

cc @kubernetes/rh-cluster-infra

@bgrant0607
Copy link
Member Author

DaemonSet won't be ready in 1.1.

In addition to the other issues mentioned, we need to think about interplay between DaemonSet updates and node upgrades.

@bgrant0607 bgrant0607 added this to the v1.2-candidate milestone Oct 13, 2015
@erictune erictune changed the title DaemonSet graduation requirements DaemonSet: requirements for graduation to v1 Oct 20, 2015
@erictune erictune changed the title DaemonSet: requirements for graduation to v1 DaemonSet: requirements for graduation to beta and then to v1 Oct 20, 2015
@bgrant0607 bgrant0607 modified the milestones: v1.2-candidate, v1.2 Nov 19, 2015
@mikedanese
Copy link
Member

We need to revert and fix #17318 before graduation.

@davidopp
Copy link
Member

We need to fix #16967 (via #12744) before graduation.

@davidopp
Copy link
Member

davidopp commented Jan 5, 2016

Either integration into the node controller in order to integrate with node lifecycle, such as creation before other pods can be scheduled and implicit forgiveness behavior (#1574), or use of initializers and finalizers (#3585) to achieve the same

Proposal for how to do this (depends on #18263 being implemented)

  1. Admission controller intercepts Node object creation and adds a taint "node being configured"
  2. DaemonSet controller automatically adds toleration for "node being configured" taint to all pods it creates
  3. When all DaemonSets that should schedule onto the node have started, the taint is removed

(3) is the only tricky step. One idea is that we could have NodeController iterate through all DaemonSet objects, figure out which ones should result in a pod on the given node (due to label selectors not all daemons are supposed to run on all nodes), and when all of those are running remove the taint.

This doesn't require moving DaemonSet controller into NodeController.

Of course this is all related to #3885 (comment)

@mml
Copy link
Contributor

mml commented Jan 6, 2016

For the implicit forgiveness part, the idea is that all DaemonSet-managed pods implicitly forgive the node going unreachable, but not any other type of node deletion or cascading delete event.

Correct?

@davidopp
Copy link
Member

davidopp commented Jan 6, 2016

For the implicit forgiveness part, the idea is that all DaemonSet-managed pods implicitly forgive the node going unreachable, but not any other type of node deletion or cascading delete event.

Yeah, this is just addressing what happens when NodeController deletes pods because the NodeReady became False or Unknown. But did you have something specific in mind when you said "other type of node deletion or cascading delete event" ?

@mml
Copy link
Contributor

mml commented Jan 6, 2016

I can't think of a cascading delete that would result in deleting a node, but if the admin manually deletes a node with kubectl, we'd still delete the pod, right?

Also to be clear, I'm not planning to do this by exposing the Forgiveness concept anywhere in the API (it's not well-enough formed yet), but just by hardcoding this behavior into the NodeController when the pod in question is DaemonSet-managed.

@mikedanese
Copy link
Member

I feel like there were two kubectl and one server side. Server side one here: #19627

@piosz
Copy link
Member

piosz commented Dec 14, 2016

@bgrant0607 @erictune can me make sure that before going to GA someone (fluentd or node-problem-dectector) will actually use this for a quarter or two, to make sure that for example upgrades are handled properly?

@bgrant0607
Copy link
Member Author

@piosz I agree. I don't want to rush any of the controller APIs to GA. I wish Job hadn't been rushed to GA.

@k8s-github-robot k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 31, 2017
@0xmichalis
Copy link
Contributor

/sig apps

@k8s-ci-robot k8s-ci-robot added the sig/apps Categorizes an issue or PR as relevant to SIG Apps. label Jun 10, 2017
@k8s-github-robot k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 10, 2017
@liggitt
Copy link
Member

liggitt commented Nov 25, 2017

cc @kow3ns

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 23, 2018
@mikedanese
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 23, 2018
@janetkuo janetkuo self-assigned this Mar 9, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 7, 2018
@bgrant0607
Copy link
Member Author

DaemonSet is v1.
cc @kow3ns @janetkuo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Indicates an issue on api area. area/app-lifecycle area/workload-api/daemonset lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/apps Categorizes an issue or PR as relevant to SIG Apps.
Projects
None yet
Development

No branches or pull requests