-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add node status to API object. #2315
Conversation
Add it to v1beta3 (in the new form) under Node. I would recommend not taking on any more of the v1beta3 changes in this pull - we should do the internal refactor of Node next but that can build on top of this. |
I think this looks ok otherwise - I haven't followed any other discussions about node conditions. Is the scheduler expected to only schedule on nodes with a certain condition? In another issue we discussed the admin being explicitly able to mark a node as not accepting new scheduled nodes - is that something that would be reflected in |
The What are the cases that we will schedule onto vanished/unhealthy node? I think at first, we can have apiserver filter out unheathy/vanished node. Then we can approach to selector based list, so scheduler can list only healthy node, kubectl can list all nodes, etc? |
|
I will take a look at this this morning. |
// NodeHealthy means the Node is running but unhealthy | ||
NodeUnhealthy NodeCondition = "Unhealthy" | ||
// NodeVanished means the Node is not reachable | ||
NodeVanished NodeCondition = "Vanished" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you plan to define "Vanished", and what behavior to you plan to attach to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here "Vanished" is defined as not reachable from the cluster, i.e. connection refused from the node's kubelet port. Ocasional network failure or machine restarts can make a node unreachable, so we'll need a policy surround the definition, like N out of M tries, at most no connection in X min, etc. I plan to enforce this on node-controller.
Besides question on Vanished raised by others, should we define a condition for node at staging state? healthy, but not ready to provide any services yet? Or this belongs to kubelet readiness state? |
@@ -576,6 +594,8 @@ type Minion struct { | |||
HostIP string `json:"hostIP,omitempty" yaml:"hostIP,omitempty"` | |||
// Resources available on the node | |||
NodeResources NodeResources `json:"resources,omitempty" yaml:"resources,omitempty"` | |||
// Status describes the current status of a Node | |||
Status NodeStatus `json:"status,omitempty" yaml:"status,omitempty""` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new status field is fine, but FYI:
We're trying to move our internal representation towards v1beta3.
There should eventually be just TypeMeta, ObjectMeta, NodeSpec, and NodeStatus in Minion/Node.
NodeResources was added to the wrong place in v1beta3 and HostIP was apparently dropped (not sure whether that was intentional or not).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Is there any particular reason we still leave the NodeResources here? Or it's just not cleaned up yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would assume whoever does the internal refactor should clean it up.
----- Original Message -----
@@ -576,6 +594,8 @@ type Minion struct {
HostIP stringjson:"hostIP,omitempty" yaml:"hostIP,omitempty"
// Resources available on the node
NodeResources NodeResourcesjson:"resources,omitempty" yaml:"resources,omitempty"
- // Status describes the current status of a Node
- Status NodeStatus
json:"status,omitempty" yaml:"status,omitempty""
Thanks!
Is there any particular reason we still leave the NodeResources here? Or
it's just not cleaned up yet?
Reply to this email directly or view it on GitHub:
https://github.com/GoogleCloudPlatform/kubernetes/pull/2315/files#r20271948
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG. I can investigate this, but I'm not fully understand how we organize our api versions. @smarterclayton
For the NodeCondition change, I need to update all 3 versions and internal version, but for dropping the HostIP, seems we only did that in v1beta3. What's the rationale behind this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bgrant0607 Should NodeResources be split into Spec and Status (Capacity in Spec, whatever represents the current state in Status)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Short answer is yes.
See resourceCapacitySpec in https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/resources.md. I updated it this week.
Status is more complicated. See Usage Data in the Appendix of resources.md.
Sorry, I haven't had time to follow node controller discussions, either. Vanished probably comes from #1366. We do eventually need a rigorous definition of vanished. There will be a standard definition, though it may be configurable per k8s cluster. As discussed in #1366, once a node is considered vanished, I'd like the node controller to kill the pods on that node. If we need more flexibility, we'd use forgiveness to override the default kill policy. Schedulers should only schedule on Healthy nodes. I expect to add more flavors of not schedulable in the future, such as Lame (node controller draining running pods gracefully -- deliberately unschedulable via API call to apiserver and/or to Kubelet), Disabled (deliberately kill pods and make unschedulable via API call to apiserver and/or to Kubelet), etc. |
@dchen1107 Good point. Not sure we want healthy, unhealthy, vanished to be the values of NodeCondition. Will think about this a bit. |
|
@smarterclayton Agree. For explicit disabling/laming, a NodeSpec field would be required. |
@dchen1107 I believe we will eventually need such a staging state, especially when kubernetes can scale nodes dynamically. At which point, we definitely need a rigorous definition, as the cutoff between staging and unhealthy can be subtle. @bgrant0607 Thanks for reviewing, some comments: Regarding to @dchen1107 's suggestion, why do you think healthy, unhealthy and vanished may not be wanted? The current NodeCondition is basically a rework of minion health check. Is Node life cycle well understood in k8s that we are able to give a rigorous definition? If so, then I can get on board with it; if not, then we can first finish the rework? |
I think we want to distinguish lifecycle from recently observed behavior. By analogy with pods, NodeCondition should reflect lifecycle. Proposed conditions could be Pending, Running, Shutdown. Then similar to ContainerState, we could provide more detailed information, including liveness, readiness, and reachability, each with time of the last transition (e.g., live -> not live). Any condition (in the general meaning of the word) that could have a fairly open-ended number of causes should have an associated Reason and possibly Message. |
It's beyond the initial scope for NodeCondition, but I'll extend the idea here since it's desirable to have a lifecycle management of node. Pending and Running looks good to me, but I'm not sure if Shutdown is appropriate here. How could controller know if a node is shutdown or not? I'd prefer using Stopped or Vanished, which includes more possibilities. Also, I think we need to include unhealthy state for node status to be complete, but we could also just include unhealthy state into stopped state. I've included a brief summary. Pending
Transition
Meta Data
Running
Transition Meta Data
Disabled
Transition
Meta Data
Unhealthy
Transition
Meta Data
Stopped (Shutdown, Vanished)
Transition
Meta Data
|
Lifecycle stages should be similar to pods. When created/added to the cluster, they start in a pending state, progress to an active state, and then end in a terminated state when the node is deleted or potentially when it disappears from the host provider (e.g., the VM is deleted from GCE). It shouldn't be possible to go back to a prior stage without re-creating the node object. Everything else should be current status. |
I think we should have an array of status info, each with a kind of health, starting with Reachable, Live, Ready. Additionally, in the spec, there should be a Schedulable field and a Runnable field. We'll want to make this fancier in various ways in the future. Will address the rest in a bit. |
We should also represent Schedulable and Runnable in status, similar to the other properties, recording time of transition, reason, and message. |
Sigh. This discussion makes it clear what a mistake it was to use "Condition" for what is, essentially, a lifecycle phase (definition: a distinct period or stage in a process of change or forming part of something's development; or stage: a point, period, or step in a process or development). At least we got Status right. @smarterclayton @markturansky I feel bad for even considering this, but how would you feel about replacing |
@bgrant0607 yes, that PodCondition refactor was easy. I am happy to change it to PodPhase. I'll have a new pull for that change and I'll be sure to reference this issue. |
@bgrant0607 I'm not sure how Schedulable/Runnable fit into Reachable/Live/Ready in NodeStatus. We could also include each kind of unhealth in the status info. So my interpretation of your suggestion is something like follows. (I didn't follow Pod lifecycle discussion, correct me if it's wrong). type NodeStatus struct {
// NodePhase is the current phase of the Node, one of "Pending", "Running" and "Terminated". It
// is populated based on the value of NodeCondition.
Phase NodePhase
Condition NodeCondition
}
// NodeCondition is a detailed condition of Node. Only one of the field can be set at a given time.
type NodeCondition struct {
UnReachable *NodeConditionUnReachable
Reachable *NodeConditionReachable
Schedulable *NodeConditionSchedulable
Live *NodeConditionLive
Ready *NodeConditionReady
NotReady *NodeConditionNotReady
...
}
type NodeConditionUnReachable struct {
Reason string
LastReachableTime Time
}
... For example, when Node is unreachable and its LastReachableTime is none, then the node is Pending; otherwise, it's Terminated. The tricky part is how do we define different node conditions? |
The running phase may be populated based on condition, but the terminated phase should not be. My proposal:
|
Ok, I think the idea is same here: with NodeCondition.Status, unhealthy condition is also included in Node status. Currently, kubelet healthz is pretty weak, we cannot gather much information from it. The only information we can get is Reachable/Unreachable (connection succeed or not), and Ready/NotReady (status ok or not). so we'll start with a subset of the conditions. I'm not sure what "Live" means here, or in general, the difference between Live/Reachable/Ready and the reason we include all them in node condition. We'll want to make sure at any given time of the node, all of the conditions exist for the node. So I'd say change the slice to map will be easier to implement. At last, I think we still need to populate NodePhase based on NodeCondition, even for terminate phase. NodePhase is really just a summary of NodeCondition, surrounded by our policy (defines which combination of conditions belongs to which phase). |
Can one of the admins verify this patch? |
Discussed IRL. Summary: NodePhase should not be derived from NodeCondition -- I'm deliberately trying to keep them orthogonal. NodePhase is about provisioning status. Pending is node has been created/added but not configured. (We need some mechanism to configure nodes on demand, to accommodate thing like auto-scaled clusters.) Running means that it has been configured and has Kubelet running. Terminated means the node has been removed from the cluster (e.g., its VM has been deleted). I'm fine if we just start with one NodeCondition -- say, "Ready". We can add the others when we have more information. Re. why array of conditions rather than map, see #2004. I'm not completely opposed to aggregating default schedulability and runnability from NodeConditions, but as separate fields and not as part of NodePhase. However, such summary fields would become confusing later when we allow exceptions to the default policies, such as forgiveness (#1574). |
cf0f8b8
to
5d0fc30
Compare
Updated PR based on discussion. Two changes made after @bgrant0607 's last comment:
Now that we want to keep node phase from condition, we definitely won't include schedulability and runnability to NodePhase. The two properties can be summarized from NodeConditions (I think), but we need to persist the properties to support forgiveness, e.g. we need to know when a node becomes unschedulable. I actually like the idea of having a separate field.. but that probably should be discussed in another issue. |
@bgrant0607 What's the status for this? Are we going to merge it or have some more discussion? |
@ddysher Was busy all last week. Will try to look it over soon. |
type NodeConditionStatus string | ||
|
||
// These are valid condition status. "ConditionTrue" means node is in the condition; | ||
// "ConditionFalse" means node is not in the condition; "ConditionUnknown" means kuernetes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: Kubernetes
Looks pretty good. Thanks! Just a few minor comments. |
Comment addressed. Thanks! |
) | ||
|
||
type NodeCondition struct { | ||
Kind NodeConditionKind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add json and description tags to all the fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Add node status to API object.
@smarterclayton @lavalamp I'm not sure if this is enough to get it work. Beat me up.
@smarterclayton Should I also take care of v1beta3?