-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Job resource #7380
New Job resource #7380
Conversation
@@ -747,6 +747,27 @@ type ReplicationController struct { | |||
Labels map[string]string `json:"labels,omitempty" description:"map of string keys and values that can be used to organize and categorize replication controllers"` | |||
} | |||
|
|||
// JobState is the state of a job, either input (create, update) or as output (list, get). | |||
type JobState struct { | |||
Completions int `json:"replicas" description:"number of replicas (desired or observed, as appropriate)"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
json and internal name have a mismatch, is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
d4a794b
to
af960c1
Compare
The v1beta3 API LGTM. One issue is that we'll need to figure out how to make Job kill off all its pods when it is killed (#1535). We shouldn't do that in the client the way it was done for RC, though it should also be the default behavior (--cascade=true). Any volunteers to do a more thorough review? |
// JobSpec is the specification of a job. | ||
type JobSpec struct { | ||
// Completions is the number of desired completions. | ||
Completions int `json:"completions" description:"number of times the job has completed successfully (desired or observed, as appropriate)"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make this field optional (omitempty), default this value to 1, and make it *int.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this read: number of times a pod has completed successfully
Will look tomorrow. |
@smarterclayton Thanks, but not super-urgent if other things are more critical. |
Completions int `json:"completions"` | ||
|
||
// Selector is a label query over pods that should match the completed count. | ||
Selector map[string]string `json:"selector"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any guarantee given if half the count runs, the nodes are lost, and so count isn't tracked? Or is the job controller expected to cache the pending completions until the job passes the threshold? Can completed pods not be deleted until enough job pods have run? Maybe i missed the deeper discussion on this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see it work both ways, the simpler version is if the kubelet doesn't have a chance to report the completed status of the job, it could run multiple times. There was some discussion on #1624 about this case. The more involved fix would be to stash the expected completions locally, update it everytime a pod status matching the selectors changes to completed, and have the job controller monitor something like observedCompleted + active < spec.completions.
I think I'm going for the simpler version first, thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was discussed here:
#1624 (comment)
There's no absolutely preventing multiple execution, since containers could always fail just before exiting with 0. I think the initial version could make a pretty-good effort at tracking completions in status. I had suggested persisting them in an annotation as a kind of private checkpoint.
there's been some discussion around if we really want the job controller for 1.0, and if we don't there's no point adding just a job resource, so like @bgrant0607 said, no hurry for this :) I'm really waiting for #1624 to either get the label post-1.0 or 1.0, Brian or @davidopp might be able to help with prioritizing that |
I can do a more thorough review. |
// JobStatus represents the current status of a job. | ||
type JobStatus struct { | ||
// Completions is the number of actual completions. | ||
Completions int `json:"completions" description:"most recently oberved number of completions"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A UI would want to show a user a list of running, successful, and failed pods. Is this something that can be reconstructed from get pods
and an appropriate label match, or is it something the JobStatus needs to contain?
I've commented in #1624. Please don't merge this until we reach a conclusion on that comment. |
// JobSpec is the specification of a job. | ||
// As the internal representation of a job, it may have either a TemplateRef or a Template set. | ||
type JobSpec struct { | ||
// Completions is the number of desired successful completions of this job. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"of this job" -> "of pods of this job" (the "job" only runs to completion once)
GCE e2e build/test failed for commit af960c1. |
GCE e2e build/test failed for commit 640d4625ebd7b1b524e4dbb6d6fb83bd0ae88fe4. |
Rebased. @mikedanese as (I think) he's going to be picking up the job controller. I've only lightly tested this post-rebase (job unittests and basic kubectl resource operations using the example job.json). |
GCE e2e build/test failed for commit 071d313. |
I think we should wait with further work on this PR until #11746 settles down. Afterwards I'll be happy to pick it up and rework to match the designed API. |
GCE e2e build/test passed for commit 071d313. |
I've picked up the changes but had to rework this to place job in |
Plumbing required to create a new job resource (storage, kubectl, rest client, validation).
@davidopp, @bgrant0607, @lavalamp (not sure if you're interested, but fyi anyway).