Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Job resource #7380

Closed
wants to merge 2 commits into from
Closed

Conversation

bprashanth
Copy link
Contributor

Plumbing required to create a new job resource (storage, kubectl, rest client, validation).
@davidopp, @bgrant0607, @lavalamp (not sure if you're interested, but fyi anyway).

@@ -747,6 +747,27 @@ type ReplicationController struct {
Labels map[string]string `json:"labels,omitempty" description:"map of string keys and values that can be used to organize and categorize replication controllers"`
}

// JobState is the state of a job, either input (create, update) or as output (list, get).
type JobState struct {
Completions int `json:"replicas" description:"number of replicas (desired or observed, as appropriate)"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json and internal name have a mismatch, is that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@bgrant0607
Copy link
Member

The v1beta3 API LGTM.

One issue is that we'll need to figure out how to make Job kill off all its pods when it is killed (#1535). We shouldn't do that in the client the way it was done for RC, though it should also be the default behavior (--cascade=true).

Any volunteers to do a more thorough review?

cc @smarterclayton

// JobSpec is the specification of a job.
type JobSpec struct {
// Completions is the number of desired completions.
Completions int `json:"completions" description:"number of times the job has completed successfully (desired or observed, as appropriate)"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this field optional (omitempty), default this value to 1, and make it *int.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this read: number of times a pod has completed successfully

@bgrant0607 bgrant0607 self-assigned this Apr 28, 2015
@smarterclayton
Copy link
Contributor

Will look tomorrow.

@bgrant0607
Copy link
Member

@smarterclayton Thanks, but not super-urgent if other things are more critical.

Completions int `json:"completions"`

// Selector is a label query over pods that should match the completed count.
Selector map[string]string `json:"selector"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any guarantee given if half the count runs, the nodes are lost, and so count isn't tracked? Or is the job controller expected to cache the pending completions until the job passes the threshold? Can completed pods not be deleted until enough job pods have run? Maybe i missed the deeper discussion on this...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see it work both ways, the simpler version is if the kubelet doesn't have a chance to report the completed status of the job, it could run multiple times. There was some discussion on #1624 about this case. The more involved fix would be to stash the expected completions locally, update it everytime a pod status matching the selectors changes to completed, and have the job controller monitor something like observedCompleted + active < spec.completions.

I think I'm going for the simpler version first, thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was discussed here:
#1624 (comment)

There's no absolutely preventing multiple execution, since containers could always fail just before exiting with 0. I think the initial version could make a pretty-good effort at tracking completions in status. I had suggested persisting them in an annotation as a kind of private checkpoint.

@bprashanth
Copy link
Contributor Author

there's been some discussion around if we really want the job controller for 1.0, and if we don't there's no point adding just a job resource, so like @bgrant0607 said, no hurry for this :)

I'm really waiting for #1624 to either get the label post-1.0 or 1.0, Brian or @davidopp might be able to help with prioritizing that

@davidopp
Copy link
Member

I can do a more thorough review.

// JobStatus represents the current status of a job.
type JobStatus struct {
// Completions is the number of actual completions.
Completions int `json:"completions" description:"most recently oberved number of completions"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A UI would want to show a user a list of running, successful, and failed pods. Is this something that can be reconstructed from get pods and an appropriate label match, or is it something the JobStatus needs to contain?

@erictune erictune mentioned this pull request Apr 29, 2015
@erictune
Copy link
Member

I've commented in #1624. Please don't merge this until we reach a conclusion on that comment.

// JobSpec is the specification of a job.
// As the internal representation of a job, it may have either a TemplateRef or a Template set.
type JobSpec struct {
// Completions is the number of desired successful completions of this job.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"of this job" -> "of pods of this job" (the "job" only runs to completion once)

@bgrant0607 bgrant0607 added this to the v1.0-post milestone May 22, 2015
@k8s-bot
Copy link

k8s-bot commented Jun 19, 2015

GCE e2e build/test failed for commit af960c1.

@bgrant0607 bgrant0607 removed this from the v1.0-post milestone Jul 24, 2015
@k8s-bot
Copy link

k8s-bot commented Jul 27, 2015

GCE e2e build/test failed for commit 640d4625ebd7b1b524e4dbb6d6fb83bd0ae88fe4.

@bprashanth
Copy link
Contributor Author

Rebased. @mikedanese as (I think) he's going to be picking up the job controller. I've only lightly tested this post-rebase (job unittests and basic kubectl resource operations using the example job.json).

@k8s-bot
Copy link

k8s-bot commented Jul 27, 2015

GCE e2e build/test failed for commit 071d313.

@smarterclayton
Copy link
Contributor

@soltysh

@soltysh
Copy link
Contributor

soltysh commented Jul 28, 2015

I think we should wait with further work on this PR until #11746 settles down. Afterwards I'll be happy to pick it up and rework to match the designed API.

@k8s-bot
Copy link

k8s-bot commented Aug 7, 2015

GCE e2e build/test passed for commit 071d313.

@soltysh soltysh mentioned this pull request Aug 19, 2015
@soltysh
Copy link
Contributor

soltysh commented Aug 19, 2015

I've picked up the changes but had to rework this to place job in expapi, here's the current work #12910

@bgrant0607 bgrant0607 closed this Aug 25, 2015
@bprashanth bprashanth deleted the job_resource branch October 26, 2015 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants