Add KEP for Indexed Job in implementable status #2245

alculquicondor · 2021-01-08T16:17:46Z

Ref #2214 kubernetes/kubernetes#97169

/sig apps

alculquicondor · 2021-01-08T16:34:22Z

cc @ahg-g @soltysh @erictune

keps/sig-apps/2214-array-job/README.md

keps/sig-apps/2214-indexed-job/README.md

ahg-g · 2021-01-12T20:16:18Z

keps/sig-apps/2214-indexed-job/README.md

+annotation. In the event of a kube-controller-manager upgrade, some running or
+succeeded Pods might exist without a completion index.
+
+<<[UNRESOLVED handling Jobs with existing Pods without completion index ]>>


Introducing the annotation shouldn't impact existing workloads unless they explicitly use it. I don't expect that users would create an indexed job that is tolerant to not having the index. So I don't really see the need for addressing this scenario.

This is more about the controller itself. It needs to do something with the existing Pods in terms of completion index tracking.

erictune

Looks good, Aldo. Added some optional suggestions.

keps/sig-apps/2214-indexed-job/README.md

alculquicondor · 2021-01-15T18:03:50Z

/assign @janetkuo

alculquicondor · 2021-01-19T18:53:38Z

/label api-review

erictune

/approve
/lgtm

alculquicondor · 2021-01-20T16:04:15Z

@wojtek-t friendly reminder for PRR

keps/sig-apps/2214-indexed-job/README.md

janetkuo · 2021-01-21T01:49:57Z

keps/sig-apps/2214-indexed-job/README.md

+### Job scaling
+
+Jobs can be scaled up and down by varying `.spec.parallelism` (note that
+`.spec.completions` is an immutable field).


Please avoid using "scale" here, as Jobs don't follow the normal semantics for scaling. Ref kubernetes/kubernetes#60139

janetkuo · 2021-01-22T02:20:56Z

/lgtm

wojtek-t

I added some comments, but mostly fairly minor one.

wojtek-t · 2021-01-25T13:01:31Z

keps/sig-apps/2214-indexed-job/README.md

+
+The Job controller doesn't add the environment variable if there is a name
+conflict with an existing environment variables in the Job's Pod template or if
+the user already specified another environment variable for the same annotation.


Why? What's the problem with having the value referenced by two separate env vars?

I think it makes sense to give the user control over the names of the variables. A binary could always have some behavior defined for JOB_COMPLETION_INDEX which is different from the one provided by k8s.

I guess I wasn't clear. I wasn't asking to touch customer-defined variable.

I was just asking - why we don't want to provide JOB_COMPLETION_INDEX unconditionally (either with user-propagated-value if set by user or as above if not).

I don't see a problem if you have two env vars, say FOO and JOB_COMPLETION_INDEX with the same value set to the val of that annotation.

I understood you correctly :)

What I'm saying is: what if a JOB_COMPLETION_INDEX variable causes an undesirable effect on the user binary?

I guess in this case they could override that environment variable with another value.

Make the env name unique, KUBERNETES_JOB_INDEX, for example. User can always map that annotation to other pre-defined name, but we should always inject it.

I don't have a strong opinion about the name. But I'm still not convinced we should always inject the value if the user is already adding it.

We should always inject, unless user explicitly overwrites that name.

The behavior should be consistent - if you job is indexed, var X (I don't have strong opinion about name) will always be set (in your behavior you have "the var may or may not be set" - that's super confusing).

Sounds good. Removed second condition.

wojtek-t · 2021-01-25T13:05:09Z

keps/sig-apps/2214-indexed-job/README.md

+is resolved:
+
+The Job controller keeps track of completed indexes in the Job status in a
+base64 encoded bitmask.


Can you explicitly describe how JobStatus is updated then (for completeness)?

Another question is: with the bitmask field and the proposal above, will we end-up with two separate fields for that?

Ohh - or you're saying that "we don't solve it for Alpha at all" - just rely on existence of all pods for now?

Correct. This would be a beta graduation criteria, pending on #2307

OK - if you're not touching JobStatus for now at all, that's fine.

I agree that tracking completed pods is orthogonal to this proposal.

wojtek-t · 2021-01-25T13:07:28Z

keps/sig-apps/2214-indexed-job/README.md

+  // The Job is considered complete when there is one successful Pod for each	
+  // index in the range 0 to (.spec.completions - 1).	
+  // When value is `Indexed`, .spec.completions must have a non-zero positive	
+  // value and `.spec.parallelism` must be less than or equal to 10^6.


I'm assuming that the reasoning behind this 10^6 is the underlying bitmask, right?
i.e. we're saying that tracking 10^6 items in bitmask takes O(125kB) and this is fine, right?

That sounds fine to me, but I would like to ensure that somewhere in the design section we will add a requirement that such big objects won't be updated frequently (e.g. not more often that once per couple seconds or something like that).

Yes, this is about being able to maintain the bitmask. Although, if we are going to use it to track completion, we actually need to cap .spec.completions too. The idea is that in the future we don't depend on lingering pods. @erictune thoughts?

e.g. not more often that once per couple seconds or something like that

Should we be doing some kind of client side throttling? In the current design, Job status updates would come after removing finalizers from finishing Pods, which already throttles Job updates in a way. But if one Pod finishes every 100ms (for example), we might still do more Job updates than what you are suggesting.

I changed the limit to be on completions, instead of parallelism

keps/sig-apps/2214-indexed-job/README.md

wojtek-t · 2021-01-25T13:18:47Z

keps/sig-apps/2214-indexed-job/README.md

+
+If, instead of a downgrade, the feature is disabled:
+
+  - kube-apiserver sets `.spec.completionMode=NonIndexed` for new Jobs.


I don't understand how this is done (and how you distinguish downgrade from disabling exactly). Can you be more specific?

This is standard logic in API strategies for disabled feature gates. Reworded a bit.

I don't exactly distinguish a downgrade, as I cannot control the code of version N-1. So the paragraphs above describe how the code behaves in that scenario.

But what I can control is a feature being disabled, in which case I prefer to ignore Indexed Jobs that were already created.

wojtek-t · 2021-01-25T13:22:45Z

keps/sig-apps/2214-indexed-job/README.md

+
+* **Are there any tests for feature enablement/disablement?**
+
+  Yes, unit and integration test for the feature enabled and disabled.


If we're going to have something fancy for feature disablement, than we should have (at least) unit test for disablement.

You can see an example of doing that here (under review, but the idea is there):
https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246

That was the intention of my text here. Reworded.

wojtek-t · 2021-01-25T20:11:10Z

keps/sig-apps/2214-indexed-job/README.md

+
+The Job controller doesn't add the environment variable if there is a name
+conflict with an existing environment variables in the Job's Pod template or if
+the user already specified another environment variable for the same annotation.


I guess I wasn't clear. I wasn't asking to touch customer-defined variable.

I was just asking - why we don't want to provide JOB_COMPLETION_INDEX unconditionally (either with user-propagated-value if set by user or as above if not).

I don't see a problem if you have two env vars, say FOO and JOB_COMPLETION_INDEX with the same value set to the val of that annotation.

wojtek-t · 2021-01-25T20:11:57Z

keps/sig-apps/2214-indexed-job/README.md

+is resolved:
+
+The Job controller keeps track of completed indexes in the Job status in a
+base64 encoded bitmask.


OK - if you're not touching JobStatus for now at all, that's fine.

wojtek-t · 2021-01-25T20:15:02Z

keps/sig-apps/2214-indexed-job/README.md

+  // The Job is considered complete when there is one successful Pod for each	
+  // index in the range 0 to (.spec.completions - 1).	
+  // When value is `Indexed`, .spec.completions must have a value between 1
+  // and 10^6.


Yes, this is about being able to maintain the bitmask. Although, if we are going to use it to track completion, we actually need to cap .spec.completions too. The idea is that in the future we don't depend on lingering pods. @erictune thoughts?

I don't think you need to limit completions. The whole points that Eric made in one of other comments proved that it's enough to limit parallelism [because if you won't be storing bitmask, but rather [a-b,c-d,e-f,...] format, there will be at most "parallelism` of such pairs, so the length of that is actually bounded by parallelism.
Bounding completions is much more limiting that limiting parallelism to something as high as 10^6.

Should we be doing some kind of client side throttling? In the current design, Job status updates would come after removing finalizers from finishing Pods, which already throttles Job updates in a way. But if one Pod finishes every 100ms (for example), we might still do more Job updates than what you are suggesting.

Yes - we should batch those, and send 1 update per X second. We can tune X later.

I don't think you need to limit completions. The whole points that Eric made in one of other comments proved that it's enough to limit parallelism [because if you won't be storing bitmask, but rather [a-b,c-d,e-f,...] format, there will be at most "parallelism` of such pairs, so the length of that is actually bounded by parallelism.

So it depends on the implementation. If we use a bitmask, then we have to limit completions. If we limit parallelism, we would have to use the compressed list format, but then the limit might have to be lower than 10^6, as we require more characters. We don't have to be concerned about display space, as we can just limit kubectl describe to print less characters.

@soltysh any thoughts?

Running 10^6 of pods at the same time is far from supported at this point.
Having more than 10^6 of pods within a single job sounds like a valid usecase to me.

So I would be much more comfortable saying that "parallelism <= 10^5" (so decreasing that by order of magnitude even), than by setting a limit on the completions

Moved back to limit parallelism and use a compressed list format instead of bitmask.

wojtek-t · 2021-01-25T20:16:59Z

keps/sig-apps/2214-indexed-job/README.md

+  - kube-apiserver sets `.spec.completionMode=NonIndexed` for new Jobs at
+    creation time.
+  - kube-controller-manager ignores existing Indexed Jobs and emits a warning
+    event.


How exactly those are ignored? Are we killing the existing pods? Are we just not starting new ones? Are we stacking pods being finished?

Added specificity.

soltysh · 2021-01-25T20:49:32Z

keps/sig-apps/2214-indexed-job/README.md

+  // More completion modes can be added in the future. If a Job controller
+  // observes a mode that it doesn't recognize, it manages the Job as in
+  // `NonIndexed`.
+  CompletionMode string	


I'd propose creating a new CompletionModeType with pre-defined constants, like we do in other places such as this.

Maybe some more generic like JobType which we could re-use in the future if we decide to implement different behaviour of the job controller. I'm worried that CompletionMode might be limiting. Other ideas: WorkMode, JobMode

I think CompletionMode has a good balance of specificity and generality. We already have one in mind for the future (see Alternative): IndexedAndUnique.

The point is that all refer to how completion is handled. If there is a mode that refers to something other than completion, it should have its own field.

I think I agree with @soltysh on making a dedicated type, but I think this will get figured out during API review.

I already gave it a type :)

soltysh · 2021-01-25T20:59:56Z

keps/sig-apps/2214-indexed-job/README.md

+
+The Job controller doesn't add the environment variable if there is a name
+conflict with an existing environment variables in the Job's Pod template or if
+the user already specified another environment variable for the same annotation.


Make the env name unique, KUBERNETES_JOB_INDEX, for example. User can always map that annotation to other pre-defined name, but we should always inject it.

soltysh · 2021-01-25T21:03:28Z

keps/sig-apps/2214-indexed-job/README.md

+is resolved:
+
+The Job controller keeps track of completed indexes in the Job status in a
+base64 encoded bitmask.


I agree that tracking completed pods is orthogonal to this proposal.

soltysh · 2021-01-25T21:06:16Z

keps/sig-apps/2214-indexed-job/README.md

+Completed Indexes: [1-25,28,30-32]
+``` 
+
+The command crops the list of indexes once it reaches 200 characters.


200 seems a lot, I don't think we have a hard limit in kubectl, but I'm positive this is too much. Also it's not relevant to this proposal, I'd drop it from here. It's implementation detail.

Removed specificity.

soltysh · 2021-01-25T21:12:00Z

keps/sig-apps/2214-indexed-job/README.md

+
+  - kube-apiserver sets `.spec.completionMode=NonIndexed` for new Jobs at
+    creation time.
+  - kube-controller-manager ignores existing Indexed Jobs and emits a warning


Warn and run as non-indexed? Warn and ignore? Can you be more specific?

Added specificity.

soltysh · 2021-01-25T21:12:48Z

keps/sig-apps/2214-indexed-job/README.md

+
+If, instead of a downgrade, the cluster administrator disables the feature gate:
+
+  - kube-apiserver sets `.spec.completionMode=NonIndexed` for new Jobs at


Will there be validation in kube-apiserver not to allow setting indexed jobs when this feature is disabled?

the value is overriden. Reworded for clarity.

wojtek-t · 2021-01-26T18:35:49Z

@alculquicondor - this LGTM, please squash the commits

alculquicondor · 2021-01-26T19:09:56Z

squashed

wojtek-t · 2021-01-26T20:12:15Z

/approve

I will give a chance to @soltysh - to take another look.

k8s-ci-robot · 2021-01-26T20:12:34Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, erictune, janetkuo, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [wojtek-t]
~~keps/sig-apps/OWNERS~~ [janetkuo]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

alculquicondor · 2021-01-26T20:39:34Z

/hold
for LGTM from @soltysh

@janetkuo could you approve?

alculquicondor · 2021-01-26T20:45:04Z

sorry, I missed that it was already approved :)

ahg-g · 2021-01-26T20:59:59Z

/lgtm

since I reviewed it earlier.

wojtek-t · 2021-01-27T12:37:16Z

I already gave it a type :)

Yeah - I realized after writing this comment :) I think that was the last non-addressed one. So let me cancel the hold - it's in very good shape now.

/hold cancel

soltysh · 2021-01-28T13:42:09Z

Was one day late, but 👍 thanks for that 🎉

k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 8, 2021

k8s-ci-robot requested review from mattfarina and prydonius January 8, 2021 16:18

k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Jan 8, 2021

alculquicondor force-pushed the array-job branch from 153bd05 to 0430c0d Compare January 8, 2021 16:18

wojtek-t reviewed Jan 11, 2021

View reviewed changes

keps/sig-apps/2214-array-job/README.md Outdated Show resolved Hide resolved

keps/sig-apps/2214-array-job/README.md Outdated Show resolved Hide resolved

alculquicondor changed the title ~~Add KEP for Array Job in provisional status~~ Add KEP for Indexed Job in implementable status Jan 12, 2021

alculquicondor commented Jan 12, 2021

View reviewed changes

keps/sig-apps/2214-indexed-job/README.md Outdated Show resolved Hide resolved

ahg-g reviewed Jan 12, 2021

View reviewed changes

erictune reviewed Jan 12, 2021

View reviewed changes

keps/sig-apps/2214-indexed-job/README.md Show resolved Hide resolved

keps/sig-apps/2214-indexed-job/README.md Outdated Show resolved Hide resolved

keps/sig-apps/2214-indexed-job/README.md Show resolved Hide resolved

keps/sig-apps/2214-indexed-job/README.md Show resolved Hide resolved

alculquicondor force-pushed the array-job branch 2 times, most recently from b292a1e to c4575c8 Compare January 14, 2021 22:11

k8s-ci-robot assigned janetkuo Jan 15, 2021

k8s-ci-robot added the api-review Categorizes an issue or PR as actively needing an API review. label Jan 19, 2021

alculquicondor force-pushed the array-job branch from 46d518a to d12b83e Compare January 19, 2021 19:19

erictune approved these changes Jan 20, 2021

View reviewed changes

k8s-ci-robot assigned erictune Jan 20, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 20, 2021

janetkuo reviewed Jan 21, 2021

View reviewed changes

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 21, 2021

alculquicondor force-pushed the array-job branch from 9b5b778 to 14dc865 Compare January 21, 2021 18:42

alculquicondor mentioned this pull request Jan 21, 2021

Indexed Job semantics in Job API #2214

Closed

12 tasks

janetkuo approved these changes Jan 22, 2021

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 22, 2021

wojtek-t self-assigned this Jan 22, 2021

wojtek-t reviewed Jan 25, 2021

View reviewed changes

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 25, 2021

wojtek-t reviewed Jan 25, 2021

View reviewed changes

soltysh reviewed Jan 25, 2021

View reviewed changes

alculquicondor force-pushed the array-job branch from 1c795a5 to b163ebd Compare January 26, 2021 15:44

alculquicondor mentioned this pull request Jan 26, 2021

Add Indexed completionMode to Job API kubernetes/kubernetes#98441

Merged

Add KEP for Indexed Job in implementable status

ddae30b

alculquicondor force-pushed the array-job branch from b163ebd to ddae30b Compare January 26, 2021 19:09

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 26, 2021

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 26, 2021

k8s-ci-robot assigned ahg-g Jan 26, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 26, 2021

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 27, 2021

k8s-ci-robot merged commit 9106c8f into kubernetes:master Jan 27, 2021

k8s-ci-robot added this to the v1.21 milestone Jan 27, 2021


		If, instead of a downgrade, the feature is disabled:

		- kube-apiserver sets `.spec.completionMode=NonIndexed` for new Jobs.


		* Are there any tests for feature enablement/disablement?

		Yes, unit and integration test for the feature enabled and disabled.


		If, instead of a downgrade, the cluster administrator disables the feature gate:

		- kube-apiserver sets `.spec.completionMode=NonIndexed` for new Jobs at

Add KEP for Indexed Job in implementable status #2245

Add KEP for Indexed Job in implementable status #2245

Conversation

alculquicondor commented Jan 8, 2021 • edited Loading

alculquicondor commented Jan 8, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erictune left a comment

Choose a reason for hiding this comment

alculquicondor commented Jan 15, 2021

alculquicondor commented Jan 19, 2021

erictune left a comment

Choose a reason for hiding this comment

alculquicondor commented Jan 20, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janetkuo commented Jan 22, 2021

wojtek-t left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wojtek-t commented Jan 26, 2021

alculquicondor commented Jan 26, 2021

wojtek-t commented Jan 26, 2021 • edited Loading

k8s-ci-robot commented Jan 26, 2021

alculquicondor commented Jan 26, 2021

alculquicondor commented Jan 26, 2021

ahg-g commented Jan 26, 2021

wojtek-t commented Jan 27, 2021

soltysh commented Jan 28, 2021

alculquicondor commented Jan 8, 2021 •

edited

Loading

wojtek-t commented Jan 26, 2021 •

edited

Loading