-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make indices of IndexedJob pods configurable #109131
Comments
@isibeni: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/wg batch |
@isibeni: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig apps |
/area batch |
cc @soltysh The field probably needs to be a string and it should have some limit for the number of characters. As for the status, it might be harder to limit. In Indexed Jobs we limit https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/job-v1/#lifecycle IMO, The feature would be a nice addition to Indexed Jobs. Let's see what others say. |
Agree. I would propose that it should have the same format as the |
@alculquicondor Could you elaborate a bit what the problem with that is? Performance and memory consumption problems for scenarios where we can't simplify the string (e.g. all pods with even indexes are completing first)? |
Not just performance, but actually the etcd database has limits. You have identified the correct challenging scenario. |
And with introducing the new field the challenging scenario would be when every other index fails directly (e.g. Based on that we would need to check if we need to reduce the max value for |
If an index fails, but succeeds with a follow up pod, we shouldn't list it in For |
sorry I just figured out that I had a small misconception about the behavior of backofflimit on indexed jobs. Or am I missing sth? Maybe we can consider introducing a behaviour like a just described. This behaviour would also help with this issue: #107891 |
The backofflimit is for the entire job. I think the feature request still makes sense. The job overall is marked as Failed, but some indexes might have completed. |
That would be a separate feature request. A we would need a new field, as we cannot change the behavior of existing fields, for backwards compatibility. |
Agree on that.
Yeah you are right. But then I would propose that we don't need the status field |
This was brought to my attention during last wg-batch call. A few ideas worth considering.
|
I created #109712 to track the idea of a job execution mode where every index is allowed to execute. |
Not necessarily. As of now it would be just for convenience. For some it might not be so straightforward to implement the calculation. If we would provide it all clients/users don't need to reimplement it on their own. But for this feature request it is not so super urgent. I could also live with the fact that we keep this out for now.
2 things on that:
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale We will have to tackle this in 1.27 or later. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/lifecycle frozen |
/wg batch |
@alculquicondor what specifically is blocking? Why couldn't it be the case that you can configure the indices of the job, but not allow every index to execute? |
In a way, the features are independent. But doing #109712 first makes this feature more useful. So we started with #109712 The idea is that #109712 will include a field |
Gotcha - that does make sense. I'm very interested in this feature, and if they can be separate (and eventually integrated in the way you described) would it be OK if I started on it? I haven't contributed to Kubernetes proper and would like to try that out (so I'm more empowered to in the future). But if you have someone in mind and/or absolutely want to wait, I can respect that too! I was able to get a prototype for bursting with the Flux Operator, albeit I used a hack to get around this particular issue of not being able to control the indices. Let me know what you think, and if it's something I could help with, I would want to ask to be pointed to step 1. :) |
I don't think there were any takers previously. Note for a feature like this, you need to follow the KEP process: https://github.com/kubernetes/enhancements/tree/master/keps/sig-architecture/0000-kep-process Ideally, before starting, it's helpful if you introduce yourself to SIG Apps (https://github.com/kubernetes/community/tree/master/sig-apps#meetings) announcing your intent and high level ideas, so that you clear possible questions and roadblocks in the future. |
What would you like to be added?
As a user I would like to be able to configure with which index the pods of a IndexedJob are getting created.
By introducing a new field to the job spec API (possible name:
indices
-> TBD) we can influence with which index the controller would create pods.This field then has a mutually exclusive relation to the
completions
field.Ideally, we could also have a new status field called
failedIndices
which contains the indices of the failed pods.,
Why is this needed?
This is especially helpful when running large batch jobs where some of them are failing.
The user would now have an easy interface to just recreate the failed job and only rerun the failed indices by specifying them via the newly introduced field.
If we also introduce the status field
failedIndices
the user could easily pick the failed indices from that status and recreate the job with those indices. Introducing this status field also allows automating this workflowThe text was updated successfully, but these errors were encountered: