-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Argo/Tekton workflows #74
Comments
FYI @terrytangyuan Also, extracted from a comment in https://bit.ly/kueue-apis (can't find the person's github)
The idea is to allow for a dependent job to jump to the head of the queue when the dependencies are met. |
Yes, but it essentially only jumps to the head of the line if it already was at the head of the line. |
I guess I'll have to read through the design doc for queue APIs in order to understand the use case better here. Any thoughts on what the integration looks like and how the two interoperate with each other? |
Consider there to be two components. a queue, and a scheduler. Sometimes in the real world, its a family waiting in line. One member goes off to use the bathroom. If they are not back by the time its their turn, they usually say, "let the next folks go, we're not ready yet". The scheduler in this case just ignores that entry and goes to the next entry in the queue. The option to allow jobs to be "not ready yet, don't schedule me, but still queue me" could be interesting to various workflow managers. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
Would a similar integration like Argo and Volcano work in this case? https://github.com/volcano-sh/volcano/blob/master/example/integrations/argo/20-job-DAG.yaml |
Not really. That seems to be creating a different job for each step of the workflow. Then, each job enters the queues only after the previous step has finished. This can already be accomplished with Kueue and batch/v1.Job. We would like to enhance the experience roughly as described here: #74 (comment) |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
Hi, I am trying to figure out if I could use Kueue for queueing Tekton PipelineRuns (more info on tekton at tekton.dev/docs). From reading bit.ly/kueue-apis, it seems like Kueue is going to have separate controllers that create Workload objects for different types of workloads (although I'm not sure if that's the case yet). Would it be reasonable to write a separate controller that creates Workload objects for pending PipelineRuns, and starts the PipelineRuns when the workload is admitted by the queue? I'm not sure if this is possible because it seems like kueue somehow mutates the workloads' node affinity directly, and the relationship between PipelineRuns and pod specs doesn't work in quite the same way as between Jobs and pod specs. I'm also curious if it's possible to create a queue that is just based on count of running objects rather than their compute resource requirements. More details on what I'm trying to do: https://github.com/tektoncd/community/blob/main/teps/0132-queueing-concurrent-runs.md |
These controllers can live in the Kueue repo, the tekton repo or a new repo altogether.
Depends on what you want. When talking about workflows, there are two possibilities: (a) queue the entire workflow or (b) queue the steps.
Injecting node affinities is the mechanism to support fungibility (example: this job can run on ARM or x86, let kueue decide to run it where there is still quota). If this is not something that matters to you, you can not create flavors.
Kueue is a quota-based system. Currently it uses pod resource requests and we plan to add number of pods #485. I'll comment more when I finish reading the doc above. Thanks for sharing :) cc @kerthcet |
Thanks for your response!
Still in the early exploration phase, but looking forward to discussing more what would work!
Tekton uses PipelineRuns, which are DAGs of TaskRuns, and each TaskRun corresponds to a pod. One of our use cases is basically just to avoid overwhelming a kube cluster, in which case queueing based on resource requirements would be useful. However, there are some wrinkles with how we handle resource requirements, since we have containers running sequentially in a pod rather than in parallel, so the default k8s assumption that pod resource requirements are the sum of container resource requirements doesn't apply. For this reason, queueing based on TaskRun or PipelineRun count may be simpler for us. Since TaskRuns correspond to pods, queueing based on pod count would solve the TaskRun use case at least. We also have some use cases that would probably need to be met in Tekton with a wrapper API (e.g. "I want to have only 5 PipelineRuns at a time of X Pipeline that communicates with a rate-limited service"; "I want to have only one deployment PipelineRun running at a time", etc). If we could use Kueue to create a queue of at most X TaskRuns, we'd be in good shape to design something in Tekton meeting these needs. |
Yes, the pod count would help. But I would encourage users to also add pod requests. This is particularly important for HPC workflows. You might want dedicated CPUs and accelerators. I agree that it wouldn't make sense to queue at a lower level than TaskRuns. |
You are welcome to add a topic to our WG Batch meetings if you want to show your design proposals for queuing workflows. https://docs.google.com/document/d/1XOeUN-K0aKmJJNq7H07r74n-mGgSFyiEDQ3ecwsGhec/edit |
If the user want to run the step which contains multi pods only when all pods can run, we may need some methods to know which pods should be in the same workload. So only pod integration may not enough. |
cc @Zhuzhenghao Discussion about integrating Kueue with tekton. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
argoproj/argo-workflows#12363 has 22 upvotes. We just need someone to drive this. |
@terrytangyuan Hi, is there any conclusion about what exactly to suspend (the entire workflow or the layer)? |
We developed two different ways to help the queuing of workflows in our environment.
And the plain pods suspend is also an available method. |
@KunWuLuan is the mentioned controller opensourced? Thanks :) |
@KunWuLuan Thank you for tackling this issue.
As a first step, it would be a great improvement if you could provide documents and examples for Plain Pod Integration + ArgoWorkflows. |
What's the layer means here? One step? If so, I think maybe it's possible to create all workloads for all the steps(parallel steps as one workload) and suspend them all. Once a workload finishes, allow the next one, I think the controller knows the dependence. However, how can we distinguish with the injected I think the approach 1) can be a simple start. Anyway, glad to see the KEP. |
Note that someone started a PR to document how to use the plain pods integration with argo #1545, but they abandoned it. Regardless, I would be interested in a more robust support at the layer level. See this comment for my high level proposal argoproj/argo-workflows#12363 (comment) |
plain pod.... could that work with gitlab runner jobs too? the lack of scheduling there has been a pain. |
The controller is not opensourced yet.
Yes, we introduced a specific key in workflow's annotations like
Yes we deployed a Job integration controller which contains a controller to create CR like workload and a controller to inject suspend template to original workflow.
On problem, working on it. : ) |
That seems useful, but annotations are not a sustainable API. Argo folks were in favor of doing a proper integration, so we can probably change their API to accommodate the needs of the integration. But again, something at the layer level is probably better. |
I think that we want to support the creation of Workload at the layer level as well, and we want to push all Workload sequentially. This layer-level approach allows us to prevent wasting resources for the entire workflow. But, I think that we can evaluate the layer-level approach during the KEP (#2976). |
@alculquicondor @tenzen-y |
@terrytangyuan If you have time, please also have a look, thanks very much. |
Awesome! I'll share the proposal around the Argo Workflows community as well. |
Can someone remove "Tekton" from the title of this issue? |
Ideally, the mechanism should be extensible to Tekton and any other workflow manager. But certainly, we can start with just Argo. |
+1 this should be aligned with other workflow tools as well, from kueue side. |
I understand. In that case, this component should aim to minimize its dependencies on modifications to other workflow managers. |
Not necessarily. But it should aim at modification that could be feasible in other projects. Just like we did the |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
This is lower priority than #65, but it would be good to have an integration with a workflow framework.
Argo supports the suspend flag, the tricky part is that suspend is for the whole workflow, meaning a QueuedWorkload would need to represent the resources of the whole workflow all at once.
Ideally Argo should create jobs per sequential step, and then resource reservation happens one step at a time.
The text was updated successfully, but these errors were encountered: