Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Argo/Tekton workflows #74

Open
ahg-g opened this issue Feb 25, 2022 · 57 comments
Open

Support Argo/Tekton workflows #74

ahg-g opened this issue Feb 25, 2022 · 57 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Comments

@ahg-g
Copy link
Contributor

ahg-g commented Feb 25, 2022

This is lower priority than #65, but it would be good to have an integration with a workflow framework.

Argo supports the suspend flag, the tricky part is that suspend is for the whole workflow, meaning a QueuedWorkload would need to represent the resources of the whole workflow all at once.

Ideally Argo should create jobs per sequential step, and then resource reservation happens one step at a time.

@ahg-g ahg-g added kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Feb 25, 2022
@alculquicondor
Copy link
Contributor

alculquicondor commented Feb 25, 2022

FYI @terrytangyuan

Also, extracted from a comment in https://bit.ly/kueue-apis (can't find the person's github)

A compromise might be a way of submitting a job, but have it "paused" so that the workflow manager can unpause it after its deps have been met, but the job still can wait in line in the queue so it doesn't add a lot of wall clock time. The scheduler would ignore any paused jobs until they are unpaused?

The idea is to allow for a dependent job to jump to the head of the queue when the dependencies are met.

@kfox1111
Copy link

Yes, but it essentially only jumps to the head of the line if it already was at the head of the line.

@terrytangyuan
Copy link
Member

terrytangyuan commented Mar 1, 2022

I guess I'll have to read through the design doc for queue APIs in order to understand the use case better here. Any thoughts on what the integration looks like and how the two interoperate with each other?

@kfox1111
Copy link

kfox1111 commented Mar 2, 2022

Consider there to be two components. a queue, and a scheduler.
The queue is where jobs wait in line. A scheduler picks entries to work on at the head of the line.

Sometimes in the real world, its a family waiting in line. One member goes off to use the bathroom. If they are not back by the time its their turn, they usually say, "let the next folks go, we're not ready yet". The scheduler in this case just ignores that entry and goes to the next entry in the queue. The option to allow jobs to be "not ready yet, don't schedule me, but still queue me" could be interesting to various workflow managers.

@alculquicondor alculquicondor changed the title Support Argo workflows Support Argo/Tekton workflows Mar 17, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 15, 2022
@kerthcet
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 16, 2022
@kannon92
Copy link
Contributor

kannon92 commented Sep 4, 2022

Would a similar integration like Argo and Volcano work in this case?

https://github.com/volcano-sh/volcano/blob/master/example/integrations/argo/20-job-DAG.yaml

@alculquicondor
Copy link
Contributor

Not really. That seems to be creating a different job for each step of the workflow. Then, each job enters the queues only after the previous step has finished. This can already be accomplished with Kueue and batch/v1.Job.

We would like to enhance the experience roughly as described here: #74 (comment)

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 5, 2022
@kerthcet
Copy link
Contributor

kerthcet commented Dec 6, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 6, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 6, 2023
@tenzen-y
Copy link
Member

tenzen-y commented Mar 6, 2023

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 6, 2023
@lbernick
Copy link

Hi, I am trying to figure out if I could use Kueue for queueing Tekton PipelineRuns (more info on tekton at tekton.dev/docs). From reading bit.ly/kueue-apis, it seems like Kueue is going to have separate controllers that create Workload objects for different types of workloads (although I'm not sure if that's the case yet).

Would it be reasonable to write a separate controller that creates Workload objects for pending PipelineRuns, and starts the PipelineRuns when the workload is admitted by the queue? I'm not sure if this is possible because it seems like kueue somehow mutates the workloads' node affinity directly, and the relationship between PipelineRuns and pod specs doesn't work in quite the same way as between Jobs and pod specs.

I'm also curious if it's possible to create a queue that is just based on count of running objects rather than their compute resource requirements.

More details on what I'm trying to do: https://github.com/tektoncd/community/blob/main/teps/0132-queueing-concurrent-runs.md

@alculquicondor
Copy link
Contributor

it seems like Kueue is going to have separate controllers that create Workload objects for different types of workloads (although I'm not sure if that's the case yet).

These controllers can live in the Kueue repo, the tekton repo or a new repo altogether.
We currently have a controller for kubeflow MPIJob in the kueue repo. If the Tekton community is open to have this integration, we can discuss where is the best place to put it.

Would it be reasonable to write a separate controller that creates Workload objects for pending PipelineRuns, and starts the PipelineRuns when the workload is admitted by the queue?

Depends on what you want. When talking about workflows, there are two possibilities: (a) queue the entire workflow or (b) queue the steps.

I'm not sure if this is possible because it seems like kueue somehow mutates the workloads' node affinity directly, and the relationship between PipelineRuns and pod specs doesn't work in quite the same way as between Jobs and pod specs.

Injecting node affinities is the mechanism to support fungibility (example: this job can run on ARM or x86, let kueue decide to run it where there is still quota). If this is not something that matters to you, you can not create flavors.

I'm also curious if it's possible to create a queue that is just based on count of running objects rather than their compute resource requirements.

Kueue is a quota-based system. Currently it uses pod resource requests and we plan to add number of pods #485.
What kind of object would make sense to count in Tekton? I would expect that there should be resource requests somewhere.

I'll comment more when I finish reading the doc above. Thanks for sharing :)

cc @kerthcet

@lbernick
Copy link

Thanks for your response!

These controllers can live in the Kueue repo, the tekton repo or a new repo altogether. We currently have a controller for kubeflow MPIJob in the kueue repo. If the Tekton community is open to have this integration, we can discuss where is the best place to put it.

Still in the early exploration phase, but looking forward to discussing more what would work!

Kueue is a quota-based system. Currently it uses pod resource requests and we plan to add number of pods #485. What kind of object would make sense to count in Tekton? I would expect that there should be resource requests somewhere.

Tekton uses PipelineRuns, which are DAGs of TaskRuns, and each TaskRun corresponds to a pod. One of our use cases is basically just to avoid overwhelming a kube cluster, in which case queueing based on resource requirements would be useful. However, there are some wrinkles with how we handle resource requirements, since we have containers running sequentially in a pod rather than in parallel, so the default k8s assumption that pod resource requirements are the sum of container resource requirements doesn't apply. For this reason, queueing based on TaskRun or PipelineRun count may be simpler for us. Since TaskRuns correspond to pods, queueing based on pod count would solve the TaskRun use case at least.

We also have some use cases that would probably need to be met in Tekton with a wrapper API (e.g. "I want to have only 5 PipelineRuns at a time of X Pipeline that communicates with a rate-limited service"; "I want to have only one deployment PipelineRun running at a time", etc). If we could use Kueue to create a queue of at most X TaskRuns, we'd be in good shape to design something in Tekton meeting these needs.

@alculquicondor
Copy link
Contributor

Since TaskRuns correspond to pods, queueing based on pod count would solve the TaskRun use case at least.

Yes, the pod count would help. But I would encourage users to also add pod requests. This is particularly important for HPC workflows. You might want dedicated CPUs and accelerators.

I agree that it wouldn't make sense to queue at a lower level than TaskRuns.

@alculquicondor
Copy link
Contributor

You are welcome to add a topic to our WG Batch meetings if you want to show your design proposals for queuing workflows.

https://docs.google.com/document/d/1XOeUN-K0aKmJJNq7H07r74n-mGgSFyiEDQ3ecwsGhec/edit

@KunWuLuan
Copy link
Member

KunWuLuan commented Apr 26, 2024

If the user want to run the step which contains multi pods only when all pods can run, we may need some methods to know which pods should be in the same workload. So only pod integration may not enough.

@kerthcet
Copy link
Contributor

kerthcet commented Jun 3, 2024

cc @Zhuzhenghao Discussion about integrating Kueue with tekton.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 1, 2024
@kannon92
Copy link
Contributor

kannon92 commented Sep 1, 2024

/remove-lifecycle stale

@terrytangyuan
Copy link
Member

argoproj/argo-workflows#12363 has 22 upvotes. We just need someone to drive this.

@KunWuLuan
Copy link
Member

@terrytangyuan Hi, is there any conclusion about what exactly to suspend (the entire workflow or the layer)?

@KunWuLuan
Copy link
Member

We developed two different ways to help the queuing of workflows in our environment.

  1. Users can define the max resources on the entire workflow. During the execution of workflow, the total resources can not exceed the admission.
  2. Integrate a controller to convert a workflow to insert a suspend template before each layer. And create a workload for each suspend layer.

And the plain pods suspend is also an available method.
I can write a simple KEP to tell the advantages and disadvantages for each method and track the discussion.
@alculquicondor @tenzen-y Hi, is there anyone working on this?

@lukasmrtvy
Copy link

@KunWuLuan is the mentioned controller opensourced? Thanks :)

@tenzen-y
Copy link
Member

tenzen-y commented Sep 3, 2024

We developed two different ways to help the queuing of workflows in our environment.

  1. Users can define the max resources on the entire workflow. During the execution of workflow, the total resources can not exceed the admission.
  2. Integrate a controller to convert a workflow to insert a suspend template before each layer. And create a workload for each suspend layer.

And the plain pods suspend is also an available method. I can write a simple KEP to tell the advantages and disadvantages for each method and track the discussion. @alculquicondor @tenzen-y Hi, is there anyone working on this?

@KunWuLuan Thank you for tackling this issue.

  1. Does this indicate a new API object or field? or Reusing existing API objects or fields?
  2. Does this indicate a part of Job integration controller implemented GenericJob interface similar to batch/v1 Job and other Jobs.

As a first step, it would be a great improvement if you could provide documents and examples for Plain Pod Integration + ArgoWorkflows.

@kerthcet
Copy link
Contributor

kerthcet commented Sep 3, 2024

What's the layer means here? One step?

If so, I think maybe it's possible to create all workloads for all the steps(parallel steps as one workload) and suspend them all. Once a workload finishes, allow the next one, I think the controller knows the dependence.

However, how can we distinguish with the injected suspend vs used configured suspend.

I think the approach 1) can be a simple start. Anyway, glad to see the KEP.

@alculquicondor
Copy link
Contributor

Note that someone started a PR to document how to use the plain pods integration with argo #1545, but they abandoned it.

Regardless, I would be interested in a more robust support at the layer level. See this comment for my high level proposal argoproj/argo-workflows#12363 (comment)

@kfox1111
Copy link

kfox1111 commented Sep 3, 2024

plain pod.... could that work with gitlab runner jobs too? the lack of scheduling there has been a pain.

@KunWuLuan
Copy link
Member

@KunWuLuan is the mentioned controller opensourced? Thanks :)

The controller is not opensourced yet.

Does this indicate a new API object or field? or Reusing existing API objects or fields?

Yes, we introduced a specific key in workflow's annotations like

 annotations:
   min-resources: |
     cpu: 5
     memory: 5G

Does this indicate a part of Job integration controller implemented GenericJob interface similar to batch/v1 Job and other Jobs.

Yes we deployed a Job integration controller which contains a controller to create CR like workload and a controller to inject suspend template to original workflow.

As a first step, it would be a great improvement if you could provide documents and examples for Plain Pod Integration + ArgoWorkflows.

On problem, working on it. : )

@alculquicondor
Copy link
Contributor

Yes we deployed a Job integration controller which contains a controller to create CR like workload and a controller to inject suspend template to original workflow.

That seems useful, but annotations are not a sustainable API. Argo folks were in favor of doing a proper integration, so we can probably change their API to accommodate the needs of the integration.

But again, something at the layer level is probably better.

@tenzen-y
Copy link
Member

tenzen-y commented Sep 6, 2024

Yes we deployed a Job integration controller which contains a controller to create CR like workload and a controller to inject suspend template to original workflow.

That seems useful, but annotations are not a sustainable API. Argo folks were in favor of doing a proper integration, so we can probably change their API to accommodate the needs of the integration.

But again, something at the layer level is probably better.

I think that we want to support the creation of Workload at the layer level as well, and we want to push all Workload sequentially. This layer-level approach allows us to prevent wasting resources for the entire workflow.

But, I think that we can evaluate the layer-level approach during the KEP (#2976).

@KunWuLuan
Copy link
Member

@alculquicondor @tenzen-y
Introduced a KEP to discuss the advantages and constraints of three different granularity levels for supporting workflows , and three approaches for supported workflows at the layer level are also proposed.

@KunWuLuan
Copy link
Member

@terrytangyuan If you have time, please also have a look, thanks very much.

@terrytangyuan
Copy link
Member

Awesome! I'll share the proposal around the Argo Workflows community as well.

@terrytangyuan
Copy link
Member

Can someone remove "Tekton" from the title of this issue?

@alculquicondor
Copy link
Contributor

Ideally, the mechanism should be extensible to Tekton and any other workflow manager. But certainly, we can start with just Argo.

@kerthcet
Copy link
Contributor

Ideally, the mechanism should be extensible to Tekton and any other workflow manager. But certainly, we can start with just Argo.

+1 this should be aligned with other workflow tools as well, from kueue side.

@KunWuLuan
Copy link
Member

Ideally, the mechanism should be extensible to Tekton and any other workflow manager. But certainly, we can start with just Argo.

I understand. In that case, this component should aim to minimize its dependencies on modifications to other workflow managers.

@alculquicondor
Copy link
Contributor

Not necessarily. But it should aim at modification that could be feasible in other projects. Just like we did the suspend field for Job that could be replicated in projects such as kubeflow and kuberay.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 11, 2024
@KunWuLuan
Copy link
Member

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

No branches or pull requests