Skip to content

Support dynamically sized (elastic) jobs #77

Open
@ahg-g

Description

We should have a clear path towards support spark and other dynamically sized jobs. Another example of this is Ray.

One related aspect is to support dynamically updating the resource requirements of a workload, we can probably limit that to support changing the count of a PodSet in QueuedWorkload (in Spark, the number of workers could change during the runtime of the job, but not the resource requirements of a worker).

One idea is to model it in a way similar to "in-place update to pod resources" [1], but in our case it would be the count that is mutable. The driver pod in spark would be watching for the corresponding QueuedWorkload instance and adjusts the number of workers when the new count is admitted.

[1] https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources

Metadata

Labels

kind/featureCategorizes issue or PR as related to a new feature.kind/grand-featurelifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.priority/important-longtermImportant over the long term, but may not be staffed and/or may need multiple releases to complete.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions