DRA: competition between schedulers + allocators #128980
Labels
kind/feature
Categorizes issue or PR as related to a new feature.
needs-triage
Indicates an issue or PR lacks a `triage/foo` label and requires one.
sig/scheduling
Categorizes an issue or PR as relevant to SIG Scheduling.
wg/device-management
Categorizes an issue or PR as relevant to WG Device Management.
What would you like to be added?
At the moment, DRA uses the approach that one scheduler instance "owns" all resources on a node or available for a node (in the case of network-attached devices). This is the same approach that is used for other resources. It enables faster scheduling because allocation can happen without coordination with other entities.
This approach breaks down when there are multiple schedulers in the cluster such that each scheduler instance is responsible for its own subset of the nodes ("sharding") and there are network-attached devices that are available for more than one set of nodes.
Also, sometimes users run additional schedulers for the same nodes as the system scheduler. While that is already problematic regarding CPU and memory, with devices it might be even worse.
/sig scheduling
/wg device-management
Why is this needed?
For more advanced cluster setups.
The text was updated successfully, but these errors were encountered: