-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deviceplugin jiayingz #51209
Deviceplugin jiayingz #51209
Conversation
Hi @jiayingz. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
if d, ok := devsMap[dev.ContainerPath]; ok { | ||
glog.V(3).Infof("skip existing device %s %s", dev.ContainerPath, dev.HostPath) | ||
if (d != dev.HostPath) { | ||
glog.Errorf("Container device %s has conflicting mapping host devices: %s and %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we fail the pod / return an error at this point ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question for Mounts and Envs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My feeling is this kind of misconfiguration may not be fatal. E.g., say the same device is accessible through multiple paths and either is fine. Since we log error here, I think we can make this information accessible through npm for debugging purpose.
} | ||
|
||
// NewDevicePluginHandler create a DevicePluginHandler | ||
func NewDevicePluginHandler() (*DevicePluginHandlerImpl, error) { | ||
func NewDevicePluginHandler(enabled bool, updateCapacity updateCapacityCallback) (*DevicePluginHandlerImpl, error) { | ||
if !enabled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if should use the same pattern as for the GPU manager. As in:
type DevicePluginHandler interface
type DevicePluginHandlerImpl struct
type MockDevicePluginHandler struct
And when calling NewDevicePluginHandler instead of passing a bool:
if enabled {
cm.handler = NewDevicePluginHandlerImpl()
} else {
cm.handler = NewMockDevicePluginHandler()
}
That would avoid the if at the start of every methods and is a lot cleaner :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. Done.
|
||
mgr, err := deviceplugin.NewManagerImpl(pluginapi.DevicePluginPath, | ||
func(r string, a, u, d []*pluginapi.Device) {}) | ||
deviceManagerMonitorCallback := func(resourceName string, added, updated, deleted []*pluginapi.Device) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this would fit more as a method of the handler
?
You would then be able to call NewManagerImpl:
deviceplugin.NewManagerImpl(pluginapi.KubeletSocket, handler.deviceManagerMonitorCallback)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function is to be called by manager only. In that sense, there doesn't seem to be much value to store it in deviceplugin.
pkg/kubelet/kubelet_node_status.go
Outdated
@@ -572,6 +571,19 @@ func (kl *Kubelet) setNodeStatusMachineInfo(node *v1.Node) { | |||
} | |||
} | |||
} | |||
|
|||
if utilfeature.DefaultFeatureGate.Enabled(features.DevicePlugins) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mock structure would also remove this if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably safe to skip this feature checking anyway, but I am following the current model as LocalStorageCapacityIsolation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiayingz Local storage does not have a stub concept. We do not need this check here as @RenaudWasTaken pointed out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed now.
/ok-to-test |
} | ||
} | ||
|
||
glog.Infof("Creating device plugin handler: %t", devicePluginEnabled) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should log level be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a one-time log, so probably ok to stay as info.
handler := &DevicePluginHandlerImpl{ | ||
allDevices: make(map[string]sets.String), | ||
allocated: devicesInUse(), | ||
updateCapacity: updateCapacity, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: s/updateCapacity/updateCapacityFunc or something similar to hint a callback function variable name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -544,6 +572,13 @@ func (cm *containerManagerImpl) Start(node *v1.Node, activePods ActivePodsFunc) | |||
} | |||
close(stopChan) | |||
}, time.Second, stopChan) | |||
|
|||
// Starts device plugin manager. | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be safe to remove this part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.
func (cm *containerManagerImpl) GetResources(pod *v1.Pod, container *v1.Container, activePods []*v1.Pod) (*kubecontainer.RunContainerOptions, error) { | ||
opts := &kubecontainer.RunContainerOptions{} | ||
// Gets devices, mounts, and envs from device plugin handler. | ||
glog.V(3).Infof("Calling device_plugin AllocateDevices") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: s/device_plugin/device_plugin handler ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
} | ||
} | ||
|
||
// updateAllocatedGPUs updates the list of GPUs in use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: updateAllocatedGPUs/updateAllocatedDevices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
devicePluginManager *deviceplugin.ManagerImpl | ||
// allDevices and allocated are keyed by resource_name. | ||
allDevices map[string]sets.String | ||
allocated map[string]*podDevices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/allocated/allocatedDevices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
pkg/kubelet/deviceplugin/manager.go
Outdated
func (m *ManagerImpl) Devices() map[string][]*pluginapi.Device { | ||
glog.V(2).Infof("Devices called") | ||
|
||
m.mutex.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are already using locks in device plugin handler, then what race condition this second level of locking is avoiding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this should be read from a top down perspective @vikaschoudhary16 but instead bottom up especially for locks :)
Anyways to answer your question the lock here is to prevent Endpoints
from changing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is from Renaud's commit but I think in general, each layer would have its own lock to protect its internal data structure. This prevents future breakage when we move things around.
available := h.allDevices[key].Difference(devicesInUse) | ||
glog.V(2).Infof("devices available: %v", available.List()) | ||
if int(available.Len()) < needed { | ||
return nil, fmt.Errorf("requested number of devices unavailable. Requested: %d, Available: %d", v, available.Len()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be needed
instead of v
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
for _, resp := range allocResps { | ||
for _, devRuntime := range resp.Spec { | ||
for _, dev := range devRuntime.Devices { | ||
if d, ok := devsMap[dev.ContainerPath]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering how can this if
condition get pass ever. At L#618, devMaps
is being initialized to an empty map, how can devsMap[dev.ContainerPath]
be non-empty in any case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Fixed. Note to myself, should add a unit test to cover this :).
pkg/kubelet/deviceplugin/utils.go
Outdated
|
||
// Returns whether the ResourceName points to a device plugin name. | ||
func IsDeviceName(k v1.ResourceName) bool { | ||
return v1helper.IsExtendedResourceName(k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsExtendedResourceName
definition is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a helper function added by Connor's PR. Should already be merged.
a1708ad
to
6edf47b
Compare
/assign @vishh FYI, this PR is ready for review. |
return nil, fmt.Errorf("requested number of devices unavailable. Requested: %d, Available: %d", needed, available.Len()) | ||
} | ||
allocated := available.UnsortedList()[:needed] | ||
resp, err := h.devicePluginManager.Allocate(key, append(devices.UnsortedList(), allocated...)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RenaudWasTaken @jiayingz one suggestion, I think we should invoke h.devicePluginManager.Allocate
outside of mutex lock because plugin manager is already using locks. Mutex of device plugin handler should be used only to access handler's internal data structures like h.allocatedDevices
for example. This looks to me performance efficient than existing implementation. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. A remote RPC with a lock is probably not a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pkg/kubelet/deviceplugin/endpoint.go
Outdated
|
||
} | ||
|
||
func (e *endpoint) allocate(devs []string) (*pluginapi.AllocateResponse, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RenaudWasTaken @jiayingz Pod/container details should also be passed along with device ids to the plugin. One use case for this requirement is special NIC resources such as Solarflare and Infiniband. With current API, it is possible to mount device files and libs inside container but the corresponding network interface will be available to use inside container only if --net=host
is used. This is kinda very strict limitation.
Instead, if pod details are also passed to the device plugin, device plugin can pass the allocated interface from the host net namespace to container net namespace and thus making it possible to use such pass through network interfaces inside non net=host containers as well.
NICs are one such use case, in future there could be many more. We should start with a wider scope and pass pod details as well to the device plugins and depending on requirement, plugins may or may not use these details. Thoughts?
cc @vishh @jeremyeder @sjenning @cmluciano
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vikaschoudhary16 the unofficial, unvetted plan is to have device plugins drop a CNI config along with necessary CNI utility binaries, and have device plugins return additional interface via CNI constructs. That way the device plugin API can be invoked during pod creation rather than during container creation.
In the future, if the networking SIG ends up having a similar RPC based CNI API, then the workflow will become simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unofficial plan sounds good to me on the surface. I think @3XX0 and I determined that an CNI plugin for Infiniband would probably be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I think the discussion on CNI integration will probably wait after 1.8.
return handler, nil | ||
} | ||
|
||
func (h *DevicePluginHandlerImpl) Start() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not find from where this is being invoked. Should it be invoked from within NewDevicePluginHandlerImpl()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Object creation is different from logical initialization of an object. The latter may have transitive dependencies on state of other objects and hence requires careful co-ordination initialization.
|
||
// NewDevicePluginHandler create a DevicePluginHandler | ||
func NewDevicePluginHandlerImpl(updateCapacityFunc updateCapacityCallback) (*DevicePluginHandlerImpl, error) { | ||
glog.V(2).Infof("Starting Device Plugin Handler") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is not invoking Start
of device plugin handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling to Start now moves to containerManagerImpl.Start().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the log should be updated to say that it's "creating" device plugin handler or maybe move the log to Start()
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
cbc13fe
to
bc942f5
Compare
Lost lgtm after rebase. @dchen1107 and @vishh could you help lgtm this PR again? Thanks! |
bc942f5
to
02001af
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dchen1107, jiayingz, vishh Associated issue: 695 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Automatic merge from submit-queue (batch tested with PRs 51590, 48217, 51209, 51575, 48627) |
} | ||
time.Sleep(1 * time.Second) | ||
} | ||
log.Println("Starting to serve on", m.socket) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiayingz Why use log.Println()
instead of glog
? Is this a separate binary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @resouer this is fixed in a follow up PR.
device_plugin_stub.go
is invoked from tests only though :)
Thanks for the review!
What this PR does / why we need it:
This PR implements the kubelet Device Plugin Manager.
It includes four commits implemented by @RenaudWasTaken and a commit that supports allocation.
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #Design document: kubernetes/community#695
PR tracking: kubernetes/enhancements#368
Special notes for your reviewer:
Release note:
Extending Kubelet to support device plugin