Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deviceplugin jiayingz #51209

Merged
merged 5 commits into from
Sep 2, 2017

Conversation

jiayingz
Copy link
Contributor

What this PR does / why we need it:
This PR implements the kubelet Device Plugin Manager.
It includes four commits implemented by @RenaudWasTaken and a commit that supports allocation.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #
Design document: kubernetes/community#695
PR tracking: kubernetes/enhancements#368

Special notes for your reviewer:

Release note:
Extending Kubelet to support device plugin

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 23, 2017
@k8s-ci-robot
Copy link
Contributor

Hi @jiayingz. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 23, 2017
@k8s-github-robot k8s-github-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. release-note-label-needed labels Aug 23, 2017
if d, ok := devsMap[dev.ContainerPath]; ok {
glog.V(3).Infof("skip existing device %s %s", dev.ContainerPath, dev.HostPath)
if (d != dev.HostPath) {
glog.Errorf("Container device %s has conflicting mapping host devices: %s and %s",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we fail the pod / return an error at this point ?

Copy link
Contributor

@RenaudWasTaken RenaudWasTaken Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question for Mounts and Envs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My feeling is this kind of misconfiguration may not be fatal. E.g., say the same device is accessible through multiple paths and either is fine. Since we log error here, I think we can make this information accessible through npm for debugging purpose.

}

// NewDevicePluginHandler create a DevicePluginHandler
func NewDevicePluginHandler() (*DevicePluginHandlerImpl, error) {
func NewDevicePluginHandler(enabled bool, updateCapacity updateCapacityCallback) (*DevicePluginHandlerImpl, error) {
if !enabled {
Copy link
Contributor

@RenaudWasTaken RenaudWasTaken Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if should use the same pattern as for the GPU manager. As in:
type DevicePluginHandler interface
type DevicePluginHandlerImpl struct
type MockDevicePluginHandler struct

And when calling NewDevicePluginHandler instead of passing a bool:

if enabled {
     cm.handler = NewDevicePluginHandlerImpl()
} else {
     cm.handler = NewMockDevicePluginHandler()
}

That would avoid the if at the start of every methods and is a lot cleaner :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. Done.


mgr, err := deviceplugin.NewManagerImpl(pluginapi.DevicePluginPath,
func(r string, a, u, d []*pluginapi.Device) {})
deviceManagerMonitorCallback := func(resourceName string, added, updated, deleted []*pluginapi.Device) {
Copy link
Contributor

@RenaudWasTaken RenaudWasTaken Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this would fit more as a method of the handler ?
You would then be able to call NewManagerImpl:

deviceplugin.NewManagerImpl(pluginapi.KubeletSocket, handler.deviceManagerMonitorCallback)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function is to be called by manager only. In that sense, there doesn't seem to be much value to store it in deviceplugin.

@@ -572,6 +571,19 @@ func (kl *Kubelet) setNodeStatusMachineInfo(node *v1.Node) {
}
}
}

if utilfeature.DefaultFeatureGate.Enabled(features.DevicePlugins) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mock structure would also remove this if

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably safe to skip this feature checking anyway, but I am following the current model as LocalStorageCapacityIsolation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiayingz Local storage does not have a stub concept. We do not need this check here as @RenaudWasTaken pointed out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed now.

@cmluciano
Copy link

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 23, 2017
}
}

glog.Infof("Creating device plugin handler: %t", devicePluginEnabled)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should log level be used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a one-time log, so probably ok to stay as info.

handler := &DevicePluginHandlerImpl{
allDevices: make(map[string]sets.String),
allocated: devicesInUse(),
updateCapacity: updateCapacity,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/updateCapacity/updateCapacityFunc or something similar to hint a callback function variable name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -544,6 +572,13 @@ func (cm *containerManagerImpl) Start(node *v1.Node, activePods ActivePodsFunc)
}
close(stopChan)
}, time.Second, stopChan)

// Starts device plugin manager.
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be safe to remove this part?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.

func (cm *containerManagerImpl) GetResources(pod *v1.Pod, container *v1.Container, activePods []*v1.Pod) (*kubecontainer.RunContainerOptions, error) {
opts := &kubecontainer.RunContainerOptions{}
// Gets devices, mounts, and envs from device plugin handler.
glog.V(3).Infof("Calling device_plugin AllocateDevices")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/device_plugin/device_plugin handler ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

}
}

// updateAllocatedGPUs updates the list of GPUs in use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: updateAllocatedGPUs/updateAllocatedDevices

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

devicePluginManager *deviceplugin.ManagerImpl
// allDevices and allocated are keyed by resource_name.
allDevices map[string]sets.String
allocated map[string]*podDevices
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/allocated/allocatedDevices

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

func (m *ManagerImpl) Devices() map[string][]*pluginapi.Device {
glog.V(2).Infof("Devices called")

m.mutex.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are already using locks in device plugin handler, then what race condition this second level of locking is avoiding?

Copy link
Contributor

@RenaudWasTaken RenaudWasTaken Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this should be read from a top down perspective @vikaschoudhary16 but instead bottom up especially for locks :)

Anyways to answer your question the lock here is to prevent Endpoints from changing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is from Renaud's commit but I think in general, each layer would have its own lock to protect its internal data structure. This prevents future breakage when we move things around.

available := h.allDevices[key].Difference(devicesInUse)
glog.V(2).Infof("devices available: %v", available.List())
if int(available.Len()) < needed {
return nil, fmt.Errorf("requested number of devices unavailable. Requested: %d, Available: %d", v, available.Len())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be needed instead of v?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

for _, resp := range allocResps {
for _, devRuntime := range resp.Spec {
for _, dev := range devRuntime.Devices {
if d, ok := devsMap[dev.ContainerPath]; ok {
Copy link
Contributor

@vikaschoudhary16 vikaschoudhary16 Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering how can this if condition get pass ever. At L#618, devMaps is being initialized to an empty map, how can devsMap[dev.ContainerPath] be non-empty in any case?

Copy link
Contributor Author

@jiayingz jiayingz Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed. Note to myself, should add a unit test to cover this :).


// Returns whether the ResourceName points to a device plugin name.
func IsDeviceName(k v1.ResourceName) bool {
return v1helper.IsExtendedResourceName(k)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsExtendedResourceName definition is missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a helper function added by Connor's PR. Should already be merged.

@jiayingz jiayingz force-pushed the deviceplugin-jiayingz branch 10 times, most recently from a1708ad to 6edf47b Compare August 25, 2017 17:55
@jiayingz
Copy link
Contributor Author

/assign @vishh

FYI, this PR is ready for review.

return nil, fmt.Errorf("requested number of devices unavailable. Requested: %d, Available: %d", needed, available.Len())
}
allocated := available.UnsortedList()[:needed]
resp, err := h.devicePluginManager.Allocate(key, append(devices.UnsortedList(), allocated...))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RenaudWasTaken @jiayingz one suggestion, I think we should invoke h.devicePluginManager.Allocate outside of mutex lock because plugin manager is already using locks. Mutex of device plugin handler should be used only to access handler's internal data structures like h.allocatedDevices for example. This looks to me performance efficient than existing implementation. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. A remote RPC with a lock is probably not a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


}

func (e *endpoint) allocate(devs []string) (*pluginapi.AllocateResponse, error) {
Copy link
Contributor

@vikaschoudhary16 vikaschoudhary16 Aug 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RenaudWasTaken @jiayingz Pod/container details should also be passed along with device ids to the plugin. One use case for this requirement is special NIC resources such as Solarflare and Infiniband. With current API, it is possible to mount device files and libs inside container but the corresponding network interface will be available to use inside container only if --net=host is used. This is kinda very strict limitation.
Instead, if pod details are also passed to the device plugin, device plugin can pass the allocated interface from the host net namespace to container net namespace and thus making it possible to use such pass through network interfaces inside non net=host containers as well.
NICs are one such use case, in future there could be many more. We should start with a wider scope and pass pod details as well to the device plugins and depending on requirement, plugins may or may not use these details. Thoughts?
cc @vishh @jeremyeder @sjenning @cmluciano

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vikaschoudhary16 the unofficial, unvetted plan is to have device plugins drop a CNI config along with necessary CNI utility binaries, and have device plugins return additional interface via CNI constructs. That way the device plugin API can be invoked during pod creation rather than during container creation.
In the future, if the networking SIG ends up having a similar RPC based CNI API, then the workflow will become simpler.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unofficial plan sounds good to me on the surface. I think @3XX0 and I determined that an CNI plugin for Infiniband would probably be necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I think the discussion on CNI integration will probably wait after 1.8.

return handler, nil
}

func (h *DevicePluginHandlerImpl) Start() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find from where this is being invoked. Should it be invoked from within NewDevicePluginHandlerImpl()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Object creation is different from logical initialization of an object. The latter may have transitive dependencies on state of other objects and hence requires careful co-ordination initialization.


// NewDevicePluginHandler create a DevicePluginHandler
func NewDevicePluginHandlerImpl(updateCapacityFunc updateCapacityCallback) (*DevicePluginHandlerImpl, error) {
glog.V(2).Infof("Starting Device Plugin Handler")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not invoking Start of device plugin handler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling to Start now moves to containerManagerImpl.Start().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the log should be updated to say that it's "creating" device plugin handler or maybe move the log to Start() method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 1, 2017
@jiayingz jiayingz force-pushed the deviceplugin-jiayingz branch from cbc13fe to bc942f5 Compare September 1, 2017 17:58
@k8s-github-robot k8s-github-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Sep 1, 2017
@jiayingz
Copy link
Contributor Author

jiayingz commented Sep 1, 2017

Lost lgtm after rebase. @dchen1107 and @vishh could you help lgtm this PR again? Thanks!

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 1, 2017
@jiayingz jiayingz force-pushed the deviceplugin-jiayingz branch from bc942f5 to 02001af Compare September 1, 2017 18:56
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 1, 2017
@vishh
Copy link
Contributor

vishh commented Sep 1, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 1, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, jiayingz, vishh

Associated issue: 695

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@vishh vishh mentioned this pull request Sep 2, 2017
@abgworrall abgworrall modified the milestone: v1.8 Sep 2, 2017
@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 51590, 48217, 51209, 51575, 48627)

}
time.Sleep(1 * time.Second)
}
log.Println("Starting to serve on", m.socket)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiayingz Why use log.Println() instead of glog? Is this a separate binary?

Copy link
Contributor

@RenaudWasTaken RenaudWasTaken Sep 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @resouer this is fixed in a follow up PR.
device_plugin_stub.go is invoked from tests only though :)

Thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.