Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device Monitoring Documentation #9945

Merged
merged 1 commit into from
Nov 27, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
add device monitoring documentation
  • Loading branch information
dashpole committed Nov 21, 2018
commit 31791028ad75839fc1bfb0627c27d66810aaafb1
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,36 @@ a Kubernetes release with a newer device plugin API version, upgrade your device
to support both versions before upgrading these nodes to
ensure the continuous functioning of the device allocations during the upgrade.

## Monitoring Device Plugin Resources

In order to monitor resources provided by device plugins, monitoring agents need to be able to
discover the set of devices that are in-use on the node and obtain metadata to describe which
container the metric should be associated with. Prometheus metrics exposed by device monitoring
agents should follow the
[Kubernetes Instrumentation Guidelines](https://github.com/kubernetes/community/blob/master/contributors/devel/instrumentation.md),
which requires identifying containers using `pod`, `namespace`, and `container` prometheus labels.
The kubelet provides a gRPC service to enable discovery of in-use devices, and to provide metadata
for these devices:

```gRPC
// PodResources is a service provided by the kubelet that provides information about the
// node resources consumed by pods and containers on the node
service PodResources {
rpc List(ListPodResourcesRequest) returns (ListPodResourcesResponse) {}
}
```

The gRPC service is served over a unix socket at `/var/lib/kubelet/pod-resources/kubelet.sock`.
Monitoring agents for device plugin resources can be deployed as a daemon, or as a DaemonSet.
The cannonical directory `/var/lib/kubelet/pod-resources` requires privileged access, so monitoring
agents must run in a privileged security context. If a device monitoring agent is running as a
DaemonSet, `/var/lib/kubelet/pod-resources` must be mounted as a
[Volume](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#volume-v1-core)
in the plugin's
[PodSpec](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#podspec-v1-core).

Support for the "PodResources service" is still in alpha.

## Examples

For examples of device plugin implementations, see:
Expand Down