Kubernetes Device Plugins
- 6 minutes read - 1102 wordsI’m debugging an issue with Akri Zeroconf
protocol in which Instance environment variables are no longer (!) being surfaced within the Broker pods. In my adventures, it seemed useful to better understand how Akri works and specifically, how Akri uses Kubernetes Device Plugins.
IIUC plugins register with the Kubelet (!) via a gRPC service (Registration
) that the Kubelet exposes on a UNIX socket at /var/lib/kubelet/device-plugins/kubelet.sock
Then (!) if successful, devices should be reported by the Node’s metadata (spec) and available to be bound to Pods.
Here’s the directory listing once a kubelet
is running:
ls -l /var/lib/kubelet/device-plugins/
-rw-r--r-- 1 root root 0 Dec 17 17:39 DEPRECATION
srwxr-xr-x 1 root root 0 Dec 17 17:39 kubelet.sock
-rw------- 1 root root 2860 Dec 17 17:44 kubelet_internal_checkpoint
Curious DEPRECATION
notice?|warning? aside, the kubelet
’s gRPC endpoint is on kubelet.sock
. I should be able to interact with it using gRPCurl
if necessary but I haven’t tried this (yet).
After running Akri and applying a (Zeroconf
) Configuration, 4 Instances are created. Each of these matches a service that is being published using avahi-publish --server ${NAME} ...
on the Node’s host:
kubectl apply --filename=./zeroconf.yaml
kubectl get instances
NAME CONFIG SHARED NODES AGE
zeroconf-218f68 zeroconf true [akri] 13s
zeroconf-2b9223 zeroconf true [akri] 13s
zeroconf-ddec04 zeroconf true [akri] 13s
zeroconf-ef5d4a zeroconf true [akri] 13s
Rechecking the Kubelet’s list of sockets, we can see that 4 new sockets have been created:
ls -l /var/lib/kubelet/device-plugins/
-rw-r--r-- 1 root root 0 Dec 17 17:39 DEPRECATION
srwxr-xr-x 1 root root 0 Dec 17 17:39 kubelet.sock
-rw------- 1 root root 3051 Dec 17 17:48 kubelet_internal_checkpoint
srwxr-xr-x 1 root root 0 Dec 17 17:48 zeroconf-218f68-1608227293.sock
srwxr-xr-x 1 root root 0 Dec 17 17:48 zeroconf-2b9223-1608227293.sock
srwxr-xr-x 1 root root 0 Dec 17 17:48 zeroconf-ddec04-1608227293.sock
srwxr-xr-x 1 root root 0 Dec 17 17:48 zeroconf-ef5d4a-1608227293.sock
You can eyeball the Instance names (zeroconf-XXXXXX
) to see the matching socket zeroconf-XXXXXX-1608227293.sock
Each of these zeroconf-XXXXXX-*
sockets is also a gRPC endpoint exposing a service called DevicePlugin
.
The Kubelet annotates the Node with details of these devices.
NODE="akri"
HANDLER="zeroconf"
FILTER="\
| with_entries(select(.key|contains(\"${HANDLER}\")))
| with_entries(select(.value==\"1\"))"
# Capacity
kubectl get node/${NODE} --output=json \
| jq -r ".status.capacity ${FILTER}"
{
"akri.sh/zeroconf-218f68": "1",
"akri.sh/zeroconf-2b9223": "1",
"akri.sh/zeroconf-ddec04": "1",
"akri.sh/zeroconf-ef5d4a": "1"
}
# Allocatable
kubectl get node/${NODE} --output=json \
| jq -r ".status.allocatable ${FILTER}"
{
"akri.sh/zeroconf-218f68": "1",
"akri.sh/zeroconf-2b9223": "1",
"akri.sh/zeroconf-ddec04": "1",
"akri.sh/zeroconf-ef5d4a": "1"
}
Again, you can see that each device is now mapped to an entry in the Node’s .status.allocatable
and .status.capacity
values.
Because the Akri Agent registers the devices using Kubernetes’ Device Plugins, Akri names devices that it’s managing using the prefix akri.sh/
.
Ultimately, when Akri creates Broker Pods for these device Instances, the resulting Kubernetes spec includes, e.g.:
spec:
containers:
- image: ghcr.io/dazwilkin/zeroconf-broker@sha256:993e5b8d...
imagePullPolicy: IfNotPresent
name: zeroconf-broker
resources:
limits:
akri.sh/akri-zeroconf-218f68: "1"
requests:
akri.sh/akri-zeroconf-218f68: "1"
And this maps against the Node’s allocatable
|capacity
values such that, once taken (by a Broker Pod), the device (Instance) is no longer (unless shared) available to be bound to a different Pod.
It’s important to know that, as far as the Device Plugins functionality is concerned, advertising resource (device) availability and capacity to a Node, is the entirety of its function. Any Pod that wishes to use the resources (per above), will make a claim for the resource type (e.g. akri.sh/akri-zeroconf-218f68
above) and some discrete (integral) quantity (e.g. "1"
) but, once this is provided to the Pod, the Pod must then e.g. load relevant device drivers and communicate with the device as normal (as if there were no Kubernetes infrastructure).
What Akri provides beyond this is that handlers can be configured to mount environment variables, devices and volumes. Further, Akri’s model of proposing (these aren’t required) Broker Pods is somewhat akin to the idea of device twins. Broker Pods could be standardized in a deployment to e.g. always surface gRPC services and these services could then be provided to application developers as a standard mechanism for device interaction.
Allocatable|Capacity Tidying
I’d been unable to determine how to PATCH
a Node’s .status.allocatable
and .status.capacity
. Then I stumbled upon Advertise Extended Resources for a Node and it provided the solution.
I had:
# Kubernetes Node name
NODE="akri"
kubectl get node/${NODE} \
--output=json \
| jq ".status.capacity"
{
"akri.sh/nessie-64ebdb": "0",
"akri.sh/zeroconf-074bbf": "0",
"akri.sh/zeroconf-129a69": "0",
"akri.sh/zeroconf-218f68": "0",
"akri.sh/zeroconf-2b9223": "0",
"akri.sh/zeroconf-320a0c": "0",
"akri.sh/zeroconf-3ada00": "0",
"akri.sh/zeroconf-4e7c97": "0",
"akri.sh/zeroconf-591f22": "0",
"akri.sh/zeroconf-7548ec": "0",
"akri.sh/zeroconf-8e12f9": "0",
"akri.sh/zeroconf-90d0fb": "0",
"akri.sh/zeroconf-9e1938": "0",
"akri.sh/zeroconf-bc5064": "0",
"akri.sh/zeroconf-ddec04": "0",
"akri.sh/zeroconf-e7f45d": "0",
"akri.sh/zeroconf-f1acf2": "0",
"akri.sh/zeroconf-f27f70": "0",
"cpu": "2",
"ephemeral-storage": "8934656Ki",
"example.com/dongle": "4",
"hugepages-1Gi": "0",
"hugepages-2Mi": "0",
"memory": "8050104Ki",
"pods": "110"
}
NOTE And a matching set in
.status.allocatable
And:
# Kubernetes Node name
NODE="akri"
# Query the list for these items
FILTER="akri.sh/zeroconf"
DEVICES=$(\
kubectl get node/${NODE} \
--output=json \
| jq -r ".status.capacity | with_entries(select(.key|contains(\"${FILTER}\"))) | keys[]")
# Iterate over the list of devices
for DEVICE in ${DEVICES}
do
# This replaces e.g. `akri.sh/` with `akri.sh~1` to accommodate PATCH'ing
DEVICE=${DEVICE/\//~1}
# PATCH the Node's `.status.capacity` and remove the device
curl \
--header "Content-Type: application/json-patch+json" \
--request PATCH \
--data "[{\"op\": \"remove\", \"path\": \"/status/capacity/${DEVICE}\"}]" \
http://localhost:8888/api/v1/nodes/${NODE}/status
done
Then:
kubectl get node/${NODE} \
--output=json \
| jq ".status.capacity"
{
"akri.sh/nessie-64ebdb": "0",
"cpu": "2",
"ephemeral-storage": "9983232Ki",
"example.com/dongle": "4",
"hugepages-1Gi": "0",
"hugepages-2Mi": "0",
"memory": "8152504Ki",
"pods": "110"
}
The system deletes the matching entry is .status.allocatable
automatically:
kubectl get node/${NODE} \
--output=json \
| jq ".status.allocatable"
{
"akri.sh/nessie-64ebdb": "0",
"cpu": "2",
"ephemeral-storage": "8934656Ki",
"example.com/dongle": "4",
"hugepages-1Gi": "0",
"hugepages-2Mi": "0",
"memory": "8050104Ki",
"pods": "110"
}
gRPCurl
The protobuf listed in the Kubernetes Device Plugin documentation is incomplete but, I grabbed a copy of the v1beta1.Registration service here and hacked its reference to gogoproto/gogo.proto
Then, assuming:
ls -l /var/lib/kubelet/device-plugins/
-rw-r--r-- 1 root root 0 Dec 17 17:39 DEPRECATION
srwxr-xr-x 1 root root 0 Dec 17 17:39 kubelet.sock
-rw------- 1 root root 3051 Dec 17 17:48 kubelet_internal_checkpoint
srwxr-xr-x 1 root root 0 Dec 17 17:48 zeroconf-218f68-1608227293.sock
srwxr-xr-x 1 root root 0 Dec 17 17:48 zeroconf-2b9223-1608227293.sock
srwxr-xr-x 1 root root 0 Dec 17 17:48 zeroconf-ddec04-1608227293.sock
srwxr-xr-x 1 root root 0 Dec 17 17:48 zeroconf-ef5d4a-1608227293.sock
The Kubelet socket is write-only so there’s not much to see with it:
SOCK="/var/lib/kubelet/device-plugins/kubelet.sock"
sudo ./grpcurl \
-plaintext \
-unix \
-proto \
./deviceplugin.proto ${SOCK} list
v1beta1.DevicePlugin
v1beta1.Registration
But, slightly more interestingly, there’s some interaction possible with device sockets:
SOCK="/var/lib/kubelet/device-plugins/zeroconf-3ada00-1608315686.sock"
sudo ./grpcurl \
-plaintext \
-unix \
-proto \
./deviceplugin.proto ${SOCK} list
v1beta1.DevicePlugin
v1beta1.Registration
sudo ./grpcurl \
-plaintext \
-unix \
-proto \
./deviceplugin.proto ${SOCK} list v1beta1.DevicePlugin
v1beta1.DevicePlugin.Allocate
v1beta1.DevicePlugin.GetDevicePluginOptions
v1beta1.DevicePlugin.ListAndWatch
v1beta1.DevicePlugin.PreStartContainer
sudo ./grpcurl \
-plaintext \
-unix \
-proto \
./deviceplugin.proto ${SOCK} v1beta1.DevicePlugin.ListAndWatch
{
"devices": [
{
"ID": "zeroconf-3ada00-0",
"health": "Healthy"
}
]
}
sudo ./grpcurl \
-plaintext \
-unix \
-proto \
./deviceplugin.proto ${SOCK} v1beta1.DevicePlugin.GetDevicePluginOptions
{
"preStartRequired": true
}
That’s all!