cdi: inject mount UID/GID mappings if user NS is in use. #288

klihub · 2025-10-18T17:55:18Z

When injecting mounts to a container with user namespaces in use, inject the mounts with UID and GID mappings taken from the OCI Spec.

Fixes #287.

@elezar @haircommander Although this simplistic fix works for the reported case, it might be a bit too simplistic (in the long run).

In particular, I wonder if we should eventually provide a way for the one requesting CDI injection (typically a DRA driver) to also provide UID-/GID-mappings for injected mounts, and only let CDI itself pick and inject UID-/GID-mappings on its own, if the necessary preconditions apply but mappings are not provided.

IOW, should we provide a way for a DRA driver itself to pick and inject the mappings ? If we are of the opinion that we should, then I think we have a bit of a problem with APIs and how things are currently wired up with CDI/CRI. The straightforward way to allow a driver to pick mappings would be to simply extend the CDI notion of a mount with UID- and GID-mappings. However, that would imply that only dynamically generated CDI Specs would allow injecting non-static mappings, since there is no way to pass on any additionald/dynamic parameters along a CDI device injection request over CRI (or in the CDI/runtime API).

elezar · 2025-10-20T09:10:45Z

pkg/cdi/oci.go

+type extraOCIMountOption func(*spec.Mount)
+
+// withMountIDMappings adds UID and GID mappings for the given mount.
+func withMountIDMappings(uid, gid []spec.LinuxIDMapping) extraOCIMountOption {


Does it make sense to have uid and gid settable by separate options.

I am fine with anything. Separate options, combined options, even with simply doing it like this

func (m *Mount) toOCI(uids, gids []spec.LinuxIDMapping) spec.Mount { om := spec.Mount{ Source: m.HostPath, Destination: m.ContainerPath, Options: m.Options, Type: m.Type, UIDMappings: uids, GIDMappings: gids, } return om

pkg/cdi/oci.go

elezar · 2025-10-20T09:11:27Z

pkg/cdi/container-edits_test.go

+						Destination: "/dest/path/a",
+						UIDMappings: []oci.LinuxIDMapping{
+							{
+								ContainerID: 0,


When is ContainerID non-zero?

Do you mean in practice ? In principle, one could set up arbitrary mappings with multiple ranges which is possible, but in practice I think you'll only see a single range and starting (in the container) with 0, because that's what this is primarily used for. But from the OCI Spec point of view, nothing dictates any range or sub-range to start (in the container) at 0.

elezar · 2025-10-20T09:16:09Z

Is something similar required for devices since the uid and gid in the spec are the IDs in the container (https://github.com/opencontainers/runtime-spec/blob/main/config-linux.md#devices)?

klihub · 2025-10-20T09:20:52Z

Is something similar required for devices since the uid and gid in the spec are the IDs in the container (https://github.com/opencontainers/runtime-spec/blob/main/config-linux.md#devices)?

At least nothing identical we can do for devices, since they don't have UID/GID mappings, only mounts have them. Also AFAICT, device access control is a bit simpler (when compared with access control for a full bind-mounted subdirectory), just device node permissions and cgroup access rules.

When injecting mounts to a container with user namespaces in use, inject the mounts with UID and GID mappings taken from the OCI Spec. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

kad · 2025-10-21T16:37:34Z

Device permissions are handled inside the containerd/cri-o based on runAsUser in non-root containers.

rata · 2025-10-27T14:40:41Z

Hi! userns KEP author here. I'm not very familiar with the CDI injection process to provide more input. do you have any links I should read?

My current opinion without much education on CDI injection is that we should allow drivers to set some mappings, but we should complete with sane defaults (as this PR does) if userns are used and those fields are not set. Wouldn't that solve the "However, that would imply that only dynamically generated CDI Specs would allow injecting non-static mappings" issue you pointed in the description?

The k8s end user currently can't select mappings for mounts, as the kubelet will set them to what makes sense in that context. However, if CDI might inject devices or things that are not just bind-mounts (like a device node), setting the UID/GID for it will be critical to work with user-namespaces (uidmappings only work for bind-mounts). I wonder if the CDI drivers have access to the mapping used for the userns, as you basically need to set the UID/GID of the inode to the hostUID of the userns mapping.

Sorry I took some time to reply, I was on PTO and just coming back. I'm happy to help in any way that I can with this :)

klihub · 2025-11-04T09:34:10Z

b> Hi! userns KEP author here. I'm not very familiar with the CDI injection process to provide more input. do you have any links I should read?

@rata No, unfortunatly I don't have, but I think that it's enough to take a look at the CDI Spec (Mount) definition and the minimalist CRI CDI device injection request/message which is then also reflected in the library entry point signature for device injection to show what I meant.

My current opinion without much education on CDI injection is that we should allow drivers to set some mappings, but we should complete with sane defaults (as this PR does) if userns are used and those fields are not set. Wouldn't that solve the "However, that would imply that only dynamically generated CDI Specs would allow injecting non-static mappings" issue you pointed in the description?

Yes, it would, but given the above linked setup it requires more changes than just adding new data to the CDI Spec. And that's why I brought the whole thing up, because my gut feeling is/was the same: the requester of a CDI device injection, typically a DRA driver, should eventually be able to provide mappings at will. However, ATM if we add UID-/GID-mapping attributes to CDI Spec Mounts without other changes, the only way for a driver to provide mappings would be to dynamically generate a per-claim CDI Spec with the desired mappings and then reference that by name the the CDI-specific CRI injection request/message bits.

This would be far from ideal. But anything else will require further changes. To CRI, to be able to pass extra parameters for the injection, and to CDI, to allow such parameters to be passed in for injection to the library and used. And this last bit (constructing injection data from a CDI Spec and extra information provided by the injection request itself) is something new that we don't have in place at the moment. So it doesn't come as easily as the rest of just adding some new data to some entity to be injected.

The k8s end user currently can't select mappings for mounts, as the kubelet will set them to what makes sense in that context. However, if CDI might inject devices or things that are not just bind-mounts (like a device node), setting the UID/GID for it will be critical to work with user-namespaces (uidmappings only work for bind-mounts).

I wonder if the CDI drivers have access to the mapping used for the userns, as you basically need to set the UID/GID of the inode to the hostUID of the userns mapping.

If you mean DRA drivers, then I think a DRA driver, purely in the 'K8s/kubelet DRA driver' context, does not have access to that information.

Currently if an injected device node does not come with UID/GID's or permission bits specified in the CDI Spec, IIRC we take a look at the corresponding device node on the host side and if it's there we take those missing bits of information from there. Do you think we should change this logic and, for instance, if there are UID/GID-mappings present in the container try setting the device node UID/GID based on that mapping (for root/UID/GID 0) ?

Sorry I took some time to reply, I was on PTO and just coming back. I'm happy to help in any way that I can with this :)

No problemo. Happy that you can chime in and help.

rata · 2025-11-04T10:57:06Z

Currently if an injected device node does not come with UID/GID's or permission bits specified in the CDI Spec, IIRC we take a look at the corresponding device node on the host side and if it's there we take those missing bits of information from there. Do you think we should change this logic and, for instance, if there are UID/GID-mappings present in the container try setting the device node UID/GID based on that mapping (for root/UID/GID 0) ?

Yes, I was thinking just that too as an alternative :)

What kind of mount are you injecting here? Is it only bind-mounts or do you also inject inodes (like a device, char device, block device or some else)?

For bind-mounts we should inject the uid/gid mappings here (as otherwise we need to chown, it's a complete mess, we can't share, etc.).

For devices (char device, block device, etc.), I think we should used the mapped UID/GID. In other words, if the inode on the host has UID 0, then we should use the hostUID that is mapped to 0 inside the container.

@klihub what do you think? Do you think it's doable?

elezar · 2025-11-04T11:30:17Z

For devices (char device, block device, etc.), I think we should used the mapped UID/GID. In other words, if the inode on the host has UID 0, then we should use the hostUID that is mapped to 0 inside the container.

@rata could you describe what you are proposing using the OCI Runtime Spec fields?

rata · 2025-11-04T11:38:39Z

@elezar sure.

The devices object in the runtime-spec has a uid/gid: https://github.com/opencontainers/runtime-spec/blob/main/config-linux.md#devices. I'm proposing to set those when devices are injected.

I'm proposing to set it to the user mapped inside the userns. The mapping in the OCI runtime spec is here: https://github.com/opencontainers/runtime-spec/blob/main/config-linux.md#user-namespace-mappings. Using the example from the spec:

"uidMappings": [
    {
        "containerID": 0,
        "hostID": 1000,
        "size": 32000
    }
],
"gidMappings": [
    {
        "containerID": 0,
        "hostID": 1000,
        "size": 32000
    }
]

If the device we want (

container-device-interface/specs-go/config.go

Lines 49 to 54 in ffd52dc

    
           type Mount struct { 
        
           	HostPath      string   `json:"hostPath"          yaml:"hostPath"` 
        
           	ContainerPath string   `json:"containerPath"     yaml:"containerPath"` 
        
           	Options       []string `json:"options,omitempty" yaml:"options,omitempty"` 
        
           	Type          string   `json:"type,omitempty"    yaml:"type,omitempty"` // Added in v0.4.0 
        
           }

) the hostPath is owned by UID 0 and GID 0, then using this examples we want to assign it to "0 inside the container". Then, we check what is the mapped ID for 0 inside the container.

The UID and GID 0 inside the container are mapped to the ID 1000 outside the container. Then, the device will have the uid and gid set to 1000 in this example.

Is this example clear? Don't hesitate to ask otherwise :)

klihub · 2025-11-04T12:56:21Z

Currently if an injected device node does not come with UID/GID's or permission bits specified in the CDI Spec, IIRC we take a look at the corresponding device node on the host side and if it's there we take those missing bits of information from there. Do you think we should change this logic and, for instance, if there are UID/GID-mappings present in the container try setting the device node UID/GID based on that mapping (for root/UID/GID 0) ?

Yes, I was thinking just that too as an alternative :)

What kind of mount are you injecting here? Is it only bind-mounts or do you also inject inodes (like a device, char device, block device or some else)?

For bind-mounts we should inject the uid/gid mappings here (as otherwise we need to chown, it's a complete mess, we can't share, etc.).

For devices (char device, block device, etc.), I think we should used the mapped UID/GID. In other words, if the inode on the host has UID 0, then we should use the hostUID that is mapped to 0 inside the container.

@klihub what do you think? Do you think it's doable?

@rata Yes, I was thinking much along the same lines... to add to this PR for devices. Slightly differently though as I was thinking that instead of the host side UID/GID 0 check, what I'd do is that if the CDI Spec does not explicitly specify a UID/GID for the device node (which now causes them to be looked for in the corresponding host device node), we'd check if there are UID/GID-mappings set up for root/UID 0 in the container. And if there are, we'd use the mapped UID/GID for UID/GID 0 as the device node UID/GID. But your version might be better...

rata · 2025-11-04T14:48:47Z

Cool. I'd add to that, to handle all the cases, that if the UID/GID is not mapped, then we set uid/gid as 0 inside the container.

Just curious, completely out of topic to understand the big picture, how are people using GPUs using this? Is there any example workflow I can run locally or something?

everzakov · 2025-11-04T18:36:29Z

Sorry for interrupting. However, only setting necessary uid/gid for the device won't work (if container has user ns) because now runc don't do anything with devices if container has a user namespace (only set root uid in device config state). It uses mount bind command instead of mknod that's why the device will have the same uid/gid as on the host. So, if the device has a 0660 permission and we will try to use not the default uid/gid mapping then we will have EPERM error.

I have made an example to show the problem. CDI commit: everzakov@429161b Containerd everzakov/containerd@52d45ca My simple dra plugin https://github.com/everzakov/dra-plugin-health-example/tree/userns-with-devices

So, in host i have created the /dev/another_urandom device using mknod with 1,9 major/minor, root owner and 0660 permissions.

I have created the new container but i have got EPERM error. Also, you can see that all devices have nobody, nogroup:

vboxuser@vboxuser:~/dra-plugin-health-example$ cat /run/cdi/dev0-dra-example-com.json 
{"cdiVersion":"1.0.0","kind":"example.com/device","devices":[{"name":"dev0-dra-example-com","containerEdits":{"env":["Check=True"],"deviceNodes":[{"path":"/dev/another_urandom","hostPath":"/dev/another_urandom","type":"c","major":1,"minor":9,"permissions":"rwm","uid":0,"gid":0}]}}],"containerEdits":{}}vboxuser@vboxuser:~/dra-plugin-health-example$ kubectl get pods -o wide
NAME                              READY   STATUS    RESTARTS   AGE   IP            NODE   NOMINATED NODE   READINESS GATES
dra-example-kubeletplugin-s7f45   1/1     Running   0          30s   10.244.0.98   node   <none>           <none>
example-pod                       3/3     Running   0          30s   10.244.0.99   node   <none>           <none>
vboxuser@vboxuser:~/dra-plugin-health-example$ kubectl  exec -ti example-pod -- /bin/bash
Defaulted container "container1" out of: container1, container2, container3
root@example-pod:/# ls -la /dev/
total 4
drwxr-xr-x 5 root   root     380 Nov  4 18:06 .
drwxr-xr-x 1 root   root    4096 Nov  4 18:06 ..
crw-rw---- 1 nobody nogroup 1, 9 Nov  4 15:52 another_urandom
lrwxrwxrwx 1 root   root      11 Nov  4 18:06 core -> /proc/kcore
lrwxrwxrwx 1 root   root      13 Nov  4 18:06 fd -> /proc/self/fd
crw-rw-rw- 1 nobody nogroup 1, 7 Nov  4 15:19 full
drwxrwxrwt 2 root   nogroup   40 Nov  4 18:06 mqueue
crw-rw-rw- 1 nobody nogroup 1, 3 Nov  4 15:19 null
lrwxrwxrwx 1 root   root       8 Nov  4 18:06 ptmx -> pts/ptmx
drwxr-xr-x 2 root   root       0 Nov  4 18:06 pts
crw-rw-rw- 1 nobody nogroup 1, 8 Nov  4 15:19 random
drwxrwxrwt 2 nobody nogroup   40 Nov  4 18:06 shm
lrwxrwxrwx 1 root   root      15 Nov  4 18:06 stderr -> /proc/self/fd/2
lrwxrwxrwx 1 root   root      15 Nov  4 18:06 stdin -> /proc/self/fd/0
lrwxrwxrwx 1 root   root      15 Nov  4 18:06 stdout -> /proc/self/fd/1
-rw-rw-rw- 1 root   root       0 Nov  4 18:06 termination-log
crw-rw-rw- 1 nobody nogroup 5, 0 Nov  4 18:10 tty
crw-rw-rw- 1 nobody nogroup 1, 9 Nov  4 15:19 urandom
crw-rw-rw- 1 nobody nogroup 1, 5 Nov  4 15:19 zero
root@example-pod:/# ls -la /dev/another_urandom 
crw-rw---- 1 nobody nogroup 1, 9 Nov  4 15:52 /dev/another_urandom
root@example-pod:/# head -n 1 /dev/another_urandom 
head: cannot open '/dev/another_urandom' for reading: Permission denied

My CDI commit have set the right uid/gid for a device:

vboxuser@vboxuser:~/container-device-interface$ sudo crictl inspect c4b49517e67c4 | jq .info.config.CDI_devices
[
  {
    "name": "example.com/device=dev0-dra-example-com"
  }
]

vboxuser@vboxuser:~/container-device-interface$ sudo ctr -n k8s.io c info c4b49517e67c4342355f8825fcbe8c65c8e7c4abc165fd4ca7f11699c2f8aa3b  | jq .Spec.linux.gidMappings
[
  {
    "containerID": 0,
    "hostID": 2460155904,
    "size": 65536
  }
]

vboxuser@vboxuser:~/container-device-interface$ sudo ctr -n k8s.io c info c4b49517e67c4342355f8825fcbe8c65c8e7c4abc165fd4ca7f11699c2f8aa3b  | jq .Spec.linux.devices
[
  {
    "path": "/dev/another_urandom",
    "type": "c",
    "major": 1,
    "minor": 9,
    "uid": 2460155904,
    "gid": 2460155904
  }
]

vboxuser@vboxuser:~/container-device-interface$ sudo python3 -m json.tool /run/containerd/runc/k8s.io/c4b49517e67c4342355f8825fcbe8c65c8e7c4abc165fd4ca7f11699c2f8aa3b/state.json  | jq .config.devices
[
  {
    "type": 99,
    "major": 1,
    "minor": 3,
    "permissions": "rwm",
    "allow": true,
    "path": "/dev/null",
    "file_mode": 438,
    "uid": 2460155904,
    "gid": 2460155904
  },
  {
    "type": 99,
    "major": 1,
    "minor": 8,
    "permissions": "rwm",
    "allow": true,
    "path": "/dev/random",
    "file_mode": 438,
    "uid": 2460155904,
    "gid": 2460155904
  },
  {
    "type": 99,
    "major": 1,
    "minor": 7,
    "permissions": "rwm",
    "allow": true,
    "path": "/dev/full",
    "file_mode": 438,
    "uid": 2460155904,
    "gid": 2460155904
  },
  {
    "type": 99,
    "major": 5,
    "minor": 0,
    "permissions": "rwm",
    "allow": true,
    "path": "/dev/tty",
    "file_mode": 438,
    "uid": 2460155904,
    "gid": 2460155904
  },
  {
    "type": 99,
    "major": 1,
    "minor": 5,
    "permissions": "rwm",
    "allow": true,
    "path": "/dev/zero",
    "file_mode": 438,
    "uid": 2460155904,
    "gid": 2460155904
  },
  {
    "type": 99,
    "major": 1,
    "minor": 9,
    "permissions": "rwm",
    "allow": true,
    "path": "/dev/urandom",
    "file_mode": 438,
    "uid": 2460155904,
    "gid": 2460155904
  },
  {
    "type": 99,
    "major": 1,
    "minor": 9,
    "permissions": "",
    "allow": false,
    "path": "/dev/another_urandom",
    "file_mode": 438,
    "uid": 2460155904,
    "gid": 2460155904
  }
]

The initial device can be chowned to the right user. However, we wan't be able to use this device with several containers.

vboxuser@vboxuser:~/container-device-interface$ sudo python3 -m json.tool /run/containerd/runc/k8s.io/c4b49517e67c4342355f8825fcbe8c65c8e7c4abc165fd4ca7f11699c2f8aa3b/state.json  | grep pid
    "init_process_pid": 74042

vboxuser@vboxuser:~/container-device-interface$ sudo  chown 2460155904.2460155904 /proc/74042/root/dev/another_urandom
vboxuser@vboxuser:~/container-device-interface$ ls -la /dev/another_urandom 
crw-rw---- 1 2460155904 2460155904 1, 9 Nov  4 15:52 /dev/another_urandom

root@example-pod:/# head -n 1 /dev/another_urandom 
����x\��{����9���jDJl�������,Ȭ��O9 �3=�J��Hjo]��7��_�[��4oN���ؐ0�zWE�;��J�.y,�}��.��!n�|����z�<�slJ�&1�p
root@example-pod:/# ls -la /dev/another_urandom 
crw-rw---- 1 root root 1, 9 Nov  4 15:52 /dev/another_urandom

PS. The default devices have 0666 permissions that's why we can use them in the container.

rata · 2025-11-05T09:45:38Z

@everzakov ohh, thanks for catching this! Okay, so runc is already setting the uid/gid to root inside the container and that doesn't work :(

So, for bind-mounts of filesystems (a file or a directory), what this PR is doing is fine.

For other devices, we need to see what we can do. Because the option that should work, isn't working and as @everzakov was saying maybe it can't work without changing the uid/gid of the device on the host, that might be problematic.

klihub force-pushed the fixes/uid-gid-mappings branch from 87acc6c to 19dae60 Compare October 18, 2025 18:44

klihub marked this pull request as ready for review October 19, 2025 16:30

klihub requested a review from elezar October 20, 2025 07:21

klihub force-pushed the fixes/uid-gid-mappings branch from 19dae60 to 0c5ec3a Compare October 20, 2025 08:46

elezar reviewed Oct 20, 2025

View reviewed changes

pkg/cdi/oci.go Outdated Show resolved Hide resolved

elezar reviewed Oct 20, 2025

View reviewed changes

klihub force-pushed the fixes/uid-gid-mappings branch 2 times, most recently from 069d16f to 7832481 Compare October 20, 2025 09:34

cdi: inject mount UID/GID mappings if user NS is in use.

7116a1d

When injecting mounts to a container with user namespaces in use, inject the mounts with UID and GID mappings taken from the OCI Spec. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

klihub force-pushed the fixes/uid-gid-mappings branch from 7832481 to 7116a1d Compare October 20, 2025 13:40

cdi: inject mount UID/GID mappings if user NS is in use. #288

Are you sure you want to change the base?

cdi: inject mount UID/GID mappings if user NS is in use. #288

Conversation

klihub commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elezar Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

klihub Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

elezar Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

klihub Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

elezar commented Oct 20, 2025

Uh oh!

klihub commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kad commented Oct 21, 2025

Uh oh!

rata commented Oct 27, 2025

Uh oh!

klihub commented Nov 4, 2025

Uh oh!

rata commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elezar commented Nov 4, 2025

Uh oh!

rata commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

klihub commented Nov 4, 2025

Uh oh!

rata commented Nov 4, 2025

Uh oh!

everzakov commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rata commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

klihub commented Oct 18, 2025 •

edited

Loading

klihub commented Oct 20, 2025 •

edited

Loading

rata commented Nov 4, 2025 •

edited

Loading

rata commented Nov 4, 2025 •

edited

Loading

everzakov commented Nov 4, 2025 •

edited

Loading