`guix shell -CN' failed to access GPU

  • Open
  • quality assurance status badge
Details
2 participants
  • dan
  • Ludovic Courtès
Owner
unassigned
Submitted by
dan
Severity
normal
Merged with
D
(address . bug-guix@gnu.org)
87zgd41syf.fsf@dan.games
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256


I was trying to run some GUI software in a guix container, and would like to have GPU access in it. However, I later found out that if I gave network access to the container, it seems like unable to properly find the GPU. The following are the commands that I run and the output I got:

- ------------------------------without-network-access------------------------------

$ guix shell -C mesa-utils --expose=/tmp/.X11-unix --expose=$XAUTHORITY --expose=/dev/dri --expose=/etc/udev -E "DISPLAY|XAUTHORITY" -- glxinfo -B

name of display: :1
display: :1 screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
Vendor: AMD (0x1002)
Device: AMD RENOIR (DRM 3.47.0, 5.19.15, LLVM 11.0.0) (0x1638)
Version: 21.3.8
Accelerated: yes
Video memory: 1024MB
Unified memory: no
Preferred profile: core (0x1)
Max core profile version: 4.6
Max compat profile version: 4.6
Max GLES1 profile version: 1.1
Max GLES[23] profile version: 3.2
Memory info (GL_ATI_meminfo):
VBO free memory - total: 655 MB, largest block: 655 MB
VBO free aux. memory - total: 15305 MB, largest block: 15305 MB
Texture free memory - total: 655 MB, largest block: 655 MB
Texture free aux. memory - total: 15305 MB, largest block: 15305 MB
Renderbuffer free memory - total: 655 MB, largest block: 655 MB
Renderbuffer free aux. memory - total: 15305 MB, largest block: 15305 MB
Memory info (GL_NVX_gpu_memory_info):
Dedicated video memory: 1024 MB
Total available memory: 16487 MB
Currently available dedicated video memory: 655 MB
OpenGL vendor string: AMD
OpenGL renderer string: AMD RENOIR (DRM 3.47.0, 5.19.15, LLVM 11.0.0)
OpenGL core profile version string: 4.6 (Core Profile) Mesa 21.3.8
OpenGL core profile shading language version string: 4.60
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 4.6 (Compatibility Profile) Mesa 21.3.8
OpenGL shading language version string: 4.60
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile

OpenGL ES profile version string: OpenGL ES 3.2 Mesa 21.3.8
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
- ------------------------------with-network-access------------------------------

$ guix shell -CN mesa-utils --expose=/tmp/.X11-unix --expose=$XAUTHORITY --expose=/dev/dri --expose=/etc/udev -E "DISPLAY|XAUTHORITY" -- glxinfo -B

name of display: :1
libGL error: MESA-LOADER: failed to retrieve device information
libGL error: MESA-LOADER: failed to open amdgpu: /gnu/store/83kzrpczis5s8hn3ly9y89mij7ngq4bw-mesa-21.3.8/lib/dri/amdgpu_dri.so: cannot open shared object file: No such file or directory (search paths /gnu/store/83kzrpczis5s8hn3ly9y89mij7ngq4bw-mesa-21.3.8/lib/dri, suffix _dri)
libGL error: failed to load driver: amdgpu
libGL error: MESA-LOADER: failed to retrieve device information
libGL error: MESA-LOADER: failed to open amdgpu: /gnu/store/83kzrpczis5s8hn3ly9y89mij7ngq4bw-mesa-21.3.8/lib/dri/amdgpu_dri.so: cannot open shared object file: No such file or directory (search paths /gnu/store/83kzrpczis5s8hn3ly9y89mij7ngq4bw-mesa-21.3.8/lib/dri, suffix _dri)
libGL error: failed to load driver: amdgpu
display: :1 screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
Vendor: Mesa/X.org (0xffffffff)
Device: llvmpipe (LLVM 11.0.0, 256 bits) (0xffffffff)
Version: 21.3.8
Accelerated: no
Video memory: 30926MB
Unified memory: no
Preferred profile: core (0x1)
Max core profile version: 4.5
Max compat profile version: 4.5
Max GLES1 profile version: 1.1
Max GLES[23] profile version: 3.2
OpenGL vendor string: Mesa/X.org
OpenGL renderer string: llvmpipe (LLVM 11.0.0, 256 bits)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 21.3.8
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 4.5 (Compatibility Profile) Mesa 21.3.8
OpenGL shading language version string: 4.50
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile

OpenGL ES profile version string: OpenGL ES 3.2 Mesa 21.3.8
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
- ------------------------------paste-ends-here------------------------------

The only difference between these two executions are the `-N' flag. I also had a look at the related guile code, and it seems like the `-N' flag is only doing two things:
1. bind several network related files to /etc
2. share network namespace to the container

I've had a few other guix users tested the commands, and they reported the similar results.

Some info about my environment:
kernel version: 6.0.7
mesa version: 21.3.8

- --
dan
-----BEGIN PGP SIGNATURE-----

iQJABAEBCAAqFiEENywBMxcNCHYJ4/aIR1rKxpmiJ40FAmNnWCgMHGlAZGFuLmdh
bWVzAAoJEEdaysaZoieNbFsP/2INlj3WNX8fKBt5pFGkAnewXUHS4Vn+pBSbshuc
srwJ4gaatBJkaWvA71kH3mLwYOH+cQmSVI8Zt2Bc2Ztny+SewBt9cqvQAEAmHME7
tW2y5nAhzsJplMoOtTcRnT1Opdn5Zz0iLCwuc8avVa14KwqV53qEmXyjdL8DwIgQ
kkyog4j3W5bCIfKdAwQmsg9/Fr4TEVRiFHvNCkmpkCHVxQ0RBsTvW5wfHzfkSvL5
Z0FY20xq20LjTpwuk6yVl79+4dkSotXoXwSbkd3aa8ehyWIlGLrTyTkJeL5jmqXZ
ec9zWBN5xT6a1JiOxhVxGn/X3FLpSryOp7kzz5L4RrWbMPYnILUz0X5XzcRRZYWK
OovxW/z6Ug6uDAfMkgGuiLrdiHOGKnxaEzJdtVdDwtk2SMqM0B8qZEkunZIfUeKf
2BOy7xCxx8UP+mtdaHz/wdH6IvVMSewDLZUIOXKOlhqeYm58vulPPkHIKP4EVNpC
RUmbRenevrfvt/6WYujxvd3GEU6I6DEslryObS7ntypjESxPiuwVTPLffhCwlomC
Yg23qP395fi4ecer+8rLgANsb7YUKWk74Pl218Pcddfjaitrfx3UUyWynYtPmxHg
tj30jNlhz2owYag5WC0c76K2rmnQaAZ8dHZ5pza0FFGHbkn7Xcqy7xXK4K0b6+5h
OSuZ
=qHGv
-----END PGP SIGNATURE-----
L
L
Ludovic Courtès wrote on 10 Nov 2022 10:45
(name . dan)(address . i@dan.games)(address . 59069@debbugs.gnu.org)
87fserkusb.fsf@gnu.org
Hi,

dan <i@dan.games> skribis:

Toggle quote (2 lines)
> I was trying to run some GUI software in a guix container, and would like to have GPU access in it. However, I later found out that if I gave network access to the container, it seems like unable to properly find the GPU. The following are the commands that I run and the output I got:

Could you check with strace what it’s trying to access, both with and
without ‘-N’?

guix shell mesa-utils strace … -C -- strace -o /tmp/log.strace glxinfo

It might be a /dev node, or it might be simply talking to the X server,
which requires network access.

Thanks,
Ludo’.
D
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 59069@debbugs.gnu.org)
87zgcyu99s.fsf@dan.games
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (7 lines)
> Could you check with strace what it’s trying to access, both
> with and
> without ‘-N’?
>
> guix shell mesa-utils strace … -C -- strace -o /tmp/log.strace
> glxinfo

I looked into the strace logs, and found out that it's actually
having trouble accessing /sys, which is not available in a '-N'
container. I run the following scripts to test:
Toggle quote (2 lines)
> $ guix shell -C coreutils -- ls /
> bin dev etc gnu home proc sys tmp
while with the '-N' flag:
Toggle quote (3 lines)
> $guix shell -CN coreutils --ls /
> bin dev etc gnu home proc tmp

L
L
Ludovic Courtès wrote on 10 Nov 2022 16:49
(name . dan)(address . i@dan.games)
87o7teg68j.fsf@gnu.org
Hi!

(Cc: Dave Thompson, the original author of this code.)

As you pointed out on IRC, the problem is that ‘guix shell -C’ provides
/sys whereas ‘guix shell -CN’ doesn’t.

This stems from this call in (gnu build linux-container), which has
always been there:

(mount-file-systems root mounts
#:mount-/proc? (memq 'pid namespaces)
#:mount-/sys? (memq 'net
namespaces))

This is explained a few lines above:

;; A sysfs mount requires the user to have the CAP_SYS_ADMIN capability in
;; the current network namespace.
(when mount-/sys?
(mount* "none" (scope "/sys") "sysfs"
(logior MS_NOEXEC MS_NOSUID MS_NODEV MS_RDONLY)))

As you noticed with ‘--expose=/sys’, bind-mounting /sys doesn’t work
either (‘mount’ fails with EINVAL).

Not sure what to do. Thoughts?

Ludo’.
L
L
Ludovic Courtès wrote on 12 Nov 2022 18:24
control message for bug #59069
(address . control@debbugs.gnu.org)
87h6z4cchg.fsf@gnu.org
merge 59069 59166
quit
?
Your comment

Commenting via the web interface is currently disabled.

To comment on this conversation send an email to 59069@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 59069
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch