`guix shell -CN' failed to access GPU

  • Open
  • quality assurance status badge
Details
2 participants
  • dan
  • Ludovic Courtès
Owner
unassigned
Submitted by
dan
Severity
normal
Merged with
D
(address . bug-guix@gnu.org)
87zgd41syf.fsf@dan.games
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256


I was trying to run some GUI software in a guix container, and would like to have GPU access in it. However, I later found out that if I gave network access to the container, it seems like unable to properly find the GPU. The following are the commands that I run and the output I got:

- ------------------------------without-network-access------------------------------

$ guix shell -C mesa-utils --expose=/tmp/.X11-unix --expose=$XAUTHORITY --expose=/dev/dri --expose=/etc/udev -E "DISPLAY|XAUTHORITY" -- glxinfo -B

name of display: :1
display: :1 screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
Vendor: AMD (0x1002)
Device: AMD RENOIR (DRM 3.47.0, 5.19.15, LLVM 11.0.0) (0x1638)
Version: 21.3.8
Accelerated: yes
Video memory: 1024MB
Unified memory: no
Preferred profile: core (0x1)
Max core profile version: 4.6
Max compat profile version: 4.6
Max GLES1 profile version: 1.1
Max GLES[23] profile version: 3.2
Memory info (GL_ATI_meminfo):
VBO free memory - total: 655 MB, largest block: 655 MB
VBO free aux. memory - total: 15305 MB, largest block: 15305 MB
Texture free memory - total: 655 MB, largest block: 655 MB
Texture free aux. memory - total: 15305 MB, largest block: 15305 MB
Renderbuffer free memory - total: 655 MB, largest block: 655 MB
Renderbuffer free aux. memory - total: 15305 MB, largest block: 15305 MB
Memory info (GL_NVX_gpu_memory_info):
Dedicated video memory: 1024 MB
Total available memory: 16487 MB
Currently available dedicated video memory: 655 MB
OpenGL vendor string: AMD
OpenGL renderer string: AMD RENOIR (DRM 3.47.0, 5.19.15, LLVM 11.0.0)
OpenGL core profile version string: 4.6 (Core Profile) Mesa 21.3.8
OpenGL core profile shading language version string: 4.60
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 4.6 (Compatibility Profile) Mesa 21.3.8
OpenGL shading language version string: 4.60
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile

OpenGL ES profile version string: OpenGL ES 3.2 Mesa 21.3.8
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
- ------------------------------with-network-access------------------------------

$ guix shell -CN mesa-utils --expose=/tmp/.X11-unix --expose=$XAUTHORITY --expose=/dev/dri --expose=/etc/udev -E "DISPLAY|XAUTHORITY" -- glxinfo -B

name of display: :1
libGL error: MESA-LOADER: failed to retrieve device information
libGL error: MESA-LOADER: failed to open amdgpu: /gnu/store/83kzrpczis5s8hn3ly9y89mij7ngq4bw-mesa-21.3.8/lib/dri/amdgpu_dri.so: cannot open shared object file: No such file or directory (search paths /gnu/store/83kzrpczis5s8hn3ly9y89mij7ngq4bw-mesa-21.3.8/lib/dri, suffix _dri)
libGL error: failed to load driver: amdgpu
libGL error: MESA-LOADER: failed to retrieve device information
libGL error: MESA-LOADER: failed to open amdgpu: /gnu/store/83kzrpczis5s8hn3ly9y89mij7ngq4bw-mesa-21.3.8/lib/dri/amdgpu_dri.so: cannot open shared object file: No such file or directory (search paths /gnu/store/83kzrpczis5s8hn3ly9y89mij7ngq4bw-mesa-21.3.8/lib/dri, suffix _dri)
libGL error: failed to load driver: amdgpu
display: :1 screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
Vendor: Mesa/X.org (0xffffffff)
Device: llvmpipe (LLVM 11.0.0, 256 bits) (0xffffffff)
Version: 21.3.8
Accelerated: no
Video memory: 30926MB
Unified memory: no
Preferred profile: core (0x1)
Max core profile version: 4.5
Max compat profile version: 4.5
Max GLES1 profile version: 1.1
Max GLES[23] profile version: 3.2
OpenGL vendor string: Mesa/X.org
OpenGL renderer string: llvmpipe (LLVM 11.0.0, 256 bits)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 21.3.8
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 4.5 (Compatibility Profile) Mesa 21.3.8
OpenGL shading language version string: 4.50
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile

OpenGL ES profile version string: OpenGL ES 3.2 Mesa 21.3.8
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
- ------------------------------paste-ends-here------------------------------

The only difference between these two executions are the `-N' flag. I also had a look at the related guile code, and it seems like the `-N' flag is only doing two things:
1. bind several network related files to /etc
2. share network namespace to the container

I've had a few other guix users tested the commands, and they reported the similar results.

Some info about my environment:
kernel version: 6.0.7
mesa version: 21.3.8

- --
dan
-----BEGIN PGP SIGNATURE-----

iQJABAEBCAAqFiEENywBMxcNCHYJ4/aIR1rKxpmiJ40FAmNnWCgMHGlAZGFuLmdh
bWVzAAoJEEdaysaZoieNbFsP/2INlj3WNX8fKBt5pFGkAnewXUHS4Vn+pBSbshuc
srwJ4gaatBJkaWvA71kH3mLwYOH+cQmSVI8Zt2Bc2Ztny+SewBt9cqvQAEAmHME7
tW2y5nAhzsJplMoOtTcRnT1Opdn5Zz0iLCwuc8avVa14KwqV53qEmXyjdL8DwIgQ
kkyog4j3W5bCIfKdAwQmsg9/Fr4TEVRiFHvNCkmpkCHVxQ0RBsTvW5wfHzfkSvL5
Z0FY20xq20LjTpwuk6yVl79+4dkSotXoXwSbkd3aa8ehyWIlGLrTyTkJeL5jmqXZ
ec9zWBN5xT6a1JiOxhVxGn/X3FLpSryOp7kzz5L4RrWbMPYnILUz0X5XzcRRZYWK
OovxW/z6Ug6uDAfMkgGuiLrdiHOGKnxaEzJdtVdDwtk2SMqM0B8qZEkunZIfUeKf
2BOy7xCxx8UP+mtdaHz/wdH6IvVMSewDLZUIOXKOlhqeYm58vulPPkHIKP4EVNpC
RUmbRenevrfvt/6WYujxvd3GEU6I6DEslryObS7ntypjESxPiuwVTPLffhCwlomC
Yg23qP395fi4ecer+8rLgANsb7YUKWk74Pl218Pcddfjaitrfx3UUyWynYtPmxHg
tj30jNlhz2owYag5WC0c76K2rmnQaAZ8dHZ5pza0FFGHbkn7Xcqy7xXK4K0b6+5h
OSuZ
=qHGv
-----END PGP SIGNATURE-----
L
L
Ludovic Courtès wrote on 10 Nov 2022 10:45
(name . dan)(address . i@dan.games)(address . 59069@debbugs.gnu.org)
87fserkusb.fsf@gnu.org
Hi,

dan <i@dan.games> skribis:

Toggle quote (2 lines)
> I was trying to run some GUI software in a guix container, and would like to have GPU access in it. However, I later found out that if I gave network access to the container, it seems like unable to properly find the GPU. The following are the commands that I run and the output I got:

Could you check with strace what it’s trying to access, both with and
without ‘-N’?

guix shell mesa-utils strace … -C -- strace -o /tmp/log.strace glxinfo

It might be a /dev node, or it might be simply talking to the X server,
which requires network access.

Thanks,
Ludo’.
D
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 59069@debbugs.gnu.org)
87zgcyu99s.fsf@dan.games
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (7 lines)
> Could you check with strace what it’s trying to access, both
> with and
> without ‘-N’?
>
> guix shell mesa-utils strace … -C -- strace -o /tmp/log.strace
> glxinfo

I looked into the strace logs, and found out that it's actually
having trouble accessing /sys, which is not available in a '-N'
container. I run the following scripts to test:
Toggle quote (2 lines)
> $ guix shell -C coreutils -- ls /
> bin dev etc gnu home proc sys tmp
while with the '-N' flag:
Toggle quote (3 lines)
> $guix shell -CN coreutils --ls /
> bin dev etc gnu home proc tmp

L
L
Ludovic Courtès wrote on 10 Nov 2022 16:49
(name . dan)(address . i@dan.games)
87o7teg68j.fsf@gnu.org
Hi!

(Cc: Dave Thompson, the original author of this code.)

As you pointed out on IRC, the problem is that ‘guix shell -C’ provides
/sys whereas ‘guix shell -CN’ doesn’t.

This stems from this call in (gnu build linux-container), which has
always been there:

(mount-file-systems root mounts
#:mount-/proc? (memq 'pid namespaces)
#:mount-/sys? (memq 'net
namespaces))

This is explained a few lines above:

;; A sysfs mount requires the user to have the CAP_SYS_ADMIN capability in
;; the current network namespace.
(when mount-/sys?
(mount* "none" (scope "/sys") "sysfs"
(logior MS_NOEXEC MS_NOSUID MS_NODEV MS_RDONLY)))

As you noticed with ‘--expose=/sys’, bind-mounting /sys doesn’t work
either (‘mount’ fails with EINVAL).

Not sure what to do. Thoughts?

Ludo’.
L
L
Ludovic Courtès wrote on 12 Nov 2022 18:24
control message for bug #59069
(address . control@debbugs.gnu.org)
87h6z4cchg.fsf@gnu.org
merge 59069 59166
quit
?