video acceleration/libva segfaults caused by stale mesa shader cache

  • Open
  • quality assurance status badge
Details
2 participants
  • Giovanni Biscuolo
  • Maxim Cournoyer
Owner
unassigned
Submitted by
Maxim Cournoyer
Severity
normal
M
M
Maxim Cournoyer wrote on 1 May 2023 04:42
(name . bug-guix)(address . bug-guix@gnu.org)
875y9c7owu.fsf@gmail.com
Hi,

After reinstalling someone's desktop which has support for VA-API,
'vainfo' from 'libva-utils' would consume all the memory then crash.
Other applications relying on libva would crash as well, e.g. ffmpeg (or
its users, such as vlc/jami). Here's a sample output from VLC:

Toggle snippet (9 lines)
vlc received_605209834855384.mp4
VLC media player 3.0.18 Vetinari (revision 3.0.13-8-g41878ff4f2)
[000000000109d770] main libvlc: Lancement de vlc avec l'interface par défaut. Utiliser « cvlc » pour démarrer VLC sans interface.
libva info: VA-API version 1.17.0
libva info: Trying to open /gnu/store/9pypr3c3y379shbwm9ilb4pik9mkfd83-mesa-22.2.4/lib/dri/radeonsi_drv_video.so
libva info: Found init function __vaDriverInit_1_17
Erreur de segmentation

After tracing the process, I noticed that the last thing it did was
loading its mesa shader cache, stored under:

Toggle snippet (3 lines)
~/.cache/mesa_shader_cache

Deleting that directory resolved the issue.

It seems that'd be a bug in Mesa (for failing to determine that it
should have invalidated its cache going from version 21 to 22 post
core-updates merge).

--
Thanks,
Maxim
M
M
Maxim Cournoyer wrote on 1 May 2023 04:58
(address . 63197@debbugs.gnu.org)
871qk07o7a.fsf@gmail.com
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (27 lines)
> Hi,
>
> After reinstalling someone's desktop which has support for VA-API,
> 'vainfo' from 'libva-utils' would consume all the memory then crash.
> Other applications relying on libva would crash as well, e.g. ffmpeg (or
> its users, such as vlc/jami). Here's a sample output from VLC:
>
> vlc received_605209834855384.mp4
> VLC media player 3.0.18 Vetinari (revision 3.0.13-8-g41878ff4f2)
> [000000000109d770] main libvlc: Lancement de vlc avec l'interface par défaut. Utiliser « cvlc » pour démarrer VLC sans interface.
> libva info: VA-API version 1.17.0
> libva info: Trying to open /gnu/store/9pypr3c3y379shbwm9ilb4pik9mkfd83-mesa-22.2.4/lib/dri/radeonsi_drv_video.so
> libva info: Found init function __vaDriverInit_1_17
> Erreur de segmentation
>
>
> After tracing the process, I noticed that the last thing it did was
> loading its mesa shader cache, stored under:
>
> ~/.cache/mesa_shader_cache
>
> Deleting that directory resolved the issue.
>
> It seems that'd be a bug in Mesa (for failing to determine that it
> should have invalidated its cache going from version 21 to 22 post
> core-updates merge).

I've forwarded this report upstream here:

--
Thanks,
Maxim
G
G
Giovanni Biscuolo wrote on 15 Jun 2023 15:49
87pm5w3ke9.fsf@xelera.eu
Hi Maxim,

I learned about this issue today

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

[...]

Toggle quote (11 lines)
>> After tracing the process, I noticed that the last thing it did was
>> loading its mesa shader cache, stored under:
>>
>> ~/.cache/mesa_shader_cache
>>
>> Deleting that directory resolved the issue.
>>
>> It seems that'd be a bug in Mesa (for failing to determine that it
>> should have invalidated its cache going from version 21 to 22 post
>> core-updates merge).

AFAIU this issue is still present using mesa 23 since Guillaume Le
Vaillant had to use this workaround yesterday [1] and reported his
backtrace upstream [2]

If I'm not wrong (i.e. vlc et al are now using mesa 23) this should also
be reported upstream (I can do it if needed).

AFAIU the only thing we can do to fix this bug is to disable the shader
cache (MESA_SHADER_CACHE_DISABLE=true) until a proper fix is found
upstream.

...or apply a patch to rename "~/.cache/mesa_shader_cache" to
"~/.cache/mesa<version>_shader_cache"

Alternatively, we should find a way to make Guix users aware of this
kind of problems and possible workarounds they can apply (it's not
related to this specific bug)


WDYT?

Thanks! Gio'


[1] id:871qify1i8.fsf@kitej


--
Giovanni Biscuolo

Xelera IT Infrastructures
-----BEGIN PGP SIGNATURE-----

iQJABAEBCgAqFiEERcxjuFJYydVfNLI5030Op87MORIFAmSLFv4MHGdAeGVsZXJh
LmV1AAoJENN9DqfOzDkSTjoP/0sKj3O19gl03wR0XkkGOOqUucQKXM9UxMT8mcl4
bh+4KOVWjD7n+zpu85fzNN3Qo4Hy3gg6febZ/3A6hGImoNewrpax5b+J5DhUht8X
Qy3rYuKrbyWGKQ9XWD4koIFDW7rhp05FBPmK6hFTQfVBF8HwTZywfsAPuRgeHsGH
Q2O0Wl+AJjtjoq/qD5veD6XoVKwCg6D3vovqobszFjb5vt5jAkfR2J0D6yHoUWFR
uHDpqCqGY8Al/J9JFePwZpwIDPZdaGYVvjrisi414SaRCrT/kj+HEfwNB4lxwYmh
8Lk3rfEY/oxaW3aJRP09swr7kA6ok0U7uWBQAAOQf44TZmxuGxGGYUHTmCnzmsKL
QbBOetJrY1k+k/epMgI1Dzu99uEbTMVgV/1cyB7DfFr1JFdtPsBAzDEEfKmr3kNr
+R/Ri9dZgN4rvOTB9Bk3WxnwVpQirOgU7wwDT6VHSVsM+IYyOtAkoyuYrW3Bsdds
7hFtbcw/lwn0MP8uA6wL0NVhxXE8bt3uHNxkALa9wWlYMCqKt39TXhz16+kvCLH5
WBhJA7ACTFJ1AwkmfYHELedv+D2QQKKsORLdwZe4MHeh4We6dMqABSmxZ4d3saGr
J7k7BCitpfGQA8+s+Xv/SXG93+NbyDawDIIAdgXqGiUsWd7IYbN4bC/xHHCSbL6F
yzhU
=i+eD
-----END PGP SIGNATURE-----

M
M
Maxim Cournoyer wrote on 17 Jun 2023 02:36
(name . Giovanni Biscuolo)(address . g@xelera.eu)(address . 63197@debbugs.gnu.org)
878rcivs9q.fsf@gmail.com
Hello,

Giovanni Biscuolo <g@xelera.eu> writes:

Toggle quote (26 lines)
> Hi Maxim,
>
> I learned about this issue today
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:
>
> [...]
>
>>> After tracing the process, I noticed that the last thing it did was
>>> loading its mesa shader cache, stored under:
>>>
>>> ~/.cache/mesa_shader_cache
>>>
>>> Deleting that directory resolved the issue.
>>>
>>> It seems that'd be a bug in Mesa (for failing to determine that it
>>> should have invalidated its cache going from version 21 to 22 post
>>> core-updates merge).
>
> AFAIU this issue is still present using mesa 23 since Guillaume Le
> Vaillant had to use this workaround yesterday [1] and reported his
> backtrace upstream [2]
>
> If I'm not wrong (i.e. vlc et al are now using mesa 23) this should also
> be reported upstream (I can do it if needed).

Which upstream are you thinking about? My understanding is that this
problem is a Mesa problem, and it's already reported there (the issue
linked in [2]).

Toggle quote (4 lines)
> AFAIU the only thing we can do to fix this bug is to disable the shader
> cache (MESA_SHADER_CACHE_DISABLE=true) until a proper fix is found
> upstream.

Disabling the shader cache sounds like a decent workaround or even
definitive solution. One less stale cache to worry about... If it's
like the Qt shader cache, the performance hit is probably too small to
be noticeable (maybe just slower startup times of complicated opengl
applications such as games?).

Toggle quote (3 lines)
> ...or apply a patch to rename "~/.cache/mesa_shader_cache" to
> "~/.cache/mesa<version>_shader_cache"

That's another good idea.

Toggle quote (4 lines)
> Alternatively, we should find a way to make Guix users aware of this
> kind of problems and possible workarounds they can apply (it's not
> related to this specific bug)

I would rather pursue the other above options you suggest, so that it
doesn't happen in the first place!

Thank you for sharing these ideas.

--
Maxim
G
G
Giovanni Biscuolo wrote on 17 Jun 2023 12:14
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)(address . 63197@debbugs.gnu.org)
878rci2y6u.fsf@xelera.eu
Hi Maxim

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

[...]

Toggle quote (9 lines)
>> AFAIU this issue is still present using mesa 23 since Guillaume Le
>> Vaillant had to use this workaround yesterday [1] and reported his
>> backtrace upstream [2]
>>
>> If I'm not wrong (i.e. vlc et al are now using mesa 23) this should also
>> be reported upstream (I can do it if needed).
>
> Which upstream are you thinking about?

mesa

Toggle quote (3 lines)
> My understanding is that this problem is a Mesa problem, and it's
> already reported there (the issue linked in [2]).

yes but the original bug report mentions Mesa 22.2.4 and M. Briar asked:

Toggle snippet (5 lines)
Mesa 22.2.x is already end-of-life and won't receive any fixes
anymore. Does this also happen on newer versions?


IMHO there is no clear answer to that question in the bug thread, maybe
mesa developers still think it's just 22.2.X related

Now we have Mesa 23.0.3 in Giux, probably the one used by vlc when
Guillaume reported his issue upstream (mesa) on June 15

Toggle quote (7 lines)
>> AFAIU the only thing we can do to fix this bug is to disable the shader
>> cache (MESA_SHADER_CACHE_DISABLE=true) until a proper fix is found
>> upstream.
>
> Disabling the shader cache sounds like a decent workaround or even
> definitive solution. One less stale cache to worry about...

oh yes! Unfortunately cache management is not so robust... sometimes :-(

Toggle quote (4 lines)
> If it's like the Qt shader cache, the performance hit is probably too
> small to be noticeable (maybe just slower startup times of complicated
> opengl applications such as games?).

I don't know the cost in term of performance, I'm not a 3D expert at
all; from what I read on the web about shader chaches I guess it's a
real problem almost only for games and I guess it's not a problem at
all for media players like vlc et al: I'm just brainstorming but what
about having a mesa-with-cache-enabled version just for the games, if it
is really needed?

I should be able to propose a patch to disable the mesa shader cache,
but since I'm not an expert in this field I prefer to leave this
decision (to disable the cache, I mean) to someone else

Toggle quote (5 lines)
>> ...or apply a patch to rename "~/.cache/mesa_shader_cache" to
>> "~/.cache/mesa<version>_shader_cache"
>
> That's another good idea.

I was just doing guesswork but the bug caused by this mesa upgrade
smells like a binary incompatibility between two versions (or just major
versions)... so a versioned shader cache makes sense to me

I'm not able to propose (I mean to code) such a patch, anyway

Anyway, users should know that they have to periodically clean unused
shader caches, since from what I read on the net the shader cache tends
to really /explode/ in terms of size, in some cases

Toggle quote (7 lines)
>> Alternatively, we should find a way to make Guix users aware of this
>> kind of problems and possible workarounds they can apply (it's not
>> related to this specific bug)
>
> I would rather pursue the other above options you suggest, so that it
> doesn't happen in the first place!

I agree

Toggle quote (2 lines)
> Thank you for sharing these ideas.

Thank you for your attention!

Happy hacking, Gio'

--
Giovanni Biscuolo

Xelera IT Infrastructures
-----BEGIN PGP SIGNATURE-----

iQJABAEBCgAqFiEERcxjuFJYydVfNLI5030Op87MORIFAmSNh2kMHGdAeGVsZXJh
LmV1AAoJENN9DqfOzDkSZgYP/AkcCHyyPHY5DyGwk2LXAAZkfoWc/NFF0lwT8id/
BEEq3wgjum3q514KITRjTe6HbcLdPtawQuHk1z2SCrJUHXhiiraskHwgPoPt5Byt
ZJQHFfw1JYvl3+ZH+uNvwRga4o87d2VQ31nlEDc4y0yarv2i3RSsRJ/QJbs4k0Eb
OWAZ4bhGpVe0HdxRbIA9OdULN4veyQuvomhRMK3KG0ckUlMp5i1dR5gRmqKpaZl8
kJMEw1ZXmCZxEexvU82GwpFBEMHst9kkjj1uGu3EWEuoTONvWC5CEearxGWpCVR4
lVyBc9weYytopC/ZbGvaugqZmttkcnejirDZfnjYpGoV1reiCPEUA36KzwvShlB4
RnYm32Ljwfmtsc8OPdTHTkzfZHC7iUASLh7GfsioAJVDOUj9y8aw6kQw0EYS2ZJi
wS2p8+pvxbJDl6HwZCTnbsNGB3/+/xoJ3Y32A+qA/Srq/83dnc6MFRf8USFozcRl
M9cBgPPu+/gtadnpxahRjb8IxnGj2PJeOfyS9jgCeq+loCw+R3Sp1qr2DycxeEDH
DeAwGxwdmv5dTw5TX+AeAEowCScSGAdSAHgdk4eh45ePw5K50mNYPc7e4Pzzl5dF
0DqE627TeuQKOoj00Lex4dc+bK2913wIRKElAyTKV+SZAQ1NwX/3pgPtgOPlmfaF
RGk+
=w3Lq
-----END PGP SIGNATURE-----

M
M
Maxim Cournoyer wrote on 26 Jun 2023 18:21
(name . Giovanni Biscuolo)(address . g@xelera.eu)(address . 63197@debbugs.gnu.org)
87r0py19fj.fsf@gmail.com
Hi Giovanni,

Giovanni Biscuolo <g@xelera.eu> writes:

[...]

Toggle quote (22 lines)
>>> ...or apply a patch to rename "~/.cache/mesa_shader_cache" to
>>> "~/.cache/mesa<version>_shader_cache"
>>
>> That's another good idea.
>
> I was just doing guesswork but the bug caused by this mesa upgrade
> smells like a binary incompatibility between two versions (or just major
> versions)... so a versioned shader cache makes sense to me
>
> I'm not able to propose (I mean to code) such a patch, anyway
>
> Anyway, users should know that they have to periodically clean unused
> shader caches, since from what I read on the net the shader cache tends
> to really /explode/ in terms of size, in some cases
>
>>> Alternatively, we should find a way to make Guix users aware of this
>>> kind of problems and possible workarounds they can apply (it's not
>>> related to this specific bug)
>>
>> I would rather pursue the other above options you suggest, so that it
>> doesn't happen in the first place!

I've ping'd upstream with
Let's see what they say!

--
Thanks,
Maxim
M
M
Maxim Cournoyer wrote 4 days ago
(address . 63197@debbugs.gnu.org)
875xw5ffdz.fsf@gmail.com
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (15 lines)
> Hi,
>
> After reinstalling someone's desktop which has support for VA-API,
> 'vainfo' from 'libva-utils' would consume all the memory then crash.
> Other applications relying on libva would crash as well, e.g. ffmpeg (or
> its users, such as vlc/jami). Here's a sample output from VLC:
>
> vlc received_605209834855384.mp4
> VLC media player 3.0.18 Vetinari (revision 3.0.13-8-g41878ff4f2)
> [000000000109d770] main libvlc: Lancement de vlc avec l'interface par défaut. Utiliser « cvlc » pour démarrer VLC sans interface.
> libva info: VA-API version 1.17.0
> libva info: Trying to open /gnu/store/9pypr3c3y379shbwm9ilb4pik9mkfd83-mesa-22.2.4/lib/dri/radeonsi_drv_video.so
> libva info: Found init function __vaDriverInit_1_17
> Erreur de segmentation

The same issue was reproduced with vlc and Totem after the latest
upgrades, which brought mesa from version 23 to 24. The same solution:

Toggle snippet (3 lines)
$ rm -rf ./.cache/mesa_shader_cache

still works.

I've sent upstream a fresh snapshot of the 23 -> 24 corrupted cache [0].


--
Thanks,
Maxim
?