GDM, GNOME Shell, etc. break when there are stale caches

OpenSubmitted by Ricardo Wurmus.
Details
7 participants
  • Andreas Enge
  • Efraim Flashner
  • L p R n d n
  • Ludovic Courtès
  • Mark H Weaver
  • Ricardo Wurmus
  • Timothy Sample
Owner
unassigned
Severity
important
R
R
Ricardo Wurmus wrote on 4 Aug 2019 23:00
fixing GDM + GNOME Shell
(address . guix-devel@gnu.org)(address . bug-guix@gnu.org)
87tvawzlvq.fsf@elephly.net
Hi Guix,

Today I again couldn’t log into my workstation after upgrading the
system. I’m using GDM + GNOME Shell.

At first GDM wouldn’t start. I knew what to do: remove /var/lib/gdm,
because some state must have accumulated there.

GDM came up after a reboot, but I still couldn’t log in. Instead I was
thrown back to the login screen without any error message. I looked in
~/.cache/gdm/session.log for information, but it only told me that
gnome-shell was killed. Thanks.

After removing both .local/share and .cache out of the way I could log
in again.

This happens whenever I upgrade the system. This makes the system
rather frustrating to use. I don’t know if booting into an older system
generation would result in the same problem, but my guess is that it
would because both GDM and GNOME Shell appear to be leaving some binary
files behind that cause different versions to crash unceremoneously.

What can we do to make GDM and GNOME Shell more reliable?

--
Ricardo
E
E
Efraim Flashner wrote on 5 Aug 2019 09:17
(name . Ricardo Wurmus)(address . rekado@elephly.net)
20190805071719.GB15819@E2140
On Sun, Aug 04, 2019 at 11:00:41PM +0200, Ricardo Wurmus wrote:
Toggle quote (8 lines)
> Hi Guix,
>
> Today I again couldn’t log into my workstation after upgrading the
> system. I’m using GDM + GNOME Shell.
>
> At first GDM wouldn’t start. I knew what to do: remove /var/lib/gdm,
> because some state must have accumulated there.

For this one can we create a single-shot service that, on reconfigure or
boot, removes this directory and recreates it? In fact, it seems this is
basically what Debian does¹.

Toggle quote (9 lines)
>
> GDM came up after a reboot, but I still couldn’t log in. Instead I was
> thrown back to the login screen without any error message. I looked in
> ~/.cache/gdm/session.log for information, but it only told me that
> gnome-shell was killed. Thanks.
>
> After removing both .local/share and .cache out of the way I could log
> in again.

This part seems a little harder to automate. /etc/skel is only sourced
when a user is created, so it's hard to make sweeping changes to help
people in this case, if they even want automated help. I'm guessing
making .cache/gdm(?) read-only would create other issues.

Toggle quote (9 lines)
>
> This happens whenever I upgrade the system. This makes the system
> rather frustrating to use. I don’t know if booting into an older system
> generation would result in the same problem, but my guess is that it
> would because both GDM and GNOME Shell appear to be leaving some binary
> files behind that cause different versions to crash unceremoneously.
>
> What can we do to make GDM and GNOME Shell more reliable?

Modify the logout scripts to remove a users' .cache file seems extreme.
Some of the other options, such as removing and recreating directories
would address other issues we've had (such as /var/cache/fontconfig).



--
Efraim Flashner <efraim@flashner.co.il> אפרים פלשנר
GPG key = A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted
-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEoov0DD5VE3JmLRT3Qarn3Mo9g1EFAl1H1/IACgkQQarn3Mo9
g1G5VhAAsibD5ztLCeQoH3V8uNSGHhxreveiSAs1ZRB63xKot5+77Yc6dVWbfQOn
FO+0kD4ltjlaD+FplwoR+3qBUzshx+Gs+5NJi6EJJfWlt6mM7mOpUlry3uVJ5iFA
Y0lR9mw+xc2Yaj/oaXiimmncpGVs8aCqNM5lSugIuACL3e0JTOVX1Dzatuc1HW7+
DRrAIpnz+2Jwdf8n8lDBRVf6skJHh3cMKEWYxO/xRABeaAESjOxvWq8sB0TEpCSe
bZdMjAUV9eoWE+gIPbqLjdAjBTHy69BKj6tRgn1meZCvQv1CsdVDkqJUVJJ1p9fm
K5VjXij092gnXDGkdDYUm2B4loTYodwva45x5O5Eb7z12wq6+zCsYcD3Ir0+twFf
hM2Q8ptCwVr55bpO48I8KpWkAJi8jn+RiDNro3BZK+IfOcHAusKdEqcdA3wEQzs0
99qkeTKbBa8sTAmI44lqYTrkrWg6mIeWXamgr0Xkb6bZM2YwkgdmsqhGFb1hs1uW
dOhfBrULNBMMARMiFIR3Muurqo/pRVqR3V/4aK0a+y4XdmE7eab2dhQs9DwPjYLl
WqaSF7X5XfLi7GQZKEzbZOEqH2D7e03zKKximVjkO/7wrGZ4qr17mC8k2/U6voEB
Yw9Xwo3hacJHGLbwG/7k79yQ4lM9oR/H0mR3RcwqNoZ7NTwKJaY=
=ZKzJ
-----END PGP SIGNATURE-----


R
R
Ricardo Wurmus wrote on 5 Aug 2019 16:36
(name . Efraim Flashner)(address . efraim@flashner.co.il)
87h86vznkt.fsf@elephly.net
Efraim Flashner <efraim@flashner.co.il> writes:

Toggle quote (13 lines)
> On Sun, Aug 04, 2019 at 11:00:41PM +0200, Ricardo Wurmus wrote:
>> Hi Guix,
>>
>> Today I again couldn’t log into my workstation after upgrading the
>> system. I’m using GDM + GNOME Shell.
>>
>> At first GDM wouldn’t start. I knew what to do: remove /var/lib/gdm,
>> because some state must have accumulated there.
>
> For this one can we create a single-shot service that, on reconfigure or
> boot, removes this directory and recreates it? In fact, it seems this is
> basically what Debian does¹.

I suggested as much earlier, but it seems like a hack. Is this how
GNOME expects this state directory to be handled? The fact that Debian
does this is reassuring (or not…), but I would very much like to avoid
adding even more hacks.

Toggle quote (13 lines)
>> GDM came up after a reboot, but I still couldn’t log in. Instead I was
>> thrown back to the login screen without any error message. I looked in
>> ~/.cache/gdm/session.log for information, but it only told me that
>> gnome-shell was killed. Thanks.
>>
>> After removing both .local/share and .cache out of the way I could log
>> in again.
>
> This part seems a little harder to automate. /etc/skel is only sourced
> when a user is created, so it's hard to make sweeping changes to help
> people in this case, if they even want automated help. I'm guessing
> making .cache/gdm(?) read-only would create other issues.

Does anyone know why this happens at all? What are the cached data?
Can we do without?

Toggle quote (6 lines)
>> What can we do to make GDM and GNOME Shell more reliable?
>
> Modify the logout scripts to remove a users' .cache file seems extreme.
> Some of the other options, such as removing and recreating directories
> would address other issues we've had (such as /var/cache/fontconfig).

In my opinion generating a global /var/cache/fontconfig should be
prevented; removing it seems again like an avoidable hack.

--
Ricardo
M
M
Mark H Weaver wrote on 6 Aug 2019 18:12
(name . Ricardo Wurmus)(address . rekado@elephly.net)
87k1bqgtn1.fsf@netris.org
Hi Ricardo,

Ricardo Wurmus <rekado@elephly.net> writes:

Toggle quote (22 lines)
> Today I again couldn’t log into my workstation after upgrading the
> system. I’m using GDM + GNOME Shell.
>
> At first GDM wouldn’t start. I knew what to do: remove /var/lib/gdm,
> because some state must have accumulated there.
>
> GDM came up after a reboot, but I still couldn’t log in. Instead I was
> thrown back to the login screen without any error message. I looked in
> ~/.cache/gdm/session.log for information, but it only told me that
> gnome-shell was killed. Thanks.
>
> After removing both .local/share and .cache out of the way I could log
> in again.
>
> This happens whenever I upgrade the system. This makes the system
> rather frustrating to use. I don’t know if booting into an older system
> generation would result in the same problem, but my guess is that it
> would because both GDM and GNOME Shell appear to be leaving some binary
> files behind that cause different versions to crash unceremoneously.
>
> What can we do to make GDM and GNOME Shell more reliable?

It's interesting that I've never run into this problem, not even once,
in all my years of running GNOME on Guix systems. Since recently
reverting to mostly using GNOME under X and GDM (whereas for a while I
was mostly launching GNOME manually under Wayland), I've run into some
other problems, e.g. GDM suspending my system automatically, sometimes
immediately after logging out, but I've *never* had to remove my caches.

I wonder if this is related to my use of Btrfs instead of Ext4. Whereas
system crashes cause file system corruptions under Ext4 (usually in the
form of some files being left empty after a crash), I've never seen any
evidence of corruption from crashes under Btrfs.

Mark
R
R
Ricardo Wurmus wrote on 6 Aug 2019 20:08
(name . Mark H Weaver)(address . mhw@netris.org)
87sgqexj2k.fsf@elephly.net
Mark H Weaver <mhw@netris.org> writes:

Toggle quote (7 lines)
> It's interesting that I've never run into this problem, not even once,
> in all my years of running GNOME on Guix systems. Since recently
> reverting to mostly using GNOME under X and GDM (whereas for a while I
> was mostly launching GNOME manually under Wayland), I've run into some
> other problems, e.g. GDM suspending my system automatically, sometimes
> immediately after logging out, but I've *never* had to remove my caches.

Interesting.

Toggle quote (5 lines)
> I wonder if this is related to my use of Btrfs instead of Ext4. Whereas
> system crashes cause file system corruptions under Ext4 (usually in the
> form of some files being left empty after a crash), I've never seen any
> evidence of corruption from crashes under Btrfs.

I haven’t had a system crash on this machine. I didn’t use it for a
month, upgraded, rebooted, and then had GDM + GNOME Shell problems.

--
Ricardo
T
T
Timothy Sample wrote on 8 Aug 2019 04:59
(name . Ricardo Wurmus)(address . rekado@elephly.net)
87d0hgpdjl.fsf@ngyro.com
Hello,

Ricardo Wurmus <rekado@elephly.net> writes:

Toggle quote (11 lines)
> Mark H Weaver <mhw@netris.org> writes:
>
>> It's interesting that I've never run into this problem, not even once,
>> in all my years of running GNOME on Guix systems. Since recently
>> reverting to mostly using GNOME under X and GDM (whereas for a while I
>> was mostly launching GNOME manually under Wayland), I've run into some
>> other problems, e.g. GDM suspending my system automatically, sometimes
>> immediately after logging out, but I've *never* had to remove my caches.
>
> Interesting.

FWIW, I’m having the same good luck as Mark. Other than the upgrade
from 3.24 to 3.28, I’ve never had this kind of trouble with GNOME and
GDM. And even then, IIRC, the issue wasn’t really with the state files
(deleting them just happened to serve as a temporary work-around).


-- Tim
L
L
Ludovic Courtès wrote on 12 Sep 2019 10:42
control message for bug #36924
(address . control@debbugs.gnu.org)
874l1hrjm6.fsf@gnu.org
severity 36924 important
quit
L
L
Ludovic Courtès wrote on 12 Sep 2019 10:43
(address . control@debbugs.gnu.org)
8736h1rjjz.fsf@gnu.org
retitle 36924 GDM, GNOME Shell, etc. break when there are stale caches
quit
A
A
Andreas Enge wrote on 12 Sep 2019 11:54
Mesa/GDM/XFCE
(address . 36924@debbugs.gnu.org)
20190912095430.GA1559@jurong
Hello,

now it is my turn to experience a problem in this area. I newly installed a
machine with the graphical installer of Guix 1.0.1 (where I could use xfce
without problem), then I issued a "guix pull" and a "guix system reconfigure".

Now logging into XFCE poses problems, which I can reproduce as follows
(after removing /var/lib/gdm, $HOME/{.config,.cache,.local} once):
- When I remove $HOME/.config/xfce4/xfconf/xfce-perchannel-xml/xfwm4.xml
I can log into XFCE once.
- The second time, various problems may occur: The terminal, which was open
before opens, but does not receive focus so I cannot type, and the windows
decorations for closing it are absent; or the xfce panel is absent.
- Then I remove $HOME/.config/xfce4/xfconf/xfce-perchannel-xml/xfwm4.xml
again, and can log in once more.
And so on.

The following lines in $HOME/.cache/gdm/session.log appear when there is
a problem:

xfwm4: ../mesa-19.1.4/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1293: intel_miptree_match_image: Zusicherung ᅵimage->TexObject->Target == mt->targetᅵ nicht erfᅵllt.
(nm-applet:8046): nm-applet-WARNING **: 11:23:52.630: GDBus.Error:org.freedesktop.NetworkManager.AgentManager.PermissionDenied: An agent with this ID is already registered for this user.
xfwm4: ../mesa-19.1.4/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1293: intel_miptree_match_image: Zusicherung ᅵimage->TexObject->Target == mt->targetᅵ nicht erfᅵllt.
(nm-applet:8046): Gdk-CRITICAL **: 11:23:53.035: gdk_window_thaw_toplevel_updates: assertion 'window->update_and_descendants_freeze_count > 0' failed
xfwm4: ../mesa-19.1.4/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1293: intel_miptree_match_image: Zusicherung ᅵimage->TexObject->Target == mt->targetᅵ nicht erfᅵllt.
(nm-applet:8046): Gdk-CRITICAL **: 11:23:53.357: gdk_window_thaw_toplevel_updates: assertion 'window->update_and_descendants_freeze_count > 0' failed
xfwm4: ../mesa-19.1.4/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1293: intel_miptree_match_image: Zusicherung ᅵimage->TexObject->Target == mt->targetᅵ nicht erfᅵllt.
(nm-applet:8046): Gdk-CRITICAL **: 11:23:53.441: gdk_window_thaw_toplevel_updates: assertion 'window->update_and_descendants_freeze_count > 0' failed
(xfconfd:7966): xfconfd-CRITICAL **: 11:23:58.040: Name org.xfce.Xfconf lost on the message dbus, exiting.
(Thunar:8030): thunar-WARNING **: 11:23:58.041: Name ᅵorg.xfce.FileManagerᅵ auf dem Nachrichten-dbus verloren.
(tumblerd:8023): tumblerd-CRITICAL **: 11:23:58.041: Name org.freedesktop.thumbnails.Cache1 lost on the message dbus, exiting.
(Thunar:8030): thunar-WARNING **: 11:23:58.041: Name ᅵorg.freedesktop.FileManager1ᅵ auf dem Nachrichten-dbus verloren.
(tumblerd:8023): tumblerd-CRITICAL **: 11:23:58.041: Name org.freedesktop.thumbnails.Manager1 lost on the message dbus, exiting.
(tumblerd:8023): tumblerd-CRITICAL **: 11:23:58.041: Name org.freedesktop.thumbnails.Thumbnailer1 lost on the message dbus, exiting.

Sorry for the German, but you also have the translation:
"Zusicherung ... nicht erfï¿œllt" = "assertion ... failed"
"auf dem Nachrichten-dbus verloren" = "lost on the message dbus"

Andreas
L
L
Ludovic Courtès wrote on 12 Sep 2019 13:40
(name . Andreas Enge)(address . andreas@enge.fr)(address . 36924@debbugs.gnu.org)
874l1hpwti.fsf@gnu.org
Hallo!

Thanks for the report, Andreas!

Andreas Enge <andreas@enge.fr> skribis:

Toggle quote (2 lines)
> xfwm4: ../mesa-19.1.4/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1293: intel_miptree_match_image: Zusicherung »image->TexObject->Target == mt->target« nicht erfüllt.

That’s the likely root cause to me (in which case it may be unrelated to

I found these bug reports:


In both cases, Xfce and Mesa’s i965 drivers are involved, as is the case
on your machine. The 2nd bug report includes an xfwm4 patch, even.

I wonder if Xfce before the recent updates (so before
8549e0ca6fd68a57253471436de49b88b2d47e64) works better.

Andreas, if you feel like it, could you try:

guix pull --commit=97ce5964fb5d52cf2151fea685e28fa23a98b264
sudo guix system reconfigure …

?

Thanks,
Ludo’.
A
A
Andreas Enge wrote on 16 Sep 2019 11:44
(name . Ludovic Courtès)(address . ludo@gnu.org)
20190916094454.GA1265@jurong
Hello,

On Thu, Sep 12, 2019 at 01:40:09PM +0200, Ludovic Courtès wrote:
Toggle quote (2 lines)
> Thanks for the report, Andreas!

and thanks for the time spent putting me on the good track!

Toggle quote (11 lines)
> Andreas Enge <andreas@enge.fr> skribis:
> > xfwm4: ../mesa-19.1.4/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1293: intel_miptree_match_image: Zusicherung »image->TexObject->Target == mt->target« nicht erfüllt.
> That’s the likely root cause to me (in which case it may be unrelated to
> this <https://issues.guix.gnu.org/issue/36924>, after all.)
> I found these bug reports:
> https://bugs.freedesktop.org/show_bug.cgi?id=107117
> https://bugzilla.redhat.com/show_bug.cgi?id=1678334
>
> In both cases, Xfce and Mesa’s i965 drivers are involved, as is the case
> on your machine. The 2nd bug report includes an xfwm4 patch, even.

The first one also contains a patch, but it has been integrated into later
mesa releases, in particular the one we are using.

Toggle quote (6 lines)
> I wonder if Xfce before the recent updates (so before
> 8549e0ca6fd68a57253471436de49b88b2d47e64) works better.
> Andreas, if you feel like it, could you try:
> guix pull --commit=97ce5964fb5d52cf2151fea685e28fa23a98b264
> sudo guix system reconfigure …

Indeed, the problem disappears with this commit; I can log in and out
and in again with xfce working. So I am cc-ing the author of the commits
updating xfce, maybe they have an answer!

And I will try to look at the patch in the second report you reference
above.

Thanks!

Andreas
L
L
L p R n d n wrote on 16 Sep 2019 16:57
(name . Andreas Enge)(address . andreas@enge.fr)
878sqoe1bi.fsf@lprndn.info
Hello,

Andreas Enge <andreas@enge.fr> writes:

Toggle quote (33 lines)
> Hello,
>
> On Thu, Sep 12, 2019 at 01:40:09PM +0200, Ludovic Courtès wrote:
>> Thanks for the report, Andreas!
>
> and thanks for the time spent putting me on the good track!
>
>> Andreas Enge <andreas@enge.fr> skribis:
>> > xfwm4: ../mesa-19.1.4/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1293:
>> intel_miptree_match_image: Zusicherung »image->TexObject->Target ==
>> mt->target« nicht erfüllt.
>> That’s the likely root cause to me (in which case it may be unrelated to
>> this <https://issues.guix.gnu.org/issue/36924>, after all.)
>> I found these bug reports:
>> https://bugs.freedesktop.org/show_bug.cgi?id=107117
>> https://bugzilla.redhat.com/show_bug.cgi?id=1678334
>>
>> In both cases, Xfce and Mesa’s i965 drivers are involved, as is the case
>> on your machine. The 2nd bug report includes an xfwm4 patch, even.
>
> The first one also contains a patch, but it has been integrated into later
> mesa releases, in particular the one we are using.
>
>> I wonder if Xfce before the recent updates (so before
>> 8549e0ca6fd68a57253471436de49b88b2d47e64) works better.
>> Andreas, if you feel like it, could you try:
>> guix pull --commit=97ce5964fb5d52cf2151fea685e28fa23a98b264
>> sudo guix system reconfigure …
>
> Indeed, the problem disappears with this commit; I can log in and out
> and in again with xfce working. So I am cc-ing the author of the commits
> updating xfce, maybe they have an answer!

It seems some bugs have been introduced in xfwm4 between 4.12 and 4.14.
(All issues previously linked are for >=4.13 wich was the dev version of
4.14).
interesting. Please let us know if it changes anything.

I don't know what would be the correct way to deal with the problem in
guix though.

Toggle quote (7 lines)
> And I will try to look at the patch in the second report you reference
> above.
>
> Thanks!
>
> Andreas

Have a nice day,

L p r n d n
A
A
Andreas Enge wrote on 27 Dec 2019 08:25
Xfce not starting
(address . 36924@debbugs.gnu.org)
20191227072541.GA842@jurong
Hello,

after trying to reconfigure with commit 02b6382169192367e97a2d1bc72f8eb3ed38b0dc
of December 9, I am now running into a problem where I cannot log into my xfce
session under gdm any more: According to the first tty, a session is opened
and closed immediately again, and the gdm login screen reappears.

It is not enough to delete /var/lib/gdm, ~/.cache and ~/.local. Could I try
anything else?

Luckily, there is "guix system rollback" to my working configuration of
September, but I am not very comfortable with running such an old system...

Andreas
L
L
Ludovic Courtès wrote on 30 Dec 2019 20:00
(name . Andreas Enge)(address . andreas@enge.fr)(address . 36924@debbugs.gnu.org)
87o8vp63m0.fsf@gnu.org
Hi Andreas,

Andreas Enge <andreas@enge.fr> skribis:

Toggle quote (5 lines)
> after trying to reconfigure with commit 02b6382169192367e97a2d1bc72f8eb3ed38b0dc
> of December 9, I am now running into a problem where I cannot log into my xfce
> session under gdm any more: According to the first tty, a session is opened
> and closed immediately again, and the gdm login screen reappears.

Did you try your config with the same commit in ‘guix system vm’? Does
it reproduce the problem?

If it does not, that means the problem has to do with state, things like
/var/lib/gdm as you mentioned.

Thanks,
Ludo’.
?
Your comment

Commenting via the web interface is currently disabled.

To comment on this conversation send email to 36924@debbugs.gnu.org