Stopping gdm-service results in an unresponsive system

  • Open
  • quality assurance status badge
Details
2 participants
  • Mark H Weaver
  • Timothy Sample
Owner
unassigned
Submitted by
Mark H Weaver
Severity
normal
M
M
Mark H Weaver wrote on 30 Apr 2019 22:42
(address . bug-guix@gnu.org)
877ebbmdhc.fsf@netris.org
On my x86_64-linux system running the Guix system, when I include
gdm-service in my system services, 'herd stop xorg-server' results in a
state where I seemingly cannot recover except by rebooting. I'm left in
what appears to be an empty Linux text console with a cursor in the top
left corner, but the keyboard is unresponsive, and I'm not able to
switch VTs. Perhaps there is some SysRq key combination that could be
used to recover, but I haven't yet tried.

Since I prefer to use Wayland, and would rather not have a separate Xorg
session running that I never use, this means that currently I must avoid
using 'gdm-service' entirely.

Note that this is on a system running fairly recent 'master', but before
'staging' was merged. I'll try again and report back after I've
finished rebuilding my post-staging-merge system.

Regards,
Mark
T
T
Timothy Sample wrote on 1 May 2019 14:36
(name . Mark H Weaver)(address . mhw@netris.org)(address . 35509@debbugs.gnu.org)
87imuupd0h.fsf@ngyro.com
Hi Mark,

Mark H Weaver <mhw@netris.org> writes:

Toggle quote (8 lines)
> On my x86_64-linux system running the Guix system, when I include
> gdm-service in my system services, 'herd stop xorg-server' results in a
> state where I seemingly cannot recover except by rebooting. I'm left in
> what appears to be an empty Linux text console with a cursor in the top
> left corner, but the keyboard is unresponsive, and I'm not able to
> switch VTs. Perhaps there is some SysRq key combination that could be
> used to recover, but I haven't yet tried.

This has been an issue with GDM since I started working on it. IIRC,
it’s not entirely deterministic (but it fails most of the time). I
don’t have any leads on this yet.

Toggle quote (4 lines)
> Since I prefer to use Wayland, and would rather not have a separate Xorg
> session running that I never use, this means that currently I must avoid
> using 'gdm-service' entirely.

Yes. The service does not currently support Wayland. I believe that
Wayland support will require a few modifications to GDM itself. At
least, we ended up modifying the X session startup a little bit, and I
would guess that the Wayland session startup would need similar changes.

Toggle quote (4 lines)
> Note that this is on a system running fairly recent 'master', but before
> 'staging' was merged. I'll try again and report back after I've
> finished rebuilding my post-staging-merge system.

Unfortunately, I wouldn’t expect the changes from staging to help here.


-- Tim
T
T
Timothy Sample wrote on 2 May 2019 21:45
(name . Mark H Weaver)(address . mhw@netris.org)(address . 35509@debbugs.gnu.org)
87ef5gprmy.fsf@ngyro.com
Hi again,

Timothy Sample <samplet@ngyro.com> writes:

Toggle quote (14 lines)
> Mark H Weaver <mhw@netris.org> writes:
>
>> On my x86_64-linux system running the Guix system, when I include
>> gdm-service in my system services, 'herd stop xorg-server' results in a
>> state where I seemingly cannot recover except by rebooting. I'm left in
>> what appears to be an empty Linux text console with a cursor in the top
>> left corner, but the keyboard is unresponsive, and I'm not able to
>> switch VTs. Perhaps there is some SysRq key combination that could be
>> used to recover, but I haven't yet tried.
>
> This has been an issue with GDM since I started working on it. IIRC,
> it’s not entirely deterministic (but it fails most of the time). I
> don’t have any leads on this yet.

I have a lead now! At least, I have a way to stop GDM and return to a
working TTY. Assuming that you are working on a TTY with elogind
session “c1”, you can run

herd stop xorg-server & (sleep 5; loginctl activate c1)

When GDM exits, it leaves the display in a non-working state. It turns
out elogind knows how to fix this. I’m guessing it does some magic with
the VT_* set of ioctl requests (see “src/basic/terminal-util.c” from
elogind). I’m not sure how to get GDM to clean up after itself, though.
It might be expecting things of elogind that it doesn’t provide (since
it is not exactly like the original logind from systemd).


-- Tim
M
M
Mark H Weaver wrote on 2 May 2019 23:46
(name . Timothy Sample)(address . samplet@ngyro.com)(address . 35509@debbugs.gnu.org)
877eb8o7fx.fsf@netris.org
Hi Timothy,

Timothy Sample <samplet@ngyro.com> writes:

Toggle quote (13 lines)
> I have a lead now! At least, I have a way to stop GDM and return to a
> working TTY. Assuming that you are working on a TTY with elogind
> session “c1”, you can run
>
> herd stop xorg-server & (sleep 5; loginctl activate c1)
>
> When GDM exits, it leaves the display in a non-working state. It turns
> out elogind knows how to fix this. I’m guessing it does some magic with
> the VT_* set of ioctl requests (see “src/basic/terminal-util.c” from
> elogind). I’m not sure how to get GDM to clean up after itself, though.
> It might be expecting things of elogind that it doesn’t provide (since
> it is not exactly like the original logind from systemd).

Thanks for investigating!

My first guess is that when GDM is killed, it's leaving the keyboard
in RAW mode. Running "kbd_mode -a" might be another way to recover.
"Alt + SysRq + r" might be another way. I'll try again after I finish
building my post-staging-merge system.


I notice that in Debian's start script for gdm3, it runs activate_logind
just before launching GDM, where activate_logind is the following Bash
function:

activate_logind() {
# Try to dbus activate logind to avoid a race conditions if we are not
# running systemd as PID1 and we have systemd << 204 package installed (see:
# #747292)
if [ ! -d /run/systemd/system ] && [ -x /lib/systemd/systemd-logind-launch ]; then
dbus-send --system --print-reply --dest=org.freedesktop.DBus /org/freedesktop/DBus \
org.freedesktop.DBus.StartServiceByName string:org.freedesktop.login1 uint32:0 2>&1 > /dev/null
fi
}

The Debian start script is debian/gdm3.init in

The Debian bug referenced above is https://bugs.debian.org/747292.

Might be worth a try, but admittedly I'm grasping at straws here :)

Mark
T
T
Timothy Sample wrote on 3 May 2019 04:15
(name . Mark H Weaver)(address . mhw@netris.org)(address . 35509@debbugs.gnu.org)
87bm0kz3kf.fsf@ngyro.com
Hi Mark,

Mark H Weaver <mhw@netris.org> writes:

Toggle quote (24 lines)
> Timothy Sample <samplet@ngyro.com> writes:
>
>> I have a lead now! At least, I have a way to stop GDM and return to a
>> working TTY. Assuming that you are working on a TTY with elogind
>> session “c1”, you can run
>>
>> herd stop xorg-server & (sleep 5; loginctl activate c1)
>>
>> When GDM exits, it leaves the display in a non-working state. It turns
>> out elogind knows how to fix this. I’m guessing it does some magic with
>> the VT_* set of ioctl requests (see “src/basic/terminal-util.c” from
>> elogind). I’m not sure how to get GDM to clean up after itself, though.
>> It might be expecting things of elogind that it doesn’t provide (since
>> it is not exactly like the original logind from systemd).
>
> Thanks for investigating!
>
> My first guess is that when GDM is killed, it's leaving the keyboard
> in RAW mode. Running "kbd_mode -a" might be another way to recover.
> "Alt + SysRq + r" might be another way. I'll try again after I finish
> building my post-staging-merge system.
>
> https://www.tldp.org/HOWTO/Keyboard-and-Console-HOWTO-9.html

Indeed. I saw this earlier today. I looked at the source for elogind,
and all it does is “VT_ACTIVATE” – no magic there. The loginctl command
can be replaced with “chvt 1”. The “SysRq + r” trick works too. In
fact, I saw this in the X.org logs:

--------------------------------
(II) UnloadModule: "libinput"
(II) systemd-logind: releasing fd for 13:67
(EE) systemd-logind: failed to release device: Unknown object '/org/freedesktop/login1/session/c4'.
(II) UnloadModule: "libinput"
(II) systemd-logind: releasing fd for 13:68
(EE) systemd-logind: failed to release device: Unknown object '/org/freedesktop/login1/session/c4'.
(II) UnloadModule: "libinput"
(II) systemd-logind: releasing fd for 13:65
(EE) systemd-logind: failed to release device: Unknown object '/org/freedesktop/login1/session/c4'.
(II) UnloadModule: "libinput"
(II) systemd-logind: releasing fd for 13:64
(EE) systemd-logind: failed to release device: Unknown object '/org/freedesktop/login1/session/c4'.
(EE) systemd-logind: ReleaseControl failed: Unknown object '/org/freedesktop/login1/session/c4'.
(II) Server terminated successfully (0). Closing log file.
--------------------------------

I wonder if GDM is destroying the session before X can call its
“ReleaseControl” method. Maybe this keeps X from restoring the terminal
properly.

Toggle quote (21 lines)
> I notice that in Debian's start script for gdm3, it runs activate_logind
> just before launching GDM, where activate_logind is the following Bash
> function:
>
> activate_logind() {
> # Try to dbus activate logind to avoid a race conditions if we are not
> # running systemd as PID1 and we have systemd << 204 package installed (see:
> # #747292)
> if [ ! -d /run/systemd/system ] && [ -x /lib/systemd/systemd-logind-launch ]; then
> dbus-send --system --print-reply --dest=org.freedesktop.DBus /org/freedesktop/DBus \
> org.freedesktop.DBus.StartServiceByName string:org.freedesktop.login1 uint32:0 2>&1 > /dev/null
> fi
> }
>
> The Debian start script is debian/gdm3.init in
> <http://deb.debian.org/debian/pool/main/g/gdm3/gdm3_3.22.3-3+deb9u2.debian.tar.xz>.
>
> The Debian bug referenced above is <https://bugs.debian.org/747292>.
>
> Might be worth a try, but admittedly I'm grasping at straws here :)

I gave this a try and... it didn’t help. :(

Looking a little closer at the systemd source, I found out that they
have logic to reset terminal settings when a service becomes “dead” (see
“exec_context_revert_tty” as called from “service_enter_dead” in the
file “src/core/service.c”). I wonder if GDM relies on that.


-- Tim
?