shepherd exits for no good reason

  • Done
  • quality assurance status badge
Details
3 participants
  • Ludovic Courtès
  • Mathieu Othacehe
  • Mathieu Othacehe
Owner
unassigned
Submitted by
Ludovic Courtès
Severity
important
L
L
Ludovic Courtès wrote on 7 May 2020 12:32
(address . bug-guix@gnu.org)
87bln0doh2.fsf@inria.fr
Hello,

I witnessed a case with Shepherd 0.8.0 on ‘core-updates’
(7b07852ddb334c92bcef69666f21c599f1f0fa79) where shepherd exited all by
itself, all of a sudden. Here’s what /var/log/messages shows:

Toggle snippet (16 lines)
May 7 09:36:23 localhost vmunix: [ 20.316829] shepherd[1]: Service user-homes has been started.
May 7 09:36:23 localhost vmunix: [ 21.319625] shepherd[1]: Service nscd has been started.
May 7 09:36:23 localhost vmunix: [ 21.321029] shepherd[1]: Service guix-daemon has been started.

[…]

May 7 09:36:52 localhost shepherd[1]: Exiting shepherd...
May 7 09:36:52 localhost shepherd[1]: Service xorg-server has been stopped.
May 7 09:36:52 localhost shepherd[1]: Service console-font-tty2 has been stopped.
May 7 09:36:52 localhost shepherd[1]: Service term-tty2 has been stopped.
May 7 09:36:52 localhost shepherd[1]: Service upower-daemon has been stopped.
May 7 09:36:52 localhost shepherd[1]: Service elogind has been stopped.
May 7 09:36:52 localhost ntpd[482]: ntpd exiting on signal 15 (Terminated)
May 7 09:36:52 localhost syslogd: exiting on signal 15

The end result was a kernel panic with exitcode=0x100 (meaning exited
with 1).

It looks as though one had run ‘herd stop root’.

Ludo’.
M
M
Mathieu Othacehe wrote on 7 May 2020 18:55
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 41123@debbugs.gnu.org)
87imh7elch.fsf@gmail.com
Hey Ludo,

Toggle quote (14 lines)
> May 7 09:36:52 localhost shepherd[1]: Exiting shepherd...
> May 7 09:36:52 localhost shepherd[1]: Service xorg-server has been stopped.
> May 7 09:36:52 localhost shepherd[1]: Service console-font-tty2 has been stopped.
> May 7 09:36:52 localhost shepherd[1]: Service term-tty2 has been stopped.
> May 7 09:36:52 localhost shepherd[1]: Service upower-daemon has been stopped.
> May 7 09:36:52 localhost shepherd[1]: Service elogind has been stopped.
> May 7 09:36:52 localhost ntpd[482]: ntpd exiting on signal 15 (Terminated)
> May 7 09:36:52 localhost syslogd: exiting on signal 15
>
> The end result was a kernel panic with exitcode=0x100 (meaning exited
> with 1).
>
> It looks as though one had run ‘herd stop root’.

It could be related to this bug[1]. The problem is that on 0.8.0 a
process restart can cause a root-service stop.

On your log, I can't see a process being restarted, so it might also be
unrelated.

Mathieu

L
L
Ludovic Courtès wrote on 10 May 2020 12:38
(name . Mathieu Othacehe)(address . m.othacehe@gmail.com)(address . 41123@debbugs.gnu.org)
87lfm09isl.fsf@gnu.org
Hi,

Mathieu Othacehe <m.othacehe@gmail.com> skribis:

Toggle quote (20 lines)
>> May 7 09:36:52 localhost shepherd[1]: Exiting shepherd...
>> May 7 09:36:52 localhost shepherd[1]: Service xorg-server has been stopped.
>> May 7 09:36:52 localhost shepherd[1]: Service console-font-tty2 has been stopped.
>> May 7 09:36:52 localhost shepherd[1]: Service term-tty2 has been stopped.
>> May 7 09:36:52 localhost shepherd[1]: Service upower-daemon has been stopped.
>> May 7 09:36:52 localhost shepherd[1]: Service elogind has been stopped.
>> May 7 09:36:52 localhost ntpd[482]: ntpd exiting on signal 15 (Terminated)
>> May 7 09:36:52 localhost syslogd: exiting on signal 15
>>
>> The end result was a kernel panic with exitcode=0x100 (meaning exited
>> with 1).
>>
>> It looks as though one had run ‘herd stop root’.
>
> It could be related to this bug[1]. The problem is that on 0.8.0 a
> process restart can cause a root-service stop.
>
> On your log, I can't see a process being restarted, so it might also be
> unrelated.

It looks very much the same though: it’s stopping itself, which most
likely happens as a result of killing itself. I’ve merged them, we’ll
see!

Ludo’.
L
L
Ludovic Courtès wrote on 14 May 2020 14:19
control message for bug #41123
(address . control@debbugs.gnu.org)
87zhaavhds.fsf@gnu.org
severity 41123 important
quit
M
M
Mathieu Othacehe wrote on 26 Jul 2020 22:28
Re: bug#41123: shepherd exits for no good reason
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 41123-done@debbugs.gnu.org)
875zaa81ii.fsf@gnu.org
Hey,

Toggle quote (4 lines)
> It looks very much the same though: it’s stopping itself, which most
> likely happens as a result of killing itself. I’ve merged them, we’ll
> see!

This is probably fixed in Shepherd 0.8.1. Closing this one.

Thanks,

Mathieu
Closed
?