Can't restart/halt system with shepherd 0.9.3 after upgrading

  • Done
  • quality assurance status badge
Details
3 participants
  • david larsson
  • Ludovic Courtès
  • Christopher Baines
Owner
unassigned
Submitted by
Christopher Baines
Severity
normal
C
C
Christopher Baines wrote on 24 May 2023 12:27
(address . bug-guix@gnu.org)
87h6s2m38h.fsf@cbaines.net
Hey!

On a system running shepherd 0.9.3 [1], I've reconfigured, but now can't
reboot or halt.

root@hamal ~# halt
Service root is not running.

1: /gnu/store/y6w0xix15cq08qasmq75f04yzgbl98jx-shepherd-0.9.3
-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmRt5x5fFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9XdN0A//VMDRayaiJoYbpdIKt+H9lFyI/I91jXmI
OZz1vN2DfzZzUs0Hl2bFwTlax9rxXb6vzW1rX+IeASnCpmePvStcoOAoNECrSaNV
wttxoNfyQtg6ABK64SxLDZVw3kcppafqiUCKWuPiMM+aaXqfxdw1P4d1OoeTuN2D
0bcZaD5QQ7jDFscVXwogKfSoqu+VJoWAsnx4dW7utaK6QL/xMAM4nm8lMjtut+qN
BOWaJRVEfH2wHAJmDI0oE65U0kNVbez8nb1hrni+hBMKeR7LzTZ6DBSA9KaTU+XU
gM6NBGGAo1IQFxTEEd59WevTiZuNKeY97KdmADOHLCVccKhjUEICeDgvfLvRReWc
OCaj/ySXiayJos2aIWNBxaKDxdjZ2dt5T9G+2OGbiGlD8u4gZDMm/fEcRTpgs5ou
2CsySfVJVRh0SC5DB+CGv3KTrY0/KFm5XiFm9SgJFjfGKo024BFOg/m1hqJmJvSV
vrwEwMe7jbkqje2Vzr9a0HEEuERaQuy43IwkQ/U0tkdhdeyPTtCpUmxLsTolZbH8
vckgFcbB0kIN9lS4Pfav9t2mDfLO8ZdKJF6x90KCEgOAXl6iG2NeUbfHUcYaNVat
XnuD4KwTjwSUtF2sj9HsLn31HU4kBjA4eu4uO/mpWuhkSGsU/DfQiq1o8K6zIG18
eS4zFucIfo4=
=5KJx
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 25 May 2023 15:13
(name . Christopher Baines)(address . mail@cbaines.net)(address . 63678@debbugs.gnu.org)
878rdceeq5.fsf@gnu.org
Hi,

Christopher Baines <mail@cbaines.net> skribis:

Toggle quote (6 lines)
> On a system running shepherd 0.9.3 [1], I've reconfigured, but now can't
> reboot or halt.
>
> root@hamal ~# halt
> Service root is not running.

Hey, why halt it if it’s not running?

Seriously though, any insight from /var/log/messages? I upgraded a
bunch of machines and didn’t hit this particular problem. Bruno
reported a similar problem with 0.9.3, but this had nothing to do with
the upgrade:


Could it be the same problem? Do you see:

Assertion (eq? (canonical-name new) (canonical-name old)) failed.

in /var/log/messages?

Ludo’.
C
C
Christopher Baines wrote on 25 May 2023 15:20
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 63678@debbugs.gnu.org)
87fs7k4kc1.fsf@cbaines.net
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (25 lines)
> Hi,
>
> Christopher Baines <mail@cbaines.net> skribis:
>
>> On a system running shepherd 0.9.3 [1], I've reconfigured, but now can't
>> reboot or halt.
>>
>> root@hamal ~# halt
>> Service root is not running.
>
> Hey, why halt it if it’s not running?
>
> Seriously though, any insight from /var/log/messages? I upgraded a
> bunch of machines and didn’t hit this particular problem. Bruno
> reported a similar problem with 0.9.3, but this had nothing to do with
> the upgrade:
>
> https://issues.guix.gnu.org/62619
>
> Could it be the same problem? Do you see:
>
> Assertion (eq? (canonical-name new) (canonical-name old)) failed.
>
> in /var/log/messages?

I don't see that, but I think these are the relevant log messages:

May 24 11:17:02 localhost shepherd[1]: Evaluating user expression (and (defined? (quote transient?)) (map (# ?) ?)).
May 24 11:17:02 localhost shepherd[1]: Evaluating user expression (register-services (primitive-load "/gnu/st?") ?).
May 24 11:17:03 localhost shepherd[1]: Service host-name has been started.
May 24 11:17:03 localhost shepherd[1]: Service user-homes has been started.
May 24 11:17:03 localhost shepherd[1]: [sysctl] fs.protected_hardlinks = 1
May 24 11:17:03 localhost shepherd[1]: [sysctl] fs.protected_symlinks = 1
May 24 11:18:41 localhost shepherd[1]: Exiting shepherd...
May 24 11:18:46 localhost shepherd[1]: Grace period of 5 seconds is over; sending -337 SIGKILL.
May 24 11:23:55 localhost shepherd[1]: Service root is not running.
May 24 11:24:16 localhost last message repeated 2 times
May 24 11:30:49 localhost syslogd (GNU inetutils 2.3): restart
May 24 11:30:49 localhost vmunix: [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd083]
May 24 11:30:49 localhost vmunix: [ 0.000000] Linux version 6.3.3-arm64-generic (guix@guix) (gcc (GCC) 11.3.0, GNU ld (GNU Binutils) 2.38) #1 SMP PREEMPT 1
-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmRvYQ5fFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9XcXbg//QToMO9KPhWwc47xSWG6rxCHUb5b2DbBz
lm2U++an9wg0DWBwbMRxCWO/lg52Tm0W/rIBbSjyMnUHtOpCPZPIKH8Zc9uymzMj
am2GBAVZLVW/oBYAAC4DPj8hD1OVIuBgQB0bA33wsu7JVdKVgfJkt7MWlocbiJzL
ki0ZA9S+8BHbZf7+S9z9YaF4ry3VNx9F8CPJDeVqKz1nhMY95/yVyZwLrXdILBug
nty1gofP3brYRJjEcITNp2k68b5A0qabXxCrUup6FMegk9cNXmu3OJ6+71Z8pTxW
MKa43bvcbP60j/HgbLk/WfQ+G1pGiNqYik6OLfPlnhlSfw0Ef2rdWU+r5tTlwj9g
BjEg7xo38qOM3gJZWxfB+9XnzLGMFBLcX4WgGrUeJqeTr714hA2+yV+De8zRiLDC
nju1mgV9ZkBRWCVIdzenSmwoLlZz44Osb+S1loi0NlcUnj6t+FDNwC+N0czjczbh
CW58h89IcbADWDEPgqMeF1zSPT1Efl2QHV9nJ/9HBSLMoobMvfEAyhQyaYRWeL8z
zjf4hhRVtlh10BKKI17H6pQ1SVQynntlup2l4m3vKiQ3ZchaIofqhhBbr41/oolb
o9CxwP0IOhqslEHT/eV8+fQwI7DyX6MwcTbc0QiCrqlEa/ZoXrhgAwjwo486141Q
nRvhmHChSnw=
=UP+c
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 27 May 2023 19:04
(name . Christopher Baines)(address . mail@cbaines.net)(address . 63678@debbugs.gnu.org)
87pm6lhfiy.fsf@gnu.org
Hi,

Christopher Baines <mail@cbaines.net> skribis:

Toggle quote (10 lines)
> May 24 11:17:02 localhost shepherd[1]: Evaluating user expression (and (defined? (quote transient?)) (map (# ?) ?)).
> May 24 11:17:02 localhost shepherd[1]: Evaluating user expression (register-services (primitive-load "/gnu/st?") ?).
> May 24 11:17:03 localhost shepherd[1]: Service host-name has been started.
> May 24 11:17:03 localhost shepherd[1]: Service user-homes has been started.
> May 24 11:17:03 localhost shepherd[1]: [sysctl] fs.protected_hardlinks = 1
> May 24 11:17:03 localhost shepherd[1]: [sysctl] fs.protected_symlinks = 1
> May 24 11:18:41 localhost shepherd[1]: Exiting shepherd...
> May 24 11:18:46 localhost shepherd[1]: Grace period of 5 seconds is over; sending -337 SIGKILL.
> May 24 11:23:55 localhost shepherd[1]: Service root is not running.

The grace period expiration thing is probably due to the fact that
shepherd is no longer processing signals, as I described here:


Could you share all of /var/log/messages (possibly privately, and
limiting to “shepherd” lines) starting from when the machine booted?
I’d like to see if there are hints of something that went wrong.

Ludo’.
C
C
Christopher Baines wrote on 29 May 2023 20:33
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 63678@debbugs.gnu.org)
87leh7oufa.fsf@cbaines.net
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (23 lines)
> Hi,
>
> Christopher Baines <mail@cbaines.net> skribis:
>
>> May 24 11:17:02 localhost shepherd[1]: Evaluating user expression (and (defined? (quote transient?)) (map (# ?) ?)).
>> May 24 11:17:02 localhost shepherd[1]: Evaluating user expression (register-services (primitive-load "/gnu/st?") ?).
>> May 24 11:17:03 localhost shepherd[1]: Service host-name has been started.
>> May 24 11:17:03 localhost shepherd[1]: Service user-homes has been started.
>> May 24 11:17:03 localhost shepherd[1]: [sysctl] fs.protected_hardlinks = 1
>> May 24 11:17:03 localhost shepherd[1]: [sysctl] fs.protected_symlinks = 1
>> May 24 11:18:41 localhost shepherd[1]: Exiting shepherd...
>> May 24 11:18:46 localhost shepherd[1]: Grace period of 5 seconds is over; sending -337 SIGKILL.
>> May 24 11:23:55 localhost shepherd[1]: Service root is not running.
>
> The grace period expiration thing is probably due to the fact that
> shepherd is no longer processing signals, as I described here:
>
> https://issues.guix.gnu.org/63736
>
> Could you share all of /var/log/messages (possibly privately, and
> limiting to “shepherd” lines) starting from when the machine booted?
> I’d like to see if there are hints of something that went wrong.

The machine is hamal (one of the HoneyComb's) and I've added a user for
you now and added the SSH key from maintenance.git.

So you should be able to: ssh ludo@hamal.cbaines.net

Your users password is also in your home directory.
-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmR08PlfFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9XfWbw/7BjAJp6vBGfI0T7a7ONfnunUnSBYFT+/U
o1te/2cDL0H3kGqlHUFjOL/6yds2UtrBZVVOwd9mni242VwLqhD8tt0qnenluxSb
bq7qvtFQPsb3udJRqsC8HIE8gsRy/8qjevuPq/lAdk0ATRZ8AYjsD4yNHG8+rMtS
7N0o6SJ/oVYx9A4xygzEmB18jbYMw4tKB64W9vvGkfqGfLA35Vd1b3/wYjo+/zy7
jMRs0CjLQj/ZaVFMfQwPacmAy5pgONHDOXOmEEA01aYIPGJpA+VNN0pzXXQmjvRy
i01rWIyF9oaYjomYAT27cvmNoeGXMq+rEC0o8A7fO3v17dCkl5Wc6+b3xEA3l8bC
dY5jlHNQ5Ckrq7nYyoB1qTea29PoWlFcumcBQ7kqR62NFo7ocJCWlOORJMM8Bqf6
tddjd6/n8WhFBcZZb6SuRlIhfVkKi1VQIYQHUvrIE0lqiXLRnJQksGyW5FVqE/E5
8qcgTk6ljuxbasx4mbO+G8nPvIBBBHLmM3+2fd3tvRSIm4OiCjc4nzni1+IBHD9v
TBrTaN7QLpFMh0vvdaHyzO+09RMd+l5RS6VaUxqa2MbcH6JDY6/SfcGPJt3qJHfE
txXcHVI/qT8NPHmEqpoHVsNYWo3nMWHF+3wMiXf9aPd53lOhYDXb2aVNamBPhpi4
EzjDFvH6Xao=
=1pDU
-----END PGP SIGNATURE-----

D
D
david larsson wrote on 29 May 2023 21:19
(name . Christopher Baines)(address . mail@cbaines.net)
a527fa97a7c7b9cf6a846b4923c47555@selfhosted.xyz
On 2023-05-24 12:27, Christopher Baines wrote:
Toggle quote (11 lines)
> Hey!
>
> On a system running shepherd 0.9.3 [1], I've reconfigured, but now
> can't
> reboot or halt.
>
> root@hamal ~# halt
> Service root is not running.
>
> 1: /gnu/store/y6w0xix15cq08qasmq75f04yzgbl98jx-shepherd-0.9.3

FWIW, this has happened to me a bunch of times, I just never reported
it. Sometimes I was able to just login as root and run herd start root
to fix it.

I have an impression, from the "bunch of times" I've experienced, that
service root doesn't fail to work because of the system reconfigure, but
for some other reason.


Best regards,
David
L
L
Ludovic Courtès wrote on 6 Jun 2023 17:06
(name . Christopher Baines)(address . mail@cbaines.net)(address . 63678@debbugs.gnu.org)
877csg8wbp.fsf@gnu.org
Hi,

Christopher Baines <mail@cbaines.net> skribis:

Toggle quote (30 lines)
> Ludovic Courtès <ludo@gnu.org> writes:
>
>> Hi,
>>
>> Christopher Baines <mail@cbaines.net> skribis:
>>
>>> May 24 11:17:02 localhost shepherd[1]: Evaluating user expression (and (defined? (quote transient?)) (map (# ?) ?)).
>>> May 24 11:17:02 localhost shepherd[1]: Evaluating user expression (register-services (primitive-load "/gnu/st?") ?).
>>> May 24 11:17:03 localhost shepherd[1]: Service host-name has been started.
>>> May 24 11:17:03 localhost shepherd[1]: Service user-homes has been started.
>>> May 24 11:17:03 localhost shepherd[1]: [sysctl] fs.protected_hardlinks = 1
>>> May 24 11:17:03 localhost shepherd[1]: [sysctl] fs.protected_symlinks = 1
>>> May 24 11:18:41 localhost shepherd[1]: Exiting shepherd...
>>> May 24 11:18:46 localhost shepherd[1]: Grace period of 5 seconds is over; sending -337 SIGKILL.
>>> May 24 11:23:55 localhost shepherd[1]: Service root is not running.
>>
>> The grace period expiration thing is probably due to the fact that
>> shepherd is no longer processing signals, as I described here:
>>
>> https://issues.guix.gnu.org/63736
>>
>> Could you share all of /var/log/messages (possibly privately, and
>> limiting to “shepherd” lines) starting from when the machine booted?
>> I’d like to see if there are hints of something that went wrong.
>
> The machine is hamal (one of the HoneyComb's) and I've added a user for
> you now and added the SSH key from maintenance.git.
>
> So you should be able to: ssh ludo@hamal.cbaines.net

Doesn’t work right now; anything in the logs?

Ludo’.
C
C
Christopher Baines wrote on 7 Jun 2023 16:09
(address . 63678@debbugs.gnu.org)
877csf1hyp.fsf@cbaines.net
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (36 lines)
> Hi,
>
> Christopher Baines <mail@cbaines.net> skribis:
>
>> Ludovic Courtès <ludo@gnu.org> writes:
>>
>>> Hi,
>>>
>>> Christopher Baines <mail@cbaines.net> skribis:
>>>
>>>> May 24 11:17:02 localhost shepherd[1]: Evaluating user expression (and (defined? (quote transient?)) (map (# ?) ?)).
>>>> May 24 11:17:02 localhost shepherd[1]: Evaluating user expression (register-services (primitive-load "/gnu/st?") ?).
>>>> May 24 11:17:03 localhost shepherd[1]: Service host-name has been started.
>>>> May 24 11:17:03 localhost shepherd[1]: Service user-homes has been started.
>>>> May 24 11:17:03 localhost shepherd[1]: [sysctl] fs.protected_hardlinks = 1
>>>> May 24 11:17:03 localhost shepherd[1]: [sysctl] fs.protected_symlinks = 1
>>>> May 24 11:18:41 localhost shepherd[1]: Exiting shepherd...
>>>> May 24 11:18:46 localhost shepherd[1]: Grace period of 5 seconds is over; sending -337 SIGKILL.
>>>> May 24 11:23:55 localhost shepherd[1]: Service root is not running.
>>>
>>> The grace period expiration thing is probably due to the fact that
>>> shepherd is no longer processing signals, as I described here:
>>>
>>> https://issues.guix.gnu.org/63736
>>>
>>> Could you share all of /var/log/messages (possibly privately, and
>>> limiting to “shepherd” lines) starting from when the machine booted?
>>> I’d like to see if there are hints of something that went wrong.
>>
>> The machine is hamal (one of the HoneyComb's) and I've added a user for
>> you now and added the SSH key from maintenance.git.
>>
>> So you should be able to: ssh ludo@hamal.cbaines.net
>
> Doesn’t work right now; anything in the logs?

I believe I sorted access for Ludo, but nothing was found when looking
at the logs.
-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmSAj85fFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9XeSCQ//Ra+c3B5CwFnKfi6QrfsCJQU7Pv63/zuW
kyuzM9eWTORB1awPyAh+QwDJ2X+/JYWMHtyFPYv9Vpk9EoHPZ/X8UH8SLXd6KmJP
RO79lGf3c9gp2oZwRvVGKPlopaaCTDgViALg1z5DRr9TOQKlRUtNfnDLjNb+vkU6
jzU53YZWyWmxBNZSgIZG3XOY729hIrenRn9RvEuWVXeHtfxmft0b7JlhdzpLDjS6
ezGi8Szjy1op5Wf5b9G7jDEwqbtqGoAZd2bGjLcElHzTT3Zc+A5ro6fF8VpA2uGr
MeUWHWWmN5oruSsWtpiFioQ4iXctyqGCkJXNOe7yiK89R7y1Lg8zCDlnaOcWkyjj
UMvdNpL5ALnYvKMgXiIOg9LxWQK1bmVWl0sJZYqbHk/assccshFB8iN7uk0CurGr
Luw2a2A3nkatr7W92KCU0DxpbIbtIqM3GtnjO7DSP7avK45Z0CmMEFNJXFTMKqzg
fClfDgMTkX5YEmGMcEhLaIp2JktjGhp9I4ViWg19YowKdbKs8LJwPRdtT7UZPTHV
OZirT5GTVqPOpnbBzbLsCLfEu7Qxd4CLto1hgPWLqjCmVSukX1hJgdLxqKcyKiDe
yhQ2/ypsv+odi+YZOWlPK1ngyiV55xnLBw7okhy6wkMANrnWegDVpLMpR0Mw1mSd
J8l7009FPrE=
=TT8i
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 20 Mar 19:09 +0100
(name . Christopher Baines)(address . mail@cbaines.net)(address . 63678-done@debbugs.gnu.org)
87y1acydu8.fsf@gnu.org
Christopher Baines <mail@cbaines.net> skribis:

Toggle quote (3 lines)
> I believe I sorted access for Ludo, but nothing was found when looking
> at the logs.

I’m closing it. Let’s reopen if we stumble upon a similar issue.

Ludo’.
Closed
?