Can't restart/halt system with shepherd 0.9.3 after upgrading

  • Done
  • quality assurance status badge
Details
3 participants
  • david larsson
  • Ludovic Courtès
  • Christopher Baines
Owner
unassigned
Submitted by
Christopher Baines
Severity
normal
C
C
Christopher Baines wrote on 24 May 2023 12:27
(address . bug-guix@gnu.org)
87h6s2m38h.fsf@cbaines.net
Hey!

On a system running shepherd 0.9.3 [1], I've reconfigured, but now can't
reboot or halt.

root@hamal ~# halt
Service root is not running.

1: /gnu/store/y6w0xix15cq08qasmq75f04yzgbl98jx-shepherd-0.9.3
-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmRt5x5fFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9XdN0A//VMDRayaiJoYbpdIKt+H9lFyI/I91jXmI
OZz1vN2DfzZzUs0Hl2bFwTlax9rxXb6vzW1rX+IeASnCpmePvStcoOAoNECrSaNV
wttxoNfyQtg6ABK64SxLDZVw3kcppafqiUCKWuPiMM+aaXqfxdw1P4d1OoeTuN2D
0bcZaD5QQ7jDFscVXwogKfSoqu+VJoWAsnx4dW7utaK6QL/xMAM4nm8lMjtut+qN
BOWaJRVEfH2wHAJmDI0oE65U0kNVbez8nb1hrni+hBMKeR7LzTZ6DBSA9KaTU+XU
gM6NBGGAo1IQFxTEEd59WevTiZuNKeY97KdmADOHLCVccKhjUEICeDgvfLvRReWc
OCaj/ySXiayJos2aIWNBxaKDxdjZ2dt5T9G+2OGbiGlD8u4gZDMm/fEcRTpgs5ou
2CsySfVJVRh0SC5DB+CGv3KTrY0/KFm5XiFm9SgJFjfGKo024BFOg/m1hqJmJvSV
vrwEwMe7jbkqje2Vzr9a0HEEuERaQuy43IwkQ/U0tkdhdeyPTtCpUmxLsTolZbH8
vckgFcbB0kIN9lS4Pfav9t2mDfLO8ZdKJF6x90KCEgOAXl6iG2NeUbfHUcYaNVat
XnuD4KwTjwSUtF2sj9HsLn31HU4kBjA4eu4uO/mpWuhkSGsU/DfQiq1o8K6zIG18
eS4zFucIfo4=
=5KJx
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 25 May 2023 15:13
(name . Christopher Baines)(address . mail@cbaines.net)(address . 63678@debbugs.gnu.org)
878rdceeq5.fsf@gnu.org
Hi,

Christopher Baines <mail@cbaines.net> skribis:

Toggle quote (6 lines)
> On a system running shepherd 0.9.3 [1], I've reconfigured, but now can't
> reboot or halt.
>
> root@hamal ~# halt
> Service root is not running.

Hey, why halt it if it’s not running?

Seriously though, any insight from /var/log/messages? I upgraded a
bunch of machines and didn’t hit this particular problem. Bruno
reported a similar problem with 0.9.3, but this had nothing to do with
the upgrade:


Could it be the same problem? Do you see:

Assertion (eq? (canonical-name new) (canonical-name old)) failed.

in /var/log/messages?

Ludo’.
C
C
Christopher Baines wrote on 25 May 2023 15:20
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 63678@debbugs.gnu.org)
87fs7k4kc1.fsf@cbaines.net
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (25 lines)
> Hi,
>
> Christopher Baines <mail@cbaines.net> skribis:
>
>> On a system running shepherd 0.9.3 [1], I've reconfigured, but now can't
>> reboot or halt.
>>
>> root@hamal ~# halt
>> Service root is not running.
>
> Hey, why halt it if it’s not running?
>
> Seriously though, any insight from /var/log/messages? I upgraded a
> bunch of machines and didn’t hit this particular problem. Bruno
> reported a similar problem with 0.9.3, but this had nothing to do with
> the upgrade:
>
> https://issues.guix.gnu.org/62619
>
> Could it be the same problem? Do you see:
>
> Assertion (eq? (canonical-name new) (canonical-name old)) failed.
>
> in /var/log/messages?

I don't see that, but I think these are the relevant log messages:

May 24 11:17:02 localhost shepherd[1]: Evaluating user expression (and (defined? (quote transient?)) (map (# ?) ?)).
May 24 11:17:02 localhost shepherd[1]: Evaluating user expression (register-services (primitive-load "/gnu/st?") ?).
May 24 11:17:03 localhost shepherd[1]: Service host-name has been started.
May 24 11:17:03 localhost shepherd[1]: Service user-homes has been started.
May 24 11:17:03 localhost shepherd[1]: [sysctl] fs.protected_hardlinks = 1
May 24 11:17:03 localhost shepherd[1]: [sysctl] fs.protected_symlinks = 1
May 24 11:18:41 localhost shepherd[1]: Exiting shepherd...
May 24 11:18:46 localhost shepherd[1]: Grace period of 5 seconds is over; sending -337 SIGKILL.
May 24 11:23:55 localhost shepherd[1]: Service root is not running.
May 24 11:24:16 localhost last message repeated 2 times
May 24 11:30:49 localhost syslogd (GNU inetutils 2.3): restart
May 24 11:30:49 localhost vmunix: [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd083]
May 24 11:30:49 localhost vmunix: [ 0.000000] Linux version 6.3.3-arm64-generic (guix@guix) (gcc (GCC) 11.3.0, GNU ld (GNU Binutils) 2.38) #1 SMP PREEMPT 1
-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmRvYQ5fFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9XcXbg//QToMO9KPhWwc47xSWG6rxCHUb5b2DbBz
lm2U++an9wg0DWBwbMRxCWO/lg52Tm0W/rIBbSjyMnUHtOpCPZPIKH8Zc9uymzMj
am2GBAVZLVW/oBYAAC4DPj8hD1OVIuBgQB0bA33wsu7JVdKVgfJkt7MWlocbiJzL
ki0ZA9S+8BHbZf7+S9z9YaF4ry3VNx9F8CPJDeVqKz1nhMY95/yVyZwLrXdILBug
nty1gofP3brYRJjEcITNp2k68b5A0qabXxCrUup6FMegk9cNXmu3OJ6+71Z8pTxW
MKa43bvcbP60j/HgbLk/WfQ+G1pGiNqYik6OLfPlnhlSfw0Ef2rdWU+r5tTlwj9g
BjEg7xo38qOM3gJZWxfB+9XnzLGMFBLcX4WgGrUeJqeTr714hA2+yV+De8zRiLDC
nju1mgV9ZkBRWCVIdzenSmwoLlZz44Osb+S1loi0NlcUnj6t+FDNwC+N0czjczbh
CW58h89IcbADWDEPgqMeF1zSPT1Efl2QHV9nJ/9HBSLMoobMvfEAyhQyaYRWeL8z
zjf4hhRVtlh10BKKI17H6pQ1SVQynntlup2l4m3vKiQ3ZchaIofqhhBbr41/oolb
o9CxwP0IOhqslEHT/eV8+fQwI7DyX6MwcTbc0QiCrqlEa/ZoXrhgAwjwo486141Q
nRvhmHChSnw=
=UP+c
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 27 May 2023 19:04
(name . Christopher Baines)(address . mail@cbaines.net)(address . 63678@debbugs.gnu.org)
87pm6lhfiy.fsf@gnu.org
Hi,

Christopher Baines <mail@cbaines.net> skribis:

Toggle quote (10 lines)
> May 24 11:17:02 localhost shepherd[1]: Evaluating user expression (and (defined? (quote transient?)) (map (# ?) ?)).
> May 24 11:17:02 localhost shepherd[1]: Evaluating user expression (register-services (primitive-load "/gnu/st?") ?).
> May 24 11:17:03 localhost shepherd[1]: Service host-name has been started.
> May 24 11:17:03 localhost shepherd[1]: Service user-homes has been started.
> May 24 11:17:03 localhost shepherd[1]: [sysctl] fs.protected_hardlinks = 1
> May 24 11:17:03 localhost shepherd[1]: [sysctl] fs.protected_symlinks = 1
> May 24 11:18:41 localhost shepherd[1]: Exiting shepherd...
> May 24 11:18:46 localhost shepherd[1]: Grace period of 5 seconds is over; sending -337 SIGKILL.
> May 24 11:23:55 localhost shepherd[1]: Service root is not running.

The grace period expiration thing is probably due to the fact that
shepherd is no longer processing signals, as I described here:


Could you share all of /var/log/messages (possibly privately, and
limiting to “shepherd” lines) starting from when the machine booted?
I’d like to see if there are hints of something that went wrong.

Ludo’.
C
C
Christopher Baines wrote on 29 May 2023 20:33
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 63678@debbugs.gnu.org)
87leh7oufa.fsf@cbaines.net
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (23 lines)
> Hi,
>
> Christopher Baines <mail@cbaines.net> skribis:
>
>> May 24 11:17:02 localhost shepherd[1]: Evaluating user expression (and (defined? (quote transient?)) (map (# ?) ?)).
>> May 24 11:17:02 localhost shepherd[1]: Evaluating user expression (register-services (primitive-load "/gnu/st?") ?).
>> May 24 11:17:03 localhost shepherd[1]: Service host-name has been started.
>> May 24 11:17:03 localhost shepherd[1]: Service user-homes has been started.
>> May 24 11:17:03 localhost shepherd[1]: [sysctl] fs.protected_hardlinks = 1
>> May 24 11:17:03 localhost shepherd[1]: [sysctl] fs.protected_symlinks = 1
>> May 24 11:18:41 localhost shepherd[1]: Exiting shepherd...
>> May 24 11:18:46 localhost shepherd[1]: Grace period of 5 seconds is over; sending -337 SIGKILL.
>> May 24 11:23:55 localhost shepherd[1]: Service root is not running.
>
> The grace period expiration thing is probably due to the fact that
> shepherd is no longer processing signals, as I described here:
>
> https://issues.guix.gnu.org/63736
>
> Could you share all of /var/log/messages (possibly privately, and
> limiting to “shepherd” lines) starting from when the machine booted?
> I’d like to see if there are hints of something that went wrong.

The machine is hamal (one of the HoneyComb's) and I've added a user for
you now and added the SSH key from maintenance.git.

So you should be able to: ssh ludo@hamal.cbaines.net

Your users password is also in your home directory.
-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmR08PlfFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9XfWbw/7BjAJp6vBGfI0T7a7ONfnunUnSBYFT+/U
o1te/2cDL0H3kGqlHUFjOL/6yds2UtrBZVVOwd9mni242VwLqhD8tt0qnenluxSb
bq7qvtFQPsb3udJRqsC8HIE8gsRy/8qjevuPq/lAdk0ATRZ8AYjsD4yNHG8+rMtS
7N0o6SJ/oVYx9A4xygzEmB18jbYMw4tKB64W9vvGkfqGfLA35Vd1b3/wYjo+/zy7
jMRs0CjLQj/ZaVFMfQwPacmAy5pgONHDOXOmEEA01aYIPGJpA+VNN0pzXXQmjvRy
i01rWIyF9oaYjomYAT27cvmNoeGXMq+rEC0o8A7fO3v17dCkl5Wc6+b3xEA3l8bC
dY5jlHNQ5Ckrq7nYyoB1qTea29PoWlFcumcBQ7kqR62NFo7ocJCWlOORJMM8Bqf6
tddjd6/n8WhFBcZZb6SuRlIhfVkKi1VQIYQHUvrIE0lqiXLRnJQksGyW5FVqE/E5
8qcgTk6ljuxbasx4mbO+G8nPvIBBBHLmM3+2fd3tvRSIm4OiCjc4nzni1+IBHD9v
TBrTaN7QLpFMh0vvdaHyzO+09RMd+l5RS6VaUxqa2MbcH6JDY6/SfcGPJt3qJHfE
txXcHVI/qT8NPHmEqpoHVsNYWo3nMWHF+3wMiXf9aPd53lOhYDXb2aVNamBPhpi4
EzjDFvH6Xao=
=1pDU
-----END PGP SIGNATURE-----

D
D
david larsson wrote on 29 May 2023 21:19
(name . Christopher Baines)(address . mail@cbaines.net)
a527fa97a7c7b9cf6a846b4923c47555@selfhosted.xyz
On 2023-05-24 12:27, Christopher Baines wrote:
Toggle quote (11 lines)
> Hey!
>
> On a system running shepherd 0.9.3 [1], I've reconfigured, but now
> can't
> reboot or halt.
>
> root@hamal ~# halt
> Service root is not running.
>
> 1: /gnu/store/y6w0xix15cq08qasmq75f04yzgbl98jx-shepherd-0.9.3

FWIW, this has happened to me a bunch of times, I just never reported
it. Sometimes I was able to just login as root and run herd start root
to fix it.

I have an impression, from the "bunch of times" I've experienced, that
service root doesn't fail to work because of the system reconfigure, but
for some other reason.


Best regards,
David
L
L
Ludovic Courtès wrote on 6 Jun 2023 17:06
(name . Christopher Baines)(address . mail@cbaines.net)(address . 63678@debbugs.gnu.org)
877csg8wbp.fsf@gnu.org
Hi,

Christopher Baines <mail@cbaines.net> skribis:

Toggle quote (30 lines)
> Ludovic Courtès <ludo@gnu.org> writes:
>
>> Hi,
>>
>> Christopher Baines <mail@cbaines.net> skribis:
>>
>>> May 24 11:17:02 localhost shepherd[1]: Evaluating user expression (and (defined? (quote transient?)) (map (# ?) ?)).
>>> May 24 11:17:02 localhost shepherd[1]: Evaluating user expression (register-services (primitive-load "/gnu/st?") ?).
>>> May 24 11:17:03 localhost shepherd[1]: Service host-name has been started.
>>> May 24 11:17:03 localhost shepherd[1]: Service user-homes has been started.
>>> May 24 11:17:03 localhost shepherd[1]: [sysctl] fs.protected_hardlinks = 1
>>> May 24 11:17:03 localhost shepherd[1]: [sysctl] fs.protected_symlinks = 1
>>> May 24 11:18:41 localhost shepherd[1]: Exiting shepherd...
>>> May 24 11:18:46 localhost shepherd[1]: Grace period of 5 seconds is over; sending -337 SIGKILL.
>>> May 24 11:23:55 localhost shepherd[1]: Service root is not running.
>>
>> The grace period expiration thing is probably due to the fact that
>> shepherd is no longer processing signals, as I described here:
>>
>> https://issues.guix.gnu.org/63736
>>
>> Could you share all of /var/log/messages (possibly privately, and
>> limiting to “shepherd” lines) starting from when the machine booted?
>> I’d like to see if there are hints of something that went wrong.
>
> The machine is hamal (one of the HoneyComb's) and I've added a user for
> you now and added the SSH key from maintenance.git.
>
> So you should be able to: ssh ludo@hamal.cbaines.net

Doesn’t work right now; anything in the logs?

Ludo’.
C
C
Christopher Baines wrote on 7 Jun 2023 16:09
(address . 63678@debbugs.gnu.org)
877csf1hyp.fsf@cbaines.net
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (36 lines)
> Hi,
>
> Christopher Baines <mail@cbaines.net> skribis:
>
>> Ludovic Courtès <ludo@gnu.org> writes:
>>
>>> Hi,
>>>
>>> Christopher Baines <mail@cbaines.net> skribis:
>>>
>>>> May 24 11:17:02 localhost shepherd[1]: Evaluating user expression (and (defined? (quote transient?)) (map (# ?) ?)).
>>>> May 24 11:17:02 localhost shepherd[1]: Evaluating user expression (register-services (primitive-load "/gnu/st?") ?).
>>>> May 24 11:17:03 localhost shepherd[1]: Service host-name has been started.
>>>> May 24 11:17:03 localhost shepherd[1]: Service user-homes has been started.
>>>> May 24 11:17:03 localhost shepherd[1]: [sysctl] fs.protected_hardlinks = 1
>>>> May 24 11:17:03 localhost shepherd[1]: [sysctl] fs.protected_symlinks = 1
>>>> May 24 11:18:41 localhost shepherd[1]: Exiting shepherd...
>>>> May 24 11:18:46 localhost shepherd[1]: Grace period of 5 seconds is over; sending -337 SIGKILL.
>>>> May 24 11:23:55 localhost shepherd[1]: Service root is not running.
>>>
>>> The grace period expiration thing is probably due to the fact that
>>> shepherd is no longer processing signals, as I described here:
>>>
>>> https://issues.guix.gnu.org/63736
>>>
>>> Could you share all of /var/log/messages (possibly privately, and
>>> limiting to “shepherd” lines) starting from when the machine booted?
>>> I’d like to see if there are hints of something that went wrong.
>>
>> The machine is hamal (one of the HoneyComb's) and I've added a user for
>> you now and added the SSH key from maintenance.git.
>>
>> So you should be able to: ssh ludo@hamal.cbaines.net
>
> Doesn’t work right now; anything in the logs?

I believe I sorted access for Ludo, but nothing was found when looking
at the logs.
-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmSAj85fFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9XeSCQ//Ra+c3B5CwFnKfi6QrfsCJQU7Pv63/zuW
kyuzM9eWTORB1awPyAh+QwDJ2X+/JYWMHtyFPYv9Vpk9EoHPZ/X8UH8SLXd6KmJP
RO79lGf3c9gp2oZwRvVGKPlopaaCTDgViALg1z5DRr9TOQKlRUtNfnDLjNb+vkU6
jzU53YZWyWmxBNZSgIZG3XOY729hIrenRn9RvEuWVXeHtfxmft0b7JlhdzpLDjS6
ezGi8Szjy1op5Wf5b9G7jDEwqbtqGoAZd2bGjLcElHzTT3Zc+A5ro6fF8VpA2uGr
MeUWHWWmN5oruSsWtpiFioQ4iXctyqGCkJXNOe7yiK89R7y1Lg8zCDlnaOcWkyjj
UMvdNpL5ALnYvKMgXiIOg9LxWQK1bmVWl0sJZYqbHk/assccshFB8iN7uk0CurGr
Luw2a2A3nkatr7W92KCU0DxpbIbtIqM3GtnjO7DSP7avK45Z0CmMEFNJXFTMKqzg
fClfDgMTkX5YEmGMcEhLaIp2JktjGhp9I4ViWg19YowKdbKs8LJwPRdtT7UZPTHV
OZirT5GTVqPOpnbBzbLsCLfEu7Qxd4CLto1hgPWLqjCmVSukX1hJgdLxqKcyKiDe
yhQ2/ypsv+odi+YZOWlPK1ngyiV55xnLBw7okhy6wkMANrnWegDVpLMpR0Mw1mSd
J8l7009FPrE=
=TT8i
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 20 Mar 19:09 +0100
(name . Christopher Baines)(address . mail@cbaines.net)(address . 63678-done@debbugs.gnu.org)
87y1acydu8.fsf@gnu.org
Christopher Baines <mail@cbaines.net> skribis:

Toggle quote (3 lines)
> I believe I sorted access for Ludo, but nothing was found when looking
> at the logs.

I’m closing it. Let’s reopen if we stumble upon a similar issue.

Ludo’.
Closed
?
Your comment

This issue is archived.

To comment on this conversation send an email to 63678@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 63678
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch