System does not boot after switching to system-log service

  • Open
  • quality assurance status badge
Details
2 participants
  • Ludovic Courtès
  • Tomas Volf
Owner
unassigned
Submitted by
Tomas Volf
Severity
important

Debbugs page

Tomas Volf wrote 6 days ago
(address . bug-guix@gnu.org)
87bjv267qp.fsf@wolfsden.cz
Hello,

after pulling recent Guix, I got this error during guix deploy:

Toggle snippet (4 lines)
guix deploy: warning: an error occurred while upgrading services on '127.0.0.1':
%exception #<inferior-object #<&service-not-found-error service: system-log>>

After rebooting, the system got stack during startup. No error message
was visible, it was just hanging.

Booting to previous generation did work.

Tomas

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
-----BEGIN PGP SIGNATURE-----

iQJCBAEBCgAsFiEEt4NJs4wUfTYpiGikL7/ufbZ/wakFAmexNE4OHH5Ad29sZnNk
ZW4uY3oACgkQL7/ufbZ/wakRiRAAkIVjtVSw29+uookSionDDutoWp6FLvD9s7nx
KNMMabojck/tUGY01gYtP7+5IYChCjlwculCk1CyOY45h750pWP+jXArgnGVRJbl
dNyYw66RH+DYg9lriQqxzuuUk+jCMFNnfbZ/34EdWgxU7U5L3ecrkh1Trx16S1hl
TJxBvGyriL9rvlrUGjBnDJ3qsftb2NN4WXH8lTQ6Ebe2tBXjXeglLC+zX08cirRq
WRuoD7tFPJ+UnoG+WrfV10bDcM91MDLFyBtB7BMfJw8ciCaLSB5AbFY0UwmbyVWL
xevbpNiFlBAnRlxaeM1j5wEz+9Qe8AIluVwLapTyr0orN71dttUtHoOK3kMqEyjp
iozkbKlFj1utf/LO4y7MVOslPCe7XoQLUptYCWss7Vsr6KMabmItfuVVtMKRnvHu
yuFKuKwRmEcUthdJ0eDJ0NkP7U94ykuBKzuIhIJBaLYSTBsEea7UHwYgNslhnaIL
qHqrsJ1+AeYfRfwQHgPhoQdA5jd/khi0WpoqveEVZpM+ie0kBx6DT2uKcNNfkqld
rwnIofnA7tJsdZaVULpAdLoDRl2fFSzkrsUITvdLI1RDDXrNHtyg8iWFNEMxKw0b
QCjlHpeGeyCF7iIKXEuadi+X+deh5YorUFfJWbALMKGaTtKx7llv5UsimhsFbvEe
wRs35XQ=
=PTmg
-----END PGP SIGNATURE-----

Tomas Volf wrote 5 days ago
(address . 76315@debbugs.gnu.org)
87a5am3r5b.fsf@wolfsden.cz
I have put together a reproducer in a VM:

1. Install Guix system using 1.4.0 installer
--> Include sshd, openbox

2. Reboot
3. Copy the /run/current-system/configuration.scm out of the VM
4. Adjust the configuration.scm (full file attached)
4.1 Allow NOPASSWD sudo
(sudoers-file
(plain-file "sudoers"
(string-append (plain-file-content %sudoers-specification)
(format #f "x ALL = NOPASSWD: ALL~%"))))
4.2 Use %base-services, delete set-xorg-configuration service
4.3 Add dhcp-client-service-type service.
4.4 Authorize your key
(simple-service
'extra-authorized-keys guix-service-type
(guix-extension
(authorized-keys (list
(local-file "/etc/guix/signing-key.pub")))))

5. Manually tweak /etc/sudoers to support NOPASSWD for user x
6. Create machine configuration (full file attached)

7. Guix deploy the machine using b99df83c591104655a6b387817d8f7bb3c50204c
8. Reboot

9. Guix deploy the machine using 1afbf48b250f667ce45de40a6c275e3e42ade67c
--> See the following error:
Toggle snippet (6 lines)
building path(s) `/gnu/store/zdknxv3knkkxx52nwfbz120p32z4j2aa-upgrade-shepherd-services.scm'
building path(s) `/gnu/store/x7bzglpc0vvr5ak24k3i33ikq5ph8sfx-remote-exp.scm'
guix deploy: warning: an error occurred while upgrading services on 'localhost':
%exception #<inferior-object #<&service-not-found-error service: system-log>>

A. Reboot
--> The system does not come up (I gave it ~10 minutes).
;; This is an operating system configuration generated ;; by the graphical installer. ;; ;; Once installation is complete, you can learn and modify ;; this file to tweak the system configuration, and pass it ;; to the 'guix system reconfigure' command to effect your ;; changes. ;; Indicate which modules to import to access the variables ;; used in this configuration. (use-modules (gnu)) (use-service-modules cups desktop networking ssh xorg) (operating-system (locale "en_US.utf8") (timezone "Europe/Prague") (keyboard-layout (keyboard-layout "us")) (host-name "x") ;; The list of user accounts ('root' is implicit). (users (cons* (user-account (name "x") (comment "X") (group "users") (home-directory "/home/x") (supplementary-groups '("wheel" "netdev" "audio" "video"))) %base-user-accounts)) ;; Packages installed system-wide. Users can also install packages ;; under their own account: use 'guix search KEYWORD' to search ;; for packages and 'guix install PACKAGE' to install a package. (packages (append (list (specification->package "openbox") (specification->package "nss-certs")) %base-packages)) (sudoers-file (plain-file "sudoers" (string-append (plain-file-content %sudoers-specification) (format #f "x ALL = NOPASSWD: ALL~%")))) ;; Below is the list of system services. To search for available ;; services, run 'guix system search KEYWORD' in a terminal. (services (append (list (service dhcp-client-service-type) ;; To configure OpenSSH, pass an 'openssh-configuration' ;; record as a second argument to 'service' below. (service openssh-service-type) (simple-service 'extra-authorized-keys guix-service-type (guix-extension (authorized-keys (list (local-file "/etc/guix/signing-key.pub")))))) ;; This is the default list of services we ;; are appending to. %base-services)) (bootloader (bootloader-configuration (bootloader grub-efi-bootloader) (targets (list "/boot/efi")) (keyboard-layout keyboard-layout))) (swap-devices (list (swap-space (target (uuid "aa8dee07-5bf4-4ad2-8db7-8ee6139d6fc5"))))) ;; The list of file systems that get "mounted". The unique ;; file system identifiers there ("UUIDs") can be obtained ;; by running 'blkid' in a terminal. (file-systems (cons* (file-system (mount-point "/boot/efi") (device (uuid "79EB-4D57" 'fat32)) (type "vfat")) (file-system (mount-point "/") (device (uuid "11d0a98d-7200-4a9b-ae0a-0cb4db3e808d" 'ext4)) (type "ext4")) %base-file-systems)))
(use-modules (gnu)) (use-service-modules networking ssh) (use-package-modules bootloaders) (list (machine (operating-system (primitive-load "config.scm")) (environment managed-host-environment-type) (configuration (machine-ssh-configuration (build-locally? #f) (host-name "localhost") (system "x86_64-linux") (user "x") (port 8888)))))
-----BEGIN PGP SIGNATURE-----

iQJCBAEBCgAsFiEEt4NJs4wUfTYpiGikL7/ufbZ/wakFAmex9NAOHH5Ad29sZnNk
ZW4uY3oACgkQL7/ufbZ/wamL4Q/9E8J3TMhoS0VZ/s5nJGh53pbbfJEOTYxwLFSl
fL7Nc14z4k6VLOqR7Rj5N8SUsv35z3aK+7fr68Sxdqz7FKw4qriT7slMISNIBXHi
NAp3Ob73HEeC+jB4sfJjox13girJpjo3lGitYldokWHEwdEK5Puh8GSMk3wJee1D
NpFjJO7ttdykNJuulGlDfSxyyiNtdT1OnHNq5JAuGfITXt8hAYDpd0UBsV6fWOxj
oMYetqp6QSpb4lGZgOhTslpACZygBr6jZZ571dxqre4KFc68fh79F3ae3Zw+xFS5
ddcfFRjq8XkzGBV2TMsMejBVwG+zc66hAMU8eIly48vbARQcuhCOVacEA96QiVYM
nd64HagbAayfe2YQ3FhUCTElxosdafwLXrJsHMriMuEwczlfiaiTGMrUtWyz3E43
htUkW9DZ/vztEGLPCWbmHIpre2vRGPhaULSwfD1vMz0iIQKEOeWW0BBMoZ3IGGBC
yElsY4LrwxgSk20SE2BuNj46Wp66zq1niaMmW7sZhQd0QP7l6raGUaCndvs14Bht
u6je4iY/PhMbqN1ggix+sLkmFMH5iEwNQoEeoajO89aH4mDMsF0I1pIbcr3VHWaX
XfoEC9TeRYU2bClcDXHePttOjTjBaz5c9xmpIb8MizbIZADCxundwjnHr6AbouU6
TlVwodE=
=ZHMd
-----END PGP SIGNATURE-----

Ludovic Courtès wrote 5 days ago
control message for bug #76315
(address . control@debbugs.gnu.org)
87zfilok9k.fsf@gnu.org
severity 76315 important
quit
Ludovic Courtès wrote 5 days ago
Re: bug#76315: System does not boot after switching to system-log service
(name . Tomas Volf)(address . ~@wolfsden.cz)(address . 76315@debbugs.gnu.org)
87seodo9w5.fsf@gnu.org
Hi,

Tomas Volf <~@wolfsden.cz> skribis:

Toggle quote (3 lines)
> A. Reboot
> --> The system does not come up (I gave it ~10 minutes).

I tried the config file you gave with:

./pre-inst-env guix system vm /tmp/config.scm

and it hangs, to my surprise (I’ve been using ‘system-log’ on my laptop
since June, and “make check-system TESTS=basic” & co. pass).

I’ll keep investigating and probably revert the change in the interim.

Ludo’.
Ludovic Courtès wrote 5 days ago
(name . Tomas Volf)(address . ~@wolfsden.cz)(address . 76315@debbugs.gnu.org)
87o6z1o7jq.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (2 lines)
> I’ll keep investigating and probably revert the change in the interim.

Reverted in 8c483c12e94bcf43e4c44170f1d5fea5fbba4970.

Ludo'.
Ludovic Courtès wrote 40 hours ago
(name . Tomas Volf)(address . ~@wolfsden.cz)(address . 76315@debbugs.gnu.org)
87ikp5hciz.fsf@gnu.org
Hey Tomas,

Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (7 lines)
> I tried the config file you gave with:
>
> ./pre-inst-env guix system vm /tmp/config.scm
>
> and it hangs, to my surprise (I’ve been using ‘system-log’ on my laptop
> since June, and “make check-system TESTS=basic” & co. pass).

After spending hours on this and fixing improbable issues in the
Shepherd (will push shortly), I found that the root of the problem is
exactly what I feared and which led to the patches at

Namely, ‘dhcp-client-service-type’ calls ‘waitpid’; that call competes
with the one done by shepherd’s SIGCHLD handler and, if you’re unlucky,
it loses the race and waits forever. (I’m using
‘network-manager-service-type’ on my laptop, which is why I did not
stumble upon this bug.)

Could you try your config with the patch at
https://issues.guix.gnu.org/76262#2, at least in a VM and ideally on
the metal?

Thanks in advance,
Ludo’.
Ludovic Courtès wrote 40 hours ago
(name . Tomas Volf)(address . ~@wolfsden.cz)(address . 76315@debbugs.gnu.org)
87eczthcd2.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (4 lines)
> Could you try your config with the patch at
> <https://issues.guix.gnu.org/76262#2>, at least in a VM and ideally on
> the metal?

You need to do that on top of a pre-revert commit, such as
eba8c08b1bfc7ac333a0eda658a0be5acac7f151.
Tomas Volf wrote 15 hours ago
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 76315@debbugs.gnu.org)
87pljcwbe4.fsf@wolfsden.cz
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (20 lines)
> Hey Tomas,
>
> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> I tried the config file you gave with:
>>
>> ./pre-inst-env guix system vm /tmp/config.scm
>>
>> and it hangs, to my surprise (I’ve been using ‘system-log’ on my laptop
>> since June, and “make check-system TESTS=basic” & co. pass).
>
> After spending hours on this and fixing improbable issues in the
> Shepherd (will push shortly), I found that the root of the problem is
> exactly what I feared and which led to the patches at
> <https://issues.guix.gnu.org/76262>.
>
> Namely, ‘dhcp-client-service-type’ calls ‘waitpid’; that call competes
> with the one done by shepherd’s SIGCHLD handler and, if you’re unlucky,
> it loses the race and waits forever.

Observation here. While yes, based on the description I agree that it
is (bad) luck based, in practice it seems to be extremely reliable to
reproduce.

At first I struggled to reproduce again, it did not hang even single
time (out of 5 tries) on the bad commit, but once I reverted my
configuration to what it was back then (== removed few shepherd timers),
the hang started happening every single time.

So, while in theory it should be a probabilistic problem, in practice it
does not seem to be the case. Not sure where I am going with this, I
just think it is interesting.

Toggle quote (5 lines)
>
> Could you try your config with the patch at
> <https://issues.guix.gnu.org/76262#2>, at least in a VM and ideally on
> the metal?

I have reverted your revert and applied the patch 2 on top of that.

Steps I took (both in VM and on a spare laptop):

1. Reconfigure from commit 1.
2. Ensure it still hangs (5x).
3. Reconfigure from commit 2.
4. Ensure it no longer hangs (5x).

I can confirm the patch 2 fixes the issue for me, both in the VM and on
physical machine.

Only thing I have noticed that even when deploying the "good" commit, I
see the following error in the log:

Toggle snippet (4 lines)
guix deploy: warning: an error occurred while upgrading services on '127.0.0.1':
%exception #<inferior-object #<&service-not-found-error service: system-log>>

The system comes up fine after reboot though.

Toggle quote (4 lines)
>
> Thanks in advance,
> Ludo’.

Thank you for figuring this one out. :)

Tomas

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
-----BEGIN PGP SIGNATURE-----

iQJCBAEBCgAsFiEEt4NJs4wUfTYpiGikL7/ufbZ/wakFAme3n1MOHH5Ad29sZnNk
ZW4uY3oACgkQL7/ufbZ/wam3gxAAqO6v9XY9WN1M65M+bZ6G7DZgwML3RAh/W+pt
tsvQOmuPiQCx12Ic1CF07U83fkElT/UXn4crqs0eg+8dYFqRTO+KI36ovL9QmG8K
ZumRfNlu3FfDdgLjRgSBXYVm7pkIH4/Qoio+JPVZIoB56V+/HK5vuw2wBGTIamZz
vpTy8olLCQfEVAnzZZ2qQSvotlpuD+3PrTt1ZZBsf0GBr7t+Srgr8n7A2TslHSTT
EpgIUy5xXv/3lyEC9zikG9JvPrWOzW2cAd5JOwi7sDx4YrmaAtefjjLa0784PhH8
nOlbLvpjrXlkgL56pC2+j5mir98oywFYoM36lG7LLsoRWlKDUTr4/QPhnZKvIZBF
+CSVBHWcAAr4DOiLLmkDCBsPfxUmrA/mR/jyrP+Rh0KbQUa8ycciMDhrubUoHRAt
n+3m/J7I4teNFK6k2t8T6h0ONGyjjALaAG5czNQPwOHQha1IwRIVFCQNUDtTwdlV
zbPwqt0zUToyio+ribwNhuoUL2Kl0sDYWMUqgBxcetiQg/Cn+smoLoBncpvV6Jrv
kV9C9CaDMwKm4du10OGcgW2hmwC7FXr4k968FX7V/AvF0NcC7BjpcKF1u9y4EheT
iWiW0zZNlKxLYFcHf6ogKQFbzreAmLQyAKajArw8qPCZlaGZ6PgaLwfgSFc1DYm+
AhbR5iI=
=Pujd
-----END PGP SIGNATURE-----

Ludovic Courtès wrote 100 minutes ago
(name . Tomas Volf)(address . ~@wolfsden.cz)(address . 76315@debbugs.gnu.org)
8734g7a6o3.fsf@gnu.org
Hi,

Tomas Volf <~@wolfsden.cz> skribis:

Toggle quote (13 lines)
>> After spending hours on this and fixing improbable issues in the
>> Shepherd (will push shortly), I found that the root of the problem is
>> exactly what I feared and which led to the patches at
>> <https://issues.guix.gnu.org/76262>.
>>
>> Namely, ‘dhcp-client-service-type’ calls ‘waitpid’; that call competes
>> with the one done by shepherd’s SIGCHLD handler and, if you’re unlucky,
>> it loses the race and waits forever.
>
> Observation here. While yes, based on the description I agree that it
> is (bad) luck based, in practice it seems to be extremely reliable to
> reproduce.

Yes, I could reproduce it 100% with just ‘bare-bones.tmpl’. Thing is,
as soon as you would change something non-trivial, for instance the
‘message-destination’ procedure of shepherd so that it writes everything
to /dev/console, the problem would go away. Even just commenting out
some of the parameters passed to ‘system-log’ could make the problem
disappear (!), which is why it took me a lot of time to figure it out.

Toggle quote (4 lines)
>> Could you try your config with the patch at
>> <https://issues.guix.gnu.org/76262#2>, at least in a VM and ideally on
>> the metal?

[...]

Toggle quote (3 lines)
> I can confirm the patch 2 fixes the issue for me, both in the VM and on
> physical machine.

Yay!

Toggle quote (6 lines)
> Only thing I have noticed that even when deploying the "good" commit, I
> see the following error in the log:
>
> guix deploy: warning: an error occurred while upgrading services on '127.0.0.1':
> %exception #<inferior-object #<&service-not-found-error service: system-log>>

I think I understood this one now.

The old service has only one name: syslogd. The new one, which upgrades
it, has two names: system-log and syslogd (system-log is its “canonical
name”).

The service upgrade machinery gets confused because it uses the
canonical name in one place.

I’ll investigate.

Ludo’.
?
Your comment

Commenting via the web interface is currently disabled.

To comment on this conversation send an email to 76315@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 76315
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch