Interactive prompt opened upon shepherd config file error

  • Done
  • quality assurance status badge
Details
3 participants
  • Ludovic Courtès
  • Ludovic Courtès
  • Christopher Baines
Owner
unassigned
Submitted by
Ludovic Courtès
Severity
important
L
L
Ludovic Courtès wrote on 23 May 2024 12:59
(address . bug-guix@gnu.org)
87sey894kj.fsf@inria.fr
Hello,

One problem we noticed in the analysis of the boot problem of bayfront
after the recent downtime¹ is that an interactive REPL would be opened
after an unbound variable was found in the shepherd config file:

Toggle snippet (18 lines)
[ 13.098907] shepherd[1]: Service root started.
[ 13.100711] shepherd[1]: Service root running with value #t.
[ 13.103824] shepherd[1]: Service root has been started.
[ 13.426102] shepherd[1]: ice-9/boot-9.scm:1685:16: In procedure raise-exception:
[ 13.428099] shepherd[1]: Unbound variable: make-forkexec-constructor/container
[ 13.429912] shepherd[1]:
[ 13.431108] shepherd[1]: Entering a new prompt. Type `,bt' for a backtrace or `,q' to continue.
[ 13.441983] shepherd[1]: GNU Guile 3.0.9
[ 13.442728] shepherd[1]: Copyright (C) 1995-2023 Free Software Foundation, Inc.
[ 13.443947] shepherd[1]:
[ 13.444427] shepherd[1]: Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
[ 13.445679] shepherd[1]: This program is free software, and you are welcome to redistribute it
[ 13.446919] shepherd[1]: under certain conditions; type `,show c' for
[ 13.447072] shepherd[1]: details.
[ 13.448737] shepherd[1]:
[ 13.449239] shepherd[1]: Enter `,help' for help.

This was unhelpful because we couldn’t interact with that REPL remotely
(no IPMI). Even when you can interact, it’s of limited use; in this
case, if you type “,q”, it tries to continue and fails:

Toggle snippet (43 lines)
Uncaught exception in task:
In fibers.scm:
172:8 7 (_)
In ice-9/exceptions.scm:
406:15 6 (_)
In ice-9/boot-9.scm:
1752:10 5 (with-exception-handler _ _ #:unwind? _ # _)
In shepherd/service.scm:
824:39 4 (_)


this is because we’re effectively adding #f in the middle of the list
passed to ‘register-services’ (see below).

This REPL-on-error “feature” comes from Guix System, not Shepherd, in
the config file generated from (gnu services shepherd):

;; Arrange to spawn a REPL if something goes wrong. This is better
;; than a kernel panic.
(call-with-error-handling
(lambda ()
(register-services
(parameterize ((current-warning-port
(%make-void-port "w")))
(map (lambda (file)
(save-module-excursion
(lambda ()
(set-current-module (make-user-module))
(load-compiled file))))
'#$(map scm->go files))))))

The rationale mentioned in the comment no longer holds: starting from
Shepherd 0.10.2, the config file is loaded in the background; if it’s
evaluation fails, shepherd keeps running (see
‘tests/config-failure.sh’, which tests this behavior).

I think we should change the above to log and gracefully handle failure
to load an individual service file.

Ludo’.

¹ https://lists.gnu.org/archive/html/info-guix/2024-05/msg00000.html
L
L
Ludovic Courtès wrote on 25 May 2024 10:56
control message for bug #71144
(address . control@debbugs.gnu.org)
87jzji46e5.fsf@gnu.org
severity 71144 important
quit
L
L
Ludovic Courtès wrote on 25 May 2024 11:11
Re: bug#71144: Interactive prompt opened upon shepherd config file error
(address . 71144@debbugs.gnu.org)
878qzy45p3.fsf@gnu.org
Ludovic Courtès <ludovic.courtes@inria.fr> skribis:

Toggle quote (3 lines)
> I think we should change the above to log and gracefully handle failure
> to load an individual service file.

With the change below, every service except the offending one is loaded
and started as expected:

Toggle snippet (32 lines)
[ 22.450515] shepherd[1]: Service root running with value #t.
[ 22.454624] shepherd[1]: Service root has been started.
[ 22.711738] shepherd[1]: Exception caught while loading '/gnu/store/fjis6iqpjfcnr90fy8rsg9v4j828jslv-shepherd-gwl-web.go': #<&compound-exception components: (#<&undefined-variable> #<&origin origin: #f> #<&message message: "Unbound variable: ~S"> #<&irritants irri
[ 22.711839] tants: (make-forkexec-constructor/container)> #<&exception-with-kind-and-args kind: unbound-variable args: (#f "Unbound variable: ~S" (make-forkexec-constructor/container) #f)>)>
[ 22.755146] shepherd[1]: starting services...
[ 22.756491] shepherd[1]: Configuration successfully loaded from '/gnu/store/mq7y31xnjcjwjkyf6w7qiaq61g6n9f5x-shepherd.conf'.
Uncaught exception in task:
In fibers.scm:
172:8 7 (_)
In ice-9/exceptions.scm:
406:15 6 (_)
In ice-9/boot-9.scm:
1752:10 5 (with-exception-handler _ _ #:unwind? _ # _)
In shepherd/service.scm:
824:39 4 (_)
In oop/goops.scm:
1567:11 3 (cache-miss #f)
1585:2 2 (_ _ _)
In ice-9/boot-9.scm:
1685:16 1 (raise-exception _ #:continuable? _)
1683:16 0 (raise-exception _ #:continuable? _)
ice-9/boot-9.scm:1683:16: In procedure raise-exception:
No applicable method for #<<generic> one-shot-service? (1)> in call (one-shot-service? #f)
[ 22.798737] shepherd[1]: Starting service user-file-systems...
[ 22.800361] shepherd[1]: Starting service root-file-system...
[ 22.802015] shepherd[1]: Starting service host-name...
[ 22.803688] shepherd[1]: Starting service pam...
[ 22.805372] shepherd[1]: Starting service sysctl...
[ 22.806926] shepherd[1]: Starting service loopback...
[ 22.808225] shepherd[1]: Starting service firewall...

(There’s still this scary-looking but harmless backtrace in the middle:
that’s because (start-in-the-background '(something-that-does-not-exist))
throws like that as of 0.10.4.)

Once booted, shepherd is fine and you can interact normally with it; the
only thing missing is, in this case, the ‘gwl-web’ service, which we
failed to load.

I think that’s a significant improvement.

Thoughts?

Ludo’.
Toggle diff (48 lines)
diff --git a/gnu/services/shepherd.scm b/gnu/services/shepherd.scm
index 455e972535d..f13c52c37ba 100644
--- a/gnu/services/shepherd.scm
+++ b/gnu/services/shepherd.scm
@@ -380,8 +380,7 @@ (define (shepherd-configuration-file services shepherd)
(scm->go (cute scm->go <> shepherd)))
(define config
#~(begin
- (use-modules (srfi srfi-34)
- (system repl error-handling))
+ (use-modules (srfi srfi-1))
(define (make-user-module)
;; Copied from (shepherd support), where it's private.
@@ -417,17 +416,22 @@ (define (shepherd-configuration-file services shepherd)
;; Arrange to spawn a REPL if something goes wrong. This is better
;; than a kernel panic.
- (call-with-error-handling
- (lambda ()
- (register-services
- (parameterize ((current-warning-port
- (%make-void-port "w")))
- (map (lambda (file)
- (save-module-excursion
- (lambda ()
- (set-current-module (make-user-module))
- (load-compiled file))))
- '#$(map scm->go files))))))
+ (register-services
+ (parameterize ((current-warning-port (%make-void-port "w")))
+ (filter-map (lambda (file)
+ (with-exception-handler
+ (lambda (exception)
+ (format #t "Exception caught \
+while loading '~a': ~s~%"
+ file exception)
+ #f)
+ (lambda ()
+ (save-module-excursion
+ (lambda ()
+ (set-current-module (make-user-module))
+ (load-compiled file))))
+ #:unwind? #t))
+ '#$(map scm->go files))))
(format #t "starting services...~%")
(let ((services-to-start
L
L
Ludovic Courtès wrote on 25 May 2024 11:33
control message for bug #71144
(address . control@debbugs.gnu.org)
87ttim2q2s.fsf@gnu.org
tags 71144 + patch
quit
C
C
Christopher Baines wrote on 25 May 2024 14:02
Re: bug#71144: Interactive prompt opened upon shepherd config file error
(name . Ludovic Courtès)(address . ludo@gnu.org)
87y17yf6bm.fsf@cbaines.net
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (53 lines)
> Ludovic Courtès <ludovic.courtes@inria.fr> skribis:
>
>> I think we should change the above to log and gracefully handle failure
>> to load an individual service file.
>
> With the change below, every service except the offending one is loaded
> and started as expected:
>
> --8<---------------cut here---------------start------------->8---
> [ 22.450515] shepherd[1]: Service root running with value #t.
> [ 22.454624] shepherd[1]: Service root has been started.
> [ 22.711738] shepherd[1]: Exception caught while loading '/gnu/store/fjis6iqpjfcnr90fy8rsg9v4j828jslv-shepherd-gwl-web.go': #<&compound-exception components: (#<&undefined-variable> #<&origin origin: #f> #<&message message: "Unbound variable: ~S"> #<&irritants irri
> [ 22.711839] tants: (make-forkexec-constructor/container)> #<&exception-with-kind-and-args kind: unbound-variable args: (#f "Unbound variable: ~S" (make-forkexec-constructor/container) #f)>)>
> [ 22.755146] shepherd[1]: starting services...
> [ 22.756491] shepherd[1]: Configuration successfully loaded from '/gnu/store/mq7y31xnjcjwjkyf6w7qiaq61g6n9f5x-shepherd.conf'.
> Uncaught exception in task:
> In fibers.scm:
> 172:8 7 (_)
> In ice-9/exceptions.scm:
> 406:15 6 (_)
> In ice-9/boot-9.scm:
> 1752:10 5 (with-exception-handler _ _ #:unwind? _ # _)
> In shepherd/service.scm:
> 824:39 4 (_)
> In oop/goops.scm:
> 1567:11 3 (cache-miss #f)
> 1585:2 2 (_ _ _)
> In ice-9/boot-9.scm:
> 1685:16 1 (raise-exception _ #:continuable? _)
> 1683:16 0 (raise-exception _ #:continuable? _)
> ice-9/boot-9.scm:1683:16: In procedure raise-exception:
> No applicable method for #<<generic> one-shot-service? (1)> in call (one-shot-service? #f)
> [ 22.798737] shepherd[1]: Starting service user-file-systems...
> [ 22.800361] shepherd[1]: Starting service root-file-system...
> [ 22.802015] shepherd[1]: Starting service host-name...
> [ 22.803688] shepherd[1]: Starting service pam...
> [ 22.805372] shepherd[1]: Starting service sysctl...
> [ 22.806926] shepherd[1]: Starting service loopback...
> [ 22.808225] shepherd[1]: Starting service firewall...
> --8<---------------cut here---------------end--------------->8---
>
> (There’s still this scary-looking but harmless backtrace in the middle:
> that’s because (start-in-the-background '(something-that-does-not-exist))
> throws like that as of 0.10.4.)
>
> Once booted, shepherd is fine and you can interact normally with it; the
> only thing missing is, in this case, the ‘gwl-web’ service, which we
> failed to load.
>
> I think that’s a significant improvement.
>
> Thoughts?

That looks good to me, the "Arrange to spawn a REPL if something goes
wrong" comment needs removing/updating, but that's the only thing I
spotted.
-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmZR0z1fFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9Xc2PBAAjgpksQf5mX5x9jbdj6ujQ4vTlhCkJS0t
1B0eiqqq0GqJxFN0dsYAxQwOHnAPi+Glm+GmbEpW9sBjS8nBb7vxHXf2ECdUZKmP
hJMxOA0FIggXNU6bsdUSKVYkwK5zXsPCiAJRfUAsCLJEucdaMRiIvmgAThKrkO1M
Ap0ggbXSf3uuR5Um82eB5jIoNosI77GC2egpnjSinpA2ndkPhg1mgEvBOG6Wb4TZ
qlDc0mGZd3Q8Ztc+ULYMwitlxzD1VsuDH9/eATj4D+nC2SdPHY5ZeoODYBRdaPzU
7k3UZ3ghrdRxTTxhwvY/7yv/TFc9UsLrbah+v3ZRffhPskjPQsksf0p9p08khE+I
m7lifaVLC1CMNsNOznvQNmvRphnIrD2Vnbqb2rjjcPsmX1qi2NWzU1vqMrq7pNxX
M0ItsH2lxBoLiFM7s0AQeC+qlsO7HyFnVHhTqT7PMBAZMbL3QdfIWsHrTCIjsQzP
6BAhLoCAydEvWHQuGtg842HclygCDHLXbHGHLwm0ZOZ9o6qmOFjJPRMvnRgzYcj+
37l0QKXJicOdYkKKgviJ9pdkYaJcOwTdeFmFDruORUOQIrt2UZA+zGDOoBJ7+Ek9
Tiyz8NXoIvfoxeumFxLcKVcekUTg/g+8NEcYVRaQi4uJ+hkyv3vh9gcs0Zjs5Qsy
Rsxuihqkc5E=
=7r7N
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 25 May 2024 16:59
(name . Christopher Baines)(address . mail@cbaines.net)
87sey6ym2j.fsf@gnu.org
Christopher Baines <mail@cbaines.net> skribis:

Toggle quote (4 lines)
> That looks good to me, the "Arrange to spawn a REPL if something goes
> wrong" comment needs removing/updating, but that's the only thing I
> spotted.

Cool. I updated the comment and pushed it as
cca25a67693bb68a1884a081b415a43fad1e8641.

Thanks!

Ludo’.
Closed
?
Your comment

This issue is archived.

To comment on this conversation send an email to 71144@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 71144
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch