[Shepherd] Nested calls lead to a hang

  • Done
  • quality assurance status badge
Details
3 participants
  • Brian Cully
  • Ludovic Courtès
  • Bruno Victal
Owner
unassigned
Submitted by
Bruno Victal
Severity
normal
B
B
Bruno Victal wrote on 30 Apr 2023 17:21
(name . bug-guix)(address . bug-guix@gnu.org)
f25fee95-7b68-88e5-85ec-77322afa29ac@makinata.eu


Minimal example (annotated):

test-system.scm:
Toggle snippet (52 lines)
(use-modules (gnu)
(gnu tests)
(gnu packages)
(gnu packages base) ; coreutils/sleep
(gnu packages admin) ; shepherd
(gnu services shepherd))

;; Some dummy service whose start action simply waits for some seconds,
;; about enough to check with herd status before it exits.
(define dummy-service-type
(shepherd-service-type
'dummy
(lambda (cfg)
(shepherd-service
(documentation "Dummy action to start service.")
(provision '(dummy-service))
(respawn? #f) ; <<<<<< note, this disables the service!
(modules (cons* '(gnu services herd)
%default-modules))
(start #~(lambda _
(format #t "Starting a delay on dummy service.~%")
(fork+exec-command (list #$(file-append coreutils "/bin/sleep")
"30"))))
(stop #~(make-kill-destructor))
(actions
(list (shepherd-action
(name 'my-action)
(documentation "lorem ipsum")
(procedure
#~(lambda (x)
;; Scenario 1: using code from (gnu services herd), this hangs shepherd
#;(start-service 'dummy) ; hangs shepherd
;; Scenario 2: this doesn't hang shepherd but do note that the service has to be re-enabled either manually or automatically here
#;(system* #$(file-append shepherd "/bin/herd") "start" "dummy-service")
;; Scenario 3: use the already imported (shepherd service) module, doesn't hang shepherd
;; Like Scenario 2, the service must be re-enabled since (respawn? #f) disabled this.
;; Comment: Won't re-enabling mean that this service will relaunch once it quits?
;; That means the service has to disable itself on a successful exit, perhaps within the (stop ...) field?
(start 'dummy-service))))))))
#f ; no config
(description "lorem ipsum.")))

(operating-system
(inherit %simple-os)
(services
(cons*
(service dummy-service-type)
%base-services)))



Required modifications to gnu/services/shepherd.scm for scenario 1:

Toggle snippet (15 lines)
diff --git a/gnu/services/shepherd.scm b/gnu/services/shepherd.scm
index b2601c0128..158806f421 100644
--- a/gnu/services/shepherd.scm
+++ b/gnu/services/shepherd.scm
@@ -282,7 +282,8 @@ (define (shepherd-service-file-name service)
(define (shepherd-service-file service)
"Return a file defining SERVICE."
(scheme-file (shepherd-service-file-name service)
- (with-imported-modules %default-imported-modules
+ (with-imported-modules (cons '(gnu services herd)
+ %default-imported-modules)
#~(begin
(use-modules #$@(shepherd-service-modules service))
L
L
Ludovic Courtès wrote on 6 May 2023 19:26
(name . Bruno Victal)(address . mirai@makinata.eu)
87wn1lid69.fsf@gnu.org
Hi,

Bruno Victal <mirai@makinata.eu> skribis:

Toggle quote (2 lines)
[...]

Toggle quote (5 lines)
> (procedure
> #~(lambda (x)
> ;; Scenario 1: using code from (gnu services herd), this hangs shepherd
> #;(start-service 'dummy) ; hangs shepherd

(gnu services herd) provides a client to talk to the shepherd process.
However, the code of actions runs in the shepherd process itself, so
there’s no need to use the client library. Don’t do that. :-)

(Whether that leads to a deadlock depends; at first sight, I’d say
there’s no reason for this to deadlock in general, but you can of course
end up with a logic bug like A starts B, which spawns a client to start
A, which doesn’t start because it’s waiting for B.)

Toggle quote (3 lines)
> ;; Scenario 2: this doesn't hang shepherd but do note that the service has to be re-enabled either manually or automatically here
> #;(system* #$(file-append shepherd "/bin/herd") "start" "dummy-service")

This is equivalent to the one above.

Toggle quote (6 lines)
> ;; Scenario 3: use the already imported (shepherd service) module, doesn't hang shepherd
> ;; Like Scenario 2, the service must be re-enabled since (respawn? #f) disabled this.
> ;; Comment: Won't re-enabling mean that this service will relaunch once it quits?
> ;; That means the service has to disable itself on a successful exit, perhaps within the (stop ...) field?
> (start 'dummy-service))))))))

This should work without blocking.

However, starting a service from another one doesn’t sound great.

Could you give more context?

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 8 May 2023 12:27
(name . Bruno Victal)(address . mirai@makinata.eu)
87fs87dson.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (15 lines)
> Bruno Victal <mirai@makinata.eu> skribis:
>
>> Original discussion (IRC): <https://logs.guix.gnu.org/guix/2023-04-29.log#180735>
>
> [...]
>
>> (procedure
>> #~(lambda (x)
>> ;; Scenario 1: using code from (gnu services herd), this hangs shepherd
>> #;(start-service 'dummy) ; hangs shepherd
>
> (gnu services herd) provides a client to talk to the shepherd process.
> However, the code of actions runs in the shepherd process itself, so
> there’s no need to use the client library. Don’t do that. :-)

Also, the socket created in (gnu services herd) lacks SOCK_NONBLOCK so
the code above is bound to block forever.

Ludo’.
B
B
Brian Cully wrote on 13 May 2023 01:01
(name . Ludovic Courtès)(address . ludo@gnu.org)
87h6shduaw.fsf@psyduck.jhoto.kublai.com
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (8 lines)
> (Whether that leads to a deadlock depends; at first sight, I’d
> say
> there’s no reason for this to deadlock in general, but you can
> of course
> end up with a logic bug like A starts B, which spawns a client
> to start
> A, which doesn’t start because it’s waiting for B.)

It's been a while since I looked at this, but my rough
recollection is the deadlock occurs because shepherd can only
process one request over its socket at a time. If that request
happens to *also* try to talk over the same socket, it'll hang
indefinitely waiting for its turn to come off the accept queue.

I'm not sure there's much to be done in the 0.9 version of
shepherd about it. I'm hoping that 0.10 and up will be able to
cope with situations like this without completely deadlocking the
shepherd itself. It's obviously pretty bad if pid 1 hangs for any
reason at all, even user error.

-bjc
L
L
Ludovic Courtès wrote on 13 May 2023 11:45
(name . Brian Cully)(address . bjc@spork.org)
87h6sg1s4w.fsf@gnu.org
Brian Cully <bjc@spork.org> skribis:

Toggle quote (13 lines)
> Ludovic Courtès <ludo@gnu.org> writes:
>
>> (Whether that leads to a deadlock depends; at first sight, I’d say
>> there’s no reason for this to deadlock in general, but you can of
>> course
>> end up with a logic bug like A starts B, which spawns a client to
>> start
>> A, which doesn’t start because it’s waiting for B.)
>
> It's been a while since I looked at this, but my rough recollection is
> the deadlock occurs because shepherd can only process one request over
> its socket at a time.

That’s not the case in 0.9: it can process several requests
concurrently. However, as I wrote in a followup message, the client
socket created by (gnu services herd) lacks SOCK_NONBLOCK, which can
thus block the process on reads and writes.

Ludo’.
L
L
Ludovic Courtès wrote on 13 May 2023 11:46
control message for bug #63190
(address . control@debbugs.gnu.org)
87fs801s4h.fsf@gnu.org
tags 63190 notabug
close 63190
quit
?