Shepherd does not respect ordering for one-shot? services

  • Done
  • quality assurance status badge
Details
2 participants
  • Ludovic Courtès
  • Tomas Volf
Owner
unassigned
Submitted by
Tomas Volf
Severity
normal
T
T
Tomas Volf wrote on 9 Nov 17:53 +0100
(address . bug-guix@gnu.org)
87ses073i9.fsf@wolfsden.cz
Hello,

I think I found a bug in the GNU Shepherd. Dependencies between
one-shot? #t services do not seem to be respected.

Documentation for #:requirement says the following (emphasis mine):

Toggle snippet (9 lines)
#:requirement is, like provision, a list of symbols that specify
services. In this case, they name what this service depends on: before
the service can be started, services that provide those symbols *must be
started*.

Note that every name listed in #:requirement must be registered so it
can be resolved (see Service Registry).

Documentation for #:one-shot? says the following:

Toggle snippet (10 lines)
Whether the service is a one-shot service. A one-shot service is a
service that, as soon as it has been successfully started, is marked as
“stopped.” Other services can nonetheless require one-shot
services. One-shot services are useful to trigger an action before other
services are started, such as a cleanup or an initialization action.

As for other services, the start method of a one-shot service must
return a truth value to indicate success, and false to indicate failure.

Nothing in there seems to mention that one-shot? services do not
actually wait on each other. To reproduce I wrote a simple
configuration file:

Toggle snippet (25 lines)
(define %one-shot #f)

(use-modules (srfi srfi-1))

(define (make-waiting-service name wait requirement)
(service (list name)
#:requirement requirement
#:start (λ _
(sleep wait)
(format #t "~a\n" name)
#t)
#:one-shot? %one-shot))

(let ((svcs (pair-fold (λ (names waits svcs)
(cons (make-waiting-service (car names)
(car waits)
(cdr names))
svcs))
'()
'(a b c d)
'(1 2 3 4))))
(register-services svcs)
(start-in-the-background (map service-canonical-name svcs)))

Each service sleeps for `wait' seconds to simulate some slow work being
done. In effect that means that each of the services takes different
time to start up.

Now, when we run it as it is, we get the following (correct) output:

Toggle snippet (29 lines)
$ shepherd -c conf.scm
Starting service root...
Service root started.
Service root running with value #t.
Service root has been started.
Configuration successfully loaded from 'conf.scm'.
Starting service d...
d
Service d has been started.
Service d started.
Service d running with value #t.
Starting service c...
c
Service c has been started.
Service c started.
Service c running with value #t.
Starting service b...
b
Service b has been started.
Service b started.
Service b running with value #t.
Starting service a...
a
Service a has been started.
Service a started.
Successfully started 4 services in the background.
Service a running with value #t.

Notice the start-up order (d c b a). If you run it, you will also
notice that `d' takes 4 seconds to start up, `c' 3 seconds etc.

However if we change the define at the top of the configuration file to
#t, hence:

Toggle snippet (3 lines)
(define %one-shot #t)

The behavior changes:

Toggle snippet (29 lines)
$ shepherd -c conf.scm
Starting service root...
Service root started.
Service root running with value #t.
Service root has been started.
Configuration successfully loaded from 'conf.scm'.
Starting service d...
Starting service c...
Starting service b...
Starting service a...
a
Service a has been started.
Service a started.
Service a running with value #t.
b
Service b has been started.
Service b started.
Service b running with value #t.
c
Service c has been started.
Service c started.
Service c running with value #t.
d
Service d has been started.
Service d started.
Successfully started 4 services in the background.
Service d running with value #t.

Notice that the order changed to (a b c d, this matches the increasing
wait time), the initial messages are all together:

Toggle snippet (6 lines)
Starting service d...
Starting service c...
Starting service b...
Starting service a...

and the whole start-up takes 4 seconds (the wait time of `d'). That
seems to indicate that all 4 services are actually starting at the same
time without waiting as they should per the #:requirement argument.



Have a nice day,
Tomas

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
L
L
Ludovic Courtès wrote 2 days ago
(address . 74284@debbugs.gnu.org)(name . Dariqq)(address . dariqq@posteo.net)
87ed33qqpq.fsf@gnu.org
Hi Tomas,

(+ Dariqq since we briefly discussed it on IRC yesterday.)

Tomas Volf <~@wolfsden.cz> skribis:

Toggle quote (12 lines)
> Notice that the order changed to (a b c d, this matches the increasing
> wait time), the initial messages are all together:
>
> Starting service d...
> Starting service c...
> Starting service b...
> Starting service a...
>
> and the whole start-up takes 4 seconds (the wait time of `d'). That
> seems to indicate that all 4 services are actually starting at the same
> time without waiting as they should per the #:requirement argument.

Indeed. As Dariqq found out, the problem was that we’d mark one-short
services in ‘%one-shot-services-started’ as soon as we’ve started them,
effectively acting as if “started” were synonymous with “running”.

This is fixed with 550c0370985022c5c90a7b477a5e0b84f6faf5d7.

Let me know if you find anything fishy!

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote 2 days ago
control message for bug #74284
(address . control@debbugs.gnu.org)
87cyinqqpg.fsf@gnu.org
close 74284
quit
T
T
Tomas Volf wrote 2 days ago
Re: bug#74284: Shepherd does not respect ordering for one-shot? services
(name . Ludovic Courtès)(address . ludo@gnu.org)
87ttbzf43l.fsf@wolfsden.cz
Hi Ludo',

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (6 lines)
> Indeed. As Dariqq found out, the problem was that we’d mark one-short
> services in ‘%one-shot-services-started’ as soon as we’ve started them,
> effectively acting as if “started” were synonymous with “running”.
>
> This is fixed with 550c0370985022c5c90a7b477a5e0b84f6faf5d7.

I have checked out the commit and verified it with my original
reproducer. Everything seems to work as it should, thank you for fixing
it :)

Toggle quote (2 lines)
> Let me know if you find anything fishy!

Did not notice anything, so once 1.0.0 lands in Guix we can just close
this bug.

Have a nice day,
Tomas

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
-----BEGIN PGP SIGNATURE-----

iQJCBAEBCgAsFiEEt4NJs4wUfTYpiGikL7/ufbZ/wakFAmdA3n4OHH5Ad29sZnNk
ZW4uY3oACgkQL7/ufbZ/wakgXQ//RED4qKlsW6bf0MVYZFjuXW6m26TivzvxWyUo
fL/GSLWJjFxbhv6a7reBmYAfo6RH8O0HAQfY79wvJMMs5+BiOcUyY/1xubhZPMuC
QnMW3GsRt107Msak7qhHA0ijRU7OpgqCWSUgrOeOImAa2ovA5RKBB3y6GC6+niw3
JqFTfZ8za3pMexJ1CeWtBRZjsyQJ6eF8X7BUYj/eruRGM7S4D+5oNW9R7suYqOuY
9+8P9txulFalAIvhWVn3ViLsmeZyp5A0kBqFrgX+Ax8lkUngDyU8aCRAdHuSQagD
vUZ0FZb+PrUpFE2JuXqY3nSih7ezPoltVNn3YrIh4uBAEt2rgqOkfJPhH4NB/QWQ
QkZi0uU+A71ZxwNQHzmbunRnqrVMseImyQlX/SESqEmLqws4r36PgT55d/r4UWKg
e6b58GgogPtpGpC5zaj7Y4HbA1nJmEbtNGmcMW/3+qutXfAI4BJAyx78uUJh4oHI
RjBBQhKx3i6ZHw0Nk00QC7Hp9iWUcsk8hh0ZLxx+hL5noVliq379Qm/E+f+1U9To
37LZxwbK7m/XH5GfoeYSghfDUjciBeVgAMgbtbsCiJCiw6Z4SB9nR/aYL/1apUCH
kDUJ2eP1Bh5rPCx6a5nMOcpZm99eukA8xUHrDVUC/NesBE1KZPO8MxwtvvzN2smF
fgeK6Ec=
=xbtH
-----END PGP SIGNATURE-----

?
Your comment

Commenting via the web interface is currently disabled.

To comment on this conversation send an email to 74284@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 74284
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch