Shepherd doesn't restart previously running dependent services

DoneSubmitted by Thompson, David.
Details
4 participants
  • Carlo Zancanaro
  • Clément Lassieur
  • Thompson, David
  • Ludovic Courtès
Owner
unassigned
Severity
normal
T
T
Thompson, David wrote on 31 Mar 2016 15:23
(address . bug-guix@gnu.org)
CAJ=RwfZdqc6iqsRsay95oPZDd-hrqkaERYA5PYiXp_FwOO1-Qg@mail.gmail.com
If service 'foo' is depended on by service 'bar', and both arerunning, and one runs 'herd restart foo', both 'foo' and 'bar' arestopped, but then only 'foo' is restarted. I think that the dependentservices that were stopped should also be restarted, otherwise theuser has to manually start them back up one at a time.
- Dave
C
C
Carlo Zancanaro wrote on 25 Aug 2018 13:33
[PATCH shepherd] Restart dependent services on service restart
(address . 23170@debbugs.gnu.org)
874lfi65rv.fsf@zancanaro.id.au
I've written a patch to fix this. It's not super smart, but it should do the job.
It currently targets the branch after my patch in #32408[1], but it's technically an independent change.
[1]: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=32408
From 50dd3ef4888b04ea3b869da893b23ad69fad8971 Mon Sep 17 00:00:00 2001From: Carlo Zancanaro <carlo@zancanaro.id.au>Date: Sat, 25 Aug 2018 20:32:11 +1000Subject: [PATCH] service: Restart dependent services on service restart
* modules/shepherd/service.scm (required-by?): New procedure.(stop): Return a list of canonical-names for stopped dependent services,including transitive dependencies.(action)[restart]: Start services based on the return value of stop.(fold-services): New procedure.* tests/restart.sh: New file.* Makefile.am (TESTS): Add tests/restart.sh.--- Makefile.am | 1 + modules/shepherd/service.scm | 90 ++++++++++++++++++++++-------------- tests/restart.sh | 77 ++++++++++++++++++++++++++++++ 3 files changed, 133 insertions(+), 35 deletions(-) create mode 100644 tests/restart.sh
Toggle diff (236 lines)diff --git a/Makefile.am b/Makefile.amindex 4322d7f..d9e21e9 100644--- a/Makefile.am+++ b/Makefile.am@@ -187,6 +187,7 @@ TESTS = \ tests/replacement.sh \ tests/respawn.sh \ tests/respawn-throttling.sh \+ tests/restart.sh \ tests/misbehaved-client.sh \ tests/no-home.sh \ tests/pid-file.sh \diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scmindex 006309c..510a5ea 100644--- a/modules/shepherd/service.scm+++ b/modules/shepherd/service.scm@@ -358,61 +358,72 @@ NEW-SERVICE." (for-each remove-service (provided-by old-service)) (register-services new-service))) +(define (required-by? service dependent)+ "Returns #t if DEPENDENT directly requires SERVICE in order to run. Returns+#f otherwise."+ (and (find (lambda (dependency)+ (memq dependency (provided-by service)))+ (required-by dependent))+ #t))+ ;; Stop the service, including services that depend on it. If the ;; latter fails, continue anyway. Return `#f' if it could be stopped.-(define-method (stop (obj <service>) . args)+(define-method (stop (service <service>) . args)+ "Stop SERVICE, and any services which depend on it. Returns a list of+canonical names for all of the services which have been stopped (including+transitive dependent services). This method will print a warning if SERVICE+is not already running, and will return SERVICE's canonical name in a list." ;; Block asyncs so the SIGCHLD handler doesn't execute concurrently.- ;; Notably, that makes sure the handler processes the SIGCHLD for OBJ's- ;; process once we're done; otherwise, it could end up respawning OBJ.+ ;; Notably, that makes sure the handler processes the SIGCHLD for SERVICE's+ ;; process once we're done; otherwise, it could end up respawning SERVICE. (call-with-blocked-asyncs (lambda ()- (if (not (running? obj))- (local-output "Service ~a is not running." (canonical-name obj))- (if (slot-ref obj 'stop-delay?)+ (if (not (running? service))+ (begin+ (local-output "Service ~a is not running." (canonical-name service))+ (list (canonical-name service)))+ (if (slot-ref service 'stop-delay?) (begin- (slot-set! obj 'waiting-for-termination? #t)+ (slot-set! service 'waiting-for-termination? #t) (local-output "Service ~a pending to be stopped."- (canonical-name obj)))- (begin- ;; Stop services that depend on it.- (for-each-service- (lambda (serv)- (and (running? serv)- (for-each (lambda (sym)- (and (memq sym (provided-by obj))- (stop serv)))- (required-by serv)))))-+ (canonical-name service))+ (list (canonical-name service)))+ (let ((name (canonical-name service))+ (stopped-dependents (fold-services (lambda (other acc)+ (if (and (running? other)+ (required-by? service other))+ (append (stop other) acc)+ acc))+ '()))) ;; Stop the service itself. (catch #t (lambda ()- (apply (slot-ref obj 'stop)- (slot-ref obj 'running)+ (apply (slot-ref service 'stop)+ (slot-ref service 'running) args)) (lambda (key . args) ;; Special case: 'root' may quit.- (and (eq? root-service obj)+ (and (eq? root-service service) (eq? key 'quit) (apply quit args)) (caught-error key args))) - ;; OBJ is no longer running.- (slot-set! obj 'running #f)+ ;; SERVICE is no longer running.+ (slot-set! service 'running #f) ;; Reset the list of respawns.- (slot-set! obj 'last-respawns '())+ (slot-set! service 'last-respawns '()) ;; Replace the service with its replacement, if it has one- (let ((replacement (slot-ref obj 'replacement)))+ (let ((replacement (slot-ref service 'replacement))) (when replacement- (replace-service obj replacement)))+ (replace-service service replacement))) ;; Status message.- (let ((name (canonical-name obj)))- (if (running? obj)- (local-output "Service ~a could not be stopped." name)- (local-output "Service ~a has been stopped." name))))))- (slot-ref obj 'running))))+ (if (running? service)+ (local-output "Service ~a could not be stopped." name)+ (local-output "Service ~a has been stopped." name))+ (cons name stopped-dependents))))))) ;; Call action THE-ACTION with ARGS. (define-method (action (obj <service>) the-action . args)@@ -423,10 +434,9 @@ NEW-SERVICE." ;; Restarting is done in the obvious way. ((restart) (lambda (running . args)- (if running- (stop obj)- (local-output "~a was not running." (canonical-name obj)))- (apply start obj args)))+ (let ((stopped-services (stop obj)))+ (for-each start stopped-services)+ #t))) ((status) ;; Return the service itself. It is automatically converted to an sexp ;; via 'result->sexp' and sent to the client.@@ -953,6 +963,16 @@ Return #f if service is not found." (eq? name (canonical-name service))) services)) +(define (fold-services proc init)+ "Apply PROC to the registered services to build a result, and return that+result. Works in a manner akin to `fold' from SRFI-1."+ (hash-fold (lambda (name services acc)+ (let ((service (lookup-canonical-service name services)))+ (if service+ (proc service acc)+ acc)))+ init %services))+ (define (for-each-service proc) "Call PROC for each registered service." (hash-for-each (lambda (name services)diff --git a/tests/restart.sh b/tests/restart.shnew file mode 100644index 0000000..92a1f79--- /dev/null+++ b/tests/restart.sh@@ -0,0 +1,77 @@+# GNU Shepherd --- Test restarting services.+# Copyright © 2013, 2014, 2016 Ludovic Courtès <ludo@gnu.org>+# Copyright © 2018 Carlo Zancanaro <carlo@zancanaro.id.au>+#+# This file is part of the GNU Shepherd.+#+# The GNU Shepherd is free software; you can redistribute it and/or modify it+# under the terms of the GNU General Public License as published by+# the Free Software Foundation; either version 3 of the License, or (at+# your option) any later version.+#+# The GNU Shepherd is distributed in the hope that it will be useful, but+# WITHOUT ANY WARRANTY; without even the implied warranty of+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the+# GNU General Public License for more details.+#+# You should have received a copy of the GNU General Public License+# along with the GNU Shepherd. If not, see <http://www.gnu.org/licenses/>.++shepherd --version+herd --version++socket="t-socket-$$"+conf="t-conf-$$"+log="t-log-$$"+pid="t-pid-$$"++herd="herd -s $socket"++trap "cat $log || true ;+ rm -f $socket $conf $log;+ test -f $pid && kill \`cat $pid\` || true ; rm -f $pid" EXIT++cat > "$conf"<<EOF+(register-services+ (make <service>+ #:provides '(test1)+ #:start (const #t)+ #:stop (const #t))+ (make <service>+ #:provides '(test2)+ #:requires '(test1)+ #:start (const #t)+ #:stop (const #t))+ (make <service>+ #:provides '(test3)+ #:requires '(test2)+ #:start (const #t)+ #:stop (const #t)))+EOF++rm -f "$pid"+shepherd -I -s "$socket" -c "$conf" -l "$log" --pid="$pid" &++while ! test -f "$pid" ; do sleep 0.3 ; done++# Start some test services, and make sure they behave how we expect+$herd start test1+$herd start test2+$herd status test1 | grep started+$herd status test2 | grep started++# Restart test1 and make sure that both services are still running (ie. that+# test2 hasn't been stopped)+$herd restart test1+$herd status test1 | grep started+$herd status test2 | grep started++# Now let's test with a transitive dependency+$herd start test3+$herd status test3 | grep started++# After restarting test1 we want test3 to still be running+$herd restart test1+$herd status test1 | grep started+$herd status test2 | grep started+$herd status test3 | grep started-- 2.18.0
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCAAdFiEE1lpncq7JnOkt+LaeqdyPv9awIbwFAluBPpsACgkQqdyPv9awIbyeQw/7B/16oHu0N3vvfGPx1TJWfByvfjnU9jlo8p0psZqR8/fbGpqax2+/VdEPzoL4XSUZMYTAeSOqFsFshHl9oyTnMInHN3+XyDCKRg8UiOj8FjQkb74ppxGdjTbjmosb2gMl5I5cIRauH3aAq64DTeZx9PZRzN0OXXAGK0B8RrKUVu8Z0AC9q9kMZ4V+Jp5d0hOG20WwV66thte/ctzhC7xF2/WgRzG/fwxnF2B3UEL2VdKanBwNq8rLBcvVy4tW1OCWUCrdIZZWHxf6EoXckjiEgQBgUkmCOmqvur+AkD/eGtFzwMkrP0juJoMzNiMuOwhj+SYLrhgaHcnwWRnOxiap/jzY6MDoxhbw0WFtitCHhm5Mz9oFJdw6vJhtfsrmcC2X4MfzLxRW4GwC3UqbGVKDC9SllFGRJVjwEQh0emShLXYGv88fVDGF3zXXnmuQe1RHJYruNVjbzqWA+3KyBSHBPild/47dLmiFc6e+1VMYBpTUKdExOMXikEAp3uZSATATO0Ziqcw+jzROo3IHwqhz0QsBYYjFWgV/cDdeHJn0r7BAkqWIi2j7cNSQY2rQB41ZV7XlFMFg2RwSwXr/8hpMnrXsl0JlYKCaSwA+CMDSGBkDrYPU400Gn7c5J1ghS+XPWYs/dVO4ssK9zpy4hl/c9dINjd3a/d6yBMFb9xspSD4==z5ll-----END PGP SIGNATURE-----
L
L
Ludovic Courtès wrote on 25 Aug 2018 16:41
(name . Carlo Zancanaro)(address . carlo@zancanaro.id.au)(address . 23170@debbugs.gnu.org)
87bm9qbjb4.fsf@gnu.org
Hi again! :-)
Carlo Zancanaro <carlo@zancanaro.id.au> skribis:
Toggle quote (3 lines)> I've written a patch to fix this. It's not super smart, but it should> do the job.
I wonder if there are cases where one might want to restart a servicewithout restarting its dependent services. We can probably ignore itfor now, but perhaps we’ll need to add a flag or a separate action later.
Thoughts?
Toggle quote (13 lines)> From 50dd3ef4888b04ea3b869da893b23ad69fad8971 Mon Sep 17 00:00:00 2001> From: Carlo Zancanaro <carlo@zancanaro.id.au>> Date: Sat, 25 Aug 2018 20:32:11 +1000> Subject: [PATCH] service: Restart dependent services on service restart>> * modules/shepherd/service.scm (required-by?): New procedure.> (stop): Return a list of canonical-names for stopped dependent services,> including transitive dependencies.> (action)[restart]: Start services based on the return value of stop.> (fold-services): New procedure.> * tests/restart.sh: New file.> * Makefile.am (TESTS): Add tests/restart.sh.
[...]
Toggle quote (16 lines)> +# Restart test1 and make sure that both services are still running (ie. that> +# test2 hasn't been stopped)> +$herd restart test1> +$herd status test1 | grep started> +$herd status test2 | grep started> +> +# Now let's test with a transitive dependency> +$herd start test3> +$herd status test3 | grep started> +> +# After restarting test1 we want test3 to still be running> +$herd restart test1> +$herd status test1 | grep started> +$herd status test2 | grep started> +$herd status test3 | grep started
For clarity, should we do an explicit “herd stop test1” followed by“herd start test1”? I know it’s currently equivalent under the hood,but it might be slightly clearer. WDYT?
Otherwise it LGTM. Thanks for addressing these longstanding issues!
Ludo’.
C
C
Carlo Zancanaro wrote on 26 Aug 2018 00:48
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 23170@debbugs.gnu.org)
871samqcz4.fsf@zancanaro.id.au
On Sun, Aug 26 2018, Ludovic Courtès wrote:
Toggle quote (7 lines)> I wonder if there are cases where one might want to restart a > service without restarting its dependent services. We can > probably ignore it for now, but perhaps we’ll need to add a flag > or a separate action later.>> Thoughts?
I think this is best served by 'herd stop', followed by 'herd start'. This patch just special-cases the 'restart' action, so manually stopping then starting a service will behave as the old restart used to.
Toggle quote (4 lines)> For clarity, should we do an explicit “herd stop test1” followed > by “herd start test1”? I know it’s currently equivalent under > the hood, but it might be slightly clearer. WDYT?
Hopefully the above also answers this, too. I did consider whether it was worth adding a test for 'herd stop' to make sure it still stops dependent services, and 'herd start' to make sure it doesn't start dependent services, but in the end I decided not to. I'm happy to send through another patch to test these cases, though, if you think it would be worthwhile.
Carlo
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCAAdFiEE1lpncq7JnOkt+LaeqdyPv9awIbwFAluB3M8ACgkQqdyPv9awIbzlkRAAuuV6Uvb+pDrPisIMqTsunJKVOkurgKSSw7mMe2QCurxTGENCkohtniNuLSXR7M8J3TsmJgUxC3go4vc2koKDzBQj00mnFveRXI3LG4ZO3dPVLwtEt8OjxzRWoRBMqb1jSn1qdsW/sEo5qZ7cJcSmJb1lChvzQEg3IOgRmm+7VWNZbfWCp9BmThFpnBzTwZ7O/9wekWZuxaQ8xuWg3kFVf9CK2HsNs5gT2/QLqp9xC2mhlh6rx7pRmsBlxT8qK5rF1V/FgVcdAufwNN7bx58pURpv11VTTySQNXytszoXUp/khOg7UJpLKnHAbkkySPlnvIP3OHQlUuuR3TKpF3gs8AobJFqJmEPDGwcxDtIoZza4AOc7I2mfMo5ajKXaBLsmSY79Eou0ySF3LU/wTyJmg2tcv8khTGIcLMeOGEW7wm6mI0v0iReW6VVoUA5Ah8NI46WsXF1kT/lGnglrHTkmaiUqyhFkihxxiBlud6Q4Hq+ZZVKUBQvRbpZh0p7MOPBlxRWDJ50oDxCwuxF7SZKMbfVzTsj1fH9nM1RSMWohOXVchQ7tlU2NayhhU1cqhbnEcwozAiSX0cdVPvO71SZ4i2mGoKuzDFmXVhZhDt2thypzuu2m6qckRBtKJpDyXIaOGZaRf3VGyIT8pGGsvA8oB8LwF8ofnTHmAKXUW7tOh5E==jJMT-----END PGP SIGNATURE-----
L
L
Ludovic Courtès wrote on 26 Aug 2018 23:08
(name . Carlo Zancanaro)(address . carlo@zancanaro.id.au)(address . 23170@debbugs.gnu.org)
87ftz07s4t.fsf@gnu.org
Carlo Zancanaro <carlo@zancanaro.id.au> skribis:
Toggle quote (13 lines)> On Sun, Aug 26 2018, Ludovic Courtès wrote:>> I wonder if there are cases where one might want to restart a>> service without restarting its dependent services. We can probably>> ignore it for now, but perhaps we’ll need to add a flag or a>> separate action later.>>>> Thoughts?>> I think this is best served by 'herd stop', followed by 'herd> start'. This patch just special-cases the 'restart' action, so> manually stopping then starting a service will behave as the old> restart used to.
Great, I had overlooked this.
Toggle quote (6 lines)>> For clarity, should we do an explicit “herd stop test1” followed by>> “herd start test1”? I know it’s currently equivalent under the>> hood, but it might be slightly clearer. WDYT?>> Hopefully the above also answers this, too.
It does, thanks!
Toggle quote (6 lines)> I did consider whether it was worth adding a test for 'herd stop' to> make sure it still stops dependent services, and 'herd start' to make> sure it doesn't start dependent services, but in the end I decided not> to. I'm happy to send through another patch to test these cases,> though, if you think it would be worthwhile.
No, that’s fine.
I forgot if this was already done, but perhaps you can add a bit in themanual to insist that ‘restart’ is not quite the same as ‘stop’ +‘start’.
Anyway, it all LGTM, thanks!
Ludo’.
C
C
Carlo Zancanaro wrote on 27 Aug 2018 00:05
(address . ludo@gnu.org)(address . 23170@debbugs.gnu.org)
7A4A9C8A-E937-44A9-B941-B8263B00093A@zancanaro.id.au
Hey Ludo’!

On 27 August 2018 7:08:34 am AEST, ludo@gnu.org wrote:
Toggle quote (2 lines)>I forgot if this was already done, but perhaps you can add a bit in the manual to insist that ‘restart’ is not quite the same as ‘stop’ + ‘start’.
I hadn't done that, but I have now. There aren't many mentions of restart in the manual, but I changed the one that seemed most relevant.
Toggle quote (2 lines)>Anyway, it all LGTM, thanks!
Pushed! Thanks for the review.
Carlo
Attachment: file
C
C
Carlo Zancanaro wrote on 27 Aug 2018 00:05
(address . ludo@gnu.org)(address . 23170@debbugs.gnu.org)
BC9EC36A-3916-4DB8-B70F-785B692BC4A4@zancanaro.id.au
Hey Ludo’!

On 27 August 2018 7:08:34 am AEST, ludo@gnu.org wrote:
Toggle quote (2 lines)>I forgot if this was already done, but perhaps you can add a bit in the manual to insist that ‘restart’ is not quite the same as ‘stop’ + ‘start’.
I hadn't done that, but I have now. There aren't many mentions of restart in the manual, but I changed the one that seemed most relevant.
Toggle quote (2 lines)>Anyway, it all LGTM, thanks!
Pushed! Thanks for the review.
Carlo
L
L
Ludovic Courtès wrote on 27 Aug 2018 13:05
(name . Carlo Zancanaro)(address . carlo@zancanaro.id.au)(address . 23170@debbugs.gnu.org)
87o9docboc.fsf@gnu.org
Hi,
Carlo Zancanaro <carlo@zancanaro.id.au> skribis:
Toggle quote (2 lines)> Pushed! Thanks for the review.
Great!
I see that you also reverted the patch that removed the ‘EINTR-safe’workaround. Could you explain why that was necessary? (It should notbe necessary with current Guile versions.)
Thank you,Ludo’.
C
C
Carlo Zancanaro wrote on 27 Aug 2018 14:42
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 23170@debbugs.gnu.org)
87mut8dlq7.fsf@zancanaro.id.au
On Mon, Aug 27 2018, Ludovic Courtès wrote:
Toggle quote (5 lines)> I see that you also reverted the patch that removed the > ‘EINTR-safe’ workaround. Could you explain why that was > necessary? (It should not be necessary with current Guile > versions.)
I'm not really sure of the details, but as I mentioned on IRC, that commit stopped me from being able to boot. I grafted a new version of the Shepherd and reconfigured my system, and when it came up my screen was filled with messages complaining that it couldn't create things in /dev because they already existed. I tested the commits since the previous release and that was the one that caused my problem.
I'd like to investigate further, but I have no idea what could be causing it. All I have to go on so far is that I think the udev service is getting respawned repeatedly, but I don't know why that commit would cause that problem. After reverting that commit I could successfully boot back into my system and everything is working properly again.
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCAAdFiEE1lpncq7JnOkt+LaeqdyPv9awIbwFAluD8cEACgkQqdyPv9awIbzA0RAAjkqR5kU3kf4wTEwk3NsdWOQQxZ37efl4Q5Jm0Z7Uygxj17mYH1paAC84y9jVNVnOOMqdJz3OsfYyw73KyGo95dQJy9+bUNt3T4ruvVLSnZyNkmSSfQBlme0cKSar7fvf2wr4oLMHr3q/4YEL0OT48yOFajHI6zDw7PaepFir6xfRK/S6M45x+1FVQ9byDx+6siWwyH602cqj7nSl8wnhoZia8grbcV9zCz2bWCnR6OAtv1M12IQoKF63E8xn25s3qdPBcN1p8QpiA0rXdyOZL0L3pFdHELtkWWV4MTxhPyC78J/Yeiz6KQJgGMTTZ0vb/ikyFizo5yE1fUByKjIAVXYTefAMj/rLiz5uOW96y4UeDxGi4aTBWvyuC/Jo14UuZr83t9q6qVqR5IKK0qZssN4GH+MrzdMeEj+Pl7rONrq46BFGMerT6lgYhQdHabZkp934FGjQajfCqHjeMEcHFW8vNSgpIfUhtecVdq0ZN8cYDHZnLK8b2fU/OBfw2uM8di7384MopGyH7iY5b3XftkaLm3JQArl3FaNkynsRwhoqLFE13gshwe8J2URC21TO1eR0GBmTntgYg82BNNN9Lz6mPg2FbRKzHfXLKfJFXMhxjc1wY+Z/PpBH0JVRuqETm5wIcMAhmQo0cZEKvZHc4nqdjOBGSK5RpOositHyZy8==O6mm-----END PGP SIGNATURE-----
C
C
Clément Lassieur wrote on 27 Aug 2018 19:53
(name . Carlo Zancanaro)(address . carlo@zancanaro.id.au)
8736uzbssi.fsf@lassieur.org
Carlo Zancanaro <carlo@zancanaro.id.au> writes:
Toggle quote (6 lines)>>Anyway, it all LGTM, thanks!>> Pushed! Thanks for the review.>> Carlo
Awesome! Thanks a lot Carlo for working on this :-)
L
L
Ludovic Courtès wrote on 5 Jan 2019 14:53
control message for bug #23170
(address . control@debbugs.gnu.org)
87y37z5hyn.fsf@gnu.org
tags 23170 fixedclose 23170
?
Your comment

This issue is archived.

To comment on this conversation send email to 23170@debbugs.gnu.org