[PATCH 0/3] [shepherd] improve race-free spawn+wait

  • Done
  • quality assurance status badge
Details
3 participants
  • Ludovic Courtès
  • Tobias Geerinckx-Rice
  • Ulf Herrman
Owner
unassigned
Submitted by
Ulf Herrman
Severity
normal
U
U
Ulf Herrman wrote on 25 Feb 2023 23:08
(address . guix-patches@gnu.org)
87cz5x1jr2.fsf@tilde.club
These patches fill out shepherd's procedures for running processes to
completion. They add a replacement for 'system' to complement the
existing replacement for 'system*', and add a 'fork+exec+wait-process'
procedure so that the flexibility of that family of procedures is
available for this use case as well. It also improves error handling in
the event that an exception occurs while spawning a process in the
process monitor, which would normally kill that essential fiber.

Note: I previously tried to send this to guix-devel, but it didn't seem
to make it (I didn't see it in the archives after half a day), and after
some consideration I recalled that guix-patches exists. Is this the
right place for shepherd patches?
From 64370a98dfc17f0531de7397a38362c03a1d89bc Mon Sep 17 00:00:00 2001
From: ulfvonbelow <striness@tilde.club>
Date: Sat, 25 Feb 2023 00:42:41 -0600
Subject: [PATCH 1/3] service: Propagate exceptions while spawning in process
monitor.

* modules/shepherd/service.scm (unboxed-errors): new procedure.
(boxed-errors): new syntax.
(process-monitor): use it to propagate exceptions from fork+exec-command via
reply channel.
(spawn-via-monitor): new procedure.
(spawn-command): use it.
---
modules/shepherd/service.scm | 47 ++++++++++++++++++++++++++++--------
1 file changed, 37 insertions(+), 10 deletions(-)

Toggle diff (77 lines)
diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scm
index fd2ef1b..196ee44 100644
--- a/modules/shepherd/service.scm
+++ b/modules/shepherd/service.scm
@@ -1825,6 +1825,24 @@ otherwise by updating its state."
;; loop so we don't miss any terminated child process.
(loop)))))
+(define-syntax-rule (boxed-errors exps ...)
+ (catch #t
+ (lambda ()
+ (call-with-values
+ (lambda ()
+ exps ...)
+ (lambda results
+ (list 'success results))))
+ (lambda args
+ (list 'exception args))))
+
+(define unboxed-errors
+ (match-lambda
+ (('success vals)
+ (apply values vals))
+ (('exception args)
+ (apply throw args))))
+
(define (process-monitor channel)
"Run a process monitor that handles requests received over @var{channel}."
(let loop ((waiters vlist-null))
@@ -1860,11 +1878,17 @@ otherwise by updating its state."
waiters)))
(('spawn command reply)
- ;; Spawn COMMAND; send its exit status to REPLY when it terminates.
- ;; This operation is atomic: the WAITERS table is updated before
- ;; termination of PID can possibly be handled.
- (let ((pid (fork+exec-command command)))
- (loop (vhash-consv pid reply waiters))))
+ ;; Spawn COMMAND; send the spawn result (pid or exception) to REPLY;
+ ;; send its exit status to REPLY when it terminates. This operation is
+ ;; atomic: the WAITERS table is updated before termination of PID can
+ ;; possibly be handled.
+ (let ((result (boxed-errors (fork+exec-command command))))
+ (put-message reply result)
+ (match result
+ (('exception . _)
+ (loop waiters))
+ (('success (pid))
+ (loop (vhash-consv pid reply waiters))))))
(('await pid reply)
;; Await the termination of PID and send its status on REPLY.
@@ -1900,14 +1924,17 @@ context. The process monitoring fiber is responsible for handling
@code{SIGCHLD} and generally dealing with process creation and termination."
(call-with-process-monitor (lambda () exp ...)))
+(define (spawn-via-monitor command)
+ (let ((reply (make-channel)))
+ (put-message (current-process-monitor)
+ `(spawn ,command ,reply))
+ (unboxed-errors (get-message reply))
+ (get-message reply)))
+
(define (spawn-command program . arguments)
"Like 'system*' but do not block while waiting for PROGRAM to terminate."
(if (current-process-monitor)
- (let ((reply (make-channel)))
- (put-message (current-process-monitor)
- `(spawn ,(cons program arguments)
- ,reply))
- (get-message reply))
+ (spawn-via-monitor (cons program arguments))
(apply system* program arguments)))
(define default-process-termination-grace-period
--
2.38.1
From 51ee63ace6f3f52eb196c990664cc6b9af3d3683 Mon Sep 17 00:00:00 2001
From: ulfvonbelow <striness@tilde.club>
Date: Sat, 25 Feb 2023 00:46:27 -0600
Subject: [PATCH 2/3] service: accept fork+exec-command argument list in
monitor.

Sometimes it's necessary to run startup / shutdown programs as a certain user,
in a certain directory, with certain environment variables, etc. Shepherd
currently provides a replacement for system* that won't race against the
child process auto-reaper, but this lacks the flexibility Shepherd users are
used to.

* modules/shepherd/service.scm (process-monitor): treat command instead as
argument list to fork+exec-command.
(spawn-via-monitor): update to new convention.
(fork+exec+wait-command): new procedure.
---
modules/shepherd/service.scm | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)

Toggle diff (59 lines)
diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scm
index 196ee44..a36e486 100644
--- a/modules/shepherd/service.scm
+++ b/modules/shepherd/service.scm
@@ -94,6 +94,7 @@
default-process-termination-grace-period
exec-command
fork+exec-command
+ fork+exec+wait-command
default-pid-file-timeout
read-pid-file
make-system-constructor
@@ -1877,12 +1878,12 @@ otherwise by updating its state."
vlist-null
waiters)))
- (('spawn command reply)
+ (('spawn args reply)
;; Spawn COMMAND; send the spawn result (pid or exception) to REPLY;
;; send its exit status to REPLY when it terminates. This operation is
;; atomic: the WAITERS table is updated before termination of PID can
;; possibly be handled.
- (let ((result (boxed-errors (fork+exec-command command))))
+ (let ((result (boxed-errors (apply fork+exec-command args))))
(put-message reply result)
(match result
(('exception . _)
@@ -1924,19 +1925,26 @@ context. The process monitoring fiber is responsible for handling
@code{SIGCHLD} and generally dealing with process creation and termination."
(call-with-process-monitor (lambda () exp ...)))
-(define (spawn-via-monitor command)
+(define (spawn-via-monitor arguments)
(let ((reply (make-channel)))
(put-message (current-process-monitor)
- `(spawn ,command ,reply))
+ `(spawn ,arguments ,reply))
(unboxed-errors (get-message reply))
(get-message reply)))
(define (spawn-command program . arguments)
"Like 'system*' but do not block while waiting for PROGRAM to terminate."
(if (current-process-monitor)
- (spawn-via-monitor (cons program arguments))
+ (spawn-via-monitor (list (cons program arguments)))
(apply system* program arguments)))
+(define (fork+exec+wait-command command . arguments)
+ "Like 'fork+exec' but also wait for PROGRAM to terminate, giving its exit
+status."
+ (if (current-process-monitor)
+ (spawn-via-monitor (cons command arguments))
+ (waitpid (apply fork+exec-command command arguments))))
+
(define default-process-termination-grace-period
;; Default process termination "grace period" before we send SIGKILL.
(make-parameter 5))
--
2.38.1
From 177592ee9d4b7fc6dcc80e545e8ad615a1d6786c Mon Sep 17 00:00:00 2001
From: ulfvonbelow <striness@tilde.club>
Date: Sat, 25 Feb 2023 00:56:57 -0600
Subject: [PATCH 3/3] service: add spawn-shell-command replacement for
`system'.

We already have a replacement for `system*' that avoids racing, but not for
`system'.

* configure.ac (SHELL): new substitution variable.
* modules/shepherd/system.scm.in (%shell-filename): new variable.
* modules/shepherd/service.scm
(spawn-shell-command, real-system): new procedures.
* modules/shepherd.scm (main): replace `system' with `spawn-shell-command'.
---
configure.ac | 1 +
modules/shepherd.scm | 7 +++++--
modules/shepherd/service.scm | 13 +++++++++++++
modules/shepherd/system.scm.in | 5 ++++-
4 files changed, 23 insertions(+), 3 deletions(-)

Toggle diff (91 lines)
diff --git a/configure.ac b/configure.ac
index 6f681dc..19c177a 100644
--- a/configure.ac
+++ b/configure.ac
@@ -32,6 +32,7 @@ guilemoduledir="${datarootdir}/guile/site/$GUILE_EFFECTIVE_VERSION"
guileobjectdir="${libdir}/guile/$GUILE_EFFECTIVE_VERSION/site-ccache"
AC_SUBST([guilemoduledir])
AC_SUBST([guileobjectdir])
+AC_SUBST([SHELL])
dnl Check for extra dependencies.
GUILE_MODULE_AVAILABLE([have_fibers], [(fibers)])
diff --git a/modules/shepherd.scm b/modules/shepherd.scm
index cce0507..1f6342e 100644
--- a/modules/shepherd.scm
+++ b/modules/shepherd.scm
@@ -420,8 +420,10 @@ already ~a threads running, disabling 'signalfd' support")
;; Replace the default 'system*' binding with one that
;; cooperates instead of blocking on 'waitpid'.
- (let ((real-system* system*))
+ (let ((real-system* system*)
+ (real-system system))
(set! system* spawn-command)
+ (set! system spawn-shell-command)
;; Restore 'system*' after fork.
(set! primitive-fork
@@ -430,7 +432,8 @@ already ~a threads running, disabling 'signalfd' support")
(let ((result (real-fork)))
(when (zero? result)
(set! primitive-fork real-fork)
- (set! system* real-system*))
+ (set! system* real-system*)
+ (set! system real-system))
result)))))
(run-daemon #:socket-file socket-file
diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scm
index a36e486..f8df3a9 100644
--- a/modules/shepherd/service.scm
+++ b/modules/shepherd/service.scm
@@ -81,6 +81,7 @@
handle-SIGCHLD
with-process-monitor
spawn-command
+ spawn-shell-command
%precious-signals
register-services
provided-by
@@ -1938,6 +1939,18 @@ context. The process monitoring fiber is responsible for handling
(spawn-via-monitor (list (cons program arguments)))
(apply system* program arguments)))
+(define real-system system)
+
+(define* (spawn-shell-command #:optional command)
+ "Like 'system' but do not block while waiting for COMMAND to terminate."
+ (if (current-process-monitor)
+ (if command
+ (spawn-command %shell-filename "-c" command)
+ #t)
+ (if command
+ (real-system command)
+ (real-system))))
+
(define (fork+exec+wait-command command . arguments)
"Like 'fork+exec' but also wait for PROGRAM to terminate, giving its exit
status."
diff --git a/modules/shepherd/system.scm.in b/modules/shepherd/system.scm.in
index 29357aa..4646e81 100644
--- a/modules/shepherd/system.scm.in
+++ b/modules/shepherd/system.scm.in
@@ -41,7 +41,8 @@
unblock-signals
set-blocked-signals
with-blocked-signals
- without-automatic-finalization))
+ without-automatic-finalization
+ %shell-filename))
;; The <sys/reboot.h> constants.
(define RB_AUTOBOOT @RB_AUTOBOOT@)
@@ -328,3 +329,5 @@ Turning finalization off shuts down the finalization thread as a side effect."
exp ...)
(lambda ()
(%set-automatic-finalization-enabled?! enabled?)))))
+
+(define %shell-filename "@SHELL@")
--
2.38.1
-----BEGIN PGP SIGNATURE-----

iQHIBAEBCAAyFiEEn6BUn0yca1D9JsMa1lV76sJM9mgFAmP6hsEUHHN0cmluZXNz
QHRpbGRlLmNsdWIACgkQ1lV76sJM9mjtsQwAiDIHp3DBdkOB1nWU5iU9WgVZHnvg
NpMxQhI0trcEluiMYltryxtyv9y4L0x1O2hKto+KbDdvS6l6NHF6IfGX4aV7hfq+
PJBntv3KQjK5NTPN96+D179aLeb8iCY//rh2/qjdLALtqxGHM8kcf83se1SQ3mr/
UtChBtz5vNQKJx3s7PSOAhGvLupNJ6rWdgKdm+BwbsWFZ/7lrb3+gNLXPMtQlEzC
3jgo7TKVEMhLLVuoYRsvCx79a5l5kayaA516B+xuKLGY3/OZIU/VT5Ts5tzAlhDU
VjTI4Wg1aKyZJDehuFwbYuoPWwxnODOhZ05fZKlKDqMhBRWsFnsPR355dU2uOY4I
ku4HkuvpQBHj1TPp7JqxRZ+cwU76REjoEjqJPNfKriIZj97/+OrPz/hdX8GLwPRY
Uwc/IZyoa5HAYKKpPyeBaZAmEJ8hxrA576nEIetuXmfFvSeheuSYw+5Qe6NujX/M
wtuZEi1t3SsHHjbCwNf9LQXPehBRVmRwtC0j
=FusD
-----END PGP SIGNATURE-----

T
T
Tobias Geerinckx-Rice wrote on 26 Feb 2023 15:17
(name . Ulf Herrman)(address . striness@tilde.club)(address . 61803@debbugs.gnu.org)
87ttz8v715.fsf@nckx
Ulf,

Ulf Herrman ???
Toggle quote (4 lines)
> Note: I previously tried to send this to guix-devel, but it
> didn't seem
> to make it (I didn't see it in the archives after half a day),

Note: I only check the spam/moderation queue once a day, most days
;-)

Toggle quote (6 lines)
> and
> after
> some consideration I recalled that guix-patches exists. Is this
> the
> right place for shepherd patches?

It is!

If you like, you can configure ‘git send-email’ to default to
guix-patches@gnu.org (sendemail.to). It will send separate mails
for each patch rather than attachments.

Kind regards,

T G-R
-----BEGIN PGP SIGNATURE-----

iIIEARYKACsWIQT12iAyS4c9C3o4dnINsP+IT1VteQUCY/trpg0cbWVAdG9iaWFz
LmdyAAoJEA2w/4hPVW15JogA/jtYf1cvINbLokGuSZzrBpnborN35ZMdF7VbDbsH
nKh9APiy/eKBvrywSugykEx4Ao77226bwInDwi3qNchMk+UG
=hNjy
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 2 Mar 2023 23:16
Re: bug#61803: [PATCH 0/3] [shepherd] improve race-free spawn+wait
(name . Ulf Herrman)(address . striness@tilde.club)(address . 61803@debbugs.gnu.org)
87h6v23ike.fsf@gnu.org
Hi Ulf,

Ulf Herrman <striness@tilde.club> skribis:

Toggle quote (8 lines)
> These patches fill out shepherd's procedures for running processes to
> completion. They add a replacement for 'system' to complement the
> existing replacement for 'system*', and add a 'fork+exec+wait-process'
> procedure so that the flexibility of that family of procedures is
> available for this use case as well. It also improves error handling in
> the event that an exception occurs while spawning a process in the
> process monitor, which would normally kill that essential fiber.

Nice!

Toggle quote (5 lines)
> Note: I previously tried to send this to guix-devel, but it didn't seem
> to make it (I didn't see it in the archives after half a day), and after
> some consideration I recalled that guix-patches exists. Is this the
> right place for shepherd patches?

Yes, as Tobias confirmed already. :-)

Toggle quote (13 lines)
> From 64370a98dfc17f0531de7397a38362c03a1d89bc Mon Sep 17 00:00:00 2001
> From: ulfvonbelow <striness@tilde.club>
> Date: Sat, 25 Feb 2023 00:42:41 -0600
> Subject: [PATCH 1/3] service: Propagate exceptions while spawning in process
> monitor.
>
> * modules/shepherd/service.scm (unboxed-errors): new procedure.
> (boxed-errors): new syntax.
> (process-monitor): use it to propagate exceptions from fork+exec-command via
> reply channel.
> (spawn-via-monitor): new procedure.
> (spawn-command): use it.

Good catch! I added a test and a copyright line for you (let me know if
I got it wrong) and pushed as 18989f2fffa6ecdbd0f9b77834e1a54c9c45ee73.

Toggle quote (17 lines)
> From 51ee63ace6f3f52eb196c990664cc6b9af3d3683 Mon Sep 17 00:00:00 2001
> From: ulfvonbelow <striness@tilde.club>
> Date: Sat, 25 Feb 2023 00:46:27 -0600
> Subject: [PATCH 2/3] service: accept fork+exec-command argument list in
> monitor.
>
> Sometimes it's necessary to run startup / shutdown programs as a certain user,
> in a certain directory, with certain environment variables, etc. Shepherd
> currently provides a replacement for system* that won't race against the
> child process auto-reaper, but this lacks the flexibility Shepherd users are
> used to.
>
> * modules/shepherd/service.scm (process-monitor): treat command instead as
> argument list to fork+exec-command.
> (spawn-via-monitor): update to new convention.
> (fork+exec+wait-command): new procedure.

I’ll take a closer look to this one and report back soon.

Toggle quote (15 lines)
> From 177592ee9d4b7fc6dcc80e545e8ad615a1d6786c Mon Sep 17 00:00:00 2001
> From: ulfvonbelow <striness@tilde.club>
> Date: Sat, 25 Feb 2023 00:56:57 -0600
> Subject: [PATCH 3/3] service: add spawn-shell-command replacement for
> `system'.
>
> We already have a replacement for `system*' that avoids racing, but not for
> `system'.
>
> * configure.ac (SHELL): new substitution variable.
> * modules/shepherd/system.scm.in (%shell-filename): new variable.
> * modules/shepherd/service.scm
> (spawn-shell-command, real-system): new procedures.
> * modules/shepherd.scm (main): replace `system' with `spawn-shell-command'.

Out of curiosity, do you have a need for ‘system’? I’m inclined to
recommend against its use, in which case this patch is unnecessary.

Toggle quote (2 lines)
> +(define %shell-filename "@SHELL@")

This is the configure-time shell so it will be wrong when
cross-compiling.

I’d just do:

(define %shell (or (getenv "SHELL") "/bin/sh"))

Thanks!

Ludo’.
L
L
Ludovic Courtès wrote on 4 Mar 2023 23:09
(name . Ulf Herrman)(address . striness@tilde.club)
877cvwp3rm.fsf@gnu.org
Hi Ulf,

Ulf Herrman <striness@tilde.club> skribis:

Toggle quote (17 lines)
> From 51ee63ace6f3f52eb196c990664cc6b9af3d3683 Mon Sep 17 00:00:00 2001
> From: ulfvonbelow <striness@tilde.club>
> Date: Sat, 25 Feb 2023 00:46:27 -0600
> Subject: [PATCH 2/3] service: accept fork+exec-command argument list in
> monitor.
>
> Sometimes it's necessary to run startup / shutdown programs as a certain user,
> in a certain directory, with certain environment variables, etc. Shepherd
> currently provides a replacement for system* that won't race against the
> child process auto-reaper, but this lacks the flexibility Shepherd users are
> used to.
>
> * modules/shepherd/service.scm (process-monitor): treat command instead as
> argument list to fork+exec-command.
> (spawn-via-monitor): update to new convention.
> (fork+exec+wait-command): new procedure.

On this one I took a similar approach but chose to extend
‘spawn-command’ instead of introducing a new procedure—see commit
0f3276a9c3dafbef41b0aab88ba5dda1bb78dc99.

Another difference is explicitly listing keyword arguments so that their
default values are taken from the caller’s dynamic state and not from
that of the process monitoring fiber. This fixes

Let me know what you think!

Thanks,
Ludo’.
U
U
Ulf Herrman wrote on 9 Mar 2023 04:48
(name . Ludovic Courtès)(address . ludo@gnu.org)
878rg61t5l.fsf@tilde.club
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (16 lines)
>> From 177592ee9d4b7fc6dcc80e545e8ad615a1d6786c Mon Sep 17 00:00:00 2001
>> From: ulfvonbelow <striness@tilde.club>
>> Date: Sat, 25 Feb 2023 00:56:57 -0600
>> Subject: [PATCH 3/3] service: add spawn-shell-command replacement for
>> `system'.
>>
>> We already have a replacement for `system*' that avoids racing, but not for
>> `system'.
>>
>> * configure.ac (SHELL): new substitution variable.
>> * modules/shepherd/system.scm.in (%shell-filename): new variable.
>> * modules/shepherd/service.scm
>> (spawn-shell-command, real-system): new procedures.
>> * modules/shepherd.scm (main): replace `system' with `spawn-shell-command'.
> Out of curiosity, do you have a need for ‘system’?

I don't.

Toggle quote (3 lines)
> I’m inclined to recommend against its use, in which case this patch is
> unnecessary.

I tend to agree, but make-system-constructor and make-system-destructor
both use it and are documented in the manual, so we should either make
them work properly or remove them.

Toggle quote (10 lines)
>> +(define %shell-filename "@SHELL@")
>
> This is the configure-time shell so it will be wrong when
> cross-compiling.
>
> I’d just do:
>
> (define %shell (or (getenv "SHELL") "/bin/sh"))
>

The rationale behind not taking that straightforward approach was to
closely emulate the normal behavior of 'system' on guix, where the shell
path used is a hardcoded store path, though since guix's libc is likely
the only one where this is anything other than /bin/sh, I suppose it
does make a lot more sense to patch it in the guix package definition
(or accept the minor behavioral difference) than to try to automagically
figure it out at configure-time, which also has the problems you
mentioned.
-----BEGIN PGP SIGNATURE-----

iQHIBAEBCAAyFiEEn6BUn0yca1D9JsMa1lV76sJM9mgFAmQJVyYUHHN0cmluZXNz
QHRpbGRlLmNsdWIACgkQ1lV76sJM9mjyagv+LCgotxnqeNPwMzd6Fj42twBH6FAS
jczDRUd03kKnr0colb/1/4U/14E99MhbQhyrtKrjwMxhvNg2CfU4f4Gsj/McrqeG
Livn2lvciUB9psr8CXfs6eEhr1yVCb9khlGbEXzNoAjcCINldMIXf/qcMNtIfN5L
9sQC7c44bjnwqXqpLBttbcm6IltCgCOkFmMa5OKqgugWjxbyLV60LyWbXMGs7nVt
Uys6U3nDhu2gC8pcMm3Q25xt14KcsRKz4YmTzlK3M+JAkIW63VYMNm0ryqO2a+az
XQ+agSHLZSSPIGYMpfKpjKz5m4wQHExAjBM5bX0WedTXdqv5tpFK4ItQve9mYuu0
VFbyEzjztP6N6M+Fs4cmokOyE4hqGzoEFQDIV9l7qFsRWomMBD4AX6LK2LbPbwUU
+0IYIBCO9HZhpdueUFO/Cg/MPFby9jVxkvCHecTYyhNhNV3WCCu3G6xEFZE8KoPN
yjHr6f6dbgZImihf0cSg3e5u/3XQyDUpjet5
=2DeH
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 11 Mar 2023 15:57
(name . Ulf Herrman)(address . striness@tilde.club)(address . 61803@debbugs.gnu.org)
87v8j7wd20.fsf_-_@gnu.org
Hi Ulf,

Ulf Herrman <striness@tilde.club> skribis:

Toggle quote (4 lines)
> I tend to agree, but make-system-constructor and make-system-destructor
> both use it and are documented in the manual, so we should either make
> them work properly or remove them.

Yeah, you’re right. So I guess we need to support it for now and start
a deprecation cycle so we can eventually remove it.

For now, I’ve pushed a simplified version of your patch as
89dd3bb57fa3e3a23cf85385b0788046b7e45170.

Let me know if you notice something wrong!

Ludo’.
L
L
Ludovic Courtès wrote on 11 Mar 2023 15:58
control message for bug #61803
(address . control@debbugs.gnu.org)
87ttyrwd1h.fsf@gnu.org
close 61803
quit
?