[PATCH Shepherd 0/2] Use 'signalfd' on GNU/Linux

  • Done
  • quality assurance status badge
Details
4 participants
  • Jan Nieuwenhuizen
  • Jelle Licht
  • Ludovic Courtès
  • Mathieu Othacehe
Owner
unassigned
Submitted by
Ludovic Courtès
Severity
normal
L
L
Ludovic Courtès wrote on 24 May 2020 16:27
(address . guix-patches@gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20200524142700.6151-1-ludo@gnu.org
Hello!

This patch series allows shepherd to use ‘signalfd’ on GNU/Linux.
It allows us to avoid race conditions related to signal delivery,
which in turn means we can pass an infinite timeout to ‘select’,
and thus increase battery life.

More generally, it’s a way to structure the code around the event
loop. The next step will be to make the code entirely reactive,
so we can have things like socket activation, being able to start
services that don’t depend on one another concurrently, and all that.
(This is actually also be possible without ‘signalfd’ but it’s more
consistent and robust to have everything visible to ‘select’.)

Thoughts?

I guess the main question is: can it go into 0.8.1, which we outta
release soon due to https://issues.guix.gnu.org/40981. I’d say
“yes”, but I’ll be even more confident if others take a look. :-)

Ludo’.

Ludovic Courtès (2):
system: Add support for 'signalfd'.
shepherd: Use 'signalfd' when possible.

configure.ac | 18 +++++++++
modules/shepherd.scm | 73 ++++++++++++++++++++++++++++++----
modules/shepherd/service.scm | 19 +++++----
modules/shepherd/system.scm.in | 65 ++++++++++++++++++++++++++++++
4 files changed, 161 insertions(+), 14 deletions(-)

--
2.26.2
L
L
Ludovic Courtès wrote on 24 May 2020 16:37
[PATCH Shepherd 2/2] shepherd: Use 'signalfd' when possible.
(address . 41507@debbugs.gnu.org)
20200524143700.6378-2-ludo@gnu.org
This eliminates possible race conditions related to signal delivery and
'select', which in turn means we can pass 'select' an infinite timeout
without any risk of missing a signal. It also allows us to structure
the code around a single event loop.

* modules/shepherd.scm (maybe-signal-port): New procedure.
(main): Use it and define 'signal-port'. Add 'ports' argument to
'next-command'. Pass it to 'select'. Change the timeout argument of
'select' to #f when SIGNAL-PORT is true.
* modules/shepherd/service.scm (fork+exec-command): Call
'unblock-signals' in the child process, right before 'exec-command'.
---
modules/shepherd.scm | 73 ++++++++++++++++++++++++++++++++----
modules/shepherd/service.scm | 19 ++++++----
2 files changed, 78 insertions(+), 14 deletions(-)

Toggle diff (154 lines)
diff --git a/modules/shepherd.scm b/modules/shepherd.scm
index 46faab6..ac60070 100644
--- a/modules/shepherd.scm
+++ b/modules/shepherd.scm
@@ -24,6 +24,7 @@
#:use-module (ice-9 format)
#:use-module (ice-9 rdelim) ;; Line-based I/O.
#:autoload (ice-9 readline) (activate-readline) ;for interactive use
+ #:use-module ((ice-9 threads) #:select (all-threads))
#:use-module (oop goops) ;; Defining classes and methods.
#:use-module (srfi srfi-1) ;; List library.
#:use-module (srfi srfi-26)
@@ -60,6 +61,53 @@ socket file at FILE-NAME upon exit of PROC. Return the values of PROC."
(close sock)
(catch-system-error (delete-file file-name))))))
+(define (maybe-signal-port signals)
+ "Return a signal port for SIGNALS, using 'signalfd' on GNU/Linux, or #f if
+that is not supported."
+ (catch 'system-error
+ (lambda ()
+ (let ((port (signalfd -1 signals)))
+ ;; As per the signalfd(2) man page, block SIGNALS. The tricky bit is
+ ;; that SIGNALS must be blocked for all the threads; new threads will
+ ;; inherit the signal mask, but we must ensure that neither Guile's
+ ;; signal delivery thread nor its finalization thread are already
+ ;; running, because if they do, they are not blocking SIGNALS. The
+ ;; signal delivery thread is started on the first call to 'sigaction'
+ ;; so we arrange to not call 'sigaction' beforehand; as for the
+ ;; finalization thread, use 'without-automatic-finalization' to
+ ;; temporarily stop it.
+ (without-automatic-finalization
+ (let ((count (length (all-threads))))
+ (if (= 1 count)
+ (begin
+ (block-signals signals)
+ port)
+ (begin
+ (local-output (l10n "warning: \
+already ~a threads running, disabling 'signalfd' support")
+ count)
+ (close-port port)
+ #f))))))
+ (lambda args
+ (if (= ENOSYS (system-error-errno args))
+ #f
+ (apply throw args)))))
+
+(define (handle-signal-port port)
+ "Read from PORT, a signalfd port, and handle the signal accordingly."
+ (let ((signal (consume-signalfd-siginfo port)))
+ (cond ((= signal SIGCHLD)
+ (handle-SIGCHLD))
+ ((= signal SIGINT)
+ (catch 'quit
+ (lambda ()
+ (stop root-service))
+ quit-exception-handler))
+ ((memv signal (list SIGTERM SIGHUP))
+ (stop root-service))
+ (else
+ #f))))
+
;; Main program.
(define (main . args)
@@ -82,6 +130,12 @@ socket file at FILE-NAME upon exit of PROC. Return the values of PROC."
(= EINVAL errno) ;PR_SET_CHILD_SUBREAPER unavailable
(apply throw args)))))))
+ (define signal-port
+ ;; Attempt to create a "signal port" via 'signalfd'. This must be called
+ ;; before the 'sigaction' procedure is called, because 'sigaction' spawns
+ ;; the signal thread.
+ (maybe-signal-port (list SIGCHLD SIGINT SIGTERM SIGHUP)))
+
(initialize-cli)
(let ((config-file #f)
@@ -281,7 +335,9 @@ socket file at FILE-NAME upon exit of PROC. Return the values of PROC."
;; "Failed to autoload handle-SIGCHLD in (ice-9 readline):"
(handle-SIGCHLD)
- (let next-command ()
+ (let next-command ((ports (if signal-port
+ (list signal-port sock)
+ (list sock))))
(define (read-from sock)
(match (accept sock)
((command-source . client-address)
@@ -289,14 +345,17 @@ socket file at FILE-NAME upon exit of PROC. Return the values of PROC."
(process-connection command-source))
(_ #f)))
- ;; XXX: Until we use signalfd(2), there's always a time window
+ ;; When not using signalfd(2), there's always a time window
;; before 'select' during which a handler async can be queued
;; but not executed. Work around it by exiting 'select' every
;; few seconds.
- (match (select (list sock) (list) (list)
- (if poll-services? 0.5 30))
- (((sock) _ _)
- (read-from sock))
+ (match (select ports (list) (list)
+ (and (not signal-port)
+ (if poll-services? 0.5 30)))
+ (((port _ ...) _ _)
+ (if (and signal-port (eq? port signal-port))
+ (handle-signal-port port)
+ (read-from sock)))
(_
;; 'select' returned an empty set, probably due to EINTR.
;; Explicitly call the SIGCHLD handler because we cannot be
@@ -306,7 +365,7 @@ socket file at FILE-NAME upon exit of PROC. Return the values of PROC."
(when poll-services?
(check-for-dead-services))
- (next-command))))))))
+ (next-command ports))))))))
;; Start all of SERVICES, which is a list of canonical names (FIXME?),
;; but in a order where all dependencies are fulfilled before we
diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scm
index 45fcf32..a202a98 100644
--- a/modules/shepherd/service.scm
+++ b/modules/shepherd/service.scm
@@ -898,13 +898,18 @@ its PID."
(pid (and (sigaction SIGTERM SIG_DFL)
(primitive-fork))))
(if (zero? pid)
- (exec-command command
- #:user user
- #:group group
- #:log-file log-file
- #:directory directory
- #:file-creation-mask file-creation-mask
- #:environment-variables environment-variables)
+ (begin
+ ;; Unblock any signals that might have been blocked by the parent
+ ;; process.
+ (unblock-signals (list SIGCHLD SIGINT SIGHUP SIGTERM))
+
+ (exec-command command
+ #:user user
+ #:group group
+ #:log-file log-file
+ #:directory directory
+ #:file-creation-mask file-creation-mask
+ #:environment-variables environment-variables))
(begin
;; Restore the initial SIGTERM handler.
(sigaction SIGTERM term-handler)
--
2.26.2
L
L
Ludovic Courtès wrote on 24 May 2020 16:36
[PATCH Shepherd 1/2] system: Add support for 'signalfd'.
(address . 41507@debbugs.gnu.org)
20200524143700.6378-1-ludo@gnu.org
* configure.ac: Add 'AC_CHECK_SIZEOF' calls for 'struct
signalfd_siginfo' and 'sigset_t'. Add 'AC_COMPUTE_INT' for
'SFD_CLOEXEC', 'SIG_BLOCK', and 'SIG_UNBLOCK'. Substitute the results.
* modules/shepherd/system.scm.in (allocate-sigset, sigemptyset)
(sigaddset, sigset): New procedures.
(%sizeof-struct-signalfd-siginfo, SFD_CLOEXEC): New variables.
(signalfd, consume-signalfd-siginfo): New procedures.
(SIG_BLOCK, SIG_UNBLOCK): New variables.
(sigprocmask, block-signals, unblock-signals): New procedures.
---
configure.ac | 18 ++++++++++
modules/shepherd/system.scm.in | 65 ++++++++++++++++++++++++++++++++++
2 files changed, 83 insertions(+)

Toggle diff (123 lines)
diff --git a/configure.ac b/configure.ac
index 5d11846..052a826 100644
--- a/configure.ac
+++ b/configure.ac
@@ -87,6 +87,24 @@ AC_COMPUTE_INT([PR_SET_CHILD_SUBREAPER], [PR_SET_CHILD_SUBREAPER], [#include <sy
AC_SUBST([PR_SET_CHILD_SUBREAPER])
AC_MSG_RESULT([done])
+dnl Check the size of 'signalfd_siginfo'. If it's undefined, returns zero.
+AC_CHECK_SIZEOF([struct signalfd_siginfo], [], [#include <sys/signalfd.h>])
+AC_CHECK_SIZEOF([sigset_t], [], [#include <signal.h>])
+
+AC_MSG_CHECKING([<sys/signalfd.h> and <sys/signal.h> constants])
+AC_COMPUTE_INT([SFD_CLOEXEC], [SFD_CLOEXEC], [#include <sys/signalfd.h>])
+AC_COMPUTE_INT([SIG_BLOCK], [SIG_BLOCK], [#include <sys/signal.h>])
+AC_COMPUTE_INT([SIG_UNBLOCK], [SIG_UNBLOCK], [#include <sys/signal.h>])
+AC_MSG_RESULT([done])
+
+SIZEOF_STRUCT_SIGNALFD_SIGINFO="$ac_cv_sizeof_struct_signalfd_siginfo"
+SIZEOF_SIGSET_T="$ac_cv_sizeof_sigset_t"
+AC_SUBST([SIZEOF_STRUCT_SIGNALFD_SIGINFO])
+AC_SUBST([SIZEOF_SIGSET_T])
+AC_SUBST([SFD_CLOEXEC])
+AC_SUBST([SIG_BLOCK])
+AC_SUBST([SIG_UNBLOCK])
+
AC_MSG_CHECKING([whether to build crash handler])
case "$host_os" in
linux-gnu*) build_crash_handler=yes;;
diff --git a/modules/shepherd/system.scm.in b/modules/shepherd/system.scm.in
index e5ecd1f..872fad4 100644
--- a/modules/shepherd/system.scm.in
+++ b/modules/shepherd/system.scm.in
@@ -20,8 +20,10 @@
(define-module (shepherd system)
#:use-module (system foreign)
+ #:use-module (ice-9 binary-ports)
#:use-module (rnrs bytevectors)
#:use-module (srfi srfi-11)
+ #:use-module (srfi srfi-26)
#:export (disable-reboot-on-ctrl-alt-del
reboot
halt
@@ -30,6 +32,11 @@
prctl
PR_SET_CHILD_SUBREAPER
getpgid
+ SFD_CLOEXEC
+ signalfd
+ consume-signalfd-siginfo
+ block-signals
+ unblock-signals
without-automatic-finalization))
;; The <sys/reboot.h> constants.
@@ -132,6 +139,64 @@ ctrlaltdel(8) and see kernel/reboot.c in Linux."
(list err))
result)))))
+(define (allocate-sigset)
+ (bytevector->pointer (make-bytevector @SIZEOF_SIGSET_T@)))
+
+(define sigemptyset
+ (syscall->procedure int "sigemptyset" '(*)))
+
+(define sigaddset
+ (syscall->procedure int "sigaddset" `(* ,int)))
+
+(define (sigset signals)
+ "Return a pointer to a fresh 'sigset_t' for SIGNALS."
+ (let ((set (allocate-sigset)))
+ (sigemptyset set)
+ (for-each (cut sigaddset set <>) signals)
+ set))
+
+(define %sizeof-struct-signalfd-siginfo
+ ;; Size of 'struct signalfd_siginfo' or zero if it doesn't exist, as is the
+ ;; case on GNU/Hurd.
+ @SIZEOF_STRUCT_SIGNALFD_SIGINFO@)
+
+(define SFD_CLOEXEC @SFD_CLOEXEC@)
+
+(define signalfd
+ (let ((proc (syscall->procedure int "signalfd" `(,int * ,int))))
+ (lambda* (fd signals #:optional (flags SFD_CLOEXEC))
+ "Return an open input port over a signal file descriptor for SIGNALS, a
+list of signal constants; if FD is -1, a new file descriptor is allocated,
+otherwise FD is returned and its associated state is updated. FLAGS must be a
+bitmask of SFD_CLOEXEC or SFD_NONBLOCK."
+ (fdopen (proc fd (sigset signals) flags) "r0"))))
+
+(define (consume-signalfd-siginfo port)
+ "Read a 'signalfd_siginfo' structure from PORT and discard it. Return the
+number of the signal received."
+ (let ((bv (get-bytevector-n port %sizeof-struct-signalfd-siginfo)))
+ ;; The first 'uint32_t' field of 'struct signalfd_siginfo' is the signal
+ ;; number.
+ (bytevector-u32-native-ref bv 0)))
+
+(define SIG_BLOCK @SIG_BLOCK@)
+(define SIG_UNBLOCK @SIG_UNBLOCK@)
+
+(define sigprocmask
+ (let ((proc (syscall->procedure int "pthread_sigmask" `(,int * *))))
+ (lambda (how signals)
+ "Add SIGNALS, a list of SIG* values, to the set of blocked signals if
+HOW is SIG_BLOCK, or unblock them if HOW is SIG_UNBLOCK."
+ (proc how (sigset signals) %null-pointer))))
+
+(define (block-signals signals)
+ "Block SIGNALS, a list of SIG* values, in the current thread."
+ (sigprocmask SIG_BLOCK signals))
+
+(define (unblock-signals signals)
+ "Unblock SIGNALS, a list of SIG* values, in the current thread."
+ (sigprocmask SIG_UNBLOCK signals))
+
;;;
;;; Guile shenanigans.
--
2.26.2
J
J
Jelle Licht wrote on 24 May 2020 17:13
Re: [bug#41507] [PATCH Shepherd 0/2] Use 'signalfd' on GNU/Linux
87o8qd2wos.fsf@jlicht.xyz
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (4 lines)
> Hello!
>
> This patch series allows shepherd to use ‘signalfd’ on GNU/Linux.

[...]

Toggle quote (2 lines)
> Thoughts?

I am extremely happy to see that you got this to work; I bashed my head
against a wall getting the signal-handling thread to block the right
signals. This does seem like something that should be in guile proper
rather than shepherd though.

As I read it, you need to set up a signalfd handlers for specific
signals before ever calling `sigaction'. Does this not have an impact on
being able to do some forms of REPL-driven development with shepherd?
Perhaps it makes sense to document this gotcha (and some of its
implications) in a location other than an inline comment in
`maybe-signal-port'.

Thanks for getting this to work nonetheless!
- Jelle
L
L
Ludovic Courtès wrote on 25 May 2020 09:37
(name . Jelle Licht)(address . jlicht@fsfe.org)(address . 41507@debbugs.gnu.org)
87tv04wjm9.fsf@gnu.org
Hi Jelle,

Jelle Licht <jlicht@fsfe.org> skribis:

Toggle quote (5 lines)
> I am extremely happy to see that you got this to work; I bashed my head
> against a wall getting the signal-handling thread to block the right
> signals. This does seem like something that should be in guile proper
> rather than shepherd though.

Oh right, you had first-hand experience fiddling with this!

Yeah ideally Guile would provide some ‘signalfd’ support. I’m not sure
exactly how that should work. It’s easier to have ad-hoc support in the
Shepherd in the meantime.

Toggle quote (4 lines)
> As I read it, you need to set up a signalfd handlers for specific
> signals before ever calling `sigaction'. Does this not have an impact on
> being able to do some forms of REPL-driven development with shepherd?

It depends, I don’t test things like SIGCHLD handling from the REPL
anyway. :-)

Toggle quote (4 lines)
> Perhaps it makes sense to document this gotcha (and some of its
> implications) in a location other than an inline comment in
> `maybe-signal-port'.

Where would you document that? Currently, one gets a warning if signals
cannot be properly blocked.

Thanks for your feedback!

Ludo’.
J
J
Jelle Licht wrote on 25 May 2020 10:02
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 41507@debbugs.gnu.org)
87mu5wv3wg.fsf@jlicht.xyz
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (14 lines)
>> As I read it, you need to set up a signalfd handlers for specific
>> signals before ever calling `sigaction'. Does this not have an impact on
>> being able to do some forms of REPL-driven development with shepherd?
>
> It depends, I don’t test things like SIGCHLD handling from the REPL
> anyway. :-)
>
>> Perhaps it makes sense to document this gotcha (and some of its
>> implications) in a location other than an inline comment in
>> `maybe-signal-port'.
>
> Where would you document that? Currently, one gets a warning if signals
> cannot be properly blocked.

Being able to use both sigaction-based handlers and signalfd is simply
weird, not forbidden.

When I read
Toggle snippet (3 lines)
"already X>1 threads running, disabling 'signalfd' support"

I would already have to understand why this is ("I didn't start any
threads, what is going on").

Your inline comment clearly indicates some possible approaches to debug
this warning. Knowing that I should call this before any sigaction stuff
seems relevant enough to be documented. The Texinfo manual could have a
(tiny) section on this. The _only_ way to learn this is by
already being a Guile-guru, or by reading the source.

- Jelle
M
M
Mathieu Othacehe wrote on 25 May 2020 14:31
Re: [bug#41507] [PATCH Shepherd 2/2] shepherd: Use 'signalfd' when possible.
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 41507@debbugs.gnu.org)
87a71ww5zx.fsf@gnu.org
Hello Ludo,

This is a big improvement, thank you!

Toggle quote (5 lines)
> + (lambda args
> + (if (= ENOSYS (system-error-errno args))
> + #f
> + (apply throw args)))))

Maybe:

(and (= ENOSYS (system-error-errno args))
(apply throw args))

Toggle quote (12 lines)
> +
> +(define (handle-signal-port port)
> + "Read from PORT, a signalfd port, and handle the signal accordingly."
> + (let ((signal (consume-signalfd-siginfo port)))
> + (cond ((= signal SIGCHLD)
> + (handle-SIGCHLD))
> + ((= signal SIGINT)
> + (catch 'quit
> + (lambda ()
> + (stop root-service))
> + quit-exception-handler))

Maybe we should create a handle-SIGINT, to make sure that the same code
is shared with the sigaction handler?

Toggle quote (5 lines)
> + (begin
> + ;; Unblock any signals that might have been blocked by the parent
> + ;; process.
> + (unblock-signals (list SIGCHLD SIGINT SIGHUP SIGTERM))

This made me realize that we may want to disable/reset SIGINT and
SIGHUP, in the same way that we do for SIGTERM? This is not related to
your patch anyway.

You could also extend the comment to say that it is only necessary if
using the signalfd mechanism (because signals are not blocked
otherwise).

In any case, this looks fine!

Thanks,

Mathieu
L
L
Ludovic Courtès wrote on 26 May 2020 23:32
Re: [bug#41507] [PATCH Shepherd 0/2] Use 'signalfd' on GNU/Linux
(name . Jelle Licht)(address . jlicht@fsfe.org)(address . 41507@debbugs.gnu.org)
87eer6mlfs.fsf@gnu.org
Hello,

Jelle Licht <jlicht@fsfe.org> skribis:

Toggle quote (19 lines)
> Ludovic Courtès <ludo@gnu.org> writes:
>
>>> As I read it, you need to set up a signalfd handlers for specific
>>> signals before ever calling `sigaction'. Does this not have an impact on
>>> being able to do some forms of REPL-driven development with shepherd?
>>
>> It depends, I don’t test things like SIGCHLD handling from the REPL
>> anyway. :-)
>>
>>> Perhaps it makes sense to document this gotcha (and some of its
>>> implications) in a location other than an inline comment in
>>> `maybe-signal-port'.
>>
>> Where would you document that? Currently, one gets a warning if signals
>> cannot be properly blocked.
>
> Being able to use both sigaction-based handlers and signalfd is simply
> weird, not forbidden.

It’s not really possible in practice. What happens occasionally if you
do that is:

1. thread A exits ‘select’ due to read event on the signal FD
2. thread B runs handler
3. thread A attempts to read from signal FD, which blocks indefinitely
because the signal is no longer “pending”

Toggle quote (7 lines)
> When I read
>
> "already X>1 threads running, disabling 'signalfd' support"
>
> I would already have to understand why this is ("I didn't start any
> threads, what is going on").

You mean you as a user? The message is meant as a hint for developers
if they have to debug the thing. It’s true that there’s nothing users
can do about it. (But hopefully nobody will see that message, it’s not
supposed to happen when running ‘shepherd’ directly!)

Toggle quote (6 lines)
> Your inline comment clearly indicates some possible approaches to debug
> this warning. Knowing that I should call this before any sigaction stuff
> seems relevant enough to be documented. The Texinfo manual could have a
> (tiny) section on this. The _only_ way to learn this is by
> already being a Guile-guru, or by reading the source.

I don’t think the *user* manual should document POSIX signal handling in
multi-threaded programs with Linux’ signalfd and with Guile. That’s a
pretty niche topic that users probably don’t want to know about. :-)

Those who do want to know about it are Shepherd developers, and I
thought they’d quickly stumble upon the comment if they have to fiddle
with this. Hopefully the explanation makes it look a bit less arcane!

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 27 May 2020 00:13
Re: [bug#41507] [PATCH Shepherd 2/2] shepherd: Use 'signalfd' when possible.
(name . Mathieu Othacehe)(address . othacehe@gnu.org)
875zcimjj2.fsf@gnu.org
Hi!

Mathieu Othacehe <othacehe@gnu.org> skribis:

Toggle quote (10 lines)
>> + (lambda args
>> + (if (= ENOSYS (system-error-errno args))
>> + #f
>> + (apply throw args)))))
>
> Maybe:
>
> (and (= ENOSYS (system-error-errno args))
> (apply throw args))

It’s not equivalent. :-)

Toggle quote (14 lines)
>> +(define (handle-signal-port port)
>> + "Read from PORT, a signalfd port, and handle the signal accordingly."
>> + (let ((signal (consume-signalfd-siginfo port)))
>> + (cond ((= signal SIGCHLD)
>> + (handle-SIGCHLD))
>> + ((= signal SIGINT)
>> + (catch 'quit
>> + (lambda ()
>> + (stop root-service))
>> + quit-exception-handler))
>
> Maybe we should create a handle-SIGINT, to make sure that the same code
> is shared with the sigaction handler?

Good idea, done!

Toggle quote (9 lines)
>> + (begin
>> + ;; Unblock any signals that might have been blocked by the parent
>> + ;; process.
>> + (unblock-signals (list SIGCHLD SIGINT SIGHUP SIGTERM))
>
> This made me realize that we may want to disable/reset SIGINT and
> SIGHUP, in the same way that we do for SIGTERM? This is not related to
> your patch anyway.

Right, I’ll let you look into it if that’s fine with you.

Toggle quote (4 lines)
> You could also extend the comment to say that it is only necessary if
> using the signalfd mechanism (because signals are not blocked
> otherwise).

Done.

I did:

make dist
guix build shepherd --with-source=shepherd-0.8.0.tar.gz \
-s x86_64-linux -s i686-linux -s armhf-linux -s aarch64-linux
guix build shepherd --with-source=shepherd-0.8.0.tar.gz \
--target=i586-pc-gnu

and it all passes.

Janneke, could you check that it works on GNU/Hurd?

Ludo’.
Closed
J
J
Jan Nieuwenhuizen wrote on 27 May 2020 07:45
(name . Ludovic Courtès)(address . ludo@gnu.org)
87367mdj88.fsf@gnu.org
Ludovic Courtès writes:

Hello,

Toggle quote (2 lines)
> Janneke, could you check that it works on GNU/Hurd?

I haven't caught-up with the discussion, but I just verified that
upstream master~

399bea2 shepherd: Use 'signalfd' when possible.

works on latest wip-hurd-vm!

I did a make dist after finding out (well that was pretty clear from the
patches, but...) that applying the patches need autotools rebootstrap,
etc.

Janneke

--
Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond http://lilypond.org
Freelance IT http://JoyofSource.com| Avatar® http://AvatarAcademy.com
Closed
L
L
Ludovic Courtès wrote on 27 May 2020 11:23
(name . Jan Nieuwenhuizen)(address . janneke@gnu.org)
87blm9loik.fsf@gnu.org
Hi,

Jan Nieuwenhuizen <janneke@gnu.org> skribis:

Toggle quote (13 lines)
> Ludovic Courtès writes:
>
> Hello,
>
>> Janneke, could you check that it works on GNU/Hurd?
>
> I haven't caught-up with the discussion, but I just verified that
> upstream master~
>
> 399bea2 shepherd: Use 'signalfd' when possible.
>
> works on latest wip-hurd-vm!

Yay, awesome, thanks for checking!

Ludo’.
Closed
L
L
Ludovic Courtès wrote on 30 May 2020 19:44
(name . Mathieu Othacehe)(address . othacehe@gnu.org)(address . 41507@debbugs.gnu.org)
87tuzxs4gj.fsf@gnu.org
Hello Mathieu!

Mathieu Othacehe <othacehe@gnu.org> skribis:

Toggle quote (9 lines)
>> + (begin
>> + ;; Unblock any signals that might have been blocked by the parent
>> + ;; process.
>> + (unblock-signals (list SIGCHLD SIGINT SIGHUP SIGTERM))
>
> This made me realize that we may want to disable/reset SIGINT and
> SIGHUP, in the same way that we do for SIGTERM? This is not related to
> your patch anyway.

I looked into this and came up with the patches below.

What it does is unconditionally block these four signals before forking.
Then the child process installs SIG_DFL handlers for them and unblocks
them.

That way, the child is guaranteed to never execute the original
handlers, and neither the parent nor the child misses any of these
signals (previously, the temporary (sigaction SIGTERM SIG_DFL)
introduced a window during which shepherd could be killed by a SIGTERM
instead of handling it gracefully.)

WDYT?

(I also thought about using ‘call-with-blocked-asyncs’ instead of
‘with-blocked-signals’. That would have prevented signal handling at
the Scheme level from happening, but the unhandled signals in the child
would be lost.)

If it works for you, I’ll do some more testing, and then hopefully we
can release 0.8.1 in the coming days!

Ludo’.
From bc74b5e33625a082ad0d44fe4409d459222aa295 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <ludo@gnu.org>
Date: Sat, 30 May 2020 17:44:07 +0200
Subject: [PATCH 1/3] system: 'sigprocmask' returns the previous set of blocked
signals.

* modules/shepherd/system.scm.in (sigismember, sigset->list): New
procedures.
(sigprocmask): Return the old set of signals.
---
modules/shepherd/system.scm.in | 29 ++++++++++++++++++++++++++---
1 file changed, 26 insertions(+), 3 deletions(-)

Toggle diff (60 lines)
diff --git a/modules/shepherd/system.scm.in b/modules/shepherd/system.scm.in
index ac822f8..9c55c69 100644
--- a/modules/shepherd/system.scm.in
+++ b/modules/shepherd/system.scm.in
@@ -148,6 +148,11 @@ ctrlaltdel(8) and see kernel/reboot.c in Linux."
(define sigaddset
(syscall->procedure int "sigaddset" `(* ,int)))
+(define sigismember
+ (let ((proc (syscall->procedure int "sigismember" `(* ,int))))
+ (lambda (set signal)
+ (not (zero? (proc set signal))))))
+
(define (sigset signals)
"Return a pointer to a fresh 'sigset_t' for SIGNALS."
(let ((set (allocate-sigset)))
@@ -155,6 +160,20 @@ ctrlaltdel(8) and see kernel/reboot.c in Linux."
(for-each (cut sigaddset set <>) signals)
set))
+(define sigset->list
+ (let ((all-signals
+ (filter integer?
+ (module-map (lambda (symbol variable)
+ (let ((str (symbol->string symbol)))
+ (and (string-prefix? "SIG" str)
+ (not (string-prefix? "SIG_" str))
+ (variable-ref variable))))
+ (resolve-interface '(guile))))))
+ (lambda (set)
+ "Return the list of integers (signal numbers) corresponding to SET, a
+sigset pointer."
+ (filter (cut sigismember set <>) all-signals))))
+
(define %sizeof-struct-signalfd-siginfo
;; Size of 'struct signalfd_siginfo' or zero if it doesn't exist, as is the
;; case on GNU/Hurd.
@@ -186,13 +205,17 @@ number of the signal received."
(let ((proc (syscall->procedure int "pthread_sigmask" `(,int * *))))
(lambda (how signals)
"Add SIGNALS, a list of SIG* values, to the set of blocked signals if
-HOW is SIG_BLOCK, or unblock them if HOW is SIG_UNBLOCK."
+HOW is SIG_BLOCK, or unblock them if HOW is SIG_UNBLOCK. Return the previous
+set of blocked signals as a list of SIG* values."
+ (define old
+ (allocate-sigset))
+
(let-values (((result err)
- (proc how (sigset signals) %null-pointer)))
+ (proc how (sigset signals) old)))
(if (= -1 result)
(throw 'system-error "sigprocmask" "~A"
(list (strerror err)) (list err))
- result)))))
+ (sigset->list old))))))
(define (block-signals signals)
"Block SIGNALS, a list of SIG* values, in the current thread."
--
2.26.2
From ec3631115f8ef070c939f392bb316ad44360a83c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <ludo@gnu.org>
Date: Sat, 30 May 2020 19:28:35 +0200
Subject: [PATCH 2/3] system: Add 'with-blocked-signals'.

* configure.ac: Compute and substitute 'SIG_SETMASK'.
* modules/shepherd/system.scm.in (SIG_SETMASK): New variable.
(set-blocked-signals, call-with-blocked-signals): New procedures.
(with-blocked-signals): New macro.
---
.dir-locals.el | 4 +++-
configure.ac | 2 ++
modules/shepherd/system.scm.in | 21 +++++++++++++++++++++
3 files changed, 26 insertions(+), 1 deletion(-)

Toggle diff (82 lines)
diff --git a/.dir-locals.el b/.dir-locals.el
index 8361cb6..3e64a3e 100644
--- a/.dir-locals.el
+++ b/.dir-locals.el
@@ -9,6 +9,8 @@
(bug-reference-url-format . "http://bugs.gnu.org/%s")
(bug-reference-bug-regexp
. "<https?://\\(debbugs\\|bugs\\)\\.gnu\\.org/\\([0-9]+\\)>")))
- (scheme-mode . ((indent-tabs-mode . nil)))
+ (scheme-mode
+ . ((indent-tabs-mode . nil)
+ (eval . (put 'with-blocked-signals 'scheme-indent-function 1))))
(texinfo-mode . ((indent-tabs-mode . nil)
(fill-column . 72))))
diff --git a/configure.ac b/configure.ac
index 052a826..6c2c7b3 100644
--- a/configure.ac
+++ b/configure.ac
@@ -95,6 +95,7 @@ AC_MSG_CHECKING([<sys/signalfd.h> and <sys/signal.h> constants])
AC_COMPUTE_INT([SFD_CLOEXEC], [SFD_CLOEXEC], [#include <sys/signalfd.h>])
AC_COMPUTE_INT([SIG_BLOCK], [SIG_BLOCK], [#include <sys/signal.h>])
AC_COMPUTE_INT([SIG_UNBLOCK], [SIG_UNBLOCK], [#include <sys/signal.h>])
+AC_COMPUTE_INT([SIG_SETMASK], [SIG_SETMASK], [#include <sys/signal.h>])
AC_MSG_RESULT([done])
SIZEOF_STRUCT_SIGNALFD_SIGINFO="$ac_cv_sizeof_struct_signalfd_siginfo"
@@ -104,6 +105,7 @@ AC_SUBST([SIZEOF_SIGSET_T])
AC_SUBST([SFD_CLOEXEC])
AC_SUBST([SIG_BLOCK])
AC_SUBST([SIG_UNBLOCK])
+AC_SUBST([SIG_SETMASK])
AC_MSG_CHECKING([whether to build crash handler])
case "$host_os" in
diff --git a/modules/shepherd/system.scm.in b/modules/shepherd/system.scm.in
index 9c55c69..d606573 100644
--- a/modules/shepherd/system.scm.in
+++ b/modules/shepherd/system.scm.in
@@ -37,6 +37,8 @@
consume-signalfd-siginfo
block-signals
unblock-signals
+ set-blocked-signals
+ with-blocked-signals
without-automatic-finalization))
;; The <sys/reboot.h> constants.
@@ -200,6 +202,7 @@ number of the signal received."
(define SIG_BLOCK @SIG_BLOCK@)
(define SIG_UNBLOCK @SIG_UNBLOCK@)
+(define SIG_SETMASK @SIG_SETMASK@)
(define sigprocmask
(let ((proc (syscall->procedure int "pthread_sigmask" `(,int * *))))
@@ -225,6 +228,24 @@ set of blocked signals as a list of SIG* values."
"Unblock SIGNALS, a list of SIG* values, in the current thread."
(sigprocmask SIG_UNBLOCK signals))
+(define (set-blocked-signals signals)
+ "Block exactly the signals listed in SIGNALS, a list of SIG* values, in the
+current thread."
+ (sigprocmask SIG_SETMASK signals))
+
+(define (call-with-blocked-signals signals thunk)
+ (let ((previous-set #f))
+ (dynamic-wind
+ (lambda ()
+ (set! previous-set (block-signals signals)))
+ thunk
+ (lambda ()
+ (set-blocked-signals previous-set)))))
+
+(define-syntax-rule (with-blocked-signals signals exp ...)
+ "Evaluate EXP... in a context where SIGNALS are blocked."
+ (call-with-blocked-signals signals (lambda () exp ...)))
+
;;;
;;; Guile shenanigans.
--
2.26.2
From 576ac6155dcabea47dfdf69fb9a8cc07cecf9695 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <ludo@gnu.org>
Date: Sat, 30 May 2020 19:32:43 +0200
Subject: [PATCH 3/3] service: 'fork+exec-command' blocks handled signals
before forking.

This is a followup to d190773751ddeddbe0daa8e4a43f76b73c4fd7ac, which
addressed only SIGTERM instead of all of %PRECIOUS-SIGNALS.
Furthermore, setting the SIGTERM handler introduced a window in the
shepherd process during which SIGTERM instances would be lost.

* modules/shepherd/service.scm (%precious-signals): New variable.
(fork+exec-command): Remove calls to 'sigaction' for SIGTERM. Wrap
'let' in 'with-blocked-signals'. Restore default signal handlers in the
child before unblocking signals.
---
modules/shepherd/service.scm | 53 ++++++++++++++++++------------------
1 file changed, 26 insertions(+), 27 deletions(-)

Toggle diff (73 lines)
diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scm
index f3ac32a..1bc77b1 100644
--- a/modules/shepherd/service.scm
+++ b/modules/shepherd/service.scm
@@ -870,6 +870,10 @@ false."
program (strerror (system-error-errno args)))
(primitive-exit 1))))))))
+(define %precious-signals
+ ;; Signals that the shepherd process handles.
+ (list SIGCHLD SIGINT SIGHUP SIGTERM))
+
(define* (fork+exec-command command
#:key
(user #f)
@@ -886,33 +890,28 @@ its PID."
(sigaction SIGCHLD handle-SIGCHLD SA_NOCLDSTOP)
(set! %sigchld-handler-installed? #t))
- ;; When forking a process, the signal handlers are inherited, until it
- ;; forks. If SIGTERM is received by the forked process, before it calls
- ;; execv, the installed SIGTERM handler, stopping Shepherd will be called.
- ;; To avoid this, save the SIGTERM handler, disable it, and restore it once,
- ;; the process has been forked. This way, the forked process will use the
- ;; default SIGTERM handler stopping the process.
- (let ((term-handler (match (sigaction SIGTERM)
- ((proc . _)
- proc)))
- (pid (and (sigaction SIGTERM SIG_DFL)
- (primitive-fork))))
- (if (zero? pid)
- (begin
- ;; Unblock any signals that might have been blocked by the parent
- ;; process if using 'signalfd'.
- (unblock-signals (list SIGCHLD SIGINT SIGHUP SIGTERM))
-
- (exec-command command
- #:user user
- #:group group
- #:log-file log-file
- #:directory directory
- #:file-creation-mask file-creation-mask
- #:environment-variables environment-variables))
- (begin
- ;; Restore the initial SIGTERM handler.
- (sigaction SIGTERM term-handler)
+ ;; Child processes inherit signal handlers until they exec. If one of
+ ;; %PRECIOUS-SIGNALS is received by the child before it execs, the installed
+ ;; handler, which stops shepherd, is called. To avoid this, block signals
+ ;; so that the child process never executes those handlers.
+ (with-blocked-signals %precious-signals
+ (let ((pid (primitive-fork)))
+ (if (zero? pid)
+ (begin
+ ;; First restore the default handlers.
+ (for-each (cut sigaction <> SIG_DFL) %precious-signals)
+
+ ;; Unblock any signals that have been blocked by the parent
+ ;; process.
+ (unblock-signals %precious-signals)
+
+ (exec-command command
+ #:user user
+ #:group group
+ #:log-file log-file
+ #:directory directory
+ #:file-creation-mask file-creation-mask
+ #:environment-variables environment-variables))
pid))))
(define* (make-forkexec-constructor command
--
2.26.2
M
M
Mathieu Othacehe wrote on 2 Jun 2020 09:00
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 41507@debbugs.gnu.org)
87imgax88r.fsf@gnu.org
Hey Ludo,

Toggle quote (8 lines)
> That way, the child is guaranteed to never execute the original
> handlers, and neither the parent nor the child misses any of these
> signals (previously, the temporary (sigaction SIGTERM SIG_DFL)
> introduced a window during which shepherd could be killed by a SIGTERM
> instead of handling it gracefully.)
>
> WDYT?

Yes, this is indeed much better this way!

Toggle quote (4 lines)
> +(define %precious-signals
> + ;; Signals that the shepherd process handles.
> + (list SIGCHLD SIGINT SIGHUP SIGTERM))

We could maybe factorize this list with the one in "signal-port" in the
"main" procedure. This way if we ever add an extra signal, we do not
forget to add it to blocked signals list.

Otherwise, this looks fine, thanks for taking care of that :)

Mathieu
L
L
Ludovic Courtès wrote on 2 Jun 2020 23:38
(name . Mathieu Othacehe)(address . othacehe@gnu.org)(address . 41507@debbugs.gnu.org)
87h7vtrvvo.fsf@gnu.org
Hi,

Mathieu Othacehe <othacehe@gnu.org> skribis:

Toggle quote (18 lines)
>> That way, the child is guaranteed to never execute the original
>> handlers, and neither the parent nor the child misses any of these
>> signals (previously, the temporary (sigaction SIGTERM SIG_DFL)
>> introduced a window during which shepherd could be killed by a SIGTERM
>> instead of handling it gracefully.)
>>
>> WDYT?
>
> Yes, this is indeed much better this way!
>
>> +(define %precious-signals
>> + ;; Signals that the shepherd process handles.
>> + (list SIGCHLD SIGINT SIGHUP SIGTERM))
>
> We could maybe factorize this list with the one in "signal-port" in the
> "main" procedure. This way if we ever add an extra signal, we do not
> forget to add it to blocked signals list.

Yes, good idea.

I pushed the series with a followup commit to do that:

38e3589 shepherd: Factorize list of handled signals.
576ac61 service: 'fork+exec-command' blocks handled signals before forking.
ec36311 system: Add 'with-blocked-signals'.
bc74b5e system: 'sigprocmask' returns the previous set of blocked signals.

Let’s see if I can tag a release tomorrow…

Thanks for your feedback!

Ludo’.
?