Shepherd doesn't seem to correctly handle waitpid itself

  • Done
  • quality assurance status badge
Details
3 participants
  • Josselin Poiret
  • Ludovic Courtès
  • Maxim Cournoyer
Owner
unassigned
Submitted by
Maxim Cournoyer
Severity
normal
M
M
Maxim Cournoyer wrote on 19 Sep 2022 06:29
(name . bug-guix)(address . bug-guix@gnu.org)
874jx4q953.fsf@gmail.com
Hi,

I've tried to determine why a workaround in the jami-service-type is
required in the 'stop' slot to avoid failures in 'herd restart jami',
and haven't quite found the culprit, but it appears to me that:

1. waipid is only called in one place in Shepherd, which is in the
handle-SIGCHLD procedure in (shepherd service), which does not
specifically wait for an exact PID but rather does:

(waitpid* WAIT_ANY WNOHANG), which is waitpid with some special handling
in the case a system-error exception is thrown with an ECHILD or EINTR
error number.

This doesn't strike me as a strong guarantee that waitpid occurs when
stop is called, because:

1. It requires to be installed in the signal handlers for each
processes, with something like:

Toggle snippet (5 lines)
(unless %sigchld-handler-installed?
(sigaction SIGCHLD handle-SIGCHLD SA_NOCLDSTOP)
(set! %sigchld-handler-installed? #t))

Done for fork+exec-command and make-inetd-forkexec-constructor, but not
for make-forkexec-constructor/container, AFAICT;

2. it has the WNOHANG flag, which means the stop simply does a kill the
the signal handling weakly (because of WNOHANG) waits on it, which means
the start may begin before the process was actually completely
terminated.

Here's a small reproducer to apply on our code base:

Toggle snippet (18 lines)
modified gnu/services/telephony.scm
@@ -685,13 +685,7 @@ (define (archive-name->username archive)
;; Finally, return the PID of the daemon process.
daemon-pid))
- (stop
- #~(lambda (pid . args)
- (kill pid SIGKILL)
- ;; Wait for the process to exit; this prevents overlapping
- ;; processes when issuing 'herd restart'.
- (waitpid pid)
- #f))))))))
+ (stop #~(make-kill-destructor))))))))
(define jami-service-type
(service-type

Then run 'make check-system TESTS=jami-provisioning' to see new
failures, or if you want to investigate manually the system:

Toggle snippet (24 lines)
$ ./pre-inst-env guix system vm --no-grafts --no-offload --no-graphic \
-e '(@@ (gnu tests telephony) %jami-os-provisioning)'

$ /gnu/store/rxi7c14hga62qslb0sr6nac9qnkxr0nn-run-vm.sh -m 1G -smp 4 \
-nic user,model=virtio-net-pci,hostfwd=tcp::10022-:22

# Connect to the QEMU VM:
$ ssh root@localhost -p10022

root@jami ~# herd restart jami
Service jami has been stopped.
herd: exception caught while executing 'start' on service 'jami':
dbus "method failed with error" "org.freedesktop.DBus.Error.NoReply" ("Message recipient disconnected from message bus without replying")
root@jami ~# herd status jami
Status of jami:
It is stopped.
It is enabled.
Provides (jami).
Requires (jami-dbus-session).
Conflicts with ().
Will be respawned.
root@jami ~# pgrep jami

Thanks,

Maxim
J
J
Josselin Poiret wrote on 20 Sep 2022 09:31
87o7va33iq.fsf@jpoiret.xyz
Hi Maxim,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (29 lines)
> Hi,
>
> I've tried to determine why a workaround in the jami-service-type is
> required in the 'stop' slot to avoid failures in 'herd restart jami',
> and haven't quite found the culprit, but it appears to me that:
>
> 1. waipid is only called in one place in Shepherd, which is in the
> handle-SIGCHLD procedure in (shepherd service), which does not
> specifically wait for an exact PID but rather does:
>
> (waitpid* WAIT_ANY WNOHANG), which is waitpid with some special handling
> in the case a system-error exception is thrown with an ECHILD or EINTR
> error number.
>
> This doesn't strike me as a strong guarantee that waitpid occurs when
> stop is called, because:
>
> 1. It requires to be installed in the signal handlers for each
> processes, with something like:
>
> --8<---------------cut here---------------start------------->8---
> (unless %sigchld-handler-installed?
> (sigaction SIGCHLD handle-SIGCHLD SA_NOCLDSTOP)
> (set! %sigchld-handler-installed? #t))
> --8<---------------cut here---------------end--------------->8---
>
> Done for fork+exec-command and make-inetd-forkexec-constructor, but not
> for make-forkexec-constructor/container, AFAICT;

The signal handler is only installed once in PID 1 (in fact, you haven't
forked yet here), since it's the one that receives the SIGCHLD.

What I don't understand that well is that this signal handler could be
installed only once when shepherd starts, right? That way, it wouldn't
need to depend on specific start actions being chosen.

Toggle quote (26 lines)
> 2. it has the WNOHANG flag, which means the stop simply does a kill the
> the signal handling weakly (because of WNOHANG) waits on it, which means
> the start may begin before the process was actually completely
> terminated.
>
> Here's a small reproducer to apply on our code base:
>
> --8<---------------cut here---------------start------------->8---
> modified gnu/services/telephony.scm
> @@ -685,13 +685,7 @@ (define (archive-name->username archive)
>
> ;; Finally, return the PID of the daemon process.
> daemon-pid))
> - (stop
> - #~(lambda (pid . args)
> - (kill pid SIGKILL)
> - ;; Wait for the process to exit; this prevents overlapping
> - ;; processes when issuing 'herd restart'.
> - (waitpid pid)
> - #f))))))))
> + (stop #~(make-kill-destructor))))))))
>
> (define jami-service-type
> (service-type
> --8<---------------cut here---------------end--------------->8---

The real problem here is not really the WNOHANG flag (you could remove
that and still get issues) but rather that the waitpid is run inside a
signal handler, which in Guile means that it's run through asyncs. You
have no guarantees wrt. when asyncs run, so they could run after or in
the middle of the next action. I also think make-kill-destructor should
waitpid the processes it's killing, as you're implying, and leave the
signal handler only for unexpected service crashes.

Best,
--
Josselin Poiret
L
L
Ludovic Courtès wrote on 23 Sep 2022 08:33
(name . Josselin Poiret)(address . dev@jpoiret.xyz)
87bkr6fvlz.fsf@gnu.org
Hi,

Josselin Poiret <dev@jpoiret.xyz> skribis:

Toggle quote (2 lines)
> Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

[...]

Toggle quote (15 lines)
>> 1. It requires to be installed in the signal handlers for each
>> processes, with something like:
>>
>> --8<---------------cut here---------------start------------->8---
>> (unless %sigchld-handler-installed?
>> (sigaction SIGCHLD handle-SIGCHLD SA_NOCLDSTOP)
>> (set! %sigchld-handler-installed? #t))
>> --8<---------------cut here---------------end--------------->8---
>>
>> Done for fork+exec-command and make-inetd-forkexec-constructor, but not
>> for make-forkexec-constructor/container, AFAICT;
>
> The signal handler is only installed once in PID 1 (in fact, you haven't
> forked yet here), since it's the one that receives the SIGCHLD.

Right.

Toggle quote (4 lines)
> What I don't understand that well is that this signal handler could be
> installed only once when shepherd starts, right? That way, it wouldn't
> need to depend on specific start actions being chosen.

The SIGCHLD handler is installed lazily since
f776de04e6702e18d95152072e78c43441d3ccc3. The rationale was discussed
here:


That said, on GNU/Linux, SIGCHLD is actually blocked and instead we rely
on signalfd(2). It’s from the main even loop in shepherd.scm that the
signal handler is called.

Toggle quote (17 lines)
>> Here's a small reproducer to apply on our code base:
>>
>> --8<---------------cut here---------------start------------->8---
>> modified gnu/services/telephony.scm
>> @@ -685,13 +685,7 @@ (define (archive-name->username archive)
>>
>> ;; Finally, return the PID of the daemon process.
>> daemon-pid))
>> - (stop
>> - #~(lambda (pid . args)
>> - (kill pid SIGKILL)
>> - ;; Wait for the process to exit; this prevents overlapping
>> - ;; processes when issuing 'herd restart'.
>> - (waitpid pid)
>> - #f))))))))
>> + (stop #~(make-kill-destructor))))))))

I think the main difference between these two is that the first one uses
SIGKILL while the second one uses SIGTERM.

You could try #~(make-kill-destructor SIGKILL) to get the same effect.

(Another difference is that ‘make-kill-destructor’ kills the process
group, not just the process itself.)

Anyway, the key point is that shepherd takes care of calling ‘waitpid’
for its child processes (services). If you call it yourself as in the
snippet above, you’re racing with shepherd; in the case above it
probably doesn’t make any difference though because it will consider
that the service is stopped in any case.

HTH!

Ludo’.
M
M
Maxim Cournoyer wrote on 23 Sep 2022 19:49
(name . Ludovic Courtès)(address . ludo@gnu.org)
875yhe9e1l.fsf@gmail.com
tags 57922 +notabug
thanks

Hi Ludo!

Ludovic Courtès <ludo@gnu.org> writes:

[...]

Toggle quote (14 lines)
>> What I don't understand that well is that this signal handler could be
>> installed only once when shepherd starts, right? That way, it wouldn't
>> need to depend on specific start actions being chosen.
>
> The SIGCHLD handler is installed lazily since
> f776de04e6702e18d95152072e78c43441d3ccc3. The rationale was discussed
> here:
>
> https://issues.guix.gnu.org/27553
>
> That said, on GNU/Linux, SIGCHLD is actually blocked and instead we rely
> on signalfd(2). It’s from the main even loop in shepherd.scm that the
> signal handler is called.

I had missed that, thanks for explaining.

Toggle quote (22 lines)
>>> Here's a small reproducer to apply on our code base:
>>>
>>> --8<---------------cut here---------------start------------->8---
>>> modified gnu/services/telephony.scm
>>> @@ -685,13 +685,7 @@ (define (archive-name->username archive)
>>>
>>> ;; Finally, return the PID of the daemon process.
>>> daemon-pid))
>>> - (stop
>>> - #~(lambda (pid . args)
>>> - (kill pid SIGKILL)
>>> - ;; Wait for the process to exit; this prevents overlapping
>>> - ;; processes when issuing 'herd restart'.
>>> - (waitpid pid)
>>> - #f))))))))
>>> + (stop #~(make-kill-destructor))))))))
>
> I think the main difference between these two is that the first one uses
> SIGKILL while the second one uses SIGTERM.
>
> You could try #~(make-kill-destructor SIGKILL) to get the same effect.

You are right, the important difference was SIGTERM vs SIGKILL. I
thought I had tried that. The problem only shows itself in the
'jami-provisioning' system test, not the 'jami' one.

Marking this one as notabug and closing.

Thanks again!

Maxim
Closed
M
M
Maxim Cournoyer wrote on 24 Sep 2022 05:32
(name . Ludovic Courtès)(address . ludo@gnu.org)
878rm98n17.fsf@gmail.com
reopen 57922
tags 57922 -notabug
thanks

Hi again,

[...]

Toggle quote (28 lines)
>>> Here's a small reproducer to apply on our code base:
>>>
>>> --8<---------------cut here---------------start------------->8---
>>> modified gnu/services/telephony.scm
>>> @@ -685,13 +685,7 @@ (define (archive-name->username archive)
>>>
>>> ;; Finally, return the PID of the daemon process.
>>> daemon-pid))
>>> - (stop
>>> - #~(lambda (pid . args)
>>> - (kill pid SIGKILL)
>>> - ;; Wait for the process to exit; this prevents overlapping
>>> - ;; processes when issuing 'herd restart'.
>>> - (waitpid pid)
>>> - #f))))))))
>>> + (stop #~(make-kill-destructor))))))))
>
> I think the main difference between these two is that the first one uses
> SIGKILL while the second one uses SIGTERM.
>
> You could try #~(make-kill-destructor SIGKILL) to get the same effect.

> You are right, the important difference was SIGTERM vs SIGKILL. I
> thought I had tried that. The problem only shows itself in the
> 'jami-provisioning' system test, not the 'jami' one.

> Marking this one as notabug and closing.

I think I spoke too soon. SIGKILL does fix the problem when *not* using
waitpid explicitly, but when using waitpid explicitly, SIGTERM can be
used just fine. In other words, this works:

Toggle snippet (11 lines)
@@ -687,7 +687,7 @@ (define (archive-name->username archive)
daemon-pid))
(stop
#~(lambda (pid . args)
- (kill pid SIGKILL)
+ (kill pid SIGTERM)
;; Wait for the process to exit; this prevents overlapping
;; processes when issuing 'herd restart'.
(waitpid pid)

but this doesn't:

Toggle snippet (16 lines)
@@ -685,13 +685,7 @@ (define (archive-name->username archive)
;; Finally, return the PID of the daemon process.
daemon-pid))
- (stop
- #~(lambda (pid . args)
- (kill pid SIGKILL)
- ;; Wait for the process to exit; this prevents overlapping
- ;; processes when issuing 'herd restart'.
- (waitpid pid)
- #f))))))))
+ (stop #~(make-kill-destructor))))))))
(define jami-service-type

when exercised with 'make check-system TESTS=jami-provisioning':

Toggle snippet (27 lines)
This is the GNU system. Welcome.
jami login: Jami Daemon 13.4.0, by Savoir-faire Linux 2004-2019
https://jami.net/
[Video support enabled]
[Plugins support enabled]

23:29:05.375 os_core_unix.c !pjlib 2.12.1 for POSIX initialized
shepherd: Service jami has been stopped.
Caught signal Terminated, terminating...

Some deprecated features have been used. Set the environment
variable GUILE_WARN_DEPRECATED to "detailed" and rerun the
program to get more information. Set it to "no" to suppress
this message.
Jami Daemon 13.4.0, by Savoir-faire Linux 2004-2019
https://jami.net/
[Video support enabled]
[Plugins support enabled]

One does not simply initialize the client: Another daemon is detected
/gnu/store/2vcv1fyqfyym2zcyf5bvbj1pcgbcc515-shepherd-marionette.scm:1:1718: ERROR:
1. &action-exception-error:
service: jami
action: start
key: misc-error
args: (#f "~A ~S ~S ~S" (dbus "method failed with error" "org.freedesktop.DBus.Error.NoReply" ("Message recipient disconnected from message bus without replying")) #f)
or manually through the test VM:

Toggle snippet (5 lines)
$(./pre-inst-env guix system vm --no-graphic --no-grafts --no-offload \
-e '(@@ (gnu tests telephony) %jami-os-provisioning)') \
-m 1G -smp $(nproc) "-nic" user,model=virtio-net-pci,hostfwd=tcp::10022-:22

This leads me to believe that Shepherd does not block until the process
is actually dead to mark the process as stopped (it just waitpid on the
group pid with WNOHANG), which means it won't block if the child process
hasn't exited yet, if I'm correct.

When we are in the stop slot, we know for sure that the process should
terminate completely, hence it'd make sense to call 'waitpid' *without*
WNOHANG there, to avoid 'herd restart' from starting the service while
its stopped process is not done terminating.

jamid can take quite some time to terminate cleanly because of the
networking threads in the opendht library that needs to be finalized,
which is probably the reason this problem can be observed here.

Thoughts?

Maxim
J
J
Josselin Poiret wrote on 24 Sep 2022 10:09
(address . 57922@debbugs.gnu.org)
87sfkh8a8z.fsf@jpoiret.xyz
Hi everyone,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (16 lines)
> This leads me to believe that Shepherd does not block until the process
> is actually dead to mark the process as stopped (it just waitpid on the
> group pid with WNOHANG), which means it won't block if the child process
> hasn't exited yet, if I'm correct.
>
> When we are in the stop slot, we know for sure that the process should
> terminate completely, hence it'd make sense to call 'waitpid' *without*
> WNOHANG there, to avoid 'herd restart' from starting the service while
> its stopped process is not done terminating.
>
> jamid can take quite some time to terminate cleanly because of the
> networking threads in the opendht library that needs to be finalized,
> which is probably the reason this problem can be observed here.
>
> Thoughts?

I agree with you, make-kill-destructor should waitpid the processes it's
killing. There shouldn't be any issues waitpid'ing before the
shepherd's signal handler, since stop actions are run with asyncs
disabled. The signal handler will run once but won't get anything
because all the processes were already waitpid'd and it uses WNOHANG.

Best,
--
Josselin Poiret
L
L
Ludovic Courtès wrote on 24 Sep 2022 18:30
(name . Josselin Poiret)(address . dev@jpoiret.xyz)
87zgeo68hc.fsf@gnu.org
Hi,

Josselin Poiret <dev@jpoiret.xyz> skribis:

Toggle quote (7 lines)
> Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:
>
>> This leads me to believe that Shepherd does not block until the process
>> is actually dead to mark the process as stopped (it just waitpid on the
>> group pid with WNOHANG), which means it won't block if the child process
>> hasn't exited yet, if I'm correct.

Correct: the service is marked as stopped as soon as ‘stop’ returns.

Toggle quote (17 lines)
>> When we are in the stop slot, we know for sure that the process should
>> terminate completely, hence it'd make sense to call 'waitpid' *without*
>> WNOHANG there, to avoid 'herd restart' from starting the service while
>> its stopped process is not done terminating.
>>
>> jamid can take quite some time to terminate cleanly because of the
>> networking threads in the opendht library that needs to be finalized,
>> which is probably the reason this problem can be observed here.
>>
>> Thoughts?
>
> I agree with you, make-kill-destructor should waitpid the processes it's
> killing. There shouldn't be any issues waitpid'ing before the
> shepherd's signal handler, since stop actions are run with asyncs
> disabled. The signal handler will run once but won't get anything
> because all the processes were already waitpid'd and it uses WNOHANG.

I think we need an extra “stopping” state for services. In general,
we’ll want to send SIGTERM, wait for some grace period or dead process
notification, then send SIGKILL, and finally change state to “stopped”.

This is not possible in 0.9 but is something I’d like to have in 0.10¹.

Ludo’.

M
M
Maxim Cournoyer wrote on 26 Sep 2022 02:12
(name . Ludovic Courtès)(address . ludo@gnu.org)
87leq76lk6.fsf@gmail.com
Hi,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (36 lines)
> Hi,
>
> Josselin Poiret <dev@jpoiret.xyz> skribis:
>
>> Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:
>>
>>> This leads me to believe that Shepherd does not block until the process
>>> is actually dead to mark the process as stopped (it just waitpid on the
>>> group pid with WNOHANG), which means it won't block if the child process
>>> hasn't exited yet, if I'm correct.
>
> Correct: the service is marked as stopped as soon as ‘stop’ returns.
>
>>> When we are in the stop slot, we know for sure that the process should
>>> terminate completely, hence it'd make sense to call 'waitpid' *without*
>>> WNOHANG there, to avoid 'herd restart' from starting the service while
>>> its stopped process is not done terminating.
>>>
>>> jamid can take quite some time to terminate cleanly because of the
>>> networking threads in the opendht library that needs to be finalized,
>>> which is probably the reason this problem can be observed here.
>>>
>>> Thoughts?
>>
>> I agree with you, make-kill-destructor should waitpid the processes it's
>> killing. There shouldn't be any issues waitpid'ing before the
>> shepherd's signal handler, since stop actions are run with asyncs
>> disabled. The signal handler will run once but won't get anything
>> because all the processes were already waitpid'd and it uses WNOHANG.
>
> I think we need an extra “stopping” state for services. In general,
> we’ll want to send SIGTERM, wait for some grace period or dead process
> notification, then send SIGKILL, and finally change state to “stopped”.
>
> This is not possible in 0.9 but is something I’d like to have in 0.10¹.

This sounds good. Let's keep this ticket open until this goodness
lands, as a reminder.

Thank you!

Maxim
?