[PATCH] services: nginx: Replace invoke with spawn-command.

  • Done
  • quality assurance status badge
Details
2 participants
  • Ludovic Courtès
  • Christopher Baines
Owner
unassigned
Submitted by
Christopher Baines
Severity
normal

Debbugs page

Christopher Baines wrote 5 days ago
(address . guix-patches@gnu.org)
e944b76a55fdad935058ac12f2fd19ecfc17ebd5.1741352535.git.mail@cbaines.net
I'm not sure where invoke is coming from here, but it could be from (guix
build utils), that uses system* which uses waitpid, which might cause problems
with recent versions of the shepherd?

At least I'm seeing issues on multiple machines where attempting to restart
the nginx service sometimes causes the shepherd to hang.

* gnu/services/web.scm (nginx-shepherd-service): Replace invoke with
spawn-command.

Change-Id: Ie9ce4be9a4df121465b28148612b4fbc45fb5126
---
gnu/services/web.scm | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

Toggle diff (18 lines)
diff --git a/gnu/services/web.scm b/gnu/services/web.scm
index 7593cd2eaa..b46a4db73f 100644
--- a/gnu/services/web.scm
+++ b/gnu/services/web.scm
@@ -870,7 +870,8 @@ (define (nginx-shepherd-service config)
(nginx-action
(lambda args
#~(lambda _
- (invoke #$nginx-binary "-c" #$config-file #$@args)
+ (spawn-command
+ (list #$nginx-binary "-c" #$config-file #$@args))
(match '#$args
(("-s" . _) #t)
(_

base-commit: 9bc4c9f521caab8aa8d88aa948a650945bb55838
--
2.48.1
Ludovic Courtès wrote 5 days ago
(name . Christopher Baines)(address . mail@cbaines.net)(address . 76811@debbugs.gnu.org)
878qphvtak.fsf@gnu.org
Christopher Baines <mail@cbaines.net> skribis:

Toggle quote (12 lines)
> I'm not sure where invoke is coming from here, but it could be from (guix
> build utils), that uses system* which uses waitpid, which might cause problems
> with recent versions of the shepherd?
>
> At least I'm seeing issues on multiple machines where attempting to restart
> the nginx service sometimes causes the shepherd to hang.
>
> * gnu/services/web.scm (nginx-shepherd-service): Replace invoke with
> spawn-command.
>
> Change-Id: Ie9ce4be9a4df121465b28148612b4fbc45fb5126

Hi! ‘invoke’ uses ‘system*’, which is an alias for ‘spawn-command’ (see
‘replace-core-bindings!’ in ‘shepherd.scm’) so the only effect of this
patch is that errors from “nginx -c nginx.conf …” would be ignored.

I think we need a reproducer for the hang so we can pinpoint the
problem because it’s a pretty serious bug!

Ludo’.
Christopher Baines wrote 5 days ago
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 76811-close@debbugs.gnu.org)
87v7skn4hv.fsf@cbaines.net
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (18 lines)
> Christopher Baines <mail@cbaines.net> skribis:
>
>> I'm not sure where invoke is coming from here, but it could be from (guix
>> build utils), that uses system* which uses waitpid, which might cause problems
>> with recent versions of the shepherd?
>>
>> At least I'm seeing issues on multiple machines where attempting to restart
>> the nginx service sometimes causes the shepherd to hang.
>>
>> * gnu/services/web.scm (nginx-shepherd-service): Replace invoke with
>> spawn-command.
>>
>> Change-Id: Ie9ce4be9a4df121465b28148612b4fbc45fb5126
>
> Hi! ‘invoke’ uses ‘system*’, which is an alias for ‘spawn-command’ (see
> ‘replace-core-bindings!’ in ‘shepherd.scm’) so the only effect of this
> patch is that errors from “nginx -c nginx.conf …” would be ignored.

Ah, yes, I see, I've tried to verify this and it does seem that the
nginx server is using this system* replacement.

Toggle quote (3 lines)
> I think we need a reproducer for the hang so we can pinpoint the
> problem because it’s a pretty serious bug!

I did try restarting nginx over and over again in the system test os,
but that seemed to work.

On a VM I have though, it only takes a few restarts for it to hang, I'm
not sure why though.
-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmfLKnxfFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9Xc5hg//eqeCGGeQfU5tA7YNpXMUXd1bCXziPi6W
KSPjSu5k/dxArwO/XN3c61qeVRBujKLiVmmvjCcjFZsMMm7Zlg3cKNaDRBGc1S/V
OLRIpJcMBdtNjtZ3OKSasPFmQidIazGulRS/yfHJWvYVHyV0lTrRADY0Ydo8u7ro
6NRhG/D33gYiwt7bqsmXhAU9+dHcAV83sKwEygFdo8T0EYaoRgPJOCfLAev7AUjE
iWRC8vgVJWGfR7kvREl2MwFJNzgwvt+v4NkOy+3EMAvQDVCbN8SoRjYrIBtKtCdT
/ZuOA80d6eZV0fXMUTyoC+jjALoO9by9gfShzeYLnwZisd7zjZ7VF1EQeg4SqJjP
F61J4BYRkrH79y4v+4pvHqIypGl9a2PWHgn8iHBpGQNKT1KLC8eJJri2u7xP6uX8
qMOyVSqkeacEvLKAvi4z1q68dFnpFSrr6KASC9Vge8K9axbE2Kxv83jYKFr8M08t
InuWAC5dRPncu1J81aorXb1HB68YtLwEmUvbAVbm6rlonrWP2dVeMXjW+B0pQk1t
fozbp21Kto66ZJLZbOMcq3VtsjiDcCDYsuCmZqOTguiQMhlmPr6+MmX2tIaMgMlx
6xuFubVjXegnSyS5GfhfnConXprOw/wCXiQlQX/3fth+SF5ks3c7kO9Zpn18zbBl
Zi+QqbF8mKY=
=EGYK
-----END PGP SIGNATURE-----

Ludovic Courtès wrote 5 days ago
(name . Christopher Baines)(address . mail@cbaines.net)(address . 76811@debbugs.gnu.org)
87y0xgv3pf.fsf@gnu.org
Hey,

Toggle quote (6 lines)
> I did try restarting nginx over and over again in the system test os,
> but that seemed to work.
>
> On a VM I have though, it only takes a few restarts for it to hang, I'm
> not sure why though.

It could be that nginx alone would work well, but combining it with
another service that does unusual things triggers the bug.

One way to investigate would be to start from ‘bayfront.scm’ (which I
think has that problem) and boil it down until we have something that
can run in a VM and reproduces the problem.

I don’t see any ‘waitpid’ uses left in Shepherd services under
gnu/services/*.scm.

One thing I found that is risky is ‘guix-data-service-setup-database’:
it loads a bunch of (guix-data-service …) modules into PID 1 and runs
non-trivial code in there; I strongly recommend doing this in a separate
process, similar to how ‘bffe-shepherd-services’ does it.

Ludo’.
Ludovic Courtès wrote 43 hours ago
(name . Christopher Baines)(address . mail@cbaines.net)(address . 76811@debbugs.gnu.org)
877c4wmtk7.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (5 lines)
> One thing I found that is risky is ‘guix-data-service-setup-database’:
> it loads a bunch of (guix-data-service …) modules into PID 1 and runs
> non-trivial code in there; I strongly recommend doing this in a separate
> process, similar to how ‘bffe-shepherd-services’ does it.

The guix-data-service tests shows that:


Namely:

Toggle snippet (21 lines)
[ 5.788985] shepherd[1]: Service loopback started.
[ 5.790001] shepherd[1]: Service loopback running with value #t.
Uncaught exception in task:
In fibers.scm:
172:8 6 (_)
In shepherd/service/system-log.scm:
180:10 5 (run-system-log #<<channel> getq: #<atomic-box 7fcc8e8?> ?)
In srfi/srfi-1.scm:
586:17 4 (map1 (#<input-output: socket 17> #<input: /proc/kmsg?>))
In shepherd/service/system-log.scm:
181:33 3 (_ #<input-output: socket 17>)
In fibers/io-wakeup.scm:
72:13 2 (make-wait-operation #<procedure 7fcc8e8b0300 at fiber?> ?)
72:13 1 (make-wait-operation #f #<procedure 7fcc8e8b0300 at fi?> ?)
In ice-9/boot-9.scm:
1685:16 0 (raise-exception _ #:continuable? _)

ice-9/boot-9.scm:1685:16: In procedure raise-exception:
Wrong type to apply: #<syntax-transformer make-base-operation>

Here bindings in (fibers io-wakeup) are likely “polluted” by loading
guile-fibers-next via the (guix-data-service …) modules.

Ludo’.
?
Your comment

Commenting via the web interface is currently disabled.

To comment on this conversation send an email to 76811@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 76811
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch
You may also tag this issue. See list of standard tags. For example, to set the confirmed and easy tags
mumi command -t +confirmed -t +easy
Or, remove the moreinfo tag and set the help tag
mumi command -t -moreinfo -t +help