[PATCH core-updates] guix: Reap finished child processes in build containers.

  • Done
  • quality assurance status badge
Details
3 participants
  • Carlo Zancanaro
  • Ludovic Courtès
  • Maxim Cournoyer
Owner
unassigned
Submitted by
Carlo Zancanaro
Severity
normal
C
C
Carlo Zancanaro wrote on 26 Mar 2018 13:16
(address . guix-patches@gnu.org)
87muyvulwt.fsf@zancanaro.id.au
When working on the Shepherd, I found that in the build containers
processes don't get reaped by pid 1. See
caused (and will cause) the Shepherd's tests to fail on some
systems.

Our guile-builder script should handle SIGCHLD and then use
waitpid to reap the child processes. Here's my attempt at a patch
to do that.

I haven't been able to build anything with it because the computer
I'm currently on is laughably slow. If someone else can check that
you can still build with it I'd really appreciate it.
From 7c66818570a139fc4e7b11de34d07c76ebdc6bac Mon Sep 17 00:00:00 2001
From: Carlo Zancanaro <carlo@zancanaro.id.au>
Date: Mon, 26 Mar 2018 22:08:26 +1100
Subject: [PATCH] guix: Reap finished child processes in build containers.

* guix/derivations (build-expression->derivation)[prologue]: Handle SIGCHLD
and reap child processes when they finish.
---
guix/derivations.scm | 11 +++++++++++
1 file changed, 11 insertions(+)

Toggle diff (24 lines)
diff --git a/guix/derivations.scm b/guix/derivations.scm
index da686e89e..80787e99e 100644
--- a/guix/derivations.scm
+++ b/guix/derivations.scm
@@ -1180,6 +1180,17 @@ ALLOWED-REFERENCES, DISALLOWED-REFERENCES, LOCAL-BUILD?, and SUBSTITUTABLE?."
(filter module-form? exp))
(_ `(,exp)))
+ ;; The root process in the build container should reap
+ ;; processes that die, so handle SIGCHLD.
+ (sigaction SIGCHLD
+ (lambda ()
+ (let loop ()
+ (match (waitpid WAIT_ANY WNOHANG)
+ ((0 . _) #f)
+ ((pid . _) (loop))
+ (_ #f))))
+ SA_NOCLDSTOP)
+
(define %output (getenv "out"))
(define %outputs
(map (lambda (o)
--
2.16.2
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEE1lpncq7JnOkt+LaeqdyPv9awIbwFAlq41pIACgkQqdyPv9aw
IbwkIBAAsY1CkKoTi0SFmbjGNXXx+PQQmvKBYn94UCRIl6VdhAFiShvNqo0wlDn3
EaRBhyJfC6ZPVTBGhJkdKY1Ar8xiL/9XYE/GpWKzCpITaccTpS1PwLO6A5dmXwUp
5jhhVOfVT1cQb7jSEWlDzee2gcnMUEWwoxM259/tJX3bJeYROnuhfQwZ4PFqBNWq
8njF6cdPVKh059A0Lvp4VUas7HnskHosdzoRzAsFjiwuQMtXX4eO4Vpb1fUsNocA
90JrgJZayu0PfvF3BG8qbkQJixSsgkbBB478L3kk/XUJJsjDCEAA0JYyXPEAWzE7
tmC6DQeZvxl9N7gVYFLk0gWdXXaFWwPieLYR7CvQwk+r9s9bY5b0/904cdx6ZcbY
UARtItvXJM3raYuFdG69Dtw/xkj906ljRr+USTcB5kBgrl6oZWh27bVKNVVefdOs
TNUFjx7gRUvuFPU6N5foxoJ5+STP4DEg75zqYleLVHhJ/kX7TprviA+aOik+uiN3
/C/WpoWPBxvb++2TYD0VML97OHZa8MML9DswhZvchY2PHHkFfmCrJAYjTj/gHje+
FfRODwEj+EiqTNRnATyFFKzcIRPQkpWyytPbk6T/tq5HonzfD1bmTDTeZY7A4Orh
5gC3Z711PvfLe+N/9JZHBAwSel8sFHyrGw8cv161vxtn2llQlSY=
=XRYr
-----END PGP SIGNATURE-----

C
C
Carlo Zancanaro wrote on 27 Mar 2018 01:39
(address . 30948@debbugs.gnu.org)
878tae76f6.fsf@zancanaro.id.au
Okay, it turns out my previous patch was very wrong. I tried to
start a build and it broke pretty significantly.

I've attached a new patch that at least starts building. My
computer takes too long to actually build anything, but I'm
slightly more confident that my change won't break everything.
From c57b2fe19865afc21fd1fd9a7cad3286b05a9b22 Mon Sep 17 00:00:00 2001
From: Carlo Zancanaro <carlo@zancanaro.id.au>
Date: Mon, 26 Mar 2018 22:08:26 +1100
Subject: [PATCH] guix: Reap finished child processes in build containers.

* guix/derivations (build-expression->derivation)[prologue]: Handle SIGCHLD
and reap child processes when they finish.
---
guix/derivations.scm | 15 +++++++++++++++
1 file changed, 15 insertions(+)

Toggle diff (28 lines)
diff --git a/guix/derivations.scm b/guix/derivations.scm
index da686e89e..27ab3e420 100644
--- a/guix/derivations.scm
+++ b/guix/derivations.scm
@@ -1201,6 +1201,21 @@ ALLOWED-REFERENCES, DISALLOWED-REFERENCES, LOCAL-BUILD?, and SUBSTITUTABLE?."
(else drv))))))
inputs))
+ ;; The root process in the build container should reap
+ ;; processes that die, so handle SIGCHLD.
+ (use-modules (ice-9 match))
+ (sigaction SIGCHLD
+ (lambda _
+ (let loop ()
+ (match (catch 'system-error
+ (lambda ()
+ (waitpid WAIT_ANY WNOHANG))
+ (lambda args
+ '(0 . -)))
+ ((0 . _) #f)
+ ((pid . _) (loop)))))
+ SA_NOCLDSTOP)
+
,@(if (null? modules)
'()
;; Remove our own settings.
--
2.16.2
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEE1lpncq7JnOkt+LaeqdyPv9awIbwFAlq5hL0ACgkQqdyPv9aw
Ibw7AA//fT/jMaBcVfGr2W+AN+nsndZ5kJyFAh+y+IbSah3QFYX4FThNTMn0vxLB
Wk2lB84ry+jRk5v8P/SFJSrgf6QkoU50psNwd/rkH9fUjjVfLlSGyV5qMwxybNoj
QbT2YL/e0xDt1RNN92wuAOQv8PrsBS2ZLqow2fU9H9tPrEb9SBr8ej1oyeGcLs/z
ybJlk1KcXpJ79Er716r7tEJxTyiAWOu/PTwvetfsvohUoSoIMgIofBYYudWVP2xr
1ABOASzEITWO2/0Y+XkoN3J8SZTVWjLssw9aaHLdE+7PD7DkDA/qvlzpbLM48SzK
eTsmRxXU8li5Sif6wHZ9Y0snWKs4tS1QbExFXJL2dnKOOpltp6ddsrvFq4SX4lMX
dQ8l85KWvJDOjnPJ1VaU5OjNUcAGpcHrLvcZ17ePCwDOiuZNZFA1CBhxD5RbH+0B
1usyMd8PvVUo+H8HGC2cbhlwiuZjn6Id4kgn7j9ozaMaRfbUt8yP++EIt/+8GEvo
V3uTNVVNXL8omBIXrZH3zOcznqZE/X5yAswcWINShHiWI9Hm6xspspdJp4bPu8+U
hFRaW7+Zf1TFbJkg4aDTxpORzP4WOCkzqXb1Kz6gS1qa2PwqzSdLZGq6h5Ep6xmk
3F312fAFpAP03uOKuwVdtHrsH/e+nfVfKk5b2EIBgze8E5N96TQ=
=AsO6
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 29 Mar 2018 22:07
Re: [bug#30948] [PATCH core-updates] guix: Reap finished child processes in build containers.
(name . Carlo Zancanaro)(address . carlo@zancanaro.id.au)(address . 30948@debbugs.gnu.org)
87bmf6ve6u.fsf@gnu.org
Hi Carlo,

Carlo Zancanaro <carlo@zancanaro.id.au> skribis:

Toggle quote (8 lines)
> When working on the Shepherd, I found that in the build containers
> processes don't get reaped by pid 1. See
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=30637#29. This caused
> (and will cause) the Shepherd's tests to fail on some systems.
>
> Our guile-builder script should handle SIGCHLD and then use waitpid to
> reap the child processes. Here's my attempt at a patch to do that.

I would rather install the handler as a phase in gnu-build-system: this
leaves ‘build-expression->derivation’ generic, and also gives us more
flexibility (e.g., we can disable that phase without doing a full
rebuild if needed.) See the patch below.

WDYT?

On my first attempt with:

./pre-inst-env guix build -e '(@@ (gnu packages commencement) findutils-boot0)'

quickly failed:

Toggle snippet (37 lines)
checking for vfork.h... no
checking for fork... yes
checking for vfork... yes
checking for working fork... Backtrace:
In ice-9/boot-9.scm:
yes
checking for working vfork... (cached) yes
checking for strcasecmp... 157: 13 [catch #t #<catch-closure c900a0> ...]
In unknown file:
?: 12 [apply-smob/1 #<catch-closure c900a0>]
In ice-9/boot-9.scm:
63: 11 [call-with-prompt prompt0 ...]
In ice-9/eval.scm:
432: 10 [eval # #]
In ice-9/boot-9.scm:
2320: 9 [save-module-excursion #<procedure cc1b80 at ice-9/boot-9.scm:3961:3 ()>]
3966: 8 [#<procedure cc1b80 at ice-9/boot-9.scm:3961:3 ()>]
1645: 7 [%start-stack load-stack #<procedure cbd2c0 at ice-9/boot-9.scm:3957:10 ()>]
1650: 6 [#<procedure cc3060 ()>]
In unknown file:
?: 5 [primitive-load "/gnu/store/pz3jy89ax5jg0j6fnp5n42x4vznga8s3-make-boot0-4.2.1-guile-builder"]
In ice-9/eval.scm:
387: 4 [eval # ()]
In srfi/srfi-1.scm:
619: 3 [for-each #<procedure 1217560 at /gnu/store/hf8xflikhgsd4hfy9h8s0cjzfqm8f3yb-module-import/guix/build/gnu-build-system.scm:815:12 (expr)> ...]
In /gnu/store/hf8xflikhgsd4hfy9h8s0cjzfqm8f3yb-module-import/guix/build/gnu-build-system.scm:
819: 2 [#<procedure 1217560 at /gnu/store/hf8xflikhgsd4hfy9h8s0cjzfqm8f3yb-module-import/guix/build/gnu-build-system.scm:815:12 (expr)> #]
In /gnu/store/hf8xflikhgsd4hfy9h8s0cjzfqm8f3yb-module-import/guix/build/utils.scm:
614: 1 [invoke "/gnu/store/g34swjqyw205d15pyra39j56qvyxq9w9-bootstrap-binaries-0/bin/bash" ...]
In unknown file:
?: 0 [system* "/gnu/store/g34swjqyw205d15pyra39j56qvyxq9w9-bootstrap-binaries-0/bin/bash" ...]

ERROR: In procedure system*:
ERROR: In procedure system*: Interrupted system call
builder for `/gnu/store/hc96d5dcshbdgavpp0j01qnsjf0yf9z5-make-boot0-4.2.1.drv' failed with exit code 1

This is why ‘install-SIGCHLD-handler’ in the patch does nothing on Guile
<= 2.0.9.

Now, we’d need to test it for real with Guile 2.2. I suppose one way to
test without rebuilding it all would be to add this phase explicitly in
a package and try building it with --rounds=10 or something. Would you
like to try that?

Note that we have only a couple of days left before the ‘core-updates’
freeze.

Thanks,
Ludo’.
Toggle diff (43 lines)
diff --git a/guix/build/gnu-build-system.scm b/guix/build/gnu-build-system.scm
index be5ad78b9..2c6cb4ad2 100644
--- a/guix/build/gnu-build-system.scm
+++ b/guix/build/gnu-build-system.scm
@@ -51,6 +51,28 @@
(define time-monotonic time-tai))
(else #t))
+(define* (install-SIGCHLD-handler #:rest _)
+ "Handle SIGCHLD signals. Since this code is usually running as PID 1 in the
+build daemon, it has to reap dead processes, hence this procedure."
+ ;; In Guile <= 2.0.9, syscalls could throw EINTR. With these versions,
+ ;; installing a SIGCHLD handler is not safe because we could have uncaught
+ ;; 'system-error' exceptions at any time.
+ (when (or (not (string=? (effective-version) "2.0"))
+ (> (string->number (micro-version)) 9))
+ (format #t "installing SIGCHLD handler in PID ~a\n" (getpid))
+ (sigaction SIGCHLD
+ (lambda _
+ (let loop ()
+ (match (catch 'system-error
+ (lambda ()
+ (waitpid WAIT_ANY WNOHANG))
+ (lambda args
+ '(0 . -)))
+ ((0 . _) #f)
+ ((pid . _) (loop)))))
+ SA_NOCLDSTOP))
+ #t)
+
(define* (set-SOURCE-DATE-EPOCH #:rest _)
"Set the 'SOURCE_DATE_EPOCH' environment variable. This is used by tools
that incorporate timestamps as a way to tell them to use a fixed timestamp.
@@ -758,7 +780,8 @@ which cannot be found~%"
;; Standard build phases, as a list of symbol/procedure pairs.
(let-syntax ((phases (syntax-rules ()
((_ p ...) `((p . ,p) ...)))))
- (phases set-SOURCE-DATE-EPOCH set-paths install-locale unpack
+ (phases install-SIGCHLD-handler
+ set-SOURCE-DATE-EPOCH set-paths install-locale unpack
bootstrap
patch-usr-bin-file
patch-source-shebangs configure patch-generated-file-shebangs
C
C
Carlo Zancanaro wrote on 29 Mar 2018 23:15
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 30948@debbugs.gnu.org)
87sh8id1mg.fsf@zancanaro.id.au
Hey Ludo,

On Thu, Mar 29 2018, Ludovic Courtès wrote:
Toggle quote (8 lines)
> I would rather install the handler as a phase in
> gnu-build-system: this leaves ‘build-expression->derivation’
> generic, and also gives us more flexibility (e.g., we can
> disable that phase without doing a full rebuild if needed.) See
> the patch below.
>
> WDYT?

What do you mean by "generic"? From what I can understand it's one
of pid 1's responsiblities to reap child processes, so I would
expect this to be set up for every builder, before the builder is
run. Given it's not specific to the gnu-build-system, I don't
think it really fits there.

Toggle quote (12 lines)
> On my first attempt with:
>
> ./pre-inst-env guix build -e '(@@ (gnu packages commencement)
> findutils-boot0)'
>
> quickly failed:
>
> ...
>
> This is why ‘install-SIGCHLD-handler’ in the patch does nothing
> on Guile <= 2.0.9.

From what I understand, Guix depends on Guile 2.0.13 or later, so
I didn't think it needed to work with 2.0.9. From my quick check,
though, our bootstrap binaries are Guile 2.0.9? I can see how that
might cause a problem. In what sense does Guix require 2.0.13 (as
the manual claims) rather than 2.0.9?

Toggle quote (8 lines)
> Now, we’d need to test it for real with Guile 2.2. I suppose
> one way to
> test without rebuilding it all would be to add this phase
> explicitly in
> a package and try building it with --rounds=10 or something.
> Would you
> like to try that?

Yeah, I'll give it a go.

Carlo
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEE1lpncq7JnOkt+LaeqdyPv9awIbwFAlq9V4gACgkQqdyPv9aw
IbwfFA/6A+eIjcEhtb6obaSPl1hhiZj/bPIftEpE5m/DS0AHqxsAfwejLA6I/av6
jJdswbSER/nQanDO5u685G0pvQLmJL5NlFCrjC4Ji/vi0FlTtVOOI8xaUsQFrXrb
eu+HhWg891VvI1hKSfrQm8ZOYzi+SyayCatontP4M4rUFbzY4UQFJZNf8IPtUAY/
N7YVv3UJ2nGGGOMU89u52idSpVxBRBesdUyszvLTW4yLKXM7wjpDP4jq1W+MJ8j+
kXEn7iNNG8L1jOTgCNPOrD0j+pnlyh4hCstaDx8AcJsbJO7cUCM3/t86Nogo27R2
nY8M+ayYUZ3mpoU4PfTDQZ4dVUoxFPqJ+mAqZ3NtYLrRZLlQQots+qLWTSIsUqoz
MI8+yaMbQej7sEGrU6pJUmwKggEpMbKdHrTsUPXAXHupBylyoXnboIzCh0aJ9kAR
4u0djoW/eAJo6Z7AuxCuO023z0mTqeAgI5dkPUx4C0S8Zl3csaFP/rO9SzFvehSy
+iWgD8BCJTZEMtgX+WUWyO37FyvvCCvOtrcLgizBJ3m2OdYZ/U1kbS3HqFaxgtvp
pEuwPWNnYNk5mM09dl/l7aV9TCbJRuryZV5Ued5pP9P9JCRcHIaz756AW98uZs3C
b9r2Bc6xVhkWiEv0SXv42HFCUJcfHhOS2e/BJO6F4Q/xufEnj8Y=
=3nBj
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 30 Mar 2018 10:16
(name . Carlo Zancanaro)(address . carlo@zancanaro.id.au)(address . 30948@debbugs.gnu.org)
87vadeou54.fsf@gnu.org
Heya,

Carlo Zancanaro <carlo@zancanaro.id.au> skribis:

Toggle quote (10 lines)
> On Thu, Mar 29 2018, Ludovic Courtès wrote:
>> I would rather install the handler as a phase in gnu-build-system:
>> this leaves ‘build-expression->derivation’ generic, and also gives
>> us more flexibility (e.g., we can disable that phase without doing a
>> full rebuild if needed.) See the patch below.
>>
>> WDYT?
>
> What do you mean by "generic"?

I want as little magic as possible around the expression that’s passed
to ‘build-expression->derivation’.

Toggle quote (4 lines)
> From what I can understand it's one of pid 1's responsiblities to reap
> child processes, so I would expect this to be set up for every
> builder, before the builder is run.

True, but for derivations it’s also “optional” because eventually
guix-daemon terminates all its child processes.

Toggle quote (3 lines)
> Given it's not specific to the gnu-build-system, I don't think it
> really fits there.

Yes, but note that it would be inherited by all the build systems.

Toggle quote (16 lines)
>> On my first attempt with:
>>
>> ./pre-inst-env guix build -e '(@@ (gnu packages commencement)
>> findutils-boot0)'
>>
>> quickly failed:
>>
>> ...
>>
>> This is why ‘install-SIGCHLD-handler’ in the patch does nothing on
>> Guile <= 2.0.9.
>
> From what I understand, Guix depends on Guile 2.0.13 or later, so I
> didn't think it needed to work with 2.0.9. From my quick check,
> though, our bootstrap binaries are Guile 2.0.9?

Exactly.

Toggle quote (3 lines)
> I can see how that might cause a problem. In what sense does Guix
> require 2.0.13 (as the manual claims) rather than 2.0.9?

There’s the “host side” (the ‘guix’ commands and related modules), and
there’s the “build side” (code used in the build environment when
building derivations.)

The “build side” is fully specified: ‘guix graph’ shows exactly what
Guile is used where, and you can see with, say:

guix graph -t derivation \
-e '(@@ (gnu packages commencement) findutils-boot0)'

that the early derivations run on Guile 2.0.9.

For “host side” code, users can use any Guile >= 2.0.13.

See also

I hope this clarifies a bit!

Ludo’.
C
C
Carlo Zancanaro wrote on 30 Mar 2018 13:17
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 30948@debbugs.gnu.org)
87o9j5x1d4.fsf@zancanaro.id.au
Hey,

On Fri, Mar 30 2018, Ludovic Courtès wrote:
Toggle quote (7 lines)
>> From what I can understand it's one of pid 1's responsiblities
>> to reap child processes, so I would expect this to be set up
>> for every builder, before the builder is run.
>
> True, but for derivations it’s also “optional” because
> eventually guix-daemon terminates all its child processes.

As long as the build process doesn't rely on behaviour that,
strictly speaking, it should be allowed to rely on. It's not an
issue of resource leaking, it's an issue of correctness.

Toggle quote (6 lines)
>> Given it's not specific to the gnu-build-system, I don't think
>> it really fits there.
>
> Yes, but note that it would be inherited by all the build
> systems.

Except for trivial-build-system, which is probably fine. I still
don't think it fits in a specific build system, given it's a
behaviour that transcends the specific action happening within the
container.

Putting it in gnu-build-system will solve the problem in all
realistic cases, so that's probably fine. It's still subtly
incorrect, but will only be a problem if something using the
trivial build system relies on pid 1 to reap a process, or if we
make a new build system not deriving from gnu-build-system (which
seems unlikely, but not impossible).

Toggle quote (10 lines)
> The “build side” is fully specified: ‘guix graph’ shows exactly
> what Guile is used where, and you can see with, say:
>
> guix graph -t derivation \
> -e '(@@ (gnu packages commencement) findutils-boot0)'
>
> that the early derivations run on Guile 2.0.9.
>
> For “host side” code, users can use any Guile >= 2.0.13.

Yeah, okay. That makes sense. I guess I just expected 2.0.13 to be
the minimum version throughout.

Carlo
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEEwWt2bKTcV+mIZ20oCShLEsLiKqIFAlq+HLIACgkQCShLEsLi
KqL7WAgAyftn/CJ0pPyDVc6L3qwhmU58s5hsT6U+E7TkRdkdf1NY6Hl1JK4JygJt
FRFy7IuDgPWm4UpuBrCbHTbA5G7yzNoSqPzFiG+ephJmXvCyCg578NEfLD/ChUmz
D/ES/rWw/rFmqTWTVxHZrnC7buiNOS9BgiiMkYbZ5cmAP1s77pzGFPKiZXoyp5zw
zyn3lGPlh+ULvLGah+PdjLMM74qhIi7y3MDpGdRuHEFmCP4+vdz/33bZKwFqQ1YE
Cj3Yi2tDPA4Ana0oAqCs4SMLGcseaZpAR4CmkDqaMq4t891k+JJ6EcvV45dFF0is
7gba+//F3RQAH9e1ujrp3SXCweO2GA==
=qY0n
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 30 Mar 2018 17:17
(name . Carlo Zancanaro)(address . carlo@zancanaro.id.au)(address . 30948@debbugs.gnu.org)
874lkxoanq.fsf@gnu.org
Hello,

Carlo Zancanaro <carlo@zancanaro.id.au> skribis:

Toggle quote (12 lines)
> On Fri, Mar 30 2018, Ludovic Courtès wrote:
>>> From what I can understand it's one of pid 1's responsiblities to
>>> reap child processes, so I would expect this to be set up for every
>>> builder, before the builder is run.
>>
>> True, but for derivations it’s also “optional” because eventually
>> guix-daemon terminates all its child processes.
>
> As long as the build process doesn't rely on behaviour that, strictly
> speaking, it should be allowed to rely on. It's not an issue of
> resource leaking, it's an issue of correctness.

Right.

Toggle quote (16 lines)
>>> Given it's not specific to the gnu-build-system, I don't think it
>>> really fits there.
>>
>> Yes, but note that it would be inherited by all the build systems.
>
> Except for trivial-build-system, which is probably fine. I still don't
> think it fits in a specific build system, given it's a behaviour that
> transcends the specific action happening within the container.
>
> Putting it in gnu-build-system will solve the problem in all realistic
> cases, so that's probably fine. It's still subtly incorrect, but will
> only be a problem if something using the trivial build system relies
> on pid 1 to reap a process, or if we make a new build system not
> deriving from gnu-build-system (which seems unlikely, but not
> impossible).

I agree, every Guile process running as PID 1 should reap processes.

My view is just that this mechanism belongs in “user code”, not in the
low-level mechanisms such as ‘build-expression->derivation’ and
‘gexp->derivation’. It’s a matter of separation of concerns.

Of course we don’t want to duplicate that code every time, but the way
we should factorize it, IMO, is by putting it in a “normal” module that
people will use.

Putting it in gnu-build-system is an admittedly hacky but easy way to
have it widely shared.

Thanks,
Ludo’.
M
M
Maxim Cournoyer wrote on 24 Nov 2022 17:40
Re: bug#30948: [PATCH core-updates] guix: Reap finished child processes in build containers.
(address . 30948@debbugs.gnu.org)
87h6yonw5h.fsf_-_@gmail.com
reassign 30948 guix
thanks
--
Hi,

I'm moving this from 'guix-patches' to 'guix', so that it's more
discoverable as a *bug*. It still bites us every now and then (grep the
Guix source code for usages of tini to find some occurrences).

Thanks,

Maxim
M
M
Maxim Cournoyer wrote on 24 Nov 2022 17:44
(name . Ludovic Courtès)(address . ludo@gnu.org)
87cz9cnvys.fsf_-_@gmail.com
Hi,

ludo@gnu.org (Ludovic Courtès) writes:

Toggle quote (21 lines)
> Hello,
>
> Carlo Zancanaro <carlo@zancanaro.id.au> skribis:
>
>> On Fri, Mar 30 2018, Ludovic Courtès wrote:
>>>> From what I can understand it's one of pid 1's responsiblities to
>>>> reap child processes, so I would expect this to be set up for every
>>>> builder, before the builder is run.
>>>
>>> True, but for derivations it’s also “optional” because eventually
>>> guix-daemon terminates all its child processes.
>>
>> As long as the build process doesn't rely on behaviour that, strictly
>> speaking, it should be allowed to rely on. It's not an issue of
>> resource leaking, it's an issue of correctness.
>
> Right.
>
>>>> Given it's not specific to the gnu-build-system, I don't think it
>>>> really fits there.

For what it's worth, I agree. The evaluation container should have the
correct signal handling configured for *any* code about to be evaluated,
not just when on demand, if we want to fix this fully in a way that
won't come back to haunt us in some edge case.

Toggle quote (15 lines)
>>> Yes, but note that it would be inherited by all the build systems.
>>
>> Except for trivial-build-system, which is probably fine. I still don't
>> think it fits in a specific build system, given it's a behaviour that
>> transcends the specific action happening within the container.
>>
>> Putting it in gnu-build-system will solve the problem in all realistic
>> cases, so that's probably fine. It's still subtly incorrect, but will
>> only be a problem if something using the trivial build system relies
>> on pid 1 to reap a process, or if we make a new build system not
>> deriving from gnu-build-system (which seems unlikely, but not
>> impossible).
>
> I agree, every Guile process running as PID 1 should reap processes.

Agreed too.

Toggle quote (4 lines)
> My view is just that this mechanism belongs in “user code”, not in the
> low-level mechanisms such as ‘build-expression->derivation’ and
> ‘gexp->derivation’. It’s a matter of separation of concerns.

Why? On my Guix System, such signal handling is handled by Shepherd, if
I'm not mistaken. As I user, I can trust the foundation to be sane,
rather than having to provide the bits to make it so myself.

Toggle quote (7 lines)
> Of course we don’t want to duplicate that code every time, but the way
> we should factorize it, IMO, is by putting it in a “normal” module that
> people will use.
>
> Putting it in gnu-build-system is an admittedly hacky but easy way to
> have it widely shared.

I think we can do better than hacky here :-)

--
Thanks,
Maxim
L
L
Ludovic Courtès wrote on 26 Nov 2022 16:11
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
875yf192en.fsf@gnu.org
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (17 lines)
>> My view is just that this mechanism belongs in “user code”, not in the
>> low-level mechanisms such as ‘build-expression->derivation’ and
>> ‘gexp->derivation’. It’s a matter of separation of concerns.
>
> Why? On my Guix System, such signal handling is handled by Shepherd, if
> I'm not mistaken. As I user, I can trust the foundation to be sane,
> rather than having to provide the bits to make it so myself.
>
>> Of course we don’t want to duplicate that code every time, but the way
>> we should factorize it, IMO, is by putting it in a “normal” module that
>> people will use.
>>
>> Putting it in gnu-build-system is an admittedly hacky but easy way to
>> have it widely shared.
>
> I think we can do better than hacky here :-)

I think the real issue here is semantic clarity when it comes to
derivation inputs.

If I write:

(gexp->derivation "foo" #~(mkdir #$output))

I can be sure that my derivation depends on nothing but (default-guile).
This is important for tests, but also to make sure we can use this
primitive everywhere—if it pulled in the Shepherd, I wouldn’t be able to
use to build glibc, because there’d be a cycle.

In that sense, having child-reaping code in gnu-build-system.scm, just
like in (guix least-authority), doesn’t seem unreasonable to me.

That said, I’m open to other proposals so please unleash your
creativity! :-)

We’re touching core components though so this will require discussion.

Ludo’.
M
M
Maxim Cournoyer wrote on 27 Nov 2022 04:00
(name . Ludovic Courtès)(address . ludo@gnu.org)
87ilj1w18i.fsf@gmail.com
Hi,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (33 lines)
> Hi,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>>> My view is just that this mechanism belongs in “user code”, not in the
>>> low-level mechanisms such as ‘build-expression->derivation’ and
>>> ‘gexp->derivation’. It’s a matter of separation of concerns.
>>
>> Why? On my Guix System, such signal handling is handled by Shepherd, if
>> I'm not mistaken. As I user, I can trust the foundation to be sane,
>> rather than having to provide the bits to make it so myself.
>>
>>> Of course we don’t want to duplicate that code every time, but the way
>>> we should factorize it, IMO, is by putting it in a “normal” module that
>>> people will use.
>>>
>>> Putting it in gnu-build-system is an admittedly hacky but easy way to
>>> have it widely shared.
>>
>> I think we can do better than hacky here :-)
>
> I think the real issue here is semantic clarity when it comes to
> derivation inputs.
>
> If I write:
>
> (gexp->derivation "foo" #~(mkdir #$output))
>
> I can be sure that my derivation depends on nothing but (default-guile).
> This is important for tests, but also to make sure we can use this
> primitive everywhere—if it pulled in the Shepherd, I wouldn’t be able to
> use to build glibc, because there’d be a cycle.

I was not suggesting to pull in extra dependencies such as Shepherd, but
to weave the to-be-added signal handling logic at a much lower level.
One idea could be to arrange so that the correct signal handlers always
get installed for any Guile code running in the build side (I'm not sure
how, but perhaps by adjusting the gexp "compiler"?).

The handlers could be defined in (guix build signal-handling) or
similar. Users wouldn't need to explicitly import the module and
install its signal handlers, that'd be taken care of automatically, all
the time.

Does that sound feasible?

--
Thanks,
Maxim
L
L
Ludovic Courtès wrote on 28 Nov 2022 16:04
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
87lenvce8r.fsf@gnu.org
Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (2 lines)
> Ludovic Courtès <ludo@gnu.org> writes:

[...]

Toggle quote (22 lines)
>> If I write:
>>
>> (gexp->derivation "foo" #~(mkdir #$output))
>>
>> I can be sure that my derivation depends on nothing but (default-guile).
>> This is important for tests, but also to make sure we can use this
>> primitive everywhere—if it pulled in the Shepherd, I wouldn’t be able to
>> use to build glibc, because there’d be a cycle.
>
> I was not suggesting to pull in extra dependencies such as Shepherd, but
> to weave the to-be-added signal handling logic at a much lower level.
> One idea could be to arrange so that the correct signal handlers always
> get installed for any Guile code running in the build side (I'm not sure
> how, but perhaps by adjusting the gexp "compiler"?).
>
> The handlers could be defined in (guix build signal-handling) or
> similar. Users wouldn't need to explicitly import the module and
> install its signal handlers, that'd be taken care of automatically, all
> the time.
>
> Does that sound feasible?

Not like this: the imported-modules derivation for (guix build
signal-handling) would be a dependency in themselves.

Ludo’.
M
M
Maxim Cournoyer wrote on 28 Nov 2022 21:10
(name . Ludovic Courtès)(address . ludo@gnu.org)
87k03esuvs.fsf@gmail.com
Hi,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (31 lines)
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> Ludovic Courtès <ludo@gnu.org> writes:
>
> [...]
>
>>> If I write:
>>>
>>> (gexp->derivation "foo" #~(mkdir #$output))
>>>
>>> I can be sure that my derivation depends on nothing but (default-guile).
>>> This is important for tests, but also to make sure we can use this
>>> primitive everywhere—if it pulled in the Shepherd, I wouldn’t be able to
>>> use to build glibc, because there’d be a cycle.
>>
>> I was not suggesting to pull in extra dependencies such as Shepherd, but
>> to weave the to-be-added signal handling logic at a much lower level.
>> One idea could be to arrange so that the correct signal handlers always
>> get installed for any Guile code running in the build side (I'm not sure
>> how, but perhaps by adjusting the gexp "compiler"?).
>>
>> The handlers could be defined in (guix build signal-handling) or
>> similar. Users wouldn't need to explicitly import the module and
>> install its signal handlers, that'd be taken care of automatically, all
>> the time.
>>
>> Does that sound feasible?
>
> Not like this: the imported-modules derivation for (guix build
> signal-handling) would be a dependency in themselves.

Can we make it an implicit dependency, since we want it to *always* be
used?

It'd be useless/annoying boilerplate otherwise.

--
Thanks,
Maxim
M
M
Maxim Cournoyer wrote on 29 Nov 2022 03:07
(name . Ludovic Courtès)(address . ludo@gnu.org)
875yeysebw.fsf@gmail.com
Hi,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (31 lines)
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> Ludovic Courtès <ludo@gnu.org> writes:
>
> [...]
>
>>> If I write:
>>>
>>> (gexp->derivation "foo" #~(mkdir #$output))
>>>
>>> I can be sure that my derivation depends on nothing but (default-guile).
>>> This is important for tests, but also to make sure we can use this
>>> primitive everywhere—if it pulled in the Shepherd, I wouldn’t be able to
>>> use to build glibc, because there’d be a cycle.
>>
>> I was not suggesting to pull in extra dependencies such as Shepherd, but
>> to weave the to-be-added signal handling logic at a much lower level.
>> One idea could be to arrange so that the correct signal handlers always
>> get installed for any Guile code running in the build side (I'm not sure
>> how, but perhaps by adjusting the gexp "compiler"?).
>>
>> The handlers could be defined in (guix build signal-handling) or
>> similar. Users wouldn't need to explicitly import the module and
>> install its signal handlers, that'd be taken care of automatically, all
>> the time.
>>
>> Does that sound feasible?
>
> Not like this: the imported-modules derivation for (guix build
> signal-handling) would be a dependency in themselves.

I see a couple of options for the lowest place to inject the minimal
signal handling of a PID.

1. In Guile itself. We could make it detect when it's running as PID 1
and then set up the required signal handling. This is apparently what
Bash does, a peculiarity exploited by NixOS (they launch their builder
scripts via Bash, which is PID 1 and takes care of reaping the dead
processes)

2. In a Guile wrapper. Instead of running Guile directly in the
container, guix-daemon would run it through a wrapper that acts as PID 1.
This would make it a tool comparable to dumb-init [0] or tini [1],
except written in Scheme.


If we implement 1, it'd make Guile potentially useful as a wrapper
itself to launch scripts in containerized environment (the same as
tini), and it alleviates any integration overhead for us, so I find it
attractive.

What do you think?

For further reading, see [2], which I found interesting.


--
Thanks,
Maxim
L
L
Ludovic Courtès wrote on 17 Dec 2023 21:23
[PATCH core-updates] build-system/gnu: Turn PID 1 into an “init”-style process by de fault.
(address . 30948@debbugs.gnu.org)
d5497ca9b8d069d31dc905ac2aedddcff2614792.1702844395.git.ludo@gnu.org

* guix/build/gnu-build-system.scm (separate-from-pid1): New procedure.
(%standard-phases): Add it.
* guix/build-system/gnu.scm (gnu-build): Add #:separate-from-pid1? and
honor it.
(gnu-cross-build): Likewise.

Reported-by: Carlo Zancanaro <carlo@zancanaro.id.au>
Change-Id: I6f3bc8d8186d1a571f983a38d5e3fd178ffa2678
---
guix/build-system/gnu.scm | 4 ++++
guix/build/gnu-build-system.scm | 39 ++++++++++++++++++++++++++++++++-
2 files changed, 42 insertions(+), 1 deletion(-)

Hi!

This is a second attempt I’m currently testing as part of an
initially unrelated ‘core-updates’ series:


The principle is simple: if the build process runs as PID 1, fork
so that PID 1 does nothing but call ‘waitpid’ in a loop while the
actual build process runs as PID 2.

This is simple and robust. The code is written in a defensive way
as an extra phase that can be disabled.

Thoughts?

Ludo’.

Toggle diff (97 lines)
diff --git a/guix/build-system/gnu.scm b/guix/build-system/gnu.scm
index 0f886fe21d..6a89bcc0d8 100644
--- a/guix/build-system/gnu.scm
+++ b/guix/build-system/gnu.scm
@@ -362,6 +362,7 @@ (define* (gnu-build name inputs
(license-file-regexp %license-file-regexp)
(phases '%standard-phases)
(locale "C.UTF-8")
+ (separate-from-pid1? #t)
(system (%current-system))
(build (nix-system->gnu-triplet system))
(imported-modules %default-gnu-imported-modules)
@@ -404,6 +405,7 @@ (define* (gnu-build name inputs
(sexp->gexp phases)
phases)
#:locale #$locale
+ #:separate-from-pid1? #$separate-from-pid1?
#:bootstrap-scripts #$bootstrap-scripts
#:configure-flags #$(if (pair? configure-flags)
(sexp->gexp configure-flags)
@@ -502,6 +504,7 @@ (define* (gnu-cross-build name
(license-file-regexp %license-file-regexp)
(phases '%standard-phases)
(locale "C.UTF-8")
+ (separate-from-pid1? #t)
(system (%current-system))
(build (nix-system->gnu-triplet system))
(imported-modules %default-gnu-imported-modules)
@@ -547,6 +550,7 @@ (define* (gnu-cross-build name
(sexp->gexp phases)
phases)
#:locale #$locale
+ #:separate-from-pid1? #$separate-from-pid1?
#:bootstrap-scripts #$bootstrap-scripts
#:configure-flags #$configure-flags
#:make-flags #$make-flags
diff --git a/guix/build/gnu-build-system.scm b/guix/build/gnu-build-system.scm
index 39707e7ace..51b8f9acbf 100644
--- a/guix/build/gnu-build-system.scm
+++ b/guix/build/gnu-build-system.scm
@@ -72,6 +72,42 @@ (define (first-subdirectory directory)
((first . _) first)
(_ #f)))
+(define* (separate-from-pid1 #:key (separate-from-pid1? #t)
+ #:allow-other-keys)
+ "When running as PID 1 and SEPARATE-FROM-PID1? is true, run build phases as
+a child process; PID 1 then becomes responsible for reaping child processes."
+ (if separate-from-pid1?
+ (if (= 1 (getpid))
+ (dynamic-wind
+ (const #t)
+ (lambda ()
+ (match (primitive-fork)
+ (0 #t)
+ (builder-pid
+ (format (current-error-port)
+ "build process now running as PID ~a~%"
+ builder-pid)
+ (let loop ()
+ ;; Running as PID 1 so take responsibility for reaping
+ ;; child processes.
+ (match (waitpid WAIT_ANY)
+ ((pid . status)
+ (if (= pid builder-pid)
+ (if (zero? status)
+ (primitive-exit 0)
+ (begin
+ (format (current-error-port)
+ "build process ~a exited with status ~a~%"
+ pid status)
+ (primitive-exit 1)))
+ (loop))))))))
+ (const #t))
+ (format (current-error-port) "not running as PID 1 (PID: ~a)~%"
+ (getpid)))
+ (format (current-error-port)
+ "build process running as PID ~a; not forking~%"
+ (getpid))))
+
(define* (set-paths #:key target inputs native-inputs
(search-paths '()) (native-search-paths '())
#:allow-other-keys)
@@ -872,7 +908,8 @@ (define %standard-phases
;; Standard build phases, as a list of symbol/procedure pairs.
(let-syntax ((phases (syntax-rules ()
((_ p ...) `((p . ,p) ...)))))
- (phases set-SOURCE-DATE-EPOCH set-paths install-locale unpack
+ (phases separate-from-pid1
+ set-SOURCE-DATE-EPOCH set-paths install-locale unpack
bootstrap
patch-usr-bin-file
patch-source-shebangs configure patch-generated-file-shebangs

base-commit: 2f3b64b9d967b4eea5cbdb32c859f4e3ac3b1a83
--
2.41.0
M
M
Maxim Cournoyer wrote on 17 Dec 2023 22:46
Re: bug#30948: [PATCH core-updates] build-system/gnu: Turn PID 1 into an “init”-style process by default.
(name . Ludovic Courtès)(address . ludo@gnu.org)
87v88w7b90.fsf@gmail.com
Hi Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (31 lines)
>
> * guix/build/gnu-build-system.scm (separate-from-pid1): New procedure.
> (%standard-phases): Add it.
> * guix/build-system/gnu.scm (gnu-build): Add #:separate-from-pid1? and
> honor it.
> (gnu-cross-build): Likewise.
>
> Reported-by: Carlo Zancanaro <carlo@zancanaro.id.au>
> Change-Id: I6f3bc8d8186d1a571f983a38d5e3fd178ffa2678
> ---
> guix/build-system/gnu.scm | 4 ++++
> guix/build/gnu-build-system.scm | 39 ++++++++++++++++++++++++++++++++-
> 2 files changed, 42 insertions(+), 1 deletion(-)
>
> Hi!
>
> This is a second attempt I’m currently testing as part of an
> initially unrelated ‘core-updates’ series:
>
> https://issues.guix.gnu.org/67824
>
> The principle is simple: if the build process runs as PID 1, fork
> so that PID 1 does nothing but call ‘waitpid’ in a loop while the
> actual build process runs as PID 2.
>
> This is simple and robust. The code is written in a defensive way
> as an extra phase that can be disabled.
>
> Thoughts?

I haven't yet looked at the code, but looking at the bigger picture,
wouldn't it be a useful behavior to have for Guile itself? Perhaps not,
as there already exists a Guile init manager (GNU Shepherd), but if it's
something relatively simple/compact to implement, perhaps it could find
its place in Guile itself, just like Bash implements correctly signal
handling when used as a PID 1 (if I'm not mistaken).

--
Thanks,
Maxim
L
L
Ludovic Courtès wrote on 18 Dec 2023 18:46
Re: bug#30948: [PATCH core-updates] guix: Reap finished child processes in build containers.
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
874jgfjtdn.fsf_-_@gnu.org
Hey Maxim,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (7 lines)
> I haven't yet looked at the code, but looking at the bigger picture,
> wouldn't it be a useful behavior to have for Guile itself? Perhaps not,
> as there already exists a Guile init manager (GNU Shepherd), but if it's
> something relatively simple/compact to implement, perhaps it could find
> its place in Guile itself, just like Bash implements correctly signal
> handling when used as a PID 1 (if I'm not mistaken).

Bash is a shell whereas Guile is a programming language, and to me that
makes a big difference: we want to be able to implement init systems in
Guile just like we implement them in C, and that means we need full
control over what to do when running as PID 1. That’s why I wouldn’t do
anything special in Guile itself (nor in libc).

The patch I submitted fixes our immediate problem with build processes,
so I’d like to have it in ‘core-updates’.

Hopeful as I am, I see us merging ‘core-updates’ in the first half of
January. Ambition! :-)

Ludo’.
L
L
Ludovic Courtès wrote on 19 Dec 2023 23:56
(address . 30948@debbugs.gnu.org)
871qbhiyxx.fsf_-_@gnu.org
Hello,

Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (11 lines)
>
> * guix/build/gnu-build-system.scm (separate-from-pid1): New procedure.
> (%standard-phases): Add it.
> * guix/build-system/gnu.scm (gnu-build): Add #:separate-from-pid1? and
> honor it.
> (gnu-cross-build): Likewise.
>
> Reported-by: Carlo Zancanaro <carlo@zancanaro.id.au>
> Change-Id: I6f3bc8d8186d1a571f983a38d5e3fd178ffa2678

I pushed this change in ‘core-updates’ as
7ebe4b72727632561ddbf8bb0c58527929682989 (together with other
world-rebuild changes.)

I’m closing this bug but please do re-open if you think the issue needs
further work.

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 19 Dec 2023 23:56
control message for bug #30948
(address . control@debbugs.gnu.org)
87zfy5hkd9.fsf@gnu.org
close 30948
quit
M
M
Maxim Cournoyer wrote on 30 Dec 2023 04:36
Re: bug#30948: [PATCH core-updates] guix: Reap finished child processes in build containers.
(name . Ludovic Courtès)(address . ludo@gnu.org)
87a5pss6nt.fsf@gmail.com
Hi Ludo,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (17 lines)
> Hey Maxim,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> I haven't yet looked at the code, but looking at the bigger picture,
>> wouldn't it be a useful behavior to have for Guile itself? Perhaps not,
>> as there already exists a Guile init manager (GNU Shepherd), but if it's
>> something relatively simple/compact to implement, perhaps it could find
>> its place in Guile itself, just like Bash implements correctly signal
>> handling when used as a PID 1 (if I'm not mistaken).
>
> Bash is a shell whereas Guile is a programming language, and to me that
> makes a big difference: we want to be able to implement init systems in
> Guile just like we implement them in C, and that means we need full
> control over what to do when running as PID 1. That’s why I wouldn’t do
> anything special in Guile itself (nor in libc).

That sounds reasonable.

Toggle quote (6 lines)
> The patch I submitted fixes our immediate problem with build processes,
> so I’d like to have it in ‘core-updates’.
>
> Hopeful as I am, I see us merging ‘core-updates’ in the first half of
> January. Ambition! :-)

Cool, let's see if we can get it done!

--
Thanks,
Maxim
?