"guix build guix" halts after exhausting memory

  • Done
  • quality assurance status badge
Details
2 participants
  • Julien Lepiller
  • Ludovic Courtès
Owner
unassigned
Submitted by
Julien Lepiller
Severity
important
J
J
Julien Lepiller wrote on 12 Jun 2021 00:23
(address . bug-guix@gnu.org)
20210612002330.64cd06c1@tachikoma.lepiller.eu
Hi Guix!

I tried updating my system on my armhf board (2GB of RAM), but during
"guix system reconfigure", guix tries to build itself (the guix package
from (gnu packages package-management)). This package uses too much
memory to build, and I start getting GC warnings like so:

GC Warning: Out of memory - trying to allocate requested amount (552
bytes)...
GC Warning: Header allocation failed: dropping block
GC Warning: Out of Memory! Heap size: 2571 MiB. Returning NULL!
Warning: Unwind-only out of memory exception; skipping pre-unwind
handler.

after some more progress and a lot more warnings, the build stops (CPU
is at 2%), but the memory is not freed. The build stayed stuck for a
long time until I decided to cancel it.

As a work-around, I tried using the guix-daemon package instead (by
changing the guix field in guix-configuration), but "guix system" still
wants to build the guix package anyway. I tried finding usages of the
guix package (grepping for "package-management" yields a relatively
short list of files, and none of them seems to be used by my config,
except (gnu services base) which always uses the guix from the
configuration). Why?

Can we instead break the build (at the Makefile level) into multiple
smaller chunks, that require less memory, in the same way (guix self)
works?
J
J
Julien Lepiller wrote on 12 Jun 2021 00:30
(address . 48963@debbugs.gnu.org)
20210612003014.6f9481de@tachikoma.lepiller.eu
Le Sat, 12 Jun 2021 00:23:30 +0200,
Julien Lepiller <julien@lepiller.eu> a écrit :

Toggle quote (33 lines)
> Hi Guix!
>
> I tried updating my system on my armhf board (2GB of RAM), but during
> "guix system reconfigure", guix tries to build itself (the guix
> package from (gnu packages package-management)). This package uses
> too much memory to build, and I start getting GC warnings like so:
>
> GC Warning: Out of memory - trying to allocate requested amount (552
> bytes)...
> GC Warning: Header allocation failed: dropping block
> GC Warning: Out of Memory! Heap size: 2571 MiB. Returning NULL!
> Warning: Unwind-only out of memory exception; skipping pre-unwind
> handler.
>
> after some more progress and a lot more warnings, the build stops (CPU
> is at 2%), but the memory is not freed. The build stayed stuck for a
> long time until I decided to cancel it.
>
> As a work-around, I tried using the guix-daemon package instead (by
> changing the guix field in guix-configuration), but "guix system"
> still wants to build the guix package anyway. I tried finding usages
> of the guix package (grepping for "package-management" yields a
> relatively short list of files, and none of them seems to be used by
> my config, except (gnu services base) which always uses the guix from
> the configuration). Why?
>
> Can we instead break the build (at the Makefile level) into multiple
> smaller chunks, that require less memory, in the same way (guix self)
> works?
>
>
>

also note it's not an OOM issue at the system level, because guile
doesn't seem to care I have 8GB of free swap it could use. Top reports
1.8GB of resident memory being used by the process. I tried to stop all
services at the time in the hope it would save some memory and help,
but guile just kept swallowing all that free memory I gave it.
L
L
Ludovic Courtès wrote on 15 Jun 2021 14:49
control message for bug #48963
(address . control@debbugs.gnu.org)
87mtrrs374.fsf@gnu.org
severity 48963 important
quit
L
L
Ludovic Courtès wrote on 16 Jun 2021 22:01
Re: bug#48963: "guix build guix" halts after exhausting memory
(name . Julien Lepiller)(address . julien@lepiller.eu)(address . 48963@debbugs.gnu.org)
87im2dpoj3.fsf@gnu.org
Hi,

Julien Lepiller <julien@lepiller.eu> skribis:

Toggle quote (16 lines)
> I tried updating my system on my armhf board (2GB of RAM), but during
> "guix system reconfigure", guix tries to build itself (the guix package
> from (gnu packages package-management)). This package uses too much
> memory to build, and I start getting GC warnings like so:
>
> GC Warning: Out of memory - trying to allocate requested amount (552
> bytes)...
> GC Warning: Header allocation failed: dropping block
> GC Warning: Out of Memory! Heap size: 2571 MiB. Returning NULL!
> Warning: Unwind-only out of memory exception; skipping pre-unwind
> handler.
>
> after some more progress and a lot more warnings, the build stops (CPU
> is at 2%), but the memory is not freed. The build stayed stuck for a
> long time until I decided to cancel it.

This is ridiculous. :-/

Toggle quote (4 lines)
> Can we instead break the build (at the Makefile level) into multiple
> smaller chunks, that require less memory, in the same way (guix self)
> works?

Yes, that’s a good idea.

Could you check the extent to which the attached patch helps on this
machine?

It doesn’t split as much as (guix self) does, only in three pieces, but
hopefully that helps a bit. A side effect is that progress report is
now off, but we can fix it later.

Thanks,
Ludo’.
Toggle diff (37 lines)
diff --git a/Makefile.am b/Makefile.am
index aa21b5383b..758d8b9b8a 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -663,7 +663,11 @@ CLEANFILES = \
# the whole thing. Likewise, set 'XDG_CACHE_HOME' to avoid loading possibly
# stale files from ~/.cache/guile/ccache.
%.go: make-go ; @:
-make-go: $(MODULES) guix/config.scm $(dist_noinst_DATA)
+make-go: make-core-go make-packages-go make-system-go
+
+define guile-compilation-rule
+
+$(1): $(2)
$(AM_V_at)echo "Compiling Scheme modules..." ; \
unset GUILE_LOAD_COMPILED_PATH ; \
XDG_CACHE_HOME=/nowhere \
@@ -671,7 +675,19 @@ make-go: $(MODULES) guix/config.scm $(dist_noinst_DATA)
$(top_builddir)/pre-inst-env \
$(GUILE) -L "$(top_builddir)" -L "$(top_srcdir)" \
--no-auto-compile \
- -s "$(top_srcdir)"/build-aux/compile-all.scm $^
+ -s "$(top_srcdir)"/build-aux/compile-all.scm $$(filter %.scm,$$^)
+
+.PHONY: $(1)
+
+endef
+
+MODULES_CORE = $(filter guix/%,$(MODULES))
+MODULES_PACKAGES = $(filter gnu/packages/%,$(MODULES))
+MODULES_SYSTEM = $(filter-out gnu/packages/%,$(filter gnu/%,$(MODULES)))
+
+$(eval $(call guile-compilation-rule,make-core-go,$(MODULES_CORE) guix/config.scm $(dist_noinst_DATA)))
+$(eval $(call guile-compilation-rule,make-packages-go,$(MODULES_PACKAGES) make-core-go))
+$(eval $(call guile-compilation-rule,make-system-go,$(MODULES_SYSTEM) make-packages-go make-core-go))
SUFFIXES = .go
J
J
Julien Lepiller wrote on 17 Jun 2021 15:58
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 48963@debbugs.gnu.org)
78EA8A84-D7DC-4506-91EB-1C304E5DB363@lepiller.eu
Hi Ludo,

I tried your patch by creating a variant of the guix package. My first attempt was a failure because it's still using almost all my memory and I forgot to enable my swap. In the second attempt, the build phase succeeded, but the build failed during the test phase (test/inferior). I've started a new build hoping it's a non-deterministic failure.

Le 16 juin 2021 16:01:20 GMT-04:00, "Ludovic Courtès" <ludo@gnu.org> a écrit :
Toggle quote (39 lines)
>Hi,
>
>Julien Lepiller <julien@lepiller.eu> skribis:
>
>> I tried updating my system on my armhf board (2GB of RAM), but during
>> "guix system reconfigure", guix tries to build itself (the guix
>package
>> from (gnu packages package-management)). This package uses too much
>> memory to build, and I start getting GC warnings like so:
>>
>> GC Warning: Out of memory - trying to allocate requested amount (552
>> bytes)...
>> GC Warning: Header allocation failed: dropping block
>> GC Warning: Out of Memory! Heap size: 2571 MiB. Returning NULL!
>> Warning: Unwind-only out of memory exception; skipping pre-unwind
>> handler.
>>
>> after some more progress and a lot more warnings, the build stops
>(CPU
>> is at 2%), but the memory is not freed. The build stayed stuck for a
>> long time until I decided to cancel it.
>
>This is ridiculous. :-/
>
>> Can we instead break the build (at the Makefile level) into multiple
>> smaller chunks, that require less memory, in the same way (guix self)
>> works?
>
>Yes, that’s a good idea.
>
>Could you check the extent to which the attached patch helps on this
>machine?
>
>It doesn’t split as much as (guix self) does, only in three pieces, but
>hopefully that helps a bit. A side effect is that progress report is
>now off, but we can fix it later.
>
>Thanks,
>Ludo’.
Attachment: file
J
J
Julien Lepiller wrote on 18 Jun 2021 00:17
(address . bug-guix@gnu.org)
20210618001737.105e2e5a@tachikoma.lepiller.eu
Le Thu, 17 Jun 2021 09:58:09 -0400,
Julien Lepiller <julien@lepiller.eu> a écrit :

Toggle quote (9 lines)
> Hi Ludo,
>
> I tried your patch by creating a variant of the guix package. My
> first attempt was a failure because it's still using almost all my
> memory and I forgot to enable my swap. In the second attempt, the
> build phase succeeded, but the build failed during the test phase
> (test/inferior). I've started a new build hoping it's a
> non-deterministic failure.

I'm afraid after testing 2 more times that the test failure is
happening consistently, at least of armhf.
L
L
Ludovic Courtès wrote on 18 Jun 2021 11:51
(name . Julien Lepiller)(address . julien@lepiller.eu)(address . 48963@debbugs.gnu.org)
87lf77h565.fsf@gnu.org
Hi,

Julien Lepiller <julien@lepiller.eu> skribis:

Toggle quote (15 lines)
> Le Thu, 17 Jun 2021 09:58:09 -0400,
> Julien Lepiller <julien@lepiller.eu> a écrit :
>
>> Hi Ludo,
>>
>> I tried your patch by creating a variant of the guix package. My
>> first attempt was a failure because it's still using almost all my
>> memory and I forgot to enable my swap. In the second attempt, the
>> build phase succeeded, but the build failed during the test phase
>> (test/inferior). I've started a new build hoping it's a
>> non-deterministic failure.
>
> I'm afraid after testing 2 more times that the test failure is
> happening consistently, at least of armhf.

(The _build_ failure, right?)

Have you been able to see how this affects max RSS?

Does it go any further than without the patch?

Thanks,
Ludo’.
J
J
Julien Lepiller wrote on 18 Jun 2021 12:52
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 48963@debbugs.gnu.org)
E61C5B68-6995-4CD3-B3A8-77B98FCC430F@lepiller.eu
No, the _test_ failure is consistent. The build itself now always completes, which is much better that before.

Le 18 juin 2021 05:51:14 GMT-04:00, "Ludovic Courtès" <ludo@gnu.org> a écrit :
Toggle quote (27 lines)
>Hi,
>
>Julien Lepiller <julien@lepiller.eu> skribis:
>
>> Le Thu, 17 Jun 2021 09:58:09 -0400,
>> Julien Lepiller <julien@lepiller.eu> a écrit :
>>
>>> Hi Ludo,
>>>
>>> I tried your patch by creating a variant of the guix package. My
>>> first attempt was a failure because it's still using almost all my
>>> memory and I forgot to enable my swap. In the second attempt, the
>>> build phase succeeded, but the build failed during the test phase
>>> (test/inferior). I've started a new build hoping it's a
>>> non-deterministic failure.
>>
>> I'm afraid after testing 2 more times that the test failure is
>> happening consistently, at least of armhf.
>
>(The _build_ failure, right?)
>
>Have you been able to see how this affects max RSS?
>
>Does it go any further than without the patch?
>
>Thanks,
>Ludo’.
Attachment: file
L
L
Ludovic Courtès wrote on 20 Jun 2021 23:00
(name . Julien Lepiller)(address . julien@lepiller.eu)(address . 48963@debbugs.gnu.org)
87v9688d4q.fsf@gnu.org
Hi!

Julien Lepiller <julien@lepiller.eu> skribis:

Toggle quote (2 lines)
> No, the _test_ failure is consistent. The build itself now always completes, which is much better that before.

Ah, that’s great! I can work on a variant of this patch that shows
correct completion numbers.

Could you share ‘tests/inferiors.log’ if that’s the one that’s failing?
Is it also failing for you on x86_64?

Thanks,
Ludo’.
J
J
Julien Lepiller wrote on 21 Jun 2021 13:05
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 48963@debbugs.gnu.org)
20210621130510.0140bf8c@tachikoma.lepiller.eu
Le Sun, 20 Jun 2021 23:00:53 +0200,
Ludovic Courtès <ludo@gnu.org> a écrit :

Toggle quote (16 lines)
> Hi!
>
> Julien Lepiller <julien@lepiller.eu> skribis:
>
> > No, the _test_ failure is consistent. The build itself now always
> > completes, which is much better that before.
>
> Ah, that’s great! I can work on a variant of this patch that shows
> correct completion numbers.
>
> Could you share ‘tests/inferiors.log’ if that’s the one that’s
> failing? Is it also failing for you on x86_64?
>
> Thanks,
> Ludo’.

Here is the content of `tests/inferiors.log`. The failure is the same
on armhf and x86_64:

Backtrace:
2 (primitive-load-path "tests/inferior.scm")
In ice-9/eval.scm:
626:19 1 (_ #<directory (test-inferior) 7ffff5d7ad20>)
In unknown file:
0 (dirname #f)

ERROR: In procedure dirname:
In procedure scm_to_utf8_stringn: Wrong type argument in position 1
(expecting string): #f
L
L
Ludovic Courtès wrote on 23 Jun 2021 23:43
(name . Julien Lepiller)(address . julien@lepiller.eu)(address . 48963-done@debbugs.gnu.org)
87zgvgtfxg.fsf@gnu.org
Hi,

Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (15 lines)
> Julien Lepiller <julien@lepiller.eu> skribis:

>> Can we instead break the build (at the Makefile level) into multiple
>> smaller chunks, that require less memory, in the same way (guix self)
>> works?
>
> Yes, that’s a good idea.
>
> Could you check the extent to which the attached patch helps on this
> machine?
>
> It doesn’t split as much as (guix self) does, only in three pieces, but
> hopefully that helps a bit. A side effect is that progress report is
> now off, but we can fix it later.

I pushed a variant of this patch, resorting to an ugly hack so that
compile-all.scm can still estimate progress. The estimate is correct if
you’re building all the .go files, not so much if you just do, say:

make
rm guix/cpio.go && make

because guix/cpio.scm is in the first module set, so it’ll think it’s
starting from scratch and display “0%”.

Fixing it is left as an exercise to the reader. :-)

It seemed more important to me to fix the memory exhaustion issue first.

Thanks,
Ludo’.
Closed
?