guix-packages-base.drv leads to segfault on i686-linux

OpenSubmitted by Ludovic Courtès.
Details
3 participants
  • Ludovic Courtès
  • Maxim Cournoyer
  • André Batista
Owner
unassigned
Severity
important
L
L
Ludovic Courtès wrote on 24 Jan 17:56 +0100
(address . bug-guix@gnu.org)
87o841qdh4.fsf@inria.fr
Hello,

This command fails:

guix pull -s i686-linux \
--commit=13b905bf28ec6309043bd61c5a92744b13352021 \
-p /tmp/test

‘guix-packages-base.drv’ fails to build due to a Guile segfault (!):

Toggle snippet (12 lines)
[653/656] compiling... 99.1% of 328 files[654/656] compiling... 99.4% of 328 files[655/656] compiling... 99.7% of 328 filesGC Warning: Failed to expand heap by 8388608 bytes
GC Warning: Failed to expand heap by 8388608 bytes
GC Warning: Failed to expand heap by 8388608 bytes

[...]

GC Warning: Failed to expand heap by 8388608 bytes
GC Warning: Failed to expand heap by 8388608 bytes
builder for `/gnu/store/cnrmdbcyy8r9bs2gl2kgpnyplivrzf5c-guix-packages-base.drv' failed due to signal 11 (Segmentation fault)
@ build-failed /gnu/store/cnrmdbcyy8r9bs2gl2kgpnyplivrzf5c-guix-packages-base.drv - 1 builder for `/gnu/store/cnrmdbcyy8r9bs2gl2kgpnyplivrzf5c-guix-packages-base.drv' failed due to signal 11 (Segmentation fault)

I wonder when it started failing, but it may have been pre-core-updates
merge.

Ricardo, do you have a known-good commit?

passes 25% of the time.

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 24 Jan 18:04 +0100
control message for bug #53506
(address . control@debbugs.gnu.org)
87mtjlqd3r.fsf@gnu.org
severity 53506 important
quit
L
L
Ludovic Courtès wrote on 24 Jan 18:05 +0100
control message for bug #53214
(address . control@debbugs.gnu.org)
87lez5qd36.fsf@gnu.org
block 53214 by 53506
quit
L
L
Ludovic Courtès wrote on 24 Jan 18:15 +0100
Re: bug#53506: guix-packages-base.drv leads to segfault on i686-linux
(address . 53506@debbugs.gnu.org)
87czkhqcma.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (13 lines)
> ‘guix-packages-base.drv’ fails to build due to a Guile segfault (!):
>
> [653/656] compiling... 99.1% of 328 files[654/656] compiling... 99.4% of 328 files[655/656] compiling... 99.7% of 328 filesGC Warning: Failed to expand heap by 8388608 bytes
> GC Warning: Failed to expand heap by 8388608 bytes
> GC Warning: Failed to expand heap by 8388608 bytes
>
> [...]
>
> GC Warning: Failed to expand heap by 8388608 bytes
> GC Warning: Failed to expand heap by 8388608 bytes
> builder for `/gnu/store/cnrmdbcyy8r9bs2gl2kgpnyplivrzf5c-guix-packages-base.drv' failed due to signal 11 (Segmentation fault)
> @ build-failed /gnu/store/cnrmdbcyy8r9bs2gl2kgpnyplivrzf5c-guix-packages-base.drv - 1 builder for `/gnu/store/cnrmdbcyy8r9bs2gl2kgpnyplivrzf5c-guix-packages-base.drv' failed due to signal 11 (Segmentation fault)

On closer inspection, this is caused by OOM, with Guile peaking at 2.8G
resident (!) at that point, more than on x86_64.

I’m quite sure this is because the compiler resorts to bignums more than
on x86_64 (fixnums are smaller), thereby consuming more heap.

Splitting creates-io.scm into two files might work as a temporary
workaround since the compiler creates a number of labels (integers)
roughly proportional to the number of lines in the file:

Toggle snippet (13 lines)
$ wc -l gnu/packages/*.scm|sort -k1 -n |tail
13977 gnu/packages/java.scm
15275 gnu/packages/bioconductor.scm
15929 gnu/packages/bioinformatics.scm
16086 gnu/packages/haskell-xyz.scm
20378 gnu/packages/lisp-xyz.scm
28770 gnu/packages/python-xyz.scm
29960 gnu/packages/emacs-xyz.scm
32071 gnu/packages/cran.scm
70442 gnu/packages/crates-io.scm
690662 totalo

L
L
Ludovic Courtès wrote on 7 Feb 23:00 +0100
(address . 53506@debbugs.gnu.org)
8735kue3r8.fsf@gnu.org
Hi!

Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (18 lines)
> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> ‘guix-packages-base.drv’ fails to build due to a Guile segfault (!):
>>
>> [653/656] compiling... 99.1% of 328 files[654/656] compiling... 99.4% of 328 files[655/656] compiling... 99.7% of 328 filesGC Warning: Failed to expand heap by 8388608 bytes
>> GC Warning: Failed to expand heap by 8388608 bytes
>> GC Warning: Failed to expand heap by 8388608 bytes
>>
>> [...]
>>
>> GC Warning: Failed to expand heap by 8388608 bytes
>> GC Warning: Failed to expand heap by 8388608 bytes
>> builder for `/gnu/store/cnrmdbcyy8r9bs2gl2kgpnyplivrzf5c-guix-packages-base.drv' failed due to signal 11 (Segmentation fault)
>> @ build-failed /gnu/store/cnrmdbcyy8r9bs2gl2kgpnyplivrzf5c-guix-packages-base.drv - 1 builder for `/gnu/store/cnrmdbcyy8r9bs2gl2kgpnyplivrzf5c-guix-packages-base.drv' failed due to signal 11 (Segmentation fault)
>
> On closer inspection, this is caused by OOM, with Guile peaking at 2.8G
> resident (!) at that point, more than on x86_64.

An update: with changes made in Guile “main” over the last couple of
weeks, memory consumption is 20% lower and compilation is 20% faster
compared to 3.0.7 (on x86_64):

Toggle snippet (18 lines)
$ ./pre-inst-env time -f '%U seconds\n%M KiB' guile -c '(use-modules (system base compile)) (compile-file "gnu/packages/crates-io.scm" #:optimization-level 1)'
53.84 seconds
795972 KiB
$ ./pre-inst-env time -f '%U seconds\n%M KiB' /data/src/guile-3.0/meta/guile -c '(use-modules (system base compile)) (compile-file "gnu/packages/crates-io.scm" #:optimization-level 1 #:opts (list #:inlinable-exports? #f #:resolve-free-vars? #f))'
43.00 seconds
618724 KiB
$
$ guile --version
guile (GNU Guile) 3.0.7
Copyright (C) 2021 Free Software Foundation, Inc.

License LGPLv3+: GNU LGPL 3 or later <http://gnu.org/licenses/lgpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
$ (cd /data/src/guile-3.0; git log | head -1)
commit 2aed3c117c2d667ecca1e38a016f2cb4b524ab50

To be continued…

Ludo’.
M
M
Maxim Cournoyer wrote on 8 Feb 15:03 +0100
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 53506@debbugs.gnu.org)
878rul317k.fsf@gmail.com
Hi Ludovic!

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (51 lines)
> Hi!
>
> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> Ludovic Courtès <ludo@gnu.org> skribis:
>>
>>> ‘guix-packages-base.drv’ fails to build due to a Guile segfault (!):
>>>
>>> [653/656] compiling... 99.1% of 328 files[654/656] compiling...
>>> 99.4% of 328 files[655/656] compiling... 99.7% of 328 filesGC
>>> Warning: Failed to expand heap by 8388608 bytes
>>> GC Warning: Failed to expand heap by 8388608 bytes
>>> GC Warning: Failed to expand heap by 8388608 bytes
>>>
>>> [...]
>>>
>>> GC Warning: Failed to expand heap by 8388608 bytes
>>> GC Warning: Failed to expand heap by 8388608 bytes
>>> builder for
>>> `/gnu/store/cnrmdbcyy8r9bs2gl2kgpnyplivrzf5c-guix-packages-base.drv'
>>> failed due to signal 11 (Segmentation fault)
>>> @ build-failed
>>> /gnu/store/cnrmdbcyy8r9bs2gl2kgpnyplivrzf5c-guix-packages-base.drv
>>> - 1 builder for
>>> `/gnu/store/cnrmdbcyy8r9bs2gl2kgpnyplivrzf5c-guix-packages-base.drv'
>>> failed due to signal 11 (Segmentation fault)
>>
>> On closer inspection, this is caused by OOM, with Guile peaking at 2.8G
>> resident (!) at that point, more than on x86_64.
>
> An update: with changes made in Guile “main” over the last couple of
> weeks, memory consumption is 20% lower and compilation is 20% faster
> compared to 3.0.7 (on x86_64):
>
> $ ./pre-inst-env time -f '%U seconds\n%M KiB' guile -c '(use-modules (system base compile)) (compile-file "gnu/packages/crates-io.scm" #:optimization-level 1)'
> 53.84 seconds
> 795972 KiB
> $ ./pre-inst-env time -f '%U seconds\n%M KiB' /data/src/guile-3.0/meta/guile -c '(use-modules (system base compile)) (compile-file "gnu/packages/crates-io.scm" #:optimization-level 1 #:opts (list #:inlinable-exports? #f #:resolve-free-vars? #f))'
> 43.00 seconds
> 618724 KiB
> $
> $ guile --version
> guile (GNU Guile) 3.0.7
> Copyright (C) 2021 Free Software Foundation, Inc.
>
> License LGPLv3+: GNU LGPL 3 or later <http://gnu.org/licenses/lgpl.html>.
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> $ (cd /data/src/guile-3.0; git log | head -1)
> commit 2aed3c117c2d667ecca1e38a016f2cb4b524ab50

Impressive! Keep up the good work!

Thanks,

Maxim
A
A
André Batista wrote on 12 Feb 01:28 +0100
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 53506@debbugs.gnu.org)
Ygb9R3+eUGQsPHEi@andel
Hi Ludo!

seg 07 fev 2022 às 23:00:27 (1644285627), ludo@gnu.org enviou:
Toggle quote (8 lines)
> Hi!
>
> An update: with changes made in Guile “main” over the last couple of
> weeks, memory consumption is 20% lower and compilation is 20% faster
> compared to 3.0.7 (on x86_64):
>
> To be continued…

I don't know if it is of any help, but after
'076e825dc5d585943ce820a279fffe4af09757fb' I could pull again after
a couple of weeks hitting this bug. Thanks a lot!

Previously I had tried to move rust-tokio* package definitions to
another file but it wasn't enough to work around it.

While parsing crates-io.scm, it also occured to me to try to
create a 'crates-crypto.scm' and move all crypto related definitions
there. Would you think this to be useful even after you get to the
solution you've been chasing? Are there any guix currently
working on chopping crates-io.scm down?

Cheers!
L
L
Ludovic Courtès wrote on 12 Feb 15:24 +0100
(name . André Batista)(address . nandre@riseup.net)(address . 53506@debbugs.gnu.org)
87iltk17tc.fsf@gnu.org
Hi André,

André Batista <nandre@riseup.net> skribis:

Toggle quote (4 lines)
> I don't know if it is of any help, but after
> '076e825dc5d585943ce820a279fffe4af09757fb' I could pull again after
> a couple of weeks hitting this bug. Thanks a lot!

Yes. I just tried ‘guix pull -s i686-linux’ for commit
e641d707e1ec8de2bfc658dcd1757360300aa509 and it passed!

This is certainly due to the reduced heap usage in Guile 3.0.8.
However, while building
/gnu/store/87mqnqwxqbcidbx5bpyrq9xpxmhw1035-guix-packages-base.drv we’re
still peaking at 2.6G resident—only 7% less than before (the packages
files have probably grown in the meantime), so we cannot claim victory
yet.

Toggle quote (9 lines)
> Previously I had tried to move rust-tokio* package definitions to
> another file but it wasn't enough to work around it.
>
> While parsing crates-io.scm, it also occured to me to try to
> create a 'crates-crypto.scm' and move all crypto related definitions
> there. Would you think this to be useful even after you get to the
> solution you've been chasing? Are there any guix currently
> working on chopping crates-io.scm down?

I think splitting the file would still be useful, yes; I don’t think
anyone is working on it.

Another thing to consider would be to balance things a bit better, by
arranging so that fewer modules are in ‘guix-packages-base’:

Toggle snippet (10 lines)
$ find /gnu/store/ry7fcdq7nwqaca6vanzc5d6z22njr92p-guix-packages-base |wc -l
331
$ find /gnu/store/45izww13rx5lll4pl0vj8xl0633bkzh7-guix-packages |wc -l
212
$ find /gnu/store/ry7fcdq7nwqaca6vanzc5d6z22njr92p-guix-packages-base -name crates\*go
/gnu/store/ry7fcdq7nwqaca6vanzc5d6z22njr92p-guix-packages-base/gnu/packages/crates-graphics.go
/gnu/store/ry7fcdq7nwqaca6vanzc5d6z22njr92p-guix-packages-base/gnu/packages/crates-io.go
/gnu/store/ry7fcdq7nwqaca6vanzc5d6z22njr92p-guix-packages-base/gnu/packages/crates-gtk.go

For the record, ‘guix-packages-base’ is computed in (guix self) as the
closure of (gnu packages base).

Thanks for your message!

Ludo’.
?