Cuirass crashes

  • Done
  • quality assurance status badge
Details
3 participants
  • Christopher Baines
  • Mathieu Othacehe
  • Ricardo Wurmus
Owner
unassigned
Submitted by
Mathieu Othacehe
Severity
normal
M
M
Mathieu Othacehe wrote on 11 Sep 2020 13:59
(address . bug-guix@gnu.org)
87d02szfq5.fsf@gnu.org
Hello,

I've observed a few Cuirass crashes the past days. The log looks like:

Toggle snippet (24 lines)
2020-09-11T12:55:35 next evaluation in 300 seconds
GC Warning: Repeated allocation of very large block (appr. size 28766208):
May lead to memory leak and poor performance
2020-09-11T12:58:52 heap: 942.38 MiB; threads: 110; file descriptors: 257
2020-09-11T13:00:35 fetching input 'core-updates' of spec 'core-updates-core-updates'
2020-09-11T13:00:54 fetched input 'core-updates' of spec 'core-updates-core-updates' (commit "1bec03df9b60f156c657a64a323ef27f4ed14b44")
2020-09-11T13:00:54 fetching input 'guix' of spec 'guix-master'
2020-09-11T13:01:13 fetched input 'guix' of spec 'guix-master' (commit "7daa99e52d94e409f05a874813bdf739709807a2")
2020-09-11T13:01:13 evaluating spec 'guix-master'
2020-09-11T13:01:13 fetching input 'guix-modular' of spec 'guix-modular-master'
2020-09-11T13:01:17 fetched input 'guix-modular' of spec 'guix-modular-master' (commit "7daa99e52d94e409f05a874813bdf739709807a2")
2020-09-11T13:01:17 evaluating spec 'guix-modular-master'
2020-09-11T13:01:17 fetching input 'kernel-updates' of spec 'kernel-updates'
2020-09-11T13:01:21 fetched input 'kernel-updates' of spec 'kernel-updates' (commit "1de80be489e443e7c0d8c79ea84762e1706e81ff")
2020-09-11T13:01:21 fetching input 'staging' of spec 'staging-staging'
2020-09-11T13:01:24 fetched input 'staging' of spec 'staging-staging' (commit "de3c03a47160dec355d9b19ad5ca210d90c15fd7")
2020-09-11T13:01:24 fetching input 'version-1.0.1' of spec 'version-1.0.1'
2020-09-11T13:01:27 fetched input 'version-1.0.1' of spec 'version-1.0.1' (commit "58d7909c97c1ab2457faee1d7af925ee32ad15c2")
2020-09-11T13:01:27 fetching input 'version-1.1.0' of spec 'version-1.1.0'
mmap(PROT_NONE) failed
WARNING: (guile-user): imported module (fibers) overrides core binding `sleep'
2020-09-11T13:01:30 performing database optimizations

It looks like a memory allocation failed causing a Cuirass/Guile crash.

Thanks,

Mathieu
R
R
Ricardo Wurmus wrote on 11 Sep 2020 14:53
(name . Mathieu Othacehe)(address . othacehe@gnu.org)(address . 43334@debbugs.gnu.org)
87r1r81nld.fsf@elephly.net
Mathieu Othacehe <othacehe@gnu.org> writes:

Toggle quote (31 lines)
> Hello,
>
> I've observed a few Cuirass crashes the past days. The log looks like:
>
> --8<---------------cut here---------------start------------->8---
> 2020-09-11T12:55:35 next evaluation in 300 seconds
> GC Warning: Repeated allocation of very large block (appr. size 28766208):
> May lead to memory leak and poor performance
> 2020-09-11T12:58:52 heap: 942.38 MiB; threads: 110; file descriptors: 257
> 2020-09-11T13:00:35 fetching input 'core-updates' of spec 'core-updates-core-updates'
> 2020-09-11T13:00:54 fetched input 'core-updates' of spec 'core-updates-core-updates' (commit "1bec03df9b60f156c657a64a323ef27f4ed14b44")
> 2020-09-11T13:00:54 fetching input 'guix' of spec 'guix-master'
> 2020-09-11T13:01:13 fetched input 'guix' of spec 'guix-master' (commit "7daa99e52d94e409f05a874813bdf739709807a2")
> 2020-09-11T13:01:13 evaluating spec 'guix-master'
> 2020-09-11T13:01:13 fetching input 'guix-modular' of spec 'guix-modular-master'
> 2020-09-11T13:01:17 fetched input 'guix-modular' of spec 'guix-modular-master' (commit "7daa99e52d94e409f05a874813bdf739709807a2")
> 2020-09-11T13:01:17 evaluating spec 'guix-modular-master'
> 2020-09-11T13:01:17 fetching input 'kernel-updates' of spec 'kernel-updates'
> 2020-09-11T13:01:21 fetched input 'kernel-updates' of spec 'kernel-updates' (commit "1de80be489e443e7c0d8c79ea84762e1706e81ff")
> 2020-09-11T13:01:21 fetching input 'staging' of spec 'staging-staging'
> 2020-09-11T13:01:24 fetched input 'staging' of spec 'staging-staging' (commit "de3c03a47160dec355d9b19ad5ca210d90c15fd7")
> 2020-09-11T13:01:24 fetching input 'version-1.0.1' of spec 'version-1.0.1'
> 2020-09-11T13:01:27 fetched input 'version-1.0.1' of spec 'version-1.0.1' (commit "58d7909c97c1ab2457faee1d7af925ee32ad15c2")
> 2020-09-11T13:01:27 fetching input 'version-1.1.0' of spec 'version-1.1.0'
> mmap(PROT_NONE) failed
> WARNING: (guile-user): imported module (fibers) overrides core binding `sleep'
> 2020-09-11T13:01:30 performing database optimizations
> --8<---------------cut here---------------end--------------->8---
>
> It looks like a memory allocation failed causing a Cuirass/Guile crash.

On ci.guix.gnu.org? We have 188GiB RAM there according to free.

--
Ricardo
C
C
Christopher Baines wrote on 11 Sep 2020 21:22
(name . Mathieu Othacehe)(address . othacehe@gnu.org)(address . 43334@debbugs.gnu.org)
87y2lgglt8.fsf@cbaines.net
Mathieu Othacehe <othacehe@gnu.org> writes:

Toggle quote (31 lines)
> Hello,
>
> I've observed a few Cuirass crashes the past days. The log looks like:
>
> --8<---------------cut here---------------start------------->8---
> 2020-09-11T12:55:35 next evaluation in 300 seconds
> GC Warning: Repeated allocation of very large block (appr. size 28766208):
> May lead to memory leak and poor performance
> 2020-09-11T12:58:52 heap: 942.38 MiB; threads: 110; file descriptors: 257
> 2020-09-11T13:00:35 fetching input 'core-updates' of spec 'core-updates-core-updates'
> 2020-09-11T13:00:54 fetched input 'core-updates' of spec 'core-updates-core-updates' (commit "1bec03df9b60f156c657a64a323ef27f4ed14b44")
> 2020-09-11T13:00:54 fetching input 'guix' of spec 'guix-master'
> 2020-09-11T13:01:13 fetched input 'guix' of spec 'guix-master' (commit "7daa99e52d94e409f05a874813bdf739709807a2")
> 2020-09-11T13:01:13 evaluating spec 'guix-master'
> 2020-09-11T13:01:13 fetching input 'guix-modular' of spec 'guix-modular-master'
> 2020-09-11T13:01:17 fetched input 'guix-modular' of spec 'guix-modular-master' (commit "7daa99e52d94e409f05a874813bdf739709807a2")
> 2020-09-11T13:01:17 evaluating spec 'guix-modular-master'
> 2020-09-11T13:01:17 fetching input 'kernel-updates' of spec 'kernel-updates'
> 2020-09-11T13:01:21 fetched input 'kernel-updates' of spec 'kernel-updates' (commit "1de80be489e443e7c0d8c79ea84762e1706e81ff")
> 2020-09-11T13:01:21 fetching input 'staging' of spec 'staging-staging'
> 2020-09-11T13:01:24 fetched input 'staging' of spec 'staging-staging' (commit "de3c03a47160dec355d9b19ad5ca210d90c15fd7")
> 2020-09-11T13:01:24 fetching input 'version-1.0.1' of spec 'version-1.0.1'
> 2020-09-11T13:01:27 fetched input 'version-1.0.1' of spec 'version-1.0.1' (commit "58d7909c97c1ab2457faee1d7af925ee32ad15c2")
> 2020-09-11T13:01:27 fetching input 'version-1.1.0' of spec 'version-1.1.0'
> mmap(PROT_NONE) failed
> WARNING: (guile-user): imported module (fibers) overrides core binding `sleep'
> 2020-09-11T13:01:30 performing database optimizations
> --8<---------------cut here---------------end--------------->8---
>
> It looks like a memory allocation failed causing a Cuirass/Guile crash.

So, I've seen this before but in a slightly different context, [1]. To
summarise, with Guile built with libgc@8 the Guix Data Service couldn't
processes Guix revisions, because the code it had Guile built with
libgc@8 run caused it to consistently crash with this error. The
workaround was to add a Guile variant built with libgc@7 and use this
for the guix package [2].


I'm not quite sure what Guile process is crashing here, but switching to
use Guile built with libgc@7 might help.
-----BEGIN PGP SIGNATURE-----

iQKTBAEBCgB9FiEEPonu50WOcg2XVOCyXiijOwuE9XcFAl9bzpNfFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcACgkQXiijOwuE
9XdJEw/9GeKVp3rBK3AilPeTiU9mqE8Qw/LjyTtl+VPoFM7TlRHXZQmtE7j7ydQP
m/gU06jw9KlO0zIa321FeNGeZKym1fraCMfck9Z5EcR9DbcgrsINrURxS8zZhgJP
Y7yUF1LPRNupF7EvS5ITvGziI84KobZuAiQ3TILoUg/O99sZMh9Z4hkc3ifRPavt
TzFhYwtFxuZDjDTwdJl7gGD+L5EErWfW00iDsD/s3EEmBtKQfQD14JHEeOo0LMtt
SrVRmLYzUc1v6f5aRgA9ZebynWmKGAqnaqNFjvq/0gC4wGAIJrLtN8JsMckaR9ki
b5qxHa1JK+dIVDCNgV96dTMaXXgEJa7gfrYrvCd2ppaQfoly46BLfGW7B/6UppFc
y/Q8AdaDGPGL0tGot86MMtVDYrVvI5ypMvRK0nmHNyqZ9ZRqFrMknqJMcotup7yq
q4p3WCeRSksGThU1cZ+e45rz1LFXKH3NiHTAogRnV19LKRGwuz5zDJBCVYNyXwUs
nCbAG0j6iprH5rnTQHfWEsQ3sOQiLSPDEIXhBOXiEu2dZB3cST2ADTU79M1pIm3x
RAIdL59suxMPq4sFMzqG8O5nGLcEenCbTA1h+HHPCsjd/GV7ROLh4EnX7h5+qpC+
Jt5vQ2zh9QZjZUJkURVPxSMbykgsl1a4wgyrzVRoN2SczJuYRCk=
=qMmE
-----END PGP SIGNATURE-----

M
M
Mathieu Othacehe wrote on 12 Sep 2020 08:46
(name . Christopher Baines)(address . mail@cbaines.net)(address . 43334@debbugs.gnu.org)
87363nijaz.fsf@gnu.org
Hey Chris,

Toggle quote (15 lines)
>> It looks like a memory allocation failed causing a Cuirass/Guile crash.
>
> So, I've seen this before but in a slightly different context, [1]. To
> summarise, with Guile built with libgc@8 the Guix Data Service couldn't
> processes Guix revisions, because the code it had Guile built with
> libgc@8 run caused it to consistently crash with this error. The
> workaround was to add a Guile variant built with libgc@7 and use this
> for the guix package [2].
>
> 1: http://issues.guix.info/40525
> 2: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=40684
>
> I'm not quite sure what Guile process is crashing here, but switching to
> use Guile built with libgc@7 might help.

Thanks for pointing to this, I somehow missed it at the time. I
collected the strace log which sounds indeed really similar:

Toggle snippet (14 lines)
[pid 49511] getdents64(271, 0x7f5374304930 /* 455 entries */, 32768) = 32760
[pid 42583] mmap(0x7f5361976000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
[pid 42583] write(2, "mmap(PROT_NONE) failed", 22) = 22
[pid 42583] write(2, "\n", 1) = 1
[pid 42583] rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
[pid 42583] rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
[pid 42583] getpid() = 42562
[pid 42583] gettid() = 42583
[pid 42583] tgkill(42562, 42583, SIGABRT) = 0
[pid 42583] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 42583] --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=42562, si_uid=997} ---
[pid 42738] <... read resumed> <unfinished ...>) = ?

The abort seem to be received by the finalizer thread. I can try to use
guile-3.0/libgc-7 to confirm this theory, but I guess we'll need to dig
deeper.

Thanks,

Mathieu
M
M
Mathieu Othacehe wrote on 25 Mar 2021 13:49
(address . 43334-done@debbugs.gnu.org)
87h7kz5rga.fsf@gnu.org
Hello,

Closing as Cuirass evaluation process now uses less memory.

Thanks,

Mathieu
Closed
?
Your comment

This issue is archived.

To comment on this conversation send an email to 43334@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 43334
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch