grafts cause “guix_ environment” to get killed with OOM

OpenSubmitted by Ricardo Wurmus.
Details
4 participants
  • Sarah Morgensen
  • Julien Lepiller
  • Ludovic Courtès
  • Ricardo Wurmus
Owner
unassigned
Severity
important
R
R
Ricardo Wurmus wrote on 6 Jul 16:38 +0200
grafts cause “guix environment” to get killed with OOM
(address . bug-guix@gnu.org)
87sg0rse1n.fsf@elephly.net
With a recent version of Guix, “guix environment” will not terminate on its own, keeps the CPU busy, and gets killed when the system eventually runs out of memory.
$ guix describe -f channels
Toggle snippet (8 lines)(list (channel (name 'guix) (url "/home/rekado/dev/gx/branches/master") (commit "685cfdec94e5e48c4ad28de53466a28dfc258edb")))

$ guix environment pigx-scrnaseq[wait until it gets killed]
The problem disappears when grafts are disabled:
$ guix environment --no-grafts pigx-scrnaseq$ [env] yay!
-- Ricardo
L
L
Ludovic Courtès wrote on 9 Jul 11:31 +0200
control message for bug #49439
(address . control@debbugs.gnu.org)
87eec7j0k6.fsf@gnu.org
severity 49439 importantquit
S
S
Sarah Morgensen wrote on 23 Jul 06:59 +0200
Re: bug#49439: grafts cause “guix environment ” to get killed with OOM
(name . Ricardo Wurmus)(address . rekado@elephly.net)(address . 49439@debbugs.gnu.org)
86tuklr5g6.fsf@mgsn.dev
Hello,
Ricardo Wurmus <rekado@elephly.net> writes:
Toggle quote (16 lines)> With a recent version of Guix, “guix environment” will not > terminate on its own, keeps the CPU busy, and gets killed when the > system eventually runs out of memory.>> $ guix describe -f channels>> (list (channel> (name 'guix)> (url "/home/rekado/dev/gx/branches/master")> (commit> "685cfdec94e5e48c4ad28de53466a28dfc258edb")))>>> $ guix environment pigx-scrnaseq> [wait until it gets killed]
I can reproduce this with pigx-scrnaseq as well a number of otherpackages (listed below).
$ ./pre-inst-env guix describe -f channels(list (channel (name 'guix) (url "/home/sarah/guix") (commit "3217a04b0352c2dd13323257b369604eeabfccc3")))
Does not complete within 5 minutes:package # inputs # transitive inputs (from package-transitive-inputs)pigx-chipseq 48 338pigx-scrnaseq 41 321r-cellchat 34 110pigx-rnaseq 34 343pigx-bsseq 32 358pigx-sars-cov2-ww 25 261r-circus 16 134
Does complete:r-chipseq 6 37 completes in >2mr-shortread 17 36 completes in >1mpython-scanpy 25 113 completes in <15s
I suspect it has something to do with the number of transitive inputs,because it is so prevalent with these R packages, which all usepropagated inputs. However... python-scanpy succeeds in under 15seconds, and it has more transitive inputs than r-chipseq.
Can we reproduce this with a large number of low-transitivity packagesdirectly on the command line?
Toggle quote (6 lines)>> The problem disappears when grafts are disabled:>> $ guix environment --no-grafts pigx-scrnaseq> $ [env] yay!
--Sarah
J
J
Julien Lepiller wrote on 27 Jul 16:52 +0200
Re: bug#49439: grafts cause “guix environment” to get killed with OOM
(address . 49439@debbugs.gnu.org)
464151E3-FBF5-4B80-947E-0F1291FD879D@lepiller.eu
I have a similar issue with an ocaml package I use at work. It's not free software, but all its dependencies are. The dependencies are not all yeet in guix, so to reproduce you might have to import them first with "guix import opam -r foo" for ocaml-foo.
The package depends on ocaml-ounit, ocaml-lp, ocaml-apronext, menhir, ocaml-async, ocaml-core, ocaml-graph, ocaml-libsvm, ocaml-minisat, ocaml-ppx-deriving-yojson, ocaml-yojson, ocaml-z3, ocaml-zarith and z3.
In total, that's 118 transitive inputs. Building the profile takes 30 minutes for me, on an SSD. The builder takes 1.5g resident.
Other than that, I measured time and memory for creating the environment when the profile was already built (no more derivation to build):
`which time` -v ~/guix/pre-inst-env guix environment ocaml-dummy-package -- echo done
User time: 121.43sSystem time: 2.28sMaximum resident: 1803028kB (1.8 GB)
With a warning from GC:
Repeated allocation of very large block (approx. size 35606528)
Note that I get the same numbers with --no-grafts, so it might be a different issue.
"guix build" terminates quickly.
Le 23 juillet 2021 00:59:21 GMT-04:00, Sarah Morgensen <iskarian@mgsn.dev> a écrit :
Toggle quote (62 lines)>Hello,>>Ricardo Wurmus <rekado@elephly.net> writes:>>> With a recent version of Guix, “guix environment” will not >> terminate on its own, keeps the CPU busy, and gets killed when the >> system eventually runs out of memory.>>>> $ guix describe -f channels>>>> (list (channel>> (name 'guix)>> (url "/home/rekado/dev/gx/branches/master")>> (commit>> "685cfdec94e5e48c4ad28de53466a28dfc258edb")))>>>>>> $ guix environment pigx-scrnaseq>> [wait until it gets killed]>>I can reproduce this with pigx-scrnaseq as well a number of other>packages (listed below).>>$ ./pre-inst-env guix describe -f channels>(list (channel> (name 'guix)> (url "/home/sarah/guix")> (commit> "3217a04b0352c2dd13323257b369604eeabfccc3")))>>Does not complete within 5 minutes:>package # inputs # transitive inputs> (from package-transitive-inputs)>pigx-chipseq 48 338>pigx-scrnaseq 41 321>r-cellchat 34 110>pigx-rnaseq 34 343>pigx-bsseq 32 358>pigx-sars-cov2-ww 25 261>r-circus 16 134>>Does complete:>r-chipseq 6 37 completes in >2m>r-shortread 17 36 completes in >1m>python-scanpy 25 113 completes in <15s>>I suspect it has something to do with the number of transitive inputs,>because it is so prevalent with these R packages, which all use>propagated inputs. However... python-scanpy succeeds in under 15>seconds, and it has more transitive inputs than r-chipseq.>>Can we reproduce this with a large number of low-transitivity packages>directly on the command line?>>>>> The problem disappears when grafts are disabled:>>>> $ guix environment --no-grafts pigx-scrnaseq>> $ [env] yay!>>-->Sarah
Attachment: file
L
L
Ludovic Courtès wrote on 27 Jul 18:28 +0200
Re: bug#49439: grafts cause “guix environment ” to get killed with OOM
(name . Sarah Morgensen)(address . iskarian@mgsn.dev)
87y29r3enb.fsf@gnu.org
Hi,
Sarah Morgensen <iskarian@mgsn.dev> skribis:
Toggle quote (30 lines)> Ricardo Wurmus <rekado@elephly.net> writes:>>> With a recent version of Guix, “guix environment” will not >> terminate on its own, keeps the CPU busy, and gets killed when the >> system eventually runs out of memory.>>>> $ guix describe -f channels>>>> (list (channel>> (name 'guix)>> (url "/home/rekado/dev/gx/branches/master")>> (commit>> "685cfdec94e5e48c4ad28de53466a28dfc258edb")))>>>>>> $ guix environment pigx-scrnaseq>> [wait until it gets killed]>> I can reproduce this with pigx-scrnaseq as well a number of other> packages (listed below).>> $ ./pre-inst-env guix describe -f channels> (list (channel> (name 'guix)> (url "/home/sarah/guix")> (commit> "3217a04b0352c2dd13323257b369604eeabfccc3")))>> Does not complete within 5 minutes:
What hardware are you using?
Here’s what I observe (with everything already in store and on a hotcache, with an i7 CPU):
Toggle snippet (28 lines)$ guix describeGeneracio 188 Jul 25 2021 12:47:29 (nuna) guix a92dfbc repository URL: https://git.savannah.gnu.org/git/guix.git branch: master commit: a92dfbce30777de6ca05031e275410cf9f56c84c$ time GUIX_PROFILING=gc guix environment pigx-scrnaseq --search-paths --no-grafts >/dev/nullGarbage collection statistics: heap size: 160.31 MiB allocated: 1440.70 MiB GC times: 39 time spent in GC: 4.51 seconds (46% of user time)
real 0m7.534suser 0m9.747ssys 0m0.167s$ time GUIX_PROFILING=gc guix environment pigx-scrnaseq --search-paths >/dev/nullGarbage collection statistics: heap size: 168.31 MiB allocated: 2111.71 MiB GC times: 53 time spent in GC: 6.92 seconds (45% of user time)
real 0m12.100suser 0m15.517ssys 0m0.198s
Commit 78daf9e02e5bc51f91488d8237cab2050cc060cf optimizes‘coalesce-duplicate-inputs’, which was quadratic in the number of inputs (!).After that change, I get:
Toggle snippet (22 lines)$ time GUIX_PROFILING=gc ./pre-inst-env guix environment pigx-scrnaseq --search-paths --no-grafts >/dev/nullGarbage collection statistics: heap size: 168.31 MiB allocated: 716.58 MiB GC times: 24 time spent in GC: 2.65 seconds (40% of user time)
real 0m5.720suser 0m6.708ssys 0m0.149s$ time GUIX_PROFILING=gc ./pre-inst-env guix environment pigx-scrnaseq --search-paths >/dev/nullGarbage collection statistics: heap size: 160.31 MiB allocated: 1387.96 MiB GC times: 42 time spent in GC: 5.87 seconds (43% of user time)
real 0m10.821suser 0m13.513ssys 0m0.198s
Could you tell me if it improves the situation for you?
It’s not the end of the road, but it should be an improvement.
Thanks,Ludo’.
S
S
Sarah Morgensen wrote on 28 Jul 01:35 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
867dhbpbyt.fsf@mgsn.dev
Hi,
Ludovic Courtès <ludo@gnu.org> writes:
[...]
Toggle quote (12 lines)>> I can reproduce this with pigx-scrnaseq as well a number of other>> packages (listed below).>>>> $ ./pre-inst-env guix describe -f channels>> (list (channel>> (name 'guix)>> (url "/home/sarah/guix")>> (commit>> "3217a04b0352c2dd13323257b369604eeabfccc3")))>>>> Does not complete within 5 minutes:
Okay, so all of a sudden I can't reproduce this; even with the samecommit as above, it completes in ~20s.
guix time-machine --commit=3217a04 -- environment pigx-scrnaseq --search-paths >/dev/null
Toggle quote (2 lines)> What hardware are you using?
Virtualbox VM with VT-x etc. on a host i7-6700. The VM has 6GB of memory.
Toggle quote (59 lines)>> Here’s what I observe (with everything already in store and on a hot> cache, with an i7 CPU):>> $ guix describe> Generacio 188 Jul 25 2021 12:47:29 (nuna)> guix a92dfbc> repository URL: https://git.savannah.gnu.org/git/guix.git> branch: master> commit: a92dfbce30777de6ca05031e275410cf9f56c84c> $ time GUIX_PROFILING=gc guix environment pigx-scrnaseq --search-paths --no-grafts >/dev/null> Garbage collection statistics:> heap size: 160.31 MiB> allocated: 1440.70 MiB> GC times: 39> time spent in GC: 4.51 seconds (46% of user time)>> real 0m7.534s> user 0m9.747s> sys 0m0.167s> $ time GUIX_PROFILING=gc guix environment pigx-scrnaseq --search-paths >/dev/null> Garbage collection statistics:> heap size: 168.31 MiB> allocated: 2111.71 MiB> GC times: 53> time spent in GC: 6.92 seconds (45% of user time)>> real 0m12.100s> user 0m15.517s> sys 0m0.198s>>> Commit 78daf9e02e5bc51f91488d8237cab2050cc060cf optimizes> ‘coalesce-duplicate-inputs’, which was quadratic in the number of inputs (!).> After that change, I get:>> $ time GUIX_PROFILING=gc ./pre-inst-env guix environment pigx-scrnaseq --search-paths --no-grafts >/dev/null> Garbage collection statistics:> heap size: 168.31 MiB> allocated: 716.58 MiB> GC times: 24> time spent in GC: 2.65 seconds (40% of user time)>> real 0m5.720s> user 0m6.708s> sys 0m0.149s> $ time GUIX_PROFILING=gc ./pre-inst-env guix environment pigx-scrnaseq --search-paths >/dev/null> Garbage collection statistics:> heap size: 160.31 MiB> allocated: 1387.96 MiB> GC times: 42> time spent in GC: 5.87 seconds (43% of user time)>> real 0m10.821s> user 0m13.513s> sys 0m0.198s>> Could you tell me if it improves the situation for you?
*Now* my experience is like yours:
Toggle snippet (30 lines)$ guix describeGeneration 9 Jul 27 2021 12:35:05 (current) guix d0ec739 repository URL: https://git.savannah.gnu.org/git/guix.git branch: master commit: d0ec73907c2995034a34339f4a7c2c72c2e21fea
time GUIX_PROFILING=gc guix time-machine --commit=3217a04 -- environment pigx-scrnaseq --search-paths >/dev/nullGarbage collection statistics: heap size: 176.31 MiB allocated: 2107.82 MiB GC times: 52 time spent in GC: 5.26 seconds (23% of user time)
real 0m20.471suser 0m22.605ssys 0m0.372s
$ time GUIX_PROFILING=gc guix environment pigx-scrnaseq --search-paths >/dev/nullGarbage collection statistics: heap size: 152.31 MiB allocated: 1367.16 MiB GC times: 40 time spent in GC: 3.25 seconds (21% of user time)
real 0m14.701suser 0m15.698ssys 0m0.361s
But why was it occurring before? The only I thing I can think of is thatI didn't have everything in the store first. Is there a way I can prunejust the relevant items from the store to test this?
Toggle quote (6 lines)>> It’s not the end of the road, but it should be an improvement.>> Thanks,> Ludo’.
--Sarah
L
L
Ludovic Courtès wrote 7 days ago
(name . Sarah Morgensen)(address . iskarian@mgsn.dev)
87k0la3ghk.fsf@gnu.org
Hi,
Sarah Morgensen <iskarian@mgsn.dev> skribis:
Toggle quote (4 lines)> But why was it occurring before? The only I thing I can think of is that> I didn't have everything in the store first. Is there a way I can prune> just the relevant items from the store to test this?
You could try something like:
guix gc -D $(guix gc --references $(guix build pigx-scrnaseq)) \ $(guix gc --references $(guix build pigx-scrnaseq --no-grafts))
Thinking about it, the grafts code depends on what’s in the store: whennothing is in the store, it bounces to the “build handler”, whichaccumulates the list of missing store items, until it starts buildingthem.
Let’s see if I can reproduce that behavior, too…
Thanks,Ludo’.
L
L
Ludovic Courtès wrote 6 days ago
(name . Sarah Morgensen)(address . iskarian@mgsn.dev)
87wnpa14fx.fsf@gnu.org
Hi,
Ludovic Courtès <ludo@gnu.org> skribis:
Toggle quote (5 lines)> Thinking about it, the grafts code depends on what’s in the store: when> nothing is in the store, it bounces to the “build handler”, which> accumulates the list of missing store items, until it starts building> them.
So I can reproduce the problem Ricardo and you initially reported byrunning:
./pre-inst-env guix environment pigx-scrnaseq --search-paths
after removing some of the ungrafted store items with:
guix gc -D $(guix build r-rlang --no-grafts) \ $(guix gc --references $(guix build pigx-scrnaseq --no-grafts))
The seemingly endless CPU usage and unbound memory use comes from the‘build-accumulator’ build handler. Here, the graph of ‘pigx-scrnaseq’has many nodes, and many of them depend on, say, ‘r-rlang’. Thus, when‘r-rlang’ is not in the store, the grafting code keeps asking for it bycalling ‘build-derivations’, which aborts to the build handler; thebuild handler saves the .drv file name and the continuation and keepsgoing. But since the next package also depends on ‘r-langr’, we abortagain to the build handler, and so on.
The end result is a very long list of <unresolved> nodes, probably ofthis order in this case:
$ guix graph -t reverse-package r-rlang |grep 'label = "'|wc -l 594
Presumably, the captured continuations occupy quite a bit of memory,hence the quick growth.
I suppose one solution is to fire suspended builds when the buildhandler sees a build request for a given derivation for the second time.It needs more thought and testing…
Ludo’.
PS: Did you know ‘pigx-scrnaseq’ has twice as many nodes as ‘libreoffice’?
$ guix graph -t bag pigx-scrnaseq |grep 'label = "'|wc -l 1359 $ guix graph -t bag libreoffice |grep 'label = "'|wc -l 699
That makes it a great example to study and fix scalability issues!
S
S
Sarah Morgensen wrote 6 days ago
(name . Ludovic Courtès)(address . ludo@gnu.org)
861r7hpzzv.fsf@mgsn.dev
Hi,
Ludovic Courtès <ludo@gnu.org> writes:
Toggle quote (19 lines)> Hi,>> Ludovic Courtès <ludo@gnu.org> skribis:>>> Thinking about it, the grafts code depends on what’s in the store: when>> nothing is in the store, it bounces to the “build handler”, which>> accumulates the list of missing store items, until it starts building>> them.>> So I can reproduce the problem Ricardo and you initially reported by> running:>> ./pre-inst-env guix environment pigx-scrnaseq --search-paths>> after removing some of the ungrafted store items with:>> guix gc -D $(guix build r-rlang --no-grafts) \> $(guix gc --references $(guix build pigx-scrnaseq --no-grafts))
Same here. I'm glad we were able to pinpoint this!
Toggle quote (23 lines)>> The seemingly endless CPU usage and unbound memory use comes from the> ‘build-accumulator’ build handler. Here, the graph of ‘pigx-scrnaseq’> has many nodes, and many of them depend on, say, ‘r-rlang’. Thus, when> ‘r-rlang’ is not in the store, the grafting code keeps asking for it by> calling ‘build-derivations’, which aborts to the build handler; the> build handler saves the .drv file name and the continuation and keeps> going. But since the next package also depends on ‘r-langr’, we abort> again to the build handler, and so on.>> The end result is a very long list of <unresolved> nodes, probably of> this order in this case:>> $ guix graph -t reverse-package r-rlang |grep 'label = "'|wc -l> 594>> Presumably, the captured continuations occupy quite a bit of memory,> hence the quick growth.>> I suppose one solution is to fire suspended builds when the build> handler sees a build request for a given derivation for the second time.> It needs more thought and testing…
I wonder if instead it's possible to reduce the memory taken by thecontinuations? As someone who has absolutely no experience with thebuild/derivation system, it seems like all we *should* need to save isthe derivation file name.
Toggle quote (13 lines)>> Ludo’.>> PS: Did you know ‘pigx-scrnaseq’ has twice as many nodes as> ‘libreoffice’?>> $ guix graph -t bag pigx-scrnaseq |grep 'label = "'|wc -l> 1359> $ guix graph -t bag libreoffice |grep 'label = "'|wc -l> 699>> That makes it a great example to study and fix scalability issues!
For those interested, I've compiled a list of the top 60highest-dependency packages, as reported by package-closure:
pigx 1630nextcloud-client 1539akregator 1486kmail 1484korganizer 1481kdepim-runtime 1480kincidenceeditor 1478keventviews 1477kmailcommon 1476kcalendarsupport 1475kmessagelib 1474knotes 1472kaddressbook 1469libksieve 1467kdepim-apps-libs 1465libgravatar 1463kpimcommon 1462akonadi-calendar 1453pigx-bsseq 1448elisa 1446kaffeine 1432kdenlive 1431kmailtransport 1431dolphin-plugins 1426k3b 1424libkgapi 1422dolphin 1421kopete 1403pigx-sars-cov2-ww 1401krdc 1398baloo-widgets 1397baloo 1396pigx-chipseq 1396krfb 1389ffmpegthumbs 1388kget 1382kmplayer 1381kdelibs4support 1375pigx-scrnaseq 1342kdevelop 1340kmailimporter 1326libkdepim 1325pigx-rnaseq 1324labplot 1316smb4k 1315kleopatra 1311kalarmcal 1311choqok 1311okular 1310ktnef 1310ktorrent 1310kate 1308akonadi-search 1308kcalutils 1307yakuake 1306khelpcenter 1305libksysguard 1305kdeconnect 1304konsole 1304libkleo 1304
There seem to be a lot of kde packages in there, so perhaps this issueisn't as rare as we might otherwise expect?
--Sarah
?