From debbugs-submit-bounces@debbugs.gnu.org Mon Aug 09 17:26:15 2021 Received: (at 49439) by debbugs.gnu.org; 9 Aug 2021 21:26:15 +0000 Received: from localhost ([127.0.0.1]:57941 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mDCmk-0007sh-Ss for submit@debbugs.gnu.org; Mon, 09 Aug 2021 17:26:15 -0400 Received: from eggs.gnu.org ([209.51.188.92]:45206) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mDCmi-0007sU-Qh for 49439@debbugs.gnu.org; Mon, 09 Aug 2021 17:26:13 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:56448) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mDCmc-0003hp-6K; Mon, 09 Aug 2021 17:26:06 -0400 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=48708 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mDCmb-0003QM-U2; Mon, 09 Aug 2021 17:26:06 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Sarah Morgensen Subject: Re: bug#49439: grafts cause =?utf-8?Q?=E2=80=9Cguix_environment?= =?utf-8?Q?=E2=80=9D?= to get killed with OOM References: <87sg0rse1n.fsf@elephly.net> <86tuklr5g6.fsf@mgsn.dev> <87y29r3enb.fsf@gnu.org> <867dhbpbyt.fsf@mgsn.dev> <87k0la3ghk.fsf@gnu.org> <87wnpa14fx.fsf@gnu.org> <861r7hpzzv.fsf@mgsn.dev> Date: Mon, 09 Aug 2021 23:26:03 +0200 Message-ID: <87czqmtims.fsf@inria.fr> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 49439 Cc: Ricardo Wurmus , 49439@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi! Sarah Morgensen skribis: > Ludovic Court=C3=A8s writes: [...] >> The seemingly endless CPU usage and unbound memory use comes from the >> =E2=80=98build-accumulator=E2=80=99 build handler. Here, the graph of = =E2=80=98pigx-scrnaseq=E2=80=99 >> has many nodes, and many of them depend on, say, =E2=80=98r-rlang=E2=80= =99. Thus, when >> =E2=80=98r-rlang=E2=80=99 is not in the store, the grafting code keeps a= sking for it by >> calling =E2=80=98build-derivations=E2=80=99, which aborts to the build h= andler; the >> build handler saves the .drv file name and the continuation and keeps >> going. But since the next package also depends on =E2=80=98r-langr=E2= =80=99, we abort >> again to the build handler, and so on. >> >> The end result is a very long list of nodes, probably of >> this order in this case: >> >> $ guix graph -t reverse-package r-rlang |grep 'label =3D "'|wc -l >> 594 >> >> Presumably, the captured continuations occupy quite a bit of memory, >> hence the quick growth. >> >> I suppose one solution is to fire suspended builds when the build >> handler sees a build request for a given derivation for the second time. >> It needs more thought and testing=E2=80=A6 > > I wonder if instead it's possible to reduce the memory taken by the > continuations? As someone who has absolutely no experience with the > build/derivation system, it seems like all we *should* need to save is > the derivation file name. Continuations may take a bit of room but the main problem is that we=E2=80= =99re storing too many of them. :-) Roughly, I think we keep one node per incoming edge into a package not in the store (=E2=80=98r-rlang=E2=80=99 in the example above). = In a case like =E2=80=98pigx-scrnaseq=E2=80=99, that=E2=80=99s a lot of items. I added counters and =E2=80=98pk=E2=80=99 calls in =E2=80=98map/accumulate-= builds=E2=80=99 & co. to see what happens. AFAICS, a cutoff as in the attached patch does the job. It=E2=80=99s a basic heuristic to avoid unbounded growth, at the expense of possibly reduced accumulation. For example, here=E2=80=99s the effect on my user profile with 300+ packages: --8<---------------cut here---------------start------------->8--- $ # with cutoff $ time GUIX_PROFILING=3Dgc ./pre-inst-env guix upgrade -n [...] 2,926.7 MB would be downloaded Garbage collection statistics: heap size: 119.37 MiB allocated: 849.56 MiB GC times: 30 time spent in GC: 2.88 seconds (30% of user time) real 0m8.987s user 0m9.482s sys 0m0.186s $=20 $ # without cutoff $ time GUIX_PROFILING=3Dgc ./pre-inst-env guix upgrade -n [...] 3,402.5 MB would be downloaded Garbage collection statistics: heap size: 127.37 MiB allocated: 1504.59 MiB GC times: 46 time spent in GC: 5.31 seconds (29% of user time) real 0m21.616s user 0m18.082s sys 0m0.255s --8<---------------cut here---------------end--------------->8--- You can tell that, without the cutoff (2nd run), the command more accurately finds out what it=E2=80=99s going to have to download since the =E2=80=9Cwould be downloaded=E2=80=9D figure is higher; with the cutoff (1s= t run), it =E2=80=9Csees less=E2=80=9D of what it=E2=80=99s going to end up downloadin= g. This could be annoying from a UI viewpoint in =E2=80=9Cextreme=E2=80=9D cases like this o= ne, but most likely the difference would be invisible in many cases. More importantly, having this cutoff halves the execution time, which is nice. The command initially given in this bug report now behaves like this: --8<---------------cut here---------------start------------->8--- $ time GUIX_PROFILING=3Dgc ./pre-inst-env guix environment pigx-scrnaseq -= -search-paths -n -v2 41.8 MB would be downloaded: /gnu/store/difgj9gf8l7y5bsilcl42x2vzgh493c6-r-utf8-1.2.2 /gnu/store/z4rpk1sn3jhy8brsr30zccln8mira3z5-r-purrr-0.3.4 /gnu/store/nkiga9rfd8a9gdfsp2rjcqz39s8p2hg3-r-magrittr-2.0.1 /gnu/store/c5y0xlw1fbcwa5p5qnk4xji286hd7cqk-r-vctrs-0.3.8 /gnu/store/86vz257dqy4vm6s7axq7zl2jqxacgngf-r-ellipsis-0.3.2 /gnu/store/m2m7h497g5n6vdrp5pxsnr2v8hsriq5f-r-glue-1.4.2 /gnu/store/0zv1sl91fz3ddq8namdfvf6yr0j1p2vx-texlive-bin-20190410 /gnu/store/7ks38sv5cpg7x76g2fdp9cra3x7dpbyq-texlive-union-51265 /gnu/store/vkdjhkl5s025jsjy7jrr9q7kxlsi2sc4-r-minimal-4.1.0 /gnu/store/pysc6ixb5fj1m2n50dac6aq5lgd5bnv4-r-rlang-0.4.11 Garbage collection statistics: heap size: 127.31 MiB allocated: 621.97 MiB GC times: 24 time spent in GC: 2.82 seconds (37% of user time) real 0m6.493s user 0m7.584s sys 0m0.117s --8<---------------cut here---------------end--------------->8--- This time it completes, which is already an improvement ;-). The 41.8=C2=A0MB download reported is underestimated, but again that=E2=80=99s = the tradeoff we=E2=80=99re making. Attached is the patch. I=E2=80=99ll go ahead with that if there are no objections. > For those interested, I've compiled a list of the top 60 > highest-dependency packages, as reported by package-closure: > > pigx 1630 > nextcloud-client 1539 > akregator 1486 > kmail 1484 > korganizer 1481 > kdepim-runtime 1480 > kincidenceeditor 1478 > keventviews 1477 > kmailcommon 1476 > kcalendarsupport 1475 Nice! I really need to stop taking =E2=80=98libreoffice=E2=80=99 as an exa= mple. Thanks, Ludo=E2=80=99. --=-=-= Content-Type: text/x-patch Content-Disposition: inline diff --git a/guix/store.scm b/guix/store.scm index 1ab2b08b47..340ad8bc10 100644 --- a/guix/store.scm +++ b/guix/store.scm @@ -1358,11 +1358,27 @@ on the build output of a previous derivation." (define (map/accumulate-builds store proc lst) "Apply PROC over each element of LST, accumulating 'build-things' calls and coalescing them into a single call." - (define result - (map (lambda (obj) - (with-build-handler build-accumulator - (proc obj))) - lst)) + (define accumulation-cutoff + ;; Threshold above which we stop accumulating unresolved nodes to avoid + ;; pessimal behavior. See . + 30) + + (define-values (result rest) + (let loop ((lst lst) + (result '()) + (unresolved 0)) + (match lst + ((head . tail) + (match (with-build-handler build-accumulator + (proc head)) + ((? unresolved? obj) + (if (> unresolved accumulation-cutoff) + (values (reverse (cons obj result)) tail) + (loop tail (cons obj result) (+ 1 unresolved)))) + (obj + (loop tail (cons obj result) unresolved)))) + (() + (values (reverse result) lst))))) (match (append-map (lambda (obj) (if (unresolved? obj) @@ -1370,6 +1386,7 @@ coalescing them into a single call." '())) result) (() + ;; REST is necessarily empty. result) (to-build ;; We've accumulated things TO-BUILD. Actually build them and resume the @@ -1382,7 +1399,7 @@ coalescing them into a single call." ;; unnecessary. ((unresolved-continuation obj) #f) obj)) - result)))) + (append result rest))))) (define build-things (let ((build (operation (build-things (string-list things) --=-=-=--