Performance of package input rewriting

DoneSubmitted by Lars-Dominik Braun.
Details
4 participants
  • Lars-Dominik Braun
  • Ludovic Courtès
  • Ricardo Wurmus
  • zimoun
Owner
unassigned
Severity
normal
L
L
Lars-Dominik Braun wrote on 27 Oct 2020 14:26
(address . bug-guix@gnu.org)
20201027132614.GB3081@zpidnp36
Hi,
this issue is similar to https://issues.guix.gnu.org/41702,but I’m not sureit’s exactly the same. For guix-science I’m trying to provide some packageslike python-jupyterlab, which depend on a mix of packages from guix proper andnewer versions of packages already included in guix proper. Thus I need torewrite inputs of the former to the latter. (Because Python only propagatesdependencies and thus collisions would occur.)
Previously I have been doing this using package-input-rewriting, but startingan environment containing python-jupyterlab alone took about 20s (warm caches,all derivations in the store). Manually rewriting inputs by inheriting andalist-delete’ing brings this down to 3s, which is pretty significant.--no-grafts has not much of an impact (15s vs 2s) here. Seehttps://github.com/guix-science/guix-science/commit/972795a23cc9eb5a0bb1a2ffb5681d151fc4d4b0for the exact changes.
My expectation would be that package-input-rewriting is the preferred, becauseeasier, solution to this problem and thus should have minimal impact onperformance.
Cheers,Lars
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEEyk+M9DfXR4/aBV/UQhN3ARo3hEYFAl+YH/IACgkQQhN3ARo3hEYtWgv/ZeeJRbS39QGqBHhDiCkMrRAP2BxrkQeiIms/Vtp2JtOWEelrmdWj6ZNe+46a4EnqRX/kv6z+ZKZPHz5KBkEPvwnNoRQbdFOlKSFPCe5PrvNHbyhpscS+TMj/u/hGS8NKt1di24SfnyqJHeWYeK9QTQnxNwycsCnQx1SUB5BrWQx49jEVS1DmmwkYzcSnB0ztlhOu+Fa+D6CrkppC2BILPdQZc3sDYQ0Cj74VCp3FwSNJJ1spLeKXIMCrcLbzdL8mAMhr5b43+ePz0pBJ9FUz+ly5nV9Ml+d1oo+wpA6dn0REqLn+7Hu4uJJFvzkCIVJ8UiG8/7HCKPiX5YZd7ka5wgyGVbE9nIAFGJ58oB22dRaAcVWho0TzTITKoJOyQ1xuCN+RbjkpWSq89NqeNC5ajoa90p8yaW9ANRn4Z6jDMH26cEbCl/mNIolHKdF1Fsbcvyk55GYmPEGEMdzgTmSDOg31quMENSP2aYrMUntSu4LI0yQbnlhPnbzjz3oMKxDF=X55m-----END PGP SIGNATURE-----

Z
Z
zimoun wrote on 27 Oct 2020 15:14
86361z4uyp.fsf@gmail.com
Hi Lars,
On Tue, 27 Oct 2020 at 14:26, Lars-Dominik Braun <ldb@leibniz-psychology.org> wrote:
Toggle quote (8 lines)> Previously I have been doing this using package-input-rewriting, but starting> an environment containing python-jupyterlab alone took about 20s (warm caches,> all derivations in the store). Manually rewriting inputs by inheriting and> alist-delete’ing brings this down to 3s, which is pretty significant.> --no-grafts has not much of an impact (15s vs 2s) here. See> https://github.com/guix-science/guix-science/commit/972795a23cc9eb5a0bb1a2ffb5681d151fc4d4b0> for the exact changes.
Is it not related to “#:deep? #t“ by default? The default was #f.
Well, using ’inherit’ only rewrites the direct explicit dependencies.However, ’package-input-rewriting’ traverse all the graph ofdependencies and replaces accordingly. Maybe I misunderstand.

All the best,simon
R
R
Ricardo Wurmus wrote on 27 Oct 2020 20:58
(name . Lars-Dominik Braun)(address . ldb@leibniz-psychology.org)
87imav1lwp.fsf@elephly.net
Lars-Dominik Braun <ldb@leibniz-psychology.org> writes:
Toggle quote (12 lines)> this issue is similar to https://issues.guix.gnu.org/41702,but I’m not sure> it’s exactly the same. For guix-science I’m trying to provide some packages> like python-jupyterlab, which depend on a mix of packages from guix proper and> newer versions of packages already included in guix proper. Thus I need to> rewrite inputs of the former to the latter. (Because Python only propagates> dependencies and thus collisions would occur.)>> Previously I have been doing this using package-input-rewriting, but starting> an environment containing python-jupyterlab alone took about 20s (warm caches,> all derivations in the store). Manually rewriting inputs by inheriting and> alist-delete’ing brings this down to 3s, which is pretty significant.
Could you show us a concrete example? Input rewriting is recursive andwill traverse the whole package graph by default, even if you *know*that, say, GCC doesn’t need to be rewritten.
For the more generic “package-mapping” you can provide a “cut?”procedure to determine when to stop recursion. Perhaps this would makethings faster in your case?
-- Ricardo
L
L
Ludovic Courtès wrote on 28 Oct 2020 15:19
(name . zimoun)(address . zimon.toutoune@gmail.com)
87o8km301f.fsf@gnu.org
Hi,
zimoun <zimon.toutoune@gmail.com> skribis:
Toggle quote (12 lines)> On Tue, 27 Oct 2020 at 14:26, Lars-Dominik Braun <ldb@leibniz-psychology.org> wrote:>>> Previously I have been doing this using package-input-rewriting, but starting>> an environment containing python-jupyterlab alone took about 20s (warm caches,>> all derivations in the store). Manually rewriting inputs by inheriting and>> alist-delete’ing brings this down to 3s, which is pretty significant.>> --no-grafts has not much of an impact (15s vs 2s) here. See>> https://github.com/guix-science/guix-science/commit/972795a23cc9eb5a0bb1a2ffb5681d151fc4d4b0>> for the exact changes.>> Is it not related to “#:deep? #t“ by default? The default was #f.
Yes, that’s a possible culprit. Try passing #:deep? #f if it works foryour use case.
Another thing to look at is the <package> object graph (as show by ‘guixgraph’). Input rewriting can duplicate parts of the graph, which inturn defeats package->derivation memoization. Just looking at thenumber of nodes in the graph can give hints.
Like Ricardo wrote, it’d be great it you could share a short reproducer.
Thanks,Ludo’.
L
L
Lars-Dominik Braun wrote on 30 Oct 2020 09:42
20201030084245.GB3128@zpidnp36
Hi,
Toggle quote (2 lines)> Yes, that’s a possible culprit. Try passing #:deep? #f if it works for> your use case.
Yeah, that brings it down to ~8s, which is still alot.
Toggle quote (4 lines)> Another thing to look at is the <package> object graph (as show by ‘guix> graph’). Input rewriting can duplicate parts of the graph, which in> turn defeats package->derivation memoization. Just looking at the> number of nodes in the graph can give hints.
Aha, it’s 913 nodes without rewriting, 13916 with rewriting (#:deep? #t) and4286 with rewriting (#:deep? #f) as determined by a rather ad-hoc `guix graph-L . -t package python-jupyterlab | grep 'shape = box' | wc -l`. That seems waytoo much. Does that mean I’m using package rewriting in the wrong way or isthat a bug?
Unfortunately I don’t have a short reproducer right now. I’ll look at the graphmore closely to figure out which parts are actually duplicated. Maybe I cancreate a reproducing testcase with more information.
Cheers,Lars
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEEyk+M9DfXR4/aBV/UQhN3ARo3hEYFAl+b0gEACgkQQhN3ARo3hEZKNgv/XsWkY+fy4zRIPx/SK5XuAOjLzvKI9w4ahuvN8ewSGzpuuhRTyzU6TIN6AWgTJJjSkHkeFY8iDgHXwgJ9KIOBtNOz7Pwvg6yUG6+nEIVjxAmmL6T2q43FPHxzgLou6pWJjsNOx10gNsKduWDcwUdeYW9KDLN2HdY3RQHnzf93cK13YuF6DiCp63Hbsq454Q7MFHo7EMrGaNx6QwuSG4tqHW8wjs9er2ioECJ4DT9XJseUXEvNKjcfuUkWTpJWuMraO1Rzb4JZD5GkqzKzX32Dgf355b8MxR//XUwfgxI4hD57CS+xedrqCUFNXhJoQgHds3cAqO54Mv2b9004kDlqHc4tb/PFnsjVEjtBwG19ARTOwA+v+avHlGophYT8tnlrl9B+mufMqILVdq9eU/KvOHmv6wtxQycb/yX30gxecc7Bqqk0MaZ//B23hSSnReGdqmNtY/9hKnSva90iwLvZXoq9nBJo/BuBhcWcSEr2MrTwu7GIO3NAhnuHdYdKQhzq=Of5M-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 31 Oct 2020 11:27
(name . Lars-Dominik Braun)(address . ldb@leibniz-psychology.org)
87tuuavgg5.fsf@gnu.org
Hi Lars,
Lars-Dominik Braun <ldb@leibniz-psychology.org> skribis:
Toggle quote (10 lines)>> Another thing to look at is the <package> object graph (as show by ‘guix>> graph’). Input rewriting can duplicate parts of the graph, which in>> turn defeats package->derivation memoization. Just looking at the>> number of nodes in the graph can give hints.> Aha, it’s 913 nodes without rewriting, 13916 with rewriting (#:deep? #t) and> 4286 with rewriting (#:deep? #f) as determined by a rather ad-hoc `guix graph> -L . -t package python-jupyterlab | grep 'shape = box' | wc -l`. That seems way> too much. Does that mean I’m using package rewriting in the wrong way or is> that a bug?
It could be a mixture thereof. :-)
I guess it’s easy to end up creating huge object graphs. Here’s anexample of an anti-pattern:
(define a ((package-input-rewriting x) ((package-input-rewriting y) p1)))
(define b ((package-input-rewriting x) ((package-input-rewriting y) p2)))
The correct use is:
(define transform (package-input-rewriting (append x y)))
(define a (transform p1)) (define b (transform p2))
That guarantees that ‘a’ and ‘b’ share most of the nodes of their objectgraph.
From a quick look, the code in Guix-Science seemed to be following thepattern above.
For example, there’s:
Toggle snippet (29 lines)(define python-ipykernel-5.3-bootstrap (let ((rewritten ((package-input-rewriting `((,python-jupyter-client . ,python-jupyter-client-6.1-bootstrap) ;; Indirect through IPython. (,python-testpath . ,python-testpath-0.4) (,python-nbformat . ,python-nbformat-5.0))) python-ipykernel-5.3-proper))) (package (inherit rewritten) (name "python-ipykernel-bootstrap"))))
(define-public python-jupyter-client-6.1 ((package-input-rewriting `((,python-ipykernel . ,python-ipykernel-5.3-bootstrap) (,python-jupyter-core . ,python-jupyter-core-4.6) ;; Indirect through IPython. (,python-testpath . ,python-testpath-0.4) (,python-nbformat . ,python-nbformat-5.0))) python-jupyter-client-6.1-proper))
(define-public python-ipykernel-5.3 ((package-input-rewriting `((,python-jupyter-client . ,python-jupyter-client-6.1) ;; Indirect through IPython. (,python-testpath . ,python-testpath-0.4) (,python-nbformat . ,python-nbformat-5.0))) python-ipykernel-5.3-proper))
It seems to me that you’re redefining a dependency graph, node by node.Thus, you probably don’t need ‘package-input-rewriting’ here. What youdid in Guix-Science commit 972795a23cc9eb5a0bb1a2ffb5681d151fc4d4b0looks more appropriate to me, in terms of style and semantics.
Does that make sense?
Thanks,Ludo’.
L
L
Lars-Dominik Braun wrote on 3 Nov 2020 09:23
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 44254@debbugs.gnu.org)
20201103082343.GC3127@zpidnp36
Hi Ludo,
Toggle quote (16 lines)> I guess it’s easy to end up creating huge object graphs. Here’s an> example of an anti-pattern:> > (define a> ((package-input-rewriting x) ((package-input-rewriting y) p1))) > > (define b> ((package-input-rewriting x) ((package-input-rewriting y) p2)))> > The correct use is:> > (define transform> (package-input-rewriting (append x y)))> > (define a (transform p1))> (define b (transform p2))
that sounds like a section for the cookbook :)
Toggle quote (4 lines)> It seems to me that you’re redefining a dependency graph, node by node.> Thus, you probably don’t need ‘package-input-rewriting’ here. What you> did in Guix-Science commit 972795a23cc9eb5a0bb1a2ffb5681d151fc4d4b0> looks more appropriate to me, in terms of style and semantics.
Okay, got it. My initial concern was that rewriting the graph “by hand” (i.e.alist-delete) would be tedious and error-prone.
Thank you very much,Lars
-- Lars-Dominik BraunWissenschaftlicher Mitarbeiter/Research Associatewww.leibniz-psychology.orgZPID - Leibniz-Institut für Psychologie /ZPID - Leibniz Institute for PsychologyUniversitätsring 15D-54296 Trier - GermanyTel.: +49–651–201-4964
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEEyk+M9DfXR4/aBV/UQhN3ARo3hEYFAl+hE4sACgkQQhN3ARo3hEZxzgwA0HnQCSUKVxjo9Q0EPYBRZR5OBV1AADcFQ+aLy48K2F/5oS6+Dm5fsK3oWpog7EbunDAub8ry3oAfqnAMDfmLL0mLFiWpIZpLp5UEMnRpnc9PxifWvJRA3SAqDAhtMhBkEjuJ48+y4IxzFK7BzWTdTTuwUy6fv0Um7Bzy6uh515D9r0pJZ33EJunUGufM1llTjre+9rDo+orXWy52TwlqbiNaZm58tJ5DmrLeoOS5l8DoFcCA4Y076KPD91O/TXQMwU2a9pRwrBu+p1i+q9mjbHhJJVXZNJjvKQPMo6tiuS7z9+1LD2+0I4t6SC3rkEyXOpx49nV/yBzo6+K902MzqPEy97SfQKmeNLkO5rTuIMK7Z8auu4mFP8Doo+cyPRC/tl2UqJPDAK11tXrDj+C5TGsDpGsVNvM6MkoeXoMQdjQzr/YdFT/uASjA1/Wf2UINl/Rlfu0tZtI4UY2csHlJjy21fDyx9X4w+arBQVFh5YyC+fBv4TtIikJ6jluKAbfR=jZax-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 3 Nov 2020 10:32
(name . Lars-Dominik Braun)(address . ldb@leibniz-psychology.org)(address . 44254@debbugs.gnu.org)
87tuu6g50f.fsf@gnu.org
Hi,
Lars-Dominik Braun <ldb@leibniz-psychology.org> skribis:
Toggle quote (18 lines)>> I guess it’s easy to end up creating huge object graphs. Here’s an>> example of an anti-pattern:>> >> (define a>> ((package-input-rewriting x) ((package-input-rewriting y) p1))) >> >> (define b>> ((package-input-rewriting x) ((package-input-rewriting y) p2)))>> >> The correct use is:>> >> (define transform>> (package-input-rewriting (append x y)))>> >> (define a (transform p1))>> (define b (transform p2))> that sounds like a section for the cookbook :)
Note that there’s a new section in the manual on this topic:
https://guix.gnu.org/manual/devel/en/html_node/Defining-Package-Variants.html
Toggle quote (7 lines)>> It seems to me that you’re redefining a dependency graph, node by node.>> Thus, you probably don’t need ‘package-input-rewriting’ here. What you>> did in Guix-Science commit 972795a23cc9eb5a0bb1a2ffb5681d151fc4d4b0>> looks more appropriate to me, in terms of style and semantics.> Okay, got it. My initial concern was that rewriting the graph “by hand” (i.e.> alist-delete) would be tedious and error-prone.
I haven’t looked closely enough. If you can define a single procedurethat rewrites the graph, that’s of course better than rewriting nodesone by one. Maybe that’s possible, but you need to be careful aboutfactorizing the transformation procedure as I shown above.
Thanks,Ludo’.
L
L
Ludovic Courtès wrote on 16 Nov 2020 16:07
control message for bug #44254
(address . control@debbugs.gnu.org)
87y2j1uyqw.fsf@gnu.org
tags 44254 notabugclose 44254quit
?