Performance of package input rewriting

  • Done
  • quality assurance status badge
Details
4 participants
  • Lars-Dominik Braun
  • Ludovic Courtès
  • Ricardo Wurmus
  • zimoun
Owner
unassigned
Submitted by
Lars-Dominik Braun
Severity
normal
L
L
Lars-Dominik Braun wrote on 27 Oct 2020 14:26
(address . bug-guix@gnu.org)
20201027132614.GB3081@zpidnp36
Hi,

this issue is similar to https://issues.guix.gnu.org/41702,but I’m not sure
it’s exactly the same. For guix-science I’m trying to provide some packages
like python-jupyterlab, which depend on a mix of packages from guix proper and
newer versions of packages already included in guix proper. Thus I need to
rewrite inputs of the former to the latter. (Because Python only propagates
dependencies and thus collisions would occur.)

Previously I have been doing this using package-input-rewriting, but starting
an environment containing python-jupyterlab alone took about 20s (warm caches,
all derivations in the store). Manually rewriting inputs by inheriting and
alist-delete’ing brings this down to 3s, which is pretty significant.
--no-grafts has not much of an impact (15s vs 2s) here. See
for the exact changes.

My expectation would be that package-input-rewriting is the preferred, because
easier, solution to this problem and thus should have minimal impact on
performance.

Cheers,
Lars
-----BEGIN PGP SIGNATURE-----

iQGzBAABCgAdFiEEyk+M9DfXR4/aBV/UQhN3ARo3hEYFAl+YH/IACgkQQhN3ARo3
hEYtWgv/ZeeJRbS39QGqBHhDiCkMrRAP2BxrkQeiIms/Vtp2JtOWEelrmdWj6ZNe
+46a4EnqRX/kv6z+ZKZPHz5KBkEPvwnNoRQbdFOlKSFPCe5PrvNHbyhpscS+TMj/
u/hGS8NKt1di24SfnyqJHeWYeK9QTQnxNwycsCnQx1SUB5BrWQx49jEVS1DmmwkY
zcSnB0ztlhOu+Fa+D6CrkppC2BILPdQZc3sDYQ0Cj74VCp3FwSNJJ1spLeKXIMCr
cLbzdL8mAMhr5b43+ePz0pBJ9FUz+ly5nV9Ml+d1oo+wpA6dn0REqLn+7Hu4uJJF
vzkCIVJ8UiG8/7HCKPiX5YZd7ka5wgyGVbE9nIAFGJ58oB22dRaAcVWho0TzTITK
oJOyQ1xuCN+RbjkpWSq89NqeNC5ajoa90p8yaW9ANRn4Z6jDMH26cEbCl/mNIolH
KdF1Fsbcvyk55GYmPEGEMdzgTmSDOg31quMENSP2aYrMUntSu4LI0yQbnlhPnbzj
z3oMKxDF
=X55m
-----END PGP SIGNATURE-----


Z
Z
zimoun wrote on 27 Oct 2020 15:14
86361z4uyp.fsf@gmail.com
Hi Lars,

On Tue, 27 Oct 2020 at 14:26, Lars-Dominik Braun <ldb@leibniz-psychology.org> wrote:

Toggle quote (8 lines)
> Previously I have been doing this using package-input-rewriting, but starting
> an environment containing python-jupyterlab alone took about 20s (warm caches,
> all derivations in the store). Manually rewriting inputs by inheriting and
> alist-delete’ing brings this down to 3s, which is pretty significant.
> --no-grafts has not much of an impact (15s vs 2s) here. See
> https://github.com/guix-science/guix-science/commit/972795a23cc9eb5a0bb1a2ffb5681d151fc4d4b0
> for the exact changes.

Is it not related to “#:deep? #t“ by default? The default was #f.

Well, using ’inherit’ only rewrites the direct explicit dependencies.
However, ’package-input-rewriting’ traverse all the graph of
dependencies and replaces accordingly. Maybe I misunderstand.


All the best,
simon
R
R
Ricardo Wurmus wrote on 27 Oct 2020 20:58
(name . Lars-Dominik Braun)(address . ldb@leibniz-psychology.org)
87imav1lwp.fsf@elephly.net
Lars-Dominik Braun <ldb@leibniz-psychology.org> writes:

Toggle quote (12 lines)
> this issue is similar to https://issues.guix.gnu.org/41702,but I’m not sure
> it’s exactly the same. For guix-science I’m trying to provide some packages
> like python-jupyterlab, which depend on a mix of packages from guix proper and
> newer versions of packages already included in guix proper. Thus I need to
> rewrite inputs of the former to the latter. (Because Python only propagates
> dependencies and thus collisions would occur.)
>
> Previously I have been doing this using package-input-rewriting, but starting
> an environment containing python-jupyterlab alone took about 20s (warm caches,
> all derivations in the store). Manually rewriting inputs by inheriting and
> alist-delete’ing brings this down to 3s, which is pretty significant.

Could you show us a concrete example? Input rewriting is recursive and
will traverse the whole package graph by default, even if you *know*
that, say, GCC doesn’t need to be rewritten.

For the more generic “package-mapping” you can provide a “cut?”
procedure to determine when to stop recursion. Perhaps this would make
things faster in your case?

--
Ricardo
L
L
Ludovic Courtès wrote on 28 Oct 2020 15:19
(name . zimoun)(address . zimon.toutoune@gmail.com)
87o8km301f.fsf@gnu.org
Hi,

zimoun <zimon.toutoune@gmail.com> skribis:

Toggle quote (12 lines)
> On Tue, 27 Oct 2020 at 14:26, Lars-Dominik Braun <ldb@leibniz-psychology.org> wrote:
>
>> Previously I have been doing this using package-input-rewriting, but starting
>> an environment containing python-jupyterlab alone took about 20s (warm caches,
>> all derivations in the store). Manually rewriting inputs by inheriting and
>> alist-delete’ing brings this down to 3s, which is pretty significant.
>> --no-grafts has not much of an impact (15s vs 2s) here. See
>> https://github.com/guix-science/guix-science/commit/972795a23cc9eb5a0bb1a2ffb5681d151fc4d4b0
>> for the exact changes.
>
> Is it not related to “#:deep? #t“ by default? The default was #f.

Yes, that’s a possible culprit. Try passing #:deep? #f if it works for
your use case.

Another thing to look at is the <package> object graph (as show by ‘guix
graph’). Input rewriting can duplicate parts of the graph, which in
turn defeats package->derivation memoization. Just looking at the
number of nodes in the graph can give hints.

Like Ricardo wrote, it’d be great it you could share a short reproducer.

Thanks,
Ludo’.
L
L
Lars-Dominik Braun wrote on 30 Oct 2020 09:42
20201030084245.GB3128@zpidnp36
Hi,

Toggle quote (2 lines)
> Yes, that’s a possible culprit. Try passing #:deep? #f if it works for
> your use case.
Yeah, that brings it down to ~8s, which is still alot.

Toggle quote (4 lines)
> Another thing to look at is the <package> object graph (as show by ‘guix
> graph’). Input rewriting can duplicate parts of the graph, which in
> turn defeats package->derivation memoization. Just looking at the
> number of nodes in the graph can give hints.
Aha, it’s 913 nodes without rewriting, 13916 with rewriting (#:deep? #t) and
4286 with rewriting (#:deep? #f) as determined by a rather ad-hoc `guix graph
-L . -t package python-jupyterlab | grep 'shape = box' | wc -l`. That seems way
too much. Does that mean I’m using package rewriting in the wrong way or is
that a bug?

Unfortunately I don’t have a short reproducer right now. I’ll look at the graph
more closely to figure out which parts are actually duplicated. Maybe I can
create a reproducing testcase with more information.

Cheers,
Lars
-----BEGIN PGP SIGNATURE-----

iQGzBAABCgAdFiEEyk+M9DfXR4/aBV/UQhN3ARo3hEYFAl+b0gEACgkQQhN3ARo3
hEZKNgv/XsWkY+fy4zRIPx/SK5XuAOjLzvKI9w4ahuvN8ewSGzpuuhRTyzU6TIN6
AWgTJJjSkHkeFY8iDgHXwgJ9KIOBtNOz7Pwvg6yUG6+nEIVjxAmmL6T2q43FPHxz
gLou6pWJjsNOx10gNsKduWDcwUdeYW9KDLN2HdY3RQHnzf93cK13YuF6DiCp63Hb
sq454Q7MFHo7EMrGaNx6QwuSG4tqHW8wjs9er2ioECJ4DT9XJseUXEvNKjcfuUkW
TpJWuMraO1Rzb4JZD5GkqzKzX32Dgf355b8MxR//XUwfgxI4hD57CS+xedrqCUFN
XhJoQgHds3cAqO54Mv2b9004kDlqHc4tb/PFnsjVEjtBwG19ARTOwA+v+avHlGop
hYT8tnlrl9B+mufMqILVdq9eU/KvOHmv6wtxQycb/yX30gxecc7Bqqk0MaZ//B23
hSSnReGdqmNtY/9hKnSva90iwLvZXoq9nBJo/BuBhcWcSEr2MrTwu7GIO3NAhnuH
dYdKQhzq
=Of5M
-----END PGP SIGNATURE-----


L
L
Ludovic Courtès wrote on 31 Oct 2020 11:27
(name . Lars-Dominik Braun)(address . ldb@leibniz-psychology.org)
87tuuavgg5.fsf@gnu.org
Hi Lars,

Lars-Dominik Braun <ldb@leibniz-psychology.org> skribis:

Toggle quote (10 lines)
>> Another thing to look at is the <package> object graph (as show by ‘guix
>> graph’). Input rewriting can duplicate parts of the graph, which in
>> turn defeats package->derivation memoization. Just looking at the
>> number of nodes in the graph can give hints.
> Aha, it’s 913 nodes without rewriting, 13916 with rewriting (#:deep? #t) and
> 4286 with rewriting (#:deep? #f) as determined by a rather ad-hoc `guix graph
> -L . -t package python-jupyterlab | grep 'shape = box' | wc -l`. That seems way
> too much. Does that mean I’m using package rewriting in the wrong way or is
> that a bug?

It could be a mixture thereof. :-)

I guess it’s easy to end up creating huge object graphs. Here’s an
example of an anti-pattern:

(define a
((package-input-rewriting x) ((package-input-rewriting y) p1)))

(define b
((package-input-rewriting x) ((package-input-rewriting y) p2)))

The correct use is:

(define transform
(package-input-rewriting (append x y)))

(define a (transform p1))
(define b (transform p2))

That guarantees that ‘a’ and ‘b’ share most of the nodes of their object
graph.

From a quick look, the code in Guix-Science seemed to be following the
pattern above.

For example, there’s:

Toggle snippet (29 lines)
(define python-ipykernel-5.3-bootstrap
(let ((rewritten ((package-input-rewriting
`((,python-jupyter-client . ,python-jupyter-client-6.1-bootstrap)
;; Indirect through IPython.
(,python-testpath . ,python-testpath-0.4)
(,python-nbformat . ,python-nbformat-5.0)))
python-ipykernel-5.3-proper)))
(package
(inherit rewritten)
(name "python-ipykernel-bootstrap"))))

(define-public python-jupyter-client-6.1
((package-input-rewriting
`((,python-ipykernel . ,python-ipykernel-5.3-bootstrap)
(,python-jupyter-core . ,python-jupyter-core-4.6)
;; Indirect through IPython.
(,python-testpath . ,python-testpath-0.4)
(,python-nbformat . ,python-nbformat-5.0)))
python-jupyter-client-6.1-proper))

(define-public python-ipykernel-5.3
((package-input-rewriting
`((,python-jupyter-client . ,python-jupyter-client-6.1)
;; Indirect through IPython.
(,python-testpath . ,python-testpath-0.4)
(,python-nbformat . ,python-nbformat-5.0)))
python-ipykernel-5.3-proper))

It seems to me that you’re redefining a dependency graph, node by node.
Thus, you probably don’t need ‘package-input-rewriting’ here. What you
did in Guix-Science commit 972795a23cc9eb5a0bb1a2ffb5681d151fc4d4b0
looks more appropriate to me, in terms of style and semantics.

Does that make sense?

Thanks,
Ludo’.
L
L
Lars-Dominik Braun wrote on 3 Nov 2020 09:23
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 44254@debbugs.gnu.org)
20201103082343.GC3127@zpidnp36
Hi Ludo,

Toggle quote (16 lines)
> I guess it’s easy to end up creating huge object graphs. Here’s an
> example of an anti-pattern:
>
> (define a
> ((package-input-rewriting x) ((package-input-rewriting y) p1)))
>
> (define b
> ((package-input-rewriting x) ((package-input-rewriting y) p2)))
>
> The correct use is:
>
> (define transform
> (package-input-rewriting (append x y)))
>
> (define a (transform p1))
> (define b (transform p2))
that sounds like a section for the cookbook :)

Toggle quote (4 lines)
> It seems to me that you’re redefining a dependency graph, node by node.
> Thus, you probably don’t need ‘package-input-rewriting’ here. What you
> did in Guix-Science commit 972795a23cc9eb5a0bb1a2ffb5681d151fc4d4b0
> looks more appropriate to me, in terms of style and semantics.
Okay, got it. My initial concern was that rewriting the graph “by hand” (i.e.
alist-delete) would be tedious and error-prone.

Thank you very much,
Lars

--
Lars-Dominik Braun
Wissenschaftlicher Mitarbeiter/Research Associate
www.leibniz-psychology.org
ZPID - Leibniz-Institut für Psychologie /
ZPID - Leibniz Institute for Psychology
Universitätsring 15
D-54296 Trier - Germany
Tel.: +49–651–201-4964
-----BEGIN PGP SIGNATURE-----

iQGzBAABCgAdFiEEyk+M9DfXR4/aBV/UQhN3ARo3hEYFAl+hE4sACgkQQhN3ARo3
hEZxzgwA0HnQCSUKVxjo9Q0EPYBRZR5OBV1AADcFQ+aLy48K2F/5oS6+Dm5fsK3o
Wpog7EbunDAub8ry3oAfqnAMDfmLL0mLFiWpIZpLp5UEMnRpnc9PxifWvJRA3SAq
DAhtMhBkEjuJ48+y4IxzFK7BzWTdTTuwUy6fv0Um7Bzy6uh515D9r0pJZ33EJunU
GufM1llTjre+9rDo+orXWy52TwlqbiNaZm58tJ5DmrLeoOS5l8DoFcCA4Y076KPD
91O/TXQMwU2a9pRwrBu+p1i+q9mjbHhJJVXZNJjvKQPMo6tiuS7z9+1LD2+0I4t6
SC3rkEyXOpx49nV/yBzo6+K902MzqPEy97SfQKmeNLkO5rTuIMK7Z8auu4mFP8Do
o+cyPRC/tl2UqJPDAK11tXrDj+C5TGsDpGsVNvM6MkoeXoMQdjQzr/YdFT/uASjA
1/Wf2UINl/Rlfu0tZtI4UY2csHlJjy21fDyx9X4w+arBQVFh5YyC+fBv4TtIikJ6
jluKAbfR
=jZax
-----END PGP SIGNATURE-----


L
L
Ludovic Courtès wrote on 3 Nov 2020 10:32
(name . Lars-Dominik Braun)(address . ldb@leibniz-psychology.org)(address . 44254@debbugs.gnu.org)
87tuu6g50f.fsf@gnu.org
Hi,

Lars-Dominik Braun <ldb@leibniz-psychology.org> skribis:

Toggle quote (18 lines)
>> I guess it’s easy to end up creating huge object graphs. Here’s an
>> example of an anti-pattern:
>>
>> (define a
>> ((package-input-rewriting x) ((package-input-rewriting y) p1)))
>>
>> (define b
>> ((package-input-rewriting x) ((package-input-rewriting y) p2)))
>>
>> The correct use is:
>>
>> (define transform
>> (package-input-rewriting (append x y)))
>>
>> (define a (transform p1))
>> (define b (transform p2))
> that sounds like a section for the cookbook :)

Note that there’s a new section in the manual on this topic:


Toggle quote (7 lines)
>> It seems to me that you’re redefining a dependency graph, node by node.
>> Thus, you probably don’t need ‘package-input-rewriting’ here. What you
>> did in Guix-Science commit 972795a23cc9eb5a0bb1a2ffb5681d151fc4d4b0
>> looks more appropriate to me, in terms of style and semantics.
> Okay, got it. My initial concern was that rewriting the graph “by hand” (i.e.
> alist-delete) would be tedious and error-prone.

I haven’t looked closely enough. If you can define a single procedure
that rewrites the graph, that’s of course better than rewriting nodes
one by one. Maybe that’s possible, but you need to be careful about
factorizing the transformation procedure as I shown above.

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 16 Nov 2020 16:07
control message for bug #44254
(address . control@debbugs.gnu.org)
87y2j1uyqw.fsf@gnu.org
tags 44254 notabug
close 44254
quit
?