[PATCH 0/3] Make some deduplicating speedups.

  • Open
  • quality assurance status badge
Details
2 participants
  • Ludovic Courtès
  • Christopher Baines
Owner
unassigned
Submitted by
Christopher Baines
Severity
normal
C
C
Christopher Baines wrote on 5 Jan 21:50 +0100
(address . guix-patches@gnu.org)
87h6jrqzc7.fsf@cbaines.net
Christopher Baines (3):
guix: utils: Add delete-duplicates/sort.
guix: derivations: Use delete-duplicates/sort.
guix: packages: Speed up deduplicating inputs.

guix/derivations.scm | 10 ++++++----
guix/packages.scm | 23 +++++++++++++++++++----
guix/utils.scm | 28 ++++++++++++++++++++++++++++
3 files changed, 53 insertions(+), 8 deletions(-)
-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmWYa5hfFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9Xd0ww//a5t0bHlFnCTiDMFqHSdw22JQ2UGs+HHb
Fu3n2KFmFBkmkcnRehpnm1AM3s5EvKPL54CIIgIa79feW3pvNPKov+tCLE8xszzg
lTjlEr8E3M8zr1kyfAgPwASZ1+DHmyIlfLeAViwnhLShx1Oj3tR5vIuLYdvrt4JA
5IRCSTnua7bleSm3zXOXaHFyblcau1EN64nD0z2lhBQXdzkyYfcr14+Qv/m7YlXH
NdaOTnpI51BkUTPEw1msH29/S2X2qGoju7hMEo6a5RCg/ZerCGR4kuKdyMqrueXP
XjhA4qTZZP/Bk4qx0SDq7vmGLG02bc03h4RUn0FUhK/6/bLbwU4VBJ3bppYPlSK5
kr8j0UmxgUUGBh6xcJ17ShkTc1F1+uIcjhpQKApQdH5It0IjTydMH+fabrW9ggOc
K3y/LWYRRp/ZO0vz9oMqq8YJ2ORYKWKEG0lLUCGkvGFzzDgsYdeWKGZB163+aujC
6IEwg5fMH950fW09LTaV+MZPDQgYGKgkRW8QfiLcknqJ4Ap46y2thusZE1euJtIk
zX0/FaBWQ9dGljdpG64RxU9e7KxCvBZs4Nr0C3ZUvqMTFPzMwLMdNte8ilF/5LD2
5TxTSsK3dHJsQFxz80p9I4IlreMw0ihg43ydN5cspKSMiyroF17xduBWsSbLLk61
VJ90VerJmbA=
=iBqK
-----END PGP SIGNATURE-----

C
C
Christopher Baines wrote on 5 Jan 21:53 +0100
[PATCH 1/3] guix: utils: Add delete-duplicates/sort.
(address . 68271@debbugs.gnu.org)
ce0156e7528f6a6a54f0469774b41604e7872a22.1704488002.git.mail@cbaines.net
Similar to delete-duplicates, but sorts the list first and then compares the
pairs in the list. This is faster than delete-duplicates for larger lists.

* guix/utils.scm (delete-duplicates): New procedure.

Change-Id: Ibc2897cdb7c76be0d0e099bc47fee005a88bea2e
---
guix/utils.scm | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)

Toggle diff (50 lines)
diff --git a/guix/utils.scm b/guix/utils.scm
index e4e9d922e7..c1c967ee6c 100644
--- a/guix/utils.scm
+++ b/guix/utils.scm
@@ -146,6 +146,8 @@ (define-module (guix utils)
edit-expression
delete-expression
+ delete-duplicates/sort
+
filtered-port
decompressed-port
call-with-decompressed-port
@@ -502,6 +504,32 @@ (define (delete-expression source-properties)
"Delete the expression specified by SOURCE-PROPERTIES."
(edit-expression source-properties (const "") #:include-trailing-newline? #t))
+
+;;;
+;;; Lists.
+;;;
+
+(define* (delete-duplicates/sort unsorted-lst less #:optional (equal? eq?))
+ (if (null? unsorted-lst)
+ unsorted-lst
+ (let ((sorted-lst (sort unsorted-lst
+ ;; Sort in the reverse order
+ (lambda (a b) (eq? #f (less a b))))))
+ (let loop ((lst (cdr sorted-lst))
+ (last-element (car sorted-lst))
+ (result (list (car sorted-lst))))
+ (if (null? lst)
+ result
+ (let ((current-element (car lst)))
+ (if (equal? current-element last-element)
+ (loop (cdr lst)
+ last-element
+ result)
+ (loop (cdr lst)
+ current-element
+ (cons current-element
+ result)))))))))
+
;;;
;;; Keyword arguments.

base-commit: deeb7d1f53d7ddfa977b3eadd760312bbd0a2509
--
2.41.0
C
C
Christopher Baines wrote on 5 Jan 21:53 +0100
[PATCH 2/3] guix: derivations: Use delete-duplicates/sort.
(address . 68271@debbugs.gnu.org)
cb11dedad1ec9838e33af735b5176337334dc362.1704488002.git.mail@cbaines.net
As this seems to be a small speedup, as tested by computing derivations for
all packages targeting i586-pc-gnu.

* guix/derivations.scm (derivation/masked-inputs): Use delete-duplicates/sort.

Change-Id: I9ec963c10e67a525037c346f44c92a87376935c5
---
guix/derivations.scm | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

Toggle diff (23 lines)
diff --git a/guix/derivations.scm b/guix/derivations.scm
index 9fec7f4f0b..29c7ef9a5c 100644
--- a/guix/derivations.scm
+++ b/guix/derivations.scm
@@ -745,10 +745,12 @@ (define (derivation/masked-inputs drv)
(make-derivation-input hash sub-drvs))))
inputs)))
(make-derivation outputs
- (sort (delete-duplicates inputs)
- (lambda (drv1 drv2)
- (string<? (derivation-input-derivation drv1)
- (derivation-input-derivation drv2))))
+ (delete-duplicates/sort
+ inputs
+ (lambda (drv1 drv2)
+ (string<? (derivation-input-derivation drv1)
+ (derivation-input-derivation drv2)))
+ eq?)
sources
system builder args env-vars
#f)))))
--
2.41.0
C
C
Christopher Baines wrote on 5 Jan 21:53 +0100
[PATCH 3/3] guix: packages: Speed up deduplicating inputs.
(address . 68271@debbugs.gnu.org)
7bb6eeca77516fdd01e9b0b98eb9e21ac87c7509.1704488002.git.mail@cbaines.net
Use delete-duplicates/sort rather than delete-duplicates, as this seems to
perform a little better, at least when testing by computing derivations
targeting i586-pc-gnu for all packages.

* guix/packages.scm (input<?, deduplicate-inputs): New procedures.
(bag-derivation, bag->cross-derivation): Use deduplicate-inputs.

Change-Id: Ic47b50aa52f11d701e5aefa2a095219e3a98cfd1
---
guix/packages.scm | 23 +++++++++++++++++++----
1 file changed, 19 insertions(+), 4 deletions(-)

Toggle diff (50 lines)
diff --git a/guix/packages.scm b/guix/packages.scm
index 930b1a3b0e..09dc88e2af 100644
--- a/guix/packages.scm
+++ b/guix/packages.scm
@@ -1889,6 +1889,21 @@ (define (input=? input1 input2)
(derivation=? obj1 obj2))
(equal? obj1 obj2))))))))
+(define (input<? input1 input2)
+ (let ((label1 (first input1))
+ (label2 (first input2)))
+ (if (string=? label1 label2)
+ (let ((obj1 (second input1))
+ (obj2 (second input2)))
+ (if (and (derivation? obj1) (derivation? obj2))
+ (string<? (derivation-file-name obj1)
+ (derivation-file-name obj2))
+ #f))
+ (string<? label1 label2))))
+
+(define-inlinable (deduplicate-inputs inputs)
+ (delete-duplicates/sort inputs input<? input=?))
+
(define* (bag->derivation bag #:optional context)
"Return the derivation to build BAG for SYSTEM. Optionally, CONTEXT can be
a package object describing the context in which the call occurs, for improved
@@ -1911,7 +1926,7 @@ (define* (bag->derivation bag #:optional context)
;; that lead to the same derivation. Delete those duplicates to avoid
;; issues down the road, such as duplicate entries in '%build-inputs'.
(apply (bag-build bag) (bag-name bag)
- (delete-duplicates input-drvs input=?)
+ (deduplicate-inputs input-drvs)
#:search-paths paths
#:outputs (bag-outputs bag) #:system system
(bag-arguments bag)))))
@@ -1951,9 +1966,9 @@ (define* (bag->cross-derivation bag #:optional context)
all))))
(apply (bag-build bag) (bag-name bag)
- #:build-inputs (delete-duplicates build-drvs input=?)
- #:host-inputs (delete-duplicates host-drvs input=?)
- #:target-inputs (delete-duplicates target-drvs input=?)
+ #:build-inputs (deduplicate-inputs build-drvs)
+ #:host-inputs (deduplicate-inputs host-drvs)
+ #:target-inputs (deduplicate-inputs target-drvs)
#:search-paths paths
#:native-search-paths npaths
#:outputs (bag-outputs bag)
--
2.41.0
L
L
Ludovic Courtès wrote on 12 Jan 14:53 +0100
Re: [bug#68271] [PATCH 1/3] guix: utils: Add delete-duplicates/sort.
(name . Christopher Baines)(address . mail@cbaines.net)
87h6jik68o.fsf@gnu.org
Christopher Baines <mail@cbaines.net> skribis:

Toggle quote (3 lines)
> Similar to delete-duplicates, but sorts the list first and then compares the
> pairs in the list. This is faster than delete-duplicates for larger lists.

We’re dealing with small lists here though (derivation inputs). Did you
try to time the benefit?

There’s a complexity/benefit tradeoff and I wonder if it cuts it.

Toggle quote (4 lines)
> +(define* (delete-duplicates/sort unsorted-lst less #:optional (equal? eq?))
> + (if (null? unsorted-lst)
> + unsorted-lst

This ‘if’ is unnecessary.

Toggle quote (4 lines)
> + (let ((sorted-lst (sort unsorted-lst
> + ;; Sort in the reverse order
> + (lambda (a b) (eq? #f (less a b))))))

Just: (negate less).

Toggle quote (7 lines)
> + (let loop ((lst (cdr sorted-lst))
> + (last-element (car sorted-lst))
> + (result (list (car sorted-lst))))
> + (if (null? lst)
> + result
> + (let ((current-element (car lst)))

Please follow the convention of using ‘match’ rather than car caddr
caddddar (info "(guix) Data Types and Pattern Matching"):

(match lst
(()
result)
((head . tail)
…))

Ludo’.
?
Your comment

Commenting via the web interface is currently disabled.

To comment on this conversation send an email to 68271@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 68271
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch