[PATCH 0/3] Make some deduplicating speedups.

  • Open
  • quality assurance status badge
Details
2 participants
  • Ludovic Courtès
  • Christopher Baines
Owner
unassigned
Submitted by
Christopher Baines
Severity
normal
C
C
Christopher Baines wrote on 5 Jan 21:50 +0100
(address . guix-patches@gnu.org)
87h6jrqzc7.fsf@cbaines.net
Christopher Baines (3):
guix: utils: Add delete-duplicates/sort.
guix: derivations: Use delete-duplicates/sort.
guix: packages: Speed up deduplicating inputs.

guix/derivations.scm | 10 ++++++----
guix/packages.scm | 23 +++++++++++++++++++----
guix/utils.scm | 28 ++++++++++++++++++++++++++++
3 files changed, 53 insertions(+), 8 deletions(-)
-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmWYa5hfFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9Xd0ww//a5t0bHlFnCTiDMFqHSdw22JQ2UGs+HHb
Fu3n2KFmFBkmkcnRehpnm1AM3s5EvKPL54CIIgIa79feW3pvNPKov+tCLE8xszzg
lTjlEr8E3M8zr1kyfAgPwASZ1+DHmyIlfLeAViwnhLShx1Oj3tR5vIuLYdvrt4JA
5IRCSTnua7bleSm3zXOXaHFyblcau1EN64nD0z2lhBQXdzkyYfcr14+Qv/m7YlXH
NdaOTnpI51BkUTPEw1msH29/S2X2qGoju7hMEo6a5RCg/ZerCGR4kuKdyMqrueXP
XjhA4qTZZP/Bk4qx0SDq7vmGLG02bc03h4RUn0FUhK/6/bLbwU4VBJ3bppYPlSK5
kr8j0UmxgUUGBh6xcJ17ShkTc1F1+uIcjhpQKApQdH5It0IjTydMH+fabrW9ggOc
K3y/LWYRRp/ZO0vz9oMqq8YJ2ORYKWKEG0lLUCGkvGFzzDgsYdeWKGZB163+aujC
6IEwg5fMH950fW09LTaV+MZPDQgYGKgkRW8QfiLcknqJ4Ap46y2thusZE1euJtIk
zX0/FaBWQ9dGljdpG64RxU9e7KxCvBZs4Nr0C3ZUvqMTFPzMwLMdNte8ilF/5LD2
5TxTSsK3dHJsQFxz80p9I4IlreMw0ihg43ydN5cspKSMiyroF17xduBWsSbLLk61
VJ90VerJmbA=
=iBqK
-----END PGP SIGNATURE-----

C
C
Christopher Baines wrote on 5 Jan 21:53 +0100
[PATCH 1/3] guix: utils: Add delete-duplicates/sort.
(address . 68271@debbugs.gnu.org)
ce0156e7528f6a6a54f0469774b41604e7872a22.1704488002.git.mail@cbaines.net
Similar to delete-duplicates, but sorts the list first and then compares the
pairs in the list. This is faster than delete-duplicates for larger lists.

* guix/utils.scm (delete-duplicates): New procedure.

Change-Id: Ibc2897cdb7c76be0d0e099bc47fee005a88bea2e
---
guix/utils.scm | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)

Toggle diff (50 lines)
diff --git a/guix/utils.scm b/guix/utils.scm
index e4e9d922e7..c1c967ee6c 100644
--- a/guix/utils.scm
+++ b/guix/utils.scm
@@ -146,6 +146,8 @@ (define-module (guix utils)
edit-expression
delete-expression
+ delete-duplicates/sort
+
filtered-port
decompressed-port
call-with-decompressed-port
@@ -502,6 +504,32 @@ (define (delete-expression source-properties)
"Delete the expression specified by SOURCE-PROPERTIES."
(edit-expression source-properties (const "") #:include-trailing-newline? #t))
+
+;;;
+;;; Lists.
+;;;
+
+(define* (delete-duplicates/sort unsorted-lst less #:optional (equal? eq?))
+ (if (null? unsorted-lst)
+ unsorted-lst
+ (let ((sorted-lst (sort unsorted-lst
+ ;; Sort in the reverse order
+ (lambda (a b) (eq? #f (less a b))))))
+ (let loop ((lst (cdr sorted-lst))
+ (last-element (car sorted-lst))
+ (result (list (car sorted-lst))))
+ (if (null? lst)
+ result
+ (let ((current-element (car lst)))
+ (if (equal? current-element last-element)
+ (loop (cdr lst)
+ last-element
+ result)
+ (loop (cdr lst)
+ current-element
+ (cons current-element
+ result)))))))))
+
;;;
;;; Keyword arguments.

base-commit: deeb7d1f53d7ddfa977b3eadd760312bbd0a2509
--
2.41.0
C
C
Christopher Baines wrote on 5 Jan 21:53 +0100
[PATCH 2/3] guix: derivations: Use delete-duplicates/sort.
(address . 68271@debbugs.gnu.org)
cb11dedad1ec9838e33af735b5176337334dc362.1704488002.git.mail@cbaines.net
As this seems to be a small speedup, as tested by computing derivations for
all packages targeting i586-pc-gnu.

* guix/derivations.scm (derivation/masked-inputs): Use delete-duplicates/sort.

Change-Id: I9ec963c10e67a525037c346f44c92a87376935c5
---
guix/derivations.scm | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

Toggle diff (23 lines)
diff --git a/guix/derivations.scm b/guix/derivations.scm
index 9fec7f4f0b..29c7ef9a5c 100644
--- a/guix/derivations.scm
+++ b/guix/derivations.scm
@@ -745,10 +745,12 @@ (define (derivation/masked-inputs drv)
(make-derivation-input hash sub-drvs))))
inputs)))
(make-derivation outputs
- (sort (delete-duplicates inputs)
- (lambda (drv1 drv2)
- (string<? (derivation-input-derivation drv1)
- (derivation-input-derivation drv2))))
+ (delete-duplicates/sort
+ inputs
+ (lambda (drv1 drv2)
+ (string<? (derivation-input-derivation drv1)
+ (derivation-input-derivation drv2)))
+ eq?)
sources
system builder args env-vars
#f)))))
--
2.41.0
C
C
Christopher Baines wrote on 5 Jan 21:53 +0100
[PATCH 3/3] guix: packages: Speed up deduplicating inputs.
(address . 68271@debbugs.gnu.org)
7bb6eeca77516fdd01e9b0b98eb9e21ac87c7509.1704488002.git.mail@cbaines.net
Use delete-duplicates/sort rather than delete-duplicates, as this seems to
perform a little better, at least when testing by computing derivations
targeting i586-pc-gnu for all packages.

* guix/packages.scm (input<?, deduplicate-inputs): New procedures.
(bag-derivation, bag->cross-derivation): Use deduplicate-inputs.

Change-Id: Ic47b50aa52f11d701e5aefa2a095219e3a98cfd1
---
guix/packages.scm | 23 +++++++++++++++++++----
1 file changed, 19 insertions(+), 4 deletions(-)

Toggle diff (50 lines)
diff --git a/guix/packages.scm b/guix/packages.scm
index 930b1a3b0e..09dc88e2af 100644
--- a/guix/packages.scm
+++ b/guix/packages.scm
@@ -1889,6 +1889,21 @@ (define (input=? input1 input2)
(derivation=? obj1 obj2))
(equal? obj1 obj2))))))))
+(define (input<? input1 input2)
+ (let ((label1 (first input1))
+ (label2 (first input2)))
+ (if (string=? label1 label2)
+ (let ((obj1 (second input1))
+ (obj2 (second input2)))
+ (if (and (derivation? obj1) (derivation? obj2))
+ (string<? (derivation-file-name obj1)
+ (derivation-file-name obj2))
+ #f))
+ (string<? label1 label2))))
+
+(define-inlinable (deduplicate-inputs inputs)
+ (delete-duplicates/sort inputs input<? input=?))
+
(define* (bag->derivation bag #:optional context)
"Return the derivation to build BAG for SYSTEM. Optionally, CONTEXT can be
a package object describing the context in which the call occurs, for improved
@@ -1911,7 +1926,7 @@ (define* (bag->derivation bag #:optional context)
;; that lead to the same derivation. Delete those duplicates to avoid
;; issues down the road, such as duplicate entries in '%build-inputs'.
(apply (bag-build bag) (bag-name bag)
- (delete-duplicates input-drvs input=?)
+ (deduplicate-inputs input-drvs)
#:search-paths paths
#:outputs (bag-outputs bag) #:system system
(bag-arguments bag)))))
@@ -1951,9 +1966,9 @@ (define* (bag->cross-derivation bag #:optional context)
all))))
(apply (bag-build bag) (bag-name bag)
- #:build-inputs (delete-duplicates build-drvs input=?)
- #:host-inputs (delete-duplicates host-drvs input=?)
- #:target-inputs (delete-duplicates target-drvs input=?)
+ #:build-inputs (deduplicate-inputs build-drvs)
+ #:host-inputs (deduplicate-inputs host-drvs)
+ #:target-inputs (deduplicate-inputs target-drvs)
#:search-paths paths
#:native-search-paths npaths
#:outputs (bag-outputs bag)
--
2.41.0
L
L
Ludovic Courtès wrote on 12 Jan 14:53 +0100
Re: [bug#68271] [PATCH 1/3] guix: utils: Add delete-duplicates/sort.
(name . Christopher Baines)(address . mail@cbaines.net)
87h6jik68o.fsf@gnu.org
Christopher Baines <mail@cbaines.net> skribis:

Toggle quote (3 lines)
> Similar to delete-duplicates, but sorts the list first and then compares the
> pairs in the list. This is faster than delete-duplicates for larger lists.

We’re dealing with small lists here though (derivation inputs). Did you
try to time the benefit?

There’s a complexity/benefit tradeoff and I wonder if it cuts it.

Toggle quote (4 lines)
> +(define* (delete-duplicates/sort unsorted-lst less #:optional (equal? eq?))
> + (if (null? unsorted-lst)
> + unsorted-lst

This ‘if’ is unnecessary.

Toggle quote (4 lines)
> + (let ((sorted-lst (sort unsorted-lst
> + ;; Sort in the reverse order
> + (lambda (a b) (eq? #f (less a b))))))

Just: (negate less).

Toggle quote (7 lines)
> + (let loop ((lst (cdr sorted-lst))
> + (last-element (car sorted-lst))
> + (result (list (car sorted-lst))))
> + (if (null? lst)
> + result
> + (let ((current-element (car lst)))

Please follow the convention of using ‘match’ rather than car caddr
caddddar (info "(guix) Data Types and Pattern Matching"):

(match lst
(()
result)
((head . tail)
…))

Ludo’.
?