excessively large manifests due to propagation

OpenSubmitted by Ricardo Wurmus.
Details
3 participants
  • Ludovic Courtès
  • Maxime Devos
  • Ricardo Wurmus
Owner
unassigned
Severity
important
R
R
Ricardo Wurmus wrote on 18 May 15:53 +0200
(address . bug-guix@gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
87sfp7kkim.fsf@elephly.net
Packages of some languages rely heavily on propagation. R is one
example. Since the generated “manifest” file of a Guix profile records
entries for all propagated packages, this can get really big really
quickly.

A profile consisting only of four R packages (r-seurat, r-cistopic,
r-monocle3, and r-cicero-monocle3) results in a “manifest” file that
weighs 7.1MB. At the MDC I repeatedly encountered manifest files that
are exceeding 24MB.

Simply reading that big a file with

(call-with-input-file "huge-manifest" read)

takes several seconds. On the MDC cluster I observed a delay of about
27 seconds.

Disabling read positions with (read-disable 'positions) significantly
speeds this up (18s vs 27s), but it’s still very slow.

We may be able to speed things up by supporting definitions or
references in manifest files, so that we don’t need to repeat a sub-tree
when generating the file.

--
Ricardo
L
L
Ludovic Courtès wrote on 23 May 17:56 +0200
control message for bug #55499
(address . control@debbugs.gnu.org)
87o7zo2qg3.fsf@gnu.org
severity 55499 important
quit
L
L
Ludovic Courtès wrote on 24 May 17:30 +0200
Re: bug#55499: excessively large manifests due to propagation
(name . Ricardo Wurmus)(address . rekado@elephly.net)(address . 55499@debbugs.gnu.org)
87a6b7j6cm.fsf@gnu.org
Hi!

Ricardo Wurmus <rekado@elephly.net> skribis:

Toggle quote (5 lines)
> A profile consisting only of four R packages (r-seurat, r-cistopic,
> r-monocle3, and r-cicero-monocle3) results in a “manifest” file that
> weighs 7.1MB. At the MDC I repeatedly encountered manifest files that
> are exceeding 24MB.

Commit 93f601d97ca2d9b82c41afeb86879ee37eae39e6 provides a 12% size
reduction on this example, and it’s backward-compatible and cheap.

I’ll try and follow up with changes along the lines you describe.

Ludo’.
R
R
Ricardo Wurmus wrote on 25 May 06:35 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 55499@debbugs.gnu.org)
87sfoygrfd.fsf@elephly.net
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (12 lines)
> Hi!
>
> Ricardo Wurmus <rekado@elephly.net> skribis:
>
>> A profile consisting only of four R packages (r-seurat, r-cistopic,
>> r-monocle3, and r-cicero-monocle3) results in a “manifest” file that
>> weighs 7.1MB. At the MDC I repeatedly encountered manifest files that
>> are exceeding 24MB.
>
> Commit 93f601d97ca2d9b82c41afeb86879ee37eae39e6 provides a 12% size
> reduction on this example, and it’s backward-compatible and cheap.

Excellent! This is a great first step.

Toggle quote (2 lines)
> I’ll try and follow up with changes along the lines you describe.

Thank you!

--
Ricardo
L
L
Ludovic Courtès wrote on 31 May 18:09 +0200
[PATCH 0/3] Make 'manifest' files more compact
(address . 55499@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20220531160916.21508-1-ludo@gnu.org
Hello,

These patches implement what you suggested on IRC: not repeating
entire manifest entries and their propagated inputs. This has a
dramatic impact on the size of the ‘manifest’ file and on the memory
and processing time to read it for the the use case you gave.

The second patch goes a tiny bit further by making the ‘search-paths’
and ‘propagated-inputs’ fields optional, shaving another ~10% on the
size of ‘manifest’ in this example.

The second patch should be squashed with the first one (so we don’t
bump version formats a second time and duplicate code). It’s kinda
optional because it doesn’t bring much compared to the first patch and
causes a bit of extra complexity, but maybe it’s still worth keeping?

Could you try this on your larger use cases and tell me how it goes?

Thanks,
Ludo’.

Ludovic Courtès (3):
tests: Augment profile collision test.
profiles: Do not repeat entries in 'manifest' file.
squash! profiles: Make all entry fields optional.

guix/build/profiles.scm | 32 ++++++++--
guix/profiles.scm | 137 ++++++++++++++++++++++++++++++++--------
tests/profiles.scm | 52 ++++++++++++++-
3 files changed, 187 insertions(+), 34 deletions(-)


base-commit: fed51b26141548a5bae349a5e1d8d6f681320f4f
--
2.36.1
L
L
Ludovic Courtès wrote on 31 May 18:09 +0200
[PATCH 2/3] profiles: Do not repeat entries in 'manifest' file.
(address . 55499@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20220531160916.21508-3-ludo@gnu.org
Reported by Ricardo Wurmus <rekado@elephly.net>.

With this change, the manifest file created for:

guix install r r-seurat r-cistopic r-monocle3 r-cicero-monocle3 r-assertthat

goes from 5.6M to 192K. Likewise, on this profile, wall-clock time of:

GUIX_PROFILING=gc guix package -I

goes from 0.7s to 0.1s, with heap usage going from 55M to 9M.

* guix/profiles.scm (manifest->gexp)[entry->gexp]: Turn into a monadic
procedure. Return a 'repeated' sexp if ENTRY was already visited
before.
Adjust caller accordingly. Bump manifest version.
(sexp->manifest)[sexp->manifest-entry]: Turn into a monadic procedure.
Add case for 'repeated' nodes. Add each entry to the current state
vhash.
Add clause for version 4 manifests.
* tests/profiles.scm ("deduplication of repeated entries"): New test.
* guix/build/profiles.scm (manifest-sexp->inputs+search-paths): Expect
version 4. Add clause for 'repeated' nodes.
---
guix/build/profiles.scm | 4 +-
guix/profiles.scm | 127 ++++++++++++++++++++++++++++------------
tests/profiles.scm | 42 +++++++++++++
3 files changed, 134 insertions(+), 39 deletions(-)

Toggle diff (236 lines)
diff --git a/guix/build/profiles.scm b/guix/build/profiles.scm
index f9875ca92e..c4460f624b 100644
--- a/guix/build/profiles.scm
+++ b/guix/build/profiles.scm
@@ -150,7 +150,7 @@ (define (manifest-sexp->inputs+search-paths manifest)
 values: the list of store items of its manifest entries, and the list of
 search path specifications."
   (match manifest                            ;this must match 'manifest->gexp'
-    (('manifest ('version 3)
+    (('manifest ('version 4)
                 ('packages (entries ...)))
      (let loop ((entries entries)
                 (inputs '())
@@ -162,6 +162,8 @@ (define (manifest-sexp->inputs+search-paths manifest)
           (loop (append rest deps)                ;breadth-first traversal
                 (cons item inputs)
                 (append paths search-paths)))
+         ((('repeated name version item) . rest)
+          (loop rest inputs search-paths))
          (()
           (values (reverse inputs)
                   (delete-duplicates
diff --git a/guix/profiles.scm b/guix/profiles.scm
index bf50c00a1e..44ff37e75b 100644
--- a/guix/profiles.scm
+++ b/guix/profiles.scm
@@ -455,31 +455,53 @@ (define (inferior->entry)
 (define (manifest->gexp manifest)
   "Return a representation of MANIFEST as a gexp."
   (define (entry->gexp entry)
-    (match entry
-      (($ <manifest-entry> name version output (? string? path)
-                           (deps ...) (search-paths ...) _ (properties ...))
-       #~(#$name #$version #$output #$path
-                 (propagated-inputs #$(map entry->gexp deps))
-                 (search-paths #$(map search-path-specification->sexp
-                                      search-paths))
-                 #$@(if (null? properties)
-                        #~()
-                        #~((properties . #$properties)))))
-      (($ <manifest-entry> name version output package
-                           (deps ...) (search-paths ...) _ (properties ...))
-       #~(#$name #$version #$output
-                 (ungexp package (or output "out"))
-                 (propagated-inputs #$(map entry->gexp deps))
-                 (search-paths #$(map search-path-specification->sexp
-                                      search-paths))
-                 #$@(if (null? properties)
-                        #~()
-                        #~((properties . #$properties)))))))
+    ;; Maintain in state monad a vhash of visited entries, indexed by their
+    ;; item, usually package objects (we cannot use the entry itself as an
+    ;; index since identical entries are usually not 'eq?').  Use that vhash
+    ;; to avoid repeating duplicate entries.  This is particularly useful in
+    ;; the presence of propagated inputs, where we could otherwise end up
+    ;; repeating large trees.
+    (mlet %state-monad ((visited (current-state)))
+      (if (match (vhash-assq (manifest-entry-item entry) visited)
+            ((_ . previous-entry)
+             (manifest-entry=? previous-entry entry))
+            (#f #f))
+          (return #~(repeated #$(manifest-entry-name entry)
+                              #$(manifest-entry-version entry)
+                              #$(manifest-entry-item entry)))
+          (mbegin %state-monad
+            (set-current-state (vhash-consq (manifest-entry-item entry)
+                                            entry visited))
+            (mlet %state-monad ((deps (mapm %state-monad entry->gexp
+                                            (manifest-entry-dependencies entry))))
+              (return
+               (match entry
+                 (($ <manifest-entry> name version output (? string? path)
+                                      (_deps ...) (search-paths ...) _ (properties ...))
+                  #~(#$name #$version #$output #$path
+                            (propagated-inputs #$deps)
+                            (search-paths #$(map search-path-specification->sexp
+                                                 search-paths))
+                            #$@(if (null? properties)
+                                   #~()
+                                   #~((properties . #$properties)))))
+                 (($ <manifest-entry> name version output package
+                                      (_deps ...) (search-paths ...) _ (properties ...))
+                  #~(#$name #$version #$output
+                            (ungexp package (or output "out"))
+                            (propagated-inputs #$deps)
+                            (search-paths #$(map search-path-specification->sexp
+                                                 search-paths))
+                            #$@(if (null? properties)
+                                   #~()
+                                   #~((properties . #$properties))))))))))))
 
   (match manifest
     (($ <manifest> (entries ...))
-     #~(manifest (version 3)
-                 (packages #$(map entry->gexp entries))))))
+     #~(manifest (version 4)
+                 (packages #$(run-with-state
+                                 (mapm %state-monad entry->gexp entries)
+                               vlist-null))))))
 
 (define (find-package name version)
   "Return a package from the distro matching NAME and possibly VERSION.  This
@@ -522,25 +544,44 @@ (define (infer-dependency item parent)
 
   (define* (sexp->manifest-entry sexp #:optional (parent (delay #f)))
     (match sexp
+      (('repeated name version path)
+       ;; This entry is the same as another one encountered earlier; look it
+       ;; up and return it.
+       (mlet %state-monad ((visited (current-state))
+                           (key -> (list name version path)))
+         (match (vhash-assoc key visited)
+           (#f
+            (raise (formatted-message
+                    (G_ "invalid repeated entry in profile: ~s")
+                    sexp)))
+           ((_ . entry)
+            (return entry)))))
       ((name version output path
              ('propagated-inputs deps)
              ('search-paths search-paths)
              extra-stuff ...)
-       ;; For each of DEPS, keep a promise pointing to ENTRY.
-       (letrec* ((deps* (map (cut sexp->manifest-entry <> (delay entry))
-                             deps))
-                 (entry (manifest-entry
-                          (name name)
-                          (version version)
-                          (output output)
-                          (item path)
-                          (dependencies deps*)
-                          (search-paths (map sexp->search-path-specification
-                                             search-paths))
-                          (parent parent)
-                          (properties (or (assoc-ref extra-stuff 'properties)
-                                          '())))))
-         entry))))
+       (mlet* %state-monad
+           ((entry -> #f)
+            (deps*    (mapm %state-monad
+                            (cut sexp->manifest-entry <> (delay entry))
+                            deps))
+            (visited  (current-state))
+            (key ->   (list name version path)))
+         (set! entry                              ;XXX: emulate 'letrec*'
+               (manifest-entry
+                 (name name)
+                 (version version)
+                 (output output)
+                 (item path)
+                 (dependencies deps*)
+                 (search-paths (map sexp->search-path-specification
+                                    search-paths))
+                 (parent parent)
+                 (properties (or (assoc-ref extra-stuff 'properties)
+                                 '()))))
+         (mbegin %state-monad
+           (set-current-state (vhash-cons key entry visited))
+           (return entry))))))
 
   (match sexp
     (('manifest ('version 0)
@@ -608,7 +649,17 @@ (define* (sexp->manifest-entry sexp #:optional (parent (delay #f)))
     ;; Version 3 represents DEPS as full-blown manifest entries.
     (('manifest ('version 3 minor-version ...)
                 ('packages (entries ...)))
-     (manifest (map sexp->manifest-entry entries)))
+     (manifest (run-with-state
+                   (mapm %state-monad sexp->manifest-entry entries)
+                 vlist-null)))
+
+    ;; Version 4 deduplicates repeated entries, as can happen with deep
+    ;; propagated input trees.
+    (('manifest ('version 4 minor-version ...)
+                ('packages (entries ...)))
+     (manifest (run-with-state
+                   (mapm %state-monad sexp->manifest-entry entries)
+                 vlist-null)))
     (_
      (raise (condition
              (&message (message "unsupported manifest format")))))))
diff --git a/tests/profiles.scm b/tests/profiles.scm
index 7e51d37ab9..3838d971c9 100644
--- a/tests/profiles.scm
+++ b/tests/profiles.scm
@@ -586,6 +586,48 @@ (define (entry->sexp entry)
                                                     #:locales? #f)))
         (return #f)))))
 
+(test-assertm "deduplication of repeated entries"
+  ;; Make sure the 'manifest' file does not duplicate identical entries.
+  ;; See <https://issues.guix.gnu.org/55499>.
+  (mlet* %store-monad ((p0 -> (dummy-package "p0"
+                                (build-system trivial-build-system)
+                                (arguments
+                                 `(#:guile ,%bootstrap-guile
+                                   #:builder (mkdir (assoc-ref %outputs "out"))))
+                                (propagated-inputs
+                                 `(("guile" ,%bootstrap-guile)))))
+                       (p1 -> (package
+                                (inherit p0)
+                                (name "p1")))
+                       (drv (profile-derivation (packages->manifest
+                                                 (list p0 p1))
+                                                #:hooks '()
+                                                #:locales? #f)))
+    (mbegin %store-monad
+      (built-derivations (list drv))
+      (let ((file     (string-append (derivation->output-path drv)
+                                     "/manifest"))
+            (manifest (profile-manifest (derivation->output-path drv))))
+        (define (contains-repeated? sexp)
+          (match sexp
+            (('repeated _ ...) #t)
+            ((lst ...) (any contains-repeated? sexp))
+            (_ #f)))
+
+        (return (and (contains-repeated? (call-with-input-file file read))
+
+                     ;; MANIFEST has two entries for %BOOTSTRAP-GUILE since
+                     ;; it's propagated both from P0 and from P1.  When
+                     ;; reading a 'repeated' node, 'read-manifest' should
+                     ;; reuse the previously-read entry so the two
+                     ;; %BOOTSTRAP-GUILE entries must be 'eq?'.
+                     (match (manifest-entries manifest)
+                       (((= manifest-entry-dependencies (dep0))
+                         (= manifest-entry-dependencies (dep1)))
+                        (and (string=? (manifest-entry-name dep0)
+                                       (package-name %bootstrap-guile))
+                             (eq? dep0 dep1))))))))))
+
 (test-assertm "no collision"
   ;; Here we have an entry that is "lowered" (its 'item' field is a store file
   ;; name) and another entry (its 'item' field is a package) that is
-- 
2.36.1
L
L
Ludovic Courtès wrote on 31 May 18:09 +0200
[PATCH 1/3] tests: Augment profile collision test.
(address . 55499@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20220531160916.21508-2-ludo@gnu.org
* tests/profiles.scm ("collision of propagated inputs"): Check the
parents of ENTRY1 and ENTRY2.
---
tests/profiles.scm | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

Toggle diff (29 lines)
diff --git a/tests/profiles.scm b/tests/profiles.scm
index d59d75985f..7e51d37ab9 100644
--- a/tests/profiles.scm
+++ b/tests/profiles.scm
@@ -556,14 +556,20 @@ (define (entry->sexp entry)
         (return #f)))))
 
 (test-equal "collision of propagated inputs"
-  '(("guile-bootstrap" "2.0") ("guile-bootstrap" "42"))
+  '(("guile-bootstrap" "2.0") "p1"
+    <> ("guile-bootstrap" "42") "p2")
   (guard (c ((profile-collision-error? c)
              (let ((entry1 (profile-collision-error-entry c))
                    (entry2 (profile-collision-error-conflict c)))
                (list (list (manifest-entry-name entry1)
                            (manifest-entry-version entry1))
+                     (manifest-entry-name
+                      (force (manifest-entry-parent entry1)))
+                     '<>
                      (list (manifest-entry-name entry2)
-                           (manifest-entry-version entry2))))))
+                           (manifest-entry-version entry2))
+                     (manifest-entry-name
+                      (force (manifest-entry-parent entry2)))))))
     (run-with-store %store
       (mlet* %store-monad ((p0 -> (package
                                     (inherit %bootstrap-guile)
-- 
2.36.1
L
L
Ludovic Courtès wrote on 31 May 18:09 +0200
[PATCH 3/3] squash! profiles: Make all entry fields optional.
(address . 55499@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20220531160916.21508-4-ludo@gnu.org
This is meant to be squashed with the previous patch.

This makes the 'search-paths' and 'propagated-inputs' fields of each
entry optional, shaving a bit more space and reading time, down to 180K
instead of 192K.

* guix/build/profiles.scm (manifest-sexp->inputs+search-paths)[let-fields]:
New macro.
Use it.
* guix/profiles.scm (manifest->gexp)[optional]: New procedure. Use it.
[sexp->manifest-entry]: Rename to...
[sexp->manifest-entry/v3]: ... this.
---
guix/build/profiles.scm | 28 ++++++++--
guix/profiles.scm | 120 ++++++++++++++++++++++++++--------------
2 files changed, 100 insertions(+), 48 deletions(-)

Toggle diff (222 lines)
diff --git a/guix/build/profiles.scm b/guix/build/profiles.scm
index c4460f624b..2ab76bde74 100644
--- a/guix/build/profiles.scm
+++ b/guix/build/profiles.scm
@@ -149,6 +149,18 @@ (define (manifest-sexp->inputs+search-paths manifest)
   "Parse MANIFEST, an sexp as produced by 'manifest->gexp', and return two
 values: the list of store items of its manifest entries, and the list of
 search path specifications."
+  (define-syntax let-fields
+    (syntax-rules ()
+      ;; Bind the fields NAME of LST to same-named variables in the lexical
+      ;; scope of BODY.
+      ((_ lst (name rest ...) body ...)
+       (let ((name (match (assq 'name lst)
+                     ((_ value) value)
+                     (#f '()))))
+         (let-fields lst (rest ...) body ...)))
+      ((_ lst () body ...)
+       (begin body ...))))
+
   (match manifest                            ;this must match 'manifest->gexp'
     (('manifest ('version 4)
                 ('packages (entries ...)))
@@ -156,12 +168,12 @@ (define (manifest-sexp->inputs+search-paths manifest)
                 (inputs '())
                 (search-paths '()))
        (match entries
-         (((name version output item
-                 ('propagated-inputs deps)
-                 ('search-paths paths) _ ...) . rest)
-          (loop (append rest deps)                ;breadth-first traversal
-                (cons item inputs)
-                (append paths search-paths)))
+         (((name version output item fields ...) . rest)
+          (let ((paths search-paths))
+            (let-fields fields (propagated-inputs search-paths properties)
+              (loop (append rest propagated-inputs) ;breadth-first traversal
+                    (cons item inputs)
+                    (append search-paths paths)))))
          ((('repeated name version item) . rest)
           (loop rest inputs search-paths))
          (()
@@ -214,4 +226,8 @@ (define manifest-file
     ;; Write 'OUTPUT/etc/profile'.
     (build-etc/profile output search-paths)))
 
+;;; Local Variables:
+;;; eval: (put 'let-fields 'scheme-indent-function 2)
+;;; End:
+
 ;;; profile.scm ends here
diff --git a/guix/profiles.scm b/guix/profiles.scm
index 44ff37e75b..d694ac07da 100644
--- a/guix/profiles.scm
+++ b/guix/profiles.scm
@@ -454,6 +454,11 @@ (define (inferior->entry)
 
 (define (manifest->gexp manifest)
   "Return a representation of MANIFEST as a gexp."
+  (define (optional name value)
+    (if (null? value)
+        #~()
+        #~((#$name #$value))))
+
   (define (entry->gexp entry)
     ;; Maintain in state monad a vhash of visited entries, indexed by their
     ;; item, usually package objects (we cannot use the entry itself as an
@@ -477,24 +482,22 @@ (define (entry->gexp entry)
               (return
                (match entry
                  (($ <manifest-entry> name version output (? string? path)
-                                      (_deps ...) (search-paths ...) _ (properties ...))
+                                      (_ ...) (search-paths ...) _ (properties ...))
                   #~(#$name #$version #$output #$path
-                            (propagated-inputs #$deps)
-                            (search-paths #$(map search-path-specification->sexp
-                                                 search-paths))
-                            #$@(if (null? properties)
-                                   #~()
-                                   #~((properties . #$properties)))))
+                            #$@(optional 'propagated-inputs deps)
+                            #$@(optional 'search-paths
+                                         (map search-path-specification->sexp
+                                              search-paths))
+                            #$@(optional 'properties properties)))
                  (($ <manifest-entry> name version output package
                                       (_deps ...) (search-paths ...) _ (properties ...))
                   #~(#$name #$version #$output
                             (ungexp package (or output "out"))
-                            (propagated-inputs #$deps)
-                            (search-paths #$(map search-path-specification->sexp
-                                                 search-paths))
-                            #$@(if (null? properties)
-                                   #~()
-                                   #~((properties . #$properties))))))))))))
+                            #$@(optional 'propagated-inputs deps)
+                            #$@(optional 'search-paths
+                                         (map search-path-specification->sexp
+                                              search-paths))
+                            #$@(optional 'properties properties))))))))))
 
   (match manifest
     (($ <manifest> (entries ...))
@@ -542,6 +545,40 @@ (define (infer-dependency item parent)
         (item item)
         (parent parent))))
 
+  (define* (sexp->manifest-entry/v3 sexp #:optional (parent (delay #f)))
+    (match sexp
+      ((name version output path
+             ('propagated-inputs deps)
+             ('search-paths search-paths)
+             extra-stuff ...)
+       ;; For each of DEPS, keep a promise pointing to ENTRY.
+       (letrec* ((deps* (map (cut sexp->manifest-entry/v3 <> (delay entry))
+                             deps))
+                 (entry (manifest-entry
+                          (name name)
+                          (version version)
+                          (output output)
+                          (item path)
+                          (dependencies deps*)
+                          (search-paths (map sexp->search-path-specification
+                                             search-paths))
+                          (parent parent)
+                          (properties (or (assoc-ref extra-stuff 'properties)
+                                          '())))))
+         entry))))
+
+  (define-syntax let-fields
+    (syntax-rules ()
+      ;; Bind the fields NAME of LST to same-named variables in the lexical
+      ;; scope of BODY.
+      ((_ lst (name rest ...) body ...)
+       (let ((name (match (assq 'name lst)
+                     ((_ value) value)
+                     (#f '()))))
+         (let-fields lst (rest ...) body ...)))
+      ((_ lst () body ...)
+       (begin body ...))))
+
   (define* (sexp->manifest-entry sexp #:optional (parent (delay #f)))
     (match sexp
       (('repeated name version path)
@@ -556,32 +593,29 @@ (define* (sexp->manifest-entry sexp #:optional (parent (delay #f)))
                     sexp)))
            ((_ . entry)
             (return entry)))))
-      ((name version output path
-             ('propagated-inputs deps)
-             ('search-paths search-paths)
-             extra-stuff ...)
-       (mlet* %state-monad
-           ((entry -> #f)
-            (deps*    (mapm %state-monad
-                            (cut sexp->manifest-entry <> (delay entry))
-                            deps))
-            (visited  (current-state))
-            (key ->   (list name version path)))
-         (set! entry                              ;XXX: emulate 'letrec*'
-               (manifest-entry
-                 (name name)
-                 (version version)
-                 (output output)
-                 (item path)
-                 (dependencies deps*)
-                 (search-paths (map sexp->search-path-specification
-                                    search-paths))
-                 (parent parent)
-                 (properties (or (assoc-ref extra-stuff 'properties)
-                                 '()))))
-         (mbegin %state-monad
-           (set-current-state (vhash-cons key entry visited))
-           (return entry))))))
+      ((name version output path fields ...)
+       (let-fields fields (propagated-inputs search-paths properties)
+         (mlet* %state-monad
+             ((entry -> #f)
+              (deps     (mapm %state-monad
+                              (cut sexp->manifest-entry <> (delay entry))
+                              propagated-inputs))
+              (visited  (current-state))
+              (key ->   (list name version path)))
+           (set! entry                             ;XXX: emulate 'letrec*'
+                 (manifest-entry
+                   (name name)
+                   (version version)
+                   (output output)
+                   (item path)
+                   (dependencies deps)
+                   (search-paths (map sexp->search-path-specification
+                                      search-paths))
+                   (parent parent)
+                   (properties properties)))
+           (mbegin %state-monad
+             (set-current-state (vhash-cons key entry visited))
+             (return entry)))))))
 
   (match sexp
     (('manifest ('version 0)
@@ -649,9 +683,7 @@ (define* (sexp->manifest-entry sexp #:optional (parent (delay #f)))
     ;; Version 3 represents DEPS as full-blown manifest entries.
     (('manifest ('version 3 minor-version ...)
                 ('packages (entries ...)))
-     (manifest (run-with-state
-                   (mapm %state-monad sexp->manifest-entry entries)
-                 vlist-null)))
+     (manifest (map sexp->manifest-entry/v3 entries)))
 
     ;; Version 4 deduplicates repeated entries, as can happen with deep
     ;; propagated input trees.
@@ -2368,4 +2400,8 @@ (define (user-friendly-profile profile)
             %known-shorthand-profiles)
       profile))
 
+;;; Local Variables:
+;;; eval: (put 'let-fields 'scheme-indent-function 2)
+;;; End:
+
 ;;; profiles.scm ends here
-- 
2.36.1
M
M
Maxime Devos wrote on 31 May 19:35 +0200
Re: bug#55499: [PATCH 2/3] profiles: Do not repeat entries in 'manifest' file.
10a79d0be70467d59fffd225b18f0bbd17ccdba3.camel@telenet.be
Ludovic Courtès schreef op di 31-05-2022 om 18:09 [+0200]:
Toggle quote (10 lines)
> With this change, the manifest file created for:
>
>   guix install r r-seurat r-cistopic r-monocle3 r-cicero-monocle3 r-assertthat
>
> goes from 5.6M to 192K.  Likewise, on this profile, wall-clock time of:
>
>   GUIX_PROFILING=gc guix package -I
>
> goes from 0.7s to 0.1s, with heap usage going from 55M to 9M.

I'm not familiar enough with this part of Guix to evaluate the patches,
but the time, disk memory and heap memory decreases sound great!

Greetings,
Maxime.
-----BEGIN PGP SIGNATURE-----

iI0EABYKADUWIQTB8z7iDFKP233XAR9J4+4iGRcl7gUCYpZR9hccbWF4aW1lZGV2
b3NAdGVsZW5ldC5iZQAKCRBJ4+4iGRcl7gAyAP9m0h53t04kE/RAWjFgbx5e1rKs
mOeRM17mlIlgkSgJewD/QLShv6NeQQpbUu9TAKW8FDtq8p/f2CBAtoRIldkwOA0=
=O9+S
-----END PGP SIGNATURE-----


L
L
Ludovic Courtès wrote on 1 Jun 11:38 +0200
(name . Maxime Devos)(address . maximedevos@telenet.be)(address . 55499@debbugs.gnu.org)
871qw8yb81.fsf@gnu.org
Hi,

Maxime Devos <maximedevos@telenet.be> skribis:

Toggle quote (14 lines)
> Ludovic Courtès schreef op di 31-05-2022 om 18:09 [+0200]:
>> With this change, the manifest file created for:
>>
>>   guix install r r-seurat r-cistopic r-monocle3 r-cicero-monocle3 r-assertthat
>>
>> goes from 5.6M to 192K.  Likewise, on this profile, wall-clock time of:
>>
>>   GUIX_PROFILING=gc guix package -I
>>
>> goes from 0.7s to 0.1s, with heap usage going from 55M to 9M.
>
> I'm not familiar enough with this part of Guix to evaluate the patches,
> but the time, disk memory and heap memory decreases sound great!

Yup! The difference is significant primarily for profiles with lots of
propagated inputs, so typically profiles with R or Python packages.

For my home profile (300+ packages but no R and no Python), it goes from
316K to 112K, heap usage for ‘guix package -I’ goes from 12M to 9M and
wall-clock time is almost unchanged.

Ludo’.
L
L
Ludovic Courtès wrote on 14 Jun 09:06 +0200
Re: bug#55499: excessively large manifests due to propagation
(address . 55499@debbugs.gnu.org)(name . Ricardo Wurmus)(address . rekado@elephly.net)
87a6afoh9z.fsf_-_@gnu.org
Hey Ricardo,

Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (16 lines)
> These patches implement what you suggested on IRC: not repeating
> entire manifest entries and their propagated inputs. This has a
> dramatic impact on the size of the ‘manifest’ file and on the memory
> and processing time to read it for the the use case you gave.
>
> The second patch goes a tiny bit further by making the ‘search-paths’
> and ‘propagated-inputs’ fields optional, shaving another ~10% on the
> size of ‘manifest’ in this example.
>
> The second patch should be squashed with the first one (so we don’t
> bump version formats a second time and duplicate code). It’s kinda
> optional because it doesn’t bring much compared to the first patch and
> causes a bit of extra complexity, but maybe it’s still worth keeping?
>
> Could you try this on your larger use cases and tell me how it goes?

Did you have a chance to give it a try?

Maybe I can double-check that everything’s alright and go ahead.

Thanks,
Ludo’.
?