[PATCH 0/6] Pipeline substitute integrity check, deduplication, and canonicalization

  • Done
  • quality assurance status badge
Details
2 participants
  • Ludovic Courtès
  • Maxim Cournoyer
Owner
unassigned
Submitted by
Ludovic Courtès
Severity
normal
L
L
Ludovic Courtès wrote on 15 Dec 2020 10:38
(address . guix-patches@gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20201215093830.10322-1-ludo@gnu.org
Hello Guix!

This is a followup to https://issues.guix.gnu.org/45018. It is
meant to be applied on top of https://issues.guix.gnu.org/44760.

Until now, guix-daemon would check the hash of store items just
substituted, reset timestamps/permissions, and deduplicate. This
would lead to extra I/O: the whole set of files is traversed three
times by the daemon and read two times.

This patch series is about delegating that work to ‘guix substitute’,
which it can do directly as it restores file, thereby reducing I/O
to the minimum necessary.

I tested with substitutes that contain many files:

guix build pipewire@0.2 ffmpeg ungoogled-chromium vim-full \
emacs-no-x emacs-no-x-toolkit

On my laptop with an SSD, the wall-clock time is almost unchanged
when fetching lzip substitutes. You can see that the throughput
displayed while downloading is slightly lower than before, which
is consistent because lzip downloads are CPU-bound¹, but this is
compensated by the lack of processing time between substitutes.
With gzip substitutes, I see a 10% speedup on the wall-clock time
on my laptop.

Ludo’.


Ludovic Courtès (6):
tests: Check the build trace for hash mismatches on substitutes.
daemon: Let 'guix substitute' perform hash checks.
tests: Check the mtime and permissions of substituted items.
daemon: Do not reset timestamps and permissions on substituted items.
tests: Make sure substituted items are deduplicated.
daemon: Delegate deduplication to 'guix substitute'.

guix/scripts/substitute.scm | 70 +++++++++++++++++++++++++-----
guix/serialization.scm | 8 +++-
nix/libstore/build.cc | 85 ++++++++++++++++++++-----------------
tests/store.scm | 82 +++++++++++++++++++++++++++++++++++
tests/substitute.scm | 58 ++++++++++++++++++++++---
5 files changed, 248 insertions(+), 55 deletions(-)

--
2.29.2
L
L
Ludovic Courtès wrote on 15 Dec 2020 10:57
[PATCH 1/6] tests: Check the build trace for hash mismatches on substitutes.
(address . 45253@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20201215095730.10954-1-ludo@gnu.org
* tests/store.scm ("substitute, corrupt output hash, build trace"): New
test.
---
tests/store.scm | 55 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 55 insertions(+)

Toggle diff (68 lines)
diff --git a/tests/store.scm b/tests/store.scm
index 38051bf5e5..7f1ec51875 100644
--- a/tests/store.scm
+++ b/tests/store.scm
@@ -787,6 +787,61 @@
(build-derivations s (list d))
#f))))))
+(test-assert "substitute, corrupt output hash, build trace"
+ ;; Likewise, and check the build trace.
+ (with-store s
+ (let* ((c "hello, world") ; contents of the output
+ (d (build-expression->derivation
+ s "corrupt-substitute"
+ `(mkdir %output)
+ #:guile-for-build
+ (package-derivation s %bootstrap-guile (%current-system))))
+ (o (derivation->output-path d)))
+ ;; Make sure we use 'guix substitute'.
+ (set-build-options s
+ #:print-build-trace #t
+ #:use-substitutes? #t
+ #:fallback? #f
+ #:substitute-urls (%test-substitute-urls))
+
+ (with-derivation-substitute d c
+ (sha256 => (make-bytevector 32 0)) ;select a hash that doesn't match C
+
+ (define output
+ (call-with-output-string
+ (lambda (port)
+ (parameterize ((current-build-output-port port))
+ (guard (c ((store-protocol-error? c) #t))
+ (build-derivations s (list d))
+ #f)))))
+
+ (define actual-hash
+ (let-values (((port get-hash)
+ (gcrypt:open-hash-port
+ (gcrypt:hash-algorithm gcrypt:sha256))))
+ (write-file-tree "foo" port
+ #:file-type+size
+ (lambda _
+ (values 'regular (string-length c)))
+ #:file-port
+ (lambda _
+ (open-input-string c)))
+ (close-port port)
+ (bytevector->nix-base32-string (get-hash))))
+
+ (define expected-hash
+ (bytevector->nix-base32-string (make-bytevector 32 0)))
+
+ (define mismatch
+ (string-append "@ hash-mismatch " o " sha256 "
+ expected-hash " " actual-hash "\n"))
+
+ (define failure
+ (string-append "@ substituter-failed " o))
+
+ (and (string-contains output mismatch)
+ (string-contains output failure))))))
+
(test-assert "substitute --fallback"
(with-store s
(let* ((t (random-text)) ; contents of the output
--
2.29.2
L
L
Ludovic Courtès wrote on 15 Dec 2020 10:57
[PATCH 2/6] daemon: Let 'guix substitute' perform hash checks.
(address . 45253@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20201215095730.10954-2-ludo@gnu.org
This way, the hash of the store item can be computed as it is restored,
thereby avoiding an additional file tree traversal ('hashPath' call)
later on in the daemon. Consequently, it should reduce latency between
subsequent substitute downloads.

This is a followup to 5ff521452b9ec2aae9ed8e4bb7bdc250a581f203.

* guix/scripts/substitute.scm (narinfo-hash-algorithm+value): New
procedure.
(process-substitution): Wrap INPUT into a hash input port, 'hashed', and
read from it. Compare the actual and expected hashes, and print a
"hash-mismatch" status line when they differ. When they match, print
not just "success" but also the nar hash and size.
* nix/libstore/build.cc (class SubstitutionGoal)[expectedHashStr]:
Remove.
(SubstitutionGoal::finished): Tokenize 'status'. Parse it and handle
"success" and "hash-mismatch" accordingly. Call 'hashPath' only when
the returned hash is not SHA256.
(SubstitutionGoal::handleChildOutput): Remove 'expectedHashStr'
handling.
* tests/substitute.scm ("substitute, invalid hash"): Rename to...
("substitute, invalid narinfo hash"): ... this.
("substitute, invalid hash"): New test.
---
guix/scripts/substitute.scm | 45 +++++++++++++++++++----
nix/libstore/build.cc | 73 ++++++++++++++++++++-----------------
tests/substitute.scm | 52 ++++++++++++++++++++++++--
3 files changed, 125 insertions(+), 45 deletions(-)

Toggle diff (286 lines)
diff --git a/guix/scripts/substitute.scm b/guix/scripts/substitute.scm
index 25075eedff..17d0002b9f 100755
--- a/guix/scripts/substitute.scm
+++ b/guix/scripts/substitute.scm
@@ -26,6 +26,8 @@
#:use-module (guix combinators)
#:use-module (guix config)
#:use-module (guix records)
+ #:use-module (guix diagnostics)
+ #:use-module (guix i18n)
#:use-module ((guix serialization) #:select (restore-file))
#:autoload (guix scripts discover) (read-substitute-urls)
#:use-module (gcrypt hash)
@@ -256,6 +258,18 @@ connection (typically PORT) is kept open once data has been fetched from URI."
;; for more information.
(contents narinfo-contents))
+(define (narinfo-hash-algorithm+value narinfo)
+ "Return two values: the hash algorithm used by NARINFO and its value as a
+bytevector."
+ (match (string-tokenize (narinfo-hash narinfo)
+ (char-set-complement (char-set #\:)))
+ ((algorithm base32)
+ (values (lookup-hash-algorithm (string->symbol algorithm))
+ (nix-base32-string->bytevector base32)))
+ (_
+ (raise (formatted-message
+ (G_ "invalid narinfo hash: ~s") (narinfo-hash narinfo))))))
+
(define (narinfo-hash->sha256 hash)
"If the string HASH denotes a sha256 hash, return it as a bytevector.
Otherwise return #f."
@@ -1033,7 +1047,9 @@ one. Return #f if URI's scheme is 'file' or #f."
(define* (process-substitution store-item destination
#:key cache-urls acl print-build-trace?)
"Substitute STORE-ITEM (a store file name) from CACHE-URLS, and write it to
-DESTINATION as a nar file. Verify the substitute against ACL."
+DESTINATION as a nar file. Verify the substitute against ACL, and verify its
+hash against what appears in the narinfo. Print a status line on the current
+output port."
(define narinfo
(lookup-narinfo cache-urls store-item
(cut valid-narinfo? <> acl)))
@@ -1044,9 +1060,6 @@ DESTINATION as a nar file. Verify the substitute against ACL."
(let-values (((uri compression file-size)
(narinfo-best-uri narinfo)))
- ;; Tell the daemon what the expected hash of the Nar itself is.
- (format #t "~a~%" (narinfo-hash narinfo))
-
(unless print-build-trace?
(format (current-error-port)
(G_ "Downloading ~a...~%") (uri->string uri)))
@@ -1079,9 +1092,16 @@ DESTINATION as a nar file. Verify the substitute against ACL."
;; closed here, while the child process doing the
;; reporting will close it upon exit.
(decompressed-port (string->symbol compression)
- progress)))
+ progress))
+
+ ;; Compute the actual nar hash as we read it.
+ ((algorithm expected)
+ (narinfo-hash-algorithm+value narinfo))
+ ((hashed get-hash)
+ (open-hash-input-port algorithm input)))
;; Unpack the Nar at INPUT into DESTINATION.
- (restore-file input destination)
+ (restore-file hashed destination)
+ (close-port hashed)
(close-port input)
;; Wait for the reporter to finish.
@@ -1091,8 +1111,17 @@ DESTINATION as a nar file. Verify the substitute against ACL."
;; one to visually separate substitutions.
(display "\n\n" (current-error-port))
- ;; Tell the daemon that we're done.
- (display "success\n" (current-output-port)))))
+ ;; Check whether we got the data announced in NARINFO.
+ (let ((actual (get-hash)))
+ (if (bytevector=? actual expected)
+ ;; Tell the daemon that we're done.
+ (format (current-output-port) "success ~a ~a~%"
+ (narinfo-hash narinfo) (narinfo-size narinfo))
+ ;; The actual data has a different hash than that in NARINFO.
+ (format (current-output-port) "hash-mismatch ~a ~a ~a~%"
+ (hash-algorithm-name algorithm)
+ (bytevector->nix-base32-string expected)
+ (bytevector->nix-base32-string actual)))))))
;;;
diff --git a/nix/libstore/build.cc b/nix/libstore/build.cc
index b5551b87ae..b19471a68f 100644
--- a/nix/libstore/build.cc
+++ b/nix/libstore/build.cc
@@ -2790,10 +2790,6 @@ private:
/* The substituter. */
std::shared_ptr<Agent> substituter;
- /* Either the empty string, or the expected hash as returned by the
- substituter. */
- string expectedHashStr;
-
/* Either the empty string, or the status phrase returned by the
substituter. */
string status;
@@ -3032,36 +3028,47 @@ void SubstitutionGoal::finished()
/* Check the exit status and the build result. */
HashResult hash;
try {
-
- if (status != "success")
- throw SubstError(format("fetching path `%1%' (status: '%2%')")
- % storePath % status);
-
- if (!pathExists(destPath))
- throw SubstError(format("substitute did not produce path `%1%'") % destPath);
-
- if (expectedHashStr == "")
- throw SubstError(format("substituter did not communicate hash for `%1'") % storePath);
-
- hash = hashPath(htSHA256, destPath);
-
- /* Verify the expected hash we got from the substituer. */
- size_t n = expectedHashStr.find(':');
- if (n == string::npos)
- throw Error(format("bad hash from substituter: %1%") % expectedHashStr);
- HashType hashType = parseHashType(string(expectedHashStr, 0, n));
- if (hashType == htUnknown)
- throw Error(format("unknown hash algorithm in `%1%'") % expectedHashStr);
- Hash expectedHash = parseHash16or32(hashType, string(expectedHashStr, n + 1));
- Hash actualHash = hashType == htSHA256 ? hash.first : hashPath(hashType, destPath).first;
- if (expectedHash != actualHash) {
- if (settings.printBuildTrace)
+ auto statusList = tokenizeString<vector<string> >(status);
+
+ if (statusList.empty()) {
+ throw SubstError(format("fetching path `%1%' (empty status: '%2%')")
+ % storePath % status);
+ } else if (statusList[0] == "hash-mismatch") {
+ if (settings.printBuildTrace) {
+ auto hashType = statusList[1];
+ auto expectedHash = statusList[2];
+ auto actualHash = statusList[3];
printMsg(lvlError, format("@ hash-mismatch %1% %2% %3% %4%")
- % storePath % "sha256"
- % printHash16or32(expectedHash)
- % printHash16or32(actualHash));
+ % storePath
+ % hashType % expectedHash % actualHash);
+ }
throw SubstError(format("hash mismatch for substituted item `%1%'") % storePath);
+ } else if (statusList[0] == "success") {
+ if (!pathExists(destPath))
+ throw SubstError(format("substitute did not produce path `%1%'") % destPath);
+
+ std::string hashStr = statusList[1];
+ size_t n = hashStr.find(':');
+ if (n == string::npos)
+ throw Error(format("bad hash from substituter: %1%") % hashStr);
+
+ HashType hashType = parseHashType(string(hashStr, 0, n));
+ switch (hashType) {
+ case htUnknown:
+ throw Error(format("unknown hash algorithm in `%1%'") % hashStr);
+ case htSHA256:
+ hash.first = parseHash16or32(hashType, string(hashStr, n + 1));
+ hash.second = std::atoi(statusList[2].c_str());
+ break;
+ default:
+ /* The database only stores SHA256 hashes, so compute it. */
+ hash = hashPath(htSHA256, destPath);
+ break;
+ }
}
+ else
+ throw SubstError(format("fetching path `%1%' (status: '%2%')")
+ % storePath % status);
} catch (SubstError & e) {
@@ -3123,9 +3130,7 @@ void SubstitutionGoal::handleChildOutput(int fd, const string & data)
string trimmed = (end != string::npos) ? input.substr(0, end) : input;
/* Update the goal's state accordingly. */
- if (expectedHashStr == "") {
- expectedHashStr = trimmed;
- } else if (status == "") {
+ if (status == "") {
status = trimmed;
worker.wakeUp(shared_from_this());
} else {
diff --git a/tests/substitute.scm b/tests/substitute.scm
index b86ce09425..241c55a1f8 100644
--- a/tests/substitute.scm
+++ b/tests/substitute.scm
@@ -28,7 +28,9 @@
#:use-module (guix base32)
#:use-module ((guix store) #:select (%store-prefix))
#:use-module ((guix ui) #:select (guix-warning-port))
- #:use-module ((guix utils) #:select (call-with-compressed-output-port))
+ #:use-module ((guix utils)
+ #:select (call-with-temporary-directory
+ call-with-compressed-output-port))
#:use-module ((guix build utils)
#:select (mkdir-p delete-file-recursively dump-port))
#:use-module (guix tests http)
@@ -36,6 +38,7 @@
#:use-module (rnrs io ports)
#:use-module (web uri)
#:use-module (ice-9 regex)
+ #:use-module (srfi srfi-11)
#:use-module (srfi srfi-26)
#:use-module (srfi srfi-34)
#:use-module (srfi srfi-35)
@@ -304,7 +307,7 @@ System: mips64el-linux\n")
(lambda ()
(guix-substitute "--substitute")))))
-(test-quit "substitute, invalid hash"
+(test-quit "substitute, invalid narinfo hash"
"no valid substitute"
;; The hash in the signature differs from the hash of %NARINFO.
(with-narinfo (string-append %narinfo "Signature: "
@@ -317,6 +320,49 @@ System: mips64el-linux\n")
(lambda ()
(guix-substitute "--substitute")))))
+(test-equal "substitute, invalid hash"
+ (string-append "hash-mismatch sha256 "
+ (bytevector->nix-base32-string (sha256 #vu8())) " "
+ (let-values (((port get-hash)
+ (open-hash-port (hash-algorithm sha256)))
+ ((content)
+ "Substitutable data."))
+ (write-file-tree "foo" port
+ #:file-type+size
+ (lambda _
+ (values 'regular
+ (string-length content)))
+ #:file-port
+ (lambda _
+ (open-input-string content)))
+ (close-port port)
+ (bytevector->nix-base32-string (get-hash)))
+ "\n")
+
+ ;; Arrange so the actual data hash does not match the 'NarHash' field in the
+ ;; narinfo.
+ (with-output-to-string
+ (lambda ()
+ (let ((narinfo (string-append "StorePath: " (%store-prefix)
+ "/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-wrong-hash
+URL: example.nar
+Compression: none
+NarHash: sha256:" (bytevector->nix-base32-string (sha256 #vu8())) "
+NarSize: 42
+References:
+Deriver: " (%store-prefix) "/foo.drv
+System: mips64el-linux\n")))
+ (with-narinfo (string-append narinfo "Signature: "
+ (signature-field narinfo) "\n")
+ (call-with-temporary-directory
+ (lambda (directory)
+ (with-input-from-string (string-append
+ "substitute " (%store-prefix)
+ "/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-wrong-hash "
+ directory "/wrong-hash\n")
+ (lambda ()
+ (guix-substitute "--substitute"))))))))))
+
(test-quit "substitute, unauthorized key"
"no valid substitute"
(with-narinfo (string-append %narinfo "Signature: "
@@ -326,7 +372,7 @@ System: mips64el-linux\n")
"\n")
(with-input-from-string (string-append "substitute "
(%store-prefix)
- "/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-foo"
+ "/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-wrong-hash"
" foo\n")
(lambda ()
(guix-substitute "--substitute")))))
--
2.29.2
L
L
Ludovic Courtès wrote on 15 Dec 2020 10:57
[PATCH 3/6] tests: Check the mtime and permissions of substituted items.
(address . 45253@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20201215095730.10954-3-ludo@gnu.org
* tests/store.scm ("substitute")
("substitute + build-things with output path")
("substitute + build-things with specific output"): Call 'canonical-file?'.
* tests/substitute.scm ("substitute, authorized key"): Check the mtime
and permissions of "substitute-retrieved".
---
tests/store.scm | 3 +++
tests/substitute.scm | 6 ++++--
2 files changed, 7 insertions(+), 2 deletions(-)

Toggle diff (54 lines)
diff --git a/tests/store.scm b/tests/store.scm
index 7f1ec51875..4dc125bcb9 100644
--- a/tests/store.scm
+++ b/tests/store.scm
@@ -715,6 +715,7 @@
#:substitute-urls (%test-substitute-urls))
(and (has-substitutes? s o)
(build-derivations s (list d))
+ (canonical-file? o)
(equal? c (call-with-input-file o get-string-all)))))))
(test-assert "substitute + build-things with output path"
@@ -735,6 +736,7 @@
(and (has-substitutes? s o)
(build-things s (list o)) ;give the output path
(valid-path? s o)
+ (canonical-file? o)
(equal? c (call-with-input-file o get-string-all)))))))
(test-assert "substitute + build-things with specific output"
@@ -755,6 +757,7 @@
(build-things s `((,(derivation-file-name d) . "out")))
(valid-path? s o)
+ (canonical-file? o)
(equal? c (call-with-input-file o get-string-all)))))))
(test-assert "substitute, corrupt output hash"
diff --git a/tests/substitute.scm b/tests/substitute.scm
index 241c55a1f8..4f55a14957 100644
--- a/tests/substitute.scm
+++ b/tests/substitute.scm
@@ -378,7 +378,7 @@ System: mips64el-linux\n")))
(guix-substitute "--substitute")))))
(test-equal "substitute, authorized key"
- "Substitutable data."
+ '("Substitutable data." 1 #o444)
(with-narinfo (string-append %narinfo "Signature: "
(signature-field %narinfo))
(dynamic-wind
@@ -387,7 +387,9 @@ System: mips64el-linux\n")))
(request-substitution (string-append (%store-prefix)
"/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-foo")
"substitute-retrieved")
- (call-with-input-file "substitute-retrieved" get-string-all))
+ (list (call-with-input-file "substitute-retrieved" get-string-all)
+ (stat:mtime (lstat "substitute-retrieved"))
+ (stat:perms (lstat "substitute-retrieved"))))
(lambda ()
(false-if-exception (delete-file "substitute-retrieved"))))))
--
2.29.2
L
L
Ludovic Courtès wrote on 15 Dec 2020 10:57
[PATCH 4/6] daemon: Do not reset timestamps and permissions on substituted items.
(address . 45253@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20201215095730.10954-4-ludo@gnu.org
'guix substitute' now takes care of it via 'restore-file'.

* nix/libstore/build.cc (SubstitutionGoal::finished): Remove call to
'canonicalisePathMetaData'.
---
nix/libstore/build.cc | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

Toggle diff (16 lines)
diff --git a/nix/libstore/build.cc b/nix/libstore/build.cc
index b19471a68f..ea809c6971 100644
--- a/nix/libstore/build.cc
+++ b/nix/libstore/build.cc
@@ -3085,7 +3085,8 @@ void SubstitutionGoal::finished()
if (repair) replaceValidPath(storePath, destPath);
- canonicalisePathMetaData(storePath, -1);
+ /* Note: 'guix substitute' takes care of resetting timestamps and
+ permissions on 'destPath', so no need to do it here. */
worker.store.optimisePath(storePath); // FIXME: combine with hashPath()
--
2.29.2
L
L
Ludovic Courtès wrote on 15 Dec 2020 10:57
[PATCH 5/6] tests: Make sure substituted items are deduplicated.
(address . 45253@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20201215095730.10954-5-ludo@gnu.org
* tests/store.scm ("substitute, deduplication"): New test.
---
tests/store.scm | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

Toggle diff (37 lines)
diff --git a/tests/store.scm b/tests/store.scm
index 4dc125bcb9..c9a08ac690 100644
--- a/tests/store.scm
+++ b/tests/store.scm
@@ -718,6 +718,30 @@
(canonical-file? o)
(equal? c (call-with-input-file o get-string-all)))))))
+(test-assert "substitute, deduplication"
+ (with-store s
+ (let* ((c (random-text)) ; contents of the output
+ (g (package-derivation s %bootstrap-guile))
+ (d1 (build-expression->derivation s "substitute-me"
+ `(begin ,c (exit 1))
+ #:guile-for-build g))
+ (d2 (build-expression->derivation s "build-me"
+ `(call-with-output-file %output
+ (lambda (p)
+ (display ,c p)))
+ #:guile-for-build g))
+ (o1 (derivation->output-path d1))
+ (o2 (derivation->output-path d2)))
+ (with-derivation-substitute d1 c
+ (set-build-options s #:use-substitutes? #t
+ #:substitute-urls (%test-substitute-urls))
+ (and (has-substitutes? s o1)
+ (build-derivations s (list d2)) ;build
+ (build-derivations s (list d1)) ;substitute
+ (canonical-file? o1)
+ (equal? c (call-with-input-file o1 get-string-all))
+ (= (stat:ino (stat o1)) (stat:ino (stat o2))))))))
+
(test-assert "substitute + build-things with output path"
(with-store s
(let* ((c (random-text)) ;contents of the output
--
2.29.2
L
L
Ludovic Courtès wrote on 15 Dec 2020 10:57
[PATCH 6/6] daemon: Delegate deduplication to 'guix substitute'.
(address . 45253@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20201215095730.10954-6-ludo@gnu.org
This removes the main source of latency between subsequent downloads.

* nix/libstore/build.cc (SubstitutionGoal::tryToRun): Add a
"deduplicate" key to ENV.
(SubstitutionGoal::finished): Remove call to 'optimisePath'.
* guix/scripts/substitute.scm (process-substitution)[destination-in-store?]
[dump-file/deduplicate*]: New variables.
Pass #:dump-file to 'restore-file'.
* guix/scripts/substitute.scm (guix-substitute)[deduplicate?]: New
variable.
Pass #:deduplicate? to 'process-substitution'.
* guix/serialization.scm (dump-file): Export and augment 'dump-file'.
---
guix/scripts/substitute.scm | 31 ++++++++++++++++++++++++++-----
guix/serialization.scm | 8 ++++++--
nix/libstore/build.cc | 13 ++++++++-----
3 files changed, 40 insertions(+), 12 deletions(-)

Toggle diff (135 lines)
diff --git a/guix/scripts/substitute.scm b/guix/scripts/substitute.scm
index 17d0002b9f..38702d0c4b 100755
--- a/guix/scripts/substitute.scm
+++ b/guix/scripts/substitute.scm
@@ -28,7 +28,8 @@
#:use-module (guix records)
#:use-module (guix diagnostics)
#:use-module (guix i18n)
- #:use-module ((guix serialization) #:select (restore-file))
+ #:use-module ((guix serialization) #:select (restore-file dump-file))
+ #:autoload (guix store deduplication) (dump-file/deduplicate)
#:autoload (guix scripts discover) (read-substitute-urls)
#:use-module (gcrypt hash)
#:use-module (guix base32)
@@ -1045,15 +1046,27 @@ one. Return #f if URI's scheme is 'file' or #f."
(call-with-cached-connection uri (lambda (port) exp ...)))
(define* (process-substitution store-item destination
- #:key cache-urls acl print-build-trace?)
+ #:key cache-urls acl
+ deduplicate? print-build-trace?)
"Substitute STORE-ITEM (a store file name) from CACHE-URLS, and write it to
DESTINATION as a nar file. Verify the substitute against ACL, and verify its
-hash against what appears in the narinfo. Print a status line on the current
-output port."
+hash against what appears in the narinfo. When DEDUPLICATE? is true, and if
+DESTINATION is in the store, deduplicate its files. Print a status line on
+the current output port."
(define narinfo
(lookup-narinfo cache-urls store-item
(cut valid-narinfo? <> acl)))
+ (define destination-in-store?
+ (string-prefix? (string-append (%store-prefix) "/")
+ destination))
+
+ (define (dump-file/deduplicate* . args)
+ ;; Make sure deduplication looks at the right store (necessary in test
+ ;; environments).
+ (apply dump-file/deduplicate
+ (append args (list #:store (%store-prefix)))))
+
(unless narinfo
(leave (G_ "no valid substitute for '~a'~%")
store-item))
@@ -1100,7 +1113,11 @@ output port."
((hashed get-hash)
(open-hash-input-port algorithm input)))
;; Unpack the Nar at INPUT into DESTINATION.
- (restore-file hashed destination)
+ (restore-file hashed destination
+ #:dump-file (if (and destination-in-store?
+ deduplicate?)
+ dump-file/deduplicate*
+ dump-file))
(close-port hashed)
(close-port input)
@@ -1248,6 +1265,9 @@ default value."
((= string->number number) (> number 0))
(_ #f)))
+ (define deduplicate?
+ (find-daemon-option "deduplicate"))
+
;; The daemon's agent code opens file descriptor 4 for us and this is where
;; stderr should go.
(parameterize ((current-error-port (if (%error-to-file-descriptor-4?)
@@ -1307,6 +1327,7 @@ default value."
(process-substitution store-path destination
#:cache-urls (substitute-urls)
#:acl (current-acl)
+ #:deduplicate? deduplicate?
#:print-build-trace?
print-build-trace?)
(loop))))))
diff --git a/guix/serialization.scm b/guix/serialization.scm
index 9e2dce8bb0..59cd93fb18 100644
--- a/guix/serialization.scm
+++ b/guix/serialization.scm
@@ -51,7 +51,8 @@
write-file
write-file-tree
fold-archive
- restore-file))
+ restore-file
+ dump-file))
;;; Comment:
;;;
@@ -458,7 +459,10 @@ depends on TYPE."
(&nar-read-error (port port) (file file) (token x)))))))))
(define (dump-file file input size type)
- "Dump SIZE bytes from INPUT to FILE."
+ "Dump SIZE bytes from INPUT to FILE.
+
+This procedure is suitable for use as the #:dump-file argument to
+'restore-file'."
(call-with-output-file file
(lambda (output)
(dump input output size))))
diff --git a/nix/libstore/build.cc b/nix/libstore/build.cc
index ea809c6971..20d83fea4a 100644
--- a/nix/libstore/build.cc
+++ b/nix/libstore/build.cc
@@ -2984,7 +2984,12 @@ void SubstitutionGoal::tryToRun()
if (!worker.substituter) {
const Strings args = { "substitute", "--substitute" };
- const std::map<string, string> env = { { "_NIX_OPTIONS", settings.pack() } };
+ const std::map<string, string> env = {
+ { "_NIX_OPTIONS",
+ settings.pack() + "deduplicate="
+ + (settings.autoOptimiseStore ? "yes" : "no")
+ }
+ };
worker.substituter = std::make_shared<Agent>(settings.guixProgram, args, env);
}
@@ -3085,10 +3090,8 @@ void SubstitutionGoal::finished()
if (repair) replaceValidPath(storePath, destPath);
- /* Note: 'guix substitute' takes care of resetting timestamps and
- permissions on 'destPath', so no need to do it here. */
-
- worker.store.optimisePath(storePath); // FIXME: combine with hashPath()
+ /* Note: 'guix substitute' takes care of resetting timestamps and of
+ deduplicating 'destPath', so no need to do it here. */
ValidPathInfo info2;
info2.path = storePath;
--
2.29.2
L
L
Ludovic Courtès wrote on 15 Dec 2020 12:43
Re: [bug#45253] [PATCH 0/6] Pipeline substitute integrity check, deduplication, and canonicalization
(address . 45253@debbugs.gnu.org)
877dpjmh0c.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (13 lines)
> I tested with substitutes that contain many files:
>
> guix build pipewire@0.2 ffmpeg ungoogled-chromium vim-full \
> emacs-no-x emacs-no-x-toolkit
>
> On my laptop with an SSD, the wall-clock time is almost unchanged
> when fetching lzip substitutes. You can see that the throughput
> displayed while downloading is slightly lower than before, which
> is consistent because lzip downloads are CPU-bound¹, but this is
> compensated by the lack of processing time between substitutes.
> With gzip substitutes, I see a 10% speedup on the wall-clock time
> on my laptop.

Picture! First the timechart with the current daemon (gzip
substitutes, downloading from the LAN):
Notice how guix-daemon is busy in between substitute downloads: that’s
the time it takes to compute the nar hash of the store item, reset its
timestamps, and deduplicate its files.

Now the same operation after with this patch series:
This time guix-daemon remains idle all along whereas ‘guix substitute’
is almost 100% busy. There’s no pause time between substitutes.

Ludo’.
M
M
Maxim Cournoyer wrote on 19 Dec 2020 03:07
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 45253@debbugs.gnu.org)
87pn36eefd.fsf@gmail.com
Hey Ludo,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (35 lines)
> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> I tested with substitutes that contain many files:
>>
>> guix build pipewire@0.2 ffmpeg ungoogled-chromium vim-full \
>> emacs-no-x emacs-no-x-toolkit
>>
>> On my laptop with an SSD, the wall-clock time is almost unchanged
>> when fetching lzip substitutes. You can see that the throughput
>> displayed while downloading is slightly lower than before, which
>> is consistent because lzip downloads are CPU-bound¹, but this is
>> compensated by the lack of processing time between substitutes.
>> With gzip substitutes, I see a 10% speedup on the wall-clock time
>> on my laptop.
>
> Picture! First the timechart with the current daemon (gzip
> substitutes, downloading from the LAN):
>
>
>
>
> Notice how guix-daemon is busy in between substitute downloads: that’s
> the time it takes to compute the nar hash of the store item, reset its
> timestamps, and deduplicate its files.
>
> Now the same operation after with this patch series:
>
>
>
>
> This time guix-daemon remains idle all along whereas ‘guix substitute’
> is almost 100% busy. There’s no pause time between substitutes.
>
> Ludo’.

Very nice! Thanks for this work! I can't wait to try it.

Maxim
L
L
Ludovic Courtès wrote on 20 Dec 2020 00:04
(address . 45253-done@debbugs.gnu.org)
87sg818kj3.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (7 lines)
> tests: Check the build trace for hash mismatches on substitutes.
> daemon: Let 'guix substitute' perform hash checks.
> tests: Check the mtime and permissions of substituted items.
> daemon: Do not reset timestamps and permissions on substituted items.
> tests: Make sure substituted items are deduplicated.
> daemon: Delegate deduplication to 'guix substitute'.

Pushed as c7c7f068c15e419aaf5ef616516aa5ad4e55c2fa.

Ludo’.
Closed
?