[PATCH 0/4] Reduce download builder duplication.

  • Done
  • quality assurance status badge
Details
One participant
  • Christopher Baines
Owner
unassigned
Submitted by
Christopher Baines
Severity
normal
C
C
Christopher Baines wrote on 11 May 18:29 +0200
(address . guix-patches@gnu.org)
87h6f48g83.fsf@cbaines.net
I think we currently have an issue where the builder scripts for several
download approaches are very numerous, svn-multi-download is probably
the extreme example as that's used for the numerous texlive packages.

I noticed this when wondering why bayfront is spending so much time
substituting svn-multi-download files from data.guix.gnu.org, and it's
probably taking up extra space in the data.guix.gnu.org database too.

These commits should address the issue in svn-multi-fetch, svn-fetch,
hg-fetch and git-fetch (although this issue doesn't affect users of
builtin:git-download). The main change is to pass the hash through to
the builder as an environment variable, rather than generating a
different builder script for each different hash.

I've also restructured the code to try and avoid this problem in the
future. While there was a comment about the intent to not duplicate the
builder scripts, it's too easy to miss. By moving the builder to it's
own procedure and moving the comment to the call site, it'll be clearer
when a new source of variation is added.


Christopher Baines (4):
svn-download: Reduce svn-multi-fetch builder duplication.
svn-download: Reduce svn-fetch builder duplication.
hg-download: Reduce builder duplication.
git-download: Reduce builder duplication.

guix/git-download.scm | 123 +++++++++++---------
guix/hg-download.scm | 127 +++++++++++---------
guix/svn-download.scm | 264 +++++++++++++++++++++++-------------------
3 files changed, 291 insertions(+), 223 deletions(-)
-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmY/nNxfFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9XeDBhAAnOfrxM7VgqX3ewKDkfIzzoE6rR9m4SzC
AtYHn4qt7si4cD/LqEV0ll2cKr7wMCkA6Adbp1eQuXtnLGlRf2U4d6zPZST7lkPY
2PkW6uqwCCjFEyfB7b0jvJCjWaWuKnXnEH9Zpyug9aV1yNLS6XPx1ZB3uVvbq4Wy
741UdTZyWqliBPiWLVFkPlHDbtA7v3rDiwnGq2QaHx4ZhBMw8LhWiuajVuduQ6Wt
hSduLZxHKyk36VuW1cDalsOdyg45eYskF43/jvKkdcJPGAmfKBRuMi2ihbRxrsIB
HvBXGVkVAVMTleQf/5s3CUdGyLF5/pDTnIiTjZjW9MpBB1cGYLnKkkGde0ySAZ27
Dts56CBfe543RjoMhz/NUUK3qlvRQkb/hs0QcvKC//lJVV6oCKTPUeE1a+FKmbLl
ag0eJ6T/QtbrDFZVobeBdLLGz+bc1+aA4HdBDYfjHh4FW/UHO2Soj7zDDoaW+IYL
Bu1qABEAIvCb2lvIY8isWNmJQBq/m3f42BB9updQX8I6LO2JAVDnxALpk/2hC2yf
W9zJxXnMkLdoSGLsOdh7Ik3PjiIEkMAoiG7Xgny1aNng9YeC6AcpwrC3GNlNhTjS
+MBT1G6AxZRVfcZ1uUJXvlItrlkYO/RnOCq/mZAZOb9gj8MprRncwnqj/9UxM+II
Bo3zQFDf7h8=
=WPDc
-----END PGP SIGNATURE-----

C
C
Christopher Baines wrote on 11 May 18:40 +0200
[PATCH 2/4] svn-download: Reduce svn-fetch builder duplication.
(address . 70878@debbugs.gnu.org)
19983108b5ee5339964c1081426f5dc98f9a3a74.1715445642.git.mail@cbaines.net
Rather than creating a different builder in the store for every different
download (by hash), remove the hash from the builder and pass it in via an
environment variable. This means that when svn-fetch is used by two different
package sources, the derivations will still differ but the builder will be
shared.

I think it used to be this way, but probably changed with
0e73f933b291c7e154c7e019b6de1e2f3a97e4c1. I noticed the hash in the builder
script when wondering why the build coordinator on bayfront was substituting
svn-multi-download files over and over again.

To try and make the effects of introducing variance in to the builder script
more obvious, separate it out in to it's own procedure, so that it's clearer
when there's new data going in that could cause variance.

* guix/svn-download.scm (svn-fetch): Extract out builder script, include hash
in the derivation as an environment variable and update the comment to be more
directive.
(svn-fetch-builder): New procedure.

Change-Id: I256b94666296ad747f494f0b497ca209b77fbfb4
---
guix/svn-download.scm | 113 +++++++++++++++++++++++-------------------
1 file changed, 63 insertions(+), 50 deletions(-)

Toggle diff (144 lines)
diff --git a/guix/svn-download.scm b/guix/svn-download.scm
index 812a46c9d4..62649e4374 100644
--- a/guix/svn-download.scm
+++ b/guix/svn-download.scm
@@ -74,14 +74,7 @@ (define (subversion-package)
(let ((distro (resolve-interface '(gnu packages version-control))))
(module-ref distro 'subversion)))
-(define* (svn-fetch ref hash-algo hash
- #:optional name
- #:key (system (%current-system)) (guile (default-guile))
- (svn (subversion-package)))
- "Return a fixed-output derivation that fetches REF, a <svn-reference>
-object. The output is expected to have recursive hash HASH of type
-HASH-ALGO (a symbol). Use NAME as the file name, or a generic name if #f."
-
+(define (svn-fetch-builder svn hash-algo)
(define guile-json
(module-ref (resolve-interface '(gnu packages guile)) 'guile-json-4))
@@ -97,51 +90,64 @@ (define* (svn-fetch ref hash-algo hash
(module-ref (resolve-interface '(gnu packages base))
'tar)))
- (define build
- (with-imported-modules
- (source-module-closure '((guix build svn)
- (guix build download)
- (guix build download-nar)
- (guix build utils)
- (guix swh)))
- (with-extensions (list guile-json guile-gnutls ;for (guix swh)
- guile-lzlib)
- #~(begin
- (use-modules (guix build svn)
- ((guix build download)
- #:select (download-method-enabled?))
- (guix build download-nar)
- (guix build utils)
- (guix swh)
- (ice-9 match))
+ (with-imported-modules
+ (source-module-closure '((guix build svn)
+ (guix build download)
+ (guix build download-nar)
+ (guix build utils)
+ (guix swh)))
+ (with-extensions (list guile-json guile-gnutls ;for (guix swh)
+ guile-lzlib)
+ #~(begin
+ (use-modules (guix build svn)
+ ((guix build download)
+ #:select (download-method-enabled?))
+ (guix build download-nar)
+ (guix build utils)
+ (guix swh)
+ (ice-9 match))
- ;; Add tar and gzip to $PATH so
- ;; 'swh-download-directory-by-nar-hash' can invoke them.
- (set-path-environment-variable "PATH" '("bin") '(#+@tar+gzip))
+ ;; Add tar and gzip to $PATH so
+ ;; 'swh-download-directory-by-nar-hash' can invoke them.
+ (set-path-environment-variable "PATH" '("bin") '(#+@tar+gzip))
- (or (and (download-method-enabled? 'upstream)
- (svn-fetch (getenv "svn url")
- (string->number (getenv "svn revision"))
- #$output
- #:svn-command #+(file-append svn "/bin/svn")
- #:recursive? (match (getenv "svn recursive?")
- ("yes" #t)
- (_ #f))
- #:user-name (getenv "svn user name")
- #:password (getenv "svn password")))
- (and (download-method-enabled? 'nar)
- (download-nar #$output))
- (and (download-method-enabled? 'swh)
- (parameterize ((%verify-swh-certificate? #f))
- (swh-download-directory-by-nar-hash #$hash '#$hash-algo
- #$output))))))))
+ (or (and (download-method-enabled? 'upstream)
+ (svn-fetch (getenv "svn url")
+ (string->number (getenv "svn revision"))
+ #$output
+ #:svn-command #+(file-append svn "/bin/svn")
+ #:recursive? (match (getenv "svn recursive?")
+ ("yes" #t)
+ (_ #f))
+ #:user-name (getenv "svn user name")
+ #:password (getenv "svn password")))
+ (and (download-method-enabled? 'nar)
+ (download-nar #$output))
+ (and (download-method-enabled? 'swh)
+ (parameterize ((%verify-swh-certificate? #f))
+ (swh-download-directory-by-nar-hash
+ (u8-list->bytevector
+ (map string->number
+ (string-split (getenv "hash") #\,)))
+ '#$hash-algo
+ #$output))))))))
+(define* (svn-fetch ref hash-algo hash
+ #:optional name
+ #:key (system (%current-system)) (guile (default-guile))
+ (svn (subversion-package)))
+ "Return a fixed-output derivation that fetches REF, a <svn-reference>
+object. The output is expected to have recursive hash HASH of type
+HASH-ALGO (a symbol). Use NAME as the file name, or a generic name if #f."
(mlet %store-monad ((guile (package->derivation guile system)))
- (gexp->derivation (or name "svn-checkout") build
-
- ;; Use environment variables and a fixed script name so
- ;; there's only one script in store for all the
- ;; downloads.
+ (gexp->derivation (or name "svn-checkout")
+ ;; Avoid the builder differing for every single use as
+ ;; having less builder is more efficient for computing
+ ;; derivations.
+ ;;
+ ;; Don't pass package specific data in to the following
+ ;; procedure, use #:env-vars below instead.
+ (svn-fetch-builder svn hash-algo)
#:script-name "svn-download"
#:env-vars
`(("svn url" . ,(svn-reference-url ref))
@@ -161,7 +167,14 @@ (define* (svn-fetch ref hash-algo hash
,@(match (getenv "GUIX_DOWNLOAD_METHODS")
(#f '())
(value
- `(("GUIX_DOWNLOAD_METHODS" . ,value)))))
+ `(("GUIX_DOWNLOAD_METHODS" . ,value))))
+ ;; To avoid pulling in (guix base32) in the builder
+ ;; script, use bytevector->u8-list from (rnrs
+ ;; bytevectors)
+ ("hash" . ,(string-join
+ (map number->string
+ (bytevector->u8-list hash))
+ ",")))
#:system system
#:hash-algo hash-algo
--
2.41.0
C
C
Christopher Baines wrote on 11 May 18:40 +0200
[PATCH 3/4] hg-download: Reduce builder duplication.
(address . 70878@debbugs.gnu.org)
9b1483d4d1240ef1add988f15ef9019ffe11ad13.1715445642.git.mail@cbaines.net
Rather than creating a different builder in the store for every different
download (by hash), remove the hash from the builder and pass it in via an
environment variable. This means that when hg-fetch is used by two different
package sources, the derivations will still differ but the builder will be
shared.

Looking at the code, becuase the ref is also in the builder, the builders have
been duplicated for a while. The overhead is probably limited though since
hg-reference isn't used much compared to say svn-multi-reference.

To try and make the effects of introducing variance in to the builder script
more obvious, separate it out in to it's own procedure, so that it's clearer
when there's new data going in that could cause variance.

* guix/hg-download.scm (hg-fetch): Extract out builder script and include
hash, hg ref url, and hg ref changeset in the derivation as an environment
variables.
(hg-fetch-builder): New procedure.

Change-Id: I3c3a0b4963ea1b208bf1d5137ef98666458ae2d7
---
guix/hg-download.scm | 127 +++++++++++++++++++++++++------------------
1 file changed, 75 insertions(+), 52 deletions(-)

Toggle diff (159 lines)
diff --git a/guix/hg-download.scm b/guix/hg-download.scm
index 55d908817f..812017e73d 100644
--- a/guix/hg-download.scm
+++ b/guix/hg-download.scm
@@ -30,6 +30,7 @@ (define-module (guix hg-download)
#:use-module (ice-9 match)
#:use-module (ice-9 popen)
#:use-module (ice-9 rdelim)
+ #:use-module (rnrs bytevectors)
#:export (hg-reference
hg-reference?
hg-reference-url
@@ -58,13 +59,7 @@ (define (hg-package)
(let ((distro (resolve-interface '(gnu packages version-control))))
(module-ref distro 'mercurial)))
-(define* (hg-fetch ref hash-algo hash
- #:optional name
- #:key (system (%current-system)) (guile (default-guile))
- (hg (hg-package)))
- "Return a fixed-output derivation that fetches REF, a <hg-reference>
-object. The output is expected to have recursive hash HASH of type
-HASH-ALGO (a symbol). Use NAME as the file name, or a generic name if #f."
+(define (hg-fetch-builder hg hash-algo)
(define inputs
;; The 'swh-download' procedure requires tar and gzip.
`(("gzip" ,(module-ref (resolve-interface '(gnu packages compression))
@@ -88,56 +83,84 @@ (define* (hg-fetch ref hash-algo hash
(guix build download-nar)
(guix swh)))))
- (define build
- (with-imported-modules modules
- (with-extensions (list guile-json gnutls ;for (guix swh)
- guile-lzlib)
- #~(begin
- (use-modules (guix build hg)
- (guix build utils) ;for `set-path-environment-variable'
- ((guix build download)
- #:select (download-method-enabled?))
- (guix build download-nar)
- (guix swh)
- (ice-9 match))
-
- (set-path-environment-variable "PATH" '("bin")
- (match '#+inputs
- (((names dirs outputs ...) ...)
- dirs)))
-
- (setvbuf (current-output-port) 'line)
- (setvbuf (current-error-port) 'line)
-
- (or (and (download-method-enabled? 'upstream)
- (hg-fetch '#$(hg-reference-url ref)
- '#$(hg-reference-changeset ref)
- #$output
- #:hg-command (string-append #+hg "/bin/hg")))
- (and (download-method-enabled? 'nar)
- (download-nar #$output))
- ;; As a last resort, attempt to download from Software Heritage.
- ;; Disable X.509 certificate verification to avoid depending
- ;; on nss-certs--we're authenticating the checkout anyway.
- (and (download-method-enabled? 'swh)
- (parameterize ((%verify-swh-certificate? #f))
- (format (current-error-port)
- "Trying to download from Software Heritage...~%")
- (or (swh-download-directory-by-nar-hash
- #$hash '#$hash-algo #$output)
- (swh-download #$(hg-reference-url ref)
- #$(hg-reference-changeset ref)
- #$output)))))))))
+ (with-imported-modules modules
+ (with-extensions (list guile-json gnutls ;for (guix swh)
+ guile-lzlib)
+ #~(begin
+ (use-modules (guix build hg)
+ (guix build utils) ;for `set-path-environment-variable'
+ ((guix build download)
+ #:select (download-method-enabled?))
+ (guix build download-nar)
+ (guix swh)
+ (ice-9 match)
+ (rnrs bytevectors))
+
+ (set-path-environment-variable "PATH" '("bin")
+ (match '#+inputs
+ (((names dirs outputs ...) ...)
+ dirs)))
+
+ (setvbuf (current-output-port) 'line)
+ (setvbuf (current-error-port) 'line)
+
+ (or (and (download-method-enabled? 'upstream)
+ (hg-fetch (string->symbol (getenv "hg ref url"))
+ (string->symbol (getenv "hg ref changeset"))
+ #$output
+ #:hg-command (string-append #+hg "/bin/hg")))
+ (and (download-method-enabled? 'nar)
+ (download-nar #$output))
+ ;; As a last resort, attempt to download from Software Heritage.
+ ;; Disable X.509 certificate verification to avoid depending
+ ;; on nss-certs--we're authenticating the checkout anyway.
+ (and (download-method-enabled? 'swh)
+ (parameterize ((%verify-swh-certificate? #f))
+ (format (current-error-port)
+ "Trying to download from Software Heritage...~%")
+ (or (swh-download-directory-by-nar-hash
+ (u8-list->bytevector
+ (map string->number
+ (string-split (getenv "hash") #\,)))
+ '#$hash-algo
+ #$output)
+ (swh-download (getenv "hg ref url")
+ (getenv "hg ref changeset")
+ #$output)))))))))
+(define* (hg-fetch ref hash-algo hash
+ #:optional name
+ #:key (system (%current-system)) (guile (default-guile))
+ (hg (hg-package)))
+ "Return a fixed-output derivation that fetches REF, a <hg-reference>
+object. The output is expected to have recursive hash HASH of type
+HASH-ALGO (a symbol). Use NAME as the file name, or a generic name if #f."
(mlet %store-monad ((guile (package->derivation guile system)))
- (gexp->derivation (or name "hg-checkout") build
+ (gexp->derivation (or name "hg-checkout")
+ ;; Avoid the builder differing for every single use as
+ ;; having less builder is more efficient for computing
+ ;; derivations.
+ ;;
+ ;; Don't pass package specific data in to the following
+ ;; procedure, use #:env-vars below instead.
+ (hg-fetch-builder hg hash-algo)
#:leaked-env-vars '("http_proxy" "https_proxy"
"LC_ALL" "LC_MESSAGES" "LANG"
"COLUMNS")
- #:env-vars (match (getenv "GUIX_DOWNLOAD_METHODS")
- (#f '())
- (value
- `(("GUIX_DOWNLOAD_METHODS" . ,value))))
+ #:env-vars
+ `(("hg ref url" . ,(hg-reference-url ref))
+ ("hg ref changeset" . ,(hg-reference-changeset ref))
+ ;; To avoid pulling in (guix base32) in the builder
+ ;; script, use bytevector->u8-list from (rnrs
+ ;; bytevectors)
+ ("hash" . ,(string-join
+ (map number->string
+ (bytevector->u8-list hash))
+ ","))
+ ,@(match (getenv "GUIX_DOWNLOAD_METHODS")
+ (#f '())
+ (value
+ `(("GUIX_DOWNLOAD_METHODS" . ,value)))))
#:system system
#:local-build? #t ;don't offload repo cloning
#:hash-algo hash-algo
--
2.41.0
C
C
Christopher Baines wrote on 11 May 18:40 +0200
[PATCH 1/4] svn-download: Reduce svn-multi-fetch builder duplication.
(address . 70878@debbugs.gnu.org)
c7bf7c56001dc8607df4b4287e78dcb611df9ecd.1715445642.git.mail@cbaines.net
Rather than creating a different builder in the store for every different
download (by hash), remove the hash from the builder and pass it in via an
environment variable. This means that when svn-multi-fetch is used by two
different package sources, the derivations will still differ but the builder
will be shared.

I think it used to be this way, but probably changed with
0e73f933b291c7e154c7e019b6de1e2f3a97e4c1. I noticed the hash in the builder
script when wondering why the build coordinator on bayfront was substituting
svn-multi-download files over and over again.

To try and make the effects of introducing variance in to the builder script
more obvious, separate it out in to it's own procedure, so that it's clearer
when there's new data going in that could cause variance.

* guix/svn-download.scm (svn-multi-fetch): Extract out builder script, include
hash in the derivation as an environment variable and update comment to be
more directive.
(svn-multi-fetch-builder): New procedure.

Change-Id: I83c60140ae09e189ee5e5428038a9428ecb8e281
---
guix/svn-download.scm | 151 +++++++++++++++++++++++-------------------
1 file changed, 83 insertions(+), 68 deletions(-)

Toggle diff (191 lines)
diff --git a/guix/svn-download.scm b/guix/svn-download.scm
index bdd9c39eb5..812a46c9d4 100644
--- a/guix/svn-download.scm
+++ b/guix/svn-download.scm
@@ -30,6 +30,7 @@ (define-module (guix svn-download)
#:use-module ((guix build utils) #:select (mkdir-p))
#:use-module (ice-9 match)
#:use-module (srfi srfi-1)
+ #:use-module (rnrs bytevectors)
#:export (svn-reference
svn-reference?
svn-reference-url
@@ -179,14 +180,7 @@ (define-record-type* <svn-multi-reference>
(user-name svn-multi-reference-user-name (default #f))
(password svn-multi-reference-password (default #f)))
-(define* (svn-multi-fetch ref hash-algo hash
- #:optional name
- #:key (system (%current-system)) (guile (default-guile))
- (svn (subversion-package)))
- "Return a fixed-output derivation that fetches REF, a <svn-multi-reference>
-object. The output is expected to have recursive hash HASH of type
-HASH-ALGO (a symbol). Use NAME as the file name, or a generic name if #f."
-
+(define (svn-multi-fetch-builder svn hash-algo)
(define guile-json
(module-ref (resolve-interface '(gnu packages guile)) 'guile-json-4))
@@ -202,69 +196,83 @@ (define* (svn-multi-fetch ref hash-algo hash
(module-ref (resolve-interface '(gnu packages base))
'tar)))
- (define build
- (with-imported-modules
- (source-module-closure '((guix build svn)
- (guix build download)
- (guix build download-nar)
- (guix build utils)
- (guix swh)))
- (with-extensions (list guile-json guile-gnutls ;for (guix swh)
- guile-lzlib)
- #~(begin
- (use-modules (guix build svn)
- (guix build utils)
- ((guix build download)
- #:select (download-method-enabled?))
- (guix build download-nar)
- (guix swh)
- (srfi srfi-1)
- (ice-9 match))
+ (with-imported-modules
+ (source-module-closure '((guix build svn)
+ (guix build download)
+ (guix build download-nar)
+ (guix build utils)
+ (guix swh)))
+ (with-extensions (list guile-json guile-gnutls ;for (guix swh)
+ guile-lzlib)
+ #~(begin
+ (use-modules (guix build svn)
+ (guix build utils)
+ ((guix build download)
+ #:select (download-method-enabled?))
+ (guix build download-nar)
+ (guix swh)
+ (srfi srfi-1)
+ (ice-9 match)
+ (rnrs bytevectors))
- ;; Add tar and gzip to $PATH so
- ;; 'swh-download-directory-by-nar-hash' can invoke them.
- (set-path-environment-variable "PATH" '("bin") '(#+@tar+gzip))
+ ;; Add tar and gzip to $PATH so
+ ;; 'swh-download-directory-by-nar-hash' can invoke them.
+ (set-path-environment-variable "PATH" '("bin") '(#+@tar+gzip))
- (or (every
- (lambda (location)
- ;; The directory must exist if we are to fetch only a
- ;; single file.
- (unless (string-suffix? "/" location)
- (mkdir-p (string-append #$output "/" (dirname location))))
- (and (download-method-enabled? 'upstream)
- (svn-fetch (string-append (getenv "svn url") "/" location)
- (string->number (getenv "svn revision"))
- (if (string-suffix? "/" location)
- (string-append #$output "/" location)
- (string-append #$output "/" (dirname location)))
- #:svn-command #+(file-append svn "/bin/svn")
- #:recursive? (match (getenv "svn recursive?")
- ("yes" #t)
- (_ #f))
- #:user-name (getenv "svn user name")
- #:password (getenv "svn password"))))
- (call-with-input-string (getenv "svn locations")
- read))
- (begin
- (when (file-exists? #$output)
- (delete-file-recursively #$output))
- (or (and (download-method-enabled? 'nar)
- (download-nar #$output))
- (and (download-method-enabled? 'swh)
- ;; SWH keeps HASH as an ExtID for the combination
- ;; of files/directories, which allows us to
- ;; retrieve the entire combination at once:
- ;; <https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/5263>.
- (parameterize ((%verify-swh-certificate? #f))
- (swh-download-directory-by-nar-hash
- #$hash '#$hash-algo #$output))))))))))
+ (or (every
+ (lambda (location)
+ ;; The directory must exist if we are to fetch only a
+ ;; single file.
+ (unless (string-suffix? "/" location)
+ (mkdir-p (string-append #$output "/" (dirname location))))
+ (and (download-method-enabled? 'upstream)
+ (svn-fetch (string-append (getenv "svn url") "/" location)
+ (string->number (getenv "svn revision"))
+ (if (string-suffix? "/" location)
+ (string-append #$output "/" location)
+ (string-append #$output "/" (dirname location)))
+ #:svn-command #+(file-append svn "/bin/svn")
+ #:recursive? (match (getenv "svn recursive?")
+ ("yes" #t)
+ (_ #f))
+ #:user-name (getenv "svn user name")
+ #:password (getenv "svn password"))))
+ (call-with-input-string (getenv "svn locations")
+ read))
+ (begin
+ (when (file-exists? #$output)
+ (delete-file-recursively #$output))
+ (or (and (download-method-enabled? 'nar)
+ (download-nar #$output))
+ (and (download-method-enabled? 'swh)
+ ;; SWH keeps HASH as an ExtID for the combination
+ ;; of files/directories, which allows us to
+ ;; retrieve the entire combination at once:
+ ;; <https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/5263>.
+ (parameterize ((%verify-swh-certificate? #f))
+ (swh-download-directory-by-nar-hash
+ (u8-list->bytevector
+ (map string->number
+ (string-split (getenv "hash") #\,)))
+ '#$hash-algo
+ #$output))))))))))
+(define* (svn-multi-fetch ref hash-algo hash
+ #:optional name
+ #:key (system (%current-system)) (guile (default-guile))
+ (svn (subversion-package)))
+ "Return a fixed-output derivation that fetches REF, a <svn-multi-reference>
+object. The output is expected to have recursive hash HASH of type
+HASH-ALGO (a symbol). Use NAME as the file name, or a generic name if #f."
(mlet %store-monad ((guile (package->derivation guile system)))
- (gexp->derivation (or name "svn-checkout") build
-
- ;; Use environment variables and a fixed script name so
- ;; there's only one script in store for all the
- ;; downloads.
+ (gexp->derivation (or name "svn-checkout")
+ ;; Avoid the builder differing for every single use as
+ ;; having less builder is more efficient for computing
+ ;; derivations.
+ ;;
+ ;; Don't pass package specific data in to the following
+ ;; procedure, use #:env-vars below instead.
+ (svn-multi-fetch-builder svn hash-algo)
#:script-name "svn-multi-download"
#:env-vars
`(("svn url" . ,(svn-multi-reference-url ref))
@@ -286,7 +294,14 @@ (define* (svn-multi-fetch ref hash-algo hash
,@(match (getenv "GUIX_DOWNLOAD_METHODS")
(#f '())
(value
- `(("GUIX_DOWNLOAD_METHODS" . ,value)))))
+ `(("GUIX_DOWNLOAD_METHODS" . ,value))))
+ ;; To avoid pulling in (guix base32) in the builder
+ ;; script, use bytevector->u8-list from (rnrs
+ ;; bytevectors)
+ ("hash" . ,(string-join
+ (map number->string
+ (bytevector->u8-list hash))
+ ",")))
#:leaked-env-vars '("http_proxy" "https_proxy"
"LC_ALL" "LC_MESSAGES" "LANG"

base-commit: 9288654773a110156e0bb6fc703a9c24f5bfc527
--
2.41.0
C
C
Christopher Baines wrote on 11 May 18:40 +0200
[PATCH 4/4] git-download: Reduce builder duplication.
(address . 70878@debbugs.gnu.org)
40494278a13c25dc8d5bea008b20d62162ac8d2e.1715445642.git.mail@cbaines.net
Rather than creating a different builder in the store for every different
download (by hash), remove the hash from the builder and pass it in via an
environment variable. This means that when git-fetch is used by two different
package sources, the derivations will still differ but the builder will be
shared.

I think it used to be this way, but probably changed with
264fdbcaff9c078642355bace0c61c094b3581fc. I noticed this through looking at
the same problem with svn-multi-fetch.

To try and make the effects of introducing variance in to the builder script
more obvious, separate it out in to it's own procedure, so that it's clearer
when there's new data going in that could cause variance.

* guix/git-download.scm (git-fetch/in-band*): Extract out builder script,
include hash in the derivation as an environment variable and update the
comment to be more directive.
(git-fetch-builder): New procedure.

Change-Id: I59c9fc445667c0e7dc44bcb706818300c394a1e5
---
guix/git-download.scm | 123 ++++++++++++++++++++++++------------------
1 file changed, 70 insertions(+), 53 deletions(-)

Toggle diff (164 lines)
diff --git a/guix/git-download.scm b/guix/git-download.scm
index d26a814e07..ce40701563 100644
--- a/guix/git-download.scm
+++ b/guix/git-download.scm
@@ -48,6 +48,7 @@ (define-module (guix git-download)
#:use-module (srfi srfi-1)
#:use-module (srfi srfi-34)
#:use-module (srfi srfi-35)
+ #:use-module (rnrs bytevectors)
#:export (git-reference
git-reference?
git-reference-url
@@ -86,20 +87,13 @@ (define (git-lfs-package)
(let ((distro (resolve-interface '(gnu packages version-control))))
(module-ref distro 'git-lfs)))
-(define* (git-fetch/in-band* ref hash-algo hash
- #:optional name
- #:key (system (%current-system))
- (guile (default-guile))
- (git (git-package))
- git-lfs)
- "Shared implementation code for git-fetch/in-band & friends. Refer to their
-respective documentation."
+(define (git-fetch-builder git git-lfs git-ref-recursive? hash-algo)
(define inputs
`(,(or git (git-package))
,@(if git-lfs
(list git-lfs)
'())
- ,@(if (git-reference-recursive? ref)
+ ,@(if git-ref-recursive?
;; TODO: remove (standard-packages) after
;; 48e528a26f9c019eeaccf5e3de3126aa02c98d3b is merged into master;
;; currently when doing 'git clone --recursive', we need sed, grep,
@@ -132,59 +126,82 @@ (define* (git-fetch/in-band* ref hash-algo hash
(source-module-closure '((guix build git)
(guix build utils)))))
- (define build
- (with-imported-modules modules
- (with-extensions (list guile-json gnutls ;for (guix swh)
- guile-lzlib)
- #~(begin
- (use-modules (guix build git)
- ((guix build utils)
- #:select (set-path-environment-variable))
- (ice-9 match))
-
- (define lfs?
- (call-with-input-string (getenv "git lfs?") read))
-
- (define recursive?
- (call-with-input-string (getenv "git recursive?") read))
-
- ;; Let Guile interpret file names as UTF-8, otherwise
- ;; 'delete-file-recursively' might fail to delete all of
- ;; '.git'--see <https://issues.guix.gnu.org/54893>.
- (setenv "GUIX_LOCPATH"
- #+(file-append glibc-locales "/lib/locale"))
- (setlocale LC_ALL "en_US.utf8")
-
- ;; The 'git submodule' commands expects Coreutils, sed, grep,
- ;; etc. to be in $PATH. This also ensures that git extensions are
- ;; found.
- (set-path-environment-variable "PATH" '("bin") '#+inputs)
-
- (setvbuf (current-output-port) 'line)
- (setvbuf (current-error-port) 'line)
-
- (git-fetch-with-fallback (getenv "git url") (getenv "git commit")
- #$output
- #:hash #$hash
- #:hash-algorithm '#$hash-algo
- #:lfs? lfs?
- #:recursive? recursive?
- #:git-command "git")))))
+ (with-imported-modules modules
+ (with-extensions (list guile-json gnutls ;for (guix swh)
+ guile-lzlib)
+ #~(begin
+ (use-modules (guix build git)
+ ((guix build utils)
+ #:select (set-path-environment-variable))
+ (ice-9 match)
+ (rnrs bytevectors))
+
+ (define lfs?
+ (call-with-input-string (getenv "git lfs?") read))
+
+ (define recursive?
+ (call-with-input-string (getenv "git recursive?") read))
+
+ ;; Let Guile interpret file names as UTF-8, otherwise
+ ;; 'delete-file-recursively' might fail to delete all of
+ ;; '.git'--see <https://issues.guix.gnu.org/54893>.
+ (setenv "GUIX_LOCPATH"
+ #+(file-append glibc-locales "/lib/locale"))
+ (setlocale LC_ALL "en_US.utf8")
+
+ ;; The 'git submodule' commands expects Coreutils, sed, grep,
+ ;; etc. to be in $PATH. This also ensures that git extensions are
+ ;; found.
+ (set-path-environment-variable "PATH" '("bin") '#+inputs)
+
+ (setvbuf (current-output-port) 'line)
+ (setvbuf (current-error-port) 'line)
+
+ (git-fetch-with-fallback (getenv "git url") (getenv "git commit")
+ #$output
+ #:hash (u8-list->bytevector
+ (map
+ string->number
+ (string-split (getenv "hash") #\,)))
+ #:hash-algorithm '#$hash-algo
+ #:lfs? lfs?
+ #:recursive? recursive?
+ #:git-command "git")))))
+(define* (git-fetch/in-band* ref hash-algo hash
+ #:optional name
+ #:key (system (%current-system))
+ (guile (default-guile))
+ (git (git-package))
+ git-lfs)
+ "Shared implementation code for git-fetch/in-band & friends. Refer to their
+respective documentation."
(mlet %store-monad ((guile (package->derivation (or guile (default-guile))
system)))
- (gexp->derivation (or name "git-checkout") build
-
- ;; Use environment variables and a fixed script name so
- ;; there's only one script in store for all the
- ;; downloads.
+ (gexp->derivation (or name "git-checkout")
+ ;; Avoid the builder differing for every single use as
+ ;; having less builder is more efficient for computing
+ ;; derivations.
+ ;;
+ ;; Don't pass package specific data in to the following
+ ;; procedure, use #:env-vars below instead.
+ (git-fetch-builder git git-lfs
+ (git-reference-recursive? ref)
+ hash-algo)
#:script-name "git-download"
#:env-vars
`(("git url" . ,(git-reference-url ref))
("git commit" . ,(git-reference-commit ref))
("git recursive?" . ,(object->string
(git-reference-recursive? ref)))
- ("git lfs?" . ,(if git-lfs "#t" "#f")))
+ ("git lfs?" . ,(if git-lfs "#t" "#f"))
+ ;; To avoid pulling in (guix base32) in the builder
+ ;; script, use bytevector->u8-list from (rnrs
+ ;; bytevectors)
+ ("hash" . ,(string-join
+ (map number->string
+ (bytevector->u8-list hash))
+ ",")))
#:leaked-env-vars '("http_proxy" "https_proxy"
"LC_ALL" "LC_MESSAGES" "LANG"
"COLUMNS")
--
2.41.0
C
C
Christopher Baines wrote on 11 Jun 13:05 +0200
Re: [bug#70878] [PATCH 0/4] Reduce download builder duplication.
(address . 70878-done@debbugs.gnu.org)
871q53buy9.fsf@cbaines.net
Christopher Baines <mail@cbaines.net> writes:

Toggle quote (32 lines)
> I think we currently have an issue where the builder scripts for several
> download approaches are very numerous, svn-multi-download is probably
> the extreme example as that's used for the numerous texlive packages.
>
> I noticed this when wondering why bayfront is spending so much time
> substituting svn-multi-download files from data.guix.gnu.org, and it's
> probably taking up extra space in the data.guix.gnu.org database too.
>
> These commits should address the issue in svn-multi-fetch, svn-fetch,
> hg-fetch and git-fetch (although this issue doesn't affect users of
> builtin:git-download). The main change is to pass the hash through to
> the builder as an environment variable, rather than generating a
> different builder script for each different hash.
>
> I've also restructured the code to try and avoid this problem in the
> future. While there was a comment about the intent to not duplicate the
> builder scripts, it's too easy to miss. By moving the builder to it's
> own procedure and moving the comment to the call site, it'll be clearer
> when a new source of variation is added.
>
>
> Christopher Baines (4):
> svn-download: Reduce svn-multi-fetch builder duplication.
> svn-download: Reduce svn-fetch builder duplication.
> hg-download: Reduce builder duplication.
> git-download: Reduce builder duplication.
>
> guix/git-download.scm | 123 +++++++++++---------
> guix/hg-download.scm | 127 +++++++++++---------
> guix/svn-download.scm | 264 +++++++++++++++++++++++-------------------
> 3 files changed, 291 insertions(+), 223 deletions(-)

I've tweaked the commit messages a little and pushed these patches to
master as 0daa72e34d7fafc927e2d476ef613c582107781d.
-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmZoL49fFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9XddqBAAhbDXkQ3sbAd0hkdyuFp0Sjmuy6BBlnif
Tkx4N0SLHxgj6s+I/NlxPDuFT7DjhCRm6X7nFrjgS6Kmhw7sd+qNFCStP9jVTNd/
Fs9yfXn9SchdJG4J4j9UEjyw2io+xtZdKIAukSS1s4kbJeR4j3zIUaZfKUCL1Y49
gXH+Bm/6dxvg5xVJtwmDi9bj3CJiTx8U2YBr6Lu+kcXSECjdbDwyLJzjtE5oBdKN
0VbGQ900dv/ZYGDskbb9/BpkGkFFyolA8JpAHf70ZszDezTugGJQYGAGtRPw0qp7
BvTHkFQ/Mka4WdYraTMR+RyMMC0LIU7GannZZlcOuZBQ07ka083INJ0H7C6G8ppN
7f6y9iwWSMnmr61z3lPO6EcSUW/NZAStBXDCdq+TjpNki38aU3U/kOutmeeQnvpU
RyQ7S62Ga6jkRE5jpk0QENLOQTb2HxFEO42lZAnMzqqWKFYrLKWKqIoDHhurohQ6
zZbsOujLHAGl+VDl1umJ2C77RTc5fKpQMZvQAfFAwXV8gzjMUevICI0z/ZRVlH7/
dVPj4TU5s5NzxK7+cePvk1DEDOnCe2SdoqdCoNfbMgGgNbiOTPC/foBlP3X8KPSX
dLtq51Ry8lijsal8BbQnh5m1EBu9tc2BaWqcTAIqj2juumv/7mF1ozutw9xotvWD
pPUHLk9HEdk=
=ju2B
-----END PGP SIGNATURE-----

Closed
?
Your comment

This issue is archived.

To comment on this conversation send an email to 70878@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 70878
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch