[core-updates] Compress man pages using zstd

  • Done
  • quality assurance status badge
Details
2 participants
  • Ludovic Courtès
  • Maxim Cournoyer
Owner
unassigned
Submitted by
Maxim Cournoyer
Severity
normal

Debbugs page

Maxim Cournoyer wrote 1 years ago
[PATCH 0/5] Compress man pages using zstd
(address . guix-patches@gnu.org)(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
cover.1704381395.git.maxim.cournoyer@gmail.com
This series changes the compressor of our man pages from gzip to zstd, which
decompresses much faster, and compresses better at the chosen level (19).


Maxim Cournoyer (5):
utils: Lower xz compression memory usage limit to 20%.
compression: Enable zstd parallel compression.
packages: Repack patched source archives via zstd by default.
build: gnu-build-system: Compress man pages with zstd.
man-db: Add support for zstd compressed man pages.

gnu/compression.scm | 3 +-
gnu/packages/commencement.scm | 3 +-
guix/build/gnu-build-system.scm | 71 +++++++++++++++++++++------------
guix/build/utils.scm | 3 +-
guix/man-db.scm | 45 ++++++++++++++++-----
guix/packages.scm | 51 +++++++++++++----------
guix/profiles.scm | 8 +++-
7 files changed, 122 insertions(+), 62 deletions(-)


base-commit: 784a7e8da6456e6388e2bfc213e93e252eb2be40
--
2.41.0
Maxim Cournoyer wrote 1 years ago
[PATCH 1/5] utils: Lower xz compression memory usage limit to 20%.
(address . 68242@debbugs.gnu.org)(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
148980a1febb6921bd1b02da6c6d5b9defecd50a.1704386901.git.maxim.cournoyer@gmail.com
There were sometimes out of memory errors on the Berlin build farm, especially
for i686 or arm machines having less memory.

* guix/build/utils.scm (%xz-parallel-args): Reduce --memlimit value from 50%
to 20%.

Change-Id: If848bed92ef4c42d11a96057e59ee51a019d0573
---

guix/build/utils.scm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Toggle diff (15 lines)
diff --git a/guix/build/utils.scm b/guix/build/utils.scm
index 8e630ad586..e87066cc02 100644
--- a/guix/build/utils.scm
+++ b/guix/build/utils.scm
@@ -186,7 +186,7 @@ (define (tarball? file-name)
(define (%xz-parallel-args)
"The xz arguments required to enable bit-reproducible, multi-threaded
compression."
- (list "--memlimit=50%"
+ (list "--memlimit=20%"
(format #f "--threads=~a" (max 2 (parallel-job-count)))))
--
2.41.0
Maxim Cournoyer wrote 1 years ago
[PATCH 2/5] compression: Enable zstd parallel compression.
(address . 68242@debbugs.gnu.org)(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
bb042c21756c9c20a59680067b1c45ff2a17d16c.1704386901.git.maxim.cournoyer@gmail.com
* gnu/compression.scm (%compressors) [zstd]: Provide the --threads argument.

Change-Id: I4e8dfe725d1b0721c0016c3013b9e609fee94367
---

gnu/compression.scm | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

Toggle diff (16 lines)
diff --git a/gnu/compression.scm b/gnu/compression.scm
index 0418e80a15..6e48de5979 100644
--- a/gnu/compression.scm
+++ b/gnu/compression.scm
@@ -56,7 +56,8 @@ (define %compressors
;; The default level 3 compresses better than gzip in a
;; fraction of the time, while the highest level 19
;; (de)compresses more slowly and worse than xz.
- #~(list #+(file-append zstd "/bin/zstd") "-3"))
+ #~(list #+(file-append zstd "/bin/zstd") "-3"
+ (format #f "--threads=~a" (parallel-job-count))))
(compressor "none" "" #f)))
(define (lookup-compressor name)
--
2.41.0
Maxim Cournoyer wrote 1 years ago
[PATCH 3/5] packages: Repack patched source archives via zstd by default.
(address . 68242@debbugs.gnu.org)(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
731e80fc6d38e18709f359ea2f982e9b302b2864.1704386901.git.maxim.cournoyer@gmail.com
* guix/build/utils.scm (compressor): Register zst file name extension.
* guix/packages.scm (%standard-patch-inputs): Add zstd.
(patch-and-repack): Rename tarxz-name nested procedure to tar-file-name, and
accept a new 'ext' argument; adjust accordingly. Add zstd binding, and
replace the XZ_DEFAULTS environment variable with ZSTD_NBTHREADS. Fallback to
xz when zstd is not available.

Change-Id: I614a6be8c87a4a0858eadce616c51d8e9b9fc020
---

guix/build/utils.scm | 1 +
guix/packages.scm | 50 +++++++++++++++++++++++++-------------------
2 files changed, 30 insertions(+), 21 deletions(-)

Toggle diff (118 lines)
diff --git a/guix/build/utils.scm b/guix/build/utils.scm
index e87066cc02..9c1e19f6d8 100644
--- a/guix/build/utils.scm
+++ b/guix/build/utils.scm
@@ -177,6 +177,7 @@ (define (compressor file-name)
((string-suffix? "lz" file-name) "lzip")
((string-suffix? "zip" file-name) "unzip")
((string-suffix? "xz" file-name) "xz")
+ ((string-suffix? "zst" file-name) "zstd")
(else #f))) ;no compression used/unknown file extension
(define (tarball? file-name)
diff --git a/guix/packages.scm b/guix/packages.scm
index cb8db925f8..ce1ba7c53a 100644
--- a/guix/packages.scm
+++ b/guix/packages.scm
@@ -5,7 +5,7 @@
;;; Copyright © 2016 Alex Kost <alezost@gmail.com>
;;; Copyright © 2017, 2019, 2020, 2022 Efraim Flashner <efraim@flashner.co.il>
;;; Copyright © 2019 Marius Bakke <mbakke@fastmail.com>
-;;; Copyright © 2020, 2021 Maxim Cournoyer <maxim.cournoyer@gmail.com>
+;;; Copyright © 2020, 2021, 2024 Maxim Cournoyer <maxim.cournoyer@gmail.com>
;;; Copyright © 2021 Chris Marusich <cmmarusich@gmail.com>
;;; Copyright © 2022 Maxime Devos <maximedevos@telenet.be>
;;; Copyright © 2022 jgart <jgart@dismail.de>
@@ -862,6 +862,7 @@ (define (%standard-patch-inputs system)
(module-ref (resolve-interface module) var))))))
`(("tar" ,(ref '(gnu packages base) 'tar))
("xz" ,(ref '(gnu packages compression) 'xz))
+ ("zstd" ,(ref '(gnu packages compression) 'zstd))
("bzip2" ,(ref '(gnu packages compression) 'bzip2))
("gzip" ,(ref '(gnu packages compression) 'gzip))
("lzip" ,(ref '(gnu packages compression) 'lzip))
@@ -926,31 +927,35 @@ (define* (patch-and-repack source patches
;; Return true if DIRECTORY is a checkout (git, svn, etc).
(string-suffix? "-checkout" directory))
- (define (tarxz-name file-name)
- ;; Return a '.tar.xz' file name based on FILE-NAME.
+ (define (tar-file-name file-name ext)
+ ;; Return a '$filename.tar.$ext' file name based on FILE-NAME and EXT.
(let ((base (if (numeric-extension? file-name)
original-file-name
(file-sans-extension file-name))))
(string-append base
(if (equal? (file-extension base) "tar")
- ".xz"
- ".tar.xz"))))
+ (string-append "." ext)
+ (string-append ".tar." ext)))))
(define instantiate-patch
(match-lambda
- ((? string? patch) ;deprecated
+ ((? string? patch) ;deprecated
(local-file patch #:recursive? #t))
- ((? struct? patch) ;origin, local-file, etc.
+ ((? struct? patch) ;origin, local-file, etc.
patch)))
- (let ((tar (lookup-input "tar"))
- (gzip (lookup-input "gzip"))
- (bzip2 (lookup-input "bzip2"))
- (lzip (lookup-input "lzip"))
- (xz (lookup-input "xz"))
- (patch (lookup-input "patch"))
- (comp (and=> (compressor source-file-name) lookup-input))
- (patches (map instantiate-patch patches)))
+ (let* ((tar (lookup-input "tar"))
+ (gzip (lookup-input "gzip"))
+ (bzip2 (lookup-input "bzip2"))
+ (lzip (lookup-input "lzip"))
+ (xz (lookup-input "xz"))
+ (zstd (or (lookup-input "zstd")
+ ;; Fallback to xz in case zstd is not available, such as
+ ;; for bootstrap packages.
+ xz))
+ (patch (lookup-input "patch"))
+ (comp (and=> (compressor source-file-name) lookup-input))
+ (patches (map instantiate-patch patches)))
(define build
(with-imported-modules '((guix build utils))
#~(begin
@@ -1028,12 +1033,12 @@ (define* (patch-and-repack source patches
locale (system-error-errno args)))))
(setenv "PATH"
- (string-append #+xz "/bin"
+ (string-append #+zstd "/bin"
(if #+comp
(string-append ":" #+comp "/bin")
"")))
- (setenv "XZ_DEFAULTS" (string-join (%xz-parallel-args)))
+ (setenv "ZSTD_NBTHREADS" (number->string (parallel-job-count)))
;; SOURCE may be either a directory, a tarball or a simple file.
(let ((name (strip-store-file-name #+source))
@@ -1088,10 +1093,13 @@ (define* (patch-and-repack source patches
(else ;single uncompressed file
(copy-file file #$output)))))))
- (let ((name (if (or (checkout? original-file-name)
- (not (compressor original-file-name)))
- original-file-name
- (tarxz-name original-file-name))))
+ (let* ((ext (if zstd
+ "zst" ;usual case
+ "xz")) ;zstd-less bootstrap-origin
+ (name (if (or (checkout? original-file-name)
+ (not (compressor original-file-name)))
+ original-file-name
+ (tar-file-name original-file-name ext))))
(gexp->derivation name build
#:graft? #f
#:system system
--
2.41.0
Maxim Cournoyer wrote 1 years ago
[PATCH 4/5] build: gnu-build-system: Compress man pages with zstd.
(address . 68242@debbugs.gnu.org)(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
6425d5767b4ca53ed6de612c0f77e3d6a872af51.1704386901.git.maxim.cournoyer@gmail.com
The aim is to improve the efficiency of computing the man pages database,
which must decompress the man pages. Zstd is faster than gzip, especially for
decompression, and has a similar compression ratio.

* gnu/packages/commencement.scm (%final-inputs): Add zstd.
* guix/build/gnu-build-system.scm
(compress-documentation) Update doc.
<info-compressor, info-compressor-flags, man-compressor, man-compressor-flags>
<man-compressor-file-extension>: New arguments.
<compressed-documentation-extension>: Rename argument to...
<info-compressor-file-extension>: ... this. Add an 'extension' argument to
the retarget-symlink nested procedure. Use new arguments in nested
'maybe-compress' procedure.

Change-Id: Ibaad4658f8e5151633714d263d9198f56d255020
---

gnu/packages/commencement.scm | 3 +-
guix/build/gnu-build-system.scm | 73 +++++++++++++++++++++------------
2 files changed, 49 insertions(+), 27 deletions(-)

Toggle diff (142 lines)
diff --git a/gnu/packages/commencement.scm b/gnu/packages/commencement.scm
index ae1c91f0d0..51c26339ef 100644
--- a/gnu/packages/commencement.scm
+++ b/gnu/packages/commencement.scm
@@ -3492,7 +3492,8 @@ (define-public %final-inputs
(native-inputs
(list (if (target-hurd?)
glibc-utf8-locales-final/hurd
- glibc-utf8-locales-final)))))))
+ glibc-utf8-locales-final)))))
+ ("zstd" ,zstd)))
("sed" ,sed-final)
("grep" ,grep-final)
("xz" ,xz-final)
diff --git a/guix/build/gnu-build-system.scm b/guix/build/gnu-build-system.scm
index 51b8f9acbf..ff9b123ae6 100644
--- a/guix/build/gnu-build-system.scm
+++ b/guix/build/gnu-build-system.scm
@@ -2,7 +2,7 @@
;;; Copyright © 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021 Ludovic Courtès <ludo@gnu.org>
;;; Copyright © 2018 Mark H Weaver <mhw@netris.org>
;;; Copyright © 2020 Brendan Tildesley <mail@brendan.scot>
-;;; Copyright © 2021 Maxim Cournoyer <maxim.cournoyer@gmail.com>
+;;; Copyright © 2021, 2022 Maxim Cournoyer <maxim.cournoyer@gmail.com>
;;;
;;; This file is part of GNU Guix.
;;;
@@ -644,21 +644,36 @@ (define* (reset-gzip-timestamps #:key outputs #:allow-other-keys)
(((names . directories) ...)
(for-each process-directory directories))))
-(define* (compress-documentation #:key outputs
+(define* (compress-documentation #:key
+ outputs
(compress-documentation? #t)
- (documentation-compressor "gzip")
- (documentation-compressor-flags
+ (info-compressor "gzip")
+ (info-compressor-flags
'("--best" "--no-name"))
- (compressed-documentation-extension ".gz")
+ (info-compressor-file-extension ".gz")
+ (man-compressor (if (which "zstd")
+ "zstd"
+ info-compressor))
+ (man-compressor-flags
+ (if (which "zstd")
+ (list "-19" "--rm"
+ "--threads" (string->number
+ (parallel-job-count)))
+ info-compressor-flags))
+ (man-compressor-file-extension
+ (if (which "zstd")
+ ".zst"
+ info-compressor-file-extension))
#:allow-other-keys)
- "When COMPRESS-DOCUMENTATION? is true, compress man pages and Info files
-found in OUTPUTS using DOCUMENTATION-COMPRESSOR, called with
-DOCUMENTATION-COMPRESSOR-FLAGS."
- (define (retarget-symlink link)
+ "When COMPRESS-INFO-MANUALS? is true, compress Info files found in OUTPUTS
+using INFO-COMPRESSOR, called with INFO-COMPRESSOR-FLAGS. Similarly, when
+COMPRESS-MAN-PAGES? is true, compress man pages files found in OUTPUTS using
+MAN-COMPRESSOR, using MAN-COMPRESSOR-FLAGS."
+ (define (retarget-symlink link extension)
(let ((target (readlink link)))
(delete-file link)
- (symlink (string-append target compressed-documentation-extension)
- (string-append link compressed-documentation-extension))))
+ (symlink (string-append target extension)
+ (string-append link extension))))
(define (has-links? file)
;; Return #t if FILE has hard links.
@@ -676,23 +691,23 @@ (define* (compress-documentation #:key outputs
(symbolic-link? target-absolute))
(lambda args
(if (= ENOENT (system-error-errno args))
- (begin
- (format (current-error-port)
- "The symbolic link '~a' target is missing: '~a'\n"
- symlink target-absolute)
- #f)
+ (format (current-error-port)
+ "The symbolic link '~a' target is missing: '~a'\n"
+ symlink target-absolute)
(apply throw args))))))
- (define (maybe-compress-directory directory regexp)
+ (define (maybe-compress-directory directory regexp
+ compressor
+ compressor-flags
+ compressor-extension)
(when (directory-exists? directory)
(match (find-files directory regexp)
- (() ;nothing to compress
+ (() ;nothing to compress
#t)
- ((files ...) ;one or more files
+ ((files ...) ;one or more files
(format #t
"compressing documentation in '~a' with ~s and flags ~s~%"
- directory documentation-compressor
- documentation-compressor-flags)
+ directory compressor compressor-flags)
(call-with-values
(lambda ()
(partition symbolic-link? files))
@@ -702,20 +717,26 @@ (define* (compress-documentation #:key outputs
;; unchanged ('gzip' would refuse to compress them anyway.)
;; Also, do not retarget symbolic links pointing to other
;; symbolic links, since these are not compressed.
- (for-each retarget-symlink
+ (for-each (cut retarget-symlink <> compressor-extension)
(filter (lambda (symlink)
(and (not (points-to-symlink? symlink))
(string-match regexp symlink)))
symlinks))
- (apply invoke documentation-compressor
- (append documentation-compressor-flags
+ (apply invoke compressor
+ (append compressor-flags
(remove has-links? regular-files)))))))))
(define (maybe-compress output)
(maybe-compress-directory (string-append output "/share/man")
- "\\.[0-9]+$")
+ "\\.[0-9]+$"
+ man-compressor
+ man-compressor-flags
+ man-compressor-file-extension)
(maybe-compress-directory (string-append output "/share/info")
- "\\.info(-[0-9]+)?$"))
+ "\\.info(-[0-9]+)?$"
+ info-compressor
+ info-compressor-flags
+ info-compressor-file-extension))
(if compress-documentation?
(match outputs
--
2.41.0
Maxim Cournoyer wrote 1 years ago
[PATCH 5/5] man-db: Add support for zstd compressed man pages.
(address . 68242@debbugs.gnu.org)(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
44d6f0f8471ad290a78c102228352481a131f60f.1704386901.git.maxim.cournoyer@gmail.com
* guix/man-db.scm (<mandb-entry>): Adjust comment.
(abbreviate-file-name): Adjust regexp.
(gz-compressed?, zstd-compressed?): New predicates.
(entry->string): Use them.
(man-page->entry): Adjust doc. Use input port reader appropriate to the
compression type, if any.
(man-files): Adjust regexp.
(mandb-entries): Adjust link resolving predicate.
* guix/profiles.scm (manual-database): Add guile-zlib extension.

Change-Id: I6336e46e2d324c520a7d15d6cafd12bbf43c5b09
---

guix/man-db.scm | 45 +++++++++++++++++++++++++++++++++++----------
guix/profiles.scm | 8 ++++++--
2 files changed, 41 insertions(+), 12 deletions(-)

Toggle diff (150 lines)
diff --git a/guix/man-db.scm b/guix/man-db.scm
index 7d9707a592..12887ce400 100644
--- a/guix/man-db.scm
+++ b/guix/man-db.scm
@@ -1,5 +1,6 @@
;;; GNU Guix --- Functional package management for GNU
;;; Copyright © 2017, 2018 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2022, 2024 Maxim Cournoyer <maxim.cournoyer@gmail.com>
;;;
;;; This file is part of GNU Guix.
;;;
@@ -18,6 +19,7 @@
(define-module (guix man-db)
#:use-module (zlib)
+ #:use-module (zstd)
#:use-module ((guix build utils) #:select (find-files))
#:use-module (gdbm) ;gdbm-ffi
#:use-module (srfi srfi-9)
@@ -48,7 +50,7 @@ (define-module (guix man-db)
(define-record-type <mandb-entry>
(mandb-entry file-name name section synopsis kind)
mandb-entry?
- (file-name mandb-entry-file-name) ;e.g., "../abiword.1.gz"
+ (file-name mandb-entry-file-name) ;e.g., "../abiword.1.zst"
(name mandb-entry-name) ;e.g., "ABIWORD"
(section mandb-entry-section) ;number
(synopsis mandb-entry-synopsis) ;string
@@ -63,7 +65,7 @@ (define (mandb-entry<? entry1 entry2)
(string<? (basename file1) (basename file2))))))))
(define abbreviate-file-name
- (let ((man-file-rx (make-regexp "(.+)\\.[0-9][a-z]?(\\.gz)?$")))
+ (let ((man-file-rx (make-regexp "(.+)\\.[0-9][a-z]?(\\.(gz|zst))?$")))
(lambda (file)
(match (regexp-exec man-file-rx (basename file))
(#f
@@ -71,6 +73,14 @@ (define abbreviate-file-name
(matches
(match:substring matches 1))))))
+(define (gzip-compressed? file-name)
+ "True if FILE-NAME is suffixed with the '.gz' file extension."
+ (string-suffix? ".gz" file-name))
+
+(define (zstd-compressed? file-name)
+ "True if FILE-NAME is suffixed with the '.zst' file extension."
+ (string-suffix? ".zst" file-name))
+
(define (entry->string entry)
"Return the wire format for ENTRY as a string."
(match entry
@@ -92,7 +102,11 @@ (define (entry->string entry)
"\t-\t-\t"
- (if (string-suffix? ".gz" file) "gz" "")
+ (cond
+ ((gzip-compressed? file) "gz")
+ ((zstd-compressed? file) "zst")
+ (else ""))
+
"\t"
synopsis "\x00"))))
@@ -148,7 +162,8 @@ (define (read-synopsis port)
(loop (cons line lines))))))
(define* (man-page->entry file #:optional (resolve identity))
- "Parse FILE, a gzipped man page, and return a <mandb-entry> for it."
+ "Parse FILE, a gzip or zstd compressed man page, and return a <mandb-entry>
+for it."
(define (string->number* str)
(if (and (string-prefix? "\"" str)
(> (string-length str) 1)
@@ -156,8 +171,13 @@ (define* (man-page->entry file #:optional (resolve identity))
(string->number (string-drop (string-drop-right str 1) 1))
(string->number str)))
- ;; Note: This works for both gzipped and uncompressed files.
- (call-with-gzip-input-port (open-file file "r0")
+ (define call-with-input-port*
+ (cond
+ ((gzip-compressed? file) call-with-gzip-input-port)
+ ((zstd-compressed? file) call-with-zstd-input-port)
+ (else call-with-port)))
+
+ (call-with-input-port* (open-file file "r0")
(lambda (port)
(let loop ((name #f)
(section #f)
@@ -191,14 +211,19 @@ (define* (man-page->entry file #:optional (resolve identity))
(define (man-files directory)
"Return the list of man pages found under DIRECTORY, recursively."
;; Filter the list to ensure that broken symlinks are excluded.
- (filter file-exists? (find-files directory "\\.[0-9][a-z]?(\\.gz)?$")))
+ (filter file-exists?
+ (find-files directory "\\.[0-9][a-z]?(\\.(gz|zst))?$")))
(define (mandb-entries directory)
"Return mandb entries for the man pages found under DIRECTORY, recursively."
(map (lambda (file)
(man-page->entry file
(lambda (link)
- (let ((file (string-append directory "/" link
- ".gz")))
- (and (file-exists? file) file)))))
+ (let ((file-gz (string-append directory "/" link
+ ".gz"))
+ (file-zst (string-append directory "/" link
+ ".zst")))
+ (and (or (file-exists? file-gz)
+ (file-exists? file-zst) file)
+ file)))))
(man-files directory)))
diff --git a/guix/profiles.scm b/guix/profiles.scm
index da7790d819..7fa5dab62a 100644
--- a/guix/profiles.scm
+++ b/guix/profiles.scm
@@ -7,7 +7,7 @@
;;; Copyright © 2016, 2017, 2018, 2019, 2021, 2022 Ricardo Wurmus <rekado@elephly.net>
;;; Copyright © 2016 Chris Marusich <cmmarusich@gmail.com>
;;; Copyright © 2017 Huang Ying <huang.ying.caritas@gmail.com>
-;;; Copyright © 2017, 2021 Maxim Cournoyer <maxim.cournoyer@gmail.com>
+;;; Copyright © 2017, 2021, 2024 Maxim Cournoyer <maxim.cournoyer@gmail.com>
;;; Copyright © 2019 Kyle Meyer <kyle@kyleam.com>
;;; Copyright © 2019 Mathieu Othacehe <m.othacehe@gmail.com>
;;; Copyright © 2020 Danny Milosavljevic <dannym@scratchpost.org>
@@ -1701,6 +1701,9 @@ (define* (manual-database manifest #:optional system)
(define guile-zlib
(module-ref (resolve-interface '(gnu packages guile)) 'guile-zlib))
+ (define guile-zstd
+ (module-ref (resolve-interface '(gnu packages guile)) 'guile-zstd))
+
(define modules
(delete '(guix config)
(source-module-closure `((guix build utils)
@@ -1709,7 +1712,8 @@ (define* (manual-database manifest #:optional system)
(define build
(with-imported-modules modules
(with-extensions (list gdbm-ffi ;for (guix man-db)
- guile-zlib)
+ guile-zlib
+ guile-zstd)
#~(begin
(use-modules (guix man-db)
(guix build utils)
--
2.41.0
Maxim Cournoyer wrote 1 years ago
control message for bug #68242
(address . control@debbugs.gnu.org)
87bka10y48.fsf@gmail.com
retitle 68242 [core-updates] Compress man pages using zstd
quit
Maxim Cournoyer wrote 1 years ago
[PATCH core-updates v2 0/5] Compress man pages using zstd
(address . 68242@debbugs.gnu.org)(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
cover.1704484373.git.maxim.cournoyer@gmail.com
This series changes the compressor of our man pages from gzip to zstd, which
decompresses much faster, and compresses better at the chosen level (19).

Changes in v2:
- Turn string->number into number->string

Maxim Cournoyer (5):
utils: Lower xz compression memory usage limit to 20%.
compression: Enable zstd parallel compression.
packages: Repack patched source archives via zstd by default.
build: gnu-build-system: Compress man pages with zstd.
man-db: Add support for zstd compressed man pages.

gnu/compression.scm | 3 +-
gnu/packages/commencement.scm | 3 +-
guix/build/gnu-build-system.scm | 73 +++++++++++++++++++++------------
guix/build/utils.scm | 3 +-
guix/man-db.scm | 45 +++++++++++++++-----
guix/packages.scm | 50 ++++++++++++----------
guix/profiles.scm | 8 +++-
7 files changed, 123 insertions(+), 62 deletions(-)


base-commit: 54d122a12b6b9f0bf2f20fe2c5e2c6549bc9909d
--
2.41.0
Maxim Cournoyer wrote 1 years ago
[PATCH core-updates v2 2/5] compression: Enable zstd parallel compression.
(address . 68242@debbugs.gnu.org)(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
af3992ac362f996fe5eea8c63e787dba929b9b5c.1704484373.git.maxim.cournoyer@gmail.com
* gnu/compression.scm (%compressors) [zstd]: Provide the --threads argument.

Change-Id: I4e8dfe725d1b0721c0016c3013b9e609fee94367
---

(no changes since v1)

gnu/compression.scm | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

Toggle diff (16 lines)
diff --git a/gnu/compression.scm b/gnu/compression.scm
index 0418e80a15..6e48de5979 100644
--- a/gnu/compression.scm
+++ b/gnu/compression.scm
@@ -56,7 +56,8 @@ (define %compressors
;; The default level 3 compresses better than gzip in a
;; fraction of the time, while the highest level 19
;; (de)compresses more slowly and worse than xz.
- #~(list #+(file-append zstd "/bin/zstd") "-3"))
+ #~(list #+(file-append zstd "/bin/zstd") "-3"
+ (format #f "--threads=~a" (parallel-job-count))))
(compressor "none" "" #f)))
(define (lookup-compressor name)
--
2.41.0
Maxim Cournoyer wrote 1 years ago
[PATCH core-updates v2 1/5] utils: Lower xz compression memory usage limit to 20%.
(address . 68242@debbugs.gnu.org)(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
2d37c22fc834b1062c456c8b23974d9be5a2ff56.1704484373.git.maxim.cournoyer@gmail.com
There were sometimes out of memory errors on the Berlin build farm, especially
for i686 or arm machines having less memory.

* guix/build/utils.scm (%xz-parallel-args): Reduce --memlimit value from 50%
to 20%.

Change-Id: If848bed92ef4c42d11a96057e59ee51a019d0573
---

(no changes since v1)

guix/build/utils.scm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Toggle diff (15 lines)
diff --git a/guix/build/utils.scm b/guix/build/utils.scm
index 8e630ad586..e87066cc02 100644
--- a/guix/build/utils.scm
+++ b/guix/build/utils.scm
@@ -186,7 +186,7 @@ (define (tarball? file-name)
(define (%xz-parallel-args)
"The xz arguments required to enable bit-reproducible, multi-threaded
compression."
- (list "--memlimit=50%"
+ (list "--memlimit=20%"
(format #f "--threads=~a" (max 2 (parallel-job-count)))))
--
2.41.0
Maxim Cournoyer wrote 1 years ago
[PATCH core-updates v2 3/5] packages: Repack patched source archives via zstd by default.
(address . 68242@debbugs.gnu.org)(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
e208adb6c41d23fce4678a5146bfd5583b04e3c8.1704484373.git.maxim.cournoyer@gmail.com
* guix/build/utils.scm (compressor): Register zst file name extension.
* guix/packages.scm (%standard-patch-inputs): Add zstd.
(patch-and-repack): Rename tarxz-name nested procedure to tar-file-name, and
accept a new 'ext' argument; adjust accordingly. Add zstd binding, and
replace the XZ_DEFAULTS environment variable with ZSTD_NBTHREADS. Fallback to
xz when zstd is not available.

Change-Id: I614a6be8c87a4a0858eadce616c51d8e9b9fc020
---

(no changes since v1)

guix/build/utils.scm | 1 +
guix/packages.scm | 50 +++++++++++++++++++++++++-------------------
2 files changed, 30 insertions(+), 21 deletions(-)

Toggle diff (118 lines)
diff --git a/guix/build/utils.scm b/guix/build/utils.scm
index e87066cc02..9c1e19f6d8 100644
--- a/guix/build/utils.scm
+++ b/guix/build/utils.scm
@@ -177,6 +177,7 @@ (define (compressor file-name)
((string-suffix? "lz" file-name) "lzip")
((string-suffix? "zip" file-name) "unzip")
((string-suffix? "xz" file-name) "xz")
+ ((string-suffix? "zst" file-name) "zstd")
(else #f))) ;no compression used/unknown file extension
(define (tarball? file-name)
diff --git a/guix/packages.scm b/guix/packages.scm
index cb8db925f8..ce1ba7c53a 100644
--- a/guix/packages.scm
+++ b/guix/packages.scm
@@ -5,7 +5,7 @@
;;; Copyright © 2016 Alex Kost <alezost@gmail.com>
;;; Copyright © 2017, 2019, 2020, 2022 Efraim Flashner <efraim@flashner.co.il>
;;; Copyright © 2019 Marius Bakke <mbakke@fastmail.com>
-;;; Copyright © 2020, 2021 Maxim Cournoyer <maxim.cournoyer@gmail.com>
+;;; Copyright © 2020, 2021, 2024 Maxim Cournoyer <maxim.cournoyer@gmail.com>
;;; Copyright © 2021 Chris Marusich <cmmarusich@gmail.com>
;;; Copyright © 2022 Maxime Devos <maximedevos@telenet.be>
;;; Copyright © 2022 jgart <jgart@dismail.de>
@@ -862,6 +862,7 @@ (define (%standard-patch-inputs system)
(module-ref (resolve-interface module) var))))))
`(("tar" ,(ref '(gnu packages base) 'tar))
("xz" ,(ref '(gnu packages compression) 'xz))
+ ("zstd" ,(ref '(gnu packages compression) 'zstd))
("bzip2" ,(ref '(gnu packages compression) 'bzip2))
("gzip" ,(ref '(gnu packages compression) 'gzip))
("lzip" ,(ref '(gnu packages compression) 'lzip))
@@ -926,31 +927,35 @@ (define* (patch-and-repack source patches
;; Return true if DIRECTORY is a checkout (git, svn, etc).
(string-suffix? "-checkout" directory))
- (define (tarxz-name file-name)
- ;; Return a '.tar.xz' file name based on FILE-NAME.
+ (define (tar-file-name file-name ext)
+ ;; Return a '$filename.tar.$ext' file name based on FILE-NAME and EXT.
(let ((base (if (numeric-extension? file-name)
original-file-name
(file-sans-extension file-name))))
(string-append base
(if (equal? (file-extension base) "tar")
- ".xz"
- ".tar.xz"))))
+ (string-append "." ext)
+ (string-append ".tar." ext)))))
(define instantiate-patch
(match-lambda
- ((? string? patch) ;deprecated
+ ((? string? patch) ;deprecated
(local-file patch #:recursive? #t))
- ((? struct? patch) ;origin, local-file, etc.
+ ((? struct? patch) ;origin, local-file, etc.
patch)))
- (let ((tar (lookup-input "tar"))
- (gzip (lookup-input "gzip"))
- (bzip2 (lookup-input "bzip2"))
- (lzip (lookup-input "lzip"))
- (xz (lookup-input "xz"))
- (patch (lookup-input "patch"))
- (comp (and=> (compressor source-file-name) lookup-input))
- (patches (map instantiate-patch patches)))
+ (let* ((tar (lookup-input "tar"))
+ (gzip (lookup-input "gzip"))
+ (bzip2 (lookup-input "bzip2"))
+ (lzip (lookup-input "lzip"))
+ (xz (lookup-input "xz"))
+ (zstd (or (lookup-input "zstd")
+ ;; Fallback to xz in case zstd is not available, such as
+ ;; for bootstrap packages.
+ xz))
+ (patch (lookup-input "patch"))
+ (comp (and=> (compressor source-file-name) lookup-input))
+ (patches (map instantiate-patch patches)))
(define build
(with-imported-modules '((guix build utils))
#~(begin
@@ -1028,12 +1033,12 @@ (define* (patch-and-repack source patches
locale (system-error-errno args)))))
(setenv "PATH"
- (string-append #+xz "/bin"
+ (string-append #+zstd "/bin"
(if #+comp
(string-append ":" #+comp "/bin")
"")))
- (setenv "XZ_DEFAULTS" (string-join (%xz-parallel-args)))
+ (setenv "ZSTD_NBTHREADS" (number->string (parallel-job-count)))
;; SOURCE may be either a directory, a tarball or a simple file.
(let ((name (strip-store-file-name #+source))
@@ -1088,10 +1093,13 @@ (define* (patch-and-repack source patches
(else ;single uncompressed file
(copy-file file #$output)))))))
- (let ((name (if (or (checkout? original-file-name)
- (not (compressor original-file-name)))
- original-file-name
- (tarxz-name original-file-name))))
+ (let* ((ext (if zstd
+ "zst" ;usual case
+ "xz")) ;zstd-less bootstrap-origin
+ (name (if (or (checkout? original-file-name)
+ (not (compressor original-file-name)))
+ original-file-name
+ (tar-file-name original-file-name ext))))
(gexp->derivation name build
#:graft? #f
#:system system
--
2.41.0
Maxim Cournoyer wrote 1 years ago
[PATCH core-updates v2 4/5] build: gnu-build-system: Compress man pages with zstd.
(address . 68242@debbugs.gnu.org)(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
befeaef0fa91ae49ea8f141121de4d790dd52a2a.1704484373.git.maxim.cournoyer@gmail.com
The aim is to improve the efficiency of computing the man pages database,
which must decompress the man pages. Zstd is faster than gzip, especially for
decompression, and has a similar compression ratio.

* gnu/packages/commencement.scm (%final-inputs): Add zstd.
* guix/build/gnu-build-system.scm
(compress-documentation) Update doc.
<info-compressor, info-compressor-flags, man-compressor, man-compressor-flags>
<man-compressor-file-extension>: New arguments.
<compressed-documentation-extension>: Rename argument to...
<info-compressor-file-extension>: ... this. Add an 'extension' argument to
the retarget-symlink nested procedure. Use new arguments in nested
'maybe-compress' procedure.

Change-Id: Ibaad4658f8e5151633714d263d9198f56d255020
---

Changes in v2:
- Turn string->number into number->string

gnu/packages/commencement.scm | 3 +-
guix/build/gnu-build-system.scm | 73 +++++++++++++++++++++------------
2 files changed, 49 insertions(+), 27 deletions(-)

Toggle diff (142 lines)
diff --git a/gnu/packages/commencement.scm b/gnu/packages/commencement.scm
index ae1c91f0d0..51c26339ef 100644
--- a/gnu/packages/commencement.scm
+++ b/gnu/packages/commencement.scm
@@ -3492,7 +3492,8 @@ (define-public %final-inputs
(native-inputs
(list (if (target-hurd?)
glibc-utf8-locales-final/hurd
- glibc-utf8-locales-final)))))))
+ glibc-utf8-locales-final)))))
+ ("zstd" ,zstd)))
("sed" ,sed-final)
("grep" ,grep-final)
("xz" ,xz-final)
diff --git a/guix/build/gnu-build-system.scm b/guix/build/gnu-build-system.scm
index 51b8f9acbf..2f0ffe36fc 100644
--- a/guix/build/gnu-build-system.scm
+++ b/guix/build/gnu-build-system.scm
@@ -2,7 +2,7 @@
;;; Copyright © 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021 Ludovic Courtès <ludo@gnu.org>
;;; Copyright © 2018 Mark H Weaver <mhw@netris.org>
;;; Copyright © 2020 Brendan Tildesley <mail@brendan.scot>
-;;; Copyright © 2021 Maxim Cournoyer <maxim.cournoyer@gmail.com>
+;;; Copyright © 2021, 2022 Maxim Cournoyer <maxim.cournoyer@gmail.com>
;;;
;;; This file is part of GNU Guix.
;;;
@@ -644,21 +644,36 @@ (define* (reset-gzip-timestamps #:key outputs #:allow-other-keys)
(((names . directories) ...)
(for-each process-directory directories))))
-(define* (compress-documentation #:key outputs
+(define* (compress-documentation #:key
+ outputs
(compress-documentation? #t)
- (documentation-compressor "gzip")
- (documentation-compressor-flags
+ (info-compressor "gzip")
+ (info-compressor-flags
'("--best" "--no-name"))
- (compressed-documentation-extension ".gz")
+ (info-compressor-file-extension ".gz")
+ (man-compressor (if (which "zstd")
+ "zstd"
+ info-compressor))
+ (man-compressor-flags
+ (if (which "zstd")
+ (list "-19" "--rm"
+ "--threads" (number->string
+ (parallel-job-count)))
+ info-compressor-flags))
+ (man-compressor-file-extension
+ (if (which "zstd")
+ ".zst"
+ info-compressor-file-extension))
#:allow-other-keys)
- "When COMPRESS-DOCUMENTATION? is true, compress man pages and Info files
-found in OUTPUTS using DOCUMENTATION-COMPRESSOR, called with
-DOCUMENTATION-COMPRESSOR-FLAGS."
- (define (retarget-symlink link)
+ "When COMPRESS-INFO-MANUALS? is true, compress Info files found in OUTPUTS
+using INFO-COMPRESSOR, called with INFO-COMPRESSOR-FLAGS. Similarly, when
+COMPRESS-MAN-PAGES? is true, compress man pages files found in OUTPUTS using
+MAN-COMPRESSOR, using MAN-COMPRESSOR-FLAGS."
+ (define (retarget-symlink link extension)
(let ((target (readlink link)))
(delete-file link)
- (symlink (string-append target compressed-documentation-extension)
- (string-append link compressed-documentation-extension))))
+ (symlink (string-append target extension)
+ (string-append link extension))))
(define (has-links? file)
;; Return #t if FILE has hard links.
@@ -676,23 +691,23 @@ (define* (compress-documentation #:key outputs
(symbolic-link? target-absolute))
(lambda args
(if (= ENOENT (system-error-errno args))
- (begin
- (format (current-error-port)
- "The symbolic link '~a' target is missing: '~a'\n"
- symlink target-absolute)
- #f)
+ (format (current-error-port)
+ "The symbolic link '~a' target is missing: '~a'\n"
+ symlink target-absolute)
(apply throw args))))))
- (define (maybe-compress-directory directory regexp)
+ (define (maybe-compress-directory directory regexp
+ compressor
+ compressor-flags
+ compressor-extension)
(when (directory-exists? directory)
(match (find-files directory regexp)
- (() ;nothing to compress
+ (() ;nothing to compress
#t)
- ((files ...) ;one or more files
+ ((files ...) ;one or more files
(format #t
"compressing documentation in '~a' with ~s and flags ~s~%"
- directory documentation-compressor
- documentation-compressor-flags)
+ directory compressor compressor-flags)
(call-with-values
(lambda ()
(partition symbolic-link? files))
@@ -702,20 +717,26 @@ (define* (compress-documentation #:key outputs
;; unchanged ('gzip' would refuse to compress them anyway.)
;; Also, do not retarget symbolic links pointing to other
;; symbolic links, since these are not compressed.
- (for-each retarget-symlink
+ (for-each (cut retarget-symlink <> compressor-extension)
(filter (lambda (symlink)
(and (not (points-to-symlink? symlink))
(string-match regexp symlink)))
symlinks))
- (apply invoke documentation-compressor
- (append documentation-compressor-flags
+ (apply invoke compressor
+ (append compressor-flags
(remove has-links? regular-files)))))))))
(define (maybe-compress output)
(maybe-compress-directory (string-append output "/share/man")
- "\\.[0-9]+$")
+ "\\.[0-9]+$"
+ man-compressor
+ man-compressor-flags
+ man-compressor-file-extension)
(maybe-compress-directory (string-append output "/share/info")
- "\\.info(-[0-9]+)?$"))
+ "\\.info(-[0-9]+)?$"
+ info-compressor
+ info-compressor-flags
+ info-compressor-file-extension))
(if compress-documentation?
(match outputs
--
2.41.0
Maxim Cournoyer wrote 1 years ago
[PATCH core-updates v2 5/5] man-db: Add support for zstd compressed man pages.
(address . 68242@debbugs.gnu.org)(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
e96aa71c826ae79cb42e92471dda7d0fbddf4321.1704484373.git.maxim.cournoyer@gmail.com
* guix/man-db.scm (<mandb-entry>): Adjust comment.
(abbreviate-file-name): Adjust regexp.
(gz-compressed?, zstd-compressed?): New predicates.
(entry->string): Use them.
(man-page->entry): Adjust doc. Use input port reader appropriate to the
compression type, if any.
(man-files): Adjust regexp.
(mandb-entries): Adjust link resolving predicate.
* guix/profiles.scm (manual-database): Add guile-zlib extension.

Change-Id: I6336e46e2d324c520a7d15d6cafd12bbf43c5b09
---

(no changes since v1)

guix/man-db.scm | 45 +++++++++++++++++++++++++++++++++++----------
guix/profiles.scm | 8 ++++++--
2 files changed, 41 insertions(+), 12 deletions(-)

Toggle diff (150 lines)
diff --git a/guix/man-db.scm b/guix/man-db.scm
index 7d9707a592..12887ce400 100644
--- a/guix/man-db.scm
+++ b/guix/man-db.scm
@@ -1,5 +1,6 @@
;;; GNU Guix --- Functional package management for GNU
;;; Copyright © 2017, 2018 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2022, 2024 Maxim Cournoyer <maxim.cournoyer@gmail.com>
;;;
;;; This file is part of GNU Guix.
;;;
@@ -18,6 +19,7 @@
(define-module (guix man-db)
#:use-module (zlib)
+ #:use-module (zstd)
#:use-module ((guix build utils) #:select (find-files))
#:use-module (gdbm) ;gdbm-ffi
#:use-module (srfi srfi-9)
@@ -48,7 +50,7 @@ (define-module (guix man-db)
(define-record-type <mandb-entry>
(mandb-entry file-name name section synopsis kind)
mandb-entry?
- (file-name mandb-entry-file-name) ;e.g., "../abiword.1.gz"
+ (file-name mandb-entry-file-name) ;e.g., "../abiword.1.zst"
(name mandb-entry-name) ;e.g., "ABIWORD"
(section mandb-entry-section) ;number
(synopsis mandb-entry-synopsis) ;string
@@ -63,7 +65,7 @@ (define (mandb-entry<? entry1 entry2)
(string<? (basename file1) (basename file2))))))))
(define abbreviate-file-name
- (let ((man-file-rx (make-regexp "(.+)\\.[0-9][a-z]?(\\.gz)?$")))
+ (let ((man-file-rx (make-regexp "(.+)\\.[0-9][a-z]?(\\.(gz|zst))?$")))
(lambda (file)
(match (regexp-exec man-file-rx (basename file))
(#f
@@ -71,6 +73,14 @@ (define abbreviate-file-name
(matches
(match:substring matches 1))))))
+(define (gzip-compressed? file-name)
+ "True if FILE-NAME is suffixed with the '.gz' file extension."
+ (string-suffix? ".gz" file-name))
+
+(define (zstd-compressed? file-name)
+ "True if FILE-NAME is suffixed with the '.zst' file extension."
+ (string-suffix? ".zst" file-name))
+
(define (entry->string entry)
"Return the wire format for ENTRY as a string."
(match entry
@@ -92,7 +102,11 @@ (define (entry->string entry)
"\t-\t-\t"
- (if (string-suffix? ".gz" file) "gz" "")
+ (cond
+ ((gzip-compressed? file) "gz")
+ ((zstd-compressed? file) "zst")
+ (else ""))
+
"\t"
synopsis "\x00"))))
@@ -148,7 +162,8 @@ (define (read-synopsis port)
(loop (cons line lines))))))
(define* (man-page->entry file #:optional (resolve identity))
- "Parse FILE, a gzipped man page, and return a <mandb-entry> for it."
+ "Parse FILE, a gzip or zstd compressed man page, and return a <mandb-entry>
+for it."
(define (string->number* str)
(if (and (string-prefix? "\"" str)
(> (string-length str) 1)
@@ -156,8 +171,13 @@ (define* (man-page->entry file #:optional (resolve identity))
(string->number (string-drop (string-drop-right str 1) 1))
(string->number str)))
- ;; Note: This works for both gzipped and uncompressed files.
- (call-with-gzip-input-port (open-file file "r0")
+ (define call-with-input-port*
+ (cond
+ ((gzip-compressed? file) call-with-gzip-input-port)
+ ((zstd-compressed? file) call-with-zstd-input-port)
+ (else call-with-port)))
+
+ (call-with-input-port* (open-file file "r0")
(lambda (port)
(let loop ((name #f)
(section #f)
@@ -191,14 +211,19 @@ (define* (man-page->entry file #:optional (resolve identity))
(define (man-files directory)
"Return the list of man pages found under DIRECTORY, recursively."
;; Filter the list to ensure that broken symlinks are excluded.
- (filter file-exists? (find-files directory "\\.[0-9][a-z]?(\\.gz)?$")))
+ (filter file-exists?
+ (find-files directory "\\.[0-9][a-z]?(\\.(gz|zst))?$")))
(define (mandb-entries directory)
"Return mandb entries for the man pages found under DIRECTORY, recursively."
(map (lambda (file)
(man-page->entry file
(lambda (link)
- (let ((file (string-append directory "/" link
- ".gz")))
- (and (file-exists? file) file)))))
+ (let ((file-gz (string-append directory "/" link
+ ".gz"))
+ (file-zst (string-append directory "/" link
+ ".zst")))
+ (and (or (file-exists? file-gz)
+ (file-exists? file-zst) file)
+ file)))))
(man-files directory)))
diff --git a/guix/profiles.scm b/guix/profiles.scm
index da7790d819..7fa5dab62a 100644
--- a/guix/profiles.scm
+++ b/guix/profiles.scm
@@ -7,7 +7,7 @@
;;; Copyright © 2016, 2017, 2018, 2019, 2021, 2022 Ricardo Wurmus <rekado@elephly.net>
;;; Copyright © 2016 Chris Marusich <cmmarusich@gmail.com>
;;; Copyright © 2017 Huang Ying <huang.ying.caritas@gmail.com>
-;;; Copyright © 2017, 2021 Maxim Cournoyer <maxim.cournoyer@gmail.com>
+;;; Copyright © 2017, 2021, 2024 Maxim Cournoyer <maxim.cournoyer@gmail.com>
;;; Copyright © 2019 Kyle Meyer <kyle@kyleam.com>
;;; Copyright © 2019 Mathieu Othacehe <m.othacehe@gmail.com>
;;; Copyright © 2020 Danny Milosavljevic <dannym@scratchpost.org>
@@ -1701,6 +1701,9 @@ (define* (manual-database manifest #:optional system)
(define guile-zlib
(module-ref (resolve-interface '(gnu packages guile)) 'guile-zlib))
+ (define guile-zstd
+ (module-ref (resolve-interface '(gnu packages guile)) 'guile-zstd))
+
(define modules
(delete '(guix config)
(source-module-closure `((guix build utils)
@@ -1709,7 +1712,8 @@ (define* (manual-database manifest #:optional system)
(define build
(with-imported-modules modules
(with-extensions (list gdbm-ffi ;for (guix man-db)
- guile-zlib)
+ guile-zlib
+ guile-zstd)
#~(begin
(use-modules (guix man-db)
(guix build utils)
--
2.41.0
Ludovic Courtès wrote 1 years ago
Re: bug#68242: [core-updates] Compress man pages using zstd
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)(name . Josselin Poiret)(address . dev@jpoiret.xyz)(address . 68242@debbugs.gnu.org)(name . Simon Tournier)(address . zimon.toutoune@gmail.com)(name . Mathieu Othacehe)(address . othacehe@gnu.org)(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)(name . Ricardo Wurmus)(address . rekado@elephly.net)(name . Christopher Baines)(address . guix@cbaines.net)
87r0ir3ffp.fsf_-_@gnu.org
Hello,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (9 lines)
> * guix/build/utils.scm (compressor): Register zst file name extension.
> * guix/packages.scm (%standard-patch-inputs): Add zstd.
> (patch-and-repack): Rename tarxz-name nested procedure to tar-file-name, and
> accept a new 'ext' argument; adjust accordingly. Add zstd binding, and
> replace the XZ_DEFAULTS environment variable with ZSTD_NBTHREADS. Fallback to
> xz when zstd is not available.
>
> Change-Id: I614a6be8c87a4a0858eadce616c51d8e9b9fc020

Good idea. LGTM!

Ludo’.
Ludovic Courtès wrote 1 years ago
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)(address . 68242@debbugs.gnu.org)
87mstf3fem.fsf_-_@gnu.org
Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (4 lines)
> * gnu/compression.scm (%compressors) [zstd]: Provide the --threads argument.
>
> Change-Id: I4e8dfe725d1b0721c0016c3013b9e609fee94367

LGTM.
Ludovic Courtès wrote 1 years ago
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)(address . 68242@debbugs.gnu.org)
87il433fea.fsf_-_@gnu.org
Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (8 lines)
> There were sometimes out of memory errors on the Berlin build farm, especially
> for i686 or arm machines having less memory.
>
> * guix/build/utils.scm (%xz-parallel-args): Reduce --memlimit value from 50%
> to 20%.
>
> Change-Id: If848bed92ef4c42d11a96057e59ee51a019d0573

LGTM.
Ludovic Courtès wrote 1 years ago
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)(address . 68242@debbugs.gnu.org)
87eder3f9t.fsf_-_@gnu.org
Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (16 lines)
> The aim is to improve the efficiency of computing the man pages database,
> which must decompress the man pages. Zstd is faster than gzip, especially for
> decompression, and has a similar compression ratio.
>
> * gnu/packages/commencement.scm (%final-inputs): Add zstd.
> * guix/build/gnu-build-system.scm
> (compress-documentation) Update doc.
> <info-compressor, info-compressor-flags, man-compressor, man-compressor-flags>
> <man-compressor-file-extension>: New arguments.
> <compressed-documentation-extension>: Rename argument to...
> <info-compressor-file-extension>: ... this. Add an 'extension' argument to
> the retarget-symlink nested procedure. Use new arguments in nested
> 'maybe-compress' procedure.
>
> Change-Id: Ibaad4658f8e5151633714d263d9198f56d255020

That’s a great idea, LGTM!

Do you have figures on the space savings of a package with many man
pages such as gnutls:doc or openssl:doc?

Thanks,
Ludo’.
Ludovic Courtès wrote 1 years ago
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)(name . Josselin Poiret)(address . dev@jpoiret.xyz)(address . 68242@debbugs.gnu.org)(name . Simon Tournier)(address . zimon.toutoune@gmail.com)(name . Mathieu Othacehe)(address . othacehe@gnu.org)(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)(name . Ricardo Wurmus)(address . rekado@elephly.net)(name . Christopher Baines)(address . guix@cbaines.net)
87a5pf3f7v.fsf_-_@gnu.org
Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (12 lines)
> * guix/man-db.scm (<mandb-entry>): Adjust comment.
> (abbreviate-file-name): Adjust regexp.
> (gz-compressed?, zstd-compressed?): New predicates.
> (entry->string): Use them.
> (man-page->entry): Adjust doc. Use input port reader appropriate to the
> compression type, if any.
> (man-files): Adjust regexp.
> (mandb-entries): Adjust link resolving predicate.
> * guix/profiles.scm (manual-database): Add guile-zlib extension.
>
> Change-Id: I6336e46e2d324c520a7d15d6cafd12bbf43c5b09

[...]

Toggle quote (4 lines)
> (define-module (guix man-db)
> #:use-module (zlib)
> + #:use-module (zstd)

Maybe #:autoload both modules for good measure.

Otherwise LGTM, thanks!

Ludo’.
Maxim Cournoyer wrote 1 years ago
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 68242@debbugs.gnu.org)
87h6jnxq2o.fsf@gmail.com
Hi Ludovic!

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (20 lines)
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> The aim is to improve the efficiency of computing the man pages database,
>> which must decompress the man pages. Zstd is faster than gzip, especially for
>> decompression, and has a similar compression ratio.
>>
>> * gnu/packages/commencement.scm (%final-inputs): Add zstd.
>> * guix/build/gnu-build-system.scm
>> (compress-documentation) Update doc.
>> <info-compressor, info-compressor-flags, man-compressor, man-compressor-flags>
>> <man-compressor-file-extension>: New arguments.
>> <compressed-documentation-extension>: Rename argument to...
>> <info-compressor-file-extension>: ... this. Add an 'extension' argument to
>> the retarget-symlink nested procedure. Use new arguments in nested
>> 'maybe-compress' procedure.
>>
>> Change-Id: Ibaad4658f8e5151633714d263d9198f56d255020
>
> That’s a great idea, LGTM!

Thank you for the review!

Toggle quote (3 lines)
> Do you have figures on the space savings of a package with many man
> pages such as gnutls:doc or openssl:doc?

Surprisingly, all of these I've checked used the weighed the same.
Here's gnutls:doc from my local (master) Guix:

Toggle snippet (4 lines)
$ du -sh /gnu/store/8i3bas6lhziqi2n5wg6qzzhlddkb502c-gnutls-3.7.7-doc
4,9M /gnu/store/8i3bas6lhziqi2n5wg6qzzhlddkb502c-gnutls-3.7.7-doc

Compared to core-updates with these changes:

Toggle snippet (4 lines)
$ du -sh /gnu/store/h3lbj1g64lkn9rd9xp86dphqnblxqkl6-gnutls-3.8.1-doc
4.9M /gnu/store/h3lbj1g64lkn9rd9xp86dphqnblxqkl6-gnutls-3.8.1-doc

That's because all the compressed man pages appear to fit in the minimal
4 KiB size of a single file, whether they are compressed with gzip or
zstd compressed.

Both man-pages packages weigh 11 MiB, but we can get an idea of the
compression ratio using:

With my local Guix:

Toggle snippet (23 lines)
$ find $(guix build man-pages) -name '*.gz' | xargs -n1 du | sort -rn | head -n20
64 /gnu/store/93fjc9hv5canvs2lpya0qsbcm44hq7hh-man-pages-6.02/share/man/man5/proc.5.gz
44 /gnu/store/93fjc9hv5canvs2lpya0qsbcm44hq7hh-man-pages-6.02/share/man/man7/bpf-helpers.7.gz
32 /gnu/store/93fjc9hv5canvs2lpya0qsbcm44hq7hh-man-pages-6.02/share/man/man2/perf_event_open.2.gz
28 /gnu/store/93fjc9hv5canvs2lpya0qsbcm44hq7hh-man-pages-6.02/share/man/man2/ptrace.2.gz
20 /gnu/store/93fjc9hv5canvs2lpya0qsbcm44hq7hh-man-pages-6.02/share/man/man7/tcp.7.gz
20 /gnu/store/93fjc9hv5canvs2lpya0qsbcm44hq7hh-man-pages-6.02/share/man/man7/cgroups.7.gz
20 /gnu/store/93fjc9hv5canvs2lpya0qsbcm44hq7hh-man-pages-6.02/share/man/man2/seccomp_unotify.2.gz
20 /gnu/store/93fjc9hv5canvs2lpya0qsbcm44hq7hh-man-pages-6.02/share/man/man2/prctl.2.gz
20 /gnu/store/93fjc9hv5canvs2lpya0qsbcm44hq7hh-man-pages-6.02/share/man/man2/open.2.gz
20 /gnu/store/93fjc9hv5canvs2lpya0qsbcm44hq7hh-man-pages-6.02/share/man/man2/futex.2.gz
20 /gnu/store/93fjc9hv5canvs2lpya0qsbcm44hq7hh-man-pages-6.02/share/man/man2/fcntl.2.gz
16 /gnu/store/93fjc9hv5canvs2lpya0qsbcm44hq7hh-man-pages-6.02/share/man/man7/user_namespaces.7.gz
16 /gnu/store/93fjc9hv5canvs2lpya0qsbcm44hq7hh-man-pages-6.02/share/man/man7/socket.7.gz
16 /gnu/store/93fjc9hv5canvs2lpya0qsbcm44hq7hh-man-pages-6.02/share/man/man7/man-pages.7.gz
16 /gnu/store/93fjc9hv5canvs2lpya0qsbcm44hq7hh-man-pages-6.02/share/man/man7/ip.7.gz
16 /gnu/store/93fjc9hv5canvs2lpya0qsbcm44hq7hh-man-pages-6.02/share/man/man7/cpuset.7.gz
16 /gnu/store/93fjc9hv5canvs2lpya0qsbcm44hq7hh-man-pages-6.02/share/man/man7/capabilities.7.gz
16 /gnu/store/93fjc9hv5canvs2lpya0qsbcm44hq7hh-man-pages-6.02/share/man/man5/elf.5.gz
16 /gnu/store/93fjc9hv5canvs2lpya0qsbcm44hq7hh-man-pages-6.02/share/man/man2/seccomp.2.gz
16 /gnu/store/93fjc9hv5canvs2lpya0qsbcm44hq7hh-man-pages-6.02/share/man/man2/keyctl.2.gz

On core-updates with these changes:

Toggle snippet (23 lines)
$ find /gnu/store/nqp5mmi1kb4xp7nkqsybrp5i18lygsl2-man-pages-6.02 -name '*.zst' | xargs -n1 du | sort -rn | head -n20
56 /gnu/store/nqp5mmi1kb4xp7nkqsybrp5i18lygsl2-man-pages-6.02/share/man/man5/proc.5.zst
36 /gnu/store/nqp5mmi1kb4xp7nkqsybrp5i18lygsl2-man-pages-6.02/share/man/man7/bpf-helpers.7.zst
28 /gnu/store/nqp5mmi1kb4xp7nkqsybrp5i18lygsl2-man-pages-6.02/share/man/man2/perf_event_open.2.zst
24 /gnu/store/nqp5mmi1kb4xp7nkqsybrp5i18lygsl2-man-pages-6.02/share/man/man2/ptrace.2.zst
20 /gnu/store/nqp5mmi1kb4xp7nkqsybrp5i18lygsl2-man-pages-6.02/share/man/man7/tcp.7.zst
20 /gnu/store/nqp5mmi1kb4xp7nkqsybrp5i18lygsl2-man-pages-6.02/share/man/man2/seccomp_unotify.2.zst
20 /gnu/store/nqp5mmi1kb4xp7nkqsybrp5i18lygsl2-man-pages-6.02/share/man/man2/prctl.2.zst
20 /gnu/store/nqp5mmi1kb4xp7nkqsybrp5i18lygsl2-man-pages-6.02/share/man/man2/futex.2.zst
20 /gnu/store/nqp5mmi1kb4xp7nkqsybrp5i18lygsl2-man-pages-6.02/share/man/man2/fcntl.2.zst
16 /gnu/store/nqp5mmi1kb4xp7nkqsybrp5i18lygsl2-man-pages-6.02/share/man/man7/user_namespaces.7.zst
16 /gnu/store/nqp5mmi1kb4xp7nkqsybrp5i18lygsl2-man-pages-6.02/share/man/man7/man-pages.7.zst
16 /gnu/store/nqp5mmi1kb4xp7nkqsybrp5i18lygsl2-man-pages-6.02/share/man/man7/ip.7.zst
16 /gnu/store/nqp5mmi1kb4xp7nkqsybrp5i18lygsl2-man-pages-6.02/share/man/man7/cpuset.7.zst
16 /gnu/store/nqp5mmi1kb4xp7nkqsybrp5i18lygsl2-man-pages-6.02/share/man/man7/cgroups.7.zst
16 /gnu/store/nqp5mmi1kb4xp7nkqsybrp5i18lygsl2-man-pages-6.02/share/man/man7/capabilities.7.zst
16 /gnu/store/nqp5mmi1kb4xp7nkqsybrp5i18lygsl2-man-pages-6.02/share/man/man5/elf.5.zst
16 /gnu/store/nqp5mmi1kb4xp7nkqsybrp5i18lygsl2-man-pages-6.02/share/man/man2/seccomp.2.zst
16 /gnu/store/nqp5mmi1kb4xp7nkqsybrp5i18lygsl2-man-pages-6.02/share/man/man2/open.2.zst
16 /gnu/store/nqp5mmi1kb4xp7nkqsybrp5i18lygsl2-man-pages-6.02/share/man/man2/keyctl.2.zst
16 /gnu/store/nqp5mmi1kb4xp7nkqsybrp5i18lygsl2-man-pages-6.02/share/man/man2/clone.2.zst

So for larger man pages, it seems we're talking about a 10% improvement.
That's not much, but the decompression is more efficient:

Compare gzipped man-pages decompression:
Toggle snippet (17 lines)
$ find /gnu/store/93fjc9hv5canvs2lpya0qsbcm44hq7hh-man-pages-6.02 -name '*.gz' | sh -c 'time xargs gunzip -ck > /dev/null'

real 0m0.137s
user 0m0.106s
sys 0m0.032s
$ find /gnu/store/93fjc9hv5canvs2lpya0qsbcm44hq7hh-man-pages-6.02 -name '*.gz' | sh -c 'time xargs gunzip -ck > /dev/null'

real 0m0.137s
user 0m0.104s
sys 0m0.035s
$ find /gnu/store/93fjc9hv5canvs2lpya0qsbcm44hq7hh-man-pages-6.02 -name '*.gz' | sh -c 'time xargs gunzip -ck > /dev/null'

real 0m0.138s
user 0m0.103s
sys 0m0.036s

With zstd' man-pages decompression:

Toggle snippet (17 lines)
$ find /gnu/store/nqp5mmi1kb4xp7nkqsybrp5i18lygsl2-man-pages-6.02 -name '*.zst' | sh -c 'time xargs zstd -dkc > /dev/null'

real 0m0.091s
user 0m0.033s
sys 0m0.059s
$ find /gnu/store/nqp5mmi1kb4xp7nkqsybrp5i18lygsl2-man-pages-6.02 -name '*.zst' | sh -c 'time xargs zstd -dkc > /dev/null'

real 0m0.091s
user 0m0.035s
sys 0m0.058s
$ find /gnu/store/nqp5mmi1kb4xp7nkqsybrp5i18lygsl2-man-pages-6.02 -name '*.zst' | sh -c 'time xargs zstd -dkc > /dev/null'

real 0m0.090s
user 0m0.027s
sys 0m0.063s

Assuming guile-zstd fares as well as zstd itself, we're looking at 1.5x
faster decompression.

Past measurements though had suggested the decompression was not the
limiting thing in making man-pages faster; rather it had to do with
building the database with Guile (sorry, I can't find a reference to it
anymore).

--
Thanks,
Maxim
Maxim Cournoyer wrote 1 years ago
(name . Ludovic Courtès)(address . ludo@gnu.org)(name . Josselin Poiret)(address . dev@jpoiret.xyz)(name . Simon Tournier)(address . zimon.toutoune@gmail.com)(name . Mathieu Othacehe)(address . othacehe@gnu.org)(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)(address . 68242-done@debbugs.gnu.org)(name . Ricardo Wurmus)(address . rekado@elephly.net)(name . Christopher Baines)(address . guix@cbaines.net)
87a5pfxjv9.fsf@gmail.com
Hi!

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (22 lines)
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> * guix/man-db.scm (<mandb-entry>): Adjust comment.
>> (abbreviate-file-name): Adjust regexp.
>> (gz-compressed?, zstd-compressed?): New predicates.
>> (entry->string): Use them.
>> (man-page->entry): Adjust doc. Use input port reader appropriate to the
>> compression type, if any.
>> (man-files): Adjust regexp.
>> (mandb-entries): Adjust link resolving predicate.
>> * guix/profiles.scm (manual-database): Add guile-zlib extension.
>>
>> Change-Id: I6336e46e2d324c520a7d15d6cafd12bbf43c5b09
>
> [...]
>
>> (define-module (guix man-db)
>> #:use-module (zlib)
>> + #:use-module (zstd)
>
> Maybe #:autoload both modules for good measure.

Done.

Toggle quote (2 lines)
> Otherwise LGTM, thanks!

Excellent, I've pushed the series.

Closing!

--
Thanks,
Maxim
Closed
?
Your comment

This issue is archived.

To comment on this conversation send an email to 68242@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 68242
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch
You may also tag this issue. See list of standard tags. For example, to set the confirmed and easy tags
mumi command -t +confirmed -t +easy
Or, remove the moreinfo tag and set the help tag
mumi command -t -moreinfo -t +help