[PATCH] Adaptive substitute decompression selection

DoneSubmitted by Ludovic Courtès.
Details
One participant
  • Ludovic Courtès
Owner
unassigned
Severity
normal
L
L
Ludovic Courtès wrote on 14 Mar 2021 15:38
(address . guix-patches@gnu.org)
87wnu9ls08.fsf@inria.fr
Hi!

The patch below is a followup to the thread started in December:


It provides a naïve but apparently good enough way for ‘guix substitute’
to choose the compression method that yields the best speed given the
CPU and current networking conditions.

On a recent x86_64 laptop with fast networking, using ci.guix.gnu.org,
the effect so far is to choose gzip substitutes, which indeed provides
slightly faster substitute installation. When ci.guix provides zstd
substitutes, the speedup will be higher.

I have yet to check that it sticks to lzip when bandwidth is low.

Thoughts?

Thanks,
Ludo’.
From 3f95a1ac04c5e178a7fedfc2d03c07bcb1075ead Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <ludo@gnu.org>
Date: Sun, 14 Mar 2021 15:05:30 +0100
Subject: [PATCH] substitute: Choose compression method based on past CPU
usage.

This stems from the observation that substitute download can be
CPU-bound when high-speed networks are in use:


* guix/narinfo.scm (decompresses-faster?): New procedure.
(narinfo-best-uri): Add #:fast-decompression?.
* guix/scripts/substitute.scm (%prefer-fast-decompression?): New
variable.
(call-with-cpu-usage-monitoring): New procedure.
(with-cpu-usage-monitoring): New macro.
(display-narinfo-data, process-substitution): Pass #:fast-decompression?
to 'narinfo-best-uri'.
(process-substitution): Wrap 'restore-file' call in
'with-cpu-usage-monitoring'. Set '%prefer-fast-decompression?'.
---
guix/narinfo.scm | 27 ++++++++++++++++---
guix/scripts/substitute.scm | 53 ++++++++++++++++++++++++++++++++-----
2 files changed, 69 insertions(+), 11 deletions(-)

Toggle diff (137 lines)
diff --git a/guix/narinfo.scm b/guix/narinfo.scm
index 2d06124017..72e0f75fda 100644
--- a/guix/narinfo.scm
+++ b/guix/narinfo.scm
@@ -1,5 +1,5 @@
 ;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021 Ludovic Courtès <ludo@gnu.org>
 ;;; Copyright © 2014 Nikita Karetnikov <nikita@karetnikov.org>
 ;;; Copyright © 2018 Kyle Meyer <kyle@kyleam.com>
 ;;;
@@ -297,9 +297,21 @@ this is a rough approximation."
     (_      (or (string=? compression2 "none")
                 (string=? compression2 "gzip")))))
 
-(define (narinfo-best-uri narinfo)
+(define (decompresses-faster? compression1 compression2)
+  "Return true if COMPRESSION1 generally has a higher decompression throughput
+than COMPRESSION2."
+  (match compression1
+    ("none" #t)
+    ("zstd" #t)
+    ("gzip" (string=? compression2 "lzip"))
+    (_      #f)))
+
+(define* (narinfo-best-uri narinfo #:key fast-decompression?)
   "Select the \"best\" URI to download NARINFO's nar, and return three values:
-the URI, its compression method (a string), and the compressed file size."
+the URI, its compression method (a string), and the compressed file size.
+When FAST-DECOMPRESSION? is true, prefer substitutes with faster
+decompression (typically zstd) rather than substitutes with a higher
+compression ratio (typically lzip)."
   (define choices
     (filter (match-lambda
               ((uri compression file-size)
@@ -321,6 +333,13 @@ the URI, its compression method (a string), and the compressed file size."
           (compresses-better? compression1 compression2))))
       (_ #f)))                                    ;we can't tell
 
-  (match (sort choices file-size<?)
+  (define (speed<? c1 c2)
+    (match c1
+      ((uri1 compression1 . _)
+       (match c2
+         ((uri2 compression2 . _)
+          (decompresses-faster? compression2 compression1))))))
+
+  (match (sort choices (if fast-decompression? (negate speed<?) file-size<?))
     (((uri compression file-size) _ ...)
      (values uri compression file-size))))
diff --git a/guix/scripts/substitute.scm b/guix/scripts/substitute.scm
index 6892aa999b..b213e6da06 100755
--- a/guix/scripts/substitute.scm
+++ b/guix/scripts/substitute.scm
@@ -257,6 +257,27 @@ Internal tool to substitute a pre-built binary to a local build.\n"))
 ;;; Daemon/substituter protocol.
 ;;;
 
+(define %prefer-fast-decompression?
+  ;; Whether to prefer fast decompression over good compression ratios.  This
+  ;; serves in particular to choose between lzip (high compression ratio but
+  ;; low decompression throughput) and zstd (lower compression ratio but high
+  ;; decompression throughput).
+  #f)
+
+(define (call-with-cpu-usage-monitoring proc)
+  (let ((before (times)))
+    (proc)
+    (let ((after (times)))
+      (if (= (tms:clock after) (tms:clock before))
+          0
+          (/ (- (tms:utime after) (tms:utime before))
+             (- (tms:clock after) (tms:clock before))
+             1.)))))
+
+(define-syntax-rule (with-cpu-usage-monitoring exp ...)
+  "Evaluate EXP...  Return its CPU usage as a fraction between 0 and 1."
+  (call-with-cpu-usage-monitoring (lambda () exp ...)))
+
 (define (display-narinfo-data narinfo)
   "Write to the current output port the contents of NARINFO in the format
 expected by the daemon."
@@ -269,7 +290,10 @@ expected by the daemon."
   (for-each (cute format #t "~a/~a~%" (%store-prefix) <>)
             (narinfo-references narinfo))
 
-  (let-values (((uri compression file-size) (narinfo-best-uri narinfo)))
+  (let-values (((uri compression file-size)
+                (narinfo-best-uri narinfo
+                                  #:fast-decompression?
+                                  %prefer-fast-decompression?)))
     (format #t "~a\n~a\n"
             (or file-size 0)
             (or (narinfo-size narinfo) 0))))
@@ -438,7 +462,9 @@ the current output port."
            store-item))
 
   (let-values (((uri compression file-size)
-                (narinfo-best-uri narinfo)))
+                (narinfo-best-uri narinfo
+                                  #:fast-decompression?
+                                  %prefer-fast-decompression?)))
     (unless print-build-trace?
       (format (current-error-port)
               (G_ "Downloading ~a...~%") (uri->string uri)))
@@ -476,11 +502,24 @@ the current output port."
                   ((hashed get-hash)
                    (open-hash-input-port algorithm input)))
       ;; Unpack the Nar at INPUT into DESTINATION.
-      (restore-file hashed destination
-                    #:dump-file (if (and destination-in-store?
-                                         deduplicate?)
-                                    dump-file/deduplicate*
-                                    dump-file))
+      (define cpu-usage
+        (with-cpu-usage-monitoring
+         (restore-file hashed destination
+                       #:dump-file (if (and destination-in-store?
+                                            deduplicate?)
+                                       dump-file/deduplicate*
+                                       dump-file))))
+
+      ;; Create a hysteresis: depending on CPU usage, favor compression
+      ;; methods with faster decompression (like ztsd) or methods with better
+      ;; compression ratios (like lzip).  This stems from the observation that
+      ;; substitution can be CPU-bound when high-speed networks are used:
+      ;; <https://lists.gnu.org/archive/html/guix-devel/2020-12/msg00177.html>.
+      (when (> cpu-usage .8)
+        (set! %prefer-fast-decompression? #t))
+      (when (< cpu-usage .4)
+        (set! %prefer-fast-decompression? #f))
+
       (close-port hashed)
       (close-port input)
 
-- 
2.30.2
L
L
Ludovic Courtès wrote on 21 Mar 2021 23:46
(address . 47137-done@debbugs.gnu.org)
874kh415xr.fsf@gnu.org
Hi!

Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (15 lines)
> The patch below is a followup to the thread started in December:
>
> https://lists.gnu.org/archive/html/guix-devel/2020-12/msg00177.html
>
> It provides a naïve but apparently good enough way for ‘guix substitute’
> to choose the compression method that yields the best speed given the
> CPU and current networking conditions.
>
> On a recent x86_64 laptop with fast networking, using ci.guix.gnu.org,
> the effect so far is to choose gzip substitutes, which indeed provides
> slightly faster substitute installation. When ci.guix provides zstd
> substitutes, the speedup will be higher.
>
> I have yet to check that it sticks to lzip when bandwidth is low.

I did that, using ‘tc’, and it works as expected, staying on lzip.

Pushed as 9da5ec7099b992a8969a17627548cd341c01bd90 with two minor
tweaks: lowered the low hysteresis threshold, and added a comment on how
to use ‘tc’ to test the behavior on “slow” networks.

Rather than running ‘guix build’ followed by ‘guix gc’, I found that
manually invoking ‘guix substitute’ was nicer (long line ahead!):

( echo substitute /gnu/store/svv4826f8zfj8grl2qa17xnxk3acsppc-elixir-1.11.4 /tmp/t1; echo substitute /gnu/store/d9dk53m7pwx1dc1p97zm0q323gpk70f9-poezio-0.13.1 /tmp/t4; echo substitute /gnu/store/mra8i18y9gjavhmdlkbb10m4miinirgz-ocaml-4.11.1 /tmp/t2; echo substitute /gnu/store/ay2j5mp20j9vbhibcwp5lmmcmhqkdnga-vim-full-8.2.2632 /tmp/t3; echo substitute /gnu/store/svv4826f8zfj8grl2qa17xnxk3acsppc-elixir-1.11.4 /tmp/t5; echo substitute /gnu/store/ay2j5mp20j9vbhibcwp5lmmcmhqkdnga-vim-full-8.2.2632 /tmp/t6) | GUIX_ALLOW_UNAUTHENTICATED_SUBSTITUTES=yes ./pre-inst-env guix substitute --substitute 4>&2

Note that this change won’t take effect until we update the ‘guix’
package.

Ludo’.
Closed
?
Your comment

This issue is archived.

To comment on this conversation send email to 47137@debbugs.gnu.org