[PATCH] store: Use buffered I/O for all protocol writes

DoneSubmitted by Lars-Dominik Braun.
Details
2 participants
  • Lars-Dominik Braun
  • Ludovic Courtès
Owner
unassigned
Severity
normal
L
L
Lars-Dominik Braun wrote on 5 Jun 2020 11:11
(address . guix-patches@gnu.org)
20200605091151.GB3225@zpidnp36
Hi,

I did some digging to investigate why SSH-based guix operations are currently
so slow. One of the reasons seems to be that some operations write single words
to the socket and due to NOWAIT these result in a lot of small packets.

The attached patch modifies store.scm to buffer all writes, so that won’t happen
any more. I’m seeing about ~40% speedup (6.194s vs 10.075s for the best out of
five runs) for `GUIX_DAEMON_SOCKET=ssh://localhost guix environment guix --
true`, but it also seems to have a negative impact on UNIX domain socket
communication by ~10% (1.561s vs 1.385s). For the notorious r-learnr package
it’s still ~20% better (ssh, 16.620/21.418) and only ~3% worse (unix,
11.489/11.199). Keep in mind localhost has a much larger MTU and much lower
latency than usual networks though. It might benefit from having an even bigger
write buffer (64k vs. currently 8k).

Cheers,
Lars
From eb38dc246e5dd33f8a48a99aded8b8dc0c378376 Mon Sep 17 00:00:00 2001
From: Lars-Dominik Braun <ldb@leibniz-psychology.org>
Date: Fri, 5 Jun 2020 10:38:32 +0200
Subject: [PATCH] store: Use buffered I/O for all protocol writes

* guix/store.scm (run-gc) Use buffered output port.
(export-path) Same.
(add-file-tree-to-store) Same.
(set-build-options): Same. Add explicit flush.
---
guix/store.scm | 70 ++++++++++++++++++++++++++++----------------------
1 file changed, 39 insertions(+), 31 deletions(-)

Toggle diff (135 lines)
diff --git a/guix/store.scm b/guix/store.scm
index 014d08aaec..9b3879b4a7 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -3,6 +3,7 @@
 ;;; Copyright © 2018 Jan Nieuwenhuizen <janneke@gnu.org>
 ;;; Copyright © 2019, 2020 Mathieu Othacehe <m.othacehe@gmail.com>
 ;;; Copyright © 2020 Florian Pelz <pelzflorian@pelzflorian.de>
+;;; Copyright © 2020 Lars-Dominik Braun <ldb@leibniz-psychology.org>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -821,8 +822,8 @@ encoding conversion errors."
                             (locale (false-if-exception (setlocale LC_ALL))))
   ;; Must be called after `open-connection'.
 
-  (define socket
-    (store-connection-socket server))
+  (define buffered
+    (store-connection-output-port server))
 
   (unless (unspecified? use-build-hook?)
     (warn-about-deprecation #:use-build-hook? #f
@@ -831,9 +832,9 @@ encoding conversion errors."
   (let-syntax ((send (syntax-rules ()
                        ((_ (type option) ...)
                         (begin
-                          (write-arg type option socket)
+                          (write-arg type option buffered)
                           ...)))))
-    (write-int (operation-id set-options) socket)
+    (write-int (operation-id set-options) buffered)
     (send (boolean keep-failed?) (boolean keep-going?)
           (boolean fallback?) (integer verbosity))
     (when (< (store-connection-minor-version server) #x61)
@@ -896,6 +897,7 @@ encoding conversion errors."
                            `(("locale" . ,locale))
                            '()))))
         (send (string-pairs pairs))))
+    (write-buffered-output server)
     (let loop ((done? (process-stderr server)))
       (or done? (process-stderr server)))))
 
@@ -1108,13 +1110,14 @@ path."
            ;; We don't use the 'operation' macro so we can pass SELECT? to
            ;; 'write-file'.
            (record-operation 'add-to-store)
-           (let ((port (store-connection-socket server)))
-             (write-int (operation-id add-to-store) port)
-             (write-string basename port)
-             (write-int 1 port)                   ;obsolete, must be #t
-             (write-int (if recursive? 1 0) port)
-             (write-string hash-algo port)
-             (write-file file-name port #:select? select?)
+           (let ((port (store-connection-socket server))
+                 (buffered (store-connection-output-port server)))
+             (write-int (operation-id add-to-store) buffered)
+             (write-string basename buffered)
+             (write-int 1 buffered)                   ;obsolete, must be #t
+             (write-int (if recursive? 1 0) buffered)
+             (write-string hash-algo buffered)
+             (write-file file-name buffered #:select? select?)
              (write-buffered-output server)
              (let loop ((done? (process-stderr server)))
                (or done? (loop (process-stderr server))))
@@ -1220,13 +1223,14 @@ an arbitrary directory layout in the store without creating a derivation."
         ;; We don't use the 'operation' macro so we can use 'write-file-tree'
         ;; instead of 'write-file'.
         (record-operation 'add-to-store/tree)
-        (let ((port (store-connection-socket server)))
-          (write-int (operation-id add-to-store) port)
-          (write-string basename port)
-          (write-int 1 port)                      ;obsolete, must be #t
-          (write-int (if recursive? 1 0) port)
-          (write-string hash-algo port)
-          (write-file-tree basename port
+        (let ((port (store-connection-socket server))
+              (buffered (store-connection-output-port server)))
+          (write-int (operation-id add-to-store) buffered)
+          (write-string basename buffered)
+          (write-int 1 buffered)                      ;obsolete, must be #t
+          (write-int (if recursive? 1 0) buffered)
+          (write-string hash-algo buffered)
+          (write-file-tree basename buffered
                            #:file-type+size file-type+size
                            #:file-port file-port
                            #:symlink-target symlink-target
@@ -1644,17 +1648,19 @@ the list of store paths to delete.  IGNORE-LIVENESS? should always be
 #f.  MIN-FREED is the minimum amount of disk space to be freed, in
 bytes, before the GC can stop.  Return the list of store paths delete,
 and the number of bytes freed."
-  (let ((s (store-connection-socket server)))
-    (write-int (operation-id collect-garbage) s)
-    (write-int action s)
-    (write-store-path-list to-delete s)
-    (write-arg boolean #f s)                      ; ignore-liveness?
-    (write-long-long min-freed s)
-    (write-int 0 s)                               ; obsolete
+  (let ((s (store-connection-socket server))
+        (buffered (store-connection-output-port server)))
+    (write-int (operation-id collect-garbage) buffered)
+    (write-int action buffered)
+    (write-store-path-list to-delete buffered)
+    (write-arg boolean #f buffered)                      ; ignore-liveness?
+    (write-long-long min-freed buffered)
+    (write-int 0 buffered)                               ; obsolete
     (when (>= (store-connection-minor-version server) 5)
       ;; Obsolete `use-atime' and `max-atime' parameters.
-      (write-int 0 s)
-      (write-int 0 s))
+      (write-int 0 buffered)
+      (write-int 0 buffered))
+    (write-buffered-output server)
 
     ;; Loop until the server is done sending error output.
     (let loop ((done? (process-stderr server)))
@@ -1711,10 +1717,12 @@ is raised if the set of paths read from PORT is not signed (as per
 
 (define* (export-path server path port #:key (sign? #t))
   "Export PATH to PORT.  When SIGN? is true, sign it."
-  (let ((s (store-connection-socket server)))
-    (write-int (operation-id export-path) s)
-    (write-store-path path s)
-    (write-arg boolean sign? s)
+  (let ((s (store-connection-socket server))
+        (buffered (store-connection-output-port server)))
+    (write-int (operation-id export-path) buffered)
+    (write-store-path path buffered)
+    (write-arg boolean sign? buffered)
+    (write-buffered-output server)
     (let loop ((done? (process-stderr server port)))
       (or done? (loop (process-stderr server port))))
     (= 1 (read-int s))))
-- 
2.20.1
-----BEGIN PGP SIGNATURE-----

iQGzBAABCgAdFiEEyk+M9DfXR4/aBV/UQhN3ARo3hEYFAl7aDFMACgkQQhN3ARo3
hEaxtgv/dpe1nWch2+Hfx3WBMZha9WyyFjW8TeasbC5cdy5bccYwasr5Q3Zd+Xbr
FYBubNTUkMZz4p5M4qIC1QRloyXf9GlkSYKY7VTYGEXmlIa6mPPzwpkcq0xcJTl3
vypg12EhiBJBXA0KuGpkWgn8Cgxb7y0K1fHeWHa2o+oPXXdVqgJt36oK+bxxkkTH
pmv0D4PhDWnxMv0wvYdz6ls5TYiIqUGpSufMIh9ynEiXBkGz/0J4bEGVV5aSKBrF
LEB01Pr7TSuR9UDg+OqEz7Q7MMVEN63BaQkxJOi2rBdVLDX/C713LHjgzSahXY37
OzOQZUFX8aCPF4dJSutyUhOUKsdOyR7K3qGxlvuRiCmzHotSym1kWIDuo/9f1e1H
h1YOutEHUIiOyiPmo941h5GlGV0qZuFCjKEu4Y7SXT0UYAR1vieL4RIzc6E0IY+m
xrLixnn1pSOn4ebeuB+QDxha/VmvyPXo2nUBwOKuXhn9dGld5x0nYWsjrC7Pj91i
DSdSf1Xm
=THIV
-----END PGP SIGNATURE-----


L
L
Ludovic Courtès wrote on 5 Jun 2020 23:33
(name . Lars-Dominik Braun)(address . ldb@leibniz-psychology.org)(address . 41720@debbugs.gnu.org)
875zc5xkoh.fsf@gnu.org
Hi,

Lars-Dominik Braun <ldb@leibniz-psychology.org> skribis:

Toggle quote (4 lines)
> I did some digging to investigate why SSH-based guix operations are currently
> so slow. One of the reasons seems to be that some operations write single words
> to the socket and due to NOWAIT these result in a lot of small packets.

Good catch!

Toggle quote (10 lines)
> The attached patch modifies store.scm to buffer all writes, so that won’t happen
> any more. I’m seeing about ~40% speedup (6.194s vs 10.075s for the best out of
> five runs) for `GUIX_DAEMON_SOCKET=ssh://localhost guix environment guix --
> true`, but it also seems to have a negative impact on UNIX domain socket
> communication by ~10% (1.561s vs 1.385s). For the notorious r-learnr package
> it’s still ~20% better (ssh, 16.620/21.418) and only ~3% worse (unix,
> 11.489/11.199). Keep in mind localhost has a much larger MTU and much lower
> latency than usual networks though. It might benefit from having an even bigger
> write buffer (64k vs. currently 8k).

Nice, though the slowdown on Unix-domain sockets is annoying given that
it’s the primary transport. Are the 10% significant and stable over
multiple runs?

It may be that writing through the buffered port, with the extra
allocations, indirection through the custom port, etc., outweighs the
gains.

OTOH all the other RPCs are buffered, so it makes sense to just do the
same.

(It may be wiser to make the ‘operation’ macro smarter and emit code
that directly allocates a bytevector of the right size and sends it.
Future work!)

Toggle quote (10 lines)
> From eb38dc246e5dd33f8a48a99aded8b8dc0c378376 Mon Sep 17 00:00:00 2001
> From: Lars-Dominik Braun <ldb@leibniz-psychology.org>
> Date: Fri, 5 Jun 2020 10:38:32 +0200
> Subject: [PATCH] store: Use buffered I/O for all protocol writes
>
> * guix/store.scm (run-gc) Use buffered output port.
> (export-path) Same.
> (add-file-tree-to-store) Same.
> (set-build-options): Same. Add explicit flush.

Look:

Toggle snippet (12 lines)
$ GUIX_PROFILING=rpc guix environment guix -- true
Remote procedure call summary: 1927 RPCs
built-in-builders ... 1
build-things ... 1
query-substitutable-path-infos ... 1
valid-path? ... 2
add-to-store/tree ... 19
query-references ... 108
add-to-store ... 133
add-text-to-store ... 1662

So probably the gain you’re seeing comes from ‘add-file-tree-to-store’.

Otherwise the patch LGTM as long as it doesn’t introduce any test suite
regressions.

Thanks for investigating!

Ludo’.
L
L
Lars-Dominik Braun wrote on 8 Jun 2020 08:42
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 41720@debbugs.gnu.org)
20200608064228.GB3166@zpidnp36
Hi Ludo,

Toggle quote (3 lines)
> Nice, though the slowdown on Unix-domain sockets is annoying given that
> it’s the primary transport. Are the 10% significant and stable over
> multiple runs?
I don’t want to blind your eyes with meaningless statistics here, but let’s do
this properly then. So for `guix environment guix -- true` with 10 trials these
are the results:

---snip---
$ ministat -A -c 99 before_ssh after_ssh
x before_ssh
+ after_ssh
N Min Max Median Avg Stddev
x 10 7.63 10.635 10.1585 9.9382 0.85517585
+ 10 5.986 6.424 6.1755 6.1917 0.15398416
Difference at 99.0% confidence
-3.7465 +/- 0.790815
-37.698% +/- 5.0788%
(Student's t, pooled s = 0.614425)

$ ministat -A -c 99 before_unix after_unix
x before_unix
+ after_unix
N Min Max Median Avg Stddev
x 10 1.528 1.624 1.5445 1.5636 0.035842867
+ 10 1.576 1.701 1.6145 1.6229 0.037869513
Difference at 99.0% confidence
0.0593 +/- 0.0474548
3.79253% +/- 3.08989%
(Student's t, pooled s = 0.0368701)
---snap---

And the same for `guix environment --ad-hoc r-learnr`:

---snip---
$ ministat -A -c 99 before_ssh after_ssh
x before_ssh
+ after_ssh
N Min Max Median Avg Stddev
x 10 20.104 21.629 21.082 21.0115 0.45142472
+ 10 16.082 16.713 16.292 16.3416 0.25018224
Difference at 99.0% confidence
-4.6699 +/- 0.469719
-22.2254% +/- 1.86734%
(Student's t, pooled s = 0.364949)

$ ministat -A -c 99 before_unix after_unix
x before_unix
+ after_unix
N Min Max Median Avg Stddev
x 10 11.322 11.529 11.401 11.4127 0.062467858
+ 10 11.249 11.469 11.3515 11.368 0.078689262
No difference proven at 99.0% confidence
---snap---

Toggle quote (2 lines)
> Otherwise the patch LGTM as long as it doesn’t introduce any test suite
> regressions.
Yes, it passes the test suite.

Lars
-----BEGIN PGP SIGNATURE-----

iQGzBAABCgAdFiEEyk+M9DfXR4/aBV/UQhN3ARo3hEYFAl7d3dAACgkQQhN3ARo3
hEYkJAv/XGCQySywDtYuQLK2yAKLZvGo27Y6RQ0RiqR1C16Xvcf9h1V3BzuL7zg7
1DGSY4dgExUIKNVXE7TnzD6/Ulg1qNcMunnsWRLnZX/pRVmqWtc7/jHrfxsuPHd4
Guevj+QIWT2kC1Dtj+eTKGJVodDr9BLTfJmWh8orLMOeWCB9dtDmrfWX1DLKdsMH
P1IzOEmRlrE4CS0syqD1F/5yKueT6sAvjSqeQ5aQfJZr1ecGwJZ+xFmxfS1kwnSU
OWAfJ+pOHKyeZIL5DMYGIem+CtUHBlXZf6rhSmNhS2aCOYFzcSDZfR/nAcCMyzJR
m4F9mZQp+38C8ljXYR3UwUeWHguwjDwMFFI2p0kAsD2FigFuYWP7z2BpfuTPWxiJ
CnIjRHBgxTAmXUoZfI/tDqX9MzrjX2bPN04Eiqm9+z14Tluq45BmnnMCLOEZBfy4
+nShsaKXerTBqr4LpyYzD11uRiPsO/22QC95Y1S848bWisEkFuh6mCA+8RN9N+6s
dNqqHMXJ
=KYN5
-----END PGP SIGNATURE-----


L
L
Ludovic Courtès wrote on 9 Jun 2020 09:49
(name . Lars-Dominik Braun)(address . ldb@leibniz-psychology.org)(address . 41720-done@debbugs.gnu.org)
87d068oezm.fsf@gnu.org
Hi!

Lars-Dominik Braun <ldb@leibniz-psychology.org> skribis:

Toggle quote (54 lines)
>> Nice, though the slowdown on Unix-domain sockets is annoying given that
>> it’s the primary transport. Are the 10% significant and stable over
>> multiple runs?
> I don’t want to blind your eyes with meaningless statistics here, but let’s do
> this properly then. So for `guix environment guix -- true` with 10 trials these
> are the results:
>
> ---snip---
> $ ministat -A -c 99 before_ssh after_ssh
> x before_ssh
> + after_ssh
> N Min Max Median Avg Stddev
> x 10 7.63 10.635 10.1585 9.9382 0.85517585
> + 10 5.986 6.424 6.1755 6.1917 0.15398416
> Difference at 99.0% confidence
> -3.7465 +/- 0.790815
> -37.698% +/- 5.0788%
> (Student's t, pooled s = 0.614425)
>
> $ ministat -A -c 99 before_unix after_unix
> x before_unix
> + after_unix
> N Min Max Median Avg Stddev
> x 10 1.528 1.624 1.5445 1.5636 0.035842867
> + 10 1.576 1.701 1.6145 1.6229 0.037869513
> Difference at 99.0% confidence
> 0.0593 +/- 0.0474548
> 3.79253% +/- 3.08989%
> (Student's t, pooled s = 0.0368701)
> ---snap---
>
> And the same for `guix environment --ad-hoc r-learnr`:
>
> ---snip---
> $ ministat -A -c 99 before_ssh after_ssh
> x before_ssh
> + after_ssh
> N Min Max Median Avg Stddev
> x 10 20.104 21.629 21.082 21.0115 0.45142472
> + 10 16.082 16.713 16.292 16.3416 0.25018224
> Difference at 99.0% confidence
> -4.6699 +/- 0.469719
> -22.2254% +/- 1.86734%
> (Student's t, pooled s = 0.364949)
>
> $ ministat -A -c 99 before_unix after_unix
> x before_unix
> + after_unix
> N Min Max Median Avg Stddev
> x 10 11.322 11.529 11.401 11.4127 0.062467858
> + 10 11.249 11.469 11.3515 11.368 0.078689262
> No difference proven at 99.0% confidence
> ---snap---

Awesome, thanks a lot. Looks like the difference on Unix-domain sockets
is not significant.

Applied!

What package provides ‘ministat’?

Thanks,
Ludo’.
Closed
L
L
Lars-Dominik Braun wrote on 9 Jun 2020 10:21
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 41720-done@debbugs.gnu.org)
20200609082151.GA8597@zpidnp36
Hi,

great!

Toggle quote (1 lines)
> What package provides ‘ministat’?
It’s originally a FreeBSD tool[1], but there are several unofficial Linux ports
on GitHub, like [2].

Cheers,
Lars

-----BEGIN PGP SIGNATURE-----

iQGzBAABCgAdFiEEyk+M9DfXR4/aBV/UQhN3ARo3hEYFAl7fRpwACgkQQhN3ARo3
hEY4Jgv9EegfGhsn0onjdNqJmMmSRcbm9Zg2MGSlkwEZPk9JVfZB2X/hTa5eiUGb
ZvO5bS7UrifBbVaMok08EYf0Chkhl1NC9/FSZwEhn5UJLpPYO+Udkw1lEdATbKcI
jKmJkn1jR+yddDwG4KLC6y9RsAUUAB7uhTe4hgE7t8YFNRdwjh/aD8kpj+cPO1gB
FK5dDoElnoC53jHpmGDu0v/Dg53Yib1wX9wlTNNvNFknaI7QX6va7F32gt899doI
G8FumgyWfl2Y6SBotsTWaoLNw8b6ufzg7xXIdQah/xsku04XW/do6s6wUeMSPTsU
XkAEO7YsOOzpbEHRpyAZ4Zvh9nipvN2+tnzUWQBz2xEjSnZZH+kRBOJVeq3PUBW4
oafCjYJ/B5w2vqCy4HR/ko8o9S2FZKKa6aOhnjtZH1lYZlsOBuXZu+xRiYVnSbkp
EOvuN1SsJAGcuiCCw9YDatf2aicCy9gsTyHvdOGs4Lxw0Zl0amt+/0fO+pIwisVK
CvWlzCen
=h5uO
-----END PGP SIGNATURE-----


Closed
?
Your comment

This issue is archived.

To comment on this conversation send email to 41720@debbugs.gnu.org