[PATCH 0/2] Avoid cache cleanup storms

  • Done
  • quality assurance status badge
Details
One participant
  • Ludovic Courtès
Owner
unassigned
Submitted by
Ludovic Courtès
Severity
normal
L
L
Ludovic Courtès wrote on 16 Jul 11:10 +0200
(address . guix-patches@gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
cover.1721120884.git.ludo@gnu.org
Hello!

This fixes “cache cleanup storms” reported by Chris Baines whereby
typically multiple ‘guix substitute’ processes would start cleaning
up /var/guix/substitute/cache concurrently, leading to poor performance
(in particular on build farms with many such processes running in
parallel, even worse when on spinning disks).

Thoughts?

Ludo’.

Ludovic Courtès (2):
syscalls: Add ‘mode’ parameter to ‘lock-file’.
cache: Avoid cache cleanup storms from concurrent processes.

guix/build/syscalls.scm | 14 +++++++++-----
guix/cache.scm | 27 ++++++++++++++++++---------
tests/cache.scm | 30 +++++++++++++++++++++++++++++-
tests/syscalls.scm | 13 +++++++++++++
4 files changed, 69 insertions(+), 15 deletions(-)


base-commit: eb508e32d2d359c94d2cabebfe90dc32ca5dcf4f
--
2.45.2
L
L
Ludovic Courtès wrote on 16 Jul 11:15 +0200
[PATCH 1/2] syscalls: Add ‘mode’ parameter to ‘lock-file’.
(address . 72137@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
e188db35c6e75d06fafd3a0ae5e8df6d35f8ce56.1721120884.git.ludo@gnu.org
* guix/build/syscalls.scm (lock-file): Add ‘mode’ parameter and honor it.
* tests/syscalls.scm ("lock-file + unlock-file"): New test.

Change-Id: I113fb4a8b35dd8782b9c0991574e39a4b4393333
---
guix/build/syscalls.scm | 14 +++++++++-----
tests/syscalls.scm | 13 +++++++++++++
2 files changed, 22 insertions(+), 5 deletions(-)

Toggle diff (54 lines)
diff --git a/guix/build/syscalls.scm b/guix/build/syscalls.scm
index 39bcffd516..2c20edf058 100644
--- a/guix/build/syscalls.scm
+++ b/guix/build/syscalls.scm
@@ -1398,14 +1398,18 @@ (define fcntl-flock
;; Presumably we got EAGAIN or so.
(throw 'flock-error err))))))
-(define* (lock-file file #:key (wait? #t))
- "Wait and acquire an exclusive lock on FILE. Return an open port."
- (let ((port (open-file file "w0")))
- (fcntl-flock port 'write-lock #:wait? wait?)
+(define* (lock-file file #:optional (mode "w0")
+ #:key (wait? #t))
+ "Wait and acquire an exclusive lock on FILE. Return an open port according
+to MODE."
+ (let ((port (open-file file mode)))
+ (fcntl-flock port
+ (if (output-port? port) 'write-lock 'read-lock)
+ #:wait? wait?)
port))
(define (unlock-file port)
- "Unlock PORT, a port returned by 'lock-file'."
+ "Unlock PORT, a port returned by 'lock-file', and close it."
(fcntl-flock port 'unlock)
(close-port port)
#t)
diff --git a/tests/syscalls.scm b/tests/syscalls.scm
index 7cf67c060d..13f4f11721 100644
--- a/tests/syscalls.scm
+++ b/tests/syscalls.scm
@@ -383,6 +383,19 @@ (define perform-container-tests?
(close-port file)
result)))))))))
+(test-equal "lock-file + unlock-file"
+ 'hello
+ (call-with-temporary-directory
+ (lambda (directory)
+ (let* ((file (in-vicinity directory "lock"))
+ (out (lock-file file #:wait? #f)))
+ (display "hello" out)
+ (unlock-file out)
+ (let* ((in (lock-file file "r0"))
+ (content (read in)))
+ (unlock-file in)
+ content)))))
+
(test-equal "set-thread-name"
"Syscall Test"
(let ((name (thread-name)))
--
2.45.2
L
L
Ludovic Courtès wrote on 16 Jul 11:15 +0200
[PATCH 2/2] cache: Avoid cache cleanup storms from concurrent processes.
(address . 72137@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
26f82a475d4ccd776c64c0a14628597dd65d8908.1721120884.git.ludo@gnu.org
Reported by Christopher Baines <guix@cbaines.net>.

* guix/cache.scm (maybe-remove-expired-cache-entries): Define
‘expiry-port’; create it with ‘lock-file’. Change ‘last-expiry-date’
accordingly. Write timestamp straight to ‘expiry-port’.
* tests/cache.scm ("maybe-remove-expired-cache-entries, cleanup needed
but lock taken"): New test.

Change-Id: I22441d9d2c4a339d3d3878de131864db5a0ae826
---
guix/cache.scm | 27 ++++++++++++++++++---------
tests/cache.scm | 30 +++++++++++++++++++++++++++++-
2 files changed, 47 insertions(+), 10 deletions(-)

Toggle diff (114 lines)
diff --git a/guix/cache.scm b/guix/cache.scm
index 6a91c7d3ef..8b12312c77 100644
--- a/guix/cache.scm
+++ b/guix/cache.scm
@@ -1,5 +1,5 @@
;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2013-2017, 2020-2021, 2023 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2013-2017, 2020-2021, 2023-2024 Ludovic Courtès <ludo@gnu.org>
;;; Copyright © 2022 Simon Tournier <zimon.toutoune@gmail.com>
;;;
;;; This file is part of GNU Guix.
@@ -19,6 +19,7 @@
(define-module (guix cache)
#:use-module ((guix utils) #:select (with-atomic-file-output))
+ #:autoload (guix build syscalls) (lock-file unlock-file)
#:use-module (srfi srfi-19)
#:use-module (srfi srfi-26)
#:use-module (ice-9 match)
@@ -93,13 +94,19 @@ (define* (maybe-remove-expired-cache-entries cache
(define expiry-file
(string-append cache "/last-expiry-cleanup"))
+ (define expiry-port
+ ;; Get exclusive access to EXPIRY-FILE to avoid "cleanup storms" where
+ ;; several processes would concurrently decide that time has come to clean
+ ;; up the same cache. 'lock-file' might throw to 'system-error' or to
+ ;; 'flock-error'; in either case, assume that we lost the race.
+ (false-if-exception
+ (lock-file expiry-file "a+0" #:wait? #f)))
+
(define last-expiry-date
- (catch 'system-error
- (lambda ()
- (or (string->number
- (call-with-input-file expiry-file get-string-all))
- 0))
- (const 0)))
+ (if expiry-port
+ (or (string->number (get-string-all expiry-port))
+ 0)
+ +inf.0))
(when (obsolete? last-expiry-date now cleanup-period)
(remove-expired-cache-entries (cache-entries cache)
@@ -108,8 +115,10 @@ (define* (maybe-remove-expired-cache-entries cache
#:delete-entry delete-entry)
(catch 'system-error
(lambda ()
- (with-atomic-file-output expiry-file
- (cute write (time-second now) <>)))
+ (seek expiry-port 0 SEEK_SET)
+ (truncate-file expiry-port 0)
+ (write (time-second now) expiry-port)
+ (unlock-file expiry-port))
(lambda args
;; ENOENT means CACHE does not exist.
(unless (= ENOENT (system-error-errno args))
diff --git a/tests/cache.scm b/tests/cache.scm
index d495ace2bd..e8ad083d40 100644
--- a/tests/cache.scm
+++ b/tests/cache.scm
@@ -1,5 +1,5 @@
;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2017, 2020 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2017, 2020, 2024 Ludovic Courtès <ludo@gnu.org>
;;; Copyright © 2022 Simon Tournier <zimon.toutoune@gmail.com>
;;;
;;; This file is part of GNU Guix.
@@ -22,7 +22,9 @@ (define-module (test-cache)
#:use-module (srfi srfi-1)
#:use-module (srfi srfi-19)
#:use-module (srfi srfi-64)
+ #:use-module ((guix build syscalls) #:select (lock-file))
#:use-module ((guix utils) #:select (call-with-temporary-directory))
+ #:use-module ((rnrs io ports) #:select (get-string-all))
#:use-module (ice-9 match))
(test-begin "cache")
@@ -75,6 +77,32 @@ (define-syntax-rule (test-cache-cleanup cache exp ...)
(lambda (port)
(display 0 port)))))
+(let ((pid #f))
+ (test-equal "maybe-remove-expired-cache-entries, cleanup needed but lock taken"
+ '()
+ (test-cache-cleanup cache
+ (let ((in+out (pipe)))
+ (match (primitive-fork)
+ (0 (dynamic-wind
+ (const #t)
+ (lambda ()
+ (close-port (car in+out))
+ (let ((port (lock-file
+ (string-append cache "/last-expiry-cleanup"))))
+ (display 0 port)
+ (display "done!\n" (cdr in+out))
+ (close-port (cdr in+out))
+ (sleep 100)))
+ (lambda ()
+ (primitive-exit 0))))
+ (n
+ (set! pid n)
+ (close-port (cdr in+out))
+ (pk 'chr (get-string-all (car in+out)))
+ (close-port (car in+out)))))))
+
+ (when pid (kill pid SIGKILL)))
+
(test-equal "maybe-remove-expired-cache-entries, empty cache"
'("a" "b" "c")
(test-cache-cleanup cache
--
2.45.2
L
L
Ludovic Courtès wrote on 21 Aug 00:54 +0200
Re: [bug#72137] [PATCH 0/2] Avoid cache cleanup storms
(address . 72137-done@debbugs.gnu.org)
87seuyst3e.fsf@gnu.org
Hi,

Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (14 lines)
> This fixes “cache cleanup storms” reported by Chris Baines whereby
> typically multiple ‘guix substitute’ processes would start cleaning
> up /var/guix/substitute/cache concurrently, leading to poor performance
> (in particular on build farms with many such processes running in
> parallel, even worse when on spinning disks).
>
> Thoughts?
>
> Ludo’.
>
> Ludovic Courtès (2):
> syscalls: Add ‘mode’ parameter to ‘lock-file’.
> cache: Avoid cache cleanup storms from concurrent processes.

I went ahead and pushed it as d921c742b774a9f0a016f3db6442d5c58a330c92.

Ludo’.
Closed
?
Your comment

Commenting via the web interface is currently disabled.

To comment on this conversation send an email to 72137@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 72137
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch