[PATCH] git: Periodically delete least-recently-used cached checkouts.

  • Done
  • quality assurance status badge
Details
3 participants
  • Guillaume Le Vaillant
  • Ludovic Courtès
  • zimoun
Owner
unassigned
Submitted by
Ludovic Courtès
Severity
normal
L
L
Ludovic Courtès wrote on 19 Dec 2020 23:06
(address . guix-patches@gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20201219220630.24605-1-ludo@gnu.org
This ensures ~/.cache/guix/checkouts is periodically cleaned up.

* guix/git.scm (cached-checkout-expiration)
(%checkout-cache-cleanup-period): New variables.
(delete-checkout): New procedure.
(update-cached-checkout)[cache-entries]: New procedure.
Add call to 'maybe-remove-expired-cache-entries'.
---
guix/git.scm | 38 +++++++++++++++++++++++++++++++++++++-
1 file changed, 37 insertions(+), 1 deletion(-)

Hi!

I noticed that my ~/.cache/guix/checkouts directory had accumulated
a lot of cruft from channels, playing with ‘--with-branch’ and such,
and that it would be nice to clean it up once in a while.

This is what this patch does. It uses the (guix cache) default
strategy, which consists in deleting least-recently-used items by
looking at their atime.

Thoughts?

Ludo’.

Toggle diff (80 lines)
diff --git a/guix/git.scm b/guix/git.scm
index ca77b9f54b..5df11db38e 100644
--- a/guix/git.scm
+++ b/guix/git.scm
@@ -23,8 +23,10 @@
#:use-module (git submodule)
#:use-module (guix i18n)
#:use-module (guix base32)
+ #:use-module (guix cache)
#:use-module (gcrypt hash)
- #:use-module ((guix build utils) #:select (mkdir-p))
+ #:use-module ((guix build utils)
+ #:select (mkdir-p delete-file-recursively))
#:use-module (guix store)
#:use-module (guix utils)
#:use-module (guix records)
@@ -35,6 +37,7 @@
#:use-module (rnrs bytevectors)
#:use-module (ice-9 format)
#:use-module (ice-9 match)
+ #:use-module (ice-9 ftw)
#:use-module (srfi srfi-1)
#:use-module (srfi srfi-11)
#:use-module (srfi srfi-34)
@@ -318,6 +321,20 @@ definitely available in REPOSITORY, false otherwise."
(_
#f)))
+(define cached-checkout-expiration
+ ;; Return the expiration time of a cached checkout.
+ (file-expiration-time (* 30 24 3600)))
+
+(define %checkout-cache-cleanup-period
+ ;; Period for the removal of expired cached checkouts.
+ (* 5 24 3600))
+
+(define (delete-checkout directory)
+ "Delete DIRECTORY recursively, in an atomic fashion."
+ (let ((trashed (string-append directory ".trashed")))
+ (rename-file directory trashed)
+ (delete-file-recursively trashed)))
+
(define* (update-cached-checkout url
#:key
(ref '(branch . "master"))
@@ -341,6 +358,14 @@ When RECURSIVE? is true, check out submodules as well, if any.
When CHECK-OUT? is true, reset the cached working tree to REF; otherwise leave
it unchanged."
+ (define (cache-entries directory)
+ (filter-map (match-lambda
+ ((or "." "..")
+ #f)
+ (file
+ (string-append directory "/" file)))
+ (or (scandir directory) '())))
+
(define canonical-ref
;; We used to require callers to specify "origin/" for each branch, which
;; made little sense since the cache should be transparent to them. So
@@ -387,6 +412,17 @@ it unchanged."
;; REPOSITORY as soon as possible.
(repository-close! repository)
+ ;; When CACHE-DIRECTORY is a sub-directory of the default cache
+ ;; directory, remove expired checkouts that are next to it.
+ (let ((parent (dirname cache-directory)))
+ (when (string=? parent (%repository-cache-directory))
+ (maybe-remove-expired-cache-entries parent cache-entries
+ #:entry-expiration
+ cached-checkout-expiration
+ #:delete-entry delete-checkout
+ #:cleanup-period
+ %checkout-cache-cleanup-period)))
+
(values cache-directory (oid->string oid) relation)))))
(define* (latest-repository-commit store url
--
2.29.2
G
G
Guillaume Le Vaillant wrote on 20 Dec 2020 11:46
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 45327@debbugs.gnu.org)
87pn34n4ab.fsf@yamatai
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (12 lines)
> Hi!
>
> I noticed that my ~/.cache/guix/checkouts directory had accumulated
> a lot of cruft from channels, playing with ‘--with-branch’ and such,
> and that it would be nice to clean it up once in a while.
>
> This is what this patch does. It uses the (guix cache) default
> strategy, which consists in deleting least-recently-used items by
> looking at their atime.
>
> Thoughts?

How does it behave when the cache is on a file system mounted with the
'noatime' option?
-----BEGIN PGP SIGNATURE-----

iIUEAREKAC0WIQTLxZxm7Ce5cXlAaz5r6CCK3yH+PwUCX98rjA8cZ2x2QHBvc3Rl
by5uZXQACgkQa+ggit8h/j9AiQD/Sxhqiead7d7oinH1G5Kz4LpLKWpZ1ZBcToqb
Epjn8NsBAISRLoyuznD0wThB6QHJwbW9HJI3nYvBQ6I1CNChzv+4
=3Ed5
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 20 Dec 2020 14:47
(name . Guillaume Le Vaillant)(address . glv@posteo.net)(address . 45327@debbugs.gnu.org)
87eejk8u8u.fsf@gnu.org
Hi,

Guillaume Le Vaillant <glv@posteo.net> skribis:

Toggle quote (17 lines)
> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> Hi!
>>
>> I noticed that my ~/.cache/guix/checkouts directory had accumulated
>> a lot of cruft from channels, playing with ‘--with-branch’ and such,
>> and that it would be nice to clean it up once in a while.
>>
>> This is what this patch does. It uses the (guix cache) default
>> strategy, which consists in deleting least-recently-used items by
>> looking at their atime.
>>
>> Thoughts?
>
> How does it behave when the cache is on a file system mounted with the
> 'noatime' option?

I guess the worst that could happen is that checkouts are removed too
frequently (because the atime is not updated), meaning that users find
themselves making full clones more often than we’d like.

Perhaps we could use the mtime instead, since when checkouts are
updated, the mtime is presumably updated too.

Thoughts?

Ludo’.
G
G
Guillaume Le Vaillant wrote on 20 Dec 2020 15:16
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 45327@debbugs.gnu.org)
87mty8mukp.fsf@yamatai
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (32 lines)
> Hi,
>
> Guillaume Le Vaillant <glv@posteo.net> skribis:
>
>> Ludovic Courtès <ludo@gnu.org> skribis:
>>
>>> Hi!
>>>
>>> I noticed that my ~/.cache/guix/checkouts directory had accumulated
>>> a lot of cruft from channels, playing with ‘--with-branch’ and such,
>>> and that it would be nice to clean it up once in a while.
>>>
>>> This is what this patch does. It uses the (guix cache) default
>>> strategy, which consists in deleting least-recently-used items by
>>> looking at their atime.
>>>
>>> Thoughts?
>>
>> How does it behave when the cache is on a file system mounted with the
>> 'noatime' option?
>
> I guess the worst that could happen is that checkouts are removed too
> frequently (because the atime is not updated), meaning that users find
> themselves making full clones more often than we’d like.
>
> Perhaps we could use the mtime instead, since when checkouts are
> updated, the mtime is presumably updated too.
>
> Thoughts?
>
> Ludo’.

I guess either using mtime or making Guix update the atime when using
a cached checkout would work.
-----BEGIN PGP SIGNATURE-----

iIUEAREKAC0WIQTLxZxm7Ce5cXlAaz5r6CCK3yH+PwUCX99ctg8cZ2x2QHBvc3Rl
by5uZXQACgkQa+ggit8h/j8WfwD/f/8ohoVqsuunW5+oz4nWVry826NaW4eAAeCN
ymejEx8BAJiv/7/OyGR0WyDJXLa+79K6VkKbSz4LLZAm7Aj9dgG8
=A+Nf
-----END PGP SIGNATURE-----

Z
Z
zimoun wrote on 21 Dec 2020 11:26
86a6u7bgle.fsf@gmail.com
Hi Ludo,

On Sat, 19 Dec 2020 at 23:06, Ludovic Courtès <ludo@gnu.org> wrote:
Toggle quote (21 lines)
> This ensures ~/.cache/guix/checkouts is periodically cleaned up.
>
> * guix/git.scm (cached-checkout-expiration)
> (%checkout-cache-cleanup-period): New variables.
> (delete-checkout): New procedure.
> (update-cached-checkout)[cache-entries]: New procedure.
> Add call to 'maybe-remove-expired-cache-entries'.
> ---
> guix/git.scm | 38 +++++++++++++++++++++++++++++++++++++-
> 1 file changed, 37 insertions(+), 1 deletion(-)
>
> Hi!
>
> I noticed that my ~/.cache/guix/checkouts directory had accumulated
> a lot of cruft from channels, playing with ‘--with-branch’ and such,
> and that it would be nice to clean it up once in a while.
>
> This is what this patch does. It uses the (guix cache) default
> strategy, which consists in deleting least-recently-used items by
> looking at their atime.

This is done at pull time, right? Personally, I would prefer at gc
time, and even maybe with an option to “guix gc”.

Because, IIUC, every 5 days, the entries older than 1 month will be
deleted. As an extensive user of the time-machine, it means that I will
do this extra work more than often, slowing down the already slow
“time-machine”.

Cheers,
simon
L
L
Ludovic Courtès wrote on 22 Dec 2020 14:33
(name . zimoun)(address . zimon.toutoune@gmail.com)(address . 45327@debbugs.gnu.org)
87blem2ceq.fsf@gnu.org
Hi,

zimoun <zimon.toutoune@gmail.com> skribis:

Toggle quote (2 lines)
> On Sat, 19 Dec 2020 at 23:06, Ludovic Courtès <ludo@gnu.org> wrote:

[...]

Toggle quote (10 lines)
>> I noticed that my ~/.cache/guix/checkouts directory had accumulated
>> a lot of cruft from channels, playing with ‘--with-branch’ and such,
>> and that it would be nice to clean it up once in a while.
>>
>> This is what this patch does. It uses the (guix cache) default
>> strategy, which consists in deleting least-recently-used items by
>> looking at their atime.
>
> This is done at pull time, right?

This is happens when ‘update-cached-checkout’ is called: when updating a
channel, using ‘--with-git-url’, etc.

Toggle quote (3 lines)
> Personally, I would prefer at gc time, and even maybe with an option
> to “guix gc”.

Hmm yes (currently the two things are unrelated.) I have a slight
preference for something automated that you don’t have to worry about.

Toggle quote (3 lines)
> Because, IIUC, every 5 days, the entries older than 1 month will be
> deleted.

Correct.

Toggle quote (4 lines)
> As an extensive user of the time-machine, it means that I will do this
> extra work more than often, slowing down the already slow
> “time-machine”.

Let’s say there’s a couple of channels you regularly pull from, like a
few times per months. Their checkout would never be deleted so you
wouldn’t notice any change in terms of performance.

The difference you’d see is if you pull from a few channels, but less
than once per month. In that case, the ‘guix’ channel would remain in
cache (because cache cleanup happens after the cached checkout has been
updated), but the other channels would be deleted just to be cloned
again soon after that.

Does that make sense?

Thanks,
Ludo’.
Z
Z
zimoun wrote on 22 Dec 2020 16:19
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 45327@debbugs.gnu.org)
86wnx97try.fsf@gmail.com
Hi,

On Tue, 22 Dec 2020 at 14:33, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (8 lines)
> The difference you’d see is if you pull from a few channels, but less
> than once per month. In that case, the ‘guix’ channel would remain in
> cache (because cache cleanup happens after the cached checkout has been
> updated), but the other channels would be deleted just to be cloned
> again soon after that.
>
> Does that make sense?

All make sense. Thanks for explaining.

As said, my preference of such thing is “guix gc” and not “guix pull”.
But since “guix pull --delete-generations” is already here… it pulls by
deleting. :-) Anyway, it is “guix pull”.

However, I still do not like the “automatic” part. Personally, I would
prefer something like:

guix pull --delete-generations=2m

deleting the cache older than 2 months, in addition to the old profiles.
Because, with your patch, if I want to change the expiration or the
period, it is not convenient: via channels.scm, maybe.

Sometime, I am one or two months off (vacation). And I am sure to
forget to tweak the channels.scm when I am back. Well, I have 3-4
channels.scm files; for example I am not pulling guix-past each time I
pull. Therefore, I should have these 3-4 channels.scm duplicated with
the expiration+period tweaked for when I am back from long breaks.

Well, if the automatic is the default, a way to turn off could be nice,
at the CLI level or even globally. IMHO.


Cheers,
simon
L
L
Ludovic Courtès wrote on 7 Jan 2021 10:39
(name . zimoun)(address . zimon.toutoune@gmail.com)(address . 45327@debbugs.gnu.org)
87v9c9hyrs.fsf@gnu.org
Hi,

zimoun <zimon.toutoune@gmail.com> skribis:

Toggle quote (9 lines)
> However, I still do not like the “automatic” part. Personally, I would
> prefer something like:
>
> guix pull --delete-generations=2m
>
> deleting the cache older than 2 months, in addition to the old profiles.
> Because, with your patch, if I want to change the expiration or the
> period, it is not convenient: via channels.scm, maybe.

There’s a difference though: generations are things you manage by
yourself as a user, whereas this cache is really just an internal cache.

Of course it becomes a user-visible feature if it gets in the way, and I
agree we should avoid that. But it also gets in the way by not being
vacuumed, hence this patch.

Toggle quote (9 lines)
> Sometime, I am one or two months off (vacation). And I am sure to
> forget to tweak the channels.scm when I am back. Well, I have 3-4
> channels.scm files; for example I am not pulling guix-past each time I
> pull. Therefore, I should have these 3-4 channels.scm duplicated with
> the expiration+period tweaked for when I am back from long breaks.
>
> Well, if the automatic is the default, a way to turn off could be nice,
> at the CLI level or even globally. IMHO.

Yeah, maybe we can have a more conservative default together with an
environment variable to tweak it.

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 7 Jan 2021 11:09
[PATCH v2] git: Periodically delete least-recently-used cached checkouts.
(address . 45327@debbugs.gnu.org)
20210107100947.31189-1-ludo@gnu.org
This ensures ~/.cache/guix/checkouts is periodically cleaned up.

* guix/git.scm (cached-checkout-expiration)
(%checkout-cache-cleanup-period): New variables.
(delete-checkout): New procedure.
(update-cached-checkout)[cache-entries]: New procedure.
Add call to 'maybe-remove-expired-cache-entries'.
* guix/cache.scm (file-expiration-time): Add optional 'timestamp'
parameter and honor it.
---
guix/cache.scm | 9 +++++----
guix/git.scm | 44 ++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 47 insertions(+), 6 deletions(-)

Hi! This v2 changes two things:

• Increase the default expiration time to three months;

• Check the mtime rather than the atime of checkouts.

As zimoun proposed, I agree that we should make the expiration time
configurable. I started doing that but that requires ‘string->duration’,
which is currently in (guix ui), and this code is supposed to be
“UI-free”. So I’d first like to move this and similar tools to a new
(guix units) module…

Thoughts?

Ludo’.

Toggle diff (121 lines)
diff --git a/guix/cache.scm b/guix/cache.scm
index feff131068..0401a9d428 100644
--- a/guix/cache.scm
+++ b/guix/cache.scm
@@ -1,5 +1,5 @@
;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2013, 2014, 2015, 2016, 2017, 2020 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2013, 2014, 2015, 2016, 2017, 2020, 2021 Ludovic Courtès <ludo@gnu.org>
;;;
;;; This file is part of GNU Guix.
;;;
@@ -47,13 +47,14 @@
(unless (= ENOENT (system-error-errno args))
(apply throw args)))))
-(define (file-expiration-time ttl)
+(define* (file-expiration-time ttl #:optional (timestamp stat:atime))
"Return a procedure that, when passed a file, returns its \"expiration
-time\" computed as its last-access time + TTL seconds."
+time\" computed as its timestamp + TTL seconds. Call TIMESTAMP to obtain the
+relevant timestamp from the result of 'stat'."
(lambda (file)
(match (stat file #f)
(#f 0) ;FILE may have been deleted in the meantime
- (st (+ (stat:atime st) ttl)))))
+ (st (+ (timestamp st) ttl)))))
(define* (remove-expired-cache-entries entries
#:key
diff --git a/guix/git.scm b/guix/git.scm
index ca77b9f54b..a5103547d3 100644
--- a/guix/git.scm
+++ b/guix/git.scm
@@ -1,6 +1,6 @@
;;; GNU Guix --- Functional package management for GNU
;;; Copyright © 2017, 2020 Mathieu Othacehe <m.othacehe@gmail.com>
-;;; Copyright © 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2018, 2019, 2020, 2021 Ludovic Courtès <ludo@gnu.org>
;;;
;;; This file is part of GNU Guix.
;;;
@@ -23,8 +23,10 @@
#:use-module (git submodule)
#:use-module (guix i18n)
#:use-module (guix base32)
+ #:use-module (guix cache)
#:use-module (gcrypt hash)
- #:use-module ((guix build utils) #:select (mkdir-p))
+ #:use-module ((guix build utils)
+ #:select (mkdir-p delete-file-recursively))
#:use-module (guix store)
#:use-module (guix utils)
#:use-module (guix records)
@@ -35,6 +37,7 @@
#:use-module (rnrs bytevectors)
#:use-module (ice-9 format)
#:use-module (ice-9 match)
+ #:use-module (ice-9 ftw)
#:use-module (srfi srfi-1)
#:use-module (srfi srfi-11)
#:use-module (srfi srfi-34)
@@ -318,6 +321,24 @@ definitely available in REPOSITORY, false otherwise."
(_
#f)))
+(define cached-checkout-expiration
+ ;; Return the expiration time procedure for a cached checkout.
+ ;; TODO: Honor $GUIX_GIT_CACHE_EXPIRATION.
+
+ ;; Use the mtime rather than the atime to cope with file systems mounted
+ ;; with 'noatime'.
+ (file-expiration-time (* 90 24 3600) stat:mtime))
+
+(define %checkout-cache-cleanup-period
+ ;; Period for the removal of expired cached checkouts.
+ (* 5 24 3600))
+
+(define (delete-checkout directory)
+ "Delete DIRECTORY recursively, in an atomic fashion."
+ (let ((trashed (string-append directory ".trashed")))
+ (rename-file directory trashed)
+ (delete-file-recursively trashed)))
+
(define* (update-cached-checkout url
#:key
(ref '(branch . "master"))
@@ -341,6 +362,14 @@ When RECURSIVE? is true, check out submodules as well, if any.
When CHECK-OUT? is true, reset the cached working tree to REF; otherwise leave
it unchanged."
+ (define (cache-entries directory)
+ (filter-map (match-lambda
+ ((or "." "..")
+ #f)
+ (file
+ (string-append directory "/" file)))
+ (or (scandir directory) '())))
+
(define canonical-ref
;; We used to require callers to specify "origin/" for each branch, which
;; made little sense since the cache should be transparent to them. So
@@ -387,6 +416,17 @@ it unchanged."
;; REPOSITORY as soon as possible.
(repository-close! repository)
+ ;; When CACHE-DIRECTORY is a sub-directory of the default cache
+ ;; directory, remove expired checkouts that are next to it.
+ (let ((parent (dirname cache-directory)))
+ (when (string=? parent (%repository-cache-directory))
+ (maybe-remove-expired-cache-entries parent cache-entries
+ #:entry-expiration
+ cached-checkout-expiration
+ #:delete-entry delete-checkout
+ #:cleanup-period
+ %checkout-cache-cleanup-period)))
+
(values cache-directory (oid->string oid) relation)))))
(define* (latest-repository-commit store url
--
2.29.2
Z
Z
zimoun wrote on 7 Jan 2021 13:40
(name . Ludovic Courtès)(address . ludo@gnu.org)
CAJ3okZ3FKwLFZz_9xnqRnQwo73=G19YmiDwTEH=E0-nOP3i0OQ@mail.gmail.com
Hi Ludo,

On Thu, 7 Jan 2021 at 11:10, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (2 lines)
> • Increase the default expiration time to three months;

That's long enough for my vacations. ;-)

Toggle quote (6 lines)
> As zimoun proposed, I agree that we should make the expiration time
> configurable. I started doing that but that requires ‘string->duration’,
> which is currently in (guix ui), and this code is supposed to be
> “UI-free”. So I’d first like to move this and similar tools to a new
> (guix units) module…

You mean 2 another patches are coming: one moving to the new (guix
units) and then another one honoring GUIX_GIT_CACHE_EXPIRATION, right?

Otherwise LGTM. :-)

Cheers,
simon
L
L
Ludovic Courtès wrote on 13 Jan 2021 16:47
Re: bug#45327: [PATCH] git: Periodically delete least-recently-used cached checkouts.
(name . zimoun)(address . zimon.toutoune@gmail.com)
87wnwg7s9h.fsf_-_@gnu.org
Hi,

zimoun <zimon.toutoune@gmail.com> skribis:

Toggle quote (6 lines)
> On Thu, 7 Jan 2021 at 11:10, Ludovic Courtès <ludo@gnu.org> wrote:
>
>> • Increase the default expiration time to three months;
>
> That's long enough for my vacations. ;-)

Pffew. :-)

Pushed as 87b0001325992db60fdf24ac09ce254cd003721c!

Toggle quote (9 lines)
>> As zimoun proposed, I agree that we should make the expiration time
>> configurable. I started doing that but that requires ‘string->duration’,
>> which is currently in (guix ui), and this code is supposed to be
>> “UI-free”. So I’d first like to move this and similar tools to a new
>> (guix units) module…
>
> You mean 2 another patches are coming: one moving to the new (guix
> units) and then another one honoring GUIX_GIT_CACHE_EXPIRATION, right?

Yup! (Not ETA though…)

Thanks,
Ludo’.
Closed
?