Guile-Git-managed checkouts grow way too much

  • Open
  • quality assurance status badge
Details
6 participants
  • Josselin Poiret
  • Jelle Licht
  • Ludovic Courtès
  • Csepp
  • wolf
  • Simon Tournier
Owner
unassigned
Submitted by
Ludovic Courtès
Severity
important
L
L
Ludovic Courtès wrote on 3 Sep 22:44 +0200
(address . bug-guix@gnu.org)
87bkejc7go.fsf@inria.fr
Hello!

As reported by Tobias on IRC (in the context of ‘hpcguix-web’),
checkouts managed by Guile-Git appear to grow beyond reason. As an
example, here’s the same ‘.git’ managed with Guile-Git and with Git:

Toggle snippet (6 lines)
$ du -hs ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
6.7G /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
$ du -hs .git
517M .git

It would seem that libgit2 doesn’t do the equivalent of ‘git gc’.

Ludo’.
L
L
Ludovic Courtès wrote on 4 Sep 23:13 +0200
control message for bug #65720
(address . control@debbugs.gnu.org)
87zg21od50.fsf@gnu.org
severity 65720 important
quit
L
L
Ludovic Courtès wrote on 4 Sep 23:47 +0200
Re: bug#65720: Guile-Git-managed checkouts grow way too much
(address . 65720@debbugs.gnu.org)
87fs3tobju.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (9 lines)
> As reported by Tobias on IRC (in the context of ‘hpcguix-web’),
> checkouts managed by Guile-Git appear to grow beyond reason. As an
> example, here’s the same ‘.git’ managed with Guile-Git and with Git:
>
> $ du -hs ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> 6.7G /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> $ du -hs .git
> 517M .git

Unsurprisingly, GC makes a big difference:

Toggle snippet (20 lines)
$ cp -r ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq /tmp/checkout
$ (cd /tmp/checkout/; git gc)
Enumerating objects: 717785, done.
Counting objects: 100% (717785/717785), done.
Delta compression using up to 4 threads
Compressing objects: 100% (154644/154644), done.
Writing objects: 100% (717785/717785), done.
Total 717785 (delta 569440), reused 710535 (delta 562274), pack-reused 0
Enumerating cruft objects: 103412, done.
Traversing cruft objects: 81753, done.
Counting objects: 100% (64171/64171), done.
Delta compression using up to 4 threads
Compressing objects: 100% (17379/17379), done.
Writing objects: 100% (64171/64171), done.
Total 64171 (delta 52330), reused 58296 (delta 46792), pack-reused 0
Expanding reachable commits in commit graph: 133730, done.
$ du -hs /tmp/checkout
539M /tmp/checkout

Toggle quote (2 lines)
> It would seem that libgit2 doesn’t do the equivalent of ‘git gc’.


My inclination for the short term would be to work around this
limitation by (1) finding a heuristic to determine is a checkout has
likely accumulated too much cruft, and (2) considering such checkouts as
expired (thereby forcing a re-clone) or running ‘git gc’ on them if
‘git’ is available.

I can’t think of a good heuristic for (1). Birth time could be one, but
we’d need statx(2):

Toggle snippet (7 lines)
$ stat ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq | tail -4
Access: 2023-09-04 23:13:54.668279105 +0200
Modify: 2023-09-04 11:34:41.665385000 +0200
Change: 2023-09-04 11:34:41.661629102 +0200
Birth: 2021-08-09 10:48:17.748722151 +0200

Lacking statx(2), we can approximate creation time by looking at
‘.git/config’:

Toggle snippet (6 lines)
$ stat ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/config | tail -3
Modify: 2021-08-09 10:50:28.031760953 +0200
Change: 2021-08-09 10:50:28.031760953 +0200
Birth: 2021-08-09 10:50:28.031760953 +0200

This strategy can be implemented like this:
Toggle diff (22 lines)
diff --git a/guix/git.scm b/guix/git.scm
index ebe2600209..ed3fa56bc8 100644
--- a/guix/git.scm
+++ b/guix/git.scm
@@ -405,7 +405,16 @@ (define cached-checkout-expiration
;; Use the mtime rather than the atime to cope with file systems mounted
;; with 'noatime'.
- (file-expiration-time (* 90 24 3600) stat:mtime))
+ (let ((ttl (* 90 24 3600))
+ (max-checkout-retention (* 9 30 24 3600)))
+ (lambda (file)
+ (match (false-if-exception (lstat file))
+ (#f 0) ;FILE may have been deleted in the meantime
+ (st (min (pk 'ttl (+ (stat:mtime st) ttl))
+ (pk 'maxttl (match (false-if-exception
+ (lstat (in-vicinity file ".git/config")))
+ (#f +inf.0)
+ (st (+ (stat:mtime st) max-checkout-retention))))))))))
(define %checkout-cache-cleanup-period
;; Period for the removal of expired cached checkouts.
Namely, a cached checkout as considered as “expired” after 9 months. In
my case, it gives this:

Toggle snippet (8 lines)
scheme@(guix git)> (cached-checkout-expiration "/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/")

;;; (ttl 1701596081)

;;; (maxttl 1651827028)
$6 = 1651827028

Of course having to re-clone entire repositories every 9 months is
ridiculous, but storing gigabytes of packs is worse IMO (I’m
specifically thinking about the Guix repo, which every users copies via
‘guix pull’).

Thoughts?

Thanks,
Ludo’.
J
J
Josselin Poiret wrote on 5 Sep 10:18 +0200
87tts9uj6x.fsf@jpoiret.xyz
Hi Ludo,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (6 lines)
> My inclination for the short term would be to work around this
> limitation by (1) finding a heuristic to determine is a checkout has
> likely accumulated too much cruft, and (2) considering such checkouts as
> expired (thereby forcing a re-clone) or running ‘git gc’ on them if
> ‘git’ is available.

I think using the git binary instead of libgit2 as a workaround is a
good idea. We can consider building it directly as well, so that people
who don't have it in their profiles can still benefit from it. We could
even consider using git commands in most places and using libgit2 only
where we really need the tight coupling. IIUC, libgit2 is eternally
trying to catch up to git and often performs in a counter-intuitive way
(I expect the various bugs with stale deleted files in checkouts to be
caused by this). Maybe it could also let us use bare repository and
directly extract the refs we want without having to mess with checkouts?

Best,
--
Josselin Poiret
-----BEGIN PGP SIGNATURE-----

iQHEBAEBCgAuFiEEOSSM2EHGPMM23K8vUF5AuRYXGooFAmT25FYQHGRldkBqcG9p
cmV0Lnh5egAKCRBQXkC5Fhcain93DACHuEyLuP52K5rHucB5+rmiiaHAqwh6U7Us
GdD98bt8ggLcGkuJviQJKAL7sWrbLZLZGoFvGOVSIFU71zixL0aDy0vLHdLrr9kw
nhlp9FBrsTE1WJ87n6cqN7QEGAKdecX8QSHzAPSgOiPniby8DDML/EZ5qkJA8HeA
x8atrhamPs/j9lUiWMR94O9eTAo0iOrZ3V+o6Phc7711vRWXiSOneIkcrCcylLfs
gtCjJUaVp2AwZXNvvCj8Lf17wWcEacsdvi4e6hTzhnT75xiDcCCc5O682F+lLPy2
XzaDc6GAql8y2tt+/zuNUa2S+anBXFGaqqz5Dxzmm3Vi/AEPA24dbCUyjmw+gX+z
yzCbSSdt2wWKT9U3Us6jQWbtzztwRjHEqNLFS7NqZvCRZA2UiDF9XxZSLlB6jZB8
01biNYDABcz1SDgEFj007l2iBoxtTWwDeOt1bTdrYP3pmTZpojf0mCox4RXkviHB
z6d2UwRQq8C9aGdHVCM0n6FSzs6a7l4=
=6KMj
-----END PGP SIGNATURE-----

J
J
Jelle Licht wrote on 5 Sep 10:22 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 65720@debbugs.gnu.org)
CE9E1465-187B-462B-B9E2-94E6A43B86EC@posteo.net
Hi Ludo,

Toggle quote (8 lines)
>
> On 4 Sep 2023, at 23:49, Ludovic Courtès <ludo@gnu.org> wrote:
>
> Of course having to re-clone entire repositories every 9 months is
> ridiculous, but storing gigabytes of packs is worse IMO (I’m
> specifically thinking about the Guix repo, which every users copies via
> ‘guix pull’).

Please ignore if it doesn’t make sense, or would not make a practical difference for the current issue, but wouldn’t a local clone do the trick here? As in, clone from the ‘clogged’ local repo, move over fresh clone to old location.

Kr, Jelle
L
L
Ludovic Courtès wrote on 5 Sep 16:11 +0200
(address . 65720@debbugs.gnu.org)
87wmx4lnfu.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (3 lines)
> $ du -hs ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> 6.7G /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq

Another data point, with Cuirass instances:

Toggle snippet (6 lines)
ludo@berlin ~$ sudo du -hs /var/lib/cuirass/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
65G /var/lib/cuirass/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
ludo@berlin ~$ sudo stat /var/lib/cuirass/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq | tail -1
Birth: 2022-07-30 23:15:45.582559879 +0200

… and:

Toggle snippet (6 lines)
ludo@guix-hpc4 ~$ sudo du -hs /var/lib/cuirass/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
86G /var/lib/cuirass/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
ludo@guix-hpc4 ~$ sudo stat /var/lib/cuirass/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq | tail -1
Créé : 2021-06-01 11:48:48.854669310 +0200

So yeah, problem we have.

Ludo’.
L
L
Ludovic Courtès wrote on 5 Sep 16:18 +0200
(name . Josselin Poiret)(address . dev@jpoiret.xyz)(address . 65720@debbugs.gnu.org)
87msy0ln4m.fsf@gnu.org
Hi,

Josselin Poiret <dev@jpoiret.xyz> skribis:

Toggle quote (6 lines)
> I think using the git binary instead of libgit2 as a workaround is a
> good idea. We can consider building it directly as well, so that people
> who don't have it in their profiles can still benefit from it. We could
> even consider using git commands in most places and using libgit2 only
> where we really need the tight coupling.

Surely you’d agree that it would suck though: depending on two Git
implementations because one doesn’t have a proper API and the other one
lacks a bunch of features.

It would also be pretty bad for closure size:

Toggle snippet (6 lines)
$ guix size guile-git | tail -1
total: 106.6 MiB
$ guix size guile-git git-minimal | tail -1
total: 169.8 MiB

It’s also not clear concretely how we’d add that dependency. Try
invoking ‘git’ from $PATH and print a warning if it doesn’t work?
But then, what about applications like Cuirass and hpcguix-web?

Tricky, tricky.

Ludo’.
L
L
Ludovic Courtès wrote on 5 Sep 16:20 +0200
(name . Jelle Licht)(address . jlicht@posteo.net)(address . 65720@debbugs.gnu.org)
87il8oln06.fsf@gnu.org
Hello,

Jelle Licht <jlicht@posteo.net> skribis:

Toggle quote (9 lines)
>> On 4 Sep 2023, at 23:49, Ludovic Courtès <ludo@gnu.org> wrote:
>>
>> Of course having to re-clone entire repositories every 9 months is
>> ridiculous, but storing gigabytes of packs is worse IMO (I’m
>> specifically thinking about the Guix repo, which every users copies via
>> ‘guix pull’).
>
> Please ignore if it doesn’t make sense, or would not make a practical difference for the current issue, but wouldn’t a local clone do the trick here? As in, clone from the ‘clogged’ local repo, move over fresh clone to old location.

Good question.

Toggle snippet (14 lines)
scheme@(guix git)> ,use(git)
scheme@(guix git)> (clone "/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/" "/tmp/fresh-clone")
$7 = #<git-repository ba4240>
scheme@(guix git)> (system* "du" "-hs" "/tmp/fresh-clone")
6.7G /tmp/fresh-clone
$8 = 0
scheme@(guix git)> (system* "du" "-hs" "/tmp/fresh-clone/.git")
6.6G /tmp/fresh-clone/.git
$9 = 0
scheme@(guix git)> (system* "du" "-hs" "/home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/")
6.7G /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/
$10 = 0

Conclusion: it makes no difference.

Ludo’.
S
S
Simon Tournier wrote on 5 Sep 20:59 +0200
86edjcqwec.fsf@gmail.com
Hi,

On Mon, 04 Sep 2023 at 23:47, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (4 lines)
>> It would seem that libgit2 doesn’t do the equivalent of ‘git gc’.
>
> Confirmed: <https://github.com/libgit2/libgit2/issues/3247>.

Ouch!

The goals of the project haven't changed, and neither have the
tradeoffs. If one were to rewrite git-gc on top of libgit2, the
best-case scenario is ending up with what we already had.

If you want to use regular maintenance on some repostories, use
git gc, that's what it's there for.


Toggle quote (6 lines)
> My inclination for the short term would be to work around this
> limitation by (1) finding a heuristic to determine is a checkout has
> likely accumulated too much cruft, and (2) considering such checkouts
> as expired (thereby forcing a re-clone) or running ‘git gc’ on them if
> ‘git’ is available.

About (1) maybe we could add a “counter” and teach after X updates of
the checkout then let run (2). Well, I guess the number of crufts is
more or less proportional with the number of checkout updates; that’s
the heuristic I would use.

The most annoying is (2). Because forcing a re-clone does not appear to
me a solution; I prefer to waste disk space (and probably run myself and
manually ‘git gc’) than re-clone… Somehow this re-clone would always
happen when I am using a poor network.

Moreover, assuming this clean-up (2) would be run once every while, we
could imagine to invoke something like,

guix shell -C git-minimal
-- git
-C ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
gc

when the checkout is updated. And maybe we could provide another “guix
pull” command-line option for turning off this and mark it as done
(reset the “counter”).

Well, that’s a poor solution but we can assume that git-minimal is at
worse available using “guix shell git-minimal”. Note that the closure
of git-minimal is far less than re-cloning the full Guix repository.

Cheers,
simon
J
J
Josselin Poiret wrote on 6 Sep 10:04 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 65720@debbugs.gnu.org)
87pm2vvibo.fsf@jpoiret.xyz
Hi Ludo,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (4 lines)
> Surely you’d agree that it would suck though: depending on two Git
> implementations because one doesn’t have a proper API and the other one
> lacks a bunch of features.

Right, although I wouldn't necessarily say that the former doesn't have
a proper API, but rather that it has a Unix-oriented API. That leads to
performance issues on e.g. Windows but on Linux I'm not sure there's
much of a difference.

Toggle quote (15 lines)
> It would also be pretty bad for closure size:
>
> --8<---------------cut here---------------start------------->8---
> $ guix size guile-git | tail -1
> total: 106.6 MiB
> $ guix size guile-git git-minimal | tail -1
> total: 169.8 MiB
> --8<---------------cut here---------------end--------------->8---
>
> It’s also not clear concretely how we’d add that dependency. Try
> invoking ‘git’ from $PATH and print a warning if it doesn’t work?
> But then, what about applications like Cuirass and hpcguix-web?
>
> Tricky, tricky.

We could consider replacing the guile-git dependency with another
library built directly on top of git-minimal, and have this be a
dependency of Guix. Not ideal though, and not really scalable either:
we can't just add every VCS as direct dependencies.

From what I've seen, people are now scaling back on their use of
libgit2 because of the impedence mismatch and are resorting more and
more to git plumbing. From a pragmatic point of view, I'd prefer the
latter, since it is more stable and feature-complete.

Best,
--
Josselin Poiret
-----BEGIN PGP SIGNATURE-----

iQHEBAEBCgAuFiEEOSSM2EHGPMM23K8vUF5AuRYXGooFAmT4MnsQHGRldkBqcG9p
cmV0Lnh5egAKCRBQXkC5FhcairUvDACAJZdGUEBC2qVWbsms7Xk6OUTUgfjucIC3
XOLFH74Ewo4OdrUJUrADcWP4GKjrEmglO1hQjRlTwpo60TB7CkFyZIC39Dkm0MPm
R7Oc8BYnPByFHihy3RJwrtk0zH1jOaRd/A6cvdIXCrXk1rnlTBn9EEAKpYlA1OrG
7al4FfxKfFkea48xZsGVM8uc1fsqiHrycZH3gLCbT8V0O4BtNY2rhYLf1eTjFkQl
PbcAdHmkOcnQaZR2WzUHUUH/9GGrHwcXkqSZtgnJ8y/zauig5nyWzX3Zgej4K+VJ
nN0l6QCVtkaAIgCN2+8zTg2ml+WeXjWcE1gGjVyv7748ICmpP3jXE0uGTSGvM0MX
ZFfi+TKU4wJUl8fCZLf41P5v7P1jSy2TbthqbAHkRKlsasYMf/KZMhSh6jypVIN0
VOFPWMUIJhZNQMP2DtVOMC/thc+O8BcfkbFSORNJ5XYaLiNvC4/ODwsKCGxYOa2h
4XlDMtm7YgAewG19hO9fYvbS1oXKLZM=
=olI3
-----END PGP SIGNATURE-----

S
S
Simon Tournier wrote on 7 Sep 02:41 +0200
(address . 65720@debbugs.gnu.org)
86il8mn7al.fsf@gmail.com
Hi,

On Tue, 05 Sep 2023 at 16:18, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (13 lines)
> It would also be pretty bad for closure size:
>
> --8<---------------cut here---------------start------------->8---
> $ guix size guile-git | tail -1
> total: 106.6 MiB
> $ guix size guile-git git-minimal | tail -1
> total: 169.8 MiB
> --8<---------------cut here---------------end--------------->8---
>
> It’s also not clear concretely how we’d add that dependency. Try
> invoking ‘git’ from $PATH and print a warning if it doesn’t work?
> But then, what about applications like Cuirass and hpcguix-web?

I think we can rely on something like,

guix shell -C git-minimal -- git gc

It would be invoked internally using the Scheme API for inferiors and
friends. Doing so, it would add nothing to the closure size.

It appears to me safe to assume that this command can be run from any
Guix installation. Since the Git GC would only be done once every X Git
fetches, the overhead would be much lower.

Hum, am I repeating myself [1]? :-)

And I would run this “git gc” via “guix gc”, not via “guix pull”. Well,
I do not like all these automatic removals happening based on date
(last-expiry-cleanup) with some usual commands. It always happens when
I do not want. ;-) Contrary to “guix gc”. Bah, another story. :-)

Cheers,
simon


1: bug#65720: Guile-Git-managed checkouts grow way too much
Simon Tournier <zimon.toutoune@gmail.com>
Tue, 05 Sep 2023 20:59:07 +0200
id:86edjcqwec.fsf@gmail.com
L
L
Ludovic Courtès wrote on 8 Sep 19:08 +0200
(name . Josselin Poiret)(address . dev@jpoiret.xyz)(address . 65720@debbugs.gnu.org)
87pm2s385m.fsf@gnu.org
Hello!

Josselin Poiret <dev@jpoiret.xyz> skribis:

Toggle quote (5 lines)
> Right, although I wouldn't necessarily say that the former doesn't have
> a proper API, but rather that it has a Unix-oriented API. That leads to
> performance issues on e.g. Windows but on Linux I'm not sure there's
> much of a difference.

[...]

Toggle quote (5 lines)
> We could consider replacing the guile-git dependency with another
> library built directly on top of git-minimal, and have this be a
> dependency of Guix. Not ideal though, and not really scalable either:
> we can't just add every VCS as direct dependencies.

I cannot imagine a viable implementation of things like ‘commit-closure’
and ‘commit-relation’ from (guix git) done by shelling out to ‘git’.
I’m quite confident this would be slow and brittle.

It looks like there’s no option other than carrying the two
implementations.

~~~

Years ago, Andy Wingo sketched a plan for GNU hackers to implement Git
in pure Scheme. That was on April 1st though, so people mistakenly
assumed it was a joke and the project was never carried out.

I digress, but I wonder: is there not even a viable Haskell or OCaml
implementation of Git?

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 8 Sep 19:09 +0200
(name . Simon Tournier)(address . zimon.toutoune@gmail.com)
87jzt0382l.fsf@gnu.org
Hi!

Simon Tournier <zimon.toutoune@gmail.com> skribis:

Toggle quote (19 lines)
> On Tue, 05 Sep 2023 at 16:18, Ludovic Courtès <ludo@gnu.org> wrote:
>
>> It would also be pretty bad for closure size:
>>
>> --8<---------------cut here---------------start------------->8---
>> $ guix size guile-git | tail -1
>> total: 106.6 MiB
>> $ guix size guile-git git-minimal | tail -1
>> total: 169.8 MiB
>> --8<---------------cut here---------------end--------------->8---
>>
>> It’s also not clear concretely how we’d add that dependency. Try
>> invoking ‘git’ from $PATH and print a warning if it doesn’t work?
>> But then, what about applications like Cuirass and hpcguix-web?
>
> I think we can rely on something like,
>
> guix shell -C git-minimal -- git gc

We’re talking about the implementation of a cache (meant to speed up
operations), that would actually fill said cache plus do a whole bunch
of expensive operations? Nah. :-)

Ludo’.
S
S
Simon Tournier wrote on 9 Sep 12:31 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
86cyyrskmj.fsf@gmail.com
Hi,

On Fri, 08 Sep 2023 at 19:09, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (21 lines)
>>> It would also be pretty bad for closure size:
>>>
>>> --8<---------------cut here---------------start------------->8---
>>> $ guix size guile-git | tail -1
>>> total: 106.6 MiB
>>> $ guix size guile-git git-minimal | tail -1
>>> total: 169.8 MiB
>>> --8<---------------cut here---------------end--------------->8---
>>>
>>> It’s also not clear concretely how we’d add that dependency. Try
>>> invoking ‘git’ from $PATH and print a warning if it doesn’t work?
>>> But then, what about applications like Cuirass and hpcguix-web?
>>
>> I think we can rely on something like,
>>
>> guix shell -C git-minimal -- git gc
>
> We’re talking about the implementation of a cache (meant to speed up
> operations), that would actually fill said cache plus do a whole bunch
> of expensive operations? Nah. :-)

I do not think. If I understand correctly, we need to run “git gc” at
some point, therefore git-minimal needs to me around. The question is
how and when.

Well, maybe I am missing what the bug is about. For me, it is about
running ‘git gc’ for cleaning the Git checkout cache, no?


Solution #1. Add git-minimal as inputs. It increases the closure and
the extra load (on average) is about the ratio between the rate of “guix
pull” and the rate of the git-minimal changes.

Assuming, that people are running “guix pull” once per week and say “git
gc” is run after 50 pulls. (These both number are totally arbitrary and
based on my personal estimate).

Data Service [1] tells:

2023-07-07 15:45:22 2023-09-08 21:22:08
2023-05-11 16:10:48 2023-07-07 14:21:45
2023-05-01 16:40:08 2023-05-11 14:36:16
2023-04-25 13:34:54 2023-05-01 15:19:55
2023-04-25 13:34:54 2023-09-08 21:22:08
2023-03-06 17:22:28 2023-04-25 12:27:33
2023-01-17 23:49:19 2023-03-06 16:48:43
2022-11-08 13:06:42 2023-01-17 15:11:47
2022-10-08 05:14:46 2022-11-08 09:56:31
2022-09-06 15:00:08 2022-10-08 04:15:43
2022-08-13 22:02:31 2022-09-06 12:58:52

It means that an user will download ~10 times git-minimal for nothing.


Solution #2. The one I am proposing. :-) Download git-minimal only
when Guix needs it for running “git gc”. Yeah, there is probably a
small overload with some operations. But, I bet this overload is much
smaller than the one of solution #1.

Well, it depends on the number of times people are updating the cache vs
the rate of change of git-minimal.

For sure, if one updates 100 times per week the cache, having
git-minimal as inputs is far better. But I do not think that the
regular usage on average. :-)

That’s why I am proposing to have an option for turning off this “git
gc“ operation.

Well, we have lived since years without running ‘git gc’ so running it
once per year on average is probably enough to keep the cache size
reasonable. And git-minimal is changing every month.


Maybe, there is some solution #3. ;-)

Cheers,
simon


C
(name . Ludovic Courtès)(address . ludo@gnu.org)
cuch6o16vgh.fsf@riseup.net
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (35 lines)
> Hello!
>
> Josselin Poiret <dev@jpoiret.xyz> skribis:
>
>> Right, although I wouldn't necessarily say that the former doesn't have
>> a proper API, but rather that it has a Unix-oriented API. That leads to
>> performance issues on e.g. Windows but on Linux I'm not sure there's
>> much of a difference.
>
> [...]
>
>> We could consider replacing the guile-git dependency with another
>> library built directly on top of git-minimal, and have this be a
>> dependency of Guix. Not ideal though, and not really scalable either:
>> we can't just add every VCS as direct dependencies.
>
> I cannot imagine a viable implementation of things like ‘commit-closure’
> and ‘commit-relation’ from (guix git) done by shelling out to ‘git’.
> I’m quite confident this would be slow and brittle.
>
> It looks like there’s no option other than carrying the two
> implementations.
>
> ~~~
>
> Years ago, Andy Wingo sketched a plan for GNU hackers to implement Git
> in pure Scheme. That was on April 1st though, so people mistakenly
> assumed it was a joke and the project was never carried out.
>
> I digress, but I wonder: is there not even a viable Haskell or OCaml
> implementation of Git?
>
> Thanks,
> Ludo’.

For sake of completeness:
There is an alternative implentation in C for Plan 9 that I've used and
is now mature enough that the 9front project switched to it from
Mercurial.
It might be possible to compile it with the plan9port compiler wrapper.

There is also a Git implementation in OCaml that some MirageOS
unikernels use to serve static content from a git repository.
Also the Irmin "database" is based on git and is written in OCaml.
C
(name . Simon Tournier)(address . zimon.toutoune@gmail.com)
cuca5tt6va2.fsf@riseup.net
Simon Tournier <zimon.toutoune@gmail.com> writes:

Toggle quote (87 lines)
> Hi,
>
> On Fri, 08 Sep 2023 at 19:09, Ludovic Courtès <ludo@gnu.org> wrote:
>
>>>> It would also be pretty bad for closure size:
>>>>
>>>> --8<---------------cut here---------------start------------->8---
>>>> $ guix size guile-git | tail -1
>>>> total: 106.6 MiB
>>>> $ guix size guile-git git-minimal | tail -1
>>>> total: 169.8 MiB
>>>> --8<---------------cut here---------------end--------------->8---
>>>>
>>>> It’s also not clear concretely how we’d add that dependency. Try
>>>> invoking ‘git’ from $PATH and print a warning if it doesn’t work?
>>>> But then, what about applications like Cuirass and hpcguix-web?
>>>
>>> I think we can rely on something like,
>>>
>>> guix shell -C git-minimal -- git gc
>>
>> We’re talking about the implementation of a cache (meant to speed up
>> operations), that would actually fill said cache plus do a whole bunch
>> of expensive operations? Nah. :-)
>
> I do not think. If I understand correctly, we need to run “git gc” at
> some point, therefore git-minimal needs to me around. The question is
> how and when.
>
> Well, maybe I am missing what the bug is about. For me, it is about
> running ‘git gc’ for cleaning the Git checkout cache, no?
>
>
> Solution #1. Add git-minimal as inputs. It increases the closure and
> the extra load (on average) is about the ratio between the rate of “guix
> pull” and the rate of the git-minimal changes.
>
> Assuming, that people are running “guix pull” once per week and say “git
> gc” is run after 50 pulls. (These both number are totally arbitrary and
> based on my personal estimate).
>
> Data Service [1] tells:
>
> 2023-07-07 15:45:22 2023-09-08 21:22:08
> 2023-05-11 16:10:48 2023-07-07 14:21:45
> 2023-05-01 16:40:08 2023-05-11 14:36:16
> 2023-04-25 13:34:54 2023-05-01 15:19:55
> 2023-04-25 13:34:54 2023-09-08 21:22:08
> 2023-03-06 17:22:28 2023-04-25 12:27:33
> 2023-01-17 23:49:19 2023-03-06 16:48:43
> 2022-11-08 13:06:42 2023-01-17 15:11:47
> 2022-10-08 05:14:46 2022-11-08 09:56:31
> 2022-09-06 15:00:08 2022-10-08 04:15:43
> 2022-08-13 22:02:31 2022-09-06 12:58:52
> …
>
> It means that an user will download ~10 times git-minimal for nothing.
>
>
> Solution #2. The one I am proposing. :-) Download git-minimal only
> when Guix needs it for running “git gc”. Yeah, there is probably a
> small overload with some operations. But, I bet this overload is much
> smaller than the one of solution #1.
>
> Well, it depends on the number of times people are updating the cache vs
> the rate of change of git-minimal.
>
> For sure, if one updates 100 times per week the cache, having
> git-minimal as inputs is far better. But I do not think that the
> regular usage on average. :-)
>
> That’s why I am proposing to have an option for turning off this “git
> gc“ operation.
>
> Well, we have lived since years without running ‘git gc’ so running it
> once per year on average is probably enough to keep the cache size
> reasonable. And git-minimal is changing every month.
>
>
> Maybe, there is some solution #3. ;-)
>
> Cheers,
> simon
>
>
> 1: https://data.guix.gnu.org/repository/1/branch/master/package/git-minimal/output-history

Please don't create another situation like with guix system roll-back,
where a crucial sysadmin operation doesn't work without network access.
Or at least make it configurable, so things that are likely to be needed
for future operations are pre-fetched.
S
S
Simon Tournier wrote on 11 Sep 10:42 +0200
Digression about Git implementations (was Re: bug#65720: Guile-Git-managed checkouts grow way too much)
(address . 65720@debbugs.gnu.org)
87zg1tje2s.fsf@gmail.com
Hi Ludo,

On Fri, 08 Sep 2023 at 19:08, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (4 lines)
> Years ago, Andy Wingo sketched a plan for GNU hackers to implement Git
> in pure Scheme. That was on April 1st though, so people mistakenly
> assumed it was a joke and the project was never carried out.

Well, that is a piece of work. :-)

Maybe there is an hope with: git-std-lib.

Subject: Proposal/Discussion: Turning parts of Git into libraries
From: Emily Shaffer <nasamuffin@google.com>
To: Git List <git@vger.kernel.org>
Date: Fri, 17 Feb 2023 13:12:23 -0800

And some patches are starting to float around.


Toggle quote (3 lines)
> I digress, but I wonder: is there not even a viable Haskell or OCaml
> implementation of Git?

It depends on what means “viable”. :-)


Irmin [1] is an OCaml library for building mergeable, branchable
distributed data stores – A Distributed Database Built on the Same
Principles as Git. And irmin relies on ocaml-git.


Then there is a pure Go implementation and another using Java.


I do not know all that are “viable”. Well, I do not know if ’git gc’ is
implemented. And I do not know which plumbing is implemented and which
porcelain is available.

Last, SWH uses dulwich [2] which is a pure Python implementation of Git.


To my knowledge, there is no “dulwich gc” but they implement “dulwich
fsck” and “dulwich repack”.

Back on 10 Years of Guix or at UNESCO on February – I do not remember
exactly when – we were discussing about implementation of Git. And we
mentioned an implementation in Rust. Maybe this one:


Cheers,
simon
L
L
Ludovic Courtès wrote on 11 Sep 16:37 +0200
Re: bug#65720: Guile-Git-managed checkouts grow way too much
(name . Josselin Poiret)(address . dev@jpoiret.xyz)(address . 65720@debbugs.gnu.org)
87jzswsrlt.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (10 lines)
> It would also be pretty bad for closure size:
>
> $ guix size guile-git | tail -1
> total: 106.6 MiB
> $ guix size guile-git git-minimal | tail -1
> total: 169.8 MiB
>
> It’s also not clear concretely how we’d add that dependency. Try
> invoking ‘git’ from $PATH and print a warning if it doesn’t work?

A solution to this particular problem is coming:


Ludo’.
W
(name . Ludovic Courtès)(address . ludo@gnu.org)
ZP8nc1m8rN_34XV-@ws
On 2023-09-08 19:08:05 +0200, Ludovic Courtès wrote:
Toggle quote (19 lines)
> Hello!
>
> Josselin Poiret <dev@jpoiret.xyz> skribis:
>
> > Right, although I wouldn't necessarily say that the former doesn't have
> > a proper API, but rather that it has a Unix-oriented API. That leads to
> > performance issues on e.g. Windows but on Linux I'm not sure there's
> > much of a difference.
>
> [...]
>
> > We could consider replacing the guile-git dependency with another
> > library built directly on top of git-minimal, and have this be a
> > dependency of Guix. Not ideal though, and not really scalable either:
> > we can't just add every VCS as direct dependencies.
>
> I cannot imagine a viable implementation of things like ‘commit-closure’
> and ‘commit-relation’ from (guix git) done by shelling out to ‘git’.

I am sure I must be missing some part of the contract of the function, but at
least the commit-relation seems fairly straightforward:

(define (shelling-commit-relation old new)
(let ((h-old (oid->string (commit-id old)))
(h-new (oid->string (commit-id new))))
(cond ((eq? old new)
'self)
((zero? (git-C %repo "merge-base" "--is-ancestor" h-old h-new))
'ancestor)
((zero? (git-C %repo "merge-base" "--is-ancestor" h-new h-old))
'descendant)
(else
'unrelated))))

I would argue it is even somewhat more readable than the current implementation.

Toggle quote (2 lines)
> I’m quite confident this would be slow

My version is ~2000x faster compared to (guix git):

Guix: 1048.620992ms
Git: 0.532143ms

Again, I am sure I must have miss something, either in the implementation or in
the measurements, because it is pretty hard to believe there is so much room for
improvement.

The full script I used is attached to this email.

Toggle quote (2 lines)
> and brittle.

In general git plumbing command are design to have stable CLI interface in order
to be usable in scripting. So I am not sure where the brittleness would come
from.

Toggle quote (4 lines)
>
> It looks like there’s no option other than carrying the two
> implementations.

Assuming I made no mistake (hard to believe), it is probably worth exploring the
feasibility of just shelling out to the git binary some more.

Toggle quote (14 lines)
>
> ~~~
>
> Years ago, Andy Wingo sketched a plan for GNU hackers to implement Git
> in pure Scheme. That was on April 1st though, so people mistakenly
> assumed it was a joke and the project was never carried out.
>
> I digress, but I wonder: is there not even a viable Haskell or OCaml
> implementation of Git?
>
> Thanks,
> Ludo’.
>

W.

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
#!/bin/sh # -*-scheme-*- exec guile -s "$0" "$@" !# (use-modules (git) (guix git)) (define %repo "/tmp/guix-fork") (define h1 "72745172d155e489936f694d6b9013cb76272370") (define h2 "6d60d7ccba5a8e06c17d55a1772fa7f4529b5eff") (define h3 "c3db650680f995f0556d3ddce567cdc1c33e4603") ;;; r has to still be defined when the commit-relation is called. There is *no* ;;; error, but it always returns 'unrelated. Quite a footgun. (define r (repository-open %repo)) (define c1 (commit-lookup r (string->oid h1))) (define c2 (commit-lookup r (string->oid h2))) (define c3 (commit-lookup r (string->oid h3))) (define (git-C dir . args) (apply system* "git" "-C" dir args)) (define (shelling-commit-relation old new) (let ((h-old (oid->string (commit-id old))) (h-new (oid->string (commit-id new)))) (cond ((eq? old new) 'self) ;; In real code, git-C should probably return #t (for 0), #f (for 1) ;; or raise (for anything else). ((zero? (git-C %repo "merge-base" "--is-ancestor" h-old h-new)) 'ancestor) ((zero? (git-C %repo "merge-base" "--is-ancestor" h-new h-old)) 'descendant) (else 'unrelated)))) ;;; Make sure it actually works. (let ((tests `((,c1 . ,c1) (,c1 . ,c2) (,c2 . ,c1) (,c1 . ,c3)))) (for-each (λ (c) (format #t "Guix: ~a\nGit: ~a\n\n" (commit-relation (car c) (cdr c)) (shelling-commit-relation (car c) (cdr c)))) tests)) (define (time proc) (let* ((start (get-internal-run-time)) (_ (proc)) (end (get-internal-run-time))) (exact->inexact (* 1000 (/ (- end start) internal-time-units-per-second))))) (format #t "Guix: ~ams\nGit: ~ams\n" (time (λ () (commit-relation c1 c2))) (time (λ () (shelling-commit-relation c1 c2))))
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEt4NJs4wUfTYpiGikL7/ufbZ/wakFAmT/J20ACgkQL7/ufbZ/
walA7BAAioswpeyaAYJlo/HjxXOUviMoZ49RJ0vjoWkcdKDJyZCvY1bSaa6E+o38
4mjw+8qT2VH3Su+GKTWgYJ66O6PT2IZh7kybzqPCdIFFXAK3KHNP2cQlweDgl6jG
YhktsUBWalhzk06rEy3JXNPqrIinGHmMqm/pIxMQXPcOLN5/d90TMB304YqbjAio
J5sCeNNYNhVL0A1jY7rZMefUcHISKX8B3XvsNr2A0AvofGv6OQrftf3OMEX4OeE1
5KFeukwv9FRZ38Cc6+Ob3Jw+Atmz5WrOutTPMXAbp4fxxXHQguG9/fIP3JinAtd1
3ruwT7Q4V5n6pGcz81vMYTR+24Tfbcs4thDqKfIM2uoPOvCh1c6dQ3ap2hI4uvls
DlCSviISQkjjCqR30jj2ZhHIHF3kPDl+DnaDCn/LIKBRwEbLDJJ+eW9Bv9JJLG2h
6TCouuRrJzCZ7OpkTg6psZI7mhzwYNdJO2wIkGib8eI2U+/GxFDgWTi8U9HHQFiR
Z8/97ph5AdoIObDz0R/hezyvpWOJMYuhI0IhKvBksyx8UYOnpM0lIaSASQt2DqU8
xmRztjNazvoUbTASBg9l4MedSejPcDVn6FFFQ+QpkBORXTMYJ5E572BVxOOmnTbu
s2K2nZZMczHKbOgWyJt4rafRzSZeJRY6fr062Cu+TrHs9TO0i70=
=UIrN
-----END PGP SIGNATURE-----


L
L
Ludovic Courtès wrote on 13 Sep 20:10 +0200
(name . wolf)(address . wolf@wolfsden.cz)
874jjylza9.fsf@gnu.org
Hi,

wolf <wolf@wolfsden.cz> skribis:

Toggle quote (10 lines)
> (define (time proc)
> (let* ((start (get-internal-run-time))
> (_ (proc))
> (end (get-internal-run-time)))
> (exact->inexact (* 1000 (/ (- end start) internal-time-units-per-second)))))
>
> (format #t "Guix: ~ams\nGit: ~ams\n"
> (time (λ () (commit-relation c1 c2)))
> (time (λ () (shelling-commit-relation c1 c2))))

‘get-internal-run-time’ returns “units of processor time” used by the
current process (info "(guile) Time"). When shelling out, the process
calls waitpid(2) and does nothing, so naturally its processor time is
close to zero.

‘get-internal-real-time’ should give something closer to elapsed time.

Ludo’.
S
S
Simon Tournier wrote on 14 Sep 00:36 +0200
86o7i5wvj2.fsf@gmail.com
Hi Ludo,

On Wed, 13 Sep 2023 at 20:10, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (7 lines)
> ‘get-internal-run-time’ returns “units of processor time” used by the
> current process (info "(guile) Time"). When shelling out, the process
> calls waitpid(2) and does nothing, so naturally its processor time is
> close to zero.
>
> ‘get-internal-real-time’ should give something closer to elapsed time.

Well, let avoid to mix unrelated discussion. :-) For discussing that
specific part, I reported on guix-devel my timing using ,time.

comparing commit-relation using Scheme+libgit2 vs shellout plumbing Git
Simon Tournier <zimon.toutoune@gmail.com>
Tue, 12 Sep 2023 00:48:30 +0200
id:865y4gz5q9.fsf@gmail.com

The result is still significantly less and discussion is welcome
overthere. :-)

Cheers,
simon
L
L
Ludovic Courtès wrote 4 days ago
(address . 65720@debbugs.gnu.org)
87jzsnf6tr.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (9 lines)
> As reported by Tobias on IRC (in the context of ‘hpcguix-web’),
> checkouts managed by Guile-Git appear to grow beyond reason. As an
> example, here’s the same ‘.git’ managed with Guile-Git and with Git:
>
> $ du -hs ~/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> 6.7G /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
> $ du -hs .git
> 517M .git

More data… The biggest file in that repo is a pack that was created
when that repo was first cloned (Aug. 2021):

Toggle snippet (10 lines)
$ du /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/objects/pack/* |sort -k1 -n| tail -3
44272 /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/objects/pack/pack-3c2f1857501b01c321bc67ba1f30704deb9e18e9.pack
47272 /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/objects/pack/pack-30d5b35ad14a8398464e49e224811b162f673d66.pack
191492 /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/objects/pack/pack-d39507858782209d1ad87e389e4dffd4b6ff7ea2.pack
$ ls -l /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/objects/pack/pack-d39507858782209d1ad87e389e4dffd4b6ff7ea2.pack
-r--r--r-- 1 ludo users 196079671 Aug 9 2021 /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/objects/pack/pack-d39507858782209d1ad87e389e4dffd4b6ff7ea2.pack
$ ls -ld /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/config
-rw-r--r-- 1 ludo users 266 Aug 9 2021 /home/ludo/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq/.git/config

The pack starts with things from Aug. 2021:

Toggle snippet (12 lines)
$ git show-index < pack-d39507858782209d1ad87e389e4dffd4b6ff7ea2.idx|sort -k1 -n|head -3
12 30289f4d4638452520f52c1a36240220d0d940ff (852d8cb3)
927 d7ffc535c52f49177a8e5553569cdb1e321b5bc6 (2007c5d0)
1800 0a379de3249d5e9ff66fb404f7e5aa8ce2cb3d24 (b1e69aa4)
$ git show 30289f4d4638452520f52c1a36240220d0d940ff
commit 30289f4d4638452520f52c1a36240220d0d940ff
Author: Milkey Mouse <milkeymouse@meme.institute>
Date: Sun Aug 8 22:15:40 2021 -0700

[…]

… and at the bottom (large offsets) it contains very old blogs from the
Nix repo that somehow made it here.

I figured we still had a ‘nix’ branch from the early days, that contains
the history of Nix. I’ve now removed it, which helps a bit:

Toggle snippet (9 lines)
scheme@(guile-user)> ,use(git)
scheme@(guile-user)> ,t (clone "https://git.savannah.gnu.org/git/guix.git" "/tmp/guix")
$5 = #<git-repository 91a7b0>
;; 600.534529s real time, 435.260926s run time. 0.000000s spent in GC.
scheme@(guile-user)> ,t (clone "https://git.savannah.gnu.org/git/guix.git" "/tmp/guix-after-removing-nix-branch")
$6 = #<git-repository 4465a50>
;; 420.321511s real time, 398.772963s run time. 0.000000s spent in GC.

… and more importantly:

Toggle snippet (6 lines)
$ du -hs /tmp/guix/.git
373M /tmp/guix/.git
$ du -hs /tmp/guix-after-removing-nix-branch/.git
362M /tmp/guix-after-removing-nix-branch/.git

Anyway, what seems to happen is that every pull (every call to
‘remote-fetch’) creates a new pack (see ‘git_fetch_download_pack’ in
libgit2), which becomes inefficient in the long run (lots of small
poorly-compressed packs). That’s at least one possible explanation.

To be continued…

Ludo’.
S
S
Simon Tournier wrote 3 days ago
86wmwmlje7.fsf@gmail.com
Hi Ludo.

On Tue, 19 Sep 2023 at 00:35, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (10 lines)
> --8<---------------cut here---------------start------------->8---
> scheme@(guile-user)> ,use(git)
> scheme@(guile-user)> ,t (clone "https://git.savannah.gnu.org/git/guix.git" "/tmp/guix")
> $5 = #<git-repository 91a7b0>
> ;; 600.534529s real time, 435.260926s run time. 0.000000s spent in GC.
> scheme@(guile-user)> ,t (clone "https://git.savannah.gnu.org/git/guix.git" "/tmp/guix-after-removing-nix-branch")
> $6 = #<git-repository 4465a50>
> ;; 420.321511s real time, 398.772963s run time. 0.000000s spent in GC.
> --8<---------------cut here---------------end--------------->8---

[...]

Toggle quote (7 lines)
> --8<---------------cut here---------------start------------->8---
> $ du -hs /tmp/guix/.git
> 373M /tmp/guix/.git
> $ du -hs /tmp/guix-after-removing-nix-branch/.git
> 362M /tmp/guix-after-removing-nix-branch/.git
> --8<---------------cut here---------------end--------------->8---

Just to also point [1] that using shallow clone and restrict to the
oldest reachable commit by the time-machine, it saves 25% of bits to
download, and similarly on disk.

Toggle snippet (22 lines)
scheme@(guix-user)> ,t (clone "https://git.savannah.gnu.org/git/guix.git" "/tmp/guix-guile")
$1 = #<git-repository df3710>
;; 383.186818s real time, 278.060733s run time. 0.000000s spent in GC.

$ time git clone https://git.savannah.gnu.org/git/guix.git guix-full
Receiving objects: 100% (693699/693699), 342.14 MiB | 2.87 MiB/s, done.
real 2m40,830s
user 3m4,683s
sys 0m8,189s

$ time git clone --shallow-since=2019-04-30 https://git.savannah.gnu.org/git/guix.git guix-oldest
Receiving objects: 100% (428646/428646), 259.41 MiB | 3.87 MiB/s, done.
real 1m45,604s
user 2m32,370s
sys 0m5,916s

$ du -sh guix-*/.git
362M guix-full/.git
362M guix-guile/.git
272M guix-oldest/.git

Cheers,
simon


1: Re: hard dependency on Git? (was bug#65866: [PATCH 0/8] Add built-in builder for Git checkouts)
Simon Tournier <zimon.toutoune@gmail.com>
Mon, 11 Sep 2023 19:52:34 +0200
id:871qf4ha1p.fsf@gmail.com
?