time-machine is doing too much network requests

  • Open
  • quality assurance status badge
Details
2 participants
  • Ludovic Courtès
  • Simon Tournier
Owner
unassigned
Submitted by
Simon Tournier
Severity
normal
S
S
Simon Tournier wrote on 6 Sep 2023 18:26
(address . bug-guix@gnu.org)
87wmx3mfo5.fsf@gmail.com
Hi,

Well, I am in a quest to make Guix more robust for the worst-case
scenario: Savannah is unreachable (as well as other servers). For
context, it matters when using Guix for reproducing scientific
production. An example of such worst-case, see [1]. Well, this quote:

The first annoyance is that guix time-machine needs an access to the
server git.savannah.gnu.org, although the Git repository is already
cloned and already contains the required commit.

is almost tackled by #65352; at least tracked. :-)

Investigating that, I am noticed that the current design is suboptimal,
IMHO. I am reporting here and I hope to improve the situation by
reducing the number of network requests.

It matters in worst-case scenario of scientific production. And it also
matters for people with poor or unstable network link.

Sorry if the report is hard to follow, I did my best for being clear.

To keep the discussion simple, I only consider the Git reference
specification ’branch’ and ’tag-or-commit’. These Git reference
specification that various internal procedures are using is poorly
documented. See the docstring of the procedure ’update-cached-checkout’
from (guix git) for an idea or the implementation of ’resolve-reference’
for the complete list.

Let consider only the Git reference specifications:

(branch . "string")
(tag-or-commit . "string")

because that are what “guix time-machine” sets from the CLI or reads
from channels.scm files, IIUC.

The command “guix time-machine” starts to call ’cached-channel-instance’
passing as argument the procedure ’validate-guix-channel’.

This procedure ’cached-channel-instance’ starts by collecting all the
commits for each channel. It maps the channels list using the procedure
’channel-full-commit’. And that procedure calls
’update-cached-checkout. (1)

Then, ’cached-channel-instance’ calls ’validate-guix-channel’. And this
procedure also calls ’update-cached-checkout’. (2)

Then, ’cached-channel-instance’ calls ’latest-channel-instances’ which
calls ’latest-channel-instance’. And guess what, this procedure also
calls ’update-cached-checkout’. (3)

Ok, let give a look at ’update-cached-checkout’.

This procedure ’update-cached-checkout’ first looks if the Git reference
specification is already in the cached Git checkout using the procedure
’reference-available?’.

Consider that the Git reference specification is (branch . "some"), then
’reference-available?’ returns #false, so it triggers ’remote-fetch’
from Guile-Git. If I read correctly, this generates network traffic and
Savannah needs to be reachable. (I)

Hum, I am not convinced someone is following. Who knows? :-)

Let continue. ’update-cached-checkout’ starts to check some commit
relation and friends. There is an if-branch calling then
’switch-to-ref’ else ’resolve-reference’. Under the hood, the procedure
’switch-to-ref’ is calling ’resolve-reference’.

For the case (branch . "some"), this ’resolve-reference’ procedure calls
’branch-lookup’ from Guile-Git. If I read correctly, this generates
network traffic because of BRANCH-REMOTE and Savannah needs to be
reachable. (II)

Summary: ( (1) + (2) + (3) ) * ( (I) + (II) ) = 6.

If I am correct and if I am not missing something, the current design
requires 6 network traffic with Savannah and most of this traffic is
useless because it had already be done, somehow.

Well, (branch . "some") is the worst case, IMHO. And the short commit
ID (tag-or-commit . "1234abc") or the tag (tag-or-commit . "v1.4.0")
too.

Applying my proposal from #65352 (DRAFT v2), it removes some useless
’remote-fetch’ calls.

Well, let me know if this diagnostic is correct.

To be continued…


Cheers,
simon


L
L
Ludovic Courtès wrote on 10 Sep 2023 22:10
(name . Simon Tournier)(address . zimon.toutoune@gmail.com)(address . 65787@debbugs.gnu.org)
87zg1tvlfn.fsf@gnu.org
Hello Simon,

This is a long message; I agree with the intent (avoiding network
traffic when the required commit is already in cache), but I’m not sure
about the analysis. It would probably be easier if you could come with
an example where there’s Git-related network traffic where there
shouldn’t.

Let me give some perspective on what the code intends to do.

‘cached-channel-instance’ has 3 cases:

1. Obvious cache hit: This is when CHANNELS specifies the commit of
each target channel (this happens for example when you type ‘guix
time-machine -q --commit=a4c35c607cfd7d6b0bad90cfcc46188d489e1754)
*and* the combination of channels is already in
~/.cache/guix/inferiors. This is the optimal case: the Git repo
doesn’t even need to be opened.

2. Cache hit: CHANNELS are pinned, but refer to tags (like “v1.2.0”)
or short commit IDs (like “a4c35c6”). In that case,
‘channel-full-commit’ opens the Git repo to resolve the identifier.
After that, we go to case #1 or #4.

3. Cache hit: CHANNELS are not pinned—i.e., they refer to a branch,
not a commit. In that case we first need to perform a ‘git fetch’
and then we go to #1 or #4.

4. Cache miss: ‘reference-available?’ returns #f for the channel
commits, we got through ‘remote-fetch’ followed by
‘build-derivations’.

As with all caches, what matters is to make sure case #1 is processes as
efficiently as possible. I believe it’s the case since the Git repo is
not even opened.

Of course it would be nice to speed up #2 and #3 too (as long as it’s
not at the expense of #1). Maybe this is the purpose of your message:
reducing Git remote accesses in those cases? (Apologies, I just
realized this might have been what you had in mind. :-))

Thanks,
Ludo’.
S
S
Simon Tournier wrote on 11 Sep 2023 11:41
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 65787@debbugs.gnu.org)
87tts1jbbx.fsf@gmail.com
Hi Ludo,

Toggle quote (3 lines)
> Maybe this is the purpose of your message:
> reducing Git remote accesses in those cases?

Yes. :-)

On Sun, 10 Sep 2023 at 22:10, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (4 lines)
> It would probably be easier if you could come with
> an example where there’s Git-related network traffic where there
> shouldn’t.

Do we agree that the both Guile-Git procedures ’remote-fetch’ and
’branch-lookup’ are doing network traffic?

If yes, here two examples:

Toggle snippet (11 lines)
$ ./pre-inst-env guix pull -q -p /tmp/new
Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...

;;; (remote-fetch NETWORK)

;;; (branch-lookup NETWORK)
Authenticating channel 'guix', commits 9edb3f6 to 2eb6df5 (83 new commits)...
Building from this channel:
guix https://git.savannah.gnu.org/git/guix.git 2eb6df5

Well, that’s not perfect… it becomes much worse:

Toggle snippet (17 lines)
$ ./pre-inst-env guix time-machine -q -- describe

;;; (remote-fetch NETWORK)

;;; (branch-lookup NETWORK)

;;; (remote-fetch NETWORK)

;;; (branch-lookup NETWORK)
Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...

;;; (remote-fetch NETWORK)

;;; (branch-lookup NETWORK)
building /gnu/store/z8jwdgr7z6i8c00msdm2kzfv0n3zp25v-module-import-compiled.drv...

Do we agree that is suboptimal?


Toggle quote (9 lines)
> ‘cached-channel-instance’ has 3 cases:
>
> 1. Obvious cache hit: This is when CHANNELS specifies the commit of
> each target channel (this happens for example when you type ‘guix
> time-machine -q --commit=a4c35c607cfd7d6b0bad90cfcc46188d489e1754)
> *and* the combination of channels is already in
> ~/.cache/guix/inferiors. This is the optimal case: the Git repo
> doesn’t even need to be opened.

Yes.


Toggle quote (5 lines)
> 2. Cache hit: CHANNELS are pinned, but refer to tags (like “v1.2.0”)
> or short commit IDs (like “a4c35c6”). In that case,
> ‘channel-full-commit’ opens the Git repo to resolve the identifier.
> After that, we go to case #1 or #4.

That’s suboptimal. Currently, it reads:

Toggle snippet (17 lines)
$ guix describe
Generation 28 sept. 06 2023 14:54:50 (current)
guix 6113e05
repository URL: https://git.savannah.gnu.org/git/guix.git
commit: 6113e0529d61df7425f64e30a6bf77f7cfdfe5a5

$ ./pre-inst-env guix time-machine -q --commit=6113e05 -- describe

;;; (remote-fetch NETWORK)

;;; (remote-fetch NETWORK)
Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...

;;; (remote-fetch NETWORK)
Computing Guix derivation for 'x86_64-linux'... | C-c C-c

Using patch from #65352 [1], it removes these useless traffic:

Toggle snippet (5 lines)
$ ./pre-inst-env guix time-machine -q --commit=6113e05 -- describe
Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...
Computing Guix derivation for 'x86_64-linux'... | C-c C-c

Using the proposed patch, it becomes optimal, IMHO. Well, let discuss
it in #65352 – comments are welcome. :-)

Toggle quote (4 lines)
> 3. Cache hit: CHANNELS are not pinned—i.e., they refer to a branch,
> not a commit. In that case we first need to perform a ‘git fetch’
> and then we go to #1 or #4.

I hope that I am convincing you that this case is suboptimal: it does 3
’git fetch’ and more 3 others lookup requiring network. So it is about
6 network access when only one is necessary, IMHO.


Toggle quote (4 lines)
> 4. Cache miss: ‘reference-available?’ returns #f for the channel
> commits, we got through ‘remote-fetch’ followed by
> ‘build-derivations’.

This case is part of #2 and discussed in #65352, IMHO.


Cheers,
simon


1: [bug#65352] [PATCH v2] DRAFT git: Avoid touching the network unless needed in 'reference-available?'.
Simon Tournier <zimon.toutoune@gmail.com>
Wed, 06 Sep 2023 16:17:08 +0200
id:32d3fb5066e0b20e200dabef0fba897634e21dda.1694009405.git.zimon.toutoune@gmail.com
S
S
Simon Tournier wrote on 11 Sep 2023 13:36
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 65787@debbugs.gnu.org)
874jk1j60n.fsf@gmail.com
Oops, missing diff for clarity. :-)

On Mon, 11 Sep 2023 at 11:41, Simon Tournier <zimon.toutoune@gmail.com> wrote:

Toggle quote (2 lines)
> If yes, here two examples:

Adding ’pk’ where ’remote-fetch’ and ’branch-lookup’ are called.
Toggle diff (30 lines)
diff --git a/guix/git.scm b/guix/git.scm
index 1cb87a45607b..0209826c5c00 100644
--- a/guix/git.scm
+++ b/guix/git.scm
@@ -234,8 +234,10 @@ (define (resolve-reference repository ref)
(let resolve ((ref ref))
(match ref
(('branch . branch)
- (let ((oid (reference-target
- (branch-lookup repository branch BRANCH-REMOTE))))
+ (let ((oid (begin
+ (pk 'branch-lookup 'NETWORK)
+ (reference-target
+ (branch-lookup repository branch BRANCH-REMOTE)))))
(object-lookup repository oid)))
(('symref . symref)
(let ((oid (reference-name->oid repository symref)))
@@ -483,8 +485,10 @@ (define* (update-cached-checkout url
;; Only fetch remote if it has not been cloned just before.
(when (and cache-exists?
(not (reference-available? repository ref)))
- (remote-fetch (remote-lookup repository "origin")
- #:fetch-options (make-default-fetch-options)))
+ (begin
+ (pk 'remote-fetch 'NETWORK)
+ (remote-fetch (remote-lookup repository "origin")
+ #:fetch-options (make-default-fetch-options))))
(when recursive?
(update-submodules repository #:log-port log-port
#:fetch-options (make-default-fetch-options)))
Cheers,
simon
?