(address . bug-guix@gnu.org)
Hi,
Well, I am in a quest to make Guix more robust for the worst-case
scenario: Savannah is unreachable (as well as other servers). For
context, it matters when using Guix for reproducing scientific
production. An example of such worst-case, see [1]. Well, this quote:
The first annoyance is that guix time-machine needs an access to the
server git.savannah.gnu.org, although the Git repository is already
cloned and already contains the required commit.
is almost tackled by #65352; at least tracked. :-)
Investigating that, I am noticed that the current design is suboptimal,
IMHO. I am reporting here and I hope to improve the situation by
reducing the number of network requests.
It matters in worst-case scenario of scientific production. And it also
matters for people with poor or unstable network link.
Sorry if the report is hard to follow, I did my best for being clear.
To keep the discussion simple, I only consider the Git reference
specification ’branch’ and ’tag-or-commit’. These Git reference
specification that various internal procedures are using is poorly
documented. See the docstring of the procedure ’update-cached-checkout’
from (guix git) for an idea or the implementation of ’resolve-reference’
for the complete list.
Let consider only the Git reference specifications:
(branch . "string")
(tag-or-commit . "string")
because that are what “guix time-machine” sets from the CLI or reads
from channels.scm files, IIUC.
The command “guix time-machine” starts to call ’cached-channel-instance’
passing as argument the procedure ’validate-guix-channel’.
This procedure ’cached-channel-instance’ starts by collecting all the
commits for each channel. It maps the channels list using the procedure
’channel-full-commit’. And that procedure calls
’update-cached-checkout. (1)
Then, ’cached-channel-instance’ calls ’validate-guix-channel’. And this
procedure also calls ’update-cached-checkout’. (2)
Then, ’cached-channel-instance’ calls ’latest-channel-instances’ which
calls ’latest-channel-instance’. And guess what, this procedure also
calls ’update-cached-checkout’. (3)
Ok, let give a look at ’update-cached-checkout’.
This procedure ’update-cached-checkout’ first looks if the Git reference
specification is already in the cached Git checkout using the procedure
’reference-available?’.
Consider that the Git reference specification is (branch . "some"), then
’reference-available?’ returns #false, so it triggers ’remote-fetch’
from Guile-Git. If I read correctly, this generates network traffic and
Savannah needs to be reachable. (I)
Hum, I am not convinced someone is following. Who knows? :-)
Let continue. ’update-cached-checkout’ starts to check some commit
relation and friends. There is an if-branch calling then
’switch-to-ref’ else ’resolve-reference’. Under the hood, the procedure
’switch-to-ref’ is calling ’resolve-reference’.
For the case (branch . "some"), this ’resolve-reference’ procedure calls
’branch-lookup’ from Guile-Git. If I read correctly, this generates
network traffic because of BRANCH-REMOTE and Savannah needs to be
reachable. (II)
Summary: ( (1) + (2) + (3) ) * ( (I) + (II) ) = 6.
If I am correct and if I am not missing something, the current design
requires 6 network traffic with Savannah and most of this traffic is
useless because it had already be done, somehow.
Well, (branch . "some") is the worst case, IMHO. And the short commit
ID (tag-or-commit . "1234abc") or the tag (tag-or-commit . "v1.4.0")
too.
Applying my proposal from #65352 (DRAFT v2), it removes some useless
’remote-fetch’ calls.
Well, let me know if this diagnostic is correct.
To be continued…
Cheers,
simon