Guix substituter gives up too easily, causing higher-level commands to fail catastrophically

OpenSubmitted by raid5atemyhomework.
Details
One participant
  • raid5atemyhomework
Owner
unassigned
Severity
normal
R
R
raid5atemyhomework wrote on 15 Mar 2021 01:31
(name . bug-guix@gnu.org)(address . bug-guix@gnu.org)
e9RKjmez-10ShzXq3Xh5yF03M0LlCYCsZlVdK7jphepWq-MYl6w97v-OoCgKmlcCNBsgfkPZFonH09a3OkaFRabLIvsxVotf1qpUJiK01iI=@protonmail.com
Split off from 46942

I recently had to rebuild a Guix OS. I ran a modified installer that used two substitute URLs: the SJTUG mirror, and the official Cuirass server in Berlin, listed in that order.

Unfortunately, it seems the SJTUG mirror has some reliability problems, during install the guided installation repeatedly failed during downloading of a `e2fsck` archive; I was unable to save the error since this occurred during an install when I had no easy way to copy-paste error messages.

But that's tangential to *this* particular report. My expectation when I give multiple substitute URLs is that if one substitute server is failing, it should automatically try other substitute servers. So even if it gets *any* kind of error on the first listed server, the substituter should make an effort to talk to some other server listed in the `substitute-urls`, not give up immediately.

My expectation is this:

* For each substitute server:
* If package exists in server:
* Try downloading
* If downloading completed with signature OK, exit function with success.
* else continue
* If we reach here, build locally if `--fallback` is specified.

But it looks like:

* For each substitute server:
* If package exists in server:
* Try downloading
* If downloading completed with signature OK, exit function with success.
* else error out of the function
* If we reach here, build locally if `--fallback` is specified.

The simple fact of the matter is that Internet connectivity ***IS NOT RELIABLE***, so failure to download MUST NOT mean that the substituter should assume the worst and should just fail catastrophically and abort, which then leads to higher-level constructions *also* aborting, sometimes throwing several minutes' worth of processing and downloading down the drain just because some package was not downloaded from a particular substitute server.

So what I would really like is:

* The substituter should really make at least a token effort to *resume* a partial download that failed midway through. Even just have it try a resume *once* would be good.
* The substituter should really fall back to the next item in the list of substitute URLs if the first one is not behaving properly, instead of deciding that the first substitute URL is the canonical version and further substitute URLs are to be ignored.

Is there any reason why the above is not feasible for Guix?

Thanks
raid5atemyhomework
?