List of discovered substitute servers not refreshed?

  • Open
  • quality assurance status badge
Details
3 participants
  • Ludovic Courtès
  • Mathieu Othacehe
  • Maxim Cournoyer
Owner
unassigned
Submitted by
Ludovic Courtès
Severity
normal
Merged with
L
L
Ludovic Courtès wrote on 3 Jun 2021 12:08
(address . bug-guix@gnu.org)
87y2bruupn.fsf@inria.fr
Hi!

I’ve turned on discovery and guix-daemon discovered the laptop next to
me (yay!). However, more than 10mn after said laptop has been turned
off, ‘guix substitute’ is still trying to connect to it.

The disconnected laptop is still listed in /var/guix/discover/publish.

It also shows up in ‘avahi-browse _guix_publish._tcp -r’, but it fails
to resolve:

Toggle snippet (3 lines)
Failed to resolve service 'guix-publish-XYZ' of type '_guix_publish._tcp' in domain 'local': Timeout reached

Perhaps (guix scripts discover) should not just wait for
‘remove-service’ events but should also attempt to resolve them?

Or maybe the problem is that the TTL of the published entry is too long?
I cannot find how to change that in the Guile-Avahi API though.

Ludo’.
M
M
Mathieu Othacehe wrote on 4 Jun 2021 10:00
control message for bug #48808
(address . control@debbugs.gnu.org)
875yyuf49q.fsf@meije.i-did-not-set--mail-host-address--so-tickle-me
merge 48808 45302
quit
M
M
Maxim Cournoyer wrote on 11 Jan 2022 04:43
Re: bug#51472: substitute servers should be preferred according to their coverage rate
(name . Ludovic Courtès)(address . ludo@gnu.org)
87pmozndi2.fsf@gmail.com
merge 48808 51472
thanks

Hello Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (46 lines)
> Hi,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> When using substitute servers discovery, I've noticed that if one of the
>> substitute servers doesn't have any substitutes available, it'll keep
>> getting tried instead of others, leading to a slide-show of substitutes
>> updates such as:
>>
>> normalized load on machine '127.0.0.1' is 0.04
>> building /gnu/store/ajd0hx104702jpz2ycdwgrnyrv8jsp6d-xorg-server-21.1.0.tar.xz.drv...
>> process 9195 acquired build slot '/var/guix/offload/127.0.0.1:6666/1'
>> normalized load on machine '127.0.0.1' is 0.04
>> building /gnu/store/49rqi3wpvdm5pv6in9pamzdvg0wscrl8-xorgproto-2021.5.drv...
>> substitute: updating substitutes from 'http://192.168.10.102:80'... 0.0%
>> substitute: updating substitutes from 'http://192.168.10.102:80'... 0.0%
>> substitute: updating substitutes from 'http://192.168.10.102:80'... 0.0%
>> substitute: updating substitutes from 'http://192.168.10.102:80'... 0.0%
>> substitute: updating substitutes from 'http://192.168.10.102:80'... 0.0%
>
> We’d need to check why this particular server is checked repeatedly.
> The fact that it displays “0.0%” doesn’t mean that the server lacks
> substitutes, but that it does not reply to ‘GET /xyz.narinfo’ requests,
> for example because it’s off-line (see
> <https://issues.guix.gnu.org/48808>.)
>
>> We should implement some scheme to prefer querying high-substitute
>> servers first, instead of wasting time querying servers always failed
>> queries; this would greatly improve performance when using substitute
>> discovery for example combined with low coverage.
>
> There are several problems with that. First one is that you can’t tell
> what substitute coverage is until you’ve actually made those GET
> requests. Second one is that substitute coverage varies and it’s not an
> absolute measure; for example, if a server provides substitutes for only
> 0.1% of all the packages, but that’s precisely the 0.1% you care about,
> it’s more valuable than the one that has 99% of the packages but lacks
> those you want.
>
> There are other issues such as the fact that current semantics is to
> respect the order of substitute URLs, which is presumably chosen by the
> user according to their own criteria: download speed, bandwidth usage,
> etc.
>
> I hope this makes sense!

It does! I agree that it'd be tricky to get this right; makes me
realize that my problem is probably due to #48808, and fixing that one
would probably have avoided that bug report :-).

I'm merging this one with 48808.

Thank you!

Maxim
?