Updating substitutes on LAN hosts dies unexpectedly

  • Done
  • quality assurance status badge
Details
5 participants
  • Guillaume Le Vaillant
  • Ludovic Courtès
  • Maxim Cournoyer
  • Maxime Devos
  • Simon Streit
Owner
unassigned
Submitted by
Simon Streit
Severity
normal
S
S
Simon Streit wrote on 31 Jan 2022 19:37
(address . bug-guix@gnu.org)
yguee4neoq6.fsf@netpanic.org
Hello,

quite often, and quite randomly I run into this situation that whenever
Guix tries to rebuild a profile, and sometimes while downloading from
local Guix hosts sharing their store items, the process will crash with
the following error:

Toggle snippet (35 lines)
~ $1 reconfigure
substitute: updating substitutes from 'http://192.168.0.157:3000'... 56.3%Backtrace:
substitute: In ice-9/boot-9.scm:
substitute: 1752:10 17 (with-exception-handler _ _ #:unwind? _ # _)
substitute: In unknown file:
substitute: 16 (apply-smob/0 #<thunk 7f309a1caf60>)
substitute: In ice-9/boot-9.scm:
substitute: 724:2 15 (call-with-prompt _ _ #<procedure default-prompt-handle…>)
substitute: In ice-9/eval.scm:
substitute: 619:8 14 (_ #(#(#<directory (guile-user) 7f309a1c4c80>)))
substitute: In guix/ui.scm:
substitute: 2206:7 13 (run-guix . _)
substitute: 2169:10 12 (run-guix-command _ . _)
substitute: In ice-9/boot-9.scm:
substitute: 1752:10 11 (with-exception-handler _ _ #:unwind? _ # _)
substitute: 1752:10 10 (with-exception-handler _ _ #:unwind? _ # _)
substitute: In guix/scripts/substitute.scm:
substitute: 757:18 9 (_)
substitute: 348:26 8 (process-query #<output: file 4> _ #:cache-urls _ #:acl _)
substitute: In guix/substitutes.scm:
substitute: 365:27 7 (lookup-narinfos/diverse _ _ #<procedure 7f309690d320 …> …)
substitute: 322:31 6 (lookup-narinfos _ _ #:open-connection _ # _)
substitute: 245:26 5 (fetch-narinfos _ _ #:open-connection _ # _)
substitute: In ice-9/boot-9.scm:
substitute: 1685:16 4 (raise-exception _ #:continuable? _)
substitute: 1685:16 3 (raise-exception _ #:continuable? _)
substitute: 1780:13 2 (_ #<&compound-exception components: (#<&assertion-fail…>)
substitute: 1685:16 1 (raise-exception _ #:continuable? _)
substitute: 1685:16 0 (raise-exception _ #:continuable? _)
substitute:
substitute: ice-9/boot-9.scm:1685:16: In procedure raise-exception:
substitute: Wrong type (expecting exact integer): #f
guix system: error: `/gnu/store/kcc8zh1fhp05wgw2m48w3gk228j39f5q-guix-1.3.0-21.e427593/bin/guix substitute' died unexpectedly

Unfortunately this crash happens at random. Other times it goes
through. Current checkout is ff14bc60e56fb0c6636da1da9a850b7b04abb367,
which isn't the most current, I know. I've been observing this behavior
since some time now, and haven't figured out what the reason is behind
it yet. The error message looks similar to [1].

The way this error appears is, that I usually have one host that I
upgrade first, and then share the checkout and the store between hosts
to speed up the upgrading process locally. Unfortunately the updater
will crash randomly whenever the host starts scanning other hosts that
are found through mDNS. Sometimes this happens while fetching new
packages into a profile.

I've set up publishing:

Toggle snippet (7 lines)
(service guix-publish-service-type
(guix-publish-configuration (host "0.0.0.0")
(port 3000)
(ttl #f)
(advertise? #t)))

and of course host discovery in guix-service-type too.


Kind regards
Simon


G
G
Guillaume Le Vaillant wrote on 31 Jan 2022 21:17
(name . Simon Streit)(address . simon@netpanic.org)(address . 53668@debbugs.gnu.org)
87r18n64du.fsf@kitej
Simon Streit <simon@netpanic.org> skribis:

Toggle quote (72 lines)
> Hello,
>
> quite often, and quite randomly I run into this situation that whenever
> Guix tries to rebuild a profile, and sometimes while downloading from
> local Guix hosts sharing their store items, the process will crash with
> the following error:
>
> ~ $1 reconfigure
> substitute: updating substitutes from 'http://192.168.0.157:3000'... 56.3%Backtrace:
> substitute: In ice-9/boot-9.scm:
> substitute: 1752:10 17 (with-exception-handler _ _ #:unwind? _ # _)
> substitute: In unknown file:
> substitute: 16 (apply-smob/0 #<thunk 7f309a1caf60>)
> substitute: In ice-9/boot-9.scm:
> substitute: 724:2 15 (call-with-prompt _ _ #<procedure default-prompt-handle…>)
> substitute: In ice-9/eval.scm:
> substitute: 619:8 14 (_ #(#(#<directory (guile-user) 7f309a1c4c80>)))
> substitute: In guix/ui.scm:
> substitute: 2206:7 13 (run-guix . _)
> substitute: 2169:10 12 (run-guix-command _ . _)
> substitute: In ice-9/boot-9.scm:
> substitute: 1752:10 11 (with-exception-handler _ _ #:unwind? _ # _)
> substitute: 1752:10 10 (with-exception-handler _ _ #:unwind? _ # _)
> substitute: In guix/scripts/substitute.scm:
> substitute: 757:18 9 (_)
> substitute: 348:26 8 (process-query #<output: file 4> _ #:cache-urls _ #:acl _)
> substitute: In guix/substitutes.scm:
> substitute: 365:27 7 (lookup-narinfos/diverse _ _ #<procedure 7f309690d320 …> …)
> substitute: 322:31 6 (lookup-narinfos _ _ #:open-connection _ # _)
> substitute: 245:26 5 (fetch-narinfos _ _ #:open-connection _ # _)
> substitute: In ice-9/boot-9.scm:
> substitute: 1685:16 4 (raise-exception _ #:continuable? _)
> substitute: 1685:16 3 (raise-exception _ #:continuable? _)
> substitute: 1780:13 2 (_ #<&compound-exception components: (#<&assertion-fail…>)
> substitute: 1685:16 1 (raise-exception _ #:continuable? _)
> substitute: 1685:16 0 (raise-exception _ #:continuable? _)
> substitute:
> substitute: ice-9/boot-9.scm:1685:16: In procedure raise-exception:
> substitute: Wrong type (expecting exact integer): #f
> guix system: error: `/gnu/store/kcc8zh1fhp05wgw2m48w3gk228j39f5q-guix-1.3.0-21.e427593/bin/guix substitute' died unexpectedly
>
>
> Unfortunately this crash happens at random. Other times it goes
> through. Current checkout is ff14bc60e56fb0c6636da1da9a850b7b04abb367,
> which isn't the most current, I know. I've been observing this behavior
> since some time now, and haven't figured out what the reason is behind
> it yet. The error message looks similar to [1].
>
> The way this error appears is, that I usually have one host that I
> upgrade first, and then share the checkout and the store between hosts
> to speed up the upgrading process locally. Unfortunately the updater
> will crash randomly whenever the host starts scanning other hosts that
> are found through mDNS. Sometimes this happens while fetching new
> packages into a profile.
>
> I've set up publishing:
>
> (service guix-publish-service-type
> (guix-publish-configuration (host "0.0.0.0")
> (port 3000)
> (ttl #f)
> (advertise? #t)))
>
> and of course host discovery in guix-service-type too.
>
>
> Kind regards
> Simon
>
>
> [1] https://issues.guix.gnu.org/52464

Hi,

I have also seen this kind of crashes on my LAN.
I don't know what causes this issue, but I noticed that it has happened
less often since I activated the cache for the guix-publish service.
-----BEGIN PGP SIGNATURE-----

iIUEAREKAC0WIQTLxZxm7Ce5cXlAaz5r6CCK3yH+PwUCYfhFTQ8cZ2x2QHBvc3Rl
by5uZXQACgkQa+ggit8h/j898QD+PNojQz8a7Lq9wMwiJpnBXglOWgcX7pskNJtX
rOsMI4kBAJeSEWQrmU00INAr3nO7sdWoVqWuX+qWWq/fsfSUkv6P
=yNEt
-----END PGP SIGNATURE-----

S
S
Simon Streit wrote on 31 Jan 2022 22:38
(name . Guillaume Le Vaillant)(address . glv@posteo.net)(address . 53668@debbugs.gnu.org)
ygutudj4md2.fsf@netpanic.org
Hello Guillaume,

Guillaume Le Vaillant <glv@posteo.net> writes:

Toggle quote (4 lines)
> I have also seen this kind of crashes on my LAN.
> I don't know what causes this issue, but I noticed that it has happened
> less often since I activated the cache for the guix-publish service.

I just activated the cache. It hasn't crashed so far.
S
S
Simon Streit wrote on 31 Jan 2022 22:55
(name . Guillaume Le Vaillant)(address . glv@posteo.net)(address . 53668@debbugs.gnu.org)
ygur18n4lkc.fsf@netpanic.org
Simon Streit <simon@netpanic.org> writes:
Toggle quote (2 lines)
> I just activated the cache. It hasn't crashed so far.

No, it did crash again.
M
M
Maxim Cournoyer wrote on 1 Feb 2022 03:06
(name . Simon Streit)(address . simon@netpanic.org)(address . 53668@debbugs.gnu.org)
87czk7l4qu.fsf@gmail.com
Hi Simon,

Simon Streit <simon@netpanic.org> writes:

Toggle quote (41 lines)
> Hello,
>
> quite often, and quite randomly I run into this situation that whenever
> Guix tries to rebuild a profile, and sometimes while downloading from
> local Guix hosts sharing their store items, the process will crash with
> the following error:
>
> ~ $1 reconfigure
> substitute: updating substitutes from 'http://192.168.0.157:3000'... 56.3%Backtrace:
> substitute: In ice-9/boot-9.scm:
> substitute: 1752:10 17 (with-exception-handler _ _ #:unwind? _ # _)
> substitute: In unknown file:
> substitute: 16 (apply-smob/0 #<thunk 7f309a1caf60>)
> substitute: In ice-9/boot-9.scm:
> substitute: 724:2 15 (call-with-prompt _ _ #<procedure default-prompt-handle…>)
> substitute: In ice-9/eval.scm:
> substitute: 619:8 14 (_ #(#(#<directory (guile-user) 7f309a1c4c80>)))
> substitute: In guix/ui.scm:
> substitute: 2206:7 13 (run-guix . _)
> substitute: 2169:10 12 (run-guix-command _ . _)
> substitute: In ice-9/boot-9.scm:
> substitute: 1752:10 11 (with-exception-handler _ _ #:unwind? _ # _)
> substitute: 1752:10 10 (with-exception-handler _ _ #:unwind? _ # _)
> substitute: In guix/scripts/substitute.scm:
> substitute: 757:18 9 (_)
> substitute: 348:26 8 (process-query #<output: file 4> _ #:cache-urls _ #:acl _)
> substitute: In guix/substitutes.scm:
> substitute: 365:27 7 (lookup-narinfos/diverse _ _ #<procedure 7f309690d320 …> …)
> substitute: 322:31 6 (lookup-narinfos _ _ #:open-connection _ # _)
> substitute: 245:26 5 (fetch-narinfos _ _ #:open-connection _ # _)
> substitute: In ice-9/boot-9.scm:
> substitute: 1685:16 4 (raise-exception _ #:continuable? _)
> substitute: 1685:16 3 (raise-exception _ #:continuable? _)
> substitute: 1780:13 2 (_ #<&compound-exception components: (#<&assertion-fail…>)
> substitute: 1685:16 1 (raise-exception _ #:continuable? _)
> substitute: 1685:16 0 (raise-exception _ #:continuable? _)
> substitute:
> substitute: ice-9/boot-9.scm:1685:16: In procedure raise-exception:
> substitute: Wrong type (expecting exact integer): #f
> guix system: error: `/gnu/store/kcc8zh1fhp05wgw2m48w3gk228j39f5q-guix-1.3.0-21.e427593/bin/guix substitute' died unexpectedly

I suffer from this routinely on my slow HDD machine. I attribute the
failure to some timeout that isn't reported correctly by guile-ssh.
Perhaps related to
update of guile-ssh to try that Artyom has put up on guix-patches; we
should test it.

Thanks,

Maxim
L
L
Ludovic Courtès wrote on 11 May 2022 16:58
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
87zgjom804.fsf@gnu.org
Hello!

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (46 lines)
> Simon Streit <simon@netpanic.org> writes:
>
>> Hello,
>>
>> quite often, and quite randomly I run into this situation that whenever
>> Guix tries to rebuild a profile, and sometimes while downloading from
>> local Guix hosts sharing their store items, the process will crash with
>> the following error:
>>
>> ~ $1 reconfigure
>> substitute: updating substitutes from 'http://192.168.0.157:3000'... 56.3%Backtrace:
>> substitute: In ice-9/boot-9.scm:
>> substitute: 1752:10 17 (with-exception-handler _ _ #:unwind? _ # _)
>> substitute: In unknown file:
>> substitute: 16 (apply-smob/0 #<thunk 7f309a1caf60>)
>> substitute: In ice-9/boot-9.scm:
>> substitute: 724:2 15 (call-with-prompt _ _ #<procedure default-prompt-handle…>)
>> substitute: In ice-9/eval.scm:
>> substitute: 619:8 14 (_ #(#(#<directory (guile-user) 7f309a1c4c80>)))
>> substitute: In guix/ui.scm:
>> substitute: 2206:7 13 (run-guix . _)
>> substitute: 2169:10 12 (run-guix-command _ . _)
>> substitute: In ice-9/boot-9.scm:
>> substitute: 1752:10 11 (with-exception-handler _ _ #:unwind? _ # _)
>> substitute: 1752:10 10 (with-exception-handler _ _ #:unwind? _ # _)
>> substitute: In guix/scripts/substitute.scm:
>> substitute: 757:18 9 (_)
>> substitute: 348:26 8 (process-query #<output: file 4> _ #:cache-urls _ #:acl _)
>> substitute: In guix/substitutes.scm:
>> substitute: 365:27 7 (lookup-narinfos/diverse _ _ #<procedure 7f309690d320 …> …)
>> substitute: 322:31 6 (lookup-narinfos _ _ #:open-connection _ # _)
>> substitute: 245:26 5 (fetch-narinfos _ _ #:open-connection _ # _)
>> substitute: In ice-9/boot-9.scm:
>> substitute: 1685:16 4 (raise-exception _ #:continuable? _)
>> substitute: 1685:16 3 (raise-exception _ #:continuable? _)
>> substitute: 1780:13 2 (_ #<&compound-exception components: (#<&assertion-fail…>)
>> substitute: 1685:16 1 (raise-exception _ #:continuable? _)
>> substitute: 1685:16 0 (raise-exception _ #:continuable? _)
>> substitute:
>> substitute: ice-9/boot-9.scm:1685:16: In procedure raise-exception:
>> substitute: Wrong type (expecting exact integer): #f
>> guix system: error: `/gnu/store/kcc8zh1fhp05wgw2m48w3gk228j39f5q-guix-1.3.0-21.e427593/bin/guix substitute' died unexpectedly
>
> I suffer from this routinely on my slow HDD machine. I attribute the
> failure to some timeout that isn't reported correctly by guile-ssh.

Guile-SSH is not involved here though.

However, this may well be this ‘guix publish’ bug that was recently
fixed, whereby ‘guix publish’ without ‘--cache’ would send bogus HTTP
replies:


Could you confirm?

Ludo’.
M
M
Maxime Devos wrote on 11 May 2022 17:02
fabe32ec3ee0c04c3b587d0ad93908b2ea1f3df7.camel@telenet.be
Ludovic Courtès schreef op wo 11-05-2022 om 16:58 [+0200]:
Toggle quote (10 lines)
> However, this may well be this ‘guix publish’ bug that was recently
> fixed, whereby ‘guix publish’ without ‘--cache’ would send bogus HTTP
> replies:
>
>   https://issues.guix.gnu.org/54723
>
> Could you confirm?
>
> Ludo’.

Even then, the error message says ‘wrong type error ...’, shouldn't it
say something like ‘bogus HTTP’ or ‘bogus narinfo’ or such instead?

Greetings,
Maxime.
-----BEGIN PGP SIGNATURE-----

iI0EABYKADUWIQTB8z7iDFKP233XAR9J4+4iGRcl7gUCYnvQERccbWF4aW1lZGV2
b3NAdGVsZW5ldC5iZQAKCRBJ4+4iGRcl7ihsAP4wVzmm5lhGs1CqMcSBHZLxzwXf
nfN4mKfwfJ7a0b8URgEA/hXC2rHJF3SVgNmaAwt3OgO5yLdnhWAZjE8+a1CoawE=
=KWHm
-----END PGP SIGNATURE-----


M
M
Maxim Cournoyer wrote on 11 May 2022 17:06
(name . Ludovic Courtès)(address . ludo@gnu.org)
878rr8qfby.fsf@gmail.com
Hi Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:

[...]

Toggle quote (8 lines)
> However, this may well be this ‘guix publish’ bug that was recently
> fixed, whereby ‘guix publish’ without ‘--cache’ would send bogus HTTP
> replies:
>
> https://issues.guix.gnu.org/54723
>
> Could you confirm?

I've reconfigured my offload machine and haven't seen the issue yet.
Tentatively closing.

Thank you for investigating and fixing the issue!

Maxim
Closed
M
M
Maxim Cournoyer wrote on 12 May 2022 03:17
(name . Maxime Devos)(address . maximedevos@telenet.be)
87wnerpn1h.fsf@gmail.com
Hi Maxime,

Maxime Devos <maximedevos@telenet.be> writes:

Toggle quote (14 lines)
> Ludovic Courtès schreef op wo 11-05-2022 om 16:58 [+0200]:
>> However, this may well be this ‘guix publish’ bug that was recently
>> fixed, whereby ‘guix publish’ without ‘--cache’ would send bogus HTTP
>> replies:
>>
>>   https://issues.guix.gnu.org/54723
>>
>> Could you confirm?
>>
>> Ludo’.
>
> Even then, the error message says ‘wrong type error ...’, shouldn't it
> say something like ‘bogus HTTP’ or ‘bogus narinfo’ or such instead?

I agree that a more accurate error message would have been useful. If
you have the bandwidth for a patch, I'd be more than happy reviewing it
:-).

Thanks,

Maxim
L
L
Ludovic Courtès wrote on 12 May 2022 10:21
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
874k1vma9a.fsf@gnu.org
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (20 lines)
> Maxime Devos <maximedevos@telenet.be> writes:
>
>> Ludovic Courtès schreef op wo 11-05-2022 om 16:58 [+0200]:
>>> However, this may well be this ‘guix publish’ bug that was recently
>>> fixed, whereby ‘guix publish’ without ‘--cache’ would send bogus HTTP
>>> replies:
>>>
>>>   https://issues.guix.gnu.org/54723
>>>
>>> Could you confirm?
>>>
>>> Ludo’.
>>
>> Even then, the error message says ‘wrong type error ...’, shouldn't it
>> say something like ‘bogus HTTP’ or ‘bogus narinfo’ or such instead?
>
> I agree that a more accurate error message would have been useful. If
> you have the bandwidth for a patch, I'd be more than happy reviewing it
> :-).

HTTP response parsing happens in (web response) though, not in Guix, so
this is where things should be patch.

Currently that code pretty much expects it’s talking to a well-behaved
HTTP server. Changing it to gracefully deal with any kind of garbage
might turn out to be quite an effort.

Food for thought!

Ludo’.
M
M
Maxim Cournoyer wrote on 12 May 2022 15:47
(name . Ludovic Courtès)(address . ludo@gnu.org)
87ilqaooam.fsf@gmail.com
Hello,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (31 lines)
> Hi,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> Maxime Devos <maximedevos@telenet.be> writes:
>>
>>> Ludovic Courtès schreef op wo 11-05-2022 om 16:58 [+0200]:
>>>> However, this may well be this ‘guix publish’ bug that was recently
>>>> fixed, whereby ‘guix publish’ without ‘--cache’ would send bogus HTTP
>>>> replies:
>>>>
>>>>   https://issues.guix.gnu.org/54723
>>>>
>>>> Could you confirm?
>>>>
>>>> Ludo’.
>>>
>>> Even then, the error message says ‘wrong type error ...’, shouldn't it
>>> say something like ‘bogus HTTP’ or ‘bogus narinfo’ or such instead?
>>
>> I agree that a more accurate error message would have been useful. If
>> you have the bandwidth for a patch, I'd be more than happy reviewing it
>> :-).
>
> HTTP response parsing happens in (web response) though, not in Guix, so
> this is where things should be patch.
>
> Currently that code pretty much expects it’s talking to a well-behaved
> HTTP server. Changing it to gracefully deal with any kind of garbage
> might turn out to be quite an effort.

Hmm. I see. Maxime, perhaps an issue can be logged against Guile to at
least remember it exists, to tackle by anyone with the
interest/bandwidth. I think it'd be valuable in the long run.

Thank you!

Maxim
?