'guix substitute' and 'guix pull' fail gracelessly on flaky networks

  • Open
  • quality assurance status badge
Details
5 participants
  • Bengt Richter
  • Ludovic Courtès
  • Brendan Tildesley
  • Quinten Gruenthal
  • zimoun
Owner
unassigned
Submitted by
Quinten Gruenthal
Severity
normal
Q
Q
Quinten Gruenthal wrote on 15 Jun 2020 18:30
Handshake Error
(address . bug-guix@gnu.org)
CAKg3DNtHr7f8PCaaa8xJDT_gUn0V+vafqiovycBX264AwOHKbQ@mail.gmail.com
I received the following error when performing a guix pull:

|substitute: guix substitute: error: TLS error in procedure 'handshake':
Error in the pull function.
killing process 8828
Backtrace:
11 (primitive-load
"/gnu/store/pl48b057h6yg8w6f7hafiilcc44d0fn6-compute-guix-derivation")
In ice-9/eval.scm:
155:9 10 (_ _)
159:9 9 (_ #(#(#(#(#(#(#(#(#(#(#(#(#(#<directory (guile-user)
7fc5ff085f?> ?) ?) ?) ?) ?) ?) ?) ?) ?) ?) ?) ?) ?))
In ./guix/store.scm:
2025:24 8 (run-with-store #<store-connection 256.99 7fc5fd407640> _
#:guile-for-build _ #:system _ #:target _)
1859:8 7 (_ _)
In ./guix/gexp.scm:
243:18 6 (_ _)
1061:2 5 (_ _)
921:2 4 (_ _)
782:4 3 (_ _)
In ./guix/store.scm:
1907:12 2 (_ #<store-connection 256.99 7fc5fd407640>)
1356:5 1 (map/accumulate-builds #<store-connection 256.99 7fc5fd407640>
_ _)
1367:15 0 (_ #<store-connection 256.99 7fc5fd407640> _ _)

./guix/store.scm:1367:15: ERROR:
1. &store-protocol-error:
message: "`/gnu/store/5r3sb6bj6pppn4h35a35956mv5qrd011-guix-command
substitute' died unexpectedly"
status: 1
guix pull: error: You found a bug: the program
'/gnu/store/pl48b057h6yg8w6f7hafiilcc44d0fn6-compute-guix-derivation'
failed to compute the derivation for Guix (version:
"cf48f0fc4c40a2ec0b38a445e1e13f37722a0ade"; system: "x86_64-linux";
host version: "ecf92194a55188a9c217d76617378749db063453"; pull-version: 1).
Please report it by email to <bug-guix@gnu.org>.

I've seen this kind of error before in other software and I believe it
occurs when a handshake can't be performed in time and the connection is
dropped; I was able to successfully pull a few minutes later under better
conditions. That being said I bring this up because a problem in my network
(I think) is being treated like an error in your program.
Attachment: file
L
L
Ludovic Courtès wrote on 15 Jun 2020 21:42
(name . Quinten Gruenthal)(address . quintengruenthal@gmail.com)(address . 41878@debbugs.gnu.org)
87y2ooyv2h.fsf@gnu.org
Hi,

Quinten Gruenthal <quintengruenthal@gmail.com> skribis:

Toggle quote (3 lines)
> |substitute: guix substitute: error: TLS error in procedure 'handshake':
> Error in the pull function.

[...]

Toggle quote (6 lines)
> I've seen this kind of error before in other software and I believe it
> occurs when a handshake can't be performed in time and the connection is
> dropped; I was able to successfully pull a few minutes later under better
> conditions. That being said I bring this up because a problem in my network
> (I think) is being treated like an error in your program.

Yes, it looks like a connectivity issue, either on your side or on the
side of https://ci.guix.gnu.org(though I’m unaware of downtime today).

Could you try again?

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 16 Jun 2020 10:07
(name . Quinten Gruenthal)(address . quintengruenthal@gmail.com)(address . 41878@debbugs.gnu.org)
87a713xwko.fsf@gnu.org
Hi,

Quinten Gruenthal <quintengruenthal@gmail.com> skribis:

Toggle quote (12 lines)
> Sure; though, as I mentioned before I did try again and it did work. Doing
> a pull once more I failed with:
>
> guix substitute: error: TLS error in procedure 'handshake': Error in the
> pull function.
> substitution of
> /gnu/store/k1da7bv579d8gvwdrakd9l4hxswknff2-guix-module-union failed
> guix pull: error: some substitutes for the outputs of derivation
> `/gnu/store/k2gdq7ahjaqmdgjr9xwywqpxvzazn467-guix-e07573432.drv' failed
> (usually happens due to networking issues); try `--fallback' to build
> derivation from source

Could it be that you’re on a flaky network connection?

Toggle quote (4 lines)
> As with before, I was also able to succeed on a subsequent pull; but this
> error seemed to be handled better as it suggested both the cause of the
> problem and a possible solution.

If the connection is dropped, there’s little we can do, but maybe you’re
suggesting better error reporting?

Thanks,
Ludo’.

PS: Please keep the bug Cc’d.
B
B
Bengt Richter wrote on 16 Jun 2020 11:31
(name . Ludovic Courtès)(address . ludo@gnu.org)
20200616093112.GA5965@LionPure
Hi,

On +2020-06-16 10:07:51 +0200, Ludovic Courtès wrote:
Toggle quote (25 lines)
> Hi,
>
> Quinten Gruenthal <quintengruenthal@gmail.com> skribis:
>
> > Sure; though, as I mentioned before I did try again and it did work. Doing
> > a pull once more I failed with:
> >
> > guix substitute: error: TLS error in procedure 'handshake': Error in the
> > pull function.
> > substitution of
> > /gnu/store/k1da7bv579d8gvwdrakd9l4hxswknff2-guix-module-union failed
> > guix pull: error: some substitutes for the outputs of derivation
> > `/gnu/store/k2gdq7ahjaqmdgjr9xwywqpxvzazn467-guix-e07573432.drv' failed
> > (usually happens due to networking issues); try `--fallback' to build
> > derivation from source
>
> Could it be that you’re on a flaky network connection?
>
> > As with before, I was also able to succeed on a subsequent pull; but this
> > error seemed to be handled better as it suggested both the cause of the
> > problem and a possible solution.
>
> If the connection is dropped, there’s little we can do, but maybe you’re
> suggesting better error reporting?

Hm, are gexp's checkpointable?

Some ftp can reconnect and continue, IUUC. Idk how that works with https.
Can one adjust timeouts? Continue on an alternate mirror?
E.g. for downloading substitutes?

In general, I hate it when I get 1.0GB through a 1.1GB ISO and lose it all,
esp when I pay by GBs. Can guix help? :)

BTW, I think I have heard of security risks in restarting, but perhaps
with end-to-end integrity checks it is not an issue despite poss MM trying?

Toggle quote (9 lines)
>
> Thanks,
> Ludo’.
>
> PS: Please keep the bug Cc’d.
>
>
>

--
Regards,
Bengt Richter
Q
Q
Quinten Gruenthal wrote on 15 Jun 2020 23:55
(name . Ludovic Courtès)(address . ludo@gnu.org)
CAKg3DNtvP4xzdwmj_HfJUy3L5=QZy1eXynr4uxyzHPoovZ0_-Q@mail.gmail.com
Sure; though, as I mentioned before I did try again and it did work. Doing
a pull once more I failed with:

guix substitute: error: TLS error in procedure 'handshake': Error in the
pull function.
substitution of
/gnu/store/k1da7bv579d8gvwdrakd9l4hxswknff2-guix-module-union failed
guix pull: error: some substitutes for the outputs of derivation
`/gnu/store/k2gdq7ahjaqmdgjr9xwywqpxvzazn467-guix-e07573432.drv' failed
(usually happens due to networking issues); try `--fallback' to build
derivation from source

As with before, I was also able to succeed on a subsequent pull; but this
error seemed to be handled better as it suggested both the cause of the
problem and a possible solution.

On Mon, Jun 15, 2020 at 12:42 PM Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (24 lines)
> Hi,
>
> Quinten Gruenthal <quintengruenthal@gmail.com> skribis:
>
> > |substitute: guix substitute: error: TLS error in procedure 'handshake':
> > Error in the pull function.
>
> [...]
>
> > I've seen this kind of error before in other software and I believe it
> > occurs when a handshake can't be performed in time and the connection is
> > dropped; I was able to successfully pull a few minutes later under better
> > conditions. That being said I bring this up because a problem in my
> network
> > (I think) is being treated like an error in your program.
>
> Yes, it looks like a connectivity issue, either on your side or on the
> side of https://ci.guix.gnu.org (though I’m unaware of downtime today).
>
> Could you try again?
>
> Thanks,
> Ludo’.
>
Attachment: file
Q
Q
Quinten Gruenthal wrote on 16 Jun 2020 16:57
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 41878@debbugs.gnu.org)
CAKg3DNufG9cOpkoatZY162FjifCDBO--B7q5T4j-=BJy-oJWNw@mail.gmail.com
Yes and yes. I didn't state explicitly that it was a problem with my
network only because I didn't want to appear overconfident in my amateur
diagnosis and I definitely prefer the second error message to the one that
dumps a trace and prompts the filing of a bug. This:

substitute: guix substitute: error: TLS error in procedure 'handshake':
Error in the pull function.
killing process 8828
Backtrace:
11 (primitive-load
"/gnu/store/pl48b057h6yg8w6f7hafiilcc44d0fn6-compute-guix-derivation")
In ice-9/eval.scm:
155:9 10 (_ _)
159:9 9 (_ #(#(#(#(#(#(#(#(#(#(#(#(#(#<directory (guile-user)
7fc5ff085f?> ?) ?) ?) ?) ?) ?) ?) ?) ?) ?) ?) ?) ?))
In ./guix/store.scm:
2025:24 8 (run-with-store #<store-connection 256.99 7fc5fd407640> _
#:guile-for-build _ #:system _ #:target _)
1859:8 7 (_ _)
In ./guix/gexp.scm:
243:18 6 (_ _)
1061:2 5 (_ _)
921:2 4 (_ _)
782:4 3 (_ _)
In ./guix/store.scm:
1907:12 2 (_ #<store-connection 256.99 7fc5fd407640>)
1356:5 1 (map/accumulate-builds #<store-connection 256.99 7fc5fd407640>
_ _)
1367:15 0 (_ #<store-connection 256.99 7fc5fd407640> _ _)

./guix/store.scm:1367:15: ERROR:
1. &store-protocol-error:
message: "`/gnu/store/5r3sb6bj6pppn4h35a35956mv5qrd011-guix-command
substitute' died unexpectedly"
status: 1
guix pull: error: You found a bug: the program
'/gnu/store/pl48b057h6yg8w6f7hafiilcc44d0fn6-compute-guix-derivation'
failed to compute the derivation for Guix (version:
"cf48f0fc4c40a2ec0b38a445e1e13f37722a0ade"; system: "x86_64-linux";
host version: "ecf92194a55188a9c217d76617378749db063453"; pull-version: 1).
Please report it by email to <bug-guix@gnu.org>.

is much different than this:

guix substitute: error: TLS error in procedure 'handshake': Error in the
pull function.
substitution of
/gnu/store/k1da7bv579d8gvwdrakd9l4hxswknff2-guix-module-union failed
guix pull: error: some substitutes for the outputs of derivation
`/gnu/store/k2gdq7ahjaqmdgjr9xwywqpxvzazn467-guix-e07573432.drv' failed
(usually happens due to networking issues); try `--fallback' to build
derivation from source

especially to one as new to scheme as myself. I can't say why the two
errors were different as I'm not familiar mechanically with what phases
pull goes through but I can say the first error occurred during the
beginning of the pull and the second occurred in the wrap up thing it does
towards the end.


On Tue, Jun 16, 2020 at 1:07 AM Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (31 lines)
> Hi,
>
> Quinten Gruenthal <quintengruenthal@gmail.com> skribis:
>
> > Sure; though, as I mentioned before I did try again and it did work.
> Doing
> > a pull once more I failed with:
> >
> > guix substitute: error: TLS error in procedure 'handshake': Error in the
> > pull function.
> > substitution of
> > /gnu/store/k1da7bv579d8gvwdrakd9l4hxswknff2-guix-module-union failed
> > guix pull: error: some substitutes for the outputs of derivation
> > `/gnu/store/k2gdq7ahjaqmdgjr9xwywqpxvzazn467-guix-e07573432.drv' failed
> > (usually happens due to networking issues); try `--fallback' to build
> > derivation from source
>
> Could it be that you’re on a flaky network connection?
>
> > As with before, I was also able to succeed on a subsequent pull; but this
> > error seemed to be handled better as it suggested both the cause of the
> > problem and a possible solution.
>
> If the connection is dropped, there’s little we can do, but maybe you’re
> suggesting better error reporting?
>
> Thanks,
> Ludo’.
>
> PS: Please keep the bug Cc’d.
>
Attachment: file
L
L
Ludovic Courtès wrote on 17 Jun 2020 14:03
(name . Quinten Gruenthal)(address . quintengruenthal@gmail.com)(address . 41878@debbugs.gnu.org)
87v9jprjb7.fsf@gnu.org
Hi,

Quinten Gruenthal <quintengruenthal@gmail.com> skribis:

Toggle quote (8 lines)
> Yes and yes. I didn't state explicitly that it was a problem with my
> network only because I didn't want to appear overconfident in my amateur
> diagnosis and I definitely prefer the second error message to the one that
> dumps a trace and prompts the filing of a bug. This:
>
> substitute: guix substitute: error: TLS error in procedure 'handshake':
> Error in the pull function.

I see. I’ll take a look and see how we can improve on this.

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 17 Jun 2020 14:03
control message for bug #41878
(address . control@debbugs.gnu.org)
87tuz9rj9y.fsf@gnu.org
retitle 41878 'guix substitute' and 'guix pull' fail gracelessly on flaky networks
quit
B
B
Brendan Tildesley wrote on 30 Aug 2020 12:06
Re: 'guix substitute' and 'guix pull' fail gracelessly on flaky networks
(address . 41878@debbugs.gnu.org)
2486f02b-b36f-7747-4eb0-b43d9215dd9f@brendan.scot
I have not looked closely but from observation I think currently guix
first decides if it is going to commit to using a substitute, or falling
back to building locally, by checking if substitutes are available then
committing to a method. This differs from the concept of a fallback in
my head, which would involve trying option B only after option A has
been tried and failed. guix's way means there are a class of failures
where guix simply gives up and stops instead of falling back.

In my experience, probably 10% of the time I try a guix pull; guix
package -u ., there is some weird network error that doesn't happen the
second time I run it. Perhaps it would be sufficient to simply try twice
for every substitute; accumulate a list of failed substitutes and retry
them after iterating through the list of substitutes to download, then
if that fails try building from source. only then are we allowed to give up.
Z
Z
zimoun wrote on 11 Sep 2020 16:17
Re: bug#41878: 'guix substitute' and 'guix pull' fail gracelessly on flaky networks
(name . Brendan Tildesley)(address . mail@brendan.scot)(address . 41878@debbugs.gnu.org)
871rj8gzz2.fsf@gmail.com
Dear,

On Sun, 30 Aug 2020 at 20:06, Brendan Tildesley <mail@brendan.scot> wrote:
Toggle quote (8 lines)
> I have not looked closely but from observation I think currently guix first
> decides if it is going to commit to using a substitute, or falling back to
> building locally, by checking if substitutes are available then committing to
> a method. This differs from the concept of a fallback in my head, which would
> involve trying option B only after option A has been tried and failed. guix's
> way means there are a class of failures where guix simply gives up and stops
> instead of falling back.

What do you mean?

Toggle quote (7 lines)
> In my experience, probably 10% of the time I try a guix pull; guix package -u
> ., there is some weird network error that doesn't happen the second time I run
> it. Perhaps it would be sufficient to simply try twice for every substitute;
> accumulate a list of failed substitutes and retry them after iterating through
> the list of substitutes to download, then if that fails try building from
> source. only then are we allowed to give up.

It rings a bell to me. Something about the configuration of Cuirass and
the build farm serving the substitutes; related to caching. But I am
not able to find the relevant pointer.


All the best,
simon
?