[Cuirass] Workers not waking up after server went away

  • Open
  • quality assurance status badge
Details
One participant
  • Ludovic Courtès
Owner
unassigned
Submitted by
Ludovic Courtès
Severity
normal
L
L
Ludovic Courtès wrote on 27 Nov 2023 14:28
(address . bug-guix@gnu.org)
875y1nxr3q.fsf@inria.fr
Hello,

The ‘cuirass remote-worker’ processes (1.2.0-1.bdc1f9f) didn’t wake up
after ‘cuirass remote-server’ stopped responding earlier today,
remaining stuck while waiting for a reply to their latest “request work”
message:

Toggle snippet (28 lines)
Nov 27 02:47:30 guixp9 cuirass[22122]: COhE8Mw6: derivation `/gnu/store/acljcvz7wb3pc9bxipkl1vf74ac7ns2z-calf-0.90.3.drv' build failed: build o
Nov 27 02:47:30 guixp9 cuirass[22122]: COhE8Mw6: request work.
Nov 27 02:47:30 guixp9 cuirass[22122]: HKCtyhxH: derivation `/gnu/store/z51fxy3j476136wcqd5gmy9v9r2vyqwn-csdr-0.18.2.drv' build failed: build o
Nov 27 02:47:30 guixp9 cuirass[22122]: HKCtyhxH: request work.
Nov 27 02:47:44 guixp9 cuirass[22122]: COhE8Mw6: ping tcp://10.0.0.1:5555.
Nov 27 02:47:44 guixp9 cuirass[22122]: HKCtyhxH: ping tcp://10.0.0.1:5555.
Nov 27 02:48:44 guixp9 cuirass[22122]: COhE8Mw6: ping tcp://10.0.0.1:5555.
Nov 27 02:48:44 guixp9 cuirass[22122]: HKCtyhxH: ping tcp://10.0.0.1:5555.
Nov 27 02:49:45 guixp9 cuirass[22122]: COhE8Mw6: ping tcp://10.0.0.1:5555.
Nov 27 02:49:45 guixp9 cuirass[22122]: HKCtyhxH: ping tcp://10.0.0.1:5555.
Nov 27 02:50:45 guixp9 cuirass[22122]: COhE8Mw6: ping tcp://10.0.0.1:5555.
Nov 27 02:50:45 guixp9 cuirass[22122]: HKCtyhxH: ping tcp://10.0.0.1:5555.
Nov 27 02:51:45 guixp9 cuirass[22122]: COhE8Mw6: ping tcp://10.0.0.1:5555.
Nov 27 02:51:45 guixp9 cuirass[22122]: HKCtyhxH: ping tcp://10.0.0.1:5555.
Nov 27 02:52:45 guixp9 cuirass[22122]: COhE8Mw6: ping tcp://10.0.0.1:5555.
Nov 27 02:52:45 guixp9 cuirass[22122]: HKCtyhxH: ping tcp://10.0.0.1:5555.
Nov 27 02:53:46 guixp9 cuirass[22122]: COhE8Mw6: ping tcp://10.0.0.1:5555.
Nov 27 02:53:46 guixp9 cuirass[22122]: HKCtyhxH: ping tcp://10.0.0.1:5555.
Nov 27 02:54:46 guixp9 cuirass[22122]: COhE8Mw6: ping tcp://10.0.0.1:5555.
Nov 27 02:54:46 guixp9 cuirass[22122]: HKCtyhxH: ping tcp://10.0.0.1:5555.
Nov 27 02:55:46 guixp9 cuirass[22122]: COhE8Mw6: ping tcp://10.0.0.1:5555.
Nov 27 02:55:46 guixp9 cuirass[22122]: HKCtyhxH: ping tcp://10.0.0.1:5555.
Nov 27 02:55:53 guixp9 cuirass[22122]: worker's alive
Nov 27 02:56:46 guixp9 cuirass[22122]: COhE8Mw6: ping tcp://10.0.0.1:5555.
Nov 27 02:56:46 guixp9 cuirass[22122]: HKCtyhxH: ping tcp://10.0.0.1:5555.
Nov 27 02:57:47 guixp9 cuirass[22122]: COhE8Mw6: ping tcp://10.0.0.1:5555.

They had to be manually restarted.

This shouldn’t be the case. Instead, they should say “received
bootstrap message” when the new ‘cuirass remote-server’ is spawned and
keep going.

Ludo’.
?
Your comment

Commenting via the web interface is currently disabled.

To comment on this conversation send an email to 67485@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 67485
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch