Guix deploy cannot reboot remote machines without an error

  • Open
  • quality assurance status badge
Details
2 participants
  • Gabriel Wicki
  • Richard Sent
Owner
unassigned
Submitted by
Richard Sent
Severity
normal
R
R
Richard Sent wrote on 3 May 2024 23:57
(address . bug-guix@gnu.org)
87wmoapnhp.fsf@freakingpenguin.com
Hi Guix,

One neat feature of guix deploy is the ability to run a command on a
list of remote machines. One command that would commonly be run is
reboot, so that upgrades to, say, the Linux kernel take effect.

While the command itself /does/ run and the system /does/ restart, guix
deploy doesn't gracefully handle the connection loss. The first machine
rebooted will throw an error, halting the reboot of the rest of the
machines in the list.

tmux and screen aren't really compatible with '-x -- <command>' since
you can't start and detach from sessions and the & in '-x -- nohup
<command> &' gets swallowed by the host shell. I haven't found a
workaround. Even if they did work, I suspect there is a race condition
between "host closes session" and "remote restarts".

We shouldn't assume that any command may potentially close the SSH
session and catch errors by default.

One solution could be adding an alternative to -x that nohup's the
command and attempts to cleanly close the SSH session. If the session
errors out after the command is nohup'd (e.g. reboot race condition),
catch the SSH error and exit.

Alternatively we could add a --reboot flag, although I prefer the more
general solution.

Perhaps this can be the impetus for implementing the "deploy-hook"
functionality described at https://issues.guix.gnu.org/53486.In a
particularly fancy world, we could combine rebooting with pre and post
reboot command execution, but now I'm thinking of pies ? in skies ?.

Or maybe I'm completely wrong and this is possible (sorry!), in which
case we probably could add a quick mention of it in the manual.

Toggle snippet (40 lines)
gibraltar :( rsent$ guix deploy rsent/machines/lan.scm --no-grafts -x -- reboot
guix deploy: warning: <machine-ssh-configuration> without a 'host-key' is deprecated
guix deploy: sending 1 store item (0 MiB) to 'horizon.local'...
;;; [2024/05/03 17:16:45.032361, 0] [GSSH ERROR] Parent session is not connected: #<unknown channel (freed) 7fbe66aa71a0>
Backtrace:
16 (primitive-load "/home/richard/.config/guix/current/bin/guix")
In guix/ui.scm:
2312:7 15 (run-guix . _)
2275:10 14 (run-guix-command _ . _)
In ice-9/boot-9.scm:
1752:10 13 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _)
In guix/status.scm:
839:4 12 (call-with-status-report _ _)
In ice-9/boot-9.scm:
1752:10 11 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _)
In guix/store.scm:
666:37 10 (thunk)
1302:8 9 (call-with-build-handler _ _)
1302:8 8 (call-with-build-handler #<procedure 7fbe69be4690 at guix/ui.scm:1222:2 (continue store things mode)> _)
In guix/scripts/deploy.scm:
274:23 7 (_)
In srfi/srfi-1.scm:
460:18 6 (fold #<procedure 7fbe78c25540 at guix/scripts/deploy.scm:274:28 (machine result)> #t (#<<machine> operating-system: #<<operating-system> ke…>))
In guix/scripts/deploy.scm:
214:2 5 (_ #<<machine> operating-system: #<<operating-system> kernel: #<package linux@6.8.8 nongnu/packages/linux.scm:118 7fbe6343c0b0> kernel-loada…> …)
In guix/store.scm:
2182:25 4 (run-with-store #<store-connection 256.100 7fbe78cf8960> #<procedure 7fbe66b423c0 at guix/remote.scm:119:2 (state)> #:guile-for-build _ # _ # _)
In guix/remote.scm:
72:20 3 (_ _)
In unknown file:
2 (channel-get-exit-status #<unknown channel (freed) 7fbe66aa71a0>)
In ice-9/boot-9.scm:
1685:16 1 (raise-exception _ #:continuable? _)
1685:16 0 (raise-exception _ #:continuable? _)

ice-9/boot-9.scm:1685:16: In procedure raise-exception:
Throw to key `guile-ssh-error' with args `("channel-get-exit-status" "Parent session is not connected" #<unknown channel (freed) 7fbe66aa71a0> #f)'.


--
Take it easy,
Richard Sent
Making my computer weirder one commit at a time.
G
G
Gabriel Wicki wrote 6 days ago
deploy and reboot
(name . Richard Sent)(address . richard@freakingpenguin.com)(address . 70761@debbugs.gnu.org)
871pwdu0d9.fsf@erlikon.ch
Hi Richard!

I just stumbled over the same issue and while i am not really sure what
to think about the screen/tmux/SIGHUP proposal (does this apply to all
deploy commands or just a more fancy/sophisticated usage scenario?) i'd
go with the special --reboot flag to the deploy command that

1. causes a reboot and then

2. waits for the machine(s) to come back up.

I am not sure whether this is possible already, but will happily dive in
a little further (and prepare a patch if circumstances allow).

Have a nice week,
gabber
?
Your comment

Commenting via the web interface is currently disabled.

To comment on this conversation send an email to 70761@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 70761
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch