GSSH ERROR while sending store items with 'guix deploy'

  • Open
  • quality assurance status badge
Details
3 participants
  • Leo Famulari
  • Ludovic Courtès
  • Maxim Cournoyer
Owner
unassigned
Submitted by
Maxim Cournoyer
Severity
normal
M
M
Maxim Cournoyer wrote on 16 Mar 2023 03:02
(name . bug-guix)(address . bug-guix@gnu.org)
87bkktjvwz.fsf@gmail.com
Hi,

I'm trying to update one of my machines using 'guix deploy' and I keep
getting the following error:

Toggle snippet (45 lines)
guix deploy: sending 599 store items (3,794 MiB) to 'x200.local'...
20% ????????????????????? ?
27% ???????????????????????????? ?;;; [2023/03/15 21:57:28.007492, 0] [GSSH ERROR] Parent session is not connected: #<unknown channel (freed) 7fca183c31c0>
Backtrace:
In ice-9/boot-9.scm:
1752:10 19 (with-exception-handler _ _ #:unwind? _ # _)
In guix/scripts/deploy.scm:
166:6 18 (_)
In guix/store.scm:
2168:25 17 (run-with-store #<store-connection 256.99 7fca0e115d70> …)
In gnu/machine/ssh.scm:
528:8 16 (_ _)
530:30 15 (_ _)
In ice-9/boot-9.scm:
1752:10 14 (with-exception-handler _ _ #:unwind? _ # _)
In gnu/machine/ssh.scm:
530:30 13 (_)
In guix/store.scm:
2168:25 12 (run-with-store #<store-connection 256.99 7fca0c81c140> …)
In guix/remote.scm:
136:10 11 (_ _)
In guix/store.scm:
2040:38 10 (_ #<store-connection 256.99 7fca0c81c140>)
In guix/ssh.scm:
530:7 9 (send-files _ _ _ #:recursive? _ #:log-port _)
In guix/store.scm:
1764:12 8 (export-paths #<store-connection 256.99 7fca0c81c140> _ …)
1731:22 7 (export-path #<store-connection 256.99 7fca0c81c140> _ # …)
716:13 6 (process-stderr _ _)
In guix/build/utils.scm:
1014:11 5 (loop _ _)
In unknown file:
4 (put-bytevector #<unknown channel (freed) 7fca183c31c0> …)
In ice-9/boot-9.scm:
1685:16 3 (raise-exception _ #:continuable? _)
1683:16 2 (raise-exception _ #:continuable? _)
1685:16 1 (raise-exception _ #:continuable? _)
1685:16 0 (raise-exception _ #:continuable? _)

ice-9/boot-9.scm:1685:16: In procedure raise-exception:
Throw to key `guile-ssh-error' with args `("write_to_channel_port"
"Parent session is not connected" #<unknown channel (freed)
7fca183c31c0> #f)'.

It works for a while, so it'll eventually succeeds, but that seems a new
regression (it didn't use to do that).

--
Thanks,
Maxim
L
L
Ludovic Courtès wrote on 17 Mar 2023 22:15
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)(address . 62213@debbugs.gnu.org)
87ttyjw03l.fsf@gnu.org
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (6 lines)
> I'm trying to update one of my machines using 'guix deploy' and I keep
> getting the following error:
>
> guix deploy: sending 599 store items (3,794 MiB) to 'x200.local'...
> 20% ????????????????????? ?

[...]

Toggle quote (15 lines)
> 716:13 6 (process-stderr _ _)
> In guix/build/utils.scm:
> 1014:11 5 (loop _ _)
> In unknown file:
> 4 (put-bytevector #<unknown channel (freed) 7fca183c31c0> …)
> In ice-9/boot-9.scm:
> 1685:16 3 (raise-exception _ #:continuable? _)
> 1683:16 2 (raise-exception _ #:continuable? _)
> 1685:16 1 (raise-exception _ #:continuable? _)
> 1685:16 0 (raise-exception _ #:continuable? _)
>
> ice-9/boot-9.scm:1685:16: In procedure raise-exception:
> Throw to key `guile-ssh-error' with args `("write_to_channel_port"
> "Parent session is not connected" #<unknown channel (freed)

What version of guix-daemon is running on the machine that runs "guix
deploy"?

Do the sshd logs on x200.local show anything interesting at the moment
the connection is dropped?

Thanks,
Ludo'.
M
M
Maxim Cournoyer wrote on 20 Mar 2023 21:22
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 62213@debbugs.gnu.org)
87355z41ht.fsf@gmail.com
Hi Ludo!

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (30 lines)
> Hi,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> I'm trying to update one of my machines using 'guix deploy' and I keep
>> getting the following error:
>>
>> guix deploy: sending 599 store items (3,794 MiB) to 'x200.local'...
>> 20% ????????????????????? ?
>
> [...]
>
>> 716:13 6 (process-stderr _ _)
>> In guix/build/utils.scm:
>> 1014:11 5 (loop _ _)
>> In unknown file:
>> 4 (put-bytevector #<unknown channel (freed) 7fca183c31c0> …)
>> In ice-9/boot-9.scm:
>> 1685:16 3 (raise-exception _ #:continuable? _)
>> 1683:16 2 (raise-exception _ #:continuable? _)
>> 1685:16 1 (raise-exception _ #:continuable? _)
>> 1685:16 0 (raise-exception _ #:continuable? _)
>>
>> ice-9/boot-9.scm:1685:16: In procedure raise-exception:
>> Throw to key `guile-ssh-error' with args `("write_to_channel_port"
>> "Parent session is not connected" #<unknown channel (freed)
>
> What version of guix-daemon is running on the machine that runs "guix
> deploy"?

Two variants of 1.4.0-4.01fd830:

Toggle snippet (7 lines)
$ pgrep -a guix-daemon
6883 /gnu/store/izxlr8c7aml0ajqpblidkq014wyxcczz-guix-1.4.0-4.01fd830/bin/guix-daemon 6822 guixbuild --max-silent-time 0 --timeout 0 --log-compression none --discover=yes --substitute-urls http://127.0.0.1:8181 https://ci.guix.gnu.org https://bordeaux.guix.gnu.org --max-jobs=4
13519 /gnu/store/izxlr8c7aml0ajqpblidkq014wyxcczz-guix-1.4.0-4.01fd830/bin/guix-daemon 13504 guixbuild --max-silent-time 0 --timeout 0 --log-compression none --discover=yes --substitute-urls http://127.0.0.1:8181 https://ci.guix.gnu.org https://bordeaux.guix.gnu.org --max-jobs=4
27855 /gnu/store/izxlr8c7aml0ajqpblidkq014wyxcczz-guix-1.4.0-4.01fd830/bin/guix-daemon --build-users-group guixbuild --max-silent-time 0 --timeout 0 --log-compression none --discover=yes --substitute-urls http://127.0.0.1:8181 https://ci.guix.gnu.org https://bordeaux.guix.gnu.org --max-jobs=4
28995 /gnu/store/izxlr8c7aml0ajqpblidkq014wyxcczz-guix-1.4.0-4.01fd830/bin/guix-daemon 28982 guixbuild --max-silent-time 0 --timeout 0 --log-compression none --discover=yes --substitute-urls http://127.0.0.1:8181 https://ci.guix.gnu.org https://bordeaux.guix.gnu.org --max-jobs=4

Toggle quote (3 lines)
> Do the sshd logs on x200.local show anything interesting at the moment
> the connection is dropped?

In my logs, at around the time I was observing these problems, I see a
couple instances of:

Toggle snippet (17 lines)
Mar 15 22:03:58 localhost shepherd[1]: Service sshd-7 has been started.
Mar 15 22:03:59 localhost sshd[24950]: Accepted publickey for mcournoyer from 192.168.10.15 port 40696 ssh2: RSA SHA256:bNjGrjJ6+adVo155SohxuXbuxeJB+w/+vQzhdFs3vUg
Mar 15 22:07:57 localhost sshd[24952]: ssh_dispatch_run_fatal: Connection from user mcournoyer 192.168.10.15 port 40696: invalid format
Mar 15 22:07:57 localhost shepherd[1]: 0 connections still in use after sshd-7 termination.
Mar 15 22:07:57 localhost shepherd[1]: Service sshd-7 (PID 24950) exited with 255.
Mar 15 22:07:57 localhost shepherd[1]: Service sshd-7 has been disabled.
Mar 15 22:07:57 localhost shepherd[1]: Transient service sshd-7 terminated, now unregistered.
Mar 15 22:14:41 localhost shepherd[1]: Service sshd-8 has been started.
Mar 15 22:14:42 localhost sshd[25142]: Accepted publickey for mcournoyer from 192.168.10.15 port 45596 ssh2: RSA SHA256:bNjGrjJ6+adVo155SohxuXbuxeJB+w/+vQzhdFs3vUg
Mar 15 22:20:02 localhost shepherd[1]: 0 connections still in use after sshd-8 termination.
Mar 15 22:20:02 localhost shepherd[1]: Service sshd-8 (PID 25142) exited with 255.
Mar 15 22:20:02 localhost shepherd[1]: Service sshd-8 has been disabled.
Mar 15 22:20:02 localhost shepherd[1]: Transient service sshd-8 terminated, now unregistered.
Mar 15 22:24:19 localhost shepherd[1]: Service sshd-9 has been started.
Mar 15 22:24:20 localhost sshd[25463]: Accepted publickey for mcournoyer from 192.168.10.15 port 41702 ssh2: RSA SHA256:bNjGrjJ6+adVo155SohxuXbuxeJB+w/+vQzhdFs3vUg

Where the 'ssh_dispatch_run_fatal ... invalid format' stands out to me.

I'll report with more if I can reproduce the issue again.

--
Thanks,
Maxim
L
L
Leo Famulari wrote on 18 May 2023 20:24
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)(address . 62213@debbugs.gnu.org)
ZGZtePnFM6dXMDoP@jasmine.lan
On Wed, Mar 15, 2023 at 10:02:20PM -0400, Maxim Cournoyer wrote:
Toggle quote (6 lines)
> ice-9/boot-9.scm:1685:16: In procedure raise-exception:
> Throw to key `guile-ssh-error' with args `("write_to_channel_port"
> "Parent session is not connected" #<unknown channel (freed)
> 7fca183c31c0> #f)'.
> --8<---------------cut here---------------end--------------->8---

I'm seeing this too with offloading.
?