guile ssh error on guix deploy

  • Done
  • quality assurance status badge
Details
4 participants
  • Andrew Tropin
  • Ludovic Courtès
  • Marius Bakke
  • Artyom V. Poptsov
Owner
unassigned
Submitted by
Andrew Tropin
Severity
important
Merged with
A
A
Andrew Tropin wrote on 4 Oct 2022 13:09
(address . bug-guix@gnu.org)
87sfk34zhj.fsf@trop.in
From time to time I get the following error, the only thing I change is
IPv6 config in static-networking service. Sometimes it fails, but after
a few more attempts with the same config it deploys sucessfully.

Toggle snippet (87 lines)
-*- mode: compilation; default-directory: "~/work/abcdw/trop.in/" -*-
Compilation started at Tue Oct 4 14:19:46

make -k deploy-pinky
guix deploy ./guix/pinky.scm
The following 1 machine will be deployed:
pinky

guix deploy: deploying to pinky...
guix deploy: sending 0 store items (0 MiB) to '23.137.249.202'...
guix deploy: sending 0 store items (0 MiB) to '23.137.249.202'...
substitute: updating substitutes from 'https://substitutes.nonguix.org'... 100.0%
substitute: updating substitutes from 'https://bordeaux.guix.gnu.org'... 100.0%
substitute: updating substitutes from 'http://ci.guix.trop.in'... 100.0%
The following derivations will be built:
/gnu/store/yzilvx4jr4s7174f6azxxbb5c24311xc-remote-exp.scm.drv
/gnu/store/sqlbbg5z19h2ww86p8nzhfbnidsm5sag-switch-to-system.scm.drv
/gnu/store/ayssizgz8i4mnk6p2yhilqgzwim4ql7j-system.drv
/gnu/store/9fxf3gcgi07hg5nmhjvnvlhk6829h304-provenance.drv
/gnu/store/djs7vabv5wr3inz3pva87wfw7ya10ajq-boot.drv
/gnu/store/smmzsd1b9v4qmncvl0pl6f0nrasp39ks-activate.scm.drv
/gnu/store/5hpm7v0xnqqilhj7qdg66cjkf8dvkimx-activate-service.scm.drv
/gnu/store/g62yd6a8wd081w0y4jji83dbcxhiks3i-etc.drv

building /gnu/store/9fxf3gcgi07hg5nmhjvnvlhk6829h304-provenance.drv...
building /gnu/store/g62yd6a8wd081w0y4jji83dbcxhiks3i-etc.drv...
building /gnu/store/5hpm7v0xnqqilhj7qdg66cjkf8dvkimx-activate-service.scm.drv...
building /gnu/store/smmzsd1b9v4qmncvl0pl6f0nrasp39ks-activate.scm.drv...
building /gnu/store/djs7vabv5wr3inz3pva87wfw7ya10ajq-boot.drv...
building /gnu/store/ayssizgz8i4mnk6p2yhilqgzwim4ql7j-system.drv...
building /gnu/store/sqlbbg5z19h2ww86p8nzhfbnidsm5sag-switch-to-system.scm.drv...
building /gnu/store/yzilvx4jr4s7174f6azxxbb5c24311xc-remote-exp.scm.drv...
guix deploy: sending 10 store items (0 MiB) to '23.137.249.202'...
guix deploy: sending 0 store items (0 MiB) to '23.137.249.202'...
guix deploy: sending 0 store items (0 MiB) to '23.137.249.202'...
The following derivations will be built:
/gnu/store/36rmj2xp0sd7lx0ndv8imkyxrh2lmmdy-remote-exp.scm.drv
/gnu/store/823hs0g4wxdmfwmj6nn8smcnrp45a1g7-install-bootloader.scm.drv
/gnu/store/0m7f6gg0d0iq6r99p40rw3m65z52fjiy-grub.cfg.drv

building /gnu/store/0m7f6gg0d0iq6r99p40rw3m65z52fjiy-grub.cfg.drv...
building /gnu/store/823hs0g4wxdmfwmj6nn8smcnrp45a1g7-install-bootloader.scm.drv...
building /gnu/store/36rmj2xp0sd7lx0ndv8imkyxrh2lmmdy-remote-exp.scm.drv...
;;; [2022/10/04 14:20:23.191118, 0] [GSSH ERROR] Channel opening failure: channel 67 error (2) open failed: #<input-output: channel (closed) 7f7d1af9e140>
Backtrace:
In guix/store.scm:
1402:13 19 (map/accumulate-builds #<store-connection 256.99 7f7d1…> …)
1377:11 18 (map/accumulate-builds #<store-connection 256.99 7f7d1…> …)
1295:8 17 (call-with-build-handler #<procedure 7f7d15629ea0 at g…> …)
In ice-9/boot-9.scm:
1752:10 16 (with-exception-handler _ _ #:unwind? _ # _)
In guix/scripts/deploy.scm:
168:6 15 (_)
In guix/store.scm:
2165:25 14 (run-with-store #<store-connection 256.99 7f7d143b0690> …)
In gnu/machine/ssh.scm:
506:32 13 (_ _)
In ice-9/boot-9.scm:
1752:10 12 (with-exception-handler _ _ #:unwind? _ # _)
In gnu/machine/ssh.scm:
506:32 11 (_)
In guix/store.scm:
2165:25 10 (run-with-store #<store-connection 256.99 7f7d1207b3c0> …)
In guix/remote.scm:
138:10 9 (_ _)
In guix/store.scm:
2037:38 8 (_ #<store-connection 256.99 7f7d1207b3c0>)
In guix/ssh.scm:
376:2 7 (send-files #<store-connection 256.99 7f7d1207b3c0> _ # …)
222:5 6 (remote-run (begin (use-modules (guix) (srfi #) # #) …) #)
In ssh/popen.scm:
64:4 5 (open-remote-pipe* _ "r+" _ . _)
In unknown file:
4 (channel-open-session #<input-output: channel (closed) …>)
In ice-9/boot-9.scm:
1685:16 3 (raise-exception _ #:continuable? _)
1683:16 2 (raise-exception _ #:continuable? _)
1685:16 1 (raise-exception _ #:continuable? _)
1685:16 0 (raise-exception _ #:continuable? _)

ice-9/boot-9.scm:1685:16: In procedure raise-exception:
Throw to key `guile-ssh-error' with args `("channel-open-session" "Channel opening failure: channel 67 error (2) open failed" #<input-output: channel (closed) 7f7d1af9e140> #f)'.
make: *** [Makefile:28: deploy-pinky] Error 1

Compilation exited abnormally with code 2 at Tue Oct 4 14:20:23

--
Best regards,
Andrew Tropin
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEKEGaxlA4dEDH6S/6IgjSCVjB3rAFAmM8FFkACgkQIgjSCVjB
3rCMmg/7BujBW2WmTY5VAIk1cCDgo4K17H7E5P3fqGmMKNMTUXsZys1fjcXKnAEt
4Hh/Uyize0bpXXwRIRHHGdeLVXyp5KpF7AhKXYZOS9CawFVwhL8/9MdDIX+FM+FE
mOKVgO5Y3qN4Onn7jHWzetayz8A8uKQjcZCWzhXPcXy/eBCoeQFof8TNZP5tbxpp
7VV+zD68WRx2Lg6OlvhpeZHf5q9msoMTJKj7pFIAOs0SrDn55/t1kSYcnYhPjUBx
Oo8RoG+UsCGC6beJGgfLY4bMe9cl6fV1VafoZgVDSdzDu4DDsAuRMINyUwCQl+7m
HV6HUYCyLu1MyM+fGlbdGY4f7tR9rvNWDzWNZeu6Yw02JjsfmR401BQpQ8uFzup2
dSLPMrTycEGALAfyywupW8+H6iDT2rYfNJ7ZGwkHWBN6MtBhUtLyU2fdbFX0YYqt
Q16CEWGcm+zIfO36tB/4/Bvma2mfEfZYswpyC5ylXOfPNX181k9r152VEhRfAFaJ
BVXisk9G82F8VheUduUEFJSOGbegulMBtEYFYrXCqLGEZKSL1Qq42pkGZjZsFaL2
/ZsgMiROY+SVnT2qAEldBxxrD32yvqudPcKYFhyKUTpMaCaHRt7eRpnHNfsHZWdv
HRLUNA7vP/Vyqunuvo/WQQ63TUjjidYDP8WCxoLpe4A53HC76Pw=
=2ZlB
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 8 Oct 2022 14:39
(name . Andrew Tropin)(address . andrew@trop.in)(address . 58290@debbugs.gnu.org)
87bkqm8p5z.fsf@gnu.org
Hi!

Andrew Tropin <andrew@trop.in> skribis:

Toggle quote (10 lines)
> From time to time I get the following error, the only thing I change is
> IPv6 config in static-networking service. Sometimes it fails, but after
> a few more attempts with the same config it deploys sucessfully.
>
> -*- mode: compilation; default-directory: "~/work/abcdw/trop.in/" -*-
> Compilation started at Tue Oct 4 14:19:46
>
> make -k deploy-pinky
> guix deploy ./guix/pinky.scm

[...]

Toggle quote (16 lines)
> In guix/ssh.scm:
> 376:2 7 (send-files #<store-connection 256.99 7f7d1207b3c0> _ # …)
> 222:5 6 (remote-run (begin (use-modules (guix) (srfi #) # #) …) #)
> In ssh/popen.scm:
> 64:4 5 (open-remote-pipe* _ "r+" _ . _)
> In unknown file:
> 4 (channel-open-session #<input-output: channel (closed) …>)
> In ice-9/boot-9.scm:
> 1685:16 3 (raise-exception _ #:continuable? _)
> 1683:16 2 (raise-exception _ #:continuable? _)
> 1685:16 1 (raise-exception _ #:continuable? _)
> 1685:16 0 (raise-exception _ #:continuable? _)
>
> ice-9/boot-9.scm:1685:16: In procedure raise-exception:
> Throw to key `guile-ssh-error' with args `("channel-open-session" "Channel opening failure: channel 67 error (2) open failed" #<input-output: channel (closed) 7f7d1af9e140> #f)'.

Are there any hints from sshd in the /var/log/messages of that machine
as to why the connection was closed?

Thanks,
Ludo’.
A
A
Andrew Tropin wrote on 14 Oct 2022 08:45
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 58290@debbugs.gnu.org)
87o7uec39j.fsf@trop.in
On 2022-10-08 14:39, Ludovic Courtès wrote:

Toggle quote (35 lines)
> Hi!
>
> Andrew Tropin <andrew@trop.in> skribis:
>
>> From time to time I get the following error, the only thing I change is
>> IPv6 config in static-networking service. Sometimes it fails, but after
>> a few more attempts with the same config it deploys sucessfully.
>>
>> -*- mode: compilation; default-directory: "~/work/abcdw/trop.in/" -*-
>> Compilation started at Tue Oct 4 14:19:46
>>
>> make -k deploy-pinky
>> guix deploy ./guix/pinky.scm
>
> [...]
>
>> In guix/ssh.scm:
>> 376:2 7 (send-files #<store-connection 256.99 7f7d1207b3c0> _ # …)
>> 222:5 6 (remote-run (begin (use-modules (guix) (srfi #) # #) …) #)
>> In ssh/popen.scm:
>> 64:4 5 (open-remote-pipe* _ "r+" _ . _)
>> In unknown file:
>> 4 (channel-open-session #<input-output: channel (closed) …>)
>> In ice-9/boot-9.scm:
>> 1685:16 3 (raise-exception _ #:continuable? _)
>> 1683:16 2 (raise-exception _ #:continuable? _)
>> 1685:16 1 (raise-exception _ #:continuable? _)
>> 1685:16 0 (raise-exception _ #:continuable? _)
>>
>> ice-9/boot-9.scm:1685:16: In procedure raise-exception:
>> Throw to key `guile-ssh-error' with args `("channel-open-session" "Channel opening failure: channel 67 error (2) open failed" #<input-output: channel (closed) 7f7d1af9e140> #f)'.
>
> Are there any hints from sshd in the /var/log/messages of that machine
> as to why the connection was closed?

Toggle snippet (189 lines)
bob@pinky /var/log$ zcat messages.1.gz | grep "Oct 4" | grep ssh
Oct 4 05:57:09 localhost shepherd[1]: Service sshd-380 has been started.
Oct 4 05:57:11 localhost sshd[15950]: Accepted publickey for bob from 201:2a73:cac3:e128:cf3d:8dc:8afa:df66 port 54710 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 06:34:17 localhost shepherd[1]: Service sshd-381 has been started.
Oct 4 06:34:19 localhost sshd[15958]: Accepted publickey for bob from 201:2a73:cac3:e128:cf3d:8dc:8afa:df66 port 57064 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 06:36:55 localhost shepherd[1]: Service sshd-382 has been started.
Oct 4 06:36:57 localhost sshd[15991]: Accepted publickey for bob from 201:2a73:cac3:e128:cf3d:8dc:8afa:df66 port 41088 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 06:37:30 localhost shepherd[1]: Service sshd-383 has been started.
Oct 4 06:37:33 localhost sshd[16031]: Accepted publickey for bob from 201:2a73:cac3:e128:cf3d:8dc:8afa:df66 port 56148 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 06:43:39 localhost shepherd[1]: 3 connections still in use after sshd-382 termination.
Oct 4 06:43:39 localhost shepherd[1]: Service sshd-382 (PID 15991) exited with 255.
Oct 4 06:43:39 localhost shepherd[1]: Service sshd-382 has been disabled.
Oct 4 06:43:39 localhost shepherd[1]: Transient service sshd-382 terminated, now unregistered.
Oct 4 06:43:41 localhost shepherd[1]: Service sshd-384 has been started.
Oct 4 06:43:42 localhost sshd[16166]: Accepted publickey for bob from 201:2a73:cac3:e128:cf3d:8dc:8afa:df66 port 36040 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 06:43:49 localhost shepherd[1]: 3 connections still in use after sshd-384 termination.
Oct 4 06:43:49 localhost shepherd[1]: Service sshd-384 has been disabled.
Oct 4 06:43:49 localhost shepherd[1]: Transient service sshd-384 terminated, now unregistered.
Oct 4 06:48:58 localhost shepherd[1]: Service sshd-385 has been started.
Oct 4 06:49:00 localhost sshd[16205]: Accepted publickey for bob from 201:2a73:cac3:e128:cf3d:8dc:8afa:df66 port 34134 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 06:52:05 localhost shepherd[1]: Service sshd-386 has been started.
Oct 4 06:52:06 localhost sshd[16212]: Accepted publickey for bob from 201:2a73:cac3:e128:cf3d:8dc:8afa:df66 port 55922 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 06:53:02 localhost shepherd[1]: 4 connections still in use after sshd-386 termination.
Oct 4 06:53:02 localhost shepherd[1]: Service sshd-386 has been disabled.
Oct 4 06:53:02 localhost shepherd[1]: Transient service sshd-386 terminated, now unregistered.
Oct 4 06:53:03 localhost shepherd[1]: Service sshd-387 has been started.
Oct 4 06:53:04 localhost sshd[16370]: Accepted publickey for bob from 201:2a73:cac3:e128:cf3d:8dc:8afa:df66 port 50604 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 06:53:11 localhost shepherd[1]: 4 connections still in use after sshd-387 termination.
Oct 4 06:53:11 localhost shepherd[1]: Service sshd-387 has been disabled.
Oct 4 06:53:11 localhost shepherd[1]: Transient service sshd-387 terminated, now unregistered.
Oct 4 06:54:25 localhost shepherd[1]: Service ssh-daemon has been stopped.
Oct 4 06:57:42 localhost shepherd[1]: Service ssh-daemon has been started.
Oct 4 06:58:20 localhost shepherd[1]: Service sshd-1 has been started.
Oct 4 06:58:22 localhost sshd[239]: Accepted publickey for bob from 185.115.7.223 port 10436 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:01:22 localhost shepherd[1]: Service sshd-2 has been started.
Oct 4 07:01:24 localhost sshd[247]: Accepted publickey for bob from 185.115.7.223 port 29449 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:02:29 localhost shepherd[1]: Service sshd-3 has been started.
Oct 4 07:02:31 localhost sshd[254]: Accepted publickey for bob from 185.115.7.223 port 29456 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:03:05 localhost sshd[256]: error: no more sessions
Oct 4 07:03:05 localhost shepherd[1]: 2 connections still in use after sshd-3 termination.
Oct 4 07:03:05 localhost shepherd[1]: Service sshd-3 has been disabled.
Oct 4 07:03:05 localhost shepherd[1]: Transient service sshd-3 terminated, now unregistered.
Oct 4 07:04:21 localhost shepherd[1]: Service sshd-4 has been started.
Oct 4 07:04:22 localhost sshd[385]: Accepted publickey for bob from 185.115.7.223 port 10409 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:04:36 localhost shepherd[1]: Service sshd-5 has been started.
Oct 4 07:04:38 localhost sshd[435]: Accepted publickey for bob from 185.115.7.223 port 10426 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:04:50 localhost sshd[387]: error: no more sessions
Oct 4 07:04:50 localhost shepherd[1]: 3 connections still in use after sshd-4 termination.
Oct 4 07:04:50 localhost shepherd[1]: Service sshd-4 has been disabled.
Oct 4 07:04:50 localhost shepherd[1]: Transient service sshd-4 terminated, now unregistered.
Oct 4 07:05:10 localhost shepherd[1]: Service sshd-6 has been started.
Oct 4 07:05:12 localhost sshd[508]: Accepted publickey for bob from 185.115.7.223 port 29519 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:05:57 localhost shepherd[1]: 3 connections still in use after sshd-6 termination.
Oct 4 07:05:57 localhost shepherd[1]: Service sshd-6 has been disabled.
Oct 4 07:05:57 localhost shepherd[1]: Transient service sshd-6 terminated, now unregistered.
Oct 4 07:05:57 localhost shepherd[1]: Service sshd-7 has been started.
Oct 4 07:05:59 localhost sshd[666]: Accepted publickey for bob from 185.115.7.223 port 29495 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:06:03 localhost shepherd[1]: 3 connections still in use after sshd-7 termination.
Oct 4 07:06:03 localhost shepherd[1]: Service sshd-7 has been disabled.
Oct 4 07:06:03 localhost shepherd[1]: Transient service sshd-7 terminated, now unregistered.
Oct 4 07:07:13 localhost shepherd[1]: Service sshd-8 has been started.
Oct 4 07:07:14 localhost sshd[699]: Accepted publickey for bob from 185.115.7.223 port 29463 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:07:51 localhost shepherd[1]: 3 connections still in use after sshd-8 termination.
Oct 4 07:07:51 localhost shepherd[1]: Service sshd-8 has been disabled.
Oct 4 07:07:51 localhost shepherd[1]: Transient service sshd-8 terminated, now unregistered.
Oct 4 07:07:52 localhost shepherd[1]: Service sshd-9 has been started.
Oct 4 07:07:54 localhost sshd[857]: Accepted publickey for bob from 185.115.7.223 port 10407 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:07:58 localhost shepherd[1]: 3 connections still in use after sshd-9 termination.
Oct 4 07:07:58 localhost shepherd[1]: Service sshd-9 (PID 857) exited with 255.
Oct 4 07:07:58 localhost shepherd[1]: Service sshd-9 has been disabled.
Oct 4 07:07:58 localhost shepherd[1]: Transient service sshd-9 terminated, now unregistered.
Oct 4 07:08:22 localhost shepherd[1]: Service sshd-10 has been started.
Oct 4 07:08:24 localhost sshd[889]: Accepted publickey for bob from 185.115.7.223 port 10478 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:08:55 localhost sshd[891]: error: no more sessions
Oct 4 07:08:55 localhost shepherd[1]: 3 connections still in use after sshd-10 termination.
Oct 4 07:08:55 localhost shepherd[1]: Service sshd-10 has been disabled.
Oct 4 07:08:55 localhost shepherd[1]: Transient service sshd-10 terminated, now unregistered.
Oct 4 07:09:24 localhost shepherd[1]: Service sshd-11 has been started.
Oct 4 07:09:25 localhost sshd[1010]: Accepted publickey for bob from 185.115.7.223 port 10489 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:10:01 localhost shepherd[1]: 3 connections still in use after sshd-11 termination.
Oct 4 07:10:01 localhost shepherd[1]: Service sshd-11 (PID 1010) exited with 255.
Oct 4 07:10:01 localhost shepherd[1]: Service sshd-11 has been disabled.
Oct 4 07:10:01 localhost shepherd[1]: Transient service sshd-11 terminated, now unregistered.
Oct 4 07:10:01 localhost shepherd[1]: Service sshd-12 has been started.
Oct 4 07:10:03 localhost sshd[1164]: Accepted publickey for bob from 185.115.7.223 port 29458 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:10:08 localhost shepherd[1]: 3 connections still in use after sshd-12 termination.
Oct 4 07:10:08 localhost shepherd[1]: Service sshd-12 (PID 1164) exited with 255.
Oct 4 07:10:08 localhost shepherd[1]: Service sshd-12 has been disabled.
Oct 4 07:10:08 localhost shepherd[1]: Transient service sshd-12 terminated, now unregistered.
Oct 4 07:11:09 localhost shepherd[1]: Service sshd-1 has been stopped.
Oct 4 07:11:09 localhost shepherd[1]: Transient service sshd-1 unregistered.
Oct 4 07:11:09 localhost shepherd[1]: Service ssh-daemon has been stopped.
Oct 4 07:11:59 localhost shepherd[1]: Service ssh-daemon has been started.
Oct 4 07:12:02 localhost shepherd[1]: Service sshd-1 has been started.
Oct 4 07:12:04 localhost sshd[241]: Accepted publickey for bob from 185.115.7.223 port 10202 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:15:09 localhost shepherd[1]: Service sshd-2 has been started.
Oct 4 07:15:11 localhost sshd[269]: Accepted publickey for bob from 185.115.7.223 port 29443 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:16:08 localhost sshd[271]: error: no more sessions
Oct 4 07:16:08 localhost shepherd[1]: 1 connection still in use after sshd-2 termination.
Oct 4 07:16:08 localhost shepherd[1]: Service sshd-2 has been disabled.
Oct 4 07:16:08 localhost shepherd[1]: Transient service sshd-2 terminated, now unregistered.
Oct 4 07:16:31 localhost shepherd[1]: Service sshd-3 has been started.
Oct 4 07:16:33 localhost sshd[404]: Accepted publickey for bob from 185.115.7.223 port 10413 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:17:13 localhost shepherd[1]: 1 connection still in use after sshd-3 termination.
Oct 4 07:17:13 localhost shepherd[1]: Service sshd-3 (PID 404) exited with 255.
Oct 4 07:17:13 localhost shepherd[1]: Service sshd-3 has been disabled.
Oct 4 07:17:13 localhost shepherd[1]: Transient service sshd-3 terminated, now unregistered.
Oct 4 07:17:14 localhost shepherd[1]: Service sshd-4 has been started.
Oct 4 07:17:15 localhost sshd[556]: Accepted publickey for bob from 185.115.7.223 port 10454 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:17:19 localhost shepherd[1]: 1 connection still in use after sshd-4 termination.
Oct 4 07:17:19 localhost shepherd[1]: Service sshd-4 has been disabled.
Oct 4 07:17:19 localhost shepherd[1]: Transient service sshd-4 terminated, now unregistered.
Oct 4 07:27:39 localhost shepherd[1]: Service sshd-5 has been started.
Oct 4 07:27:41 localhost sshd[602]: Accepted publickey for bob from 185.115.7.223 port 29522 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:28:13 localhost sshd[604]: error: no more sessions
Oct 4 07:28:13 localhost shepherd[1]: 1 connection still in use after sshd-5 termination.
Oct 4 07:28:13 localhost shepherd[1]: Service sshd-5 has been disabled.
Oct 4 07:28:13 localhost shepherd[1]: Transient service sshd-5 terminated, now unregistered.
Oct 4 07:28:20 localhost shepherd[1]: Service sshd-6 has been started.
Oct 4 07:28:21 localhost sshd[723]: Accepted publickey for bob from 185.115.7.223 port 29474 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:28:50 localhost sshd[725]: error: no more sessions
Oct 4 07:28:50 localhost shepherd[1]: 1 connection still in use after sshd-6 termination.
Oct 4 07:28:50 localhost shepherd[1]: Service sshd-6 has been disabled.
Oct 4 07:28:50 localhost shepherd[1]: Transient service sshd-6 terminated, now unregistered.
Oct 4 07:30:39 localhost shepherd[1]: Service sshd-7 has been started.
Oct 4 07:30:40 localhost sshd[840]: Accepted publickey for bob from 185.115.7.223 port 10445 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:31:44 localhost shepherd[1]: Service sshd-8 has been started.
Oct 4 07:31:46 localhost sshd[857]: Accepted publickey for bob from 185.115.7.223 port 10393 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:32:13 localhost sshd[859]: error: no more sessions
Oct 4 07:32:13 localhost shepherd[1]: 2 connections still in use after sshd-8 termination.
Oct 4 07:32:13 localhost shepherd[1]: Service sshd-8 has been disabled.
Oct 4 07:32:13 localhost shepherd[1]: Transient service sshd-8 terminated, now unregistered.
Oct 4 07:33:07 localhost shepherd[1]: Service ssh-daemon has been stopped.
Oct 4 07:34:15 localhost shepherd[1]: Service ssh-daemon has been started.
Oct 4 07:34:26 localhost shepherd[1]: Service sshd-1 has been started.
Oct 4 07:34:28 localhost sshd[239]: Accepted publickey for bob from 185.115.7.223 port 29462 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:35:14 localhost shepherd[1]: Service sshd-2 has been started.
Oct 4 07:35:16 localhost sshd[258]: Accepted publickey for bob from 185.115.7.223 port 29504 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:35:52 localhost shepherd[1]: 1 connection still in use after sshd-2 termination.
Oct 4 07:35:52 localhost shepherd[1]: Service sshd-2 has been disabled.
Oct 4 07:35:52 localhost shepherd[1]: Transient service sshd-2 terminated, now unregistered.
Oct 4 07:36:13 localhost shepherd[1]: Service sshd-3 has been started.
Oct 4 07:36:14 localhost sshd[418]: Accepted publickey for bob from 185.115.7.223 port 29463 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:36:52 localhost shepherd[1]: 1 connection still in use after sshd-3 termination.
Oct 4 07:36:52 localhost shepherd[1]: Service sshd-3 (PID 418) exited with 255.
Oct 4 07:36:52 localhost shepherd[1]: Service sshd-3 has been disabled.
Oct 4 07:36:52 localhost shepherd[1]: Transient service sshd-3 terminated, now unregistered.
Oct 4 07:38:51 localhost shepherd[1]: Service ssh-daemon has been started.
Oct 4 07:38:58 localhost shepherd[1]: Service sshd-1 has been started.
Oct 4 07:39:00 localhost sshd[257]: Accepted publickey for bob from 185.115.7.223 port 29563 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:40:14 localhost sshd[259]: Received disconnect from 185.115.7.223 port 29563:11: disconnected by user
Oct 4 07:40:14 localhost sshd[259]: Disconnected from user bob 185.115.7.223 port 29563
Oct 4 07:40:14 localhost shepherd[1]: 0 connections still in use after sshd-1 termination.
Oct 4 07:40:14 localhost shepherd[1]: Service sshd-1 (PID 257) exited with 255.
Oct 4 07:40:14 localhost shepherd[1]: Service sshd-1 has been disabled.
Oct 4 07:40:14 localhost shepherd[1]: Transient service sshd-1 terminated, now unregistered.
Oct 4 07:40:31 localhost shepherd[1]: Service sshd-2 has been started.
Oct 4 07:40:33 localhost sshd[267]: Accepted publickey for bob from 185.115.7.223 port 10488 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:40:43 localhost sshd[269]: Received disconnect from 185.115.7.223 port 10488:11: disconnected by user
Oct 4 07:40:43 localhost sshd[269]: Disconnected from user bob 185.115.7.223 port 10488
Oct 4 07:40:43 localhost shepherd[1]: 0 connections still in use after sshd-2 termination.
Oct 4 07:40:43 localhost shepherd[1]: Service sshd-2 (PID 267) exited with 255.
Oct 4 07:40:43 localhost shepherd[1]: Service sshd-2 has been disabled.
Oct 4 07:40:43 localhost shepherd[1]: Transient service sshd-2 terminated, now unregistered.
Oct 4 07:41:03 localhost shepherd[1]: Service sshd-3 has been started.
Oct 4 07:41:05 localhost sshd[271]: Accepted publickey for bob from 185.115.7.223 port 10368 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:41:10 localhost sshd[273]: Received disconnect from 185.115.7.223 port 10368:11: disconnected by user
Oct 4 07:41:10 localhost sshd[273]: Disconnected from user bob 185.115.7.223 port 10368
Oct 4 07:41:10 localhost shepherd[1]: 0 connections still in use after sshd-3 termination.
Oct 4 07:41:10 localhost shepherd[1]: Service sshd-3 (PID 271) exited with 255.
Oct 4 07:41:10 localhost shepherd[1]: Service sshd-3 has been disabled.
Oct 4 07:41:10 localhost shepherd[1]: Transient service sshd-3 terminated, now unregistered.
Oct 4 07:41:13 localhost shepherd[1]: Service sshd-4 has been started.
Oct 4 07:41:16 localhost sshd[275]: Accepted publickey for bob from 201:2a73:cac3:e128:cf3d:8dc:8afa:df66 port 47302 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:41:19 localhost sshd[277]: Received disconnect from 201:2a73:cac3:e128:cf3d:8dc:8afa:df66 port 47302:11: disconnected by user
Oct 4 07:41:19 localhost sshd[277]: Disconnected from user bob 201:2a73:cac3:e128:cf3d:8dc:8afa:df66 port 47302
Oct 4 07:41:19 localhost shepherd[1]: 0 connections still in use after sshd-4 termination.
Oct 4 07:41:19 localhost shepherd[1]: Service sshd-4 (PID 275) exited with 255.
Oct 4 07:41:19 localhost shepherd[1]: Service sshd-4 has been disabled.
Oct 4 07:41:19 localhost shepherd[1]: Transient service sshd-4 terminated, now unregistered.
Oct 4 07:41:24 localhost shepherd[1]: Service sshd-5 has been started.
Oct 4 07:41:26 localhost sshd[280]: Accepted publickey for bob from 201:2a73:cac3:e128:cf3d:8dc:8afa:df66 port 33450 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:42:39 localhost shepherd[1]: 0 connections still in use after sshd-5 termination.
Oct 4 07:42:39 localhost shepherd[1]: Service sshd-5 has been disabled.
Oct 4 07:42:39 localhost shepherd[1]: Transient service sshd-5 terminated, now unregistered.
Oct 4 07:43:20 localhost shepherd[1]: Service sshd-6 has been started.
Oct 4 07:43:22 localhost sshd[436]: Accepted publickey for bob from 201:2a73:cac3:e128:cf3d:8dc:8afa:df66 port 60366 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 07:44:08 localhost shepherd[1]: 0 connections sti
This message was truncated. Download the full message here.
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEKEGaxlA4dEDH6S/6IgjSCVjB3rAFAmNJBXgACgkQIgjSCVjB
3rAJNA//R63FpeXwzUZQr4SUG/2DS0BtRipxwfekAfZaobqeRmNqNTeg/2MBROfL
KgPue2NYOPyxbsZMKGNEMFMgsKRLPd6sVG3d61HEx/W6aI+f1PONZBYkqJlfhbaQ
r6DR3w/hfejDuH1pNJ/wADeAC5qN5HAcdOjBNVW+qFoYihjJZ38rGyX9AcVw78vY
0ruMUkolKRtN9I8IKn2ziFBq8BUbs+f4BE5wrRMn3Qzgk7857AjMxmI0TT393oHt
jU0Bl85faLWSRlSuQq5CIsXOjtJp8nnRsIW+9irtUnUZSURW3Rtc82d0BwrPIQ67
6m3lY2Vzb/HFqhoWZBhPSQrYrJ4ahqLKemY8q5vEczxid/gOivald4XTgWG9aqkr
csavvFTeqkNbVOofFLmtm27LTC+rcLUsEcyr42jqcnuTpd0Si+MF3MCGyAxWrz3g
pZdJuA4slaoqO0okfcrQIaA1ZshtieTgri/0kOIOvXoDIqfFh+fYsHVyoy6+TsOf
YnQtCwEX6Y/5bPZRBkP5PPUcP8BJ0KBSO+ni9Uhm9lC6RWMIMJb685406io4RWoh
y4PBCIH98IDyJzUpi1ZDzRgMdmREgRlI6I+X6GDbbZrbRuzuLp0tAEzkxroYux0z
CMjBy3emSy5GJ3q42iCDxdUOpJh2VogYrOyOc76HME3BfhzcF9s=
=ekjf
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 17 Oct 2022 16:39
(name . Andrew Tropin)(address . andrew@trop.in)(address . 58290@debbugs.gnu.org)
8735bmsee9.fsf@gnu.org
Hi,

Andrew Tropin <andrew@trop.in> skribis:

Toggle quote (8 lines)
>>> ice-9/boot-9.scm:1685:16: In procedure raise-exception:
>>> Throw to key `guile-ssh-error' with args `("channel-open-session" "Channel opening failure: channel 67 error (2) open failed" #<input-output: channel (closed) 7f7d1af9e140> #f)'.
>>
>> Are there any hints from sshd in the /var/log/messages of that machine
>> as to why the connection was closed?
>
> bob@pinky /var/log$ zcat messages.1.gz | grep "Oct 4" | grep ssh

It’s hard (if not impossible) to see which lines correspond to the error
above. :-)

Could try to reproduce the bug, and have ‘tail -f /var/log/messages’
running on the server side so you can capture just the right lines?
/var/log/debug might be interesting too.

Thanks,
Ludo’.
M
M
Marius Bakke wrote on 17 Oct 2022 19:27
(address . 58290@debbugs.gnu.org)
87h702uzrb.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skriver:

Toggle quote (19 lines)
> Hi,
>
> Andrew Tropin <andrew@trop.in> skribis:
>
>>>> ice-9/boot-9.scm:1685:16: In procedure raise-exception:
>>>> Throw to key `guile-ssh-error' with args `("channel-open-session" "Channel opening failure: channel 67 error (2) open failed" #<input-output: channel (closed) 7f7d1af9e140> #f)'.
>>>
>>> Are there any hints from sshd in the /var/log/messages of that machine
>>> as to why the connection was closed?
>>
>> bob@pinky /var/log$ zcat messages.1.gz | grep "Oct 4" | grep ssh
>
> It’s hard (if not impossible) to see which lines correspond to the error
> above. :-)
>
> Could try to reproduce the bug, and have ‘tail -f /var/log/messages’
> running on the server side so you can capture just the right lines?
> /var/log/debug might be interesting too.

I have the same problem. Log messages around failure look something
like this (extracted from above message):

Oct 4 11:51:49 localhost shepherd[1]: Service sshd-2 has been started.
Oct 4 11:51:50 localhost sshd[250]: Accepted publickey for bob from 185.70.53.164 port 1915 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct 4 11:52:28 localhost sshd[252]: error: no more sessions
Oct 4 11:52:28 localhost shepherd[1]: 1 connection still in use after sshd-2 termination.
Oct 4 11:52:28 localhost shepherd[1]: Service sshd-2 has been disabled.

Then deploy crashes with 'Channel opening failure'.

Due to the number of SSH connections made by deploy and frequent
occurence of this problem, I was not able to successfully deploy through
many attempts.

I found that removing the memoizing open-machine-ssh-session* in
(gnu machine ssh) works around it and can happily deploy again.

(that is, just use 'open-machine-ssh-session' instead)
-----BEGIN PGP SIGNATURE-----

iIUEARYKAC0WIQRNTknu3zbaMQ2ddzTocYulkRQQdwUCY02QeA8cbWFyaXVzQGdu
dS5vcmcACgkQ6HGLpZEUEHfSOwEAvIflhP7SIu4QVTzDTPeVj5tf4c5yKWUkCqzT
Fp5bbZ0BAOkXwYDz1lmBlIJzRzZ7TAErhs3EXLvyUbqh/eBliEEF
=BIX7
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 18 Oct 2022 17:31
(name . Marius Bakke)(address . marius@gnu.org)
87y1tdkv1d.fsf@gnu.org
Hi,

Marius Bakke <marius@gnu.org> skribis:

Toggle quote (11 lines)
> I have the same problem. Log messages around failure look something
> like this (extracted from above message):
>
> Oct 4 11:51:49 localhost shepherd[1]: Service sshd-2 has been started.
> Oct 4 11:51:50 localhost sshd[250]: Accepted publickey for bob from 185.70.53.164 port 1915 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
> Oct 4 11:52:28 localhost sshd[252]: error: no more sessions
> Oct 4 11:52:28 localhost shepherd[1]: 1 connection still in use after sshd-2 termination.
> Oct 4 11:52:28 localhost shepherd[1]: Service sshd-2 has been disabled.
>
> Then deploy crashes with 'Channel opening failure'.

“no more sessions” comes from this:

Toggle snippet (15 lines)
int
session_open(Authctxt *authctxt, int chanid)
{
Session *s = session_new();
debug("session_open: channel %d", chanid);
if (s == NULL) {
error("no more sessions");
return 0;
}

[...]

}

Would be interesting to run sshd in verbose/debug mode and see why we
hit that; it could be because the maximum number of sessions has been
reached or something.

Toggle quote (4 lines)
> Due to the number of SSH connections made by deploy and frequent
> occurence of this problem, I was not able to successfully deploy through
> many attempts.

Ouch. Normally, thanks to memoization, ‘guix deploy’ opens only one
session per machine. (I think it works well with the 25 local build
machines behind berlin; it’s also fine for a small set of machines I
take care of at work.)

Toggle quote (5 lines)
> I found that removing the memoizing open-machine-ssh-session* in
> (gnu machine ssh) works around it and can happily deploy again.
>
> (that is, just use 'open-machine-ssh-session' instead)

Interesting. That gives us a hint. Could you add a ‘pk’ in
‘open-machine-ssh-session*’ to see how many connections it opens?

Thanks for your feedback!

Ludo’.
M
M
Marius Bakke wrote on 4 Nov 2022 00:47
(name . Ludovic Courtès)(address . ludo@gnu.org)
875yfv38jt.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skriver:

Toggle quote (37 lines)
> Hi,
>
> Marius Bakke <marius@gnu.org> skribis:
>
>> I have the same problem. Log messages around failure look something
>> like this (extracted from above message):
>>
>> Oct 4 11:51:49 localhost shepherd[1]: Service sshd-2 has been started.
>> Oct 4 11:51:50 localhost sshd[250]: Accepted publickey for bob from 185.70.53.164 port 1915 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
>> Oct 4 11:52:28 localhost sshd[252]: error: no more sessions
>> Oct 4 11:52:28 localhost shepherd[1]: 1 connection still in use after sshd-2 termination.
>> Oct 4 11:52:28 localhost shepherd[1]: Service sshd-2 has been disabled.
>>
>> Then deploy crashes with 'Channel opening failure'.
>
> “no more sessions” comes from this:
>
> --8<---------------cut here---------------start------------->8---
> int
> session_open(Authctxt *authctxt, int chanid)
> {
> Session *s = session_new();
> debug("session_open: channel %d", chanid);
> if (s == NULL) {
> error("no more sessions");
> return 0;
> }
>
> [...]
>
> }
> --8<---------------cut here---------------end--------------->8---
>
> Would be interesting to run sshd in verbose/debug mode and see why we
> hit that; it could be because the maximum number of sessions has been
> reached or something.

I enabled (log-level 'debug) and this is what happens server side:

Nov 3 21:48:25 capella sshd[26345]: debug1: permanently_set_uid: 0/0
Nov 3 21:50:27 capella sshd[26115]: debug1: Received SIGCHLD.
Nov 3 21:50:27 capella sshd[26115]: debug1: session_by_pid: pid 26345
Nov 3 21:50:27 capella sshd[26115]: debug1: session_exit_message: session 8 channel 8 pid 26345
Nov 3 21:50:27 capella sshd[26115]: debug1: session_exit_message: release channel 8
Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: ctype session rchan 65 win 64000 max 32768
Nov 3 21:50:28 capella sshd[26115]: debug1: input_session_request
Nov 3 21:50:28 capella sshd[26115]: debug1: channel 0: new [server-session]
Nov 3 21:50:28 capella sshd[26115]: debug1: session_new: session 0
Nov 3 21:50:28 capella sshd[26115]: debug1: session_open: channel 0
Nov 3 21:50:28 capella sshd[26115]: debug1: session_open: session 0: link with channel 0
Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: confirm session
Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_req: channel 0 request exec reply 1
Nov 3 21:50:28 capella sshd[26115]: debug1: session_by_channel: session 0 channel 0
Nov 3 21:50:28 capella sshd[26115]: debug1: session_input_channel_req: session 0 req exec
Nov 3 21:50:28 capella sshd[26535]: debug1: PAM: reinitializing credentials
Nov 3 21:50:28 capella sshd[26535]: debug1: permanently_set_uid: 0/0
Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: ctype session rchan 66 win 64000 max 32768
Nov 3 21:50:28 capella sshd[26115]: debug1: input_session_request
Nov 3 21:50:28 capella sshd[26115]: debug1: channel 10: new [server-session]
Nov 3 21:50:28 capella sshd[26115]: debug1: session_open: channel 10
["no more sessions" error occurs here, in a different log file]
Nov 3 21:50:28 capella sshd[26115]: debug1: session open failed, free channel 10
Nov 3 21:50:28 capella sshd[26115]: debug1: channel 10: free: server-session, nchannels 11
Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: failure session
Nov 3 21:50:28 capella sshd[26115]: debug1: channel 0: free: server-session, nchannels 10
Nov 3 21:50:28 capella sshd[26115]: debug1: channel 1: free: server-session, nchannels 9
Nov 3 21:50:28 capella sshd[26115]: debug1: channel 2: free: server-session, nchannels 8
Nov 3 21:50:28 capella sshd[26115]: debug1: channel 3: free: server-session, nchannels 7
Nov 3 21:50:28 capella sshd[26115]: debug1: channel 4: free: server-session, nchannels 6
Nov 3 21:50:28 capella sshd[26115]: debug1: channel 5: free: server-session, nchannels 5
Nov 3 21:50:28 capella sshd[26115]: debug1: channel 6: free: server-session, nchannels 4
Nov 3 21:50:28 capella sshd[26115]: debug1: channel 7: free: server-session, nchannels 3
Nov 3 21:50:28 capella sshd[26115]: debug1: channel 8: free: server-session, nchannels 2
Nov 3 21:50:28 capella sshd[26115]: debug1: channel 9: free: server-session, nchannels 1
Nov 3 21:50:28 capella sshd[26115]: debug1: do_cleanup

I compared this with a successful deploy:

Nov 3 21:44:22 capella sshd[25842]: debug1: permanently_set_uid: 0/0
Nov 3 21:46:25 capella sshd[25612]: debug1: Received SIGCHLD.
Nov 3 21:46:25 capella sshd[25612]: debug1: session_by_pid: pid 25842
Nov 3 21:46:25 capella sshd[25612]: debug1: session_exit_message: session 6 channel 6 pid 25842
Nov 3 21:46:25 capella sshd[25612]: debug1: session_exit_message: release channel 6
Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: ctype session rchan 65 win 64000 max 32768
Nov 3 21:46:26 capella sshd[25612]: debug1: input_session_request
Nov 3 21:46:26 capella sshd[25612]: debug1: channel 0: new [server-session]
Nov 3 21:46:26 capella sshd[25612]: debug1: session_new: session 0
Nov 3 21:46:26 capella sshd[25612]: debug1: session_open: channel 0
Nov 3 21:46:26 capella sshd[25612]: debug1: session_open: session 0: link with channel 0
Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: confirm session
Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_req: channel 0 request exec reply 1
Nov 3 21:46:26 capella sshd[25612]: debug1: session_by_channel: session 0 channel 0
Nov 3 21:46:26 capella sshd[25612]: debug1: session_input_channel_req: session 0 req exec
Nov 3 21:46:26 capella sshd[26032]: debug1: PAM: reinitializing credentials
Nov 3 21:46:26 capella sshd[26032]: debug1: permanently_set_uid: 0/0
Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: ctype session rchan 66 win 64000 max 32768
Nov 3 21:46:26 capella sshd[25612]: debug1: input_session_request
Nov 3 21:46:26 capella sshd[25612]: debug1: channel 8: new [server-session]
Nov 3 21:46:26 capella sshd[25612]: debug1: session_new: session 8
Nov 3 21:46:26 capella sshd[25612]: debug1: session_open: channel 8
Nov 3 21:46:26 capella sshd[25612]: debug1: session_open: session 8: link with channel 8
Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: confirm session
Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_req: channel 8 request exec reply 1
Nov 3 21:46:26 capella sshd[25612]: debug1: session_by_channel: session 8 channel 8
Nov 3 21:46:26 capella sshd[25612]: debug1: session_input_channel_req: session 8 req exec
Nov 3 21:46:26 capella sshd[26043]: debug1: PAM: reinitializing credentials
Nov 3 21:46:26 capella sshd[26043]: debug1: permanently_set_uid: 0/0
Nov 3 21:46:28 capella sshd[25612]: debug1: chan_shutdown_extended_read: channel 8: (i0 o3 sock -1 wfd -1 efd 13 [read])
Nov 3 21:46:28 capella sshd[25612]: debug1: server_input_channel_open: ctype session rchan 67 win 64000 max 32768
Nov 3 21:46:28 capella sshd[25612]: debug1: input_session_request
Nov 3 21:46:28 capella sshd[25612]: debug1: channel 9: new [server-session]
Nov 3 21:46:28 capella sshd[25612]: debug1: session_new: session 9
[... more channels and stuffs before graceful shutdown ...]

I believe the two-minute window before SIGCHLD is waiting for some
Shepherd services to time out (long story..!).

In my current tests the failure always occur at the
very end of deploy:

building /gnu/store/19yajyzw4jhnkkz9w0l9gm4az0jxihkc-install-bootloader.scm.drv...
building /gnu/store/m7ngq5gszyswm6sficinz7ghpra30dl4-remote-exp.scm.drv...
;;; [2022/11/03 21:50:28.892606, 0] [GSSH ERROR] Channel opening failure: channel 66 error (2) open failed: #<input-output: channel (closed) 7fdefb015e80>

What's interesting is that all of the failed deploys fail to open
"channel 10", whereas successful deploys never reach it. All of these
are deploying the exact same configuration and commit, so the SSH
traffic should nearly identical.

I have attached the transcripts of each session (by grepping the PIDs
from /var/log/debug). Grepping for session_open yields some
inconstencies, but haven't studied them in detail yet.

Toggle quote (9 lines)
>> Due to the number of SSH connections made by deploy and frequent
>> occurence of this problem, I was not able to successfully deploy through
>> many attempts.
>
> Ouch. Normally, thanks to memoization, ‘guix deploy’ opens only one
> session per machine. (I think it works well with the 25 local build
> machines behind berlin; it’s also fine for a small set of machines I
> take care of at work.)

Have they been rebooted since the switch to inetd-style sshd? It fails
more often than not for me.

Toggle quote (8 lines)
>> I found that removing the memoizing open-machine-ssh-session* in
>> (gnu machine ssh) works around it and can happily deploy again.
>>
>> (that is, just use 'open-machine-ssh-session' instead)
>
> Interesting. That gives us a hint. Could you add a ‘pk’ in
> ‘open-machine-ssh-session*’ to see how many connections it opens?

Indeed it only opens a single connection. The problem seems to be
with multiple "channels" and "sessions" over a single connection.
Attachment: deploy-fail.log.gz
-----BEGIN PGP SIGNATURE-----

iIUEARYKAC0WIQRNTknu3zbaMQ2ddzTocYulkRQQdwUCY2RS9g8cbWFyaXVzQGdu
dS5vcmcACgkQ6HGLpZEUEHeGbAEAsJskXuHr5Vgs3yLpN7xVoiMuCDmXs3cclgHf
0OYUhgcBALAZhvkMmIjCnT7e/cWeRCyuDUvEIOvC6YIboWDmKbQB
=ihvL
-----END PGP SIGNATURE-----

A
A
Andrew Tropin wrote on 10 Nov 2022 11:06
87h6z75djq.fsf@trop.in
On 2022-11-04 00:47, Marius Bakke wrote:

Toggle quote (159 lines)
> Ludovic Courtès <ludo@gnu.org> skriver:
>
>> Hi,
>>
>> Marius Bakke <marius@gnu.org> skribis:
>>
>>> I have the same problem. Log messages around failure look something
>>> like this (extracted from above message):
>>>
>>> Oct 4 11:51:49 localhost shepherd[1]: Service sshd-2 has been started.
>>> Oct 4 11:51:50 localhost sshd[250]: Accepted publickey for bob from 185.70.53.164 port 1915 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
>>> Oct 4 11:52:28 localhost sshd[252]: error: no more sessions
>>> Oct 4 11:52:28 localhost shepherd[1]: 1 connection still in use after sshd-2 termination.
>>> Oct 4 11:52:28 localhost shepherd[1]: Service sshd-2 has been disabled.
>>>
>>> Then deploy crashes with 'Channel opening failure'.
>>
>> “no more sessions” comes from this:
>>
>> --8<---------------cut here---------------start------------->8---
>> int
>> session_open(Authctxt *authctxt, int chanid)
>> {
>> Session *s = session_new();
>> debug("session_open: channel %d", chanid);
>> if (s == NULL) {
>> error("no more sessions");
>> return 0;
>> }
>>
>> [...]
>>
>> }
>> --8<---------------cut here---------------end--------------->8---
>>
>> Would be interesting to run sshd in verbose/debug mode and see why we
>> hit that; it could be because the maximum number of sessions has been
>> reached or something.
>
> I enabled (log-level 'debug) and this is what happens server side:
>
> Nov 3 21:48:25 capella sshd[26345]: debug1: permanently_set_uid: 0/0
> Nov 3 21:50:27 capella sshd[26115]: debug1: Received SIGCHLD.
> Nov 3 21:50:27 capella sshd[26115]: debug1: session_by_pid: pid 26345
> Nov 3 21:50:27 capella sshd[26115]: debug1: session_exit_message: session 8 channel 8 pid 26345
> Nov 3 21:50:27 capella sshd[26115]: debug1: session_exit_message: release channel 8
> Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: ctype session rchan 65 win 64000 max 32768
> Nov 3 21:50:28 capella sshd[26115]: debug1: input_session_request
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 0: new [server-session]
> Nov 3 21:50:28 capella sshd[26115]: debug1: session_new: session 0
> Nov 3 21:50:28 capella sshd[26115]: debug1: session_open: channel 0
> Nov 3 21:50:28 capella sshd[26115]: debug1: session_open: session 0: link with channel 0
> Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: confirm session
> Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_req: channel 0 request exec reply 1
> Nov 3 21:50:28 capella sshd[26115]: debug1: session_by_channel: session 0 channel 0
> Nov 3 21:50:28 capella sshd[26115]: debug1: session_input_channel_req: session 0 req exec
> Nov 3 21:50:28 capella sshd[26535]: debug1: PAM: reinitializing credentials
> Nov 3 21:50:28 capella sshd[26535]: debug1: permanently_set_uid: 0/0
> Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: ctype session rchan 66 win 64000 max 32768
> Nov 3 21:50:28 capella sshd[26115]: debug1: input_session_request
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 10: new [server-session]
> Nov 3 21:50:28 capella sshd[26115]: debug1: session_open: channel 10
> ["no more sessions" error occurs here, in a different log file]
> Nov 3 21:50:28 capella sshd[26115]: debug1: session open failed, free channel 10
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 10: free: server-session, nchannels 11
> Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: failure session
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 0: free: server-session, nchannels 10
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 1: free: server-session, nchannels 9
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 2: free: server-session, nchannels 8
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 3: free: server-session, nchannels 7
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 4: free: server-session, nchannels 6
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 5: free: server-session, nchannels 5
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 6: free: server-session, nchannels 4
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 7: free: server-session, nchannels 3
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 8: free: server-session, nchannels 2
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 9: free: server-session, nchannels 1
> Nov 3 21:50:28 capella sshd[26115]: debug1: do_cleanup
>
> I compared this with a successful deploy:
>
> Nov 3 21:44:22 capella sshd[25842]: debug1: permanently_set_uid: 0/0
> Nov 3 21:46:25 capella sshd[25612]: debug1: Received SIGCHLD.
> Nov 3 21:46:25 capella sshd[25612]: debug1: session_by_pid: pid 25842
> Nov 3 21:46:25 capella sshd[25612]: debug1: session_exit_message: session 6 channel 6 pid 25842
> Nov 3 21:46:25 capella sshd[25612]: debug1: session_exit_message: release channel 6
> Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: ctype session rchan 65 win 64000 max 32768
> Nov 3 21:46:26 capella sshd[25612]: debug1: input_session_request
> Nov 3 21:46:26 capella sshd[25612]: debug1: channel 0: new [server-session]
> Nov 3 21:46:26 capella sshd[25612]: debug1: session_new: session 0
> Nov 3 21:46:26 capella sshd[25612]: debug1: session_open: channel 0
> Nov 3 21:46:26 capella sshd[25612]: debug1: session_open: session 0: link with channel 0
> Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: confirm session
> Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_req: channel 0 request exec reply 1
> Nov 3 21:46:26 capella sshd[25612]: debug1: session_by_channel: session 0 channel 0
> Nov 3 21:46:26 capella sshd[25612]: debug1: session_input_channel_req: session 0 req exec
> Nov 3 21:46:26 capella sshd[26032]: debug1: PAM: reinitializing credentials
> Nov 3 21:46:26 capella sshd[26032]: debug1: permanently_set_uid: 0/0
> Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: ctype session rchan 66 win 64000 max 32768
> Nov 3 21:46:26 capella sshd[25612]: debug1: input_session_request
> Nov 3 21:46:26 capella sshd[25612]: debug1: channel 8: new [server-session]
> Nov 3 21:46:26 capella sshd[25612]: debug1: session_new: session 8
> Nov 3 21:46:26 capella sshd[25612]: debug1: session_open: channel 8
> Nov 3 21:46:26 capella sshd[25612]: debug1: session_open: session 8: link with channel 8
> Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: confirm session
> Nov 3 21:46:26 capella sshd[25612]: debug1: server_input_channel_req: channel 8 request exec reply 1
> Nov 3 21:46:26 capella sshd[25612]: debug1: session_by_channel: session 8 channel 8
> Nov 3 21:46:26 capella sshd[25612]: debug1: session_input_channel_req: session 8 req exec
> Nov 3 21:46:26 capella sshd[26043]: debug1: PAM: reinitializing credentials
> Nov 3 21:46:26 capella sshd[26043]: debug1: permanently_set_uid: 0/0
> Nov 3 21:46:28 capella sshd[25612]: debug1: chan_shutdown_extended_read: channel 8: (i0 o3 sock -1 wfd -1 efd 13 [read])
> Nov 3 21:46:28 capella sshd[25612]: debug1: server_input_channel_open: ctype session rchan 67 win 64000 max 32768
> Nov 3 21:46:28 capella sshd[25612]: debug1: input_session_request
> Nov 3 21:46:28 capella sshd[25612]: debug1: channel 9: new [server-session]
> Nov 3 21:46:28 capella sshd[25612]: debug1: session_new: session 9
> [... more channels and stuffs before graceful shutdown ...]
>
> I believe the two-minute window before SIGCHLD is waiting for some
> Shepherd services to time out (long story..!).
>
> In my current tests the failure always occur at the
> very end of deploy:
>
> building /gnu/store/19yajyzw4jhnkkz9w0l9gm4az0jxihkc-install-bootloader.scm.drv...
> building /gnu/store/m7ngq5gszyswm6sficinz7ghpra30dl4-remote-exp.scm.drv...
> ;;; [2022/11/03 21:50:28.892606, 0] [GSSH ERROR] Channel opening failure: channel 66 error (2) open failed: #<input-output: channel (closed) 7fdefb015e80>
>
> What's interesting is that all of the failed deploys fail to open
> "channel 10", whereas successful deploys never reach it. All of these
> are deploying the exact same configuration and commit, so the SSH
> traffic should nearly identical.
>
> I have attached the transcripts of each session (by grepping the PIDs
> from /var/log/debug). Grepping for session_open yields some
> inconstencies, but haven't studied them in detail yet.
>
>>> Due to the number of SSH connections made by deploy and frequent
>>> occurence of this problem, I was not able to successfully deploy through
>>> many attempts.
>>
>> Ouch. Normally, thanks to memoization, ‘guix deploy’ opens only one
>> session per machine. (I think it works well with the 25 local build
>> machines behind berlin; it’s also fine for a small set of machines I
>> take care of at work.)
>
> Have they been rebooted since the switch to inetd-style sshd? It fails
> more often than not for me.
>
>>> I found that removing the memoizing open-machine-ssh-session* in
>>> (gnu machine ssh) works around it and can happily deploy again.
>>>
>>> (that is, just use 'open-machine-ssh-session' instead)
>>
>> Interesting. That gives us a hint. Could you add a ‘pk’ in
>> ‘open-machine-ssh-session*’ to see how many connections it opens?
>
> Indeed it only opens a single connection. The problem seems to be
> with multiple "channels" and "sessions" over a single connection.
>

Very detailed report, thank you for digging in!

CCed Artyom Poptsov, author of Guile-SSH.

--
Best regards,
Andrew Tropin
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEKEGaxlA4dEDH6S/6IgjSCVjB3rAFAmNszSoACgkQIgjSCVjB
3rC+bBAAgcU8svGJMtm7hzefZHQfH0cs2XP/6VFd5ae5AqpQ4Kw2ZENvED+eSIdv
BuzmsdXhLwLGITEFJdhyROrDlKtz85IwCsl9NXrWrOa0cNVW9bD9ibqjKnNcCMP8
jMjdoGxo2Khj6DAMdma9l1G7BfJrutTHvxid3MrmnSk3ZEQLK0haEKkkifK7vogq
l1s1sLicEyx+9ux4BweZoGbIwuXVl7/dwiRNyS/02xE+8S8YR4dTuOFMoAH5P7Hy
2GlQyif9jyZji58msLhbfwu4WmZf0yiVRIDzjoNA5YtT8ZCJ/0auOCihtQi1ri2o
Nwaf44LywmadtO7RpXnp4rVyxvXV3SYcgXajDFyMAZxvKEHqIc8pUkvpcdS/RfkR
KLkCOY03Xlf52F8Ien3+9g6zLibDqaXXgUF9GNVfg7FqaxVH2LlTc0EUcBIL7oiQ
/TvddMtklO+nMXfVx9OBIiI5RgwT0eW93nxSOO0Ouh2nh9jJDLuD9oQoyeudd/w4
rD0LkWlBSOSqi+8bJ4QkpmbMLELoEIi9IiNa/HpH0W3q8H5DFWLtkZ70XRlMxcky
IunQA1sQDXfSBMPWzH798ESk9RfIxwTSdIKdXG3Y02kd4PFIjjvUoSjqUjZ3jDLB
6n6lwWUUZCCxvm3U/WRSOqGCtvJAxZaoTJnuq/aIFw7dr4/LQmI=
=/h8b
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 12 Jan 2023 12:10
control message for bug #58290
(address . control@debbugs.gnu.org)
874jswhuu4.fsf@gnu.org
severity 58290 important
quit
L
L
Ludovic Courtès wrote on 12 Jan 2023 12:10
(address . control@debbugs.gnu.org)
87358ghutk.fsf@gnu.org
merge 58290 56709
quit
L
L
Ludovic Courtès wrote on 12 Jan 2023 13:11
Re: bug#58290: guile ssh error on guix deploy
(name . Marius Bakke)(address . marius@gnu.org)
87v8lcgdg3.fsf@gnu.org
Hi,

Marius Bakke <marius@gnu.org> skribis:

Toggle quote (14 lines)
> Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: ctype session rchan 66 win 64000 max 32768
> Nov 3 21:50:28 capella sshd[26115]: debug1: input_session_request
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 10: new [server-session]
> Nov 3 21:50:28 capella sshd[26115]: debug1: session_open: channel 10
> ["no more sessions" error occurs here, in a different log file]
> Nov 3 21:50:28 capella sshd[26115]: debug1: session open failed, free channel 10
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 10: free: server-session, nchannels 11
> Nov 3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: failure session
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 0: free: server-session, nchannels 10
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 1: free: server-session, nchannels 9
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 2: free: server-session, nchannels 8
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 3: free: server-session, nchannels 7
> Nov 3 21:50:28 capella sshd[26115]: debug1: channel 4: free: server-session, nchannels 6

[...]

Toggle quote (12 lines)
> In my current tests the failure always occur at the
> very end of deploy:
>
> building /gnu/store/19yajyzw4jhnkkz9w0l9gm4az0jxihkc-install-bootloader.scm.drv...
> building /gnu/store/m7ngq5gszyswm6sficinz7ghpra30dl4-remote-exp.scm.drv...
> ;;; [2022/11/03 21:50:28.892606, 0] [GSSH ERROR] Channel opening failure: channel 66 error (2) open failed: #<input-output: channel (closed) 7fdefb015e80>
>
> What's interesting is that all of the failed deploys fail to open
> "channel 10", whereas successful deploys never reach it. All of these
> are deploying the exact same configuration and commit, so the SSH
> traffic should nearly identical.

Here’s a reproducer:

Toggle snippet (15 lines)
(use-modules (guix ssh)
(guix))

(define %local
(open-connection))

(define %remote
(connect-to-remote-daemon (open-ssh-session "HOST-OF-YOUR-CHOICE")))

(let loop ((i 0))
(pk 'loop i)
(send-files %local '("/gnu/store/8fpk2cja3f07xls48jfnpgrzrljpqivr-coreutils-8.32") %remote)
(loop (+ 1 i)))

It fails after 8 iterations. It looks like we’re leaking SSH channels
or something.

However, in GDB, we can see that ‘ssh_channel_close’ is indeed called
after each iteration (via ‘ptob_close’ in Guile-SSH).

Artyom, does that ring a bell?

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 12 Jan 2023 13:44
(name . Marius Bakke)(address . marius@gnu.org)
87o7r3hqgy.fsf@gnu.org
A simpler reproducer:

Toggle snippet (20 lines)
(use-modules (ssh session)
(ssh popen)
(ssh auth)
(rnrs io ports))

(define session
(make-session #:host "HOST-OF-YOUR-CHOICE"))

(session-parse-config! session)
(connect! session)
(userauth-public-key/auto! session)

(let loop ((i 0))
(pk 'loOp i)
(let ((pipe (open-remote-pipe session "date" "r")))
(pk 'x (get-string-all pipe))
(close-port pipe)
(loop (+ 1 i))))

Ludo’.
A
A
Artyom V. Poptsov wrote on 22 Jan 2023 21:00
(name . Ludovic Courtès)(address . ludo@gnu.org)
87zgaaz6ej.fsf@gmail.com
Hello Ludovic, Andrew and all,

thank you for reporting the bug!

The code for reproducing the bug was very helpful indeed, I just
slightly modified the it:

Toggle snippet (23 lines)
(use-modules (ssh session)
(ssh popen)
(ssh auth)
(ssh log)
(rnrs io ports))

(define session
(make-session #:host "localhost"))

(define (main args)
(session-parse-config! session)
(connect! session)
(userauth-public-key/auto! session)
(set-log-verbosity! 'functions)

(let loop ((i 0))
(format (current-error-port) "-- ~a --~%" i)
(let ((pipe (open-remote-pipe session "date" "r")))
(pk 'x (get-string-all pipe))
(close-port pipe)
(loop (+ 1 i)))))

It seems to me that the reason Guile-SSH fails at least partly lies in
changes that were introduced in OpenSSH 5.1.

I set "LogLevel" to "DEBUG3" and ran the test code. System logs have
shown the following error:

Toggle snippet (22 lines)
??? 22 22:40:06 elephant sshd[131488]: debug3: channel 8: will not send data after close
??? 22 22:40:06 elephant sshd[131488]: debug3: channel 9: will not send data after close
??? 22 22:40:06 elephant sshd[131488]: debug2: channel 9: rcvd adjust 1216033
??? 22 22:40:06 elephant sshd[131488]: debug3: channel 0: will not send data after close
??? 22 22:40:06 elephant sshd[131488]: debug3: channel 1: will not send data after close
??? 22 22:40:06 elephant sshd[131488]: debug3: channel 2: will not send data after close
??? 22 22:40:06 elephant sshd[131488]: debug3: channel 3: will not send data after close
??? 22 22:40:06 elephant sshd[131488]: debug3: channel 4: will not send data after close
??? 22 22:40:06 elephant sshd[131488]: debug3: channel 5: will not send data after close
??? 22 22:40:06 elephant sshd[131488]: debug3: channel 6: will not send data after close
??? 22 22:40:06 elephant sshd[131488]: debug3: channel 7: will not send data after close
??? 22 22:40:06 elephant sshd[131488]: debug3: channel 8: will not send data after close
??? 22 22:40:06 elephant sshd[131488]: debug3: channel 9: will not send data after close
??? 22 22:40:06 elephant sshd[131488]: debug3: receive packet: type 90
??? 22 22:40:06 elephant sshd[131488]: debug1: server_input_channel_open: ctype session rchan 53 win 64000 max 32768
??? 22 22:40:06 elephant sshd[131488]: debug1: input_session_request
??? 22 22:40:06 elephant sshd[131488]: debug2: channel: expanding 20
??? 22 22:40:06 elephant sshd[131488]: debug1: channel 10: new [server-session]
??? 22 22:40:06 elephant sshd[131488]: debug1: session_open: channel 10
??? 22 22:40:06 elephant sshd[131488]: error: no more sessions

This "no more sessions" error was the key to understanding the problem.

Here's an excerpt from OpenSSH 5.1 changelog [1]:

Toggle snippet (11 lines)
Changes since OpenSSH 5.0
=========================
[...]
* Added a MaxSessions option to sshd_config(5) to allow control of the
number of multiplexed sessions supported over a single TCP connection.
This allows increasing the number of allowed sessions above the
previous default of 10, disabling connection multiplexing
(MaxSessions=1) or disallowing login/shell/subsystem sessions
entirely (MaxSessions=0).

When I set "MaxSessions" in "/etc/ssh/sshd_config" to 100 and restarted
the sshd daemon, the test code fails on the 100th iteration.

Toggle snippet (50 lines)
[...]
;;; [2023/01/22 22:52:03.246549, 3] ssh_packet_need_rekey: rekey: [data_rekey_needed=0, out_blocks=817, in_blocks=1309]
;;; [2023/01/22 22:52:03.246559, 3] ssh_packet_socket_callback: rcv packet cb (len=0, state=INIT)
;;; [2023/01/22 22:52:03.246568, 3] ssh_packet_socket_callback: Waiting for more data (0 < 4)
;;; [2023/01/22 22:52:03.246579, 4] ssh_socket_pollcallback: sending control flow event
;;; [2023/01/22 22:52:03.246587, 4] ssh_packet_socket_controlflow_callback: sending channel_write_wontblock callback
;;; [2023/01/22 22:52:03.246600, 4] [GSSH DEBUG] read result: 0: #<input: channel (closed by the remote side) 7fba6349d240>

;;; (x "?? 22 ??? 2023 22:52:03 MSK\n")
;;; [2023/01/22 22:52:03.246636, 4] [GSSH DEBUG] the channel is already freed by the closing request from the remote side.
-- 100 --
;;; [2023/01/22 22:52:03.246655, 2] channel_open: Creating a channel 143 with 64000 window and 32768 max packet
;;; [2023/01/22 22:52:03.246664, 3] ssh_packet_need_rekey: rekey: [data_rekey_needed=0, out_blocks=820, in_blocks=1312]
;;; [2023/01/22 22:52:03.246693, 3] ssh_socket_unbuffered_write: Enabling POLLOUT for socket
;;; [2023/01/22 22:52:03.246703, 3] packet_send2: packet: wrote [type=90, len=32, padding_size=7, comp=24, payload=24]
;;; [2023/01/22 22:52:03.246712, 3] channel_open: Sent a SSH_MSG_CHANNEL_OPEN type session for channel 143
;;; [2023/01/22 22:52:03.246721, 4] ssh_socket_pollcallback: Poll callback on socket 10 (POLLOUT ), out buffer 0
;;; [2023/01/22 22:52:03.246732, 4] ssh_socket_pollcallback: sending control flow event
;;; [2023/01/22 22:52:03.246741, 4] ssh_packet_socket_controlflow_callback: sending channel_write_wontblock callback
;;; [2023/01/22 22:52:03.247437, 4] ssh_socket_pollcallback: Poll callback on socket 10 (POLLIN ), out buffer 0
;;; [2023/01/22 22:52:03.247449, 3] ssh_packet_socket_callback: rcv packet cb (len=60, state=INIT)
;;; [2023/01/22 22:52:03.247462, 3] ssh_packet_socket_callback: packet: read type 92 [len=40,padding=11,comp=28,payload=28]
;;; [2023/01/22 22:52:03.247473, 3] ssh_packet_incoming_filter: Filtering packet type 92
;;; [2023/01/22 22:52:03.247484, 3] ssh_packet_process: Dispatching handler for packet type 92
;;; [2023/01/22 22:52:03.247496, 1] ssh_packet_channel_open_fail: Channel opening failure: channel 143 error (2) open failed
;;; [2023/01/22 22:52:03.247506, 3] ssh_packet_need_rekey: rekey: [data_rekey_needed=0, out_blocks=820, in_blocks=1312]
;;; [2023/01/22 22:52:03.247518, 3] ssh_packet_socket_callback: rcv packet cb (len=0, state=INIT)
;;; [2023/01/22 22:52:03.247531, 3] ssh_packet_socket_callback: Waiting for more data (0 < 4)
;;; [2023/01/22 22:52:03.247545, 4] [GSSH DEBUG] result: -1: #<input: channel (closed) 7fba634e5560>
;;; [2023/01/22 22:52:03.247570, 0] [GSSH ERROR] Channel opening failure: channel 143 error (2) open failed: #<input: channel (closed) 7fba634e5560>
Backtrace:
In ice-9/boot-9.scm:
1752:10 6 (with-exception-handler _ _ #:unwind? _ # _)
In unknown file:
5 (apply-smob/0 #<thunk 7fba634b82e0>)
In ice-9/boot-9.scm:
724:2 4 (call-with-prompt _ _ #<procedure default-prompt-handle…>)
In ice-9/eval.scm:
619:8 3 (_ #(#(#<directory (guile-user) 7fba634bdc80>)))
In /home/avp/src/projects/guile-ssh/./test.scm:
22:16 2 (main _)
In ssh/popen.scm:
64:4 1 (open-remote-pipe _ "date" "r")
In unknown file:
0 (channel-open-session #<input: channel (closed) 7fba634…>)

ERROR: In procedure channel-open-session:
Throw to key `guile-ssh-error' with args `("channel-open-session" "Channel opening failure: channel 143 error (2) open failed" #<input: channel (closed) 7fba634e5560> #f)'.

I'm not sure if I should deal with this "MaxSessions" error in Guile-SSH
code; and if so, what the right thing to do on the Guile-SSH side?

I'd love to hear your opinions on that.

Thanks,
- avp

References:

--
Artyom "avp" Poptsov <poptsov.artyom@gmail.com>
CADR Hackerspace co-founder: https://cadrspace.ru/
GPG: D0C2 EAC1 3310 822D 98DE B57C E9C5 A2D9 0898 A02F
-----BEGIN PGP SIGNATURE-----

iQFNBAEBCgA3FiEE0MLqwTMQgi2Y3rV86cWi2QiYoC8FAmPNldQZHHBvcHRzb3Yu
YXJ0eW9tQGdtYWlsLmNvbQAKCRDpxaLZCJigLxVnB/45HlrG3M7wGz+x0XBvHmOr
Tyy0U+fX7cAsLod5yvLrtDg92X+LW1qoeMEaVKuxhIBerMuR/fvimT9arOHs7TgH
soCLEeyaY1YdtPwU5hSRMfgtK4IzW3wltAfJJSCCXXhh+uvuUMeVHIDgtCUT+wlB
eWHlHhm2XViQWer2Shr0p2Wvt2QxGGPHteamzO4J8yCmoVa0qCWZWMp0kbHWfBFE
/4s7NyxKcNjA5M0XBjSBQTiAL+gcQzPq5VVmiUuz1fNLbROSo8Pau7rafHJ9zrqF
n4p3prHojGIL6Gzuju/JT52YfWzhbLksu2fbJXMX2TeKFvKXg22JOe3J0zrlT79b
=RAfK
-----END PGP SIGNATURE-----

A
A
Artyom V. Poptsov wrote on 23 Jan 2023 21:14
(name . Ludovic Courtès)(address . ludo@gnu.org)
87cz752el1.fsf@gmail.com
Hello,

I figured out how to fix Guile-SSH channel "leak", so to say, that lead
to the OpenSSH "no more sessions" problem.

Please run your tests with this branch and let me know if it works for
you (including all the edge cases):

Thanks,
- avp

--
Artyom "avp" Poptsov <poptsov.artyom@gmail.com>
CADR Hackerspace co-founder: https://cadrspace.ru/
GPG: D0C2 EAC1 3310 822D 98DE B57C E9C5 A2D9 0898 A02F
-----BEGIN PGP SIGNATURE-----

iQFNBAEBCgA3FiEE0MLqwTMQgi2Y3rV86cWi2QiYoC8FAmPO6qoZHHBvcHRzb3Yu
YXJ0eW9tQGdtYWlsLmNvbQAKCRDpxaLZCJigL922B/oC4M1PztxusrbrkGrOht4P
uIIvw6e3xfWTTmxsPhC1/1mjHXpo5EYShjd1LlItKbHulJ03Isk4UD7i2OGHt9Nw
BP8RAZZzJyhv0afK/HXXA9HgLynkEdfMlFbun2fU3+KqFSK5Jn8hfFW4U9qp7Sab
ICucmVU+0nP86bOUPyr4ZSDrcVVXf0RuR8OQPwGshZzQdNFbda+KKFEVfn6AaHOu
rO26VdZz3zvLclAOzOiNWzBwhfTjuRm5DGvKgH/EKuLjEL0MKQkzDJr/3Lzh5W3w
5/2Umd5/t1piUHRwVRf0N9BCrF1Tszq1lR12/GJqPSxKNvmFpkCE183qNk84jqgl
=96bl
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 23 Jan 2023 22:50
(name . Artyom V. Poptsov)(address . poptsov.artyom@gmail.com)
87r0vkzzrb.fsf@gnu.org
Hi Artyom,

"Artyom V. Poptsov" <poptsov.artyom@gmail.com> skribis:

Toggle quote (7 lines)
> I figured out how to fix Guile-SSH channel "leak", so to say, that lead
> to the OpenSSH "no more sessions" problem.
>
> Please run your tests with this branch and let me know if it works for
> you (including all the edge cases):
> https://github.com/artyom-poptsov/guile-ssh/tree/wip-fix-channel-leak

It works! Specifically, I ran:

guix shell guile guile-ssh \
--with-branch=guile-ssh=wip-fix-channel-leak -- \
guile ssh-channels.scm

and the script (same one as before) ran several hundreds of iterations
just fine.

I had looked at ‘ptob_close’ and related code but didn’t see this issue;
good catch!

If you plan to push a new release, we’ll just upgrade in Guix; otherwise
we can apply the patch locally.

Thank you!

Ludo’.
A
A
Andrew Tropin wrote on 24 Jan 2023 16:53
87mt68udxz.fsf@trop.in
On 2023-01-23 22:50, Ludovic Courtès wrote:

Toggle quote (20 lines)
> Hi Artyom,
>
> "Artyom V. Poptsov" <poptsov.artyom@gmail.com> skribis:
>
>> I figured out how to fix Guile-SSH channel "leak", so to say, that lead
>> to the OpenSSH "no more sessions" problem.
>>
>> Please run your tests with this branch and let me know if it works for
>> you (including all the edge cases):
>> https://github.com/artyom-poptsov/guile-ssh/tree/wip-fix-channel-leak
>
> It works! Specifically, I ran:
>
> guix shell guile guile-ssh \
> --with-branch=guile-ssh=wip-fix-channel-leak -- \
> guile ssh-channels.scm
>
> and the script (same one as before) ran several hundreds of iterations
> just fine.

Can confirm the same, works on my machine :) Thank you very much.

Toggle quote (11 lines)
>
> I had looked at ‘ptob_close’ and related code but didn’t see this issue;
> good catch!
>
> If you plan to push a new release, we’ll just upgrade in Guix; otherwise
> we can apply the patch locally.
>
> Thank you!
>
> Ludo’.

--
Best regards,
Andrew Tropin
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEKEGaxlA4dEDH6S/6IgjSCVjB3rAFAmPP/ugACgkQIgjSCVjB
3rDf8RAAjjLmF3cKMMrbC0dAXUIotR3ikIEK2g2yI8IVVGlQPAmO8qZHh97GB8Fo
8EkBIKjpztk1NcB6KoAdJoXfzfkE706aJBfERYTtdsXESAtKtnA6ChyeEQXFjIRK
n2fgk1gqoK8455UCFhwiE2WldQQOPIEadh3aNNdK2QdNAnDTDjo1IIgz59NoiXNA
OzwFpsqzHH6ZivM+V8RwibT2MWDlR/tyaMMeFpp3RDSLz2kkPhoNxps0b4ZY5YWd
gn8bq3CWuvZjz/pfqnPuuz19Kc9SPumOFaKMKP3ZdXwc0tSHY7vS8Lm0tECXkRrR
l7HNhierBe3fxQTt5vzgyyMsQDXx/HM25UjZ2ZJ7tBcIlpD+e+kiib3pYs/AHOhg
31LT1/heM9KnQe29BGQ66aqTpABVIBr6BmYBOF3KkuFojkJ2sof8qiIER4oCmi07
khTi3r7Aef7e1otUJ38crnjr0h/ZlRtJ93YO5N/PcGkVj2uFgVQJOkzn7qyBMXo3
SM6Tnx9EdT0byj9/AwsAbMuHASV7Eot8XnE/nllkx6HIU3JLM3GJFlOMHDa2J5Sa
DAycHoCttk313fPnW/PkU1aybfcbRC9FDaxhW4rhcepzzRJuIbVfxV9eLnNlcELx
nP/xH/ELxcA23MLJkqa7e4owxMby9QmOSq2KlUY8XPxfSw9c71w=
=kQM5
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 30 Jan 2023 23:53
(name . Andrew Tropin)(address . andrew@trop.in)
87wn53vdlv.fsf@gnu.org
Hi,

The bug should be fixed with the upgrade to Guile-SSH 0.16.3 in commit
e6f557dd23fbb298afa92dba3133ed985e560699.

Thanks Artyom for the prompt fix & release!

Ludo’.
Closed
?