'ssh-daemon' fails to start

DoneSubmitted by Giovanni Biscuolo.
Details
9 participants
  • Christopher Lemmer Webber
  • Giovanni Biscuolo
  • 宋文武
  • Jelle Licht
  • Julien Lepiller
  • Leo Famulari
  • Ludovic Courtès
  • Marius Bakke
  • maxim.cournoyer
Owner
unassigned
Severity
important
Merged with
G
G
Giovanni Biscuolo wrote on 5 Sep 2019 15:18
‘ssh-daemon’ service fails to start at boot
(address . bug-guix@gnu.org)
87ef0u2867.fsf@roquette.mug.biscuolo.net
Hi,
following a recent discussion on guix-sysadmin I have to confirm thessh-daemon issue since it is still happening on some of the machines Iadminister
Previous possibly related bug reports arehttps://issues.guix.gnu.org/issue/30993andhttps://issues.guix.gnu.org/issue/32197
Unfortunately this issue is *not* well reproducible, it depends on somemysterious (to me) timing factor; AFAIU it does *not* depend on theshepherd version, probably it depends on "something" related to IPv6(read below the details)
Andreas Enge <andreas@enge.fr> writes:
[...]
Toggle quote (4 lines)> My impression is that the problem is still there. I am quite certain it> happened when I rebooted dover, since I had to connect on the serial console> to manually restart the ssh service.
I'm sure it happened when milano-guix-1 was rebooted due to data centremaintenance and happened yesterday to one of my personal Guix machines atoffice
[...]
My situation is similar to the one observed by Andreas
Toggle quote (4 lines)> Well, it is in /var/log/messages:> Aug 3 21:11:38 localhost sshd[360]: Server listening on 0.0.0.0 port 22.> Aug 3 21:11:55 localhost shepherd[1]: Service ssh-daemon could not be started.
Toggle snippet (22 lines)[...]Sep 4 21:46:02 localhost shepherd[1]: Service syslogd has been started.[...]Sep 4 21:46:03 localhost shepherd[1]: Service loopback has been started.[...]Sep 4 21:46:22 localhost vmunix: [ 0.226337] PCI: Using configuration type 1 for base accessSep 4 21:46:09 localhost dhclient: DHCPREQUEST for 10.38.2.16 on eno1 to 255.255.255.255 port 67[...]Sep 4 21:46:24 localhost shepherd[1]: Service networking has been started.[...]Sep 4 21:46:12 localhost sshd[577]: Server listening on 0.0.0.0 port 22.[...]Sep 4 21:46:30 localhost vmunix: [ 0.250107] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 10 *11 12 14 15)Sep 4 21:46:13 localhost dhclient: DHCPREQUEST for 10.38.2.16 on eno1 to 255.255.255.255 port 67[...]Sep 4 21:46:16 localhost dhclient: DHCPACK of 10.38.2.16 from 10.38.2.1[...]Sep 4 21:46:33 localhost shepherd[1]: Service ssh-daemon could not be started.[...]Sep 4 21:46:47 localhost vmunix: [ 0.731142] Segment Routing with IPv6
Please note the timing of the dhclient and the sshd processes: Iinserted them as printed in /var/log/messages but they are nottime-sequential: does it means something or is irrelevant?
So the sshd process started (as far as I cen see there is no trace itwas stopped) and pretty soon shepherd noticed ssh-daemon was notstarted.
Logging in from the console I see the ssh-daemon is stopped but enabled:
Toggle snippet (9 lines)Status of ssh-daemon: It is stopped. It is enabled. Provides (ssh-daemon). Requires (syslogd loopback). Conflicts with (). Will be respawned.
[...]
If I start it via `sudo herd start ssh-daemon` it immediatly starts,like in Andreas experience:
Toggle quote (4 lines)> Aug 3 21:13:10 localhost sshd[385]: Server listening on 0.0.0.0 port 22.> Aug 3 21:13:10 localhost sshd[385]: Server listening on :: port 22.> Aug 3 21:13:11 localhost shepherd[1]: Service ssh-daemon has been started.
Toggle snippet (5 lines)Sep 5 13:38:55 localhost sshd[745]: Server listening on 0.0.0.0 port 22.Sep 5 13:38:55 localhost sshd[745]: Server listening on :: port 22.Sep 5 13:38:55 localhost shepherd[1]: Service ssh-daemon has been started.
Please notice the difference from above: this time the sshd server isalso listening on the IPv6 address :: while in the above log it was onlylistening on the 0.0.0.0 IPv4 address
Does the failure have something to do with IPv6 not available when sshdstarts for the first time after a reboot?
Please have a look at the following /var/log/message excerpt from mysystem after a succesfull ssh-daemon start soon after a reboot (no"manual" intervention):
Toggle snippet (8 lines)Sep 5 14:45:00 localhost vmunix: [ 0.247544] pci 0000:00:14.0: reg 0x10: [mem 0xf7c20000-0xf7c2ffff 64bit]Sep 5 14:44:45 localhost sshd[574]: Server listening on 0.0.0.0 port 22.[...]Sep 5 14:44:47 localhost sshd[574]: Server listening on :: port 22.[...]Sep 5 14:45:05 localhost shepherd[1]: Service ssh-daemon has been started.
Bingo? This time ssh was started also on :: and it works right after a reboot.
It really seems it has something to do with IPv6 but I cannot understandexactly what :-S (do I have to disable IPv6 in my configs?)
For completeness, I have to say that the issue happened yesterday aftera `guix system reconfigure`, this is my current system generation:
Toggle snippet (9 lines)Generation 8 Sep 04 2019 17:19:08 (current) file name: /var/guix/profiles/system-8-link canonical file name: /gnu/store/iw2ayn696f8ipmd5gzw9fxljf9h8w4pr-system label: GNU with Linux-Libre 5.2.11 bootloader: grub-efi root device: UUID: 26bd54ec-4e74-4b3a-96ff-58f2f34e4a1a kernel: /gnu/store/xgl60ivx8p5p79zjbf08p4x09881wf4s-linux-libre-5.2.11/bzImage
Reconfigured with this guix version:
Toggle snippet (8 lines)g@batondor ~$ sudo -i guix describe Generation 6 Sep 04 2019 17:17:02 (current) guix 5ee1c04 repository URL: https://git.savannah.gnu.org/git/guix.git branch: master commit: 5ee1c0459eebdd3b7771abaeab0f0b52ff86fdd5
This is the shepherd version:
Toggle snippet (4 lines)g@batondor ~$ shepherd --versionshepherd (GNU Shepherd) 0.6.1
Thanks! Gio'
-- Giovanni Biscuolo
Xelera IT Infrastructures
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEERcxjuFJYydVfNLI5030Op87MORIFAl1xCAUACgkQ030Op87MORL0QxAAmh2osgmSkAnBkixn9A8temRR7BZp5+LHfC5LGe/5x0Swqawf01QGAnnwnDjdD7WdWqgIJHpZn3hebQLVVYVGFnAP6skD4AG3+YQ3olE6gis48TMvQQ5vUOh0AgQviWjNszuCsG7piZVoBs68+Nf6myyJ8VIqQuKFOhAdCa9WVKVQc4kVt6NQViHJVWrgqusU8ytWj2gs5ryKRGWTxd0T4gqe5aX5YH+N3hji16q39LQUlc38MeGqWKWwIhxyzIfO6nuN7ZpjC7osVOI+yNx3H1RFOUB6UydaET9ZLWBK6B7ZUQ/zkO3OO8CpgM2v7SZ/HlIgjFyco5sXy9waDHSmYFQK1fGMKWG+0soiNYBQsQkgy7HEqSBgtNTcuP3vk4lSEAQt6IEAnULtYy0VEAK8fq9/4jR3zc51VgQ5u+u7VCLtDNknh6aBG/+BZHGY/zVEO7Uufswc2911mAR56MJV3H6RtvV4BzozAQZPsdHEhivg0bTRGAqaU0iPawcwwRa62OP6G0FXO7+xl8wwVkWuNJ/OfH71HZ1uuX00ecTtwh0vMoAIYMB3/YH4ILzsElquwhH/s8FqQq4WMpEUlbxDwDuRUz2ltoHLn4k4aVY/6CfztRDni9Fk8c9SAeCJwwe9xfk3R8aVOutzDMsAagHG1hkA7as6K4UXIDVn6m4qCDU==af5L-----END PGP SIGNATURE-----
宋文武 wrote on 8 Sep 2019 06:19
(name . Giovanni Biscuolo)(address . g@xelera.eu)
871rwro1x9.fsf@member.fsf.org
Giovanni Biscuolo <g@xelera.eu> writes:
Toggle quote (15 lines)> Hi,>> following a recent discussion on guix-sysadmin I have to confirm the> ssh-daemon issue since it is still happening on some of the machines I> administer>> Previous possibly related bug reports are> https://issues.guix.gnu.org/issue/30993 and> https://issues.guix.gnu.org/issue/32197>> Unfortunately this issue is *not* well reproducible, it depends on some> mysterious (to me) timing factor; AFAIU it does *not* depend on the> shepherd version, probably it depends on "something" related to IPv6> (read below the details)
Hello, thank you for this report, it's reproducible with my box that hasan old hard disk, and disable IPv6 for sshd does fix the issue for me...
Toggle quote (64 lines)>> Andreas Enge <andreas@enge.fr> writes:>> [...]>>> My impression is that the problem is still there. I am quite certain it>> happened when I rebooted dover, since I had to connect on the serial console>> to manually restart the ssh service.>> I'm sure it happened when milano-guix-1 was rebooted due to data centre> maintenance and happened yesterday to one of my personal Guix machines at> office>> [...]>> My situation is similar to the one observed by Andreas>>> Well, it is in /var/log/messages:>> Aug 3 21:11:38 localhost sshd[360]: Server listening on 0.0.0.0 port 22.>> Aug 3 21:11:55 localhost shepherd[1]: Service ssh-daemon could not be started.>> [...]> Sep 4 21:46:02 localhost shepherd[1]: Service syslogd has been started.> [...]> Sep 4 21:46:03 localhost shepherd[1]: Service loopback has been started.> [...]> Sep 4 21:46:22 localhost vmunix: [ 0.226337] PCI: Using configuration type 1 for base access> Sep 4 21:46:09 localhost dhclient: DHCPREQUEST for 10.38.2.16 on eno1 to 255.255.255.255 port 67> [...]> Sep 4 21:46:24 localhost shepherd[1]: Service networking has been started.> [...]> Sep 4 21:46:12 localhost sshd[577]: Server listening on 0.0.0.0 port 22.> [...]> Sep 4 21:46:30 localhost vmunix: [ 0.250107] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 10 *11 12 14 15)> Sep 4 21:46:13 localhost dhclient: DHCPREQUEST for 10.38.2.16 on eno1 to 255.255.255.255 port 67> [...]> Sep 4 21:46:16 localhost dhclient: DHCPACK of 10.38.2.16 from 10.38.2.1> [...]> Sep 4 21:46:33 localhost shepherd[1]: Service ssh-daemon could not be started.> [...]> Sep 4 21:46:47 localhost vmunix: [ 0.731142] Segment Routing with IPv6>>> Please note the timing of the dhclient and the sshd processes: I> inserted them as printed in /var/log/messages but they are not> time-sequential: does it means something or is irrelevant?>> So the sshd process started (as far as I cen see there is no trace it> was stopped) and pretty soon shepherd noticed ssh-daemon was not> started.>> Logging in from the console I see the ssh-daemon is stopped but enabled:>> Status of ssh-daemon:> It is stopped.> It is enabled.> Provides (ssh-daemon).> Requires (syslogd loopback).> Conflicts with ().> Will be respawned.>>> [...]
Yes, I think when 'ssh-daemon' failed to start, shepherd should respawnit until success or disable it, but by look at the code of'make-forkexec-constructor', when using 'pid-file' (as 'ssh-ademon'does), and a timeout (default to 5s %pid-file-timeout) is reached, theprocesses got a 'SIGTERM' and return '#f' as its running state, whichwon't be respawn (it's not a pid number) I guess...
To ludo: Is my analysis correct? It's not clear to me how to fix it so'ssh-daemon' can be respawn though...
Toggle quote (20 lines)>> If I start it via `sudo herd start ssh-daemon` it immediatly starts,> like in Andreas experience:>>> Aug 3 21:13:10 localhost sshd[385]: Server listening on 0.0.0.0 port 22.>> Aug 3 21:13:10 localhost sshd[385]: Server listening on :: port 22.>> Aug 3 21:13:11 localhost shepherd[1]: Service ssh-daemon has been started.>> Sep 5 13:38:55 localhost sshd[745]: Server listening on 0.0.0.0 port 22.> Sep 5 13:38:55 localhost sshd[745]: Server listening on :: port 22.> Sep 5 13:38:55 localhost shepherd[1]: Service ssh-daemon has been started.>>> Please notice the difference from above: this time the sshd server is> also listening on the IPv6 address :: while in the above log it was only> listening on the 0.0.0.0 IPv4 address>> Does the failure have something to do with IPv6 not available when sshd> starts for the first time after a reboot?
I agree, as adding '(extra-content "ListenAddress 0.0.0.0")' to my'openssh-configuration' to skip the ipv6 listen fix this issue for me.
A proper fix should be respawn 'ssh-daemon' and start it after 'ipv6available' (i don't know what this mean yet..).
L
L
Ludovic Courtès wrote on 26 Sep 2019 22:23
control message for bug #37309
(address . control@debbugs.gnu.org)
87v9telscw.fsf@gnu.org
severity 37309 importantquit
L
L
Ludovic Courtès wrote on 26 Sep 2019 22:28
control message for bug #30993
(address . control@debbugs.gnu.org)
87r242ls3n.fsf@gnu.org
merge 30993 37309quit
L
L
Ludovic Courtès wrote on 26 Sep 2019 22:29
(address . control@debbugs.gnu.org)
87pnjmls36.fsf@gnu.org
retitle 30993 'ssh-daemon' fails to startquit
J
J
Jelle Licht wrote on 26 Nov 2019 19:34
Re: bug#37309: ‘ssh-daemon’ service fails to start at boot
(address . 37309@debbugs.gnu.org)
87y2w2mqpf.fsf@jlicht.xyz
Hey 宋文武, Giovanni,
iyzsong@member.fsf.org (宋文武) writes:
Toggle quote (11 lines)> [...]> Yes, I think when 'ssh-daemon' failed to start, shepherd should respawn> it until success or disable it, but by look at the code of> 'make-forkexec-constructor', when using 'pid-file' (as 'ssh-ademon'> does), and a timeout (default to 5s %pid-file-timeout) is reached, the> processes got a 'SIGTERM' and return '#f' as its running state, which> won't be respawn (it's not a pid number) I guess...>> To ludo: Is my analysis correct? It's not clear to me how to fix it so> 'ssh-daemon' can be respawn though...
I think I am also running into a similar issue on my spinning rust basedT400. Is there a workaround available that does the above, or is thatanalysis of the situation not correct either?
Thanks,
Jelle
G
G
Giovanni Biscuolo wrote on 29 Nov 2019 09:40
(address . 37309@debbugs.gnu.org)
87imn3f52y.fsf@roquette.mug.biscuolo.net
Hi Jelle,
Jelle Licht <jlicht@fsfe.org> writes:
[...]
Toggle quote (3 lines)> I think I am also running into a similar issue on my spinning rust based> T400. Is there a workaround available that does the above,
I added `(extra-content "ListenAddress 0.0.0.0")` to myopenssh-configuration, to only listen on IPv4 addresses:
Toggle snippet (9 lines)(service openssh-service-type (openssh-configuration (port-number 22) (extra-content "ListenAddress 0.0.0.0") (authorized-keys `(("g" ,(local-file "keys/ssh/g.pub")) ("hydra",(local-file "keys/ssh/hydra.pub"))))))
I tried to reboot several times one machine I can use for testing and itworks for me: please can you try and report if this also works for you?
[...]
Thanks! Gio'
-- Giovanni Biscuolo
Xelera IT Infrastructures
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEERcxjuFJYydVfNLI5030Op87MORIFAl3g2YUACgkQ030Op87MORJMOA//ZyYsA6NbjXJuTpdp1wE1G8wfZwNdlKFUKgbJL4u8Vdsn/b/UTdCU+sOzbNNq6LCwJEYcVwowSz+2qLGJeOHwCYKwHp4mz+GzeQ41evHVJQYuYCMv5TcP10gwl7RW4rxP1yWRDacdc3bUts/VfhIYykxQcBdmm6zqY+Cg9SsAARIbZn4PRt7ci5JCwfXVCaJu4Diob8lyAt8Gb46aiZiQ3sKRui9xLfvNDD6L/owXbjBwGrXuvToTIAErrCGImbBSgruYkRM6/1R5Oly3c0wJ5vQbULi/yb+HbVWfd8MmDWu2z2QzNXcsGrxv8xx82QF4sltfvgABWnBq1EVAWFuSgaBJi7kTVTM1+4KVvRoRPsxwXbkfwMZbtNUEKvhMDeLeGdBehiX1n2eISFefI3kkhdlho2VqiPqskj/q0sRia6NNIC/zXH/RPsXbalc/tcR9OWRHJfnHgZQ1650dCOxWynVwQQB10/Eg4pFfTCSy8B2jfUOzLn6fXyd8tUz2R+v7YO78XfzUgam9lLpi/p9bvlUS8V22n/vbhOAx0bQSmuGfTQRvu2MJ3xV9oq3/Y5hqhLdG/Z6XH+ftjzbsg/HGs23O1tOhFFAjrfoAR2rcXSLOYhAnOQXdMBMsrLQ3s7a8NNZbdWhWDNBD09JgGsLhf55I2g8QjSUTB95lwzB5gAg==Qf9S-----END PGP SIGNATURE-----
J
J
Jelle Licht wrote on 29 Nov 2019 10:51
(address . 37309@debbugs.gnu.org)
87r21rui19.fsf@jlicht.xyz
Hi Giovanni,

Giovanni Biscuolo <g@xelera.eu> writes:
Toggle quote (25 lines)> Hi Jelle,>> Jelle Licht <jlicht@fsfe.org> writes:>> [...]>>> I think I am also running into a similar issue on my spinning rust based>> T400. Is there a workaround available that does the above,>> I added `(extra-content "ListenAddress 0.0.0.0")` to my> openssh-configuration, to only listen on IPv4 addresses:>> --8<---------------cut here---------------start------------->8---> (service openssh-service-type> (openssh-configuration> (port-number 22)> (extra-content "ListenAddress 0.0.0.0")> (authorized-keys> `(("g" ,(local-file "keys/ssh/g.pub"))> ("hydra",(local-file "keys/ssh/hydra.pub"))))))> --8<---------------cut here---------------end--------------->8--->> I tried to reboot several times one machine I can use for testing and it> works for me: please can you try and report if this also works for you?
This, in combination with setting the pid-file-timeout to 30 seconds,made everything work! I guess it is a combination of fun IPv6interactions with extremely slow and busy spinning rust.
Thank you!
This does still like a workaround instead of a proper fix though; isthere something we can do to mitigate these issues in the first place?
- Jelle
L
L
Leo Famulari wrote on 3 Dec 2019 21:12
[PATCH] services: openssh: Restrict to IPv4.
(address . 37309@debbugs.gnu.org)
180aa2dee4e1da7fe915c85b90b1f60edd04f23d.1575403967.git.leo@famulari.name
This works around https://issues.guix.info/issue/30993.
* gnu/services/ssh.scm (<openssh-configuration>)[address-family]: New field.(openssh-config-file): Use it.* doc/guix.texi: Document it.--- doc/guix.texi | 10 ++++++++++ gnu/services/ssh.scm | 16 +++++++++++++++- 2 files changed, 25 insertions(+), 1 deletion(-)
Toggle diff (64 lines)diff --git a/doc/guix.texi b/doc/guix.texiindex 39eb25385c..cf0e141baf 100644--- a/doc/guix.texi+++ b/doc/guix.texi@@ -13913,6 +13913,16 @@ This is a symbol specifying the logging level: @code{quiet}, @code{fatal}, @code{error}, @code{info}, @code{verbose}, @code{debug}, etc. See the man page for @file{sshd_config} for the full list of level names. +@item @code{address-family} (default: @code{'inet})+This is a symbol specifying which type of internet addresses should be+handled by @command{sshd}. The options are @code{inet} (IPv4),+@code{inet6} (IPv6), or @code{any}, which selects both @code{inet} and+@code{inet6}. The upstream default in @code{any}. However, we+currently default to @code{inet} due to a nondeterministic+@command{sshd} startup failure when using IPv6 on Guix. See+@uref{https://issues.guix.info/issue/30993, the bug report} for more+information on this temporary limitation.+ @item @code{extra-content} (default: @code{""}) This field can be used to append arbitrary text to the configuration file. It is especially useful for elaborate configurations that cannot be expresseddiff --git a/gnu/services/ssh.scm b/gnu/services/ssh.scmindex d2dbb8f80d..7e25810eff 100644--- a/gnu/services/ssh.scm+++ b/gnu/services/ssh.scm@@ -4,6 +4,7 @@ ;;; Copyright © 2016 Julien Lepiller <julien@lepiller.eu> ;;; Copyright © 2017 Clément Lassieur <clement@lassieur.org> ;;; Copyright © 2019 Ricardo Wurmus <rekado@elephly.net>+;;; Copyright © 2019 Leo Famulari <leo@famulari.name> ;;; ;;; This file is part of GNU Guix. ;;;@@ -340,7 +341,16 @@ The other options should be self-descriptive." ;; proposed in <https://bugs.gnu.org/27155>. Keep it internal/undocumented ;; for now. (%auto-start? openssh-auto-start?- (default #t)))+ (default #t))++ ;; Symbol+ ;; XXX: This shouldn't be required, but due to limitations with IPv6+ ;; on Guix, sshd often fails to start when it attempts to bind to both+ ;; 0.0.0.0 and ::, because the IPv6 interface is not ready in time.+ ;; Accepted options are inet (IPv4), inet6 (IPv6), or any (both).+ ;; <https://issues.guix.info/issue/30993>+ (address-family openssh-configuration-address-family+ (default 'inet))) (define %openssh-accounts (list (user-group (name "sshd") (system? #t))@@ -468,6 +478,10 @@ of user-name/file-like tuples." (symbol->string (openssh-configuration-log-level config)))) + (format port "AddressFamily ~a\n"+ #$(symbol->string+ (openssh-configuration-address-family config)))+ ;; Add '/etc/authorized_keys.d/%u', which we populate. (format port "AuthorizedKeysFile \ .ssh/authorized_keys .ssh/authorized_keys2 /etc/ssh/authorized_keys.d/%u\n")-- 2.24.0
J
J
Julien Lepiller wrote on 3 Dec 2019 22:53
9AF0F57B-ED38-4A4F-9D34-B0A083DBBB3C@lepiller.eu
Le 3 décembre 2019 21:12:51 GMT+01:00, Leo Famulari <leo@famulari.name> a écrit :
Toggle quote (26 lines)>This works around https://issues.guix.info/issue/30993.>>* gnu/services/ssh.scm (<openssh-configuration>)[address-family]: New>field.>(openssh-config-file): Use it.>* doc/guix.texi: Document it.>---> doc/guix.texi | 10 ++++++++++> gnu/services/ssh.scm | 16 +++++++++++++++-> 2 files changed, 25 insertions(+), 1 deletion(-)>>diff --git a/doc/guix.texi b/doc/guix.texi>index 39eb25385c..cf0e141baf 100644>--- a/doc/guix.texi>+++ b/doc/guix.texi>@@ -13913,6 +13913,16 @@ This is a symbol specifying the logging level:>@code{quiet}, @code{fatal},>@code{error}, @code{info}, @code{verbose}, @code{debug}, etc. See the>man> page for @file{sshd_config} for the full list of level names.> >+@item @code{address-family} (default: @code{'inet})>+This is a symbol specifying which type of internet addresses should be>+handled by @command{sshd}. The options are @code{inet} (IPv4),>+@code{inet6} (IPv6), or @code{any}, which selects both @code{inet} and>+@code{inet6}. The upstream default in @code{any}. However, we
default *is*
Toggle quote (54 lines)>+currently default to @code{inet} due to a nondeterministic>+@command{sshd} startup failure when using IPv6 on Guix. See>+@uref{https://issues.guix.info/issue/30993, the bug report} for more>+information on this temporary limitation.>+> @item @code{extra-content} (default: @code{""})>This field can be used to append arbitrary text to the configuration>file. It>is especially useful for elaborate configurations that cannot be>expressed>diff --git a/gnu/services/ssh.scm b/gnu/services/ssh.scm>index d2dbb8f80d..7e25810eff 100644>--- a/gnu/services/ssh.scm>+++ b/gnu/services/ssh.scm>@@ -4,6 +4,7 @@> ;;; Copyright © 2016 Julien Lepiller <julien@lepiller.eu>> ;;; Copyright © 2017 Clément Lassieur <clement@lassieur.org>> ;;; Copyright © 2019 Ricardo Wurmus <rekado@elephly.net>>+;;; Copyright © 2019 Leo Famulari <leo@famulari.name>> ;;;> ;;; This file is part of GNU Guix.> ;;;>@@ -340,7 +341,16 @@ The other options should be self-descriptive.">;; proposed in <https://bugs.gnu.org/27155>. Keep it>internal/undocumented> ;; for now.> (%auto-start? openssh-auto-start?>- (default #t)))>+ (default #t))>+>+ ;; Symbol>+ ;; XXX: This shouldn't be required, but due to limitations with IPv6>+ ;; on Guix, sshd often fails to start when it attempts to bind to>both>+ ;; 0.0.0.0 and ::, because the IPv6 interface is not ready in time.>+ ;; Accepted options are inet (IPv4), inet6 (IPv6), or any (both).>+ ;; <https://issues.guix.info/issue/30993>>+ (address-family openssh-configuration-address-family>+ (default 'inet)))> > (define %openssh-accounts> (list (user-group (name "sshd") (system? #t))>@@ -468,6 +478,10 @@ of user-name/file-like tuples."> (symbol->string> (openssh-configuration-log-level config))))> >+ (format port "AddressFamily ~a\n">+ #$(symbol->string>+ (openssh-configuration-address-family config)))>+> ;; Add '/etc/authorized_keys.d/%u', which we populate.> (format port "AuthorizedKeysFile \>.ssh/authorized_keys .ssh/authorized_keys2>/etc/ssh/authorized_keys.d/%u\n")
L
L
Leo Famulari wrote on 4 Dec 2019 14:41
(name . Julien Lepiller)(address . julien@lepiller.eu)(address . 37309@debbugs.gnu.org)
20191204134135.GA7375@jasmine.lan
On Tue, Dec 03, 2019 at 10:53:11PM +0100, Julien Lepiller wrote:
Toggle quote (8 lines)> Le 3 d�cembre 2019 21:12:51 GMT+01:00, Leo Famulari <leo@famulari.name> a �crit :> >+@item @code{address-family} (default: @code{'inet})> >+This is a symbol specifying which type of internet addresses should be> >+handled by @command{sshd}. The options are @code{inet} (IPv4),> >+@code{inet6} (IPv6), or @code{any}, which selects both @code{inet} and> >+@code{inet6}. The upstream default in @code{any}. However, we> default *is*
Thanks!
This patch did make sshd work for me again.
However, as part of trying to debug this issue, I changed my systemconfiguration so that it uses dhcp-client-service andwpa-supplicant-service instead of using Wicd. And now I can't reproducethe bug anymore.
I guess that either 1) wpa_supplicant brings the network interfaces upfaster or 2) the state of the network interfaces is more accuratelycaptured with these services (in the sense of, is the network up?).
Tricky...
Does the patch help anybody else?
L
L
Ludovic Courtès wrote on 10 Dec 2019 17:47
(name . Leo Famulari)(address . leo@famulari.name)
87tv68m8ki.fsf@gnu.org
Hi Leo,
Leo Famulari <leo@famulari.name> skribis:
Toggle quote (22 lines)> On Tue, Dec 03, 2019 at 10:53:11PM +0100, Julien Lepiller wrote:>> Le 3 décembre 2019 21:12:51 GMT+01:00, Leo Famulari <leo@famulari.name> a écrit :>> >+@item @code{address-family} (default: @code{'inet})>> >+This is a symbol specifying which type of internet addresses should be>> >+handled by @command{sshd}. The options are @code{inet} (IPv4),>> >+@code{inet6} (IPv6), or @code{any}, which selects both @code{inet} and>> >+@code{inet6}. The upstream default in @code{any}. However, we>> default *is*>> Thanks!>> This patch did make sshd work for me again.>> However, as part of trying to debug this issue, I changed my system> configuration so that it uses dhcp-client-service and> wpa-supplicant-service instead of using Wicd. And now I can't reproduce> the bug anymore.>> I guess that either 1) wpa_supplicant brings the network interfaces up> faster or 2) the state of the network interfaces is more accurately> captured with these services (in the sense of, is the network up?).
Did anyone manage to get an strace log as was discussed inhttps://issues.guix.gnu.org/issue/30993?
That would allow us to know where this is hanging exactly (probablybind(2) on an IPv6 address.)
Thanks,Ludo’.
M
M
maxim.cournoyer wrote on 18 Aug 2020 06:08
control message for bug #30993
(address . control@debbugs.gnu.org)
87wo1wppdk.fsf@hurd.i-did-not-set--mail-host-address--so-tickle-me
tags 30993 fixedclose 30993 quit
C
C
Christopher Lemmer Webber wrote on 27 Nov 2020 23:57
unarchive 37309
(address . control@debbugs.gnu.org)
87y2imjtm0.fsf@dustycloud.org
unarchive 37309
C
C
Christopher Lemmer Webber wrote on 28 Nov 2020 00:00
Re: bug#37309: ‘ssh-daemon’ service fails to start at boot
(name . Giovanni Biscuolo)(address . g@xelera.eu)
87tutajtgf.fsf@dustycloud.org
Giovanni Biscuolo writes:
Toggle quote (15 lines)> Hi,>> following a recent discussion on guix-sysadmin I have to confirm the> ssh-daemon issue since it is still happening on some of the machines I> administer>> Previous possibly related bug reports are> https://issues.guix.gnu.org/issue/30993 and> https://issues.guix.gnu.org/issue/32197>> Unfortunately this issue is *not* well reproducible, it depends on some> mysterious (to me) timing factor; AFAIU it does *not* depend on the> shepherd version, probably it depends on "something" related to IPv6> (read below the details)
This issue continues to plauge me, and has ever since I started to useGuixSD. However it is much worse now that I am running Guix onservers... I frequently have to log in via Linode's (nonfree!) webconsole on every server that is rebooted and kick herd to restartopenssh. Once I do that it's fine.
I don't think my linode machine is on "spinning rust" so I don't thinkthis is the cause. IPv6, maybe? Dunno what.
However I think that it's probably really a dependency issue somewhere;herd is starting opensshd before some other dependent service isspawned. But what? Maybe something authentication related likenetworking, or something. But hm, networking is required...
I'm assuming others must be experiencing this still too... right?
Would really like to see it fixed. It's one of the few things holdingme back from recommending Guix on servers to others.
Do others have any idea?
I noticed the lsh daemon requires networking. Why doesn't openssh?
What about the following "fix"?
Toggle diff (13 lines)diff --git a/gnu/services/ssh.scm b/gnu/services/ssh.scmindex 1891db0487..c9bd62bab7 100644--- a/gnu/services/ssh.scm+++ b/gnu/services/ssh.scm@@ -508,7 +508,7 @@ of user-name/file-like tuples." (list (shepherd-service (documentation "OpenSSH server.")- (requirement '(syslogd loopback))+ (requirement '(syslogd networking loopback)) (provision '(ssh-daemon ssh sshd)) (start #~(make-forkexec-constructor #$openssh-command #:pid-file #$pid-file))
M
M
Marius Bakke wrote on 28 Nov 2020 02:08
(address . 37309@debbugs.gnu.org)
87k0u6xp7x.fsf@gnu.org
Christopher Lemmer Webber <cwebber@dustycloud.org> skriver:
Toggle quote (23 lines)> Giovanni Biscuolo writes:>>> Hi,>>>> following a recent discussion on guix-sysadmin I have to confirm the>> ssh-daemon issue since it is still happening on some of the machines I>> administer>>>> Previous possibly related bug reports are>> https://issues.guix.gnu.org/issue/30993 and>> https://issues.guix.gnu.org/issue/32197>>>> Unfortunately this issue is *not* well reproducible, it depends on some>> mysterious (to me) timing factor; AFAIU it does *not* depend on the>> shepherd version, probably it depends on "something" related to IPv6>> (read below the details)>> This issue continues to plauge me, and has ever since I started to use> GuixSD. However it is much worse now that I am running Guix on> servers... I frequently have to log in via Linode's (nonfree!) web> console on every server that is rebooted and kick herd to restart> openssh. Once I do that it's fine.
Can you share an excerpt of /var/log/messages (ideally the whole bootsequence) from when SSH failed to start?
Toggle quote (10 lines)> I don't think my linode machine is on "spinning rust" so I don't think> this is the cause. IPv6, maybe? Dunno what.>> However I think that it's probably really a dependency issue somewhere;> herd is starting opensshd before some other dependent service is> spawned. But what? Maybe something authentication related like> networking, or something. But hm, networking is required...>> I'm assuming others must be experiencing this still too... right?
FWIW I have never encountered this. :-/
Toggle quote (7 lines)> Would really like to see it fixed. It's one of the few things holding> me back from recommending Guix on servers to others.>> Do others have any idea?>> I noticed the lsh daemon requires networking. Why doesn't openssh?
It's really for legacy reasons, from before we had the Guix Systeminstaller. Then a common way to install was to run dhclient and"herd start ssh-daemon" manually on the live image, so people coulddo the installation over SSH:
https://issues.guix.gnu.org/26548#5
Nowadays, the installer gives a nice and quick way to deploy a minimalsystem, and I suspect the SSH method has fallen out of favor.
Toggle quote (2 lines)> What about the following "fix"?
[...]
Toggle quote (5 lines)> (list (shepherd-service> (documentation "OpenSSH server.")> - (requirement '(syslogd loopback))> + (requirement '(syslogd networking loopback))
If it works for you, let's do this. It would be good to find theunderlying cause though...
Not sure what to do about the installer however: perhaps createyet-another undocumented field of openssh-service-type that makes thenetworking requirement optional?
-----BEGIN PGP SIGNATURE-----
iQFDBAEBCgAtFiEEu7At3yzq9qgNHeZDoqBt8qM6VPoFAl/BoxIPHG1hcml1c0BnbnUub3JnAAoJEKKgbfKjOlT6IkcIALXF4JaUnoObn2DOBkcTf83l7xOGTVP8CHMJIBtQNc5hwpMKj39uzSM2CJPRmNxIqpcFbFYqUDATz9S6UET+mTHuQ9mCL2XsF5tFwRjneR/vUaCC0uSKOK01hdUon1dNzTY3lAfwEu41b/zLLXwjMZcnIlA4roIlcYhNVhM9oVpoGPZCoB9FpcnO7ab7pmvecsjajDYSqTrYgrwFrKZEulOm4lYPIDun7jHpJAzepeEfApYd7jPI346I0x+/xdGzXtcw5xprhfV6QkVqzeNZaLhZTMyn+iJjGl3HsYiw1DmSRn8zNz1YO7zt3Bk0mq5BLNiONXxdm6Z0Hcbq+Eh6k6A==6Eah-----END PGP SIGNATURE-----
L
L
Leo Famulari wrote on 3 Dec 2020 21:38
Re: bug#37309: ‘ssh-daemo n’ service fails to start at boot
(name . Marius Bakke)(address . marius@gnu.org)
X8lM42LVEYWePEdJ@jasmine.lan
On Sat, Nov 28, 2020 at 02:08:34AM +0100, Marius Bakke wrote:
Toggle quote (5 lines)> Christopher Lemmer Webber <cwebber@dustycloud.org> skriver:> > I'm assuming others must be experiencing this still too... right?> > FWIW I have never encountered this. :-/
I reenabled IPv6 listening for sshd after updating to 1.2.0 and thingsare working for now. The problem has always been intermittent for me inthe past.
Chris, are you using an old Thinkpad too?
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEsFFZSPHn08G5gDigJkb6MLrKfwgFAl/JTOMACgkQJkb6MLrKfwh3ehAA5eqcYL2e9Um8q35fSDuChLmmJWdxqzUec+9W64i7tQbi6Ni30jj9HH6o7EPcK6TqCRWQ9GlSRpJKqUWX8s+vRZS9fMR/0g9ohDdkaasaUW+5hnTJ3GBoDy8iBVeJPvlz1Z7cLw7HPOXzQ+fpp/lI68qKr5Q7czoy/rd0ktSYJqhqtS3T2lCG9EZ5Z+sQXCvrRGgoYlAoKCgdEFmsbCG2yRKhGQduCdmHqgX2X3jvrvWvA+4jjhGy+QSyyBVgDNBnnyj8R1L0M8lcuGRZCZLA5s9Wh5DTLHnm+aheq7YKr9crSbdHTB4TI7O8zYis/5xL+oY1HNzR4Y5jdzDYneEBHOGmWoctUsTkfRd6xC2PtcieDvP59XoPKJKJH2eOi7iZC8pH3IbEfrv+HSQobqtILh4wLKDKfdl+yrp0BvCokRPzhtYk++neeHle+XMfuij1riUWuTApJh4gmHpoMhobXqV6wbD6ZVIhpR6DWWgfUbePGG2zelReEfAsGgFwgSttdSVd0n3e+qrlYbdlbcilo6GxTHy64LoyepSbTn72iMOZG3k5+eoJSx1TOzRdHRdD/jQt/Im5AISF6zMjTEQHGoqQXAkXB2/FS4Zx0gLLeubT2m2uOaqW2fNWrJ2TGCOlWTOSvLL0FWyuZN+SFJ/+ZRABi2Np36u7K8iPDIzTMhA==039v-----END PGP SIGNATURE-----

C
C
Christopher Lemmer Webber wrote on 3 Dec 2020 22:56
Re: bug#37309: ‘ssh-daemon’ service fails to start at boot
(name . Leo Famulari)(address . leo@famulari.name)
87pn3qzh7r.fsf@dustycloud.org
Leo Famulari writes:
Toggle quote (12 lines)> On Sat, Nov 28, 2020 at 02:08:34AM +0100, Marius Bakke wrote:>> Christopher Lemmer Webber <cwebber@dustycloud.org> skriver:>> > I'm assuming others must be experiencing this still too... right?>> >> FWIW I have never encountered this. :-/>> I reenabled IPv6 listening for sshd after updating to 1.2.0 and things> are working for now. The problem has always been intermittent for me in> the past.>> Chris, are you using an old Thinkpad too?
I did experience it on an old thinkpad, though in this case it'shappening on the Linode server I'm running. Not particularly old, butprobably shared by many users and thus slower in some way.
That's part of what makes me think this is some kind of racecondition...
?
Your comment

Commenting via the web interface is currently disabled.

To comment on this conversation send email to 37309@debbugs.gnu.org