'ssh-daemon' fails to start

DoneSubmitted by Giovanni Biscuolo.
Details
9 participants
  • Christopher Lemmer Webber
  • Giovanni Biscuolo
  • 宋文武
  • Jelle Licht
  • Julien Lepiller
  • Leo Famulari
  • Ludovic Courtès
  • Marius Bakke
  • maxim.cournoyer
Owner
unassigned
Severity
important
Merged with
G
G
Giovanni Biscuolo wrote on 5 Sep 2019 15:18
‘ssh-daemon’ service fails to start at boot
(address . bug-guix@gnu.org)
87ef0u2867.fsf@roquette.mug.biscuolo.net
Hi,

following a recent discussion on guix-sysadmin I have to confirm the
ssh-daemon issue since it is still happening on some of the machines I
administer

Previous possibly related bug reports are

Unfortunately this issue is *not* well reproducible, it depends on some
mysterious (to me) timing factor; AFAIU it does *not* depend on the
shepherd version, probably it depends on "something" related to IPv6
(read below the details)

Andreas Enge <andreas@enge.fr> writes:

[...]

Toggle quote (4 lines)
> My impression is that the problem is still there. I am quite certain it
> happened when I rebooted dover, since I had to connect on the serial console
> to manually restart the ssh service.

I'm sure it happened when milano-guix-1 was rebooted due to data centre
maintenance and happened yesterday to one of my personal Guix machines at
office

[...]

My situation is similar to the one observed by Andreas

Toggle quote (4 lines)
> Well, it is in /var/log/messages:
> Aug 3 21:11:38 localhost sshd[360]: Server listening on 0.0.0.0 port 22.
> Aug 3 21:11:55 localhost shepherd[1]: Service ssh-daemon could not be started.

Toggle snippet (22 lines)
[...]
Sep 4 21:46:02 localhost shepherd[1]: Service syslogd has been started.
[...]
Sep 4 21:46:03 localhost shepherd[1]: Service loopback has been started.
[...]
Sep 4 21:46:22 localhost vmunix: [ 0.226337] PCI: Using configuration type 1 for base access
Sep 4 21:46:09 localhost dhclient: DHCPREQUEST for 10.38.2.16 on eno1 to 255.255.255.255 port 67
[...]
Sep 4 21:46:24 localhost shepherd[1]: Service networking has been started.
[...]
Sep 4 21:46:12 localhost sshd[577]: Server listening on 0.0.0.0 port 22.
[...]
Sep 4 21:46:30 localhost vmunix: [ 0.250107] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 10 *11 12 14 15)
Sep 4 21:46:13 localhost dhclient: DHCPREQUEST for 10.38.2.16 on eno1 to 255.255.255.255 port 67
[...]
Sep 4 21:46:16 localhost dhclient: DHCPACK of 10.38.2.16 from 10.38.2.1
[...]
Sep 4 21:46:33 localhost shepherd[1]: Service ssh-daemon could not be started.
[...]
Sep 4 21:46:47 localhost vmunix: [ 0.731142] Segment Routing with IPv6

Please note the timing of the dhclient and the sshd processes: I
inserted them as printed in /var/log/messages but they are not
time-sequential: does it means something or is irrelevant?

So the sshd process started (as far as I cen see there is no trace it
was stopped) and pretty soon shepherd noticed ssh-daemon was not
started.

Logging in from the console I see the ssh-daemon is stopped but enabled:

Toggle snippet (9 lines)
Status of ssh-daemon:
It is stopped.
It is enabled.
Provides (ssh-daemon).
Requires (syslogd loopback).
Conflicts with ().
Will be respawned.

[...]

If I start it via `sudo herd start ssh-daemon` it immediatly starts,
like in Andreas experience:

Toggle quote (4 lines)
> Aug 3 21:13:10 localhost sshd[385]: Server listening on 0.0.0.0 port 22.
> Aug 3 21:13:10 localhost sshd[385]: Server listening on :: port 22.
> Aug 3 21:13:11 localhost shepherd[1]: Service ssh-daemon has been started.

Toggle snippet (5 lines)
Sep 5 13:38:55 localhost sshd[745]: Server listening on 0.0.0.0 port 22.
Sep 5 13:38:55 localhost sshd[745]: Server listening on :: port 22.
Sep 5 13:38:55 localhost shepherd[1]: Service ssh-daemon has been started.

Please notice the difference from above: this time the sshd server is
also listening on the IPv6 address :: while in the above log it was only
listening on the 0.0.0.0 IPv4 address

Does the failure have something to do with IPv6 not available when sshd
starts for the first time after a reboot?

Please have a look at the following /var/log/message excerpt from my
system after a succesfull ssh-daemon start soon after a reboot (no
"manual" intervention):

Toggle snippet (8 lines)
Sep 5 14:45:00 localhost vmunix: [ 0.247544] pci 0000:00:14.0: reg 0x10: [mem 0xf7c20000-0xf7c2ffff 64bit]
Sep 5 14:44:45 localhost sshd[574]: Server listening on 0.0.0.0 port 22.
[...]
Sep 5 14:44:47 localhost sshd[574]: Server listening on :: port 22.
[...]
Sep 5 14:45:05 localhost shepherd[1]: Service ssh-daemon has been started.

Bingo? This time ssh was started also on :: and it works right after a reboot.

It really seems it has something to do with IPv6 but I cannot understand
exactly what :-S (do I have to disable IPv6 in my configs?)

For completeness, I have to say that the issue happened yesterday after
a `guix system reconfigure`, this is my current system generation:

Toggle snippet (9 lines)
Generation 8 Sep 04 2019 17:19:08 (current)
file name: /var/guix/profiles/system-8-link
canonical file name: /gnu/store/iw2ayn696f8ipmd5gzw9fxljf9h8w4pr-system
label: GNU with Linux-Libre 5.2.11
bootloader: grub-efi
root device: UUID: 26bd54ec-4e74-4b3a-96ff-58f2f34e4a1a
kernel: /gnu/store/xgl60ivx8p5p79zjbf08p4x09881wf4s-linux-libre-5.2.11/bzImage

Reconfigured with this guix version:

Toggle snippet (8 lines)
g@batondor ~$ sudo -i guix describe
Generation 6 Sep 04 2019 17:17:02 (current)
guix 5ee1c04
repository URL: https://git.savannah.gnu.org/git/guix.git
branch: master
commit: 5ee1c0459eebdd3b7771abaeab0f0b52ff86fdd5

This is the shepherd version:

Toggle snippet (4 lines)
g@batondor ~$ shepherd --version
shepherd (GNU Shepherd) 0.6.1

Thanks! Gio'

--
Giovanni Biscuolo

Xelera IT Infrastructures
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEERcxjuFJYydVfNLI5030Op87MORIFAl1xCAUACgkQ030Op87M
ORL0QxAAmh2osgmSkAnBkixn9A8temRR7BZp5+LHfC5LGe/5x0Swqawf01QGAnnw
nDjdD7WdWqgIJHpZn3hebQLVVYVGFnAP6skD4AG3+YQ3olE6gis48TMvQQ5vUOh0
AgQviWjNszuCsG7piZVoBs68+Nf6myyJ8VIqQuKFOhAdCa9WVKVQc4kVt6NQViHJ
VWrgqusU8ytWj2gs5ryKRGWTxd0T4gqe5aX5YH+N3hji16q39LQUlc38MeGqWKWw
IhxyzIfO6nuN7ZpjC7osVOI+yNx3H1RFOUB6UydaET9ZLWBK6B7ZUQ/zkO3OO8Cp
gM2v7SZ/HlIgjFyco5sXy9waDHSmYFQK1fGMKWG+0soiNYBQsQkgy7HEqSBgtNTc
uP3vk4lSEAQt6IEAnULtYy0VEAK8fq9/4jR3zc51VgQ5u+u7VCLtDNknh6aBG/+B
ZHGY/zVEO7Uufswc2911mAR56MJV3H6RtvV4BzozAQZPsdHEhivg0bTRGAqaU0iP
awcwwRa62OP6G0FXO7+xl8wwVkWuNJ/OfH71HZ1uuX00ecTtwh0vMoAIYMB3/YH4
ILzsElquwhH/s8FqQq4WMpEUlbxDwDuRUz2ltoHLn4k4aVY/6CfztRDni9Fk8c9S
AeCJwwe9xfk3R8aVOutzDMsAagHG1hkA7as6K4UXIDVn6m4qCDU=
=af5L
-----END PGP SIGNATURE-----
宋文武 wrote on 8 Sep 2019 06:19
(name . Giovanni Biscuolo)(address . g@xelera.eu)
871rwro1x9.fsf@member.fsf.org
Giovanni Biscuolo <g@xelera.eu> writes:

Toggle quote (15 lines)
> Hi,
>
> following a recent discussion on guix-sysadmin I have to confirm the
> ssh-daemon issue since it is still happening on some of the machines I
> administer
>
> Previous possibly related bug reports are
> https://issues.guix.gnu.org/issue/30993 and
> https://issues.guix.gnu.org/issue/32197
>
> Unfortunately this issue is *not* well reproducible, it depends on some
> mysterious (to me) timing factor; AFAIU it does *not* depend on the
> shepherd version, probably it depends on "something" related to IPv6
> (read below the details)

Hello, thank you for this report, it's reproducible with my box that has
an old hard disk, and disable IPv6 for sshd does fix the issue for me...

Toggle quote (64 lines)
>
> Andreas Enge <andreas@enge.fr> writes:
>
> [...]
>
>> My impression is that the problem is still there. I am quite certain it
>> happened when I rebooted dover, since I had to connect on the serial console
>> to manually restart the ssh service.
>
> I'm sure it happened when milano-guix-1 was rebooted due to data centre
> maintenance and happened yesterday to one of my personal Guix machines at
> office
>
> [...]
>
> My situation is similar to the one observed by Andreas
>
>> Well, it is in /var/log/messages:
>> Aug 3 21:11:38 localhost sshd[360]: Server listening on 0.0.0.0 port 22.
>> Aug 3 21:11:55 localhost shepherd[1]: Service ssh-daemon could not be started.
>
> [...]
> Sep 4 21:46:02 localhost shepherd[1]: Service syslogd has been started.
> [...]
> Sep 4 21:46:03 localhost shepherd[1]: Service loopback has been started.
> [...]
> Sep 4 21:46:22 localhost vmunix: [ 0.226337] PCI: Using configuration type 1 for base access
> Sep 4 21:46:09 localhost dhclient: DHCPREQUEST for 10.38.2.16 on eno1 to 255.255.255.255 port 67
> [...]
> Sep 4 21:46:24 localhost shepherd[1]: Service networking has been started.
> [...]
> Sep 4 21:46:12 localhost sshd[577]: Server listening on 0.0.0.0 port 22.
> [...]
> Sep 4 21:46:30 localhost vmunix: [ 0.250107] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 10 *11 12 14 15)
> Sep 4 21:46:13 localhost dhclient: DHCPREQUEST for 10.38.2.16 on eno1 to 255.255.255.255 port 67
> [...]
> Sep 4 21:46:16 localhost dhclient: DHCPACK of 10.38.2.16 from 10.38.2.1
> [...]
> Sep 4 21:46:33 localhost shepherd[1]: Service ssh-daemon could not be started.
> [...]
> Sep 4 21:46:47 localhost vmunix: [ 0.731142] Segment Routing with IPv6
>
>
> Please note the timing of the dhclient and the sshd processes: I
> inserted them as printed in /var/log/messages but they are not
> time-sequential: does it means something or is irrelevant?
>
> So the sshd process started (as far as I cen see there is no trace it
> was stopped) and pretty soon shepherd noticed ssh-daemon was not
> started.
>
> Logging in from the console I see the ssh-daemon is stopped but enabled:
>
> Status of ssh-daemon:
> It is stopped.
> It is enabled.
> Provides (ssh-daemon).
> Requires (syslogd loopback).
> Conflicts with ().
> Will be respawned.
>
>
> [...]

Yes, I think when 'ssh-daemon' failed to start, shepherd should respawn
it until success or disable it, but by look at the code of
'make-forkexec-constructor', when using 'pid-file' (as 'ssh-ademon'
does), and a timeout (default to 5s %pid-file-timeout) is reached, the
processes got a 'SIGTERM' and return '#f' as its running state, which
won't be respawn (it's not a pid number) I guess...

To ludo: Is my analysis correct? It's not clear to me how to fix it so
'ssh-daemon' can be respawn though...

Toggle quote (20 lines)
>
> If I start it via `sudo herd start ssh-daemon` it immediatly starts,
> like in Andreas experience:
>
>> Aug 3 21:13:10 localhost sshd[385]: Server listening on 0.0.0.0 port 22.
>> Aug 3 21:13:10 localhost sshd[385]: Server listening on :: port 22.
>> Aug 3 21:13:11 localhost shepherd[1]: Service ssh-daemon has been started.
>
> Sep 5 13:38:55 localhost sshd[745]: Server listening on 0.0.0.0 port 22.
> Sep 5 13:38:55 localhost sshd[745]: Server listening on :: port 22.
> Sep 5 13:38:55 localhost shepherd[1]: Service ssh-daemon has been started.
>
>
> Please notice the difference from above: this time the sshd server is
> also listening on the IPv6 address :: while in the above log it was only
> listening on the 0.0.0.0 IPv4 address
>
> Does the failure have something to do with IPv6 not available when sshd
> starts for the first time after a reboot?

I agree, as adding '(extra-content "ListenAddress 0.0.0.0")' to my
'openssh-configuration' to skip the ipv6 listen fix this issue for me.

A proper fix should be respawn 'ssh-daemon' and start it after 'ipv6
available' (i don't know what this mean yet..).
L
L
Ludovic Courtès wrote on 26 Sep 2019 22:23
control message for bug #37309
(address . control@debbugs.gnu.org)
87v9telscw.fsf@gnu.org
severity 37309 important
quit
L
L
Ludovic Courtès wrote on 26 Sep 2019 22:28
control message for bug #30993
(address . control@debbugs.gnu.org)
87r242ls3n.fsf@gnu.org
merge 30993 37309
quit
L
L
Ludovic Courtès wrote on 26 Sep 2019 22:29
(address . control@debbugs.gnu.org)
87pnjmls36.fsf@gnu.org
retitle 30993 'ssh-daemon' fails to start
quit
J
J
Jelle Licht wrote on 26 Nov 2019 19:34
Re: bug#37309: ‘ssh-daemon’ service fails to start at boot
(address . 37309@debbugs.gnu.org)
87y2w2mqpf.fsf@jlicht.xyz
Hey 宋文武, Giovanni,

iyzsong@member.fsf.org (宋文武) writes:

Toggle quote (11 lines)
> [...]
> Yes, I think when 'ssh-daemon' failed to start, shepherd should respawn
> it until success or disable it, but by look at the code of
> 'make-forkexec-constructor', when using 'pid-file' (as 'ssh-ademon'
> does), and a timeout (default to 5s %pid-file-timeout) is reached, the
> processes got a 'SIGTERM' and return '#f' as its running state, which
> won't be respawn (it's not a pid number) I guess...
>
> To ludo: Is my analysis correct? It's not clear to me how to fix it so
> 'ssh-daemon' can be respawn though...

I think I am also running into a similar issue on my spinning rust based
T400. Is there a workaround available that does the above, or is that
analysis of the situation not correct either?

Thanks,

Jelle
G
G
Giovanni Biscuolo wrote on 29 Nov 2019 09:40
(address . 37309@debbugs.gnu.org)
87imn3f52y.fsf@roquette.mug.biscuolo.net
Hi Jelle,

Jelle Licht <jlicht@fsfe.org> writes:

[...]

Toggle quote (3 lines)
> I think I am also running into a similar issue on my spinning rust based
> T400. Is there a workaround available that does the above,

I added `(extra-content "ListenAddress 0.0.0.0")` to my
openssh-configuration, to only listen on IPv4 addresses:

Toggle snippet (9 lines)
(service openssh-service-type
(openssh-configuration
(port-number 22)
(extra-content "ListenAddress 0.0.0.0")
(authorized-keys
`(("g" ,(local-file "keys/ssh/g.pub"))
("hydra",(local-file "keys/ssh/hydra.pub"))))))

I tried to reboot several times one machine I can use for testing and it
works for me: please can you try and report if this also works for you?

[...]

Thanks! Gio'

--
Giovanni Biscuolo

Xelera IT Infrastructures
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEERcxjuFJYydVfNLI5030Op87MORIFAl3g2YUACgkQ030Op87M
ORJMOA//ZyYsA6NbjXJuTpdp1wE1G8wfZwNdlKFUKgbJL4u8Vdsn/b/UTdCU+sOz
bNNq6LCwJEYcVwowSz+2qLGJeOHwCYKwHp4mz+GzeQ41evHVJQYuYCMv5TcP10gw
l7RW4rxP1yWRDacdc3bUts/VfhIYykxQcBdmm6zqY+Cg9SsAARIbZn4PRt7ci5JC
wfXVCaJu4Diob8lyAt8Gb46aiZiQ3sKRui9xLfvNDD6L/owXbjBwGrXuvToTIAEr
rCGImbBSgruYkRM6/1R5Oly3c0wJ5vQbULi/yb+HbVWfd8MmDWu2z2QzNXcsGrxv
8xx82QF4sltfvgABWnBq1EVAWFuSgaBJi7kTVTM1+4KVvRoRPsxwXbkfwMZbtNUE
KvhMDeLeGdBehiX1n2eISFefI3kkhdlho2VqiPqskj/q0sRia6NNIC/zXH/RPsXb
alc/tcR9OWRHJfnHgZQ1650dCOxWynVwQQB10/Eg4pFfTCSy8B2jfUOzLn6fXyd8
tUz2R+v7YO78XfzUgam9lLpi/p9bvlUS8V22n/vbhOAx0bQSmuGfTQRvu2MJ3xV9
oq3/Y5hqhLdG/Z6XH+ftjzbsg/HGs23O1tOhFFAjrfoAR2rcXSLOYhAnOQXdMBMs
rLQ3s7a8NNZbdWhWDNBD09JgGsLhf55I2g8QjSUTB95lwzB5gAg=
=Qf9S
-----END PGP SIGNATURE-----

J
J
Jelle Licht wrote on 29 Nov 2019 10:51
(address . 37309@debbugs.gnu.org)
87r21rui19.fsf@jlicht.xyz
Hi Giovanni,


Giovanni Biscuolo <g@xelera.eu> writes:

Toggle quote (25 lines)
> Hi Jelle,
>
> Jelle Licht <jlicht@fsfe.org> writes:
>
> [...]
>
>> I think I am also running into a similar issue on my spinning rust based
>> T400. Is there a workaround available that does the above,
>
> I added `(extra-content "ListenAddress 0.0.0.0")` to my
> openssh-configuration, to only listen on IPv4 addresses:
>
> --8<---------------cut here---------------start------------->8---
> (service openssh-service-type
> (openssh-configuration
> (port-number 22)
> (extra-content "ListenAddress 0.0.0.0")
> (authorized-keys
> `(("g" ,(local-file "keys/ssh/g.pub"))
> ("hydra",(local-file "keys/ssh/hydra.pub"))))))
> --8<---------------cut here---------------end--------------->8---
>
> I tried to reboot several times one machine I can use for testing and it
> works for me: please can you try and report if this also works for you?

This, in combination with setting the pid-file-timeout to 30 seconds,
made everything work! I guess it is a combination of fun IPv6
interactions with extremely slow and busy spinning rust.

Thank you!

This does still like a workaround instead of a proper fix though; is
there something we can do to mitigate these issues in the first place?

- Jelle
L
L
Leo Famulari wrote on 3 Dec 2019 21:12
[PATCH] services: openssh: Restrict to IPv4.
(address . 37309@debbugs.gnu.org)
180aa2dee4e1da7fe915c85b90b1f60edd04f23d.1575403967.git.leo@famulari.name

* gnu/services/ssh.scm (<openssh-configuration>)[address-family]: New field.
(openssh-config-file): Use it.
* doc/guix.texi: Document it.
---
doc/guix.texi | 10 ++++++++++
gnu/services/ssh.scm | 16 +++++++++++++++-
2 files changed, 25 insertions(+), 1 deletion(-)

Toggle diff (64 lines)
diff --git a/doc/guix.texi b/doc/guix.texi
index 39eb25385c..cf0e141baf 100644
--- a/doc/guix.texi
+++ b/doc/guix.texi
@@ -13913,6 +13913,16 @@ This is a symbol specifying the logging level: @code{quiet}, @code{fatal},
 @code{error}, @code{info}, @code{verbose}, @code{debug}, etc.  See the man
 page for @file{sshd_config} for the full list of level names.
 
+@item @code{address-family} (default: @code{'inet})
+This is a symbol specifying which type of internet addresses should be
+handled by @command{sshd}.  The options are @code{inet} (IPv4),
+@code{inet6} (IPv6), or @code{any}, which selects both @code{inet} and
+@code{inet6}.  The upstream default in @code{any}.  However, we
+currently default to @code{inet} due to a nondeterministic
+@command{sshd} startup failure when using IPv6 on Guix.  See
+@uref{https://issues.guix.info/issue/30993, the bug report} for more
+information on this temporary limitation.
+
 @item @code{extra-content} (default: @code{""})
 This field can be used to append arbitrary text to the configuration file.  It
 is especially useful for elaborate configurations that cannot be expressed
diff --git a/gnu/services/ssh.scm b/gnu/services/ssh.scm
index d2dbb8f80d..7e25810eff 100644
--- a/gnu/services/ssh.scm
+++ b/gnu/services/ssh.scm
@@ -4,6 +4,7 @@
 ;;; Copyright © 2016 Julien Lepiller <julien@lepiller.eu>
 ;;; Copyright © 2017 Clément Lassieur <clement@lassieur.org>
 ;;; Copyright © 2019 Ricardo Wurmus <rekado@elephly.net>
+;;; Copyright © 2019 Leo Famulari <leo@famulari.name>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -340,7 +341,16 @@ The other options should be self-descriptive."
   ;; proposed in <https://bugs.gnu.org/27155>.  Keep it internal/undocumented
   ;; for now.
   (%auto-start?          openssh-auto-start?
-                         (default #t)))
+                         (default #t))
+
+  ;; Symbol
+  ;; XXX: This shouldn't be required, but due to limitations with IPv6
+  ;; on Guix, sshd often fails to start when it attempts to bind to both
+  ;; 0.0.0.0 and ::, because the IPv6 interface is not ready in time.
+  ;; Accepted options are inet (IPv4), inet6 (IPv6), or any (both).
+  ;; <https://issues.guix.info/issue/30993>
+  (address-family        openssh-configuration-address-family
+                         (default 'inet)))
 
 (define %openssh-accounts
   (list (user-group (name "sshd") (system? #t))
@@ -468,6 +478,10 @@ of user-name/file-like tuples."
                       (symbol->string
                        (openssh-configuration-log-level config))))
 
+           (format port "AddressFamily ~a\n"
+                   #$(symbol->string
+                      (openssh-configuration-address-family config)))
+
            ;; Add '/etc/authorized_keys.d/%u', which we populate.
            (format port "AuthorizedKeysFile \
  .ssh/authorized_keys .ssh/authorized_keys2 /etc/ssh/authorized_keys.d/%u\n")
-- 
2.24.0
J
J
Julien Lepiller wrote on 3 Dec 2019 22:53
9AF0F57B-ED38-4A4F-9D34-B0A083DBBB3C@lepiller.eu
Le 3 décembre 2019 21:12:51 GMT+01:00, Leo Famulari <leo@famulari.name> a écrit :
Toggle quote (26 lines)
>
>* gnu/services/ssh.scm (<openssh-configuration>)[address-family]: New
>field.
>(openssh-config-file): Use it.
>* doc/guix.texi: Document it.
>---
> doc/guix.texi | 10 ++++++++++
> gnu/services/ssh.scm | 16 +++++++++++++++-
> 2 files changed, 25 insertions(+), 1 deletion(-)
>
>diff --git a/doc/guix.texi b/doc/guix.texi
>index 39eb25385c..cf0e141baf 100644
>--- a/doc/guix.texi
>+++ b/doc/guix.texi
>@@ -13913,6 +13913,16 @@ This is a symbol specifying the logging level:
>@code{quiet}, @code{fatal},
>@code{error}, @code{info}, @code{verbose}, @code{debug}, etc. See the
>man
> page for @file{sshd_config} for the full list of level names.
>
>+@item @code{address-family} (default: @code{'inet})
>+This is a symbol specifying which type of internet addresses should be
>+handled by @command{sshd}. The options are @code{inet} (IPv4),
>+@code{inet6} (IPv6), or @code{any}, which selects both @code{inet} and
>+@code{inet6}. The upstream default in @code{any}. However, we
default *is*
Toggle quote (54 lines)
>+currently default to @code{inet} due to a nondeterministic
>+@command{sshd} startup failure when using IPv6 on Guix. See
>+@uref{https://issues.guix.info/issue/30993, the bug report} for more
>+information on this temporary limitation.
>+
> @item @code{extra-content} (default: @code{""})
>This field can be used to append arbitrary text to the configuration
>file. It
>is especially useful for elaborate configurations that cannot be
>expressed
>diff --git a/gnu/services/ssh.scm b/gnu/services/ssh.scm
>index d2dbb8f80d..7e25810eff 100644
>--- a/gnu/services/ssh.scm
>+++ b/gnu/services/ssh.scm
>@@ -4,6 +4,7 @@
> ;;; Copyright © 2016 Julien Lepiller <julien@lepiller.eu>
> ;;; Copyright © 2017 Clément Lassieur <clement@lassieur.org>
> ;;; Copyright © 2019 Ricardo Wurmus <rekado@elephly.net>
>+;;; Copyright © 2019 Leo Famulari <leo@famulari.name>
> ;;;
> ;;; This file is part of GNU Guix.
> ;;;
>@@ -340,7 +341,16 @@ The other options should be self-descriptive."
>;; proposed in <https://bugs.gnu.org/27155>. Keep it
>internal/undocumented
> ;; for now.
> (%auto-start? openssh-auto-start?
>- (default #t)))
>+ (default #t))
>+
>+ ;; Symbol
>+ ;; XXX: This shouldn't be required, but due to limitations with IPv6
>+ ;; on Guix, sshd often fails to start when it attempts to bind to
>both
>+ ;; 0.0.0.0 and ::, because the IPv6 interface is not ready in time.
>+ ;; Accepted options are inet (IPv4), inet6 (IPv6), or any (both).
>+ ;; <https://issues.guix.info/issue/30993>
>+ (address-family openssh-configuration-address-family
>+ (default 'inet)))
>
> (define %openssh-accounts
> (list (user-group (name "sshd") (system? #t))
>@@ -468,6 +478,10 @@ of user-name/file-like tuples."
> (symbol->string
> (openssh-configuration-log-level config))))
>
>+ (format port "AddressFamily ~a\n"
>+ #$(symbol->string
>+ (openssh-configuration-address-family config)))
>+
> ;; Add '/etc/authorized_keys.d/%u', which we populate.
> (format port "AuthorizedKeysFile \
>.ssh/authorized_keys .ssh/authorized_keys2
>/etc/ssh/authorized_keys.d/%u\n")
L
L
Leo Famulari wrote on 4 Dec 2019 14:41
(name . Julien Lepiller)(address . julien@lepiller.eu)(address . 37309@debbugs.gnu.org)
20191204134135.GA7375@jasmine.lan
On Tue, Dec 03, 2019 at 10:53:11PM +0100, Julien Lepiller wrote:
Toggle quote (8 lines)
> Le 3 d�cembre 2019 21:12:51 GMT+01:00, Leo Famulari <leo@famulari.name> a �crit :
> >+@item @code{address-family} (default: @code{'inet})
> >+This is a symbol specifying which type of internet addresses should be
> >+handled by @command{sshd}. The options are @code{inet} (IPv4),
> >+@code{inet6} (IPv6), or @code{any}, which selects both @code{inet} and
> >+@code{inet6}. The upstream default in @code{any}. However, we
> default *is*

Thanks!

This patch did make sshd work for me again.

However, as part of trying to debug this issue, I changed my system
configuration so that it uses dhcp-client-service and
wpa-supplicant-service instead of using Wicd. And now I can't reproduce
the bug anymore.

I guess that either 1) wpa_supplicant brings the network interfaces up
faster or 2) the state of the network interfaces is more accurately
captured with these services (in the sense of, is the network up?).

Tricky...

Does the patch help anybody else?
L
L
Ludovic Courtès wrote on 10 Dec 2019 17:47
(name . Leo Famulari)(address . leo@famulari.name)
87tv68m8ki.fsf@gnu.org
Hi Leo,

Leo Famulari <leo@famulari.name> skribis:

Toggle quote (22 lines)
> On Tue, Dec 03, 2019 at 10:53:11PM +0100, Julien Lepiller wrote:
>> Le 3 décembre 2019 21:12:51 GMT+01:00, Leo Famulari <leo@famulari.name> a écrit :
>> >+@item @code{address-family} (default: @code{'inet})
>> >+This is a symbol specifying which type of internet addresses should be
>> >+handled by @command{sshd}. The options are @code{inet} (IPv4),
>> >+@code{inet6} (IPv6), or @code{any}, which selects both @code{inet} and
>> >+@code{inet6}. The upstream default in @code{any}. However, we
>> default *is*
>
> Thanks!
>
> This patch did make sshd work for me again.
>
> However, as part of trying to debug this issue, I changed my system
> configuration so that it uses dhcp-client-service and
> wpa-supplicant-service instead of using Wicd. And now I can't reproduce
> the bug anymore.
>
> I guess that either 1) wpa_supplicant brings the network interfaces up
> faster or 2) the state of the network interfaces is more accurately
> captured with these services (in the sense of, is the network up?).

Did anyone manage to get an strace log as was discussed in

That would allow us to know where this is hanging exactly (probably
bind(2) on an IPv6 address.)

Thanks,
Ludo’.
M
M
maxim.cournoyer wrote on 18 Aug 2020 06:08
control message for bug #30993
(address . control@debbugs.gnu.org)
87wo1wppdk.fsf@hurd.i-did-not-set--mail-host-address--so-tickle-me
tags 30993 fixed
close 30993
quit
C
C
Christopher Lemmer Webber wrote on 27 Nov 2020 23:57
unarchive 37309
(address . control@debbugs.gnu.org)
87y2imjtm0.fsf@dustycloud.org
unarchive 37309
C
C
Christopher Lemmer Webber wrote on 28 Nov 2020 00:00
Re: bug#37309: ‘ssh-daemon’ service fails to start at boot
(name . Giovanni Biscuolo)(address . g@xelera.eu)
87tutajtgf.fsf@dustycloud.org
Giovanni Biscuolo writes:

Toggle quote (15 lines)
> Hi,
>
> following a recent discussion on guix-sysadmin I have to confirm the
> ssh-daemon issue since it is still happening on some of the machines I
> administer
>
> Previous possibly related bug reports are
> https://issues.guix.gnu.org/issue/30993 and
> https://issues.guix.gnu.org/issue/32197
>
> Unfortunately this issue is *not* well reproducible, it depends on some
> mysterious (to me) timing factor; AFAIU it does *not* depend on the
> shepherd version, probably it depends on "something" related to IPv6
> (read below the details)

This issue continues to plauge me, and has ever since I started to use
GuixSD. However it is much worse now that I am running Guix on
servers... I frequently have to log in via Linode's (nonfree!) web
console on every server that is rebooted and kick herd to restart
openssh. Once I do that it's fine.

I don't think my linode machine is on "spinning rust" so I don't think
this is the cause. IPv6, maybe? Dunno what.

However I think that it's probably really a dependency issue somewhere;
herd is starting opensshd before some other dependent service is
spawned. But what? Maybe something authentication related like
networking, or something. But hm, networking is required...

I'm assuming others must be experiencing this still too... right?

Would really like to see it fixed. It's one of the few things holding
me back from recommending Guix on servers to others.

Do others have any idea?

I noticed the lsh daemon requires networking. Why doesn't openssh?

What about the following "fix"?

Toggle diff (13 lines)
diff --git a/gnu/services/ssh.scm b/gnu/services/ssh.scm
index 1891db0487..c9bd62bab7 100644
--- a/gnu/services/ssh.scm
+++ b/gnu/services/ssh.scm
@@ -508,7 +508,7 @@ of user-name/file-like tuples."
 
   (list (shepherd-service
          (documentation "OpenSSH server.")
-         (requirement '(syslogd loopback))
+         (requirement '(syslogd networking loopback))
          (provision '(ssh-daemon ssh sshd))
          (start #~(make-forkexec-constructor #$openssh-command
                                              #:pid-file #$pid-file))
M
M
Marius Bakke wrote on 28 Nov 2020 02:08
(address . 37309@debbugs.gnu.org)
87k0u6xp7x.fsf@gnu.org
Christopher Lemmer Webber <cwebber@dustycloud.org> skriver:

Toggle quote (23 lines)
> Giovanni Biscuolo writes:
>
>> Hi,
>>
>> following a recent discussion on guix-sysadmin I have to confirm the
>> ssh-daemon issue since it is still happening on some of the machines I
>> administer
>>
>> Previous possibly related bug reports are
>> https://issues.guix.gnu.org/issue/30993 and
>> https://issues.guix.gnu.org/issue/32197
>>
>> Unfortunately this issue is *not* well reproducible, it depends on some
>> mysterious (to me) timing factor; AFAIU it does *not* depend on the
>> shepherd version, probably it depends on "something" related to IPv6
>> (read below the details)
>
> This issue continues to plauge me, and has ever since I started to use
> GuixSD. However it is much worse now that I am running Guix on
> servers... I frequently have to log in via Linode's (nonfree!) web
> console on every server that is rebooted and kick herd to restart
> openssh. Once I do that it's fine.

Can you share an excerpt of /var/log/messages (ideally the whole boot
sequence) from when SSH failed to start?

Toggle quote (10 lines)
> I don't think my linode machine is on "spinning rust" so I don't think
> this is the cause. IPv6, maybe? Dunno what.
>
> However I think that it's probably really a dependency issue somewhere;
> herd is starting opensshd before some other dependent service is
> spawned. But what? Maybe something authentication related like
> networking, or something. But hm, networking is required...
>
> I'm assuming others must be experiencing this still too... right?

FWIW I have never encountered this. :-/

Toggle quote (7 lines)
> Would really like to see it fixed. It's one of the few things holding
> me back from recommending Guix on servers to others.
>
> Do others have any idea?
>
> I noticed the lsh daemon requires networking. Why doesn't openssh?

It's really for legacy reasons, from before we had the Guix System
installer. Then a common way to install was to run dhclient and
"herd start ssh-daemon" manually on the live image, so people could
do the installation over SSH:


Nowadays, the installer gives a nice and quick way to deploy a minimal
system, and I suspect the SSH method has fallen out of favor.

Toggle quote (2 lines)
> What about the following "fix"?

[...]

Toggle quote (5 lines)
> (list (shepherd-service
> (documentation "OpenSSH server.")
> - (requirement '(syslogd loopback))
> + (requirement '(syslogd networking loopback))

If it works for you, let's do this. It would be good to find the
underlying cause though...

Not sure what to do about the installer however: perhaps create
yet-another undocumented field of openssh-service-type that makes the
networking requirement optional?
-----BEGIN PGP SIGNATURE-----

iQFDBAEBCgAtFiEEu7At3yzq9qgNHeZDoqBt8qM6VPoFAl/BoxIPHG1hcml1c0Bn
bnUub3JnAAoJEKKgbfKjOlT6IkcIALXF4JaUnoObn2DOBkcTf83l7xOGTVP8CHMJ
IBtQNc5hwpMKj39uzSM2CJPRmNxIqpcFbFYqUDATz9S6UET+mTHuQ9mCL2XsF5tF
wRjneR/vUaCC0uSKOK01hdUon1dNzTY3lAfwEu41b/zLLXwjMZcnIlA4roIlcYhN
VhM9oVpoGPZCoB9FpcnO7ab7pmvecsjajDYSqTrYgrwFrKZEulOm4lYPIDun7jHp
JAzepeEfApYd7jPI346I0x+/xdGzXtcw5xprhfV6QkVqzeNZaLhZTMyn+iJjGl3H
sYiw1DmSRn8zNz1YO7zt3Bk0mq5BLNiONXxdm6Z0Hcbq+Eh6k6A=
=6Eah
-----END PGP SIGNATURE-----

L
L
Leo Famulari wrote on 3 Dec 2020 21:38
Re: bug#37309: ‘ssh-daemo n’ service fails to start at boot
(name . Marius Bakke)(address . marius@gnu.org)
X8lM42LVEYWePEdJ@jasmine.lan
On Sat, Nov 28, 2020 at 02:08:34AM +0100, Marius Bakke wrote:
Toggle quote (5 lines)
> Christopher Lemmer Webber <cwebber@dustycloud.org> skriver:
> > I'm assuming others must be experiencing this still too... right?
>
> FWIW I have never encountered this. :-/

I reenabled IPv6 listening for sshd after updating to 1.2.0 and things
are working for now. The problem has always been intermittent for me in
the past.

Chris, are you using an old Thinkpad too?
-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEEsFFZSPHn08G5gDigJkb6MLrKfwgFAl/JTOMACgkQJkb6MLrK
fwh3ehAA5eqcYL2e9Um8q35fSDuChLmmJWdxqzUec+9W64i7tQbi6Ni30jj9HH6o
7EPcK6TqCRWQ9GlSRpJKqUWX8s+vRZS9fMR/0g9ohDdkaasaUW+5hnTJ3GBoDy8i
BVeJPvlz1Z7cLw7HPOXzQ+fpp/lI68qKr5Q7czoy/rd0ktSYJqhqtS3T2lCG9EZ5
Z+sQXCvrRGgoYlAoKCgdEFmsbCG2yRKhGQduCdmHqgX2X3jvrvWvA+4jjhGy+QSy
yBVgDNBnnyj8R1L0M8lcuGRZCZLA5s9Wh5DTLHnm+aheq7YKr9crSbdHTB4TI7O8
zYis/5xL+oY1HNzR4Y5jdzDYneEBHOGmWoctUsTkfRd6xC2PtcieDvP59XoPKJKJ
H2eOi7iZC8pH3IbEfrv+HSQobqtILh4wLKDKfdl+yrp0BvCokRPzhtYk++neeHle
+XMfuij1riUWuTApJh4gmHpoMhobXqV6wbD6ZVIhpR6DWWgfUbePGG2zelReEfAs
GgFwgSttdSVd0n3e+qrlYbdlbcilo6GxTHy64LoyepSbTn72iMOZG3k5+eoJSx1T
OzRdHRdD/jQt/Im5AISF6zMjTEQHGoqQXAkXB2/FS4Zx0gLLeubT2m2uOaqW2fNW
rJ2TGCOlWTOSvLL0FWyuZN+SFJ/+ZRABi2Np36u7K8iPDIzTMhA=
=039v
-----END PGP SIGNATURE-----


C
C
Christopher Lemmer Webber wrote on 3 Dec 2020 22:56
Re: bug#37309: ‘ssh-daemon’ service fails to start at boot
(name . Leo Famulari)(address . leo@famulari.name)
87pn3qzh7r.fsf@dustycloud.org
Leo Famulari writes:

Toggle quote (12 lines)
> On Sat, Nov 28, 2020 at 02:08:34AM +0100, Marius Bakke wrote:
>> Christopher Lemmer Webber <cwebber@dustycloud.org> skriver:
>> > I'm assuming others must be experiencing this still too... right?
>>
>> FWIW I have never encountered this. :-/
>
> I reenabled IPv6 listening for sshd after updating to 1.2.0 and things
> are working for now. The problem has always been intermittent for me in
> the past.
>
> Chris, are you using an old Thinkpad too?

I did experience it on an old thinkpad, though in this case it's
happening on the Linode server I'm running. Not particularly old, but
probably shared by many users and thus slower in some way.

That's part of what makes me think this is some kind of race
condition...
?
Your comment

This issue is archived.

To comment on this conversation send email to 37309@debbugs.gnu.org