dockerd fails to start on boot

  • Done
  • quality assurance status badge
Details
4 participants
  • Oleg Pykhalov
  • Luciano Laratelli
  • Ludovic Courtès
  • Maxim Cournoyer
Owner
unassigned
Submitted by
Luciano Laratelli
Severity
normal
Merged with
L
L
Luciano Laratelli wrote on 13 Jun 2022 00:56
(address . bug-guix@gnu.org)
87h74pa3rp.fsf@kong.network
Attachment: file
Attachment: file
M
M
Maxim Cournoyer wrote on 24 Jun 2022 07:08
control message for bug #55936
(address . control@debbugs.gnu.org)
87wnd6zm08.fsf@gmail.com
merge 55936 38432
quit
M
M
Maxim Cournoyer wrote on 24 Jun 2022 07:11
Re: bug#55936: dockerd fails to start on boot
(name . Luciano Laratelli)(address . luciano@laratel.li)(address . 55936@debbugs.gnu.org)
87sfnuzluq.fsf@gmail.com
Hello,

Luciano Laratelli <luciano@laratel.li> writes:

Toggle quote (16 lines)
> Hi, hope you are doing well.
>
> I’m running Guix System and am seeing that `dockerd' fails to start on boot due to not being able to find `containerd':
>
> $ sudo tail /var/log/docker.log
> 2022-06-12 18:25:29 time=“2022-06-12T18:25:29.969005384-04:00” level=warning msg=“Error (Unable to complete atomic operation, key modified) deleting object [endpoint 062e6856b7776daf35f1d570dc7e055d3c0f3eefc0f58c5e279eba20035c8e9e eb10082295c7a53d882e36d93a8b5eb20e980a5950c4a67fa03444274448b232], retrying….”
> 2022-06-12 18:25:30 time=“2022-06-12T18:25:30.068910364-04:00” level=info msg=“Removing stale sandbox e35667a7ef1441bced213cf035efc9d6c71a0dce7f8941e3fbb63f5a27265bca (91314e5594f72585f9df121ba16cc8d67c4e1fcb91fc3c7b9b0660aed1b3054a)”
> 2022-06-12 18:25:30 time=“2022-06-12T18:25:30.080685302-04:00” level=warning msg=“Error (Unable to complete atomic operation, key modified) deleting object [endpoint 062e6856b7776daf35f1d570dc7e055d3c0f3eefc0f58c5e279eba20035c8e9e 825f4a6f68b1b81b24b2edc0b382deca116e72a75e6207036f24e18ba6434c81], retrying….”
> 2022-06-12 18:25:30 time=“2022-06-12T18:25:30.143624227-04:00” level=info msg=“Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option –bip can be used to set a preferred IP address”
> 2022-06-12 18:25:31 time=“2022-06-12T18:25:31.400700443-04:00” level=info msg=“Loading containers: done.”
> 2022-06-12 18:25:31 time=“2022-06-12T18:25:31.689183684-04:00” level=info msg=“Docker daemon” commit=v19.03.15 graphdriver(s)=overlay2 version=19.03.15-ce
> 2022-06-12 18:25:31 time=“2022-06-12T18:25:31.691171101-04:00” level=info msg=“Daemon has completed initialization”
> 2022-06-12 18:25:31 time=“2022-06-12T18:25:31.961049886-04:00” level=info msg=“API listen on /var/run/docker.sock”
> 2022-06-12 18:43:43 time=“2022-06-12T18:43:43.503118343-04:00” level=info msg=“Starting up”
> 2022-06-12 18:43:43 failed to start containerd: exec: “containerd”: executable file not found in $PATH

It seems there's a race condition between containerd and docker (the
later starts before the former is done launching and it fails to see it,
aborting, if I understand). We should see if we can migrate the
dockerd-service-type to use the newly introduced systemd-style
constructor.

Thanks,

Maxim
O
O
Oleg Pykhalov wrote on 2 Jul 2022 12:41
[PATCH] services: docker: Fix race condition.
(address . 55936@debbugs.gnu.org)
20220702104106.16997-1-go.wigust@gmail.com

* gnu/packages/patches/containerd-create-pid-file.patch: New file.
* gnu/local.mk (dist_patch_DATA): Add this.
* gnu/packages/docker.scm (containerd)[source]: Add this patch.
* gnu/services/docker.scm
(containerd-shepherd-service): Add #:pid-file and #:pid-file-timeout.
* gnu/services/docker.scm (docker-shepherd-service): Add --containerd flag.
---
gnu/local.mk | 3 +-
gnu/packages/docker.scm | 6 ++--
.../patches/containerd-create-pid-file.patch | 31 +++++++++++++++++++
gnu/services/docker.scm | 5 ++-
4 files changed, 41 insertions(+), 4 deletions(-)
create mode 100644 gnu/packages/patches/containerd-create-pid-file.patch

Toggle diff (107 lines)
diff --git a/gnu/local.mk b/gnu/local.mk
index 3a56ad371d..5cd235286c 100644
--- a/gnu/local.mk
+++ b/gnu/local.mk
@@ -17,7 +17,7 @@
# Copyright © 2017, 2020 Mathieu Othacehe <m.othacehe@gmail.com>
# Copyright © 2017, 2018, 2019 Gábor Boskovits <boskovits@gmail.com>
# Copyright © 2018 Amirouche Boubekki <amirouche@hypermove.net>
-# Copyright © 2018, 2019, 2020, 2021 Oleg Pykhalov <go.wigust@gmail.com>
+# Copyright © 2018, 2019, 2020, 2021, 2022 Oleg Pykhalov <go.wigust@gmail.com>
# Copyright © 2018 Stefan Stefanovi? <stefanx2ovic@gmail.com>
# Copyright © 2018, 2020, 2021, 2022 Maxim Cournoyer <maxim.cournoyer@gmail.com>
# Copyright © 2019, 2020, 2021, 2022 Guillaume Le Vaillant <glv@posteo.net>
@@ -965,6 +965,7 @@ dist_patch_DATA = \
%D%/packages/patches/cmh-support-fplll.patch \
%D%/packages/patches/coda-use-system-libs.patch \
%D%/packages/patches/collectd-5.11.0-noinstallvar.patch \
+ %D%/packages/patches/containerd-create-pid-file.patch \
%D%/packages/patches/combinatorial-blas-awpm.patch \
%D%/packages/patches/combinatorial-blas-io-fix.patch \
%D%/packages/patches/cool-retro-term-wctype.patch \
diff --git a/gnu/packages/docker.scm b/gnu/packages/docker.scm
index ae4ee419af..184280b38f 100644
--- a/gnu/packages/docker.scm
+++ b/gnu/packages/docker.scm
@@ -6,7 +6,7 @@
;;; Copyright © 2020 Michael Rohleder <mike@rohleder.de>
;;; Copyright © 2020 Katherine Cox-Buday <cox.katherine.e@gmail.com>
;;; Copyright © 2020 Jesse Dowell <jessedowell@gmail.com>
-;;; Copyright © 2021 Oleg Pykhalov <go.wigust@gmail.com>
+;;; Copyright © 2021, 2022 Oleg Pykhalov <go.wigust@gmail.com>
;;; Copyright © 2022 Pierre Langlois <pierre.langlois@gmx.com>
;;;
;;; This file is part of GNU Guix.
@@ -184,7 +184,9 @@ (define-public containerd
(commit (string-append "v" version))))
(file-name (git-file-name name version))
(sha256
- (base32 "1vsl747i3wyy68j4lp4nprwxadbyga8qxlrk892afcd2990zp5mr"))))
+ (base32 "1vsl747i3wyy68j4lp4nprwxadbyga8qxlrk892afcd2990zp5mr"))
+ (patches
+ (search-patches "containerd-create-pid-file.patch"))))
(build-system go-build-system)
(arguments
(let ((make-flags #~(list (string-append "VERSION=" #$version)
diff --git a/gnu/packages/patches/containerd-create-pid-file.patch b/gnu/packages/patches/containerd-create-pid-file.patch
new file mode 100644
index 0000000000..668ffcd9e9
--- /dev/null
+++ b/gnu/packages/patches/containerd-create-pid-file.patch
@@ -0,0 +1,31 @@
+Copyright © 2022 Oleg Pykhalov <go.wigust@gmail.com>
+
+Create a PID file after containerd is ready to serve requests.
+
+Fixes <https://issues.guix.gnu.org/38432>.
+
+--- a/cmd/containerd/command/notify_linux.go 1970-01-01 03:00:01.000000000 +0300
++++ b/cmd/containerd/command/notify_linux.go 2022-07-02 04:42:35.553753495 +0300
+@@ -22,15 +22,22 @@
+ sd "github.com/coreos/go-systemd/v22/daemon"
+
+ "github.com/containerd/containerd/log"
++
++ "os"
++ "strconv"
+ )
+
+ // notifyReady notifies systemd that the daemon is ready to serve requests
+ func notifyReady(ctx context.Context) error {
++ pidFile, _ := os.Create("/run/containerd/containerd.pid")
++ defer pidFile.Close()
++ pidFile.WriteString(strconv.FormatInt(int64(os.Getpid()), 10))
+ return sdNotify(ctx, sd.SdNotifyReady)
+ }
+
+ // notifyStopping notifies systemd that the daemon is about to be stopped
+ func notifyStopping(ctx context.Context) error {
++ os.Remove("/run/containerd/containerd.pid")
+ return sdNotify(ctx, sd.SdNotifyStopping)
+ }
+
diff --git a/gnu/services/docker.scm b/gnu/services/docker.scm
index 846ebe8334..741bab5a8c 100644
--- a/gnu/services/docker.scm
+++ b/gnu/services/docker.scm
@@ -98,6 +98,8 @@ (define (containerd-shepherd-service config)
;; For finding containerd-shim binary.
#:environment-variables
(list (string-append "PATH=" #$containerd "/bin"))
+ #:pid-file "/run/containerd/containerd.pid"
+ #:pid-file-timeout 300
#:log-file "/var/log/containerd.log"))
(stop #~(make-kill-destructor)))))
@@ -135,7 +137,8 @@ (define (docker-shepherd-service config)
'("--userland-proxy=false"))
(if #$enable-iptables?
"--iptables"
- "--iptables=false"))
+ "--iptables=false")
+ "--containerd" "/run/containerd/containerd.sock")
#:environment-variables
(list #$@environment-variables)
#:pid-file "/var/run/docker.pid"
--
2.36.0
O
O
Oleg Pykhalov wrote on 2 Jul 2022 13:39
control message for bug #55936
(address . control@debbugs.gnu.org)
62c02e8f.1c69fb81.cca20.7706@mx.google.com
merge 55936 38432
quit
M
M
Maxim Cournoyer wrote on 10 Jul 2022 07:10
Re: [PATCH] services: docker: Fix race condition.
(name . Oleg Pykhalov)(address . go.wigust@gmail.com)(address . 55936@debbugs.gnu.org)
878rp1ft82.fsf@gmail.com
Hi Oleg,

Oleg Pykhalov <go.wigust@gmail.com> writes:

Toggle quote (9 lines)
>
> * gnu/packages/patches/containerd-create-pid-file.patch: New file.
> * gnu/local.mk (dist_patch_DATA): Add this.
> * gnu/packages/docker.scm (containerd)[source]: Add this patch.
> * gnu/services/docker.scm
> (containerd-shepherd-service): Add #:pid-file and #:pid-file-timeout.
> * gnu/services/docker.scm (docker-shepherd-service): Add --containerd flag.

Thanks for this, it looks promising!

Before we go forward though, had you consider using a
'make-systemd-constructor' as now available in Shepherd 0.9+ ? I
remember Docker supports systemd socket activation for synchronizing its
services; it could be a simpler, no-code solution.

Would you like to give it a try?

Thanks,

Maxim
M
M
Maxim Cournoyer wrote on 13 Jul 2022 23:06
Re: bug#55936: dockerd fails to start on boot
(name . Oleg Pykhalov)(address . go.wigust@gmail.com)(address . 55936@debbugs.gnu.org)
87wncgyb67.fsf_-_@gmail.com
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (20 lines)
> Hi Oleg,
>
> Oleg Pykhalov <go.wigust@gmail.com> writes:
>
>> Fixes <https://issues.guix.gnu.org/38432>.
>>
>> * gnu/packages/patches/containerd-create-pid-file.patch: New file.
>> * gnu/local.mk (dist_patch_DATA): Add this.
>> * gnu/packages/docker.scm (containerd)[source]: Add this patch.
>> * gnu/services/docker.scm
>> (containerd-shepherd-service): Add #:pid-file and #:pid-file-timeout.
>> * gnu/services/docker.scm (docker-shepherd-service): Add --containerd flag.
>
> Thanks for this, it looks promising!
>
> Before we go forward though, had you consider using a
> 'make-systemd-constructor' as now available in Shepherd 0.9+ ? I
> remember Docker supports systemd socket activation for synchronizing its
> services; it could be a simpler, no-code solution.

I've researched more on the topic, and it appears what I had on mind is
rather systemd's socket *notification* (what they call 'sdNotify')
rather than activation. Activation is just to lazy start things... it
probably wouldn't help here, rather it seems it'd be a bad idea, as
realized elsewhere [0].


All that to say that I shall be reviewing your patches shortly :-).

Thank you,

Maxim
M
M
Maxim Cournoyer wrote on 14 Jul 2022 03:40
(name . Oleg Pykhalov)(address . go.wigust@gmail.com)(address . 55936-done@debbugs.gnu.org)
877d4g1nfe.fsf@gmail.com
Hi Oleg,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (19 lines)
> Hi,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:
>
>> Hi Oleg,
>>
>> Oleg Pykhalov <go.wigust@gmail.com> writes:
>>
>>> Fixes <https://issues.guix.gnu.org/38432>.
>>>
>>> * gnu/packages/patches/containerd-create-pid-file.patch: New file.
>>> * gnu/local.mk (dist_patch_DATA): Add this.
>>> * gnu/packages/docker.scm (containerd)[source]: Add this patch.
>>> * gnu/services/docker.scm
>>> (containerd-shepherd-service): Add #:pid-file and #:pid-file-timeout.
>>> * gnu/services/docker.scm (docker-shepherd-service): Add --containerd flag.
>>
>> Thanks for this, it looks promising!

[...]

Toggle quote (2 lines)
> All that to say that I shall be reviewing your patches shortly :-).

Now done; it all looks good to me! I've run the docker system test, and
installed it on my machine, rebooted, confirmed it was up, restarted
containerd a couple times and checked the PID content matched its actual
PID, and it seems to behave as expected!

Pushed as b33e1a183f6756514e6b6a3b84054a232dbddad4.

Thank you!

Maxim
Closed
L
L
Ludovic Courtès wrote on 15 Jul 2022 12:20
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
87k08e8yo1.fsf@gnu.org
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (8 lines)
> I've researched more on the topic, and it appears what I had on mind is
> rather systemd's socket *notification* (what they call 'sdNotify')
> rather than activation. Activation is just to lazy start things... it
> probably wouldn't help here, rather it seems it'd be a bad idea, as
> realized elsewhere [0].
>
> [0] https://github.com/containerd/containerd/issues/164#issuecomment-657536515

Currently the Shepherd implements activation as lazy start, but we
should add an option for “eager socket activation” where the daemon is
started right away.

Such activation is still useful as a synchronization mechanism: you can
tell the service is ready to serve requests as soon as the socket has
been created.

Thanks,
Ludo’.
M
M
Maxim Cournoyer wrote on 16 Jul 2022 03:55
(name . Ludovic Courtès)(address . ludo@gnu.org)
87pmi5stxf.fsf@gmail.com
Hi,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (20 lines)
> Hi,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> I've researched more on the topic, and it appears what I had on mind is
>> rather systemd's socket *notification* (what they call 'sdNotify')
>> rather than activation. Activation is just to lazy start things... it
>> probably wouldn't help here, rather it seems it'd be a bad idea, as
>> realized elsewhere [0].
>>
>> [0] https://github.com/containerd/containerd/issues/164#issuecomment-657536515
>
> Currently the Shepherd implements activation as lazy start, but we
> should add an option for “eager socket activation” where the daemon is
> started right away.
>
> Such activation is still useful as a synchronization mechanism: you can
> tell the service is ready to serve requests as soon as the socket has
> been created.

But this relies on the application behaving that way (e.g., waiting for
the socket to be opened, rather than expecting things to be ready and
failing), right?

If I understand correctly, the sdNotify mechanism in systemd is a means
that let the application notify systemd when it is ready, so that
systemd itself can ensure the ordering relationships. So on systemd
containerd would be marked as 'starting' by systemd until it notifies it
that it's good via sdNotify, and docker.service would be waiting on it
until after containerd has started since it is ordered to start after it
[0]


Thanks,

Maxim
?