start shepherd when a previous instance was killed by kill -9

  • Open
  • quality assurance status badge
Details
2 participants
  • Danny Milosavljevic
  • gfleury
Owner
unassigned
Submitted by
gfleury
Severity
normal
G
G
gfleury wrote on 27 Sep 2020 10:00
start shepherd when a previous instance was killed by kill -9
(address . bug-guix@gnu.org)
87k0wfejjw.fsf@disroot.org
Hi,

when killing shepherd i.e `pkill -9 shepherd` it left behind
`default-socket-file` and when restarted whithout remove the socket like
---------------------------------------------------------
rm /var/run/user/1000/shepherd/socket
---------------------------------------------------------

it throws a error:
---------------------------------------------------------
3 (primitive-load "/home/gfleury/prod/shepherd/./shepherd")
In shepherd.scm:
56:14 2 (main . _)
49:6 1 (open-server-socket _)
In unknown file:
0 (bind #<input-output: socket 16> #(1 "/run/user/1000?") #)

ERROR: In procedure bind:
In procedure bind: Address already in use
---------------------------------------------------------

something like this patch can fix it.
From 7d16c47bad6fd98cf0838d2fcd62735d846e7bab Mon Sep 17 00:00:00 2001
From: gfleury <gfleury@disroot.org>
Date: Sun, 27 Sep 2020 09:29:37 +0200
Subject: [PATCH] ensure that `default-socket-file` is not present.

* modules/shepherd.scm(main): remove a possible `default-socket-file`
left by a previous instance.
---
modules/shepherd.scm | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

Toggle diff (18 lines)
diff --git a/modules/shepherd.scm b/modules/shepherd.scm
index 9f80f62..d18567e 100644
--- a/modules/shepherd.scm
+++ b/modules/shepherd.scm
@@ -147,7 +147,10 @@ already ~a threads running, disabling 'signalfd' support")
(initialize-cli)
(let ((config-file #f)
- (socket-file default-socket-file)
+ (socket-file
+ (begin
+ (false-if-exception (delete-file default-socket-file))
+ default-socket-file))
(pid-file #f)
(secure #t)
(logfile #f))
--
2.28.0
D
D
Danny Milosavljevic wrote on 27 Sep 2020 16:19
Re: bug#43643: start shepherd when a previous instance was killed by kill -9
(name . gfleury)(address . gfleury@disroot.org)(address . 43643@debbugs.gnu.org)
20200927161906.399fe259@scratchpost.org
Hello,

On Sun, 27 Sep 2020 10:00:03 +0200
gfleury <gfleury@disroot.org> wrote:

Toggle quote (15 lines)
> it throws a error:
> ---------------------------------------------------------
> 3 (primitive-load "/home/gfleury/prod/shepherd/./shepherd")
> In shepherd.scm:
> 56:14 2 (main . _)
> 49:6 1 (open-server-socket _)
> In unknown file:
> 0 (bind #<input-output: socket 16> #(1 "/run/user/1000?") #)
>
> ERROR: In procedure bind:
> In procedure bind: Address already in use
> ---------------------------------------------------------
>
> something like this patch can fix it.

Please don't do it that way.

Shepherd has to be able to ascertain that it is not running yet before
starting yet another instance in parallel.

I don't like PID and socket files either--but it's just what we have
available.

Maybe find out who is at the other side of the socket
(connect and then use getpeername on the socket or something ?
maybe even just trying to connect fails, which would be good for this).

I think UNIX domain sockets are made in a way that it doesn't matter
whether the server or the client connects first, so even that would
probably not be reliable.

So maybe just live with having to remove the socket file yourself.

I'm open to other suggestions that are safe that accomplish the same goal.
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl9wn1oACgkQ5xo1VCww
uqXoCwgAoMcxwXRxKGV1diFf7Ii5eJ7vz8PnZbVFaPpCcGHPlF6v7JuYJUOnn/L0
wBvXIbZhefMXdlEk6pQz8RrLvT9A1HFSyPBSCB1mVHw7ln0xJaxVKa9mPeALh4X/
m1voIDSPSC6iAMFgXAUai7Q8DDO9HnGYSWZfJw2icTtKGJ5b7VmwosGLUhZrIDR9
jCH0duwxRjZppkppkkDgXpVYAqs4WVq47jzUr1C0xA0u0iAkVP9tEYrd7MhFQG9S
pjlkOa+4L/ndFDcGwEQX+phF+hL5i+eFjD/OyszVsGJuw9xiGmiAvhAGaYbiLVjp
bj+K0C2w791Mn18yDPwP3w/zkmOhWQ==
=Cc66
-----END PGP SIGNATURE-----


G
G
gfleury wrote on 27 Sep 2020 20:09
(name . Danny Milosavljevic)(address . dannym@scratchpost.org)(address . 43643@debbugs.gnu.org)
b6da8dfecdc84d1a2de64a43f99cbeb8@disroot.org
hello,

27 septembre 2020 16:29 "Danny Milosavljevic" <dannym@scratchpost.org> a écrit:

Toggle quote (25 lines)
> Hello,
>
> On Sun, 27 Sep 2020 10:00:03 +0200
> gfleury <gfleury@disroot.org> wrote:
>
>> it throws a error:
>> ---------------------------------------------------------
>> 3 (primitive-load "/home/gfleury/prod/shepherd/./shepherd")
>> In shepherd.scm:
>> 56:14 2 (main . _)
>> 49:6 1 (open-server-socket _)
>> In unknown file:
>> 0 (bind #<input-output: socket 16> #(1 "/run/user/1000?") #)
>>
>> ERROR: In procedure bind:
>> In procedure bind: Address already in use
>> ---------------------------------------------------------
>>
>> something like this patch can fix it.
>
> Please don't do it that way.
>
> Shepherd has to be able to ascertain that it is not running yet before
> starting yet another instance in parallel.
>
i missed that part.

Toggle quote (15 lines)
> I don't like PID and socket files either--but it's just what we have
> available.
>
> Maybe find out who is at the other side of the socket
> (connect and then use getpeername on the socket or something ?
> maybe even just trying to connect fails, which would be good for this).
>
> I think UNIX domain sockets are made in a way that it doesn't matter
> whether the server or the client connects first, so even that would
> probably not be reliable.
>
> So maybe just live with having to remove the socket file yourself.
>
> I'm open to other suggestions that are safe that accomplish the same goal.

yes a better solution is needed.
?