Shepherd Sometimes Crashes

  • Done
  • quality assurance status badge
Details
5 participants
  • Katherine Cox-Buday
  • Efraim Flashner
  • Ludovic Courtès
  • Mathieu Othacehe
  • Mathieu Othacehe
Owner
unassigned
Submitted by
Katherine Cox-Buday
Severity
important
Merged with
K
K
Katherine Cox-Buday wrote on 21 May 2020 04:59
(address . bug-guix@gnu.org)
87d06yc7t4.fsf@gmail.com
I am running shepherd as a userspace service manager on an alien distro.
Occassionally (often enough as to cause concern), Shepherd is crashing.
I am unable to narrow down a cause, but anecdotally, it seems to happen
more often when a service it's managing fails repeatedly and is
disabled.

I'm running `strace` against the Shepherd process in an attempt to
submit a better bug report, but this is all I have for now. Maybe others
have also seen this behavior.

--
Katherine
E
E
Efraim Flashner wrote on 21 May 2020 14:14
(name . Katherine Cox-Buday)(address . cox.katherine.e@gmail.com)(address . 41429@debbugs.gnu.org)
20200521121443.GC958@E5400
On Wed, May 20, 2020 at 09:59:03PM -0500, Katherine Cox-Buday wrote:
Toggle quote (10 lines)
> I am running shepherd as a userspace service manager on an alien distro.
> Occassionally (often enough as to cause concern), Shepherd is crashing.
> I am unable to narrow down a cause, but anecdotally, it seems to happen
> more often when a service it's managing fails repeatedly and is
> disabled.
>
> I'm running `strace` against the Shepherd process in an attempt to
> submit a better bug report, but this is all I have for now. Maybe others
> have also seen this behavior.

I found it happens less often with shepherd-0.8. What version are you
running? Also possibly related, do you have mismatched versions of guile
between guix packages and your distro's native packages?

I've also sometimes found shepherd to crash when I add a service where
the start command is "wrong", as though the error were so bad that
shepherd says "Nope! That's it! I quit!"

I'd suggest looking at .config/shepherd/shepherd.log but it's rather
sparse. Still, it might have something useful.

--
Efraim Flashner <efraim@flashner.co.il> ????? ?????
GPG key = A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted
-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEoov0DD5VE3JmLRT3Qarn3Mo9g1EFAl7GcLAACgkQQarn3Mo9
g1HBrRAAo8tkuKQq8iFEDcca1Ns2wBjX/YkRMxZ0YKdjNQbUtAwwX/C58fNhBbEx
6azwrBPcgM2UNs5h5BZf5pCNbNgwUvz5FI+X7fpoXuSD7kc6MHQZSePy7rAyQV/t
8sAjUgBDeojp2fMZ/BryWrmnvFNJzXQy1bItUZRwfVm3xVdwrXwyFQbxslOpvves
wmysubdsu+lQZfbyNYyw5JFYhiIqk+Gzel0OwKCVayg36qGMlywyJnlyO2tW65uP
6R07Cv/BB4o2ho05p9R1nd8acHBViZscGr/5oaaWQcnIi9ci5FT6Oo/I8yQqF45f
0bXEZXmMn9NpChT8IGWePX8L2zCLXDXJPmb5khHxx4uSXBl0VRNBbAOtVKJPfSk1
kOWbjZEz4Qv/cpHJnUYRgooNJ0sl4LvHQeQLw2jH1Kwr3/9AO6mfSaaB8gN+l9ET
TyWpkRSQkBaH4f3Lu4sMJUoW3A0LgDTvcDuXEkOMaNZd2vgYOt4KucBtDH/1FfUE
UT9ZFolDxKTZmtztZNHl5bysj/6Gn6kRoFbL+z2JJH/BZQyccH72YRmY+dYZdzdW
xDcPp2IE1aTmeZxNgaiNv6r4UK+alTMOFWKWiQUJ3+NpIMxwU+HezKu1r2Ji5BlW
6Ojw9Hj2W5qxRTlf4mCTnKTEX8jYmZfQaCXyLAPxPvbRtzSQmEU=
=G/Uj
-----END PGP SIGNATURE-----


K
K
Katherine Cox-Buday wrote on 21 May 2020 14:51
(name . Efraim Flashner)(address . efraim@flashner.co.il)(address . 41429@debbugs.gnu.org)
87sgftbgd1.fsf@gmail.com
Efraim Flashner <efraim@flashner.co.il> writes:

Toggle quote (15 lines)
> On Wed, May 20, 2020 at 09:59:03PM -0500, Katherine Cox-Buday wrote:
>> I am running shepherd as a userspace service manager on an alien distro.
>> Occassionally (often enough as to cause concern), Shepherd is crashing.
>> I am unable to narrow down a cause, but anecdotally, it seems to happen
>> more often when a service it's managing fails repeatedly and is
>> disabled.
>>
>> I'm running `strace` against the Shepherd process in an attempt to
>> submit a better bug report, but this is all I have for now. Maybe others
>> have also seen this behavior.
>
> I found it happens less often with shepherd-0.8. What version are you
> running? Also possibly related, do you have mismatched versions of guile
> between guix packages and your distro's native packages?

Sorry, I forgot to include the version! I am running 0.8 from a store
which I update ~1 week.

Toggle quote (4 lines)
> I've also sometimes found shepherd to crash when I add a service where
> the start command is "wrong", as though the error were so bad that
> shepherd says "Nope! That's it! I quit!"

I'm doing very standard things with `make-forkexec-constructor`, so I
wouldn't expect any problems there.

Your comment is kind of scary though! Shepherd is the thing I want to
stay up no matter what since it's responsible for monitoring and
restarting things. The idea that a misbehaving or poorly written service
could bring down the entire Shepherd process is a problem! Is there no
isolation?

Toggle quote (3 lines)
> I'd suggest looking at .config/shepherd/shepherd.log but it's rather
> sparse. Still, it might have something useful.

Yes, this is the first place I looked, but unfortunately there wasn't
much usable informatino.

--
Katherine
E
E
Efraim Flashner wrote on 21 May 2020 16:04
(name . Katherine Cox-Buday)(address . cox.katherine.e@gmail.com)(address . 41429@debbugs.gnu.org)
20200521140442.GF958@E5400
On Thu, May 21, 2020 at 07:51:54AM -0500, Katherine Cox-Buday wrote:
Toggle quote (33 lines)
> Efraim Flashner <efraim@flashner.co.il> writes:
>
> > On Wed, May 20, 2020 at 09:59:03PM -0500, Katherine Cox-Buday wrote:
> >> I am running shepherd as a userspace service manager on an alien distro.
> >> Occassionally (often enough as to cause concern), Shepherd is crashing.
> >> I am unable to narrow down a cause, but anecdotally, it seems to happen
> >> more often when a service it's managing fails repeatedly and is
> >> disabled.
> >>
> >> I'm running `strace` against the Shepherd process in an attempt to
> >> submit a better bug report, but this is all I have for now. Maybe others
> >> have also seen this behavior.
> >
> > I found it happens less often with shepherd-0.8. What version are you
> > running? Also possibly related, do you have mismatched versions of guile
> > between guix packages and your distro's native packages?
>
> Sorry, I forgot to include the version! I am running 0.8 from a store
> which I update ~1 week.
>
> > I've also sometimes found shepherd to crash when I add a service where
> > the start command is "wrong", as though the error were so bad that
> > shepherd says "Nope! That's it! I quit!"
>
> I'm doing very standard things with `make-forkexec-constructor`, so I
> wouldn't expect any problems there.
>
> Your comment is kind of scary though! Shepherd is the thing I want to
> stay up no matter what since it's responsible for monitoring and
> restarting things. The idea that a misbehaving or poorly written service
> could bring down the entire Shepherd process is a problem! Is there no
> isolation?

I have a whole collection of attempts to integrate mcron with shepherd,
to create loops and add jobs only when the service is active. Attempting
to fork off and then collect the child process and then fail just enough
to make the service restart. Lots of cringe-worthy code. The more common
fail scenarios I see are shepherd fails to start because it doesn't like
my start code of one of the services or actually starting the service
somehow kills it. All of those were with straight lambdas to the start
command though.

Do you have your services writing out any logs? Maybe there's a clue
there.

Toggle quote (9 lines)
> > I'd suggest looking at .config/shepherd/shepherd.log but it's rather
> > sparse. Still, it might have something useful.
>
> Yes, this is the first place I looked, but unfortunately there wasn't
> much usable informatino.
>
> --
> Katherine

--
Efraim Flashner <efraim@flashner.co.il> ????? ?????
GPG key = A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted
-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEoov0DD5VE3JmLRT3Qarn3Mo9g1EFAl7GinYACgkQQarn3Mo9
g1E7Zw/+L2CA+Yy0ewq6WgTq+CmcVRUju6X9PvR8Od1Q6QxWKl4p0xdcJjI21OKt
uSz0OmoM+cZRud7EZtXpbRds1k4ar6ZmM9pJv5WUBZaF11kISrxjJnncbEsHCy0U
NwIEp4OSiZRubBiBzST7Wb9fr3XZSK4rvuSqmr+9OPKkj6ekZcIa51PG7h2wODyn
2gjqYdfXoKXxCB7RECRPw7v92k6QGuqnSAXlwi9fNg1ZojFECelaL0b4liqb23wG
AbJ4HmatBagLo5TezO9g6KdhxS3VfEvqsuN1h6JwHXCYoAJsfN3HN9R8KovNf/Wi
mL+WIxq0FpRX5rexV7GkZaC86ABGspxmrbPnnqPktCqjwwMHPo4iFeHIIzx1w9VM
PPzAg3Da2TilkR5z0h4Td+nKNvCjSQ6C6WZhlxaG/uOSPYSSBApbYWgRg81x+xq6
m11UroqNSQ34PekPhl7u1Bowillyd1OvK1tIi8as7i6DEEFzsjRafP+cfZa0apkt
9LGPYXjL/me4y9ZWhXnF3gYA32lCKdyCIphLr2iJvppA44wubMsG7piE1HPkBzTm
GxkdHgS1qrVW8ucYgw9KVLiB3DxwY5a3RP4jBjPx/GBNZ+bq5DKG4c58CuZZmIF2
Z4hCU448pYKS6mFGgCUCdalPeFF6u219tTiB7XprcXKhmuhEKFA=
=GPjE
-----END PGP SIGNATURE-----


K
K
Katherine Cox-Buday wrote on 21 May 2020 17:59
(name . Efraim Flashner)(address . efraim@flashner.co.il)(address . 41429@debbugs.gnu.org)
87k115b7o0.fsf@gmail.com
Efraim Flashner <efraim@flashner.co.il> writes:

Toggle quote (15 lines)
>> Your comment is kind of scary though! Shepherd is the thing I want to
>> stay up no matter what since it's responsible for monitoring and
>> restarting things. The idea that a misbehaving or poorly written service
>> could bring down the entire Shepherd process is a problem! Is there no
>> isolation?
>
> I have a whole collection of attempts to integrate mcron with shepherd,
> to create loops and add jobs only when the service is active. Attempting
> to fork off and then collect the child process and then fail just enough
> to make the service restart. Lots of cringe-worthy code. The more common
> fail scenarios I see are shepherd fails to start because it doesn't like
> my start code of one of the services or actually starting the service
> somehow kills it. All of those were with straight lambdas to the start
> command though.

I'm not familiar with Shepherd's internals, so I don't know why
interacting with a cron is relevant.

Toggle quote (3 lines)
> Do you have your services writing out any logs? Maybe there's a clue
> there.

Not yet, but I should be enabling this soon, and if they display
anything I'll report back.

Still, this seems beside the point: the bug is that Shepherd needs to
stay up regardless of what the services it's monitoring do.

--
Katherine
M
M
Mathieu Othacehe wrote on 22 May 2020 19:39
(name . Katherine Cox-Buday)(address . cox.katherine.e@gmail.com)(address . 41429@debbugs.gnu.org)
877dx3vphe.fsf@gnu.org
Hello Katherine,

Toggle quote (4 lines)
> I'm running `strace` against the Shepherd process in an attempt to
> submit a better bug report, but this is all I have for now. Maybe others
> have also seen this behavior.

Yes, I have observed this behavior. This should be fixed with the
upcoming 0.8.1 release of Shepherd (hopefully !).


Thanks for reporting,

Mathieu
L
L
Ludovic Courtès wrote on 22 May 2020 22:15
control message for bug #41429
(address . control@debbugs.gnu.org)
87y2pj915p.fsf@gnu.org
severity 41429 important
quit
L
L
Ludovic Courtès wrote on 22 May 2020 22:15
control message for bug #40981
(address . control@debbugs.gnu.org)
87wo539159.fsf@gnu.org
merge 40981 41429
quit
M
M
Mathieu Othacehe wrote on 20 Jun 2020 11:54
(address . control@debbugs.gnu.org)
877dw25age.fsf@meru.i-did-not-set--mail-host-address--so-tickle-me
close 40981
quit
?