Shepherd Sometimes Crashes

OpenSubmitted by Katherine Cox-Buday.
Details
4 participants
  • Katherine Cox-Buday
  • Efraim Flashner
  • Ludovic Courtès
  • Mathieu Othacehe
Owner
unassigned
Severity
important
Merged with
K
K
Katherine Cox-Buday wrote on 21 May 04:59 +0200
(address . bug-guix@gnu.org)
87d06yc7t4.fsf@gmail.com
I am running shepherd as a userspace service manager on an alien distro.Occassionally (often enough as to cause concern), Shepherd is crashing.I am unable to narrow down a cause, but anecdotally, it seems to happenmore often when a service it's managing fails repeatedly and isdisabled.
I'm running `strace` against the Shepherd process in an attempt tosubmit a better bug report, but this is all I have for now. Maybe othershave also seen this behavior.
-- Katherine
E
E
Efraim Flashner wrote on 21 May 14:14 +0200
(name . Katherine Cox-Buday)(address . cox.katherine.e@gmail.com)(address . 41429@debbugs.gnu.org)
20200521121443.GC958@E5400
On Wed, May 20, 2020 at 09:59:03PM -0500, Katherine Cox-Buday wrote:
Toggle quote (10 lines)> I am running shepherd as a userspace service manager on an alien distro.> Occassionally (often enough as to cause concern), Shepherd is crashing.> I am unable to narrow down a cause, but anecdotally, it seems to happen> more often when a service it's managing fails repeatedly and is> disabled.> > I'm running `strace` against the Shepherd process in an attempt to> submit a better bug report, but this is all I have for now. Maybe others> have also seen this behavior.
I found it happens less often with shepherd-0.8. What version are yourunning? Also possibly related, do you have mismatched versions of guilebetween guix packages and your distro's native packages?
I've also sometimes found shepherd to crash when I add a service wherethe start command is "wrong", as though the error were so bad thatshepherd says "Nope! That's it! I quit!"
I'd suggest looking at .config/shepherd/shepherd.log but it's rathersparse. Still, it might have something useful.
-- Efraim Flashner <efraim@flashner.co.il> אפרים פלשנרGPG key = A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351Confidentiality cannot be guaranteed on emails sent or received unencrypted
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEoov0DD5VE3JmLRT3Qarn3Mo9g1EFAl7GcLAACgkQQarn3Mo9g1HBrRAAo8tkuKQq8iFEDcca1Ns2wBjX/YkRMxZ0YKdjNQbUtAwwX/C58fNhBbEx6azwrBPcgM2UNs5h5BZf5pCNbNgwUvz5FI+X7fpoXuSD7kc6MHQZSePy7rAyQV/t8sAjUgBDeojp2fMZ/BryWrmnvFNJzXQy1bItUZRwfVm3xVdwrXwyFQbxslOpvveswmysubdsu+lQZfbyNYyw5JFYhiIqk+Gzel0OwKCVayg36qGMlywyJnlyO2tW65uP6R07Cv/BB4o2ho05p9R1nd8acHBViZscGr/5oaaWQcnIi9ci5FT6Oo/I8yQqF45f0bXEZXmMn9NpChT8IGWePX8L2zCLXDXJPmb5khHxx4uSXBl0VRNBbAOtVKJPfSk1kOWbjZEz4Qv/cpHJnUYRgooNJ0sl4LvHQeQLw2jH1Kwr3/9AO6mfSaaB8gN+l9ETTyWpkRSQkBaH4f3Lu4sMJUoW3A0LgDTvcDuXEkOMaNZd2vgYOt4KucBtDH/1FfUEUT9ZFolDxKTZmtztZNHl5bysj/6Gn6kRoFbL+z2JJH/BZQyccH72YRmY+dYZdzdWxDcPp2IE1aTmeZxNgaiNv6r4UK+alTMOFWKWiQUJ3+NpIMxwU+HezKu1r2Ji5BlW6Ojw9Hj2W5qxRTlf4mCTnKTEX8jYmZfQaCXyLAPxPvbRtzSQmEU==G/Uj-----END PGP SIGNATURE-----

K
K
Katherine Cox-Buday wrote on 21 May 14:51 +0200
(name . Efraim Flashner)(address . efraim@flashner.co.il)(address . 41429@debbugs.gnu.org)
87sgftbgd1.fsf@gmail.com
Efraim Flashner <efraim@flashner.co.il> writes:
Toggle quote (15 lines)> On Wed, May 20, 2020 at 09:59:03PM -0500, Katherine Cox-Buday wrote:>> I am running shepherd as a userspace service manager on an alien distro.>> Occassionally (often enough as to cause concern), Shepherd is crashing.>> I am unable to narrow down a cause, but anecdotally, it seems to happen>> more often when a service it's managing fails repeatedly and is>> disabled.>> >> I'm running `strace` against the Shepherd process in an attempt to>> submit a better bug report, but this is all I have for now. Maybe others>> have also seen this behavior.>> I found it happens less often with shepherd-0.8. What version are you> running? Also possibly related, do you have mismatched versions of guile> between guix packages and your distro's native packages?
Sorry, I forgot to include the version! I am running 0.8 from a storewhich I update ~1 week.
Toggle quote (4 lines)> I've also sometimes found shepherd to crash when I add a service where> the start command is "wrong", as though the error were so bad that> shepherd says "Nope! That's it! I quit!"
I'm doing very standard things with `make-forkexec-constructor`, so Iwouldn't expect any problems there.
Your comment is kind of scary though! Shepherd is the thing I want tostay up no matter what since it's responsible for monitoring andrestarting things. The idea that a misbehaving or poorly written servicecould bring down the entire Shepherd process is a problem! Is there noisolation?
Toggle quote (3 lines)> I'd suggest looking at .config/shepherd/shepherd.log but it's rather> sparse. Still, it might have something useful.
Yes, this is the first place I looked, but unfortunately there wasn'tmuch usable informatino.
-- Katherine
E
E
Efraim Flashner wrote on 21 May 16:04 +0200
(name . Katherine Cox-Buday)(address . cox.katherine.e@gmail.com)(address . 41429@debbugs.gnu.org)
20200521140442.GF958@E5400
On Thu, May 21, 2020 at 07:51:54AM -0500, Katherine Cox-Buday wrote:
Toggle quote (33 lines)> Efraim Flashner <efraim@flashner.co.il> writes:> > > On Wed, May 20, 2020 at 09:59:03PM -0500, Katherine Cox-Buday wrote:> >> I am running shepherd as a userspace service manager on an alien distro.> >> Occassionally (often enough as to cause concern), Shepherd is crashing.> >> I am unable to narrow down a cause, but anecdotally, it seems to happen> >> more often when a service it's managing fails repeatedly and is> >> disabled.> >> > >> I'm running `strace` against the Shepherd process in an attempt to> >> submit a better bug report, but this is all I have for now. Maybe others> >> have also seen this behavior.> >> > I found it happens less often with shepherd-0.8. What version are you> > running? Also possibly related, do you have mismatched versions of guile> > between guix packages and your distro's native packages?> > Sorry, I forgot to include the version! I am running 0.8 from a store> which I update ~1 week.> > > I've also sometimes found shepherd to crash when I add a service where> > the start command is "wrong", as though the error were so bad that> > shepherd says "Nope! That's it! I quit!"> > I'm doing very standard things with `make-forkexec-constructor`, so I> wouldn't expect any problems there.> > Your comment is kind of scary though! Shepherd is the thing I want to> stay up no matter what since it's responsible for monitoring and> restarting things. The idea that a misbehaving or poorly written service> could bring down the entire Shepherd process is a problem! Is there no> isolation?
I have a whole collection of attempts to integrate mcron with shepherd,to create loops and add jobs only when the service is active. Attemptingto fork off and then collect the child process and then fail just enoughto make the service restart. Lots of cringe-worthy code. The more commonfail scenarios I see are shepherd fails to start because it doesn't likemy start code of one of the services or actually starting the servicesomehow kills it. All of those were with straight lambdas to the startcommand though.
Do you have your services writing out any logs? Maybe there's a cluethere.
Toggle quote (9 lines)> > I'd suggest looking at .config/shepherd/shepherd.log but it's rather> > sparse. Still, it might have something useful.> > Yes, this is the first place I looked, but unfortunately there wasn't> much usable informatino.> > -- > Katherine
-- Efraim Flashner <efraim@flashner.co.il> אפרים פלשנרGPG key = A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351Confidentiality cannot be guaranteed on emails sent or received unencrypted
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEoov0DD5VE3JmLRT3Qarn3Mo9g1EFAl7GinYACgkQQarn3Mo9g1E7Zw/+L2CA+Yy0ewq6WgTq+CmcVRUju6X9PvR8Od1Q6QxWKl4p0xdcJjI21OKtuSz0OmoM+cZRud7EZtXpbRds1k4ar6ZmM9pJv5WUBZaF11kISrxjJnncbEsHCy0UNwIEp4OSiZRubBiBzST7Wb9fr3XZSK4rvuSqmr+9OPKkj6ekZcIa51PG7h2wODyn2gjqYdfXoKXxCB7RECRPw7v92k6QGuqnSAXlwi9fNg1ZojFECelaL0b4liqb23wGAbJ4HmatBagLo5TezO9g6KdhxS3VfEvqsuN1h6JwHXCYoAJsfN3HN9R8KovNf/WimL+WIxq0FpRX5rexV7GkZaC86ABGspxmrbPnnqPktCqjwwMHPo4iFeHIIzx1w9VMPPzAg3Da2TilkR5z0h4Td+nKNvCjSQ6C6WZhlxaG/uOSPYSSBApbYWgRg81x+xq6m11UroqNSQ34PekPhl7u1Bowillyd1OvK1tIi8as7i6DEEFzsjRafP+cfZa0apkt9LGPYXjL/me4y9ZWhXnF3gYA32lCKdyCIphLr2iJvppA44wubMsG7piE1HPkBzTmGxkdHgS1qrVW8ucYgw9KVLiB3DxwY5a3RP4jBjPx/GBNZ+bq5DKG4c58CuZZmIF2Z4hCU448pYKS6mFGgCUCdalPeFF6u219tTiB7XprcXKhmuhEKFA==GPjE-----END PGP SIGNATURE-----

K
K
Katherine Cox-Buday wrote on 21 May 17:59 +0200
(name . Efraim Flashner)(address . efraim@flashner.co.il)(address . 41429@debbugs.gnu.org)
87k115b7o0.fsf@gmail.com
Efraim Flashner <efraim@flashner.co.il> writes:
Toggle quote (15 lines)>> Your comment is kind of scary though! Shepherd is the thing I want to>> stay up no matter what since it's responsible for monitoring and>> restarting things. The idea that a misbehaving or poorly written service>> could bring down the entire Shepherd process is a problem! Is there no>> isolation?>> I have a whole collection of attempts to integrate mcron with shepherd,> to create loops and add jobs only when the service is active. Attempting> to fork off and then collect the child process and then fail just enough> to make the service restart. Lots of cringe-worthy code. The more common> fail scenarios I see are shepherd fails to start because it doesn't like> my start code of one of the services or actually starting the service> somehow kills it. All of those were with straight lambdas to the start> command though.
I'm not familiar with Shepherd's internals, so I don't know whyinteracting with a cron is relevant.
Toggle quote (3 lines)> Do you have your services writing out any logs? Maybe there's a clue> there.
Not yet, but I should be enabling this soon, and if they displayanything I'll report back.
Still, this seems beside the point: the bug is that Shepherd needs tostay up regardless of what the services it's monitoring do.
-- Katherine
M
M
Mathieu Othacehe wrote on 22 May 19:39 +0200
(name . Katherine Cox-Buday)(address . cox.katherine.e@gmail.com)(address . 41429@debbugs.gnu.org)
877dx3vphe.fsf@gnu.org
Hello Katherine,
Toggle quote (4 lines)> I'm running `strace` against the Shepherd process in an attempt to> submit a better bug report, but this is all I have for now. Maybe others> have also seen this behavior.
Yes, I have observed this behavior. This should be fixed with theupcoming 0.8.1 release of Shepherd (hopefully !).
See: https://lists.gnu.org/archive/html/bug-guix/2020-05/msg00241.html.
Thanks for reporting,
Mathieu
L
L
Ludovic Courtès wrote on 22 May 22:15 +0200
control message for bug #41429
(address . control@debbugs.gnu.org)
87y2pj915p.fsf@gnu.org
severity 41429 importantquit
L
L
Ludovic Courtès wrote on 22 May 22:15 +0200
control message for bug #40981
(address . control@debbugs.gnu.org)
87wo539159.fsf@gnu.org
merge 40981 41429quit
?