shepherd: detailed output should be placed into well-known location and not tty

OpenSubmitted by ng0.
Details
7 participants
  • Brice Waegeneire
  • Clément Lassieur
  • conjaroy
  • Ludovic Courtès
  • Mark H Weaver
  • ng0
  • Robert Vollmert
Owner
unassigned
Severity
important
Merged with
N
(address . bug-guix@gnu.org)
20180325183555.cilo6qyrj43jh6he@abyayala
Problem, not just when a service is misbehaving after successful system reconfigure:

$ sudo herd start smtpd
Password:
Service smtpd could not be started.
herd: failed to start service smtpd



This is on virtual terminal in X11, as well as in /var/log/messages,
/var/log/shepherd.log, etc.
This is not enough. If I wanted more info, I'd expect that
sudo herd status smtpd would give it (which it does not), so the only
reliable source of information so far is tty 1. Can we fix that in
one of the next shepherd releases? Or is this something we have to
fix in Guix?
L
L
Ludovic Courtès wrote on 26 Mar 2018 15:41
(name . ng0)(address . ng0@n0.is)(address . 30939@debbugs.gnu.org)
87a7uvdke0.fsf@gnu.org
Hi ng0,

ng0 <ng0@n0.is> skribis:

Toggle quote (17 lines)
> Problem, not just when a service is misbehaving after successful system reconfigure:
>
> $ sudo herd start smtpd
> Password:
> Service smtpd could not be started.
> herd: failed to start service smtpd
>
>
>
> This is on virtual terminal in X11, as well as in /var/log/messages,
> /var/log/shepherd.log, etc.
> This is not enough. If I wanted more info, I'd expect that
> sudo herd status smtpd would give it (which it does not), so the only
> reliable source of information so far is tty 1. Can we fix that in
> one of the next shepherd releases? Or is this something we have to
> fix in Guix?

So you’re saying that you’d like shepherd to show more info as to why
the service could not be started, right?

Thanks,
Ludo’.
N
(name . Ludovic Courtès)(address . ludo@gnu.org)
20180326150859.6yf244bgjxi4oawt@abyayala
Hi Ludovic,

Ludovic Courtès transcribed 790 bytes:
Toggle quote (27 lines)
> Hi ng0,
>
> ng0 <ng0@n0.is> skribis:
>
> > Problem, not just when a service is misbehaving after successful system reconfigure:
> >
> > $ sudo herd start smtpd
> > Password:
> > Service smtpd could not be started.
> > herd: failed to start service smtpd
> >
> >
> >
> > This is on virtual terminal in X11, as well as in /var/log/messages,
> > /var/log/shepherd.log, etc.
> > This is not enough. If I wanted more info, I'd expect that
> > sudo herd status smtpd would give it (which it does not), so the only
> > reliable source of information so far is tty 1. Can we fix that in
> > one of the next shepherd releases? Or is this something we have to
> > fix in Guix?
>
> So you’re saying that you’d like shepherd to show more info as to why
> the service could not be started, right?
>
> Thanks,
> Ludo’.

Must have been late and too many failed attempts at what I'm trying to do.
Yes. So I can't make any daemons I run out there fail, but for the
current case I have in Guix for this:

Sometimes I succeed building a system generation with an OpenSMTPD config-file
which has syntax error that aren't picked up at configure time. When I reboot,
not being aware of this, I have to switch to tty to read the reasons why it
crashed.
Because this is a desktop system, I have to start the service again to see
the error output directly from the daemon.

I think I know why this happens (that the output goes to tty), but nevertheless
it would be good if shepherd were more capable than beint captain obvious:
Start: "Oh, you see it is started". Crashed: "Oh, no has your daemon crashed?",
like it is now.
....
Okay, I just looked at some other daemon controls I run, and maybe it's good that
shepherd is limited in its output. It does this one job. What I'd like to have
as a sysadmin is the ability to tail something like say /var/log/shepherd.fail.log
and services which are failing log into this file (or a set of files in /var/log/shepherd/
in files like $daemonname.fail.log).
Given the absence of the kitchensink of tools in systemd, you got used to something like
"status" and immediate "HELLO! This is why I failed: (5 lines)". With shepherd, you can't
even grep for the failures in locations newcomers to the system would assume (like:
/var/log/shepherd.log (it is the daemon control application)).

Long store short, greping for failures to fix daemon configurations and not having to
look at tty 1 (which can be noisy depending on what you run, I have some notorious tty
spammers) would be good.
And not sacrifice the simplicity of Shepherd :)
L
L
Ludovic Courtès wrote on 27 Mar 2018 09:36
(name . ng0)(address . ng0@n0.is)(address . 30939@debbugs.gnu.org)
87o9jaynpn.fsf@gnu.org
Hello,

ng0 <ng0@n0.is> skribis:

Toggle quote (7 lines)
> Sometimes I succeed building a system generation with an OpenSMTPD config-file
> which has syntax error that aren't picked up at configure time. When I reboot,
> not being aware of this, I have to switch to tty to read the reasons why it
> crashed.
> Because this is a desktop system, I have to start the service again to see
> the error output directly from the daemon.

I think shepherd could capture stdout/stderr of the processes it starts
and make it available, in a way similar in spirit to what ‘journalctl’
does. That would allow you to see the output of the daemon that failed.

That’s the only solution I can think of. Of course we don’t have to do
that if the daemon writes error messages to syslog, but not all of them do.

Thanks,
Ludo’.
N
(name . Ludovic Courtès)(address . ludo@gnu.org)
20180327091529.pcju57bro2snuxtb@abyayala
Ludovic Courtès transcribed 839 bytes:
Toggle quote (15 lines)
> Hello,
>
> ng0 <ng0@n0.is> skribis:
>
> > Sometimes I succeed building a system generation with an OpenSMTPD config-file
> > which has syntax error that aren't picked up at configure time. When I reboot,
> > not being aware of this, I have to switch to tty to read the reasons why it
> > crashed.
> > Because this is a desktop system, I have to start the service again to see
> > the error output directly from the daemon.
>
> I think shepherd could capture stdout/stderr of the processes it starts
> and make it available, in a way similar in spirit to what ‘journalctl’
> does. That would allow you to see the output of the daemon that failed.

That sounds good.

Toggle quote (6 lines)
> That’s the only solution I can think of. Of course we don’t have to do
> that if the daemon writes error messages to syslog, but not all of them do.
>
> Thanks,
> Ludo’.

Well, just a way to catch them would be good.

Thanks
C
C
Clément Lassieur wrote on 27 Mar 2018 20:59
(name . Ludovic Courtès)(address . ludo@gnu.org)
87370lcpjx.fsf@lassieur.org
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (15 lines)
> Hello,
>
> ng0 <ng0@n0.is> skribis:
>
>> Sometimes I succeed building a system generation with an OpenSMTPD config-file
>> which has syntax error that aren't picked up at configure time. When I reboot,
>> not being aware of this, I have to switch to tty to read the reasons why it
>> crashed.
>> Because this is a desktop system, I have to start the service again to see
>> the error output directly from the daemon.
>
> I think shepherd could capture stdout/stderr of the processes it starts
> and make it available, in a way similar in spirit to what ‘journalctl’
> does. That would allow you to see the output of the daemon that failed.

That would be great!

Toggle quote (5 lines)
> That’s the only solution I can think of. Of course we don’t have to do
> that if the daemon writes error messages to syslog, but not all of them do.
>
> Thanks,
> Ludo’.
M
M
Mark H Weaver wrote on 27 Mar 2018 22:13
(name . Ludovic Courtès)(address . ludo@gnu.org)
87fu4l1dkv.fsf@netris.org
ludo@gnu.org (Ludovic Courtès) writes:

Toggle quote (13 lines)
> ng0 <ng0@n0.is> skribis:
>
>> Sometimes I succeed building a system generation with an OpenSMTPD config-file
>> which has syntax error that aren't picked up at configure time. When I reboot,
>> not being aware of this, I have to switch to tty to read the reasons why it
>> crashed.
>> Because this is a desktop system, I have to start the service again to see
>> the error output directly from the daemon.
>
> I think shepherd could capture stdout/stderr of the processes it starts
> and make it available, in a way similar in spirit to what ‘journalctl’
> does. That would allow you to see the output of the daemon that failed.

This would be very helpful.

Mark
C
C
Clément Lassieur wrote on 10 Jul 2018 11:13
control message for bug #30939
(address . control@debbugs.gnu.org)
87fu0rihbi.fsf@lassieur.org
severity 30939 important
R
R
Robert Vollmert wrote on 26 Jun 2019 20:07
still relevant
(address . 30939@debbugs.gnu.org)
FABCF7A5-CE74-478C-B581-D3142FECD545@vllmrt.net
This came up again recently, compare the discussion here:


Here’s some code to wrap an executable manually to capture its output
and send it to syslog:

(define* (logger-wrapper name exec . args)
"Return a derivation that builds a script to start a process with
standard output and error redirected to syslog via logger."
(define exp
#~(begin
(use-modules (ice-9 popen))
(let* ((pid (number->string (getpid)))
(logger #$(file-append inetutils "/bin/logger"))
(args (list "-t" #$name (string-append "--id=" pid)))
(pipe (apply open-pipe* OPEN_WRITE logger args)))
(dup pipe 1)
(dup pipe 2)
(execl #$exec #$exec #$@args))))
(program-file (string-append name "-logger") exp))
B
B
Brice Waegeneire wrote on 5 Apr 2020 16:28
merge 36264 30939
(address . control@debbugs.gnu.org)
10441295b633fc99948b7eed4aad51f3@waegenei.re
severity 36264 important
merge 36264 30939
quit
C
C
conjaroy wrote on 18 Jul 2020 18:27
shepherd: detailed output should be placed into well-known location and not tty
(address . 30939@debbugs.gnu.org)
CABWzUjVRoB5G1u6frO09UVZQrB6BxrFD0uSvXg94dYwKKxfcFw@mail.gmail.com
Hello -

I too have found that debugging is a challenge when a service's
stdout/stderr aren't captured automatically. From my point of view though,
the issue is not just that certain binaries lack syslog support: since a
service implementation's gexp can do much more than just exec a binary, and
since mistakes in gexps usually go unnoticed until a runtime, I've found it
easy to write scripts that trigger fatal Guile errors before the service
binary is even started (syntax errors, missing `use-modules` declarations,
etc.)

Will the solution proposed here capture output for this class of errors as
well?
Attachment: file
?