Herd `fport_write: Broken pipe` error when running `guix home reconfigure`

  • Open
  • quality assurance status badge
Details
4 participants
  • Elias Kueny
  • Ludovic Courtès
  • richard
  • Tanguy LE CARROUR
Owner
unassigned
Submitted by
Tanguy LE CARROUR
Severity
normal
T
T
Tanguy LE CARROUR wrote on 23 Aug 2023 08:47
(address . bug-guix@gnu.org)
169277323322.1687.15739147838600524142@localhost
Hi Guix,

I think this has been happening for quite a long time, but I never paid
attention to it, for everything seemed to be running just fine.
When I run `guix home reconfigure`, it completes with the following message:

```
# […]
Finished updating symlinks.

SSLoading /gnu/store/zbfyaxxigns5lqyxhxzxhm92w54ns1cz-shepherd.conf.
herd: error: exception caught while executing 'load' on service 'root':
In procedure fport_write: Broken pipe
Comparing /gnu/store/2sr90smj33k686394krjx1klhlpipd7z-home/profile/share/fonts and
/gnu/store/2sr90smj33k686394krjx1klhlpipd7z-home/profile/share/fonts... done (same)
Evaluating on-change gexps.

On-change gexps evaluation finished.
```

I don't know if it's related, but since my last `guix pull`,
`herd status` has been reporting strange statuses:

```
$ herd status
Started:
+ root
Starting:
^ mcron
^ mpd
Stopped:
- tor
- transmission
```

`mcron` and `mpd` (a home service that I will eventually submit) are listed as
`starting`, but they are actually started and they work just fine!?

I had a look at this morning's log. This is what I see right after logging in:

```
2023-08-23 08:21:36 Starting service root...
2023-08-23 08:21:36 Service root started.
2023-08-23 08:21:36 Service root running with value #t.
2023-08-23 08:21:36 Service root has been started.
2023-08-23 08:21:36 Daemonizing...
2023-08-23 08:21:36 Restarting signal handler.
2023-08-23 08:21:36 Now running as process 567.
2023-08-23 08:21:36 Starting services...
2023-08-23 08:21:36 Configuration successfully loaded from '/gnu/store/zbfyaxxigns5lqyxhxzxhm92w54ns1cz-shepherd.conf'.
2023-08-23 08:21:36 Starting service mpd...
2023-08-23 08:21:36 Starting service mcron...
2023-08-23 08:21:36 Service mpd has been started.
2023-08-23 08:21:37 [mpd] Aug 23 08:21 : server_socket: bind to '0.0.0.0:6600' failed (continuing anyway, because binding to '[::]:6600' succeeded): Failed to bind socket: Address already in use
```

The last line is strange. It looks like a second instance of `mpd`
started, tried to bind to the same port and failed!?

After `guix home reconfigure` I get:

```
2023-08-23 08:31:51 Service mcron has been started.
2023-08-23 08:31:51 SSL2023-08-23 08:31:51 oading /gnu/store/zbfyaxxigns5lqyxhxzxhm92w54ns1cz-shepherd.conf.
```

There is no mention to `mpd` and there's a timestamp right in the middle of
the second line?!

When I try to `herd stop SOMETHING`, it hangs forever and I have to
`ctrl+c` it.

Any idea what might be going wrong?!

--
Tanguy
E
E
Elias Kueny wrote on 9 Sep 2023 18:09
(address . 65463@debbugs.gnu.org)
87sf7nfhtf.fsf@posteo.net
I experience the same thing. I hadn't updated guix since may (last generation was commit 91bfd30ee3f35dfb7048bf42aea92f939cffbf17), and since I did I'm encountering issues with shepherd.

Probably unrelated, but for the record: my first issue was caused by the disappearance of the XDG_LOG_HOME environment variable. I was using it in the definition of shepherd services (as for example `#:log-file (string-append (getenv "XDG_LOG_HOME") "/emacs-daemon.log")`. Guix home reconfigure worked because the getenv wasn't evaluated immediately, but after a reboot the syntax error prevented shepherd to start the session.

But once I solved that, same problems: shepherd seems to hang somewhere after starting the home services.

herd status:
Toggle snippet (8 lines)
Started:
+ root
Starting:
^ emacs-daemon
^ ssh-agent
^ syncthing

~/.local/state/log/shepherd.log:
Toggle snippet (18 lines)
2023-09-09 16:12:42 Service root started.
2023-09-09 16:12:42 Service root running with value #t.
2023-09-09 16:12:42 Service root has been started.
2023-09-09 16:12:42 Daemonizing...
2023-09-09 16:12:42 Restarting signal handler.
2023-09-09 16:12:42 Now running as process 430.
2023-09-09 16:12:42 Starting services...
2023-09-09 16:12:42 Configuration successfully loaded from '/gnu/store/mq01z0gvi1zv3skk6xh1q7g4id6hsgdk-shepherd.conf'.
2023-09-09 16:12:42 Starting service ssh-agent...
2023-09-09 16:12:42 Starting service syncthing...
2023-09-09 16:12:42 Starting service emacs-daemon...
2023-09-09 16:12:42 Service ssh-agent has been started.
2023-09-09 16:12:42 Service syncthing has been started.
--- guix home reconfigure happened here ---
2023-09-09 16:59:00 Service emacs-daemon has been started.
2023-09-09 16:59:00 SSSL2023-09-09 16:59:00 oading /gnu/store/mlvqhkb37zy3yycriv3lmqah7yff34af-shepherd.conf.

My 3 services are working normally. If I try to run `herd restart emacs-daemon`, the command hangs until I press Ctrl-C and nothing happens to emacs. Same thing for the other services (and for `herd stop`).

Here's how my services are defined. If I comment out all occurrences of `home-shepherd-service-type` in my home configuration (not just commenting %emacs-daemon-user-service and running it with an empty list of services), then there is no error when running `guix home reconfigure`.

Toggle snippet (19 lines)
(define %emacs-daemon-user-service
(shepherd-service
(documentation "Run emacs-daemon.")
(provision '(emacs-daemon))
(start #~(make-forkexec-constructor
(list #$(file-append (specification->package %emacs-package) "/bin/emacs") "--fg-daemon")
#:log-file (string-append #$(getenv "HOME") "/.local/var/log/emacs-daemon.log")))
(stop #~(make-system-destructor "emacsclient --eval \"(kill-emacs)\""))
(auto-start? #t)
(respawn? #t)))

(define-public emacs-services
(list (simple-service 'emacs-shepherd-service
home-shepherd-service-type
(list %emacs-daemon-user-service))))

(home-environment (services `(,@emacs-services […])))

When reconfiguring with an home-shepherd-service-type service but not shepherd service in the list:

Toggle snippet (14 lines)
Finished updating symlinks.

Loading /gnu/store/26jgrxzmabjdl3nhjx16cqa1f5h3flks-shepherd.conf.
herd: error: exception caught while executing 'load' on service 'root':
In procedure fport_write: Broken pipe
Comparing /gnu/store/4vmxyl8fykz9wkrkicnv5azhvr1gb5i1-home/profile/share/fonts and
/gnu/store/3wlqdh4i4zmwjmqa69isr62nvbgf7abh-home/profile/share/fonts... done (same)
Comparing /gnu/store/4vmxyl8fykz9wkrkicnv5azhvr1gb5i1-home/files/.config/fish/fish_plugins and
/gnu/store/3wlqdh4i4zmwjmqa69isr62nvbgf7abh-home/files/.config/fish/fish_plugins... done (same)
Evaluating on-change gexps.

On-change gexps evaluation finished.

Only new line in ~/.local/state/log/shepherd.log:
Toggle snippet (3 lines)
2023-09-09 17:45:07 Loading /gnu/store/26jgrxzmabjdl3nhjx16cqa1f5h3flks-shepherd.conf.

/gnu/store/26jgrxzmabjdl3nhjx16cqa1f5h3flks-shepherd.conf:
Toggle snippet (3 lines)
(begin (use-modules (srfi srfi-34) (system repl error-handling)) (apply register-services (map (lambda (file) (load file)) (quote ()))) (action (quote root) (quote daemonize)) (format #t "Starting services...~%") (let ((services-to-start (quote ()))) (if (defined? (quote start-in-the-background)) (start-in-the-background services-to-start) (for-each start services-to-start)) (redirect-port (open-input-file "/dev/null") (current-input-port))))

So I would guess it's a problem in home-shepherd-service-type, and not in the custom service definitions. I also remember the broken pipe error showing previously, but shepherd commands hanging is new.
R
R
richard wrote on 15 Nov 2023 02:55
(address . 65463@debbugs.gnu.org)
87107ca490c719dd1e9c7a75915ceaa8@freakingpenguin.com
Just throwing out that I'm also experiencing this issue, even with a
home configuration containing a single service, that being
home-gpg-agent-service. I've deployed the same configuration on two
different machines at the same checkout and only experience the issue on
one. The other machine works flawlessly. The only distinction between
the two is that the working machine is using Guix on a foreign
distribution while the nonworking machine is using Guix System.

I do see the same behavior mentioned previously such as herd commands
hanging. However, in my case there seems to be another negative symptom
from shepherd acting this way.

Various $ gpg commands will hang (presumably when they try invoking
gpg-agent). Trying to manually invoke $ gpg-agent --daemon fails and
hangs indefinitely without a message. This effectively makes gpg
unusable.

Barebones home-environment:
Toggle snippet (10 lines)
(home-environment
(services
(list
(service home-gpg-agent-service-type
(home-gpg-agent-configuration
(pinentry-program
(file-append pinentry-tty "/bin/pinentry-tty"))
(ssh-support? #t))))))

herd status:
Toggle snippet (5 lines)
Started:
+ gpg-agent
+ root

Interestingly when running $ gpg --card-status I see the following
printed to the log:

shepherd.log (nonworking machine):
Toggle snippet (6 lines)
2023-11-14 19:23:06 Spawning systemd-style service
/gnu/store/2zgdxhbnkz8fgsb1l4xydm3nbvj66mih-gnupg-2.2.39/bin/gpg-agent.
2023-11-14 19:23:06 L2023-11-14 19:23:06 oading
/gnu/store/kzh1x4y030drw0jrdk08nynvvyasm3c1-shepherd.conf.

shepherd.log (working machine):
Toggle snippet (15 lines)
2023-11-14 19:43:39 Spawning systemd-style service
/gnu/store/2zgdxhbnkz8fgsb1l4xydm3nbvj66mih-gnupg-2.2.39/bin/gpg-agent.
2023-11-14 19:43:39 Running value of service gpg-agent changed to 611.
2023-11-14 19:43:39 [gpg-agent] using fd 3 for ssh socket
(/run/user/1000/gnupg/S.gpg-agent.ssh)
2023-11-14 19:43:39 [gpg-agent] using fd 4 for browser socket
(/run/user/1000/gnupg/S.gpg-agent.browser)
2023-11-14 19:43:39 [gpg-agent] using fd 5 for extra socket
(/run/user/1000/gnupg/S.gpg-agent.extra)
2023-11-14 19:43:39 [gpg-agent] using fd 6 for std socket
(/run/user/1000/gnupg/S.gpg-agent)
2023-11-14 19:43:39 [gpg-agent] listing on: std=6 extra=5 browser=4
ssh=3)

For some reason shepherd is rereading its configuration file instead of
properly spawning the gpg-agent service. shepherd.conf and
shepherd-gpg-agent-ssh-agent.scm (a file that appears in shepherd.conf)
are identical between the two machines.

shepherd.conf (working and nonworking machine):
Toggle snippet (10 lines)
(begin (use-modules (srfi srfi-34) (system repl error-handling)) (apply
register-services (map (lambda (file) (load file)) (quote
("/gnu/store/6zcvlihqs5mniw2vly4w7z39lhai2grz-shepherd-gpg-agent-ssh-agent.scm"))))
(action (quote root) (quote daemonize)) (format #t "Starting
services...~%") (let ((services-to-start (quote (gpg-agent ssh-agent))))
(if (defined? (quote start-in-the-background)) (start-in-the-background
services-to-start) (for-each start services-to-start)) (redirect-port
(open-input-file "/dev/null") (current-input-port))))

Sorry if this is too much or not helpful! First time on the mailing
list.
R
R
Richard Sent wrote on 16 Nov 2023 21:09
(address . 65463@debbugs.gnu.org)
5538caef3f592a7adc5b75e1cb99a711@freakingpenguin.com
Did some more testing and I was able to find a workaround for my version
of this issue, although I don't know if it'll solve it for others. TL;DR
is I changed my login shell from fish back to bash.

The system configuration for the nonworking machine was using fish
`(user-account (shell (file-append fish "/bin/fish") ...)`. There seems
to be some unintended behavior with executing the `on-first-login`
script in `.profile` using fish.

The stock `config.fish` sources $HOME/.profile, which executes the
.guix-home/on-first-login script via fenv source $HOME/.profile. Herd
seems to start up nicely, but somehow stdout for herd winds assigned to
a pipe without a reader between initialization and spawning gpg-agent.

initialization writes to the logfile just fine regardless of if you're
using fish or bash:
Toggle snippet (20 lines)
2023-11-16 14:17:15 Starting service root...
2023-11-16 14:17:15 Service root started.
2023-11-16 14:17:15 Service root running with value #t.
2023-11-16 14:17:15 Service root has been started.
2023-11-16 14:17:15 Daemonizing...
2023-11-16 14:17:15 Restarting signal handler.
2023-11-16 14:17:15 Now running as process 936.
2023-11-16 14:17:15 Starting services...
2023-11-16 14:17:15 Configuration successfully loaded from
'/gnu/store/kzh1x4y030drw0jrdk08nynvvyasm3c1-shepherd.conf'.
2023-11-16 14:17:15 Starting service gpg-agent...
2023-11-16 14:17:15 Service gpg-agent has been started.
2023-11-16 14:17:15 Service gpg-agent started.
2023-11-16 14:17:15 Service gpg-agent running with value (("ssh" .
#<input-output: socket 16>) ("browser" . #<input-output: socket 18>)
("extra" . #<input-output: socket 19>) ("std" . #<input-output: socket
20>)).
2023-11-16 14:17:15 Successfully started 2 services in the background.

When using bash instead of fish, stdout (file descriptor 1) points to
/dev/null so the write does not fail. Another write is for
$HOME/.local/state/shepherd.log.

bash fds
Toggle snippet (12 lines)
$ ls -l /proc/<HERD_PID>/fd)
lr-x------ 1 richard users 64 2023-11-16 14:17 0 ->
/dev/null
l-wx------ 1 richard users 64 2023-11-16 14:17 1 ->
/dev/null
l-wx------ 1 richard users 64 2023-11-16 15:04 2 ->
/home/richard/.local/share/sddm/xorg-session.log
l-wx------ 1 richard users 64 2023-11-16 14:17 6 ->
/home/richard/.local/state/log/shepherd.log
// ...

bash login shell strace of herd, right after running $ gpg
--card-status:
Toggle snippet (11 lines)
write(1, "Spawning systemd-style service
/gnu/store/2zgdxhbnkz8fgsb1l4xydm3nbvj66mih-gnupg-2.2.39/bin/gpg-agent.\n",
103) = 103
newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0444,
st_size=3536, ...}, 0) = 0
write(6, "2023-11-16 14:36:29 Spawning systemd-style service
/gnu/store/2zgdxhbnkz8fgsb1l4xydm3nbvj66mih-gnupg-2.2.39/bin/gpg-agent.\n",
123) = 123
// happy success messages

However, when fish is in charge of executing on-first-login, stdout and
stderr for the user shepherd get directed to a pipe instead of
/dev/null.

fish fds, both stdout and stderr point to a pipe:
Toggle snippet (12 lines)
$ ls -l /proc/<HERD_PID>/fd)
lr-x------ 1 richard users 64 2023-11-16 14:47 0 ->
/dev/null
l-wx------ 1 richard users 64 2023-11-16 14:47 1 ->
pipe:[41039]
l-wx------ 1 richard users 64 2023-11-16 14:47 2 ->
pipe:[41039]
l-wx------ 1 richard users 64 2023-11-16 14:47 6 ->
/home/richard/.local/state/log/shepherd.log
// ...

fish login shell strace of herd:
Toggle snippet (5 lines)
21:46:43 write(1, "Spawning systemd-style service
/gnu/store/2zgdxhbnkz8fgsb1l4xydm3nbvj66mih-gnupg-2.2.39/bin/gpg-agent.\n",
103) = -1 EPIPE (Broken pipe)

Details of what processes are using the pipe: (i.e. nothing else is
using the pipe. Which makes sense for a broken pipe error at least...)
Toggle snippet (7 lines)
$ lsof -n | grep -w 41039
(standard input):shepherd 991 richard 1w FIFO
0,14 0t0 41039 pipe
(standard input):shepherd 991 richard 2w FIFO
0,14 0t0 41039 pipe

What confuses me about this situation is judging from that bash behavior
pipes shouldn't be involved at all for stdin/stdout/stderr. There's also
an discrepancy in that stderr for bash points to xorg-session.log while
for fish it points to the same pipe as stdout. It would be interesting
to know if other users who are experiencing broken pipe errors are also
using fish as their user shell.
L
L
Ludovic Courtès wrote on 20 Mar 19:07 +0100
(name . Richard Sent)(address . richard@freakingpenguin.com)(address . 65463@debbugs.gnu.org)
8734skzshv.fsf@gnu.org
Hi Richard and all,

Richard Sent <richard@freakingpenguin.com> skribis:

Toggle quote (24 lines)
> However, when fish is in charge of executing on-first-login, stdout and
> stderr for the user shepherd get directed to a pipe instead of
> /dev/null.
>
> fish fds, both stdout and stderr point to a pipe:
>
> $ ls -l /proc/<HERD_PID>/fd)
> lr-x------ 1 richard users 64 2023-11-16 14:47 0 ->
> /dev/null
> l-wx------ 1 richard users 64 2023-11-16 14:47 1 ->
> pipe:[41039]
> l-wx------ 1 richard users 64 2023-11-16 14:47 2 ->
> pipe:[41039]
> l-wx------ 1 richard users 64 2023-11-16 14:47 6 ->
> /home/richard/.local/state/log/shepherd.log
> // ...
>
>
> fish login shell strace of herd:
>
> 21:46:43 write(1, "Spawning systemd-style service
> /gnu/store/2zgdxhbnkz8fgsb1l4xydm3nbvj66mih-gnupg-2.2.39/bin/gpg-agent.\n",
> 103) = -1 EPIPE (Broken pipe)

(I assume this is the strace of shepherd, not herd.)

Maybe we should unconditionally run shepherd with stdout/stderr
redirected to /dev/null?

That would sidestep the problem and it’s probably a good idea anyway.

WDYT?

Ludo’.
R
R
Richard Sent wrote on 21 Mar 20:27 +0100
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 65463@debbugs.gnu.org)
878r2b1j2b.fsf@freakingpenguin.com
Hi Ludo,

I tried replicating this issue on guix
d67e4f0f9b10c7ddac8fb0ca68cbf1d6ad0a6e5d and was not able to. My guess
is either shepherd 10.2->10.3 resolved the issue or there was a change in
fish.

It seems to work fine now, but I've not tested thoroughly. Hopefully
anyone else who can into the issue can chime in.

Toggle quote (2 lines)
> (I assume this is the strace of shepherd, not herd.)

When you say this do you mean daemon vs. CLI process? I'm fairly
confident that was the strace of the daemon although I can't recall how
I pulled that off.

Toggle quote (5 lines)
> Maybe we should unconditionally run shepherd with stdout/stderr
> redirected to /dev/null?
>
> That would sidestep the problem and it’s probably a good idea anyway.

Would unconditionally redirecting to /dev/null mess with the system
shepherd? I noticed that stdout and stderr for my system shepherd (in
/proc/1/fd) are printing to /dev/console. If so that would probably make
debugging shepherd issues harder, particularly on something like a
single board computer.

Thanks for looking at this!

--
Take it easy,
Richard Sent
Making my computer weirder one commit at a time.
?