[Shepherd] shepherd does not handle signals after 'daemonize'

  • Done
  • quality assurance status badge
Details
3 participants
  • Ludovic Courtès
  • Maxim Cournoyer
  • nils
Owner
unassigned
Submitted by
Maxim Cournoyer
Severity
normal
M
M
Maxim Cournoyer wrote on 9 Jun 2023 19:13
Shepherd can crash when a user service fails to start
(name . bug-guix)(address . bug-guix@gnu.org)
87mt18blug.fsf@gmail.com
Hi!

I've noticed that while all my user services (managed via GNU Stow --
not via Guix Home) were working, 'herd status' would report that
/run/user/1000/shepherd/socket was missing and bail out.

Starting from a nonexistent /run/user/1000/shepherd/socket, using old
Shepherd 0.9.1:

Toggle snippet (54 lines)
$ /gnu/store/dblbnj1yra4yrrfjbnzsa0ldcl3170ap-shepherd-0.9.1/bin/shepherd
Service root has been started.
WARNING: Use of `load' in declarative module (#{ g115}#). Add #:declarative? #f to your define-module invocation.

Some deprecated features have been used. Set the environment
variable GUILE_WARN_DEPRECATED to "detailed" and rerun the
program to get more information. Set it to "no" to suppress
this message.
$
Warning: due to a long standing Gtk+ bug
https://gitlab.gnome.org/GNOME/gtk/issues/221
Emacs might crash when run in daemon mode and the X11 connection is unexpectedly lost.
Using an Emacs configured with --with-x-toolkit=lucid does not have this problem.
Loading time (native compiled elisp)...
Loading time (native compiled elisp)...done
Loading /home/maxim/.emacs.d/recentf...
Loading /home/maxim/.emacs.d/recentf...done
Cleaning up the recentf list...
Cleaning up the recentf list...done (0 removed)
../../.emacs: Warning: Use keywords rather than deprecated positional arguments to `define-minor-mode'
Preparing diary...
No diary entries for Friday, June 9, 2023
Preparing diary...done
Appointment reminders enabled
Loading /home/maxim/.emacs.d/emms/cache...
Loading /home/maxim/.emacs.d/emms/cache...done
[yas] Prepared just-in-time loading of snippets successfully.
[yas] Prepared just-in-time loading of snippets successfully.
Starting new Ispell process aspell with english dictionary... \
Starting new Ispell process aspell with english dictionary...done
Starting Emacs daemon.
Unable to start the daemon.
Another instance of Emacs is running the server, either as daemon or interactively.
You can use emacsclient to connect to that Emacs process.
Saving file /home/maxim/.emacs.d/emms/history...
Wrote /home/maxim/.emacs.d/emms/history
Wrote /home/maxim/.emacs.d/recentf
Error: server did not start correctly
Service emacs could not be started.
gpg-agent: a gpg-agent is already running - not starting a new one
Service gpg-agent could not be started.
Service ibus-daemon has been started.

$ herd status
Started:
+ ibus-daemon
+ root
Stopped:
- emacs
- gpg-agent
- jackd
- workrave

If I then run it anew, it fails with "shepherd: while opening socket
'/run/user/1000/shepherd/socket': bind: Address already in use", because
apparently 'herd stop root' didn't remove it.

Toggle snippet (59 lines)
$ herd stop root
Exiting.
[...]

$ /gnu/store/dblbnj1yra4yrrfjbnzsa0ldcl3170ap-shepherd-0.9.1/bin/shepherd
Service root has been started.
WARNING: Use of `load' in declarative module (#{ g115}#). Add #:declarative? #f to your define-module invocation.

Some deprecated features have been used. Set the environment
variable GUILE_WARN_DEPRECATED to "detailed" and rerun the
program to get more information. Set it to "no" to suppress
this message.
maxim@hurd ~/src/guix [env]$
Warning: due to a long standing Gtk+ bug
https://gitlab.gnome.org/GNOME/gtk/issues/221
Emacs might crash when run in daemon mode and the X11 connection is unexpectedly lost.
Using an Emacs configured with --with-x-toolkit=lucid does not have this problem.
Loading time (native compiled elisp)...
Loading time (native compiled elisp)...done
Loading /home/maxim/.emacs.d/recentf...
Loading /home/maxim/.emacs.d/recentf...done
Cleaning up the recentf list...
Cleaning up the recentf list...done (0 removed)
../../.emacs: Warning: Use keywords rather than deprecated positional arguments to `define-minor-mode'
Preparing diary...
No diary entries for Friday, June 9, 2023
Preparing diary...done
Appointment reminders enabled
Loading /home/maxim/.emacs.d/emms/cache...
Loading /home/maxim/.emacs.d/emms/cache...done
[yas] Prepared just-in-time loading of snippets successfully.
[yas] Prepared just-in-time loading of snippets successfully.
Starting new Ispell process aspell with english dictionary... \
Starting new Ispell process aspell with english dictionary...done
Starting Emacs daemon.
Unable to start the daemon.
Another instance of Emacs is running the server, either as daemon or interactively.
You can use emacsclient to connect to that Emacs process.
Saving file /home/maxim/.emacs.d/emms/history...
Wrote /home/maxim/.emacs.d/emms/history
Wrote /home/maxim/.emacs.d/recentf
Error: server did not start correctly
Service emacs could not be started.
gpg-agent: a gpg-agent is already running - not starting a new one
Service gpg-agent could not be started.
Service ibus-daemon has been started.
shepherd: while opening socket '/run/user/1000/shepherd/socket': bind: Address already in use

Exiting shepherd...
Service ibus-daemon has been stopped.

Some deprecated features have been used. Set the environment
variable GUILE_WARN_DEPRECATED to "detailed" and rerun the
program to get more information. Set it to "no" to suppress
this message.

$

Even after removing it manually with 'rm
/run/user/1000/shepherd/socket', it still fails:

Toggle snippet (53 lines)
$ /gnu/store/dblbnj1yra4yrrfjbnzsa0ldcl3170ap-shepherd-0.9.1/bin/shepherd
Service root has been started.
WARNING: Use of `load' in declarative module (#{ g115}#). Add #:declarative? #f to your define-module invocation.

Some deprecated features have been used. Set the environment
variable GUILE_WARN_DEPRECATED to "detailed" and rerun the
program to get more information. Set it to "no" to suppress
this message.
maxim@hurd ~/src/guix [env]$
Warning: due to a long standing Gtk+ bug
https://gitlab.gnome.org/GNOME/gtk/issues/221
Emacs might crash when run in daemon mode and the X11 connection is unexpectedly lost.
Using an Emacs configured with --with-x-toolkit=lucid does not have this problem.
Loading time (native compiled elisp)...
Loading time (native compiled elisp)...done
Loading /home/maxim/.emacs.d/recentf...
Loading /home/maxim/.emacs.d/recentf...done
Cleaning up the recentf list...
Cleaning up the recentf list...done (0 removed)
../../.emacs: Warning: Use keywords rather than deprecated positional arguments to `define-minor-mode'
Preparing diary...
No diary entries for Friday, June 9, 2023
Preparing diary...done
Appointment reminders enabled
Loading /home/maxim/.emacs.d/emms/cache...
Loading /home/maxim/.emacs.d/emms/cache...done
[yas] Prepared just-in-time loading of snippets successfully.
[yas] Prepared just-in-time loading of snippets successfully.
Starting new Ispell process aspell with english dictionary... \
Starting new Ispell process aspell with english dictionary...done
Starting Emacs daemon.
Unable to start the daemon.
Another instance of Emacs is running the server, either as daemon or interactively.
You can use emacsclient to connect to that Emacs process.
Saving file /home/maxim/.emacs.d/emms/history...
Wrote /home/maxim/.emacs.d/emms/history
Wrote /home/maxim/.emacs.d/recentf
Error: server did not start correctly
Service emacs could not be started.
gpg-agent: a gpg-agent is already running - not starting a new one
Service gpg-agent could not be started.
Service ibus-daemon has been started.
shepherd: while opening socket '/run/user/1000/shepherd/socket': bind: Address already in use

Exiting shepherd...
Service ibus-daemon has been stopped.

Some deprecated features have been used. Set the environment
variable GUILE_WARN_DEPRECATED to "detailed" and rerun the
program to get more information. Set it to "no" to suppress
this message.

It apparently is caused by Emacs failing to start, because if I comment
it out from the init.scm file, then the same Shepherd invocation is
happy:

Toggle snippet (6 lines)
;; Services to start when shepherd starts:
(for-each start '(;emacs
gpg-agent
ibus-daemon))

Toggle snippet (11 lines)
$ herd status
Started:
+ ibus-daemon
+ root
Stopped:
- emacs
- gpg-agent
- jackd
- workrave

But that's with Shepherd 0.9.1. If I run the exact same config that now
works, I see:

Toggle snippet (26 lines)
rm /run/user/1000/shepherd/socket

$ /gnu/store/y826g8wrpzskcs82ffxppj7mmz257ksi-shepherd-0.10.1/bin/shepherd
Starting service root...
Service root started.
Service root running with value #t.
Service root has been started.
WARNING: Use of `load' in declarative module (#{ g119}#). Add #:declarative? #f to your define-module invocation.

Some deprecated features have been used. Set the environment
variable GUILE_WARN_DEPRECATED to "detailed" and rerun the
program to get more information. Set it to "no" to suppress
this message.
Starting service gpg-agent...

$ herd status
herd: error: /run/user/1000/shepherd/socket: No such file or directory

$ file /run/user/1000/shepherd/socket
/run/user/1000/shepherd/socket: cannot open `/run/user/1000/shepherd/socket' (No such file or directory)

$ pgrep -a shepherd
1 /gnu/store/4gvgcfdiz67wv04ihqfa8pqwzsb0qpv5-guile-3.0.9/bin/guile --no-auto-compile /gnu/store/nl0948z46yndpx3kihhi540l5c422wv4-shepherd-0.10.0/bin/shepherd --config /gnu/store/7dxbjccbqamk4wa0nyf7zsc4ywimb1fh-shepherd.conf
24700 /gnu/store/4gvgcfdiz67wv04ihqfa8pqwzsb0qpv5-guile-3.0.9/bin/guile --no-auto-compile /gnu/store/y826g8wrpzskcs82ffxppj7mmz257ksi-shepherd-0.10.1/bin/shepherd

It seems a bug exists in both 0.9.1 and 0.10.1, but that something also
regressed going from 0.9.1 to 0.10.1.

Attached are the two relevant
Shepherd config files to test:
Attachment: init.scm
Attachment: services.scm
--
Thanks,
Maxim
L
L
Ludovic Courtès wrote on 12 Jun 2023 15:44
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)(address . 63982@debbugs.gnu.org)
87ilbsvlql.fsf@gnu.org
Hi Maxim,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (18 lines)
> rm /run/user/1000/shepherd/socket
>
> $ /gnu/store/y826g8wrpzskcs82ffxppj7mmz257ksi-shepherd-0.10.1/bin/shepherd
> Starting service root...
> Service root started.
> Service root running with value #t.
> Service root has been started.
> WARNING: Use of `load' in declarative module (#{ g119}#). Add #:declarative? #f to your define-module invocation.
>
> Some deprecated features have been used. Set the environment
> variable GUILE_WARN_DEPRECATED to "detailed" and rerun the
> program to get more information. Set it to "no" to suppress
> this message.
> Starting service gpg-agent...
>
> $ herd status
> herd: error: /run/user/1000/shepherd/socket: No such file or directory

Thanks for the detailed bug report!

I believe this is fixed by Shepherd commit
24c964021ebd3d63ce6e22808dd09dbe16116a6c, which introduces an additional
change: loading the config file asynchronously.

If you wish to test it, you can use the ‘shepherd’ channel.

Let me know how it goes!

Thanks,
Ludo’.
M
M
Maxim Cournoyer wrote on 12 Jun 2023 19:32
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 63982@debbugs.gnu.org)
87pm60wpr1.fsf@gmail.com
Hi Ludovic!

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (28 lines)
> Hi Maxim,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> rm /run/user/1000/shepherd/socket
>>
>> $ /gnu/store/y826g8wrpzskcs82ffxppj7mmz257ksi-shepherd-0.10.1/bin/shepherd
>> Starting service root...
>> Service root started.
>> Service root running with value #t.
>> Service root has been started.
>> WARNING: Use of `load' in declarative module (#{ g119}#). Add #:declarative? #f to your define-module invocation.
>>
>> Some deprecated features have been used. Set the environment
>> variable GUILE_WARN_DEPRECATED to "detailed" and rerun the
>> program to get more information. Set it to "no" to suppress
>> this message.
>> Starting service gpg-agent...
>>
>> $ herd status
>> herd: error: /run/user/1000/shepherd/socket: No such file or directory
>
> Thanks for the detailed bug report!
>
> I believe this is fixed by Shepherd commit
> 24c964021ebd3d63ce6e22808dd09dbe16116a6c, which introduces an additional
> change: loading the config file asynchronously.

Nitpick: I'd use a git message tag for 'Reported-by', as can be inserted
in the commit buffer in Magit with C-c C-p. They should be placed at
the bottom of the git message to be considered by tools parsing them.

Toggle quote (2 lines)
> If you wish to test it, you can use the ‘shepherd’ channel.

I've done so by placing in my ~/.config/guix/channels.scm file:

(channel
(name 'shepherd)
(introduction
(make-channel-introduction
"788a6d6f1d5c170db68aa4bbfb77024fdc468ed3" ;2022-05-21
(openpgp-fingerprint
"3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5"))))

It'd be nice to have this in the Shepherd doc for easy copy & paste.

Toggle quote (2 lines)
> Let me know how it goes!

I've edited my ~/.xsession file to use
/gnu/store/ahzl8vxxcd5bqlljwgn8wkp4884sr72l-shepherd-0.10.99-tarball,
and I'm now seeing this:

Toggle snippet (11 lines)
$ herd status
Démarrés :
+ root
Starting:
^ emacs
Arrêtés :
- gpg-agent
- ibus-daemon
- jackd
- workrave
Interestingly, the Emacs client is usable. It doesn't change from
there, and requesting it to be stopped hangs Shepherd:

Toggle snippet (4 lines)
$ herd stop emacs


If I comment out the Emacs service from the ~/.config/shepherd/init.scm
file, the same seems to happen on my next service, gpg-agent:

Toggle snippet (12 lines)
$ herd status
Démarrés :
+ root
Starting:
^ gpg-agent
Arrêtés :
- emacs
- ibus-daemon
- jackd
- workrave

Etc. if I comment that one (now hanging on starting ibus-daemon). It
seems something is still off?

Thanks for working toward a fix!

--
Thanks,
Maxim
L
L
Ludovic Courtès wrote on 14 Jun 2023 17:57
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)(address . 63982@debbugs.gnu.org)
87o7lirq9a.fsf@gnu.org
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (8 lines)
>> I believe this is fixed by Shepherd commit
>> 24c964021ebd3d63ce6e22808dd09dbe16116a6c, which introduces an additional
>> change: loading the config file asynchronously.
>
> Nitpick: I'd use a git message tag for 'Reported-by', as can be inserted
> in the commit buffer in Magit with C-c C-p. They should be placed at
> the bottom of the git message to be considered by tools parsing them.

Neat, I didn’t know about it, I’ll do that now (I think I started using
the “Reported by” convention before Git came into existence…).

Toggle quote (16 lines)
>> If you wish to test it, you can use the ‘shepherd’ channel.
>
> I've done so by placing in my ~/.config/guix/channels.scm file:
>
> (channel
> (name 'shepherd)
> (url "https://git.savannah.gnu.org/git/shepherd.git")
> (introduction
> (make-channel-introduction
> "788a6d6f1d5c170db68aa4bbfb77024fdc468ed3" ;2022-05-21
> (openpgp-fingerprint
> "3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5"))))
>
>
> It'd be nice to have this in the Shepherd doc for easy copy & paste.

I’ll add that to ‘README’.

Toggle quote (17 lines)
>> Let me know how it goes!
>
> I've edited my ~/.xsession file to use
> /gnu/store/ahzl8vxxcd5bqlljwgn8wkp4884sr72l-shepherd-0.10.99-tarball,
> and I'm now seeing this:
>
> $ herd status
> Démarrés :
> + root
> Starting:
> ^ emacs
> Arrêtés :
> - gpg-agent
> - ibus-daemon
> - jackd
> - workrave

Uh, so it remains in “starting” state?

Toggle quote (3 lines)
> Interestingly, the Emacs client is usable. It doesn't change from
> there, and requesting it to be stopped hangs Shepherd:

Technically it’s waiting for ‘emacs’ to be in “running” state before
attempting to stop it.

Toggle quote (17 lines)
> If I comment out the Emacs service from the ~/.config/shepherd/init.scm
> file, the same seems to happen on my next service, gpg-agent:
>
> $ herd status
> Démarrés :
> + root
> Starting:
> ^ gpg-agent
> Arrêtés :
> - emacs
> - ibus-daemon
> - jackd
> - workrave
>
> Etc. if I comment that one (now hanging on starting ibus-daemon). It
> seems something is still off?

Looks like it. Could you share ~/.local/var/log/shepherd.log?

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 17 Jun 2023 16:03
control message for bug #63982
(address . control@debbugs.gnu.org)
87wn02qj83.fsf@gnu.org
tags 63982 + moreinfo
quit
N
Shepherd wrong-type-arg
(name . 63982@debbugs.gnu.org)(address . 63982@debbugs.gnu.org)
1275452452.902121.1687101286088@office.mailbox.org
Hello,

I am affected by this as well, but with slightly different symptoms.
Using guix home on a foreign system (Debian 12), I tried different shepherd versions with

(service home-shepherd-service-type
(home-shepherd-configuration
(shepherd (specification->package "shepherd@0.9")))

, and guix home describe --list-installed shows me that this works (in the sense that a different shepherd version is installed).
None of the versions I tried got me a functional shepherd service.

These are the error messages by shepherd version:

0.8.1:
Service root has been started.
WARNING: Use of `load' in declarative module (#{ g91}#). Add #:declarative? #f to your define-module invocation.
Loading /gnu/store/w6rlja8v65dwv16ivcqx513q7827n6aq-shepherd.conf.
herd: exception caught while executing 'load' on service 'root':
In procedure string-append: Wrong type (expecting string): #f

No /run/user/1000/shepherd/socket is created.

0.9.3:
Service root has been started.
WARNING: Use of `load' in declarative module (#{ g117}#). Add #:declarative? #f to your define-module invocation.
wrong-type-arg("string-append" "Wrong type (expecting ~A): ~S" ("string" #f) (#f))

Some deprecated features have been used. Set the environment
variable GUILE_WARN_DEPRECATED to "detailed" and rerun the
program to get more information. Set it to "no" to suppress
this message.

No /run/user/1000/shepherd/socket is created.

0.10.1:
Starting service root...
Service root started.
Service root running with value #t.
Service root has been started.
WARNING: Use of `load' in declarative module (#{ g107}#). Add #:declarative? #f to your define-module invocation.
wrong-type-arg("string-append" "Wrong type (expecting ~A): ~S" ("string" #f) (#f))

No /run/user/1000/shepherd/socket is created.

0.10.99:
Starting service root...
Service root started.
Service root running with value #t.
Service root has been started.
WARNING: Use of `load' in declarative module (#{ g119}#). Add #:declarative? #f to your define-module invocation.
Uncaught exception while loading configuration file '/gnu/store/w6rlja8v65dwv16ivcqx513q7827n6aq-shepherd.conf': (wrong-type-arg "string-append" "Wrong type (expecting ~A): ~S"
("string" #f) (#f))

, and then the reconfiguration hangs. /run/user/1000/shepherd/socket is created, and herd status shows that root is started, other services are not shown, and are not started.


Content of config (/gnu/store/w6rlja8v65dwv16ivcqx513q7827n6aq-shepherd.conf):
(begin (use-modules (srfi srfi-34) (system repl error-handling)) (apply register-services (map (lambda (file) (load file)) (quote ("/gnu/store/71n4r0hccps574aqcks7zyk5rz5zardq-
shepherd-eww.scm" "/gnu/store/0r14z4psnf9h2nfqiflm0nv6m2bv04si-shepherd-eww-open-lockscreen-like-background.scm" "/gnu/store/ylidynn5akvk3lmqrxbgqkz0c8hn3y8c-shepherd-syncthing
.scm" "/gnu/store/9igwbpbwavl6r94ph7qss7i5cqq9d8nj-shepherd-mcron.scm")))) (action (quote root) (quote daemonize)) (format #t "Starting services...~%") (let ((services-to-start
(quote (eww eww-open-lockscreen-like-background syncthing mcron)))) (if (defined? (quote start-in-the-background)) (start-in-the-background services-to-start) (for-each start
services-to-start)) (redirect-port (open-input-file "/dev/null") (current-input-port))))

~/.local/state/log/shepherd.log does not contain anything that's not already in the messages above.

Is there anything else I can provide? Without a running shepherd, my system doesn't work super well.
M
M
Maxim Cournoyer wrote on 19 Jun 2023 03:42
Service hangs in 'starting' with Shepherd 0.10 (was: Shepherd can crash when a user service fails to start)
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 63982@debbugs.gnu.org)
87zg4wrzvi.fsf_-_@gmail.com
Hi Ludo,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (33 lines)
> Hi,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>>> I believe this is fixed by Shepherd commit
>>> 24c964021ebd3d63ce6e22808dd09dbe16116a6c, which introduces an additional
>>> change: loading the config file asynchronously.
>>
>> Nitpick: I'd use a git message tag for 'Reported-by', as can be inserted
>> in the commit buffer in Magit with C-c C-p. They should be placed at
>> the bottom of the git message to be considered by tools parsing them.
>
> Neat, I didn’t know about it, I’ll do that now (I think I started using
> the “Reported by” convention before Git came into existence…).
>
>>> If you wish to test it, you can use the ‘shepherd’ channel.
>>
>> I've done so by placing in my ~/.config/guix/channels.scm file:
>>
>> (channel
>> (name 'shepherd)
>> (url "https://git.savannah.gnu.org/git/shepherd.git")
>> (introduction
>> (make-channel-introduction
>> "788a6d6f1d5c170db68aa4bbfb77024fdc468ed3" ;2022-05-21
>> (openpgp-fingerprint
>> "3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5"))))
>>
>>
>> It'd be nice to have this in the Shepherd doc for easy copy & paste.
>
> I’ll add that to ‘README’.

Neat, thank you.

Toggle quote (19 lines)
>>> Let me know how it goes!
>>
>> I've edited my ~/.xsession file to use
>> /gnu/store/ahzl8vxxcd5bqlljwgn8wkp4884sr72l-shepherd-0.10.99-tarball,
>> and I'm now seeing this:
>>
>> $ herd status
>> Démarrés :
>> + root
>> Starting:
>> ^ emacs
>> Arrêtés :
>> - gpg-agent
>> - ibus-daemon
>> - jackd
>> - workrave
>
> Uh, so it remains in “starting” state?

Yes! Which is surprising, because it's actually running fine, and
Shepherd 0.9.3 didn't have this issue (perhaps because it only knew of a
started/stopped service).

The other surprising thing is that because it thinks that Emacs hasn't
finished starting, it doesn't even attempt to try starting the other
services; they remain stopped although they should work.


[...]

Toggle quote (2 lines)
> Looks like it. Could you share ~/.local/var/log/shepherd.log?

I have something a bit more detailed, with various versions (the logs
are under ~/.local/state/shepherd/shepherd.log by default). If you need
to, you should be able to reproduce on your end using the attached
~/.config/shepherd/{init.scm,services.scm} files (and ensuring the
service commands are on your PATH):

Toggle snippet (86 lines)
Using /gnu/store/dblbnj1yra4yrrfjbnzsa0ldcl3170ap-shepherd-0.9.1/bin/shepherd

$ herd status
Started:
+ Emacs
+ Gpg-agent
+ ibus-daemon
+ jackd
+ root
+ workrave

Using /gnu/store/cdc1gzbp3q15kdiwn2i5j3437jwx61ac-shepherd-0.9.2/bin/shepherd

$ herd status
Started:
+ emacs
+ gpg-agent
+ ibus-daemon
+ jackd
+ root
+ workrave

Using /gnu/store/a9jdd8kgckwlq97yw3pjqs6sy4lqgrfq-shepherd-0.9.3/bin/shepherd

$ herd status
Started:
+ emacs
+ gpg-agent
+ ibus-daemon
+ jackd
+ root
+ workrave

~/.local/state/shepherd/shepherd.log:

2023-06-18 21:04:47 Service root démarré.
2023-06-18 21:04:57 Service emacs démarré.
2023-06-18 21:04:57 Service jackd démarré.
2023-06-18 21:04:57 Service gpg-agent démarré.
2023-06-18 21:04:57 Service ibus-daemon démarré.
2023-06-18 21:04:57 Service workrave démarré.

Using /gnu/store/ahzl8vxxcd5bqlljwgn8wkp4884sr72l-shepherd-0.10.99-tarball/bin/shepherd

$ herd status
Started:
+ root
Starting:
^ emacs
Stopped:
- gpg-agent
- ibus-daemon
- jackd
- workrave

~/.local/state/shepherd/shepherd.log:

2023-06-18 21:06:12 Starting service root...
2023-06-18 21:06:12 Service root started.
2023-06-18 21:06:12 Service root running with value #t.
2023-06-18 21:06:12 Service root démarré.
2023-06-18 21:06:12 Starting service emacs...
2023-06-18 21:06:12 [bash]
2023-06-18 21:06:12 [bash] Warning: due to a long standing Gtk+ bug
2023-06-18 21:06:12 [bash] https://gitlab.gnome.org/GNOME/gtk/issues/221
2023-06-18 21:06:12 [bash] Emacs might crash when run in daemon mode and the X11 connection is unexpectedly lost.
2023-06-18 21:06:12 [bash] Using an Emacs configured with --with-x-toolkit=lucid does not have this problem.
2023-06-18 21:06:13 [bash] Loading time (native compiled elisp)...
2023-06-18 21:06:13 [bash] Loading time (native compiled elisp)...done
2023-06-18 21:06:13 [bash] Loading /home/maxim/.emacs.d/recentf...
2023-06-18 21:06:13 [bash] Loading /home/maxim/.emacs.d/recentf...done
2023-06-18 21:06:13 [bash] Cleaning up the recentf list...
2023-06-18 21:06:13 [bash] Cleaning up the recentf list...done (0 removed)
2023-06-18 21:06:13 [bash] .emacs: Warning: Use keywords rather than deprecated positional arguments to `define-minor-mode'
2023-06-18 21:06:15 [bash] Preparing diary...
2023-06-18 21:06:15 [bash] No diary entries for Sunday, June 18, 2023: Father's Day
2023-06-18 21:06:15 [bash] Preparing diary...done
2023-06-18 21:06:15 [bash] Appointment reminders enabled
2023-06-18 21:06:16 [bash] Loading /home/maxim/.emacs.d/emms/cache...
2023-06-18 21:06:16 [bash] Loading /home/maxim/.emacs.d/emms/cache...done
2023-06-18 21:06:18 [bash] [yas] Prepared just-in-time loading of snippets successfully.
2023-06-18 21:06:20 [bash] [yas] Prepared just-in-time loading of snippets successfully.
2023-06-18 21:06:22 [bash] Starting new Ispell process aspell with english dictionary... \
2023-06-18 21:06:22 [bash] Starting new Ispell process aspell with english dictionary...done
2023-06-18 21:06:22 [bash] Starting Emacs daemon.
Attachment: init.scm
Attachment: services.scm
--
Thanks,
Maxim
L
L
Ludovic Courtès wrote on 21 Jun 2023 16:20
Re: bug#63982: Shepherd can crash when a user service fails to start
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)(address . 63982@debbugs.gnu.org)
87y1kc7v79.fsf_-_@gnu.org
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (4 lines)
> The other surprising thing is that because it thinks that Emacs hasn't
> finished starting, it doesn't even attempt to try starting the other
> services; they remain stopped although they should work.

This is because you’re starting them sequentially with:

(for-each start …)

If you instead use ‘start-in-the-background’, it’ll start them in
parallel.

(BTW, you might want to use the new interface eventually:

Toggle quote (44 lines)
> Using /gnu/store/ahzl8vxxcd5bqlljwgn8wkp4884sr72l-shepherd-0.10.99-tarball/bin/shepherd
>
> $ herd status
> Started:
> + root
> Starting:
> ^ emacs
> Stopped:
> - gpg-agent
> - ibus-daemon
> - jackd
> - workrave
>
> ~/.local/state/shepherd/shepherd.log:
>
> 2023-06-18 21:06:12 Starting service root...
> 2023-06-18 21:06:12 Service root started.
> 2023-06-18 21:06:12 Service root running with value #t.
> 2023-06-18 21:06:12 Service root démarré.
> 2023-06-18 21:06:12 Starting service emacs...
> 2023-06-18 21:06:12 [bash]
> 2023-06-18 21:06:12 [bash] Warning: due to a long standing Gtk+ bug
> 2023-06-18 21:06:12 [bash] https://gitlab.gnome.org/GNOME/gtk/issues/221
> 2023-06-18 21:06:12 [bash] Emacs might crash when run in daemon mode and the X11 connection is unexpectedly lost.
> 2023-06-18 21:06:12 [bash] Using an Emacs configured with --with-x-toolkit=lucid does not have this problem.
> 2023-06-18 21:06:13 [bash] Loading time (native compiled elisp)...
> 2023-06-18 21:06:13 [bash] Loading time (native compiled elisp)...done
> 2023-06-18 21:06:13 [bash] Loading /home/maxim/.emacs.d/recentf...
> 2023-06-18 21:06:13 [bash] Loading /home/maxim/.emacs.d/recentf...done
> 2023-06-18 21:06:13 [bash] Cleaning up the recentf list...
> 2023-06-18 21:06:13 [bash] Cleaning up the recentf list...done (0 removed)
> 2023-06-18 21:06:13 [bash] .emacs: Warning: Use keywords rather than deprecated positional arguments to `define-minor-mode'
> 2023-06-18 21:06:15 [bash] Preparing diary...
> 2023-06-18 21:06:15 [bash] No diary entries for Sunday, June 18, 2023: Father's Day
> 2023-06-18 21:06:15 [bash] Preparing diary...done
> 2023-06-18 21:06:15 [bash] Appointment reminders enabled
> 2023-06-18 21:06:16 [bash] Loading /home/maxim/.emacs.d/emms/cache...
> 2023-06-18 21:06:16 [bash] Loading /home/maxim/.emacs.d/emms/cache...done
> 2023-06-18 21:06:18 [bash] [yas] Prepared just-in-time loading of snippets successfully.
> 2023-06-18 21:06:20 [bash] [yas] Prepared just-in-time loading of snippets successfully.
> 2023-06-18 21:06:22 [bash] Starting new Ispell process aspell with english dictionary... \
> 2023-06-18 21:06:22 [bash] Starting new Ispell process aspell with english dictionary...done
> 2023-06-18 21:06:22 [bash] Starting Emacs daemon.

And what’s the process tree like, if you run “pstree -p N” where N is
the PID of shepherd?

It looks as though ‘bash -c "emacs --daemon"’ didn’t terminate, which is
what’s needed to transition from “starting” to “running”.

Could you ‘strace -f -s 100 -o /tmp/log.strace shepherd’, keeping only
the ‘emacs’ service?

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 22 Jun 2023 22:08
(address . nils@landt.email)(name . 63982@debbugs.gnu.org)(address . 63982@debbugs.gnu.org)
87cz1n45uv.fsf_-_@gnu.org
Hi,

nils@landt.email skribis:

Toggle quote (19 lines)
> 0.10.99:
> Starting service root...
> Service root started.
> Service root running with value #t.
> Service root has been started.
> WARNING: Use of `load' in declarative module (#{ g119}#). Add #:declarative? #f to your define-module invocation.
> Uncaught exception while loading configuration file '/gnu/store/w6rlja8v65dwv16ivcqx513q7827n6aq-shepherd.conf': (wrong-type-arg "string-append" "Wrong type (expecting ~A): ~S"
> ("string" #f) (#f))
>
> , and then the reconfiguration hangs. /run/user/1000/shepherd/socket is created, and herd status shows that root is started, other services are not shown, and are not started.
>
>
> Content of config (/gnu/store/w6rlja8v65dwv16ivcqx513q7827n6aq-shepherd.conf):
> (begin (use-modules (srfi srfi-34) (system repl error-handling)) (apply register-services (map (lambda (file) (load file)) (quote ("/gnu/store/71n4r0hccps574aqcks7zyk5rz5zardq-
> shepherd-eww.scm" "/gnu/store/0r14z4psnf9h2nfqiflm0nv6m2bv04si-shepherd-eww-open-lockscreen-like-background.scm" "/gnu/store/ylidynn5akvk3lmqrxbgqkz0c8hn3y8c-shepherd-syncthing
> .scm" "/gnu/store/9igwbpbwavl6r94ph7qss7i5cqq9d8nj-shepherd-mcron.scm")))) (action (quote root) (quote daemonize)) (format #t "Starting services...~%") (let ((services-to-start
> (quote (eww eww-open-lockscreen-like-background syncthing mcron)))) (if (defined? (quote start-in-the-background)) (start-in-the-background services-to-start) (for-each start
> services-to-start)) (redirect-port (open-input-file "/dev/null") (current-input-port))))

This suggests a problem in the config file: one of the shepherd-*.scm
files listed above ends up calling (string-append #f …).

We’d need to see those files to understand what’s happening but it looks
different from what Maxim reported.

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 22 Jun 2023 23:35
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)(address . 63982@debbugs.gnu.org)
878rcb41vb.fsf_-_@gnu.org
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (2 lines)
> Ludovic Courtès <ludo@gnu.org> writes:

[...]

Toggle quote (4 lines)
>> Uh, so it remains in “starting” state?
>
> Yes!

Turns out that this happens when calling the ‘daemonize’ action on
‘root’. I have a reproducer now and am investigating…

Ludo’.
N
(name . Ludovic Courtès)(address . ludo@gnu.org)(name . 63982@debbugs.gnu.org)(address . 63982@debbugs.gnu.org)
84694188.1493786.1687698238340@office.mailbox.org
Toggle quote (7 lines)
> Ludovic Courtès <ludo@gnu.org> hat am 22.06.2023 22:08 CEST geschrieben:
> This suggests a problem in the config file: one of the shepherd-*.scm
> files listed above ends up calling (string-append #f …).
>
> We’d need to see those files to understand what’s happening but it looks
> different from what Maxim reported.

Indeed I misdiagnosed the issue because it happened after a guix upgrade.
I used $XDG_LOG_HOME in my shepherd services, and as of f74df2ab879fc5457982bbc85b7455a90e82317d this is no longer set by default.
Thanks for your help!
M
M
Maxim Cournoyer wrote on 26 Jun 2023 17:53
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 63982@debbugs.gnu.org)
87v8fa1aq4.fsf@gmail.com
Hi Ludo,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (15 lines)
> Hi,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> Ludovic Courtès <ludo@gnu.org> writes:
>
> [...]
>
>>> Uh, so it remains in “starting” state?
>>
>> Yes!
>
> Turns out that this happens when calling the ‘daemonize’ action on
> ‘root’. I have a reproducer now and am investigating…

Great, thanks for investigating and let me know if I can provide
something useful. It seems introducing cooperative scheduling is a path
layered with traps, eh :-).

--
Thanks,
Maxim
L
L
Ludovic Courtès wrote on 12 Jul 2023 19:11
control message for bug #63982
(address . control@debbugs.gnu.org)
87y1jl12dc.fsf@gnu.org
retitle 63982 [Shepherd] shepherd does not handle signals after 'daemonize'
quit
L
L
Ludovic Courtès wrote on 12 Jul 2023 19:46
Re: bug#63982: Shepherd can crash when a user service fails to start
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)(address . 63982@debbugs.gnu.org)
87ttu910q7.fsf@gnu.org
Hi!

Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (3 lines)
> Turns out that this happens when calling the ‘daemonize’ action on
> ‘root’. I have a reproducer now and am investigating…

Good news: this is fixed in Shepherd commit
f4272d2f0f393d2aa3e9d76b36ab6aa5f2fc72c2!

The root cause is inconsistent semantics when mixing epoll, signalfd,
and fork, specifically this part from signalfd(2):

epoll(7) semantics
If a process adds (via epoll_ctl(2)) a signalfd file descriptor to an
epoll(7) instance, then epoll_wait(2) returns events only for signals
sent to that process. In particular, if the process then uses fork(2)
to create a child process, then the child will be able to read(2) sig?
nals that are sent to it using the signalfd file descriptor, but
epoll_wait(2) will not indicate that the signalfd file descriptor is
ready. In this scenario, a possible workaround is that after the
fork(2), the child process can close the signalfd file descriptor that
it inherited from the parent process and then create another signalfd
file descriptor and add it to the epoll instance. […]

The C program below illustrates this behavior:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/signal.h>
#include <sys/signalfd.h>
#include <sys/epoll.h>

int
main ()
{
int ep, sfd;

sigset_t signals;
sigemptyset (&signals);
sigaddset (&signals, SIGINT);
sigaddset (&signals, SIGHUP);

sigprocmask (SIG_BLOCK, &signals, NULL);
sfd = signalfd (-1, &signals, SFD_CLOEXEC);

ep = epoll_create1 (EPOLL_CLOEXEC);

struct epoll_event events = { .events = EPOLLIN | EPOLLONESHOT, .data = NULL };
epoll_ctl (ep, EPOLL_CTL_ADD, sfd, &events);

epoll_wait (ep, &events, 1, 123);

if (fork () == 0)
{
/* Quoth signalfd(2):

If a process adds (via epoll_ctl(2)) a signalfd file descriptor to an
epoll(7) instance, then epoll_wait(2) returns events only for signals
sent to that process. In particular, if the process then uses fork(2)
to create a child process, then the child will be able to read(2) sig?
nals that are sent to it using the signalfd file descriptor, but
epoll_wait(2) will not indicate that the signalfd file descriptor is
ready. */

printf ("try this: kill -INT %i\n", getpid ());
while (1)
{
struct signalfd_siginfo info;
if (epoll_wait (ep, &events, 1, 777) > 0)
{
read (sfd, &info, sizeof info);
printf ("got signal %i!\n", info.ssi_signo);
epoll_ctl (ep, EPOLL_CTL_MOD, sfd, &events);
}
}
}

return 0;
}
Of course it took me a while to find out about this; I first looked at
things individually and didn’t expect the mixture to behave
inconsistently.

Maxim, let me know if it works for you!

Thanks,
Ludo’.
M
M
Maxim Cournoyer wrote on 19 Jul 2023 03:11
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 63982-done@debbugs.gnu.org)
87sf9kln75.fsf@gmail.com
Hey Ludo!

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (87 lines)
> Hi!
>
> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> Turns out that this happens when calling the ‘daemonize’ action on
>> ‘root’. I have a reproducer now and am investigating…
>
> Good news: this is fixed in Shepherd commit
> f4272d2f0f393d2aa3e9d76b36ab6aa5f2fc72c2!
>
> The root cause is inconsistent semantics when mixing epoll, signalfd,
> and fork, specifically this part from signalfd(2):
>
> epoll(7) semantics
> If a process adds (via epoll_ctl(2)) a signalfd file descriptor to an
> epoll(7) instance, then epoll_wait(2) returns events only for signals
> sent to that process. In particular, if the process then uses fork(2)
> to create a child process, then the child will be able to read(2) sig?
> nals that are sent to it using the signalfd file descriptor, but
> epoll_wait(2) will not indicate that the signalfd file descriptor is
> ready. In this scenario, a possible workaround is that after the
> fork(2), the child process can close the signalfd file descriptor that
> it inherited from the parent process and then create another signalfd
> file descriptor and add it to the epoll instance. […]
>
> The C program below illustrates this behavior:
>
> #include <stdlib.h>
> #include <stdio.h>
> #include <unistd.h>
> #include <sys/signal.h>
> #include <sys/signalfd.h>
> #include <sys/epoll.h>
>
> int
> main ()
> {
> int ep, sfd;
>
> sigset_t signals;
> sigemptyset (&signals);
> sigaddset (&signals, SIGINT);
> sigaddset (&signals, SIGHUP);
>
> sigprocmask (SIG_BLOCK, &signals, NULL);
> sfd = signalfd (-1, &signals, SFD_CLOEXEC);
>
> ep = epoll_create1 (EPOLL_CLOEXEC);
>
> struct epoll_event events = { .events = EPOLLIN | EPOLLONESHOT, .data = NULL };
> epoll_ctl (ep, EPOLL_CTL_ADD, sfd, &events);
>
> epoll_wait (ep, &events, 1, 123);
>
> if (fork () == 0)
> {
> /* Quoth signalfd(2):
>
> If a process adds (via epoll_ctl(2)) a signalfd file descriptor to an
> epoll(7) instance, then epoll_wait(2) returns events only for signals
> sent to that process. In particular, if the process then uses fork(2)
> to create a child process, then the child will be able to read(2) sig?
> nals that are sent to it using the signalfd file descriptor, but
> epoll_wait(2) will not indicate that the signalfd file descriptor is
> ready. */
>
> printf ("try this: kill -INT %i\n", getpid ());
> while (1)
> {
> struct signalfd_siginfo info;
> if (epoll_wait (ep, &events, 1, 777) > 0)
> {
> read (sfd, &info, sizeof info);
> printf ("got signal %i!\n", info.ssi_signo);
> epoll_ctl (ep, EPOLL_CTL_MOD, sfd, &events);
> }
> }
> }
>
> return 0;
> }
>
>
> Of course it took me a while to find out about this; I first looked at
> things individually and didn’t expect the mixture to behave
> inconsistently.

Tricky! Thanks for sharing the result of your investigation, it's
always enlightening!

Toggle quote (2 lines)
> Maxim, let me know if it works for you!

Better than ever! Thanks a lot for fixing the various issues reported
here.

I'm closing this one!

--
Thanks,
Maxim
Closed
?
Your comment

This issue is archived.

To comment on this conversation send an email to 63982@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 63982
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch