'guix system reconfigure' must start/restart/stop services

  • Done
  • quality assurance status badge
Details
4 participants
  • Carlo Zancanaro
  • Thompson, David
  • Efraim Flashner
  • Ludovic Courtès
Owner
Somebody
Submitted by
Ludovic Courtès
Severity
important
L
L
Ludovic Courtès wrote on 28 Nov 2015 17:35
(address . bug-guix@gnu.org)
874mg6rsjl.fsf@gnu.org
Hello!

Currently ‘guix system reconfigure’ doesn’t try to dynamically update
the set of running services, which is a shame.

A simple strategy would be to have it:

1. Stop and unregister services currently known to dmd that are
missing in the new configuration.

2. Load and start (if they have ‘auto-start?’) services that are in
the new configuration and currently unknown to dmd.

3. The rest is the most difficult part: dealing with services that
already exist but that have changed (see below.)

One step towards this has been the fact that each service has its code
in a module of its own (commit fae685b), making it easy to have dmd load
it.

For #3, the difficulty is that we cannot do deco stop/load/start for
core services like udev or file-system-root because stopping these would
effectively halt the system.

However, we can safely restart services that are leaves of the dmd graph
(unless the user explicitly asks not to do it.) Here’s what it would
mean on my system, which uses ‘%desktop-services’ and a few more:

Toggle snippet (21 lines)
scheme@(guile-user)> ,use(guix)
scheme@(guile-user)> ,use(gnu)
scheme@(guile-user)> ,use(gnu services dmd)
scheme@(guile-user)> (define os (load "/home/ludo/src/configuration/pluto-configuration.scm"))
scheme@(guile-user)> ,use(gnu services)
scheme@(guile-user)> (define dmds (fold-services (operating-system-services os)
#:target-type dmd-root-service-type))
scheme@(guile-user)> ,use(gnu services)
scheme@(guile-user)> (length (service-parameters dmds))
$2 = 49
scheme@(guile-user)> (define back-edges (dmd-service-back-edges (service-parameters dmds)))
scheme@(guile-user)> ,use(srfi srfi-1)
scheme@(guile-user)> (map dmd-service-provision
(filter (lambda (s)
(null? (back-edges s)))
(service-parameters dmds)))
$3 = ((swap-/dev/sda4) (nscd) (guix-daemon) (console-font-tty6) (console-font-tty5) (console-font-tty4) (console-font-tty3) (console-font-tty2) (console-font-tty1) (ntpd) (elogind) (upower-daemon) (avahi-daemon) (xorg-server) (tor) (ssh-daemon) (bitlbee))
scheme@(guile-user)> (length $3)
$4 = 17

17 out of 49 services could be restarted.

As a first step, we could ignore the other services.

As a second step, we could maybe have an ‘upgrade’ action that would
mutate their <service> instance in place, but without actually
restarting them, such that the changes would only take effect on the
next restart.

Roughly, we’d be doing, say:

deco upgrade udev /gnu/store/…-dmd-udev.scm

where …-dmd-udev.scm is the service file that contains:

(make <service> #:provides '(udev) …)

The ‘upgrade’ action would ‘set!’ all the fields of the old service
instance to those of the new instance, such that they are ‘equal?’ (but
not ‘eq?’.) The caveat is that this is not atomic.

Thoughts?

The prerequisite to all this work is to make the dmd RPCs
machine-processable, which is not too much work.

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 8 Jan 2016 11:04
(address . 22039@debbugs.gnu.org)
87d1tcmlpd.fsf@gnu.org
ludo@gnu.org (Ludovic Courtès) skribis:

Toggle quote (3 lines)
> The prerequisite to all this work is to make the dmd RPCs
> machine-processable, which is not too much work.

Commit 841b009 in dmd does one step in that direction: it’s now possible
to get the status of services as an sexp.

Ludo’.
L
L
Ludovic Courtès wrote on 3 Feb 2016 22:32
(address . 22039@debbugs.gnu.org)
87h9hp7a5h.fsf@gnu.org
ludo@gnu.org (Ludovic Courtès) skribis:

Toggle quote (14 lines)
> Currently ‘guix system reconfigure’ doesn’t try to dynamically update
> the set of running services, which is a shame.
>
> A simple strategy would be to have it:
>
> 1. Stop and unregister services currently known to dmd that are
> missing in the new configuration.
>
> 2. Load and start (if they have ‘auto-start?’) services that are in
> the new configuration and currently unknown to dmd.
>
> 3. The rest is the most difficult part: dealing with services that
> already exist but that have changed (see below.)

Commit 240b57f implements #1 and #2.

Ludo’.
T
T
Thompson, David wrote on 3 Feb 2016 22:34
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 22039@debbugs.gnu.org)
CAJ=Rwfb=dw9QudA-yWnYTj4NVUH=qO7UXTzW6BXEtza5mTM66g@mail.gmail.com
On Wed, Feb 3, 2016 at 4:32 PM, Ludovic Courtès <ludo@gnu.org> wrote:
Toggle quote (18 lines)
> ludo@gnu.org (Ludovic Courtès) skribis:
>
>> Currently ‘guix system reconfigure’ doesn’t try to dynamically update
>> the set of running services, which is a shame.
>>
>> A simple strategy would be to have it:
>>
>> 1. Stop and unregister services currently known to dmd that are
>> missing in the new configuration.
>>
>> 2. Load and start (if they have ‘auto-start?’) services that are in
>> the new configuration and currently unknown to dmd.
>>
>> 3. The rest is the most difficult part: dealing with services that
>> already exist but that have changed (see below.)
>
> Commit 240b57f implements #1 and #2.

Awesome! This is very good progress.

- Dave
L
L
Ludovic Courtès wrote on 18 Apr 2016 16:52
control message for bug #22039
(address . control@debbugs.gnu.org)
87oa973r6j.fsf@gnu.org
owner 22039 !
L
L
Ludovic Courtès wrote on 16 Jan 2018 12:16
(address . control@debbugs.gnu.org)
871siq835n.fsf@gnu.org
severity 22039 important
L
C
C
Carlo Zancanaro wrote on 26 Aug 2018 14:15
[PATCH] 'guix system reconfigure' must start/restart/stop services
(address . 22039@debbugs.gnu.org)
87tvnhxr20.fsf@zancanaro.id.au
When the next release of the Shepherd is made (including commit
9ec5c0000e9a45441417a6ee4138cdcbf1b1f2b2) we should have the
capability to resolve this ticket.

Attached is my proposed patch from the Guix side. I have tested it
on my machine by grafting the Shepherd with the appropriate patch
and it seems to work as expected.

I tested it by changing the substitute-urls in my guix-daemon
configuration. The output of `ps aux | grep guix-daemon` after
`guix system reconfigure` showed the substitute-urls were
unchanged. After `herd restart guix-daemon` the updated
substitute-urls appeared in `ps aux | grep guix-daemon`. I did not
need to reboot my system.

One possible improvement would be to print out the services that
need to be restarted to be upgraded.
From 162bd298563201ebf6eda87d46ae1b64671397da Mon Sep 17 00:00:00 2001
From: Carlo Zancanaro <carlo@zancanaro.id.au>
Date: Sun, 26 Aug 2018 21:54:14 +1000
Subject: [PATCH] gnu: services: Load all services on reconfigure, not just
stopped ones

* gnu/services/shepherd.scm (shepherd-service-upgrade): Remove checks for
running services.
---
gnu/services/shepherd.scm | 25 +++++--------------------
1 file changed, 5 insertions(+), 20 deletions(-)

Toggle diff (54 lines)
diff --git a/gnu/services/shepherd.scm b/gnu/services/shepherd.scm
index 4cd224984..efeb82c86 100644
--- a/gnu/services/shepherd.scm
+++ b/gnu/services/shepherd.scm
@@ -1,6 +1,7 @@
;;; GNU Guix --- Functional package management for GNU
;;; Copyright © 2013, 2014, 2015, 2016, 2018 Ludovic Courtès <ludo@gnu.org>
;;; Copyright © 2017 Clément Lassieur <clement@lassieur.org>
+;;; Copyright © 2018 Carlo Zancanaro <carlo@zancanaro.id.au>
;;;
;;; This file is part of GNU Guix.
;;;
@@ -338,20 +339,6 @@ needs to be loaded."
(shepherd-service-lookup-procedure target
shepherd-service-provision))
- (define lookup-live
- (shepherd-service-lookup-procedure live
- live-service-provision))
-
- (define (running? service)
- (and=> (lookup-live (shepherd-service-canonical-name service))
- live-service-running))
-
- (define (stopped service)
- (match (lookup-live (shepherd-service-canonical-name service))
- (#f #f)
- (service (and (not (live-service-running service))
- service))))
-
(define live-service-dependents
(shepherd-service-back-edges live
#:provision live-service-provision
@@ -363,14 +350,12 @@ needs to be loaded."
(_ #f)))
(define to-load
- ;; Only load services that are either new or currently stopped.
- (remove running? target))
+ ;; Load all of the new services.
+ target)
(define to-unload
- ;; Unload services that are (1) no longer required, or (2) are in TO-LOAD.
- (remove essential?
- (append (filter obsolete? live)
- (filter-map stopped to-load))))
+ ;; Unload services that are no longer required.
+ (remove essential? (filter obsolete? live)))
(values to-unload to-load))
--
2.18.0
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEE1lpncq7JnOkt+LaeqdyPv9awIbwFAluCmccACgkQqdyPv9aw
IbyvRRAApCdWUbxl9KkYNo0NnxHpjEXn0LPOqQzwrWHE9LBoIPIVQ4Ri7mcCYHvl
uIz2WZ+08A88vOHIXoVxI1xpsWco4MU+NM+jysZatDqrjt1mUJPEyqQUcCB0o0QV
NMjLXzyV70HYQjdFoIkq8xZluUYo+Ar8oE04Ey2zsy7+RV2GyV2QqNOW35ki/il7
q5NrJujR5XYVz+SOzKL+K9yYYfehh53CHG0Uy24wXfrStHQehrWChRzUEEN4uEwF
xrqcOOq4Txnn92b7TCuqA0vtJ+A7L5PSFS47qhJo0+OcNC3OhApQXWnHXRhby4S3
3ZlZdyGc/amZspFFDD/zttWmfL4StPC/HzH54cpTJwC/qlOyk9heS6xgJU+819f+
zJDWsvMTkeqJ782Yoz81SgcgML8I3eQWsXkWyaoFM4iPOjrfVuIiPhEr4BQEibq2
LGdG7rFE0ZKFBmkwo7QyB1AqhLuAPheQY1mlbIQ41BneccWULmoA3uy5Zc9i69E0
P5zzogQL9QettdpDz2vXmOwjstipbQSP1EuW/x1h8GuCdy43oxORE6EvqjZkHL3A
GheEJZUv1vqtWGpPn8T55cTiD0USpqhjSdGwrkxkl04KrT+Jkywwju+gswknot9G
z9jZ0c+Fna2VIo/Wp4t7eDOKOehdReG/0yudWeFpEgnOKlR4/JE=
=fVbo
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 1 Sep 2018 12:49
(name . Carlo Zancanaro)(address . carlo@zancanaro.id.au)(address . 22039@debbugs.gnu.org)
87sh2tijb2.fsf@gnu.org
Hi Carlo,

Carlo Zancanaro <carlo@zancanaro.id.au> skribis:

Toggle quote (4 lines)
> When the next release of the Shepherd is made (including commit
> 9ec5c0000e9a45441417a6ee4138cdcbf1b1f2b2) we should have the
> capability to resolve this ticket.

Woohoo!

I’d like to make sure we understand the story with ‘EINTR-safe’, but
after that I’m happy to push a release.

Toggle quote (10 lines)
> Attached is my proposed patch from the Guix side. I have tested it on
> my machine by grafting the Shepherd with the appropriate patch and it
> seems to work as expected.
>
> I tested it by changing the substitute-urls in my guix-daemon
> configuration. The output of `ps aux | grep guix-daemon` after `guix
> system reconfigure` showed the substitute-urls were unchanged. After
> `herd restart guix-daemon` the updated substitute-urls appeared in `ps
> aux | grep guix-daemon`. I did not need to reboot my system.

Perfect.

Toggle quote (3 lines)
> One possible improvement would be to print out the services that need
> to be restarted to be upgraded.

Yes, that’d be nice.

Toggle quote (9 lines)
> From 162bd298563201ebf6eda87d46ae1b64671397da Mon Sep 17 00:00:00 2001
> From: Carlo Zancanaro <carlo@zancanaro.id.au>
> Date: Sun, 26 Aug 2018 21:54:14 +1000
> Subject: [PATCH] gnu: services: Load all services on reconfigure, not just
> stopped ones
>
> * gnu/services/shepherd.scm (shepherd-service-upgrade): Remove checks for
> running services.

Could you adjust the manual, where it currently says “if a service is
currently running, it does not attempt to upgrade it”?

Other than that LGTM!

Ludo’.
C
C
Carlo Zancanaro wrote on 1 Sep 2018 14:15
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 22039@debbugs.gnu.org)
87va7pza4p.fsf@zancanaro.id.au
Hey Ludo’,

On Sat, Sep 01 2018, Ludovic Courtès wrote:
Toggle quote (3 lines)
> I’d like to make sure we understand the story with ‘EINTR-safe’,
> but after that I’m happy to push a release.

Do you have any thoughts about why it could be failing, or things
I could investigate? I don't know where to start.

Toggle quote (5 lines)
>> One possible improvement would be to print out the services
>> that need to be restarted to be upgraded.
>
> Yes, that’d be nice.

I have done this, but now it seems a bit overwhelming how many
services would need to be manually restarted. My modified code
writes a message like this:

To complete the upgrade, restart the following services:
file-systems
user-file-systems
file-system-/boot/efi
file-system-/dev/pts
file-system-/dev/shm
file-system-/gnu/store
file-system-/run/systemd
file-system-/run/user
file-system-/sys/fs/cgroup/elogind
file-system-/sys/fs/cgroup
file-system-/sys/fs/cgroup/cpuset
file-system-/sys/fs/cgroup/cpu
file-system-/sys/fs/cgroup/cpuacct
file-system-/sys/fs/cgroup/memory
file-system-/sys/fs/cgroup/devices
file-system-/sys/fs/cgroup/freezer
file-system-/sys/fs/cgroup/blkio
file-system-/sys/fs/cgroup/perf_event
root-file-system
user-processes
host-name
udev
nscd
guix-daemon
urandom-seed
syslogd
loopback
term-tty6
term-tty5
term-tty4
term-tty3
term-tty2
term-tty1
console-font-tty1
console-font-tty2
console-font-tty3
console-font-tty4
console-font-tty5
console-font-tty6
virtual-terminal
ntpd
dbus-system
elogind
upower-daemon
avahi-daemon
wpa-supplicant
networking
xorg-server
cups

The same list is printed every time on my system, because the
diffing is only on the level of the canonical-name. Most of these
services are being "replaced" by services that are exactly the
same, so they don't really need to be restarted. I don't really
know what to do about this, Even if it were fixed, on an actual
upgrade I assume many of these services would be different, and
thus would be printed legitimately.

I'm also confused why some of these things are services (like
host-name).

I'll send through an updated patch once I've cleaned it up a bit,
but I'm not as positive about it as I was initially.

Carlo
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEE1lpncq7JnOkt+LaeqdyPv9awIbwFAluKgvYACgkQqdyPv9aw
Ibw9chAAi+b3J8DzTqEaHB/0w/lj160Q0wxOh75s1uIw2yQUcLIs7A+RQCi/qzdQ
3WAQEoVh3RAGoKVOAbMA1p+vY/gRA1LYnrYWvl/AD9CfKpcC0qDMMRxNNJlEFBZI
EzlejShTq4uOtPAs6q51iVWIBOECGzWnoRPULtiUB24B9mfgbVFeywrgLREvVuZO
yMPjsy+cR3fEKJ2N5DooLWydvj1KuQrji62iYIzYGd4UQ0nfYWIJKUVUB0uK5mUD
uoHWOaoA/fmDSXlkkLugPgov5/urp8zzZt0DivEhBda7OS5hoGJ/MIbB4VtuqNcL
nfhx/CRGHV061v06gysPLE/opaOMoy3wiilnyomyFvpGPb+U6pxFHmq0VSS8kFl0
V8ztZVODs/wlp7TmLlnCrGSeYH6EX6q4+Ox14pF3IleX09tPM3pNTEHKE1mAq60p
HfNzwFJPHuVXoJEEyy1yqJhkQ2TtiPzX1HcrMrtcKiPhTCeFBXuIF4q85KVUhiqE
SbFUjlGWt04+tu6+trGUtg1zG7T5PXkv9YdW51gGRwol72eAnbOKB/xUvYiLYvJz
HI0Ul7O5d+eqGtPIGSHjgSyivG7kVhAcFIUTl9P9h89MAr+ro1KkgEpM05X6FoHI
9Q2AT+boKDrnEQj6Awtm8It1avlQVQnzwikCZf5Co7FcsJEwugQ=
=ADik
-----END PGP SIGNATURE-----

C
C
Carlo Zancanaro wrote on 1 Sep 2018 14:33
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 22039@debbugs.gnu.org)
87tvn9z9bp.fsf@zancanaro.id.au
On Sat, Sep 01 2018, Carlo Zancanaro wrote:
Toggle quote (3 lines)
> I'll send through an updated patch once I've cleaned it up a
> bit, [ ... ]

Updated patch attached.
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEE1lpncq7JnOkt+LaeqdyPv9awIbwFAluKhwsACgkQqdyPv9aw
IbwcpBAAh5maLOBsraA9/gJfF+wuhrmEo17QmwLwnw+2/dXFg7jsrwnDmk8Ht3qE
BKR9kPEiKq1T9BPjv9436VY/OW+CULYMwM/yUsbxG3qarsRnXocL1BGIXVr2Sy6q
jpwN9q3TMOVwlgHmXfHPqU2yB/GxeV0vyJUmVUdWeFlHUEnI4Mzzc+TIDxYsPJHL
GvZeBR6Af+f5lCJMdgV24YChS7gc7ErrUaKO376Mv7b84BLkZT4ZTnE8DgL4jRjW
aIr+oabYkXT6/TT8jjcchFiBx521GDkOUi5SMNLvzNhzLnOCOpkYT+IYL7wfwVN5
RbFw+XfiTeMg+SEAWQEIJqdh/WsK7qIwuWDKgZsQAEUovzAtoJ3LqtyEnvScwJ0W
kWTRHRWs2B/WLqvGpR9ul6fInzk9YHYc3uUeFGIQQcgjxCOUQX3ahz4f6Ka32Ho3
l3igjWTelJpiq/GT+s3bWIelK8TdI1jOOb0Tp6dF1S8zeVz5DzXScr35CyCcGJq+
Z2rxBfiU3/LdTs0SkIrYvgm/+9eR8VU1C32ZDLmoAcu8aZQ+eqhGpmLcaL5OEphK
tzZ53WJ3Ip5j73mObXE02OdteCo8/guY1pDqnvmw13UWcw7esPKT8Zre/dljb7um
qJ3UTe8x9nniTeDMUq1IVs3WbqU+I0IUpoNonl61ZWYTsSBDIYY=
=nrFf
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 1 Sep 2018 19:12
(name . Carlo Zancanaro)(address . carlo@zancanaro.id.au)(address . 22039@debbugs.gnu.org)
87tvn9b0qh.fsf@gnu.org
Heya,

Carlo Zancanaro <carlo@zancanaro.id.au> skribis:

Toggle quote (7 lines)
> On Sat, Sep 01 2018, Ludovic Courtès wrote:
>> I’d like to make sure we understand the story with ‘EINTR-safe’, but
>> after that I’m happy to push a release.
>
> Do you have any thoughts about why it could be failing, or things I
> could investigate? I don't know where to start.

First, could you check (in a VM) whether the boot failure is
reproducible when that patch that removes ‘EINTR-safe’ is applied?

If it’s 100% reproducible, could you share the VM’s output?

I don’t know what the problem might be but hopefully that’ll give us a
starting point.

Toggle quote (4 lines)
> I have done this, but now it seems a bit overwhelming how many
> services would need to be manually restarted. My modified code writes
> a message like this:

[...]

Toggle quote (8 lines)
> The same list is printed every time on my system, because the diffing
> is only on the level of the canonical-name. Most of these services are
> being "replaced" by services that are exactly the same, so they don't
> really need to be restarted. I don't really know what to do about
> this, Even if it were fixed, on an actual upgrade I assume many of
> these services would be different, and thus would be printed
> legitimately.

Indeed. In addition, some low-level services such as file system mounts
cannot be restarted without rebooting, so it’s not useful to mention
them. Perhaps we should simply print (1) the list of services that were
restarted, and (2) a message saying that users should explicitly run
“herd restart SERVICE” to upgrade other services.

WDYT?

Toggle quote (3 lines)
> I'm also confused why some of these things are services (like
> host-name).

‘host-name’ could (should?) be an activation snippet.

Thank you!

Ludo’.
C
C
Carlo Zancanaro wrote on 2 Sep 2018 05:43
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 22039@debbugs.gnu.org)
87tvn8d0n7.fsf@zancanaro.id.au
On Sun, Sep 02 2018, Ludovic Courtès wrote:
Toggle quote (4 lines)
> First, could you check (in a VM) whether the boot failure is
> reproducible when that patch that removes ‘EINTR-safe’ is
> applied?

As far as I can tell it's completely reproducible.

Toggle quote (2 lines)
> If it’s 100% reproducible, could you share the VM’s output?

Sure. It's attached.
Attachment: vm-output
Toggle quote (9 lines)
> Indeed. In addition, some low-level services such as file
> system mounts cannot be restarted without rebooting, so it’s not
> useful to mention them. Perhaps we should simply print (1) the
> list of services that were restarted, and (2) a message saying
> that users should explicitly run “herd restart SERVICE” to
> upgrade other services.
>
> WDYT?

If there are services that must never be restarted, then maybe we
don't want to indiscriminately print out a message to restart
everything. We need some way to mark services that must not be
restarted. If that's the case, then we might as well just
automatically restart the services that we can rather than
printing a message saying to do so. What do we gain by adding an
extra step to that process?

Carlo
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEE1lpncq7JnOkt+LaeqdyPv9awIbwFAluLXHwACgkQqdyPv9aw
IbwcTg/9EOSZ1qDZpNNy4qPgdYYohIkwO2naKreMGbzzlRdDltpDXN55E4R3fgDX
QwYZVkTaMpWeWhXqB8EoaaMUnT2hCcvlooGitYqm0g8Qa8CgNUUNGNCD141MoHFJ
1GkcVCXUlXnxrIcoXFTB3tF965lEzTTu/aS/39ryUyh4GJyuPpoYg8FLWSFkWYmC
p21JJs64w7aGZZCxHqjGz2CxV0LQHAy/ycevysa/92/4Ui9GYDl0AJGsMvds7Gsg
L86YKPUwYSnl5Zsdf8mV0Ube5wLv9watJiPgOROPBEuNClhk7OVoSo85OQXrquXO
DGFPHUcqrDgpHaoJW/1rld1B3ZnzqmJUxkBMo6ilKcoIET+53XhCLcPioVbyA2hS
X17uZNpd8cvb0cIYF5ypeCKPJAM0HcStQYWguu4HDb+1Py4mKIuM3W2kCWGlcRzS
gv7+2Mttx84dNF3Uftv+g0AwgFVLWEiV3qz5f3M32CcoIyjcIzavbi6Tza5clrpI
+GGDA3GEKQH1vqfrPp2tNJTLfo3OrMU/UDzfxqQoT/20Q9ojGsu/p4QTYSlZN7OM
yWRGTUP1Dxlr3MVGatZIvOZuTNxQM5NMVPOCAO8WqYEqtjFjTiqVF0JXMPPlShIz
jimlcwFQ3TomjRFk1YF52ha5EVYYQAnsZEePbeKisnzY8Q5j8Ls=
=6IKU
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 2 Sep 2018 22:39
(name . Carlo Zancanaro)(address . carlo@zancanaro.id.au)(address . 22039@debbugs.gnu.org)
87y3cj4orq.fsf@gnu.org
Hi Carlo,

Carlo Zancanaro <carlo@zancanaro.id.au> skribis:

Toggle quote (13 lines)
> [ 18.924085] shepherd[1]: Service root-file-system has been started.
> [ 18.932361] shepherd[1]:
> [ 18.939972] shepherd[1]: Service user-file-systems has been started.
> [ 18.947889] shepherd[1]:
> [ 18.989611] shepherd[1]: waiting for udevd...
> [ 19.001396] shepherd[1]:
> [ 19.229174] udevd[267]: starting version 3.2.5
> failed to start service 'file-systems'
> could not create '/dev/autofs': File exists
> could not create '/dev/fuse': File exists
> could not create '/dev/cuse': File exists
> [ 19.525763] udevd[267]: starting eudev-3.2.5

[...]

Toggle quote (3 lines)
> [ 19.553794] udevd[267]: no sender credentials received, message ignored
> failed to start service 'file-system-/dev/pts'

[...]

Toggle quote (3 lines)
> [ 19.633995] udevd[267]: no sender credentials received, message ignored
> failed to start service 'file-system-/dev/shm'

[...]

Toggle quote (9 lines)
> [ 19.741025] udevd[267]: no sender credentials received, message ignored
> failed to start service 'user-processes'
> [ 19.773968] shepherd[1]: Service host-name has been started.
> [ 19.784495] udevd[268]: starting version 3.2.5
> [ 19.797674] shepherd[1]:
> could not create '/dev/autofs': File exists
> could not create '/dev/fuse': File exists
> [ 19.846310] udevd[269]: starting version 3.2.5

It looks as if udev failed to start initially, hence the subsequent
“failed to start 'file-system-*'” messages, but then we appear to have
several competing udevd processes, as if (exec-command (list udevd)) had
been executed multiple times. Hmm not sure what’s going on…

Ludo’.
L
L
Ludovic Courtès wrote on 19 Sep 2018 17:47
(name . Carlo Zancanaro)(address . carlo@zancanaro.id.au)(address . 22039@debbugs.gnu.org)
87lg7xh4l0.fsf@gnu.org
Hi Carlo,

Carlo Zancanaro <carlo@zancanaro.id.au> skribis:

Toggle quote (6 lines)
> On Sun, Sep 02 2018, Ludovic Courtès wrote:
>> First, could you check (in a VM) whether the boot failure is
>> reproducible when that patch that removes ‘EINTR-safe’ is applied?
>
> As far as I can tell it's completely reproducible.

Commit c4ba8c79db0aa4ba3517acc82ebafe16105fbb97 reinstates the commit
and removes the leftover #:replace, which was responsible for the
problem: in the context of the ‘start’ method of udev, ‘system*’ was
unbound, to ‘start’ would throw an exception and shepherd would call it
again (thinking udev had failed to start), indefinitely.

If there’s nothing left to add to Shepherd, we can release 0.5.0 within
a few days and then commit the Guix side of this change.

WDYT?

Thanks,
Ludo’.
C
C
Carlo Zancanaro wrote on 19 Sep 2018 22:56
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 22039@debbugs.gnu.org)
871s9pfbpg.fsf@zancanaro.id.au
Hey Ludo’,

On Thu, Sep 20 2018, Ludovic Courtès wrote:
Toggle quote (4 lines)
> Commit c4ba8c79db0aa4ba3517acc82ebafe16105fbb97 reinstates the
> commit and removes the leftover #:replace, which was responsible
> for the problem: ...

That's great! I didn't even know about the #:replace option, so
I'm glad you were able to find it.

Toggle quote (3 lines)
> If there’s nothing left to add to Shepherd, we can release 0.5.0
> within a few days and then commit the Guix side of this change.

This seems like the sort of thing that shouldn't have been this
tricky. Is the exception printed somewhere? If not, then I think
we should print the exception, or at least some information, when
a service fails to load.

Carlo
L
L
Ludovic Courtès wrote on 20 Sep 2018 11:47
(name . Carlo Zancanaro)(address . carlo@zancanaro.id.au)(address . 22039@debbugs.gnu.org)
87efdowleu.fsf@gnu.org
Hi,

Carlo Zancanaro <carlo@zancanaro.id.au> skribis:

Toggle quote (16 lines)
> On Thu, Sep 20 2018, Ludovic Courtès wrote:
>> Commit c4ba8c79db0aa4ba3517acc82ebafe16105fbb97 reinstates the
>> commit and removes the leftover #:replace, which was responsible for
>> the problem: ...
>
> That's great! I didn't even know about the #:replace option, so I'm
> glad you were able to find it.
>
>> If there’s nothing left to add to Shepherd, we can release 0.5.0
>> within a few days and then commit the Guix side of this change.
>
> This seems like the sort of thing that shouldn't have been this
> tricky. Is the exception printed somewhere? If not, then I think we
> should print the exception, or at least some information, when a
> service fails to load.

I agree. Note that ‘herd start foo’ prints at least a one-line message
showing the exception when that happens. The problem here is that
failure happens when ‘start’ is called from the shepherd config file.
At that point there’s no client connected and syslogd either around
either, so presumably messages go to /dev/kmsg and/or the console.

I wouldn’t consider it a blocker for 0.5.0 though. WDYT?

Thanks,
Ludo’.
C
C
Carlo Zancanaro wrote on 20 Sep 2018 12:24
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 22039@debbugs.gnu.org)
87tvmkqxe7.fsf@zancanaro.id.au
Toggle quote (2 lines)
> [...] so presumably messages go to /dev/kmsg and/or the console.

I don't remember seeing anything about the exception in any of the
output that I looked at. I'm a bit confused about where different
bits of output go, so I'll take a look at how output is handled in
a few weeks, when the rest of life settles down a bit.

Toggle quote (2 lines)
> I wouldn’t consider it a blocker for 0.5.0 though. WDYT?

Yeah, I agree. We should try to improve it, but as long as we
haven't made things worse (which we haven't) then it shouldn't
block a release.

We still need to work out what we want to do on the Guix side once
the Shepherd is released. Do we want to restart services that we
can, or print a message telling users how to do so? Maybe
individual services should be able to specify their preference?

Carlo
L
L
Ludovic Courtès wrote on 20 Sep 2018 13:08
(name . Carlo Zancanaro)(address . carlo@zancanaro.id.au)(address . 22039@debbugs.gnu.org)
87efdov32l.fsf@gnu.org
Carlo Zancanaro <carlo@zancanaro.id.au> skribis:

Toggle quote (5 lines)
> We still need to work out what we want to do on the Guix side once the
> Shepherd is released. Do we want to restart services that we can, or
> print a message telling users how to do so? Maybe individual services
> should be able to specify their preference?

I would reload and restart services currently stopped (what ‘guix system
reconfigure’ currently does), and replace all the other services. This
is what the patch you sent at https://issues.guix.info/issue/22039#24
does.

AIUI the only remaining issue is whether/how to print hints about
services that need to be manually restarted. In

Toggle quote (4 lines)
> Perhaps we should simply print (1) the list of services that were
> restarted, and (2) a message saying that users should explicitly run
> “herd restart SERVICE” to upgrade other services.

To which you replied:

Toggle quote (8 lines)
> If there are services that must never be restarted, then maybe we
> don't want to indiscriminately print out a message to restart
> everything. We need some way to mark services that must not be
> restarted. If that's the case, then we might as well just
> automatically restart the services that we can rather than
> printing a message saying to do so. What do we gain by adding an
> extra step to that process?

From the POV of the Shepherd, services carry no semantics. The Shepherd
cannot guess that restarting ‘udev’ or ‘file-system-xyz’ is impractical
(try it :-)). Leaf services like ‘ssh-daemon’ can generally be
restarted, but whether or not now is a good time to do it is something
only the user can decide. That’s why the only services which are safe
to restart right away are those currently stopped (and those that can be
hot-swapped like nginx.)

Thus I think it’s reasonable to print a message along the lines of:

The following services were upgraded: …
Please run “herd restart SERVICE” to stop, upgrade, and restart
services that were not automatically upgraded.

WDYT?

Thanks,
Ludo’.
C
C
Carlo Zancanaro wrote on 20 Sep 2018 13:50
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 22039@debbugs.gnu.org)
87sh24qtes.fsf@zancanaro.id.au
Hey Ludo’,

Toggle quote (2 lines)
> From the POV of the Shepherd, services carry no semantics.

In Guix we have as much information as possible about the
services. We should be know which services should be upgraded
automatically, which ones we should prompt the user to upgrade,
and which ones are never safe to upgrade. Maybe we could add a
"restart-strategy" to the shepherd-service object?

Toggle quote (9 lines)
> Thus I think it’s reasonable to print a message along the lines
> of:
>
> The following services were upgraded: …
> Please run “herd restart SERVICE” to stop, upgrade, and
> restart services that were not automatically upgraded.
>
> WDYT?

The main reasons I'm not super happy with this are that it's not
discoverable (which is bad for new users), and it requires
interaction (so cannot be an unattended upgrade). In particular
for discoverability, some of our services don't take advantage of
the Shepherd's ability to have multiple "provision" values. For
instance, I just have to know that to restart wicd I have to run
"herd restart networking".

Maybe this should be a separate ticket. Replacing the services and
printing a generic message will still be an improvement on what
Guix currently does, and I don't want to hold that up just because
I think we can do better.

Carlo
L
L
Ludovic Courtès wrote on 21 Sep 2018 13:58
(name . Carlo Zancanaro)(address . carlo@zancanaro.id.au)(address . 22039@debbugs.gnu.org)
87r2hn2hbo.fsf@gnu.org
Hello,

Carlo Zancanaro <carlo@zancanaro.id.au> skribis:

Toggle quote (8 lines)
>> From the POV of the Shepherd, services carry no semantics.
>
> In Guix we have as much information as possible about the services. We
> should be know which services should be upgraded automatically, which
> ones we should prompt the user to upgrade, and which ones are never
> safe to upgrade. Maybe we could add a "restart-strategy" to the
> shepherd-service object?

What would you put there? Do you have concrete examples?

Note that FHS distros don’t do better: either the service is
hot-replaceable (nginx; I don’t know of any other) or can at least
reload its config (sshd, etc.), and then it’s dynamically upgraded, or
it’ll be upgraded next time you restart it.

That’s because fundamentally only the user can tell whether now is a
good time to restart, say, sshd.

In Debian, “apt-get dist-upgrade” opens a dialog box asking the user
whether services can be restarted right away, IIRC.

Toggle quote (12 lines)
>> Thus I think it’s reasonable to print a message along the lines of:
>>
>> The following services were upgraded: …
>> Please run “herd restart SERVICE” to stop, upgrade, and restart
>> services that were not automatically upgraded.
>>
>> WDYT?
>
> The main reasons I'm not super happy with this are that it's not
> discoverable (which is bad for new users), and it requires interaction
> (so cannot be an unattended upgrade).

I agree, but I don’t think full unattended upgrades exist out there.
I’m not saying this is good, but rather that this is hard and beyond
the scope of this patch.

Toggle quote (5 lines)
> In particular for discoverability, some of our services don't take
> advantage of the Shepherd's ability to have multiple "provision"
> values. For instance, I just have to know that to restart wicd I have
> to run "herd restart networking".

There’s ‘guix system search’ that provides this kind of info (see
https://issues.guix.info/issue/29707), but I agree we could do better.

Toggle quote (5 lines)
> Maybe this should be a separate ticket. Replacing the services and
> printing a generic message will still be an improvement on what Guix
> currently does, and I don't want to hold that up just because I think
> we can do better.

Yes, I think this should be a separate ticket. We can go with your
patch and a message along the lines of what we discussed above, and then
work on the improvements you mentioned, one at a time. That way we’ll
have the warm feeling of having achieved something, even if there’s more
to come. :-)

Thank you!

Ludo’.
E
E
Efraim Flashner wrote on 23 Sep 2018 10:26
(name . Ludovic Courtès)(address . ludo@gnu.org)
20180923082613.GA1226@macbook41
On Fri, Sep 21, 2018 at 01:58:03PM +0200, Ludovic Courtès wrote:
Toggle quote (14 lines)
> Hello,
>
> Carlo Zancanaro <carlo@zancanaro.id.au> skribis:
>
> >> From the POV of the Shepherd, services carry no semantics.
> >
> > In Guix we have as much information as possible about the services. We
> > should be know which services should be upgraded automatically, which
> > ones we should prompt the user to upgrade, and which ones are never
> > safe to upgrade. Maybe we could add a "restart-strategy" to the
> > shepherd-service object?
>
> What would you put there? Do you have concrete examples?

Restart/reload/whatever unless explicitly disabled?

Toggle quote (9 lines)
>
> Note that FHS distros don’t do better: either the service is
> hot-replaceable (nginx; I don’t know of any other) or can at least
> reload its config (sshd, etc.), and then it’s dynamically upgraded, or
> it’ll be upgraded next time you restart it.
>
> That’s because fundamentally only the user can tell whether now is a
> good time to restart, say, sshd.

Not exactly the point, but Debian regularly restarts sshd for me on a
remote box (somehow) without me losing the connection.

Toggle quote (13 lines)
>
> In Debian, “apt-get dist-upgrade” opens a dialog box asking the user
> whether services can be restarted right away, IIRC.
>
> >> Thus I think it’s reasonable to print a message along the lines of:
> >>
> >> The following services were upgraded: …
> >> Please run “herd restart SERVICE” to stop, upgrade, and restart
> >> services that were not automatically upgraded.
> >>
> >> WDYT?
>

This sounds like a really good idea, especially if we limit to ones that
are less likely to cause problems if restarted (like filesystems). We
still have to figure out something for databases and upgrading them from
one version to another.


--
Efraim Flashner <efraim@flashner.co.il> ????? ?????
GPG key = A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted
-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEoov0DD5VE3JmLRT3Qarn3Mo9g1EFAlunTiIACgkQQarn3Mo9
g1HFOw/+OqrtSkTHtYM6bcDFfxtglp6pkKdkNeNOB5IwtT1LLf12oTZSPKcB6obt
jR+Wd/lUIMcwC6qLx5Q5vJMmFUPK1Du9+XsEKuOiVycZLX3lsOz616c4H6jTZgLn
5IZOoy+Xt+s4ffDSwhzCcjYMEVu/ourAHEvMCCmZDNSCg7DNoD0F0Ir/4eax2NIb
avTQ/KckwgS5FmUKU8QPdfzr5rrAotOJjVRkiHlibVu2iOCVFuhBlWb77t/YBrul
umUVmjKsA26K2i1H9YMn8LkO2b1Yc5WpWK9DRnWJP89rLdYBQCuxkpipLYj8xHL8
hj/2TEFz5Q56BX9oEpN98t2n8nj1Q1IlnVqUt0kmgeHFqI7UkqKkIeWB17ievIE9
CW5hSkaeRSYUTXqbxRk3YzFyRByJ8FMMm6F+n8GUDDT9YSklhe2xhBftKPGLrp97
5XnN8Ki+7jpl31fhfxYGBiNeKcITv8ExwBzb8wsj4Yc25LyODqvaq8PcFYn//Jnr
4oAqlB2NB2ivSiJRomjS4c211UY7KekxjA8q82jqenELGQIek0PutaYglPBe5qZk
40SJxX1mkHYdUkaNGPKl2tGnbsUm2i0GW6m+BcUp0OvVMYPyZQzsufal+bVx19Zc
z6ut8xWmFe2u2ug4vTyCY628B2e5w+It/d0k9lhTz7/wmZIAyUg=
=Tw7l
-----END PGP SIGNATURE-----


L
L
Ludovic Courtès wrote on 23 Sep 2018 21:53
(name . Efraim Flashner)(address . efraim@flashner.co.il)
87lg7srnw0.fsf@gnu.org
Hi,

Efraim Flashner <efraim@flashner.co.il> skribis:

Toggle quote (2 lines)
> On Fri, Sep 21, 2018 at 01:58:03PM +0200, Ludovic Courtès wrote:

[...]

Toggle quote (11 lines)
>> Note that FHS distros don’t do better: either the service is
>> hot-replaceable (nginx; I don’t know of any other) or can at least
>> reload its config (sshd, etc.), and then it’s dynamically upgraded, or
>> it’ll be upgraded next time you restart it.
>>
>> That’s because fundamentally only the user can tell whether now is a
>> good time to restart, say, sshd.
>
> Not exactly the point, but Debian regularly restarts sshd for me on a
> remote box (somehow) without me losing the connection.

Good point! I think sshd opens child processes for new sessions, and
thanks to that it falls into the category of service that can be
hot-replaced.

For hot-swappable daemons, I think we should provide a specific ‘reload’
or ‘upgrade’ action as was discussed at
https://issues.guix.info/issue/26830. That way, to figure out the
right strategy, we would just check whether the service supports that
action.

Thanks,
Ludo’.
C
C
Carlo Zancanaro wrote on 24 Sep 2018 01:06
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 22039@debbugs.gnu.org)
87lg7ru83l.fsf@zancanaro.id.au
Hey Ludo’,

On Fri, Sep 21 2018, Ludovic Courtès wrote:
Toggle quote (2 lines)
> What would you put there? Do you have concrete examples?

I would have three possible values: 'never, 'always, 'manual.

'never would mean that the service should never be restarted. This
is for things like udev, or the filesystems, which should never be
restarted on a running system.

'always would mean that the service is always safe to restart. I
don't immediately know what services would fit in this category
(maybe sshd, given Efraim's comment; maybe ntpd? I'm sure there
are others). Things like nginx will probably not fall into this
category, because they involve some downtime when restarting.
Reloading their configuration (via a "reload" action, or similar)
is not enough because the binary and/or libraries might have
changed (and, in the worst case, might have an incompatible
configuration format, although I would expect that to be
exceedingly rare).

'manual would mean that the service should be restarted, but it
need to be done at an appropriate time. This should prompt the
user with the names of the services, and we should provide an
option to guix system reconfigure to restart these services as
part of the reconfigure. We could call the option
"--restart-services".

Toggle quote (7 lines)
>> [ ... ] I just have to know that to restart wicd I have to run
>> "herd restart networking".
>
> There’s ‘guix system search’ that provides this kind of info
> (see <https://issues.guix.info/issue/29707>), but I agree we
> could do better.

I actually checked this before sending my previous message, but I
didn't see that it includes "shepherdnames". I tested with "guix
system search wicd" which didn't show any, but I see now that
searching "guix system search xmpp" does helpfully show how to
restart the service.

Toggle quote (5 lines)
> We can go with your patch and a message along the lines of what
> we discussed above, and then work on the improvements you
> mentioned, one at a time. That way we’ll have the warm feeling
> of having achieved something, even if there’s more to come. :-)

I won't be able to look at writing the code for this for a few
weeks, but hopefully I'll get to it around mid- to late-October.

Carlo
L
L
Ludovic Courtès wrote on 24 Sep 2018 10:58
(name . Carlo Zancanaro)(address . carlo@zancanaro.id.au)(address . 22039@debbugs.gnu.org)
877ejbxodp.fsf@gnu.org
Hi Carlo,

Carlo Zancanaro <carlo@zancanaro.id.au> skribis:

Toggle quote (25 lines)
> On Fri, Sep 21 2018, Ludovic Courtès wrote:
>> What would you put there? Do you have concrete examples?
>
> I would have three possible values: 'never, 'always, 'manual.
>
> 'never would mean that the service should never be restarted. This is
> for things like udev, or the filesystems, which should never be
> restarted on a running system.
>
> 'always would mean that the service is always safe to restart. I don't
> immediately know what services would fit in this category (maybe sshd,
> given Efraim's comment; maybe ntpd? I'm sure there are others). Things
> like nginx will probably not fall into this category, because they
> involve some downtime when restarting. Reloading their configuration
> (via a "reload" action, or similar) is not enough because the binary
> and/or libraries might have changed (and, in the worst case, might
> have an incompatible configuration format, although I would expect
> that to be exceedingly rare).
>
> 'manual would mean that the service should be restarted, but it need
> to be done at an appropriate time. This should prompt the user with
> the names of the services, and we should provide an option to guix
> system reconfigure to restart these services as part of the
> reconfigure. We could call the option "--restart-services".

OK, I see.

Toggle quote (8 lines)
>> We can go with your patch and a message along the lines of what we
>> discussed above, and then work on the improvements you mentioned,
>> one at a time. That way we’ll have the warm feeling of having
>> achieved something, even if there’s more to come. :-)
>
> I won't be able to look at writing the code for this for a few weeks,
> but hopefully I'll get to it around mid- to late-October.

If that’s fine with you, I can apply the patch you initially posted so
we can start taking advantage of it (I’d like to push a Guix release by
the end of October.) WDYT?

Thanks!

Ludo’.
C
C
Carlo Zancanaro wrote on 24 Sep 2018 12:18
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 22039@debbugs.gnu.org)
87k1nb5hbl.fsf@zancanaro.id.au
Hey Ludo’,

On Mon, Sep 24 2018, Ludovic Courtès wrote:
Toggle quote (4 lines)
> If that’s fine with you, I can apply the patch you initially
> posted so we can start taking advantage of it (I’d like to push
> a Guix release by the end of October.) WDYT?

That sounds good to me!

Thanks for your patience through this. It's taken a bit of time
for my ideas to fully form, but I think it's coming together.

Carlo
L
L
Ludovic Courtès wrote on 26 Sep 2018 23:46
(name . Carlo Zancanaro)(address . carlo@zancanaro.id.au)(address . 22039-done@debbugs.gnu.org)
87y3boylsv.fsf@gnu.org
Hello!

I went ahead and pushed the patch as
4245ddcbc9f935804c17c97872b90ec1050c2d75.

One modification I had to make and which I hadn’t though of before is
the new ‘load-services/safe’ procedure I added: it makes sure it DTRT
when talking to shepherd < 0.15.0.

I’ve reconfigured from master, and so far so good! :-)

I’m closing this issue. I suggest opening new ones for specific
improvements we discussed.

Thank you!

Ludo’.
Closed
?