opendht-service-type hangs Shepherd at boot

  • Done
  • quality assurance status badge
Details
One participant
  • Maxim Cournoyer
Owner
unassigned
Submitted by
Maxim Cournoyer
Severity
normal
M
M
Maxim Cournoyer wrote on 19 May 2021 13:59
(name . bug-guix)(address . bug-guix@gnu.org)
874kezhr3c.fsf@gmail.com
Hello,

I just noticed about this problem following a reboot. I can also
reproduce it in 'guix system vm', simply adding the opendht-service-type
to my operating-system declaration.

The boot proceeds until 'error in finalization thread: Success' then
hangs indefinitely.

What is troubling for me is that the service is rather straightforwardly
defined. It uses the make-forkexec-constructor/container like so:

Toggle snippet (15 lines)
(define (opendht-shepherd-service config)
"Return a <shepherd-service> running OpenDHT."
(shepherd-service
(documentation "Run an OpenDHT node.")
(provision '(opendht dhtnode dhtproxy))
(requirement '(user-processes syslogd))
(start #~(make-forkexec-constructor/container
(list #$@(opendht-configuration->command-line-arguments config))
#:mappings (list (file-system-mapping
(source "/dev/log") ;for syslog
(target source)))
#:user "opendht"))
(stop #~(make-kill-destructor))))

I'm not sure how using such basic building blocks could lead to a hang
in Shepherd ?

Thanks,

Maxim
M
M
Maxim Cournoyer wrote on 19 May 2021 23:36
(address . 48521@debbugs.gnu.org)
87zgwqh0d5.fsf@gmail.com
Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (29 lines)
> Hello,
>
> I just noticed about this problem following a reboot. I can also
> reproduce it in 'guix system vm', simply adding the opendht-service-type
> to my operating-system declaration.
>
> The boot proceeds until 'error in finalization thread: Success' then
> hangs indefinitely.
>
> What is troubling for me is that the service is rather straightforwardly
> defined. It uses the make-forkexec-constructor/container like so:
>
> (define (opendht-shepherd-service config)
> "Return a <shepherd-service> running OpenDHT."
> (shepherd-service
> (documentation "Run an OpenDHT node.")
> (provision '(opendht dhtnode dhtproxy))
> (requirement '(user-processes syslogd))
> (start #~(make-forkexec-constructor/container
> (list #$@(opendht-configuration->command-line-arguments config))
> #:mappings (list (file-system-mapping
> (source "/dev/log") ;for syslog
> (target source)))
> #:user "opendht"))
> (stop #~(make-kill-destructor))))
>
> I'm not sure how using such basic building blocks could lead to a hang
> in Shepherd ?

After much trial and error, the service can be made to not hang Shepherd
with the removal of the mappings argument:

Toggle snippet (15 lines)
modified gnu/services/networking.scm
@@ -845,9 +845,9 @@ CONFIG, an <opendht-configuration> object."
(requirement '(user-processes networking syslogd))
(start #~(make-forkexec-constructor/container
(list #$@(opendht-configuration->command-line-arguments config))
- #:mappings (list (file-system-mapping
- (source "/dev/log") ;for syslog
- (target source)))
+ ;; #:mappings (list (file-system-mapping
+ ;; (source "/dev/log") ;for syslog
+ ;; (target source)))
#:user "opendht"))
(stop #~(make-kill-destructor))))

I have no idea why that is, but given that the tor-service-type does the
same thing, I can only conclude that it is some strange interaction
between dhtnode and syslog.

The above fixes the hang, but breaks logging to syslog.

Ideas?

Maxim
M
M
Maxim Cournoyer wrote on 20 May 2021 04:52
(address . 48521-done@debbugs.gnu.org)
87v97eglrd.fsf@gmail.com
Hello,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (29 lines)
> Hello,
>
> I just noticed about this problem following a reboot. I can also
> reproduce it in 'guix system vm', simply adding the opendht-service-type
> to my operating-system declaration.
>
> The boot proceeds until 'error in finalization thread: Success' then
> hangs indefinitely.
>
> What is troubling for me is that the service is rather straightforwardly
> defined. It uses the make-forkexec-constructor/container like so:
>
> (define (opendht-shepherd-service config)
> "Return a <shepherd-service> running OpenDHT."
> (shepherd-service
> (documentation "Run an OpenDHT node.")
> (provision '(opendht dhtnode dhtproxy))
> (requirement '(user-processes syslogd))
> (start #~(make-forkexec-constructor/container
> (list #$@(opendht-configuration->command-line-arguments config))
> #:mappings (list (file-system-mapping
> (source "/dev/log") ;for syslog
> (target source)))
> #:user "opendht"))
> (stop #~(make-kill-destructor))))
>
> I'm not sure how using such basic building blocks could lead to a hang
> in Shepherd ?

It seems Shepherd can't cope with a failing start procedure/script when
a variable was not bound. To diagnose the problem, the best way ended
up being to extract the code of the constructor in a separate script to
run it separately. This made the error quickly apparent: "Unbound
variable: file-system-mapping".

We should try to handle this class of errors in Shepherd and report a
useful message and *not* crash Shepherd or otherwise hang.

Pushed with commit a09cdf1f9d.

Closing.

Maxim
Closed
?