opendht-service-type hangs Shepherd at boot

  • Done
  • quality assurance status badge
Details
One participant
  • Maxim Cournoyer
Owner
unassigned
Submitted by
Maxim Cournoyer
Severity
normal
M
M
Maxim Cournoyer wrote on 19 May 2021 13:59
(name . bug-guix)(address . bug-guix@gnu.org)
874kezhr3c.fsf@gmail.com
Hello,

I just noticed about this problem following a reboot. I can also
reproduce it in 'guix system vm', simply adding the opendht-service-type
to my operating-system declaration.

The boot proceeds until 'error in finalization thread: Success' then
hangs indefinitely.

What is troubling for me is that the service is rather straightforwardly
defined. It uses the make-forkexec-constructor/container like so:

Toggle snippet (15 lines)
(define (opendht-shepherd-service config)
"Return a <shepherd-service> running OpenDHT."
(shepherd-service
(documentation "Run an OpenDHT node.")
(provision '(opendht dhtnode dhtproxy))
(requirement '(user-processes syslogd))
(start #~(make-forkexec-constructor/container
(list #$@(opendht-configuration->command-line-arguments config))
#:mappings (list (file-system-mapping
(source "/dev/log") ;for syslog
(target source)))
#:user "opendht"))
(stop #~(make-kill-destructor))))

I'm not sure how using such basic building blocks could lead to a hang
in Shepherd ?

Thanks,

Maxim
M
M
Maxim Cournoyer wrote on 19 May 2021 23:36
(address . 48521@debbugs.gnu.org)
87zgwqh0d5.fsf@gmail.com
Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (29 lines)
> Hello,
>
> I just noticed about this problem following a reboot. I can also
> reproduce it in 'guix system vm', simply adding the opendht-service-type
> to my operating-system declaration.
>
> The boot proceeds until 'error in finalization thread: Success' then
> hangs indefinitely.
>
> What is troubling for me is that the service is rather straightforwardly
> defined. It uses the make-forkexec-constructor/container like so:
>
> (define (opendht-shepherd-service config)
> "Return a <shepherd-service> running OpenDHT."
> (shepherd-service
> (documentation "Run an OpenDHT node.")
> (provision '(opendht dhtnode dhtproxy))
> (requirement '(user-processes syslogd))
> (start #~(make-forkexec-constructor/container
> (list #$@(opendht-configuration->command-line-arguments config))
> #:mappings (list (file-system-mapping
> (source "/dev/log") ;for syslog
> (target source)))
> #:user "opendht"))
> (stop #~(make-kill-destructor))))
>
> I'm not sure how using such basic building blocks could lead to a hang
> in Shepherd ?

After much trial and error, the service can be made to not hang Shepherd
with the removal of the mappings argument:

Toggle snippet (15 lines)
modified gnu/services/networking.scm
@@ -845,9 +845,9 @@ CONFIG, an <opendht-configuration> object."
(requirement '(user-processes networking syslogd))
(start #~(make-forkexec-constructor/container
(list #$@(opendht-configuration->command-line-arguments config))
- #:mappings (list (file-system-mapping
- (source "/dev/log") ;for syslog
- (target source)))
+ ;; #:mappings (list (file-system-mapping
+ ;; (source "/dev/log") ;for syslog
+ ;; (target source)))
#:user "opendht"))
(stop #~(make-kill-destructor))))

I have no idea why that is, but given that the tor-service-type does the
same thing, I can only conclude that it is some strange interaction
between dhtnode and syslog.

The above fixes the hang, but breaks logging to syslog.

Ideas?

Maxim
M
M
Maxim Cournoyer wrote on 20 May 2021 04:52
(address . 48521-done@debbugs.gnu.org)
87v97eglrd.fsf@gmail.com
Hello,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (29 lines)
> Hello,
>
> I just noticed about this problem following a reboot. I can also
> reproduce it in 'guix system vm', simply adding the opendht-service-type
> to my operating-system declaration.
>
> The boot proceeds until 'error in finalization thread: Success' then
> hangs indefinitely.
>
> What is troubling for me is that the service is rather straightforwardly
> defined. It uses the make-forkexec-constructor/container like so:
>
> (define (opendht-shepherd-service config)
> "Return a <shepherd-service> running OpenDHT."
> (shepherd-service
> (documentation "Run an OpenDHT node.")
> (provision '(opendht dhtnode dhtproxy))
> (requirement '(user-processes syslogd))
> (start #~(make-forkexec-constructor/container
> (list #$@(opendht-configuration->command-line-arguments config))
> #:mappings (list (file-system-mapping
> (source "/dev/log") ;for syslog
> (target source)))
> #:user "opendht"))
> (stop #~(make-kill-destructor))))
>
> I'm not sure how using such basic building blocks could lead to a hang
> in Shepherd ?

It seems Shepherd can't cope with a failing start procedure/script when
a variable was not bound. To diagnose the problem, the best way ended
up being to extract the code of the constructor in a separate script to
run it separately. This made the error quickly apparent: "Unbound
variable: file-system-mapping".

We should try to handle this class of errors in Shepherd and report a
useful message and *not* crash Shepherd or otherwise hang.

Pushed with commit a09cdf1f9d.

Closing.

Maxim
Closed
?
Your comment

This issue is archived.

To comment on this conversation send an email to 48521@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 48521
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch