Hello, When running "gui-installed-desktop-os-encrypted" test, Shepherd seemsto deadlock when restarting "guix-daemon". This can happen at differentstages: * In "umount-cow-store" procedure, just before finishing the install. * During "set-http-proxy" tests inside the marionette.This is not always reproducible. In order to gather some information, Icreated a Shepherd "strace" service that logs what's happening inShepherd itself (patch attached). It seems that, just after blocking signals, in "fork+exec-command", Iguess, Shepherd is taking a lock:
I think this is caused by a "pthread_join", most probably the one in"stop_finalization_thread" that is called right before forking a newprocess. The fact that we hang here probably means that the finalizerthread itself is hanging, not sure why. It looks like what was reported by Ludo here:https://issues.guix.info/31925. Thanks, Mathieu
Toggle quote (12 lines)> When running "gui-installed-desktop-os-encrypted" test, Shepherd seems> to deadlock when restarting "guix-daemon". This can happen at different> stages:>> * In "umount-cow-store" procedure, just before finishing the install.>> * During "set-http-proxy" tests inside the marionette.> > This is not always reproducible. In order to gather some information, I> created a Shepherd "strace" service that logs what's happening in> Shepherd itself (patch attached).
We should be able to reproduce it with much simpler tests then, right?Like maybe “while : ; do herd restart guix-daemon ; done” or similar?
When that happens, we should check how many threads exist in PID 1.There should be the finalization thread and the main thread, plus thesignal thread (because there are still ‘sigaction’ calls in the ‘main’procedure), plus the GC marker threads. In https://issues.guix.gnu.org/31925#6, Andy suggests that the signalthread is not properly handled; indeed it takes locks and we don’t tryto shut it down upon fork. However, when using signalfd, the signalthread must be stuck in its ‘read’ call in ‘read_signal_pipe_data’, so Idon’t see how it could cause problems. The GC threads are presumably taken care of by the atfork handler inlibgc. Thoughts? Ludo’.