Hi Ludovic, ludo@gnu.org (Ludovic Courtès) writes: > (+Cc: Andy as the ultimate authority for all these things. :-)) > > ludo@gnu.org (Ludovic Courtès) skribis: > >> (let loop ((files files) >> (n 0)) >> (match files >> ((file . tail) >> (call-with-input-file file >> (lambda (port) >> (call-with-decompressed-port 'gzip port >> (lambda (port) >> (let loop () >> (unless (eof-object? (get-bytevector-n port 777)) >> (loop))))))) >> ;; (pk 'loop n file) >> (display ".") >> (loop tail (+ n 1))))) > > One problem I’ve noticed is that the child process that > ‘call-with-decompressed-port’ spawns would be stuck trying to get the > allocation lock: > > (gdb) bt > #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 > #1 0x00007f9fd8d5cb25 in __GI___pthread_mutex_lock (mutex=0x7f9fd91b3240 ) at ../nptl/pthread_mutex_lock.c:78 > #2 0x00007f9fd8f8ef8f in GC_call_with_alloc_lock (fn=fn@entry=0x7f9fd92b0420 , client_data=client_data@entry=0x7ffe4b9a0d80) at misc.c:1929 > #3 0x00007f9fd92b1270 in copy_weak_entry (dst=0x7ffe4b9a0d70, src=0x759ed0) at weak-set.c:124 > #4 weak_set_remove_x (closure=0x8850c0, pred=0x7f9fd92b0440 , hash=3944337866184184181, set=0x70cf00) at weak-set.c:615 > #5 scm_c_weak_set_remove_x (set=set@entry=#, raw_hash=, pred=pred@entry=0x7f9fd92b0440 , closure=closure@entry=0x8850c0) at weak-set.c:791 > #6 0x00007f9fd92b13b0 in scm_weak_set_remove_x (set=#, obj=obj@entry=#) at weak-set.c:812 > #7 0x00007f9fd926f72f in close_port (port=#, explicit=) at ports.c:884 > #8 0x00007f9fd92ad307 in vm_regular_engine (thread=0x7f9fd91b3240 , vp=0x7adf30, registers=0x0, resume=-657049556) at vm-engine.c:786 > #9 0x00007f9fd92afb37 in scm_call_n (proc=0x7f9fd959b030, argv=argv@entry=0x7ffe4b9a1018, nargs=nargs@entry=1) at vm.c:1257 > #10 0x00007f9fd9233017 in scm_primitive_eval (exp=, exp@entry=0x855280) at eval.c:662 > #11 0x00007f9fd9233073 in scm_eval (exp=0x855280, module_or_state=module_or_state@entry=0x83d140) at eval.c:696 > #12 0x00007f9fd927e8d0 in scm_shell (argc=2, argv=0x7ffe4b9a1668) at script.c:454 > #13 0x00007f9fd9249a9d in invoke_main_func (body_data=0x7ffe4b9a1510) at init.c:340 > #14 0x00007f9fd922c28a in c_body (d=0x7ffe4b9a1450) at continuations.c:422 > #15 0x00007f9fd92ad307 in vm_regular_engine (thread=0x7f9fd91b3240 , vp=0x7adf30, registers=0x0, resume=-657049556) at vm-engine.c:786 > #16 0x00007f9fd92afb37 in scm_call_n (proc=proc@entry=#, argv=argv@entry=0x0, nargs=nargs@entry=0) at vm.c:1257 > #17 0x00007f9fd9231e69 in scm_call_0 (proc=proc@entry=#) at eval.c:481 > #18 0x00007f9fd929e7b2 in catch (tag=tag@entry=#t, thunk=#, handler=0x7950c0, pre_unwind_handler=0x7950a0) at throw.c:137 > #19 0x00007f9fd929ea95 in scm_catch_with_pre_unwind_handler (key=key@entry=#t, thunk=, handler=, pre_unwind_handler=) at throw.c:254 > #20 0x00007f9fd929ec5f in scm_c_catch (tag=tag@entry=#t, body=body@entry=0x7f9fd922c280 , body_data=body_data@entry=0x7ffe4b9a1450, handler=handler@entry=0x7f9fd922c510 , handler_data=handler_data@entry=0x7ffe4b9a1450, pre_unwind_handler=pre_unwind_handler@entry=0x7f9fd922c370 , pre_unwind_handler_data=0x7a9bc0) at throw.c:377 > #21 0x00007f9fd922c870 in scm_i_with_continuation_barrier (body=body@entry=0x7f9fd922c280 , body_data=body_data@entry=0x7ffe4b9a1450, handler=handler@entry=0x7f9fd922c510 , handler_data=handler_data@entry=0x7ffe4b9a1450, pre_unwind_handler=pre_unwind_handler@entry=0x7f9fd922c370 , pre_unwind_handler_data=0x7a9bc0) at continuations.c:360 > #22 0x00007f9fd922c905 in scm_c_with_continuation_barrier (func=, data=) at continuations.c:456 > #23 0x00007f9fd929d3ec in with_guile (base=base@entry=0x7ffe4b9a14b8, data=data@entry=0x7ffe4b9a14e0) at threads.c:661 > #24 0x00007f9fd8f8efb8 in GC_call_with_stack_base (fn=fn@entry=0x7f9fd929d3a0 , arg=arg@entry=0x7ffe4b9a14e0) at misc.c:1949 > #25 0x00007f9fd929d708 in scm_i_with_guile (dynamic_state=, data=data@entry=0x7ffe4b9a14e0, func=func@entry=0x7f9fd9249a80 ) at threads.c:704 > #26 scm_with_guile (func=func@entry=0x7f9fd9249a80 , data=data@entry=0x7ffe4b9a1510) at threads.c:710 > #27 0x00007f9fd9249c32 in scm_boot_guile (argc=argc@entry=2, argv=argv@entry=0x7ffe4b9a1668, main_func=main_func@entry=0x400cb0 , closure=closure@entry=0x0) at init.c:323 > #28 0x0000000000400b70 in main (argc=2, argv=0x7ffe4b9a1668) at guile.c:101 > (gdb) info threads > Id Target Id Frame > * 1 Thread 0x7f9fd972eb80 (LWP 15573) "guile" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 > > So it seems quite clear that the thing has the alloc lock taken. I > suppose this can happen if one of the libgc threads runs right when we > call fork and takes the alloc lock, right? Does libgc spawn threads that run concurrently with user threads? If so, that would be news to me. My understanding was that incremental marking occurs within GC allocation calls, and marking threads are only spawned after all user threads have been stopped, but I could be wrong. The first idea that comes to my mind is that perhaps the finalization thread is holding the GC allocation lock when 'fork' is called. The finalization thread grabs the GC allocation lock every time it calls 'GC_invoke_finalizers'. All ports backed by POSIX file descriptors (including pipes) register finalizers and therefore spawn the finalization thread and make work for it to do. Another possibility: both the finalization thread and the signal delivery thread call 'scm_without_guile', which calls 'GC_do_blocking', which also temporarily grabs the GC allocation lock before calling the specified function. See 'GC_do_blocking_inner' in pthread_support.c in libgc. You spawn the signal delivery thread by calling 'sigaction' and you make work for it to do every second when the SIGALRM is delivered. > If that is correct, the fix would be to call fork within > ‘GC_call_with_alloc_lock’. > > How does that sound? Sure, sounds good to me. > As a workaround on the Guix side, we might achieve the same effect by > calling ‘gc-disable’ right before ‘primitive-fork’. I don't think this would help. Thanks, Mark