From debbugs-submit-bounces@debbugs.gnu.org Thu Jul 05 04:01:04 2018 Received: (at 31925) by debbugs.gnu.org; 5 Jul 2018 08:01:04 +0000 Received: from localhost ([127.0.0.1]:47201 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fazCK-00005K-8D for submit@debbugs.gnu.org; Thu, 05 Jul 2018 04:01:04 -0400 Received: from pb-sasl1.pobox.com ([64.147.108.66]:56600 helo=sasl.smtp.pobox.com) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fazCH-000051-V9 for 31925@debbugs.gnu.org; Thu, 05 Jul 2018 04:01:02 -0400 Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by pb-sasl1.pobox.com (Postfix) with ESMTP id A75C4D8C5C; Thu, 5 Jul 2018 04:01:01 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type:content-transfer-encoding; s=sasl; bh=kT6Q9JJiCmW0 OcaRNNz2fD19Cl0=; b=dN0RWOYsuTT2+/kQy5apPAZG0DAsGfFYr/N82wvmjl82 zpT0T5+XcrlozW+KIepsel5jKwpt5qayVgs7j+DYD6LUxyZJeSJolOISLJqka7h2 O5SL4CrGXUAFJ00D98960nDqi3D3lV24v8nihsv5L7AFl5I1Rz1Mws9Pzs8IOoo= Received: from pb-sasl1.nyi.icgroup.com (unknown [127.0.0.1]) by pb-sasl1.pobox.com (Postfix) with ESMTP id 663E9D8C57; Thu, 5 Jul 2018 04:01:01 -0400 (EDT) Received: from sparrow (unknown [88.160.190.192]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by pb-sasl1.pobox.com (Postfix) with ESMTPSA id 2F525D8C54; Thu, 5 Jul 2018 04:00:59 -0400 (EDT) From: Andy Wingo To: Mark H Weaver Subject: Re: bug#31925: 'guix substitutes' sometimes hangs on glibc 2.27 References: <87bmc4748n.fsf@gnu.org> <874lhffpnq.fsf@gnu.org> <87tvpfaqfd.fsf@gnu.org> <87efgil5jz.fsf@netris.org> Date: Thu, 05 Jul 2018 10:00:52 +0200 In-Reply-To: <87efgil5jz.fsf@netris.org> (Mark H. Weaver's message of "Wed, 04 Jul 2018 23:33:52 -0400") Message-ID: <87lgaqjemj.fsf@igalia.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Pobox-Relay-ID: 8E558238-8029-11E8-AAF2-46F7D6707B88-02397024!pb-sasl1.pobox.com X-Spam-Score: 0.7 (/) X-Debbugs-Envelope-To: 31925 Cc: 31925@debbugs.gnu.org, Ludovic =?utf-8?Q?Court=C3=A8s?= X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.3 (/) Hi! On Thu 05 Jul 2018 05:33, Mark H Weaver writes: >> One problem I=E2=80=99ve noticed is that the child process that >> =E2=80=98call-with-decompressed-port=E2=80=99 spawns would be stuck tryi= ng to get the >> allocation lock: >> >> So it seems quite clear that the thing has the alloc lock taken. I >> suppose this can happen if one of the libgc threads runs right when we >> call fork and takes the alloc lock, right? > > Does libgc spawn threads that run concurrently with user threads? If > so, that would be news to me. My understanding was that incremental > marking occurs within GC allocation calls, and marking threads are only > spawned after all user threads have been stopped, but I could be wrong. I think Mark is correct. > The first idea that comes to my mind is that perhaps the finalization > thread is holding the GC allocation lock when 'fork' is called. So of course we agree you're only supposed to "fork" when there are no other threads running, I think. As far as the finalizer thread goes, "primitive-fork" calls "scm_i_finalizer_pre_fork" which should join the finalizer thread, before the fork. There could be a bug obviously but the intention is for Guile to shut down its internal threads. Here's the body of primitive-fork fwiw: { int pid; scm_i_finalizer_pre_fork (); if (scm_ilength (scm_all_threads ()) !=3D 1) /* Other threads may be holding on to resources that Guile needs -- it is not safe to permit one thread to fork while others are running. =20=20=20=20 In addition, POSIX clearly specifies that if a multi-threaded program forks, the child must only call functions that are async-signal-safe. We can't guarantee that in general. The best we can do is to allow forking only very early, before any call to sigaction spawns the signal-handling thread. */ scm_display (scm_from_latin1_string ("warning: call to primitive-fork while multiple threads are run= ning;\n" " further behavior unspecified. See \"Processes\" in t= he\n" " manual, for more information.\n"), scm_current_warning_port ()); pid =3D fork (); if (pid =3D=3D -1) SCM_SYSERROR; return scm_from_int (pid); } > Another possibility: both the finalization thread and the signal > delivery thread call 'scm_without_guile', which calls 'GC_do_blocking', > which also temporarily grabs the GC allocation lock before calling the > specified function. See 'GC_do_blocking_inner' in pthread_support.c in > libgc. You spawn the signal delivery thread by calling 'sigaction' and > you make work for it to do every second when the SIGALRM is delivered. The signal thread is a possibility though in that case you'd get a warning; the signal-handling thread appears in scm_all_threads. Do you see a warning? If you do, that is a problem :) >> If that is correct, the fix would be to call fork within >> =E2=80=98GC_call_with_alloc_lock=E2=80=99. >> >> How does that sound? > > Sure, sounds good to me. I don't think this is necessary. I think the problem is that other threads are running. If we solve that, then we solve this issue; if we don't solve that, we don't know what else those threads are doing, so we don't know what mutexes and other state they might have. Andy