From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 09 14:11:22 2019 Received: (at 35181) by debbugs.gnu.org; 9 Apr 2019 18:11:22 +0000 Received: from localhost ([127.0.0.1]:51943 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hDvDO-0000Ti-7f for submit@debbugs.gnu.org; Tue, 09 Apr 2019 14:11:22 -0400 Received: from world.peace.net ([64.112.178.59]:51200) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hDvDL-0000TV-Kp for 35181@debbugs.gnu.org; Tue, 09 Apr 2019 14:11:21 -0400 Received: from mhw by world.peace.net with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1hDvDF-0006z1-MN; Tue, 09 Apr 2019 14:11:13 -0400 From: Mark H Weaver To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: bug#35181: Hydra offloads often get stuck while exporting build requisites References: <87mul17oo2.fsf@netris.org> <87imvp7ogv.fsf@netris.org> <20190407173105.GB1337@macbook41> <87ef6d6mdn.fsf@netris.org> <87pnpw29kp.fsf@gnu.org> <87o95g5lpd.fsf@netris.org> <87ftqrh2jn.fsf@gnu.org> Date: Tue, 09 Apr 2019 14:09:41 -0400 In-Reply-To: <87ftqrh2jn.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Tue, 09 Apr 2019 12:54:20 +0200") Message-ID: <87pnpvrqwv.fsf@netris.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 35181 Cc: 35181@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hi Ludovic, Ludovic Court=C3=A8s writes: > The problem is that this is an ancient Guix. In the meantime, > offloading has seen relevant changes, in particular things like commit > ed7b44370f71126087eb953f36aad8dc4c44109f which address stability issues > with Guile-SSH (ssh dist node) that was previously used. > > I think we should upgrade Guix on hydra.gnu.org otherwise we=E2=80=99re l= ikely > to end up chasing old bugs. Sure, that makes sense. I also noticed the old Guix after writing my last messages, so yesterday I tried updating Hydra's Guix to 0.16.0-11, which at the time was the latest version built by Hydra. After updating, I quit and relaunched 'guix-daemon', as well as 'guix publish', hydra-queue-runner, and hydra-evaluator. With the new version of Guix, *all* offloads started failing in a strange way: it got stuck in a loop, printing endlessly repeated messages like this: process N acquired build slot '/var/guix/offload/hydra.gnunet.org/0' process N acquired build slot '/var/guix/offload/hydra.gnunet.org/0' process N acquired build slot '/var/guix/offload/hydra.gnunet.org/1' process N acquired build slot '/var/guix/offload/hydra.gnunet.org/2' process N acquired build slot '/var/guix/offload/hydra.gnunet.org/0' This is from memory because after killing the queue-runner and cancelling the 'mozjs-60' jobs (which I had intended to start building as a test), the nix output above is no longer visible on those pages, and I'm not sure offhand were to look for it. Anyway, in every offloaded build, it printed a line like the above every few seconds, with the build slot number at the end varying. I don't remember if the process number varied. This reminds that I also ran into difficulties updating 'guix' on the armhf build slaves, which are also currently stuck on an even more ancient version of Guix (circa 0.12.0). On both Hydra and its armhf build slaves, Guix is installed on top of a Debian derivative, and both 'guix' and 'guix-daemon' are launched from an environment without any Guix environment variable settings. This apparently works in ancient versions of Guix, but not recent ones. So, could the problem simply be that the 'guix' wrapper is not installing enough environment variable settings for offloading to work? Mark