From debbugs-submit-bounces@debbugs.gnu.org Wed Oct 05 07:36:34 2016 Received: (at 24496) by debbugs.gnu.org; 5 Oct 2016 11:36:34 +0000 Received: from localhost ([127.0.0.1]:45221 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1brkV0-0006MI-0E for submit@debbugs.gnu.org; Wed, 05 Oct 2016 07:36:34 -0400 Received: from eggs.gnu.org ([208.118.235.92]:52119) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1brkUx-0006M3-UN for 24496@debbugs.gnu.org; Wed, 05 Oct 2016 07:36:32 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1brkUp-0005gy-Jz for 24496@debbugs.gnu.org; Wed, 05 Oct 2016 07:36:26 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_50,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:34659) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1brkUp-0005gp-Gw; Wed, 05 Oct 2016 07:36:23 -0400 Received: from reverse-83.fdn.fr ([80.67.176.83]:48446 helo=pluto) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1brkUo-00030Q-OY; Wed, 05 Oct 2016 07:36:23 -0400 From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) To: ng0 Subject: Re: bug#24496: offloading should fall back to local build after n tries References: <8760ppr3q3.fsf@we.make.ritual.n0.is> <87r387nhjg.fsf@gnu.org> <87vax8nis5.fsf@we.make.ritual.n0.is> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 14 =?utf-8?Q?Vend=C3=A9miaire?= an 225 de la =?utf-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-unknown-linux-gnu Date: Wed, 05 Oct 2016 13:36:20 +0200 In-Reply-To: <87vax8nis5.fsf@we.make.ritual.n0.is> (ng0's message of "Tue, 04 Oct 2016 17:08:58 +0000") Message-ID: <87a8ej81u3.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -7.7 (-------) X-Debbugs-Envelope-To: 24496 Cc: 24496@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -7.7 (-------) ng0 skribis: > Ludovic Court=C3=A8s writes: [...] >> Like you say, on Hydra-style setup this could be a problem: the >> front-end machine may have --max-jobs=3D0, meaning that it cannot perform >> builds on its own. >> >> So I guess we would need a command-line option to select a different >> behavior. I=E2=80=99m not sure how to do that because =E2=80=98guix off= load=E2=80=99 is >> =E2=80=9Chidden=E2=80=9D behind =E2=80=98guix-daemon=E2=80=99, so there= =E2=80=99s no obvious place for such an >> option. > > Could the daemon run with --enable-hydra-style or --disable-hydra-style > and --disable-hydra-style would allow falling back to local build if > after a defined time - keeping slow connections in mind - the machine > did not reply. That would be too ad-hoc IMO, and the problem mentioned above remains. >> In the meantime, you could also hack up your machines.scm: it would >> return a list where unreachable machines have been filtered out. > > How can I achieve this? Something like: (define the-machine (build-machine =E2=80=A6)) (if (managed-to-connect-timely the-machine) (list the-machine) '()) =E2=80=A6 where =E2=80=98managed-to-connect-timely=E2=80=99 would try to co= nnect to the machine with a timeout. > And to append to this bug: it seems to me that offloading requires 1 > lsh-key for each > build-machine. The main machine needs to be able to connect to each build machine over SSH, so indeed, that requires proper SSH key registration (host keys and authorized user keys). > (https://lists.gnu.org/archive/html/help-guix/2016-10/msg00007.html) > and that you can not directly address them (say I want to create some > system where I want to build on machine 1 AND machine 2. Having 2 > x86_64 in machines.scm only selects one of them (if 2 were working, > see linked thread) and builds on the one which is accessible first. If > however the first machine is somehow blocked and it fails, therefore > terminates lsh connection, the build does not happen at all. The code that selects machines is in (guix scripts offload), specifically =E2=80=98choose-build-machine=E2=80=99. It tries to choose th= e =E2=80=9Cbest=E2=80=9D machine, which means, roughly, the fastest and least loaded one. HTH, Ludo=E2=80=99.