From debbugs-submit-bounces@debbugs.gnu.org Thu Jul 02 10:20:35 2020 Received: (at 34033) by debbugs.gnu.org; 2 Jul 2020 14:20:35 +0000 Received: from localhost ([127.0.0.1]:55483 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jr04p-0000DB-Cd for submit@debbugs.gnu.org; Thu, 02 Jul 2020 10:20:35 -0400 Received: from eggs.gnu.org ([209.51.188.92]:38274) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jr04m-0000Cx-6X for 34033@debbugs.gnu.org; Thu, 02 Jul 2020 10:20:34 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:33435) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jr04g-00081f-90; Thu, 02 Jul 2020 10:20:26 -0400 Received: from [2a01:e0a:fa:a50:283e:a3be:73a1:d0e2] (port=44556 helo=meru) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1jr04f-0002cK-T0; Thu, 02 Jul 2020 10:20:26 -0400 From: Mathieu Othacehe To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: bug#34033: Offloading sometimes hangs References: <87o98obikk.fsf@gnu.org> <87fttuq2mz.fsf@gnu.org> Date: Thu, 02 Jul 2020 16:20:23 +0200 In-Reply-To: <87fttuq2mz.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Mon, 14 Jan 2019 23:45:56 +0100") Message-ID: <87pn9ec82g.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 34033 Cc: 34033@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hello, > (That still doesn=E2=80=99t tell us why our =E2=80=98guix offload=E2=80= =99 processes would > occasionally be stuck but at least it ensures the build farm keeps > making progress even when that happens.) I'm still not sure it's directly related to this bug but I observed several offloading hangs on Berlin today. For instance, in Cuirass logs: --8<---------------cut here---------------start------------->8--- 2020-07-02T09:59:45 '/gnu/store/rm8ndiichxhwybaizis5pgck77952ilp-halt.drv' = offloaded to '141.80.167.164' 2020-07-02T09:54:30 '/gnu/store/dxczkbf5wa6qr37gm7wr995hcxs8s0ya-motion-4.2= .2.drv' offloaded to '141.80.167.170' --8<---------------cut here---------------end--------------->8--- those two builds were offloaded around 10:00 today and there's still no report from them at 16:00.=20 On 141.80.167.164 there's a matching build log: --8<---------------cut here---------------start------------->8--- -rw-r--r-- 1 root root 1735 Jul 2 10:00 /var/log/guix/drvs/rm/8ndiichxhwyb= aizis5pgck77952ilp-halt.drv.bz2 --8<---------------cut here---------------end--------------->8--- same on 141.80.167.170, --8<---------------cut here---------------start------------->8--- -rw-r--r-- 1 root root 6344 Jul 2 09:56 /var/log/guix/drvs/dx/czkbf5wa6qr3= 7gm7wr995hcxs8s0ya-motion-4.2.2.drv.bz2 --8<---------------cut here---------------end--------------->8--- Having those builds "unfinished" keeps the rest of the evaluation hanging. Running this SQL command in Cuirass database: --8<---------------cut here---------------start------------->8--- sqlite> select derivation, datetime(starttime, 'unixepoch', 'localtime'),st= optime from Builds where status=3D-1 and evaluation=3D14771; /gnu/store/ncp59nyidli4lm3ff2lkfjym25yb18j5-guix-1.1.0-14.5bd8033.drv|2020-= 07-02 09:33:04|0 /gnu/store/rm8ndiichxhwybaizis5pgck77952ilp-halt.drv|2020-07-02 09:59:28|0 /gnu/store/71wnjgm2waqgw3fqmxmc4r3f1ifd1l92-cups-test.drv|2020-07-02 10:00:= 26|0 /gnu/store/9qsqd7jfwnaw9sm323y45cwymn98kyjl-exim-test.drv|2020-07-02 10:00:= 51|0 /gnu/store/vhcww4fw4qxw0hl1009npd26b22gfj3c-bitlbee-test.drv|2020-07-02 10:= 00:24|0 /gnu/store/92jrd6dfzgdifr107hwi64s8hf4mls47-iptables.drv|2020-07-02 09:59:4= 9|0 /gnu/store/380nq6sjphd0agrvl43sr6ypli1yraz4-gnunet-0.12.2.drv|2020-07-02 09= :51:32|0 /gnu/store/lqs22nbc6vy2z2524rmkcsmbh5mllm62-cuirass-0.0.1-37.882393d.drv|20= 20-07-02 10:34:37|0 /gnu/store/dxczkbf5wa6qr37gm7wr995hcxs8s0ya-motion-4.2.2.drv|2020-07-02 09:= 54:02|0 /gnu/store/5ln3r997ycr7rd6fqahd2d426mjw0rxb-gzochi-0.12.drv|2020-07-02 09:5= 3:51|0 --8<---------------cut here---------------end--------------->8--- shows that the evaluation is pretty much pending since 10:00. According to Cuirass logs again, all those builds were offloaded, "/gnu/store/380nq6sjphd0agrvl43sr6ypli1yraz4-gnunet-0.12.2.drv", "/gnu/store/lqs22nbc6vy2z2524rmkcsmbh5mllm62-cuirass-0.0.1-37.882393d.drv" and /gnu/store/5ln3r997ycr7rd6fqahd2d426mjw0rxb-gzochi-0.12.drv are reported as failed, and all other are still hanging. Something is going wrong here! I'll keep investigating. Thanks, Mathieu