From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 08 21:07:47 2019 Received: (at control) by debbugs.gnu.org; 9 Apr 2019 01:07:47 +0000 Received: from localhost ([127.0.0.1]:50381 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hDfEo-0001d1-H9 for submit@debbugs.gnu.org; Mon, 08 Apr 2019 21:07:46 -0400 Received: from world.peace.net ([64.112.178.59]:49250) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hDfEl-0001ci-7i; Mon, 08 Apr 2019 21:07:45 -0400 Received: from mhw by world.peace.net with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1hDfEe-0001Nf-PJ; Mon, 08 Apr 2019 21:07:36 -0400 From: Mark H Weaver To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: bug#35181: Hydra offloads often get stuck while exporting build requisites References: <87mul17oo2.fsf@netris.org> <87imvp7ogv.fsf@netris.org> <20190407173105.GB1337@macbook41> <87ef6d6mdn.fsf@netris.org> <87pnpw29kp.fsf@gnu.org> Date: Mon, 08 Apr 2019 21:06:04 -0400 In-Reply-To: <87pnpw29kp.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Mon, 08 Apr 2019 10:19:18 +0200") Message-ID: <87k1g456nc.fsf@netris.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: control Cc: 35181@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) merge 35181 34157 thanks I looked more closely at the 'mozjs-60' failures, and I'm convinced that it's an instance of the same problem that's currently affecting these large font builds. Mozjs-60 was pushed to the master branch on 2019-01-18. It has _never_ successfully built on x86_64 or i686, although all builds were successful on armhf. See below for the complete list of build attempts of mozjs-60 on Hydra. Also of note: So far, all known instances of this problem have occurred while transferring a large directory, as opposed to a tarball. We have several packages with source tarballs _much_ larger than these problematic source checkouts, and which are updated more much frequently, and yet I've *never* seen an instance of this problem while exporting a plain file to a build slave. For example, the upstream IceCat and Firefox ESR tarballs are ~270 megabytes compressed, whereas 'font-google-material-design-icons-3.0.1' source is only ~176 megabytes _uncompressed_. I have no explanation for why the superficial form of the store item should matter here, but maybe it's a clue. I know that plain non-executable files in the store are handled somewhat differently in the Nix model than directories or executable files, the latter associated with the word "recursive", and requiring an additional layer of encoding for purposes of serialization, but I'm not sufficiently familiar with the details or relevant code. Ludovic, can you think of a reason why the file/directory distinction could be relevant to this issue? Finally: the problem seems to have been introduced into Hydra sometime between September 2018 and January 2019. September 2018 is when the last successful build of the problematic font packages was performed, and January 2019 is the first known instance of the problem. I do not currently know of any relevant data points in that time range. The last 'core-updates' merge into 'master' was on December 3rd. Mark PS: Here's the complete history of 'mozjs-60' build attempts on Hydra: First are the 'armhf' attempts, followed by i686, and x86_64. Note that the two armhf aborts happened after only 2 seconds, and surely had a different cause than this issue. --8<---------------cut here---------------start------------->8--- hydra=> select case when s.machine~'^(hydra|guix)\.' then s.machine else substring(s.machine from '^[^.]*') end as machine, jobset, s.build, s.stepnr as step, case when s.busy=1 then 'busy' when s.status=0 then NULL when s.status=1 then 'fail' when s.status=4 then 'abort' when s.status=7 then 'timeout' when s.status=8 then 'cfail' else '?' end as stat, regexp_replace(substr(s.drvpath,1+strpos(s.drvpath,'-')),'\.drv$','') as what, date_trunc('second', to_timestamp(s.stoptime)) as finished, date_trunc('second', to_timestamp(s.stoptime) - to_timestamp(s.starttime)) as duration from builds b, buildsteps s where b.id=s.build and b.job='mozjs-60.2.3-2.armhf-linux' order by s.stoptime; machine | jobset | build | step | stat | what | finished | duration --------------+--------+---------+------+-------+-------------------------+------------------------+---------- hydra-slave2 | master | 3342804 | 1 | | mozjs-60.2.3-2-checkout | 2019-01-19 12:58:52+00 | 00:23:55 hydra-slave2 | master | 3342804 | 2 | | mozjs-60.2.3-2 | 2019-01-19 15:49:37+00 | 02:50:42 | master | 3367975 | 1 | abort | mozjs-60.2.3-2 | 2019-02-13 00:03:58+00 | 00:00:02 | master | 3367975 | 2 | abort | mozjs-60.2.3-2 | 2019-02-15 15:35:45+00 | 00:00:02 hydra-slave3 | master | 3367975 | 3 | | mozjs-60.2.3-2 | 2019-02-18 16:38:08+00 | 02:46:42 (5 rows) hydra=> select case when s.machine~'^(hydra|guix)\.' then s.machine else substring(s.machine from '^[^.]*') end as machine, jobset, s.build, s.stepnr as step, case when s.busy=1 then 'busy' when s.status=0 then NULL when s.status=1 then 'fail' when s.status=4 then 'abort' when s.status=7 then 'timeout' when s.status=8 then 'cfail' else '?' end as stat, regexp_replace(substr(s.drvpath,1+strpos(s.drvpath,'-')),'\.drv$','') as what, date_trunc('second', to_timestamp(s.stoptime)) as finished, date_trunc('second', to_timestamp(s.stoptime) - to_timestamp(s.starttime)) as duration from builds b, buildsteps s where b.id=s.build and b.job='mozjs-60.2.3-2.i686-linux' order by s.stoptime; machine | jobset | build | step | stat | what | finished | duration ---------+--------+---------+------+-------+----------------+------------------------+----------------- | master | 3343511 | 1 | abort | mozjs-60.2.3-2 | 2019-01-20 20:05:16+00 | 12:11:12 | master | 3343511 | 2 | abort | mozjs-60.2.3-2 | 2019-01-23 01:52:01+00 | 2 days 05:42:55 | master | 3360985 | 1 | abort | mozjs-60.2.3-2 | 2019-02-15 19:59:42+00 | 09:31:25 | master | 3360985 | 2 | abort | mozjs-60.2.3-2 | 2019-02-16 17:37:06+00 | 05:57:15 | master | 3360985 | 3 | abort | mozjs-60.2.3-2 | 2019-02-17 17:39:49+00 | 16:06:14 | master | 3360985 | 4 | abort | mozjs-60.2.3-2 | 2019-03-03 21:50:48+00 | 00:02:19 (6 rows) hydra=> select case when s.machine~'^(hydra|guix)\.' then s.machine else substring(s.machine from '^[^.]*') end as machine, jobset, s.build, s.stepnr as step, case when s.busy=1 then 'busy' when s.status=0 then NULL when s.status=1 then 'fail' when s.status=4 then 'abort' when s.status=7 then 'timeout' when s.status=8 then 'cfail' else '?' end as stat, regexp_replace(substr(s.drvpath,1+strpos(s.drvpath,'-')),'\.drv$','') as what, date_trunc('second', to_timestamp(s.stoptime)) as finished, date_trunc('second', to_timestamp(s.stoptime) - to_timestamp(s.starttime)) as duration from builds b, buildsteps s where b.id=s.build and b.job='mozjs-60.2.3-2.x86_64-linux' order by s.stoptime; machine | jobset | build | step | stat | what | finished | duration ---------+--------+---------+------+-------+----------------+------------------------+----------------- | master | 3342528 | 1 | abort | mozjs-60.2.3-2 | 2019-01-20 20:04:50+00 | 22:25:28 | master | 3342528 | 2 | abort | mozjs-60.2.3-2 | 2019-01-23 01:51:48+00 | 2 days 05:19:35 | master | 3366691 | 1 | abort | mozjs-60.2.3-2 | 2019-02-17 17:39:59+00 | 09:21:24 (3 rows) --8<---------------cut here---------------end--------------->8---