coreutils fails to build

  • Open
  • quality assurance status badge
Details
One participant
  • Collin J. Doering
Owner
unassigned
Submitted by
Collin J. Doering
Severity
normal
C
C
Collin J. Doering wrote on 4 Nov 16:37 +0100
(address . bug-guix@gnu.org)
874j4nt3fy.fsf@rekahsoft.ca
Hi lovely maintainers of Guix!

Some time ago I announced the availability of a guix build farm running out of the University of Tennessee[1]. Some time ago, builds started failing due to a failure to build coreutils[2]; investigation showed a unexpected failing test:

Toggle snippet (3 lines)
FAIL tests/cp/reflink-auto.sh (exit status: 1)

I found that on other guix systems, this is not occurring. After some online sleuthing, it appears that the nix folks have seen this before[3]. They opted to disable the test 'tests/cp/reflink-auto.sh' as it can fail when using btrfs. On the guix system impacted, disabling coreutils tests makes the package build.

For reference, coreutils was building on cuirass.genenetwork.org on guix commit `0c908518375aea50be6dec703367c01944c8c721` and stopped building on `66611696975409a52478b95a862a464daeaefe2a`.

I suggest we follow what the nix folks did (disable `tests/cp/reflink-auto.sh`). In a following email you will find a patch that does so, however, because it changes coreutils, this will cause many packages to be rebuilt, so I'm unsure whats the best way to correct this without having to wait for core-updates to be merged.

Any advise or insight appreciated.


--
Collin J. Doering

-----BEGIN PGP SIGNATURE-----

iIoEARYKADIWIQSg4F3ACfM0j/GRGeP3fjGTl82nFgUCZyjqURQcY29sbGluQHJl
a2Foc29mdC5jYQAKCRD3fjGTl82nFvPBAQC3DfQP5Dp4VH+4J7MXVjU3l2VPG9bd
9tAe7TWGfOXiNAEAjNk8MWoEhTqDvNlfHIRV6X5z3eef1LzQFIJkVxp3GAQ=
=N/kV
-----END PGP SIGNATURE-----

C
C
Collin J. Doering wrote on 4 Nov 16:42 +0100
[PATCH] gnu: coreutils: Disable cp/reflink-auto.sh as it can fail on btrfs
(address . 74203@debbugs.gnu.org)(name . Collin J. Doering)(address . collin@rekahsoft.ca)
3f7ec312925bdf1b89b0fc4630b4c73f820f4af7.1730734380.git.collin@rekahsoft.ca
* gnu/packages/base.scm: Similarly to
tests/cp/reflink-auto.sh test as it can fail on btrfs. This was discovered by
the cuirass.genenetwork.org build farm.

Change-Id: If1cc3d516c5807e580ec64ab93670e30090581a7
---
gnu/packages/base.scm | 2 ++
1 file changed, 2 insertions(+)

Toggle diff (17 lines)
diff --git a/gnu/packages/base.scm b/gnu/packages/base.scm
index 4e8121ae2c..bed708fc27 100644
--- a/gnu/packages/base.scm
+++ b/gnu/packages/base.scm
@@ -506,6 +506,8 @@ (define-public coreutils
"tests/split/fail.sh"
;; These tests error
"tests/dd/nocache.sh"
+ ;; These tests can intermitently fail on btrfs
+ "tests/cp/reflink-auto.sh"
;; These tests fail
"tests/cp/sparse.sh"
"tests/cp/special-f.sh"

base-commit: 915f807ce61c48c34141f0300ea7623170f4148a
--
2.46.0
C
C
Collin J. Doering wrote 5 days ago
Further investigation and workaround
(address . 74203@debbugs.gnu.org)
87ldxmsf3s.fsf@rekahsoft.ca
Hi again,

I wanted to follow up on my previous report and patch. I still think its useful to consider disabling the coreutils test I previously suggested, however I found a way to work around the issue and wanted to make note of it, as well as provide some details of my investigation.

To work around the coreutils test `tests/cp/reflink-auto.sh` failing on guix commit `66611696975409a52478b95a862a464daeaefe2a`, I temporarily mounted a tmpfs to replace /tmp (which was on btrfs).

Toggle snippet (7 lines)
mv /tmp /tmp.old
mkdir /tmp
mount -t tmpfs tmpfs /tmp
chmod 1777 /tmp
mv /tmp.old/{.*,*} /tmp/

Now, what made me do this? Well let me explain!

In `tests/cp/reflink-auto.sh` (https://github.com/coreutils/coreutils/blob/v9.1/tests/cp/reflink-auto.sh),the failing part of the test:

Toggle snippet (7 lines)
# we shouldn't be able to reflink() files on separate partitions
. "$abs_srcdir/tests/other-fs-tmpdir"
a_other="$other_partition_tmpdir/a"
<..>
returns_ 1 cp --reflink "$a_other" b || fail=1

'$other_partition_tmpdir' is defined in 'tests/other-fs-tmpdir' (https://github.com/coreutils/coreutils/blob/v9.1/tests/other-fs-tmpdir)by looking through a list of candidate directories, comparing the current working directory to each candidate to see if they have different device ids (as given by 'stat -c %d path ') and that the current user can create directories there. Once it finds a candidate, it sets '$other_partition_tmpdir' to the temporary directory it created. The candidate directories that are considered are as follows:

Toggle snippet (4 lines)
test "${CANDIDATE_TMP_DIRS+set}" = set \
|| CANDIDATE_TMP_DIRS="$TMPDIR /tmp /dev/shm /var/tmp /usr/tmp $HOME"

Looking at a remaining failed build of coreutils (left over by building with `--keep-failed`), I see that in 'top/environment-variables', 'TMPDIR' is set to '/tmp/guix-build-guix-1.4.0-26.5ab3c4c.drv-0'. This directory is the same place the build is taking place, so I would expect it to 'be on the same partition'. So, next would be /tmp, where the same premise applies; next is /dev/shm. From my tests simulating the coreutils guix shell build environment, this would meet the conditions and be selected. However, if this were the case, I wouldn't expect the coreutils reflink test to fail.

My suspicion is that for some reason, 'stat -c %d <path>' to check whether two files, a and b are on the same partition doesn't play well with btrfs subvolumes in some instances with guix-daemon sandboxed builds. However, when trying to test this in a simulated coreutils guix shell build environment, I found that paths outside of the environment on different subvolumes (that do indeed show different device ids (as per 'stat -c %d <path>' outside of the guix shell container)), show the same id's within it. I suspect this is related to why the coreutils test fails, but does not when I use a tmpfs for /tmp. Its worth noting that on the system impacted, /gnu/store is a btrfs subvolume.

I am not yet satisfied with my with my partial explanation, and am very curious if anyone spots something I'm missing (eg. has a better understanding of the guix build environment and why the reflink coreutils test could be failing like this).

Thanks for your time and attention.

--
Collin J. Doering

-----BEGIN PGP SIGNATURE-----

iIoEARYKADIWIQSg4F3ACfM0j/GRGeP3fjGTl82nFgUCZzVlhxQcY29sbGluQHJl
a2Foc29mdC5jYQAKCRD3fjGTl82nFgzbAQD2b+IJGAyaN0KjUxKIJ47eZD+nWgMu
VsLKjo6ZosMcTgEApfsAskaxs/lfHil7AalJdkFNiR32ZoBmiRvMyqYFGgo=
=ofWN
-----END PGP SIGNATURE-----

?
Your comment

Commenting via the web interface is currently disabled.

To comment on this conversation send an email to 74203@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 74203
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch