[PATCH 0/2] gnu: hwloc: Skip failing test on non-x86 systems.

  • Done
  • quality assurance status badge
Details
3 participants
  • Ludovic Courtès
  • Ludovic Courtès
  • Simon South
Owner
unassigned
Submitted by
Simon South
Severity
normal
S
S
Simon South wrote on 13 Feb 2023 21:56
(address . guix-patches@gnu.org)
cover.1676319305.git.simon@simonsouth.net
Here's a patch that circumvents a test failure in hwloc 2.9.0 on non-x86
systems (and specifically on AArch64), allowing the package to build
successfully on these machines.

An additional, bonus patch removes a pair of obsolete comments from the hwloc
package definitions.

I've tested these changes on x86-64 and AArch64 and generally, things seem
fine.

- On x86-64, of hwloc's 136 dependents the only seven[0] that fail to build
appear to be existing failures, according to ci.guix.gnu.org.

- On AArch64, the package builds fine; many of its dependents fail (in fact I
am still waiting for builds to complete) but again, none of the failures
I've investigated appear to be new.

----------

Here's some background information regarding the fix in case it's useful:

One of hwloc's primary functions is to provide information about the host
computer's processor topology, in terms of NUMA nodes, CPU clusters and so on.
At start-up it it tries to collect this information by querying a sequence of
"topology backends" that each implement a different strategy for detecting the
host system's configuration.

The first source of information is the operating system, so on most Guix
machines the "Linux" backend runs first. This tries to pull information from
the /sys filesystem tree but since that's inaccessible from within build
containers, this always fails during hwloc's tests.

For x86 machines specifically, hwloc provides an architecture-specific,
fallback backend that can obtain the same information by querying the hardware
directly. This normally succeeds within the build environment, and so hwloc
passes its tests without issue on x86 and x86-64 machines.

But those are the only platforms for which an architecture-specific topology
backend is provided: On other systems, once the Linux backend fails, hwloc has
nothing else to try and so any tests that rely on the host system's topology
having been detected will fail.

My patch fixes the build on these machines by skipping the one (other) test
that relies on this information being available, only on non-x86 systems where
the unavailability of /sys means certain failure.

For reference, the backends mentioned above are implemented in hwloc's
hwloc/topology-linux.c and hwloc/topology-x86.c.

--
Simon South
simon@simonsouth.net

[0] combinatorial-blas, cube, elemental, elpa-openmpi, python-dolfin-adjoint,
scorep-openmpi and superlu-dist.


Simon South (2):
gnu: hwloc: Remove obsolete comments.
gnu: hwloc: Skip failing test on non-x86 systems.

gnu/packages/mpi.scm | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)


base-commit: 5b1eab43f011983d9ee560d6935409b6b39706ff
--
2.39.1
S
S
Simon South wrote on 13 Feb 2023 22:01
[PATCH 2/2] gnu: hwloc: Skip failing test on non-x86 systems.
(address . 61493@debbugs.gnu.org)
cddb11f082e86b39a3b3ecbc5815b6a8a72a2c36.1676319305.git.simon@simonsouth.net
* gnu/packages/mpi.scm (hwloc-2)[arguments]<#:phases>: Rename
"skip-test-that-requires-/sys" phase to "skip-tests-that-require-/sys" and
expand to skip additional test requiring /sys on non-x86 systems.
---
gnu/packages/mpi.scm | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)

Toggle diff (28 lines)
diff --git a/gnu/packages/mpi.scm b/gnu/packages/mpi.scm
index febd0b4124..22d47b966c 100644
--- a/gnu/packages/mpi.scm
+++ b/gnu/packages/mpi.scm
@@ -164,10 +164,19 @@ (define-public hwloc-2
(substitute* "tests/hwloc/linux-libnuma.c"
(("numa_available\\(\\)")
"-1"))))
- (add-before 'check 'skip-test-that-requires-/sys
+ (add-before 'check 'skip-tests-that-require-/sys
(lambda _
;; 'test-gather-topology.sh' requires /sys as of 2.9.0; skip it.
- (setenv "HWLOC_TEST_GATHER_TOPOLOGY" "0")))
+ (setenv "HWLOC_TEST_GATHER_TOPOLOGY" "0")
+
+ ;; 'hwloc_backends' also requires /sys on non-x86 systems, for
+ ;; which hwloc lacks a topology backend not reliant on the
+ ;; operating system; skip it also on these machines.
+ (substitute* "tests/hwloc/hwloc_backends.c"
+ ,@(if (not (target-x86?))
+ '((("putenv\\(\\(char \\*\\) \"HWLOC_L" all)
+ (string-append "exit (77);\n" all)))
+ '()))))
(add-before 'check 'skip-test-that-fails-on-qemu
(lambda _
;; Skip test that fails on emulated hardware due to QEMU bug:
--
2.39.1
S
S
Simon South wrote on 13 Feb 2023 22:01
[PATCH 1/2] gnu: hwloc: Remove obsolete comments.
(address . 61493@debbugs.gnu.org)
16ef00ad362bd07513600591e70e49997d60adde.1676319305.git.simon@simonsouth.net
hwloc 2.x become the default with commit 8ec7ca22d3, "gnu: hwloc: Default to
2.x.".

* gnu/packages/mpi.scm (hwloc-1): Remove obsolete comment.
(hwloc-2): Remove obsolete comment.
---
gnu/packages/mpi.scm | 3 ---
1 file changed, 3 deletions(-)

Toggle diff (23 lines)
diff --git a/gnu/packages/mpi.scm b/gnu/packages/mpi.scm
index 70b14c30b3..febd0b4124 100644
--- a/gnu/packages/mpi.scm
+++ b/gnu/packages/mpi.scm
@@ -53,8 +53,6 @@ (define-module (gnu packages mpi)
#:use-module (ice-9 match))
(define-public hwloc-1
- ;; Note: For now we keep 1.x as the default because many packages have yet
- ;; to migrate to 2.0.
(package
(name "hwloc")
(version "1.11.13")
@@ -140,7 +138,6 @@ (define-public hwloc-1
(license license:bsd-3)))
(define-public hwloc-2
- ;; Note: 2.x isn't the default yet, see above.
(package
(inherit hwloc-1)
(version "2.9.0")
--
2.39.1
L
L
Ludovic Courtès wrote on 27 Feb 2023 15:52
Re: bug#61493: [PATCH 0/2] gnu: hwloc: Skip failing test on non-x86 systems.
(name . Simon South)(address . simon@simonsouth.net)(address . 61493-done@debbugs.gnu.org)
875ybnjh38.fsf@gnu.org
Hi Simon,

Simon South <simon@simonsouth.net> skribis:

Toggle quote (17 lines)
> Here's a patch that circumvents a test failure in hwloc 2.9.0 on non-x86
> systems (and specifically on AArch64), allowing the package to build
> successfully on these machines.
>
> An additional, bonus patch removes a pair of obsolete comments from the hwloc
> package definitions.
>
> I've tested these changes on x86-64 and AArch64 and generally, things seem
> fine.
>
> - On x86-64, of hwloc's 136 dependents the only seven[0] that fail to build
> appear to be existing failures, according to ci.guix.gnu.org.
>
> - On AArch64, the package builds fine; many of its dependents fail (in fact I
> am still waiting for builds to complete) but again, none of the failures
> I've investigated appear to be new.

It’s a clear improvement according to https://qa.guix.gnu.org/issue/61493.

Toggle quote (32 lines)
> ----------
>
> Here's some background information regarding the fix in case it's useful:
>
> One of hwloc's primary functions is to provide information about the host
> computer's processor topology, in terms of NUMA nodes, CPU clusters and so on.
> At start-up it it tries to collect this information by querying a sequence of
> "topology backends" that each implement a different strategy for detecting the
> host system's configuration.
>
> The first source of information is the operating system, so on most Guix
> machines the "Linux" backend runs first. This tries to pull information from
> the /sys filesystem tree but since that's inaccessible from within build
> containers, this always fails during hwloc's tests.
>
> For x86 machines specifically, hwloc provides an architecture-specific,
> fallback backend that can obtain the same information by querying the hardware
> directly. This normally succeeds within the build environment, and so hwloc
> passes its tests without issue on x86 and x86-64 machines.
>
> But those are the only platforms for which an architecture-specific topology
> backend is provided: On other systems, once the Linux backend fails, hwloc has
> nothing else to try and so any tests that rely on the host system's topology
> having been detected will fail.
>
> My patch fixes the build on these machines by skipping the one (other) test
> that relies on this information being available, only on non-x86 systems where
> the unavailability of /sys means certain failure.
>
> For reference, the backends mentioned above are implemented in hwloc's
> hwloc/topology-linux.c and hwloc/topology-x86.c.

Interesting, thanks for explaining!

Ludo’.
Closed
L
L
Ludovic Courtès wrote on 27 Feb 2023 16:37
(name . Simon South)(address . simon@simonsouth.net)(address . 61493@debbugs.gnu.org)
87cz5vi0g2.fsf@gnu.org
Hi again,

Simon South <simon@simonsouth.net> skribis:

Toggle quote (4 lines)
> Here's a patch that circumvents a test failure in hwloc 2.9.0 on non-x86
> systems (and specifically on AArch64), allowing the package to build
> successfully on these machines.

I forwarded this to Brice Goglin, a colleague of mine also hwloc
co-maintainer, and they kindly opened an issue usptream:


Feel free to comment there!

Ludo’.
?