[PATCH 0/5] MPI support for Slingshot via libcxi

  • Done
  • quality assurance status badge
Details
2 participants
  • Ludovic Courtès
  • Ludovic Courtès
Owner
unassigned
Submitted by
Ludovic Courtès
Severity
normal
L
L
Ludovic Courtès wrote on 18 Nov 15:56 +0100
(address . guix-patches@gnu.org)(name . Ludovic Courtès)(address . ludovic.courtes@inria.fr)
cover.1731940961.git.ludo@gnu.org
From: Ludovic Courtès <ludovic.courtes@inria.fr>

Hello,

This series provides packages adding support for HPE’s Slingshot
high-speed interconnect to Open MPI. I have tested the whole stack
with ‘intel-mpi-benchmarks’ on a Tier-1 supercomputer with a
Slingshot NIC (Adastra, in France) and confirmed that we get the
expected peak bandwidth, around 25 GB/s.

Libcxi and related packages were fully published as free software
just today. I’m really happy about that because it unlocks access
to major supercomputers using a free software stack, and using Guix!
Incidentally, Guix will be the very first distro shipping packages
off their free releases. :-)

Many thanks to the people we talked to at HPE who helped make this
happen faster.

Ludo’.

Ludovic Courtès (5):
gnu: Add cassini-headers.
gnu: Add cxi-driver.
gnu: Add libcxi.
gnu: libfabric: Enable libcxi support.
gnu: openmpi: Disable static libraries.

gnu/packages/linux.scm | 137 +++++++++++++++++++++++++++++++++++++++--
gnu/packages/mpi.scm | 2 +
2 files changed, 135 insertions(+), 4 deletions(-)


base-commit: 23cbbe6860782c5d4a0ba599ea1cda0642e91661
--
2.46.0
L
L
Ludovic Courtès wrote on 18 Nov 15:59 +0100
[PATCH 1/5] gnu: Add cassini-headers.
(address . 74419@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludovic.courtes@inria.fr)
d1b5e4dc8f7426c09038ee6468b7b64947861c77.1731940961.git.ludo@gnu.org
From: Ludovic Courtès <ludovic.courtes@inria.fr>

* gnu/packages/linux.scm (cassini-headers): New variable.

Change-Id: I278fe784ed2a0b31831dd6ff19f0c03d193b310a
---
gnu/packages/linux.scm | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)

Toggle diff (41 lines)
diff --git a/gnu/packages/linux.scm b/gnu/packages/linux.scm
index 7a856c4721..9d0970313a 100644
--- a/gnu/packages/linux.scm
+++ b/gnu/packages/linux.scm
@@ -8988,6 +8988,34 @@ (define-public procenv
(home-page "https://github.com/jamesodhunt/procenv/")
(license license:gpl3+)))
+(define-public cassini-headers
+ (let ((commit "9a8a738a879f007849fbc69be8e3487a4abf0952")
+ (revision "0"))
+ (package
+ (name "cassini-headers")
+ (version (git-version "2.0.0" ;per .spec file
+ revision commit))
+ (home-page "https://github.com/HewlettPackard/shs-cassini-headers")
+ (source (origin
+ (method git-fetch)
+ (uri (git-reference (url home-page) (commit commit)))
+ (file-name (git-file-name name version))
+ (sha256
+ (base32
+ "0a54vwfr29n0i392wdap7rzmq0lb8mxa17d8yljdbm0kzrq48csz"))))
+ (build-system copy-build-system)
+ (arguments
+ (list #:install-plan
+ #~'(("include" "include")
+ ("share/cassini-headers" "share/cassini-headers"))))
+ (synopsis "Cassini network hardware definitions and headers")
+ (description
+ "This package provides hardware definitions and C headers for use by
+the Linux driver and by user-space applications for the Cassini/Slingshot
+high-speed network interconnect made by HPE (formerly Cray). User-land
+software uses @file{cxi_prov_hw.h} from this package.")
+ (license (list license:gpl2 license:bsd-2))))) ;dual-licensed
+
(define-public libfabric
(package
(name "libfabric")
--
2.46.0
L
L
Ludovic Courtès wrote on 18 Nov 15:59 +0100
[PATCH 2/5] gnu: Add cxi-driver.
(address . 74419@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludovic.courtes@inria.fr)
99b5d0abec47ff57af6e46310114a133411a0740.1731940961.git.ludo@gnu.org
From: Ludovic Courtès <ludovic.courtes@inria.fr>

* gnu/packages/linux.scm (cxi-driver): New variable.

Change-Id: Iac48010d3de7f46248afe8c71991da71b61ebe6f
---
gnu/packages/linux.scm | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)

Toggle diff (43 lines)
diff --git a/gnu/packages/linux.scm b/gnu/packages/linux.scm
index 9d0970313a..17f743ef11 100644
--- a/gnu/packages/linux.scm
+++ b/gnu/packages/linux.scm
@@ -9016,6 +9016,36 @@ (define-public cassini-headers
software uses @file{cxi_prov_hw.h} from this package.")
(license (list license:gpl2 license:bsd-2))))) ;dual-licensed
+(define-public cxi-driver
+ (let ((commit "5f0ec0ead6ef3f98542a2ef5e76b89d14dd22150")
+ (revision "0"))
+ (package
+ (name "cxi-driver")
+ (version (git-version "1.0.0" ;per .spec file
+ revision commit))
+ (home-page "https://github.com/HewlettPackard/shs-cxi-driver")
+ (source
+ (origin
+ (method git-fetch)
+ (uri (git-reference (url home-page) (commit commit)))
+ (file-name (git-file-name name version))
+ (sha256
+ (base32
+ "19cly014ihgdidrc1aki2xsbfhpc0g73v0vxcky8r27xza7rz5bg"))))
+ ;; TODO: Actually build the Linux driver.
+ (build-system copy-build-system)
+ (arguments
+ (list #:install-plan #~'(("include" "include"))))
+ (propagated-inputs (list cassini-headers))
+ (synopsis "Linux driver for the Cassini/Slingshot interconnect")
+ (description
+ "This is the Linux driver for the Cray/HPE Cassini 1 and 2 high-speed
+network interconnect (aka. Slingshot), and its Ethernet driver. It includes
+the @file{uapi/misc/cxi.h} C header file for use by user-land software.
+
+Currently the Linux driver itself is missing from this package.")
+ (license license:gpl2+))))
+
(define-public libfabric
(package
(name "libfabric")
--
2.46.0
L
L
Ludovic Courtès wrote on 18 Nov 15:59 +0100
[PATCH 3/5] gnu: Add libcxi.
(address . 74419@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludovic.courtes@inria.fr)
de5078171a824a8b4446cdeda836a4cc20596c2e.1731940961.git.ludo@gnu.org
From: Ludovic Courtès <ludovic.courtes@inria.fr>

* gnu/packages/linux.scm (libcxi): New variable.

Change-Id: I714d8694c796d5bc3ce4756d5aae576031288699
---
gnu/packages/linux.scm | 55 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 55 insertions(+)

Toggle diff (75 lines)
diff --git a/gnu/packages/linux.scm b/gnu/packages/linux.scm
index 17f743ef11..6b2a0c201e 100644
--- a/gnu/packages/linux.scm
+++ b/gnu/packages/linux.scm
@@ -150,6 +150,7 @@ (define-module (gnu packages linux)
#:use-module (gnu packages haskell-xyz)
#:use-module (gnu packages image)
#:use-module (gnu packages kde-frameworks)
+ #:use-module (gnu packages libevent)
#:use-module (gnu packages libunwind)
#:use-module (gnu packages libusb)
#:use-module (gnu packages llvm)
@@ -9046,6 +9047,60 @@ (define-public cxi-driver
Currently the Linux driver itself is missing from this package.")
(license license:gpl2+))))
+(define-public libcxi
+ (let ((commit "5b6f8b5d57017c7963debb379d5693c59aca63ed")
+ (revision "0"))
+ (package
+ (name "libcxi")
+ (version (git-version "1.0.1" revision commit))
+ (home-page "https://github.com/HewlettPackard/shs-libcxi")
+ (source
+ (origin
+ (method git-fetch)
+ (uri (git-reference (url home-page) (commit commit)))
+ (file-name (git-file-name name version))
+ (sha256
+ (base32 "1h3dhird8p11q4ziaxzg1hr5gxcgwx1limzdcyildyaw50dy549g"))))
+ (build-system gnu-build-system)
+ (arguments
+ (list #:configure-flags
+ #~(list "--disable-static"
+ (string-append "--with-udevrulesdir="
+ #$output "/lib/udev/rules.d"))
+
+ #:phases
+ #~(modify-phases %standard-phases
+ (add-before 'configure 'set-cassini-file-names
+ (lambda* (#:key inputs #:allow-other-keys)
+ (substitute* "utils/cxi_dump_csrs.py"
+ (("/usr/share/cassini-headers/csr_defs.json")
+ (search-input-file
+ inputs
+ "/share/cassini-headers/csr_defs.json"))))))))
+ (native-inputs (list autoconf
+ automake
+ libtool
+ pkg-config
+ python-wrapper))
+ (inputs (list libconfig
+ libuv
+ fuse-2
+ libyaml
+ libnl
+ numactl
+ eudev
+ (list lm-sensors "lib")))
+ (propagated-inputs (list cassini-headers cxi-driver))
+ (synopsis "Interface to the Cassini/Slingshot high-speed interconnect")
+ (description
+ "Libcxi provides applications with a low-level interface to the
+Cray/HPE Cassini high-speed @acronym{NIC, network interface controller}, also
+known as Slingshot.")
+
+ ;; License is spelled out in 'cray-libcxi.spec' and in source file
+ ;; headers.
+ (license (list license:lgpl2.1+ license:bsd-3))))) ;dual-licensed
+
(define-public libfabric
(package
(name "libfabric")
--
2.46.0
L
L
Ludovic Courtès wrote on 18 Nov 15:59 +0100
[PATCH 4/5] gnu: libfabric: Enable libcxi support.
(address . 74419@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludovic.courtes@inria.fr)
3b94d179f5fa0fb4a0c443f4cb41225888eb280f.1731940961.git.ludo@gnu.org
From: Ludovic Courtès <ludovic.courtes@inria.fr>

* gnu/packages/linux.scm (libfabric)[inputs]: Add libcxi, curl, and
json-c if libcxi supports the target system.
[arguments]: Add #:phases.

Change-Id: I3345cac68603c776ec4953cf0e97a12389c30635
---
gnu/packages/linux.scm | 24 ++++++++++++++++++++----
1 file changed, 20 insertions(+), 4 deletions(-)

Toggle diff (48 lines)
diff --git a/gnu/packages/linux.scm b/gnu/packages/linux.scm
index 6b2a0c201e..2bd1e14c0c 100644
--- a/gnu/packages/linux.scm
+++ b/gnu/packages/linux.scm
@@ -9116,21 +9116,37 @@ (define-public libfabric
(build-system gnu-build-system)
(inputs
(let ((if-supported ;XXX: copied from openmpi
- (lambda (package)
+ (lambda (package . extra)
(if (and (not (%current-target-system))
(member (%current-system)
(package-supported-systems package)))
- (list package)
+ (cons package extra)
'()))))
(append (list rdma-core libnl)
(if-supported psm)
- (if-supported psm2))))
+ (if-supported psm2)
+ (if-supported libcxi curl json-c))))
(arguments
(list #:configure-flags
#~(append (if #$(target-64bit?)
(list "--enable-efa")
'())
- (list "--enable-verbs"))))
+ (list #$@(if (this-package-input "libcxi")
+ #~("--enable-cxi")
+ #~())
+ "--enable-verbs"))
+ #:phases
+ #~(modify-phases %standard-phases
+ (add-after 'install 'remove-libtool-archive
+ (lambda _
+ ;; 'libfabric.la' has '-ljson-c' without a corresponding
+ ;; '-L' in 'dependency_libs', which in turn causes users
+ ;; such as Open MPI to fail at link time due to '-ljson-c'
+ ;; not being found, even when building a shared library.
+ ;; So, remove the .la file.
+ (delete-file
+ (string-append #$output
+ "/lib/libfabric.la")))))))
(home-page "https://ofiwg.github.io/libfabric/")
(synopsis "Open Fabric Interfaces")
(description
--
2.46.0
L
L
Ludovic Courtès wrote on 18 Nov 15:59 +0100
[PATCH 5/5] gnu: openmpi: Disable static libraries.
(address . 74419@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
bc59eeedfc741fd52e65b2e81338e1c03b0b9709.1731940961.git.ludo@gnu.org
* gnu/packages/mpi.scm (openmpi-4)[arguments]: Pass “--disable-static”.
* gnu/packages/mpi.scm (openmpi-5)[arguments]: Likewise.

Change-Id: Ia6a8bc8a88d12a37878a45eed380262759bd4565
---
gnu/packages/mpi.scm | 2 ++
1 file changed, 2 insertions(+)

Toggle diff (22 lines)
diff --git a/gnu/packages/mpi.scm b/gnu/packages/mpi.scm
index bc1fd797d6..20497242e5 100644
--- a/gnu/packages/mpi.scm
+++ b/gnu/packages/mpi.scm
@@ -234,6 +234,7 @@ (define-public openmpi-4
(list
#:configure-flags #~`("--enable-mpi-ext=affinity" ;cr doesn't work
"--with-sge"
+ "--disable-static"
#$@(if (package? (this-package-input "valgrind"))
#~("--enable-memchecker"
@@ -342,6 +343,7 @@ (define-public openmpi-5
(list #:configure-flags
#~(list "--enable-mpi-ext=affinity" ;cr doesn't work
"--with-sge"
+ "--disable-static"
#$@(if (package? (this-package-input "valgrind"))
#~("--enable-memchecker"
--
2.46.0
L
L
Ludovic Courtès wrote on 25 Nov 12:30 +0100
Re: [bug#74419] [PATCH 0/5] MPI support for Slingshot via libcxi
(address . 74419-done@debbugs.gnu.org)(address . romain.garbage@inria.fr)
87v7wbeekf.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (6 lines)
> gnu: Add cassini-headers.
> gnu: Add cxi-driver.
> gnu: Add libcxi.
> gnu: libfabric: Enable libcxi support.
> gnu: openmpi: Disable static libraries.

Pushed as a7d6a79a98496f87f577bf5edfa4024e1a39665e!

Ludo'.
Closed
?
Your comment

This issue is archived.

To comment on this conversation send an email to 74419@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 74419
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch