gnu: Add mpich, scalapack-mpich, mumps-mpich, pt-scotch-mpich, python-mpi4py-mpich

  • Done
  • quality assurance status badge
Details
4 participants
  • Maurice Brémond
  • Ludovic Courtès
  • Ludovic Courtès
  • zimoun
Owner
unassigned
Submitted by
Maurice Brémond
Severity
normal
M
L
L
Ludovic Courtès wrote on 17 Feb 2020 18:26
(name . Maurice Brémond)(address . Maurice.Bremond@inria.fr)(address . 39588@debbugs.gnu.org)
87o8tx3z2q.fsf@gnu.org
Hi Maurice,

Thanks for the patches! We like to have one patch per package, so I
started with MPICH:

Maurice Brémond <Maurice.Bremond@inria.fr> skribis:

Toggle quote (14 lines)
> +(define-public mpich
> + (package
> + (name "mpich")
> + (version "3.4a2")
> + (source (origin
> + (method git-fetch)
> + (uri (git-reference
> + (url "https://github.com/pmodels/mpich")
> + (commit "644051d13dc20aecd460ba3db088756659c3dad3") ; tag v3.4a2
> + (recursive? #t)))
> + (sha256
> + (base32
> + "02ildr7wh40q1qaq5k8npb6vw6kv9szmrm3lspr6skqa5csmrrxw"))))

I ended up modifying the package:

• To use 3.3 instead of 3.4a, the latter being listed as “alpha”;

• To build from an official tarball rather than from a Git checkout,
so that the GNU build system is already bootstrapped;

• To ensure that the bundled copies of hwloc and ucx are not used.

I pushed the result here:


Let me know if I broke something.

As for the “-mpich” packages: they look good to me, though I’m not
entirely sure whether we should create “-mpich” variants for each of
them. Ideally ‘--with-inputs’ would be enough, but I don’t know. At
the same time, those variants don’t cost us much, so if they’re useful,
why not.

Thoughts, HPC folks?

Ludo’.
Z
Z
zimoun wrote on 17 Feb 2020 19:20
(name . Ludovic Courtès)(address . ludo@gnu.org)
CAJ3okZ0szPexSebyY4Td+M9Lb06WGHD__Qh+A3qOvL-Ei2efnw@mail.gmail.com
Hi,

Thank you Maurice for the packages! :-)


On Mon, 17 Feb 2020 at 18:27, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (6 lines)
> As for the “-mpich” packages: they look good to me, though I’m not
> entirely sure whether we should create “-mpich” variants for each of
> them. Ideally ‘--with-inputs’ would be enough, but I don’t know. At
> the same time, those variants don’t cost us much, so if they’re useful,
> why not.

Is it not related to "package parameters" or the discussion we had
about rebuilding everything with another compiler?
Other said, '--with-inputs' will do the job for explicit packages but
not the implicit ones.

One easy move should to generalize -- if possible -- what is done in
'with-python2' or 'with-ocaml4.07'. But I am not convinced it is easy
because it is clearly dependant on the build system.

On the other hand, I gave a look at spack (after the discussion at
FOSDEM) and how they do. The WIP branch [1] about the solver is
interesting: possibly catch incompatibilities earlier using solver
(SAT or other) and specifications. But I am not convinced neither it
is the way to go because it adds a lot of complexity for a gain that
could be discussed. ;-)




Well, for these particular patches, the variants are ok.
But we should think about how to ease the variant generation of all the chain.


Cheers,
simon
M
M
Maurice Brémond wrote on 18 Feb 2020 18:58
(name . Ludovic Courtès)(address . ludo@gnu.org)
87k14j6amk.fsf@inria.fr
Hi Ludovic & Simon!

I agree, the *-mpich packages are not necessary and this asks the
question of why those variants and why not others. Finally, I could keep
them on my own channel for convenience.

If I understand, in this case, the usage of --with-input is
possible because implicit packages are very likely to not use mpi ?

In guix packages, mpi input is usually declared as
(("openmpi" . ,openmpi))
or
(("mpi" . ,openmpi))

So two flags are necessary for the transformation.

Doing this, I ran into problems with your patch...

You can try with my original patch just a transformation of
mumps-openmpi into mumps-mpich:

--branch=add-mpich -- \
environment -K --pure --ad-hoc mumps-openmpi \
--with-input=mpi=mpich --with-input=openmpi=mpich --

This works for me, I can use a similar command to compile and execute a
program which uses mumps and I can see with ldd that mpich is used.

Then with the current mpich patch on savannah master:

guix time-machine --commit=c70261bfb993cebc23cd80042de3f52a8b7932a4 -- \
environment -K --pure --ad-hoc mumps-openmpi \
--with-input=mpi=mpich --with-input=openmpi=mpich --

This fails on my machine for the pt-scotch check (there is the same
error with scalapack check)

Invalid error code (-2) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(586)..............:
MPID_Init(224).....................: channel initialization failed
MPIDI_CH3_Init(105)................:
MPID_nem_init(324).................:
MPID_nem_tcp_init(175).............:
MPID_nem_tcp_get_business_card(401):
MPID_nem_tcp_init(373).............: gethostbyname failed, localhost (errno 0)

If I go into the build directory and launch the check manually after
sourcing the environment-variables file, it works...

So it seems that this is related to guix and the guixbuild environment in
the definition of the package.

Maurice
Z
Z
zimoun wrote on 18 Feb 2020 19:22
(name . Maurice Brémond)(address . Maurice.Bremond@inria.fr)
CAJ3okZ0wT_LxDcZ1LvyKN9-_QDUO9i+U_43Uhjdma5gAeWg=0g@mail.gmail.com
Hi Maurice,

On Tue, 18 Feb 2020 at 18:58, Maurice Brémond <Maurice.Bremond@inria.fr> wrote:

Toggle quote (3 lines)
> If I understand, in this case, the usage of --with-input is
> possible because implicit packages are very likely to not use mpi ?

Maybe I miss the issue. I have not look at mumps and related since... years. :-)
(Neither your patches. :-D)

If mumps depends explicitly on openmpi, then '--with-inputs' can
rewrite the direct dependencies, by providing say mpich instead of
openmpi.
If petsc* depends explicitly on openmpi and on mumps (which depends
explicitly on openmpi too), then '--with-inputs=openmpi=mpich' will
*only* rewrite the dependency of petsc but not of mumps. So it ends
with petsc compiled with mpich and mumps with openmpi.

Still considering this (fictive) example, where:
- petsc depends on openmpi(1) and mumps
- mumps depends on openmpi(2)
The openmpi(2) is an implicit dependency for petsc and '--with-inputs'
does not work.

*because I know better PETSc than Scotch. ;-)



Toggle quote (20 lines)
> You can try with my original patch just a transformation of
> mumps-openmpi into mumps-mpich:
>
> guix time-machine --url=https://gitlab.inria.fr/bremond/guix.git \
> --branch=add-mpich -- \
> environment -K --pure --ad-hoc mumps-openmpi \
> --with-input=mpi=mpich --with-input=openmpi=mpich --
>
> This works for me, I can use a similar command to compile and execute a
> program which uses mumps and I can see with ldd that mpich is used.
>
> Then with the current mpich patch on savannah master:
>
> guix time-machine --commit=c70261bfb993cebc23cd80042de3f52a8b7932a4 -- \
> environment -K --pure --ad-hoc mumps-openmpi \
> --with-input=mpi=mpich --with-input=openmpi=mpich --
>
> This fails on my machine for the pt-scotch check (there is the same
> error with scalapack check)

Are 'pt-scotch' and 'scalapack' compiled with 'mpich' or 'openmpi'?

Because maybe "mumps-openmpi --with-input=openmpi=mpich" compiles
'mumps' using 'mpich' as MPI but compile 'pt-scotch' or 'scalapack'
with the default implementation which seems 'openmpi'.


Thank you for your work.

All the best,
simon
Z
Z
zimoun wrote on 19 Feb 2020 13:11
(name . Maurice Brémond)(address . Maurice.Bremond@inria.fr)
CAJ3okZ3kwGssKHw-A3Gc1Lr+n4BpbWzxrTH0eoUFpQ3urrTnDA@mail.gmail.com
Hi Maurice,

On Wed, 19 Feb 2020 at 12:46, Maurice Brémond <Maurice.Bremond@inria.fr> wrote:

Toggle quote (14 lines)
> >If mumps depends explicitly on openmpi, then '--with-inputs' can
> >rewrite the direct dependencies, by providing say mpich instead of
> >openmpi.
> >If petsc* depends explicitly on openmpi and on mumps (which depends
> >explicitly on openmpi too), then '--with-inputs=openmpi=mpich' will
> >*only* rewrite the dependency of petsc but not of mumps. So it ends
> >with petsc compiled with mpich and mumps with openmpi.
> >
> >Still considering this (fictive) example, where:
> > - petsc depends on openmpi(1) and mumps
> > - mumps depends on openmpi(2)
> >The openmpi(2) is an implicit dependency for petsc and '--with-inputs'
> >does not work.

Sorry for the confusion, because what I said is *wrong*.
It is not the definition of an implicit inputs. The definition is:

Toggle snippet (9 lines)
In addition, this build system ensures that the “standard” environment
for GNU packages is available. This includes tools such as GCC, libc,
Coreutils, Bash, Make, Diffutils, grep, and sed (see the (guix
build-system gnu) module for a complete list). We call these the
implicit inputs of a package, because package definitions do not have to
mention them.


Toggle quote (10 lines)
> Ok thank you for the clarification, I understand better now.
>
> I misunderstood the documentation:
>
> https://guix.gnu.org/manual/en/html_node/Package-Transformation-Options.html
>
> --with-input=package=replacement
> [...]
> This is a recursive, deep replacement. [...]

Well, you understood correctly. It is me that mix and add confusion, sorry.


Toggle quote (11 lines)
> In the scalapack input I can see:
> `(("mpi" ,openmpi)
> ("fortran" ,gfortran)
> ("lapack" ,lapack))) ;for testing only
>
> So my assumption is that the --with-input transformation should work
> here as neither gfortran or lapack depends on mpi and to just build
> scalapack with mpich I tried:
>
> guix time-machine --commit=c70261bfb993cebc23cd80042de3f52a8b7932a4 -- build scalapack --with-input=openmpi=mpich

Hum, my MUA trims the long message.


Well, my point was: maybe it does not work because of the implicit inputs.
Now, mpi has bitten me so I will try this afternoon. :-)

Cheers,
simon
Z
Z
zimoun wrote on 19 Feb 2020 14:34
(name . Maurice Brémond)(address . Maurice.Bremond@inria.fr)
CAJ3okZ0tCj3C42=XoC3D=9av+uTAM_AUO2Gr-QszHLOdYfb4yA@mail.gmail.com
Hi Maurice,

On Tue, 18 Feb 2020 at 18:58, Maurice Brémond <Maurice.Bremond@inria.fr> wrote:

Toggle quote (24 lines)
> guix time-machine --commit=c70261bfb993cebc23cd80042de3f52a8b7932a4 -- \
> environment -K --pure --ad-hoc mumps-openmpi \
> --with-input=mpi=mpich --with-input=openmpi=mpich --
>
> This fails on my machine for the pt-scotch check (there is the same
> error with scalapack check)
>
> Invalid error code (-2) (error ring index 127 invalid)
> INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
> Fatal error in PMPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(586)..............:
> MPID_Init(224).....................: channel initialization failed
> MPIDI_CH3_Init(105)................:
> MPID_nem_init(324).................:
> MPID_nem_tcp_init(175).............:
> MPID_nem_tcp_get_business_card(401):
> MPID_nem_tcp_init(373).............: gethostbyname failed, localhost (errno 0)
>
> If I go into the build directory and launch the check manually after
> sourcing the environment-variables file, it works...
>
> So it seems that this is related to guix and the guixbuild environment in
> the definition of the package.

Considering mumps-openmpi (or scalapack), the package definition contains:

Toggle snippet (7 lines)
(arguments
`(#:configure-flags `("-DBUILD_SHARED_LIBS:BOOL=YES")
#:phases (modify-phases %standard-phases
(add-before 'check 'mpi-setup
,%openmpi-setup))))

so you have right. It seems being an environment issue.

The flag '--with-inputs=openmpi=mpich' changes the MPI implementation
but then at the checking phase, the environment variables (see
%openmpi-setup in gnu/packages/mpi.scm) are not necessarily set for
mpich.

The digression about implicit inputs was not relevant, sorry. :-)

Well, to restore the discussion about the variants '*-mpich' instead
of '*-openmpi', we could use a 'with-mpich' similar to 'with-python2'
or 'with-ocaml4.07' rewritting correctly the definition.


Hope that helps,
simon
M
M
Maurice Brémond wrote on 19 Feb 2020 12:45
(name . zimoun)(address . zimon.toutoune@gmail.com)
87eeuqby18.fsf@inria.fr
Hi Simon,

Toggle quote (17 lines)
>If mumps depends explicitly on openmpi, then '--with-inputs' can
>rewrite the direct dependencies, by providing say mpich instead of
>openmpi.
>If petsc* depends explicitly on openmpi and on mumps (which depends
>explicitly on openmpi too), then '--with-inputs=openmpi=mpich' will
>*only* rewrite the dependency of petsc but not of mumps. So it ends
>with petsc compiled with mpich and mumps with openmpi.
>
>Still considering this (fictive) example, where:
> - petsc depends on openmpi(1) and mumps
> - mumps depends on openmpi(2)
>The openmpi(2) is an implicit dependency for petsc and '--with-inputs'
>does not work.
>
>*because I know better PETSc than Scotch. ;-)
>

Ok thank you for the clarification, I understand better now.

I misunderstood the documentation:


--with-input=package=replacement
[...]
This is a recursive, deep replacement. [...]

So I thought that implicit packages had something to do with other
inputs like native-inputs, but it was not clear for me.
(I must admit that I still do not understand very well the recursive term here)

I made a confusion also with the replacement, --with-input must be given
the name of the package and not the name of the dependency, so
--with-input=openmpi=mpich is sufficient.

In the scalapack input I can see:

`(("mpi" ,openmpi)
("fortran" ,gfortran)
("lapack" ,lapack))) ;for testing only

So my assumption is that the --with-input transformation should work
here as neither gfortran or lapack depends on mpi and to just build
scalapack with mpich I tried:

guix time-machine --commit=c70261bfb993cebc23cd80042de3f52a8b7932a4 -- build scalapack --with-input=openmpi=mpich

This fails with the same error:
Attachment: file
I also tried to compile my *-mpich packages in my own branch with the new
mpich package and get the same results :

--branch=check-master-mpich -- \
environment --pure --ad-hoc mumps-mpich


Maurice
L
L
Ludovic Courtès wrote on 20 Feb 2020 10:08
(name . zimoun)(address . zimon.toutoune@gmail.com)
87eeupd3t1.fsf@gnu.org
Hello!

zimoun <zimon.toutoune@gmail.com> skribis:

Toggle quote (11 lines)
> On Mon, 17 Feb 2020 at 18:27, Ludovic Courtès <ludo@gnu.org> wrote:
>
>> As for the “-mpich” packages: they look good to me, though I’m not
>> entirely sure whether we should create “-mpich” variants for each of
>> them. Ideally ‘--with-inputs’ would be enough, but I don’t know. At
>> the same time, those variants don’t cost us much, so if they’re useful,
>> why not.
>
> Is it not related to "package parameters" or the discussion we had
> about rebuilding everything with another compiler?

There’s definitely a connection.

Toggle quote (3 lines)
> Other said, '--with-inputs' will do the job for explicit packages but
> not the implicit ones.

Right, ‘--with-input’ could be “good enough”.

Toggle quote (14 lines)
> One easy move should to generalize -- if possible -- what is done in
> 'with-python2' or 'with-ocaml4.07'. But I am not convinced it is easy
> because it is clearly dependant on the build system.
>
> On the other hand, I gave a look at spack (after the discussion at
> FOSDEM) and how they do. The WIP branch [1] about the solver is
> interesting: possibly catch incompatibilities earlier using solver
> (SAT or other) and specifications. But I am not convinced neither it
> is the way to go because it adds a lot of complexity for a gain that
> could be discussed. ;-)
>
>
> [1] https://github.com/spack/spack/tree/features/solver/lib/spack/spack/solver

I have yet to look more closely into this. However, overall, while I
agree that some flexibility is welcome and actually needed, I’m
skeptical about the goal of potentially allowing for any combination, at
the expense of QA (the solver can check for incompatible options,
provided option compatibility is well specified, but it cannot check
whether something will run or even build at all.)

Toggle quote (3 lines)
> Well, for these particular patches, the variants are ok.
> But we should think about how to ease the variant generation of all the chain.

Well again there are things like ‘package-input-rewriting’ that could
help: we could define a ‘package-with-mpich’ procedure.

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 20 Feb 2020 10:38
(name . Maurice Brémond)(address . Maurice.Bremond@inria.fr)
878skxd2es.fsf@gnu.org
Hi Maurice,

Maurice Brémond <Maurice.Bremond@inria.fr> skribis:

Toggle quote (17 lines)
> This fails on my machine for the pt-scotch check (there is the same
> error with scalapack check)
>
> Invalid error code (-2) (error ring index 127 invalid)
> INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
> Fatal error in PMPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(586)..............:
> MPID_Init(224).....................: channel initialization failed
> MPIDI_CH3_Init(105)................:
> MPID_nem_init(324).................:
> MPID_nem_tcp_init(175).............:
> MPID_nem_tcp_get_business_card(401):
> MPID_nem_tcp_init(373).............: gethostbyname failed, localhost (errno 0)
>
> If I go into the build directory and launch the check manually after
> sourcing the environment-variables file, it works...

My version of the patch must have changed the default driver or
something along these lines.

tcp_init.c:373 in MPICH reads this:

Toggle snippet (7 lines)
/* If we don't have an IP address, try to get it from the name */
if (!ifaddrFound) {
mpi_errno = MPL_get_sockaddr(ifname_string, p_addr);
MPIR_ERR_CHKANDJUMP2(mpi_errno, mpi_errno, MPI_ERR_OTHER, "**gethostbyname", "**gethostbyname %s %d", ifname_string, h_errno);
}

‘MPL_get_sockaddr’ uses ‘getifaddrs’ to get the list of local
interfaces, which in turn is implemented in terms of netlink requests in
libc.

I tried to reproduce it without success, but I guess my example MPI
program is too simple to trigger the issue:

Toggle snippet (44 lines)
$ guix build -f mpich.scm
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
La jena derivo estos konstruata:
/gnu/store/mgcwnmicw696i3g98rljdg92ra6ilq4n-mpi-init.drv
building /gnu/store/mgcwnmicw696i3g98rljdg92ra6ilq4n-mpi-init.drv...
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
successfully built /gnu/store/mgcwnmicw696i3g98rljdg92ra6ilq4n-mpi-init.drv
/gnu/store/cmdh27sg2hqh2p0jhxz33xgfmsxd5hmz-mpi-init
$ /gnu/store/cmdh27sg2hqh2p0jhxz33xgfmsxd5hmz-mpi-init
$ cat mpich.scm
(use-modules (guix) (gnu))

(define code
(plain-file "mpi.c" "
#include <mpi.h>

int main (int argc, char *argv[]) { return MPI_Init (&argc, &argv);} "))

(define toolchain (specification->package "gcc-toolchain"))
(define mpich (specification->package "mpich"))

(computed-file "mpi-init"
(with-imported-modules '((guix build utils))
#~(begin
(use-modules (guix build utils))

(setenv "PATH"
(string-append #$(file-append toolchain "/bin") ":"
#$(file-append mpich "/bin")))
(setenv "CPATH" #$(file-append mpich "/include"))
(setenv "LIBRARY_PATH"
(string-append #$(file-append mpich "/lib") ":"
#$(file-append toolchain "/lib")))
(invoke "mpicc" "-o" #$output "-Wall" "-g"
#$code)

;; Run the MPI code in the build environment.
(invoke #$output))))

Ideas?

Could you perhaps strace the pt-scotch test that fails so we can see if
there’s anything obvious, such as code that browses /sys or similar?

TIA,
Ludo’.
Z
Z
zimoun wrote on 20 Feb 2020 11:23
(name . Ludovic Courtès)(address . ludo@gnu.org)
CAJ3okZ1DnP9ZZE2Ovxj_87Akazxw+743ahwTNDN24aBThGB+jg@mail.gmail.com
Hi Ludo,

On Thu, 20 Feb 2020 at 10:08, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (5 lines)
> > Other said, '--with-inputs' will do the job for explicit packages but
> > not the implicit ones.
>
> Right, ‘--with-input’ could be “good enough”.

About openmpi->mpich, I am not sure it will work because of:

Toggle snippet (8 lines)
#:phases (modify-phases %standard-phases
(add-before 'check 'mpi-setup
,%openmpi-setup))




Toggle quote (17 lines)
> > On the other hand, I gave a look at spack (after the discussion at
> > FOSDEM) and how they do. The WIP branch [1] about the solver is
> > interesting: possibly catch incompatibilities earlier using solver
> > (SAT or other) and specifications. But I am not convinced neither it
> > is the way to go because it adds a lot of complexity for a gain that
> > could be discussed. ;-)
> >
> >
> > [1] https://github.com/spack/spack/tree/features/solver/lib/spack/spack/solver
>
> I have yet to look more closely into this. However, overall, while I
> agree that some flexibility is welcome and actually needed, I’m
> skeptical about the goal of potentially allowing for any combination, at
> the expense of QA (the solver can check for incompatible options,
> provided option compatibility is well specified, but it cannot check
> whether something will run or even build at all.)

I agree. Need more thoughts. :-)


Toggle quote (10 lines)
> > One easy move should to generalize -- if possible -- what is done in
> > 'with-python2' or 'with-ocaml4.07'. But I am not convinced it is easy
> > because it is clearly dependant on the build system.

> > Well, for these particular patches, the variants are ok.
> > But we should think about how to ease the variant generation of all the chain.
>
> Well again there are things like ‘package-input-rewriting’ that could
> help: we could define a ‘package-with-mpich’ procedure.

Yes. 'with-python2' and 'with-ocaml4.07' rewrite the build-system
(implicit inputs) and 'package-with-mpich' rewrites packages
('package-input-rewritting' so explicit ones) more tweak some
variables (environment and/or flags).
Sounds good. :-)


All the best,
simon
L
L
Ludovic Courtès wrote on 21 Feb 2020 09:03
(name . zimoun)(address . zimon.toutoune@gmail.com)
87pne8pdsl.fsf@gnu.org
Hi,

zimoun <zimon.toutoune@gmail.com> skribis:

Toggle quote (13 lines)
> On Thu, 20 Feb 2020 at 10:08, Ludovic Courtès <ludo@gnu.org> wrote:
>
>> > Other said, '--with-inputs' will do the job for explicit packages but
>> > not the implicit ones.
>>
>> Right, ‘--with-input’ could be “good enough”.
>
> About openmpi->mpich, I am not sure it will work because of:
>
> #:phases (modify-phases %standard-phases
> (add-before 'check 'mpi-setup
> ,%openmpi-setup))

That phase just sets environment variables that MPICH will happily
ignore.

Ludo’.
Z
Z
zimoun wrote on 21 Feb 2020 09:40
(name . Ludovic Courtès)(address . ludo@gnu.org)
CAJ3okZ04OpBVtUm07=B-J+zvn4cU09E4VVqATBUKnfizKqGKcQ@mail.gmail.com
On Fri, 21 Feb 2020 at 09:03, Ludovic Courtès <ludo@gnu.org> wrote:
Toggle quote (17 lines)
> zimoun <zimon.toutoune@gmail.com> skribis:
> > On Thu, 20 Feb 2020 at 10:08, Ludovic Courtès <ludo@gnu.org> wrote:
> >
> >> > Other said, '--with-inputs' will do the job for explicit packages but
> >> > not the implicit ones.
> >>
> >> Right, ‘--with-input’ could be “good enough”.
> >
> > About openmpi->mpich, I am not sure it will work because of:
> >
> > #:phases (modify-phases %standard-phases
> > (add-before 'check 'mpi-setup
> > ,%openmpi-setup))
>
> That phase just sets environment variables that MPICH will happily
> ignore.

Yes "qui peut le plus peut le moins". ;-)
But if the package mpich requires environment variables too.

(I do not have a clean MPI installation at hand so it is difficult to
really test.)


Cheers,
simon
M
M
Maurice Brémond wrote on 21 Feb 2020 09:46
(name . Ludovic Courtès)(address . ludo@gnu.org)
875zg0jpjl.fsf@inria.fr
Hi,

I made an strace on scalapack tests (blacs tests)
Attachment: XL.gz
I cannot see where it goes wrong, but it should be in the trace.

I also compiled another package I use with mpich, adjoinable-mpi, and it
is ok (as there is no checks inside it...). I can use it to run an ocean
model and everything works. So it is the same thing as your example I
think, the end user can use mpich, but the guix daemon cannot.

Maurice
M
M
Maurice Brémond wrote on 21 Feb 2020 10:01
(name . zimoun)(address . zimon.toutoune@gmail.com)
871rqojouv.fsf@inria.fr
Hi Simon,

Toggle quote (1 lines)
>The digression about implicit inputs was not relevant, sorry. :-)
no worry, it made me learn some guix internals that were obscure to me!

Toggle quote (6 lines)
>(arguments
> `(#:configure-flags `("-DBUILD_SHARED_LIBS:BOOL=YES")
> #:phases (modify-phases %standard-phases
> (add-before 'check 'mpi-setup
> ,%openmpi-setup))))

Even if it only sets variables and should be harmless here, I don't know,
would it be easy to make this openmpi-setup become a mpi-setup with
ad-hoc initialisations?

Maurice
L
L
Ludovic Courtès wrote on 21 Feb 2020 12:32
(name . Maurice Brémond)(address . Maurice.Bremond@inria.fr)
87ftf4npk3.fsf@gnu.org
Hi,

I actually managed to reproduce it with a minimal test case (attached):

Toggle snippet (35 lines)
$ guix build -f mpich-test.scm
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
La jena derivo estos konstruata:
/gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv
building /gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv...
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
Invalid error code (-2) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
Invalid error code (-2) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(586)..............:
MPID_Init(224).....................: channel initialization failed
MPIDI_CH3_Init(105)................:
MPID_nem_init(324).................:
MPID_nem_tcp_init(175).............:
MPID_nem_tcp_get_business_card(401):
MPID_nem_tcp_init(373).............: gethostbyname failed, localhost (errno 0)
Backtrace:
1 (primitive-load "/gnu/store/iykxzg1n018sigd4c23kx1c4ngz?")
In guix/build/utils.scm:
652:6 0 (invoke _ . _)

guix/build/utils.scm:652:6: In procedure invoke:
Throw to key `srfi-34' with args `(#<condition &invoke-error [program: "mpiexec" arguments: ("-np" "2" "/gnu/store/8i1dci1wxd6c0q6a2cz4kgb8adfk8rrz-mpi-init") exit-status: 15 term-signal: #f stop-signal: #f] 7ffff6022f40>)'.
builder for `/gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv' failed with exit code 1
build of /gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv failed
View build log at '/var/log/guix/drvs/rg/r7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv.bz2'.
guix build: error: build of `/gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv' failed

The same program outside the container works just fine:

Toggle snippet (5 lines)
$ guix environment --ad-hoc mpich -- mpiexec -np 2 "/gnu/store/8i1dci1wxd6c0q6a2cz4kgb8adfk8rrz-mpi-init"
np = 2, rank = 0
np = 2, rank = 1

‘MPL_get_sockaddr’ uses ‘getaddrinfo’ for host name lookup.
Interestingly, ‘getaddrinfo’ fails in the build environment when passed
the flags that ‘MPL_get_sockaddr’ uses:

Toggle snippet (9 lines)
(computed-file "getaddrinfo"
#~(pk #$output
(getaddrinfo "localhost" #f
(logior AI_ADDRCONFIG AI_V4MAPPED)
AF_INET
SOCK_STREAM
IPPROTO_TCP)))

However, if you comment AF_INET, SOCK_STREAM, and IPPROTO_TCP, it works.

Now we need to see why the ‘ai_family’ hint is causing troubles in
glibc, and perhaps in parallel try to work around it in MPICH…

Ludo’.

PS: I’ll be mostly away from keyboard in the coming days.
(use-modules (guix) (gnu)) (define code (plain-file "mpi.c" " #include <assert.h> #include <stdio.h> #include <mpi.h> int main (int argc, char *argv[]) { int err, np, rank; err = MPI_Init (&argc, &argv); assert (err == 0); err = MPI_Comm_size(MPI_COMM_WORLD, &np); assert (err == 0); err = MPI_Comm_rank(MPI_COMM_WORLD, &rank); assert (err == 0); printf (\"np = %i, rank = %i\\n\", np, rank); return 0; } ")) (define toolchain (specification->package "gcc-toolchain")) (define mpich (specification->package "mpich")) (computed-file "mpi-init" (with-imported-modules '((guix build utils)) #~(begin (use-modules (guix build utils)) (setenv "PATH" (string-append #$(file-append toolchain "/bin") ":" #$(file-append mpich "/bin"))) (setenv "CPATH" #$(file-append mpich "/include")) (setenv "LIBRARY_PATH" (string-append #$(file-append mpich "/lib") ":" #$(file-append toolchain "/lib"))) (invoke "mpicc" "-o" #$output "-Wall" "-g" #$code) ;; Run the MPI code in the build environment. (invoke "mpiexec" "-np" "2" #$output))))
Z
Z
zimoun wrote on 25 Feb 2020 17:41
(name . Ludovic Courtès)(address . ludo@gnu.org)
CAJ3okZ0Zug88h4V_yzkHEVkWLKwew8rG3RGg79=_p4AHm3_rpg@mail.gmail.com
Hi Ludo,

On Thu, 20 Feb 2020 at 11:23, zimoun <zimon.toutoune@gmail.com> wrote:
Toggle quote (18 lines)
> On Thu, 20 Feb 2020 at 10:08, Ludovic Courtès <ludo@gnu.org> wrote:

> > > One easy move should to generalize -- if possible -- what is done in
> > > 'with-python2' or 'with-ocaml4.07'. But I am not convinced it is easy
> > > because it is clearly dependant on the build system.
>
> > > Well, for these particular patches, the variants are ok.
> > > But we should think about how to ease the variant generation of all the chain.
> >
> > Well again there are things like ‘package-input-rewriting’ that could
> > help: we could define a ‘package-with-mpich’ procedure.
>
> Yes. 'with-python2' and 'with-ocaml4.07' rewrite the build-system
> (implicit inputs) and 'package-with-mpich' rewrites packages
> ('package-input-rewritting' so explicit ones) more tweak some
> variables (environment and/or flags).
> Sounds good. :-)

I do not know why I remove the "package-" in "package-with-python2".
Whatever! :-)
My remark was to maybe distinguish between rewriting an input and
rewriting the build-system. But after some thoughts, I do not know if
it is useful and add more complexity.

However, I do not know if the good candidate is
'package-input-rewriting' or 'package-mapping'; as in
'package-with-python2'. Well, I will try to experiment in the
meantime.


All the best,
simon
Z
Z
zimoun wrote on 15 Oct 2020 21:50
(name . Ludovic Courtès)(address . ludo@gnu.org)
861rhz1d7b.fsf@gmail.com
Hi,

Reliving the openmpi->mpich topic of #39588 [1].



On Thu, 20 Feb 2020 at 10:08, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (6 lines)
>> Well, for these particular patches, the variants are ok.
>> But we should think about how to ease the variant generation of all the chain.
>
> Well again there are things like ‘package-input-rewriting’ that could
> help: we could define a ‘package-with-mpich’ procedure.

Now the “#:deep?” exists, does it make sense to implement this
“package-with-mpich” procedure? It could be helpful for HPC people,
isn’t?


All the best,
simon
L
L
Ludovic Courtès wrote on 16 Oct 2020 11:32
(name . zimoun)(address . zimon.toutoune@gmail.com)
87o8l28qjh.fsf@gnu.org
Hi,

zimoun <zimon.toutoune@gmail.com> skribis:

Toggle quote (4 lines)
> Reliving the openmpi->mpich topic of #39588 [1].
>
> 1: <http://issues.guix.gnu.org/issue/39588>

Thanks for staying on top of things! :-)

Toggle quote (12 lines)
> On Thu, 20 Feb 2020 at 10:08, Ludovic Courtès <ludo@gnu.org> wrote:
>
>>> Well, for these particular patches, the variants are ok.
>>> But we should think about how to ease the variant generation of all the chain.
>>
>> Well again there are things like ‘package-input-rewriting’ that could
>> help: we could define a ‘package-with-mpich’ procedure.
>
> Now the “#:deep?” exists, does it make sense to implement this
> “package-with-mpich” procedure? It could be helpful for HPC people,
> isn’t?

Or does ‘--with-input=openmpi=mpich’ fit the bill now?

Ludo’.
Z
Z
zimoun wrote on 16 Oct 2020 13:46
(name . Ludovic Courtès)(address . ludo@gnu.org)
CAJ3okZ1hnYkzwJ88=k6OvJ3TdU9uU_+DPFC-AqBDCDuaJc1J_g@mail.gmail.com
Dear,

On Fri, 16 Oct 2020 at 11:32, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (2 lines)
> Thanks for staying on top of things! :-)

I have been a bit upset by recent discussions on French's HPC mailing
list about "modulefiles are awesome" ---- well I am sure they are
applying variant of "prêcher le faux pour savoir le vrai" [1]. :-)



Toggle quote (8 lines)
> > On Thu, 20 Feb 2020 at 10:08, Ludovic Courtès <ludo@gnu.org> wrote:

> > Now the “#:deep?” exists, does it make sense to implement this
> > “package-with-mpich” procedure? It could be helpful for HPC people,
> > isn’t?
>
> Or does ‘--with-input=openmpi=mpich’ fit the bill now?

I do not have an MPI infrastructure or enough CPU power to rebuild a
lot at hand to fully test it. If one of you can try on right infra
and report, it could be awesome.

BTW, it should work only if MPICH does not require extra phases or
environment variables.


All the best,
simon
M
M
Maurice Brémond wrote on 19 Oct 2020 15:46
(name . zimoun)(address . zimon.toutoune@gmail.com)
87lfg2pbv7.fsf@inria.fr
Hello,

A build of mumps-openmpi with mpich fails:

guix time-machine -- build mumps-openmpi --with-input=openmpi=mpich

[...]
mpirun -n 3 ./test_scotch_dgraph_check data/bump.grf
Invalid error code (-2) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
Invalid error code (-2) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
Invalid error code (-2) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(586)..............:
MPID_Init(224).....................: channel initialization failed
MPIDI_CH3_Init(105)................:
MPID_nem_init(324).................:
MPID_nem_tcp_init(175).............:
MPID_nem_tcp_get_business_card(401):
MPID_nem_tcp_init(373).............: gethostbyname failed, localhost (errno 0)


This is what Ludo reproduced:
From: Ludovic Courtès <ludo@gnu.org>
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich, pt-scotch-mpich, python-mpi4py-mpich
To: Maurice Brémond <Maurice.Bremond@inria.fr>
Cc: 39588@debbugs.gnu.org, zimoun <zimon.toutoune@gmail.com>
Date: Fri, 21 Feb 2020 12:32:44 +0100 (34 weeks, 3 days, 2 hours ago)

Hi,

I actually managed to reproduce it with a minimal test case (attached):

$ guix build -f mpich-test.scm
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
La jena derivo estos konstruata:
/gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv
building /gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv...
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
Invalid error code (-2) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
Invalid error code (-2) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(586)..............:
MPID_Init(224).....................: channel initialization failed
MPIDI_CH3_Init(105)................:
MPID_nem_init(324).................:
MPID_nem_tcp_init(175).............:
MPID_nem_tcp_get_business_card(401):
MPID_nem_tcp_init(373).............: gethostbyname failed, localhost (errno 0)
Backtrace:
1 (primitive-load "/gnu/store/iykxzg1n018sigd4c23kx1c4ngz?")
In guix/build/utils.scm:
652:6 0 (invoke _ . _)

guix/build/utils.scm:652:6: In procedure invoke:
Throw to key `srfi-34' with args `(#<condition &invoke-error [program: "mpiexec" arguments: ("-np" "2" "/gnu/store/8i1dci1wxd6c0q6a2cz4kgb8adfk8rrz-mpi-init") exit-status: 15 term-signal: #f stop-signal: #f] 7ffff6022f40>)'.
builder for `/gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv' failed with exit code 1
build of /gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv failed
View build log at '/var/log/guix/drvs/rg/r7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv.bz2'.
guix build: error: build of `/gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv' failed


The same program outside the container works just fine:

$ guix environment --ad-hoc mpich -- mpiexec -np 2 "/gnu/store/8i1dci1wxd6c0q6a2cz4kgb8adfk8rrz-mpi-init"
np = 2, rank = 0
np = 2, rank = 1


‘MPL_get_sockaddr’ uses ‘getaddrinfo’ for host name lookup.
Interestingly, ‘getaddrinfo’ fails in the build environment when passed
the flags that ‘MPL_get_sockaddr’ uses:

(computed-file "getaddrinfo"
#~(pk #$output
(getaddrinfo "localhost" #f
(logior AI_ADDRCONFIG AI_V4MAPPED)
AF_INET
SOCK_STREAM
IPPROTO_TCP)))

However, if you comment AF_INET, SOCK_STREAM, and IPPROTO_TCP, it works.

Now we need to see why the ‘ai_family’ hint is causing troubles in
glibc, and perhaps in parallel try to work around it in MPICH…

Ludo’.

PS: I’ll be mostly away from keyboard in the coming days.

(use-modules (guix) (gnu))

(define code
(plain-file "mpi.c" "
#include <assert.h>
#include <stdio.h>
#include <mpi.h>

int main (int argc, char *argv[]) {
int err, np, rank;
err = MPI_Init (&argc, &argv);
assert (err == 0);
err = MPI_Comm_size(MPI_COMM_WORLD, &np);
assert (err == 0);
err = MPI_Comm_rank(MPI_COMM_WORLD, &rank);
assert (err == 0);
printf (\"np = %i, rank = %i\\n\", np, rank);
return 0;
} "))

(define toolchain (specification->package "gcc-toolchain"))
(define mpich (specification->package "mpich"))

(computed-file "mpi-init"
(with-imported-modules '((guix build utils))
#~(begin
(use-modules (guix build utils))

(setenv "PATH"
(string-append #$(file-append toolchain "/bin") ":"
#$(file-append mpich "/bin")))
(setenv "CPATH" #$(file-append mpich "/include"))
(setenv "LIBRARY_PATH"
(string-append #$(file-append mpich "/lib") ":"
#$(file-append toolchain "/lib")))
(invoke "mpicc" "-o" #$output "-Wall" "-g"
#$code)

;; Run the MPI code in the build environment.
(invoke "mpiexec" "-np" "2" #$output))))
Note that it is ok with the raw mpich patch
guix time-machine --commit=398ec3c1e265a3f89ed07987f33b264db82e4080 -- time-machine --url=https://gitlab.inria.fr/bremond/guix.git--branch=add-mpich -- build mumps-openmpi --with-input=openmpi=mpich

I tried a build with the same hwloc as the embedded commit f7b08df258c2e7d04ca2035ddd55a1de91f806d4
(the HEAD used for hwloc in mpich) but the result is the same:

guix time-machine --commit=398ec3c1e265a3f89ed07987f33b264db82e4080 -- time-machine --url=https://gitlab.inria.fr/bremond/guix.git--branch=test-mpich -- build mumps-openmpi --with-input=openmpi=mpich

(the 2 steps time-machine needed is another question...)


Maurice
L
L
Ludovic Courtès wrote on 20 Oct 2020 22:55
(name . Maurice Brémond)(address . Maurice.Bremond@inria.fr)
87v9f4bowq.fsf@gnu.org
Hi Maurice,

Maurice Brémond <Maurice.Bremond@inria.fr> skribis:

Toggle quote (4 lines)
> A build of mumps-openmpi with mpich fails:
>
> guix time-machine -- build mumps-openmpi --with-input=openmpi=mpich

[...]

Toggle quote (3 lines)
> MPID_nem_tcp_get_business_card(401):
> MPID_nem_tcp_init(373).............: gethostbyname failed, localhost (errno 0)

[...]

Toggle quote (17 lines)
> ‘MPL_get_sockaddr’ uses ‘getaddrinfo’ for host name lookup.
> Interestingly, ‘getaddrinfo’ fails in the build environment when passed
> the flags that ‘MPL_get_sockaddr’ uses:
>
> (computed-file "getaddrinfo"
> #~(pk #$output
> (getaddrinfo "localhost" #f
> (logior AI_ADDRCONFIG AI_V4MAPPED)
> AF_INET
> SOCK_STREAM
> IPPROTO_TCP)))
>
> However, if you comment AF_INET, SOCK_STREAM, and IPPROTO_TCP, it works.
>
> Now we need to see why the ‘ai_family’ hint is causing troubles in
> glibc, and perhaps in parallel try to work around it in MPICH…

Oh thanks for the reminder, I have yet to take a closer look… hopefully
soon.

Ludo’.
Z
Z
zimoun wrote on 21 Oct 2020 16:43
(off-topic) double time-machine explanations
(name . Maurice Brémond)(address . Maurice.Bremond@inria.fr)
CAJ3okZ1a6yevGiMHoWux6qcQvsXbWKw6muQwdThTDZGU+iT3tQ@mail.gmail.com
Dear Maurice,

Thank you for the tests. Ouch! I will try to give a deep look next
week... even if it is maybe out of my skill. Well, the v1.2 is
coming and it could be nice to have both MPI. :-)

On Mon, 19 Oct 2020 at 15:46, Maurice Brémond <Maurice.Bremond@inria.fr> wrote:

[...]

Toggle quote (10 lines)
> Note that it is ok with the raw mpich patch
> guix time-machine --commit=398ec3c1e265a3f89ed07987f33b264db82e4080 -- time-machine --url=https://gitlab.inria.fr/bremond/guix.git --branch=add-mpich -- build mumps-openmpi --with-input=openmpi=mpich
>
> I tried a build with the same hwloc as the embedded commit f7b08df258c2e7d04ca2035ddd55a1de91f806d4
> (the HEAD used for hwloc in mpich) but the result is the same:
>
> guix time-machine --commit=398ec3c1e265a3f89ed07987f33b264db82e4080 -- time-machine --url=https://gitlab.inria.fr/bremond/guix.git --branch=test-mpich -- build mumps-openmpi --with-input=openmpi=mpich
>
> (the 2 steps time-machine needed is another question...)

The 2 "time-machine" are because the repo
https://gitlab.inria.fr/bremond/guix.gitlags really behind master, I
guess. You can cut to only one by using a channel file, something
like:

Toggle snippet (10 lines)
(list (channel
(name 'guix)
(url "https://git.savannah.gnu.org/git/guix.git")
(commit
"398ec3c1e265a3f89ed07987f33b264db82e4080"))
(channel
(name ’yours)
(url "https://gitlab.inria.fr/bremond/guix.git")))

and then "guix time-machine -C channels.scm -- build ..."


But yeah, that's another story. :-)


All the best,
simon
M
M
Maurice Brémond wrote on 23 Oct 2020 10:41
(name . zimoun)(address . zimon.toutoune@gmail.com)
871rhp49mb.fsf@inria.fr
Hello Simon,

thank you for the explanation, and sorry for the digression. I'm going
to read more carefully the manual...

Maurice
M
M
Maurice Brémond wrote on 23 Oct 2020 11:33
Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich, pt-scotch-mpich, python-mpi4py-mpich
(name . Ludovic Courtès)(address . ludo@gnu.org)
87v9f12so9.fsf@inria.fr
Hello Ludovic,

Apparently at the mpich configuration level, using the experimental
device ch4 instead of ch3 solves the problem : just remove comment on
"--with-device=ch4:ucx". Reversely, with mpich 3.4a2 (for which ch4 is
de default) setting --with-device=ch3 leads to the same failure as with
3.3.2.

I also checked sock channel for ch3 : with-device=ch3:sock, but then on
my laptop, scotch tests hang at

mpirun -n 3 ./test_scotch_dgraph_check data/bump.grf

For the moment, there isn't a stable 3.4 version yet for mpich. I had a
try with the latest 3.4b1 but a test failed...


Maurice
L
L
Ludovic Courtès wrote on 23 Oct 2020 17:26
(name . Maurice Brémond)(address . Maurice.Bremond@inria.fr)
873625m09f.fsf@gnu.org
Hi Maurice,

Maurice Brémond <Maurice.Bremond@inria.fr> skribis:

Toggle quote (6 lines)
> Apparently at the mpich configuration level, using the experimental
> device ch4 instead of ch3 solves the problem : just remove comment on
> "--with-device=ch4:ucx". Reversely, with mpich 3.4a2 (for which ch4 is
> de default) setting --with-device=ch3 leads to the same failure as with
> 3.3.2.

Nice, we have a way forward.

With the patch below, I have successfully built:

guix build mumps-openmpi --with-input=openmpi=mpich

and I confirm that despite the name it depends exclusively on MPICH.
:-)

If that’s fine with you I’ll go ahead and commit it; let me know!

Toggle quote (8 lines)
> I also checked sock channel for ch3 : with-device=ch3:sock, but then on
> my laptop, scotch tests hang at
>
> mpirun -n 3 ./test_scotch_dgraph_check data/bump.grf
>
> For the moment, there isn't a stable 3.4 version yet for mpich. I had a
> try with the latest 3.4b1 but a test failed...

We’ll see, but having a solution that works with 3.3 and is likely to
work with 3.4 is good.

I guess we should also check whether we’re obtaining the expected
performance. This builds fine too:

guix build intel-mpi-benchmarks --with-input=openmpi=mpich

Thank you!

Ludo’.
Toggle diff (17 lines)
diff --git a/gnu/packages/mpi.scm b/gnu/packages/mpi.scm
index 06a82cce95..9035147441 100644
--- a/gnu/packages/mpi.scm
+++ b/gnu/packages/mpi.scm
@@ -436,7 +436,12 @@ arrays) that expose a buffer interface.")
`(#:configure-flags
(list "--disable-silent-rules" ;let's see what's happening
"--enable-debuginfo"
- ;; "--with-device=ch4:ucx" ; --with-device=ch4:ofi segfaults in tests
+
+ ;; Default to "ch4", as will be the case in 3.4. It also works
+ ;; around issues when running test suites of packages that use
+ ;; MPICH: <https://issues.guix.gnu.org/39588#15>.
+ "--with-device=ch4:ucx" ; --with-device=ch4:ofi segfaults in tests
+
(string-append "--with-hwloc-prefix="
(assoc-ref %build-inputs "hwloc"))
M
M
Maurice Brémond wrote on 23 Oct 2020 19:04
(name . Ludovic Courtès)(address . ludovic.courtes@inria.fr)
87h7qkub4p.fsf@inria.fr
Toggle quote (1 lines)
>If that’s fine with you I’ll go ahead and commit it; let me know!
It's ok for me and what I do with it.

Bon week-end!
L
L
Ludovic Courtès wrote on 2 Nov 2020 15:02
(name . Maurice Brémond)(address . Maurice.Bremond@inria.fr)
87h7q7luvz.fsf@inria.fr
Salut,

Maurice Brémond <Maurice.Bremond@inria.fr> skribis:

Toggle quote (5 lines)
>>If that’s fine with you I’ll go ahead and commit it; let me know!
> It's ok for me and what I do with it.
>
> Bon week-end!

Finally pushed as c73496f433044a76003b33c3855bb35ecd0df87f, thanks!

I’m closing this bug, let’s open a new one if we need to further discuss
MPI support in Guix.

Ludo’.
Closed
?