To: Maurice Brémond <Maurice.Bremond@inria.fr>
Cc: 39588@debbugs.gnu.org, zimoun <zimon.toutoune@gmail.com>
Hi,
I actually managed to reproduce it with a minimal test case (attached):
$ guix build -f mpich-test.scm
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
La jena derivo estos konstruata:
/gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv
building /gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv...
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
Invalid error code (-2) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
Invalid error code (-2) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(586)..............:
MPID_Init(224).....................: channel initialization failed
MPIDI_CH3_Init(105)................:
MPID_nem_init(324).................:
MPID_nem_tcp_init(175).............:
MPID_nem_tcp_get_business_card(401):
MPID_nem_tcp_init(373).............: gethostbyname failed, localhost (errno 0)
Backtrace:
1 (primitive-load "/gnu/store/iykxzg1n018sigd4c23kx1c4ngz?")
In guix/build/utils.scm:
652:6 0 (invoke _ . _)
guix/build/utils.scm:652:6: In procedure invoke:
Throw to key `srfi-34' with args `(#<condition &invoke-error [program: "mpiexec" arguments: ("-np" "2" "/gnu/store/8i1dci1wxd6c0q6a2cz4kgb8adfk8rrz-mpi-init") exit-status: 15 term-signal: #f stop-signal: #f] 7ffff6022f40>)'.
builder for `/gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv' failed with exit code 1
build of /gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv failed
View build log at '/var/log/guix/drvs/rg/r7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv.bz2'.
guix build: error: build of `/gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv' failed
The same program outside the container works just fine:
$ guix environment --ad-hoc mpich -- mpiexec -np 2 "/gnu/store/8i1dci1wxd6c0q6a2cz4kgb8adfk8rrz-mpi-init"
np = 2, rank = 0
np = 2, rank = 1
‘MPL_get_sockaddr’ uses ‘getaddrinfo’ for host name lookup.
Interestingly, ‘getaddrinfo’ fails in the build environment when passed
the flags that ‘MPL_get_sockaddr’ uses:
(computed-file "getaddrinfo"
#~(pk #$output
(getaddrinfo "localhost" #f
(logior AI_ADDRCONFIG AI_V4MAPPED)
AF_INET
SOCK_STREAM
IPPROTO_TCP)))
However, if you comment AF_INET, SOCK_STREAM, and IPPROTO_TCP, it works.
Now we need to see why the ‘ai_family’ hint is causing troubles in
glibc, and perhaps in parallel try to work around it in MPICH…
Ludo’.
PS: I’ll be mostly away from keyboard in the coming days.
(use-modules (guix) (gnu))
(define code
(plain-file "mpi.c" "
#include <assert.h>
#include <stdio.h>
#include <mpi.h>
int main (int argc, char *argv[]) {
int err, np, rank;
err = MPI_Init (&argc, &argv);
assert (err == 0);
err = MPI_Comm_size(MPI_COMM_WORLD, &np);
assert (err == 0);
err = MPI_Comm_rank(MPI_COMM_WORLD, &rank);
assert (err == 0);
printf (\"np = %i, rank = %i\\n\", np, rank);
return 0;
} "))
(define toolchain (specification->package "gcc-toolchain"))
(define mpich (specification->package "mpich"))
(computed-file "mpi-init"
(with-imported-modules '((guix build utils))
#~(begin
(use-modules (guix build utils))
(setenv "PATH"
(string-append #$(file-append toolchain "/bin") ":"
#$(file-append mpich "/bin")))
(setenv "CPATH" #$(file-append mpich "/include"))
(setenv "LIBRARY_PATH"
(string-append #$(file-append mpich "/lib") ":"
#$(file-append toolchain "/lib")))
(invoke "mpicc" "-o" #$output "-Wall" "-g"
#$code)
;; Run the MPI code in the build environment.
(invoke "mpiexec" "-np" "2" #$output))))