Hi, Mathieu Othacehe skribis: >> Oh! That indicates that it’s failing to offload to one of the >> ‘localhost’ build machines specified in /etc/guix/machines.scm. >> Normally there’s an SSH tunnel set up for those, but I guess it broke. >> >> Perhaps we can update /etc/guix/machines.scm to refer to armhf-linux >> machines by their WireGuard IP? > > Seems like the right thing to do. This bit is also an unstaged change in > the berlin maintenance repository, we should commit it. Tobias, could > you have a look :) ? > > +(define powerpc64le > + (list > + ;; A VM donated/hosted by OSUOSL & administered by nckx. > + ;; XXX: SSH tunnel via overdrive1: > + ;; ssh -L 2224:p9.tobias.gr:22 hydra@10.0.0.3 > + #;(build-machine > + ;;(name "p9.tobias.gr") > + (name "localhost") > + (port 2224) > + (user "hydra") > + (systems '("powerpc64le-linux")) > + (host-key "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJEbRxJ6WqnNLYEMNDUKFcdMtyZ9V/6oEfBFSHY8xE6A nckx")))) IIRC this machine is now running WireGuard, Tobias? If so, could you change this to refer to its WireGuard IP and commit it? > I also found that other machines were unreachable and commented them: > > ;; CPU: 16 ARM Cortex-A72 cores > ;; RAM: 32 GB > - (list (build-machine > + (list #;(build-machine > ;;kreuzberg > (name "10.0.0.9") > (user "hydra") Ricardo, could you check what’s wrong with kreuzberg? > @@ -243,13 +256,13 @@ > ;; BeagleBoard X15 kindly hosted by Simon Josefsson. > ;; CPU: Cortex A15 (2 cores) > ;; RAM: 2 GB > - (build-machine > + #;(build-machine > (name "10.0.0.5") ;guix-x15 > (user "hydra") > (systems '("armhf-linux")) > (host-key "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOfXjwCAFWeGiUoOVXEgtIeXxbtymjOTg7ph1ObMAcJ0 root@beaglebone")) > > - (build-machine > + #;(build-machine > (name "10.0.0.6") ;guix-x15b > (user "hydra") > (systems '("armhf-linux")) Oops. Note that it’s not necessary to comment them all out. As long as at least one machine is available for a given system type, we’re fine: ‘guix offload’ will pick it up. > Nevertheless we are hitting an offload issue here, maybe an occurrence > of #24496. The offload mechanism should timeout when a machine is > unreachable instead of retrying over and over, causing all evaluation > processes to hang. Yes, though the problem here is that some architectures were left with zero machines IIRC, so it would have failed one way or another. Thanks! Ludo’.