Allowing 'guix pull' to operate remotely

  • Open
  • quality assurance status badge
Details
2 participants
  • Lars-Dominik Braun
  • Ludovic Courtès
Owner
unassigned
Submitted by
Lars-Dominik Braun
Severity
wishlist
L
L
Lars-Dominik Braun wrote on 5 Mar 2020 14:33
`guix pull` failure in multi-machine setup
(address . bug-guix@gnu.org)
20200305133318.GB2909@zpidnp36
Hi,

I’m using guix on a multi-machine setup with a single remote guix-daemon that
can be reached via SSH. Thus GUIX_DAEMON_SOCKET=ssh://master.<domain> on the
compute nodes. Running `guix pull` on master works fine (the variable is not
set here), but it does not on a compute node. Instead it fails with this error:

---snip---
Backtrace:
1 (primitive-load "/gnu/store/n5wgvz287dwm62474mr42x34wl5j5wh7-compute-guix-derivation")
In ice-9/eval.scm:
293:34 0 (_ #(#(#(#(#(#(#(#(#(#(#(#(#<directory (guile-user) 7f19dd213140> (?)) #) # ?) ?) ?) ?) ?) ?) ?) ?) ?) ?))

ice-9/eval.scm:293:34: Throw to key `srfi-34' with args `(#<condition &store-connection-error [file: "/var/guix/daemon-socket/socket" errno: 111] 7f19dba3a090>)'.
guix pull: error: You found a bug: the program '/gnu/store/n5wgvz287dwm62474mr42x34wl5j5wh7-compute-guix-derivation'
failed to compute the derivation for Guix (version: "aac148a87b9a79b9992b8b1a9d76c217175d4a88"; system: "x86_64-linux";
host version: "aac148a87b9a79b9992b8b1a9d76c217175d4a88"; pull-version: 1).
Please report it by email to <bug-guix@gnu.org>.
---snap---

Obviously the socket on that compute machine is not working, because it’s on an
NFS share /var/guix belonging to master. But why is the socket considered in
the first place?

Cheers,
Lars
-----BEGIN PGP SIGNATURE-----

iQGzBAABCgAdFiEEyk+M9DfXR4/aBV/UQhN3ARo3hEYFAl5g/44ACgkQQhN3ARo3
hEbhOwv/dPpapHvVULzW8shmZkmDYRBPnvE+7oqnG11F6hNxtdbdKuDdnynA+PuF
okH9uS32aikWcfkR9N7PNg/I/H6OIJhWr39TSAs7anqD0/FkQl5jhTEl9LQGsboK
D1oXUMtnv4Re/kInfUnfTcgR8K/aiult/YOADFOWuh+/yWrJs8v8mTdINxhESkK3
fmNKP6HC8EukzgL36l/7nTFhZ2ns2DHaa1lUzvDfm9G1NV8YIEznqluu8Du8JRpS
iD466592lUq9haM0ziosgqWUL7ze7yzdr4zoaUFcZd0IQru84tG0t2AhBUozhSyQ
fxFszxsxY/+XdG/wxCCTa4Z5VFEvyw75I8rGScRYYcmUNeUixPYe5PG4EHMml3Li
URbA8OhfyWhmMz2B3kBLnqq08uDJPkIvh8k2AsIqrdzlNZtskAKctsplhCoGP+dh
i5hKn5E9ubsLWkmCaxE6PPUMWzvoCbdKHdd8OaObH/A/1r5bsMVgYc2ZKsG9owHf
D+4vnGs4
=Oki/
-----END PGP SIGNATURE-----


L
L
Ludovic Courtès wrote on 5 Mar 2020 18:20
(name . Lars-Dominik Braun)(address . ldb@leibniz-psychology.org)(address . 39925@debbugs.gnu.org)
87ftem7m6d.fsf@gnu.org
Hi,

Lars-Dominik Braun <ldb@leibniz-psychology.org> skribis:

Toggle quote (22 lines)
> I’m using guix on a multi-machine setup with a single remote guix-daemon that
> can be reached via SSH. Thus GUIX_DAEMON_SOCKET=ssh://master.<domain> on the
> compute nodes. Running `guix pull` on master works fine (the variable is not
> set here), but it does not on a compute node. Instead it fails with this error:
>
> ---snip---
> Backtrace:
> 1 (primitive-load "/gnu/store/n5wgvz287dwm62474mr42x34wl5j5wh7-compute-guix-derivation")
> In ice-9/eval.scm:
> 293:34 0 (_ #(#(#(#(#(#(#(#(#(#(#(#(#<directory (guile-user) 7f19dd213140> (?)) #) # ?) ?) ?) ?) ?) ?) ?) ?) ?) ?))
>
> ice-9/eval.scm:293:34: Throw to key `srfi-34' with args `(#<condition &store-connection-error [file: "/var/guix/daemon-socket/socket" errno: 111] 7f19dba3a090>)'.
> guix pull: error: You found a bug: the program '/gnu/store/n5wgvz287dwm62474mr42x34wl5j5wh7-compute-guix-derivation'
> failed to compute the derivation for Guix (version: "aac148a87b9a79b9992b8b1a9d76c217175d4a88"; system: "x86_64-linux";
> host version: "aac148a87b9a79b9992b8b1a9d76c217175d4a88"; pull-version: 1).
> Please report it by email to <bug-guix@gnu.org>.
> ---snap---
>
> Obviously the socket on that compute machine is not working, because it’s on an
> NFS share /var/guix belonging to master. But why is the socket considered in
> the first place?

This is a limitation in ‘build-aux/build-self.scm’:

;; Use the port beneath the current store as the stdin of BUILD. This
;; way, we know 'open-pipe*' will not close it on 'exec'. If PORT is
;; not a file port (e.g., it's an SSH channel), then the subprocess's
;; stdin will actually be /dev/null.
(let* ((pipe (with-input-from-port port
(lambda ()
;; …
(if (file-port? port) ;<- here
(number->string
(logior major minor))
"none"))))))

We could work around it by letting the ‘GUIX_DAEMON_SOCKET’ environment
variable through, along these lines:
Toggle diff (21 lines)
diff --git a/build-aux/build-self.scm b/build-aux/build-self.scm
index f2e785b7f1..18a78b5f41 100644
--- a/build-aux/build-self.scm
+++ b/build-aux/build-self.scm
@@ -400,6 +400,7 @@ files."
#:pull-version pull-version))
(system (if system (return system) (current-system)))
(home -> (getenv "HOME"))
+ (daemon-socket -> (getenv "GUIX_DAEMON_SOCKET"))
;; Note: Use the deprecated names here because the
;; caller might be Guix <= 0.16.0.
@@ -424,6 +425,8 @@ files."
(when home
;; Inherit HOME so that 'xdg-directory' works.
(setenv "HOME" home))
+ (when (and (not (file-port? port) daemon-socket))
+ (setenv "GUIX_DAEMON_SOCKET" daemon-socket))
(open-pipe* OPEN_READ
(derivation->output-path build)
source system version
It’s a bit hacky though, and won’t work with old Guix revisions anyway.

However, for your use case, you could perhaps simply pull on one machine
and use ‘guix copy’ to send Guix elsewhere? Or even explicitly run
‘guix pull’ on each node?

Thanks,
Ludo’.
L
L
Lars-Dominik Braun wrote on 6 Mar 2020 08:40
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 39925@debbugs.gnu.org)
20200306074018.GC2909@zpidnp36
Hi Ludo,

Toggle quote (1 lines)
> This is a limitation in ‘build-aux/build-self.scm’: […]
I don’t understand what’s going on there unfortunately. Is there a high-level
explanation somewhere in the manual?

Toggle quote (2 lines)
> We could work around it by letting the ‘GUIX_DAEMON_SOCKET’ environment
> variable through, along these lines:
Nope, that does not seem to be enough. After pulling on master doing the same
on a node (with a patched guix) yields:

---snip---
ice-9/eval.scm:293:34: Throw to key `srfi-34' with args `(#<condition &store-connection-error [file: "ssh://master.<domain>" errno: 95] 7f0f325f77b0>)'.
---snap---

Any ideas?

Toggle quote (1 lines)
> + (when (and (not (file-port? port) daemon-socket))
(when (and (not (file-port? port)) daemon-socket)
I assume: ↑

Toggle quote (1 lines)
> […] and won’t work with old Guix revisions anyway.
That means `guix time-machine` could not go back beyond a commit that fixes the
issue, correct? Not a concern for me.

Toggle quote (2 lines)
> However, for your use case, you could perhaps simply pull on one machine
> and use ‘guix copy’ to send Guix elsewhere?
The store is the same on all machines, since /gnu/store, /var/guix and /home
are all shared via NFS. As far as I understand the manual `guix copy` would be
useful for store to store transfers on different machines only.

Lars
-----BEGIN PGP SIGNATURE-----

iQGzBAABCgAdFiEEyk+M9DfXR4/aBV/UQhN3ARo3hEYFAl5h/mAACgkQQhN3ARo3
hEaIigwAq3b8Oa2bLKiVr4ytMk4nMGyYYoUUs1onF1uzCIQQyxOdsvqizMCj+ouM
R1m4MxWyuFcVM33gbzrLcJN8pCIepMPt0GD9iZnybK8nQ4EBf5wZGTSJ+FfzmLbh
B7Cp7X1WkCfo8/SeUj0a3UwfvarzWYGOh2cELvrlnaQrkfDWA9gAElLPEbDQUu9T
BbfmNoAWyZD4nMHRhBW41q8CSDfGjIh7mrWeUyFiVMcKNQGF8AyttibSHSjM5hpa
snetCoMtIrWQWl6R9NSnjb3kg2kZxfvSN1fChmqkV7QblIeMEG0kxe3MSEoAswWn
ExG9rRRBCzEM+nlgV4lYZW7OcYB0F0p7EApldnmg4iO2zbcqacu4/sQb2NKnW4nA
Drem07JMexHnwHJBncVVUQj5A03sIgAtEXf34FapTFeUh7XVbuwPwDRCF1w5CM51
z9Ti3eE8FHaMFtGx2FKrZUiayzXaAddomxRV4I2jAT1VYKJuZgwhp5iCFSb5YCTM
v7v8Pxau
=Ta+7
-----END PGP SIGNATURE-----


L
L
Ludovic Courtès wrote on 6 Mar 2020 11:53
(name . Lars-Dominik Braun)(address . ldb@leibniz-psychology.org)(address . 39925@debbugs.gnu.org)
87wo7xoiuj.fsf@gnu.org
Hello,

Lars-Dominik Braun <ldb@leibniz-psychology.org> skribis:

Toggle quote (15 lines)
>> This is a limitation in ‘build-aux/build-self.scm’: […]
> I don’t understand what’s going on there unfortunately. Is there a high-level
> explanation somewhere in the manual?
>
>> We could work around it by letting the ‘GUIX_DAEMON_SOCKET’ environment
>> variable through, along these lines:
> Nope, that does not seem to be enough. After pulling on master doing the same
> on a node (with a patched guix) yields:
>
> ---snip---
> ice-9/eval.scm:293:34: Throw to key `srfi-34' with args `(#<condition &store-connection-error [file: "ssh://master.<domain>" errno: 95] 7f0f325f77b0>)'.
> ---snap---
>
> Any ideas?

Sounds like this ssh URI is not valid on the nodes, is that right?

Toggle quote (8 lines)
>> + (when (and (not (file-port? port) daemon-socket))
> (when (and (not (file-port? port)) daemon-socket)
> I assume: ↑
>
>> […] and won’t work with old Guix revisions anyway.
> That means `guix time-machine` could not go back beyond a commit that fixes the
> issue, correct? Not a concern for me.

Correct.

Toggle quote (6 lines)
>> However, for your use case, you could perhaps simply pull on one machine
>> and use ‘guix copy’ to send Guix elsewhere?
> The store is the same on all machines, since /gnu/store, /var/guix and /home
> are all shared via NFS. As far as I understand the manual `guix copy` would be
> useful for store to store transfers on different machines only.

Right. So perhaps I don’t quite understand the use case. What about
simply pulling from one of these machines, if everything is shared over
NFS?

HTH,
Ludo’.
L
L
Lars-Dominik Braun wrote on 6 Mar 2020 12:45
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 39925@debbugs.gnu.org)
20200306114548.GD2909@zpidnp36
Hi,

Toggle quote (1 lines)
> Sounds like this ssh URI is not valid on the nodes, is that right?
I would consider it valid, since `ssh master.<domain>` and `guix build
<package>` both work just fine from the nodes. It’s just `guix pull`, which is
causing issues.

Toggle quote (3 lines)
> Right. So perhaps I don’t quite understand the use case. What about
> simply pulling from one of these machines, if everything is shared over
> NFS?
Sure, that’s an option, but anyone who tries will get a strange error message.
And it breaks the appeal of having a remote guix daemon in the first place,
that is being able to run `guix <whatever>` on any machine I log into. If that
is not the case (i.e. not for `guix pull`) it would be more consistent to ask
users to SSH into a different machine every time they interact with guix. Does
that explain my use case?

Lars
-----BEGIN PGP SIGNATURE-----

iQGzBAABCgAdFiEEyk+M9DfXR4/aBV/UQhN3ARo3hEYFAl5iN+YACgkQQhN3ARo3
hEYgiAv/cY2qHPkhZiV8IfcewbnLgKQaS6+BotFrsQYDt9ZISYcHGcSgWymCoGBY
nqZllY9lBedxntAW4t4hNox5nfg2UY5oJ3Sqk7BFP4lmbrIAm7AUIjaVN+FCWtSz
Gn0S66F90H8BTSya4DcjHZA2OQ7f+QzhJPRZqjVhUkAzkklteE251XeaA9ZRnzFY
vaPkiQrCFPzHiaiDtthKLpcHCzr2TR1ucr8OboChtv2FhuorYBMvc7n9njyWIMDr
mWy4ixmZp6MG/R3EmC8nE4E/9/0WGhV3EA+7Yq2ICpBmYrDv0AXRW5RqxV2TgG0L
I6VTwY5MDA+zD00Rdswv463AWPwjkEzoppucjZX+Nl6lG7h11J3UZhd4VsSqli41
g0WEO98LT86Zf6TFn1VzioUZxwRBmhekd7E/PTaYJxkDxtBzD4CCVKCl4Vh0K/j+
BfdsNzgx0O0Idwa2QogrrZZqHNLjuuNRVDHshn7Atb6TyvOeR2LlOsU8TvYOqprv
zzQbfiZn
=JCnG
-----END PGP SIGNATURE-----


L
L
Ludovic Courtès wrote on 8 Mar 2020 12:40
(name . Lars-Dominik Braun)(address . ldb@leibniz-psychology.org)(address . 39925@debbugs.gnu.org)
87zhcraxce.fsf@gnu.org
Hi,

Lars-Dominik Braun <ldb@leibniz-psychology.org> skribis:

Toggle quote (5 lines)
>> Sounds like this ssh URI is not valid on the nodes, is that right?
> I would consider it valid, since `ssh master.<domain>` and `guix build
> <package>` both work just fine from the nodes. It’s just `guix pull`, which is
> causing issues.

Oh it may be that we would also need to let ‘HOME’ through, so that
~/.ssh/config is found, for example. That could have undesirable side
effects that are best avoided, though (e.g., ~/.cache/guile would become
visible.)

Toggle quote (5 lines)
>> Right. So perhaps I don’t quite understand the use case. What about
>> simply pulling from one of these machines, if everything is shared over
>> NFS?
> Sure, that’s an option, but anyone who tries will get a strange error message.

I agree that the error message is sub-optimal. Not sure how to improve
on it (how can ‘build-self.scm’ know that it’s failing because of
that?).

Toggle quote (6 lines)
> And it breaks the appeal of having a remote guix daemon in the first place,
> that is being able to run `guix <whatever>` on any machine I log into. If that
> is not the case (i.e. not for `guix pull`) it would be more consistent to ask
> users to SSH into a different machine every time they interact with guix. Does
> that explain my use case?

Instead of:

GUIX_DAEMON_SOCKET=ssh://host guix pull

You could run:

ssh host guix pull

In fact, the former would probably not work because ‘guix pull’ modifies
the local /var/guix/profiles, not the one on the host that runs the
daemon.

So maybe the problem is that ‘GUIX_DAEMON_SOCKET=ssh://’ isn’t quite as
powerful as you thought. :-) It’s really just a way to talk to a remote
daemon, but ‘guix pull’, ‘guix package’, etc. also need to access
/var/guix/profiles.

Thanks,
Ludo’.
L
L
Lars-Dominik Braun wrote on 9 Mar 2020 09:22
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 39925@debbugs.gnu.org)
20200309082253.GA2917@zpidnp36
Hi Ludo,

Toggle quote (4 lines)
> Oh it may be that we would also need to let ‘HOME’ through, so that
> ~/.ssh/config is found, for example. That could have undesirable side
> effects that are best avoided, though (e.g., ~/.cache/guile would become
> visible.)
shouldn’t be a problem since ~/.ssh/config does not exist for that user and
known hosts are globally declared in /etc/ssh/ssh_known_hosts (strace indicates
that guile-ssh/libssh reads that file).

Toggle quote (3 lines)
> I agree that the error message is sub-optimal. Not sure how to improve
> on it (how can ‘build-self.scm’ know that it’s failing because of
> that?).
If I stop the daemon and `guix pull` it just says “guix pull: error: failed to
connect to `/var/guix/daemon-socket/socket': Connection refused”. Something
similar should do. I don’t know whether that’s possible though.

Toggle quote (2 lines)
> You could run:
> ssh host guix pull
Sure, that’s the only workaround I can think of right now.

Toggle quote (3 lines)
> In fact, the former would probably not work because ‘guix pull’ modifies
> the local /var/guix/profiles, not the one on the host that runs the
> daemon.
Yes, /var/guix is shared via NFS too. Otherwise roaming between machines
wouldn’t work at all.

Toggle quote (2 lines)
> So maybe the problem is that ‘GUIX_DAEMON_SOCKET=ssh://’ isn’t quite as
> powerful as you thought. :-)
It is, it’s just a bug we have to fix :) Can I help you debug this somehow,
i.e. figure out where exactly the error message is coming from?

Cheers,
Lars
-----BEGIN PGP SIGNATURE-----

iQGzBAABCgAdFiEEyk+M9DfXR4/aBV/UQhN3ARo3hEYFAl5l/NkACgkQQhN3ARo3
hEZd2gv/eZUJBo3JkZ+OkyjzuZ6sJR6iPVbjYyK0rvjOcyUF+frXNdMWjzRnUnMw
P/NdzmLezTMirA/fLfJRFSSRPwO+4hH5mZFSudUVOZ2leG3UHaX3UNWLx1rjHY5k
4EMa+hAo3Oi35v5hudZnizlWDbFN2j2eb8QyTfjfSBCbW3tAsWBt6vW7p4w+kvRY
9PXj1E0OrY6nvD+syf9qAmK5NGgk0nXsVx5C1LQyqG1XQnv8YaoJUNYpLb4ZzMfS
eyaxAbpj6S7feXOm7ZI/Qfh6bBQHNa/K4Y6JdgGo8cXLlPL6kN40GGdAHxHTrYxW
eati83IlqVLmFkVUBTc8wrxdUCCGV+VsE5o4Khg99twpCLZpFHmO51w4UYYcblGQ
7O3no1JPDyS+13gg2EoF7kG7KJltncKwQ7Yigxh5sCJ3n6au9XcMKyj6nH9gvh/u
bo9/cgjbpZ3fYXjfcP6Sjbvd+QcaJAizajLSFkVvSKQVk9JZ9ug1gd37UlwY7qWe
5FWI883U
=O3xE
-----END PGP SIGNATURE-----


L
L
Ludovic Courtès wrote on 9 Mar 2020 11:46
(name . Lars-Dominik Braun)(address . ldb@leibniz-psychology.org)(address . 39925@debbugs.gnu.org)
87fteh3iwv.fsf@gnu.org
Hi!

Lars-Dominik Braun <ldb@leibniz-psychology.org> skribis:

Toggle quote (11 lines)
>> In fact, the former would probably not work because ‘guix pull’ modifies
>> the local /var/guix/profiles, not the one on the host that runs the
>> daemon.
> Yes, /var/guix is shared via NFS too. Otherwise roaming between machines
> wouldn’t work at all.
>
>> So maybe the problem is that ‘GUIX_DAEMON_SOCKET=ssh://’ isn’t quite as
>> powerful as you thought. :-)
> It is, it’s just a bug we have to fix :) Can I help you debug this somehow,
> i.e. figure out where exactly the error message is coming from?

Well, I think you’re really asking for a new feature; we need more than
just talk to a remote daemon.

Updating profiles like ‘guix package’ and ‘guix pull’ do involve two
things:

1. building the profile—this is done by talking to the daemon;

2. modifying things in /var/guix/profiles & co.

GUIX_DAEMON_SOCKET addresses #1 but not #2.

For #2, we would need to do something like Jakub did in (guix scripts
system reconfigure), where the effectul bits can be transparently
evaluated either locally or remotely.

But really, that’d be a brand new feature, so I’m marking it as a
wishlist if you don’t mind. :-)

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 9 Mar 2020 11:46
control message for bug #39925
(address . control@debbugs.gnu.org)
87eeu13iw3.fsf@gnu.org
retitle 39925 Allowing 'guix pull' to operate remotely
quit
L
L
Ludovic Courtès wrote on 9 Mar 2020 11:47
(address . control@debbugs.gnu.org)
87d09l3ivu.fsf@gnu.org
severity 39925 wishlist
quit
L
L
Lars-Dominik Braun wrote on 10 Mar 2020 08:19
Re: bug#39925: `guix pull` failure in multi-machine setup
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 39925@debbugs.gnu.org)
20200310071901.GB2917@zpidnp36
Hey Ludo,

Toggle quote (3 lines)
> For #2, we would need to do something like Jakub did in (guix scripts
> system reconfigure), where the effectul bits can be transparently
> evaluated either locally or remotely.
I don’t understand why #2 needs different mechanics. As I said, /var/guix is
mounted r/w on every machine and in fact `guix package -i` is working as
intended.

Maybe we’ve got a communication issue here and we’re talking about two
different things?

Lars
-----BEGIN PGP SIGNATURE-----

iQGzBAABCgAdFiEEyk+M9DfXR4/aBV/UQhN3ARo3hEYFAl5nP2AACgkQQhN3ARo3
hEYi6Av9GrViMbCfYgg1SrfWDEqoq2ZMTB03BImComFuK+68seyIiOAf0SDtMRyY
d281X3fDq2XlYAjgo6J7V1uaHoniDdt6G3RMlP2dD+P39yiv1xxo83v2Ytd7uq/Y
RJHL/lkFT1dKaXs8cLNib870Nco0muVf9gErjptgsmcqnLYjHKseDwG/1p2f7JSA
LtHPXiCP6xRQxOBrY37D3ChshXAQycZWVdNdndMeRyFKuUeuxGGjwwt/UUB3+uYX
385F1zZvDG25SLeJDwoxLurExfmMLw9UOu3U4e03jfyxC0+Rany6fMxcz+IItO8g
VTJxUOr0jaXfOJjAKXDQOFFQm7/SvbhOw6t6Ruk2xWAEid8TJchglbcomJ5P0EW/
htd4819dabmTQsXu3dtiB/FtDAIpyKSVXqdpRud/deicObLq75ublwP4vh5PWBYa
CacIrvLr03y2e0YSUzm+MtZN1ra3amevGcWvP001waT8Kdbt9Fv05aQw2jKQX2bW
S1C2y9E3
=thRn
-----END PGP SIGNATURE-----


?