“guix copy” to host with daemon listening on TCP fails

  • Done
  • quality assurance status badge
Details
4 participants
  • Bone Baboon
  • Ludovic Courtès
  • Ricardo Wurmus
  • Simon Streit
Owner
unassigned
Submitted by
Ricardo Wurmus
Severity
normal
R
R
Ricardo Wurmus wrote on 5 May 2021 18:04
“guix copy” to host with daemon listening on TCP fails
(address . bug-guix@gnu.org)
87a6p9rwu7.fsf@elephly.net
There are two hosts running Guix. The target host runs
“guix-daemon” with “--listen=0.0.0.0:9999”; it does not listen on
a local socket file. Trying to copy store items to the target
host fails with this backtrace:

Toggle snippet (32 lines)
[me@here:~] (1028) $ guix copy --to=there /gnu/store/…-profile
Backtrace:
12 (primitive-load
"/gnu/store/9qjkzhlwj2792iczsyfx9n7c23g…")
In guix/ui.scm:
2165:12 11 (run-guix-command _ . _)
In ice-9/boot-9.scm:
1736:10 10 (with-exception-handler _ _ #:unwind? _ # _)
1731:15 9 (with-exception-handler #<procedure 7fa3ef9d0150 at
ic…> …)
1736:10 8 (with-exception-handler _ _ #:unwind? _ # _)
In guix/store.scm:
636:37 7 (thunk)
1305:8 6 (call-with-build-handler _ _)
1305:8 5 (call-with-build-handler #<procedure 7fa3ef9db5d0 at
g…> …)
In guix/status.scm:
799:4 4 (call-with-status-report _ _)
In guix/scripts/copy.scm:
76:25 3 (_)
In guix/ssh.scm:
485:39 2 (send-files #<store-connection 256.99 7fa3ef9d6f00> _
#f …)
In ice-9/boot-9.scm:
1669:16 1 (raise-exception _ #:continuable? _)
1669:16 0 (raise-exception _ #:continuable? _)

ice-9/boot-9.scm:1669:16: In procedure raise-exception:
In procedure struct-vtable: Wrong type argument in position 1
(expecting struct): #f

The (guix ssh) appears to assume that the remote daemon listens on
a socket file. Telling the daemon to also listen on a socket file
works around this problem<.

--
Ricardo
L
L
Ludovic Courtès wrote on 5 May 2021 23:32
[PATCH 2/4] ssh: 'connect-to-remote-daemon' raises a nicer message upon error.
(address . 48240@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210505213205.28519-2-ludo@gnu.org
* guix/ssh.scm (connect-to-remote-daemon): Catch
'store-connection-error?' and rethrow.
---
guix/ssh.scm | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

Toggle diff (22 lines)
diff --git a/guix/ssh.scm b/guix/ssh.scm
index 457d1890f9..b39b90f733 100644
--- a/guix/ssh.scm
+++ b/guix/ssh.scm
@@ -302,8 +302,13 @@ EXP never returns or calls 'primitive-exit' when it's done."
"/var/guix/daemon-socket/socket"))
"Connect to the remote build daemon listening on SOCKET-NAME over SESSION,
an SSH session. Return a <store-connection> object."
- (open-connection #:port (remote-daemon-channel session socket-name)))
-
+ (guard (c ((store-connection-error? c)
+ ;; Raise a more focused error condition.
+ (raise (formatted-message
+ (G_ "failed to connect over SSH to daemon at '~a', socket ~a")
+ (session-get session 'host)
+ socket-name))))
+ (open-connection #:port (remote-daemon-channel session socket-name))))
(define (store-import-channel session)
"Return an output port to which archives to be exported to SESSION's store
--
2.31.1
L
L
Ludovic Courtès wrote on 5 May 2021 23:32
[PATCH 1/4] store: 'open-connection' never returns #f.
(address . 48240@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210505213205.28519-1-ludo@gnu.org
* guix/store.scm (open-connection)[handshake-error]: New procedure.
Call it in code paths that would previously return #f.
---
guix/store.scm | 66 +++++++++++++++++++++++++++-----------------------
1 file changed, 36 insertions(+), 30 deletions(-)

Toggle diff (89 lines)
diff --git a/guix/store.scm b/guix/store.scm
index 37ae6cfedd..315ae4cdce 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -548,13 +548,16 @@ space on the file system so that the garbage collector can still operate,
should the disk become full. When CPU-AFFINITY is true, it must be an integer
corresponding to an OS-level CPU number to which the daemon's worker process
for this connection will be pinned. Return a server object."
+ (define (handshake-error)
+ (raise (condition
+ (&store-connection-error (file (or port uri))
+ (errno EPROTO))
+ (&message (message "build daemon handshake failed")))))
+
(guard (c ((nar-error? c)
;; One of the 'write-' or 'read-' calls below failed, but this is
;; really a connection error.
- (raise (condition
- (&store-connection-error (file (or port uri))
- (errno EPROTO))
- (&message (message "build daemon handshake failed"))))))
+ (handshake-error)))
(let*-values (((port)
(or port (connect-to-daemon uri)))
((output flush)
@@ -562,32 +565,35 @@ for this connection will be pinned. Return a server object."
(make-bytevector 8192))))
(write-int %worker-magic-1 port)
(let ((r (read-int port)))
- (and (= r %worker-magic-2)
- (let ((v (read-int port)))
- (and (= (protocol-major %protocol-version)
- (protocol-major v))
- (begin
- (write-int %protocol-version port)
- (when (>= (protocol-minor v) 14)
- (write-int (if cpu-affinity 1 0) port)
- (when cpu-affinity
- (write-int cpu-affinity port)))
- (when (>= (protocol-minor v) 11)
- (write-int (if reserve-space? 1 0) port))
- (letrec* ((built-in-builders
- (delay (%built-in-builders conn)))
- (conn
- (%make-store-connection port
- (protocol-major v)
- (protocol-minor v)
- output flush
- (make-hash-table 100)
- (make-hash-table 100)
- vlist-null
- built-in-builders)))
- (let loop ((done? (process-stderr conn)))
- (or done? (process-stderr conn)))
- conn)))))))))
+ (unless (= r %worker-magic-2)
+ (handshake-error))
+
+ (let ((v (read-int port)))
+ (unless (= (protocol-major %protocol-version)
+ (protocol-major v))
+ (handshake-error))
+
+ (write-int %protocol-version port)
+ (when (>= (protocol-minor v) 14)
+ (write-int (if cpu-affinity 1 0) port)
+ (when cpu-affinity
+ (write-int cpu-affinity port)))
+ (when (>= (protocol-minor v) 11)
+ (write-int (if reserve-space? 1 0) port))
+ (letrec* ((built-in-builders
+ (delay (%built-in-builders conn)))
+ (conn
+ (%make-store-connection port
+ (protocol-major v)
+ (protocol-minor v)
+ output flush
+ (make-hash-table 100)
+ (make-hash-table 100)
+ vlist-null
+ built-in-builders)))
+ (let loop ((done? (process-stderr conn)))
+ (or done? (process-stderr conn)))
+ conn))))))
(define* (port->connection port
#:key (version %protocol-version))
--
2.31.1
L
L
Ludovic Courtès wrote on 5 May 2021 23:32
[PATCH 3/4] store: Export 'connect-to-daemon'.
(address . 48240@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210505213205.28519-3-ludo@gnu.org
* guix/store.scm (connect-to-daemon): Make public. Improve docstring.
---
guix/store.scm | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

Toggle diff (26 lines)
diff --git a/guix/store.scm b/guix/store.scm
index 315ae4cdce..9d706ae590 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -90,6 +90,7 @@
hash-algo
build-mode
+ connect-to-daemon
open-connection
port->connection
close-connection
@@ -501,7 +502,10 @@
(define (connect-to-daemon uri)
"Connect to the daemon at URI, a string that may be an actual URI or a file
-name."
+name, and return an input/output port.
+
+This is a low-level procedure that does not perform the initial handshake with
+the daemon. Use 'open-connection' for that."
(define (not-supported)
(raise (condition (&store-connection-error
(file uri)
--
2.31.1
L
L
Ludovic Courtès wrote on 5 May 2021 23:32
[PATCH 4/4] ssh: Honor GUIX_DAEMON_SOCKET on the target machine.
(address . 48240@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210505213205.28519-4-ludo@gnu.org
Reported by Ricardo Wurmus <rekado@elephly.net>.

* guix/ssh.scm (remote-daemon-channel)[redirect]: Define
'connect-to-daemon'. Use the same-named procedure from (guix store)
when available, and honor GUIX_DAEMON_SOCKET.
---
guix/ssh.scm | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)

Toggle diff (46 lines)
diff --git a/guix/ssh.scm b/guix/ssh.scm
index b39b90f733..77a9732ce5 100644
--- a/guix/ssh.scm
+++ b/guix/ssh.scm
@@ -1,5 +1,5 @@
;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2016, 2017, 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2016, 2017, 2018, 2019, 2020, 2021, 2021 Ludovic Courtès <ludo@gnu.org>
;;;
;;; This file is part of GNU Guix.
;;;
@@ -253,7 +253,22 @@ EXP never returns or calls 'primitive-exit' when it's done."
(use-modules (ice-9 match) (rnrs io ports)
(rnrs bytevectors))
- (let ((sock (socket AF_UNIX SOCK_STREAM 0))
+ (define connect-to-daemon
+ ;; XXX: 'connect-to-daemon' used to be private and before that it
+ ;; didn't even exist, hence these shenanigans.
+ (let ((connect-to-daemon
+ (false-if-exception (module-ref (resolve-module '(guix store))
+ 'connect-to-daemon))))
+ (lambda (uri)
+ (if connect-to-daemon
+ (connect-to-daemon uri)
+ (let ((sock (socket AF_UNIX SOCK_STREAM 0)))
+ (connect sock AF_UNIX ,socket-name)
+ sock)))))
+
+ ;; Use 'connect-to-daemon' to honor GUIX_DAEMON_SOCKET.
+ (let ((sock (connect-to-daemon (or (getenv "GUIX_DAEMON_SOCKET")
+ socket-name)))
(stdin (current-input-port))
(stdout (current-output-port))
(select* (lambda (read write except)
@@ -272,8 +287,6 @@ EXP never returns or calls 'primitive-exit' when it's done."
(setvbuf stdin 'block 65536)
(setvbuf sock 'block 65536)
- (connect sock AF_UNIX ,socket-name)
-
(let loop ()
(match (select* (list stdin sock) '() '())
((reads () ())
--
2.31.1
L
L
Ludovic Courtès wrote on 8 May 2021 15:10
Re: bug#48240: “guix copy” to host with daemon listening on TCP fails
(name . Ricardo Wurmus)(address . rekado@elephly.net)(address . 48240-done@debbugs.gnu.org)
875yzt8j74.fsf@gnu.org
Hi,

Ricardo Wurmus <rekado@elephly.net> skribis:

Toggle quote (5 lines)
> There are two hosts running Guix. The target host runs
> “guix-daemon” with “--listen=0.0.0.0:9999”; it does not listen on
> a local socket file. Trying to copy store items to the target
> host fails with this backtrace:

I pushed the four patches as 3270308eebe82075d2f02517c5a2b1599928495c.

Let me know if anything’s amiss!

Thanks,
Ludo’.
Closed
S
S
Simon Streit wrote on 11 May 2021 10:43
“guix_ copy” to host with daemon listening on TCP fails
(address . 48240@debbugs.gnu.org)
yguv97pslsh.fsf@netpanic.org
Hello!

After reinstalling my system last night, I run into this problem too,
that I couldn't offload.

Then it was suggested I checkout to commit
dd14678b9b9843be20e2bbb98ceb30d2433dab82 and force downgrade my new
system. While doing so, I noticed that guix-daemon would still offload,
while if I'd type in `guix offload test`, I'd get a response:
Toggle snippet (6 lines)
guix offload: testing 1 build machines defined in '/etc/guix/machines.scm'...
guix offload: Guix is usable on 'host' (test returned "/gnu/store/883yjkl46dxw9mzykykmbs0yzwyxm17z-test")
guix offload: 'host' is running GNU Guile 3.0.5
guix offload: error: failed to connect over SSH to daemon at 'host', socket /var/guix/daemon-socket/socket

Anyway, back to this old commit offloading works for all users.

The commit with this broken behaviour is at:
87b4b0e4385149b40ee87ae2d57712679452746b.


Cheers
Simon
L
L
Ludovic Courtès wrote on 11 May 2021 11:56
Re: bug#48240: “guix copy” to host with daemon listening on TCP fails
(name . Simon Streit)(address . simon@netpanic.org)(address . 48240@debbugs.gnu.org)
87pmxx386l.fsf_-_@gnu.org
Hi,

Simon Streit <simon@netpanic.org> skribis:

Toggle quote (12 lines)
> Then it was suggested I checkout to commit
> dd14678b9b9843be20e2bbb98ceb30d2433dab82 and force downgrade my new
> system. While doing so, I noticed that guix-daemon would still offload,
> while if I'd type in `guix offload test`, I'd get a response:
>
> guix offload: testing 1 build machines defined in '/etc/guix/machines.scm'...
> guix offload: Guix is usable on 'host' (test returned "/gnu/store/883yjkl46dxw9mzykykmbs0yzwyxm17z-test")
> guix offload: 'host' is running GNU Guile 3.0.5
> guix offload: error: failed to connect over SSH to daemon at 'host', socket /var/guix/daemon-socket/socket
>
> Anyway, back to this old commit offloading works for all users.

Is the socket file name displayed above correct? Or did you specify
something else in the <build-machine> record?

Is the ‘GUIX_DAEMON_SOCKET’ environment variable defined on that
machine?

How do you run guix-daemon on the head node? The patches discussed here
haven’t made it into the ‘guix’ package yet AFAIK.

Thanks for reporting the issue!

Ludo’.
L
L
Ludovic Courtès wrote on 11 May 2021 12:52
(name . Simon Streit)(address . simon@netpanic.org)(address . 48240@debbugs.gnu.org)
87cztx1r1d.fsf_-_@gnu.org
Hi,

Simon Streit <simon@netpanic.org> skribis:

Toggle quote (15 lines)
> Then it was suggested I checkout to commit
> dd14678b9b9843be20e2bbb98ceb30d2433dab82 and force downgrade my new
> system. While doing so, I noticed that guix-daemon would still offload,
> while if I'd type in `guix offload test`, I'd get a response:
>
> guix offload: testing 1 build machines defined in '/etc/guix/machines.scm'...
> guix offload: Guix is usable on 'host' (test returned "/gnu/store/883yjkl46dxw9mzykykmbs0yzwyxm17z-test")
> guix offload: 'host' is running GNU Guile 3.0.5
> guix offload: error: failed to connect over SSH to daemon at 'host', socket /var/guix/daemon-socket/socket
>
> Anyway, back to this old commit offloading works for all users.
>
> The commit with this broken behaviour is at:
> 87b4b0e4385149b40ee87ae2d57712679452746b.

Fixed in da28efef36af8925bcd9e40a81cbf552cf8c2d02. Let me know if it
works for you!

Thanks,
Ludo’.
B
B
Bone Baboon wrote on 11 May 2021 16:01
(name . Ludovic Courtès)(address . ludo@gnu.org)
87fsytfjxt.fsf@disroot.org
Ludovic Courtès writes:
Toggle quote (3 lines)
> Fixed in da28efef36af8925bcd9e40a81cbf552cf8c2d02. Let me know if it
> works for you!

This commit appears to have fixed a problem with guix copy that I was
having yesterday. I was getting this error "guix copy: error: failed to
connect over SSH to daemon at '<ip-address>', socket
/var/guix/daemon-socket/socket".

Now I can successfully run guix copy.
L
L
Ludovic Courtès wrote on 11 May 2021 23:22
(name . Bone Baboon)(address . bone.baboon@disroot.org)
87cztxufrz.fsf@gnu.org
Bone Baboon <bone.baboon@disroot.org> skribis:

Toggle quote (11 lines)
> Ludovic Courtès writes:
>> Fixed in da28efef36af8925bcd9e40a81cbf552cf8c2d02. Let me know if it
>> works for you!
>
> This commit appears to have fixed a problem with guix copy that I was
> having yesterday. I was getting this error "guix copy: error: failed to
> connect over SSH to daemon at '<ip-address>', socket
> /var/guix/daemon-socket/socket".
>
> Now I can successfully run guix copy.

Thanks for confirming!
S
S
Simon Streit wrote on 12 May 2021 09:48
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 48240@debbugs.gnu.org)
yguzgx0mlzf.fsf@netpanic.org
Ludovic Courtès <ludo@gnu.org> writes:
Toggle quote (6 lines)
> Simon Streit <simon@netpanic.org> skribis:
>> Anyway, back to this old commit offloading works for all users.
>
> Is the socket file name displayed above correct? Or did you specify
> something else in the <build-machine> record?

No, nothing that I'm aware about. I haven't made any special changes.
Toggle quote (4 lines)
>
> Is the ‘GUIX_DAEMON_SOCKET’ environment variable defined on that
> machine?

No.
Toggle quote (4 lines)
>
> How do you run guix-daemon on the head node? The patches discussed here
> haven’t made it into the ‘guix’ package yet AFAIK.

That is a Guix system, where I've got an extra user with no extra group
permisions that takes the requests for offloading the clients make.
Thinking about it, the host isn't fully updated. Its current checkout
is, or was at the time of reporting to this issue:
407e0af6aa465479d08dafb125d06d50109f1822


Cheers!
S
S
Simon Streit wrote on 12 May 2021 09:49
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 48240@debbugs.gnu.org)
yguv97omlx1.fsf@netpanic.org
Ludovic Courtès <ludo@gnu.org> writes:
Toggle quote (3 lines)
> Fixed in da28efef36af8925bcd9e40a81cbf552cf8c2d02. Let me know if it
> works for you!

I'll try it later. I missed this mail yesterday.


Cheers!
S
S
Simon Streit wrote on 12 May 2021 21:44
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 48240@debbugs.gnu.org)
ygueeeb3ff0.fsf@netpanic.org
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (22 lines)
> Hi,
>
> Simon Streit <simon@netpanic.org> skribis:
>
>> Then it was suggested I checkout to commit
>> dd14678b9b9843be20e2bbb98ceb30d2433dab82 and force downgrade my new
>> system. While doing so, I noticed that guix-daemon would still offload,
>> while if I'd type in `guix offload test`, I'd get a response:
>>
>> guix offload: testing 1 build machines defined in '/etc/guix/machines.scm'...
>> guix offload: Guix is usable on 'host' (test returned "/gnu/store/883yjkl46dxw9mzykykmbs0yzwyxm17z-test")
>> guix offload: 'host' is running GNU Guile 3.0.5
>> guix offload: error: failed to connect over SSH to daemon at 'host', socket /var/guix/daemon-socket/socket
>>
>> Anyway, back to this old commit offloading works for all users.
>>
>> The commit with this broken behaviour is at:
>> 87b4b0e4385149b40ee87ae2d57712679452746b.
>
> Fixed in da28efef36af8925bcd9e40a81cbf552cf8c2d02. Let me know if it
> works for you!

Offloading works with this commit! Thanks
Toggle quote (3 lines)
>
> Thanks,
> Ludo’.
?