“guix_ copy” to host with daemon listening on TCP fails

DoneSubmitted by Ricardo Wurmus.
Details
4 participants
  • Bone Baboon
  • Ludovic Courtès
  • Ricardo Wurmus
  • Simon Streit
Owner
unassigned
Severity
normal
R
R
Ricardo Wurmus wrote on 5 May 18:04 +0200
“guix copy” to host with daemon listening on TCP fails
(address . bug-guix@gnu.org)
87a6p9rwu7.fsf@elephly.net
There are two hosts running Guix. The target host runs “guix-daemon” with “--listen=0.0.0.0:9999”; it does not listen on a local socket file. Trying to copy store items to the target host fails with this backtrace:
Toggle snippet (32 lines)[me@here:~] (1028) $ guix copy --to=there /gnu/store/…-profileBacktrace: 12 (primitive-load "/gnu/store/9qjkzhlwj2792iczsyfx9n7c23g…")In guix/ui.scm: 2165:12 11 (run-guix-command _ . _)In ice-9/boot-9.scm: 1736:10 10 (with-exception-handler _ _ #:unwind? _ # _) 1731:15 9 (with-exception-handler #<procedure 7fa3ef9d0150 at ic…> …) 1736:10 8 (with-exception-handler _ _ #:unwind? _ # _)In guix/store.scm: 636:37 7 (thunk) 1305:8 6 (call-with-build-handler _ _) 1305:8 5 (call-with-build-handler #<procedure 7fa3ef9db5d0 at g…> …)In guix/status.scm: 799:4 4 (call-with-status-report _ _)In guix/scripts/copy.scm: 76:25 3 (_)In guix/ssh.scm: 485:39 2 (send-files #<store-connection 256.99 7fa3ef9d6f00> _ #f …)In ice-9/boot-9.scm: 1669:16 1 (raise-exception _ #:continuable? _) 1669:16 0 (raise-exception _ #:continuable? _)
ice-9/boot-9.scm:1669:16: In procedure raise-exception:In procedure struct-vtable: Wrong type argument in position 1 (expecting struct): #f
The (guix ssh) appears to assume that the remote daemon listens on a socket file. Telling the daemon to also listen on a socket file works around this problem<.
-- Ricardo
L
L
Ludovic Courtès wrote on 5 May 23:32 +0200
[PATCH 2/4] ssh: 'connect-to-remote-daemon' raises a nicer message upon error.
(address . 48240@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210505213205.28519-2-ludo@gnu.org
* guix/ssh.scm (connect-to-remote-daemon): Catch'store-connection-error?' and rethrow.--- guix/ssh.scm | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
Toggle diff (22 lines)diff --git a/guix/ssh.scm b/guix/ssh.scmindex 457d1890f9..b39b90f733 100644--- a/guix/ssh.scm+++ b/guix/ssh.scm@@ -302,8 +302,13 @@ EXP never returns or calls 'primitive-exit' when it's done." "/var/guix/daemon-socket/socket")) "Connect to the remote build daemon listening on SOCKET-NAME over SESSION, an SSH session. Return a <store-connection> object."- (open-connection #:port (remote-daemon-channel session socket-name)))-+ (guard (c ((store-connection-error? c)+ ;; Raise a more focused error condition.+ (raise (formatted-message+ (G_ "failed to connect over SSH to daemon at '~a', socket ~a")+ (session-get session 'host)+ socket-name))))+ (open-connection #:port (remote-daemon-channel session socket-name)))) (define (store-import-channel session) "Return an output port to which archives to be exported to SESSION's store-- 2.31.1
L
L
Ludovic Courtès wrote on 5 May 23:32 +0200
[PATCH 1/4] store: 'open-connection' never returns #f.
(address . 48240@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210505213205.28519-1-ludo@gnu.org
* guix/store.scm (open-connection)[handshake-error]: New procedure.Call it in code paths that would previously return #f.--- guix/store.scm | 66 +++++++++++++++++++++++++++----------------------- 1 file changed, 36 insertions(+), 30 deletions(-)
Toggle diff (89 lines)diff --git a/guix/store.scm b/guix/store.scmindex 37ae6cfedd..315ae4cdce 100644--- a/guix/store.scm+++ b/guix/store.scm@@ -548,13 +548,16 @@ space on the file system so that the garbage collector can still operate, should the disk become full. When CPU-AFFINITY is true, it must be an integer corresponding to an OS-level CPU number to which the daemon's worker process for this connection will be pinned. Return a server object."+ (define (handshake-error)+ (raise (condition+ (&store-connection-error (file (or port uri))+ (errno EPROTO))+ (&message (message "build daemon handshake failed")))))+ (guard (c ((nar-error? c) ;; One of the 'write-' or 'read-' calls below failed, but this is ;; really a connection error.- (raise (condition- (&store-connection-error (file (or port uri))- (errno EPROTO))- (&message (message "build daemon handshake failed"))))))+ (handshake-error))) (let*-values (((port) (or port (connect-to-daemon uri))) ((output flush)@@ -562,32 +565,35 @@ for this connection will be pinned. Return a server object." (make-bytevector 8192)))) (write-int %worker-magic-1 port) (let ((r (read-int port)))- (and (= r %worker-magic-2)- (let ((v (read-int port)))- (and (= (protocol-major %protocol-version)- (protocol-major v))- (begin- (write-int %protocol-version port)- (when (>= (protocol-minor v) 14)- (write-int (if cpu-affinity 1 0) port)- (when cpu-affinity- (write-int cpu-affinity port)))- (when (>= (protocol-minor v) 11)- (write-int (if reserve-space? 1 0) port))- (letrec* ((built-in-builders- (delay (%built-in-builders conn)))- (conn- (%make-store-connection port- (protocol-major v)- (protocol-minor v)- output flush- (make-hash-table 100)- (make-hash-table 100)- vlist-null- built-in-builders)))- (let loop ((done? (process-stderr conn)))- (or done? (process-stderr conn)))- conn)))))))))+ (unless (= r %worker-magic-2)+ (handshake-error))++ (let ((v (read-int port)))+ (unless (= (protocol-major %protocol-version)+ (protocol-major v))+ (handshake-error))++ (write-int %protocol-version port)+ (when (>= (protocol-minor v) 14)+ (write-int (if cpu-affinity 1 0) port)+ (when cpu-affinity+ (write-int cpu-affinity port)))+ (when (>= (protocol-minor v) 11)+ (write-int (if reserve-space? 1 0) port))+ (letrec* ((built-in-builders+ (delay (%built-in-builders conn)))+ (conn+ (%make-store-connection port+ (protocol-major v)+ (protocol-minor v)+ output flush+ (make-hash-table 100)+ (make-hash-table 100)+ vlist-null+ built-in-builders)))+ (let loop ((done? (process-stderr conn)))+ (or done? (process-stderr conn)))+ conn)))))) (define* (port->connection port #:key (version %protocol-version))-- 2.31.1
L
L
Ludovic Courtès wrote on 5 May 23:32 +0200
[PATCH 3/4] store: Export 'connect-to-daemon'.
(address . 48240@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210505213205.28519-3-ludo@gnu.org
* guix/store.scm (connect-to-daemon): Make public. Improve docstring.--- guix/store.scm | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
Toggle diff (26 lines)diff --git a/guix/store.scm b/guix/store.scmindex 315ae4cdce..9d706ae590 100644--- a/guix/store.scm+++ b/guix/store.scm@@ -90,6 +90,7 @@ hash-algo build-mode + connect-to-daemon open-connection port->connection close-connection@@ -501,7 +502,10 @@ (define (connect-to-daemon uri) "Connect to the daemon at URI, a string that may be an actual URI or a file-name."+name, and return an input/output port.++This is a low-level procedure that does not perform the initial handshake with+the daemon. Use 'open-connection' for that." (define (not-supported) (raise (condition (&store-connection-error (file uri)-- 2.31.1
L
L
Ludovic Courtès wrote on 5 May 23:32 +0200
[PATCH 4/4] ssh: Honor GUIX_DAEMON_SOCKET on the target machine.
(address . 48240@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210505213205.28519-4-ludo@gnu.org
Fixes https://bugs.gnu.org/48240.Reported by Ricardo Wurmus <rekado@elephly.net>.
* guix/ssh.scm (remote-daemon-channel)[redirect]: Define'connect-to-daemon'. Use the same-named procedure from (guix store)when available, and honor GUIX_DAEMON_SOCKET.--- guix/ssh.scm | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-)
Toggle diff (46 lines)diff --git a/guix/ssh.scm b/guix/ssh.scmindex b39b90f733..77a9732ce5 100644--- a/guix/ssh.scm+++ b/guix/ssh.scm@@ -1,5 +1,5 @@ ;;; GNU Guix --- Functional package management for GNU-;;; Copyright © 2016, 2017, 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org>+;;; Copyright © 2016, 2017, 2018, 2019, 2020, 2021, 2021 Ludovic Courtès <ludo@gnu.org> ;;; ;;; This file is part of GNU Guix. ;;;@@ -253,7 +253,22 @@ EXP never returns or calls 'primitive-exit' when it's done." (use-modules (ice-9 match) (rnrs io ports) (rnrs bytevectors)) - (let ((sock (socket AF_UNIX SOCK_STREAM 0))+ (define connect-to-daemon+ ;; XXX: 'connect-to-daemon' used to be private and before that it+ ;; didn't even exist, hence these shenanigans.+ (let ((connect-to-daemon+ (false-if-exception (module-ref (resolve-module '(guix store))+ 'connect-to-daemon))))+ (lambda (uri)+ (if connect-to-daemon+ (connect-to-daemon uri)+ (let ((sock (socket AF_UNIX SOCK_STREAM 0)))+ (connect sock AF_UNIX ,socket-name)+ sock)))))++ ;; Use 'connect-to-daemon' to honor GUIX_DAEMON_SOCKET.+ (let ((sock (connect-to-daemon (or (getenv "GUIX_DAEMON_SOCKET")+ socket-name))) (stdin (current-input-port)) (stdout (current-output-port)) (select* (lambda (read write except)@@ -272,8 +287,6 @@ EXP never returns or calls 'primitive-exit' when it's done." (setvbuf stdin 'block 65536) (setvbuf sock 'block 65536) - (connect sock AF_UNIX ,socket-name)- (let loop () (match (select* (list stdin sock) '() '()) ((reads () ())-- 2.31.1
L
L
Ludovic Courtès wrote on 8 May 15:10 +0200
Re: bug#48240: “guix copy” to host with daemon listening on TCP fails
(name . Ricardo Wurmus)(address . rekado@elephly.net)(address . 48240-done@debbugs.gnu.org)
875yzt8j74.fsf@gnu.org
Hi,
Ricardo Wurmus <rekado@elephly.net> skribis:
Toggle quote (5 lines)> There are two hosts running Guix. The target host runs > “guix-daemon” with “--listen=0.0.0.0:9999”; it does not listen on > a local socket file. Trying to copy store items to the target > host fails with this backtrace:
I pushed the four patches as 3270308eebe82075d2f02517c5a2b1599928495c.
Let me know if anything’s amiss!
Thanks,Ludo’.
Closed
S
S
Simon Streit wrote on 11 May 10:43 +0200
“guix_ copy” to host with daemon listening on TCP fails
(address . 48240@debbugs.gnu.org)
yguv97pslsh.fsf@netpanic.org
Hello!
After reinstalling my system last night, I run into this problem too,that I couldn't offload.
Then it was suggested I checkout to commitdd14678b9b9843be20e2bbb98ceb30d2433dab82 and force downgrade my newsystem. While doing so, I noticed that guix-daemon would still offload,while if I'd type in `guix offload test`, I'd get a response:
Toggle snippet (6 lines)guix offload: testing 1 build machines defined in '/etc/guix/machines.scm'...guix offload: Guix is usable on 'host' (test returned "/gnu/store/883yjkl46dxw9mzykykmbs0yzwyxm17z-test")guix offload: 'host' is running GNU Guile 3.0.5guix offload: error: failed to connect over SSH to daemon at 'host', socket /var/guix/daemon-socket/socket
Anyway, back to this old commit offloading works for all users.
The commit with this broken behaviour is at:87b4b0e4385149b40ee87ae2d57712679452746b.

CheersSimon
L
L
Ludovic Courtès wrote on 11 May 11:56 +0200
Re: bug#48240: “guix copy” to host with daemon listening on TCP fails
(name . Simon Streit)(address . simon@netpanic.org)(address . 48240@debbugs.gnu.org)
87pmxx386l.fsf_-_@gnu.org
Hi,
Simon Streit <simon@netpanic.org> skribis:
Toggle quote (12 lines)> Then it was suggested I checkout to commit> dd14678b9b9843be20e2bbb98ceb30d2433dab82 and force downgrade my new> system. While doing so, I noticed that guix-daemon would still offload,> while if I'd type in `guix offload test`, I'd get a response:>> guix offload: testing 1 build machines defined in '/etc/guix/machines.scm'...> guix offload: Guix is usable on 'host' (test returned "/gnu/store/883yjkl46dxw9mzykykmbs0yzwyxm17z-test")> guix offload: 'host' is running GNU Guile 3.0.5> guix offload: error: failed to connect over SSH to daemon at 'host', socket /var/guix/daemon-socket/socket>> Anyway, back to this old commit offloading works for all users.
Is the socket file name displayed above correct? Or did you specifysomething else in the <build-machine> record?
Is the ‘GUIX_DAEMON_SOCKET’ environment variable defined on thatmachine?
How do you run guix-daemon on the head node? The patches discussed herehaven’t made it into the ‘guix’ package yet AFAIK.
Thanks for reporting the issue!
Ludo’.
L
L
Ludovic Courtès wrote on 11 May 12:52 +0200
(name . Simon Streit)(address . simon@netpanic.org)(address . 48240@debbugs.gnu.org)
87cztx1r1d.fsf_-_@gnu.org
Hi,
Simon Streit <simon@netpanic.org> skribis:
Toggle quote (15 lines)> Then it was suggested I checkout to commit> dd14678b9b9843be20e2bbb98ceb30d2433dab82 and force downgrade my new> system. While doing so, I noticed that guix-daemon would still offload,> while if I'd type in `guix offload test`, I'd get a response:>> guix offload: testing 1 build machines defined in '/etc/guix/machines.scm'...> guix offload: Guix is usable on 'host' (test returned "/gnu/store/883yjkl46dxw9mzykykmbs0yzwyxm17z-test")> guix offload: 'host' is running GNU Guile 3.0.5> guix offload: error: failed to connect over SSH to daemon at 'host', socket /var/guix/daemon-socket/socket>> Anyway, back to this old commit offloading works for all users. >> The commit with this broken behaviour is at:> 87b4b0e4385149b40ee87ae2d57712679452746b.
Fixed in da28efef36af8925bcd9e40a81cbf552cf8c2d02. Let me know if itworks for you!
Thanks,Ludo’.
B
B
Bone Baboon wrote on 11 May 16:01 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
87fsytfjxt.fsf@disroot.org
Ludovic Courtès writes:
Toggle quote (3 lines)> Fixed in da28efef36af8925bcd9e40a81cbf552cf8c2d02. Let me know if it> works for you!
This commit appears to have fixed a problem with guix copy that I washaving yesterday. I was getting this error "guix copy: error: failed toconnect over SSH to daemon at '<ip-address>', socket/var/guix/daemon-socket/socket".
Now I can successfully run guix copy.
L
L
Ludovic Courtès wrote on 11 May 23:22 +0200
(name . Bone Baboon)(address . bone.baboon@disroot.org)
87cztxufrz.fsf@gnu.org
Bone Baboon <bone.baboon@disroot.org> skribis:
Toggle quote (11 lines)> Ludovic Courtès writes:>> Fixed in da28efef36af8925bcd9e40a81cbf552cf8c2d02. Let me know if it>> works for you!>> This commit appears to have fixed a problem with guix copy that I was> having yesterday. I was getting this error "guix copy: error: failed to> connect over SSH to daemon at '<ip-address>', socket> /var/guix/daemon-socket/socket".>> Now I can successfully run guix copy.
Thanks for confirming!
S
S
Simon Streit wrote on 12 May 09:48 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 48240@debbugs.gnu.org)
yguzgx0mlzf.fsf@netpanic.org
Ludovic Courtès <ludo@gnu.org> writes:
Toggle quote (6 lines)> Simon Streit <simon@netpanic.org> skribis:>> Anyway, back to this old commit offloading works for all users. >> Is the socket file name displayed above correct? Or did you specify> something else in the <build-machine> record?
No, nothing that I'm aware about. I haven't made any special changes.
Toggle quote (4 lines)>> Is the ‘GUIX_DAEMON_SOCKET’ environment variable defined on that> machine?
No.
Toggle quote (4 lines)>> How do you run guix-daemon on the head node? The patches discussed here> haven’t made it into the ‘guix’ package yet AFAIK.
That is a Guix system, where I've got an extra user with no extra grouppermisions that takes the requests for offloading the clients make.Thinking about it, the host isn't fully updated. Its current checkoutis, or was at the time of reporting to this issue:407e0af6aa465479d08dafb125d06d50109f1822

Cheers!
S
S
Simon Streit wrote on 12 May 09:49 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 48240@debbugs.gnu.org)
yguv97omlx1.fsf@netpanic.org
Ludovic Courtès <ludo@gnu.org> writes:
Toggle quote (3 lines)> Fixed in da28efef36af8925bcd9e40a81cbf552cf8c2d02. Let me know if it> works for you!
I'll try it later. I missed this mail yesterday.

Cheers!
S
S
Simon Streit wrote on 12 May 21:44 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 48240@debbugs.gnu.org)
ygueeeb3ff0.fsf@netpanic.org
Ludovic Courtès <ludo@gnu.org> writes:
Toggle quote (22 lines)> Hi,>> Simon Streit <simon@netpanic.org> skribis:>>> Then it was suggested I checkout to commit>> dd14678b9b9843be20e2bbb98ceb30d2433dab82 and force downgrade my new>> system. While doing so, I noticed that guix-daemon would still offload,>> while if I'd type in `guix offload test`, I'd get a response:>>>> guix offload: testing 1 build machines defined in '/etc/guix/machines.scm'...>> guix offload: Guix is usable on 'host' (test returned "/gnu/store/883yjkl46dxw9mzykykmbs0yzwyxm17z-test")>> guix offload: 'host' is running GNU Guile 3.0.5>> guix offload: error: failed to connect over SSH to daemon at 'host', socket /var/guix/daemon-socket/socket>>>> Anyway, back to this old commit offloading works for all users.>>>> The commit with this broken behaviour is at:>> 87b4b0e4385149b40ee87ae2d57712679452746b.>> Fixed in da28efef36af8925bcd9e40a81cbf552cf8c2d02. Let me know if it> works for you!
Offloading works with this commit! Thanks
Toggle quote (3 lines)>> Thanks,> Ludo’.
?
Your comment

Commenting via the web interface is currently disabled.

To comment on this conversation send email to 48240@debbugs.gnu.org