[PATCH] store: Use a non-blocking socket for store connections.

  • Open
  • quality assurance status badge
Details
2 participants
  • Ludovic Courtès
  • Christopher Baines
Owner
unassigned
Submitted by
Christopher Baines
Severity
normal
C
C
Christopher Baines wrote on 17 Nov 2023 19:05
(address . guix-patches@gnu.org)
460cdfa67b473ea2f1593668b2d9d0fd159378d0.1700244314.git.mail@cbaines.net
For some applications, it's important to do this here rather than just making
the socket non-blocking after the connection is established because there can
be I/O on the socket that will block during the handshake.

I've noticed this blocking during the handshake causing issues in the build
coordinator for example.

* guix/store.scm (open-unix-domain-socket, open-inet-socket): Pass
SOCK_NONBLOCK when calling socket.

Change-Id: I8225762b78448bc1f7b698c8de5d736e13f577bf
---
guix/store.scm | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

Toggle diff (29 lines)
diff --git a/guix/store.scm b/guix/store.scm
index f8e77b2cd9..216be98c05 100644
--- a/guix/store.scm
+++ b/guix/store.scm
@@ -460,7 +460,9 @@ (define (open-unix-domain-socket file)
'&store-connection-error' upon error."
(let ((s (with-fluids ((%default-port-encoding #f))
;; This trick allows use of the `scm_c_read' optimization.
- (socket PF_UNIX (logior SOCK_STREAM SOCK_CLOEXEC) 0)))
+ (socket PF_UNIX
+ (logior SOCK_STREAM SOCK_CLOEXEC SOCK_NONBLOCK)
+ 0)))
(a (make-socket-address PF_UNIX file)))
(system-error-to-connection-error file
@@ -488,7 +490,8 @@ (define (open-inet-socket host port)
((ai rest ...)
(let ((s (socket (addrinfo:fam ai)
;; TCP/IP only
- (logior SOCK_STREAM SOCK_CLOEXEC) IPPROTO_IP)))
+ (logior SOCK_STREAM SOCK_CLOEXEC SOCK_NONBLOCK)
+ IPPROTO_IP)))
(catch 'system-error
(lambda ()

base-commit: e35b7c5386c1bfacf47ed31bac9b503373dd26fc
--
2.41.0
L
L
Ludovic Courtès wrote on 26 Nov 2023 23:16
(name . Christopher Baines)(address . mail@cbaines.net)
87fs0s17op.fsf@gnu.org
Hi Christopher,

Christopher Baines <mail@cbaines.net> skribis:

Toggle quote (12 lines)
> For some applications, it's important to do this here rather than just making
> the socket non-blocking after the connection is established because there can
> be I/O on the socket that will block during the handshake.
>
> I've noticed this blocking during the handshake causing issues in the build
> coordinator for example.
>
> * guix/store.scm (open-unix-domain-socket, open-inet-socket): Pass
> SOCK_NONBLOCK when calling socket.
>
> Change-Id: I8225762b78448bc1f7b698c8de5d736e13f577bf

I feel we should really discuss on Guix + Fibers since we’ve apparently
been going through the exact same set of issues. :-)

(The other thing that comes to mind is the resource pool!)

Toggle quote (10 lines)
> +++ b/guix/store.scm
> @@ -460,7 +460,9 @@ (define (open-unix-domain-socket file)
> '&store-connection-error' upon error."
> (let ((s (with-fluids ((%default-port-encoding #f))
> ;; This trick allows use of the `scm_c_read' optimization.
> - (socket PF_UNIX (logior SOCK_STREAM SOCK_CLOEXEC) 0)))
> + (socket PF_UNIX
> + (logior SOCK_STREAM SOCK_CLOEXEC SOCK_NONBLOCK)
> + 0)))

We cannot do this here because callers have to be prepared to deal with
non-blocking sockets, and that’s not the case in Guix itself.

In Cuirass, I have this:

Toggle snippet (25 lines)
(define (non-blocking-port port)
"Make PORT non-blocking and return it."
(let ((flags (fcntl port F_GETFL)))
(when (zero? (logand O_NONBLOCK flags))
(fcntl port F_SETFL (logior O_NONBLOCK flags)))
port))

(define (ensure-non-blocking-store-connection store)
"Mark the file descriptor that backs STORE, a <store-connection>, as
O_NONBLOCK."
(match (store-connection-socket store)
((? file-port? port)
(non-blocking-port port))
(_ #f)))

(define-syntax-rule (with-store/non-blocking store exp ...)
"Like 'with-store', bind STORE to a connection to the store, but ensure that
said connection is non-blocking (O_NONBLOCK). Evaluate EXP... in that
context."
(with-store store
(ensure-non-blocking-store-connection store)
(let ()
exp ...)))

Then ‘with-store/non-blocking’ is used in fiberized context where I know
this is fine.

I think it’ll have to remain this way until Guix itself is fiberized or
something.

Does that make sense?

Ludo’.
C
C
Christopher Baines wrote on 27 Nov 2023 10:48
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 67245@debbugs.gnu.org)
875y1nmrwf.fsf@cbaines.net
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (21 lines)
> Hi Christopher,
>
> Christopher Baines <mail@cbaines.net> skribis:
>
>> For some applications, it's important to do this here rather than just making
>> the socket non-blocking after the connection is established because there can
>> be I/O on the socket that will block during the handshake.
>>
>> I've noticed this blocking during the handshake causing issues in the build
>> coordinator for example.
>>
>> * guix/store.scm (open-unix-domain-socket, open-inet-socket): Pass
>> SOCK_NONBLOCK when calling socket.
>>
>> Change-Id: I8225762b78448bc1f7b698c8de5d736e13f577bf
>
> I feel we should really discuss on Guix + Fibers since we’ve apparently
> been going through the exact same set of issues. :-)
>
> (The other thing that comes to mind is the resource pool!)

I'm mostly ignoring these issues then coping the code once you write it
:)

Toggle quote (13 lines)
>> +++ b/guix/store.scm
>> @@ -460,7 +460,9 @@ (define (open-unix-domain-socket file)
>> '&store-connection-error' upon error."
>> (let ((s (with-fluids ((%default-port-encoding #f))
>> ;; This trick allows use of the `scm_c_read' optimization.
>> - (socket PF_UNIX (logior SOCK_STREAM SOCK_CLOEXEC) 0)))
>> + (socket PF_UNIX
>> + (logior SOCK_STREAM SOCK_CLOEXEC SOCK_NONBLOCK)
>> + 0)))
>
> We cannot do this here because callers have to be prepared to deal with
> non-blocking sockets, and that’s not the case in Guix itself.

I can see potential problems for programs outside of Guix which use
suspendable ports, but given Guix doesn't use suspendable ports, this
won't change behaviour, right?

Obviously Guile will be working a bit differently, using poll when it
needs to wait for I/O, but at the scheme level within Guix, things
should be no different.

I tried guix weather with this change, and things seemed fine. Is there
a specific bit of Guix you're concerned about?
-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmVkakBfFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9XcoJhAArMwV2vXM3qOOR01PaYfTHmQb/NcB3XiK
VXiusYS9WB15qLYF8v3qxhXtvhy/4ZBmmTCe9l/adwjqGqoHAeBqSRHNODvCZG3+
KXEclNOe1/7zWFlIkzP3bjHBQr9qOHdEq2NLl+Btyzn/9c8WGYeR0SWCgY4rjDtX
fUicq8czkuAURryhErg1wyDXgHdNwt4mez7sH7Tx3biBSC71PRmc/Cjbn5n3bhM9
cqJ8BWpel198VsDAJpCF+s6PvmfHRMCvuFie0IVl6O3wuvGkMoO6VofMg8i0mlsE
sBh97Cio+ShZRhtqjYP6L50cNnQNwH3WT7QfhG0qMOPJJCje21MRckmyfWiot4pp
BQmYVmyBxzOesKsfWvexuCeQN7BF9XAkFgf+vu2xfCmjxTNdCrqB/6YswVqa4/Z2
oIB0hiWEMhQxC0bCDU2dIExi1LkrXlAWQJPxPYxwaz3VnT3fUy7zHW5Ql2iXFd/r
2Sr8Iy7MvGVpmwyaQA3LgLcDji/h1vxQpzqniZDFaqFPMgdSEcxCG3wKGzZIsPAJ
MtyA5FPYxPHt042bWrGMNdyiVMpBpSZvzfgexCpJi96jikfD9tkWOyEjxgUnk7Jo
4QA03acFYUPxFkIiey6mJOFvzVzeTogFMqe6sV7/Bvz2fWv/dSczx8MbWNaEeoRP
qEmCKTgUzvE=
=iswm
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 30 Nov 2023 22:11
(name . Christopher Baines)(address . mail@cbaines.net)(address . 67245@debbugs.gnu.org)
87fs0nndy9.fsf@gnu.org
Hi Chris,

Christopher Baines <mail@cbaines.net> skribis:

Toggle quote (9 lines)
> Ludovic Courtès <ludo@gnu.org> writes:
>> I feel we should really discuss on Guix + Fibers since we’ve apparently
>> been going through the exact same set of issues. :-)
>>
>> (The other thing that comes to mind is the resource pool!)
>
> I'm mostly ignoring these issues then coping the code once you write it
> :)

Heh, so we’re already in sync maybe, not bad. :-)

Toggle quote (21 lines)
>>> +++ b/guix/store.scm
>>> @@ -460,7 +460,9 @@ (define (open-unix-domain-socket file)
>>> '&store-connection-error' upon error."
>>> (let ((s (with-fluids ((%default-port-encoding #f))
>>> ;; This trick allows use of the `scm_c_read' optimization.
>>> - (socket PF_UNIX (logior SOCK_STREAM SOCK_CLOEXEC) 0)))
>>> + (socket PF_UNIX
>>> + (logior SOCK_STREAM SOCK_CLOEXEC SOCK_NONBLOCK)
>>> + 0)))
>>
>> We cannot do this here because callers have to be prepared to deal with
>> non-blocking sockets, and that’s not the case in Guix itself.
>
> I can see potential problems for programs outside of Guix which use
> suspendable ports, but given Guix doesn't use suspendable ports, this
> won't change behaviour, right?
>
> Obviously Guile will be working a bit differently, using poll when it
> needs to wait for I/O, but at the scheme level within Guix, things
> should be no different.

Hmm yes, I think you’re right.

One issue is if we hand over the file descriptor to something that’s not
Guile. Off the top of my head, this happens with inferiors and in the
‘build’ procedure of ‘build-self.scm’ (well, the process that receives
that file descriptor is Guile, but if it’s older than 3.0 (?), then it
may behave differently.)

So it should be safe indeed, but adds a bit of overhead (hopping via
‘current-{read,write}-waiter’) and needs good testing.

Laziness gives an incentive for the status quo, but I’m not opposed to
the change if we get more confidence (test suite passing, tests with
inferiors and ‘time-machine’, and some more auditing.)

Ludo’.
?