Performance regression in narinfo fetching

  • Done
  • quality assurance status badge
Details
2 participants
  • Ludovic Courtès
  • Christopher Baines
Owner
unassigned
Submitted by
Ludovic Courtès
Severity
important
L
L
Ludovic Courtès wrote on 20 Mar 2021 18:38
(address . bug-guix@gnu.org)
87ft0p67z4.fsf@inria.fr
Hello!

As reported on guix-devel, ‘guix weather’ has become extremely slow.
Specifically, in the narinfo-fetching phase, it runs at 100% CPU, even
though that part should be network-bound (pipelined HTTP GETs).

A profile of the ‘report-server-coverage’ call would show this:

Toggle snippet (13 lines)
% cumulative self
time seconds seconds procedure
62.50 1.06 1.06 fluid-ref*
6.25 0.11 0.11 regexp-exec
3.13 0.05 0.05 ice-9/boot-9.scm:1738:4:throw
2.08 0.04 0.04 string-index
2.08 0.04 0.04 write
1.04 568.08 0.02 ice-9/boot-9.scm:1673:4:with-exception-handler
1.04 0.02 0.02 %read-line
1.04 0.02 0.02 guix/ci.scm:78:0:json->build
1.04 0.02 0.02 string-append

More than half of the time spent in ‘fluid-ref*’—sounds fishy.

Where does that that call come from? There seems to be a single caller,
in boot-9.scm:

(define* (raise-exception exn #:key (continuable? #f))
(define (capture-current-exception-handlers)
;; FIXME: This is quadratic.
(let lp ((depth 0))
(let ((h (fluid-ref* %exception-handler depth)))
(if h
(cons h (lp (1+ depth)))
(list fallback-exception-handler)))))
;; …
)

We must be abusing exceptions somewhere…

Indeed, there’s one place on the hot path where we install exception
handlers: in ‘http-multiple-get’ (from commit
205833b72c5517915a47a50dbe28e7024dc74e57). I don’t think it’s needed,
is it? (But if it is, let’s find another approach, this one is
prohibitively expensive.)

A simple performance test is:

rm -rf ~/.cache/guix/substitute/
time ./pre-inst-env guix weather $(guix package -A|head -500| cut -f1)

After removing this ‘catch’ in ‘http-multiple-get’, the profile is
flatter:

Toggle snippet (12 lines)
% cumulative self
time seconds seconds procedure
8.33 0.07 0.07 string-index
8.33 0.07 0.07 regexp-exec
5.56 0.05 0.05 anon #x154af88
5.56 0.05 0.05 write
5.56 0.05 0.05 string-tokenize
5.56 0.05 0.05 read-char
5.56 0.05 0.05 set-certificate-credentials-x509-trust-data!
5.56 0.05 0.05 %read-line

There’s also this ‘call-with-connection-error-handling’ call in (guix
substitute), around an ‘http-multiple-get’ call, that may not be
justified.

Attached is a diff of the tweaks I made to test this.

WDYT, Chris?

Ludo’.
Attachment: file
C
C
Christopher Baines wrote on 20 Mar 2021 21:32
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 47283@debbugs.gnu.org)
874kh5d0rg.fsf@cbaines.net
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (44 lines)
> As reported on guix-devel, ‘guix weather’ has become extremely slow.
> Specifically, in the narinfo-fetching phase, it runs at 100% CPU, even
> though that part should be network-bound (pipelined HTTP GETs).
>
> A profile of the ‘report-server-coverage’ call would show this:
>
> --8<---------------cut here---------------start------------->8---
> % cumulative self
> time seconds seconds procedure
> 62.50 1.06 1.06 fluid-ref*
> 6.25 0.11 0.11 regexp-exec
> 3.13 0.05 0.05 ice-9/boot-9.scm:1738:4:throw
> 2.08 0.04 0.04 string-index
> 2.08 0.04 0.04 write
> 1.04 568.08 0.02 ice-9/boot-9.scm:1673:4:with-exception-handler
> 1.04 0.02 0.02 %read-line
> 1.04 0.02 0.02 guix/ci.scm:78:0:json->build
> 1.04 0.02 0.02 string-append
> --8<---------------cut here---------------end--------------->8---
>
> More than half of the time spent in ‘fluid-ref*’—sounds fishy.
>
> Where does that that call come from? There seems to be a single caller,
> in boot-9.scm:
>
> (define* (raise-exception exn #:key (continuable? #f))
> (define (capture-current-exception-handlers)
> ;; FIXME: This is quadratic.
> (let lp ((depth 0))
> (let ((h (fluid-ref* %exception-handler depth)))
> (if h
> (cons h (lp (1+ depth)))
> (list fallback-exception-handler)))))
> ;; …
> )
>
> We must be abusing exceptions somewhere…
>
> Indeed, there’s one place on the hot path where we install exception
> handlers: in ‘http-multiple-get’ (from commit
> 205833b72c5517915a47a50dbe28e7024dc74e57). I don’t think it’s needed,
> is it? (But if it is, let’s find another approach, this one is
> prohibitively expensive.)

I think the exception handling has moved around, but I guess the
exceptions that could be caught in http-multiple-get could happen,
right? I am really just guessing here, as Guile doesn't help tell you
about possible exceptions, and I haven't spent enough time to read all
the possible code involved to find out if these are definitely possible.

Toggle quote (29 lines)
> A simple performance test is:
>
> rm -rf ~/.cache/guix/substitute/
> time ./pre-inst-env guix weather $(guix package -A|head -500| cut -f1)
>
> After removing this ‘catch’ in ‘http-multiple-get’, the profile is
> flatter:
>
> --8<---------------cut here---------------start------------->8---
> % cumulative self
> time seconds seconds procedure
> 8.33 0.07 0.07 string-index
> 8.33 0.07 0.07 regexp-exec
> 5.56 0.05 0.05 anon #x154af88
> 5.56 0.05 0.05 write
> 5.56 0.05 0.05 string-tokenize
> 5.56 0.05 0.05 read-char
> 5.56 0.05 0.05 set-certificate-credentials-x509-trust-data!
> 5.56 0.05 0.05 %read-line
> --8<---------------cut here---------------end--------------->8---
>
> There’s also this ‘call-with-connection-error-handling’ call in (guix
> substitute), around an ‘http-multiple-get’ call, that may not be
> justified.
>
> Attached is a diff of the tweaks I made to test this.
>
> WDYT, Chris?

I haven't looked in to this yet, but maybe it would be possible to
adjust the code so that it doesn't perform so badly, but still tries to
handle possible exceptions.

The two ideas I have is to rewrite the (let ...) bit in terms of a fold,
maybe that would perform better, or stop using let for iteration and
setup the exception handling, then process each request, using set! to
update the state. I haven't tested either of these.

It's good to know that Guile exception handling can be excessively
expensive though, I wouldn't have expected it to beat out anything over
the network in terms of the performance penalty.
-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmBWW+NfFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9Xenow//RkiKPMtSiWGvnn/tEVC1d+o/4tK/SrKy
L5QbyF9k/oOCsCGWyZKlgFhrjctcoYs3He4KY9T8ba2xy3YHN3m5Xm9aZcTS7fgH
8AwjiMZ/Edw+qF55+wXYcxGF8ch8SfMib3rRnBH5K0yhQ1KH2wZzDTCp8MdBfMzl
5PwMbhLpKTlNW82ZuGqDGj50/Ca5QMg4XRr7r1ACQR4W8ee0VilF1K/Kp7QTT0B7
ikwOLvxaqsVwlSo7Kyvu3FIiSeIlWXFRjSTtAUSEtBzREoyp+oh+7YGWJ4buaz5x
WAl0/cfs6Blf5Y9MrEaCkHV0kvHzknelWYoR1IBMBAOhwksU1mOKD3I45fhixTYg
E4oNkFN9ug9WPzz522skbWyoHuQH6zRLh1aoXpC2xSydcu7hXSyA+NZ5bpBWIzRI
rMsHhryV/kA7ZhOQKUSbJIdpu3mFYYmuCAeAdK2oO5FF7oRZ3yb4samHebWFMog+
7U71Y+iR0ga12NmUzN1sBxfTFHa2ynlRlyuhF/H4KiFa09Z9xzsHRu02dPmh5u0q
UXz2twCd/pVg0OmnPpFvedw9NaBoxNViawfHRxLz5C8xqpbpXqM1j0OJcvSqfXC3
OJDhsRNLZrnusLFi31ydxkTr+R+f03PPgmJYAG9ay8iIpFKOt9HLDLQeCPReexdF
vBxzyBRUhtM=
=gCEB
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 20 Mar 2021 21:54
control message for bug #47283
(address . control@debbugs.gnu.org)
87y2eh4kbu.fsf@gnu.org
severity 47283 important
quit
C
C
Christopher Baines wrote on 21 Mar 2021 01:48
Re: bug#47283: Performance regression in narinfo fetching
(address . 47283@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
871rc9coxo.fsf@cbaines.net
Christopher Baines <mail@cbaines.net> writes:

Toggle quote (9 lines)
> I haven't looked in to this yet, but maybe it would be possible to
> adjust the code so that it doesn't perform so badly, but still tries to
> handle possible exceptions.
>
> The two ideas I have is to rewrite the (let ...) bit in terms of a fold,
> maybe that would perform better, or stop using let for iteration and
> setup the exception handling, then process each request, using set! to
> update the state. I haven't tested either of these.

I tried something, neither of these things, but just not calling (loop
...) within the catch block. I don't know why this might work, but it
seems to make guix weather much faster.

Here's the patch [1], I've just realised it's broken, as it'll loose the
result value (and use an old one) when the connection is closed. I'll
send a updated patch without this issue in a moment.

-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmBWl8NfFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9XfIpw//QGNGAFecZaN9H8GjP/XTfT7dU2MB2kcB
qS6sIMtgPkBNRLLktK2HkQL9KQq2f8W0hTw+Kh7blzHPnBQwauwUDLukscSXxTH+
AOJMJ7WTOfgQoy2kAT3HLtzG7/Ww76arA6v1TzizttTSXz9dxbbFV9CnNRAtQqcW
r4St0bnT1rfAcW1SHDrbl1Bs2ORxWlxhAjjwRNM6psK0QG/BWzft00JyfbiQT8YF
qwG7zaJwiyOzhCJ2UWnrDlfapax0u4mzObbf+mSctIYynSFk3LJY/wEU35aWkkoZ
coZPBaf2udXN4qzwRb26Yj8JduC6liKjbvzzMQiU7oSZ9pSNqADhFUJYrjblRURV
GVA2h2pSk1fVOk/OikqEuyfPx5Kj0fsbQ1IMtmaGcbO1vNUhxISYRspwuFC1hCYv
Cqw2kJ+0MbqggMANrw2qqszhMYb4zIqYkJDOeLh4rjUNluyzxIV6TWK76B7O7dz1
t6HnI66AigNosEMyH3u8ViATIe3Od1jmz7w7Q62t5QqiPPkKnF4CVd87p4YUrv2z
oQ5piDwgNouAeBklsHNd6mXaetYLXtHg6GHEjBDZUeNRKciJ63+DAzAWd5ojLfVu
cxUqR6Ag7p9zRVlUSIC6BTpKmJD7gC6RtmjeAQeAtDbDUSfo1+2+iq+1QEhcoA65
Xd/zFiFlE0U=
=hdwE
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 21 Mar 2021 22:10
(name . Christopher Baines)(address . mail@cbaines.net)(address . 47283@debbugs.gnu.org)
87o8fc1acv.fsf@gnu.org
Hi!

Christopher Baines <mail@cbaines.net> skribis:

Toggle quote (15 lines)
> Christopher Baines <mail@cbaines.net> writes:
>
>> I haven't looked in to this yet, but maybe it would be possible to
>> adjust the code so that it doesn't perform so badly, but still tries to
>> handle possible exceptions.
>>
>> The two ideas I have is to rewrite the (let ...) bit in terms of a fold,
>> maybe that would perform better, or stop using let for iteration and
>> setup the exception handling, then process each request, using set! to
>> update the state. I haven't tested either of these.
>
> I tried something, neither of these things, but just not calling (loop
> ...) within the catch block. I don't know why this might work, but it
> seems to make guix weather much faster.

Oh yes, that’s also because calling ‘loop’ from within ‘catch’ made it a
non-tail call, so we kept accumulating exception handlers, and the ‘lp’
loop in ‘raise-exception’ would have an ever increasing list of handlers
to traverse.

Toggle quote (6 lines)
> Here's the patch [1], I've just realised it's broken, as it'll loose the
> result value (and use an old one) when the connection is closed. I'll
> send a updated patch without this issue in a moment.
>
> 1: https://issues.guix.gnu.org/47288

OK, thanks. I’ll reply to your other message first. :-)

Ludo’.
L
L
Ludovic Courtès wrote on 21 Mar 2021 22:22
(name . Christopher Baines)(address . mail@cbaines.net)(address . 47283@debbugs.gnu.org)
87czvs19tp.fsf@gnu.org
Christopher Baines <mail@cbaines.net> skribis:

Toggle quote (14 lines)
> Ludovic Courtès <ludo@gnu.org> writes:
>
>> Indeed, there’s one place on the hot path where we install exception
>> handlers: in ‘http-multiple-get’ (from commit
>> 205833b72c5517915a47a50dbe28e7024dc74e57). I don’t think it’s needed,
>> is it? (But if it is, let’s find another approach, this one is
>> prohibitively expensive.)
>
> I think the exception handling has moved around, but I guess the
> exceptions that could be caught in http-multiple-get could happen,
> right? I am really just guessing here, as Guile doesn't help tell you
> about possible exceptions, and I haven't spent enough time to read all
> the possible code involved to find out if these are definitely possible.

Yeah.

Commit 205833b72c5517915a47a50dbe28e7024dc74e57 added a ‘catch’ block
that catches the same things as ‘with-cached-connection’ did (it would
be better to not duplicate it IMO). That includes EPIPE, gnutls-error,
bad-response & co.

Earlier, commit be5a75ebb5988b87b2392e2113f6590f353dd6cd (“substitute:
Reuse connections for '--query'.”) did not add such a ‘catch’ block in
‘http-multiple-get’. Instead, it wrapped its call in ‘do-fetch’ in
‘fetch-narinfos’:
(define (do-fetch uri)
(case (and=> uri uri-scheme)
((http https)
- (let ((requests (map (cut narinfo-request url <>) paths)))
- (match (open-connection-for-uri/maybe uri)
- (#f
- '())
- (port
- (update-progress!)
;; Note: Do not check HTTPS server certificates to avoid depending
;; on the X.509 PKI. We can do it because we authenticate
;; narinfos, which provides a much stronger guarantee.
- (let ((result (http-multiple-get uri
+ (let* ((requests (map (cut narinfo-request url <>) paths))
+ (result (call-with-cached-connection uri
+ (lambda (port)
+ (if port
+ (begin
+ (update-progress!)
+ (http-multiple-get uri
handle-narinfo-response '()
requests
+ #:open-connection
+ open-connection-for-uri/cached
#:verify-certificate? #f
- #:port port)))
This bit is still there in current ‘master’, so I think it’s not
necessary to catch these exceptions in ‘http-multiple-get’ itself, and I
would just remove the ‘catch’ wrap altogether.

WDYT?

Thanks,
Ludo’.
C
C
Christopher Baines wrote on 23 Mar 2021 21:47
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 47283@debbugs.gnu.org)
87k0pxbnsf.fsf@cbaines.net
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (23 lines)
> Christopher Baines <mail@cbaines.net> skribis:
>
>> Ludovic Courtès <ludo@gnu.org> writes:
>>
>>> Indeed, there’s one place on the hot path where we install exception
>>> handlers: in ‘http-multiple-get’ (from commit
>>> 205833b72c5517915a47a50dbe28e7024dc74e57). I don’t think it’s needed,
>>> is it? (But if it is, let’s find another approach, this one is
>>> prohibitively expensive.)
>>
>> I think the exception handling has moved around, but I guess the
>> exceptions that could be caught in http-multiple-get could happen,
>> right? I am really just guessing here, as Guile doesn't help tell you
>> about possible exceptions, and I haven't spent enough time to read all
>> the possible code involved to find out if these are definitely possible.
>
> Yeah.
>
> Commit 205833b72c5517915a47a50dbe28e7024dc74e57 added a ‘catch’ block
> that catches the same things as ‘with-cached-connection’ did (it would
> be better to not duplicate it IMO). That includes EPIPE, gnutls-error,
> bad-response & co.

So, my intention here was to move the error handling, to allow
separating out the connection caching code from the code I wanted to
move out to the (guix substitutes) module. I don't think there's
currently duplication in the error handling for the code path involving
http-multiple-get currently, at least for the exceptions in question
here.

Toggle quote (38 lines)
> Earlier, commit be5a75ebb5988b87b2392e2113f6590f353dd6cd (“substitute:
> Reuse connections for '--query'.”) did not add such a ‘catch’ block in
> ‘http-multiple-get’. Instead, it wrapped its call in ‘do-fetch’ in
> ‘fetch-narinfos’:
>
> (define (do-fetch uri)
> (case (and=> uri uri-scheme)
> ((http https)
> - (let ((requests (map (cut narinfo-request url <>) paths)))
> - (match (open-connection-for-uri/maybe uri)
> - (#f
> - '())
> - (port
> - (update-progress!)
> ;; Note: Do not check HTTPS server certificates to avoid depending
> ;; on the X.509 PKI. We can do it because we authenticate
> ;; narinfos, which provides a much stronger guarantee.
> - (let ((result (http-multiple-get uri
> + (let* ((requests (map (cut narinfo-request url <>) paths))
> + (result (call-with-cached-connection uri
> + (lambda (port)
> + (if port
> + (begin
> + (update-progress!)
> + (http-multiple-get uri
> handle-narinfo-response '()
> requests
> + #:open-connection
> + open-connection-for-uri/cached
> #:verify-certificate? #f
> - #:port port)))
>
> This bit is still there in current ‘master’, so I think it’s not
> necessary to catch these exceptions in ‘http-multiple-get’ itself, and I
> would just remove the ‘catch’ wrap altogether.
>
> WDYT?

I'm not sure what you're referring to as still being there on the master
branch?

Looking at the changes to this particular code path resulting from the
changes I've made recently, starting at lookup-narinfos, before:

- lookup-narinfos calls fetch-narinfos, which calls do-fetch

- call-with-cached-connection is used, which catches a number of
exceptions relating to requests, and will retry PROC once upon a
matching exception

- open-connection-for-uri/maybe is also used, which is like
open-connection-for-uri/cached, except it includes error handling for
establishing connections to substitute servers

- http-multiple-get doesn't include error handling

After:

- lookup-narinfos calls fetch-narinfos, which calls do-fetch

- call-with-connection-error-handling is used, which performs the same
role as the error handling previously within
open-connection-for-uri/maybe, catching exceptions relating to
establishing connections to substitute servers

- http-multiple-get now includes error handling similar to what was
previously done by call-with-cached-connection, although it's more
complicated since it's done with knowledge of what http-multiple-get
is doing

I think that the error handling now in http-multiple-get isn't covered
elsewhere. Moving this error handling back in to fetch-narinfos is
possible, but then we'd be back to handling connection caching in that
code, and avoiding that led to this refactoring in the first place.

Also, apart from the implementation problems, I do think that the error
handling here is better than before. Previously, if you called
lookup-narinfos, and a connection problem occurred, processing all the
requests would start from scratch (as call-with-cached-connection calls
PROC a second time), if a second connection error was to happen, well,
call-with-cached-connection only handles one error, so that won't be
caught.

I think it's possible that http-multiple-get will be making thousands of
requests, running guix weather with no cached results for example. The
error handling in http-multiple-get is smarter than the previous
approach, doing as little as possible again. It's also not limited to
catching one exception.
-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmBaU9BfFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9XdvCxAAtsUzNiHvJL2S0jf1/VJUdqZw9VSb91BM
GnyMd5hXQIe0ONd3r6XkdLdPpe9JT2Lh18U0F6fvJfk+zhDFWIAPZE9PiyweBarf
0ggJH45zr8nxbIrPp/dOuDvuCEhnEyovK0bPdvKQfzkC67ZQdlfkjnPjDKbtactA
l2hgWdtoT7wN0QdwFpa45fQhbP1zHWh5Wd0kItyxSjM3B9xpnoYHP/IryuTUhtVd
jx18SFdPKTHUYwWpESrjRM79tE53IfcChYz7b5toUgDnGSM63nZQHbLbmVcQXLrO
m8rRhTcf3ZbDqg5qQr7RAQfwucO/1C9NIkP6Gjn0Hnk2e/q6fHdhLpjIDwmQy2S2
yIx28JcpKB4ua+ToyVDsEZGuJiacJan5P306B7LjrDVx8BRWF36Xt3PZEV2kHKrd
uFwOlGe829zYsXtp45FnVRd4wHzP9uqDnQoh4f4vyLaoKqB/Wx4UNb0fqIJ4mY5Y
lHIlbXIUpkEAb4ON9jm2D81bWUCFcfLTJj5mkrXer3J8II2pcg/Md7MgmyhXcbOa
zYq5N3TyHaQs5PeWhuS3oXUePjnpxse0oAtH3V0cm8iGa/iggWUmC6+eu7b/X1v/
Fo3RXgu5/Zbx6xvVtrzSiq4s9CZKLh1qhUjSZBlG5SquHg2WJNSKZ3/Grwi+x78s
ITZ9f0QQh4E=
=O8Tt
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 24 Mar 2021 15:51
(name . Christopher Baines)(address . mail@cbaines.net)(address . 47283@debbugs.gnu.org)
87y2eceha4.fsf@gnu.org
Hi Chris,

Christopher Baines <mail@cbaines.net> skribis:

Toggle quote (2 lines)
> Ludovic Courtès <ludo@gnu.org> writes:

[...]

Toggle quote (41 lines)
>> Earlier, commit be5a75ebb5988b87b2392e2113f6590f353dd6cd (“substitute:
>> Reuse connections for '--query'.”) did not add such a ‘catch’ block in
>> ‘http-multiple-get’. Instead, it wrapped its call in ‘do-fetch’ in
>> ‘fetch-narinfos’:
>>
>> (define (do-fetch uri)
>> (case (and=> uri uri-scheme)
>> ((http https)
>> - (let ((requests (map (cut narinfo-request url <>) paths)))
>> - (match (open-connection-for-uri/maybe uri)
>> - (#f
>> - '())
>> - (port
>> - (update-progress!)
>> ;; Note: Do not check HTTPS server certificates to avoid depending
>> ;; on the X.509 PKI. We can do it because we authenticate
>> ;; narinfos, which provides a much stronger guarantee.
>> - (let ((result (http-multiple-get uri
>> + (let* ((requests (map (cut narinfo-request url <>) paths))
>> + (result (call-with-cached-connection uri
>> + (lambda (port)
>> + (if port
>> + (begin
>> + (update-progress!)
>> + (http-multiple-get uri
>> handle-narinfo-response '()
>> requests
>> + #:open-connection
>> + open-connection-for-uri/cached
>> #:verify-certificate? #f
>> - #:port port)))
>>
>> This bit is still there in current ‘master’, so I think it’s not
>> necessary to catch these exceptions in ‘http-multiple-get’ itself, and I
>> would just remove the ‘catch’ wrap altogether.
>>
>> WDYT?
>
> I'm not sure what you're referring to as still being there on the master
> branch?

On ‘master’, ‘do-fetch’ has its ‘http-multiple-get’ call wrapped in
‘call-with-connection-error-handling’, which is equivalent to the change
made in be5a75ebb5988b87b2392e2113f6590f353dd6cd and shown above.

Toggle quote (34 lines)
> Looking at the changes to this particular code path resulting from the
> changes I've made recently, starting at lookup-narinfos, before:
>
> - lookup-narinfos calls fetch-narinfos, which calls do-fetch
>
> - call-with-cached-connection is used, which catches a number of
> exceptions relating to requests, and will retry PROC once upon a
> matching exception
>
> - open-connection-for-uri/maybe is also used, which is like
> open-connection-for-uri/cached, except it includes error handling for
> establishing connections to substitute servers
>
> - http-multiple-get doesn't include error handling
>
> After:
>
> - lookup-narinfos calls fetch-narinfos, which calls do-fetch
>
> - call-with-connection-error-handling is used, which performs the same
> role as the error handling previously within
> open-connection-for-uri/maybe, catching exceptions relating to
> establishing connections to substitute servers
>
> - http-multiple-get now includes error handling similar to what was
> previously done by call-with-cached-connection, although it's more
> complicated since it's done with knowledge of what http-multiple-get
> is doing
>
> I think that the error handling now in http-multiple-get isn't covered
> elsewhere. Moving this error handling back in to fetch-narinfos is
> possible, but then we'd be back to handling connection caching in that
> code, and avoiding that led to this refactoring in the first place.

The ‘http-multiple-get’ call in ‘fetch-narinfos’ is already wrapped in
‘call-with-connection-error-handling’, so it seems we’re good?

Toggle quote (8 lines)
> Also, apart from the implementation problems, I do think that the error
> handling here is better than before. Previously, if you called
> lookup-narinfos, and a connection problem occurred, processing all the
> requests would start from scratch (as call-with-cached-connection calls
> PROC a second time), if a second connection error was to happen, well,
> call-with-cached-connection only handles one error, so that won't be
> caught.

Hmm true. However, can that second exception happen? Normally, if we
get a first exception, we open a new connection and that one should not
get another exception, unless something is wrong—in which case it’s
probably best to report it than to endlessly retry.

WDYT?

Even in tail call position, ‘catch’ calls introduce allocations and
extra work, so if we can avoid using one ‘catch’ per iteration, that’s
better.

Thank you,
Ludo’.
L
L
Ludovic Courtès wrote on 27 Mar 2021 22:58
Re: bug#47288: [PATCH] guix: http-client: Tweak http-multiple-get error handling.
(name . Christopher Baines)(address . mail@cbaines.net)
87mtuoxnqo.fsf_-_@gnu.org
Hi!

I went ahead and pushed 45fce38fb0b6c6796906149ade145b8d3594c1c6 along
these lines.

‘guix weather’ runs to completion and things seem to work fine. Let me
know if you notice anything wrong.

Thank you for your work and for your patience on this issue!

Ludo’.
Closed
?