HTTP pipelining of narinfo requests broken for https://ci.guix.gnu.org

  • Done
  • quality assurance status badge
Details
2 participants
  • Gábor Boskovits
  • Ludovic Courtès
Owner
unassigned
Submitted by
Ludovic Courtès
Severity
serious
L
L
Ludovic Courtès wrote on 28 Nov 2019 10:22
(address . bug-Guix@gnu.org)
871rtsz77q.fsf@gnu.org
Starting from a couple of days ago (it seems; roughly around the same
time berlin hit ENOSPC), people have been experiencing issues during the
“updating list of substitutes” phase from https://ci.guix.gnu.org, where
they’d get an ugly backtrace when they’re at 80% or so.

Here’s a small reproducer:
(use-modules (guix scripts substitute) (srfi srfi-1) (srfi srfi-26) (web uri) (web request) (web response) (rnrs io ports)) (define http-multiple-get (@@ (guix scripts substitute) http-multiple-get)) (define %base-url "https://berlin.guix.gnu.org") (define %request-count ;; Number of requests to send. Starts failing at 85 (that is, we don't ;; receive the 85th response). 200) (http-multiple-get (string->uri %base-url) (lambda (request response port result) (let ((len (or (response-content-length response) 0))) (pk 'resp (length result) (uri-path (request-uri request))) (get-bytevector-n port len) (cons result result))) '() (unfold (cut >= <> %request-count) (lambda (n) (build-request (string->uri (string-append %base-url "/" (string-pad (number->string n) 32 #\a) ".narinfo")) #:method 'GET #:headers '((User-Agent . "GNU Guile")))) 1+ 0) #:verify-certificate? #f)
You’ll see that it hangs waiting for a response as soon as you pipeline
85 requests or more.

Note that:

1. https://bayfront.guix.gnu.orgdoesn’t have that problem;

2. http://ci.guix.gnu.orgdoesn’t have that problem;

3. when you send 85 requests, it hangs waiting for the 85th response;
but when you send 200 requests, it hangs waiting for the 160th
response; so it seems it’s not just a matter of TLS record size.

I suspected something having to do with TLS record size limits, but item
#3 above may invalidate that hypothesis.

The weird thing is that we haven’t change the config of berlin in recent
days. Berlin runs nginx 1.17.5 on openssl 1.1.1d, while bayfront runs
nginx 1.17.0 on openssl 1.0.2p.

I very much welcome any ideas you may have!

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 28 Nov 2019 10:24
control message for bug #38411
(address . control@debbugs.gnu.org)
87zhggxsj2.fsf@gnu.org
severity 38411 serious
quit
L
L
Ludovic Courtès wrote on 28 Nov 2019 10:59
Re: bug#38411: HTTP pipelining of narinfo requests broken for https://ci.guix.gnu.org
(address . 38411@debbugs.gnu.org)
87r21sxqxb.fsf@gnu.org
Toggle quote (4 lines)
> The weird thing is that we haven’t change the config of berlin in recent
> days. Berlin runs nginx 1.17.5 on openssl 1.1.1d, while bayfront runs
> nginx 1.17.0 on openssl 1.0.2p.

A new data point: nginx 1.17.4 from a month ago works fine:

Toggle snippet (19 lines)
root@berlin ~/maintenance/hydra# guix system list-generations 2m
[…]
Generation 238 Oct 23 2019 18:02:22
file name: /var/guix/profiles/system-238-link
canonical file name: /gnu/store/28xk8ny9qslyqgi7pjyz4d2x0xrxpw07-system
label: GNU with Linux-Libre 5.3.7
bootloader: grub
root device: label: "my-root"
kernel: /gnu/store/qpasq1pkzb47w5pjzs80pvslv1n7ja1m-linux-libre-5.3.7/bzImage
[…]
root@berlin ~/maintenance/hydra# guix gc -R /gnu/store/28xk8ny9qslyqgi7pjyz4d2x0xrxpw07-system|grep nginx
/gnu/store/4gbpgcr4zc4qf59yq2a008maycfwra4n-shepherd-nginx.scm
/gnu/store/3gqi5cahcwjfvv0bbfqv8ifir2vrqirh-nginx.conf
/gnu/store/s9fm4d5ii8bnh9zv5k78mzjvcl3dipbh-shepherd-nginx.go
/gnu/store/zj3mxk3r2dka56favm357kmgywnv5imk-nginx-1.17.4
root@berlin ~/maintenance/hydra# guix gc --references /gnu/store/zj3mxk3r2dka56favm357kmgywnv5imk-nginx-1.17.4/sbin/nginx |grep ssl
/gnu/store/1dvkm6b97667qd36rsnw4g6isnsmpym7-openssl-1.1.1d

So for now berlin is running this nginx. I’ll try with 1.17.6 and
1.17.4 on top of current master.

Ludo’.
L
L
Ludovic Courtès wrote on 28 Nov 2019 11:04
(address . 38411@debbugs.gnu.org)
87k17kxqp2.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (3 lines)
> You’ll see that it hangs waiting for a response as soon as you pipeline
> 85 requests or more.

When you look at /var/log/guix-publish.log, you see that nginx did not
forward the 85th request (and beyond) to ‘guix publish’.

Interestingly, if you C-c the client while it’s waiting for further
responses, nginx suddenly forwards all the pending requests to ‘guix
publish’.

Ludo’.
L
L
Ludovic Courtès wrote on 28 Nov 2019 11:18
(address . 38411@debbugs.gnu.org)
87fti8xq1g.fsf@gnu.org
nginx 1.17.6 is similarly broken:

Toggle snippet (12 lines)
root@berlin ~/maintenance/hydra# guix describe
Generation 45 Nov 28 2019 11:02:51 (current)
guix 18a5575
repository URL: https://git.savannah.gnu.org/git/guix.git
branch: master
commit: 18a5575ec530f0e7a2e27f2aa3b5addf20da0f87
root@berlin ~/maintenance/hydra# guix build nginx
/gnu/store/6q44kjf59rgkvn0ip8m0454ybszhjpy0-nginx-1.17.6
root@berlin ~/maintenance/hydra# guix gc --references $(guix build nginx)|grep ssl
/gnu/store/1dvkm6b97667qd36rsnw4g6isnsmpym7-openssl-1.1.1d

Ludo'.
L
L
Ludovic Courtès wrote on 28 Nov 2019 11:29
(address . 38411@debbugs.gnu.org)
87a78gxpjf.fsf@gnu.org
AFAICS everything is working fine with this config:

Toggle snippet (17 lines)
root@berlin ~/maintenance/hydra# guix describe
Generation 45 Nov 28 2019 11:02:51 (current)
guix 18a5575
repository URL: https://git.savannah.gnu.org/git/guix.git
branch: master
commit: 18a5575ec530f0e7a2e27f2aa3b5addf20da0f87
root@berlin ~/maintenance/hydra# git log |head
commit 9fdb990982006a4d0ddb68aa71351cc76ef50cdc
Author: Ludovic Courtès <ludo@gnu.org>
Date: Thu Nov 28 11:23:29 2019 +0100

nginx: berlin: Use nginx 1.17.4.
* hydra/nginx/berlin.scm (nginx-1.17.4): New variable.
(%nginx-configuration)[nginx]: New field.

Please let me know if you encounter any issues!

I’ll leave it at that for now, but we’ll have to pay attention to that
in future upgrades. (Ideally we’d report a bug to nginx but that’s
tricky enough that this alone would take much more time than I can
allocate to it.)

Ludo’.
G
G
Gábor Boskovits wrote on 25 Dec 2019 17:50
HTTP pipelining of narinfo requests broken for https://ci.guix.gnu.org
(address . 38411-done@debbugs.gnu.org)
CAE4v=ph4fLFgj47Ar3yYjmm9D7Z2x-cmmzxFiBScSa5wGaaiug@mail.gmail.com
This was an upstream regression introduced in nginx 1.17.5.
It is fixed in 1.17.7. Fixed by updating nginx to 1.17.7 in commit:
32dfde905229e593f9fe60795d2490f99c27aad5
and updating berlin config in maintenance on commit:
87d451e9c3381b21e6c7208372576abed84df1e6.

This is mentioned on then nginx 1.17.7 release notes as:

Bugfix: a timeout might occur while handling pipelined requests in an
SSL connection; the bug had appeared in 1.17.5.

g_bor
--
OpenPGP Key Fingerprint: 7988:3B9F:7D6A:4DBF:3719:0367:2506:A96C:CF63:0B21
Closed
L
L
Ludovic Courtès wrote on 26 Dec 2019 19:14
Re: bug#38411: closed (HTTP pipelining of narinfo requests broken for https://ci.guix.gnu.org)
(name . Gábor Boskovits)(address . boskovits@gmail.com)(address . 38411@debbugs.gnu.org)
87sgl7gdiz.fsf@gnu.org
Hi Gábor,

help-debbugs@gnu.org (GNU bug Tracking System) skribis:

Toggle quote (11 lines)
> This was an upstream regression introduced in nginx 1.17.5.
> It is fixed in 1.17.7. Fixed by updating nginx to 1.17.7 in commit:
> 32dfde905229e593f9fe60795d2490f99c27aad5
> and updating berlin config in maintenance on commit:
> 87d451e9c3381b21e6c7208372576abed84df1e6.
>
> This is mentioned on then nginx 1.17.7 release notes as:
>
> Bugfix: a timeout might occur while handling pipelined requests in an
> SSL connection; the bug had appeared in 1.17.5.

I’m really happy you were able to pinpoint the issue and to help
upstream diagnose it. The power of free software!

Thank you,
Ludo’.
?