[1.2.1 pre-release testing] substitute downloading and TLS errors

  • Done
  • quality assurance status badge
Details
3 participants
  • Leo Famulari
  • Ludovic Courtès
  • pelzflorian (Florian Pelz)
Owner
unassigned
Submitted by
pelzflorian (Florian Pelz)
Severity
important
P
P
pelzflorian (Florian Pelz) wrote on 18 Apr 2021 11:38
(name . jcguu95)(address . jcguu95@gmail.com)
20210418093856.a6r2zqw2gks56dg5@pelzflorian.localdomain
On Fri, Apr 16, 2021 at 10:40:55PM -0500, jcguu95 wrote:
Toggle quote (15 lines)
> 2.2. Unexpected failure
>
> After running `guix system init /mnt/etc/config.scm /mnt' for 5 to 10
> minutes, I got the error
>
> ,----
> | substitute: updating substitutes from 'https://ci.guix.gnu.org'... 0.0%guix substitute: error: TSL error in procedure 'write_to_session_port': Resource temporarily unavailable, try again.
> |
> | guix system: error: `/gnu/store/1nmwil4cs...vc2p-guix-1.2.0-21.4dff6ec/bin/guix substitute` died unexpectedly
> | root@gnu ~# _
> `----
>
> Retrying it did resolve the problem. And I got a working guix system
> out of it :)

I’ve got the same TLS error (though the procedure for me is
‘write_to_session_record_port’.

I needed to restart the installer at the last step, this stopped while
downloading enlightenment with no error but it does not continue.

Regards,
Florian
P
P
pelzflorian (Florian Pelz) wrote on 18 Apr 2021 11:44
(name . jcguu95)(address . jcguu95@gmail.com)
20210418094425.cwkdrixhzwgd5qwt@pelzflorian.localdomain
On Sun, Apr 18, 2021 at 11:38:56AM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (3 lines)
> I needed to restart the installer at the last step, this stopped while
> downloading enlightenment with no error but it does not continue.

Note: The last message was

enlightenment-0.24.2 27.0MiB 3.7MiB/s 00:07 [################] 100.0%

But now after maybe 10 minutes it finally continued and died with the
same TLS error about write_to_session_record_port and Resource not
available.
P
P
pelzflorian (Florian Pelz) wrote on 18 Apr 2021 11:48
20210418094838.va3avptrgburuxlz@pelzflorian.localdomain
On Sun, Apr 18, 2021 at 11:44:25AM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (4 lines)
> But now after maybe 10 minutes it finally continued and died with the
> same TLS error about write_to_session_record_port and Resource not
> available.

The error happened while

substitute: updating substitutes from […]
P
P
pelzflorian (Florian Pelz) wrote on 18 Apr 2021 11:57
20210418095722.egbvpgey4okt5yrc@pelzflorian.localdomain
On Sun, Apr 18, 2021 at 11:48:47AM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (9 lines)
> On Sun, Apr 18, 2021 at 11:44:25AM +0200, pelzflorian (Florian Pelz) wrote:
> > But now after maybe 10 minutes it finally continued and died with the
> > same TLS error about write_to_session_record_port and Resource not
> > available.
>
> The error happened while
>
> substitute: updating substitutes from […]

After starting one more time from the last installer step it
downloaded the same enlightenment substitute again and got stuck the
same way.
L
L
Ludovic Courtès wrote on 18 Apr 2021 12:12
control message for bug #47867
(address . control@debbugs.gnu.org)
87fsznnbox.fsf@gnu.org
severity 47867 important
quit
L
L
Ludovic Courtès wrote on 18 Apr 2021 12:12
control message for bug #47297
(address . control@debbugs.gnu.org)
87eef7nbob.fsf@gnu.org
block 47297 by 47867
quit
L
L
Ludovic Courtès wrote on 18 Apr 2021 12:15
Re: bug#47867: [1.2.1 pre-release testing] substitute downloading and TLS errors
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)
87a6pvnbjg.fsf@gnu.org
Hi Florian,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

Toggle quote (22 lines)
> On Fri, Apr 16, 2021 at 10:40:55PM -0500, jcguu95 wrote:
>> 2.2. Unexpected failure
>>
>> After running `guix system init /mnt/etc/config.scm /mnt' for 5 to 10
>> minutes, I got the error
>>
>> ,----
>> | substitute: updating substitutes from 'https://ci.guix.gnu.org'... 0.0%guix substitute: error: TSL error in procedure 'write_to_session_port': Resource temporarily unavailable, try again.
>> |
>> | guix system: error: `/gnu/store/1nmwil4cs...vc2p-guix-1.2.0-21.4dff6ec/bin/guix substitute` died unexpectedly
>> | root@gnu ~# _
>> `----
>>
>> Retrying it did resolve the problem. And I got a working guix system
>> out of it :)
>
> I’ve got the same TLS error (though the procedure for me is
> ‘write_to_session_record_port’.
>
> I needed to restart the installer at the last step, this stopped while
> downloading enlightenment with no error but it does not continue.

Was this in a VM? Could it be that networking was unstable (e.g., you
were running this over a flaky WiFi connection)? How reproducible is
this? Any tips on how to reproduce it?

I did a full bare-bones OS install in a VM, which went fine, but maybe
that’s not representative of your setup.

Thanks,
Ludo’.
P
P
pelzflorian (Florian Pelz) wrote on 18 Apr 2021 13:33
(name . Ludovic Courtès)(address . ludo@gnu.org)
20210418113337.z3afnp4lmt55cjvw@pelzflorian.localdomain
On Sun, Apr 18, 2021 at 12:15:47PM +0200, Ludovic Courtès wrote:
Toggle quote (2 lines)
> Was this in a VM?

No, an Asrock Beebox (real x86_64 hardware).


Toggle quote (3 lines)
> Could it be that networking was unstable (e.g., you
> were running this over a flaky WiFi connection)?

It is a stable Ethernet connection.


Toggle quote (9 lines)
> How reproducible is
> this? Any tips on how to reproduce it?
>
> I did a full bare-bones OS install in a VM, which went fine, but maybe
> that’s not representative of your setup.
>
> Thanks,
> Ludo’.

The installer completed just now after continuing from the last
install step another time (well with an unrelated error by
grub-install that it could not determine the canonical path of
/boot/efi). I will try reproducing, but Jin and me both had this
error.

Regards,
Florian
P
P
pelzflorian (Florian Pelz) wrote on 18 Apr 2021 14:10
(name . Ludovic Courtès)(address . ludo@gnu.org)
20210418121006.4ihop3umb2oyofum@pelzflorian.localdomain
On Sun, Apr 18, 2021 at 01:33:37PM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (4 lines)
> (well with an unrelated error by
> grub-install that it could not determine the canonical path of
> /boot/efi).

Probably

Toggle quote (3 lines)
> I will try reproducing, but Jin and me both had this
> error.

I got the error again with enlightenment.

On IRC Ricardo/rekado suspected
c7c7f068c15e419aaf5ef616516aa5ad4e55c2fa, I will try reverting it.

Regards,
Florian
P
P
pelzflorian (Florian Pelz) wrote on 19 Apr 2021 10:05
(name . Ludovic Courtès)(address . ludo@gnu.org)
20210419080516.piqb52s32h3fe5ag@pelzflorian.localdomain
On Sun, Apr 18, 2021 at 02:10:06PM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (3 lines)
> On IRC Ricardo/rekado suspected
> c7c7f068c15e419aaf5ef616516aa5ad4e55c2fa, I will try reverting it.

It was necessary to revert follow-up commits before reverting
c7c7f068c15e419aaf5ef616516aa5ad4e55c2fa.

I tried on Guix master commit fef2f08bc640f78cc0a86fc7be3eccbc07b5e98c
the following commands:

sed -i '/vi/d' po/packages/LINGUAS #otherwise errors; Julien fixed this by now I think
git add po/packages/LINGUAS
git commit -m fixLINGUAS
git revert 2d73086262e1fb33cd0f0f16f74a495fe06b38aa
git revert 673e5276f6b4dda4bfa9dd5bb70220fc8b17abd2
git revert 45fce38fb0b6c6796906149ade145b8d3594c1c6
git revert 9da5ec7099b992a8969a17627548cd341c01bd90
git revert c5ab78f90b111839508d0ab10c0e5ac2038002ca
git revert b48204259aa9cad80c5b23a4060e2d796007ec7a
git revert c37e3b92ad0334ba2fe7ee4e98631f0a4edeee21
git revert 728c90862eb967a1679efdb69e26d7ba37cdc3a3
git revert fd5b77503e852b78a43e1bee4d6bdfbbb1f27e8f
git revert 112692c0d546d35cd67c5dc70dbd1dc609b18f64
git revert ee3226e9d54891c7e696912245e4904435be191c
git revert e2572aa95007558b008f04decff095f46d20e087
git revert 20c08a8a45d0f137ead7c05e720456b2aea44402
git revert 187e97096825d2bcceb144cead6eccc27385acd7
git revert 8116cc66733134a8fb6f9117d4648288b83c8356
git revert b9d058e3f7982eed04cc748cc0b24173a3969c52
git revert 7c85877fdf964694061e3192eac35723ebc047bf
git revert f50f5751fff4cfc6d5abba9681054569694b7a5c
git revert 05f38ca8dccc49d9ec70d1f63461976b97d74d52
git revert 7b812f7c84c43455cdd68a0e51b6ded018afcc8e
git revert f50b501a7472f4f237023831aa415a948115d1d1
git revert 205833b72c5517915a47a50dbe28e7024dc74e57
git revert e2e853ddb0d964e7b0300334ab51ab89950c2376
git revert 87734503a4a1761a4432b601352e5b02340a97f0
git revert 08acee2f98321d57c4faa3ba641a5543b74f10a4
git revert fbd61b5d3de353bfa468641d087bc53aaa1e63a5
git revert be5a75ebb5988b87b2392e2113f6590f353dd6cd
git revert c7c7f068c15e419aaf5ef616516aa5ad4e55c2fa
./bootstrap
./configure --localstatedir=/var
make
GUIX_ALLOW_ME_TO_USE_PRIVATE_COMMIT=y make update-guix-package
# disabling #:tests? on the guix package was *not* necessary once all reverts were in place
make
./pre-inst-env guix system image -t iso9660 gnu/system/install.scm

I dd’d the resulting image to a USB drive and the TLS errors no longer
appear (tried installing everything two times now; without reverts
downloading enlightenment needed retrys everytime).

I suppose if we leave this bug in the new Guix release, then some
substitutes like enlightenment get stuck and fail to download for some
people and therefore the installation fails. I suspect restarting the
install causes grub-install to fail because it cannot find /boot/efi
anymore, but maybe my grub-install error had other reasons.

I will bisect which reverts are needed, but it takes time …

Regards,
Florian
P
P
pelzflorian (Florian Pelz) wrote on 19 Apr 2021 10:22
(name . Ludovic Courtès)(address . ludo@gnu.org)
20210419082241.3jhmquw6t5jtfutp@pelzflorian.localdomain
On Mon, Apr 19, 2021 at 10:05:25AM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (7 lines)
> I dd’d the resulting image to a USB drive and the TLS errors no longer
> appear (tried installing everything two times now; without reverts
> downloading enlightenment needed retrys everytime).
> […] I suspect restarting the
> install causes grub-install to fail because it cannot find /boot/efi
> anymore, but maybe my grub-install error had other reasons.

No sorry, grub-install still fails, even with no TLS errors no prior
installation restarts. grub-install failing is an unrelated bug. I
had not let the install run past the “copying everthing to /mnt” step
last time.
P
P
pelzflorian (Florian Pelz) wrote on 19 Apr 2021 13:02
(name . Ludovic Courtès)(address . ludo@gnu.org)
20210419110258.vbryj6xapnyf2bas@pelzflorian.localdomain
On Mon, Apr 19, 2021 at 10:05:25AM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (2 lines)
> I will bisect which reverts are needed, but it takes time …

My UEFI broke; the PC cannot display anything anymore. Don’t wait for
my debugging.
P
P
pelzflorian (Florian Pelz) wrote on 20 Apr 2021 03:21
(name . Ludovic Courtès)(address . ludo@gnu.org)
20210420012113.jcwuzfx6k3by5nvo@pelzflorian.localdomain
After flashing the CMOS I can test again.

Without reverts TLS errors happen while or after downloading the
enlightenment substitute from the installer.

With all of those reverts *no* TLS errors happen:

On Mon, Apr 19, 2021 at 10:05:25AM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (28 lines)
> git revert 2d73086262e1fb33cd0f0f16f74a495fe06b38aa
> git revert 673e5276f6b4dda4bfa9dd5bb70220fc8b17abd2
> git revert 45fce38fb0b6c6796906149ade145b8d3594c1c6
> git revert 9da5ec7099b992a8969a17627548cd341c01bd90
> git revert c5ab78f90b111839508d0ab10c0e5ac2038002ca
> git revert b48204259aa9cad80c5b23a4060e2d796007ec7a
> git revert c37e3b92ad0334ba2fe7ee4e98631f0a4edeee21
> git revert 728c90862eb967a1679efdb69e26d7ba37cdc3a3
> git revert fd5b77503e852b78a43e1bee4d6bdfbbb1f27e8f
> git revert 112692c0d546d35cd67c5dc70dbd1dc609b18f64
> git revert ee3226e9d54891c7e696912245e4904435be191c
> git revert e2572aa95007558b008f04decff095f46d20e087
> git revert 20c08a8a45d0f137ead7c05e720456b2aea44402
> git revert 187e97096825d2bcceb144cead6eccc27385acd7
> git revert 8116cc66733134a8fb6f9117d4648288b83c8356
> git revert b9d058e3f7982eed04cc748cc0b24173a3969c52
> git revert 7c85877fdf964694061e3192eac35723ebc047bf
> git revert f50f5751fff4cfc6d5abba9681054569694b7a5c
> git revert 05f38ca8dccc49d9ec70d1f63461976b97d74d52
> git revert 7b812f7c84c43455cdd68a0e51b6ded018afcc8e
> git revert f50b501a7472f4f237023831aa415a948115d1d1
> git revert 205833b72c5517915a47a50dbe28e7024dc74e57
> git revert e2e853ddb0d964e7b0300334ab51ab89950c2376
> git revert 87734503a4a1761a4432b601352e5b02340a97f0
> git revert 08acee2f98321d57c4faa3ba641a5543b74f10a4
> git revert fbd61b5d3de353bfa468641d087bc53aaa1e63a5
> git revert be5a75ebb5988b87b2392e2113f6590f353dd6cd

The bad commit must be one of those.
c7c7f068c15e419aaf5ef616516aa5ad4e55c2fa is not the problem.

I will bisect tomorrow.

Regards,
Florian
P
P
pelzflorian (Florian Pelz) wrote on 20 Apr 2021 17:27
(name . Ludovic Courtès)(address . ludo@gnu.org)
20210420152754.a3ucylrdugrmxu57@pelzflorian.localdomain
On Tue, Apr 20, 2021 at 03:21:13AM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (2 lines)
> > git revert be5a75ebb5988b87b2392e2113f6590f353dd6cd

It seems this is the bad commit. Downloading the enlightenment
substitute got stuck and after a few minutes displayed the usual TLS
error.

Regards,
Florian
P
P
pelzflorian (Florian Pelz) wrote on 20 Apr 2021 19:03
(name . Ludovic Courtès)(address . ludo@gnu.org)
20210420170326.vhgvlm63jve6oi4j@pelzflorian.localdomain
On Tue, Apr 20, 2021 at 05:27:54PM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (7 lines)
> On Tue, Apr 20, 2021 at 03:21:13AM +0200, pelzflorian (Florian Pelz) wrote:
> > > git revert be5a75ebb5988b87b2392e2113f6590f353dd6cd
>
> It seems this is the bad commit. Downloading the enlightenment
> substitute got stuck and after a few minutes displayed the usual TLS
> error.

P.S. There are no issues on my Macbook on the same internet
connection, the issue is only with the Beebox PC.

Regards,
Florian
L
L
Leo Famulari wrote on 20 Apr 2021 20:00
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)
YH8W1mItLIfnQNsj@jasmine.lan
On Tue, Apr 20, 2021 at 07:03:26PM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (8 lines)
> On Tue, Apr 20, 2021 at 05:27:54PM +0200, pelzflorian (Florian Pelz) wrote:
> > It seems this is the bad commit. Downloading the enlightenment
> > substitute got stuck and after a few minutes displayed the usual TLS
> > error.
>
> P.S. There are no issues on my Macbook on the same internet
> connection, the issue is only with the Beebox PC.

To clarify, you're running the same Guix on both machines, but it only
fails on one of them?
P
P
pelzflorian (Florian Pelz) wrote on 20 Apr 2021 21:41
(name . Leo Famulari)(address . leo@famulari.name)
20210420194130.j5x6p4wcnxlot6n5@pelzflorian.localdomain
On Tue, Apr 20, 2021 at 02:00:54PM -0400, Leo Famulari wrote:
Toggle quote (12 lines)
> On Tue, Apr 20, 2021 at 07:03:26PM +0200, pelzflorian (Florian Pelz) wrote:
> > On Tue, Apr 20, 2021 at 05:27:54PM +0200, pelzflorian (Florian Pelz) wrote:
> > > It seems this is the bad commit. Downloading the enlightenment
> > > substitute got stuck and after a few minutes displayed the usual TLS
> > > error.
> >
> > P.S. There are no issues on my Macbook on the same internet
> > connection, the issue is only with the Beebox PC.
>
> To clarify, you're running the same Guix on both machines, but it only
> fails on one of them?

No sorry I was mistaken, sorry for misguiding you.

I had thought so, but the actions were different on both machines.
Only full installs fail.

Doing a full install with the Enlightenment desktop environment fails
with TLS errors on the Beebox PC.

guix build enlightenment succeeds on the Macbook (with same Guix
commit), but

Running
mount /dev/sda2 /mnt
guile -c '((@ (gnu build install) mount-cow-store) "/mnt" "/tmp/guix-inst")'
guix build enlightenment
succeeds though. (Without mount-cow-store disk space is insufficient.)

So I will try to do a full reinstall with Enlightenment on the Macbook
now after a backup.

Regards,
Florian
L
L
Ludovic Courtès wrote on 21 Apr 2021 00:38
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)
8735vk8ttp.fsf@gnu.org
Hi Florian,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

Toggle quote (7 lines)
> On Tue, Apr 20, 2021 at 03:21:13AM +0200, pelzflorian (Florian Pelz) wrote:
>> > git revert be5a75ebb5988b87b2392e2113f6590f353dd6cd
>
> It seems this is the bad commit. Downloading the enlightenment
> substitute got stuck and after a few minutes displayed the usual TLS
> error.

Note that on master there have been changes in this area since this
commit, in particular 20c08a8a45d0f137ead7c05e720456b2aea44402.

I assume the error we’re after is still this one:

Toggle quote (2 lines)
>> | substitute: updating substitutes from 'https://ci.guix.gnu.org'... 0.0%guix substitute: error: TSL error in procedure 'write_to_session_port': Resource temporarily unavailable, try again.

I believe the attached patch “addresses” this problem.

Now, I’m a bit skeptical: the error above is GNUTLS_E_AGAIN, which
happens if sendmsg(2) returns EAGAIN, which supposedly only happens on
datagram (UDP) sockets or on non-blocking sockets, neither of which
applies here.

This change should be safe, though since I’m not sure why we’re getting
EAGAIN in the first place, I wonder what could happen (infinite loop?).

Could you give it a try?

Can you reproduce the substitute issue in a simpler environment?
For instance, by running:

rm -rf ~/.cache/guix/substitute/
./pre-inst-env guix weather $(guix package -A |head -2000 |cut -f1)

This makes 2000 pipelined GETs to https://ci.guix.gnu.org, which is
probably similar to what you observe during system installation.

Thanks for the thorough debugging and bisecting, as always!

Ludo’.
Toggle diff (46 lines)
diff --git a/guix/http-client.scm b/guix/http-client.scm
index a2e11a1b73..51bba250d5 100644
--- a/guix/http-client.scm
+++ b/guix/http-client.scm
@@ -38,7 +38,7 @@
#:use-module (guix utils)
#:use-module (guix base64)
#:autoload (gcrypt hash) (sha256)
- #:autoload (gnutls) (error/invalid-session)
+ #:autoload (gnutls) (error/invalid-session error/again)
#:use-module ((guix build utils)
#:select (mkdir-p dump-port))
#:use-module ((guix build download)
@@ -163,7 +163,8 @@ reusing stale cached connections."
(if (or (and (eq? key 'system-error)
(= EPIPE (system-error-errno `(,key ,@args))))
(and (eq? key 'gnutls-error)
- (eq? (first args) error/invalid-session))
+ (memq (first args)
+ (list error/again error/invalid-session)))
(memq key
'(bad-response bad-header bad-header-component)))
#f
diff --git a/guix/scripts/substitute.scm b/guix/scripts/substitute.scm
index 48309f9b3a..65940591a9 100755
--- a/guix/scripts/substitute.scm
+++ b/guix/scripts/substitute.scm
@@ -45,7 +45,7 @@
#:select (uri-abbreviation nar-uri-abbreviation
(open-connection-for-uri
. guix:open-connection-for-uri)))
- #:autoload (gnutls) (error/invalid-session)
+ #:autoload (gnutls) (error/invalid-session error/again)
#:use-module (guix progress)
#:use-module ((guix build syscalls)
#:select (set-thread-name))
@@ -417,7 +417,8 @@ server certificates."
(if (or (and (eq? key 'system-error)
(= EPIPE (system-error-errno `(,key ,@args))))
(and (eq? key 'gnutls-error)
- (eq? (first args) error/invalid-session))
+ (memq (first args)
+ (list error/again error/invalid-session)))
(memq key '(bad-response bad-header bad-header-component)))
(proc (open-connection-for-uri/cached uri
#:verify-certificate? #f
P
P
pelzflorian (Florian Pelz) wrote on 21 Apr 2021 22:17
(name . Ludovic Courtès)(address . ludo@gnu.org)
20210421201751.3mygidtwio2yz7j4@pelzflorian.localdomain
Sorry for the slow response.

On Wed, Apr 21, 2021 at 12:38:58AM +0200, Ludovic Courtès wrote:
Toggle quote (11 lines)
> Hi Florian,
>
> "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
> > On Tue, Apr 20, 2021 at 03:21:13AM +0200, pelzflorian (Florian Pelz) wrote:
> >> > git revert be5a75ebb5988b87b2392e2113f6590f353dd6cd
> > It seems this is the bad commit. Downloading the enlightenment
> > substitute got stuck and after a few minutes displayed the usual TLS
> > error.
> Note that on master there have been changes in this area since this
> commit, in particular 20c08a8a45d0f137ead7c05e720456b2aea44402.

I have tested 20c08a in my reverts, it is neither the bad commit nor
does it fix it.
Toggle quote (4 lines)
> I assume the error we’re after is still this one:
>
> >> | substitute: updating substitutes from 'https://ci.guix.gnu.org'... 0.0%guix substitute: error: TSL error in procedure 'write_to_session_port': Resource temporarily unavailable, try again.

Yes.

Toggle quote (2 lines)
> I believe the attached patch “addresses” this problem.

It still gets stuck (sometimes with enlightenment, one time with
udisks, restarting the install fixed it once). After getting stuck,
this different error message is shown now; no TLS error (copied by
manual typing, there may be typos):

gtk-doc-1.28 653KiB 2.4MiB/s 00:00 [####################] 100.0%
udisks-2.8.4 842KiB 1.6MiB/s 00:00 [####################] 100.0%

substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%Backtrace:
substitute: In ice-9/boot-9.scm:
substitute: 1736:10 17 (with-exception-handler _ _ #:unwind? _ # _)
substitute: In unknown file:
substitute: 16 (apply-smob/0 #<thunk 7fc2dbf60520>)
substitute: In ice-9/boot-9.scm:
substitute: 718:2 15 (call-with-prompt _ _ #<procedure default-prompt-handle…>)
substitute: In ice-9/eval.scm:
substitute: 619:8 14 (_ #(#(#<directory (guile-user) 7fc2dbf63c80>)))
substitute: In guix/ui.scm:
substitute: 2164:12 13 (run-guix-command _ . _)
substitute: In ice-9/boot-9.scm:
substitute: 1736:10 12 (with-exception-handler _ _ #:unwind? _ # _)
substitute: 1736:10 11 (with-exception-handler _ _ #:unwind? _ # _)
substitute: 1731:15 10 (with-exception-handler _ _ #:unwind? _ # _)
substitute: In guix/scripts/substitute.scm:
substitute: 745:18 9 (_)
substitute: 346:26 8 (process-query #<output: file 4> _ #:cache-urls _ #:acl _)
substitute: In guix/substitutes.scm:
substitute: 358:27 7 (lookup-narinfos/diverse _ _ #<procedure 7fc2d3b15000 …>
substitute: 315:31 6 (lookup-narinfos _ _ #:open-connection _ # _)
substitute: 238:26 5 (fetch-narinfos _ _ #:open-connection _ # _)
substitute: In ice-9/boot-9.scm:
substitute: 1669:16 4 (raise-exception _ #:continuable? _)
substitute: 1669:16 3 (raise-exception _ #:continuable? _)
substitute: 1764:13 2 (_ #<&compound-exception _ components: assertion-fail…>)
substitute: 1669:16 1 (raise-exception _ #:continuable? _)
substitute: 1669:16 0 (raise-exception _ #:continuable? _)
substitute:
substitute: In ice-9/boot-9.scm:1669:16 In procedure raise-exception:
substitute: In procedure %read-line: Wrong type argument in position 1 (expecting open input port): #<closed: string 7fc2d8796070>
guix system: error: `/gnu/store/k3n98i1fk9awd5ydv4ry4k4rlpp7i13m7-guix-1.2.0-22.c467718/bin/guix substitute' died unexpectedly

Command failed with exit code 1.
Press Enter to continue.

Toggle quote (6 lines)
> Can you reproduce the substitute issue in a simpler environment?
> For instance, by running:
>
> rm -rf ~/.cache/guix/substitute/
> ./pre-inst-env guix weather $(guix package -A |head -2000 |cut -f1)

Nope, this always works without issue. Ricardo had written about TLS
errors with 0ad some two or three days ago, but I do not get errors
with 0ad. The issue is present when installing Enlightenment DE from
the installer on both my Macbook and my Beebox every time, but not
when installing to a VM (but maybe it was just luck)

Regards,
Florian
L
L
Ludovic Courtès wrote on 23 Apr 2021 11:19
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)
874kfxtl27.fsf@gnu.org
Hi Florian,

(Cc: Chris who’s also familiar with (guix http-client).)

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

Toggle quote (41 lines)
> It still gets stuck (sometimes with enlightenment, one time with
> udisks, restarting the install fixed it once). After getting stuck,
> this different error message is shown now; no TLS error (copied by
> manual typing, there may be typos):
>
> gtk-doc-1.28 653KiB 2.4MiB/s 00:00 [####################] 100.0%
> udisks-2.8.4 842KiB 1.6MiB/s 00:00 [####################] 100.0%
>
> substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%Backtrace:
> substitute: In ice-9/boot-9.scm:
> substitute: 1736:10 17 (with-exception-handler _ _ #:unwind? _ # _)
> substitute: In unknown file:
> substitute: 16 (apply-smob/0 #<thunk 7fc2dbf60520>)
> substitute: In ice-9/boot-9.scm:
> substitute: 718:2 15 (call-with-prompt _ _ #<procedure default-prompt-handle…>)
> substitute: In ice-9/eval.scm:
> substitute: 619:8 14 (_ #(#(#<directory (guile-user) 7fc2dbf63c80>)))
> substitute: In guix/ui.scm:
> substitute: 2164:12 13 (run-guix-command _ . _)
> substitute: In ice-9/boot-9.scm:
> substitute: 1736:10 12 (with-exception-handler _ _ #:unwind? _ # _)
> substitute: 1736:10 11 (with-exception-handler _ _ #:unwind? _ # _)
> substitute: 1731:15 10 (with-exception-handler _ _ #:unwind? _ # _)
> substitute: In guix/scripts/substitute.scm:
> substitute: 745:18 9 (_)
> substitute: 346:26 8 (process-query #<output: file 4> _ #:cache-urls _ #:acl _)
> substitute: In guix/substitutes.scm:
> substitute: 358:27 7 (lookup-narinfos/diverse _ _ #<procedure 7fc2d3b15000 …>
> substitute: 315:31 6 (lookup-narinfos _ _ #:open-connection _ # _)
> substitute: 238:26 5 (fetch-narinfos _ _ #:open-connection _ # _)
> substitute: In ice-9/boot-9.scm:
> substitute: 1669:16 4 (raise-exception _ #:continuable? _)
> substitute: 1669:16 3 (raise-exception _ #:continuable? _)
> substitute: 1764:13 2 (_ #<&compound-exception _ components: assertion-fail…>)
> substitute: 1669:16 1 (raise-exception _ #:continuable? _)
> substitute: 1669:16 0 (raise-exception _ #:continuable? _)
> substitute:
> substitute: In ice-9/boot-9.scm:1669:16 In procedure raise-exception:
> substitute: In procedure %read-line: Wrong type argument in position 1 (expecting open input port): #<closed: string 7fc2d8796070>
> guix system: error: `/gnu/store/k3n98i1fk9awd5ydv4ry4k4rlpp7i13m7-guix-1.2.0-22.c467718/bin/guix substitute' died unexpectedly

I think I got it: commit 205833b72c5517915a47a50dbe28e7024dc74e57 (then
carried over in 45fce38fb0b6c6796906149ade145b8d3594c1c6) introduced a
call to ‘connect’ in non-tail position. Once that recursive call to
‘connect’ had completed, ‘http-multiple-get’ would go on in ‘loop’
trying to re-process responses, but at that point there aren’t any
responses left to process.

This problem could only happen if a networking exception would occur
while sending HTTP requests for narinfos. Thus, it was most likely to
occur when interleaving substitutions and queries, as in the snippet you
provided above, because then ‘http-multiple-get’ was more likely to be
passed a stale reused connection.

Florian, could you try again with the attached patch?

If you have the courage, it would be awesome if you could also try the
patch without the ‘error/again’ bits. It’s possible that they aren’t
needed now. I double-checked and the GnuTLS Guile bindings already
handle GNUTLS_E_AGAIN and GNUTLS_E_INTERRUPTED, so my guess is that this
was just a side effect of dealing with stale TLS sessions:


Thanks a lot for your help!

Ludo’.
Toggle diff (70 lines)
diff --git a/guix/http-client.scm b/guix/http-client.scm
index a2e11a1b73..b9cf0b1a4b 100644
--- a/guix/http-client.scm
+++ b/guix/http-client.scm
@@ -38,7 +38,7 @@
#:use-module (guix utils)
#:use-module (guix base64)
#:autoload (gcrypt hash) (sha256)
- #:autoload (gnutls) (error/invalid-session)
+ #:autoload (gnutls) (error/invalid-session error/again)
#:use-module ((guix build utils)
#:select (mkdir-p dump-port))
#:use-module ((guix build download)
@@ -163,7 +163,8 @@ reusing stale cached connections."
(if (or (and (eq? key 'system-error)
(= EPIPE (system-error-errno `(,key ,@args))))
(and (eq? key 'gnutls-error)
- (eq? (first args) error/invalid-session))
+ (memq (first args)
+ (list error/again error/invalid-session)))
(memq key
'(bad-response bad-header bad-header-component)))
#f
@@ -207,15 +208,14 @@ returning."
;; Inherit the HTTP proxying property from P.
(set-http-proxy-port?! buffer (http-proxy-port? p))
- (unless (false-if-networking-error
- (begin
- (for-each (cut write-request <> buffer) batch)
- (put-bytevector p (get))
- (force-output p)
- #t))
- ;; If PORT becomes unusable, open a fresh connection and retry.
- (close-port p) ; close the broken port
- (connect #f requests result)))
+ ;; Swallow networking errors that could occur due to connection reuse
+ ;; and the like; they will be handled down the road when trying to
+ ;; read responses.
+ (false-if-networking-error
+ (begin
+ (for-each (cut write-request <> buffer) batch)
+ (put-bytevector p (get))
+ (force-output p))))
;; Now start processing responses.
(let loop ((sent batch)
diff --git a/guix/scripts/substitute.scm b/guix/scripts/substitute.scm
index 48309f9b3a..65940591a9 100755
--- a/guix/scripts/substitute.scm
+++ b/guix/scripts/substitute.scm
@@ -45,7 +45,7 @@
#:select (uri-abbreviation nar-uri-abbreviation
(open-connection-for-uri
. guix:open-connection-for-uri)))
- #:autoload (gnutls) (error/invalid-session)
+ #:autoload (gnutls) (error/invalid-session error/again)
#:use-module (guix progress)
#:use-module ((guix build syscalls)
#:select (set-thread-name))
@@ -417,7 +417,8 @@ server certificates."
(if (or (and (eq? key 'system-error)
(= EPIPE (system-error-errno `(,key ,@args))))
(and (eq? key 'gnutls-error)
- (eq? (first args) error/invalid-session))
+ (memq (first args)
+ (list error/again error/invalid-session)))
(memq key '(bad-response bad-header bad-header-component)))
(proc (open-connection-for-uri/cached uri
#:verify-certificate? #f
P
P
pelzflorian (Florian Pelz) wrote on 23 Apr 2021 16:07
(name . Ludovic Courtès)(address . ludo@gnu.org)
20210423140725.jq67brcspvappuft@pelzflorian.localdomain
Success! Thank you.
65;6003;1c
On Fri, Apr 23, 2021 at 11:19:28AM +0200, Ludovic Courtès wrote:
Toggle quote (2 lines)
> Florian, could you try again with the attached patch?

It succeeds on two full installs of Enlightenment, no errors, no
prolonged getting stuck.


Toggle quote (3 lines)
> If you have the courage, it would be awesome if you could also try the
> patch without the ‘error/again’ bits.

This fails. I tried with

- (unless (false-if-networking-error
- (begin
- (for-each (cut write-request <> buffer) batch)
- (put-bytevector p (get))
- (force-output p)
- #t))
- ;; If PORT becomes unusable, open a fresh connection and retry.
- (close-port p) ; close the broken port
- (connect #f requests result)))
+ ;; Swallow networking errors that could occur due to connection reuse
+ ;; and the like; they will be handled down the road when trying to
+ ;; read responses.
+ (false-if-networking-error
+ (begin
+ (for-each (cut write-request <> buffer) batch)
+ (put-bytevector p (get))
+ (force-output p))))


only and not the rest of your patch, on Guix git master where the full
patch had worked, it fails again with TLS errors (the same error of
Resource temporarily unavailable in procedure
'write_to_session_record_port') after downloading the enlightenment
substitute.


Toggle quote (5 lines)
> I double-checked and the GnuTLS Guile bindings already
> handle GNUTLS_E_AGAIN and GNUTLS_E_INTERRUPTED, so my guess is that this
> was just a side effect of dealing with stale TLS sessions:
> https://gitlab.com/gnutls/gnutls/-/blob/master/guile/src/core.c#L1042

Strange,.

Regards,
Florian
L
L
Ludovic Courtès wrote on 24 Apr 2021 19:13
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)
87sg3fip2a.fsf@gnu.org
Hi Florian,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

Toggle quote (2 lines)
> Success! Thank you.

\o/

Toggle quote (6 lines)
> On Fri, Apr 23, 2021 at 11:19:28AM +0200, Ludovic Courtès wrote:
>> Florian, could you try again with the attached patch?
>
> It succeeds on two full installs of Enlightenment, no errors, no
> prolonged getting stuck.

Yay.

Toggle quote (7 lines)
>> I double-checked and the GnuTLS Guile bindings already
>> handle GNUTLS_E_AGAIN and GNUTLS_E_INTERRUPTED, so my guess is that this
>> was just a side effect of dealing with stale TLS sessions:
>> https://gitlab.com/gnutls/gnutls/-/blob/master/guile/src/core.c#L1042
>
> Strange,.

My bad: I was looking at ‘read_from_session_record_port’, but the
problem is in ‘write_to_session_record_port’. All is clear now.

I’ve pushed these as two separate patches:

c50db7156d http-client: Remove exception mishandling in 'http-multiple-get'.
02d62978f4 http-client, substitute: Gracefully handle GnuTLS EAGAIN/EINTR.

This bug should be gone now.

I’ll go ahead and fix ‘write_to_session_record_port’ in GnuTLS.

Thanks a lot for the quick and thorough testing!

Ludo’.
P
P
pelzflorian (Florian Pelz) wrote on 25 Apr 2021 02:20
(name . Ludovic Courtès)(address . ludo@gnu.org)
20210425002016.d4vje2v45kh3rqmj@pelzflorian.localdomain
On Sat, Apr 24, 2021 at 07:13:01PM +0200, Ludovic Courtès wrote:
Toggle quote (5 lines)
> I’ve pushed these as two separate patches:
>
> c50db7156d http-client: Remove exception mishandling in 'http-multiple-get'.
> 02d62978f4 http-client, substitute: Gracefully handle GnuTLS EAGAIN/EINTR.

I assume they will be cherry-picked to the version-1.3.0 branch and
the guix package will be updated.

Toggle quote (2 lines)
> This bug should be gone now.

Yes, after and only after locally updating the guix package I can
create good installer images.

Closing.

Note though that "Guu, Jin-Cheng" <jcguu95@gmail.com> reported the TLS
errors before me, but to guix-devel and not as a bug report.

Toggle quote (2 lines)
> I’ll go ahead and fix ‘write_to_session_record_port’ in GnuTLS.

Thank you for all the work!

Regards,
Florian
Closed
L
L
Ludovic Courtès wrote on 21 Jun 2021 21:12
control message for bug #47867
(address . control@debbugs.gnu.org)
87eecv58x1.fsf@gnu.org
unarchive 47867
quit
L
L
Ludovic Courtès wrote on 21 Jun 2021 21:18
Re: bug#47867: [1.2.1 pre-release testing] substitute downloading and TLS errors
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)
87bl7z58mu.fsf@gnu.org
Hi,

Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (19 lines)
>>> I double-checked and the GnuTLS Guile bindings already
>>> handle GNUTLS_E_AGAIN and GNUTLS_E_INTERRUPTED, so my guess is that this
>>> was just a side effect of dealing with stale TLS sessions:
>>> https://gitlab.com/gnutls/gnutls/-/blob/master/guile/src/core.c#L1042
>>
>> Strange,.
>
> My bad: I was looking at ‘read_from_session_record_port’, but the
> problem is in ‘write_to_session_record_port’. All is clear now.
>
> I’ve pushed these as two separate patches:
>
> c50db7156d http-client: Remove exception mishandling in 'http-multiple-get'.
> 02d62978f4 http-client, substitute: Gracefully handle GnuTLS EAGAIN/EINTR.
>
> This bug should be gone now.
>
> I’ll go ahead and fix ‘write_to_session_record_port’ in GnuTLS.

It took a while but this is now fixed in GnuTLS:


I *think* the fix will be included in 3.7.3 (it’s not in 3.7.2, released
a couple of weeks ago).

Ludo’.
?