[1.2.1 pre-release testing] substitute downloading and TLS errors

DoneSubmitted by pelzflorian (Florian Pelz).
Details
3 participants
  • Leo Famulari
  • Ludovic Courtès
  • pelzflorian (Florian Pelz)
Owner
unassigned
Severity
important
P
P
pelzflorian (Florian Pelz) wrote on 18 Apr 11:38 +0200
(name . jcguu95)(address . jcguu95@gmail.com)
20210418093856.a6r2zqw2gks56dg5@pelzflorian.localdomain
On Fri, Apr 16, 2021 at 10:40:55PM -0500, jcguu95 wrote:
Toggle quote (15 lines)> 2.2. Unexpected failure> > After running `guix system init /mnt/etc/config.scm /mnt' for 5 to 10> minutes, I got the error> > ,----> | substitute: updating substitutes from 'https://ci.guix.gnu.org'... 0.0%guix substitute: error: TSL error in procedure 'write_to_session_port': Resource temporarily unavailable, try again.> | > | guix system: error: `/gnu/store/1nmwil4cs...vc2p-guix-1.2.0-21.4dff6ec/bin/guix substitute` died unexpectedly> | root@gnu ~# _> `----> > Retrying it did resolve the problem. And I got a working guix system> out of it :)
I’ve got the same TLS error (though the procedure for me is‘write_to_session_record_port’.
I needed to restart the installer at the last step, this stopped whiledownloading enlightenment with no error but it does not continue.
Regards,Florian
P
P
pelzflorian (Florian Pelz) wrote on 18 Apr 11:44 +0200
(name . jcguu95)(address . jcguu95@gmail.com)
20210418094425.cwkdrixhzwgd5qwt@pelzflorian.localdomain
On Sun, Apr 18, 2021 at 11:38:56AM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (3 lines)> I needed to restart the installer at the last step, this stopped while> downloading enlightenment with no error but it does not continue.
Note: The last message was
enlightenment-0.24.2 27.0MiB 3.7MiB/s 00:07 [################] 100.0%
But now after maybe 10 minutes it finally continued and died with thesame TLS error about write_to_session_record_port and Resource notavailable.
P
P
pelzflorian (Florian Pelz) wrote on 18 Apr 11:48 +0200
20210418094838.va3avptrgburuxlz@pelzflorian.localdomain
On Sun, Apr 18, 2021 at 11:44:25AM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (4 lines)> But now after maybe 10 minutes it finally continued and died with the> same TLS error about write_to_session_record_port and Resource not> available.
The error happened while
substitute: updating substitutes from […]
P
P
pelzflorian (Florian Pelz) wrote on 18 Apr 11:57 +0200
20210418095722.egbvpgey4okt5yrc@pelzflorian.localdomain
On Sun, Apr 18, 2021 at 11:48:47AM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (9 lines)> On Sun, Apr 18, 2021 at 11:44:25AM +0200, pelzflorian (Florian Pelz) wrote:> > But now after maybe 10 minutes it finally continued and died with the> > same TLS error about write_to_session_record_port and Resource not> > available.> > The error happened while> > substitute: updating substitutes from […]
After starting one more time from the last installer step itdownloaded the same enlightenment substitute again and got stuck thesame way.
L
L
Ludovic Courtès wrote on 18 Apr 12:12 +0200
control message for bug #47867
(address . control@debbugs.gnu.org)
87fsznnbox.fsf@gnu.org
severity 47867 importantquit
L
L
Ludovic Courtès wrote on 18 Apr 12:12 +0200
control message for bug #47297
(address . control@debbugs.gnu.org)
87eef7nbob.fsf@gnu.org
block 47297 by 47867quit
L
L
Ludovic Courtès wrote on 18 Apr 12:15 +0200
Re: bug#47867: [1.2.1 pre-release testing] substitute downloading and TLS errors
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)
87a6pvnbjg.fsf@gnu.org
Hi Florian,
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
Toggle quote (22 lines)> On Fri, Apr 16, 2021 at 10:40:55PM -0500, jcguu95 wrote:>> 2.2. Unexpected failure>> >> After running `guix system init /mnt/etc/config.scm /mnt' for 5 to 10>> minutes, I got the error>> >> ,---->> | substitute: updating substitutes from 'https://ci.guix.gnu.org'... 0.0%guix substitute: error: TSL error in procedure 'write_to_session_port': Resource temporarily unavailable, try again.>> | >> | guix system: error: `/gnu/store/1nmwil4cs...vc2p-guix-1.2.0-21.4dff6ec/bin/guix substitute` died unexpectedly>> | root@gnu ~# _>> `---->> >> Retrying it did resolve the problem. And I got a working guix system>> out of it :)>> I’ve got the same TLS error (though the procedure for me is> ‘write_to_session_record_port’.>> I needed to restart the installer at the last step, this stopped while> downloading enlightenment with no error but it does not continue.
Was this in a VM? Could it be that networking was unstable (e.g., youwere running this over a flaky WiFi connection)? How reproducible isthis? Any tips on how to reproduce it?
I did a full bare-bones OS install in a VM, which went fine, but maybethat’s not representative of your setup.
Thanks,Ludo’.
P
P
pelzflorian (Florian Pelz) wrote on 18 Apr 13:33 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
20210418113337.z3afnp4lmt55cjvw@pelzflorian.localdomain
On Sun, Apr 18, 2021 at 12:15:47PM +0200, Ludovic Courtès wrote:
Toggle quote (2 lines)> Was this in a VM?
No, an Asrock Beebox (real x86_64 hardware).

Toggle quote (3 lines)> Could it be that networking was unstable (e.g., you> were running this over a flaky WiFi connection)?
It is a stable Ethernet connection.

Toggle quote (9 lines)> How reproducible is> this? Any tips on how to reproduce it?> > I did a full bare-bones OS install in a VM, which went fine, but maybe> that’s not representative of your setup.> > Thanks,> Ludo’.
The installer completed just now after continuing from the lastinstall step another time (well with an unrelated error bygrub-install that it could not determine the canonical path of/boot/efi). I will try reproducing, but Jin and me both had thiserror.
Regards,Florian
P
P
pelzflorian (Florian Pelz) wrote on 18 Apr 14:10 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
20210418121006.4ihop3umb2oyofum@pelzflorian.localdomain
On Sun, Apr 18, 2021 at 01:33:37PM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (4 lines)> (well with an unrelated error by> grub-install that it could not determine the canonical path of> /boot/efi).
Probably
Toggle quote (3 lines)> I will try reproducing, but Jin and me both had this> error.
I got the error again with enlightenment.
On IRC Ricardo/rekado suspectedc7c7f068c15e419aaf5ef616516aa5ad4e55c2fa, I will try reverting it.
Regards,Florian
P
P
pelzflorian (Florian Pelz) wrote on 19 Apr 10:05 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
20210419080516.piqb52s32h3fe5ag@pelzflorian.localdomain
On Sun, Apr 18, 2021 at 02:10:06PM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (3 lines)> On IRC Ricardo/rekado suspected> c7c7f068c15e419aaf5ef616516aa5ad4e55c2fa, I will try reverting it.
It was necessary to revert follow-up commits before revertingc7c7f068c15e419aaf5ef616516aa5ad4e55c2fa.
I tried on Guix master commit fef2f08bc640f78cc0a86fc7be3eccbc07b5e98cthe following commands:
sed -i '/vi/d' po/packages/LINGUAS #otherwise errors; Julien fixed this by now I thinkgit add po/packages/LINGUASgit commit -m fixLINGUASgit revert 2d73086262e1fb33cd0f0f16f74a495fe06b38aagit revert 673e5276f6b4dda4bfa9dd5bb70220fc8b17abd2git revert 45fce38fb0b6c6796906149ade145b8d3594c1c6git revert 9da5ec7099b992a8969a17627548cd341c01bd90git revert c5ab78f90b111839508d0ab10c0e5ac2038002cagit revert b48204259aa9cad80c5b23a4060e2d796007ec7agit revert c37e3b92ad0334ba2fe7ee4e98631f0a4edeee21git revert 728c90862eb967a1679efdb69e26d7ba37cdc3a3git revert fd5b77503e852b78a43e1bee4d6bdfbbb1f27e8fgit revert 112692c0d546d35cd67c5dc70dbd1dc609b18f64git revert ee3226e9d54891c7e696912245e4904435be191cgit revert e2572aa95007558b008f04decff095f46d20e087git revert 20c08a8a45d0f137ead7c05e720456b2aea44402git revert 187e97096825d2bcceb144cead6eccc27385acd7git revert 8116cc66733134a8fb6f9117d4648288b83c8356git revert b9d058e3f7982eed04cc748cc0b24173a3969c52git revert 7c85877fdf964694061e3192eac35723ebc047bfgit revert f50f5751fff4cfc6d5abba9681054569694b7a5cgit revert 05f38ca8dccc49d9ec70d1f63461976b97d74d52git revert 7b812f7c84c43455cdd68a0e51b6ded018afcc8egit revert f50b501a7472f4f237023831aa415a948115d1d1git revert 205833b72c5517915a47a50dbe28e7024dc74e57git revert e2e853ddb0d964e7b0300334ab51ab89950c2376git revert 87734503a4a1761a4432b601352e5b02340a97f0git revert 08acee2f98321d57c4faa3ba641a5543b74f10a4git revert fbd61b5d3de353bfa468641d087bc53aaa1e63a5git revert be5a75ebb5988b87b2392e2113f6590f353dd6cdgit revert c7c7f068c15e419aaf5ef616516aa5ad4e55c2fa./bootstrap./configure --localstatedir=/varmakeGUIX_ALLOW_ME_TO_USE_PRIVATE_COMMIT=y make update-guix-package# disabling #:tests? on the guix package was *not* necessary once all reverts were in placemake./pre-inst-env guix system image -t iso9660 gnu/system/install.scm
I dd’d the resulting image to a USB drive and the TLS errors no longerappear (tried installing everything two times now; without revertsdownloading enlightenment needed retrys everytime).
I suppose if we leave this bug in the new Guix release, then somesubstitutes like enlightenment get stuck and fail to download for somepeople and therefore the installation fails. I suspect restarting theinstall causes grub-install to fail because it cannot find /boot/efianymore, but maybe my grub-install error had other reasons.
I will bisect which reverts are needed, but it takes timeĀ …
Regards,Florian
P
P
pelzflorian (Florian Pelz) wrote on 19 Apr 10:22 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
20210419082241.3jhmquw6t5jtfutp@pelzflorian.localdomain
On Mon, Apr 19, 2021 at 10:05:25AM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (7 lines)> I dd’d the resulting image to a USB drive and the TLS errors no longer> appear (tried installing everything two times now; without reverts> downloading enlightenment needed retrys everytime).> […] I suspect restarting the> install causes grub-install to fail because it cannot find /boot/efi> anymore, but maybe my grub-install error had other reasons.
No sorry, grub-install still fails, even with no TLS errors no priorinstallation restarts. grub-install failing is an unrelated bug. Ihad not let the install run past the “copying everthing to /mnt” steplast time.
P
P
pelzflorian (Florian Pelz) wrote on 19 Apr 13:02 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
20210419110258.vbryj6xapnyf2bas@pelzflorian.localdomain
On Mon, Apr 19, 2021 at 10:05:25AM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (2 lines)> I will bisect which reverts are needed, but it takes timeĀ …
My UEFI broke; the PC cannot display anything anymore. Don’t wait formy debugging.
P
P
pelzflorian (Florian Pelz) wrote on 20 Apr 03:21 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
20210420012113.jcwuzfx6k3by5nvo@pelzflorian.localdomain
After flashing the CMOS I can test again.
Without reverts TLS errors happen while or after downloading theenlightenment substitute from the installer.
With all of those reverts *no* TLS errors happen:
On Mon, Apr 19, 2021 at 10:05:25AM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (28 lines)> git revert 2d73086262e1fb33cd0f0f16f74a495fe06b38aa> git revert 673e5276f6b4dda4bfa9dd5bb70220fc8b17abd2> git revert 45fce38fb0b6c6796906149ade145b8d3594c1c6> git revert 9da5ec7099b992a8969a17627548cd341c01bd90> git revert c5ab78f90b111839508d0ab10c0e5ac2038002ca> git revert b48204259aa9cad80c5b23a4060e2d796007ec7a> git revert c37e3b92ad0334ba2fe7ee4e98631f0a4edeee21> git revert 728c90862eb967a1679efdb69e26d7ba37cdc3a3> git revert fd5b77503e852b78a43e1bee4d6bdfbbb1f27e8f> git revert 112692c0d546d35cd67c5dc70dbd1dc609b18f64> git revert ee3226e9d54891c7e696912245e4904435be191c> git revert e2572aa95007558b008f04decff095f46d20e087> git revert 20c08a8a45d0f137ead7c05e720456b2aea44402> git revert 187e97096825d2bcceb144cead6eccc27385acd7> git revert 8116cc66733134a8fb6f9117d4648288b83c8356> git revert b9d058e3f7982eed04cc748cc0b24173a3969c52> git revert 7c85877fdf964694061e3192eac35723ebc047bf> git revert f50f5751fff4cfc6d5abba9681054569694b7a5c> git revert 05f38ca8dccc49d9ec70d1f63461976b97d74d52> git revert 7b812f7c84c43455cdd68a0e51b6ded018afcc8e> git revert f50b501a7472f4f237023831aa415a948115d1d1> git revert 205833b72c5517915a47a50dbe28e7024dc74e57> git revert e2e853ddb0d964e7b0300334ab51ab89950c2376> git revert 87734503a4a1761a4432b601352e5b02340a97f0> git revert 08acee2f98321d57c4faa3ba641a5543b74f10a4> git revert fbd61b5d3de353bfa468641d087bc53aaa1e63a5> git revert be5a75ebb5988b87b2392e2113f6590f353dd6cd
The bad commit must be one of those.c7c7f068c15e419aaf5ef616516aa5ad4e55c2fa is not the problem.
I will bisect tomorrow.
Regards,Florian
P
P
pelzflorian (Florian Pelz) wrote on 20 Apr 17:27 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
20210420152754.a3ucylrdugrmxu57@pelzflorian.localdomain
On Tue, Apr 20, 2021 at 03:21:13AM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (2 lines)> > git revert be5a75ebb5988b87b2392e2113f6590f353dd6cd
It seems this is the bad commit. Downloading the enlightenmentsubstitute got stuck and after a few minutes displayed the usual TLSerror.
Regards,Florian
P
P
pelzflorian (Florian Pelz) wrote on 20 Apr 19:03 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
20210420170326.vhgvlm63jve6oi4j@pelzflorian.localdomain
On Tue, Apr 20, 2021 at 05:27:54PM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (7 lines)> On Tue, Apr 20, 2021 at 03:21:13AM +0200, pelzflorian (Florian Pelz) wrote:> > > git revert be5a75ebb5988b87b2392e2113f6590f353dd6cd> > It seems this is the bad commit. Downloading the enlightenment> substitute got stuck and after a few minutes displayed the usual TLS> error.
P.S. There are no issues on my Macbook on the same internetconnection, the issue is only with the Beebox PC.
Regards,Florian
L
L
Leo Famulari wrote on 20 Apr 20:00 +0200
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)
YH8W1mItLIfnQNsj@jasmine.lan
On Tue, Apr 20, 2021 at 07:03:26PM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (8 lines)> On Tue, Apr 20, 2021 at 05:27:54PM +0200, pelzflorian (Florian Pelz) wrote:> > It seems this is the bad commit. Downloading the enlightenment> > substitute got stuck and after a few minutes displayed the usual TLS> > error.> > P.S. There are no issues on my Macbook on the same internet> connection, the issue is only with the Beebox PC.
To clarify, you're running the same Guix on both machines, but it onlyfails on one of them?
P
P
pelzflorian (Florian Pelz) wrote on 20 Apr 21:41 +0200
(name . Leo Famulari)(address . leo@famulari.name)
20210420194130.j5x6p4wcnxlot6n5@pelzflorian.localdomain
On Tue, Apr 20, 2021 at 02:00:54PM -0400, Leo Famulari wrote:
Toggle quote (12 lines)> On Tue, Apr 20, 2021 at 07:03:26PM +0200, pelzflorian (Florian Pelz) wrote:> > On Tue, Apr 20, 2021 at 05:27:54PM +0200, pelzflorian (Florian Pelz) wrote:> > > It seems this is the bad commit. Downloading the enlightenment> > > substitute got stuck and after a few minutes displayed the usual TLS> > > error.> > > > P.S. There are no issues on my Macbook on the same internet> > connection, the issue is only with the Beebox PC.> > To clarify, you're running the same Guix on both machines, but it only> fails on one of them?
No sorry I was mistaken, sorry for misguiding you.
I had thought so, but the actions were different on both machines.Only full installs fail.
Doing a full install with the Enlightenment desktop environment failswith TLS errors on the Beebox PC.
guix build enlightenment succeeds on the Macbook (with same Guixcommit), but
Runningmount /dev/sda2 /mntguile -c '((@ (gnu build install) mount-cow-store) "/mnt" "/tmp/guix-inst")'guix build enlightenmentsucceeds though. (Without mount-cow-store disk space is insufficient.)
So I will try to do a full reinstall with Enlightenment on the Macbooknow after a backup.
Regards,Florian
L
L
Ludovic Courtès wrote on 21 Apr 00:38 +0200
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)
8735vk8ttp.fsf@gnu.org
Hi Florian,
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
Toggle quote (7 lines)> On Tue, Apr 20, 2021 at 03:21:13AM +0200, pelzflorian (Florian Pelz) wrote:>> > git revert be5a75ebb5988b87b2392e2113f6590f353dd6cd>> It seems this is the bad commit. Downloading the enlightenment> substitute got stuck and after a few minutes displayed the usual TLS> error.
Note that on master there have been changes in this area since thiscommit, in particular 20c08a8a45d0f137ead7c05e720456b2aea44402.
I assume the error we’re after is still this one:
Toggle quote (2 lines)>> | substitute: updating substitutes from 'https://ci.guix.gnu.org'... 0.0%guix substitute: error: TSL error in procedure 'write_to_session_port': Resource temporarily unavailable, try again.
I believe the attached patch “addresses” this problem.
Now, I’m a bit skeptical: the error above is GNUTLS_E_AGAIN, whichhappens if sendmsg(2) returns EAGAIN, which supposedly only happens ondatagram (UDP) sockets or on non-blocking sockets, neither of whichapplies here.
This change should be safe, though since I’m not sure why we’re gettingEAGAIN in the first place, I wonder what could happen (infinite loop?).
Could you give it a try?
Can you reproduce the substitute issue in a simpler environment?For instance, by running:
rm -rf ~/.cache/guix/substitute/ ./pre-inst-env guix weather $(guix package -A |head -2000 |cut -f1)
This makes 2000 pipelined GETs to https://ci.guix.gnu.org, which isprobably similar to what you observe during system installation.
Thanks for the thorough debugging and bisecting, as always!
Ludo’.
Toggle diff (46 lines)diff --git a/guix/http-client.scm b/guix/http-client.scmindex a2e11a1b73..51bba250d5 100644--- a/guix/http-client.scm+++ b/guix/http-client.scm@@ -38,7 +38,7 @@ #:use-module (guix utils) #:use-module (guix base64) #:autoload (gcrypt hash) (sha256)- #:autoload (gnutls) (error/invalid-session)+ #:autoload (gnutls) (error/invalid-session error/again) #:use-module ((guix build utils) #:select (mkdir-p dump-port)) #:use-module ((guix build download)@@ -163,7 +163,8 @@ reusing stale cached connections." (if (or (and (eq? key 'system-error) (= EPIPE (system-error-errno `(,key ,@args)))) (and (eq? key 'gnutls-error)- (eq? (first args) error/invalid-session))+ (memq (first args)+ (list error/again error/invalid-session))) (memq key '(bad-response bad-header bad-header-component))) #fdiff --git a/guix/scripts/substitute.scm b/guix/scripts/substitute.scmindex 48309f9b3a..65940591a9 100755--- a/guix/scripts/substitute.scm+++ b/guix/scripts/substitute.scm@@ -45,7 +45,7 @@ #:select (uri-abbreviation nar-uri-abbreviation (open-connection-for-uri . guix:open-connection-for-uri)))- #:autoload (gnutls) (error/invalid-session)+ #:autoload (gnutls) (error/invalid-session error/again) #:use-module (guix progress) #:use-module ((guix build syscalls) #:select (set-thread-name))@@ -417,7 +417,8 @@ server certificates." (if (or (and (eq? key 'system-error) (= EPIPE (system-error-errno `(,key ,@args)))) (and (eq? key 'gnutls-error)- (eq? (first args) error/invalid-session))+ (memq (first args)+ (list error/again error/invalid-session))) (memq key '(bad-response bad-header bad-header-component))) (proc (open-connection-for-uri/cached uri #:verify-certificate? #f
P
P
pelzflorian (Florian Pelz) wrote on 21 Apr 22:17 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
20210421201751.3mygidtwio2yz7j4@pelzflorian.localdomain
Sorry for the slow response.
On Wed, Apr 21, 2021 at 12:38:58AM +0200, Ludovic Courtès wrote:
Toggle quote (11 lines)> Hi Florian,> > "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:> > On Tue, Apr 20, 2021 at 03:21:13AM +0200, pelzflorian (Florian Pelz) wrote:> >> > git revert be5a75ebb5988b87b2392e2113f6590f353dd6cd> > It seems this is the bad commit. Downloading the enlightenment> > substitute got stuck and after a few minutes displayed the usual TLS> > error.> Note that on master there have been changes in this area since this> commit, in particular 20c08a8a45d0f137ead7c05e720456b2aea44402.
I have tested 20c08a in my reverts, it is neither the bad commit nordoes it fix it.
Toggle quote (4 lines)> I assume the error we’re after is still this one:> > >> | substitute: updating substitutes from 'https://ci.guix.gnu.org'... 0.0%guix substitute: error: TSL error in procedure 'write_to_session_port': Resource temporarily unavailable, try again.
Yes.
Toggle quote (2 lines)> I believe the attached patch “addresses” this problem.
It still gets stuck (sometimes with enlightenment, one time withudisks, restarting the install fixed it once). After getting stuck,this different error message is shown now; no TLS error (copied bymanual typing, there may be typos):
gtk-doc-1.28 653KiB 2.4MiB/s 00:00 [####################] 100.0%udisks-2.8.4 842KiB 1.6MiB/s 00:00 [####################] 100.0%
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%Backtrace:substitute: In ice-9/boot-9.scm:substitute: 1736:10 17 (with-exception-handler _ _ #:unwind? _ # _)substitute: In unknown file:substitute: 16 (apply-smob/0 #<thunk 7fc2dbf60520>)substitute: In ice-9/boot-9.scm:substitute: 718:2 15 (call-with-prompt _ _ #<procedure default-prompt-handle…>)substitute: In ice-9/eval.scm:substitute: 619:8 14 (_ #(#(#<directory (guile-user) 7fc2dbf63c80>)))substitute: In guix/ui.scm:substitute: 2164:12 13 (run-guix-command _ . _)substitute: In ice-9/boot-9.scm:substitute: 1736:10 12 (with-exception-handler _ _ #:unwind? _ # _)substitute: 1736:10 11 (with-exception-handler _ _ #:unwind? _ # _)substitute: 1731:15 10 (with-exception-handler _ _ #:unwind? _ # _)substitute: In guix/scripts/substitute.scm:substitute: 745:18 9 (_)substitute: 346:26 8 (process-query #<output: file 4> _ #:cache-urls _ #:acl _)substitute: In guix/substitutes.scm:substitute: 358:27 7 (lookup-narinfos/diverse _ _ #<procedure 7fc2d3b15000 …>substitute: 315:31 6 (lookup-narinfos _ _ #:open-connection _ # _)substitute: 238:26 5 (fetch-narinfos _ _ #:open-connection _ # _)substitute: In ice-9/boot-9.scm:substitute: 1669:16 4 (raise-exception _ #:continuable? _)substitute: 1669:16 3 (raise-exception _ #:continuable? _)substitute: 1764:13 2 (_ #<&compound-exception _ components: assertion-fail…>)substitute: 1669:16 1 (raise-exception _ #:continuable? _)substitute: 1669:16 0 (raise-exception _ #:continuable? _)substitute:substitute: In ice-9/boot-9.scm:1669:16 In procedure raise-exception:substitute: In procedure %read-line: Wrong type argument in position 1 (expecting open input port): #<closed: string 7fc2d8796070>guix system: error: `/gnu/store/k3n98i1fk9awd5ydv4ry4k4rlpp7i13m7-guix-1.2.0-22.c467718/bin/guix substitute' died unexpectedly
Command failed with exit code 1.Press Enter to continue.
Toggle quote (6 lines)> Can you reproduce the substitute issue in a simpler environment?> For instance, by running:> > rm -rf ~/.cache/guix/substitute/> ./pre-inst-env guix weather $(guix package -A |head -2000 |cut -f1)
Nope, this always works without issue. Ricardo had written about TLSerrors with 0ad some two or three days ago, but I do not get errorswith 0ad. The issue is present when installing Enlightenment DE fromthe installer on both my Macbook and my Beebox every time, but notwhen installing to a VM (but maybe it was just luck)
Regards,Florian
L
L
Ludovic Courtès wrote on 23 Apr 11:19 +0200
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)
874kfxtl27.fsf@gnu.org
Hi Florian,
(Cc: Chris who’s also familiar with (guix http-client).)
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
Toggle quote (41 lines)> It still gets stuck (sometimes with enlightenment, one time with> udisks, restarting the install fixed it once). After getting stuck,> this different error message is shown now; no TLS error (copied by> manual typing, there may be typos):>> gtk-doc-1.28 653KiB 2.4MiB/s 00:00 [####################] 100.0%> udisks-2.8.4 842KiB 1.6MiB/s 00:00 [####################] 100.0%>> substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%Backtrace:> substitute: In ice-9/boot-9.scm:> substitute: 1736:10 17 (with-exception-handler _ _ #:unwind? _ # _)> substitute: In unknown file:> substitute: 16 (apply-smob/0 #<thunk 7fc2dbf60520>)> substitute: In ice-9/boot-9.scm:> substitute: 718:2 15 (call-with-prompt _ _ #<procedure default-prompt-handle…>)> substitute: In ice-9/eval.scm:> substitute: 619:8 14 (_ #(#(#<directory (guile-user) 7fc2dbf63c80>)))> substitute: In guix/ui.scm:> substitute: 2164:12 13 (run-guix-command _ . _)> substitute: In ice-9/boot-9.scm:> substitute: 1736:10 12 (with-exception-handler _ _ #:unwind? _ # _)> substitute: 1736:10 11 (with-exception-handler _ _ #:unwind? _ # _)> substitute: 1731:15 10 (with-exception-handler _ _ #:unwind? _ # _)> substitute: In guix/scripts/substitute.scm:> substitute: 745:18 9 (_)> substitute: 346:26 8 (process-query #<output: file 4> _ #:cache-urls _ #:acl _)> substitute: In guix/substitutes.scm:> substitute: 358:27 7 (lookup-narinfos/diverse _ _ #<procedure 7fc2d3b15000 …>> substitute: 315:31 6 (lookup-narinfos _ _ #:open-connection _ # _)> substitute: 238:26 5 (fetch-narinfos _ _ #:open-connection _ # _)> substitute: In ice-9/boot-9.scm:> substitute: 1669:16 4 (raise-exception _ #:continuable? _)> substitute: 1669:16 3 (raise-exception _ #:continuable? _)> substitute: 1764:13 2 (_ #<&compound-exception _ components: assertion-fail…>)> substitute: 1669:16 1 (raise-exception _ #:continuable? _)> substitute: 1669:16 0 (raise-exception _ #:continuable? _)> substitute:> substitute: In ice-9/boot-9.scm:1669:16 In procedure raise-exception:> substitute: In procedure %read-line: Wrong type argument in position 1 (expecting open input port): #<closed: string 7fc2d8796070>> guix system: error: `/gnu/store/k3n98i1fk9awd5ydv4ry4k4rlpp7i13m7-guix-1.2.0-22.c467718/bin/guix substitute' died unexpectedly
I think I got it: commit 205833b72c5517915a47a50dbe28e7024dc74e57 (thencarried over in 45fce38fb0b6c6796906149ade145b8d3594c1c6) introduced acall to ‘connect’ in non-tail position. Once that recursive call to‘connect’ had completed, ‘http-multiple-get’ would go on in ‘loop’trying to re-process responses, but at that point there aren’t anyresponses left to process.
This problem could only happen if a networking exception would occurwhile sending HTTP requests for narinfos. Thus, it was most likely tooccur when interleaving substitutions and queries, as in the snippet youprovided above, because then ‘http-multiple-get’ was more likely to bepassed a stale reused connection.
Florian, could you try again with the attached patch?
If you have the courage, it would be awesome if you could also try thepatch without the ‘error/again’ bits. It’s possible that they aren’tneeded now. I double-checked and the GnuTLS Guile bindings alreadyhandle GNUTLS_E_AGAIN and GNUTLS_E_INTERRUPTED, so my guess is that thiswas just a side effect of dealing with stale TLS sessions:
https://gitlab.com/gnutls/gnutls/-/blob/master/guile/src/core.c#L1042
Thanks a lot for your help!
Ludo’.
Toggle diff (70 lines)diff --git a/guix/http-client.scm b/guix/http-client.scmindex a2e11a1b73..b9cf0b1a4b 100644--- a/guix/http-client.scm+++ b/guix/http-client.scm@@ -38,7 +38,7 @@ #:use-module (guix utils) #:use-module (guix base64) #:autoload (gcrypt hash) (sha256)- #:autoload (gnutls) (error/invalid-session)+ #:autoload (gnutls) (error/invalid-session error/again) #:use-module ((guix build utils) #:select (mkdir-p dump-port)) #:use-module ((guix build download)@@ -163,7 +163,8 @@ reusing stale cached connections." (if (or (and (eq? key 'system-error) (= EPIPE (system-error-errno `(,key ,@args)))) (and (eq? key 'gnutls-error)- (eq? (first args) error/invalid-session))+ (memq (first args)+ (list error/again error/invalid-session))) (memq key '(bad-response bad-header bad-header-component))) #f@@ -207,15 +208,14 @@ returning." ;; Inherit the HTTP proxying property from P. (set-http-proxy-port?! buffer (http-proxy-port? p)) - (unless (false-if-networking-error- (begin- (for-each (cut write-request <> buffer) batch)- (put-bytevector p (get))- (force-output p)- #t))- ;; If PORT becomes unusable, open a fresh connection and retry.- (close-port p) ; close the broken port- (connect #f requests result)))+ ;; Swallow networking errors that could occur due to connection reuse+ ;; and the like; they will be handled down the road when trying to+ ;; read responses.+ (false-if-networking-error+ (begin+ (for-each (cut write-request <> buffer) batch)+ (put-bytevector p (get))+ (force-output p)))) ;; Now start processing responses. (let loop ((sent batch)diff --git a/guix/scripts/substitute.scm b/guix/scripts/substitute.scmindex 48309f9b3a..65940591a9 100755--- a/guix/scripts/substitute.scm+++ b/guix/scripts/substitute.scm@@ -45,7 +45,7 @@ #:select (uri-abbreviation nar-uri-abbreviation (open-connection-for-uri . guix:open-connection-for-uri)))- #:autoload (gnutls) (error/invalid-session)+ #:autoload (gnutls) (error/invalid-session error/again) #:use-module (guix progress) #:use-module ((guix build syscalls) #:select (set-thread-name))@@ -417,7 +417,8 @@ server certificates." (if (or (and (eq? key 'system-error) (= EPIPE (system-error-errno `(,key ,@args)))) (and (eq? key 'gnutls-error)- (eq? (first args) error/invalid-session))+ (memq (first args)+ (list error/again error/invalid-session))) (memq key '(bad-response bad-header bad-header-component))) (proc (open-connection-for-uri/cached uri #:verify-certificate? #f
P
P
pelzflorian (Florian Pelz) wrote on 23 Apr 16:07 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
20210423140725.jq67brcspvappuft@pelzflorian.localdomain
Success! Thank you.65;6003;1cOn Fri, Apr 23, 2021 at 11:19:28AM +0200, Ludovic Courtès wrote:
Toggle quote (2 lines)> Florian, could you try again with the attached patch?
It succeeds on two full installs of Enlightenment, no errors, noprolonged getting stuck.

Toggle quote (3 lines)> If you have the courage, it would be awesome if you could also try the> patch without the ‘error/again’ bits.
This fails. I tried with
- (unless (false-if-networking-error- (begin- (for-each (cut write-request <> buffer) batch)- (put-bytevector p (get))- (force-output p)- #t))- ;; If PORT becomes unusable, open a fresh connection and retry.- (close-port p) ; close the broken port- (connect #f requests result)))+ ;; Swallow networking errors that could occur due to connection reuse+ ;; and the like; they will be handled down the road when trying to+ ;; read responses.+ (false-if-networking-error+ (begin+ (for-each (cut write-request <> buffer) batch)+ (put-bytevector p (get))+ (force-output p))))

only and not the rest of your patch, on Guix git master where the fullpatch had worked, it fails again with TLS errors (the same error ofResource temporarily unavailable in procedure'write_to_session_record_port') after downloading the enlightenmentsubstitute.

Toggle quote (5 lines)> I double-checked and the GnuTLS Guile bindings already> handle GNUTLS_E_AGAIN and GNUTLS_E_INTERRUPTED, so my guess is that this> was just a side effect of dealing with stale TLS sessions:> https://gitlab.com/gnutls/gnutls/-/blob/master/guile/src/core.c#L1042
Strange,.
Regards,Florian
L
L
Ludovic Courtès wrote on 24 Apr 19:13 +0200
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)
87sg3fip2a.fsf@gnu.org
Hi Florian,
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
Toggle quote (2 lines)> Success! Thank you.
\o/
Toggle quote (6 lines)> On Fri, Apr 23, 2021 at 11:19:28AM +0200, Ludovic Courtès wrote:>> Florian, could you try again with the attached patch?>> It succeeds on two full installs of Enlightenment, no errors, no> prolonged getting stuck.
Yay.
Toggle quote (7 lines)>> I double-checked and the GnuTLS Guile bindings already>> handle GNUTLS_E_AGAIN and GNUTLS_E_INTERRUPTED, so my guess is that this>> was just a side effect of dealing with stale TLS sessions:>> https://gitlab.com/gnutls/gnutls/-/blob/master/guile/src/core.c#L1042>> Strange,.
My bad: I was looking at ‘read_from_session_record_port’, but theproblem is in ‘write_to_session_record_port’. All is clear now.
I’ve pushed these as two separate patches:
c50db7156d http-client: Remove exception mishandling in 'http-multiple-get'. 02d62978f4 http-client, substitute: Gracefully handle GnuTLS EAGAIN/EINTR.
This bug should be gone now.
I’ll go ahead and fix ‘write_to_session_record_port’ in GnuTLS.
Thanks a lot for the quick and thorough testing!
Ludo’.
P
P
pelzflorian (Florian Pelz) wrote on 25 Apr 02:20 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
20210425002016.d4vje2v45kh3rqmj@pelzflorian.localdomain
On Sat, Apr 24, 2021 at 07:13:01PM +0200, Ludovic Courtès wrote:
Toggle quote (5 lines)> I’ve pushed these as two separate patches:> > c50db7156d http-client: Remove exception mishandling in 'http-multiple-get'.> 02d62978f4 http-client, substitute: Gracefully handle GnuTLS EAGAIN/EINTR.
I assume they will be cherry-picked to the version-1.3.0 branch andthe guix package will be updated.
Toggle quote (2 lines)> This bug should be gone now.
Yes, after and only after locally updating the guix package I cancreate good installer images.
Closing.
Note though that "Guu, Jin-Cheng" <jcguu95@gmail.com> reported the TLSerrors before me, but to guix-devel and not as a bug report.
Toggle quote (2 lines)> I’ll go ahead and fix ‘write_to_session_record_port’ in GnuTLS.
Thank you for all the work!
Regards,Florian
Closed
?