guix substitute: error: TLS error in procedure 'read_from_session_record_port': Error decoding the received TLS packet.

  • Done
  • quality assurance status badge
Details
4 participants
  • Thiago Jung Bauermann
  • Ludovic Courtès
  • Maxim Cournoyer
  • Maxime Devos
Owner
unassigned
Submitted by
Maxim Cournoyer
Severity
important
Merged with
M
M
Maxim Cournoyer wrote on 7 Jun 2021 17:10
(name . bug-guix)(address . bug-guix@gnu.org)
87bl8hvhgx.fsf@gmail.com
Hello,

Using Guix from the master branch at commit
b2122b07dc24007263b92247cc479713c2101390, with a system reconfigured on
the 2nd of June (Guix commit bb325c5611553a6db21ee7499ac07d5757d24fc3):

Toggle snippet (19 lines)
Generation 216 Jun 02 2021 10:14:19 (current)
file name: /var/guix/profiles/system-216-link
canonical file name: /gnu/store/apjg70083nc5xj816y0ff3r8ir9gh5py-system
label: GNU with Linux-Libre 5.11.20
bootloader: grub
root device: /dev/mapper/cryptroot
kernel: /gnu/store/ghijd80qabdyf0p6jcich9ggnpwrbwxw-linux-libre-5.11.20/bzImage
channels:
sfl-packages:
repository URL: https://gitlab.com/Apteryks/sfl-guix-channel
branch: master
commit: 37d017573350b64f8a8c992530153f42806b6a6f
guix:
repository URL: https://git.savannah.gnu.org/git/guix.git
branch: master
commit: bb325c5611553a6db21ee7499ac07d5757d24fc3
configuration file:
/gnu/store/qvhl7ya2xn4gr9mn29hg93p1dcbdlyfy-configuration.scm
with the guix-daemon running being:

Toggle snippet (6 lines)
/gnu/store/9zh3bg8d4y08jnkqyrk6xczahiahhcy4-guix-1.3.0-1.771b866/bin/guix-daemon
29920 guixbuild --max-silent-time 0 --timeout 0 --log-compression none
--discover=no --substitute-urls http://127.0.0.1:8080
https://ci.guix.gnu.org --max-jobs=4

Attempting to update my profile keeps failing with:

Toggle snippet (22 lines)
$ ./pre-inst-env guix package -m ~/stow/guix/manifest.scm -L ~/src/sfl-guix-channel/ --substitute-urls=https://ci.guix.gnu.org --no-offload
;;; note: source file /home/maxim/src/guix-master/gnu/packages/networking.scm
;;; newer than compiled /home/maxim/src/guix-master/gnu/packages/networking.go
;;; note: source file /home/maxim/src/guix-master/gnu/packages/networking.scm
;;; newer than compiled /run/current-system/profile/lib/guile/3.0/site-ccache/gnu/packages/networking.go
The following packages will be installed:
acpi 1.7
adb 7.1.2_r36
adwaita-icon-theme 3.34.3
alsa-utils 1.2.4
[...]
xrandr 1.5.1
xrdb 1.2.0
xsetroot 1.1.2
yelp 3.32.2

122.8 MB will be downloaded
libreoffice-6.4.7.2 117.1MiB 344KiB/s 03:04 [######### ] 52.7%guix substitute: error: TLS error in procedure 'read_from_session_record_port': Error decoding the received TLS packet.
substitution of /gnu/store/44h13hn5zssfppz67vydxcf95qsc8qfw-libreoffice-6.4.7.2 failed
guix package: error: some substitutes for the outputs of derivation `/gnu/store/9f8sffldy39mprihx6xgrs7hys9j75jm-libreoffice-6.4.7.2.drv' failed (usually happens due to networking issues); try `--fallback' to build derivation from source

I'm attaching my (large!) profile manifest. It depends on the
Python packages. You could comment out the "sflvault-client" package
from the manifest to lift that requirement.

Thanks,

Maxim
(use-modules (gnu packages) (gnu packages emacs) (guix build-system emacs) (guix profiles)) (concatenate-manifests (list ;;; Emacs packages. (specifications->manifest '("emacs" "emacs-auctex" "emacs-bash-completion" "emacs-bbdb" "emacs-cmake-mode" "emacs-company" "emacs-company-quickhelp" "emacs-counsel" "emacs-counsel-bbdb" "emacs-csv-mode" "emacs-debbugs" "emacs-diff-hl" "emacs-el-mock" "emacs-elpy" "emacs-emms" "emacs-ggtags" "emacs-go-mode" "emacs-grep-a-lot" "emacs-groovy-modes" "emacs-guix" "emacs-htmlize" "emacs-ivy" "emacs-magit" "emacs-markdown-mode" "emacs-nix-mode" "emacs-org" "emacs-org-reveal" "emacs-paredit" "emacs-php-mode" "emacs-pdf-tools" "emacs-qml-mode" "emacs-realgud" "emacs-rpm-spec-mode" "emacs-sr-speedbar" "emacs-string-inflection" "emacs-swiper" "emacs-w3m" "emacs-ws-butler" "emacs-yaml-mode" "emacs-yasnippet" "emacs-yasnippet-snippets")) ;; Other software. (specifications->manifest '("adb" "acpi" "adwaita-icon-theme" "alsa-utils" "anthy" "arc-icon-theme" "arc-theme" "aspell" "aspell-dict-en" "aspell-dict-fr" "autoconf" "automake" "autossh" "bash" "bc" "beep" "bind:utils" ;for 'dig' "bluez" "bridge-utils" "cheese" "compsize" "cqfd" "cryptsetup" "curl" "dbus" "dconf" "ddcutil" "diffoscope" "docker-cli" "dosfstools" "evince" "file" "font-adobe-source-han-sans" "font-dejavu" "font-google-roboto" "font-hack" "gcc-toolchain" "gdb" "geeqie" "ghostscript-with-x" "gimp" "git" "git:send-email" "glibc-locales" "global" "gnome-bluetooth" "gnome-boxes" "gnu-standards" "gnucash" "gnucash:doc" "gnupg" "graphviz" "grub" ;for the manual "gtk-engines" "guile" "guile-lib" "guile-readline" "guile-sqlite3" "guile-ssh" "hackneyed-x11-cursors" "hicolor-icon-theme" "hunspell" "hunspell-dict-fr" "ibus" "ibus-anthy" "icecat" "imagemagick" "inetutils" "inkscape" "iotop" "jack" "jami-gnome" "jami-qt" "nethogs" ;pre-process bandwith monitoring "jnettop" ;bandwidth monitoring "keepassxc" "libjpeg" "libmtp" "libpcap" "libreoffice" "libssh" "libx11" "linphone-desktop" "lm-sensors" "lsof" "ltrace" "lvm2" ;for dmsetup "maim" ;take screenshots "make" "man-pages" "mesa-utils" "moreutils" "mpv" "mtr" "ncftp" ;for gnupload "nmap" "openssh" "openvpn" "parted" "pavucontrol" "perl" "pinentry" "pkg-config" "poppler" "pulseaudio" "pv" "python" "python-wrapper" "qemu" "recutils" "rofi" "rsync" "rtorrent" "screen" "setxkbmap" "shepherd" "sicp" "smartmontools" "spacefm" "stow" "strace" "sysstat" ;for iostat "tcpdump" "the-silver-searcher" ;ag "time" ;aliased to time+ "transmission" "transmission:gui" "tree" "unzip" "vinagre" "vorbis-tools" "weechat" "wget" "workrave" "wpa-supplicant" "xclip" "xdpyinfo" "xdg-utils" "xev" "xmodmap" "xournal" "xrandr" "xrdb" "xsetroot" "yelp" "gxtuner" "shellcheck" "wireguard-tools" "wireshark")) ;; SFL stuff -- todo extract in separate manifest (specifications->manifest '("ansible" "docker-compose" "emacs-adoc-mode" "emacs-clang-format" "emacs-clang-rename" "emacs-feature-mode" "picocom" "python-git-review" "sflvault-client" "sshpass" "ungoogled-chromium" "ddrescue"))))
L
L
Ludovic Courtès wrote on 7 Jun 2021 17:38
control message for bug #48903
(address . control@debbugs.gnu.org)
87zgw1it1d.fsf@gnu.org
severity 48903 important
quit
M
M
Maxime Devos wrote on 7 Jun 2021 19:46
Re: bug#48903: guix substitute: error: TLS error in procedure 'read_from_session_record_port': Error decoding the received TLS packet.
bd60b06842be6645d2947e49b709ad358e9516bc.camel@telenet.be
Toggle quote (7 lines)
> 122.8 MB will be downloaded
> libreoffice-6.4.7.2 117.1MiB 344KiB/s 03:04 [######### ] 52.7%guix substitute: error: TLS error in procedure 'read_from_session_record_port': Error decoding the received TLS packet.
> substitution of /gnu/store/44h13hn5zssfppz67vydxcf95qsc8qfw-libreoffice-6.4.7.2 failed
> guix package: error: some substitutes for the outputs of derivation `/gnu/store/9f8sffldy39mprihx6xgrs7hys9j75jm-libreoffice-6.4.7.2.drv' failed (usually happens due to networking issues); try `--fallback' to build derivation from source
> --8<---------------cut here---------------end--------------->8---
>

I often have the same problem when I do "guix package -u".
(Same error message, same package libreoffice, same derivation)
(Usually libreoffice, sometimes with other packages as well.)

I don't know the cause though.
-----BEGIN PGP SIGNATURE-----

iI0EABYKADUWIQTB8z7iDFKP233XAR9J4+4iGRcl7gUCYL5bThccbWF4aW1lZGV2
b3NAdGVsZW5ldC5iZQAKCRBJ4+4iGRcl7vdfAQDP7hYYsVYbCvm3aiDP7R0lxdSJ
hfH67m16fdVsGntpVgD/eOHTlj3H8LvnAZFI1ncxWKCNlbPIg8uMcgxKBa2hrwM=
=4APP
-----END PGP SIGNATURE-----


L
L
Ludovic Courtès wrote on 11 Jun 2021 17:09
(name . Maxime Devos)(address . maximedevos@telenet.be)
87wnr05tfz.fsf@gnu.org
Hi Maxim{,e}!

Maxime Devos <maximedevos@telenet.be> skribis:

Toggle quote (11 lines)
>> 122.8 MB will be downloaded
>> libreoffice-6.4.7.2 117.1MiB 344KiB/s 03:04 [######### ] 52.7%guix substitute: error: TLS error in procedure 'read_from_session_record_port': Error decoding the received TLS packet.
>> substitution of /gnu/store/44h13hn5zssfppz67vydxcf95qsc8qfw-libreoffice-6.4.7.2 failed
>> guix package: error: some substitutes for the outputs of derivation `/gnu/store/9f8sffldy39mprihx6xgrs7hys9j75jm-libreoffice-6.4.7.2.drv' failed (usually happens due to networking issues); try `--fallback' to build derivation from source
>> --8<---------------cut here---------------end--------------->8---
>>
>
> I often have the same problem when I do "guix package -u".
> (Same error message, same package libreoffice, same derivation)
> (Usually libreoffice, sometimes with other packages as well.)

As a first step, can you reproduce the bug like this:

while echo substitute /gnu/store/44h13hn5zssfppz67vydxcf95qsc8qfw-libreoffice-6.4.7.2 /tmp/t | guix substitute --substitute ; do chmod -R +w /tmp/t && rm -rf /tmp/t; done

?

FWIW, I can’t seem to reproduce it with:

Toggle snippet (8 lines)
$ guix describe
Generacio 185 Jun 07 2021 15:07:46 (nuna)
guix e3611cc
repository URL: https://git.savannah.gnu.org/git/guix.git
branch: master
commit: e3611cc412e7b1c750a56d17fb1b7cde684baa3f

TIA,
Ludo’.
L
L
Ludovic Courtès wrote on 18 Jun 2021 09:47
control message for bug #48903
(address . control@debbugs.gnu.org)
87zgvnlile.fsf@gnu.org
merge 48903 49071
quit
M
M
Maxim Cournoyer wrote on 29 Jun 2021 15:23
Re: bug#48903: guix substitute: error: TLS error in procedure 'read_from_session_record_port': Error decoding the received TLS packet.
(name . Ludovic Courtès)(address . ludo@gnu.org)
87o8bovm7t.fsf@gmail.com
Hello Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:

[...]

Toggle quote (18 lines)
> As a first step, can you reproduce the bug like this:
>
> while echo substitute
> /gnu/store/44h13hn5zssfppz67vydxcf95qsc8qfw-libreoffice-6.4.7.2 /tmp/t
> | guix substitute --substitute ; do chmod -R +w /tmp/t && rm -rf
> /tmp/t; done
>
> ?
>
> FWIW, I can’t seem to reproduce it with:
>
> $ guix describe
> Generacio 185 Jun 07 2021 15:07:46 (nuna)
> guix e3611cc
> repository URL: https://git.savannah.gnu.org/git/guix.git
> branch: master
> commit: e3611cc412e7b1c750a56d17fb1b7cde684baa3f

I can't seem to reproduce either. Perhaps the issue only arises when
there are many things happening concurrently. My daemon runs with:

Toggle snippet (12 lines)
$ sudo ps -eF | grep guix-daemon
root 25193 216 0 3074 1524 3 Jun28 ? 00:00:00 /gnu/store/vphx2839xv0qj9xwcwrb95592lzrrnx7-guix-1.3.0-3.50dfbbf/bin/guix-daemon 25178 guixbuild --max-silent-time 0 --timeout 0 --log-compression none --discover=no --substitute-urls http://127.0.0.1:8080 https://ci.guix.gnu.org --max-jobs=4--8<---------------cut here---------------end--------------->8---

I can rather easily (and annoyingly!) trigger the problem (and a few
variations of it, it seems) with something like:

$ packages=$(guix refresh -l protobuf | sed 's/^.*: //')
$ guix build -v3 --keep-going $packages

For example, running the above, I just got:

guix build: error: corrupt input while restoring archive from #<closed:
file 7fc95acfc2a0>
Toggle snippet (14 lines)
Does the above commands succeed on the first time on your end? If you
have already lots of things cached, you can try for an architecture you
don't often build for by adding the '--system=i686-linux' option; that
should cause a massive amount of downloads, likely to trigger the
problem. Perhaps also try to use --max-jobs=4.

If you have ideas of how to debug this when I hit the issue I'm all ears
:-).

Thank you!

Maxim
L
L
Ludovic Courtès wrote on 29 Jun 2021 23:18
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
87sg10pdxs.fsf@gnu.org
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (21 lines)
> $ sudo ps -eF | grep guix-daemon
> root 25193 216 0 3074 1524 3 Jun28 ? 00:00:00 /gnu/store/vphx2839xv0qj9xwcwrb95592lzrrnx7-guix-1.3.0-3.50dfbbf/bin/guix-daemon 25178 guixbuild --max-silent-time 0 --timeout 0 --log-compression none --discover=no --substitute-urls http://127.0.0.1:8080 https://ci.guix.gnu.org --max-jobs=4--8<---------------cut here---------------end--------------->8---
>
> I can rather easily (and annoyingly!) trigger the problem (and a few
> variations of it, it seems) with something like:
>
> $ packages=$(guix refresh -l protobuf | sed 's/^.*: //')
> $ guix build -v3 --keep-going $packages
>
> For example, running the above, I just got:
>
> guix build: error: corrupt input while restoring archive from #<closed:
> file 7fc95acfc2a0>
> --8<---------------cut here---------------end--------------->8---
>
> Does the above commands succeed on the first time on your end? If you
> have already lots of things cached, you can try for an architecture you
> don't often build for by adding the '--system=i686-linux' option; that
> should cause a massive amount of downloads, likely to trigger the
> problem. Perhaps also try to use --max-jobs=4.

I’ve tried that, with --max-jobs=4, and it fills my disk just fine. :-/

Toggle quote (3 lines)
> If you have ideas of how to debug this when I hit the issue I'm all ears
> :-).

The attached patch substitutes a number of store items in a row; run:

guix repl -- substitute-stress.scm

and it’ll fill /tmp/substitute-test with 200 substitutes, which should
be equivalent to the kind of stress test you had above.

It doesn’t crash for me. There are a few “error: no valid substitute
for /gnu/store/…” errors, but these are expected: was ask for
substitutes for 200 packages without first checking whether substitutes
are available.

Could you run it and report back?

You can try with more packages, different substitute URLs, etc.

TIA!

Ludo’.
(use-modules (guix) (gnu packages) (guix scripts substitute) (guix grafts) (guix build utils) (srfi srfi-1) (ice-9 match) (ice-9 threads)) (define test-directory "/tmp/substitute-test") (define packages ;; Subset of packages for which we request substitutes. (take (fold-packages cons '()) 200)) (define (spawn-substitution-thread input urls) "Spawn a 'guix substitute' thread that reads commands from INPUT and uses URLS as the substitute servers." (call-with-new-thread (lambda () (parameterize ((%reply-file-descriptor #f) (current-input-port input)) (setenv "_NIX_OPTIONS" (string-append "substitute-urls=" (string-join urls))) (let loop () (format (current-error-port) "starting substituter~%") ;; Catch "no valid substitute" errors. (catch 'quit (lambda () (guix-substitute "--substitute")) (const #f)) (unless (eof-object? (peek-char input)) (loop))))))) (match (pipe) ((input . output) (let ((thread (spawn-substitution-thread input %default-substitute-urls))) ;; Remove the test directory. (when (file-exists? test-directory) (for-each make-file-writable (find-files test-directory #:directories? #t)) (delete-file-recursively test-directory)) (mkdir-p test-directory) (parameterize ((%graft? #false)) (with-store store ;; Ask for substitutes for PACKAGES. (for-each (lambda (package n) (define item (run-with-store store (package-file package))) (format output "substitute ~a ~a/~a~%" item test-directory n)) packages (iota (length packages)))) (format #t "sent ~a substitution requests...~%" (length packages)) (close-port output) ;; Wait for substitution to complete. (join-thread thread)))))
M
M
Maxim Cournoyer wrote on 30 Jun 2021 18:26
(name . Ludovic Courtès)(address . ludo@gnu.org)
871r8j5nev.fsf@gmail.com
Hello!

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (55 lines)
> Hi,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> $ sudo ps -eF | grep guix-daemon
>> root 25193 216 0 3074 1524 3 Jun28 ? 00:00:00
>> /gnu/store/vphx2839xv0qj9xwcwrb95592lzrrnx7-guix-1.3.0-3.50dfbbf/bin/guix-daemon
>> 25178 guixbuild --max-silent-time 0 --timeout 0 --log-compression
>> none --discover=no --substitute-urls http://127.0.0.1:8080
>> https://ci.guix.gnu.org --max-jobs=4--8<---------------cut
>> here---------------end--------------->8---
>>
>> I can rather easily (and annoyingly!) trigger the problem (and a few
>> variations of it, it seems) with something like:
>>
>> $ packages=$(guix refresh -l protobuf | sed 's/^.*: //')
>> $ guix build -v3 --keep-going $packages
>>
>> For example, running the above, I just got:
>>
>> guix build: error: corrupt input while restoring archive from #<closed:
>> file 7fc95acfc2a0>
>> --8<---------------cut here---------------end--------------->8---
>>
>> Does the above commands succeed on the first time on your end? If you
>> have already lots of things cached, you can try for an architecture you
>> don't often build for by adding the '--system=i686-linux' option; that
>> should cause a massive amount of downloads, likely to trigger the
>> problem. Perhaps also try to use --max-jobs=4.
>
> I’ve tried that, with --max-jobs=4, and it fills my disk just fine. :-/
>
>> If you have ideas of how to debug this when I hit the issue I'm all ears
>> :-).
>
> The attached patch substitutes a number of store items in a row; run:
>
> guix repl -- substitute-stress.scm
>
> and it’ll fill /tmp/substitute-test with 200 substitutes, which should
> be equivalent to the kind of stress test you had above.
>
> It doesn’t crash for me. There are a few “error: no valid substitute
> for /gnu/store/…” errors, but these are expected: was ask for
> substitutes for 200 packages without first checking whether substitutes
> are available.
>
> Could you run it and report back?
>
> You can try with more packages, different substitute URLs, etc.
>
> TIA!
>
> Ludo’.

[...]

I've tried with the following modified version which runs multiple
threads in parallel (to mimic --max-jobs=4 on the daemon), and I've yet
to trigger it, although the hard drive is grinding heavily:

Toggle snippet (75 lines)
(use-modules (guix) (gnu packages)
(guix scripts substitute)
(guix grafts)
(guix build utils)
(srfi srfi-1)
(ice-9 match)
(ice-9 threads))

(define test-directory "/tmp/substitute-test")

(define max-jobs 4)

(define packages
;; Subset of packages for which we request substitutes.
(append (map specification->package '("libreoffice"
"ungoogled-chromium"
"openjdk"
"texmacs"))
(take (fold-packages cons '()) 1000)))

(define (spawn-substitution-thread input urls)
"Spawn a 'guix substitute' thread that reads commands from INPUT and uses
URLS as the substitute servers."
(call-with-new-thread
(lambda ()
(parameterize ((%reply-file-descriptor #f)
(current-input-port input))
(setenv "_NIX_OPTIONS"
(string-append "substitute-urls=" (string-join urls)))
(let loop ()
(format (current-error-port) "starting substituter~%")
;; Catch "no valid substitute" errors.
(catch 'quit
(lambda ()
(guix-substitute "--substitute"))
(const #f))
(unless (eof-object? (peek-char input))
(loop)))))))

(for-each (lambda (job)
(match (pipe)
((input . output)
(let ((test-directory* (string-append test-directory "-"
(number->string job)))
(thread (spawn-substitution-thread
input %default-substitute-urls)))
;; Remove the test directory.
(when (file-exists? test-directory*)
(for-each (lambda (f)
(false-if-exception (make-file-writable f)))
(find-files test-directory #:directories? #t))
(delete-file-recursively test-directory*))
(mkdir-p test-directory*)

(parameterize ((%graft? #false))
(with-store store
;; Ask for substitutes for PACKAGES.
(for-each (lambda (package n)
(define item
(run-with-store store
(package-file package)))

(format output "substitute ~a ~a/~a~%"
item test-directory* n))
packages
(iota (length packages))))
(format #t "sent ~a substitution requests...~%"
(length packages))
(close-port output)

;; Wait for substitution to complete.
(join-thread thread))))))
(iota max-jobs))

I wonder if there's something more happening in the real scenario
(validating signatures when putting things in the store? or something
similar) that may have a role in the failure.

That's a tough nut to crack!

I'll keep looking for clues.

Thanks for your time!

Maxim
L
L
Ludovic Courtès wrote on 1 Jul 2021 15:12
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
874kdemb4k.fsf@gnu.org
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (4 lines)
> I've tried with the following modified version which runs multiple
> threads in parallel (to mimic --max-jobs=4 on the daemon), and I've yet
> to trigger it, although the hard drive is grinding heavily:

Note that ‘--max-jobs=4’ leads guix-daemon to spawn 4 ‘guix substitute’
processes, which is not what the script is doing here.

Are the other conditions the same, for instance same network, etc.?

Thanks,
Ludo’.
M
M
Maxim Cournoyer wrote on 2 Jul 2021 16:26
(name . Ludovic Courtès)(address . ludo@gnu.org)
87h7hc3i6v.fsf@gmail.com
Hello,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (11 lines)
> Hi,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> I've tried with the following modified version which runs multiple
>> threads in parallel (to mimic --max-jobs=4 on the daemon), and I've yet
>> to trigger it, although the hard drive is grinding heavily:
>
> Note that ‘--max-jobs=4’ leads guix-daemon to spawn 4 ‘guix substitute’
> processes, which is not what the script is doing here.

Oh! I had overlooked that. What the modified version did is create
threads rather than processes, right?

Toggle quote (2 lines)
> Are the other conditions the same, for instance same network, etc.?

Yes!

Maxim
L
L
Ludovic Courtès wrote on 5 Jul 2021 18:12
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
87y2akspsh.fsf@gnu.org
Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (16 lines)
> Ludovic Courtès <ludo@gnu.org> writes:
>
>> Hi,
>>
>> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>>
>>> I've tried with the following modified version which runs multiple
>>> threads in parallel (to mimic --max-jobs=4 on the daemon), and I've yet
>>> to trigger it, although the hard drive is grinding heavily:
>>
>> Note that ‘--max-jobs=4’ leads guix-daemon to spawn 4 ‘guix substitute’
>> processes, which is not what the script is doing here.
>
> Oh! I had overlooked that. What the modified version did is create
> threads rather than processes, right?

Yes.

So I’m not sure how to better test this. Perhaps you could try
introducing random delays in the loop (which could cause connections to
go stale), using different substitute URLs, things like that.

Thanks,
Ludo’.
M
M
Maxim Cournoyer wrote on 8 Jul 2021 15:59
(name . Ludovic Courtès)(address . ludo@gnu.org)
87pmvs52kl.fsf@gmail.com
Hi!

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (24 lines)
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> Ludovic Courtès <ludo@gnu.org> writes:
>>
>>> Hi,
>>>
>>> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>>>
>>>> I've tried with the following modified version which runs multiple
>>>> threads in parallel (to mimic --max-jobs=4 on the daemon), and I've yet
>>>> to trigger it, although the hard drive is grinding heavily:
>>>
>>> Note that ‘--max-jobs=4’ leads guix-daemon to spawn 4 ‘guix substitute’
>>> processes, which is not what the script is doing here.
>>
>> Oh! I had overlooked that. What the modified version did is create
>> threads rather than processes, right?
>
> Yes.
>
> So I’m not sure how to better test this. Perhaps you could try
> introducing random delays in the loop (which could cause connections to
> go stale), using different substitute URLs, things like that.

I've tried some to reproduce the issue with the modified scripts below,
but in vain. I'm not sure if my delay is inserted at the right place.
I also suspect that my attempt to shuffle the substitute-urls is not
really useful, as that's probably what would have happened anyway
(although I haven't followed in the code deeply enough to confirm).
Thanks,

Maxim
L
L
Ludovic Courtès wrote on 10 Jul 2021 12:20
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
874kd2fp1y.fsf@gnu.org
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (6 lines)
> I've tried some to reproduce the issue with the modified scripts below,
> but in vain. I'm not sure if my delay is inserted at the right place.
> I also suspect that my attempt to shuffle the substitute-urls is not
> really useful, as that's probably what would have happened anyway
> (although I haven't followed in the code deeply enough to confirm).

Bah. :-/ Do the two of you still experience the bug initially reported
here in “real” conditions?

Are we sure we’re using the same Guix + Guile when running the stress
test and in real conditions?

Thanks for testing!

Ludo’.
M
M
Maxim Cournoyer wrote on 7 Aug 2021 06:23
(name . Ludovic Courtès)(address . ludo@gnu.org)
87czqpkhn8.fsf@gmail.com
Hello,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (20 lines)
> Hi,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> I've tried some to reproduce the issue with the modified scripts below,
>> but in vain. I'm not sure if my delay is inserted at the right place.
>> I also suspect that my attempt to shuffle the substitute-urls is not
>> really useful, as that's probably what would have happened anyway
>> (although I haven't followed in the code deeply enough to confirm).
>
> Bah. :-/ Do the two of you still experience the bug initially reported
> here in “real” conditions?
>
> Are we sure we’re using the same Guix + Guile when running the stress
> test and in real conditions?
>
> Thanks for testing!
>
> Ludo’.

I've been doing builds on core-updates, which would previously trigger
this problem rather often, *without* suffering from the problem.

I consider it resolved. I'm not exactly sure how, which is not
satisfying, but I'm glad it's gone.

Thank you!

Closing.

Maxim
Closed
T
T
Thiago Jung Bauermann wrote on 13 Jan 2022 21:21
unarchive bug
(address . control@debbugs.gnu.org)
1840192.3LYBDE1tu0@popigai
unarchive 48903
--
Thanks,
Thiago
T
T
Thiago Jung Bauermann wrote on 13 Jan 2022 21:47
Re: bug#42902: texlive substitute TLS error: decoding the received packet
(address . 48903@debbugs.gnu.org)
2309789.4Gk3HTWCPZ@popigai
Hello,

[ Sending again to 48903 (and excluding everyone else from the recipients)
because this bug was archived and thus the original message bounced. ]

I’m sending this email to two bugs because they are both about the same
problem. Bug 48903 (which is currently closed) has fairly intense attempts
at debugging what is going on, but unfortunately without arriving at an
answer.

This problem is also affecting Cuirass builds. This powerpc64le-linux build:


failed because of the intermitent TLS error:

Toggle snippet (5 lines)
guix substitute: error: TLS error in procedure 'read_from_session_record_port': Error decoding the received TLS packet.
fetching path `/gnu/store/fkkfffvwrj103zs5cf6d8bf9as46ywhc-python-minimal-3.5.9' (empty status: '')
@ substituter-failed /gnu/store/fkkfffvwrj103zs5cf6d8bf9as46ywhc-python-minimal-3.5.9 fetching path `/gnu/store/fkkfffvwrj103zs5cf6d8bf9as46ywhc-python-minimal-3.5.9' (empty status: '')

The daemon in this case is:

/gnu/store/ny30pxjzv866m3w0v1vfbzdbqi17k8wn-guix-daemon-1.3.0-21.e427593/bin/guix-daemon

From this Guix version

Toggle snippet (8 lines)
guixp9: sudo -i guix describe
Generation 11 Jan 11 2022 02:07:42 (current)
guix 83abdc8
repository URL: https://git.savannah.gnu.org/git/guix.git
branch: master
commit: 83abdc8371d90b6d4591a69fae5585a2a99c1627

Not sure if this is related, but during that build Guix noticed that the substitute server is slow:

Toggle snippet (5 lines)
Downloading https://ci.guix.gnu.org/nar/lzip/fkkfffvwrj103zs5cf6d8bf9as46ywhc-python-minimal-3.5.9...
guix substitute: warning: while fetching https://ci.guix.gnu.org/nar/lzip/fkkfffvwrj103zs5cf6d8bf9as46ywhc-python-minimal-3.5.9: server is somewhat slow
guix substitute: warning: try `--no-substitutes' if the problem persists

There’s bug 46942 which is specifically about ci.guix.gnu.org being slow,
and the bug reporter there also hits this same TLS error, so there’s
probably at least some correlation between this problem and the network
being slow.

--
Thanks,
Thiago
?