Ludovic Courtès <ludovic.courtes@inria.fr> skribis:
Toggle quote (11 lines)
> Inside the childhurd, we get:
>
> root@childhurd ~# tail -1 /var/log/guix-daemon.log
> 2024-09-09 21:09:03 unexpected build daemon error: stoi
>
> Last time we got that error was in commit
> 21deb89e287b5821975544118bf137562a91d4e1: guix-daemon was running with
> incorrect locale data, and thus ‘std::stoi’ would throw (‘std::stoi’ in
> guix-daemon is used for the communication with the ‘guix authenticate’
> process, see ‘readAuthenticateReply’).
I wanted to ‘rpctrace’ to see what ‘guix authenticate’ and ‘guix-daemon’
are telling each other. The problem is that ‘rpctrace’ is kinda broken,
or at least it wrecks havoc when trying to follow forks or something.
But I found a trick that allowed me to trace just ‘guix authenticate’:
Toggle snippet (9 lines)
root@childhurd ~# cat intercept-guix.sh
#!/bin/sh
/usr/bin/env > /tmp/env.log
echo >> /tmp/env.log
echo "$@" >> /tmp/env.log
exec rpctrace -o /tmp/rpctrace.log -s 200 /gnu/store/cg64w7mv1v2r188rzcgksdva9cvj8vir-guix-1.4.0-24.9a2ddcc/bin/guix "$@"
root@childhurd ~# LC_ALL=fr_FR.utf8 GUIX=$PWD/intercept-guix.sh /gnu/store/cg64w7mv1v2r188rzcgksdva9cvj8vir-guix-1.4.0-24.9a2ddcc/bin/guix-daemon --build-users-group guixbuild --max-silent-time 3600 --timeout 86400 --log-compression gzip --discover=no --disable-chroot --disable-deduplication
(Here “LC_ALL=fr_FR.utf8” allows me to reproduce the stoi bug.)
We get this:
Toggle snippet (13 lines)
root@childhurd ~# tail /tmp/rpctrace.log
5<--35(pid166)->dir_lookup ("gnu/store/9ghq6s4mq5sff9cwqrmn26ivycn3p8ql-guile-3.0.9/lib/guile/3.0/ccache/system/foreign-library.go" 4194305 0) = 0 1 "" 67<--70(pid166)
67<--70(pid166)->io_seek_request (0 2) = 0 98141
67<--70(pid166)->io_map_request () = 0 69<--66(pid166) (null)
task30(pid166)-> 2089 (0 98141 0 1 69<--66(pid166) 0 1 1 7 1) = 0 61452288
task30(pid166)-> 3206 (pn{ 28}) = 0
task30(pid166)-> 3206 (pn{ 27}) = 0
task30(pid166)-> 2024 (61517824 18824 0 3) = 0
14<--31(pid166)->io_write_request ("GC Warning: Repeated allocation of very large block (appr. size 112 KiB):\n\tMay lead to memory leak and poor performance\n" -1) = 0 120
task30(pid166)-> 2012 (1 22) = 0 {0 25 195772416 15151104 0 0 0 0 1726072790 570000 0 0 0 0 0 0 0 0 1726072790 0 570000000 0}
task30(pid166)-> 2012 (3 12) = 0 {0 30000 0 20000 0 0 30000000 0 0 0 20000000 0}
That warning comes from libgc and there’s no such warning when LC_ALL is
left unset, which is why it works in that case.
The warning goes to stderr, but that’s what guix-daemon ends up reading,
hence the ‘stoi’ error. One can reproduce that by printing something on
(current-error-port) from ‘guix authenticate’ and running, say, “make
check TESTS=tests/store.scm”: a bunch of tests fail.
It’s not clear to me why libgc keeps printing that warning on i586-gnu
(I think it’s fine on i686-linux so there might be a portability issue),
and it’s not clear either why changing the locale triggers the warning
(it might have to do with the so-called GC “black lists”, since loading
a locale effectively changes the address space layout somewhat).
I see several possible things to do:
1. Disable GC warnings altogether in Guix by setting ‘GC_warn_proc’.
2. Make sure guix-daemon discards stderr from agents such as ‘guix
authenticate’.
3. Figure out why those “Repeated allocation” messages are so frequent
on i586-gnu and Do Something About It™.
Thoughts?
Ludo’.