Hi Mark, Mark H Weaver skribis: > Ludovic Courtès writes: > >> Mark H Weaver skribis: >> >>> Sometimes when compiling a package with --verbosity=1, some parts of the >>> compile output leak through. For example, see the transcript below. >> >> Weird. > > FWIW, a few observations, possibly relevant: > > (1) Each chunk of leaked output begins with 1 or 2 Unicode Replacement > characters (U+FFFD). In the transcript I provided, the first leak > began with 1 replacement char, and all later leaks began with 2. > > (2) The replacement characters are immediately followed by > "@ build-log 30033 4096\n", and that string is also sprinkled > throughout the leaked output, with approximately ~4060-4070 > characters of leaked output between each occurrence of > "@ build-log 30033 4096\n". Indeed. I managed to reproduce it while building modem-manager. I strace’d ‘guix build’ with an additional ‘pk’¹ to see what happens, and here’s what leads to the wrong “write(2, "�@ build-log…")” call: --8<---------------cut here---------------start------------->8--- read(13, "gmlo\0\0\0\0", 8) = 8 read(13, "\27\20\0\0\0\0\0\0", 8) = 8 read(13, "@ build-log 22090 4096\n […] warning: \342\200", 4119) = 4119 read(13, "\0", 1) = 1 write(1, "\n", 1) = 1 write(1, ";;; (write 1008 <> #f 0)\n", 25) = 25 write(1, "\n", 1) = 1 write(1, ";;; (write 985 <> 22090 4096)\n", 30) = 30 write(1, "\n", 1) = 1 write(1, ";;; (write 1008 <> 22090 3111)\n", 31) = 31 write(1, "\n", 1) = 1 write(1, ";;; (write 1008 <> 22090 2103)\n", 31) = 31 write(1, "\n", 1) = 1 write(1, ";;; (write 1008 <> 22090 1095)\n", 31) = 31 write(1, "\n", 1) = 1 write(1, ";;; (write 88 <> 22090 87)\n", 27) = 27 write(2, "\r\33[K\\ 'build' phase", 19) = 19 […] write(2, "\r\33[K\\ 'build' phase", 19) = 19 write(1, "\n", 1) = 1 write(1, ";;; (write 1 <> #f 0)\n", 22) = 22 read(13, "gmlo\0\0\0\0", 8) = 8 read(13, "\27\20\0\0\0\0\0\0", 8) = 8 read(13, "@ build-log 22090 4096\n\230g_simple_async_result_take_error\342\200\231 is deprecated[…]", 4119) = 4119 read(13, "\0", 1) = 1 write(1, "\n", 1) = 1 write(1, ";;; (write 1008 <> #f 0)\n", 25) = 25 write(2, "\357\277\275@ build-log 22090 4096\n", 26) = 26 --8<---------------cut here---------------end--------------->8--- The third read(2) call here ends on a partial UTF-8 sequence for LEFT SINGLE QUOTATION MARK (we get the first two bytes of a three byte sequence.) What happens is that ‘process-stderr’ in (guix store) gets that byte string from the daemon, passes it through ‘read-maybe-utf8-string’, which replaces the last two bytes with REPLACEMENT CHARACTER, which is itself a 3-byte sequence. Thus, we have this extra byte that’s being inserted. That confuses the whole machinery since the build log was announced as being 4096-byte long, and it’s now 4097-byte long. Internally, ‘build-event-output-port’ keeps the last byte of the REPLACEMENT CHARACTER sequence in the ‘%fragments’ buffer. Consequently, the “@ build-log” string that comes next doesn’t start on a newline, and thus it is considered build output. Since the first byte does not constitute a valid UTF-8 sequence, another REPLACEMENT CHARACTER is inserted there when it gets printed. So ‘build-event-output-port’ is working as expected. The problem is the first layer of UTF-8 decoding that happens in ‘process-stderr’, in the ‘%stderr-next’ case. We would need to disable it, but only if the build output port is ‘build-event-output-port’ (i.e., it’s capable of interpreting “multiplexed build output” correctly.) Thanks, Ludo’. ¹ pk: