Hi Mark, Mark H Weaver skribis: > Ludovic Courtès writes: [...] >> So there are two things. To fix the issue you reported (build output >> that goes through), I think we must simply turn off UTF-8 decoding from >> ‘process-stderr’ and leave that entirely to ‘build-event-output-port’. > > Can we assume that UTF-8 is the appropriate encoding for > (current-build-output-port)? My interpretation of the Guix manual entry > for 'current-build-output-port' suggests that the answer should be "no". What goes to ‘current-build-output-port’ comes from builds processes. It’s usually UTF-8 but it can be anything, including binary garbage, which should be gracefully handled. That’s why ‘process-stderr’ currently uses ‘read-maybe-utf8-string’. > Also, in your previous message you wrote: > > The problem is the first layer of UTF-8 decoding that happens in > ‘process-stderr’, in the ‘%stderr-next’ case. We would need to > disable it, but only if the build output port is > ‘build-event-output-port’ (i.e., it’s capable of interpreting > “multiplexed build output” correctly.) > > It sounds like you're suggesting that 'process-stderr' should look to > see if (current-build-output-port) is a 'build-event-output-port', and > in that case it should use binary I/O primitives to write raw binary > data to it, otherwise it should use text I/O primitives and write > characters to it. Do I understand correctly? Yes. (Actually, rather than guessing if (current-build-output-port) is a ‘build-event-output-port’, there could be a fluid to ask for the use of raw binary primitives.) > IMO, it would be cleaner to treat 'build-event-output-port' uniformly, > and specifically as a textual port of unknown encoding. (You mean ‘current-build-output-port’, right?) I think you’re right. I’m not yet entirely sure what the implications are. There’s a couple of tests in tests/store.scm for UTF-8 interpretation that describe behavior that I think we should preserve. > I would suggest changing 'build-event-output-port' to create an R6RS > custom *textual* output port, so that it wouldn't have to worry about > encodings at all, and it would only be given whole characters. > Internally, it would be doing exactly what you suggest above, but those > details would be encapsulated within the custom textual port. > > However, I don't think we can use Guile's current implementation of R6RS > custom textual output ports, which are currently built on Guile's legacy > soft ports, which I suspect have a similar bug with multibyte characters > sometimes being split (see 'soft_port_write' in vports.c). > > Having said all of this, my suggestions would ultimately entail having > two separate places along the stderr pipeline where 'utf8->string!' > would be used, and maybe that's too much until we have a more optimized > C implementation of it. Yeah it looks like we don’t yet have custom textual output ports that we could rely on, do we? I support your work to add that in Guile proper! Thanks, Ludo’.