Hi Ludovic, Thanks for investigating this. Ludovic Courtès writes: > The third read(2) call here ends on a partial UTF-8 sequence for LEFT > SINGLE QUOTATION MARK (we get the first two bytes of a three byte > sequence.) > > What happens is that ‘process-stderr’ in (guix store) gets that byte > string from the daemon, passes it through ‘read-maybe-utf8-string’, > which replaces the last two bytes with REPLACEMENT CHARACTER, which is > itself a 3-byte sequence. It seems to me that what's needed here is to save the UTF-8 decoder state between calls to 'process-stderr'. Coincidentally, I also needed something like this a week ago, when I tried implementing R6RS custom textual input/output ports on top of R6RS custom binary input/output ports. To meet these needs, I've implemented a fairly efficient, purely functional UTF-8 decoder in Scheme that accepts a decoder state and an arbitrary range from a bytevector, and returns a new decoder state. There's a macro that allows arbitrary actions to be performed when a code point (or maximal subpart in the case of errors) is found. This macro is then used to implement a decoder (utf8->string!) that writes into an arbitrary range of an existing string. Of course, it's not purely functional, but it avoids heap allocation when compiled with Guile. On my Thinkpad X200, it can process around 10 megabytes per second. The state is represented as an exact integer between 0 and #xF48FBF inclusive, which are simply the bytes that have been seen so far in the current code sequence, in big-endian order, or 0 for the start state. For example, #xF48FBF represents the state where the bytes (F4 8F BF) have been read. The state is always either 0 or a proper prefix of a valid UTF-8 byte sequence. I also plan to implement an optimized C version of 'utf8->string!' and add it to Guile, in order to implement fast custom textual ports. The precise name and API is not yet finalized. At present, 'utf8->string!' always replaces maximal subparts with the substitution character in case of errors, but I intend to eventually support other error modes as well. What would you think about using this code to replace the uses of 'read-maybe-utf8-string', and storing the UTF-8 decoder state in the object? Would we need to store multiple states in case of (max-jobs > 1)? Regards Mark