mumi does not correctly display (some?) non-ascii characters

  • Done
  • quality assurance status badge
Details
6 participants
  • Arun Isaac
  • Felix Lechner
  • Christopher Baines
  • Maxim Cournoyer
  • noe
  • Tomas Volf
Owner
unassigned
Submitted by
Tomas Volf
Severity
normal
Blocked by

Debbugs page

Tomas Volf wrote 1 years ago
(address . bug-mumi@gnu.org)
Zds6yhPkZ0Id6SAT@ws
Hi,

when I compare mumi page[0] with debbugs page[1], the from field displays "???"
in mumi, but "宋文武" in debbugs.

Have a nice day,
Tomas Volf


--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEt4NJs4wUfTYpiGikL7/ufbZ/wakFAmXbOsoACgkQL7/ufbZ/
wanbRxAApKKRUtO29oBTdcr2QO99NNsvW2cq5VWAm4iAKEaThMzZneIT5ljX7WHi
qMjAJq4wBJMzRiZpym51Z/ZVJQ4l7AW7fZmzmjcMp7f66WxG1Ob6B5815lE4Iyyv
8Zo+3UBBLOuCfk0zXgWiO/5DhL0XeLs2FqHvdKOlfI++eNjPwuTLY8wxyk1oB31Z
njJyqXZdpxKa4Z54tyv2JIs2FWAptYAR2MXFTmoEmV4RanNDBXOvYfc84XBFaGE0
ObhK0Nn0pjxmZjzR7XIVvnm4Q6krpmSGd/Pqhe4JTLoeFnl2QV27fKExxGtKw8fm
fR1HTPBQ5+XvfiJ4atVhYsoE7gAH2KW+Db2tm0Wp0S3kqibVndQF2jjZxkuaYvqG
Rdqi4krCZMkDzP8i9fapXyJ9FPo60qYHttpl98HLMt/utL2iNV6sDULH/cv0senR
P1Pdk4n/NzPnbiDskkVGc+FkoZVTkhFEDJbrfAQ8GF7hmktMpXZFpbP0BypMVKz6
aoT6czUisgSXzWtT+rm3LMgjIqPd+JHm9IeEDsZ6KCO7W5mpqwXGK0qB7x6OgJ3S
B33GiqattHNeWlWkGOEs/Ptz9AbWbX8oUaiMZ7UhAyBzCpnaZtXbU9CHf6k3MFGY
z1PgvDtqXfqkMfVNGzzI3NLUjqdDtZOm3EUN5l1WE57Yhze7ESI=
=2/8p
-----END PGP SIGNATURE-----


Felix Lechner wrote 10 months ago
[PATCH] Convert HTML to UTF-8 ourselves. (Closes: #69381)
(address . 69381@patchwise.org)
20240514231249.18303-1-felix.lechner@lease-up.com
This fixes a host of encoding issues in Mumi, including the diff
problems that are not mentioned in the bug. An example is here:


The procedure version may one day be more efficient but does not work.
Based on comments in the Guile source code, the procedure style may
one day enable more advanced response formats. The author is unclear
as to why the procedure does not work. There may be a complex
interaction involving the response headers.

A preview of this code is live at patchwise.org.

The solution of this bug may depend on the patch in Bug#70907. This
patch furthermore depends on the patch in Bug#70906, but the solution
of the bug may not.
---
mumi/web/render.scm | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

Toggle diff (32 lines)
diff --git a/mumi/web/render.scm b/mumi/web/render.scm
index 316ca4c..9b16f8d 100644
--- a/mumi/web/render.scm
+++ b/mumi/web/render.scm
@@ -28,6 +28,7 @@
#:use-module ((ice-9 textual-ports)
#:select (get-string-all put-string))
#:use-module (ice-9 match)
+ #:use-module (rnrs bytevectors)
#:use-module (web http)
#:use-module (web request)
#:use-module (web response)
@@ -104,13 +105,13 @@
(define* (render-html sxml #:key (extra-headers '()))
(values (append extra-headers
'((content-type . (text/html (charset . "utf-8")))))
- (lambda (port)
- (sxml->html sxml port))))
+ (string->utf8
+ (sxml->html-string sxml))))
(define (render-json json)
(values '((content-type . (application/json (charset . "utf-8"))))
- (lambda (port)
- (scm->json json port))))
+ (string->utf8
+ (scm->json-string json))))
(define (not-found uri)
(values (build-response #:code 404)
--
2.41.0
Felix Lechner wrote 10 months ago
(no subject)
(address . control@patchwise.org)
87a5ksvvcc.fsf@lease-up.com
block 69381 by 70906 70907
tags 69381 + patch
thanks
noe wrote 4 months ago
[PATCH] web: Use string to avoid losing unicode characters.
(address . 69381@debbugs.gnu.org)(name . Noé Lopez)(address . noelopez@free.fr)
20241102000730.3330-1-noe@xn--no-cja.eu
From: Noé Lopez <noelopez@free.fr>

I don’t really understand why the unicode characters were lost in the
first place, maybe something in the sanitize-response of (fibers web
server)? Specifically, strings and procedures don’t take the same
path there.

* mumi/web/render.scm (render-html): Return string instead of procedure.
---
mumi/web/render.scm | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

Toggle diff (18 lines)
diff --git a/mumi/web/render.scm b/mumi/web/render.scm
index 168f3bc..c28a26f 100644
--- a/mumi/web/render.scm
+++ b/mumi/web/render.scm
@@ -105,8 +105,9 @@
(define* (render-html sxml #:key (extra-headers '()))
(values (append extra-headers
'((content-type . (text/html (charset . "utf-8")))))
- (lambda (port)
- (sxml->html sxml port))))
+ (call-with-output-string
+ (lambda (port)
+ (sxml->html sxml port)))))
(define (render-json json)
(values '((content-type . (application/json)))
--
2.46.0
Noé Lopez wrote 4 months ago
(address . 69381@debbugs.gnu.org)
87ikt6pk4o.fsf@xn--no-cja.eu
Hi,

Wanted to send this patch separately but had this issue selected in mumi
so it sent it here, oops.

I recognize this solution is not optimal (a hack even), but it should be
heavily considered as the issue is rampant among international users.

I suspect the actual issue lies in fibers, as said in the commit message
and I’ll try to fix it there but this patch is still important in the
meanwhile.

Good night,
Noé
Noé Lopez wrote 4 months ago
(address . 69381@debbugs.gnu.org)
87froape5f.fsf@xn--no-cja.eu
Small update,

I’ve investigated the issue in fibers and I now blame the guile web
library for the issue. Apparently it sets the port to ISO-8859-1
encoding each time you call read-request, but it acts like « yeah don’t
worry just use utf-8 for your body » in the docs.

That’s fine UNLESS you use chunked transfers (omitting content-length in
fibers), in which case it just decides to blow up :///// (it assumes one
character = one byte)

In the end I’m pretty sure any of this could have been avoided by just
not replacing every character with question marks. Had it kept the
invalid bytes intact they would have translated back with no issue.
Maxim Cournoyer wrote 1 months ago
Re: bug#69381: mumi does not correctly display (some?) non-ascii characters
(name . Noé Lopez)(address . noe@xn--no-cja.eu)(address . 69381@debbugs.gnu.org)
87bjvdajpq.fsf_-_@gmail.com
Hi Noé,

Noé Lopez <noe@noé.eu> writes:

Toggle quote (15 lines)
> Small update,
>
> I’ve investigated the issue in fibers and I now blame the guile web
> library for the issue. Apparently it sets the port to ISO-8859-1
> encoding each time you call read-request, but it acts like « yeah don’t
> worry just use utf-8 for your body » in the docs.
>
> That’s fine UNLESS you use chunked transfers (omitting content-length in
> fibers), in which case it just decides to blow up :///// (it assumes one
> character = one byte)
>
> In the end I’m pretty sure any of this could have been avoided by just
> not replacing every character with question marks. Had it kept the
> invalid bytes intact they would have translated back with no issue.

Nice investigation! Did you create an issue at bug-guile@gnu.org?
don't see it on the tracker. Or perhaps this could be tackled from the
angle of fibers? For example by adding a new failing test reproducing
the problem to its test suite, and going from there.

--
Thanks,
Maxim
Noé Lopez wrote 1 months ago
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
878qqhg57z.fsf@xn--no-cja.eu
Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (25 lines)
> Hi Noé,
>
> Noé Lopez <noe@noé.eu> writes:
>
>> Small update,
>>
>> I’ve investigated the issue in fibers and I now blame the guile web
>> library for the issue. Apparently it sets the port to ISO-8859-1
>> encoding each time you call read-request, but it acts like « yeah don’t
>> worry just use utf-8 for your body » in the docs.
>>
>> That’s fine UNLESS you use chunked transfers (omitting content-length in
>> fibers), in which case it just decides to blow up :///// (it assumes one
>> character = one byte)
>>
>> In the end I’m pretty sure any of this could have been avoided by just
>> not replacing every character with question marks. Had it kept the
>> invalid bytes intact they would have translated back with no issue.
>
> Nice investigation! Did you create an issue at bug-guile@gnu.org?
> don't see it on the tracker. Or perhaps this could be tackled from the
> angle of fibers? For example by adding a new failing test reproducing
> the problem to its test suite, and going from there.
>

I talked about this with Christopher Baines at FOSDEM and he seemed to
know much more about it than me, so maybe he can suggest a way forward?

Starting with a failing test seems like a good idea.

Have a nice day,
Noé
Christopher Baines wrote 1 months ago
(name . Noé Lopez)(address . noe@xn--no-cja.eu)
87tt94lqp3.fsf@cbaines.net
Noé Lopez <noe@noé.eu> writes:

Toggle quote (32 lines)
> Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:
>
>> Hi Noé,
>>
>> Noé Lopez <noe@noé.eu> writes:
>>
>>> Small update,
>>>
>>> I’ve investigated the issue in fibers and I now blame the guile web
>>> library for the issue. Apparently it sets the port to ISO-8859-1
>>> encoding each time you call read-request, but it acts like « yeah don’t
>>> worry just use utf-8 for your body » in the docs.
>>>
>>> That’s fine UNLESS you use chunked transfers (omitting content-length in
>>> fibers), in which case it just decides to blow up :///// (it assumes one
>>> character = one byte)
>>>
>>> In the end I’m pretty sure any of this could have been avoided by just
>>> not replacing every character with question marks. Had it kept the
>>> invalid bytes intact they would have translated back with no issue.
>>
>> Nice investigation! Did you create an issue at bug-guile@gnu.org?
>> don't see it on the tracker. Or perhaps this could be tackled from the
>> angle of fibers? For example by adding a new failing test reproducing
>> the problem to its test suite, and going from there.
>>
>
> I talked about this with Christopher Baines at FOSDEM and he seemed to
> know much more about it than me, so maybe he can suggest a way forward?
>
> Starting with a failing test seems like a good idea.

I've raised a Pull Request which I think should help in fibers:


I think this issue should be possible to work around in Mumi as well,
the encoding on the port needs to be set, and I think Guile 3.0.10 needs
to be used.
-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmenJphfFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9XfGlQ/+P0iUxaucWOrxEH6R1sr59fAsoeyYEL/A
9oX7aSQmbqSmp3UYcWMezj806NrqdZ2mILDD17YspOzDqcgDcp4TCWa8vJ7euTE4
dwLUod0cprYmF2yPJKHcvhcjaD8ow2ZRRVn1DftUhG1hr63vRuTUoP458+aoKGJ9
MAGywt59Mol21hedNMqrAKTmPaWO1tgG7FePMwPjosd2qcsbylijlc7VmxGOTKfN
qhnNKWwY0vPXQWW2g2zOd25L57hO+suMCji/GqykLHf6PXRMmu8N2vXp7tYB72ay
ejCLhPTVLQ9QiWOC6x0rg/CIs5f8Q7kUiT3NqTQ6IT7WnizPC7cjzk0kfE1/E58o
7Eor3FEpdSouy5GBuhaC/m5L4+pJl63fWiEE2OvHMXDxo81oMqtNZ8it8jXub4MH
6jTsvvUmSI7yQewbHRDt0q1O3Zo5jl9QeoXTn83lNmBhtuTJwaNnT2b2gcj56BRP
ezN2urUsM2UULQ0uJqjzHuHQES1sodL2RmRQZiav2JWCHeH0HhKws7lzQVwxTvmp
wnwJL2dylPNs46UAsMMskS6qyIwu4iUK2CRgtwN1hfdOcsTHQnlZWw+yNgrUV2Ds
1jUhzmipyTB0TxWMYyfVK/cuE7SU0uRZW2Mm8qOqoMiqUkW6Udsg34x65q6uXOh7
1GZrbT43B2Y=
=Zbun
-----END PGP SIGNATURE-----

Felix Lechner wrote 4 weeks ago
(name . Christopher Baines)(address . mail@cbaines.net)
87mses2kry.fsf@lease-up.com
Hi,

On Sat, Feb 08 2025, Christopher Baines wrote:

Toggle quote (2 lines)
> this issue should be possible to work around in Mumi as well,

What is wrong with my proposed patch, please? Just because a lambda
will eventually save memory and enable chunking?

Kind regards,
Felix
Christopher Baines wrote 4 weeks ago
(name . Felix Lechner)(address . felix.lechner@lease-up.com)(address . 69381@debbugs.gnu.org)
87tt90je7f.fsf@cbaines.net
Felix Lechner <felix.lechner@lease-up.com> writes:

Toggle quote (9 lines)
> Hi,
>
> On Sat, Feb 08 2025, Christopher Baines wrote:
>
>> this issue should be possible to work around in Mumi as well,
>
> What is wrong with my proposed patch, please? Just because a lambda
> will eventually save memory and enable chunking?

I'm late to this thread but looking at your patch that looks like it
should work, there's nothing wrong with it, I was just looking at fixing
the issue with chunked responses.
-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmer0lRfFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9Xeu/Q/+LaK0o6yoMy1frlPtYEoPE01UPfd7Mi3o
XhccA2sLd4KMUFl4jhoCCQuBuDD5mIn9pK9a+P3SHgzFNGwCutneLxJbiMIcFMYc
/9fR1ISfBNxzVTS8GJsByilldS7jhzpW5Om+fHcCBQoZ7VRT0NwAO1gcGFU32DCw
MIymJuiGz+8AJ1A7TOdP/mYB1P7KgCwTeJzSi+/CNpaTZWVhv+gLT1+NxrGU0IDD
Semr+4ZACfuG4XOX6UiOJ9Z/xempidk8P1nUkZe7t9Ta93rwIel8SawggWhgDUBL
eRwoNAIbqLJgA5iBaLYvFvrFboAln2HhbhI5Y1gDhHpJuaV6oJVmrAuR4n/4P33m
Y1KEj9gRCswP7ms81UH6/Mrmlz/Vc6gMW+5LrmMzvkY+/RNTGdpNT3for+JFaL2a
Vo/15V7gbcl7m1tv20WiXuaCzEZNe2T36qaEdHFbOf8vaWmpnfTff/z81RqUxCLX
XULLFkXamDbN+BQQuRP4o53jeiNt93SYI7ekvZ2ljavjOJUxei70kfxO+5XEL2+F
NvzcN+VPrBPgddJbfPvqSw0tJnUYIPjsHw/EvT4JaN+eJVFkXt8k4a+//ClhGsaI
xO+JPczN6vh4kcwYm+nn3Qp9Rgd8mJhU/GE1WnJ5rQifmJP7csBKtZiS4+OpJvvJ
/ZZU798Sk0c=
=TKmJ
-----END PGP SIGNATURE-----

Maxim Cournoyer wrote 4 weeks ago
(name . Felix Lechner)(address . felix.lechner@lease-up.com)
87y0ybri6d.fsf@gmail.com
Hi Felix,

Felix Lechner <felix.lechner@lease-up.com> writes:

Toggle quote (9 lines)
> Hi,
>
> On Sat, Feb 08 2025, Christopher Baines wrote:
>
>> this issue should be possible to work around in Mumi as well,
>
> What is wrong with my proposed patch, please? Just because a lambda
> will eventually save memory and enable chunking?

There's nothing wrong with it; I believe it could be pushed already,
while a more definitive fix in fibers or guile is pursued! I just
haven't gotten round to it yet. Anyone with commit access please feel
free to beat me to it, and update mumi on berlin with it.

--
Thanks,
Maxim
Arun Isaac wrote 4 weeks ago
87ldub2u6x.fsf@systemreboot.net
Hi Maxim,

I am redeploying mumi on berlin today for a bunch of other commits. I
can take this over, if you don't mind.

Regards,
Arun
Maxim Cournoyer wrote 4 weeks ago
(name . Arun Isaac)(address . arunisaac@systemreboot.net)
87mserp5t9.fsf@gmail.com
Hi Arun,

Arun Isaac <arunisaac@systemreboot.net> writes:

Toggle quote (5 lines)
> Hi Maxim,
>
> I am redeploying mumi on berlin today for a bunch of other commits. I
> can take this over, if you don't mind.

Yes, please :-). Thanks a lot.

--
Thanks,
Maxim
Closed
Arun Isaac wrote 4 weeks ago
87y0ya12ny.fsf@systemreboot.net
Hi all,

I have updated mumi on berlin. But, berlin needs to be rebooted for a
shepherd update before the mumi update can take effect. And, someone
needs to be at the data center when this happens, just in case there's
any trouble with the reboot. So, this is going to take a little while,
hopefully under a few days or a week.

Regards,
Arun
Closed
?
Your comment

Commenting via the web interface is currently disabled.

To comment on this conversation send an email to 69381@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 69381
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch
You may also tag this issue. See list of standard tags. For example, to set the confirmed and easy tags
mumi command -t +confirmed -t +easy
Or, remove the moreinfo tag and set the help tag
mumi command -t -moreinfo -t +help