[website] return 404 with HTTP header 'Accept-Language: zh-CN,zh'

  • Done
  • quality assurance status badge
Details
5 participants
  • Julien Lepiller
  • Ludovic Courtès
  • Tobias Geerinckx-Rice
  • pelzflorian (Florian Pelz)
  • ylc991
Owner
unassigned
Submitted by
ylc991
Severity
normal
Y
Y
ylc991 wrote on 27 Feb 2021 03:18
(address . bug-guix@gnu.org)
evqkq4rbt46iooto6cjksjii.1614392292497@email.android.com
Attachment: file
T
T
Tobias Geerinckx-Rice wrote on 27 Feb 2021 13:31
Re: bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'
(name . ylc991)(address . ylc991@163.com)
87czwl66ab.fsf@nckx
Ylc991,

Thanks for the report!

My verbose notes so far; I need to (finally!) set up a local build
of the Web site first.

ylc991 ???
Toggle quote (3 lines)
> Hello! My webbrowser has set ‘Accept-Language’ to 'zh-CN,zh' by
> default, and https://guix.gnu.org returns 404.

Indeed, handling of zh-CN specifically is broken. :-(

Toggle snippet (5 lines)
~ λ curl -LI -H 'Accept-Language: zh-cn' https://guix.gnu.org
HTTP/1.1 404 Not Found
[...]

This is because our nginx configuration
(maintenance/hydra/nginx/berlin.scm) does:

Toggle snippet (5 lines)
set_from_accept_language $lang en de es fr zh-CN;
[...]
try_files $uri /$lang/$uri /$lang/$uri/index.html =404;

i.e., it looks in /srv/guix.gnu.org/zh-CN, but our website uses...

Toggle snippet (4 lines)
nckx@berlin ~$ ls -d /srv/guix.gnu.org/zh*
/srv/guix.gnu.org/zh-cn/

...lowercase. This questionable choice comes from
artwork/po/ietf-tags.scm:

Toggle snippet (18 lines)
;;; This file contains an association list for each translation
from
;;; the locale to an IETF language tag to be used in the URL path
of
;;; translated pages. The language tag results from the
translation
;;; team<E2><80><99>s language code from
;;; <https://translationproject.org/team/index.html>. The
underscore
;;; in the team<E2><80><99>s code is replaced by a hyphen. For
example, az would
;;; be used for the Azerbaijani language (not az-Latn) and zh-CN
would
;;; be used for mainland Chinese (not zh-Hans-CN)
([...]
("zh_CN" . "zh-cn"))

Questionable only because, while a lowercase region is technically
valid, it's so rare that it's likely to cause problems -- as we
found out.

Toggle quote (2 lines)
> I have tested with curl, 'zh-CN,zh', 'zh-CN', [is 404]

These are valid, so the nginx accept-language module accepts them,
but then looks for a subdirectory that doesn't exist and returns
404.

Toggle quote (2 lines)
> 'zh-cn' is 404

This is valid, but since we configure the accept-language module
to use ‘zh-CN’ it normalises $lang to the latter. Which is good,
but it causes the same 404 as above.

Toggle quote (2 lines)
> 'zh_CN' is 200.

This is bogus (‘_’ is not valid), hence ignored, and so the site
falls back to English 200.

Toggle quote (2 lines)
> 'zh' [is 200]

Valid but the accept-language module is not clever; we need to add
an explicit 'zh' entry for that to work:

Toggle snippet (3 lines)
set_from_accept_language $lang en de es fr zh-CN zh en;

I expect that adding it and changing ietf-tags.scm to use "zh-CN"
will fix both 404s, but need to check that it doesn't break
anything else.

The other untested solution is using lowercase

Toggle snippet (3 lines)
set_from_accept_language $lang en de es fr zh-cn zh en;

but I--assuming that even works--'m not fond of making the
unconventional the norm.

Kind regards,

T G-R
-----BEGIN PGP SIGNATURE-----

iIMEARYKACsWIQT12iAyS4c9C3o4dnINsP+IT1VteQUCYDo7rQ0cbWVAdG9iaWFz
LmdyAAoJEA2w/4hPVW15JtwBANSsU7nb49XCqb+SwuiKIoJrdYEqizNZP5OPmoSg
zN1VAPsHOqOQ6yL6TCgmq/oyIpHoZvk7x4W5VJqp2I9uHGLICg==
=MWX4
-----END PGP SIGNATURE-----

J
J
Julien Lepiller wrote on 27 Feb 2021 13:34
4577C0A0-0B5B-4808-8E51-D8E59301BE6B@lepiller.eu
It might be related to translations. When you use zh-cn, we have a translation for that language, so you're redirected to it. Not sure why you get a 404 though.

Le 26 février 2021 21:18:12 GMT-05:00, ylc991 <ylc991@163.com> a écrit :
Toggle quote (8 lines)
>Hello! My webbrowser has set ‘Accept-Language’ to 'zh-CN,zh' by
>default, and https://guix.gnu.org returns 404. I have tested with curl,
>'zh-CN,zh', 'zh-CN', 'zh-cn' is 404 while 'zh', 'zh_CN' is 200.
>
>
>The first time I found it is on 2021-02-23. And it didn't happened
>about one or two months ago. I think there may be something wrong with
>the web server.
Attachment: file
L
L
Ludovic Courtès wrote on 1 Mar 2021 11:06
(name . ylc991)(address . ylc991@163.com)
87im6btcfw.fsf@gnu.org
Hello,

ylc991 <ylc991@163.com> skribis:

Toggle quote (3 lines)
> Hello! My webbrowser has set ‘Accept-Language’ to 'zh-CN,zh' by default, and https://guix.gnu.orgreturns 404. I have tested with curl, 'zh-CN,zh', 'zh-CN',
> 'zh-cn' is 404 while 'zh', 'zh_CN' is 200.

Florian, could it be that we’re not normalizing language tags
appropriately? Does that ring a bell?

Thanks for your report!

Ludo’.
P
P
pelzflorian (Florian Pelz) wrote on 1 Mar 2021 11:49
(name . Ludovic Courtès)(address . ludo@gnu.org)
20210301104747.wlibfapjjn3x3kut@pelzflorian.localdomain
Hello,

On Mon, Mar 01, 2021 at 11:06:59AM +0100, Ludovic Courtès wrote:
Toggle quote (3 lines)
> Florian, could it be that we’re not normalizing language tags
> appropriately? Does that ring a bell?

Tobias’ analysis likely is correct. I haven’t yet build a current
berlin virtual machine to test though.

We’re not normalizing language tags at all currently. Doing URL
redirects in nginx confuses me greatly; I have no idea how to
concisely specify redirects *and* have them execute in the right
order. The many lines

(redirect "/blog/2006/purely-functional-software-deployment-model" "/$lang/blog/2006/purely-functional-software-deployment-model/")

and similar in maintenance.git’s hydra/nginx/berlin.scm file are a bad
solution and are testament to my confusion. I would not like one line
for each package.

Regards,
Florian
P
P
pelzflorian (Florian Pelz) wrote on 4 Mar 2021 12:03
(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)
20210304110018.uhacou2tdyattzt6@pelzflorian.localdomain
On Sat, Feb 27, 2021 at 01:31:40PM +0100, Tobias Geerinckx-Rice via Bug reports for GNU Guix wrote:
Toggle quote (3 lines)
> I expect that adding it and changing ietf-tags.scm to use "zh-CN" will fix
> both 404s, but need to check that it doesn't break anything else.

I made the tiny change to guix-artwork’s ietf-tags.scm as
04c96a370b8cae48ed162e4414b8950cc65c513b now (sorry for taking so
long):

Toggle diff (27 lines)
diff --git a/website/po/ietf-tags.scm b/website/po/ietf-tags.scm
index 32b81ef..5bd22f4 100644
--- a/website/po/ietf-tags.scm
+++ b/website/po/ietf-tags.scm
@@ -10,4 +10,4 @@
("de_DE" . "de")
("es_ES" . "es")
("fr_FR" . "fr")
- ("zh_CN" . "zh-cn"))
+ ("zh_CN" . "zh-CN"))

Note that the prior zh-cn URLs will be broken.

I will play around with nginx’ map directive to make zh-cn and zh
Accept-Language settings direct to the proper URL later, afterwards I
will close this bug. zh-cn URLs remain invalid. Links to the manual
continue to use zh-cn.

For testing I dug out the VM code
<https://lists.gnu.org/archive/html/bug-guix/2020-04/msg00195.html>
where I had removed parts of berlin that are not relevant to the
website. The change breaks neither website nor manual.

Thanks ylc991 for the report!

Regards,
Florian
P
P
pelzflorian (Florian Pelz) wrote on 5 Mar 2021 12:54
(address . 46807@debbugs.gnu.org)
20210305115333.prvjomh2lre7rt5k@pelzflorian.localdomain
Hello all,

On Mon, Mar 01, 2021 at 11:06:59AM +0100, Ludovic Courtès wrote:
Toggle quote (3 lines)
> Florian, could it be that we’re not normalizing language tags
> appropriately? Does that ring a bell?

The attached patch to maintenance.git fixes the remaining minor issue:
Now Accept-Language language codes get normalized, zh to zh-CN, so web
browsers requesting any kind of Chinese get the website in mainland
Chinese. (This is a minor issue. The only valid URL is /zh-CN/ since
my last patch to guix-artwork because I don’t know how to
rewrite/redirect URLs in nginx.)

The patch was tested on a berlin VM.

There is no copyright header in maintenance.git’s
hydra/nginx/berlin.scm so I did not add a copyright. I hereby license
the patch CC0

Shall I just push? A reconfigure of berlin will be necessary but is
not urgent.

Regards,
Florian
From: Florian Pelz <pelzflorian@pelzflorian.de>
Date: Thu, 4 Mar 2021 20:29:27 +0100
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Subject: [PATCH] nginx: berlin: Normalize Accept-Language language code zh to
zh-CN.

Now web browsers requesting any kind of Chinese get the website in
mainland Chinese.

zh, zh-Hans, zh-Hans-CN all are synonymous with zh-CN now.

* hydra/nginx/berlin.scm (accept-languages): New procedure.
(%extra-content): Normalize $lang variable with it.
---
hydra/nginx/berlin.scm | 27 ++++++++++++++++++++++++++-
1 file changed, 26 insertions(+), 1 deletion(-)

Toggle diff (45 lines)
diff --git a/hydra/nginx/berlin.scm b/hydra/nginx/berlin.scm
index 85aaf38..4b9d297 100644
--- a/hydra/nginx/berlin.scm
+++ b/hydra/nginx/berlin.scm
@@ -995,12 +995,37 @@ PUBLISH-URL."
(uri "~ /(.*)")
(body (list "return 301 $scheme://guixwl.org/$1;"))))))))
+(define (accept-languages language-lists)
+ "Returns nginx configuration code to set up the $lang variable
+according to the Accept-Language header in the HTTP request. The
+requesting user agent will be served the files at /$lang/some/url.
+Each list in LANGUAGE-LISTS starts with the $lang and is followed by
+synonymous IETF language tags that should be mapped to the same $lang."
+ (define (language-mappings language-list)
+ (define (language-mapping language)
+ (string-join (list " " language (car language-list) ";")))
+ (string-join (map language-mapping language-list) "\n"))
+
+ (let ((directives
+ `(,(string-join
+ `("set_from_accept_language $lang_unmapped"
+ ,@(map string-join language-lists)
+ ";"))
+ "map $lang_unmapped $lang {"
+ ,@(map language-mappings language-lists)
+ "}")))
+ (string-join directives "\n")))
+
(define %extra-content
(list
"default_type application/octet-stream;"
"sendfile on;"
- "set_from_accept_language $lang en de es fr zh-CN;"
+ (accept-languages '(("en")
+ ("de")
+ ("es")
+ ("fr")
+ ("zh-CN" "zh" "zh-Hans" "zh-Hans-CN")))
;; Maximum chunk size to send. Partly this is a workaround for
;; <http://bugs.gnu.org/19939>, but also the nginx docs mention that
--
2.30.1
Y
(address . 46807@debbugs.gnu.org)
3e7d57a0.5263.17801d76b44.Coremail.ylc991@163.com
Thank you for your help! Everything goes fine now.
L
L
Ludovic Courtès wrote on 8 Mar 2021 14:27
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)
87h7llg4ht.fsf@gnu.org
Hi,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

Toggle quote (9 lines)
> The attached patch to maintenance.git fixes the remaining minor issue:
> Now Accept-Language language codes get normalized, zh to zh-CN, so web
> browsers requesting any kind of Chinese get the website in mainland
> Chinese. (This is a minor issue. The only valid URL is /zh-CN/ since
> my last patch to guix-artwork because I don’t know how to
> rewrite/redirect URLs in nginx.)
>
> The patch was tested on a berlin VM.

Yay!

Toggle quote (5 lines)
> There is no copyright header in maintenance.git’s
> hydra/nginx/berlin.scm so I did not add a copyright. I hereby license
> the patch CC0
> <https://creativecommons.org/publicdomain/zero/1.0/legalcode>.

Good point; I guess it was meant to be GPLv3+ like the rest, but thanks
for clarifying.

Toggle quote (3 lines)
> Shall I just push? A reconfigure of berlin will be necessary but is
> not urgent.

Yes, sounds good!

We’ll reconfigure sooner or later, just ping if you don’t see it happen
within two weeks or so.

Thanks,
Ludo’.
P
P
pelzflorian (Florian Pelz) wrote on 11 Mar 2021 01:01
(address . 46807-done@debbugs.gnu.org)
20210311000150.cymvv2cdsaadyzep@pelzflorian.localdomain
Pushed to maintenance.git as 82b075685b6089c7f98acb0993c003936d833776.

Closing. Thank you all!
Closed
?
Your comment

This issue is archived.

To comment on this conversation send an email to 46807@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 46807
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch