(address . bug-guix@gnu.org)
- Yon 27 Feb 2021 03:18
- Ton 27 Feb 2021 13:31
- Jon 27 Feb 2021 13:34
- Lon 1 Mar 2021 11:06
- Pon 1 Mar 2021 11:49
- Pon 4 Mar 2021 12:03
- Pon 5 Mar 2021 12:54
- Yon 5 Mar 2021 11:03
- Lon 8 Mar 2021 14:27
- Pon 11 Mar 2021 01:01
[website] return 404 with HTTP header 'Accept-Language: zh-CN,zh'
T
Re: bug#46807: [website] return 404 with HTTP header 'Accept-Language: zh-CN, zh'
(name . ylc991)(address . ylc991@163.com)
87czwl66ab.fsf@nckx
Ylc991,
Thanks for the report!
My verbose notes so far; I need to (finally!) set up a local build
of the Web site first.
ylc991 ???
Toggle quote (3 lines)
> Hello! My webbrowser has set ‘Accept-Language’ to 'zh-CN,zh' by
> default, and https://guix.gnu.org returns 404.
Indeed, handling of zh-CN specifically is broken. :-(
Toggle snippet (5 lines)
~ λ curl -LI -H 'Accept-Language: zh-cn' https://guix.gnu.org
HTTP/1.1 404 Not Found
[...]
This is because our nginx configuration
(maintenance/hydra/nginx/berlin.scm) does:
Toggle snippet (5 lines)
set_from_accept_language $lang en de es fr zh-CN;
[...]
try_files $uri /$lang/$uri /$lang/$uri/index.html =404;
i.e., it looks in /srv/guix.gnu.org/zh-CN, but our website uses...
Toggle snippet (4 lines)
nckx@berlin ~$ ls -d /srv/guix.gnu.org/zh*
/srv/guix.gnu.org/zh-cn/
...lowercase. This questionable choice comes from
artwork/po/ietf-tags.scm:
Toggle snippet (18 lines)
;;; This file contains an association list for each translation
from
;;; the locale to an IETF language tag to be used in the URL path
of
;;; translated pages. The language tag results from the
translation
;;; team<E2><80><99>s language code from
;;; <https://translationproject.org/team/index.html>. The
underscore
;;; in the team<E2><80><99>s code is replaced by a hyphen. For
example, az would
;;; be used for the Azerbaijani language (not az-Latn) and zh-CN
would
;;; be used for mainland Chinese (not zh-Hans-CN)
([...]
("zh_CN" . "zh-cn"))
Questionable only because, while a lowercase region is technically
valid, it's so rare that it's likely to cause problems -- as we
found out.
Toggle quote (2 lines)
> I have tested with curl, 'zh-CN,zh', 'zh-CN', [is 404]
These are valid, so the nginx accept-language module accepts them,
but then looks for a subdirectory that doesn't exist and returns
404.
Toggle quote (2 lines)
> 'zh-cn' is 404
This is valid, but since we configure the accept-language module
to use ‘zh-CN’ it normalises $lang to the latter. Which is good,
but it causes the same 404 as above.
Toggle quote (2 lines)
> 'zh_CN' is 200.
This is bogus (‘_’ is not valid), hence ignored, and so the site
falls back to English 200.
Toggle quote (2 lines)
> 'zh' [is 200]
Valid but the accept-language module is not clever; we need to add
an explicit 'zh' entry for that to work:
Toggle snippet (3 lines)
set_from_accept_language $lang en de es fr zh-CN zh en;
I expect that adding it and changing ietf-tags.scm to use "zh-CN"
will fix both 404s, but need to check that it doesn't break
anything else.
The other untested solution is using lowercase
Toggle snippet (3 lines)
set_from_accept_language $lang en de es fr zh-cn zh en;
but I--assuming that even works--'m not fond of making the
unconventional the norm.
Kind regards,
T G-R
J
4577C0A0-0B5B-4808-8E51-D8E59301BE6B@lepiller.eu
It might be related to translations. When you use zh-cn, we have a translation for that language, so you're redirected to it. Not sure why you get a 404 though.
Le 26 février 2021 21:18:12 GMT-05:00, ylc991 <ylc991@163.com> a écrit :
Toggle quote (8 lines)
>Hello! My webbrowser has set ‘Accept-Language’ to 'zh-CN,zh' by
>default, and https://guix.gnu.org returns 404. I have tested with curl,
>'zh-CN,zh', 'zh-CN', 'zh-cn' is 404 while 'zh', 'zh_CN' is 200.
>
>
>The first time I found it is on 2021-02-23. And it didn't happened
>about one or two months ago. I think there may be something wrong with
>the web server.
Attachment: file
L
(name . ylc991)(address . ylc991@163.com)
87im6btcfw.fsf@gnu.org
Hello,
ylc991 <ylc991@163.com> skribis:
Toggle quote (3 lines)
> Hello! My webbrowser has set ‘Accept-Language’ to 'zh-CN,zh' by default, and https://guix.gnu.orgreturns 404. I have tested with curl, 'zh-CN,zh', 'zh-CN',
> 'zh-cn' is 404 while 'zh', 'zh_CN' is 200.
Florian, could it be that we’re not normalizing language tags
appropriately? Does that ring a bell?
Thanks for your report!
Ludo’.
P
(name . Ludovic Courtès)(address . ludo@gnu.org)
20210301104747.wlibfapjjn3x3kut@pelzflorian.localdomain
Hello,
On Mon, Mar 01, 2021 at 11:06:59AM +0100, Ludovic Courtès wrote:
Toggle quote (3 lines)
> Florian, could it be that we’re not normalizing language tags
> appropriately? Does that ring a bell?
Tobias’ analysis likely is correct. I haven’t yet build a current
berlin virtual machine to test though.
We’re not normalizing language tags at all currently. Doing URL
redirects in nginx confuses me greatly; I have no idea how to
concisely specify redirects *and* have them execute in the right
order. The many lines
(redirect "/blog/2006/purely-functional-software-deployment-model" "/$lang/blog/2006/purely-functional-software-deployment-model/")
and similar in maintenance.git’s hydra/nginx/berlin.scm file are a bad
solution and are testament to my confusion. I would not like one line
for each package.
Regards,
Florian
P
(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)
20210304110018.uhacou2tdyattzt6@pelzflorian.localdomain
On Sat, Feb 27, 2021 at 01:31:40PM +0100, Tobias Geerinckx-Rice via Bug reports for GNU Guix wrote:
Toggle quote (3 lines)
> I expect that adding it and changing ietf-tags.scm to use "zh-CN" will fix
> both 404s, but need to check that it doesn't break anything else.
I made the tiny change to guix-artwork’s ietf-tags.scm as
04c96a370b8cae48ed162e4414b8950cc65c513b now (sorry for taking so
long):
Toggle diff (27 lines)
diff --git a/website/po/ietf-tags.scm b/website/po/ietf-tags.scm
index 32b81ef..5bd22f4 100644
--- a/website/po/ietf-tags.scm
+++ b/website/po/ietf-tags.scm
@@ -10,4 +10,4 @@
("de_DE" . "de")
("es_ES" . "es")
("fr_FR" . "fr")
- ("zh_CN" . "zh-cn"))
+ ("zh_CN" . "zh-CN"))
Note that the prior zh-cn URLs will be broken.
I will play around with nginx’ map directive to make zh-cn and zh
Accept-Language settings direct to the proper URL later, afterwards I
will close this bug. zh-cn URLs remain invalid. Links to the manual
continue to use zh-cn.
For testing I dug out the VM code
<https://lists.gnu.org/archive/html/bug-guix/2020-04/msg00195.html>
where I had removed parts of berlin that are not relevant to the
website. The change breaks neither website nor manual.
Thanks ylc991 for the report!
Regards,
Florian
P
(address . 46807@debbugs.gnu.org)
20210305115333.prvjomh2lre7rt5k@pelzflorian.localdomain
Hello all,
On Mon, Mar 01, 2021 at 11:06:59AM +0100, Ludovic Courtès wrote:
Toggle quote (3 lines)
> Florian, could it be that we’re not normalizing language tags
> appropriately? Does that ring a bell?
The attached patch to maintenance.git fixes the remaining minor issue:
Now Accept-Language language codes get normalized, zh to zh-CN, so web
browsers requesting any kind of Chinese get the website in mainland
Chinese. (This is a minor issue. The only valid URL is /zh-CN/ since
my last patch to guix-artwork because I don’t know how to
rewrite/redirect URLs in nginx.)
The patch was tested on a berlin VM.
There is no copyright header in maintenance.git’s
hydra/nginx/berlin.scm so I did not add a copyright. I hereby license
the patch CC0
Shall I just push? A reconfigure of berlin will be necessary but is
not urgent.
Regards,
Florian
From: Florian Pelz <pelzflorian@pelzflorian.de>
Date: Thu, 4 Mar 2021 20:29:27 +0100
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Subject: [PATCH] nginx: berlin: Normalize Accept-Language language code zh to
zh-CN.
Now web browsers requesting any kind of Chinese get the website in
mainland Chinese.
zh, zh-Hans, zh-Hans-CN all are synonymous with zh-CN now.
* hydra/nginx/berlin.scm (accept-languages): New procedure.
(%extra-content): Normalize $lang variable with it.
---
hydra/nginx/berlin.scm | 27 ++++++++++++++++++++++++++-
1 file changed, 26 insertions(+), 1 deletion(-)
Toggle diff (45 lines)
diff --git a/hydra/nginx/berlin.scm b/hydra/nginx/berlin.scm
index 85aaf38..4b9d297 100644
--- a/hydra/nginx/berlin.scm
+++ b/hydra/nginx/berlin.scm
@@ -995,12 +995,37 @@ PUBLISH-URL."
(uri "~ /(.*)")
(body (list "return 301 $scheme://guixwl.org/$1;"))))))))
+(define (accept-languages language-lists)
+ "Returns nginx configuration code to set up the $lang variable
+according to the Accept-Language header in the HTTP request. The
+requesting user agent will be served the files at /$lang/some/url.
+Each list in LANGUAGE-LISTS starts with the $lang and is followed by
+synonymous IETF language tags that should be mapped to the same $lang."
+ (define (language-mappings language-list)
+ (define (language-mapping language)
+ (string-join (list " " language (car language-list) ";")))
+ (string-join (map language-mapping language-list) "\n"))
+
+ (let ((directives
+ `(,(string-join
+ `("set_from_accept_language $lang_unmapped"
+ ,@(map string-join language-lists)
+ ";"))
+ "map $lang_unmapped $lang {"
+ ,@(map language-mappings language-lists)
+ "}")))
+ (string-join directives "\n")))
+
(define %extra-content
(list
"default_type application/octet-stream;"
"sendfile on;"
- "set_from_accept_language $lang en de es fr zh-CN;"
+ (accept-languages '(("en")
+ ("de")
+ ("es")
+ ("fr")
+ ("zh-CN" "zh" "zh-Hans" "zh-Hans-CN")))
;; Maximum chunk size to send. Partly this is a workaround for
;; <http://bugs.gnu.org/19939>, but also the nginx docs mention that
--
2.30.1
L
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)
87h7llg4ht.fsf@gnu.org
Hi,
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
Toggle quote (9 lines)
> The attached patch to maintenance.git fixes the remaining minor issue:
> Now Accept-Language language codes get normalized, zh to zh-CN, so web
> browsers requesting any kind of Chinese get the website in mainland
> Chinese. (This is a minor issue. The only valid URL is /zh-CN/ since
> my last patch to guix-artwork because I don’t know how to
> rewrite/redirect URLs in nginx.)
>
> The patch was tested on a berlin VM.
Yay!
Toggle quote (5 lines)
> There is no copyright header in maintenance.git’s
> hydra/nginx/berlin.scm so I did not add a copyright. I hereby license
> the patch CC0
> <https://creativecommons.org/publicdomain/zero/1.0/legalcode>.
Good point; I guess it was meant to be GPLv3+ like the rest, but thanks
for clarifying.
Toggle quote (3 lines)
> Shall I just push? A reconfigure of berlin will be necessary but is
> not urgent.
Yes, sounds good!
We’ll reconfigure sooner or later, just ping if you don’t see it happen
within two weeks or so.
Thanks,
Ludo’.
Closed
?