[PATCH] nls: Implement translation thresholds.

  • Done
  • quality assurance status badge
Details
2 participants
  • Julien Lepiller
  • Ludovic Courtès
Owner
unassigned
Submitted by
Julien Lepiller
Severity
normal
J
J
Julien Lepiller wrote on 8 Mar 2022 19:22
(address . guix-patches@gnu.org)
20220308192251.20094828@tachikoma.lepiller.eu
Hi Guix!

As discussed with Ludo on IRC, new translations for the manual and
cookbook are a bit annoying, because you need to build them regularly.
Ludo proposed to implement a threshold to ensure we're not compiling
what is mostly the English manual. Here's the proposal:

manual and cookbook: only include new languages when they reach 10%
completion. Remove languages when they fall below 5%.

website (unrelated to this repo, but still important): only include new
languages when they reach 80% completion. Remove languages when they
fall below 60%. The reason for the higher threshold is that the website
acts as some sort of advertisement, so we want a higher quality than
half English half translated.

guix and packages: this is not an issue for developpers or users to
have more, so no threshold (other than at least one string needs to be
translated). Removal of obsolete translations (that do not have any
relevant strings anymore) is not implemented in the series.

The first patch documents the thresholds in the manual. The second
patch fixes the download-po target to make it enforce the threshold and
the last patch removes files that are under the 5% threshold for the
manual and cookbook.

In the long run, we might want to find a way to not build the
translated manuals by default...
From 37071410629a7d70c9b4e4f673f2c625d3ed4b47 Mon Sep 17 00:00:00 2001
Message-Id: <37071410629a7d70c9b4e4f673f2c625d3ed4b47.1646763369.git.julien@lepiller.eu>
From: Julien Lepiller <julien@lepiller.eu>
Date: Tue, 8 Mar 2022 13:14:58 +0100
Subject: [PATCH 1/3] doc: Document inclusion requirements for new
translations.

* doc/contributing.texi (Translating Guix)[Conditions for Inclusion]:
New section.
---
doc/contributing.texi | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)

Toggle diff (35 lines)
diff --git a/doc/contributing.texi b/doc/contributing.texi
index 207efc4ee6..dadd191a4b 100644
--- a/doc/contributing.texi
+++ b/doc/contributing.texi
@@ -1911,6 +1911,28 @@ Translating Guix
translated.
@end itemize
+@subsubheading Conditions for Inclusion
+
+There are no conditions for adding new translations of the guix and
+guix-packages components, other than they need at least one translated
+string. New languages will be added to Guix as soon as possible. The
+files may be removed if they fall out of sync and have no more translated
+strings.
+
+Given that the website is dedicated to new users, we want its translation
+to be as complete as possible before we include it in the language menu.
+For a new language to be included, it needs to reach at least 80% completion.
+When a language is included, it may be removed in the future, if it stays
+out of sync and falls below 60% completion.
+
+The manual and cookbook are automatically added in the default compilation
+target. Everytime we synchronise translations, developpers need to
+recompile all the translated manuals and cookbooks. This is useless for what
+is essentially the English manual or cookbook. Therefore, we will only
+include a new language when it reaches 10% completion in the component.
+When a language is included, it may be removed in the future, if it stays
+out of sync and falls below 5% completion.
+
@subsubheading Translation Infrastructure
Weblate is backed by a git repository from which it discovers new strings to
--
2.34.0
From 5cbb70ebcbf141cd05fa60bf0bfa806125a56381 Mon Sep 17 00:00:00 2001
Message-Id: <5cbb70ebcbf141cd05fa60bf0bfa806125a56381.1646763369.git.julien@lepiller.eu>
In-Reply-To: <37071410629a7d70c9b4e4f673f2c625d3ed4b47.1646763369.git.julien@lepiller.eu>
References: <37071410629a7d70c9b4e4f673f2c625d3ed4b47.1646763369.git.julien@lepiller.eu>
From: Julien Lepiller <julien@lepiller.eu>
Date: Tue, 8 Mar 2022 19:11:38 +0100
Subject: [PATCH 2/3] maint: Implement translation thresholds.

Do not download new translations for the cookbook and the manual when
they are below 10% completion, and remove existing translations when
they fall below 5%.

* Makefile.am (download-po): Implement translation thresholds.
---
Makefile.am | 20 +++++++++++++++++---
1 file changed, 17 insertions(+), 3 deletions(-)

Toggle diff (45 lines)
diff --git a/Makefile.am b/Makefile.am
index 8850c4562c..164804d96a 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -1066,21 +1066,35 @@ WEBLATE_REPO = https://framagit.org/tyreunom/guix-translations
# form.
download-po:
dir=$$(mktemp -d); \
- git clone --depth 1 "$(WEBLATE_REPO)" "$$dir/translations"; \
+ git clone --depth 1 "$(WEBLATE_REPO)" "$$dir/translations" && \
for domain in po/doc po/guix po/packages; do \
for po in "$$dir/translations/$$domain"/*.po; do \
translated=$$(LANG=en_US.UTF-8 msgfmt --statistics "$$po" 2>&1 | cut -f1 -d' '); \
+ untranslated=$$(LANG=en_US.UTF-8 msgfmt --statistics "$$po" 2>&1 | cut -f4 -d' '); \
+ untranslated=$${untranslated:-0}; \
+ total=$$(($$translated+$$untranslated)); \
target=$$(basename "$$po"); \
target="$$domain/$$target"; \
- if msgfmt -c "$$po" && [ "$$translated" != "0" ]; then \
+ msgfmt -c "$$po"; \
+ if msgfmt -c "$$po" && [ "$$translated" != "0" ] && ([ "$$domain" != "po/doc" ] || [ "$$translated" -gt $$(($$total/10)) ] || [ -f $$target ]); then \
msgfilter --no-wrap -i "$$po" cat > "$$po".tmp; \
mv "$$po".tmp "$$target"; \
echo "copied $$target."; \
else \
- echo "WARN: $$target ($$translated translated messages) was not added/updated."; \
+ echo "WARN: $$target ($$translated translated messages ($$((translated/total*100))%)) was not added/updated."; \
fi; \
done; \
done; \
+ for po in po/doc/*.po; do \
+ translated=$$(LANG=en_US.UTF-8 msgfmt --statistics "$$po" 2>&1 | cut -f1 -d' '); \
+ untranslated=$$(LANG=en_US.UTF-8 msgfmt --statistics "$$po" 2>&1 | cut -f4 -d' '); \
+ untranslated=$${untranslated:-0}; \
+ total=$$(($$translated + $$untranslated)); \
+ if [ "$$translated" -lt "$$(($$total/20))" ]; then \
+ echo "WARN: $$po was removed because it is below the 5% threshold: $$((translated/total*100))%"; \
+ rm $$po; \
+ fi; \
+ done; \
rm -rf "$$dir"
.PHONY: download-po
--
2.34.0
L
L
Ludovic Courtès wrote on 9 Mar 2022 11:12
(name . Julien Lepiller)(address . julien@lepiller.eu)(address . 54302@debbugs.gnu.org)
8735jr4eq0.fsf@gnu.org
Hi!

Julien Lepiller <julien@lepiller.eu> skribis:

Toggle quote (3 lines)
> manual and cookbook: only include new languages when they reach 10%
> completion. Remove languages when they fall below 5%.

SGTM (or even 15%/10%).

Toggle quote (6 lines)
> website (unrelated to this repo, but still important): only include new
> languages when they reach 80% completion. Remove languages when they
> fall below 60%. The reason for the higher threshold is that the website
> acts as some sort of advertisement, so we want a higher quality than
> half English half translated.

SGTM.

Toggle quote (10 lines)
>>From 37071410629a7d70c9b4e4f673f2c625d3ed4b47 Mon Sep 17 00:00:00 2001
> Message-Id: <37071410629a7d70c9b4e4f673f2c625d3ed4b47.1646763369.git.julien@lepiller.eu>
> From: Julien Lepiller <julien@lepiller.eu>
> Date: Tue, 8 Mar 2022 13:14:58 +0100
> Subject: [PATCH 1/3] doc: Document inclusion requirements for new
> translations.
>
> * doc/contributing.texi (Translating Guix)[Conditions for Inclusion]:
> New section.

[...]

Toggle quote (3 lines)
> +There are no conditions for adding new translations of the guix and
> +guix-packages components, other than they need at least one translated

@code{guix} and @code{guix-packages}

Toggle quote (2 lines)
> +Given that the website is dedicated to new users, we want its translation

“web site” (two words).

Toggle quote (2 lines)
> +target. Everytime we synchronise translations, developpers need to

“developers” and (if you feel overseas-inclined) “synchronize”.

Toggle quote (2 lines)
> +When a language is included, it may be removed in the future, if it stays

Remove comma.

Toggle quote (58 lines)
>>From 5cbb70ebcbf141cd05fa60bf0bfa806125a56381 Mon Sep 17 00:00:00 2001
> Message-Id: <5cbb70ebcbf141cd05fa60bf0bfa806125a56381.1646763369.git.julien@lepiller.eu>
> In-Reply-To: <37071410629a7d70c9b4e4f673f2c625d3ed4b47.1646763369.git.julien@lepiller.eu>
> References: <37071410629a7d70c9b4e4f673f2c625d3ed4b47.1646763369.git.julien@lepiller.eu>
> From: Julien Lepiller <julien@lepiller.eu>
> Date: Tue, 8 Mar 2022 19:11:38 +0100
> Subject: [PATCH 2/3] maint: Implement translation thresholds.
>
> Do not download new translations for the cookbook and the manual when
> they are below 10% completion, and remove existing translations when
> they fall below 5%.
>
> * Makefile.am (download-po): Implement translation thresholds.
> ---
> Makefile.am | 20 +++++++++++++++++---
> 1 file changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/Makefile.am b/Makefile.am
> index 8850c4562c..164804d96a 100644
> --- a/Makefile.am
> +++ b/Makefile.am
> @@ -1066,21 +1066,35 @@ WEBLATE_REPO = https://framagit.org/tyreunom/guix-translations
> # form.
> download-po:
> dir=$$(mktemp -d); \
> - git clone --depth 1 "$(WEBLATE_REPO)" "$$dir/translations"; \
> + git clone --depth 1 "$(WEBLATE_REPO)" "$$dir/translations" && \
> for domain in po/doc po/guix po/packages; do \
> for po in "$$dir/translations/$$domain"/*.po; do \
> translated=$$(LANG=en_US.UTF-8 msgfmt --statistics "$$po" 2>&1 | cut -f1 -d' '); \
> + untranslated=$$(LANG=en_US.UTF-8 msgfmt --statistics "$$po" 2>&1 | cut -f4 -d' '); \
> + untranslated=$${untranslated:-0}; \
> + total=$$(($$translated+$$untranslated)); \
> target=$$(basename "$$po"); \
> target="$$domain/$$target"; \
> - if msgfmt -c "$$po" && [ "$$translated" != "0" ]; then \
> + msgfmt -c "$$po"; \
> + if msgfmt -c "$$po" && [ "$$translated" != "0" ] && ([ "$$domain" != "po/doc" ] || [ "$$translated" -gt $$(($$total/10)) ] || [ -f $$target ]); then \
> msgfilter --no-wrap -i "$$po" cat > "$$po".tmp; \
> mv "$$po".tmp "$$target"; \
> echo "copied $$target."; \
> else \
> - echo "WARN: $$target ($$translated translated messages) was not added/updated."; \
> + echo "WARN: $$target ($$translated translated messages ($$((translated/total*100))%)) was not added/updated."; \
> fi; \
> done; \
> done; \
> + for po in po/doc/*.po; do \
> + translated=$$(LANG=en_US.UTF-8 msgfmt --statistics "$$po" 2>&1 | cut -f1 -d' '); \
> + untranslated=$$(LANG=en_US.UTF-8 msgfmt --statistics "$$po" 2>&1 | cut -f4 -d' '); \
> + untranslated=$${untranslated:-0}; \
> + total=$$(($$translated + $$untranslated)); \
> + if [ "$$translated" -lt "$$(($$total/20))" ]; then \
> + echo "WARN: $$po was removed because it is below the 5% threshold: $$((translated/total*100))%"; \
> + rm $$po; \
> + fi; \
> + done; \

LGTM, but this is getting a bit hairy. :-)

No concrete suggestions, but it would be great if we could somehow split
it and/or move it to a script in build-aux/ (that’d avoid double dollar
signs) and/or write it in Scheme. Future work…

Toggle quote (15 lines)
> From 726ef94f91d5dab25c3ccfb2986dcba6d39a4ab8 Mon Sep 17 00:00:00 2001
> Message-Id: <726ef94f91d5dab25c3ccfb2986dcba6d39a4ab8.1646763369.git.julien@lepiller.eu>
> In-Reply-To: <37071410629a7d70c9b4e4f673f2c625d3ed4b47.1646763369.git.julien@lepiller.eu>
> References: <37071410629a7d70c9b4e4f673f2c625d3ed4b47.1646763369.git.julien@lepiller.eu>
> From: Julien Lepiller <julien@lepiller.eu>
> Date: Tue, 8 Mar 2022 19:14:47 +0100
> Subject: [PATCH 3/3] nls: Enforce translation thresholds.
>
> * po/doc/guix-cookbook.es.po: Remove file.
> * po/doc/guix-cookbook.fa.po: Remove file.
> * po/doc/guix-cookbook.fi.po: Remove file.
> * po/doc/guix-cookbook.uk.po: Remove file.
> * po/doc/local.mk: Remove them.
> * doc/local.mk: Remove them.

[...]

Toggle quote (5 lines)
> # If adding a language, update the following variables, and info_TEXINFOS.
> MANUAL_LANGUAGES = de es fa fi fr it ko pt_BR ru sk zh_CN
> -COOKBOOK_LANGUAGES = de es fa fi fr ko pt_BR ru sk uk zh_Hans
> +COOKBOOK_LANGUAGES = de fr ko pt_BR ru sk zh_Hans

Should we also remove fa, fi, it, ko, and sk from MANUAL_LANGUAGES and
info_TEXINFOS?


Thank you!

Ludo’.
L
L
Ludovic Courtès wrote on 29 Mar 2022 15:56
(name . Julien Lepiller)(address . julien@lepiller.eu)(address . 54302@debbugs.gnu.org)
87v8vwx3o7.fsf_-_@gnu.org
Ping! :-)

Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (140 lines)
> Hi!
>
> Julien Lepiller <julien@lepiller.eu> skribis:
>
>> manual and cookbook: only include new languages when they reach 10%
>> completion. Remove languages when they fall below 5%.
>
> SGTM (or even 15%/10%).
>
>> website (unrelated to this repo, but still important): only include new
>> languages when they reach 80% completion. Remove languages when they
>> fall below 60%. The reason for the higher threshold is that the website
>> acts as some sort of advertisement, so we want a higher quality than
>> half English half translated.
>
> SGTM.
>
>>>>From 37071410629a7d70c9b4e4f673f2c625d3ed4b47 Mon Sep 17 00:00:00 2001
>> Message-Id: <37071410629a7d70c9b4e4f673f2c625d3ed4b47.1646763369.git.julien@lepiller.eu>
>> From: Julien Lepiller <julien@lepiller.eu>
>> Date: Tue, 8 Mar 2022 13:14:58 +0100
>> Subject: [PATCH 1/3] doc: Document inclusion requirements for new
>> translations.
>>
>> * doc/contributing.texi (Translating Guix)[Conditions for Inclusion]:
>> New section.
>
> [...]
>
>> +There are no conditions for adding new translations of the guix and
>> +guix-packages components, other than they need at least one translated
>
> @code{guix} and @code{guix-packages}
>
>> +Given that the website is dedicated to new users, we want its translation
>
> “web site” (two words).
>
>> +target. Everytime we synchronise translations, developpers need to
>
> “developers” and (if you feel overseas-inclined) “synchronize”.
>
>> +When a language is included, it may be removed in the future, if it stays
>
> Remove comma.
>
>>>>From 5cbb70ebcbf141cd05fa60bf0bfa806125a56381 Mon Sep 17 00:00:00 2001
>> Message-Id: <5cbb70ebcbf141cd05fa60bf0bfa806125a56381.1646763369.git.julien@lepiller.eu>
>> In-Reply-To: <37071410629a7d70c9b4e4f673f2c625d3ed4b47.1646763369.git.julien@lepiller.eu>
>> References: <37071410629a7d70c9b4e4f673f2c625d3ed4b47.1646763369.git.julien@lepiller.eu>
>> From: Julien Lepiller <julien@lepiller.eu>
>> Date: Tue, 8 Mar 2022 19:11:38 +0100
>> Subject: [PATCH 2/3] maint: Implement translation thresholds.
>>
>> Do not download new translations for the cookbook and the manual when
>> they are below 10% completion, and remove existing translations when
>> they fall below 5%.
>>
>> * Makefile.am (download-po): Implement translation thresholds.
>> ---
>> Makefile.am | 20 +++++++++++++++++---
>> 1 file changed, 17 insertions(+), 3 deletions(-)
>>
>> diff --git a/Makefile.am b/Makefile.am
>> index 8850c4562c..164804d96a 100644
>> --- a/Makefile.am
>> +++ b/Makefile.am
>> @@ -1066,21 +1066,35 @@ WEBLATE_REPO = https://framagit.org/tyreunom/guix-translations
>> # form.
>> download-po:
>> dir=$$(mktemp -d); \
>> - git clone --depth 1 "$(WEBLATE_REPO)" "$$dir/translations"; \
>> + git clone --depth 1 "$(WEBLATE_REPO)" "$$dir/translations" && \
>> for domain in po/doc po/guix po/packages; do \
>> for po in "$$dir/translations/$$domain"/*.po; do \
>> translated=$$(LANG=en_US.UTF-8 msgfmt --statistics "$$po" 2>&1 | cut -f1 -d' '); \
>> + untranslated=$$(LANG=en_US.UTF-8 msgfmt --statistics "$$po" 2>&1 | cut -f4 -d' '); \
>> + untranslated=$${untranslated:-0}; \
>> + total=$$(($$translated+$$untranslated)); \
>> target=$$(basename "$$po"); \
>> target="$$domain/$$target"; \
>> - if msgfmt -c "$$po" && [ "$$translated" != "0" ]; then \
>> + msgfmt -c "$$po"; \
>> + if msgfmt -c "$$po" && [ "$$translated" != "0" ] && ([ "$$domain" != "po/doc" ] || [ "$$translated" -gt $$(($$total/10)) ] || [ -f $$target ]); then \
>> msgfilter --no-wrap -i "$$po" cat > "$$po".tmp; \
>> mv "$$po".tmp "$$target"; \
>> echo "copied $$target."; \
>> else \
>> - echo "WARN: $$target ($$translated translated messages) was not added/updated."; \
>> + echo "WARN: $$target ($$translated translated messages ($$((translated/total*100))%)) was not added/updated."; \
>> fi; \
>> done; \
>> done; \
>> + for po in po/doc/*.po; do \
>> + translated=$$(LANG=en_US.UTF-8 msgfmt --statistics "$$po" 2>&1 | cut -f1 -d' '); \
>> + untranslated=$$(LANG=en_US.UTF-8 msgfmt --statistics "$$po" 2>&1 | cut -f4 -d' '); \
>> + untranslated=$${untranslated:-0}; \
>> + total=$$(($$translated + $$untranslated)); \
>> + if [ "$$translated" -lt "$$(($$total/20))" ]; then \
>> + echo "WARN: $$po was removed because it is below the 5% threshold: $$((translated/total*100))%"; \
>> + rm $$po; \
>> + fi; \
>> + done; \
>
> LGTM, but this is getting a bit hairy. :-)
>
> No concrete suggestions, but it would be great if we could somehow split
> it and/or move it to a script in build-aux/ (that’d avoid double dollar
> signs) and/or write it in Scheme. Future work…
>
>> From 726ef94f91d5dab25c3ccfb2986dcba6d39a4ab8 Mon Sep 17 00:00:00 2001
>> Message-Id: <726ef94f91d5dab25c3ccfb2986dcba6d39a4ab8.1646763369.git.julien@lepiller.eu>
>> In-Reply-To: <37071410629a7d70c9b4e4f673f2c625d3ed4b47.1646763369.git.julien@lepiller.eu>
>> References: <37071410629a7d70c9b4e4f673f2c625d3ed4b47.1646763369.git.julien@lepiller.eu>
>> From: Julien Lepiller <julien@lepiller.eu>
>> Date: Tue, 8 Mar 2022 19:14:47 +0100
>> Subject: [PATCH 3/3] nls: Enforce translation thresholds.
>>
>> * po/doc/guix-cookbook.es.po: Remove file.
>> * po/doc/guix-cookbook.fa.po: Remove file.
>> * po/doc/guix-cookbook.fi.po: Remove file.
>> * po/doc/guix-cookbook.uk.po: Remove file.
>> * po/doc/local.mk: Remove them.
>> * doc/local.mk: Remove them.
>
> [...]
>
>> # If adding a language, update the following variables, and info_TEXINFOS.
>> MANUAL_LANGUAGES = de es fa fi fr it ko pt_BR ru sk zh_CN
>> -COOKBOOK_LANGUAGES = de es fa fi fr ko pt_BR ru sk uk zh_Hans
>> +COOKBOOK_LANGUAGES = de fr ko pt_BR ru sk zh_Hans
>
> Should we also remove fa, fi, it, ko, and sk from MANUAL_LANGUAGES and
> info_TEXINFOS?
>
> https://translate.fedoraproject.org/projects/guix/documentation-manual/
>
> Thank you!
>
> Ludo’.
J
J
Julien Lepiller wrote on 2 Apr 2022 18:58
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 54302-done@debbugs.gnu.org)
20220402185509.1c0328ec@tachikoma.lepiller.eu
Thanks for the ping. I applied the changes you suggested and pushed to
master, along with the usual monthly nls update. Thanks.

Le Tue, 29 Mar 2022 15:56:56 +0200,
Ludovic Courtès <ludo@gnu.org> a écrit :

Toggle quote (28 lines)
> De: Ludovic Courtès <ludo@gnu.org>
> À: Julien Lepiller <julien@lepiller.eu>
> Cc: 54302@debbugs.gnu.org
> Sujet: Re: bug#54302: [PATCH] nls: Implement translation thresholds.
> Date: Tue, 29 Mar 2022 15:56:56 +0200
> Client-de-messagerie: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
>
> Ping! :-)
>
> Ludovic Courtès <ludo@gnu.org> skribis:
>
> > Hi!
> >
> > Julien Lepiller <julien@lepiller.eu> skribis:
> >
> >> manual and cookbook: only include new languages when they reach 10%
> >> completion. Remove languages when they fall below 5%.
> >
> > SGTM (or even 15%/10%).
> >
> >> website (unrelated to this repo, but still important): only
> >> include new languages when they reach 80% completion. Remove
> >> languages when they fall below 60%. The reason for the higher
> >> threshold is that the website acts as some sort of advertisement,
> >> so we want a higher quality than half English half translated.
> >
> > SGTM.
> >
Closed
?