guix lint: support for spellchecker or basic grammar

OpenSubmitted by Vagrant Cascadian.
Details
5 participants
  • Efraim Flashner
  • Ludovic Courtès
  • Maxime Devos
  • Vagrant Cascadian
  • zimoun
Owner
unassigned
Severity
normal
V
V
Vagrant Cascadian wrote on 16 Nov 2020 02:53
(address . bug-guix@gnu.org)
87ima6rrri.fsf@yucca
Please consider a guix lint description/synopsis check for basicspelling, typo and rudimentary grammar issues.
Most of the ones I've found were caught by debian's "lintian" tool:
https://tracker.debian.org/lintian

Common issues appear to be:
"This packages" -> "This package" "allows to X" -> "Xs" or "Xing"

I've fixed many of these in the past:
git log --author=vagrant --extended-regexp --grep='spelling|typo|grammar' --patch
But some of the very same patterns keep reappearing!

Many of these are likely to be caught by most spell checking routines;I'm not sure if there is anything that would be implementable in pureguile, or it if would make sense to call out to an externalspellchecker.
Some of them might be harder, and obviously we do not want too manyfalse positives, but no need to get perfectionist on solving this; evenjust checking for "This packages" would haved detected many of theseissues!
That is, of course, if "guix lint" is being used consistently... :)

live well, vagrant
-----BEGIN PGP SIGNATURE-----
iHUEARYKAB0WIQRlgHNhO/zFx+LkXUXcUY/If5cWqgUCX7HbsQAKCRDcUY/If5cWqkGNAP9k5PHKWQUAar5lQzxIfjyZkqBArCd2xtcWvgAtrofqrgD+Prxswpjl9TSTrnfFB5SnKII3Ytwftt5aM5WHpw5BJQY==7RdI-----END PGP SIGNATURE-----
Z
Z
zimoun wrote on 16 Nov 2020 06:55
868sb1u9pb.fsf@gmail.com
Hi Vagrant,
On Sun, 15 Nov 2020 at 17:53, Vagrant Cascadian <vagrant@debian.org> wrote:
Toggle quote (7 lines)> Please consider a guix lint description/synopsis check for basic> spelling, typo and rudimentary grammar issues.>> Most of the ones I've found were caught by debian's "lintian" tool:>> https://tracker.debian.org/lintian
[...]
Toggle quote (5 lines)> Many of these are likely to be caught by most spell checking routines;> I'm not sure if there is anything that would be implementable in pure> guile, or it if would make sense to call out to an external> spellchecker.
The tool is ’spellintian’ [1], right? If yes, the work seems done by[2] but I am not sure to understand if it is only regexp and Perl or ifan external tool is called. And the list in debian/control is not veryhelpful.
1:https://salsa.debian.org/lintian/lintian/-/blob/master/bin/spellintian2:https://salsa.debian.org/lintian/lintian/-/blob/master/lib/Lintian/Spelling.pm

Toggle quote (2 lines)> That is, of course, if "guix lint" is being used consistently... :)
It should be! :-)

All the best,simon
L
L
Ludovic Courtès wrote on 3 Dec 2020 18:06
control message for bug #44675
(address . control@debbugs.gnu.org)
87360mddk0.fsf@gnu.org
tags 44675 + easyquit
V
V
Vagrant Cascadian wrote on 22 Apr 01:10 +0200
Re: bug#44675: guix lint: support for spellchecker or basic grammar
(address . 44675@debbugs.gnu.org)
87tunznsi7.fsf@yucca
Control: tags 44675 +patch
On 2020-11-15, Vagrant Cascadian wrote:
Toggle quote (2 lines)> Please consider a guix lint description/synopsis check for basic> spelling, typo and rudimentary grammar issues.
...
Toggle quote (10 lines)> Many of these are likely to be caught by most spell checking routines;> I'm not sure if there is anything that would be implementable in pure> guile, or it if would make sense to call out to an external> spellchecker.>> Some of them might be harder, and obviously we do not want too many> false positives, but no need to get perfectionist on solving this; even> just checking for "This packages" would haved detected many of these> issues!
In the attached patch, I've implemented a simple lint check for "Thispackages", which has been fixed in ... 42 packages so far in the gitrepository, so maybe this could help catch future ones!
I haven't implemented a more complicated spellchecker or grammar checkeror anything, but at least this is a start.
I think it is also within my skills to address "allows to" and "permitsto", if I'm not heading down the wrong path here...

live well, vagrant
From d4b851f5722cd6f8d514a4254884d1f7a016b74f Mon Sep 17 00:00:00 2001From: Vagrant Cascadian <vagrant@debian.org>Date: Wed, 21 Apr 2021 09:26:45 -0700Subject: [PATCH] lint: Add description check for check-pluralized-package
Fixes: https://issues.guix.gnu.org/44675
* guix/lint.scm: Check for occurances of "This packages" in package descriptions.* tests/lint.scm: Add test.--- guix/lint.scm | 9 +++++++++ tests/lint.scm | 7 +++++++ 2 files changed, 16 insertions(+)
Toggle diff (47 lines)diff --git a/guix/lint.scm b/guix/lint.scmindex 1bebfe03d3..ffeac18077 100644--- a/guix/lint.scm+++ b/guix/lint.scm@@ -221,6 +221,14 @@ markup is valid return a plain-text version of DESCRIPTION, otherwise #f." (G_ "Texinfo markup in description is invalid") #:field 'description)))) + (define (check-pluralized-this-package description)+ "Check that DESCRIPTION does not contain This packages"+ (if (string-match "This packages" description)+ (list+ (make-warning package+ (G_ "description contains This Packages but should just be This package")))+ '()))+ (define (check-trademarks description) "Check that DESCRIPTION does not contain '™' or '®' characters. See http://www.gnu.org/prep/standards/html_node/Trademarks.html."@@ -283,6 +291,7 @@ by two spaces; possible infraction~p at ~{~a~^, ~}") (check-not-empty description) (check-quotes description) (check-trademarks description)+ (check-pluralized-this-package description) ;; Use raw description for this because Texinfo rendering ;; automatically fixes end of sentence space. (check-end-of-sentence-space description)diff --git a/tests/lint.scm b/tests/lint.scmindex a2c8665142..6cb7a98686 100644--- a/tests/lint.scm+++ b/tests/lint.scm@@ -160,6 +160,13 @@ (description "This is a 'quoted' thing.")))) (check-description-style pkg)))) +(test-equal "description: pluralized this package"+ "description contains This Packages but should just be This package"+ (single-lint-warning-message+ (let ((pkg (dummy-package "x"+ (description "This packages is a typo."))))+ (check-description-style pkg))))+ (test-equal "synopsis: not a string" "invalid synopsis: #f" (single-lint-warning-message-- 2.30.2
-----BEGIN PGP SIGNATURE-----
iHUEARYKAB0WIQRlgHNhO/zFx+LkXUXcUY/If5cWqgUCYICw8AAKCRDcUY/If5cWql31AQDhw/mVDLHlM7VAiztoP8oGG+hY2Zkmksac+kaAFKTEiQD9Gmi+llqfzsUz2rP03GAKD8s9BKyuMDu9havpzQXHpAo==M7KV-----END PGP SIGNATURE-----
M
M
Maxime Devos wrote on 22 Apr 18:42 +0200
2f3077c0d040e4b40db19d98195845e124b064d3.camel@telenet.be
+ (define (check-pluralized-this-package description)+ "Check that DESCRIPTION does not contain This packages"
The sentence structure would be clearer if you used quotes here,something like "Check that DESCRIPTION does not contain ‘This packages’".
+ (if (string-match "This packages" description)+ (list+ (make-warning package+ (G_ "description contains This Packages but should just be This package")))
There are no package descriptions containing "This Packages".Did you mean "This packages"?
Toggle quote (1 lines)> +(test-equal "description: pluralized this package"
Quotes: "description: pluralized ‘this package’".
Toggle quote (1 lines)> + "description contains This Packages but should just be This package"
Capitalisation error: This Packages --> This packagesAlso, quotes: "description contains ‘This packages’ but should just be ‘This package’".
Greetings,Maxime.
-----BEGIN PGP SIGNATURE-----
iI0EABYKADUWIQTB8z7iDFKP233XAR9J4+4iGRcl7gUCYIGnbxccbWF4aW1lZGV2b3NAdGVsZW5ldC5iZQAKCRBJ4+4iGRcl7pxIAP4xHGk1slA5mIN7UDipnEXShvCcr2MyB+hdGP3FeD5GVAD5AUDkrtMMqNir0cYacA/mlJz/l0L6sQyOG2MF0yP1TQY==s0SE-----END PGP SIGNATURE-----

V
V
Vagrant Cascadian wrote on 22 Apr 19:57 +0200
87o8e6nqvv.fsf@yucca
On 2021-04-22, Maxime Devos wrote:
Toggle quote (6 lines)> + (define (check-pluralized-this-package description)> + "Check that DESCRIPTION does not contain This packages">> The sentence structure would be clearer if you used quotes here,> something like "Check that DESCRIPTION does not contain ‘This packages’".
Any compelling reason to use ‘This packages’ vs. 'This packages' ? Itseems other quotes in guix/lint.scm use '' also, and I'm not apparentlyskilled enough with a keyboard to generate ‘’-style quotes... :)

Toggle quote (8 lines)> + (if (string-match "This packages" description)> + (list> + (make-warning package> + (G_ "description contains This Packages but should just be This package")))>> There are no package descriptions containing "This Packages".> Did you mean "This packages"?
Nice catch, thanks!

Toggle quote (3 lines)>> +(test-equal "description: pluralized this package"> Quotes: "description: pluralized ‘this package’".
Noted.

Toggle quote (4 lines)>> + "description contains This Packages but should just be This package"> Capitalisation error: This Packages --> This packages> Also, quotes: "description contains ‘This packages’ but should just be ‘This package’".
Again, nice catch!

Updated the commit message and incorporated the above suggestions intothe updated attached patch.

live well, vagrant
From 4e724fbe9815e1c27967b835f08d2259164538ba Mon Sep 17 00:00:00 2001From: Vagrant Cascadian <vagrant@debian.org>Date: Wed, 21 Apr 2021 09:26:45 -0700Subject: [PATCH] lint: Add description check for pluralized "This package"
Partial fix for: https://issues.guix.gnu.org/44675
* guix/lint.scm (check-pluralized-this-package): Add check for occurances of "This packages" in package descriptions.* tests/lint.scm: Add test.--- guix/lint.scm | 9 +++++++++ tests/lint.scm | 7 +++++++ 2 files changed, 16 insertions(+)
Toggle diff (47 lines)diff --git a/guix/lint.scm b/guix/lint.scmindex 1bebfe03d3..e00048349b 100644--- a/guix/lint.scm+++ b/guix/lint.scm@@ -221,6 +221,14 @@ markup is valid return a plain-text version of DESCRIPTION, otherwise #f." (G_ "Texinfo markup in description is invalid") #:field 'description)))) + (define (check-pluralized-this-package description)+ "Check that DESCRIPTION does not contain 'This packages'"+ (if (string-match "This packages" description)+ (list+ (make-warning package+ (G_ "description contains 'This packages' but should just be 'This package'")))+ '()))+ (define (check-trademarks description) "Check that DESCRIPTION does not contain '™' or '®' characters. See http://www.gnu.org/prep/standards/html_node/Trademarks.html."@@ -283,6 +291,7 @@ by two spaces; possible infraction~p at ~{~a~^, ~}") (check-not-empty description) (check-quotes description) (check-trademarks description)+ (check-pluralized-this-package description) ;; Use raw description for this because Texinfo rendering ;; automatically fixes end of sentence space. (check-end-of-sentence-space description)diff --git a/tests/lint.scm b/tests/lint.scmindex a2c8665142..3e1b95680a 100644--- a/tests/lint.scm+++ b/tests/lint.scm@@ -160,6 +160,13 @@ (description "This is a 'quoted' thing.")))) (check-description-style pkg)))) +(test-equal "description: pluralized 'This package'"+ "description contains 'This packages' but should just be 'This package'"+ (single-lint-warning-message+ (let ((pkg (dummy-package "x"+ (description "This packages is a typo."))))+ (check-description-style pkg))))+ (test-equal "synopsis: not a string" "invalid synopsis: #f" (single-lint-warning-message-- 2.30.2
-----BEGIN PGP SIGNATURE-----
iHUEARYKAB0WIQRlgHNhO/zFx+LkXUXcUY/If5cWqgUCYIG5JAAKCRDcUY/If5cWqkSZAQCLAsyse5gMhpIQdnBHmq2g75AeRUC7sYA+k2X5YS+pZgD+JWMjxaj/9CAnWYlkeiP/Rviat5RG6a63NTlHq96zUQo==r0Zx-----END PGP SIGNATURE-----
M
M
Maxime Devos wrote on 22 Apr 20:05 +0200
1f1a7b54fce32d0241d9f689e00cf52b5c4d48fd.camel@telenet.be
Vagrant Cascadian schreef op do 22-04-2021 om 10:57 [-0700]:
Toggle quote (9 lines)> On 2021-04-22, Maxime Devos wrote:> > + (define (check-pluralized-this-package description)> > + "Check that DESCRIPTION does not contain This packages"> > > > The sentence structure would be clearer if you used quotes here,> > something like "Check that DESCRIPTION does not contain ‘This packages’".> > Any compelling reason to use ‘This packages’ vs. 'This packages' ?
I find ‘curly quotes’ more aesthetically pleasing, though that's a bit subjectiveI guess.
Toggle quote (1 lines)> It seems other quotes in guix/lint.scm use '' also,
I believe they should use ‘curly quotes’ as well, though I would like to hearwhat other things about that first.
Toggle quote (3 lines)> and I'm not apparently> skilled enough with a keyboard to generate ‘’-style quotes... :)
If your keyboard is azerty, you could choose ‘Belgian alternative’, and type‘ with alt-gr+f and ’ with alt-gr+g.
Toggle quote (3 lines)> Updated the commit message and incorporated the above suggestions into> the updated attached patch.
One other suggestion: you used "string-match" in 'check-pluralized-this-package',which is a bit overkill. string-match interprets its first argument as a regex.The procedure "string-contains" is simpler and probably more efficient.
The patch looks good otherwise.
Greetings,Maxime.
-----BEGIN PGP SIGNATURE-----
iI0EABYKADUWIQTB8z7iDFKP233XAR9J4+4iGRcl7gUCYIG64hccbWF4aW1lZGV2b3NAdGVsZW5ldC5iZQAKCRBJ4+4iGRcl7s1SAP49qSu0Cb+lpiG9d3u/bp6jldfK1bFVLabTd1mjvILUgQD+OQQL38Y3qweGxWKqGg2njcvMGu3cDi3mQTz/GkXxzw0==/QsW-----END PGP SIGNATURE-----

E
E
Efraim Flashner wrote on 25 Apr 09:27 +0200
(name . Vagrant Cascadian)(address . vagrant@debian.org)(address . 44675@debbugs.gnu.org)
YIUZ4CqdTHGWtCTt@3900XT
On Wed, Apr 21, 2021 at 04:10:40PM -0700, Vagrant Cascadian wrote:
Toggle quote (30 lines)> Control: tags 44675 +patch> > On 2020-11-15, Vagrant Cascadian wrote:> > Please consider a guix lint description/synopsis check for basic> > spelling, typo and rudimentary grammar issues.> ...> > Many of these are likely to be caught by most spell checking routines;> > I'm not sure if there is anything that would be implementable in pure> > guile, or it if would make sense to call out to an external> > spellchecker.> >> > Some of them might be harder, and obviously we do not want too many> > false positives, but no need to get perfectionist on solving this; even> > just checking for "This packages" would haved detected many of these> > issues!> > In the attached patch, I've implemented a simple lint check for "This> packages", which has been fixed in ... 42 packages so far in the git> repository, so maybe this could help catch future ones!> > I haven't implemented a more complicated spellchecker or grammar checker> or anything, but at least this is a start.> > I think it is also within my skills to address "allows to" and "permits> to", if I'm not heading down the wrong path here...> > > live well,> vagrant
It might make more sense to name it something more like'catch-common-typos' and to search for 'This packages', 'allows to','permits to', 'file-name' and then print out the different mistakes inthe description. Then we can add more as we find them, rather than onecheck per mistake.

-- Efraim Flashner <efraim@flashner.co.il> אפרים פלשנרGPG key = A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351Confidentiality cannot be guaranteed on emails sent or received unencrypted
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEoov0DD5VE3JmLRT3Qarn3Mo9g1EFAmCFGd0ACgkQQarn3Mo9g1Gbyg//UlUXH/f0eliEyVtjqJQt+8hNoW0RMgHITqW11z9JZ/n/2Ahff+Jb//83O4o/9CELlgKZKb9sLCXJFGFAHBQlZVY1zVxTd2QbzFebdN/l5Fya2stJarynHFUJHWORKHkoq0pC7kSKKxOmIklfxV8sC9EjmiiTKnYlTSWUiex7hlHpTW39PBoXoDn8xUeHk8SstbO00oYyLN89mmcgjWuUSOFFO0K7fHJxgWeL8F/m298KaJTCbteRS0D73o69Hi2+aNpj/742DiWhNCv1CuwNY+aI0Q0V5whJFiZONMJHqQbTqctNn1n5ujjhZkl5CK5DnYoMKLIg91Oujp+s2BvP9MSiIHXw6EqkaTryG3LQmUZAXcaTTBtBttHbfbbacjiam2JRfaUCL2h/31dZJ71zi095Taeqlah+OjUxNi9+BYacxM+w02iHUxlcj34PJSs30waldU+84+LH+lbN/hZ23qFbvKZ5Meq9NdRsAjGaeceZIO605K7WqLbrA9eP5iRHagsS1OsVc9crkU8h2Q9lLIWMmxq8ZZSuxZJh7fee+BEcyMaK+aa8fwZg7no5wlSJaoIybvflO15GrMbaWyBoGD3Q6Po8QAp7Hif5uEMIEy0Xj8wEIVd0VcB37KDKhWnvJ3iwv3vZlS8VlZXNKiB7jMq1SYqhUmdULwNnc7fogFc==DYJF-----END PGP SIGNATURE-----

V
V
Vagrant Cascadian wrote on 25 Apr 18:43 +0200
(name . Efraim Flashner)(address . efraim@flashner.co.il)(address . 44675@debbugs.gnu.org)
87eeeynwm2.fsf@yucca
On 2021-04-25, Efraim Flashner wrote:
Toggle quote (37 lines)> On Wed, Apr 21, 2021 at 04:10:40PM -0700, Vagrant Cascadian wrote:>> Control: tags 44675 +patch>> >> On 2020-11-15, Vagrant Cascadian wrote:>> > Please consider a guix lint description/synopsis check for basic>> > spelling, typo and rudimentary grammar issues.>> ...>> > Many of these are likely to be caught by most spell checking routines;>> > I'm not sure if there is anything that would be implementable in pure>> > guile, or it if would make sense to call out to an external>> > spellchecker.>> >>> > Some of them might be harder, and obviously we do not want too many>> > false positives, but no need to get perfectionist on solving this; even>> > just checking for "This packages" would haved detected many of these>> > issues!>> >> In the attached patch, I've implemented a simple lint check for "This>> packages", which has been fixed in ... 42 packages so far in the git>> repository, so maybe this could help catch future ones!>> >> I haven't implemented a more complicated spellchecker or grammar checker>> or anything, but at least this is a start.>> >> I think it is also within my skills to address "allows to" and "permits>> to", if I'm not heading down the wrong path here...>> >> >> live well,>> vagrant>> It might make more sense to name it something more like> 'catch-common-typos' and to search for 'This packages', 'allows to',> 'permits to', 'file-name' and then print out the different mistakes in> the description. Then we can add more as we find them, rather than one> check per mistake.
That makes sense, though 'This packages' is very straightforward and hasa simple recommendation to fix it, whereas 'allows to' requires morecomplicated english skills to come up with the correct solution... itcould just simply flag those cases as "wrong" without a solution.
Basically, I already stretched my cargo-culting, er, guile skills justto get something obvious fixed that I keep seeing over and overagain. It would be a good excercise for me to better learn guile toextend to further typos, though ... limited time.
Playing whack-a-mole with typos does get tiring :)

live well, vagrant
-----BEGIN PGP SIGNATURE-----
iHUEARYKAB0WIQRlgHNhO/zFx+LkXUXcUY/If5cWqgUCYIWcJQAKCRDcUY/If5cWqnAFAP4qgz82H73sev7l5ghpUl6hb+G9KLOH/HtdY5XPVUxVIAD+J1bPekfF9yJqJ4kM0wW5WF7HTJN2ZbE2X2ITOCgMsgk==7Ckg-----END PGP SIGNATURE-----
L
L
Ludovic Courtès wrote on 4 May 18:40 +0200
(name . Vagrant Cascadian)(address . vagrant@debian.org)
87o8dqmozo.fsf@gnu.org
Hi Vagrant,
Vagrant Cascadian <vagrant@debian.org> skribis:
Toggle quote (11 lines)> From 4e724fbe9815e1c27967b835f08d2259164538ba Mon Sep 17 00:00:00 2001> From: Vagrant Cascadian <vagrant@debian.org>> Date: Wed, 21 Apr 2021 09:26:45 -0700> Subject: [PATCH] lint: Add description check for pluralized "This package">> Partial fix for: https://issues.guix.gnu.org/44675>> * guix/lint.scm (check-pluralized-this-package): Add check for> occurances of "This packages" in package descriptions.> * tests/lint.scm: Add test.
I had missed this patch, nice!
Toggle quote (8 lines)> + (define (check-pluralized-this-package description)> + "Check that DESCRIPTION does not contain 'This packages'"> + (if (string-match "This packages" description)> + (list> + (make-warning package> + (G_ "description contains 'This packages' but should just be 'This package'")))> + '()))
How about making this ‘check-spelling’ and generalizing a bit so that ititerates over a bunch of regexps or strings?
Like:
(if (any (cut string-contains description <>) patterns) …)
where ‘patterns’ is a list of strings.
(Note that ‘string-match’ invokes libc’s regcomp + regexec, so it’s moreheavyweight than needed here.)
Thanks,Ludo’.
?