guix lint: support for spellchecker or basic grammar

DoneSubmitted by Vagrant Cascadian.
Details
5 participants
  • Efraim Flashner
  • Ludovic Courtès
  • Maxime Devos
  • Vagrant Cascadian
  • zimoun
Owner
unassigned
Severity
normal
V
V
Vagrant Cascadian wrote on 16 Nov 2020 02:53
(address . bug-guix@gnu.org)
87ima6rrri.fsf@yucca
Please consider a guix lint description/synopsis check for basic
spelling, typo and rudimentary grammar issues.

Most of the ones I've found were caught by debian's "lintian" tool:



Common issues appear to be:

"This packages" -> "This package"
"allows to X" -> "Xs" or "Xing"


I've fixed many of these in the past:

git log --author=vagrant --extended-regexp --grep='spelling|typo|grammar' --patch

But some of the very same patterns keep reappearing!


Many of these are likely to be caught by most spell checking routines;
I'm not sure if there is anything that would be implementable in pure
guile, or it if would make sense to call out to an external
spellchecker.

Some of them might be harder, and obviously we do not want too many
false positives, but no need to get perfectionist on solving this; even
just checking for "This packages" would haved detected many of these
issues!

That is, of course, if "guix lint" is being used consistently... :)


live well,
vagrant
-----BEGIN PGP SIGNATURE-----

iHUEARYKAB0WIQRlgHNhO/zFx+LkXUXcUY/If5cWqgUCX7HbsQAKCRDcUY/If5cW
qkGNAP9k5PHKWQUAar5lQzxIfjyZkqBArCd2xtcWvgAtrofqrgD+Prxswpjl9TST
rnfFB5SnKII3Ytwftt5aM5WHpw5BJQY=
=7RdI
-----END PGP SIGNATURE-----

Z
Z
zimoun wrote on 16 Nov 2020 06:55
868sb1u9pb.fsf@gmail.com
Hi Vagrant,

On Sun, 15 Nov 2020 at 17:53, Vagrant Cascadian <vagrant@debian.org> wrote:
Toggle quote (7 lines)
> Please consider a guix lint description/synopsis check for basic
> spelling, typo and rudimentary grammar issues.
>
> Most of the ones I've found were caught by debian's "lintian" tool:
>
> https://tracker.debian.org/lintian

[...]

Toggle quote (5 lines)
> Many of these are likely to be caught by most spell checking routines;
> I'm not sure if there is anything that would be implementable in pure
> guile, or it if would make sense to call out to an external
> spellchecker.

The tool is ’spellintian’ [1], right? If yes, the work seems done by
[2] but I am not sure to understand if it is only regexp and Perl or if
an external tool is called. And the list in debian/control is not very
helpful.

1:
2:


Toggle quote (2 lines)
> That is, of course, if "guix lint" is being used consistently... :)

It should be! :-)


All the best,
simon
L
L
Ludovic Courtès wrote on 3 Dec 2020 18:06
control message for bug #44675
(address . control@debbugs.gnu.org)
87360mddk0.fsf@gnu.org
tags 44675 + easy
quit
V
V
Vagrant Cascadian wrote on 22 Apr 2021 01:10
Re: bug#44675: guix lint: support for spellchecker or basic grammar
(address . 44675@debbugs.gnu.org)
87tunznsi7.fsf@yucca
Control: tags 44675 +patch

On 2020-11-15, Vagrant Cascadian wrote:
Toggle quote (2 lines)
> Please consider a guix lint description/synopsis check for basic
> spelling, typo and rudimentary grammar issues.
...
Toggle quote (10 lines)
> Many of these are likely to be caught by most spell checking routines;
> I'm not sure if there is anything that would be implementable in pure
> guile, or it if would make sense to call out to an external
> spellchecker.
>
> Some of them might be harder, and obviously we do not want too many
> false positives, but no need to get perfectionist on solving this; even
> just checking for "This packages" would haved detected many of these
> issues!

In the attached patch, I've implemented a simple lint check for "This
packages", which has been fixed in ... 42 packages so far in the git
repository, so maybe this could help catch future ones!

I haven't implemented a more complicated spellchecker or grammar checker
or anything, but at least this is a start.

I think it is also within my skills to address "allows to" and "permits
to", if I'm not heading down the wrong path here...


live well,
vagrant
From d4b851f5722cd6f8d514a4254884d1f7a016b74f Mon Sep 17 00:00:00 2001
From: Vagrant Cascadian <vagrant@debian.org>
Date: Wed, 21 Apr 2021 09:26:45 -0700
Subject: [PATCH] lint: Add description check for check-pluralized-package


* guix/lint.scm: Check for occurances of "This packages" in package
descriptions.
* tests/lint.scm: Add test.
---
guix/lint.scm | 9 +++++++++
tests/lint.scm | 7 +++++++
2 files changed, 16 insertions(+)

Toggle diff (47 lines)
diff --git a/guix/lint.scm b/guix/lint.scm
index 1bebfe03d3..ffeac18077 100644
--- a/guix/lint.scm
+++ b/guix/lint.scm
@@ -221,6 +221,14 @@ markup is valid return a plain-text version of DESCRIPTION, otherwise #f."
                       (G_ "Texinfo markup in description is invalid")
                       #:field 'description))))
 
+  (define (check-pluralized-this-package description)
+    "Check that DESCRIPTION does not contain This packages"
+    (if (string-match "This packages" description)
+	(list
+	 (make-warning package
+		       (G_ "description contains This Packages but should just be This package")))
+	'()))
+
   (define (check-trademarks description)
     "Check that DESCRIPTION does not contain '™' or '®' characters.  See
 http://www.gnu.org/prep/standards/html_node/Trademarks.html."
@@ -283,6 +291,7 @@ by two spaces; possible infraction~p at ~{~a~^, ~}")
          (check-not-empty description)
          (check-quotes description)
          (check-trademarks description)
+         (check-pluralized-this-package description)
          ;; Use raw description for this because Texinfo rendering
          ;; automatically fixes end of sentence space.
          (check-end-of-sentence-space description)
diff --git a/tests/lint.scm b/tests/lint.scm
index a2c8665142..6cb7a98686 100644
--- a/tests/lint.scm
+++ b/tests/lint.scm
@@ -160,6 +160,13 @@
                              (description "This is a 'quoted' thing."))))
      (check-description-style pkg))))
 
+(test-equal "description: pluralized this package"
+  "description contains This Packages but should just be This package"
+  (single-lint-warning-message
+   (let ((pkg (dummy-package "x"
+                             (description "This packages is a typo."))))
+     (check-description-style pkg))))
+
 (test-equal "synopsis: not a string"
   "invalid synopsis: #f"
   (single-lint-warning-message
-- 
2.30.2
-----BEGIN PGP SIGNATURE-----

iHUEARYKAB0WIQRlgHNhO/zFx+LkXUXcUY/If5cWqgUCYICw8AAKCRDcUY/If5cW
ql31AQDhw/mVDLHlM7VAiztoP8oGG+hY2Zkmksac+kaAFKTEiQD9Gmi+llqfzsUz
2rP03GAKD8s9BKyuMDu9havpzQXHpAo=
=M7KV
-----END PGP SIGNATURE-----

M
M
Maxime Devos wrote on 22 Apr 2021 18:42
2f3077c0d040e4b40db19d98195845e124b064d3.camel@telenet.be
+ (define (check-pluralized-this-package description)
+ "Check that DESCRIPTION does not contain This packages"

The sentence structure would be clearer if you used quotes here,
something like "Check that DESCRIPTION does not contain ‘This packages’".

+ (if (string-match "This packages" description)
+ (list
+ (make-warning package
+ (G_ "description contains This Packages but should just be This package")))

There are no package descriptions containing "This Packages".
Did you mean "This packages"?
Toggle quote (1 lines)
> +(test-equal "description: pluralized this package"
Quotes: "description: pluralized ‘this package’".

Toggle quote (1 lines)
> + "description contains This Packages but should just be This package"
Capitalisation error: This Packages --> This packages
Also, quotes: "description contains ‘This packages’ but should just be ‘This package’".

Greetings,
Maxime.
-----BEGIN PGP SIGNATURE-----

iI0EABYKADUWIQTB8z7iDFKP233XAR9J4+4iGRcl7gUCYIGnbxccbWF4aW1lZGV2
b3NAdGVsZW5ldC5iZQAKCRBJ4+4iGRcl7pxIAP4xHGk1slA5mIN7UDipnEXShvCc
r2MyB+hdGP3FeD5GVAD5AUDkrtMMqNir0cYacA/mlJz/l0L6sQyOG2MF0yP1TQY=
=s0SE
-----END PGP SIGNATURE-----


V
V
Vagrant Cascadian wrote on 22 Apr 2021 19:57
87o8e6nqvv.fsf@yucca
On 2021-04-22, Maxime Devos wrote:
Toggle quote (6 lines)
> + (define (check-pluralized-this-package description)
> + "Check that DESCRIPTION does not contain This packages"
>
> The sentence structure would be clearer if you used quotes here,
> something like "Check that DESCRIPTION does not contain ‘This packages’".

Any compelling reason to use ‘This packages’ vs. 'This packages' ? It
seems other quotes in guix/lint.scm use '' also, and I'm not apparently
skilled enough with a keyboard to generate ‘’-style quotes... :)


Toggle quote (8 lines)
> + (if (string-match "This packages" description)
> + (list
> + (make-warning package
> + (G_ "description contains This Packages but should just be This package")))
>
> There are no package descriptions containing "This Packages".
> Did you mean "This packages"?

Nice catch, thanks!


Toggle quote (3 lines)
>> +(test-equal "description: pluralized this package"
> Quotes: "description: pluralized ‘this package’".

Noted.


Toggle quote (4 lines)
>> + "description contains This Packages but should just be This package"
> Capitalisation error: This Packages --> This packages
> Also, quotes: "description contains ‘This packages’ but should just be ‘This package’".

Again, nice catch!


Updated the commit message and incorporated the above suggestions into
the updated attached patch.


live well,
vagrant
From 4e724fbe9815e1c27967b835f08d2259164538ba Mon Sep 17 00:00:00 2001
From: Vagrant Cascadian <vagrant@debian.org>
Date: Wed, 21 Apr 2021 09:26:45 -0700
Subject: [PATCH] lint: Add description check for pluralized "This package"


* guix/lint.scm (check-pluralized-this-package): Add check for
occurances of "This packages" in package descriptions.
* tests/lint.scm: Add test.
---
guix/lint.scm | 9 +++++++++
tests/lint.scm | 7 +++++++
2 files changed, 16 insertions(+)

Toggle diff (47 lines)
diff --git a/guix/lint.scm b/guix/lint.scm
index 1bebfe03d3..e00048349b 100644
--- a/guix/lint.scm
+++ b/guix/lint.scm
@@ -221,6 +221,14 @@ markup is valid return a plain-text version of DESCRIPTION, otherwise #f."
                       (G_ "Texinfo markup in description is invalid")
                       #:field 'description))))
 
+  (define (check-pluralized-this-package description)
+    "Check that DESCRIPTION does not contain 'This packages'"
+    (if (string-match "This packages" description)
+	(list
+	 (make-warning package
+		       (G_ "description contains 'This packages' but should just be 'This package'")))
+	'()))
+
   (define (check-trademarks description)
     "Check that DESCRIPTION does not contain '™' or '®' characters.  See
 http://www.gnu.org/prep/standards/html_node/Trademarks.html."
@@ -283,6 +291,7 @@ by two spaces; possible infraction~p at ~{~a~^, ~}")
          (check-not-empty description)
          (check-quotes description)
          (check-trademarks description)
+         (check-pluralized-this-package description)
          ;; Use raw description for this because Texinfo rendering
          ;; automatically fixes end of sentence space.
          (check-end-of-sentence-space description)
diff --git a/tests/lint.scm b/tests/lint.scm
index a2c8665142..3e1b95680a 100644
--- a/tests/lint.scm
+++ b/tests/lint.scm
@@ -160,6 +160,13 @@
                              (description "This is a 'quoted' thing."))))
      (check-description-style pkg))))
 
+(test-equal "description: pluralized 'This package'"
+  "description contains 'This packages' but should just be 'This package'"
+  (single-lint-warning-message
+   (let ((pkg (dummy-package "x"
+                             (description "This packages is a typo."))))
+     (check-description-style pkg))))
+
 (test-equal "synopsis: not a string"
   "invalid synopsis: #f"
   (single-lint-warning-message
-- 
2.30.2
-----BEGIN PGP SIGNATURE-----

iHUEARYKAB0WIQRlgHNhO/zFx+LkXUXcUY/If5cWqgUCYIG5JAAKCRDcUY/If5cW
qkSZAQCLAsyse5gMhpIQdnBHmq2g75AeRUC7sYA+k2X5YS+pZgD+JWMjxaj/9CAn
WYlkeiP/Rviat5RG6a63NTlHq96zUQo=
=r0Zx
-----END PGP SIGNATURE-----

M
M
Maxime Devos wrote on 22 Apr 2021 20:05
1f1a7b54fce32d0241d9f689e00cf52b5c4d48fd.camel@telenet.be
Vagrant Cascadian schreef op do 22-04-2021 om 10:57 [-0700]:
Toggle quote (9 lines)
> On 2021-04-22, Maxime Devos wrote:
> > + (define (check-pluralized-this-package description)
> > + "Check that DESCRIPTION does not contain This packages"
> >
> > The sentence structure would be clearer if you used quotes here,
> > something like "Check that DESCRIPTION does not contain ‘This packages’".
>
> Any compelling reason to use ‘This packages’ vs. 'This packages' ?

I find ‘curly quotes’ more aesthetically pleasing, though that's a bit subjective
I guess.

Toggle quote (1 lines)
> It seems other quotes in guix/lint.scm use '' also,
I believe they should use ‘curly quotes’ as well, though I would like to hear
what other things about that first.

Toggle quote (3 lines)
> and I'm not apparently
> skilled enough with a keyboard to generate ‘’-style quotes... :)

If your keyboard is azerty, you could choose ‘Belgian alternative’, and type
‘ with alt-gr+f and ’ with alt-gr+g.

Toggle quote (3 lines)
> Updated the commit message and incorporated the above suggestions into
> the updated attached patch.

One other suggestion: you used "string-match" in 'check-pluralized-this-package',
which is a bit overkill. string-match interprets its first argument as a regex.
The procedure "string-contains" is simpler and probably more efficient.

The patch looks good otherwise.

Greetings,
Maxime.
-----BEGIN PGP SIGNATURE-----

iI0EABYKADUWIQTB8z7iDFKP233XAR9J4+4iGRcl7gUCYIG64hccbWF4aW1lZGV2
b3NAdGVsZW5ldC5iZQAKCRBJ4+4iGRcl7s1SAP49qSu0Cb+lpiG9d3u/bp6jldfK
1bFVLabTd1mjvILUgQD+OQQL38Y3qweGxWKqGg2njcvMGu3cDi3mQTz/GkXxzw0=
=/QsW
-----END PGP SIGNATURE-----


E
E
Efraim Flashner wrote on 25 Apr 2021 09:27
(name . Vagrant Cascadian)(address . vagrant@debian.org)(address . 44675@debbugs.gnu.org)
YIUZ4CqdTHGWtCTt@3900XT
On Wed, Apr 21, 2021 at 04:10:40PM -0700, Vagrant Cascadian wrote:
Toggle quote (30 lines)
> Control: tags 44675 +patch
>
> On 2020-11-15, Vagrant Cascadian wrote:
> > Please consider a guix lint description/synopsis check for basic
> > spelling, typo and rudimentary grammar issues.
> ...
> > Many of these are likely to be caught by most spell checking routines;
> > I'm not sure if there is anything that would be implementable in pure
> > guile, or it if would make sense to call out to an external
> > spellchecker.
> >
> > Some of them might be harder, and obviously we do not want too many
> > false positives, but no need to get perfectionist on solving this; even
> > just checking for "This packages" would haved detected many of these
> > issues!
>
> In the attached patch, I've implemented a simple lint check for "This
> packages", which has been fixed in ... 42 packages so far in the git
> repository, so maybe this could help catch future ones!
>
> I haven't implemented a more complicated spellchecker or grammar checker
> or anything, but at least this is a start.
>
> I think it is also within my skills to address "allows to" and "permits
> to", if I'm not heading down the wrong path here...
>
>
> live well,
> vagrant

It might make more sense to name it something more like
'catch-common-typos' and to search for 'This packages', 'allows to',
'permits to', 'file-name' and then print out the different mistakes in
the description. Then we can add more as we find them, rather than one
check per mistake.


--
Efraim Flashner <efraim@flashner.co.il> אפרים פלשנר
GPG key = A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted
-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEoov0DD5VE3JmLRT3Qarn3Mo9g1EFAmCFGd0ACgkQQarn3Mo9
g1Gbyg//UlUXH/f0eliEyVtjqJQt+8hNoW0RMgHITqW11z9JZ/n/2Ahff+Jb//83
O4o/9CELlgKZKb9sLCXJFGFAHBQlZVY1zVxTd2QbzFebdN/l5Fya2stJarynHFUJ
HWORKHkoq0pC7kSKKxOmIklfxV8sC9EjmiiTKnYlTSWUiex7hlHpTW39PBoXoDn8
xUeHk8SstbO00oYyLN89mmcgjWuUSOFFO0K7fHJxgWeL8F/m298KaJTCbteRS0D7
3o69Hi2+aNpj/742DiWhNCv1CuwNY+aI0Q0V5whJFiZONMJHqQbTqctNn1n5ujjh
Zkl5CK5DnYoMKLIg91Oujp+s2BvP9MSiIHXw6EqkaTryG3LQmUZAXcaTTBtBttHb
fbbacjiam2JRfaUCL2h/31dZJ71zi095Taeqlah+OjUxNi9+BYacxM+w02iHUxlc
j34PJSs30waldU+84+LH+lbN/hZ23qFbvKZ5Meq9NdRsAjGaeceZIO605K7WqLbr
A9eP5iRHagsS1OsVc9crkU8h2Q9lLIWMmxq8ZZSuxZJh7fee+BEcyMaK+aa8fwZg
7no5wlSJaoIybvflO15GrMbaWyBoGD3Q6Po8QAp7Hif5uEMIEy0Xj8wEIVd0VcB3
7KDKhWnvJ3iwv3vZlS8VlZXNKiB7jMq1SYqhUmdULwNnc7fogFc=
=DYJF
-----END PGP SIGNATURE-----


V
V
Vagrant Cascadian wrote on 25 Apr 2021 18:43
(name . Efraim Flashner)(address . efraim@flashner.co.il)(address . 44675@debbugs.gnu.org)
87eeeynwm2.fsf@yucca
On 2021-04-25, Efraim Flashner wrote:
Toggle quote (37 lines)
> On Wed, Apr 21, 2021 at 04:10:40PM -0700, Vagrant Cascadian wrote:
>> Control: tags 44675 +patch
>>
>> On 2020-11-15, Vagrant Cascadian wrote:
>> > Please consider a guix lint description/synopsis check for basic
>> > spelling, typo and rudimentary grammar issues.
>> ...
>> > Many of these are likely to be caught by most spell checking routines;
>> > I'm not sure if there is anything that would be implementable in pure
>> > guile, or it if would make sense to call out to an external
>> > spellchecker.
>> >
>> > Some of them might be harder, and obviously we do not want too many
>> > false positives, but no need to get perfectionist on solving this; even
>> > just checking for "This packages" would haved detected many of these
>> > issues!
>>
>> In the attached patch, I've implemented a simple lint check for "This
>> packages", which has been fixed in ... 42 packages so far in the git
>> repository, so maybe this could help catch future ones!
>>
>> I haven't implemented a more complicated spellchecker or grammar checker
>> or anything, but at least this is a start.
>>
>> I think it is also within my skills to address "allows to" and "permits
>> to", if I'm not heading down the wrong path here...
>>
>>
>> live well,
>> vagrant
>
> It might make more sense to name it something more like
> 'catch-common-typos' and to search for 'This packages', 'allows to',
> 'permits to', 'file-name' and then print out the different mistakes in
> the description. Then we can add more as we find them, rather than one
> check per mistake.

That makes sense, though 'This packages' is very straightforward and has
a simple recommendation to fix it, whereas 'allows to' requires more
complicated english skills to come up with the correct solution... it
could just simply flag those cases as "wrong" without a solution.

Basically, I already stretched my cargo-culting, er, guile skills just
to get something obvious fixed that I keep seeing over and over
again. It would be a good excercise for me to better learn guile to
extend to further typos, though ... limited time.

Playing whack-a-mole with typos does get tiring :)


live well,
vagrant
-----BEGIN PGP SIGNATURE-----

iHUEARYKAB0WIQRlgHNhO/zFx+LkXUXcUY/If5cWqgUCYIWcJQAKCRDcUY/If5cW
qnAFAP4qgz82H73sev7l5ghpUl6hb+G9KLOH/HtdY5XPVUxVIAD+J1bPekfF9yJq
J4kM0wW5WF7HTJN2ZbE2X2ITOCgMsgk=
=7Ckg
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 4 May 2021 18:40
(name . Vagrant Cascadian)(address . vagrant@debian.org)
87o8dqmozo.fsf@gnu.org
Hi Vagrant,

Vagrant Cascadian <vagrant@debian.org> skribis:

Toggle quote (11 lines)
> From 4e724fbe9815e1c27967b835f08d2259164538ba Mon Sep 17 00:00:00 2001
> From: Vagrant Cascadian <vagrant@debian.org>
> Date: Wed, 21 Apr 2021 09:26:45 -0700
> Subject: [PATCH] lint: Add description check for pluralized "This package"
>
> Partial fix for: https://issues.guix.gnu.org/44675
>
> * guix/lint.scm (check-pluralized-this-package): Add check for
> occurances of "This packages" in package descriptions.
> * tests/lint.scm: Add test.

I had missed this patch, nice!

Toggle quote (8 lines)
> + (define (check-pluralized-this-package description)
> + "Check that DESCRIPTION does not contain 'This packages'"
> + (if (string-match "This packages" description)
> + (list
> + (make-warning package
> + (G_ "description contains 'This packages' but should just be 'This package'")))
> + '()))

How about making this ‘check-spelling’ and generalizing a bit so that it
iterates over a bunch of regexps or strings?

Like:

(if (any (cut string-contains description <>) patterns)
…)

where ‘patterns’ is a list of strings.

(Note that ‘string-match’ invokes libc’s regcomp + regexec, so it’s more
heavyweight than needed here.)

Thanks,
Ludo’.
V
V
Vagrant Cascadian wrote on 9 Jun 2021 17:33
(name . Ludovic Courtès)(address . ludo@gnu.org)
87zgvz129e.fsf@yucca
On 2021-05-04, Ludovic Courtès wrote:
Toggle quote (33 lines)
> Vagrant Cascadian <vagrant@debian.org> skribis:
>
>> From 4e724fbe9815e1c27967b835f08d2259164538ba Mon Sep 17 00:00:00 2001
>> From: Vagrant Cascadian <vagrant@debian.org>
>> Date: Wed, 21 Apr 2021 09:26:45 -0700
>> Subject: [PATCH] lint: Add description check for pluralized "This package"
>>
>> Partial fix for: https://issues.guix.gnu.org/44675
>>
>> * guix/lint.scm (check-pluralized-this-package): Add check for
>> occurances of "This packages" in package descriptions.
>> * tests/lint.scm: Add test.
>
> I had missed this patch, nice!
>
>> + (define (check-pluralized-this-package description)
>> + "Check that DESCRIPTION does not contain 'This packages'"
>> + (if (string-match "This packages" description)
>> + (list
>> + (make-warning package
>> + (G_ "description contains 'This packages' but should just be 'This package'")))
>> + '()))
>
> How about making this ‘check-spelling’ and generalizing a bit so that it
> iterates over a bunch of regexps or strings?
>
> Like:
>
> (if (any (cut string-contains description <>) patterns)
> …)
>
> where ‘patterns’ is a list of strings.

Love the idea, but would need some help in implementing!

There have been at least three newly added "This packages" since I
submitted this patch, so wondering if we can at least get the simple
case merged before getting too caught up in all the potential
improvements?


Toggle quote (3 lines)
> (Note that ‘string-match’ invokes libc’s regcomp + regexec, so it’s more
> heavyweight than needed here.)

I can probably manage that! (I'll dig up where the simpler suggestion was)


live well,
vagrant
-----BEGIN PGP SIGNATURE-----

iHUEARYKAB0WIQRlgHNhO/zFx+LkXUXcUY/If5cWqgUCYMDfTQAKCRDcUY/If5cW
qgzgAP9D/QK00msHp9nAnF49CJThYxuajJrLHjts0/9KLXy9JQEAw4CVPdEbgH67
AGhsq8QOmGHm6cdfa21MIKqdP/zv7Q8=
=iO+2
-----END PGP SIGNATURE-----

V
V
Vagrant Cascadian wrote on 22 Oct 2021 01:18
(address . 44675@debbugs.gnu.org)
8735ouotkl.fsf@yucca
On 2021-06-09, Vagrant Cascadian wrote:
Toggle quote (5 lines)
> There have been at least three newly added "This packages" since I
> submitted this patch, so wondering if we can at least get the simple
> case merged before getting too caught up in all the potential
> improvements?

And up until today, that list grew to 7! (fixed by

Long delayed updated patch ... I think it addresses almost all of the
issues brought up, and maybe introduces a few new ones!

It has been rewritten to easily add new typo checks, but this one so far
only addresses pluralized "This packages". Would be easy enough to add
"allows to" but hard to add a suggested fix...


Big thanks to rekado, vivien and nckx who helped via #guix IRC!


live well,
vagrant
From 3ab46ca7932614ab4c699512c2fbfa8207ffa964 Mon Sep 17 00:00:00 2001
From: Vagrant Cascadian <vagrant@debian.org>
Date: Thu, 21 Oct 2021 15:51:11 -0700
Subject: [PATCH] lint: Add description check for pluralized "This package"


* guix/lint.scm (check-description-typo): Add check for
occurances of "This packages" in package descriptions.
* tests/lint.scm: Add test.
---
guix/lint.scm | 12 ++++++++++++
tests/lint.scm | 7 +++++++
2 files changed, 19 insertions(+)

Toggle diff (50 lines)
diff --git a/guix/lint.scm b/guix/lint.scm
index 7b02b9cec0..b22454fd31 100644
--- a/guix/lint.scm
+++ b/guix/lint.scm
@@ -321,6 +321,17 @@ markup is valid return a plain-text version of DESCRIPTION, otherwise #f."
                       (G_ "Texinfo markup in description is invalid")
                       #:field 'description))))
 
+  (define (check-description-typo description typo correction)
+    "Check that DESCRIPTION does not contain typo, with optional correction"
+    (if (string-contains description typo)
+        (list
+	 (make-warning package
+		       (G_
+                        (format #false
+                                "description contains typo '~a'~@[, should be '~a'~]"
+		                typo correction))))
+        '()))
+
   (define (check-trademarks description)
     "Check that DESCRIPTION does not contain '™' or '®' characters.  See
 http://www.gnu.org/prep/standards/html_node/Trademarks.html."
@@ -401,6 +412,7 @@ by two spaces; possible infraction~p at ~{~a~^, ~}")
          (check-not-empty description)
          (check-quotes description)
          (check-trademarks description)
+         (check-description-typo description "This packages" "This package")
          ;; Use raw description for this because Texinfo rendering
          ;; automatically fixes end of sentence space.
          (check-end-of-sentence-space description)
diff --git a/tests/lint.scm b/tests/lint.scm
index 699a750eb9..1902a87354 100644
--- a/tests/lint.scm
+++ b/tests/lint.scm
@@ -177,6 +177,13 @@
                               (description "Whitespace. "))))
      (check-description-style pkg))))
 
+(test-equal "description: pluralized 'This package'"
+  "description contains typo 'This packages', should be 'This package'"
+  (single-lint-warning-message
+   (let ((pkg (dummy-package "x"
+                             (description "This packages is a typo."))))
+     (check-description-style pkg))))
+
 (test-equal "synopsis: not a string"
   "invalid synopsis: #f"
   (single-lint-warning-message
-- 
2.30.2
-----BEGIN PGP SIGNATURE-----

iHUEARYKAB0WIQRlgHNhO/zFx+LkXUXcUY/If5cWqgUCYXH1SwAKCRDcUY/If5cW
qvtEAP9hlcihBOeEnrbS6zZjgdxG84VvbfSVPK6KtvLt0Di1/wD+MxXNMtIeRn7h
uJZQ50L0D3MKsoJoZnfKQcG9CprXjQ0=
=YK9i
-----END PGP SIGNATURE-----

Z
Z
zimoun wrote on 22 Oct 2021 10:33
865ytpzcfc.fsf@gmail.com
Hi Vagrant,

On Thu, 21 Oct 2021 at 16:18, Vagrant Cascadian <vagrant@debian.org> wrote:

Toggle quote (4 lines)
> It has been rewritten to easily add new typo checks, but this one so far
> only addresses pluralized "This packages". Would be easy enough to add
> "allows to" but hard to add a suggested fix...

If I remember correctly the previous discussion, Debian uses an external
tool for spellchecking. Here, the patch uses a list of “common”
mistakes. It is seems an easy good start. :-)


Toggle quote (2 lines)
> + (define (check-description-typo description typo correction)

Instead, I would use a list of alist ’typo-corrections’ as argument.
For instance,

Toggle snippet (11 lines)
(define (check-description-typo description typo-corrections)
(for-each
(match-lambda
((typo . correction)
(if (string-contains description typo)
(list
(make-warning ...))
'())))
typo-corrections))

Toggle quote (12 lines)
> + "Check that DESCRIPTION does not contain typo, with optional
> correction"
> + (if (string-contains description typo)
> + (list
> + (make-warning package
> + (G_
> + (format #false
> + "description contains typo '~a'~@[, should be '~a'~]"
> + typo correction))))
> + '()))
> +

[...]

Toggle quote (2 lines)
> + (check-description-typo description "This packages" "This package")

And the call reads,

(check-description-typo description '(("This packages" . "This package")))

which allows easily to add new pattern; such as,

'(("This packages" . "This package")
("this packages" . "this package")
("This modules" . "This module"))

Then, as a second step, depending on the patterns listed, let see if
there is a pattern inside these patterns. ;-) (Check both capitalize and
lower-case, etc.)


Cheers,
simon
V
V
Vagrant Cascadian wrote on 24 Oct 2021 13:22
(name . zimoun)(address . zimon.toutoune@gmail.com)
87mtmyr7kp.fsf@yucca
On 2021-10-22, zimoun wrote:
Toggle quote (10 lines)
> On Thu, 21 Oct 2021 at 16:18, Vagrant Cascadian <vagrant@debian.org> wrote:
>
>> It has been rewritten to easily add new typo checks, but this one so far
>> only addresses pluralized "This packages". Would be easy enough to add
>> "allows to" but hard to add a suggested fix...
>
> If I remember correctly the previous discussion, Debian uses an external
> tool for spellchecking. Here, the patch uses a list of “common”
> mistakes. It is seems an easy good start. :-)

Yes, I definitely went for "catch some useful things" rather than trying
to be comprehensive... these checks should catch the majority of
historical ones I've noticed, but I didn't bother adding one-off typos.


Toggle quote (15 lines)
> Instead, I would use a list of alist ’typo-corrections’ as argument.
> For instance,
>
> --8<---------------cut here---------------start------------->8---
> (define (check-description-typo description typo-corrections)
> (for-each
> (match-lambda
> ((typo . correction)
> (if (string-contains description typo)
> (list
> (make-warning ...))
> '())))
> typo-corrections))
> --8<---------------cut here---------------end--------------->8---

So close! Ludo spotted that the for-each needed to be replaced with
append-map, and that basically did it!


Toggle quote (12 lines)
> (check-description-typo description '(("This packages" . "This package")))
>
> which allows easily to add new pattern; such as,
>
> '(("This packages" . "This package")
> ("this packages" . "this package")
> ("This modules" . "This module"))
>
> Then, as a second step, depending on the patterns listed, let see if
> there is a pattern inside these patterns. ;-) (Check both capitalize and
> lower-case, etc.)

I haven't seen case issues in practice for these.

I've added "This packages", "This modules", "allows to" and "permits to"
to the list while I was at it. I only added tests for "This packages"
and "allows to" as the others are so similar I'm not sure it's worth
testing...

Here's to hoping for a guix with fewer glaring typos with which to craft
guix poetry. :)


New patch attached!


live well,
vagrant
From a171769da0f737468e06866164eadf1e720764ba Mon Sep 17 00:00:00 2001
From: Vagrant Cascadian <vagrant@debian.org>
Date: Sun, 24 Oct 2021 04:00:15 -0700
Subject: [PATCH] lint: Add description check for common typos.


* guix/lint.scm (check-description-typo): Add check for occurances of
"This packages", "This modules", "allows to" and "permits to" in package
descriptions.
* tests/lint.scm: Add tests for "This packages" and "allows to".
---
guix/lint.scm | 19 +++++++++++++++++++
tests/lint.scm | 14 ++++++++++++++
2 files changed, 33 insertions(+)

Toggle diff (64 lines)
diff --git a/guix/lint.scm b/guix/lint.scm
index 7b02b9cec0..ac2e7b3841 100644
--- a/guix/lint.scm
+++ b/guix/lint.scm
@@ -321,6 +321,21 @@ markup is valid return a plain-text version of DESCRIPTION, otherwise #f."
                       (G_ "Texinfo markup in description is invalid")
                       #:field 'description))))
 
+  (define (check-description-typo description typo-corrections)
+    "Check that DESCRIPTION does not contain typo, with optional correction"
+    (append-map
+     (match-lambda
+      ((typo . correction)
+       (if (string-contains description typo)
+           (list
+            (make-warning package
+                          (G_
+                           (format #false
+                                   "description contains typo '~a'~@[, should be '~a'~]"
+                                   typo correction))))
+           '())))
+     typo-corrections))
+
   (define (check-trademarks description)
     "Check that DESCRIPTION does not contain '™' or '®' characters.  See
 http://www.gnu.org/prep/standards/html_node/Trademarks.html."
@@ -401,6 +416,10 @@ by two spaces; possible infraction~p at ~{~a~^, ~}")
          (check-not-empty description)
          (check-quotes description)
          (check-trademarks description)
+         (check-description-typo description '(("This packages" . "This package")
+                                               ("This modules" . "This module")
+                                               ("allows to" . #f)
+                                               ("permits to" . #f)))
          ;; Use raw description for this because Texinfo rendering
          ;; automatically fixes end of sentence space.
          (check-end-of-sentence-space description)
diff --git a/tests/lint.scm b/tests/lint.scm
index 699a750eb9..6a7eed02e0 100644
--- a/tests/lint.scm
+++ b/tests/lint.scm
@@ -177,6 +177,20 @@
                               (description "Whitespace. "))))
      (check-description-style pkg))))
 
+(test-equal "description: pluralized 'This package'"
+  "description contains typo 'This packages', should be 'This package'"
+  (single-lint-warning-message
+   (let ((pkg (dummy-package "x"
+                             (description "This packages is a typo."))))
+     (check-description-style pkg))))
+
+(test-equal "description: grammar 'allows to'"
+  "description contains typo 'allows to'"
+  (single-lint-warning-message
+   (let ((pkg (dummy-package "x"
+                             (description "This package allows to do stuff."))))
+     (check-description-style pkg))))
+
 (test-equal "synopsis: not a string"
   "invalid synopsis: #f"
   (single-lint-warning-message
-- 
2.30.2
-----BEGIN PGP SIGNATURE-----

iHUEARYKAB0WIQRlgHNhO/zFx+LkXUXcUY/If5cWqgUCYXVB5wAKCRDcUY/If5cW
qikvAQDiaQqa8kHiYS4AWxIus3x1VkdMUhmBTLOMgolSxqIkcAD8CmhSqhprnrfp
8P3vwacCFwk7lCGy4soPAYKsBFPJuQc=
=mfD9
-----END PGP SIGNATURE-----

Z
Z
zimoun wrote on 24 Oct 2021 13:56
(name . Vagrant Cascadian)(address . vagrant@debian.org)
86ilxmab6p.fsf@gmail.com
Hi,

On Sun, 24 Oct 2021 at 04:22, Vagrant Cascadian <vagrant@debian.org> wrote:

Toggle quote (3 lines)
> So close! Ludo spotted that the for-each needed to be replaced with
> append-map, and that basically did it!

Indeed, I have not checked the type of full chain. :-) And the return
value probably needs to be strongly a list, instead of not-specified. ;-)


Toggle quote (3 lines)
> Here's to hoping for a guix with fewer glaring typos with which to craft
> guix poetry. :)

Guix poetry, oh well. :-)


Toggle quote (8 lines)
> From a171769da0f737468e06866164eadf1e720764ba Mon Sep 17 00:00:00 2001
> From: Vagrant Cascadian <vagrant@debian.org>
> Date: Sun, 24 Oct 2021 04:00:15 -0700
> Subject: [PATCH] lint: Add description check for common typos.
>
> Fixes: https://issues.guix.gnu.org/44675
>
> * guix/lint.scm (check-description-typo): Add check for occurances of
----^^

Occurrence, no? Well, maybe a US vs UK vs French-glish. :-)


Toggle quote (8 lines)
> "This packages", "This modules", "allows to" and "permits to" in package
> descriptions.
> * tests/lint.scm: Add tests for "This packages" and "allows to".
> ---
> guix/lint.scm | 19 +++++++++++++++++++
> tests/lint.scm | 14 ++++++++++++++
> 2 files changed, 33 insertions(+)

Otherwise, LGTM.


Cheers,
simon
V
V
Vagrant Cascadian wrote on 24 Oct 2021 21:02
(name . zimoun)(address . zimon.toutoune@gmail.com)
87fssqqma0.fsf@yucca
On 2021-10-24, zimoun wrote:
Toggle quote (13 lines)
> On Sun, 24 Oct 2021 at 04:22, Vagrant Cascadian <vagrant@debian.org> wrote:
>> From a171769da0f737468e06866164eadf1e720764ba Mon Sep 17 00:00:00 2001
>> From: Vagrant Cascadian <vagrant@debian.org>
>> Date: Sun, 24 Oct 2021 04:00:15 -0700
>> Subject: [PATCH] lint: Add description check for common typos.
>>
>> Fixes: https://issues.guix.gnu.org/44675
>>
>> * guix/lint.scm (check-description-typo): Add check for occurances of
> ----^^
>
> Occurrence, no? Well, maybe a US vs UK vs French-glish. :-)

You are correct! French-glish FTW! :)

Toggle quote (10 lines)
>> "This packages", "This modules", "allows to" and "permits to" in package
>> descriptions.
>> * tests/lint.scm: Add tests for "This packages" and "allows to".
>> ---
>> guix/lint.scm | 19 +++++++++++++++++++
>> tests/lint.scm | 14 ++++++++++++++
>> 2 files changed, 33 insertions(+)
>
> Otherwise, LGTM.

Shall I fix the above typo (the irony!) and push before we get any more
of these typos in the repository? :)


live well,
vagrant
-----BEGIN PGP SIGNATURE-----

iHUEARYKAB0WIQRlgHNhO/zFx+LkXUXcUY/If5cWqgUCYXWtuAAKCRDcUY/If5cW
qmTDAP93Z9HX8mkdrpkp2zxU/CGQOcMOvez8dUhl223rZdA7nQEAwB0bEOBx1QKs
ggU3lETzZ6YIHeOv1L75xkdmcj2nowM=
=03vP
-----END PGP SIGNATURE-----

V
V
Vagrant Cascadian wrote on 24 Oct 2021 23:41
(name . zimoun)(address . zimon.toutoune@gmail.com)
87bl3eqewv.fsf@yucca
On 2021-10-24, Vagrant Cascadian wrote:
Toggle quote (15 lines)
> On 2021-10-24, zimoun wrote:
>> On Sun, 24 Oct 2021 at 04:22, Vagrant Cascadian <vagrant@debian.org> wrote:
>>> "This packages", "This modules", "allows to" and "permits to" in package
>>> descriptions.
>>> * tests/lint.scm: Add tests for "This packages" and "allows to".
>>> ---
>>> guix/lint.scm | 19 +++++++++++++++++++
>>> tests/lint.scm | 14 ++++++++++++++
>>> 2 files changed, 33 insertions(+)
>>
>> Otherwise, LGTM.
>
> Shall I fix the above typo (the irony!) and push before we get any more
> of these typos in the repository? :)

Pushed to master as b5f45a21c27b80210753e184e52708bb75a347bb.

Thanks for all your help!


live well,
vagrant
-----BEGIN PGP SIGNATURE-----

iHUEARYKAB0WIQRlgHNhO/zFx+LkXUXcUY/If5cWqgUCYXXTAAAKCRDcUY/If5cW
qr/HAQCBiXxMCF9RtAed7NaWXn/vVEQIgFDZBZDtoz12IpuWGQD9E8yi3i3Ix3RI
ZZDH3uxa9z9h6DrvwMkUm8dKIAgG+QU=
=qzn3
-----END PGP SIGNATURE-----

Closed
?
Your comment

This issue is archived.

To comment on this conversation send email to 44675@debbugs.gnu.org