[PATCH 0/2] Detect early and gracefully handle invalid Texinfo

  • Done
  • quality assurance status badge
Details
One participant
  • Ludovic Courtès
Owner
unassigned
Submitted by
Ludovic Courtès
Severity
normal
L
L
Ludovic Courtès wrote on 22 Oct 2021 14:40
(address . guix-patches@gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20211022124052.28197-1-ludo@gnu.org
Hello!

It’s a fact that we occasionally push invalid Texinfo markup in
package descriptions/synopses, probably even more so in external
channels, and despite the fact that ‘guix lint’ flags it.

The problem is that some of the tools were designed around the idea
that invalid Texinfo “does not happen”. For example, if a single
package contains invalid markup, ‘guix search’ and ‘guix show’ crash
badly:

Toggle snippet (41 lines)
$ guix search ghc citations
name: ghc-citeproc
version: 0.4.0.1
outputs: out
systems: x86_64-linux i686-linux
dependencies: ghc-aeson-pretty@0.8.8 ghc-aeson@1.5.6.0 ghc-attoparsec@0.13.2.5
+ ghc-base-compat@0.11.2 ghc-case-insensitive@1.2.1.0 ghc-data-default@0.7.1.1 ghc-diff@0.4.0
+ ghc-file-embed@0.0.15.0 ghc-pandoc-types@1.22 ghc-safe@0.3.19 ghc-scientific@0.3.7.0
+ ghc-timeit@2.0 ghc-unicode-collation@0.1.3 ghc-uniplate@1.6.13 ghc-vector@0.12.3.0
+ ghc-xml-conduit@1.9.1.1
location: gnu/packages/haskell-xyz.scm:15823:2
homepage: https://hackage.haskell.org/package/citeproc
license: FreeBSD
synopsis: Generate citations and bibliography from CSL styles
Backtrace:
13 (primitive-load "/home/ludo/.config/guix/current/bin/gu…")
In guix/ui.scm:
2185:7 12 (run-guix . _)
2148:10 11 (run-guix-command _ . _)
In ice-9/boot-9.scm:
1752:10 10 (with-exception-handler _ _ #:unwind? _ # _)
In guix/scripts/package.scm:
896:9 9 (_)
In ice-9/boot-9.scm:
1747:15 8 (with-exception-handler #<procedure 7fb7f469a6c0 at ic…> …)
In guix/ui.scm:
1677:23 7 (call-with-paginated-output-port _ #:less-options _)
1712:11 6 (_ #<output: #{write pipe}# 15>)
1558:14 5 (package->recutils _ #<output: #{write pipe}# 15> _ # _ …)
1432:23 4 (texi->plain-text _)
In texinfo.scm:
1132:22 3 (parse _)
967:36 2 (loop #<input: string 7fb7f4a4bc40> (*fragment*) #<pro…> …)
92:2 1 (command-spec _)
In ice-9/boot-9.scm:
1685:16 0 (raise-exception _ #:continuable? _)

ice-9/boot-9.scm:1685:16: In procedure raise-exception:
Throw to key `parser-error' with args `(#f "Unknown command" urefhttps)'.

(This one was fixed in c3c502896b1454b345ee9f17d20063853652a35a.)

This series does two things:

1. Emit a warning when invalid markup is encountered but keep going.

2. Raise a syntax error, at macro-expansion time, when invalid markup
is encountered.

Obviously #2 incurs some overhead, since it parses Texinfo strings at
expansion time, so it’s enabled only when ‘GUIX_UNINSTALLED’ is set—that
is, when working on a checkout with ./pre-inst-env. The expanded code
is exactly the same as before though, without any overhead. Concretely,
that means that ‘make’ fail and you just don’t see the package until
the error has been fixed:

Toggle snippet (11 lines)
$ make
[…]
[ 78%] LOAD gnu/packages/haskell-xyz.scm
;;; note: source file ./gnu/packages/haskell-xyz.scm
;;; newer than compiled /home/ludo/src/guix/gnu/packages/haskell-xyz.go
;;; note: source file ./gnu/packages/haskell-xyz.scm
;;; newer than compiled /home/ludo/src/guix/gnu/packages/haskell-xyz.go
gnu/packages/haskell-xyz.scm:15855:5: error: "@code{ghc-citeproc} parses @acronym{Citation Style Language, CSL} style files\nand uses them to generate a list of formatted citations and bibliography\nentries. For more information about CSL, see @urefhttps://citationstyles.org/}.": invalid Texinfo markup
make[2]: *** [Makefile:7131: make-packages-go] Error 1

Feedback welcome!

Ludo’.

Ludovic Courtès (2):
ui: Gracefully handle invalid Texinfo markup in package blurbs.
packages: Optionally validate Texinfo markup at expansion time.

guix/packages.scm | 52 ++++++++++++++++++++++++++++++++++++++++++++---
guix/ui.scm | 17 ++++++++++++++--
2 files changed, 64 insertions(+), 5 deletions(-)


base-commit: e1261ddd38cf02a0f046f3a5360502d659b4e7d4
--
2.33.0
L
L
Ludovic Courtès wrote on 22 Oct 2021 14:45
[PATCH 1/2] ui: Gracefully handle invalid Texinfo markup in package blurbs.
(address . 51332@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20211022124519.28473-1-ludo@gnu.org
Previously 'guix search' & co. would crash when encountering invalid
Texinfo.

* guix/ui.scm (texi->plain-text*): New procedure.
(package-field-string, package->recutils): Use it.
---
guix/ui.scm | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)

Toggle diff (40 lines)
diff --git a/guix/ui.scm b/guix/ui.scm
index 1428c254b3..eb7f0afcfd 100644
--- a/guix/ui.scm
+++ b/guix/ui.scm
@@ -1431,10 +1431,22 @@ (define (texi->plain-text str)
(with-fluids ((%default-port-encoding "UTF-8"))
(stexi->plain-text (texi-fragment->stexi str))))
+(define (texi->plain-text* package str)
+ "Same as 'texi->plain-text', but gracefully handle Texinfo errors."
+ (catch 'parser-error
+ (lambda ()
+ (texi->plain-text str))
+ (lambda args
+ (warning (package-location package)
+ (G_ "~a: invalid Texinfo markup~%")
+ (package-full-name package))
+ str)))
+
(define (package-field-string package field-accessor)
"Return a plain-text representation of PACKAGE field."
(and=> (field-accessor package)
- (compose texi->plain-text P_)))
+ (lambda (str)
+ (texi->plain-text* package (P_ str)))))
(define (package-description-string package)
"Return a plain-text representation of PACKAGE description field."
@@ -1555,7 +1567,8 @@ (define (package<? p1 p2)
(parameterize ((%text-width width*))
;; Call 'texi->plain-text' on the concatenated string to account
;; for the width of "description:" in paragraph filling.
- (texi->plain-text
+ (texi->plain-text*
+ p
(string-append "description: "
(or (and=> (package-description p) P_)
""))))
--
2.33.0
L
L
Ludovic Courtès wrote on 22 Oct 2021 14:45
[PATCH 2/2] packages: Optionally validate Texinfo markup at expansion time.
(address . 51332@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20211022124519.28473-2-ludo@gnu.org
* guix/packages.scm (validate-texinfo): New macro.
(<package>)[synopsis, description]: Add 'sanitize' property.
---
guix/packages.scm | 52 ++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 49 insertions(+), 3 deletions(-)

Toggle diff (79 lines)
diff --git a/guix/packages.scm b/guix/packages.scm
index e5a9d08bce..394f6aa39e 100644
--- a/guix/packages.scm
+++ b/guix/packages.scm
@@ -49,6 +49,7 @@ (define-module (guix packages)
#:use-module (srfi srfi-35)
#:use-module (rnrs bytevectors)
#:use-module (web uri)
+ #:autoload (texinfo) (texi-fragment->stexi)
#:re-export (%current-system
%current-target-system
search-path-specification) ;for convenience
@@ -437,6 +438,49 @@ (define location
(lambda (s) #,location)))
body ...))))))
+(define-syntax validate-texinfo
+ (let ((validate? (getenv "GUIX_UNINSTALLED")))
+ (define ensure-thread-safe-texinfo-parser!
+ ;; Work around <https://issues.guix.gnu.org/51264> for Guile <= 3.0.7.
+ (let ((patched? (or (> (string->number (major-version)) 3)
+ (> (string->number (minor-version)) 0)
+ (> (string->number (micro-version)) 7)))
+ (next-token-of/thread-safe
+ (lambda (pred port)
+ (let loop ((chars '()))
+ (match (read-char port)
+ ((? eof-object?)
+ (list->string (reverse! chars)))
+ (chr
+ (let ((chr* (pred chr)))
+ (if chr*
+ (loop (cons chr* chars))
+ (begin
+ (unread-char chr port)
+ (list->string (reverse! chars)))))))))))
+ (lambda ()
+ (unless patched?
+ (set! (@@ (texinfo) next-token-of) next-token-of/thread-safe)
+ (set! patched? #t)))))
+
+ (lambda (s)
+ "Raise a syntax error when passed a literal string that is not valid
+Texinfo. Otherwise, return the string."
+ (syntax-case s ()
+ ((_ str)
+ (string? (syntax->datum #'str))
+ (if validate?
+ (catch 'parser-error
+ (lambda ()
+ (ensure-thread-safe-texinfo-parser!)
+ (texi-fragment->stexi (syntax->datum #'str))
+ #'str)
+ (lambda _
+ (syntax-violation 'package "invalid Texinfo markup" #'str)))
+ #'str))
+ ((_ obj)
+ #'obj)))))
+
;; A package.
(define-record-type* <package>
package make-package
@@ -471,9 +515,11 @@ (define-record-type* <package>
(replacement package-replacement ; package | #f
(default #f) (thunked) (innate))
- (synopsis package-synopsis) ; one-line description
- (description package-description) ; one or two paragraphs
- (license package-license) ; <license> instance or list
+ (synopsis package-synopsis
+ (sanitize validate-texinfo)) ; one-line description
+ (description package-description
+ (sanitize validate-texinfo)) ; one or two paragraphs
+ (license package-license) ; <license> instance or list
(home-page package-home-page)
(supported-systems package-supported-systems ; list of strings
(default %supported-systems))
--
2.33.0
L
L
Ludovic Courtès wrote on 28 Oct 2021 21:46
Re: bug#51332: [PATCH 0/2] Detect early and gracefully handle invalid Texinfo
(address . 51332-done@debbugs.gnu.org)
87fssl3pbt.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (3 lines)
> ui: Gracefully handle invalid Texinfo markup in package blurbs.
> packages: Optionally validate Texinfo markup at expansion time.

Pushed as e171182a20962c4119e12439b92bbbfd59b1495e!

Ludo'.
Closed
?