[PATCH 0/7] Add 'generic-html' updater

DoneSubmitted by Ludovic Courtès.
Details
2 participants
  • Léo Le Bouter
  • Ludovic Courtès
Owner
unassigned
Severity
normal
L
L
Ludovic Courtès wrote on 13 Mar 22:43 +0100
(address . guix-patches@gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210313214326.28052-1-ludo@gnu.org
Hi!
These patches allow ‘guix refresh’ coverage to go from 78% to 88%as reported by ‘guix refresh --list-updaters’ (both are probablyslightly overestimated) by adding a new ‘generic-html’ updater.
The updater crawls the web page where the package’s source tarballis stored, using Guile-Lib’s (htmlprag), which we depend on sincecommit 02e2e093e858e8a0ca7bd66c1f1f6fd0a1705edb. Among other things,it handles freedesktop.org packages.
Feedback welcome!
Thanks,Ludo’.
Ludovic Courtès (7): gnu-maintenance: Use (htmlprag) for 'latest-html-release'. gnu-maintenance: 'latest-html-release' considers non-relative URLs. gnu-maintenance: 'release-file?' rejects checksum files. gnu-maintenance: 'latest-html-release' can determine signature file name. gnu-maintenance: 'latest-html-release' better computes version number. gnu-maintenance: Add 'generic-html' updater. gnu: hwloc: Add 'release-monitoring-url' property.
doc/guix.texi | 6 +- gnu/packages/mpi.scm | 6 ++ guix/gnu-maintenance.scm | 136 ++++++++++++++++++++++++++++----------- 3 files changed, 108 insertions(+), 40 deletions(-)
-- 2.30.1
L
L
Ludovic Courtès wrote on 13 Mar 22:46 +0100
[PATCH 1/7] gnu-maintenance: Use (htmlprag) for 'latest-html-release'.
(address . 47126@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210313214620.28186-1-ludo@gnu.org
* guix/gnu-maintenance.scm (html->sxml): Remove. Autoload (htmlprag)instead.* doc/guix.texi (Requirements): Mention 'guix refresh' for the Guile-Libdependency.--- doc/guix.texi | 3 ++- guix/gnu-maintenance.scm | 13 +------------ 2 files changed, 3 insertions(+), 13 deletions(-)
Toggle diff (47 lines)diff --git a/doc/guix.texi b/doc/guix.texiindex 4cf241c56a..97094a7d0a 100644--- a/doc/guix.texi+++ b/doc/guix.texi@@ -865,7 +865,8 @@ the @code{crate} importer (@pxref{Invoking guix import}). @item @uref{https://www.nongnu.org/guile-lib/doc/ref/htmlprag/, Guile-Lib} for-the @code{go} importer (@pxref{Invoking guix import}).+the @code{go} importer (@pxref{Invoking guix import}) and for some of+the ``updaters'' (@pxref{Invoking guix refresh}). @item When @url{http://www.bzip.org, libbz2} is available,diff --git a/guix/gnu-maintenance.scm b/guix/gnu-maintenance.scmindex 9e393d18cd..febed57c3a 100644--- a/guix/gnu-maintenance.scm+++ b/guix/gnu-maintenance.scm@@ -38,6 +38,7 @@ #:use-module (guix upstream) #:use-module (guix packages) #:autoload (zlib) (call-with-gzip-input-port)+ #:autoload (htmlprag) (html->sxml) ;from Guile-Lib #:export (gnu-package-name gnu-package-mundane-name gnu-package-copyright-holder@@ -447,18 +448,6 @@ hosted on ftp.gnu.org, or not under that name (this is the case for ;;; Latest HTTP release. ;;; -(define (html->sxml port)- "Read HTML from PORT and return the corresponding SXML tree."- (let ((str (get-string-all port)))- (catch #t- (lambda ()- ;; XXX: This is the poor developer's HTML-to-XML converter. It's good- ;; enough for directory listings at <https://kernel.org/pub> but if- ;; needed we could resort to (htmlprag) from Guile-Lib.- (call-with-input-string (string-replace-substring str "<hr>" "<hr />")- xml->sxml))- (const '(html))))) ;parse error- (define (html-links sxml) "Return the list of links found in SXML, the SXML tree of an HTML page." (let loop ((sxml sxml)-- 2.30.1
L
L
Ludovic Courtès wrote on 13 Mar 22:46 +0100
[PATCH 2/7] gnu-maintenance: 'latest-html-release' considers non-relative URLs.
(address . 47126@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210313214620.28186-2-ludo@gnu.org
* guix/gnu-maintenance.scm (latest-html-release): Allow for URL to be anarbitrary URL rather than a relative URL reference.--- guix/gnu-maintenance.scm | 30 ++++++++++++++++-------------- 1 file changed, 16 insertions(+), 14 deletions(-)
Toggle diff (48 lines)diff --git a/guix/gnu-maintenance.scm b/guix/gnu-maintenance.scmindex febed57c3a..98d326e500 100644--- a/guix/gnu-maintenance.scm+++ b/guix/gnu-maintenance.scm@@ -1,5 +1,5 @@ ;;; GNU Guix --- Functional package management for GNU-;;; Copyright © 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org>+;;; Copyright © 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021 Ludovic Courtès <ludo@gnu.org> ;;; Copyright © 2012, 2013 Nikita Karetnikov <nikita@karetnikov.org> ;;; Copyright © 2021 Simon Tournier <zimon.toutoune@gmail.com> ;;;@@ -479,19 +479,21 @@ return the corresponding signature URL, or #f it signatures are unavailable." (port (http-fetch/cached uri #:ttl 3600)) (sxml (html->sxml port))) (define (url->release url)- (and (string=? url (basename url)) ;relative reference?- (release-file? package url)- (let-values (((name version)- (package-name->name+version- (tarball-sans-extension url)- #\-)))- (upstream-source- (package name)- (version version)- (urls (list (string-append base-url directory "/" url)))- (signature-urls- (list (file->signature- (string-append base-url directory "/" url))))))))+ (let* ((base (basename url))+ (url (if (string=? base url)+ (string-append base-url directory "/" url)+ url)))+ (and (release-file? package base)+ (let-values (((name version)+ (package-name->name+version+ (tarball-sans-extension base)+ #\-)))+ (upstream-source+ (package name)+ (version version)+ (urls (list url))+ (signature-urls+ (list (file->signature url)))))))) (define candidates (filter-map url->release (html-links sxml)))-- 2.30.1
L
L
Ludovic Courtès wrote on 13 Mar 22:46 +0100
[PATCH 3/7] gnu-maintenance: 'release-file?' rejects checksum files.
(address . 47126@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210313214620.28186-3-ludo@gnu.org
* guix/gnu-maintenance.scm (release-file?): Reject ".md5sum",".sha1sum", and ".sha256sum".--- guix/gnu-maintenance.scm | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
Toggle diff (17 lines)diff --git a/guix/gnu-maintenance.scm b/guix/gnu-maintenance.scmindex 98d326e500..a8b24fa336 100644--- a/guix/gnu-maintenance.scm+++ b/guix/gnu-maintenance.scm@@ -247,7 +247,9 @@ network to check in GNU's database." (define (release-file? project file) "Return #f if FILE is not a release tarball of PROJECT, otherwise return true."- (and (not (member (file-extension file) '("sig" "sign" "asc")))+ (and (not (member (file-extension file)+ '("sig" "sign" "asc"+ "md5sum" "sha1sum" "sha256sum"))) (and=> (regexp-exec %tarball-rx file) (lambda (match) ;; Filter out unrelated files, like `guile-www-1.1.1'.-- 2.30.1
L
L
Ludovic Courtès wrote on 13 Mar 22:46 +0100
[PATCH 4/7] gnu-maintenance: 'latest-html-release' can determine signature file name.
(address . 47126@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210313214620.28186-4-ludo@gnu.org
* guix/gnu-maintenance.scm (latest-html-release): #:file->signaturedefaults to #f.[file->signature/guess]: New procedure.[url->release]: Use it when FILE->SIGNATURE is #f.Introduce 'links' variable.(url-prefix-rewrite): Check whether URL is true before calling'string-prefix?'.(latest-savannah-release): Adjust comment about detached signatures.--- guix/gnu-maintenance.scm | 36 ++++++++++++++++++++++++------------ 1 file changed, 24 insertions(+), 12 deletions(-)
Toggle diff (76 lines)diff --git a/guix/gnu-maintenance.scm b/guix/gnu-maintenance.scmindex a8b24fa336..3bffa4d11e 100644--- a/guix/gnu-maintenance.scm+++ b/guix/gnu-maintenance.scm@@ -470,16 +470,29 @@ hosted on ftp.gnu.org, or not under that name (this is the case for #:key (base-url "https://kernel.org/pub") (directory (string-append "/" package))- (file->signature (cut string-append <> ".sig")))+ file->signature) "Return an <upstream-source> for the latest release of PACKAGE (a string) on SERVER under DIRECTORY, or #f. BASE-URL should be the URL of an HTML page, typically a directory listing as found on 'https://kernel.org/pub'. -FILE->SIGNATURE must be a procedure; it is passed a source file URL and must-return the corresponding signature URL, or #f it signatures are unavailable."- (let* ((uri (string->uri (string-append base-url directory "/")))- (port (http-fetch/cached uri #:ttl 3600))- (sxml (html->sxml port)))+When FILE->SIGNATURE is omitted or #f, guess the detached signature file name,+if any. Otherwise, FILE->SIGNATURE must be a procedure; it is passed a source+file URL and must return the corresponding signature URL, or #f it signatures+are unavailable."+ (let* ((uri (string->uri (string-append base-url directory "/")))+ (port (http-fetch/cached uri #:ttl 3600))+ (sxml (html->sxml port))+ (links (delete-duplicates (html-links sxml))))+ (define (file->signature/guess url)+ (let ((base (basename url)))+ (any (lambda (link)+ (any (lambda (extension)+ (and (string=? (string-append base extension)+ (basename link))+ (string-append url extension)))+ '(".asc" ".sig" ".sign")))+ links)))+ (define (url->release url) (let* ((base (basename url)) (url (if (string=? base url)@@ -495,10 +508,10 @@ return the corresponding signature URL, or #f it signatures are unavailable." (version version) (urls (list url)) (signature-urls- (list (file->signature url))))))))+ (list ((or file->signature file->signature/guess) url)))))))) (define candidates- (filter-map url->release (html-links sxml)))+ (filter-map url->release links)) (close-port port) (match candidates@@ -614,7 +627,7 @@ releases are on gnu.org." (define (url-prefix-rewrite old new) "Return a one-argument procedure that rewrites URL prefix OLD to NEW." (lambda (url)- (if (string-prefix? old url)+ (if (and url (string-prefix? old url)) (string-append new (string-drop url (string-length old))) url))) @@ -646,9 +659,8 @@ releases are on gnu.org." (directory (dirname (uri-path uri))) (rewrite (url-prefix-rewrite %savannah-base "mirror://savannah")))- ;; Note: We use the default 'file->signature', which adds ".sig", but not- ;; all projects on Savannah follow that convention: some use ".asc" and- ;; perhaps some lack signatures altogether.+ ;; Note: We use the default 'file->signature', which adds ".sig", ".asc",+ ;; or whichever detached signature naming scheme PACKAGE uses. (and=> (latest-html-release package #:base-url %savannah-base #:directory directory)-- 2.30.1
L
L
Ludovic Courtès wrote on 13 Mar 22:46 +0100
[PATCH 5/7] gnu-maintenance: 'latest-html-release' better computes version number.
(address . 47126@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210313214620.28186-5-ludo@gnu.org
* guix/gnu-maintenance.scm (latest-html-release): Use 'tarball->version'rather than 'package-name->name+version' to extract the version number.This fixes problems with packages like 'netsurf' and 'libdom' that have"-src" in their tarball name, where "src" would be taken as the newversion number.--- guix/gnu-maintenance.scm | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)
Toggle diff (21 lines)diff --git a/guix/gnu-maintenance.scm b/guix/gnu-maintenance.scmindex 3bffa4d11e..5aa16acfde 100644--- a/guix/gnu-maintenance.scm+++ b/guix/gnu-maintenance.scm@@ -499,12 +499,9 @@ are unavailable." (string-append base-url directory "/" url) url))) (and (release-file? package base)- (let-values (((name version)- (package-name->name+version- (tarball-sans-extension base)- #\-)))+ (let ((version (tarball->version base))) (upstream-source- (package name)+ (package package) (version version) (urls (list url)) (signature-urls-- 2.30.1
L
L
Ludovic Courtès wrote on 13 Mar 22:46 +0100
[PATCH 7/7] gnu: hwloc: Add 'release-monitoring-url' property.
(address . 47126@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210313214620.28186-7-ludo@gnu.org
* gnu/packages/mpi.scm (hwloc-1)[properties]: New field.--- gnu/packages/mpi.scm | 6 ++++++ 1 file changed, 6 insertions(+)
Toggle diff (19 lines)diff --git a/gnu/packages/mpi.scm b/gnu/packages/mpi.scmindex 53ee6ef1cd..a8ebd8aeb8 100644--- a/gnu/packages/mpi.scm+++ b/gnu/packages/mpi.scm@@ -66,6 +66,12 @@ (sha256 (base32 "0za1b9lvrm3rhn0lrxja5f64r0aq1qs4m0pxn1ji2mbi8ndppyyx"))))++ (properties+ ;; Tell the 'generic-html' updater to monitor this URL for updates.+ `((release-monitoring-url+ . "https://www-lb.open-mpi.org/software/hwloc/current")))+ (build-system gnu-build-system) (outputs '("out" ;'lstopo' & co., depends on Cairo, libx11, etc. "lib" ;small closure-- 2.30.1
L
L
Ludovic Courtès wrote on 13 Mar 22:46 +0100
[PATCH 6/7] gnu-maintenance: Add 'generic-html' updater.
(address . 47126@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210313214620.28186-6-ludo@gnu.org
This brings total updater coverage, as reported by 'guix refresh--list-updaters', from 78% to 88.3%. Among many other things, it coversfreedesktop.org packages.
* guix/gnu-maintenance.scm (html-updatable-package?)(latest-html-updatable-release): New procedures.(%generic-html-updater): New variable.* doc/guix.texi (Invoking guix refresh): Document it.--- doc/guix.texi | 3 +++ guix/gnu-maintenance.scm | 58 +++++++++++++++++++++++++++++++++++++++- 2 files changed, 60 insertions(+), 1 deletion(-)
Toggle diff (104 lines)diff --git a/doc/guix.texi b/doc/guix.texiindex 97094a7d0a..89c8c58295 100644--- a/doc/guix.texi+++ b/doc/guix.texi@@ -11693,6 +11693,9 @@ the updater for @uref{https://www.stackage.org, Stackage} packages. the updater for @uref{https://crates.io, Crates} packages. @item launchpad the updater for @uref{https://launchpad.net, Launchpad} packages.+@item generic-html+a generic updater that crawls the HTML page where the source tarball of+the package is hosted, when applicable. @end table For instance, the following command only checks for updates of Emacsdiff --git a/guix/gnu-maintenance.scm b/guix/gnu-maintenance.scmindex 5aa16acfde..ced5497b37 100644--- a/guix/gnu-maintenance.scm+++ b/guix/gnu-maintenance.scm@@ -28,6 +28,7 @@ #:use-module (srfi srfi-1) #:use-module (srfi srfi-11) #:use-module (srfi srfi-26)+ #:use-module (srfi srfi-34) #:use-module (rnrs io ports) #:use-module (system foreign) #:use-module (guix http-client)@@ -66,7 +67,8 @@ %gnu-ftp-updater %savannah-updater %xorg-updater- %kernel.org-updater))+ %kernel.org-updater+ %generic-html-updater)) ;;; Commentary: ;;;@@ -697,6 +699,53 @@ releases are on gnu.org." #:file->signature file->signature) (cut adjusted-upstream-source <> rewrite)))) +(define html-updatable-package?+ ;; Return true if the given package may be handled by the generic HTML+ ;; updater.+ (let ((hosting-sites '("github.com" "github.io" "gitlab.com"+ "notabug.org" "sr.ht"+ "gforge.inria.fr" "gitlab.inria.fr"+ "ftp.gnu.org" "download.savannah.gnu.org"+ "pypi.org" "crates.io" "rubygems.org"+ "bioconductor.org")))+ (url-predicate (lambda (url)+ (match (string->uri url)+ (#f #f)+ (uri+ (let ((scheme (uri-scheme uri))+ (host (uri-host uri)))+ (and (memq scheme '(http https))+ (not (member host hosting-sites))))))))))++(define (latest-html-updatable-release package)+ "Return the latest release of PACKAGE. Do that by crawling the HTML page of+the directory containing its source tarball."+ (let* ((uri (string->uri+ (match (origin-uri (package-source package))+ ((? string? url) url)+ ((url _ ...) url))))+ (custom (assoc-ref (package-properties package)+ 'release-monitoring-url))+ (base (or custom+ (string-append (symbol->string (uri-scheme uri))+ "://" (uri-host uri))))+ (directory (if custom+ ""+ (dirname (uri-path uri))))+ (package (package-upstream-name package)))+ (catch #t+ (lambda ()+ (guard (c ((http-get-error? c) #f))+ (latest-html-release package+ #:base-url base+ #:directory directory)))+ (lambda (key . args)+ ;; Return false and move on upon connection failures.+ (unless (memq key '(gnutls-error tls-certificate-error+ system-error))+ (apply throw key args))+ #f))))+ (define %gnu-updater ;; This is for everything at ftp.gnu.org. (upstream-updater@@ -737,4 +786,11 @@ releases are on gnu.org." (pred (url-prefix-predicate "mirror://kernel.org/")) (latest latest-kernel.org-release))) +(define %generic-html-updater+ (upstream-updater+ (name 'generic-html)+ (description "Updater that crawls HTML pages.")+ (pred html-updatable-package?)+ (latest latest-html-updatable-release)))+ ;;; gnu-maintenance.scm ends here-- 2.30.1
L
L
Léo Le Bouter wrote on 17 Mar 11:18 +0100
[PATCH 0/7] Add 'generic-html' updater
(address . 47126@debbugs.gnu.org)
5a2391930ed890f8cf61da88ccda6df9bb874630.camel@zaclys.net
Hello!
That's awesome thanks a lot Ludo!!
I am wondering, does this handle cases where there's a subfolder withversion and then another tarball with version as well?
Like GNOME for example: https://download.gnome.org/sources/NetworkManager/1.31/NetworkManager-1.31.1.tar.xz
I see this is a generic solution, I see you made available some optionsto customize per-package as needed but can we get as precise/reliableas Debian's watch/uscan with that?
Or if I understand correctly, we should always point it to a page wherethe link for the latest release is always published? That last thingreally sounds nice!
Léo
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEFIvLi9gL+xax3g6RRaix6GvNEKYFAmBR15EACgkQRaix6GvNEKYmFxAAsgGHKw+x7IWA4NJMYWvLo0WhY1Ss3wt0CbnOeNDW3aaDV1S39WCs5zPO5GpYp1Y/jo1PGDgilJL5V/Aof8McoRQi1PttVTmfUDAg2n3q4Uv9eKJlKfEFPopZJgxYE/1Y9b+6qjNagBT921AgNpuL0JVdsftdhcOEmmr2ROLTSMLBvHiKc9uCXPDve/hqIVYfXtf6p+EJIwAbEruqhPuLFuPcmOKoEm8hd6T/0cBYoRt36g+uQ8Dkyil5YEPR61XbGo+A6n88uoTmUrCmGvuiQ+uhvI513PlvTrOPtPYvRQf23r5vWii8Dazjaz1BVBH3EnKkZRXhPjE9IG+mwxPq/MWzaZsuyKP5qBwDGZmSrzpzarbaYso4y4LaNdbpCmUMdLYVvK9F/o2UB6P9wyMXH6HCtvZUg86WiylhGnceMj65bZdrnmqWlQAof9O6WTwdXU6vD4RgjBpINqsPH2iqGEFkKDP68ZznNTqjAlEbC5vOoWVVIkO32VmIZtEf2sQggUFDO9ez+78VgY7he4YXeqfzenDwfyFhSRIe+LKGy0oNbrYuizSkbviyvOy1YblmEQnE+vG1d7Uwy7CWANBFPSEUJIHZdKrutV2NPH6CexgpktEjFcoeSTu2PfR/BfIEoIbZMN/4GVwhwcakc08njQV0c7L4JTqQcm2swr/GdAU==faOe-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 17 Mar 14:52 +0100
(name . Léo Le Bouter)(address . lle-bout@zaclys.net)(address . 47126@debbugs.gnu.org)
87lfal51lf.fsf_-_@gnu.org
Hi Léo,
Léo Le Bouter <lle-bout@zaclys.net> skribis:
Toggle quote (2 lines)> That's awesome thanks a lot Ludo!!
Just pushed this series as fe96f64110676f28b948f0d31a1726501abdae0e.Unleash your update powers, comrades! :-)
Toggle quote (10 lines)> I am wondering, does this handle cases where there's a subfolder with> version and then another tarball with version as well?>> Like GNOME for example: > https://download.gnome.org/sources/NetworkManager/1.31/NetworkManager-1.31.1.tar.xz>> I see this is a generic solution, I see you made available some options> to customize per-package as needed but can we get as precise/reliable> as Debian's watch/uscan with that?
There’s a ‘gnome’ updater for GNOME:
https://guix.gnu.org/manual/en/html_node/Invoking-guix-refresh.html
And yes, it actually works. :-)
In the case of NetworkManager, there’s a bug right now:
Toggle snippet (6 lines)$ guix refresh network-managerni sekvas la redirektigon al 'https://download.gnome.org/sources/NetworkManager/cache.json'...ni sekvas la redirektigon al 'https://fr2.rpmfind.net/linux/gnome.org/sources/NetworkManager/cache.json'...gnu/packages/gnome.scm:7648:13: network-manager would be upgraded from 1.24.0 to rc2
I’ll see what’s up. But otherwise ‘guix refresh -t gnome’ producessensible results.
At any rate, updaters sometimes bitrot, produce buggy results as in theexample above. Please do use ‘guix refresh’ and report any issues!
Also, there are still ~12% of packages for which none of the updatersapply. We should investigate and see how we can bring that down tozero.
Thanks for your feedback!
Ludo’.
L
L
Ludovic Courtès wrote on 17 Mar 14:53 +0100
control message for bug #47126
(address . control@debbugs.gnu.org)
87k0q551ky.fsf@gnu.org
tags 47126 fixedclose 47126 quit
?
Your comment

Commenting via the web interface is currently disabled.

To comment on this conversation send email to 47126@debbugs.gnu.org