[PATCH 0/4] Add 'archival' checker for 'guix lint'

DoneSubmitted by Ludovic Courtès.
Details
2 participants
  • Ludovic Courtès
  • zimoun
Owner
unassigned
Severity
normal
L
L
Ludovic Courtès wrote on 30 Aug 2019 01:16
(address . guix-patches@gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20190829231653.7607-1-ludo@gnu.org
Hello Guix!
This patch series adds an ‘archival’ checker for ‘guix lint’, documentedlike this:
Checks whether the package’s source code is archived at Software Heritage (https://www.softwareheritage.org).
When the source code that is not archived comes from a version-control system (VCS)—e.g., it’s obtained with ‘git-fetch’, send Software Heritage a “save” request so that it eventually archives it. This ensures that the source will remain available in the long term, and that Guix can fall back to Software Heritage should the source code disappear from its original host. The status of recent “save” requests can be viewed on-line (https://archive.softwareheritage.org/save/#requests).
When source code is a tarball obtained with ‘url-fetch’, simply print a message when it is not archived. As of this writing Software Heritage does not allow requests to save arbitrary tarballs; we are working on ways to ensure that non-VCS source code is also archived.
Software Heritage limits the request rate per IP address (https://archive.softwareheritage.org/api/#rate-limiting). When the limit is reached, ‘guix lint’ prints a message and the ‘archival’ checker stops doing anything until that limit has been reset.
Currently, only 25% of our packages are not fetched with ‘url-fetch’.For the remaining 75%, this checker can only report whether the tarballis missing (and apart from ftp.gnu.org and a few other exceptions, itusually _is_ missing) and cannot actually save it.
Anyway, it’s a first step in that direction. Feedback welcome!
The second step will be to write a “lister” for Software Heritage thatgrabs the list of source code URLs fromhttps://guix.gnu.org/packages.json. That could would run at SWHand it could potentially grab the tarballs, not just the VCS checkouts.Here’s are examples:
https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/packagist/lister.py https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/gnu/lister.py
It should be quite easy for a Pythonista to write something similarfor our ‘packages.json’. Any takers? :-)
Ludo’.
Ludovic Courtès (4): tests: 'with-http-server' accepts multiple responses. swh: Add hooks for rate limiting handling. swh: Make 'commit-id?' public. lint: Add 'archival' checker.
doc/guix.texi | 25 ++++++ guix/lint.scm | 96 +++++++++++++++++++++- guix/swh.scm | 88 ++++++++++++++++----- guix/tests/http.scm | 39 +++++---- tests/derivations.scm | 12 +-- tests/lint.scm | 179 ++++++++++++++++++++++++++++++++---------- tests/swh.scm | 41 +++++++++- 7 files changed, 395 insertions(+), 85 deletions(-)
-- 2.23.0
L
L
Ludovic Courtès wrote on 30 Aug 2019 01:20
[PATCH 1/4] tests: 'with-http-server' accepts multiple responses.
(address . 37224@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20190829232101.8153-1-ludo@gnu.org
* guix/tests/http.scm (call-with-http-server): Replace 'code' and 'data'parameters with 'responses+data'. Compute RESPONSES as a function ofthat. Remove #:headers parameter.[http-write]: Quit only when RESPONSES is empty.[server-body]: Get the response and data from RESPONSES, and set it topoint to the rest.(with-http-server): Adjust accordingly.* tests/derivations.scm ("'download' built-in builder")("'download' built-in builder, invalid hash")("'download' built-in builder, not found")("'download' built-in builder, check mode"): Adjust to new'with-http-server' interface.* tests/lint.scm ("home-page: 200")("home-page: 200 but short length")("home-page: 404", "home-page: 301, invalid"):("home-page: 301 -> 200", "home-page: 301 -> 404")("source: 200", "source: 200 but short length")("source: 404", "source: 404 and 200")("source: 301 -> 200", "source: 301 -> 404"):("github-url", github-url): Likewise.* tests/swh.scm (with-json-result)("lookup-origin, not found"): Likewise.--- guix/tests/http.scm | 39 ++++++++++------- tests/derivations.scm | 12 +++--- tests/lint.scm | 98 +++++++++++++++++++++++++------------------ tests/swh.scm | 5 ++- 4 files changed, 91 insertions(+), 63 deletions(-)
Toggle diff (375 lines)diff --git a/guix/tests/http.scm b/guix/tests/http.scmindex a56d6f213d..05ce39bca2 100644--- a/guix/tests/http.scm+++ b/guix/tests/http.scm@@ -1,5 +1,5 @@ ;;; GNU Guix --- Functional package management for GNU-;;; Copyright © 2014, 2015, 2016, 2017 Ludovic Courtès <ludo@gnu.org>+;;; Copyright © 2014, 2015, 2016, 2017, 2019 Ludovic Courtès <ludo@gnu.org> ;;; ;;; This file is part of GNU Guix. ;;;@@ -22,6 +22,7 @@ #:use-module (web server http) #:use-module (web response) #:use-module (srfi srfi-39)+ #:use-module (ice-9 match) #:export (with-http-server call-with-http-server %http-server-port@@ -69,10 +70,20 @@ needed." (string-append "http://localhost:" (number->string (%http-server-port)) "/foo/bar")) -(define* (call-with-http-server code data thunk- #:key (headers '()))- "Call THUNK with an HTTP server running and returning CODE and DATA (a-string) on HTTP requests."+(define* (call-with-http-server responses+data thunk)+ "Call THUNK with an HTTP server running and returning RESPONSES+DATA on HTTP+requests. Each elements of RESPONSES+DATA must be a tuple containing a+response and a string, or an HTTP response code and a string."+ (define responses+ (map (match-lambda+ (((? response? response) data)+ (list response data))+ (((? integer? code) data)+ (list (build-response #:code code+ #:reason-phrase "Such is life")+ data)))+ responses+data))+ (define (http-write server client response body) "Write RESPONSE." (let* ((response (write-response response client))@@ -82,7 +93,8 @@ string) on HTTP requests." (else (write-response-body response body))) (close-port port)- (quit #t) ;exit the server thread+ (when (null? responses)+ (quit #t)) ;exit the server thread (values))) ;; Mutex and condition variable to synchronize with the HTTP server.@@ -105,10 +117,10 @@ string) on HTTP requests." (define (server-body) (define (handle request body)- (values (build-response #:code code- #:reason-phrase "Such is life"- #:headers headers)- data))+ (match responses+ (((response data) rest ...)+ (set! responses rest)+ (values response data)))) (let ((socket (open-http-server-socket))) (catch 'quit@@ -126,10 +138,7 @@ string) on HTTP requests." (define-syntax with-http-server (syntax-rules ()- ((_ (code headers) data body ...)- (call-with-http-server code data (lambda () body ...)- #:headers headers))- ((_ code data body ...)- (call-with-http-server code data (lambda () body ...)))))+ ((_ responses+data body ...)+ (call-with-http-server responses+data (lambda () body ...))))) ;;; http.scm ends herediff --git a/tests/derivations.scm b/tests/derivations.scmindex db73d19b3a..00cedef32c 100644--- a/tests/derivations.scm+++ b/tests/derivations.scm@@ -210,7 +210,7 @@ (test-skip 1)) (test-assert "'download' built-in builder" (let ((text (random-text)))- (with-http-server 200 text+ (with-http-server `((200 ,text)) (let* ((drv (derivation %store "world" "builtin:download" '() #:env-vars `(("url"@@ -225,7 +225,7 @@ (unless (http-server-can-listen?) (test-skip 1)) (test-assert "'download' built-in builder, invalid hash"- (with-http-server 200 "hello, world!"+ (with-http-server `((200 "hello, world!")) (let* ((drv (derivation %store "world" "builtin:download" '() #:env-vars `(("url"@@ -240,7 +240,7 @@ (unless (http-server-can-listen?) (test-skip 1)) (test-assert "'download' built-in builder, not found"- (with-http-server 404 "not found"+ (with-http-server '((404 "not found")) (let* ((drv (derivation %store "will-never-be-found" "builtin:download" '() #:env-vars `(("url"@@ -275,9 +275,9 @@ . ,(object->string (%local-url)))) #:hash-algo 'sha256 #:hash (sha256 (string->utf8 text)))))- (and (with-http-server 200 text+ (and (with-http-server `((200 ,text)) (build-derivations %store (list drv)))- (with-http-server 200 text+ (with-http-server `((200 ,text)) (build-derivations %store (list drv) (build-mode check))) (string=? (call-with-input-file (derivation->output-path drv)@@ -1264,5 +1264,5 @@ (test-end) ;; Local Variables:-;; eval: (put 'with-http-server 'scheme-indent-function 2)+;; eval: (put 'with-http-server 'scheme-indent-function 1) ;; End:diff --git a/tests/lint.scm b/tests/lint.scmindex db6dd6dbe1..c8b88136f4 100644--- a/tests/lint.scm+++ b/tests/lint.scm@@ -390,7 +390,7 @@ (test-skip (if (http-server-can-listen?) 0 1)) (test-equal "home-page: 200" '()- (with-http-server 200 %long-string+ (with-http-server `((200 ,%long-string)) (let ((pkg (package (inherit (dummy-package "x")) (home-page (%local-url)))))@@ -399,7 +399,7 @@ (test-skip (if (http-server-can-listen?) 0 1)) (test-equal "home-page: 200 but short length" "URI http://localhost:9999/foo/bar returned suspiciously small file (18 bytes)"- (with-http-server 200 "This is too small."+ (with-http-server `((200 "This is too small.")) (let ((pkg (package (inherit (dummy-package "x")) (home-page (%local-url)))))@@ -410,7 +410,7 @@ (test-skip (if (http-server-can-listen?) 0 1)) (test-equal "home-page: 404" "URI http://localhost:9999/foo/bar not reachable: 404 (\"Such is life\")"- (with-http-server 404 %long-string+ (with-http-server `((404 ,%long-string)) (let ((pkg (package (inherit (dummy-package "x")) (home-page (%local-url)))))@@ -420,7 +420,7 @@ (test-skip (if (http-server-can-listen?) 0 1)) (test-equal "home-page: 301, invalid" "invalid permanent redirect from http://localhost:9999/foo/bar"- (with-http-server 301 %long-string+ (with-http-server `((301 ,%long-string)) (let ((pkg (package (inherit (dummy-package "x")) (home-page (%local-url)))))@@ -430,12 +430,14 @@ (test-skip (if (http-server-can-listen?) 0 1)) (test-equal "home-page: 301 -> 200" "permanent redirect from http://localhost:10000/foo/bar to http://localhost:9999/foo/bar"- (with-http-server 200 %long-string- (let ((initial-url (%local-url)))+ (with-http-server `((200 ,%long-string))+ (let* ((initial-url (%local-url))+ (redirect (build-response #:code 301+ #:headers+ `((location+ . ,(string->uri initial-url)))))) (parameterize ((%http-server-port (+ 1 (%http-server-port))))- (with-http-server (301 `((location- . ,(string->uri initial-url))))- ""+ (with-http-server `((,redirect "")) (let ((pkg (package (inherit (dummy-package "x")) (home-page (%local-url)))))@@ -445,12 +447,14 @@ (test-skip (if (http-server-can-listen?) 0 1)) (test-equal "home-page: 301 -> 404" "URI http://localhost:10000/foo/bar not reachable: 404 (\"Such is life\")"- (with-http-server 404 "booh!"- (let ((initial-url (%local-url)))+ (with-http-server '((404 "booh!"))+ (let* ((initial-url (%local-url))+ (redirect (build-response #:code 301+ #:headers+ `((location+ . ,(string->uri initial-url)))))) (parameterize ((%http-server-port (+ 1 (%http-server-port))))- (with-http-server (301 `((location- . ,(string->uri initial-url))))- ""+ (with-http-server `((,redirect "")) (let ((pkg (package (inherit (dummy-package "x")) (home-page (%local-url)))))@@ -583,7 +587,7 @@ (test-skip (if (http-server-can-listen?) 0 1)) (test-equal "source: 200" '()- (with-http-server 200 %long-string+ (with-http-server `((200 ,%long-string)) (let ((pkg (package (inherit (dummy-package "x")) (source (origin@@ -595,7 +599,7 @@ (test-skip (if (http-server-can-listen?) 0 1)) (test-equal "source: 200 but short length" "URI http://localhost:9999/foo/bar returned suspiciously small file (18 bytes)"- (with-http-server 200 "This is too small."+ (with-http-server '((200 "This is too small.")) (let ((pkg (package (inherit (dummy-package "x")) (source (origin@@ -610,7 +614,7 @@ (test-skip (if (http-server-can-listen?) 0 1)) (test-equal "source: 404" "URI http://localhost:9999/foo/bar not reachable: 404 (\"Such is life\")"- (with-http-server 404 %long-string+ (with-http-server `((404 ,%long-string)) (let ((pkg (package (inherit (dummy-package "x")) (source (origin@@ -625,10 +629,10 @@ (test-skip (if (http-server-can-listen?) 0 1)) (test-equal "source: 404 and 200" '()- (with-http-server 404 %long-string+ (with-http-server `((404 ,%long-string)) (let ((bad-url (%local-url))) (parameterize ((%http-server-port (+ 1 (%http-server-port))))- (with-http-server 200 %long-string+ (with-http-server `((200 ,%long-string)) (let ((pkg (package (inherit (dummy-package "x")) (source (origin@@ -642,11 +646,14 @@ (test-skip (if (http-server-can-listen?) 0 1)) (test-equal "source: 301 -> 200" "permanent redirect from http://localhost:10000/foo/bar to http://localhost:9999/foo/bar"- (with-http-server 200 %long-string- (let ((initial-url (%local-url)))+ (with-http-server `((200 ,%long-string))+ (let* ((initial-url (%local-url))+ (redirect (build-response #:code 301+ #:headers+ `((location+ . ,(string->uri initial-url)))))) (parameterize ((%http-server-port (+ 1 (%http-server-port))))- (with-http-server (301 `((location . ,(string->uri initial-url))))- ""+ (with-http-server `((,redirect "")) (let ((pkg (package (inherit (dummy-package "x")) (source (origin@@ -661,11 +668,14 @@ (test-skip (if (http-server-can-listen?) 0 1)) (test-equal "source: 301 -> 404" "URI http://localhost:10000/foo/bar not reachable: 404 (\"Such is life\")"- (with-http-server 404 "booh!"- (let ((initial-url (%local-url)))+ (with-http-server '((404 "booh!"))+ (let* ((initial-url (%local-url))+ (redirect (build-response #:code 301+ #:headers+ `((location+ . ,(string->uri initial-url)))))) (parameterize ((%http-server-port (+ 1 (%http-server-port))))- (with-http-server (301 `((location . ,(string->uri initial-url))))- ""+ (with-http-server `((,redirect "")) (let ((pkg (package (inherit (dummy-package "x")) (source (origin@@ -697,7 +707,7 @@ (test-equal "github-url" '()- (with-http-server 200 %long-string+ (with-http-server `((200 ,%long-string)) (check-github-url (dummy-package "x" (source (origin@@ -709,17 +719,25 @@ (test-equal "github-url: one suggestion" (string-append "URL should be '" github-url "'")- (with-http-server (301 `((location . ,(string->uri github-url)))) ""- (let ((initial-uri (%local-url)))- (parameterize ((%http-server-port (+ 1 (%http-server-port))))- (with-http-server (302 `((location . ,(string->uri initial-uri)))) ""- (single-lint-warning-message- (check-github-url- (dummy-package "x" (source- (origin- (method url-fetch)- (uri (%local-url))- (sha256 %null-sha256)))))))))))+ (let ((redirect (build-response #:code 301+ #:headers+ `((location+ . ,(string->uri github-url))))))+ (with-http-server `((,redirect ""))+ (let* ((initial-url (%local-url))+ (redirect (build-response #:code 302+ #:headers+ `((location+ . ,(string->uri initial-url))))))+ (parameterize ((%http-server-port (+ 1 (%http-server-port))))+ (with-http-server `((,redirect ""))+ (single-lint-warning-message+ (check-github-url+ (dummy-package "x" (source+ (origin+ (method url-fetch)+ (uri (%local-url))+ (sha256 %null-sha256)))))))))))) (test-equal "github-url: already the correct github url" '() (check-github-url@@ -844,6 +862,6 @@ (test-end "lint") ;; Local Variables:-;; eval: (put 'with-http-server 'scheme-indent-function 2)+;; eval: (put 'with-http-server 'scheme-indent-function 1) ;; eval: (put 'with-warnings 'scheme-indent-function 0) ;; End:diff --git a/tests/swh.scm b/tests/swh.scmindex 07f0fda37b..9a0da07ae1 100644--- a/tests/swh.scm+++ b/tests/swh.scm@@ -40,7 +40,7 @@ \"dir_id\": 2 } ]") (define-syntax-rule (with-json-result str exp ...)- (with-http-server 200 str+ (with-http-server `((200 ,str)) (parameterize ((%swh-base-url (%local-url))) exp ...))) @@ -56,7 +56,7 @@ (test-equal "lookup-origin, not found" #f- (with-http-server 404 "Nope."+ (with-http-server `((404 "Nope.")) (parameterize ((%swh-base-url (%local-url))) (lookup-origin "http://example.org/whatever")))) @@ -72,5 +72,6 @@ ;; Local Variables: ;; eval: (put 'with-json-result 'scheme-indent-function 1)+;; eval: (put 'with-http-server 'scheme-indent-function 1) ;; End: -- 2.23.0
L
L
Ludovic Courtès wrote on 30 Aug 2019 01:20
[PATCH 2/4] swh: Add hooks for rate limiting handling.
(address . 37224@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20190829232101.8153-2-ludo@gnu.org
* guix/swh.scm (%allow-request?, %save-rate-limit-reset-time)(%general-rate-limit-reset-time): New variables.(request-rate-limit-reached?, update-rate-limit-reset-time!): Newprocedures.(call): Call '%allow-request?'. Change 'swh-error' protocol to passMETHOD in addition to URL.* tests/swh.scm ("rate limit reached")("%allow-request? and request-rate-limit-reached?"): New tests.--- guix/swh.scm | 84 +++++++++++++++++++++++++++++++++++++++------------ tests/swh.scm | 36 ++++++++++++++++++++++ 2 files changed, 100 insertions(+), 20 deletions(-)
Toggle diff (170 lines)diff --git a/guix/swh.scm b/guix/swh.scmindex c253e217da..42f38ee048 100644--- a/guix/swh.scm+++ b/guix/swh.scm@@ -20,6 +20,7 @@ #:use-module (guix base16) #:use-module (guix build utils) #:use-module ((guix build syscalls) #:select (mkdtemp!))+ #:use-module (web uri) #:use-module (web client) #:use-module (web response) #:use-module (json)@@ -32,6 +33,9 @@ #:use-module (ice-9 popen) #:use-module ((ice-9 ftw) #:select (scandir)) #:export (%swh-base-url+ %allow-request?++ request-rate-limit-reached? origin? origin-id@@ -196,31 +200,71 @@ Software Heritage." ((? string? str) str) ((? null?) #f))) +(define %allow-request?+ ;; Takes a URL and method (e.g., the 'http-get' procedure) and returns true+ ;; to keep going. This can be used to disallow a requests when+ ;; 'request-rate-limit-reached?' returns true, for instance.+ (make-parameter (const #t)))++;; The time when the rate limit for "/origin/save" POST requests and that of+;; other requests will be reset.+;; See <https://archive.softwareheritage.org/api/#rate-limiting>.+(define %save-rate-limit-reset-time 0)+(define %general-rate-limit-reset-time 0)++(define (request-rate-limit-reached? url method)+ "Return true if the rate limit has been reached for URI."+ (define uri+ (string->uri url))++ (define reset-time+ (if (and (eq? method http-post)+ (string-prefix? "/api/1/origin/save/" (uri-path uri)))+ %save-rate-limit-reset-time+ %general-rate-limit-reset-time))++ (< (car (gettimeofday)) reset-time))++(define (update-rate-limit-reset-time! url method response)+ "Update the rate limit reset time for URL and METHOD based on the headers in+RESPONSE."+ (let ((uri (string->uri url)))+ (match (assq-ref (response-headers response) 'x-ratelimit-reset)+ ((= string->number (? number? reset))+ (if (and (eq? method http-post)+ (string-prefix? "/api/1/origin/save/" (uri-path uri)))+ (set! %save-rate-limit-reset-time reset)+ (set! %general-rate-limit-reset-time reset)))+ (_+ #f))))+ (define* (call url decode #:optional (method http-get) #:key (false-if-404? #t)) "Invoke the endpoint at URL using METHOD. Decode the resulting JSON body using DECODE, a one-argument procedure that takes an input port. When FALSE-IF-404? is true, return #f upon 404 responses."- (let*-values (((response port)- (method url #:streaming? #t)))- ;; See <https://archive.softwareheritage.org/api/#rate-limiting>.- (match (assq-ref (response-headers response) 'x-ratelimit-remaining)- (#f #t)- ((? (compose zero? string->number))- (throw 'swh-error url response))- (_ #t))-- (cond ((= 200 (response-code response))- (let ((result (decode port)))- (close-port port)- result))- ((and false-if-404?- (= 404 (response-code response)))- (close-port port)- #f)- (else- (close-port port)- (throw 'swh-error url response)))))+ (and ((%allow-request?) url method)+ (let*-values (((response port)+ (method url #:streaming? #t)))+ ;; See <https://archive.softwareheritage.org/api/#rate-limiting>.+ (match (assq-ref (response-headers response) 'x-ratelimit-remaining)+ (#f #t)+ ((? (compose zero? string->number))+ (update-rate-limit-reset-time! url method response)+ (throw 'swh-error url method response))+ (_ #t))++ (cond ((= 200 (response-code response))+ (let ((result (decode port)))+ (close-port port)+ result))+ ((and false-if-404?+ (= 404 (response-code response)))+ (close-port port)+ #f)+ (else+ (close-port port)+ (throw 'swh-error url method response)))))) (define-syntax define-query (syntax-rules (path)diff --git a/tests/swh.scm b/tests/swh.scmindex 9a0da07ae1..e36c54e5fb 100644--- a/tests/swh.scm+++ b/tests/swh.scm@@ -19,6 +19,7 @@ (define-module (test-swh) #:use-module (guix swh) #:use-module (guix tests http)+ #:use-module (web response) #:use-module (srfi srfi-64)) ;; Test the JSON mapping machinery used in (guix swh).@@ -68,6 +69,41 @@ (directory-entry-length entry))) (lookup-directory "123")))) +(test-equal "rate limit reached"+ 3000000000+ (let ((too-many (build-response+ #:code 429+ #:reason-phrase "Too many requests"++ ;; Pretend we've reached the limit and it'll be reset in+ ;; June 2065.+ #:headers '((x-ratelimit-remaining . "0")+ (x-ratelimit-reset . "3000000000")))))+ (with-http-server `((,too-many "Too bad."))+ (parameterize ((%swh-base-url (%local-url)))+ (catch 'swh-error+ (lambda ()+ (lookup-origin "http://example.org/guix.git"))+ (lambda (key url method response)+ ;; Ensure the reset time was recorded.+ (@@ (guix swh) %general-rate-limit-reset-time)))))))++(test-assert "%allow-request? and request-rate-limit-reached?"+ ;; Here we test two things: that the rate limit set above is in effect and+ ;; that %ALLOW-REQUEST? is called, and that 'request-rate-limit-reached?'+ ;; returns true.+ (let* ((key (gensym "skip-request"))+ (skip-if-limit-reached+ (lambda (url method)+ (or (not (request-rate-limit-reached? url method))+ (throw key #t)))))+ (parameterize ((%allow-request? skip-if-limit-reached))+ (catch key+ (lambda ()+ (lookup-origin "http://example.org/guix.git")+ #f)+ (const #t)))))+ (test-end "swh") ;; Local Variables:-- 2.23.0
L
L
Ludovic Courtès wrote on 30 Aug 2019 01:21
[PATCH 3/4] swh: Make 'commit-id?' public.
(address . 37224@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20190829232101.8153-3-ludo@gnu.org
* guix/swh.scm (commit-id?): Make public.--- guix/swh.scm | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
Toggle diff (24 lines)diff --git a/guix/swh.scm b/guix/swh.scmindex 42f38ee048..01648a1ebe 100644--- a/guix/swh.scm+++ b/guix/swh.scm@@ -105,6 +105,8 @@ request-cooking vault-fetch + commit-id?+ swh-download)) ;;; Commentary:@@ -568,7 +570,7 @@ requested bundle cooking, waiting for completion...~%")) (define (commit-id? reference) "Return true if REFERENCE is likely a commit ID, false otherwise---e.g., if-it is a tag name."+it is a tag name. This is based on a simple heuristic so use with care!" (and (= (string-length reference) 40) (string-every char-set:hex-digit reference))) -- 2.23.0
L
L
Ludovic Courtès wrote on 30 Aug 2019 01:21
[PATCH 4/4] lint: Add 'archival' checker.
(address . 37224@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20190829232101.8153-4-ludo@gnu.org
* guix/lint.scm (check-archival): New procedure.(%network-dependent-checkers): Add 'archival' checker.* tests/lint.scm ("archival: missing content")("archival: content available")("archival: missing revision")("archival: revision available")("archival: rate limit reached"): New tests.* doc/guix.texi (Invoking guix lint): Document it.--- doc/guix.texi | 25 +++++++++++++ guix/lint.scm | 96 +++++++++++++++++++++++++++++++++++++++++++++++++- tests/lint.scm | 81 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 201 insertions(+), 1 deletion(-)
Toggle diff (272 lines)diff --git a/doc/guix.texi b/doc/guix.texiindex 707c2ba700..582f3a124b 100644--- a/doc/guix.texi+++ b/doc/guix.texi@@ -9233,6 +9233,31 @@ Parse the @code{source} URL to determine if a tarball from GitHub is autogenerated or if it is a release tarball. Unfortunately GitHub's autogenerated tarballs are sometimes regenerated. +@item archival+@cindex Software Heritage, source code archive+@cindex archival of source code, Software Heritage+Checks whether the package's source code is archived at+@uref{https://www.softwareheritage.org, Software Heritage}.++When the source code that is not archived comes from a version-control system+(VCS)---e.g., it's obtained with @code{git-fetch}, send Software Heritage a+``save'' request so that it eventually archives it. This ensures that the+source will remain available in the long term, and that Guix can fall back to+Software Heritage should the source code disappear from its original host.+The status of recent ``save'' requests can be+@uref{https://archive.softwareheritage.org/save/#requests, viewed on-line}.++When source code is a tarball obtained with @code{url-fetch}, simply print a+message when it is not archived. As of this writing, Software Heritage does+not allow requests to save arbitrary tarballs; we are working on ways to+ensure that non-VCS source code is also archived.++Software Heritage+@uref{https://archive.softwareheritage.org/api/#rate-limiting, limits the+request rate per IP address}. When the limit is reached, @command{guix lint}+prints a message and the @code{archival} checker stops doing anything until+that limit has been reset.+ @item cve @cindex security vulnerabilities @cindex CVE, Common Vulnerabilities and Exposuresdiff --git a/guix/lint.scm b/guix/lint.scmindex 2bf5097403..98ac77556e 100644--- a/guix/lint.scm+++ b/guix/lint.scm@@ -44,6 +44,8 @@ #:use-module ((guix ui) #:select (texi->plain-text fill-paragraph)) #:use-module (guix gnu-maintenance) #:use-module (guix cve)+ #:use-module ((guix swh) #:hide (origin?))+ #:autoload (guix git-download) (git-reference?) #:use-module (ice-9 match) #:use-module (ice-9 regex) #:use-module (ice-9 format)@@ -80,6 +82,7 @@ check-vulnerabilities check-for-updates check-formatting+ check-archival lint-warning lint-warning?@@ -1023,6 +1026,93 @@ the NIST server non-fatal." '())) (#f '()))) ; cannot find newer upstream release ++(define (check-archival package)+ "Check whether PACKAGE's source code is archived on Software Heritage. If+it's not, and if its source code is a VCS snapshot, then send a \"save\"+request to Software Heritage.++Software Heritage imposes limits on the request rate per client IP address.+This checker prints a notice and stops doing anything once that limit has been+reached."+ (define (response->warning url method response)+ (if (request-rate-limit-reached? url method)+ (list (make-warning package+ (G_ "Software Heritage rate limit reached; \+try again later")+ #:field 'source))+ (list (make-warning package+ (G_ "'~a' returned ~a")+ (list url (response-code response))+ #:field 'source))))++ (define skip-key (gensym "skip-archival-check"))++ (define (skip-when-limit-reached url method)+ (or (not (request-rate-limit-reached? url method))+ (throw skip-key #t)))++ (parameterize ((%allow-request? skip-when-limit-reached))+ (catch #t+ (lambda ()+ (match (and (origin? (package-source package))+ (package-source package))+ (#f ;no source+ '())+ ((= origin-uri (? git-reference? reference))+ (define url+ (git-reference-url reference))+ (define commit+ (git-reference-commit reference))++ (match (if (commit-id? commit)+ (or (lookup-revision commit)+ (lookup-origin-revision url commit))+ (lookup-origin-revision url commit))+ ((? revision? revision)+ '())+ (#f+ ;; Revision is missing from the archive, attempt to save it.+ (catch 'swh-error+ (lambda ()+ (save-origin (git-reference-url reference) "git")+ (list (make-warning+ package+ ;; TRANSLATORS: "Software Heritage" is a proper noun+ ;; that must remain untranslated. See+ ;; <https://www.softwareheritage.org>.+ (G_ "scheduled Software Heritage archival")+ #:field 'source)))+ (lambda (key url method response . _)+ (cond ((= 429 (response-code response))+ (list (make-warning+ package+ (G_ "archival rate limit exceeded; \+try again later")+ #:field 'source)))+ (else+ (response->warning url method response))))))))+ ((? origin? origin)+ ;; Since "save" origins are not supported for non-VCS source, all+ ;; we can do is tell whether a given tarball is available or not.+ (if (origin-sha256 origin) ;XXX: for ungoogled-chromium+ (match (lookup-content (origin-sha256 origin) "sha256")+ (#f+ (list (make-warning package+ (G_ "source not archived on Software \+Heritage")+ #:field 'source)))+ ((? content?)+ '()))+ '()))))+ (match-lambda*+ ((key url method response)+ (response->warning url method response))+ ((key . args)+ (if (eq? key skip-key)+ '()+ (apply throw key args)))))))+ ;;; ;;; Source code formatting.@@ -1227,7 +1317,11 @@ or a list thereof") (lint-checker (name 'refresh) (description "Check the package for new upstream releases")- (check check-for-updates))))+ (check check-for-updates))+ (lint-checker+ (name 'archival)+ (description "Ensure source code archival on Software Heritage")+ (check check-archival)))) (define %all-checkers (append %local-checkersdiff --git a/tests/lint.scm b/tests/lint.scmindex c8b88136f4..1b92f02b85 100644--- a/tests/lint.scm+++ b/tests/lint.scm@@ -35,6 +35,7 @@ #:use-module (guix packages) #:use-module (guix lint) #:use-module (guix ui)+ #:use-module (guix swh) #:use-module (gnu packages) #:use-module (gnu packages glib) #:use-module (gnu packages pkg-config)@@ -47,6 +48,7 @@ #:use-module (ice-9 regex) #:use-module (ice-9 getopt-long) #:use-module (ice-9 pretty-print)+ #:use-module (rnrs bytevectors) #:use-module (srfi srfi-1) #:use-module (srfi srfi-9 gnu) #:use-module (srfi srfi-26)@@ -859,6 +861,85 @@ '() (check-formatting (dummy-package "x"))) +(test-assert "archival: missing content"+ (let* ((origin (origin+ (method url-fetch)+ (uri "http://example.org/foo.tgz")+ (sha256 (make-bytevector 32))))+ (warnings (with-http-server '((404 "Not archived."))+ (parameterize ((%swh-base-url (%local-url)))+ (check-archival (dummy-package "x"+ (source origin)))))))+ (warning-contains? "not archived" warnings)))++(test-equal "archival: content available"+ '()+ (let* ((origin (origin+ (method url-fetch)+ (uri "http://example.org/foo.tgz")+ (sha256 (make-bytevector 32))))+ ;; https://archive.softwareheritage.org/api/1/content/+ (content "{ \"checksums\": {}, \"data_url\": \"xyz\",+ \"length\": 42 }"))+ (with-http-server `((200 ,content))+ (parameterize ((%swh-base-url (%local-url)))+ (check-archival (dummy-package "x" (source origin)))))))++(test-assert "archival: missing revision"+ (let* ((origin (origin+ (method git-fetch)+ (uri (git-reference+ (url "http://example.org/foo.git")+ (commit "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa")))+ (sha256 (make-bytevector 32))))+ ;; https://archive.softwareheritage.org/api/1/origin/save/+ (save "{ \"origin_url\": \"http://example.org/foo.git\",+ \"save_request_date\": \"2014-11-17T22:09:38+01:00\",+ \"save_request_status\": \"accepted\",+ \"save_task_status\": \"scheduled\" }")+ (warnings (with-http-server `((404 "No revision.") ;lookup-revision+ (404 "No origin.") ;lookup-origin+ (200 ,save)) ;save-origin+ (parameterize ((%swh-base-url (%local-url)))+ (check-archival (dummy-package "x" (source origin)))))))+ (warning-contains? "scheduled" warnings)))++(test-equal "archival: revision available"+ '()+ (let* ((origin (origin+ (method git-fetch)+ (uri (git-reference+ (url "http://example.org/foo.git")+ (commit "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa")))+ (sha256 (make-bytevector 32))))+ ;; https://archive.softwareheritage.org/api/1/revision/+ (revision "{ \"author\": {}, \"parents\": [],+ \"date\": \"2014-11-17T22:09:38+01:00\" }"))+ (with-http-server `((200 ,revision))+ (parameterize ((%swh-base-url (%local-url)))+ (check-archival (dummy-package "x" (source origin)))))))++(test-assert "archival: rate limit reached"+ ;; We should get a single warning stating that the rate limit was reached,+ ;; and nothing more, in particular no other HTTP requests.+ (let* ((origin (origin+ (method url-fetch)+ (uri "http://example.org/foo.tgz")+ (sha256 (make-bytevector 32))))+ (too-many (build-response+ #:code 429+ #:reason-phrase "Too many requests"+ #:headers '((x-ratelimit-remaining . "0")+ (x-ratelimit-reset . "3000000000"))))+ (warnings (with-http-server `((,too-many "Rate limit reached."))+ (parameterize ((%swh-base-url (%local-url)))+ (append-map (lambda (name)+ (check-archival+ (dummy-package name (source origin))))+ '("x" "y" "z"))))))+ (string-contains (single-lint-warning-message warnings)+ "rate limit reached")))+ (test-end "lint") ;; Local Variables:-- 2.23.0
L
L
Ludovic Courtès wrote on 2 Sep 2019 15:28
Re: [bug#37224] [PATCH 0/4] Add 'archival' checker for 'guix lint'
(address . 37224-done@debbugs.gnu.org)
87h85u96am.fsf@gnu.org
Hello,
Ludovic Courtès <ludo@gnu.org> skribis:
Toggle quote (5 lines)> tests: 'with-http-server' accepts multiple responses.> swh: Add hooks for rate limiting handling.> swh: Make 'commit-id?' public.> lint: Add 'archival' checker.
I went ahead and pushed these at commit55549c7b9b778a79d3e1f3d085861ef36aabdca6.
I asked for feedback on #swh-devel and olasd (Nicolas Dandrimont), oneof the SWH developers, replied:
Toggle snippet (25 lines)<olasd> civodul: this seems like a sensible design to me; Does `guix lint` automatically call other network services? maybe the save request should be an optional flag [13:55]<olasd> (automatically _checking_ is fine; automatically _saving_, I don't know)<civodul> olasd: there's a 'refresh' checker that calls out to services to determine whether a newer version of the package is available, for instance [14:01]<civodul> initially i thought about not saving at all, and just writing "you should save this"<civodul> but then i thought it's more convenient to just do it right away<civodul> it's unlikely to send garbage anyway, and it'll necessarily send only public code, and very likely only free code [14:02]<civodul> or did you have other concerns?<olasd> I don't think it's going to be an issue for us [14:08]<olasd> I would just (personally) be surprised if a lint tool I'm using started to have side effects on somewhat unrelated systems :) [14:09][...]
<civodul> olasd: ah true, though i guess we just got used to that ;-) [14:12]<civodul> anyway, thanks for your feedback!<olasd> civodul: feel free to quote me by mail if you want to keep it archived
Ludo’.
Closed
Z
Z
zimoun wrote on 11 Sep 2019 12:20
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 37224@debbugs.gnu.org)
CAJ3okZ20pTrSJ5z4vi2eMBqMPzE=wvbbPdRAGVUapsSR0YcwBQ@mail.gmail.com
Hi,
Nice !And it is so aligned with their recent announcement [1] ;-)
[1] https://www.softwareheritage.org/2019/08/05/saving-and-referencing-research-software-in-software-heritage/
On Fri, 30 Aug 2019 at 01:18, Ludovic Courtès <ludo@gnu.org> wrote:
Toggle quote (5 lines)> Currently, only 25% of our packages are not fetched with ‘url-fetch’.> For the remaining 75%, this checker can only report whether the tarball> is missing (and apart from ftp.gnu.org and a few other exceptions, it> usually _is_ missing) and cannot actually save it.
Maybe I miss something, but for example guile-2.0 is not yet archived.I am not able to find it with their search resources. And `guix lint-c archival guile@2.0' reports "guile@2.0.14: source not archived onSoftware Heritage".

Toggle quote (2 lines)> Anyway, it’s a first step in that direction. Feedback welcome!
I agree with the words on #swh-deve by olasd (Nicolas Dandrimont) fromSWH that the automatic "save" should be optional (even if the defaultis save=true).

Toggle quote (12 lines)> The second step will be to write a “lister” for Software Heritage that> grabs the list of source code URLs from> <https://guix.gnu.org/packages.json>. That could would run at SWH> and it could potentially grab the tarballs, not just the VCS checkouts.> Here’s are examples:>> https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/packagist/lister.py> https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/gnu/lister.py>> It should be quite easy for a Pythonista to write something similar> for our ‘packages.json’. Any takers? :-)
I am not sure to understand all but I will give a look... I am readingtheir GSoC about this topic [2].
[2] https://wiki.softwareheritage.org/wiki/Google_Summer_of_Code_2019/Increase_archive_coverage

All the best,simon
L
L
Ludovic Courtès wrote on 12 Sep 2019 09:41
(name . zimoun)(address . zimon.toutoune@gmail.com)(address . 37224@debbugs.gnu.org)
87blvqrmfr.fsf@gnu.org
Hello!
zimoun <zimon.toutoune@gmail.com> skribis:
Toggle quote (12 lines)> On Fri, 30 Aug 2019 at 01:18, Ludovic Courtès <ludo@gnu.org> wrote:>>> Currently, only 25% of our packages are not fetched with ‘url-fetch’.>> For the remaining 75%, this checker can only report whether the tarball>> is missing (and apart from ftp.gnu.org and a few other exceptions, it>> usually _is_ missing) and cannot actually save it.>> Maybe I miss something, but for example guile-2.0 is not yet archived.> I am not able to find it with their search resources. And `guix lint> -c archival guile@2.0' reports "guile@2.0.14: source not archived on> Software Heritage".
Yeah, most not-too-recent tarballs from ftp.gnu.org are archived, so Idon’t know why this one is missing. We’d have to check with them.
Toggle quote (4 lines)> I agree with the words on #swh-deve by olasd (Nicolas Dandrimont) from> SWH that the automatic "save" should be optional (even if the default> is save=true).
Maybe we could have a flag somewhere to turn it off? The good thing ofhaving it on (or opt-out) is that we increase the chances that the codewe care about is archived. :-)
Toggle quote (15 lines)>> The second step will be to write a “lister” for Software Heritage that>> grabs the list of source code URLs from>> <https://guix.gnu.org/packages.json>. That could would run at SWH>> and it could potentially grab the tarballs, not just the VCS checkouts.>> Here’s are examples:>>>> https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/packagist/lister.py>> https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/gnu/lister.py>>>> It should be quite easy for a Pythonista to write something similar>> for our ‘packages.json’. Any takers? :-)>> I am not sure to understand all but I will give a look... I am reading> their GSoC about this topic [2].
Awesome, thank you! Having a “guix” lister in place would be perfect.
Ludo’.
Z
Z
zimoun wrote on 12 Sep 2019 11:52
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 37224@debbugs.gnu.org)
CAJ3okZ3TTx3+RjdFCrj6WpNBAOq7CWn_gR7VApEjWpH3R9GRyQ@mail.gmail.com
Hi Ludo,
On Thu, 12 Sep 2019 at 09:41, Ludovic Courtès <ludo@gnu.org> wrote:
Toggle quote (9 lines)> zimoun <zimon.toutoune@gmail.com> skribis:>> > On Fri, 30 Aug 2019 at 01:18, Ludovic Courtès <ludo@gnu.org> wrote:> >> >> Currently, only 25% of our packages are not fetched with ‘url-fetch’.> >> For the remaining 75%, this checker can only report whether the tarball> >> is missing (and apart from ftp.gnu.org and a few other exceptions, it> >> usually _is_ missing) and cannot actually save it.
And it is interesting that Nix has the same stats. ;-)
https://sympa.inria.fr/sympa/arc/swh-devel/2019-08/msg00024.html

Toggle quote (8 lines)> > Maybe I miss something, but for example guile-2.0 is not yet archived.> > I am not able to find it with their search resources. And `guix lint> > -c archival guile@2.0' reports "guile@2.0.14: source not archived on> > Software Heritage".>> Yeah, most not-too-recent tarballs from ftp.gnu.org are archived, so I> don’t know why this one is missing. We’d have to check with them.
Maybe I have wrong, but bunch of GNU packages seems missing. :-)

Toggle quote (8 lines)> > I agree with the words on #swh-deve by olasd (Nicolas Dandrimont) from> > SWH that the automatic "save" should be optional (even if the default> > is save=true).>> Maybe we could have a flag somewhere to turn it off? The good thing of> having it on (or opt-out) is that we increase the chances that the code> we care about is archived. :-)
I agree. :-)

Speaking of UI, I would expect 2 different commands:
- one to check if the package is in SWH, say: guix package <name> --is-in-swh - one to send a "save" request guix lint <name> -c archival
And adding an option to turn "the push" off, say: guix lint <name> --no-archival
Because when linting the process is generally iterative: guix lint <name> # fix mistake guix lint <name> # fix other mistake etc.and it will save network resource (latency, etc.) by avoiding to checkagain and again in this lint process; I guess.
Or even something in this flavour should be a better UI:
guix lint <name> --checkers=description,synopsis--no-checkers=license,archival
What do you think?


Cheers,simon
L
L
Ludovic Courtès wrote on 13 Sep 2019 10:49
(name . zimoun)(address . zimon.toutoune@gmail.com)(address . 37224@debbugs.gnu.org)
87lfus8tt2.fsf@gnu.org
Hi!
zimoun <zimon.toutoune@gmail.com> skribis:
Toggle quote (7 lines)> Or even something in this flavour should be a better UI:>> guix lint <name> --checkers=description,synopsis> --no-checkers=license,archival>> What do you think?
Good idea, this would be simple and effective!
Thanks,Ludo’.
?
Your comment

This issue is archived.

To comment on this conversation send email to 37224@debbugs.gnu.org