[PATCH 0/4] Add 'archival' checker for 'guix lint'

  • Done
  • quality assurance status badge
Details
2 participants
  • Ludovic Courtès
  • zimoun
Owner
unassigned
Submitted by
Ludovic Courtès
Severity
normal
L
L
Ludovic Courtès wrote on 30 Aug 2019 01:16
(address . guix-patches@gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20190829231653.7607-1-ludo@gnu.org
Hello Guix!

This patch series adds an ‘archival’ checker for ‘guix lint’, documented
like this:

Checks whether the package’s source code is archived at Software
Heritage (https://www.softwareheritage.org).

When the source code that is not archived comes from a
version-control system (VCS)—e.g., it’s obtained with ‘git-fetch’,
send Software Heritage a “save” request so that it eventually
archives it. This ensures that the source will remain available in
the long term, and that Guix can fall back to Software Heritage
should the source code disappear from its original host. The
status of recent “save” requests can be viewed on-line

When source code is a tarball obtained with ‘url-fetch’, simply
print a message when it is not archived. As of this writing
Software Heritage does not allow requests to save arbitrary
tarballs; we are working on ways to ensure that non-VCS source code
is also archived.

Software Heritage limits the request rate per IP address
the limit is reached, ‘guix lint’ prints a message and the
‘archival’ checker stops doing anything until that limit has been
reset.

Currently, only 25% of our packages are not fetched with ‘url-fetch’.
For the remaining 75%, this checker can only report whether the tarball
is missing (and apart from ftp.gnu.org and a few other exceptions, it
usually _is_ missing) and cannot actually save it.

Anyway, it’s a first step in that direction. Feedback welcome!

The second step will be to write a “lister” for Software Heritage that
grabs the list of source code URLs from
https://guix.gnu.org/packages.json. That could would run at SWH
and it could potentially grab the tarballs, not just the VCS checkouts.
Here’s are examples:


It should be quite easy for a Pythonista to write something similar
for our ‘packages.json’. Any takers? :-)

Ludo’.

Ludovic Courtès (4):
tests: 'with-http-server' accepts multiple responses.
swh: Add hooks for rate limiting handling.
swh: Make 'commit-id?' public.
lint: Add 'archival' checker.

doc/guix.texi | 25 ++++++
guix/lint.scm | 96 +++++++++++++++++++++-
guix/swh.scm | 88 ++++++++++++++++-----
guix/tests/http.scm | 39 +++++----
tests/derivations.scm | 12 +--
tests/lint.scm | 179 ++++++++++++++++++++++++++++++++----------
tests/swh.scm | 41 +++++++++-
7 files changed, 395 insertions(+), 85 deletions(-)

--
2.23.0
L
L
Ludovic Courtès wrote on 30 Aug 2019 01:20
[PATCH 1/4] tests: 'with-http-server' accepts multiple responses.
(address . 37224@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20190829232101.8153-1-ludo@gnu.org
* guix/tests/http.scm (call-with-http-server): Replace 'code' and 'data'
parameters with 'responses+data'. Compute RESPONSES as a function of
that. Remove #:headers parameter.
[http-write]: Quit only when RESPONSES is empty.
[server-body]: Get the response and data from RESPONSES, and set it to
point to the rest.
(with-http-server): Adjust accordingly.
* tests/derivations.scm ("'download' built-in builder")
("'download' built-in builder, invalid hash")
("'download' built-in builder, not found")
("'download' built-in builder, check mode"): Adjust to new
'with-http-server' interface.
* tests/lint.scm ("home-page: 200")
("home-page: 200 but short length")
("home-page: 404", "home-page: 301, invalid"):
("home-page: 301 -> 200", "home-page: 301 -> 404")
("source: 200", "source: 200 but short length")
("source: 404", "source: 404 and 200")
("source: 301 -> 200", "source: 301 -> 404"):
("github-url", github-url): Likewise.
* tests/swh.scm (with-json-result)
("lookup-origin, not found"): Likewise.
---
guix/tests/http.scm | 39 ++++++++++-------
tests/derivations.scm | 12 +++---
tests/lint.scm | 98 +++++++++++++++++++++++++------------------
tests/swh.scm | 5 ++-
4 files changed, 91 insertions(+), 63 deletions(-)

Toggle diff (375 lines)
diff --git a/guix/tests/http.scm b/guix/tests/http.scm
index a56d6f213d..05ce39bca2 100644
--- a/guix/tests/http.scm
+++ b/guix/tests/http.scm
@@ -1,5 +1,5 @@
;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2014, 2015, 2016, 2017 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2014, 2015, 2016, 2017, 2019 Ludovic Courtès <ludo@gnu.org>
;;;
;;; This file is part of GNU Guix.
;;;
@@ -22,6 +22,7 @@
#:use-module (web server http)
#:use-module (web response)
#:use-module (srfi srfi-39)
+ #:use-module (ice-9 match)
#:export (with-http-server
call-with-http-server
%http-server-port
@@ -69,10 +70,20 @@ needed."
(string-append "http://localhost:" (number->string (%http-server-port))
"/foo/bar"))
-(define* (call-with-http-server code data thunk
- #:key (headers '()))
- "Call THUNK with an HTTP server running and returning CODE and DATA (a
-string) on HTTP requests."
+(define* (call-with-http-server responses+data thunk)
+ "Call THUNK with an HTTP server running and returning RESPONSES+DATA on HTTP
+requests. Each elements of RESPONSES+DATA must be a tuple containing a
+response and a string, or an HTTP response code and a string."
+ (define responses
+ (map (match-lambda
+ (((? response? response) data)
+ (list response data))
+ (((? integer? code) data)
+ (list (build-response #:code code
+ #:reason-phrase "Such is life")
+ data)))
+ responses+data))
+
(define (http-write server client response body)
"Write RESPONSE."
(let* ((response (write-response response client))
@@ -82,7 +93,8 @@ string) on HTTP requests."
(else
(write-response-body response body)))
(close-port port)
- (quit #t) ;exit the server thread
+ (when (null? responses)
+ (quit #t)) ;exit the server thread
(values)))
;; Mutex and condition variable to synchronize with the HTTP server.
@@ -105,10 +117,10 @@ string) on HTTP requests."
(define (server-body)
(define (handle request body)
- (values (build-response #:code code
- #:reason-phrase "Such is life"
- #:headers headers)
- data))
+ (match responses
+ (((response data) rest ...)
+ (set! responses rest)
+ (values response data))))
(let ((socket (open-http-server-socket)))
(catch 'quit
@@ -126,10 +138,7 @@ string) on HTTP requests."
(define-syntax with-http-server
(syntax-rules ()
- ((_ (code headers) data body ...)
- (call-with-http-server code data (lambda () body ...)
- #:headers headers))
- ((_ code data body ...)
- (call-with-http-server code data (lambda () body ...)))))
+ ((_ responses+data body ...)
+ (call-with-http-server responses+data (lambda () body ...)))))
;;; http.scm ends here
diff --git a/tests/derivations.scm b/tests/derivations.scm
index db73d19b3a..00cedef32c 100644
--- a/tests/derivations.scm
+++ b/tests/derivations.scm
@@ -210,7 +210,7 @@
(test-skip 1))
(test-assert "'download' built-in builder"
(let ((text (random-text)))
- (with-http-server 200 text
+ (with-http-server `((200 ,text))
(let* ((drv (derivation %store "world"
"builtin:download" '()
#:env-vars `(("url"
@@ -225,7 +225,7 @@
(unless (http-server-can-listen?)
(test-skip 1))
(test-assert "'download' built-in builder, invalid hash"
- (with-http-server 200 "hello, world!"
+ (with-http-server `((200 "hello, world!"))
(let* ((drv (derivation %store "world"
"builtin:download" '()
#:env-vars `(("url"
@@ -240,7 +240,7 @@
(unless (http-server-can-listen?)
(test-skip 1))
(test-assert "'download' built-in builder, not found"
- (with-http-server 404 "not found"
+ (with-http-server '((404 "not found"))
(let* ((drv (derivation %store "will-never-be-found"
"builtin:download" '()
#:env-vars `(("url"
@@ -275,9 +275,9 @@
. ,(object->string (%local-url))))
#:hash-algo 'sha256
#:hash (sha256 (string->utf8 text)))))
- (and (with-http-server 200 text
+ (and (with-http-server `((200 ,text))
(build-derivations %store (list drv)))
- (with-http-server 200 text
+ (with-http-server `((200 ,text))
(build-derivations %store (list drv)
(build-mode check)))
(string=? (call-with-input-file (derivation->output-path drv)
@@ -1264,5 +1264,5 @@
(test-end)
;; Local Variables:
-;; eval: (put 'with-http-server 'scheme-indent-function 2)
+;; eval: (put 'with-http-server 'scheme-indent-function 1)
;; End:
diff --git a/tests/lint.scm b/tests/lint.scm
index db6dd6dbe1..c8b88136f4 100644
--- a/tests/lint.scm
+++ b/tests/lint.scm
@@ -390,7 +390,7 @@
(test-skip (if (http-server-can-listen?) 0 1))
(test-equal "home-page: 200"
'()
- (with-http-server 200 %long-string
+ (with-http-server `((200 ,%long-string))
(let ((pkg (package
(inherit (dummy-package "x"))
(home-page (%local-url)))))
@@ -399,7 +399,7 @@
(test-skip (if (http-server-can-listen?) 0 1))
(test-equal "home-page: 200 but short length"
"URI http://localhost:9999/foo/bar returned suspiciously small file (18 bytes)"
- (with-http-server 200 "This is too small."
+ (with-http-server `((200 "This is too small."))
(let ((pkg (package
(inherit (dummy-package "x"))
(home-page (%local-url)))))
@@ -410,7 +410,7 @@
(test-skip (if (http-server-can-listen?) 0 1))
(test-equal "home-page: 404"
"URI http://localhost:9999/foo/bar not reachable: 404 (\"Such is life\")"
- (with-http-server 404 %long-string
+ (with-http-server `((404 ,%long-string))
(let ((pkg (package
(inherit (dummy-package "x"))
(home-page (%local-url)))))
@@ -420,7 +420,7 @@
(test-skip (if (http-server-can-listen?) 0 1))
(test-equal "home-page: 301, invalid"
"invalid permanent redirect from http://localhost:9999/foo/bar"
- (with-http-server 301 %long-string
+ (with-http-server `((301 ,%long-string))
(let ((pkg (package
(inherit (dummy-package "x"))
(home-page (%local-url)))))
@@ -430,12 +430,14 @@
(test-skip (if (http-server-can-listen?) 0 1))
(test-equal "home-page: 301 -> 200"
"permanent redirect from http://localhost:10000/foo/bar to http://localhost:9999/foo/bar"
- (with-http-server 200 %long-string
- (let ((initial-url (%local-url)))
+ (with-http-server `((200 ,%long-string))
+ (let* ((initial-url (%local-url))
+ (redirect (build-response #:code 301
+ #:headers
+ `((location
+ . ,(string->uri initial-url))))))
(parameterize ((%http-server-port (+ 1 (%http-server-port))))
- (with-http-server (301 `((location
- . ,(string->uri initial-url))))
- ""
+ (with-http-server `((,redirect ""))
(let ((pkg (package
(inherit (dummy-package "x"))
(home-page (%local-url)))))
@@ -445,12 +447,14 @@
(test-skip (if (http-server-can-listen?) 0 1))
(test-equal "home-page: 301 -> 404"
"URI http://localhost:10000/foo/bar not reachable: 404 (\"Such is life\")"
- (with-http-server 404 "booh!"
- (let ((initial-url (%local-url)))
+ (with-http-server '((404 "booh!"))
+ (let* ((initial-url (%local-url))
+ (redirect (build-response #:code 301
+ #:headers
+ `((location
+ . ,(string->uri initial-url))))))
(parameterize ((%http-server-port (+ 1 (%http-server-port))))
- (with-http-server (301 `((location
- . ,(string->uri initial-url))))
- ""
+ (with-http-server `((,redirect ""))
(let ((pkg (package
(inherit (dummy-package "x"))
(home-page (%local-url)))))
@@ -583,7 +587,7 @@
(test-skip (if (http-server-can-listen?) 0 1))
(test-equal "source: 200"
'()
- (with-http-server 200 %long-string
+ (with-http-server `((200 ,%long-string))
(let ((pkg (package
(inherit (dummy-package "x"))
(source (origin
@@ -595,7 +599,7 @@
(test-skip (if (http-server-can-listen?) 0 1))
(test-equal "source: 200 but short length"
"URI http://localhost:9999/foo/bar returned suspiciously small file (18 bytes)"
- (with-http-server 200 "This is too small."
+ (with-http-server '((200 "This is too small."))
(let ((pkg (package
(inherit (dummy-package "x"))
(source (origin
@@ -610,7 +614,7 @@
(test-skip (if (http-server-can-listen?) 0 1))
(test-equal "source: 404"
"URI http://localhost:9999/foo/bar not reachable: 404 (\"Such is life\")"
- (with-http-server 404 %long-string
+ (with-http-server `((404 ,%long-string))
(let ((pkg (package
(inherit (dummy-package "x"))
(source (origin
@@ -625,10 +629,10 @@
(test-skip (if (http-server-can-listen?) 0 1))
(test-equal "source: 404 and 200"
'()
- (with-http-server 404 %long-string
+ (with-http-server `((404 ,%long-string))
(let ((bad-url (%local-url)))
(parameterize ((%http-server-port (+ 1 (%http-server-port))))
- (with-http-server 200 %long-string
+ (with-http-server `((200 ,%long-string))
(let ((pkg (package
(inherit (dummy-package "x"))
(source (origin
@@ -642,11 +646,14 @@
(test-skip (if (http-server-can-listen?) 0 1))
(test-equal "source: 301 -> 200"
"permanent redirect from http://localhost:10000/foo/bar to http://localhost:9999/foo/bar"
- (with-http-server 200 %long-string
- (let ((initial-url (%local-url)))
+ (with-http-server `((200 ,%long-string))
+ (let* ((initial-url (%local-url))
+ (redirect (build-response #:code 301
+ #:headers
+ `((location
+ . ,(string->uri initial-url))))))
(parameterize ((%http-server-port (+ 1 (%http-server-port))))
- (with-http-server (301 `((location . ,(string->uri initial-url))))
- ""
+ (with-http-server `((,redirect ""))
(let ((pkg (package
(inherit (dummy-package "x"))
(source (origin
@@ -661,11 +668,14 @@
(test-skip (if (http-server-can-listen?) 0 1))
(test-equal "source: 301 -> 404"
"URI http://localhost:10000/foo/bar not reachable: 404 (\"Such is life\")"
- (with-http-server 404 "booh!"
- (let ((initial-url (%local-url)))
+ (with-http-server '((404 "booh!"))
+ (let* ((initial-url (%local-url))
+ (redirect (build-response #:code 301
+ #:headers
+ `((location
+ . ,(string->uri initial-url))))))
(parameterize ((%http-server-port (+ 1 (%http-server-port))))
- (with-http-server (301 `((location . ,(string->uri initial-url))))
- ""
+ (with-http-server `((,redirect ""))
(let ((pkg (package
(inherit (dummy-package "x"))
(source (origin
@@ -697,7 +707,7 @@
(test-equal "github-url"
'()
- (with-http-server 200 %long-string
+ (with-http-server `((200 ,%long-string))
(check-github-url
(dummy-package "x" (source
(origin
@@ -709,17 +719,25 @@
(test-equal "github-url: one suggestion"
(string-append
"URL should be '" github-url "'")
- (with-http-server (301 `((location . ,(string->uri github-url)))) ""
- (let ((initial-uri (%local-url)))
- (parameterize ((%http-server-port (+ 1 (%http-server-port))))
- (with-http-server (302 `((location . ,(string->uri initial-uri)))) ""
- (single-lint-warning-message
- (check-github-url
- (dummy-package "x" (source
- (origin
- (method url-fetch)
- (uri (%local-url))
- (sha256 %null-sha256)))))))))))
+ (let ((redirect (build-response #:code 301
+ #:headers
+ `((location
+ . ,(string->uri github-url))))))
+ (with-http-server `((,redirect ""))
+ (let* ((initial-url (%local-url))
+ (redirect (build-response #:code 302
+ #:headers
+ `((location
+ . ,(string->uri initial-url))))))
+ (parameterize ((%http-server-port (+ 1 (%http-server-port))))
+ (with-http-server `((,redirect ""))
+ (single-lint-warning-message
+ (check-github-url
+ (dummy-package "x" (source
+ (origin
+ (method url-fetch)
+ (uri (%local-url))
+ (sha256 %null-sha256))))))))))))
(test-equal "github-url: already the correct github url"
'()
(check-github-url
@@ -844,6 +862,6 @@
(test-end "lint")
;; Local Variables:
-;; eval: (put 'with-http-server 'scheme-indent-function 2)
+;; eval: (put 'with-http-server 'scheme-indent-function 1)
;; eval: (put 'with-warnings 'scheme-indent-function 0)
;; End:
diff --git a/tests/swh.scm b/tests/swh.scm
index 07f0fda37b..9a0da07ae1 100644
--- a/tests/swh.scm
+++ b/tests/swh.scm
@@ -40,7 +40,7 @@
\"dir_id\": 2 } ]")
(define-syntax-rule (with-json-result str exp ...)
- (with-http-server 200 str
+ (with-http-server `((200 ,str))
(parameterize ((%swh-base-url (%local-url)))
exp ...)))
@@ -56,7 +56,7 @@
(test-equal "lookup-origin, not found"
#f
- (with-http-server 404 "Nope."
+ (with-http-server `((404 "Nope."))
(parameterize ((%swh-base-url (%local-url)))
(lookup-origin "http://example.org/whatever"))))
@@ -72,5 +72,6 @@
;; Local Variables:
;; eval: (put 'with-json-result 'scheme-indent-function 1)
+;; eval: (put 'with-http-server 'scheme-indent-function 1)
;; End:
--
2.23.0
L
L
Ludovic Courtès wrote on 30 Aug 2019 01:20
[PATCH 2/4] swh: Add hooks for rate limiting handling.
(address . 37224@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20190829232101.8153-2-ludo@gnu.org
* guix/swh.scm (%allow-request?, %save-rate-limit-reset-time)
(%general-rate-limit-reset-time): New variables.
(request-rate-limit-reached?, update-rate-limit-reset-time!): New
procedures.
(call): Call '%allow-request?'. Change 'swh-error' protocol to pass
METHOD in addition to URL.
* tests/swh.scm ("rate limit reached")
("%allow-request? and request-rate-limit-reached?"): New tests.
---
guix/swh.scm | 84 +++++++++++++++++++++++++++++++++++++++------------
tests/swh.scm | 36 ++++++++++++++++++++++
2 files changed, 100 insertions(+), 20 deletions(-)

Toggle diff (170 lines)
diff --git a/guix/swh.scm b/guix/swh.scm
index c253e217da..42f38ee048 100644
--- a/guix/swh.scm
+++ b/guix/swh.scm
@@ -20,6 +20,7 @@
#:use-module (guix base16)
#:use-module (guix build utils)
#:use-module ((guix build syscalls) #:select (mkdtemp!))
+ #:use-module (web uri)
#:use-module (web client)
#:use-module (web response)
#:use-module (json)
@@ -32,6 +33,9 @@
#:use-module (ice-9 popen)
#:use-module ((ice-9 ftw) #:select (scandir))
#:export (%swh-base-url
+ %allow-request?
+
+ request-rate-limit-reached?
origin?
origin-id
@@ -196,31 +200,71 @@ Software Heritage."
((? string? str) str)
((? null?) #f)))
+(define %allow-request?
+ ;; Takes a URL and method (e.g., the 'http-get' procedure) and returns true
+ ;; to keep going. This can be used to disallow a requests when
+ ;; 'request-rate-limit-reached?' returns true, for instance.
+ (make-parameter (const #t)))
+
+;; The time when the rate limit for "/origin/save" POST requests and that of
+;; other requests will be reset.
+;; See <https://archive.softwareheritage.org/api/#rate-limiting>.
+(define %save-rate-limit-reset-time 0)
+(define %general-rate-limit-reset-time 0)
+
+(define (request-rate-limit-reached? url method)
+ "Return true if the rate limit has been reached for URI."
+ (define uri
+ (string->uri url))
+
+ (define reset-time
+ (if (and (eq? method http-post)
+ (string-prefix? "/api/1/origin/save/" (uri-path uri)))
+ %save-rate-limit-reset-time
+ %general-rate-limit-reset-time))
+
+ (< (car (gettimeofday)) reset-time))
+
+(define (update-rate-limit-reset-time! url method response)
+ "Update the rate limit reset time for URL and METHOD based on the headers in
+RESPONSE."
+ (let ((uri (string->uri url)))
+ (match (assq-ref (response-headers response) 'x-ratelimit-reset)
+ ((= string->number (? number? reset))
+ (if (and (eq? method http-post)
+ (string-prefix? "/api/1/origin/save/" (uri-path uri)))
+ (set! %save-rate-limit-reset-time reset)
+ (set! %general-rate-limit-reset-time reset)))
+ (_
+ #f))))
+
(define* (call url decode #:optional (method http-get)
#:key (false-if-404? #t))
"Invoke the endpoint at URL using METHOD. Decode the resulting JSON body
using DECODE, a one-argument procedure that takes an input port. When
FALSE-IF-404? is true, return #f upon 404 responses."
- (let*-values (((response port)
- (method url #:streaming? #t)))
- ;; See <https://archive.softwareheritage.org/api/#rate-limiting>.
- (match (assq-ref (response-headers response) 'x-ratelimit-remaining)
- (#f #t)
- ((? (compose zero? string->number))
- (throw 'swh-error url response))
- (_ #t))
-
- (cond ((= 200 (response-code response))
- (let ((result (decode port)))
- (close-port port)
- result))
- ((and false-if-404?
- (= 404 (response-code response)))
- (close-port port)
- #f)
- (else
- (close-port port)
- (throw 'swh-error url response)))))
+ (and ((%allow-request?) url method)
+ (let*-values (((response port)
+ (method url #:streaming? #t)))
+ ;; See <https://archive.softwareheritage.org/api/#rate-limiting>.
+ (match (assq-ref (response-headers response) 'x-ratelimit-remaining)
+ (#f #t)
+ ((? (compose zero? string->number))
+ (update-rate-limit-reset-time! url method response)
+ (throw 'swh-error url method response))
+ (_ #t))
+
+ (cond ((= 200 (response-code response))
+ (let ((result (decode port)))
+ (close-port port)
+ result))
+ ((and false-if-404?
+ (= 404 (response-code response)))
+ (close-port port)
+ #f)
+ (else
+ (close-port port)
+ (throw 'swh-error url method response))))))
(define-syntax define-query
(syntax-rules (path)
diff --git a/tests/swh.scm b/tests/swh.scm
index 9a0da07ae1..e36c54e5fb 100644
--- a/tests/swh.scm
+++ b/tests/swh.scm
@@ -19,6 +19,7 @@
(define-module (test-swh)
#:use-module (guix swh)
#:use-module (guix tests http)
+ #:use-module (web response)
#:use-module (srfi srfi-64))
;; Test the JSON mapping machinery used in (guix swh).
@@ -68,6 +69,41 @@
(directory-entry-length entry)))
(lookup-directory "123"))))
+(test-equal "rate limit reached"
+ 3000000000
+ (let ((too-many (build-response
+ #:code 429
+ #:reason-phrase "Too many requests"
+
+ ;; Pretend we've reached the limit and it'll be reset in
+ ;; June 2065.
+ #:headers '((x-ratelimit-remaining . "0")
+ (x-ratelimit-reset . "3000000000")))))
+ (with-http-server `((,too-many "Too bad."))
+ (parameterize ((%swh-base-url (%local-url)))
+ (catch 'swh-error
+ (lambda ()
+ (lookup-origin "http://example.org/guix.git"))
+ (lambda (key url method response)
+ ;; Ensure the reset time was recorded.
+ (@@ (guix swh) %general-rate-limit-reset-time)))))))
+
+(test-assert "%allow-request? and request-rate-limit-reached?"
+ ;; Here we test two things: that the rate limit set above is in effect and
+ ;; that %ALLOW-REQUEST? is called, and that 'request-rate-limit-reached?'
+ ;; returns true.
+ (let* ((key (gensym "skip-request"))
+ (skip-if-limit-reached
+ (lambda (url method)
+ (or (not (request-rate-limit-reached? url method))
+ (throw key #t)))))
+ (parameterize ((%allow-request? skip-if-limit-reached))
+ (catch key
+ (lambda ()
+ (lookup-origin "http://example.org/guix.git")
+ #f)
+ (const #t)))))
+
(test-end "swh")
;; Local Variables:
--
2.23.0
L
L
Ludovic Courtès wrote on 30 Aug 2019 01:21
[PATCH 3/4] swh: Make 'commit-id?' public.
(address . 37224@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20190829232101.8153-3-ludo@gnu.org
* guix/swh.scm (commit-id?): Make public.
---
guix/swh.scm | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

Toggle diff (24 lines)
diff --git a/guix/swh.scm b/guix/swh.scm
index 42f38ee048..01648a1ebe 100644
--- a/guix/swh.scm
+++ b/guix/swh.scm
@@ -105,6 +105,8 @@
request-cooking
vault-fetch
+ commit-id?
+
swh-download))
;;; Commentary:
@@ -568,7 +570,7 @@ requested bundle cooking, waiting for completion...~%"))
(define (commit-id? reference)
"Return true if REFERENCE is likely a commit ID, false otherwise---e.g., if
-it is a tag name."
+it is a tag name. This is based on a simple heuristic so use with care!"
(and (= (string-length reference) 40)
(string-every char-set:hex-digit reference)))
--
2.23.0
L
L
Ludovic Courtès wrote on 30 Aug 2019 01:21
[PATCH 4/4] lint: Add 'archival' checker.
(address . 37224@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20190829232101.8153-4-ludo@gnu.org
* guix/lint.scm (check-archival): New procedure.
(%network-dependent-checkers): Add 'archival' checker.
* tests/lint.scm ("archival: missing content")
("archival: content available")
("archival: missing revision")
("archival: revision available")
("archival: rate limit reached"): New tests.
* doc/guix.texi (Invoking guix lint): Document it.
---
doc/guix.texi | 25 +++++++++++++
guix/lint.scm | 96 +++++++++++++++++++++++++++++++++++++++++++++++++-
tests/lint.scm | 81 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 201 insertions(+), 1 deletion(-)

Toggle diff (272 lines)
diff --git a/doc/guix.texi b/doc/guix.texi
index 707c2ba700..582f3a124b 100644
--- a/doc/guix.texi
+++ b/doc/guix.texi
@@ -9233,6 +9233,31 @@ Parse the @code{source} URL to determine if a tarball from GitHub is
autogenerated or if it is a release tarball. Unfortunately GitHub's
autogenerated tarballs are sometimes regenerated.
+@item archival
+@cindex Software Heritage, source code archive
+@cindex archival of source code, Software Heritage
+Checks whether the package's source code is archived at
+@uref{https://www.softwareheritage.org, Software Heritage}.
+
+When the source code that is not archived comes from a version-control system
+(VCS)---e.g., it's obtained with @code{git-fetch}, send Software Heritage a
+``save'' request so that it eventually archives it. This ensures that the
+source will remain available in the long term, and that Guix can fall back to
+Software Heritage should the source code disappear from its original host.
+The status of recent ``save'' requests can be
+@uref{https://archive.softwareheritage.org/save/#requests, viewed on-line}.
+
+When source code is a tarball obtained with @code{url-fetch}, simply print a
+message when it is not archived. As of this writing, Software Heritage does
+not allow requests to save arbitrary tarballs; we are working on ways to
+ensure that non-VCS source code is also archived.
+
+Software Heritage
+@uref{https://archive.softwareheritage.org/api/#rate-limiting, limits the
+request rate per IP address}. When the limit is reached, @command{guix lint}
+prints a message and the @code{archival} checker stops doing anything until
+that limit has been reset.
+
@item cve
@cindex security vulnerabilities
@cindex CVE, Common Vulnerabilities and Exposures
diff --git a/guix/lint.scm b/guix/lint.scm
index 2bf5097403..98ac77556e 100644
--- a/guix/lint.scm
+++ b/guix/lint.scm
@@ -44,6 +44,8 @@
#:use-module ((guix ui) #:select (texi->plain-text fill-paragraph))
#:use-module (guix gnu-maintenance)
#:use-module (guix cve)
+ #:use-module ((guix swh) #:hide (origin?))
+ #:autoload (guix git-download) (git-reference?)
#:use-module (ice-9 match)
#:use-module (ice-9 regex)
#:use-module (ice-9 format)
@@ -80,6 +82,7 @@
check-vulnerabilities
check-for-updates
check-formatting
+ check-archival
lint-warning
lint-warning?
@@ -1023,6 +1026,93 @@ the NIST server non-fatal."
'()))
(#f '()))) ; cannot find newer upstream release
+
+(define (check-archival package)
+ "Check whether PACKAGE's source code is archived on Software Heritage. If
+it's not, and if its source code is a VCS snapshot, then send a \"save\"
+request to Software Heritage.
+
+Software Heritage imposes limits on the request rate per client IP address.
+This checker prints a notice and stops doing anything once that limit has been
+reached."
+ (define (response->warning url method response)
+ (if (request-rate-limit-reached? url method)
+ (list (make-warning package
+ (G_ "Software Heritage rate limit reached; \
+try again later")
+ #:field 'source))
+ (list (make-warning package
+ (G_ "'~a' returned ~a")
+ (list url (response-code response))
+ #:field 'source))))
+
+ (define skip-key (gensym "skip-archival-check"))
+
+ (define (skip-when-limit-reached url method)
+ (or (not (request-rate-limit-reached? url method))
+ (throw skip-key #t)))
+
+ (parameterize ((%allow-request? skip-when-limit-reached))
+ (catch #t
+ (lambda ()
+ (match (and (origin? (package-source package))
+ (package-source package))
+ (#f ;no source
+ '())
+ ((= origin-uri (? git-reference? reference))
+ (define url
+ (git-reference-url reference))
+ (define commit
+ (git-reference-commit reference))
+
+ (match (if (commit-id? commit)
+ (or (lookup-revision commit)
+ (lookup-origin-revision url commit))
+ (lookup-origin-revision url commit))
+ ((? revision? revision)
+ '())
+ (#f
+ ;; Revision is missing from the archive, attempt to save it.
+ (catch 'swh-error
+ (lambda ()
+ (save-origin (git-reference-url reference) "git")
+ (list (make-warning
+ package
+ ;; TRANSLATORS: "Software Heritage" is a proper noun
+ ;; that must remain untranslated. See
+ ;; <https://www.softwareheritage.org>.
+ (G_ "scheduled Software Heritage archival")
+ #:field 'source)))
+ (lambda (key url method response . _)
+ (cond ((= 429 (response-code response))
+ (list (make-warning
+ package
+ (G_ "archival rate limit exceeded; \
+try again later")
+ #:field 'source)))
+ (else
+ (response->warning url method response))))))))
+ ((? origin? origin)
+ ;; Since "save" origins are not supported for non-VCS source, all
+ ;; we can do is tell whether a given tarball is available or not.
+ (if (origin-sha256 origin) ;XXX: for ungoogled-chromium
+ (match (lookup-content (origin-sha256 origin) "sha256")
+ (#f
+ (list (make-warning package
+ (G_ "source not archived on Software \
+Heritage")
+ #:field 'source)))
+ ((? content?)
+ '()))
+ '()))))
+ (match-lambda*
+ ((key url method response)
+ (response->warning url method response))
+ ((key . args)
+ (if (eq? key skip-key)
+ '()
+ (apply throw key args)))))))
+
;;;
;;; Source code formatting.
@@ -1227,7 +1317,11 @@ or a list thereof")
(lint-checker
(name 'refresh)
(description "Check the package for new upstream releases")
- (check check-for-updates))))
+ (check check-for-updates))
+ (lint-checker
+ (name 'archival)
+ (description "Ensure source code archival on Software Heritage")
+ (check check-archival))))
(define %all-checkers
(append %local-checkers
diff --git a/tests/lint.scm b/tests/lint.scm
index c8b88136f4..1b92f02b85 100644
--- a/tests/lint.scm
+++ b/tests/lint.scm
@@ -35,6 +35,7 @@
#:use-module (guix packages)
#:use-module (guix lint)
#:use-module (guix ui)
+ #:use-module (guix swh)
#:use-module (gnu packages)
#:use-module (gnu packages glib)
#:use-module (gnu packages pkg-config)
@@ -47,6 +48,7 @@
#:use-module (ice-9 regex)
#:use-module (ice-9 getopt-long)
#:use-module (ice-9 pretty-print)
+ #:use-module (rnrs bytevectors)
#:use-module (srfi srfi-1)
#:use-module (srfi srfi-9 gnu)
#:use-module (srfi srfi-26)
@@ -859,6 +861,85 @@
'()
(check-formatting (dummy-package "x")))
+(test-assert "archival: missing content"
+ (let* ((origin (origin
+ (method url-fetch)
+ (uri "http://example.org/foo.tgz")
+ (sha256 (make-bytevector 32))))
+ (warnings (with-http-server '((404 "Not archived."))
+ (parameterize ((%swh-base-url (%local-url)))
+ (check-archival (dummy-package "x"
+ (source origin)))))))
+ (warning-contains? "not archived" warnings)))
+
+(test-equal "archival: content available"
+ '()
+ (let* ((origin (origin
+ (method url-fetch)
+ (uri "http://example.org/foo.tgz")
+ (sha256 (make-bytevector 32))))
+ ;; https://archive.softwareheritage.org/api/1/content/
+ (content "{ \"checksums\": {}, \"data_url\": \"xyz\",
+ \"length\": 42 }"))
+ (with-http-server `((200 ,content))
+ (parameterize ((%swh-base-url (%local-url)))
+ (check-archival (dummy-package "x" (source origin)))))))
+
+(test-assert "archival: missing revision"
+ (let* ((origin (origin
+ (method git-fetch)
+ (uri (git-reference
+ (url "http://example.org/foo.git")
+ (commit "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa")))
+ (sha256 (make-bytevector 32))))
+ ;; https://archive.softwareheritage.org/api/1/origin/save/
+ (save "{ \"origin_url\": \"http://example.org/foo.git\",
+ \"save_request_date\": \"2014-11-17T22:09:38+01:00\",
+ \"save_request_status\": \"accepted\",
+ \"save_task_status\": \"scheduled\" }")
+ (warnings (with-http-server `((404 "No revision.") ;lookup-revision
+ (404 "No origin.") ;lookup-origin
+ (200 ,save)) ;save-origin
+ (parameterize ((%swh-base-url (%local-url)))
+ (check-archival (dummy-package "x" (source origin)))))))
+ (warning-contains? "scheduled" warnings)))
+
+(test-equal "archival: revision available"
+ '()
+ (let* ((origin (origin
+ (method git-fetch)
+ (uri (git-reference
+ (url "http://example.org/foo.git")
+ (commit "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa")))
+ (sha256 (make-bytevector 32))))
+ ;; https://archive.softwareheritage.org/api/1/revision/
+ (revision "{ \"author\": {}, \"parents\": [],
+ \"date\": \"2014-11-17T22:09:38+01:00\" }"))
+ (with-http-server `((200 ,revision))
+ (parameterize ((%swh-base-url (%local-url)))
+ (check-archival (dummy-package "x" (source origin)))))))
+
+(test-assert "archival: rate limit reached"
+ ;; We should get a single warning stating that the rate limit was reached,
+ ;; and nothing more, in particular no other HTTP requests.
+ (let* ((origin (origin
+ (method url-fetch)
+ (uri "http://example.org/foo.tgz")
+ (sha256 (make-bytevector 32))))
+ (too-many (build-response
+ #:code 429
+ #:reason-phrase "Too many requests"
+ #:headers '((x-ratelimit-remaining . "0")
+ (x-ratelimit-reset . "3000000000"))))
+ (warnings (with-http-server `((,too-many "Rate limit reached."))
+ (parameterize ((%swh-base-url (%local-url)))
+ (append-map (lambda (name)
+ (check-archival
+ (dummy-package name (source origin))))
+ '("x" "y" "z"))))))
+ (string-contains (single-lint-warning-message warnings)
+ "rate limit reached")))
+
(test-end "lint")
;; Local Variables:
--
2.23.0
L
L
Ludovic Courtès wrote on 2 Sep 2019 15:28
Re: [bug#37224] [PATCH 0/4] Add 'archival' checker for 'guix lint'
(address . 37224-done@debbugs.gnu.org)
87h85u96am.fsf@gnu.org
Hello,

Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (5 lines)
> tests: 'with-http-server' accepts multiple responses.
> swh: Add hooks for rate limiting handling.
> swh: Make 'commit-id?' public.
> lint: Add 'archival' checker.

I went ahead and pushed these at commit
55549c7b9b778a79d3e1f3d085861ef36aabdca6.

I asked for feedback on #swh-devel and olasd (Nicolas Dandrimont), one
of the SWH developers, replied:

Toggle snippet (25 lines)
<olasd> civodul: this seems like a sensible design to me; Does `guix lint`
automatically call other network services? maybe the save request
should be an optional flag [13:55]
<olasd> (automatically _checking_ is fine; automatically _saving_, I don't
know)
<civodul> olasd: there's a 'refresh' checker that calls out to services to
determine whether a newer version of the package is available, for
instance [14:01]
<civodul> initially i thought about not saving at all, and just writing "you
should save this"
<civodul> but then i thought it's more convenient to just do it right away
<civodul> it's unlikely to send garbage anyway, and it'll necessarily send
only public code, and very likely only free code [14:02]
<civodul> or did you have other concerns?
<olasd> I don't think it's going to be an issue for us [14:08]
<olasd> I would just (personally) be surprised if a lint tool I'm using
started to have side effects on somewhat unrelated systems :)
[14:09]
[...]

<civodul> olasd: ah true, though i guess we just got used to that ;-) [14:12]
<civodul> anyway, thanks for your feedback!
<olasd> civodul: feel free to quote me by mail if you want to keep it archived

Ludo’.
Closed
Z
Z
zimoun wrote on 11 Sep 2019 12:20
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 37224@debbugs.gnu.org)
CAJ3okZ20pTrSJ5z4vi2eMBqMPzE=wvbbPdRAGVUapsSR0YcwBQ@mail.gmail.com
Hi,

Nice !
And it is so aligned with their recent announcement [1] ;-)


On Fri, 30 Aug 2019 at 01:18, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (5 lines)
> Currently, only 25% of our packages are not fetched with ‘url-fetch’.
> For the remaining 75%, this checker can only report whether the tarball
> is missing (and apart from ftp.gnu.org and a few other exceptions, it
> usually _is_ missing) and cannot actually save it.

Maybe I miss something, but for example guile-2.0 is not yet archived.
I am not able to find it with their search resources. And `guix lint
-c archival guile@2.0' reports "guile@2.0.14: source not archived on
Software Heritage".


Toggle quote (2 lines)
> Anyway, it’s a first step in that direction. Feedback welcome!

I agree with the words on #swh-deve by olasd (Nicolas Dandrimont) from
SWH that the automatic "save" should be optional (even if the default
is save=true).


Toggle quote (12 lines)
> The second step will be to write a “lister” for Software Heritage that
> grabs the list of source code URLs from
> <https://guix.gnu.org/packages.json>. That could would run at SWH
> and it could potentially grab the tarballs, not just the VCS checkouts.
> Here’s are examples:
>
> https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/packagist/lister.py
> https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/gnu/lister.py
>
> It should be quite easy for a Pythonista to write something similar
> for our ‘packages.json’. Any takers? :-)

I am not sure to understand all but I will give a look... I am reading
their GSoC about this topic [2].



All the best,
simon
L
L
Ludovic Courtès wrote on 12 Sep 2019 09:41
(name . zimoun)(address . zimon.toutoune@gmail.com)(address . 37224@debbugs.gnu.org)
87blvqrmfr.fsf@gnu.org
Hello!

zimoun <zimon.toutoune@gmail.com> skribis:

Toggle quote (12 lines)
> On Fri, 30 Aug 2019 at 01:18, Ludovic Courtès <ludo@gnu.org> wrote:
>
>> Currently, only 25% of our packages are not fetched with ‘url-fetch’.
>> For the remaining 75%, this checker can only report whether the tarball
>> is missing (and apart from ftp.gnu.org and a few other exceptions, it
>> usually _is_ missing) and cannot actually save it.
>
> Maybe I miss something, but for example guile-2.0 is not yet archived.
> I am not able to find it with their search resources. And `guix lint
> -c archival guile@2.0' reports "guile@2.0.14: source not archived on
> Software Heritage".

Yeah, most not-too-recent tarballs from ftp.gnu.org are archived, so I
don’t know why this one is missing. We’d have to check with them.

Toggle quote (4 lines)
> I agree with the words on #swh-deve by olasd (Nicolas Dandrimont) from
> SWH that the automatic "save" should be optional (even if the default
> is save=true).

Maybe we could have a flag somewhere to turn it off? The good thing of
having it on (or opt-out) is that we increase the chances that the code
we care about is archived. :-)

Toggle quote (15 lines)
>> The second step will be to write a “lister” for Software Heritage that
>> grabs the list of source code URLs from
>> <https://guix.gnu.org/packages.json>. That could would run at SWH
>> and it could potentially grab the tarballs, not just the VCS checkouts.
>> Here’s are examples:
>>
>> https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/packagist/lister.py
>> https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/gnu/lister.py
>>
>> It should be quite easy for a Pythonista to write something similar
>> for our ‘packages.json’. Any takers? :-)
>
> I am not sure to understand all but I will give a look... I am reading
> their GSoC about this topic [2].

Awesome, thank you! Having a “guix” lister in place would be perfect.

Ludo’.
Z
Z
zimoun wrote on 12 Sep 2019 11:52
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 37224@debbugs.gnu.org)
CAJ3okZ3TTx3+RjdFCrj6WpNBAOq7CWn_gR7VApEjWpH3R9GRyQ@mail.gmail.com
Hi Ludo,

On Thu, 12 Sep 2019 at 09:41, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (9 lines)
> zimoun <zimon.toutoune@gmail.com> skribis:
>
> > On Fri, 30 Aug 2019 at 01:18, Ludovic Courtès <ludo@gnu.org> wrote:
> >
> >> Currently, only 25% of our packages are not fetched with ‘url-fetch’.
> >> For the remaining 75%, this checker can only report whether the tarball
> >> is missing (and apart from ftp.gnu.org and a few other exceptions, it
> >> usually _is_ missing) and cannot actually save it.

And it is interesting that Nix has the same stats. ;-)



Toggle quote (8 lines)
> > Maybe I miss something, but for example guile-2.0 is not yet archived.
> > I am not able to find it with their search resources. And `guix lint
> > -c archival guile@2.0' reports "guile@2.0.14: source not archived on
> > Software Heritage".
>
> Yeah, most not-too-recent tarballs from ftp.gnu.org are archived, so I
> don’t know why this one is missing. We’d have to check with them.

Maybe I have wrong, but bunch of GNU packages seems missing. :-)


Toggle quote (8 lines)
> > I agree with the words on #swh-deve by olasd (Nicolas Dandrimont) from
> > SWH that the automatic "save" should be optional (even if the default
> > is save=true).
>
> Maybe we could have a flag somewhere to turn it off? The good thing of
> having it on (or opt-out) is that we increase the chances that the code
> we care about is archived. :-)

I agree. :-)


Speaking of UI, I would expect 2 different commands:

- one to check if the package is in SWH, say:
guix package <name> --is-in-swh
- one to send a "save" request
guix lint <name> -c archival

And adding an option to turn "the push" off, say:
guix lint <name> --no-archival

Because when linting the process is generally iterative:
guix lint <name>
# fix mistake
guix lint <name>
# fix other mistake
etc.
and it will save network resource (latency, etc.) by avoiding to check
again and again in this lint process; I guess.

Or even something in this flavour should be a better UI:

guix lint <name> --checkers=description,synopsis
--no-checkers=license,archival

What do you think?



Cheers,
simon
L
L
Ludovic Courtès wrote on 13 Sep 2019 10:49
(name . zimoun)(address . zimon.toutoune@gmail.com)(address . 37224@debbugs.gnu.org)
87lfus8tt2.fsf@gnu.org
Hi!

zimoun <zimon.toutoune@gmail.com> skribis:

Toggle quote (7 lines)
> Or even something in this flavour should be a better UI:
>
> guix lint <name> --checkers=description,synopsis
> --no-checkers=license,archival
>
> What do you think?

Good idea, this would be simple and effective!

Thanks,
Ludo’.
?