[PATCH 0/4] 'github' importer gracefully handles rate limiting

DoneSubmitted by Ludovic Courtès.
Details
2 participants
  • Ludovic Courtès
  • Maxime Devos
Owner
unassigned
Severity
normal
L
L
Ludovic Courtès wrote on 3 Mar 22:13 +0100
(address . guix-patches@gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20220303211326.19884-1-ludo@gnu.org
Hi Guix!

These patches address a famous complaint about “the GitHub problem”
that affects ‘guix refresh’¹, shown here in its naked awfulness:

Toggle snippet (24 lines)
$ ./pre-inst-env guix refresh
gnu/packages/zig.scm:32:13: zig would be upgraded from 0.9.0 to 0.9.1

[...]

In guix/scripts/refresh.scm:
578:14 5 (_ _)
In srfi/srfi-1.scm:
634:9 4 (for-each #<procedure 7fe85c9a8e00 at guix/scripts/refresh.scm:578:24 (t-916fdc98f4be2f1-1d48)> _)
In guix/scripts/refresh.scm:
378:2 3 (check-for-package-update #<package redshift-wayland@1.12-1.7da875d gnu/packages/xdisorg.scm:1425 7fe85879e790> (#<<upstream…>) …)
In guix/import/github.scm:
232:12 2 (latest-release _)
In ice-9/boot-9.scm:
1685:16 1 (raise-exception _ #:continuable? _)
1685:16 0 (raise-exception _ #:continuable? _)

ice-9/boot-9.scm:1685:16: In procedure raise-exception:
Error downloading release information through the GitHub
API. This may be fixed by using an access token and setting the environment
variable GUIX_GITHUB_TOKEN, for instance one procured from
https://github.com/settings/tokens

With this change, ‘guix refresh’ warns you when the GitHub rate limit
is reached, but it keeps going, falling back to the ‘generic-git’
updater if it’s among the applicable updaters:

Toggle snippet (17 lines)
$ ./pre-inst-env guix refresh -t github,generic-git

[...]

guix refresh: warning: GitHub rate limit exceeded; disallowing requests for 1477 seconds
hint: You can waive the rate limit by setting the `GUIX_GITHUB_TOKEN' environment variable to a
token obtained from `https://github.com/settings/tokens' with your GitHub account.

Alternatively, you can wait until your rate limit is reset, or use the `generic-git' updater
instead.

gnu/packages/zile.scm:113:14: warning: no tags were found for zile-on-guile
gnu/packages/zig.scm:32:13: zig would be upgraded from 0.9.0 to 0.9.1
gnu/packages/xorg.scm:2830:7: warning: no valid tags found for xf86-video-freedreno
gnu/packages/xml.scm:2132:13: java-kxml2 would be upgraded from 2.4.2 to 2.5.0

The GitHub updater becomes functional again once the rate limit has
been reset.

The code to deal with rate limiting is similar to that in (guix swh).

Thoughts?

Thanks,
Ludo’.


Ludovic Courtès (4):
http-client: Add response headers to '&http-get-error'.
import: github: Gracefully handle rate limit exhaustion.
http-client: Correctly handle redirects when #:keep-alive? #t.
import: github: Reuse HTTP connection for the /tags URL fallback.

.dir-locals.el | 1 +
guix/http-client.scm | 39 +++++++++-----
guix/import/github.scm | 119 +++++++++++++++++++++++++++++------------
3 files changed, 112 insertions(+), 47 deletions(-)


base-commit: be84fb701bf7a36a0eb50147ccbb988aa3f41209
--
2.34.0
L
L
Ludovic Courtès wrote on 3 Mar 22:14 +0100
[PATCH 1/4] http-client: Add response headers to '&http-get-error'.
(address . 54241@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20220303211444.19928-1-ludo@gnu.org
* guix/http-client.scm (&http-get-error)[headers]: New field.
(http-fetch): Initialize the 'headers' field.
---
guix/http-client.scm | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

Toggle diff (45 lines)
diff --git a/guix/http-client.scm b/guix/http-client.scm
index 10bc278023..4b01e31165 100644
--- a/guix/http-client.scm
+++ b/guix/http-client.scm
@@ -1,5 +1,5 @@
 ;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2020, 2021 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2012-2018, 2020-2022 Ludovic Courtès <ludo@gnu.org>
 ;;; Copyright © 2015 Mark H Weaver <mhw@netris.org>
 ;;; Copyright © 2012, 2015 Free Software Foundation, Inc.
 ;;; Copyright © 2017 Tobias Geerinckx-Rice <me@tobias.gr>
@@ -52,6 +52,7 @@ (define-module (guix http-client)
             http-get-error-uri
             http-get-error-code
             http-get-error-reason
+            http-get-error-headers
 
             http-fetch
             http-multiple-get
@@ -69,9 +70,10 @@ (define-module (guix http-client)
 ;; HTTP GET error.
 (define-condition-type &http-get-error &error
   http-get-error?
-  (uri    http-get-error-uri)                     ; URI
-  (code   http-get-error-code)                    ; integer
-  (reason http-get-error-reason))                 ; string
+  (uri      http-get-error-uri)                   ;URI
+  (code     http-get-error-code)                  ;integer
+  (reason   http-get-error-reason)                ;string
+  (headers  http-get-error-headers))              ;alist
 
 
 (define* (http-fetch uri #:key port (text? #f) (buffered? #t)
@@ -138,7 +140,8 @@ (define* (http-fetch uri #:key port (text? #f) (buffered? #t)
            (raise (condition (&http-get-error
                               (uri uri)
                               (code code)
-                              (reason (response-reason-phrase resp)))
+                              (reason (response-reason-phrase resp))
+                              (headers (response-headers resp)))
                              (&message
                               (message
                                (format
-- 
2.34.0
L
L
Ludovic Courtès wrote on 3 Mar 22:14 +0100
[PATCH 2/4] import: github: Gracefully handle rate limit exhaustion.
(address . 54241@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20220303211444.19928-2-ludo@gnu.org
Previously, 'guix refresh' would literally crash when the rate limit was
reached due to the call to 'error'. With this change, the updater
notices when the rate limit is reached and it turns itself into a no-op
until the rate limit has been reset.

When running "guix refresh" (with no arguments), the 'github' updater
gets used until the rate limit has been reached, after which "guix
refresh" automatically picks up the next valid updater, typically
'generic-git'.

* guix/import/github.scm (fetch-releases-or-tags): Use 'http-fetch'
directly instead of 'json-fetch' to let 'http-get-error?' exceptions
through. Handle 403 errors with an 'X-RateLimit-Remaining' header.
(%rate-limit-reset-time): New variable.
(update-rate-limit-reset-time!, request-rate-limit-reached?): New
procedures.
(latest-released-version): Remove calls to 'error'.
---
guix/import/github.scm | 113 +++++++++++++++++++++++++++++------------
1 file changed, 80 insertions(+), 33 deletions(-)

Toggle diff (161 lines)
diff --git a/guix/import/github.scm b/guix/import/github.scm
index 8c1898c0c5..5062bad80d 100644
--- a/guix/import/github.scm
+++ b/guix/import/github.scm
@@ -1,6 +1,6 @@
 ;;; GNU Guix --- Functional package management for GNU
 ;;; Copyright © 2016 Ben Woodcroft <donttrustben@gmail.com>
-;;; Copyright © 2017, 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2017-2020, 2022 Ludovic Courtès <ludo@gnu.org>
 ;;; Copyright © 2018 Eric Bavier <bavier@member.fsf.org>
 ;;; Copyright © 2019 Arun Isaac <arunisaac@systemreboot.net>
 ;;; Copyright © 2019 Efraim Flashner <efraim@flashner.co.il>
@@ -30,15 +30,16 @@ (define-module (guix import github)
   #:use-module (guix utils)
   #:use-module (guix i18n)
   #:use-module (guix diagnostics)
+  #:use-module ((guix ui) #:select (display-hint))
   #:use-module ((guix download) #:prefix download:)
   #:use-module ((guix git-download) #:prefix download:)
   #:use-module (guix import utils)
-  #:use-module (guix import json)
   #:use-module (json)
   #:use-module (guix packages)
   #:use-module (guix upstream)
   #:use-module (guix http-client)
   #:use-module (web uri)
+  #:use-module (web response)
   #:export (%github-api %github-updater))
 
 ;; For tests.
@@ -140,6 +141,30 @@ (define %github-token
   ;; limit, or #f.
   (make-parameter (getenv "GUIX_GITHUB_TOKEN")))
 
+(define %rate-limit-reset-time
+  ;; Time (seconds since the Epoch, UTC) when the rate limit for GitHub
+  ;; requests will be reset, or #f if the rate limit hasn't been reached.
+  #f)
+
+(define (update-rate-limit-reset-time! headers)
+  "Update the rate limit reset time based on HEADERS, the HTTP response
+headers."
+  (match (assq-ref headers 'x-ratelimit-reset)
+    ((= string->number (? number? reset))
+     (set! %rate-limit-reset-time reset)
+     reset)
+    (_
+     0)))
+
+(define (request-rate-limit-reached?)
+  "Return true if the rate limit has been reached."
+  (and %rate-limit-reset-time
+       (match (< (car (gettimeofday)) %rate-limit-reset-time)
+         (#t #t)
+         (#f
+          (set! %rate-limit-reset-time #f)
+          #f))))
+
 (define (fetch-releases-or-tags url)
   "Fetch the list of \"releases\" or, if it's empty, the list of tags for the
 repository at URL.  Return the corresponding JSON dictionaries (alists),
@@ -170,20 +195,49 @@ (define headers
             `((Authorization . ,(string-append "token " (%github-token))))
             '())))
 
-  (guard (c ((and (http-get-error? c)
-                  (= 404 (http-get-error-code c)))
-             (warning (G_ "~a is unreachable (~a)~%")
-                      release-url (http-get-error-code c))
-             '#()))                               ;return an empty release set
-    (let* ((port   (http-fetch release-url #:headers headers))
-           (result (json->scm port)))
-      (close-port port)
-      (match result
-        (#()
-         ;; We got the empty list, presumably because the user didn't use GitHub's
-         ;; "release" mechanism, but hopefully they did use Git tags.
-         (json-fetch tag-url #:headers headers))
-        (x x)))))
+  (and (not (request-rate-limit-reached?))
+       (guard (c ((and (http-get-error? c)
+                       (= 404 (http-get-error-code c)))
+                  (warning (G_ "~a is unreachable (~a)~%")
+                           (uri->string (http-get-error-uri c))
+                           (http-get-error-code c))
+                  '#())                           ;return an empty release set
+                 ((and (http-get-error? c)
+                       (= 403 (http-get-error-code c)))
+                  ;; See
+                  ;; <https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting>.
+                  (match (assq-ref (http-get-error-headers c)
+                                   'x-ratelimit-remaining)
+                    (#f
+                     (raise c))
+                    ((? (compose zero? string->number))
+                     (let ((reset (update-rate-limit-reset-time!
+                                   (http-get-error-headers c))))
+                       (warning (G_ "GitHub rate limit exceeded; \
+disallowing requests for ~a seconds~%")
+                                (- reset (car (gettimeofday))))
+                       (display-hint (G_ "You can waive the rate limit by
+setting the @env{GUIX_GITHUB_TOKEN} environment variable to a token obtained
+from @url{https://github.com/settings/tokens} with your GitHub account.
+
+Alternatively, you can wait until your rate limit is reset, or use the
+@code{generic-git} updater instead."))
+                       #f))                       ;bail out
+                    (_
+                     (raise c)))))
+
+         (let* ((port   (http-fetch release-url #:headers headers))
+                (result (json->scm port)))
+           (close-port port)
+           (match result
+             (#()
+              ;; We got the empty list, presumably because the user didn't use GitHub's
+              ;; "release" mechanism, but hopefully they did use Git tags.
+              (let* ((port (http-fetch tag-url #:headers headers))
+                     (json (json->scm port)))
+                (close-port port)
+                json))
+             (x x))))))
 
 (define (latest-released-version url package-name)
   "Return the newest released version and its tag given a string URL like
@@ -223,23 +277,16 @@ (define (release->version release)
         (cons tag tag))
        (else #f))))
 
-  (let* ((json (and=> (fetch-releases-or-tags url)
-                      vector->list)))
-    (if (eq? json #f)
-        (if (%github-token)
-            (error "Error downloading release information through the GitHub
-API when using a GitHub token")
-            (error "Error downloading release information through the GitHub
-API. This may be fixed by using an access token and setting the environment
-variable GUIX_GITHUB_TOKEN, for instance one procured from
-https://github.com/settings/tokens"))
-        (match (sort (filter-map release->version
-                                 (match (remove pre-release? json)
-                                   (() json) ; keep everything
-                                   (releases releases)))
-                     (lambda (x y) (version>? (car x) (car y))))
-          (((latest-version . tag) . _) (values latest-version tag))
-          (() (values #f #f))))))
+  (match (and=> (fetch-releases-or-tags url) vector->list)
+    (#f (values #f #f))
+    (json
+     (match (sort (filter-map release->version
+                              (match (remove pre-release? json)
+                                (() json)         ; keep everything
+                                (releases releases)))
+                  (lambda (x y) (version>? (car x) (car y))))
+       (((latest-version . tag) . _) (values latest-version tag))
+       (() (values #f #f))))))
 
 (define (latest-release pkg)
   "Return an <upstream-source> for the latest release of PKG."
-- 
2.34.0
L
L
Ludovic Courtès wrote on 3 Mar 22:14 +0100
[PATCH 3/4] http-client: Correctly handle redirects when #:keep-alive? #t.
(address . 54241@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20220303211444.19928-3-ludo@gnu.org
Previously PORT would be closed unconditionally, which broke redirects
when #:keep-alive? #t is given.

* guix/http-client.scm (http-fetch): Make 'port' a parameter of 'loop'.
Upon 3xx responses, do not close PORT is KEEP-ALIVE? is true, but consume
RESP's body. Add second argument to 'loop'.
---
guix/http-client.scm | 26 +++++++++++++++++---------
1 file changed, 17 insertions(+), 9 deletions(-)

Toggle diff (48 lines)
diff --git a/guix/http-client.scm b/guix/http-client.scm
index 4b01e31165..4784497e5f 100644
--- a/guix/http-client.scm
+++ b/guix/http-client.scm
@@ -102,12 +102,12 @@ (define* (http-fetch uri #:key port (text? #f) (buffered? #t)
 Raise an '&http-get-error' condition if downloading fails."
   (let loop ((uri (if (string? uri)
                       (string->uri uri)
-                      uri)))
-    (let ((port (or port (open-connection uri
-                                          #:verify-certificate?
-                                          verify-certificate?
-                                          #:timeout timeout)))
-          (headers (match (uri-userinfo uri)
+                      uri))
+             (port (or port (open-connection uri
+                                             #:verify-certificate?
+                                             verify-certificate?
+                                             #:timeout timeout))))
+    (let ((headers (match (uri-userinfo uri)
                      ((? string? str)
                       (cons (cons 'Authorization
                                   (string-append "Basic "
@@ -131,11 +131,19 @@ (define* (http-fetch uri #:key port (text? #f) (buffered? #t)
             303                                   ; see other
             307                                   ; temporary redirection
             308)                                  ; permanent redirection
-           (let ((uri (resolve-uri-reference (response-location resp) uri)))
-             (close-port port)
+           (let ((host (uri-host uri))
+                 (uri  (resolve-uri-reference (response-location resp) uri)))
+             (if keep-alive?
+                 (dump-port data (%make-void-port "w0")
+                            (response-content-length resp))
+                 (close-port port))
              (format log-port (G_ "following redirection to `~a'...~%")
                      (uri->string uri))
-             (loop uri)))
+             (loop uri
+                   (and keep-alive?
+                        (or (not (uri-host uri))
+                            (string=? host (uri-host uri)))
+                        port))))
           (else
            (raise (condition (&http-get-error
                               (uri uri)
-- 
2.34.0
L
L
Ludovic Courtès wrote on 3 Mar 22:14 +0100
[PATCH 4/4] import: github: Reuse HTTP connection for the /tags URL fallback.
(address . 54241@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20220303211444.19928-4-ludo@gnu.org
* guix/import/github.scm (fetch-releases-or-tags): Call
'open-connection-for-uri' and reuse the same connection for the two
'http-fetch' calls.
* .dir-locals.el (scheme-mode): Add 'call-with-port'.
---
.dir-locals.el | 1 +
guix/import/github.scm | 30 ++++++++++++++++++------------
2 files changed, 19 insertions(+), 12 deletions(-)

Toggle diff (62 lines)
diff --git a/.dir-locals.el b/.dir-locals.el
index 0edf2a8d23..bee745cc27 100644
--- a/.dir-locals.el
+++ b/.dir-locals.el
@@ -52,6 +52,7 @@
    (eval . (put 'test-equal 'scheme-indent-function 1))
    (eval . (put 'test-eq 'scheme-indent-function 1))
    (eval . (put 'call-with-input-string 'scheme-indent-function 1))
+   (eval . (put 'call-with-port 'scheme-indent-function 1))
    (eval . (put 'guard 'scheme-indent-function 1))
    (eval . (put 'lambda* 'scheme-indent-function 1))
    (eval . (put 'substitute* 'scheme-indent-function 1))
diff --git a/guix/import/github.scm b/guix/import/github.scm
index 5062bad80d..84f70eed0f 100644
--- a/guix/import/github.scm
+++ b/guix/import/github.scm
@@ -33,6 +33,7 @@ (define-module (guix import github)
   #:use-module ((guix ui) #:select (display-hint))
   #:use-module ((guix download) #:prefix download:)
   #:use-module ((guix git-download) #:prefix download:)
+  #:autoload   (guix build download) (open-connection-for-uri)
   #:use-module (guix import utils)
   #:use-module (json)
   #:use-module (guix packages)
@@ -226,18 +227,23 @@ (define headers
                     (_
                      (raise c)))))
 
-         (let* ((port   (http-fetch release-url #:headers headers))
-                (result (json->scm port)))
-           (close-port port)
-           (match result
-             (#()
-              ;; We got the empty list, presumably because the user didn't use GitHub's
-              ;; "release" mechanism, but hopefully they did use Git tags.
-              (let* ((port (http-fetch tag-url #:headers headers))
-                     (json (json->scm port)))
-                (close-port port)
-                json))
-             (x x))))))
+         (let ((release-uri (string->uri release-url)))
+           (call-with-port (open-connection-for-uri release-uri)
+             (lambda (connection)
+               (let* ((result (json->scm
+                               (http-fetch release-uri
+                                           #:port connection
+                                           #:keep-alive? #t
+                                           #:headers headers))))
+                 (match result
+                   (#()
+                    ;; We got the empty list, presumably because the user didn't use GitHub's
+                    ;; "release" mechanism, but hopefully they did use Git tags.
+                    (json->scm (http-fetch tag-url
+                                           #:port connection
+                                           #:keep-alive? #t
+                                           #:headers headers)))
+                   (x x)))))))))
 
 (define (latest-released-version url package-name)
   "Return the newest released version and its tag given a string URL like
-- 
2.34.0
M
M
Maxime Devos wrote on 4 Mar 13:07 +0100
Re: [bug#54241] [PATCH 0/4] 'github' importer gracefully handles rate limiting
(name . Nicolas Goaziou)(address . mail@nicolasgoaziou.fr)
95b755ed648ffc15d7dcedb09f538a026fcfeb10.camel@telenet.be
Ludovic Courtès schreef op do 03-03-2022 om 22:13 [+0100]:
Toggle quote (5 lines)
> With this change, ‘guix refresh’ warns you when the GitHub rate limit
> is reached, but it keeps going, falling back to the ‘generic-git’
> updater if it’s among the applicable updaters:
> [...]

WDYT of avoiding the rate limit by caching, using 'http-fetch/cached'?
GitHub does not count requests setting If-Modified-Since against the
rate limit (assuming the answer hasn't changed).

Greetings,
Maxime.
-----BEGIN PGP SIGNATURE-----

iI0EABYKADUWIQTB8z7iDFKP233XAR9J4+4iGRcl7gUCYiIBCxccbWF4aW1lZGV2
b3NAdGVsZW5ldC5iZQAKCRBJ4+4iGRcl7t5yAQC+3yNlyI6OSvD+d9Nk6/bCLR4U
re/eXuU1WtqQYUQTbgD/ciWDY+7ouehsvsUS2Nm4XoZXgoC4yDagQtzuibXj9AE=
=SckA
-----END PGP SIGNATURE-----


M
M
Maxime Devos wrote on 4 Mar 13:39 +0100
Re: [bug#54241] [PATCH 3/4] http-client: Correctly handle redirects when #:keep-alive? #t.
69a48935f3f03ad72ac49eb7b529cf10eb7db96e.camel@telenet.be
Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]:
Toggle quote (8 lines)
> Previously PORT would be closed unconditionally, which broke redirects
> when #:keep-alive? #t is given.
>
> * guix/http-client.scm (http-fetch): Make 'port' a parameter of 'loop'.
> Upon 3xx responses, do not close PORT is KEEP-ALIVE? is true, but consume
> RESP's body.  Add second argument to 'loop'.
> ---

HTTP things can become complicated, with lots of edge cases. Could an
appropriate test be added to prevent this becoming buggy in the future,
in case of future changed to 'http-fetch'? And a few source code
comments about what is going on exactly?

Greetings,
Maxime.
-----BEGIN PGP SIGNATURE-----

iI0EABYKADUWIQTB8z7iDFKP233XAR9J4+4iGRcl7gUCYiIIaRccbWF4aW1lZGV2
b3NAdGVsZW5ldC5iZQAKCRBJ4+4iGRcl7mY4AP9b1RFl4t/PnFRNRi+y/HfBqtKn
EXsxXoUozot4vTcXLAD/SotzdhfuyoAdJjYmV+BGZek8ka8JBFQhtmclYdBzWwo=
=fH/p
-----END PGP SIGNATURE-----


L
L
Ludovic Courtès wrote on 4 Mar 21:45 +0100
Re: bug#54241: [PATCH 0/4] 'github' importer gracefully handles rate limiting
(name . Maxime Devos)(address . maximedevos@telenet.be)
87tucdqwen.fsf_-_@gnu.org
Hi,

Maxime Devos <maximedevos@telenet.be> skribis:

Toggle quote (10 lines)
> Ludovic Courtès schreef op do 03-03-2022 om 22:13 [+0100]:
>> With this change, ‘guix refresh’ warns you when the GitHub rate limit
>> is reached, but it keeps going, falling back to the ‘generic-git’
>> updater if it’s among the applicable updaters:
>> [...]
>
> WDYT of avoiding the rate limit by caching, using 'http-fetch/cached'?
> GitHub does not count requests setting If-Modified-Since against the
> rate limit (assuming the answer hasn't changed).

My concern is that we’d end up caching one or two little files in
~/.cache for each candidate package, and (rate limit aside) the overhead
of dealing with the cache might outweigh the benefits. I’d rather use
‘http-fetch/cached’ for bigger files, like in (guix cve).

WDYT?

My goal here was to ensure the ‘github’ updater doesn’t get in the way
of those who don’t want to specify a token.

Thanks,
Ludo’.
M
M
Maxime Devos wrote on 5 Mar 10:35 +0100
Re: [bug#54241] [PATCH 2/4] import: github: Gracefully handle rate limit exhaustion.
81f212aeddb345789105c8b8b90bce032adb6c1a.camel@telenet.be
Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]:
Toggle quote (5 lines)
> When running "guix refresh" (with no arguments), the 'github' updater
> gets used until the rate limit has been reached, after which "guix
> refresh" automatically picks up the next valid updater, typically
> 'generic-git'.

Wouldn't this make "guix refresh" non-deterministic (though this might
be an acceptable trade-off)?

Greetings,
Maxime.
-----BEGIN PGP SIGNATURE-----

iI0EABYKADUWIQTB8z7iDFKP233XAR9J4+4iGRcl7gUCYiMu0xccbWF4aW1lZGV2
b3NAdGVsZW5ldC5iZQAKCRBJ4+4iGRcl7rXRAPwLumHwTqOkNKP+sZJTFaAEP9Oc
jSM+ZTWfocFIVNXcLQD8DcVpUQEGajClDrGKwkQ/0PiFUxpef1j7lvyA6O6V9g4=
=VSK6
-----END PGP SIGNATURE-----


M
M
Maxime Devos wrote on 5 Mar 10:37 +0100
ec20410ae84a28c59994e558735ee874aac3ece7.camel@telenet.be
Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]:
Toggle quote (18 lines)
> +  (and (not (request-rate-limit-reached?))
> +       (guard (c ([...]
> +                  ;; See
> +                  ;; <https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting>.
> +                  (match (assq-ref (http-get-error-headers c)
> +                                   'x-ratelimit-remaining)
> +                    (#f
> +                     (raise c))
> +                    ((? (compose zero? string->number))
> +                     (let ((reset (update-rate-limit-reset-time!
> +                                   (http-get-error-headers c))))
> +                       (warning (G_ "GitHub rate limit exceeded; \
> +disallowing requests for ~a seconds~%")
> +                                (- reset (car (gettimeofday))))
> +                       (display-hint (G_ "You can waive the rate limit by
> +setting the @env{GUIX_GITHUB_TOKEN} environment variable to a token obtained
> +from @url{https://github.com/settings/tokens} with your GitHub account.

IIRC, the GitHub documentation doesn't state that this waives the rate
limit, rather it increases the rate limit a lot, but there's still a
limit.

Greetings,
Maxime.
-----BEGIN PGP SIGNATURE-----

iI0EABYKADUWIQTB8z7iDFKP233XAR9J4+4iGRcl7gUCYiMvPxccbWF4aW1lZGV2
b3NAdGVsZW5ldC5iZQAKCRBJ4+4iGRcl7m8xAQCtCloUphtmRR0fYGTzLob60MUI
Gb6+4KGnfF85do0zkQD/T7zMh2/Q9PfdvhUTbS3kl0jxH1CnP1MiJtCr7ArofAg=
=YkRQ
-----END PGP SIGNATURE-----


M
M
Maxime Devos wrote on 5 Mar 10:44 +0100
Re: bug#54241: [PATCH 0/4] 'github' importer gracefully handles rate limiting
(name . Ludovic Courtès)(address . ludo@gnu.org)
9ec3d594aaf4e2ed835a15999c07398bb7c96029.camel@telenet.be
Ludovic Courtès schreef op vr 04-03-2022 om 21:45 [+0100]:
Toggle quote (7 lines)
> My concern is that we’d end up caching one or two little files in
> ~/.cache for each candidate package, and (rate limit aside) the overhead
> of dealing with the cache might outweigh the benefits.  I’d rather use
> ‘http-fetch/cached’ for bigger files, like in (guix cve).
>
> WDYT?

If the overhead of caching little files is a concern, then perhaps a
SQLite (or GDBM) database could be used instead of the filesystem-based
cache? The number of packages in Guix was about 150 000 IIRC, if we
assume something around the magnitude of 200 bytes per package, then
we end up with about 29 MiB for the entirity of Guix. And there might
be some opportunities for compression, reducing this number.

Something like this could be left for later though.

Greetings,
Maxime.
-----BEGIN PGP SIGNATURE-----

iI0EABYKADUWIQTB8z7iDFKP233XAR9J4+4iGRcl7gUCYiMxEBccbWF4aW1lZGV2
b3NAdGVsZW5ldC5iZQAKCRBJ4+4iGRcl7rlrAQCEUL9l/X7qI4RTPnAN4QTORipl
HbeWWKeUBjBFdritsAEAx5BXSS1xnYcyioOiI845303n1EX1gat09Nm7oVWT2QY=
=KduL
-----END PGP SIGNATURE-----


M
M
Maxime Devos wrote on 5 Mar 10:48 +0100
Re: [bug#54241] [PATCH 2/4] import: github: Gracefully handle rate limit exhaustion.
61869824d6b6f27ee1b8e71456e9e5c21b3dea5e.camel@telenet.be
Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]:
Toggle quote (2 lines)
> [...]

It would be nice to have some tests for the rate limiting code,
in tests/import-github.scm.

Greetings,
Maxime.
-----BEGIN PGP SIGNATURE-----

iI0EABYKADUWIQTB8z7iDFKP233XAR9J4+4iGRcl7gUCYiMx5BccbWF4aW1lZGV2
b3NAdGVsZW5ldC5iZQAKCRBJ4+4iGRcl7hoaAQDoOMGqBrglGPX1x5Rir5GU6y7/
Km73KN/Sqb+V7vUXmwEAhmqxf4nwrHaDY/pnLI6fqJ8qxBrFZmLI4EdBRjetfQg=
=Cd8J
-----END PGP SIGNATURE-----


M
M
Maxime Devos wrote on 5 Mar 10:48 +0100
5ddbea58affe4bdb9a8c8d138bb25ccac9aca65c.camel@telenet.be
Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]:
Toggle quote (10 lines)
> +(define (update-rate-limit-reset-time! headers)
> +  "Update the rate limit reset time based on HEADERS, the HTTP response
> +headers."
> +  (match (assq-ref headers 'x-ratelimit-reset)
> +    ((= string->number (? number? reset))
> +     (set! %rate-limit-reset-time reset)
> +     reset)
> +    (_
> +     0)))

When can this second case happen?

Greetings,
Maxime.
-----BEGIN PGP SIGNATURE-----

iI0EABYKADUWIQTB8z7iDFKP233XAR9J4+4iGRcl7gUCYiMyCBccbWF4aW1lZGV2
b3NAdGVsZW5ldC5iZQAKCRBJ4+4iGRcl7hDMAQD1e/L4koL7hN9AvjEIS+A8cjr/
OEjrzHutBRxbEhIOHQEAgCCT/Hn14dudn3MxVJAHvYHwZCHVm7+xrbK2haZ9Ngw=
=vjUY
-----END PGP SIGNATURE-----


M
M
Maxime Devos wrote on 5 Mar 10:52 +0100
2321d1fa880d51a5938b14ab0bd006e966fa8ac1.camel@telenet.be
Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]:
Toggle quote (9 lines)
> +(define (request-rate-limit-reached?)
> +  "Return true if the rate limit has been reached."
> +  (and %rate-limit-reset-time
> +       (match (< (car (gettimeofday)) %rate-limit-reset-time)
> +         (#t #t)
> +         (#f
> +          (set! %rate-limit-reset-time #f)
> +          #f))))

The clocks used by the GitHub server cannot exactly be the clock of the
local Guix (at least, not in a realistic setting). WDYT of adding a
little margin, accounting for the impossibility of clocks exactly
matching and allowing for some clock skew?

(< (car (gettimeofday)) (+ [5 minutes] %rate-limit-reset-time))

Greetings,
Maxime.
-----BEGIN PGP SIGNATURE-----

iI0EABYKADUWIQTB8z7iDFKP233XAR9J4+4iGRcl7gUCYiMy9hccbWF4aW1lZGV2
b3NAdGVsZW5ldC5iZQAKCRBJ4+4iGRcl7pLaAP9so91eVeGOK0n94JDoUpI3IGvO
y8tjJaseJjvG8cnMzgEA4Yn5lGp4unZC1YC6Lh1BNcKAHM7+yjjQRiFiX18chAA=
=AxYS
-----END PGP SIGNATURE-----


L
L
Ludovic Courtès wrote on 5 Mar 22:58 +0100
Re: bug#54241: [PATCH 0/4] 'github' importer gracefully handles rate limiting
(name . Maxime Devos)(address . maximedevos@telenet.be)
87a6e4njrz.fsf_-_@gnu.org
Hi,

Maxime Devos <maximedevos@telenet.be> skribis:

Toggle quote (15 lines)
> Ludovic Courtès schreef op vr 04-03-2022 om 21:45 [+0100]:
>> My concern is that we’d end up caching one or two little files in
>> ~/.cache for each candidate package, and (rate limit aside) the overhead
>> of dealing with the cache might outweigh the benefits.  I’d rather use
>> ‘http-fetch/cached’ for bigger files, like in (guix cve).
>>
>> WDYT?
>
> If the overhead of caching little files is a concern, then perhaps a
> SQLite (or GDBM) database could be used instead of the filesystem-based
> cache? The number of packages in Guix was about 150 000 IIRC, if we
> assume something around the magnitude of 200 bytes per package, then
> we end up with about 29 MiB for the entirity of Guix. And there might
> be some opportunities for compression, reducing this number.

I think this would be going overboard in terms of complexity :-), and it
wouldn’t radically change the run-time overhead (you still potentially
have to do an HTTP round trip with ‘If-Modified-Since’, you’re just
saving a few hundred bytes on the response in the best case.)

Toggle quote (2 lines)
> Something like this could be left for later though.

Yup!

Ludo’.
L
L
Ludovic Courtès wrote on 5 Mar 23:00 +0100
(name . Maxime Devos)(address . maximedevos@telenet.be)(address . 54241@debbugs.gnu.org)
875yosnjot.fsf_-_@gnu.org
Maxime Devos <maximedevos@telenet.be> skribis:

Toggle quote (9 lines)
> Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]:
>> When running "guix refresh" (with no arguments), the 'github' updater
>> gets used until the rate limit has been reached, after which "guix
>> refresh" automatically picks up the next valid updater, typically
>> 'generic-git'.
>
> Wouldn't this make "guix refresh" non-deterministic (though this might
> be an acceptable trade-off)?

Correct. It could be said that ‘guix refresh’ is “non-deterministic”
anyway since its result depends on external services. I think it’s an
acceptable tradeoff.

Ludo’.
L
L
Ludovic Courtès wrote on 5 Mar 23:01 +0100
(name . Maxime Devos)(address . maximedevos@telenet.be)(address . 54241@debbugs.gnu.org)
871qzgnjns.fsf_-_@gnu.org
Maxime Devos <maximedevos@telenet.be> skribis:
Toggle quote (22 lines)
> Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]:
>> +  (and (not (request-rate-limit-reached?))
>> +       (guard (c ([...]
>> +                  ;; See
>> +                  ;; <https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting>.
>> +                  (match (assq-ref (http-get-error-headers c)
>> +                                   'x-ratelimit-remaining)
>> +                    (#f
>> +                     (raise c))
>> +                    ((? (compose zero? string->number))
>> +                     (let ((reset (update-rate-limit-reset-time!
>> +                                   (http-get-error-headers c))))
>> +                       (warning (G_ "GitHub rate limit exceeded; \
>> +disallowing requests for ~a seconds~%")
>> +                                (- reset (car (gettimeofday))))
>> +                       (display-hint (G_ "You can waive the rate limit by
>> +setting the @env{GUIX_GITHUB_TOKEN} environment variable to a token obtained
>> +from @url{https://github.com/settings/tokens} with your GitHub account.
>
> IIRC, the GitHub documentation doesn't state that this waives the rate
> limit, rather it increases the rate limit a lot, but there's still a
> limit.
Good point, I’ll adjust the message accordingly.
L
L
Ludovic Courtès wrote on 5 Mar 23:03 +0100
(name . Maxime Devos)(address . maximedevos@telenet.be)(address . 54241@debbugs.gnu.org)
87wnh8m4zw.fsf_-_@gnu.org
Maxime Devos <maximedevos@telenet.be> skribis:

Toggle quote (13 lines)
> Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]:
>> +(define (update-rate-limit-reset-time! headers)
>> +  "Update the rate limit reset time based on HEADERS, the HTTP response
>> +headers."
>> +  (match (assq-ref headers 'x-ratelimit-reset)
>> +    ((= string->number (? number? reset))
>> +     (set! %rate-limit-reset-time reset)
>> +     reset)
>> +    (_
>> +     0)))
>
> When can this second case happen?

I don’t know if it’s supposed to happen. It’s defensive programming:
better keep going than crash if the server starts behaving slightly
differently.
M
M
Maxime Devos wrote on 5 Mar 23:04 +0100
(name . Ludovic Courtès)(address . ludo@gnu.org)
7d4255f8670a135420ff526262459035adddb0d5.camel@telenet.be
Ludovic Courtès schreef op za 05-03-2022 om 22:58 [+0100]:
Toggle quote (4 lines)
> [...] and it wouldn’t radically change the run-time overhead (you still
> potentially have to do an HTTP round trip with ‘If-Modified-Since’,
> you’re just saving a few hundred bytes on the response in the best case.)

IIUC, when the TTL hasn't been exceeded, then the file from the file
system is served without contacting the remote server at all. So in
the best case, you only ‘round-trip’ to the disk instead of the HTTP
server. So I think there's some potential benefits to be had here.

That assumes a sufficiently large TTL though.

Greetings,
Maxime.
-----BEGIN PGP SIGNATURE-----

iI0EABYKADUWIQTB8z7iDFKP233XAR9J4+4iGRcl7gUCYiPeexccbWF4aW1lZGV2
b3NAdGVsZW5ldC5iZQAKCRBJ4+4iGRcl7vUcAP94Itr7wlL8pLLDKKx7dR1VYbKd
WKECqLQqpwntYDY/CwD8CdKWcCfsYnIcYQNlJsgn0WwmxVIFyUUrYHoQU5W5XA0=
=cbRT
-----END PGP SIGNATURE-----


L
L
Ludovic Courtès wrote on 5 Mar 23:06 +0100
(name . Maxime Devos)(address . maximedevos@telenet.be)(address . 54241@debbugs.gnu.org)
87r17gm4v6.fsf_-_@gnu.org
Maxime Devos <maximedevos@telenet.be> skribis:

Toggle quote (17 lines)
> Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]:
>> +(define (request-rate-limit-reached?)
>> +  "Return true if the rate limit has been reached."
>> +  (and %rate-limit-reset-time
>> +       (match (< (car (gettimeofday)) %rate-limit-reset-time)
>> +         (#t #t)
>> +         (#f
>> +          (set! %rate-limit-reset-time #f)
>> +          #f))))
>
> The clocks used by the GitHub server cannot exactly be the clock of the
> local Guix (at least, not in a realistic setting). WDYT of adding a
> little margin, accounting for the impossibility of clocks exactly
> matching and allowing for some clock skew?
>
> (< (car (gettimeofday)) (+ [5 minutes] %rate-limit-reset-time))

I don’t think it’s necessary. The worst that can happen is that we
retry too early, get another 403 response, and retry later. (In
practice, on my NTP-synchronized laptop, things just work.)
L
L
Ludovic Courtès wrote on 5 Mar 23:09 +0100
(name . Maxime Devos)(address . maximedevos@telenet.be)(address . 54241@debbugs.gnu.org)
87mti4m4q9.fsf_-_@gnu.org
Maxime Devos <maximedevos@telenet.be> skribis:

Toggle quote (3 lines)
> It would be nice to have some tests for the rate limiting code,
> in tests/import-github.scm.

It would but… I think I’ll punt on that one. :-)
M
M
Maxime Devos wrote on 5 Mar 23:11 +0100
(name . Ludovic Courtès)(address . ludo@gnu.org)
48d960ad6ec09604d9271cfae7d57796b564e1c0.camel@telenet.be
Ludovic Courtès schreef op za 05-03-2022 om 22:58 [+0100]:
Toggle quote (2 lines)
> I think this would be going overboard in terms of complexity :-)

There's some complexity here, but assuming a sufficient amount of
tests, I believe it would be worth it if it allows side-stepping the
rate limit to some degree. And the extra complexity would mostly
disappear if the overhead of tiny files was accepted (*).

There are also some other benefits, e.g. a kind of ‘download
resumption’ but for linters, reducing network traffic after retrying
"guix lint" on a lossy network (or because the terminal tab was closed
to early, etc.).

All stuff that can be left for later though!

Greetings,
Maxime.

(*) Assuming 150 000 packages and 1 KiB per package (this would be
file-system dependent!), I end up with 150 MiB. That's a bit on the
large size though ...
-----BEGIN PGP SIGNATURE-----

iI0EABYKADUWIQTB8z7iDFKP233XAR9J4+4iGRcl7gUCYiPf+RccbWF4aW1lZGV2
b3NAdGVsZW5ldC5iZQAKCRBJ4+4iGRcl7huQAP9s5wWhgz3jLsn+2DSC3Atx+4Xq
eWeeg2PFGrMeFBIYlwEA+2iz94aC0r3JK607TD53EzXK1LYv52tp8nkdrKbnHws=
=3jco
-----END PGP SIGNATURE-----


M
M
Maxime Devos wrote on 5 Mar 23:16 +0100
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 54241@debbugs.gnu.org)
c66d948313fd9451e50761fc707fcbcdc6d2bff6.camel@telenet.be
Ludovic Courtès schreef op za 05-03-2022 om 23:03 [+0100]:
Toggle quote (4 lines)
> I don’t know if it’s supposed to happen.  It’s defensive programming:
> better keep going than crash if the server starts behaving slightly
> differently.

That's called total programming I think? From a OOP I'm following:

* total: handle all cases without complaints (no throwing exceptions
or such), assign every case a well-defined (and documented!)
behaviour
* nominal: document the preconditions, but don't bother checking them
* defensive: check inputs, if they are wrong, throw an exception

(it was probably formulated a bit differently but that was the gist of
it)

At least according to this classification, this
'update-rate-limit-reset-time!' would be total (except for the lack of
documentation), not defensive.

Greetings,
Maxime.
-----BEGIN PGP SIGNATURE-----

iI0EABYKADUWIQTB8z7iDFKP233XAR9J4+4iGRcl7gUCYiPhJBccbWF4aW1lZGV2
b3NAdGVsZW5ldC5iZQAKCRBJ4+4iGRcl7s2AAP90etfPQ3L09auXAbp+Kv+8/1Q/
euTM5+gp29o6M+AazQEAg66z8EnIYmmKIFYD2kEjEdmRm3R6auhhmIbgIpmiQwk=
=/aVa
-----END PGP SIGNATURE-----


M
M
Maxime Devos wrote on 5 Mar 23:21 +0100
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 54241@debbugs.gnu.org)
73b2e8d49a3c178d06e2d7c2984f499a66e249d4.camel@telenet.be
Ludovic Courtès schreef op za 05-03-2022 om 23:03 [+0100]:
Toggle quote (19 lines)
> Maxime Devos <maximedevos@telenet.be> skribis:
>
> > Ludovic Courtès schreef op do 03-03-2022 om 22:14 [+0100]:
> > > +(define (update-rate-limit-reset-time! headers)
> > > +  "Update the rate limit reset time based on HEADERS, the HTTP response
> > > +headers."
> > > +  (match (assq-ref headers 'x-ratelimit-reset)
> > > +    ((= string->number (? number? reset))
> > > +     (set! %rate-limit-reset-time reset)
> > > +     reset)
> > > +    (_
> > > +     0)))
> >
> > When can this second case happen?
>
> I don’t know if it’s supposed to happen.  It’s defensive programming:
> better keep going than crash if the server starts behaving slightly
> differently.

If it's not supposed to happen, can it at least be reported with a
warning, such that we then know that 'update-rate-limit-reset-time!'
needs to be extended or GitHub needs to be contacted?

FWIW, I think crashing in case of bogus HTTP answers is fine, as long
as it crashes with a _nice_ error message ("guix: error: HTTP server
foo.com returned an unrecoginised X-Ratelimit-Reset $SOME_STRING" or
something like that) instead of some vague backtrace.

Greetings,
Maxime.
-----BEGIN PGP SIGNATURE-----

iI0EABYKADUWIQTB8z7iDFKP233XAR9J4+4iGRcl7gUCYiPiTxccbWF4aW1lZGV2
b3NAdGVsZW5ldC5iZQAKCRBJ4+4iGRcl7iaOAQDLWPqjDOJIM5ceCo62t3ELxIHu
burvTnsk3Y+pMnYLMwD+InZCjUVt40CTioWzHL5moDsQZbljmV7BXn1/SOj5nwQ=
=3JJp
-----END PGP SIGNATURE-----


L
L
Ludovic Courtès wrote on 6 Mar 18:18 +0100
(name . Maxime Devos)(address . maximedevos@telenet.be)(address . 54241@debbugs.gnu.org)
874k4bknjh.fsf_-_@gnu.org
Hi,

Maxime Devos <maximedevos@telenet.be> skribis:

Toggle quote (8 lines)
> That's called total programming I think? From a OOP I'm following:
>
> * total: handle all cases without complaints (no throwing exceptions
> or such), assign every case a well-defined (and documented!)
> behaviour
> * nominal: document the preconditions, but don't bother checking them
> * defensive: check inputs, if they are wrong, throw an exception

Interesting; I stand corrected!

Toggle quote (4 lines)
> If it's not supposed to happen, can it at least be reported with a
> warning, such that we then know that 'update-rate-limit-reset-time!'
> needs to be extended or GitHub needs to be contacted?

Yes, sounds reasonable.

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 6 Mar 18:21 +0100
(name . Maxime Devos)(address . maximedevos@telenet.be)
87zgm3j8te.fsf_-_@gnu.org
Maxime Devos <maximedevos@telenet.be> skribis:

Toggle quote (7 lines)
> Ludovic Courtès schreef op za 05-03-2022 om 22:58 [+0100]:
>> I think this would be going overboard in terms of complexity :-)
>
> There's some complexity here, but assuming a sufficient amount of
> tests, I believe it would be worth it if it allows side-stepping the
> rate limit to some degree.

What should also be taken into account is the usefulness of the ‘github’
updater—investment should be proportionate. I suspect it’s much less
useful now that we have the ‘generic-git’ updater. Maybe, maybe it
gives slightly more accurate data in some cases, maybe it can be
slightly faster, but that’s not entirely clear to me.
L
L
Ludovic Courtès wrote on 6 Mar 22:55 +0100
(name . Maxime Devos)(address . maximedevos@telenet.be)(address . 54241-done@debbugs.gnu.org)
875yoqiw46.fsf_-_@gnu.org
Hello,

I committed an updated version of these patches, taking some of your
suggestions into account:

a8d3033da6 import: github: Reuse HTTP connection for the /tags URL fallback.
8786c2e8d7 http-client: Correctly handle redirects when #:keep-alive? #t.
55e8e283ae import: github: Gracefully handle rate limit exhaustion.
ecad9b2213 http-client: Add response headers to '&http-get-error'.
049aefddb2 tests: Add (guix http-client) tests.

The first patch adds tests for ‘http-fetch’, as you suggested, which
allowed me to find a thinko in the keep-alive patch. The test doesn’t
test keep-alive support though, because the server in (guix tests http)
doesn’t support it currently (I tried to retrofit it based on (web
server http) but that’s not really possible because we’d need to either
access internals of the <http> record type, specifically it’s poll set,
or re-implement it entirely.)

Thanks a lot for your help!

Ludo’.
Closed
?
Your comment

This issue is archived.

To comment on this conversation send email to 54241@debbugs.gnu.org