Substitute requests fail if URL has trailing slash

  • Done
  • quality assurance status badge
Details
6 participants
  • Hartmut Goebel
  • Hartmut Goebel
  • Ludovic Courtès
  • Mark H Weaver
  • Mathieu Othacehe
  • zimoun
Owner
unassigned
Submitted by
Hartmut Goebel
Severity
normal
H
H
Hartmut Goebel wrote on 27 Nov 2020 22:19
(name . bug-guix)(address . bug-guix@gnu.org)
3848e5d4-3694-e7f4-cb42-f97a51bde5b4@crazy-compilers.com
If the substitute-URL ends with a slash, api requests fail.

Expected behavior:

Substitute-URLs with and without trailing slash should behave the same.
This is especially true for substitute-URLs with empty path
("http://server") and path "/" (http://server/") - for which the RFCs
explicitly state to be equivalent.

According to RFC 7230, sec 2.7.3 "http and https URI Normalization and
Comparison" [1]:

[…] an empty
path component is equivalent to an absolute path of "/", so the
normal form is to provide a path of "/" instead.



How to reproduce:

no trailing slash:

$ guix weather --substitute-urls="https://ci.guix.gnu.org" gcc-toolchain
  100.0% substitutes available (3 out of 3)

Trailing slash:

$ guix weather --substitute-urls="https://ci.guix.gnu.org/"gcc-toolchain
  0.0% substitutes available (0 out of 3)
  'https://ci.guix.gnu.org//api/queue?nr=1000'returned 400 ("Bad Request")

--
Regards
Hartmut Goebel

| Hartmut Goebel | h.goebel@crazy-compilers.com |
| www.crazy-compilers.com | compilers which you thought are impossible |
Z
Z
zimoun wrote on 28 Nov 2020 00:37
86a6v2qslp.fsf@gmail.com
Dear,

Thank you for the report.


Tweaking the function such as:

Toggle snippet (9 lines)
(define (narinfo-request cache-url path)
"Return an HTTP request for the narinfo of PATH at CACHE-URL."
(let ((url (string-append cache-url "/" (store-path-hash-part path)
".narinfo"))
(headers '((User-Agent . "GNU Guile"))))
(format #t "~%Narinfo request: ~a~%~%" url)
(build-request (string->uri url) #:method 'GET #:headers headers)))

and removing the cache adequately, then running:

Toggle snippet (37 lines)
./pre-inst-env guix weather \
--substitute-urls="https://ci.guix.gnu.org/ https://ci.guix.gnu.org" \
hello
computing 1 package derivations for x86_64-linux...

looking for 1 store items on https://ci.guix.gnu.org/...

Narinfo request: https://ci.guix.gnu.org//a462kby1q51ndvxdv3b6p0rsixxrgx1h.narinfo

updating substitutes from 'https://ci.guix.gnu.org/'... 100.0%
https://ci.guix.gnu.org/
0.0% substitutes available (0 out of 1)

[...]

'https://ci.guix.gnu.org//api/queue?nr=1000' returned 400 ("Bad Request")

looking for 1 store items on https://ci.guix.gnu.org...

Narinfo request: https://ci.guix.gnu.org/a462kby1q51ndvxdv3b6p0rsixxrgx1h.narinfo

updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
https://ci.guix.gnu.org
100.0% substitutes available (1 out of 1)

[...]

at least 1,000 queued builds

[...]

build rate: 36.89 builds per hour

[...]


On Fri, 27 Nov 2020 at 22:19, Hartmut Goebel <h.goebel@crazy-compilers.com> wrote:

Toggle quote (9 lines)
> According to RFC 7230, sec 2.7.3 "http and https URI Normalization and
> Comparison" [1]:
>
> […] an empty
> path component is equivalent to an absolute path of "/", so the
> normal form is to provide a path of "/" instead.
>
> [1] https://tools.ietf.org/html/rfc7230#section-2.7.3

Now, the question is where should the fix go? “guix publish” exposing
the narinfos or “guix weather“? Or both?


From my understanding, one fix should go to ‘guix publish’ exposing the
narinfos since:


should be a valid URL and return the narinfo file. However, taking this
road, it means that the cache folder will not be the same:

~/.cache/guix/substitute/x2wcz6gz3evwlqcrz3fqstmezkfcfnpfb5kfyxbz7kjikc7upkiq/
~/.cache/guix/substitute/4refhwxbjmeua2kwg2nmzhv4dg4d3dorpjefq7kiciw2pfhaf26a/

https://ci.guix.gnu.org/resp. https://ci.guix.gnu.org Therefore, ‘guix
weather’ should be fixed too.


WDYT?

All the best,
simon
H
H
Hartmut Goebel wrote on 28 Nov 2020 10:47
7b367712-d759-ba3d-3ffc-d3323cb859c7@crazy-compilers.com
Am 28.11.20 um 00:37 schrieb zimoun:
Toggle quote (3 lines)
> Now, the question is where should the fix go? “guix publish” exposing
> the narinfos or “guix weather“? Or both?

I propose fixing all places where string-append is used to join URLs,
since joining URLs is not the same as string concatenation. We might
restrict our algorithm to only joining a path.
algorithm, where this is the relevant part for only joining a path
(R.path) to a base URL's path (T.path).

if (R.path starts-with "/") then
T.path = remove_dot_segments(R.path);
else
T.path = merge(Base.path, R.path);
T.path = remove_dot_segments(T.path);

(Side-node: guile module (web uri)
lack respective, easy to use functions.)

--
Regards
Hartmut Goebel

| Hartmut Goebel | h.goebel@crazy-compilers.com |
| www.crazy-compilers.com | compilers which you thought are impossible |
Attachment: file
L
L
Ludovic Courtès wrote on 3 Dec 2020 18:01
(name . Hartmut Goebel)(address . h.goebel@crazy-compilers.com)
87im9iddta.fsf@gnu.org
Hi,

Hartmut Goebel <h.goebel@crazy-compilers.com> skribis:

Toggle quote (13 lines)
> I propose fixing all places where string-append is used to join URLs,
> since joining URLs is not the same as string concatenation. We might
> restrict our algorithm to only joining a
> path. <https://tools.ietf.org/html/rfc3986#section-5.2.2> shows the
> complete algorithm, where this is the relevant part for only joining a
> path (R.path) to a base URL's path (T.path).
>
> if (R.path starts-with "/") then
> T.path = remove_dot_segments(R.path);
> else
> T.path = merge(Base.path, R.path);
> T.path = remove_dot_segments(T.path);

To begin with, we could define ‘url-append’ in (guix http-client), say,
and use it in (guix scripts substitute).

Eventually it would be nice to have that in (web uri).

Thoughts?

Ludo’.
M
M
Mark H Weaver wrote on 4 Dec 2020 05:15
(address . 44906@debbugs.gnu.org)
87zh2udx4j.fsf@netris.org
Hi,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (20 lines)
> Hartmut Goebel <h.goebel@crazy-compilers.com> skribis:
>
>> I propose fixing all places where string-append is used to join URLs,
>> since joining URLs is not the same as string concatenation. We might
>> restrict our algorithm to only joining a
>> path. <https://tools.ietf.org/html/rfc3986#section-5.2.2> shows the
>> complete algorithm, where this is the relevant part for only joining a
>> path (R.path) to a base URL's path (T.path).
>>
>> if (R.path starts-with "/") then
>> T.path = remove_dot_segments(R.path);
>> else
>> T.path = merge(Base.path, R.path);
>> T.path = remove_dot_segments(T.path);
>
> To begin with, we could define ‘url-append’ in (guix http-client), say,
> and use it in (guix scripts substitute).
>
> Eventually it would be nice to have that in (web uri).

Note that 'resolve-uri-reference' in (guix build download) implements
the algorithm specified in RFC 3986 section 5.2.2, for purposes of
supporting HTTP redirects. Perhaps some of that code will be useful.

Regards,
Mark
H
H
Hartmut Goebel wrote on 9 Jul 2021 10:38
[PATCH 0/3] Properly construct URLs if base-url has trailing slash
cover.1625818877.git.h.goebel@crazy-compilers.com
Here is now an attempt to solve the issue. It had to be fixed in
guix/substitutes.scm and guix/ci.scm only. In guix/scripts/publish.scm I did
not spot any place where wrong URLs are constructed.

Many thanks to Mark for pointing to 'resolve-uri-reference'.

Regarding CI: I did some tests, so these should work. Anyhow, I did not find a
tests-suite for fully testing this part.

Hartmut Goebel (3):
substitute: Fix handling of short option "-h".
substitutes: Properly construct URLs.
ci: Properly construct URLs.

guix/ci.scm | 79 +++++++++++++++++++++----------------
guix/scripts/substitute.scm | 2 +-
guix/substitutes.scm | 13 +++---
3 files changed, 55 insertions(+), 39 deletions(-)

--
2.30.2
H
H
Hartmut Goebel wrote on 9 Jul 2021 10:38
[PATCH 1/3] substitute: Fix handling of short option "-h".
d9544a84ebfc8a7bf3bc628aa15ab8c81b4a7bbd.1625819848.git.h.goebel@crazy-compilers.com
The short option was listed in the help-text, but not recognized.
---
guix/scripts/substitute.scm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Toggle diff (15 lines)
diff --git a/guix/scripts/substitute.scm b/guix/scripts/substitute.scm
index 03115ffe44..c044e1d47a 100755
--- a/guix/scripts/substitute.scm
+++ b/guix/scripts/substitute.scm
@@ -777,7 +777,7 @@ default value."
(loop))))))
((or ("-V") ("--version"))
(show-version-and-exit "guix substitute"))
- (("--help")
+ ((or ("-h") ("--help"))
(show-help))
(opts
(leave (G_ "~a: unrecognized options~%") opts))))))
--
2.30.2
H
H
Hartmut Goebel wrote on 9 Jul 2021 10:38
[PATCH 2/3] substitutes: Properly construct URLs.
26c8bb31c4d468e770e35cd743aca736f5ccd093.1625819848.git.h.goebel@crazy-compilers.com
Use relative URIs and "resolve-uri-reference" (which implements the algorithm
specified in RFC 3986 section 5.2.2) for building the URL, instead of just
appending strings. This avoids issued if the cache-url ends with a slash.

* guix/substitutes.scm (narinfo-request): Use resolve-uri-reference for
constructing the url.
---
guix/substitutes.scm | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

Toggle diff (33 lines)
diff --git a/guix/substitutes.scm b/guix/substitutes.scm
index 4987cda165..a5c554acff 100644
--- a/guix/substitutes.scm
+++ b/guix/substitutes.scm
@@ -37,7 +37,8 @@
#:use-module ((guix build utils) #:select (mkdir-p dump-port))
#:use-module ((guix build download)
#:select ((open-connection-for-uri
- . guix:open-connection-for-uri)))
+ . guix:open-connection-for-uri)
+ resolve-uri-reference))
#:use-module (guix progress)
#:use-module (ice-9 rdelim)
#:use-module (ice-9 regex)
@@ -155,10 +156,12 @@ indicates that PATH is unavailable at CACHE-URL."
(define (narinfo-request cache-url path)
"Return an HTTP request for the narinfo of PATH at CACHE-URL."
- (let ((url (string-append cache-url "/" (store-path-hash-part path)
- ".narinfo"))
- (headers '((User-Agent . "GNU Guile"))))
- (build-request (string->uri url) #:method 'GET #:headers headers)))
+ (let* ((base (string->uri cache-url))
+ (ref (build-relative-ref
+ #:path (string-append (store-path-hash-part path) ".narinfo")))
+ (url (resolve-uri-reference ref base))
+ (headers '((User-Agent . "GNU Guile"))))
+ (build-request url #:method 'GET #:headers headers)))
(define (narinfo-from-file file url)
"Attempt to read a narinfo from FILE, using URL as the cache URL. Return #f
--
2.30.2
H
H
Hartmut Goebel wrote on 9 Jul 2021 10:38
[PATCH 3/3] ci: Properly construct URLs.
d576a732f6bca1a530116a8665f72c413f8b3c0f.1625819848.git.h.goebel@crazy-compilers.com
Implement a new function "api-url", which constructs URLs using relative URI
and "resolve-uri-reference" (which implements the algorithm specified in RFC
3986 section 5.2.2) for building the URL, instead of just appending
strings. This avoids issued if the server-url ends with a slash.

Since "api-url" uses URI-objects, it makes sense to also construct the
query-part of the URL here. For this "api-url" accepts optional
key-value-pairs.

New function "json-api-fetch" is a wrapper using "api-url".

* guix/ci.scm (api-url): New function. (build): Use it.
(json-api-fetch): New function. (queued-builds, latest-builds,
evaluation, latest-evaluations, evaluation-jobs: Use it.
---
guix/ci.scm | 79 +++++++++++++++++++++++++++++++----------------------
1 file changed, 46 insertions(+), 33 deletions(-)

Toggle diff (142 lines)
diff --git a/guix/ci.scm b/guix/ci.scm
index dde93bbd53..cf39744567 100644
--- a/guix/ci.scm
+++ b/guix/ci.scm
@@ -20,9 +20,12 @@
(define-module (guix ci)
#:use-module (guix http-client)
#:use-module (guix utils)
+ #:use-module ((guix build download)
+ #:select (resolve-uri-reference))
#:use-module (json)
#:use-module (srfi srfi-1)
#:use-module (ice-9 match)
+ #:use-module (web uri)
#:use-module (guix i18n)
#:use-module (guix diagnostics)
#:autoload (guix channels) (channel)
@@ -146,16 +149,41 @@
;; Max number of builds requested in queries.
1000)
+(define* (api-url base-url path #:rest query)
+ "Build a proper API url, taking into account BASE_URL's trailing slashes."
+
+ (define (build-query-string query)
+ (let lp ((query (or (reverse query) '())) (acc '()))
+ (match query
+ (() (string-concatenate acc))
+ (((_ #f) . rest) (lp rest acc))
+ (((name val) . rest)
+ (lp rest (cons*
+ name "="
+ (if (string? val) (uri-encode val) (number->string val))
+ (if (null? acc) "" "&")
+ acc))))))
+
+ (let* ((query-string (build-query-string query))
+ (base (string->uri base-url))
+ (ref (build-relative-ref #:path path #:query query-string)))
+ (resolve-uri-reference ref base)))
+
+
(define (json-fetch url)
(let* ((port (http-fetch url))
(json (json->scm port)))
(close-port port)
json))
+(define* (json-api-fetch base-url path #:rest query)
+ (json-fetch (apply api-url base-url path query)))
+
+
(define* (queued-builds url #:optional (limit %query-limit))
"Return the list of queued derivations on URL."
- (let ((queue (json-fetch (string-append url "/api/queue?nr="
- (number->string limit)))))
+ (let ((queue
+ (json-api-fetch url "/api/queue" `("nr" ,limit))))
(map json->build (vector->list queue))))
(define* (latest-builds url #:optional (limit %query-limit)
@@ -163,28 +191,21 @@
"Return the latest builds performed by the CI server at URL. If EVALUATION
is an integer, restrict to builds of EVALUATION. If SYSTEM is true (a system
string such as \"x86_64-linux\"), restrict to builds for SYSTEM."
- (define* (option name value #:optional (->string identity))
- (if value
- (string-append "&" name "=" (->string value))
- ""))
-
- (let ((latest (json-fetch (string-append url "/api/latestbuilds?nr="
- (number->string limit)
- (option "evaluation" evaluation
- number->string)
- (option "system" system)
- (option "job" job)
- (option "status" status
- number->string)))))
+ (let ((latest (json-api-fetch
+ url "/api/latestbuilds"
+ `("nr" ,limit)
+ `("evaluation" ,evaluation)
+ `("system" ,system)
+ `("job" ,job)
+ `("status" ,status))))
;; Note: Hydra does not provide a "derivation" field for entries in
;; 'latestbuilds', but Cuirass does.
(map json->build (vector->list latest))))
(define (evaluation url evaluation)
"Return the given EVALUATION performed by the CI server at URL."
- (let ((evaluation (json-fetch
- (string-append url "/api/evaluation?id="
- (number->string evaluation)))))
+ (let ((evaluation
+ (json-api-fetch url "/api/evaluation" `("id" ,evaluation))))
(json->evaluation evaluation)))
(define* (latest-evaluations url
@@ -192,16 +213,10 @@ string such as \"x86_64-linux\"), restrict to builds for SYSTEM."
#:key spec)
"Return the latest evaluations performed by the CI server at URL. If SPEC
is passed, only consider the evaluations for the given SPEC specification."
- (let ((spec (if spec
- (format #f "&spec=~a" spec)
- "")))
- (map json->evaluation
- (vector->list
- (json->scm
- (http-fetch
- (string-append url "/api/evaluations?nr="
- (number->string limit)
- spec)))))))
+ (map json->evaluation
+ (vector->list
+ (json-api-fetch
+ url "/api/evaluations" `("nr" ,limit) `("spec" ,spec)))))
(define* (evaluations-for-commit url commit #:optional (limit %query-limit))
"Return the evaluations among the latest LIMIT evaluations that have COMMIT
@@ -216,16 +231,14 @@ as one of their inputs."
"Return the list of jobs of evaluation EVALUATION-ID."
(map json->job
(vector->list
- (json->scm (http-fetch
- (string-append url "/api/jobs?evaluation="
- (number->string evaluation-id)))))))
+ (json-api-fetch url "/api/jobs" `("evaluation" ,evaluation-id)))))
(define (build url id)
"Look up build ID at URL and return it. Raise &http-get-error if it is not
found (404)."
(json->build
- (http-fetch (string-append url "/build/" ;note: no "/api" here
- (number->string id)))))
+ (http-fetch (api-url url (string-append "/build/" ;note: no "/api" here
+ (number->string id))))))
(define (job-build url job)
"Return the build associated with JOB."
--
2.30.2
M
M
Mathieu Othacehe wrote on 15 Jul 2021 09:35
(name . Hartmut Goebel)(address . h.goebel@crazy-compilers.com)
871r80do70.fsf@gnu.org
Hello Hartmut,

Thanks for this patchset!

Toggle quote (3 lines)
> +(define* (api-url base-url path #:rest query)
> + "Build a proper API url, taking into account BASE_URL's trailing slashes."

s/BASE_URL/BASE-URL/

You could also indicate what is the expect format for query: '("name"
"value") lists.

Toggle quote (6 lines)
> + (((_ #f) . rest) (lp rest acc))
> + (((name val) . rest)
> + (lp rest (cons*
> + name "="
> + (if (string? val) (uri-encode val) (number->string val))

What about booleans? False is filtered above but true will throw an
exception.

Toggle quote (4 lines)
> + (resolve-uri-reference ref base)))
> +
> +

There's an extra new line here.

Toggle quote (5 lines)
> +(define* (json-api-fetch base-url path #:rest query)
> + (json-fetch (apply api-url base-url path query)))
> +
> +

Here also.

Otherwise, it looks nice :)

Thanks,

Mathieu
H
H
Hartmut Goebel wrote on 16 Jul 2021 19:55
(name . Mathieu Othacehe)(address . othacehe@gnu.org)
2d5afdf0-5e6b-b949-057e-2a1cb24b9703@crazy-compilers.com
Hi Mathieu,
thanks for the review. I updated the doc-string, fixed the other parts
and pushed as 3ee0f170c8bd883728d8abb2c2e00f445c13f17d.

Toggle quote (3 lines)
> What about booleans? False is filtered above but true will throw an
> exception.

False is used to omit elements from the query-string.

Booleans and other types are not handled, since this low-level function
doesn't know how to convert them into a string to be put into the query.
#t could be "1", "t", "true", depending on the API used.

--
Regards
Hartmut Goebel

| Hartmut Goebel | h.goebel@crazy-compilers.com |
| www.crazy-compilers.com | compilers which you thought are impossible |
H
H
Hartmut Goebel wrote on 16 Jul 2021 19:58
close
(name . control)(address . control@debbugs.gnu.org)
be36da6a-b14b-3d13-0a39-9c3af71ed82f@goebel-consult.de
close 44906 49483 49482
?