[PATCH] guix: Support partial download

OpenSubmitted by Julien Lepiller.
Details
2 participants
  • Julien Lepiller
  • Ludovic Courtès
Owner
unassigned
Severity
normal
J
J
Julien Lepiller wrote on 9 Feb 2020 20:23
(address . guix-patches@gnu.org)
20200209202358.17bb4a39@tachikoma.lepiller.eu
Hi Guix!

This patch adds support for partial download of fixed-output
derivations. I'm not very confident it can be pushed as-is, and it has
some shortcomings.

First, I make sure that the guix daemon will not remove previously
failed attempts when trying to build something again, when that is a
fixed-output derivation. Then, I add a Range HTTP header when
performing an HTTP fetch; this ensures that we only query for the part
we don't already have, and append it to the target file.

If a partial download fails, the same mirror/url is tried again, but
the partial file is removed first, ensuring we do a complete fetch this
time around. If that failed too, we try with the following url. If we
only perform a complete fetch, we proceed as usual. The next url will
be a partial fetch if there is already something locally.

The use-case is: I have a very unreliable wifi currently and when
downloading a big source (or substitute, but this patch doesn't address
that use-case), the connection is sometimes dropped in the middle and i
have to fetch everything from scratch. With this patch, the download
resumes.

Some issues that might need to be fixed: progress only shows for the
rest of the file, it would be nicer if it could start again where it
was before (say the connection dropped at 34%, then the progress bar
should start from 34%). When there are at least two urls it goes like
that: fetch a partial file, connection drops. Remove the file and try
again, connection drops. Go to the next mirror, fetch a partial file.
The first mirror restarted the download from the beginning, but we'd
like it not to, and skip to the following mirror instead. When there is
a hash mismatch, the file is fetched twice on a further attempt.

When testing locally with guix build -S ghostscript (and running the
daemon from ./pre-inst-env), the download went fine. Cancelling it in
the middle and restarting it did continue the download instead of
starting again, which is nice :).

However, with that daemon there was a lot of new builds required to run
guix environment guix as my user (and nothing was substituted, which
is weird), whereas with the system's daemon, there was nothing to
build. Maybe there's something fishy in that patch...
From 332793b7f29ea68ac9a1af22e3d1c4745200da7e Mon Sep 17 00:00:00 2001
From: Julien Lepiller <julien@lepiller.eu>
Date: Sun, 9 Feb 2020 19:47:27 +0100
Subject: [PATCH] guix: download: Add partial download support.

* nix/libstore/build.cc (tryToBuild): Do not remove invalid fixed-output
derivations.
* guix/build/download.scm (http-fetch): Add a range argument.
(url-fetch): Performa partial download if a file already exists.
---
guix/build/download.scm | 36 +++++++++++++++++++++++++++---------
nix/libstore/build.cc | 1 +
2 files changed, 28 insertions(+), 9 deletions(-)

Toggle diff (99 lines)
diff --git a/guix/build/download.scm b/guix/build/download.scm
index 0f2d5f402a..c2b710becc 100644
--- a/guix/build/download.scm
+++ b/guix/build/download.scm
@@ -654,11 +654,13 @@ Return the resulting target URI."
                     #:query    (uri-query    ref)
                     #:fragment (uri-fragment ref)))))
 
-(define* (http-fetch uri #:key timeout (verify-certificate? #t))
+(define* (http-fetch uri #:key timeout (verify-certificate? #t) range)
   "Return an input port containing the data at URI, and the expected number of
 bytes available or #f.  When TIMEOUT is true, bail out if the connection could
 not be established in less than TIMEOUT seconds.  When VERIFY-CERTIFICATE? is
-true, verify HTTPS certificates; otherwise simply ignore them."
+true, verify HTTPS certificates; otherwise simply ignore them.  When RANGE is
+a number, it is the number of bytes we want to skip from the data at URI;
+otherwise the full document is requested."
 
   (define headers
     `(;; Some web sites, such as http://dist.schmorp.de, would block you if
@@ -670,6 +672,10 @@ true, verify HTTPS certificates; otherwise simply ignore them."
       ;; Acceptable" when not explicitly told that everything is accepted.
       (Accept . "*/*")
 
+      ,@(if range
+            `((Range . ,(string-append "bytes=" (number->string range) "-")))
+            '())
+
       ;; Basic authentication, if needed.
       ,@(match (uri-userinfo uri)
           ((? string? str)
@@ -690,7 +696,8 @@ true, verify HTTPS certificates; otherwise simply ignore them."
                 ((code)
                  (response-code resp)))
     (case code
-      ((200)                                      ; OK
+      ((200                                       ; OK
+        206)                                      ; Partial content
        (values port (response-content-length resp)))
       ((301                                       ; moved permanently
         302                                       ; found (redirection)
@@ -703,7 +710,8 @@ true, verify HTTPS certificates; otherwise simply ignore them."
          (close connection)
          (http-fetch uri
                      #:timeout timeout
-                     #:verify-certificate? verify-certificate?)))
+                     #:verify-certificate? verify-certificate?
+                     #:range range)))
       (else
        (error "download failed" (uri->string uri)
               code (response-reason-phrase resp))))))
@@ -775,10 +783,15 @@ otherwise simply ignore them."
       ((http https)
        (false-if-exception*
         (let-values (((port size)
-                      (http-fetch uri
-                                  #:verify-certificate? verify-certificate?
-                                  #:timeout timeout)))
-          (call-with-output-file file
+                      (if (file-exists? file)
+                        (http-fetch uri
+                                    #:verify-certificate? verify-certificate?
+                                    #:timeout timeout
+                                    #:range (stat:size (stat file)))
+                        (http-fetch uri
+                                    #:verify-certificate? verify-certificate?
+                                    #:timeout timeout))))
+          (call-with-port (open-file file (if (file-exists? file) "a" "w"))
             (lambda (output)
               (dump-port* port output
                           #:buffer-size %http-receive-buffer-size
@@ -817,7 +830,12 @@ otherwise simply ignore them."
   (let try ((uri (append uri content-addressed-uris)))
     (match uri
       ((uri tail ...)
-       (or (fetch uri file)
+       (or (if (file-exists? file)
+	       ;; If the file already exists, we do a partial fetch for the
+	       ;; remainder.  If something went wrong, we remove the file
+	       ;; and try to fetch the whole file instead.
+               (or (fetch uri file) (begin (delete-file file) (fetch uri file)))
+               (fetch uri file))
            (try tail)))
       (()
        (format (current-error-port) "failed to download ~s from ~s~%"
diff --git a/nix/libstore/build.cc b/nix/libstore/build.cc
index 17e92c68a7..176ab40226 100644
--- a/nix/libstore/build.cc
+++ b/nix/libstore/build.cc
@@ -1320,6 +1320,7 @@ void DerivationGoal::tryToBuild()
         Path path = i->second.path;
         if (worker.store.isValidPath(path)) continue;
         if (!pathExists(path)) continue;
+	if (fixedOutput) continue;
         debug(format("removing invalid path `%1%'") % path);
         deletePath(path);
     }
-- 
2.24.0
L
L
Ludovic Courtès wrote on 19 Feb 2020 17:04
(name . Julien Lepiller)(address . julien@lepiller.eu)(address . 39530@debbugs.gnu.org)
874kvmmulr.fsf@gnu.org
Hi,

Julien Lepiller <julien@lepiller.eu> skribis:

Toggle quote (12 lines)
> First, I make sure that the guix daemon will not remove previously
> failed attempts when trying to build something again, when that is a
> fixed-output derivation. Then, I add a Range HTTP header when
> performing an HTTP fetch; this ensures that we only query for the part
> we don't already have, and append it to the target file.
>
> If a partial download fails, the same mirror/url is tried again, but
> the partial file is removed first, ensuring we do a complete fetch this
> time around. If that failed too, we try with the following url. If we
> only perform a complete fetch, we proceed as usual. The next url will
> be a partial fetch if there is already something locally.

Nice!

Toggle quote (5 lines)
> However, with that daemon there was a lot of new builds required to run
> guix environment guix as my user (and nothing was substituted, which
> is weird), whereas with the system's daemon, there was nothing to
> build. Maybe there's something fishy in that patch...

Hmm, that sounds really weird. Could you clarify what you did?

Toggle quote (5 lines)
>>From 332793b7f29ea68ac9a1af22e3d1c4745200da7e Mon Sep 17 00:00:00 2001
> From: Julien Lepiller <julien@lepiller.eu>
> Date: Sun, 9 Feb 2020 19:47:27 +0100
> Subject: [PATCH] guix: download: Add partial download support.

Nitpick: you can remove “guix:” from the subject.

Toggle quote (5 lines)
> * nix/libstore/build.cc (tryToBuild): Do not remove invalid fixed-output
> derivations.
> * guix/build/download.scm (http-fetch): Add a range argument.
> (url-fetch): Performa partial download if a file already exists.

[...]

Toggle quote (10 lines)
> -(define* (http-fetch uri #:key timeout (verify-certificate? #t))
> +(define* (http-fetch uri #:key timeout (verify-certificate? #t) range)
> "Return an input port containing the data at URI, and the expected number of
> bytes available or #f. When TIMEOUT is true, bail out if the connection could
> not be established in less than TIMEOUT seconds. When VERIFY-CERTIFICATE? is
> -true, verify HTTPS certificates; otherwise simply ignore them."
> +true, verify HTTPS certificates; otherwise simply ignore them. When RANGE is
> +a number, it is the number of bytes we want to skip from the data at URI;
> +otherwise the full document is requested."

I’d suggest to rename #:range to #:offset because it denotes the start
offset.

What response do we get if the server doesn’t support “Range”?

Can servers silently ignore “Range”?

Toggle quote (9 lines)
> + (if (file-exists? file)
> + (http-fetch uri
> + #:verify-certificate? verify-certificate?
> + #:timeout timeout
> + #:range (stat:size (stat file)))
> + (http-fetch uri
> + #:verify-certificate? verify-certificate?
> + #:timeout timeout))))

I’d remove the ‘if’:

(http-fetch …
#:offset (and=> (stat file #f) stat:size))

Toggle quote (8 lines)
> --- a/nix/libstore/build.cc
> +++ b/nix/libstore/build.cc
> @@ -1320,6 +1320,7 @@ void DerivationGoal::tryToBuild()
> Path path = i->second.path;
> if (worker.store.isValidPath(path)) continue;
> if (!pathExists(path)) continue;
> + if (fixedOutput) continue;

Please add a comment above explaining why fixed outputs are not deleted.

Also please: not tabs. :-)

Thanks!

Ludo’.
J
J
Julien Lepiller wrote on 19 Feb 2020 17:25
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 39530@debbugs.gnu.org)
8F45CBA3-9DCE-4BDD-A23E-543F035F87DC@lepiller.eu
Le 19 février 2020 11:04:32 GMT-05:00, "Ludovic Courtès" <ludo@gnu.org> a écrit :
Toggle quote (28 lines)
>Hi,
>
>Julien Lepiller <julien@lepiller.eu> skribis:
>
>> First, I make sure that the guix daemon will not remove previously
>> failed attempts when trying to build something again, when that is a
>> fixed-output derivation. Then, I add a Range HTTP header when
>> performing an HTTP fetch; this ensures that we only query for the
>part
>> we don't already have, and append it to the target file.
>>
>> If a partial download fails, the same mirror/url is tried again, but
>> the partial file is removed first, ensuring we do a complete fetch
>this
>> time around. If that failed too, we try with the following url. If we
>> only perform a complete fetch, we proceed as usual. The next url will
>> be a partial fetch if there is already something locally.
>
>Nice!
>
>> However, with that daemon there was a lot of new builds required to
>run
>> guix environment guix as my user (and nothing was substituted, which
>> is weird), whereas with the system's daemon, there was nothing to
>> build. Maybe there's something fishy in that patch...
>
>Hmm, that sounds really weird. Could you clarify what you did?

Actually I'm not sure how to test the new daemon. I ran the guix daemon from a checkout and probably forgot to set sysconfdir to /etc, so it was missing authorized keys, hence no substitutes.

Toggle quote (73 lines)
>
>>>From 332793b7f29ea68ac9a1af22e3d1c4745200da7e Mon Sep 17 00:00:00
>2001
>> From: Julien Lepiller <julien@lepiller.eu>
>> Date: Sun, 9 Feb 2020 19:47:27 +0100
>> Subject: [PATCH] guix: download: Add partial download support.
>
>Nitpick: you can remove “guix:” from the subject.
>
>> * nix/libstore/build.cc (tryToBuild): Do not remove invalid
>fixed-output
>> derivations.
>> * guix/build/download.scm (http-fetch): Add a range argument.
>> (url-fetch): Performa partial download if a file already exists.
>
>[...]
>
>> -(define* (http-fetch uri #:key timeout (verify-certificate? #t))
>> +(define* (http-fetch uri #:key timeout (verify-certificate? #t)
>range)
>> "Return an input port containing the data at URI, and the expected
>number of
>> bytes available or #f. When TIMEOUT is true, bail out if the
>connection could
>> not be established in less than TIMEOUT seconds. When
>VERIFY-CERTIFICATE? is
>> -true, verify HTTPS certificates; otherwise simply ignore them."
>> +true, verify HTTPS certificates; otherwise simply ignore them. When
>RANGE is
>> +a number, it is the number of bytes we want to skip from the data at
>URI;
>> +otherwise the full document is requested."
>
>I’d suggest to rename #:range to #:offset because it denotes the start
>offset.
>
>What response do we get if the server doesn’t support “Range”?
>
>Can servers silently ignore “Range”?
>
>> + (if (file-exists? file)
>> + (http-fetch uri
>> + #:verify-certificate?
>verify-certificate?
>> + #:timeout timeout
>> + #:range (stat:size (stat file)))
>> + (http-fetch uri
>> + #:verify-certificate?
>verify-certificate?
>> + #:timeout timeout))))
>
>I’d remove the ‘if’:
>
> (http-fetch …
> #:offset (and=> (stat file #f) stat:size))
>
>> --- a/nix/libstore/build.cc
>> +++ b/nix/libstore/build.cc
>> @@ -1320,6 +1320,7 @@ void DerivationGoal::tryToBuild()
>> Path path = i->second.path;
>> if (worker.store.isValidPath(path)) continue;
>> if (!pathExists(path)) continue;
>> + if (fixedOutput) continue;
>
>Please add a comment above explaining why fixed outputs are not
>deleted.
>
>Also please: not tabs. :-)
>
>Thanks!
>
>Ludo’.

Thanks!
L
L
Ludovic Courtès wrote on 19 Feb 2020 23:07
(name . Julien Lepiller)(address . julien@lepiller.eu)(address . 39530@debbugs.gnu.org)
87v9o2kz8c.fsf@gnu.org
Julien Lepiller <julien@lepiller.eu> skribis:

Toggle quote (2 lines)
> Actually I'm not sure how to test the new daemon. I ran the guix daemon from a checkout and probably forgot to set sysconfdir to /etc, so it was missing authorized keys, hence no substitutes.

Ah yes. This should work:

sudo -E ./pre-inst-env guix-daemon --build-users-group=… …

Ludo’.
?