Channel clones lack SWH fallback

  • Done
  • quality assurance status badge
Details
2 participants
  • Ludovic Courtès
  • zimoun
Owner
unassigned
Submitted by
zimoun
Severity
important
Z
Z
zimoun wrote on 24 Oct 2020 00:17
whishlist: time-machine --channel falls back to SWH
(address . bug-guix@gnu.org)
86pn581t9s.fsf@gmail.com
Dear,

Let’s describe the use case. Consider that:

guix time-machine -C channels -- install foo

is provided in some documentation, say scientific paper. Where the
channels.scm file is completly described:

Toggle snippet (7 lines)
(list (channel
(name 'kikoo)
(url "https://example.org/that-great.git")
(commit
"353bdae32f72b720c7ddd706576ccc40e2b43f95")))

In the future, if https://example.org/that-great.gitdisappears, then
build/install the package ’foo’ is becoming difficult, nor impossible.

However, let’s consider that the repo ’that-great’ had been saved in SWH
(say manually); since it is a regular Git repo. Guix should be able to
fallback to it transparently.


Obviously, another whislist is to have something to ease the save
request of the channel on SWH. Maybe this latter could be part of the
several-times discussed “guix channel” subcommand. :-)


All the best,
simon
L
L
Ludovic Courtès wrote on 5 Mar 2021 15:14
control message for bug #44187
(address . control@debbugs.gnu.org)
878s71lmbq.fsf@gnu.org
severity 44187 important
quit
L
L
Ludovic Courtès wrote on 5 Mar 2021 15:14
(address . control@debbugs.gnu.org)
877dmllmas.fsf@gnu.org
retitle 44187 Channel clones lack SWH fallback
quit
L
L
Ludovic Courtès wrote on 5 Mar 2021 15:51
Re: bug#44187: whishlist: time-machine --channel falls back to SWH
(name . zimoun)(address . zimon.toutoune@gmail.com)(address . 44187@debbugs.gnu.org)
87pn0dk61v.fsf@gnu.org
Hi,

zimoun <zimon.toutoune@gmail.com> skribis:

Toggle quote (20 lines)
> Let’s describe the use case. Consider that:
>
> guix time-machine -C channels -- install foo
>
> is provided in some documentation, say scientific paper. Where the
> channels.scm file is completly described:
>
> (list (channel
> (name 'kikoo)
> (url "https://example.org/that-great.git")
> (commit
> "353bdae32f72b720c7ddd706576ccc40e2b43f95")))
>
> In the future, if https://example.org/that-great.git disappears, then
> build/install the package ’foo’ is becoming difficult, nor impossible.
>
> However, let’s consider that the repo ’that-great’ had been saved in SWH
> (say manually); since it is a regular Git repo. Guix should be able to
> fallback to it transparently.

I went head-down to add SWH fallback to ‘latest-repository-commit’… but
that’s of no use because (guix channels) wants a complete clone so that
it can determine commit relations (to detect downgrades).

The SWH vault gives access to checkouts primarily, but it’s also
possible to get a full repo in ‘git fast-import’ format, which is what
we need:


However, this API will be eventually replaced by some other solution say
SWH developers, possibly a bare Git repo export, so it may not be a good
idea to build upon it.

If we were able, using the SWH API, to map “revisions” to “origins”, we
could find potential mirrors hosting a given commit, but apparently
that’s not possible.

To be continued…

Ludo’.
Toggle diff (72 lines)
diff --git a/guix/git.scm b/guix/git.scm
index a5103547d3..449011c51a 100644
--- a/guix/git.scm
+++ b/guix/git.scm
@@ -32,6 +32,7 @@
#:use-module (guix records)
#:use-module (guix gexp)
#:use-module (guix sets)
+ #:autoload (guix swh) (swh-download)
#:use-module ((guix diagnostics) #:select (leave))
#:use-module (guix progress)
#:use-module (rnrs bytevectors)
@@ -459,22 +460,43 @@ Log progress and checkout info to LOG-PORT."
(eq? 'regular (stat:type stat))))))
(format log-port "updating checkout of '~a'...~%" url)
- (let*-values
- (((checkout commit _)
- (update-cached-checkout url
- #:recursive? recursive?
- #:ref ref
- #:cache-directory
- (url-cache-directory url cache-directory
- #:recursive?
- recursive?)
- #:log-port log-port))
- ((name)
- (url+commit->name url commit)))
- (format log-port "retrieved commit ~a~%" commit)
- (values (add-to-store store name #t "sha256" checkout
- #:select? (negate dot-git?))
- commit)))
+
+ (catch 'git-error
+ (lambda ()
+ (let*-values
+ (((checkout commit _)
+ (update-cached-checkout (pk 'l-r-c url)
+ #:recursive? recursive?
+ #:ref ref
+ #:cache-directory
+ (url-cache-directory url cache-directory
+ #:recursive?
+ recursive?)
+ #:log-port log-port))
+ ((name)
+ (url+commit->name url commit)))
+ (format log-port "retrieved commit ~a~%" commit)
+ (values (add-to-store store name #t "sha256" checkout
+ #:select? (negate dot-git?))
+ commit)))
+ (lambda (key err . rest)
+ ;; XXX: 'swh-download' currently doesn't support submodules.
+ (when recursive?
+ (apply throw key err rest))
+
+ (pk 'err key err rest)
+ (match ref
+ (('commit . commit)
+ ;; Attempt to fetch COMMIT from SWH.
+ (call-with-temporary-directory
+ (lambda (directory)
+ (unless (swh-download url commit directory)
+ (apply throw key err rest))
+ (values (add-to-store store (url+commit->name url commit)
+ #t "sha256" directory)
+ commit))))
+ (_
+ (apply throw key err rest))))))
(define (print-git-error port key args default-printer)
(match args
L
L
Ludovic Courtès wrote on 10 Sep 2021 16:34
[PATCH 0/3] Fall back to Software Heritage (SWH) for Git clones
(address . 44187@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210910143415.14783-1-ludo@gnu.org
Hi!

A bit of context: we already had automatic SWH fallback for Git checkouts,
which is to say that any origin that uses ‘git-fetch’ would have its
checkout transparently fetched from SWH if upstream vanished (this
dates back to commit 608d3dca89d73fe7260e97a284a8aeea756a3e11, Nov. 2018).

What this patch series provides is SWH fallback for full Git clones (as
opposed to flat checkouts). It works for anything that uses (guix git).
That includes <git-checkout>, used by transformation options:

Toggle snippet (40 lines)
$ ./pre-inst-env guix build footswitch --with-git-url=footswitch=http://example.org/sdf --with-commit=footswitch=1eabc563ca5692b3e08d84f1f0e6fd2283284469 -n
updating checkout of 'http://example.org/sdf'...
SWH: found revision 1eabc563ca5692b3e08d84f1f0e6fd2283284469 with directory at 'https://archive.softwareheritage.org/api/1/directory/ad8976564375ee55f645387bbcdf4b66e6582fbf/'
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/HEAD
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/branches/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/config
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/description
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/applypatch-msg.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/commit-msg.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/fsmonitor-watchman.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/post-update.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/pre-applypatch.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/pre-commit.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/pre-push.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/pre-rebase.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/pre-receive.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/prepare-commit-msg.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/hooks/update.sample
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/info/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/info/exclude
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/info/refs
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/objects/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/objects/info/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/objects/info/packs
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/objects/pack/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/objects/pack/pack-ed28f44a2599fe2d0a5f1b1a84c247c43afd14a1.idx
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/objects/pack/pack-ed28f44a2599fe2d0a5f1b1a84c247c43afd14a1.pack
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/refs/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/refs/heads/
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/refs/heads/master
swh:1:rev:1eabc563ca5692b3e08d84f1f0e6fd2283284469.git/refs/tags/
retrieved commit 1eabc563ca5692b3e08d84f1f0e6fd2283284469
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
substitute: updating substitutes from 'https://bayfront.guix.gnu.org'... 100.0%
The following derivation would be built:
/gnu/store/39kzsy5kgj5150q6zgckc2hbxp999adw-footswitch-git.1eabc56.drv

In the example above, we pass a bogus Git URL, but since the target
commit is known, (guix git) automatically fetches a bare Git repository
from the SWH vault.

It also works for channels, which is what zimoun reported here:

Toggle snippet (46 lines)
$ cat /tmp/chan.scm
(list (channel
(name 'guix)
(url "https://git.savannah.gnu.org/git/guix.git")
(commit
"f91ae9425bb385b60396a544afe27933896b8fa3")
(introduction
(make-channel-introduction
"9edb3f66fd807b096b48283debdcddccfea34bad"
(openpgp-fingerprint
"BBB0 2DDF 2CEA F6A8 0D1D E643 A2A0 6DF2 A33A 54FA"))))
(channel
(name 'guix-past)
(url "https://does-not-exist.inria.fr/guix-hpc/guix-past")
(commit "77e183dc7ade307ad3409fad4b71f12e266de910")
#;(introduction
(make-channel-introduction
"0c119db2ea86a389769f4d2b9c6f5c41c027e336"
(openpgp-fingerprint
"3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5")))))
$ ./pre-inst-env guix time-machine -C /tmp/chan.scm -- describe
Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...
Updating channel 'guix-past' from Git repository at 'https://does-not-exist.inria.fr/guix-hpc/guix-past'...
SWH: found revision 77e183dc7ade307ad3409fad4b71f12e266de910 with directory at 'https://archive.softwareheritage.org/api/1/directory/7c6aa10e1e0fa54199566145c6a453731872b87d/'
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/HEAD
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/branches/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/config
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/description
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/hooks/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/info/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/info/exclude
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/info/refs
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/objects/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/objects/info/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/objects/info/packs
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/objects/pack/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/objects/pack/pack-e6c0a4813509178eed735708dd60503353a50b9c.idx
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/objects/pack/pack-e6c0a4813509178eed735708dd60503353a50b9c.pack
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/refs/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/refs/heads/
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/refs/heads/master
swh:1:rev:77e183dc7ade307ad3409fad4b71f12e266de910.git/refs/tags/
Computing Guix derivation for 'x86_64-linux'... \ C-c C-c

Here, the ‘guix-past’ channel is transparently cloned from SWH. This
is pretty cool, because having the whole repo around is what permits
things like downgrade prevention¹ and news support².

Finally we can enjoy content-addressability and brittle URLs
are becoming a thing of the past!*


Limitations
~~~~~~~~~~~~

Yes, there’s a couple of them.

First, fallback is implemented only for fresh clones, not for updates.
Thus, if I rerun the first example, having now the clone in
~/.cache/guix/checkouts, with a different commit, I get:

Toggle snippet (5 lines)
$ ./pre-inst-env guix build footswitch --with-git-url=footswitch=http://example.org/sdf --with-commit=footswitch=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa -n
updating checkout of 'http://example.org/sdf'...
guix build: error: Git failure while fetching http://example.org/sdf: unexpected http status code: 404

Second, clones from SWH only contain the one branch that the revision
is on. For channels, that means that the ‘keyring’ branch is not fetched,
which is why I commented out ‘introduction’ in /tmp/chan.scm above.
If I uncomment it, I get:

Toggle snippet (6 lines)
$ ./pre-inst-env guix time-machine -C /tmp/chan.scm -- describe
Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...
Updating channel 'guix-past' from Git repository at 'https://does-not-exist.inria.fr/guix-hpc/guix-past'...
guix time-machine: error: Git error: cannot locate remote-tracking branch 'origin/keyring'

The SWH folks tell me it’ll eventually be possible to map a revision
to its containing snapshot(s) via the HTTP API, and to obtain entire
snapshots (i.e., the repo and all its branches) from the vault. That’s
what we need to fix this issue.

*Third, and this answers the asterisk above, we must keep in mind that
this is content-addressibility *with SHA1*. Generating a chosen-prefix
collision is becoming affordable³, so users absolutely need an additional
mechanism to authenticate code they fetched.

For origins, we have the content SHA256, so we’re fine. For channels,
we have Guix’s authentication mechanism¹, except it’s not available yet
via SWH, as I wrote above. For the footswitch example above using
‘--with-commit’, we don’t have any authentication method, but in fact,
that’s the situation of Git repositories in general: they can rarely be
authenticated.

Overall, I think it’s a step in the right direction.

Thoughts?

Thanks to vlorentz and olasd on #swh-devel for their support!

Thanks,
Ludo’.


Ludovic Courtès (3):
swh: Support downloads of bare Git repositories.
git: 'update-cached-checkout' can fall back to SWH when cloning.
git: 'reference-available?' recognizes 'tag-or-commit'.

guix/git.scm | 45 +++++++++++++++++++++++++++++++++++++++++++--
guix/swh.scm | 52 ++++++++++++++++++++++++++++++++++++++++------------
2 files changed, 83 insertions(+), 14 deletions(-)

--
2.33.0
L
L
Ludovic Courtès wrote on 10 Sep 2021 16:34
[PATCH 1/3] swh: Support downloads of bare Git repositories.
(address . 44187@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludovic.courtes@inria.fr)
20210910143415.14783-2-ludo@gnu.org
From: Ludovic Courtès <ludovic.courtes@inria.fr>

* guix/swh.scm (swh-download-archive): New procedure.
(swh-download-directory): Rewrite in terms of 'swh-download-archive'.
(swh-download): Add #:archive-type and honor it. Use
'swh-download-archive' instead of 'swh-download-directory'.
---
guix/swh.scm | 52 ++++++++++++++++++++++++++++++++++++++++------------
1 file changed, 40 insertions(+), 12 deletions(-)

Toggle diff (91 lines)
diff --git a/guix/swh.scm b/guix/swh.scm
index 3d5d2a410a..707551a799 100644
--- a/guix/swh.scm
+++ b/guix/swh.scm
@@ -645,20 +645,29 @@ delete it when leaving the dynamic extent of this call."
(lambda ()
(false-if-exception (delete-file-recursively tmp-dir))))))
-(define* (swh-download-directory id output
- #:key (log-port (current-error-port)))
- "Download from Software Heritage the directory with the given ID, and
-unpack it to OUTPUT. Return #t on success and #f on failure"
+(define* (swh-download-archive swhid output
+ #:key
+ (archive-type 'flat)
+ (log-port (current-error-port)))
+ "Download from Software Heritage the directory or revision with the given
+SWID, in the ARCHIVE-TYPE format (one of 'flat or 'git-bare), and unpack it to
+OUTPUT. Return #t on success and #f on failure."
(call-with-temporary-directory
(lambda (directory)
- (match (vault-fetch id 'directory #:log-port log-port)
+ (match (vault-fetch swhid
+ #:archive-type archive-type
+ #:log-port log-port)
(#f
(format log-port
- "SWH: directory ~a could not be fetched from the vault~%"
- id)
+ "SWH: object ~a could not be fetched from the vault~%"
+ swhid)
#f)
((? port? input)
- (let ((tar (open-pipe* OPEN_WRITE "tar" "-C" directory "-xzvf" "-")))
+ (let ((tar (open-pipe* OPEN_WRITE "tar" "-C" directory
+ (match archive-type
+ ('flat "-xzvf") ;gzipped
+ ('git-bare "-xvf")) ;uncompressed
+ "-")))
(dump-port input tar)
(close-port input)
(let ((status (close-pipe tar)))
@@ -672,6 +681,14 @@ unpack it to OUTPUT. Return #t on success and #f on failure"
#:log (%make-void-port "w"))
#t))))))))
+(define* (swh-download-directory id output
+ #:key (log-port (current-error-port)))
+ "Download from Software Heritage the directory with the given ID, and
+unpack it to OUTPUT. Return #t on success and #f on failure."
+ (swh-download-archive (string-append "swh:1:dir:" id) output
+ #:archive-type 'flat
+ #:log-port log-port))
+
(define (commit-id? reference)
"Return true if REFERENCE is likely a commit ID, false otherwise---e.g., if
it is a tag name. This is based on a simple heuristic so use with care!"
@@ -679,8 +696,11 @@ it is a tag name. This is based on a simple heuristic so use with care!"
(string-every char-set:hex-digit reference)))
(define* (swh-download url reference output
- #:key (log-port (current-error-port)))
- "Download from Software Heritage a checkout of the Git tag or commit
+ #:key
+ (archive-type 'flat)
+ (log-port (current-error-port)))
+ "Download from Software Heritage a checkout (if ARCHIVE-TYPE is 'flat) or a
+full Git repository (if ARCHIVE-TYPE is 'git-bare) of the Git tag or commit
REFERENCE originating from URL, and unpack it in OUTPUT. Return #t on success
and #f on failure.
@@ -694,7 +714,15 @@ wait until it becomes available, which could take several minutes."
(format log-port "SWH: found revision ~a with directory at '~a'~%"
(revision-id revision)
(swh-url (revision-directory-url revision)))
- (swh-download-directory (revision-directory revision) output
- #:log-port log-port))
+ (swh-download-archive (match archive-type
+ ('flat
+ (string-append
+ "swh:1:dir:" (revision-directory revision)))
+ ('git-bare
+ (string-append
+ "swh:1:rev:" (revision-id revision))))
+ output
+ #:archive-type archive-type
+ #:log-port log-port))
(#f
#f)))
--
2.33.0
L
L
Ludovic Courtès wrote on 10 Sep 2021 16:34
[PATCH 3/3] git: 'reference-available?' recognizes 'tag-or-commit'.
(address . 44187@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210910143415.14783-4-ludo@gnu.org
* guix/git.scm (reference-available?): Handle 'tag-or-commit' with a
40-digit hex string.
---
guix/git.scm | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

Toggle diff (25 lines)
diff --git a/guix/git.scm b/guix/git.scm
index 377e09888a..33a111b84a 100644
--- a/guix/git.scm
+++ b/guix/git.scm
@@ -36,7 +36,7 @@
#:use-module (guix sets)
#:use-module ((guix diagnostics) #:select (leave))
#:use-module (guix progress)
- #:autoload (guix swh) (swh-download)
+ #:autoload (guix swh) (swh-download commit-id?)
#:use-module (rnrs bytevectors)
#:use-module (ice-9 format)
#:use-module (ice-9 match)
@@ -340,7 +340,8 @@ dynamic extent of EXP."
"Return true if REF, a reference such as '(commit . \"cabba9e\"), is
definitely available in REPOSITORY, false otherwise."
(match ref
- (('commit . commit)
+ ((or ('commit . commit)
+ ('tag-or-commit . (? commit-id? commit)))
(let ((len (string-length commit))
(oid (string->oid commit)))
(false-if-git-not-found
--
2.33.0
L
L
Ludovic Courtès wrote on 10 Sep 2021 16:34
[PATCH 2/3] git: 'update-cached-checkout' can fall back to SWH when cloning.
(address . 44187@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludovic.courtes@inria.fr)
20210910143415.14783-3-ludo@gnu.org
From: Ludovic Courtès <ludovic.courtes@inria.fr>

Reported by zimoun <zimon.toutoune@gmail.com>.

* guix/git.scm (GITERR_HTTP): New variable.
(clone-from-swh, clone/swh-fallback): New procedures.
(update-cached-checkout): Use 'clone/swh-fallback' instead of 'clone*'.
---
guix/git.scm | 42 +++++++++++++++++++++++++++++++++++++++++-
1 file changed, 41 insertions(+), 1 deletion(-)

Toggle diff (76 lines)
diff --git a/guix/git.scm b/guix/git.scm
index acc48fd12f..377e09888a 100644
--- a/guix/git.scm
+++ b/guix/git.scm
@@ -36,6 +36,7 @@
#:use-module (guix sets)
#:use-module ((guix diagnostics) #:select (leave))
#:use-module (guix progress)
+ #:autoload (guix swh) (swh-download)
#:use-module (rnrs bytevectors)
#:use-module (ice-9 format)
#:use-module (ice-9 match)
@@ -180,6 +181,13 @@ the 'SSL_CERT_FILE' and 'SSL_CERT_DIR' environment variables."
(lambda args
(make-fetch-options auth-method)))))
+(define GITERR_HTTP
+ ;; Guile-Git <= 0.5.2 lacks this constant.
+ (let ((errors (resolve-interface '(git errors))))
+ (if (module-defined? errors 'GITERR_HTTP)
+ (module-ref errors 'GITERR_HTTP)
+ 34)))
+
(define (clone* url directory)
"Clone git repository at URL into DIRECTORY. Upon failure,
make sure no empty directory is left behind."
@@ -342,6 +350,38 @@ definitely available in REPOSITORY, false otherwise."
(_
#f)))
+(define (clone-from-swh url tag-or-commit output)
+ "Attempt to clone TAG-OR-COMMIT (a string), which originates from URL, using
+a copy archived at Software Heritage."
+ (call-with-temporary-directory
+ (lambda (bare)
+ (and (swh-download url tag-or-commit bare
+ #:archive-type 'git-bare)
+ (let ((repository (clone* bare output)))
+ (remote-set-url! repository "origin" url)
+ repository)))))
+
+(define (clone/swh-fallback url ref cache-directory)
+ "Like 'clone', but fallback to Software Heritage if the repository cannot be
+found at URL."
+ (define (inaccessible-url-error? err)
+ (let ((class (git-error-class err))
+ (code (git-error-code err)))
+ (or (= class GITERR_HTTP) ;404 or similar
+ (= class GITERR_NET)))) ;unknown host, etc.
+
+ (catch 'git-error
+ (lambda ()
+ (clone* url cache-directory))
+ (lambda (key err)
+ (match ref
+ (((or 'commit 'tag-or-commit) . commit)
+ (if (inaccessible-url-error? err)
+ (or (clone-from-swh url commit cache-directory)
+ (throw key err))
+ (throw key err)))
+ (_ (throw key err))))))
+
(define cached-checkout-expiration
;; Return the expiration time procedure for a cached checkout.
;; TODO: Honor $GUIX_GIT_CACHE_EXPIRATION.
@@ -408,7 +448,7 @@ it unchanged."
(let* ((cache-exists? (openable-repository? cache-directory))
(repository (if cache-exists?
(repository-open cache-directory)
- (clone* url cache-directory))))
+ (clone/swh-fallback url ref cache-directory))))
;; Only fetch remote if it has not been cloned just before.
(when (and cache-exists?
(not (reference-available? repository ref)))
--
2.33.0
Z
Z
zimoun wrote on 13 Sep 2021 18:07
Re: bug#44187: [PATCH 0/3] Fall back to Software Heritage (SWH) for Git clones
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 44187@debbugs.gnu.org)
CAJ3okZ3-6dXBjampGu9UTtY8f3xnmKoqGsJvsfu1-wrSZ2zUZQ@mail.gmail.com
Hi Ludo,

Cool! However, the patch does not apply on the top of 53f54d4aa2.
That's why the option '--base' of "git format-patch" is really helpful. ;-)

Onto which commit does the patch set apply? In order to try and review. :-)

Cheers,
simon
L
L
Ludovic Courtès wrote on 14 Sep 2021 15:37
(name . zimoun)(address . zimon.toutoune@gmail.com)(address . 44187@debbugs.gnu.org)
871r5r5ldr.fsf@gnu.org
Hello,

zimoun <zimon.toutoune@gmail.com> skribis:

Toggle quote (3 lines)
> Cool! However, the patch does not apply on the top of 53f54d4aa2.
> That's why the option '--base' of "git format-patch" is really helpful. ;-)

Ah! It should apply on top of ff613c2b68aac539262822490448e637d8f315ba.

If not, I can rebase it and send an updated patch (I’ve been fiddling
with code in this area lately…).

Thanks,
Ludo’.
Z
Z
zimoun wrote on 17 Sep 2021 10:02
Re: bug#44187: Channel clones lack SWH fallback
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 44187@debbugs.gnu.org)
86o88r1vfe.fsf@gmail.com
Hi,

On ven., 10 sept. 2021 at 16:34, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (3 lines)
> Finally we can enjoy content-addressability and brittle URLs
> are becoming a thing of the past!*

Yeah, it is awesome!

The original URL of the channel was:
defines a package where the upstream has also disappeared
package definition is not bogus… but using one was already working. :-)

All is saved on SWH, so now all is transparent! From my point of view,
this is a killer feature for scientific folks. :-)

Toggle snippet (89 lines)
$ cat /tmp/channels.scm
(list (channel
(name 'guix)
(url "/home/sitour/src/guix/guix")
(branch "fix-44187")
(commit
"cdea76a2fdaf7705583a02081a6468d436b8df05"))
(channel
(name 'example)
(url "https://example.org/foo.git")
(commit
"67c9f2143aa6f545419ae913b4ae02af4cd3effc")))

$ ./pre-inst-env guix time-machine -C /tmp/channels.scm --disable-authentication -- build hi
Updating channel 'guix' from Git repository at '/home/sitour/src/guix/guix'...
guix time-machine: warning: channel authentication disabled
Updating channel 'example' from Git repository at 'https://example.org/foo.git'...
SWH: found revision 67c9f2143aa6f545419ae913b4ae02af4cd3effc with directory at 'https://archive.softwareheritage.org/api/1/directory/fe423e88ce277d3fc230c88d408e42b14a3a458c/'
SWH vault: requested bundle cooking, waiting for completion...
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/HEAD
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/branches/
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/config
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/description
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/hooks/
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/info/
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/info/exclude
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/info/refs
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/objects/
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/objects/info/
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/objects/info/packs
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/objects/pack/
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/objects/pack/pack-4e9279a1b64e4dda7bd9d84bb6b50bb1f80def08.idx
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/objects/pack/pack-4e9279a1b64e4dda7bd9d84bb6b50bb1f80def08.pack
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/refs/
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/refs/heads/
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/refs/heads/master
swh:1:rev:67c9f2143aa6f545419ae913b4ae02af4cd3effc.git/refs/tags/
guix time-machine: warning: channel authentication disabled

[...]

Computing Guix derivation for 'x86_64-linux'... -

[...]

construction de /gnu/store/6g9qlysbbk7p4609xrv82j0wzbib1y4r-git-checkout.drv...
guile: warning: failed to install locale
environment variable `PATH' set to `/gnu/store/378zjf2kgajcfd7mfr98jn5xyc5wa3qv-gzip-1.10/bin:/gnu/store/sf3rbvb6iqcphgm1afbplcs72hsywg25-tar-1.32/bin'
hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint:
hint: git config --global init.defaultBranch <name>
hint:
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint:
hint: git branch -m <name>
Initialized empty Git repository in /gnu/store/884nsva9r8wkp40kbqyvpj1ad57jc5dd-git-checkout/.git/
fatal: could not read Username for 'https://github.com': No such device or address
Failed to do a shallow fetch; retrying a full fetch...
fatal: could not read Username for 'https://github.com': No such device or address
git-fetch: '/gnu/store/5vai7bfrfkzv22dx13bxpszjrqyi78x6-git-minimal-2.33.0/bin/git fetch origin' failed with exit code 128
Trying content-addressed mirror at berlin.guix.gnu.org...
Trying content-addressed mirror at berlin.guix.gnu.org...
Trying to download from Software Heritage...
SWH: found revision e1eefd033b8a2c4c81babc6fde08ebb116c6abb8 with directory at 'https://archive.softwareheritage.org/api/1/directory/c3e538ed2de412d54c567ed7c8cfc46cbbc35d07/'
swh:1:dir:c3e538ed2de412d54c567ed7c8cfc46cbbc35d07/
swh:1:dir:c3e538ed2de412d54c567ed7c8cfc46cbbc35d07/ABOUT-NLS
swh:1:dir:c3e538ed2de412d54c567ed7c8cfc46cbbc35d07/AUTHORS
swh:1:dir:c3e538ed2de412d54c567ed7c8cfc46cbbc35d07/COPYING

[...]

swh:1:dir:c3e538ed2de412d54c567ed7c8cfc46cbbc35d07/tests/hello-1
swh:1:dir:c3e538ed2de412d54c567ed7c8cfc46cbbc35d07/tests/last-1
swh:1:dir:c3e538ed2de412d54c567ed7c8cfc46cbbc35d07/tests/traditional-1
construction de /gnu/store/6g9qlysbbk7p4609xrv82j0wzbib1y4r-git-checkout.drv réussie
construction de /gnu/store/jx1r7w8xaw768176pjl0j0q1l1529w75-hi-2.10.drv...
starting phase `set-SOURCE-DATE-EPOCH'
phase `set-SOURCE-DATE-EPOCH' succeeded after 0.0 seconds

[...]

construction de /gnu/store/jx1r7w8xaw768176pjl0j0q1l1529w75-hi-2.10.drv réussie
/gnu/store/jn8d031zx4znxy7s5zhj4dbr6xjsfq9v-hi-2.10

Well, it still misses the tarball and non-Git fetch method fallback and
the story will be more than awesome! :-)

Toggle quote (5 lines)
> Limitations
> ~~~~~~~~~~~~
>
> Yes, there’s a couple of them.

Well, yes some limitations but not so much. ;-)


Toggle quote (4 lines)
> First, fallback is implemented only for fresh clones, not for updates.
> Thus, if I rerun the first example, having now the clone in
> ~/.cache/guix/checkouts, with a different commit, I get:

SWH is not a forge but an archive. :-) Therefore, this update case does
not make sense to me. I mean,

Toggle snippet (4 lines)
$ git -C ~/.cache/guix/checkouts/6k7wvrcpbdsw3pje5b4squybw3jfn3viyrj7gcl7fipa5yjflaza fetch
fatal: dépôt 'http://example.org/sdf/' non trouvé

Well, maybe this cache could be removed if the commit is not found
inside this cache and retry to fetch it from SWH. Obviously, the
downdate case works.

Note that on fresh clone, the error message could be improved:

Toggle snippet (5 lines)
$ ./pre-inst-env guix build guix --with-git-url=guix=https://example.org --with-commit=guix=ff613c2b68aac539262822490448e637d8f315ba -n
updating checkout of 'https://example.org'...
guix build: error: Git failure while fetching https://example.org: unexpected http status code: 404

where https://example.orgis bogus and
ff613c2b68aac539262822490448e637d8f315ba is not yet archived on SWH. It
could be nice to warn in addition to the 404 that it is not found in
SWH. WDYT?


Toggle quote (4 lines)
> Second, clones from SWH only contain the one branch that the revision
> is on. For channels, that means that the ‘keyring’ branch is not fetched,
> which is why I commented out ‘introduction’ in /tmp/chan.scm above.

To me, it is not an issue. Because you reach a commit from the past
knowing the hash.

Aside my opinion, I wanted to know which kind of metadata we get back
from the Git repo, so I tried:

Toggle snippet (8 lines)
$ guix build guix --with-git-url=guix=https://example.org --with-commit=guix=c75b30d58f0becb0a5cd6a8bfe69d1063b0d1ada -n
updating checkout of 'https://example.org'...
SWH: found revision c75b30d58f0becb0a5cd6a8bfe69d1063b0d1ada with directory at 'https://archive.softwareheritage.org/api/1/directory/ca2e8a7222b4850c7bea935dff86b9c2a905efd6/'
SWH vault: requested bundle cooking, waiting for completion...
SWH vault: Processing...
[...]

then after several hours, I get this:

Toggle snippet (6 lines)
SWH vault: failure: Internal Server Error. This incident will be reported.
SWH vault: retrying...
SWH vault: requested bundle cooking, waiting for completion...
SWH vault: Processing...

and after more than 12h, the status is still: «SWH vault: Processing...»
and nothing is complete.

About this ’keyring’ branch, somehow it could be as a separated repo, so
why not effectively do it. :-) I mean, get the branch as it is and
mirror this branch in another Git repo saved on SWH; fallback to it if
’keyring’ branch is not there. I do not know… Or simply wait that SWH
improves their things. :-)


Toggle quote (12 lines)
> *Third, and this answers the asterisk above, we must keep in mind that
> this is content-addressibility *with SHA1*. Generating a chosen-prefix
> collision is becoming affordable³, so users absolutely need an additional
> mechanism to authenticate code they fetched.
>
> For origins, we have the content SHA256, so we’re fine. For channels,
> we have Guix’s authentication mechanism¹, except it’s not available yet
> via SWH, as I wrote above. For the footswitch example above using
> ‘--with-commit’, we don’t have any authentication method, but in fact,
> that’s the situation of Git repositories in general: they can rarely be
> authenticated.

How a chosen-prefix attack could work here? I understand why the second
preimage attack is an issue. But I miss how the SHA-1 chosen-prefix attack
could be exploited here to compromise the user, because this hash is provided
by this very same user.


Toggle quote (5 lines)
> Ludovic Courtès (3):
> swh: Support downloads of bare Git repositories.
> git: 'update-cached-checkout' can fall back to SWH when cloning.
> git: 'reference-available?' recognizes 'tag-or-commit'.

LGTM!

Cheers,
simon
Z
Z
zimoun wrote on 17 Sep 2021 19:31
(name . Ludovic Courtès)(address . ludo@gnu.org)
874kajglbo.fsf_-_@gmail.com
Hi Ludo,

The patch LGTM although there is a redundancy, from my understanding.

On Fri, 10 Sep 2021 at 16:34, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (14 lines)
> @@ -694,7 +714,15 @@ wait until it becomes available, which could take several minutes."
> (format log-port "SWH: found revision ~a with directory at '~a'~%"
> (revision-id revision)
> (swh-url (revision-directory-url revision)))
> - (swh-download-directory (revision-directory revision) output
> - #:log-port log-port))
> + (swh-download-archive (match archive-type
> + ('flat
> + (string-append
> + "swh:1:dir:" (revision-directory revision)))
> + ('git-bare
> + (string-append
> + "swh:1:rev:" (revision-id revision))))

Here the ’swid’ depends on the ’archive-type’…

Toggle quote (3 lines)
> + output
> + #:archive-type archive-type

…which is also passed. Then this is propagated. For instance,
’swh-download-directory’:

Toggle quote (9 lines)
> +(define* (swh-download-directory id output
> + #:key (log-port (current-error-port)))
> + "Download from Software Heritage the directory with the given ID, and
> +unpack it to OUTPUT. Return #t on success and #f on failure."
> + (swh-download-archive (string-append "swh:1:dir:" id) output
> + #:archive-type 'flat
> + #:log-port log-port))
> +

Does it make sense to pass this ’swhid’ equal to ’swh:1:rev’ with the
’flat’ archive-type? Another instance is,

Toggle quote (4 lines)
> + (match (vault-fetch swhid
> + #:archive-type archive-type
> + #:log-port log-port)

and from my understanding, again ’swhid’ depends on ’archive-type’.
Therefore, it prone error. The best seems to pass ’(archive-type
. swhid)’ and pattern-match on that. Yeah, it potentially breaks the
public API… but there is no claim about stability (and I am not
convinced this (guix swh) module is used outside Guix :-)).



Cheers,
simon
L
L
Ludovic Courtès wrote on 18 Sep 2021 12:05
(name . zimoun)(address . zimon.toutoune@gmail.com)(address . 44187@debbugs.gnu.org)
8735q2urjv.fsf@gnu.org
Hi!

zimoun <zimon.toutoune@gmail.com> skribis:

Toggle quote (45 lines)
> The patch LGTM although there is a redundancy, from my understanding.
>
> On Fri, 10 Sep 2021 at 16:34, Ludovic Courtès <ludo@gnu.org> wrote:
>
>> @@ -694,7 +714,15 @@ wait until it becomes available, which could take several minutes."
>> (format log-port "SWH: found revision ~a with directory at '~a'~%"
>> (revision-id revision)
>> (swh-url (revision-directory-url revision)))
>> - (swh-download-directory (revision-directory revision) output
>> - #:log-port log-port))
>> + (swh-download-archive (match archive-type
>> + ('flat
>> + (string-append
>> + "swh:1:dir:" (revision-directory revision)))
>> + ('git-bare
>> + (string-append
>> + "swh:1:rev:" (revision-id revision))))
>
> Here the ’swid’ depends on the ’archive-type’…
>
>> + output
>> + #:archive-type archive-type
>
> …which is also passed. Then this is propagated. For instance,
> ’swh-download-directory’:
>
>> +(define* (swh-download-directory id output
>> + #:key (log-port (current-error-port)))
>> + "Download from Software Heritage the directory with the given ID, and
>> +unpack it to OUTPUT. Return #t on success and #f on failure."
>> + (swh-download-archive (string-append "swh:1:dir:" id) output
>> + #:archive-type 'flat
>> + #:log-port log-port))
>> +
>
> Does it make sense to pass this ’swhid’ equal to ’swh:1:rev’ with the
> ’flat’ archive-type? Another instance is,
>
>> + (match (vault-fetch swhid
>> + #:archive-type archive-type
>> + #:log-port log-port)
>
> and from my understanding, again ’swhid’ depends on ’archive-type’.
> Therefore, it prone error.

‘git-bare’ only makes sense for a revision, not a directory, but I
wonder if ‘flat’ can be used for a revision (in which case it’d be
equivalent to getting the corresponding directory)?

I agree there’s some redundancy between directory/revision and
flat/git-bare, but it’s the SWH API that looks like this, so I’d be
tempted to just keep it as is. Maybe we could ask for guidance on
#swh-devel.

Thanks!

Ludo’.
Z
Z
zimoun wrote on 18 Sep 2021 12:27
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 44187@debbugs.gnu.org)
CAJ3okZ3cVgLopG5GMFXJVomxKWCfgjmWxVt1cC0oV_S32ewOTw@mail.gmail.com
Hi,

On Sat, 18 Sept 2021 at 12:05, Ludovic Courtès <ludo@gnu.org> wrote:
Toggle quote (5 lines)
> zimoun <zimon.toutoune@gmail.com> skribis:

> > Does it make sense to pass this ’swhid’ equal to ’swh:1:rev’ with the
> > ’flat’ archive-type? Another instance is,

[...]

Toggle quote (5 lines)
> > and from my understanding, again ’swhid’ depends on ’archive-type’.
> > Therefore, it prone error.
>
> ‘git-bare’ only makes sense for a revision, not a directory, but I

So it does not seem possible to form a 'swhid' as "swh:1:dir" and pass
'archive-type' as 'git-bare'. And conversely with 'swh:1:rev' and
'flat'. Right?
I have not tried though. :-)
If yes, it means the both arguments 'swhid' and 'archive-type' are
linked so the function should accept only one unifyied argument and
not 2 independent ones. IMHO.

Toggle quote (8 lines)
> wonder if ‘flat’ can be used for a revision (in which case it’d be
> equivalent to getting the corresponding directory)?
>
> I agree there’s some redundancy between directory/revision and
> flat/git-bare, but it’s the SWH API that looks like this, so I’d be
> tempted to just keep it as is. Maybe we could ask for guidance on
> #swh-devel.

Well, let postpone the refactoring. :-) However, if it works as I
understand, then the refactoring seems the correct way so I would not
accept a backward compatibility argument. ;-)

Have a nice week-end,
simon
L
L
Ludovic Courtès wrote on 18 Sep 2021 23:10
(name . zimoun)(address . zimon.toutoune@gmail.com)(address . 44187-done@debbugs.gnu.org)
87pmt5si8b.fsf@gnu.org
Hello!

zimoun <zimon.toutoune@gmail.com> skribis:

Toggle quote (9 lines)
> The original URL of the channel was:
> <https://github.com/zimoun/channel-example.git>. And this channel
> defines a package where the upstream has also disappeared
> <https://github.com/zimoun/hello-example.git>. Note the URL in the
> package definition is not bogus… but using one was already working. :-)
>
> All is saved on SWH, so now all is transparent! From my point of view,
> this is a killer feature for scientific folks. :-)

Yay! Great that you came up with a nice example to test it on!

Toggle quote (10 lines)
>> First, fallback is implemented only for fresh clones, not for updates.
>> Thus, if I rerun the first example, having now the clone in
>> ~/.cache/guix/checkouts, with a different commit, I get:
>
> SWH is not a forge but an archive. :-) Therefore, this update case does
> not make sense to me. I mean,
>
> $ git -C ~/.cache/guix/checkouts/6k7wvrcpbdsw3pje5b4squybw3jfn3viyrj7gcl7fipa5yjflaza fetch
> fatal: dépôt 'http://example.org/sdf/' non trouvé

Right, that’s a reasonable limitation.

Toggle quote (4 lines)
> Well, maybe this cache could be removed if the commit is not found
> inside this cache and retry to fetch it from SWH. Obviously, the
> downdate case works.

It’s still useful to keep it cached around in case the user is going to
use it several times in a row.

Toggle quote (12 lines)
> Note that on fresh clone, the error message could be improved:
>
> $ ./pre-inst-env guix build guix --with-git-url=guix=https://example.org --with-commit=guix=ff613c2b68aac539262822490448e637d8f315ba -n
> updating checkout of 'https://example.org'...
> guix build: error: Git failure while fetching https://example.org: unexpected http status code: 404
>
>
> where https://example.org is bogus and
> ff613c2b68aac539262822490448e637d8f315ba is not yet archived on SWH. It
> could be nice to warn in addition to the 404 that it is not found in
> SWH. WDYT?

Agreed; I’ve made this change (actually ‘swh-download’ prints something
upon failure since commit 60b42bec8413aa9844e625fb1903257f1bc1e55c, but
it looks more like a debugging message.)

Toggle quote (18 lines)
> $ guix build guix --with-git-url=guix=https://example.org--with-commit=guix=c75b30d58f0becb0a5cd6a8bfe69d1063b0d1ada -n
> updating checkout of 'https://example.org'...
> SWH: found revision c75b30d58f0becb0a5cd6a8bfe69d1063b0d1ada with directory at 'https://archive.softwareheritage.org/api/1/directory/ca2e8a7222b4850c7bea935dff86b9c2a905efd6/'
> SWH vault: requested bundle cooking, waiting for completion...
> SWH vault: Processing...
> [...]
>
>
> then after several hours, I get this:
>
> SWH vault: failure: Internal Server Error. This incident will be reported.
> SWH vault: retrying...
> SWH vault: requested bundle cooking, waiting for completion...
> SWH vault: Processing...
>
> and after more than 12h, the status is still: «SWH vault: Processing...»
> and nothing is complete.

Did it eventually succeed? We obviously have no guarantee as to how
long it might take to cook a bundle.

Toggle quote (6 lines)
> About this ’keyring’ branch, somehow it could be as a separated repo, so
> why not effectively do it. :-) I mean, get the branch as it is and
> mirror this branch in another Git repo saved on SWH; fallback to it if
> ’keyring’ branch is not there. I do not know… Or simply wait that SWH
> improves their things. :-)

Yeah, they’re planning to support it eventually.

Toggle quote (5 lines)
>> *Third, and this answers the asterisk above, we must keep in mind that
>> this is content-addressibility *with SHA1*. Generating a chosen-prefix
>> collision is becoming affordable³, so users absolutely need an additional
>> mechanism to authenticate code they fetched.

[...]

Toggle quote (5 lines)
> How a chosen-prefix attack could work here? I understand why the second
> preimage attack is an issue. But I miss how the SHA-1 chosen-prefix attack
> could be exploited here to compromise the user, because this hash is provided
> by this very same user.

I think you’re right, it’s rather second-preimage attacks that would be
a serious problem. My point is: as time passes, assuming that a SHA1
resolves to a single revision on SWH is becoming more and more
questionable.

Toggle quote (4 lines)
>> swh: Support downloads of bare Git repositories.
>> git: 'update-cached-checkout' can fall back to SWH when cloning.
>> git: 'reference-available?' recognizes 'tag-or-commit'.

I’ve pushed this after adding the warning as you suggested:

dce2cf311b * git: 'reference-available?' recognizes 'tag-or-commit'.
05f44c2d85 * git: 'update-cached-checkout' can fall back to SWH when cloning.
6ec81c31c0 * swh: Support downloads of bare Git repositories.

Thanks a lot for reviewing and testing on real-world examples!

Ludo’.
Closed
Z
Z
zimoun wrote on 20 Sep 2021 11:27
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 44187-done@debbugs.gnu.org)
CAJ3okZ2XNXecc-Q0xCGcmJk99kJvk0coHSp+zpUwRGwbJSdOhg@mail.gmail.com
Hi,

On Sat, 18 Sept 2021 at 23:10, Ludovic Courtès <ludo@gnu.org> wrote:
Toggle quote (8 lines)
> zimoun <zimon.toutoune@gmail.com> skribis:

> > and after more than 12h, the status is still: «SWH vault: Processing...»
> > and nothing is complete.
>
> Did it eventually succeed? We obviously have no guarantee as to how
> long it might take to cook a bundle.

No, I stopped. And I reported to #swh-devel. It might be something
wrong on their side.
Yeah, cook a bundle could be long... especially with large repo as
Guix (lot of commits and couple of files).
I think it is ok to let the code as it is now.


Toggle quote (17 lines)
> >> *Third, and this answers the asterisk above, we must keep in mind that
> >> this is content-addressibility *with SHA1*. Generating a chosen-prefix
> >> collision is becoming affordable³, so users absolutely need an additional
> >> mechanism to authenticate code they fetched.
>
> [...]
>
> > How a chosen-prefix attack could work here? I understand why the second
> > preimage attack is an issue. But I miss how the SHA-1 chosen-prefix attack
> > could be exploited here to compromise the user, because this hash is provided
> > by this very same user.
>
> I think you’re right, it’s rather second-preimage attacks that would be
> a serious problem. My point is: as time passes, assuming that a SHA1
> resolves to a single revision on SWH is becoming more and more
> questionable.

Well, SHA-1 is 2^160 (~10^48.2) and compared to 10^50 which is the
estimated number of atoms in Earth. Speaking about
content-addressability, SHA-1 seems fine. However, for security, yeah
time flies. :-)


Toggle quote (10 lines)
> >> swh: Support downloads of bare Git repositories.
> >> git: 'update-cached-checkout' can fall back to SWH when cloning.
> >> git: 'reference-available?' recognizes 'tag-or-commit'.
>
> I’ve pushed this after adding the warning as you suggested:
>
> dce2cf311b * git: 'reference-available?' recognizes 'tag-or-commit'.
> 05f44c2d85 * git: 'update-cached-checkout' can fall back to SWH when cloning.
> 6ec81c31c0 * swh: Support downloads of bare Git repositories.

Cool! I would deserve a --news entry. ;-)

Cheers,
simon
Closed
L
L
Ludovic Courtès wrote on 22 Sep 2021 12:03
(name . zimoun)(address . zimon.toutoune@gmail.com)(address . 44187-done@debbugs.gnu.org)
87fstxhqq3.fsf@gnu.org
Hi,

zimoun <zimon.toutoune@gmail.com> skribis:

Toggle quote (2 lines)
> On Sat, 18 Sept 2021 at 23:10, Ludovic Courtès <ludo@gnu.org> wrote:

[...]

Toggle quote (15 lines)
>> > How a chosen-prefix attack could work here? I understand why the second
>> > preimage attack is an issue. But I miss how the SHA-1 chosen-prefix attack
>> > could be exploited here to compromise the user, because this hash is provided
>> > by this very same user.
>>
>> I think you’re right, it’s rather second-preimage attacks that would be
>> a serious problem. My point is: as time passes, assuming that a SHA1
>> resolves to a single revision on SWH is becoming more and more
>> questionable.
>
> Well, SHA-1 is 2^160 (~10^48.2) and compared to 10^50 which is the
> estimated number of atoms in Earth. Speaking about
> content-addressability, SHA-1 seems fine. However, for security, yeah
> time flies. :-)

True!

Toggle quote (12 lines)
>> >> swh: Support downloads of bare Git repositories.
>> >> git: 'update-cached-checkout' can fall back to SWH when cloning.
>> >> git: 'reference-available?' recognizes 'tag-or-commit'.
>>
>> I’ve pushed this after adding the warning as you suggested:
>>
>> dce2cf311b * git: 'reference-available?' recognizes 'tag-or-commit'.
>> 05f44c2d85 * git: 'update-cached-checkout' can fall back to SWH when cloning.
>> 6ec81c31c0 * swh: Support downloads of bare Git repositories.
>
> Cool! I would deserve a --news entry. ;-)

That’s a good idea, I’ve added one.

Thanks,
Ludo’.
Closed
?
Your comment

This issue is archived.

To comment on this conversation send an email to 44187@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 44187
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch