Content-addressed mirror is not used upon invalid hash

  • Open
  • quality assurance status badge
Details
6 participants
  • Jan Nieuwenhuizen
  • Leo Famulari
  • Ludovic Courtès
  • Maxim Cournoyer
  • ng0
  • zimoun
Owner
unassigned
Submitted by
Jan Nieuwenhuizen
Severity
important
Merged with
J
J
Jan Nieuwenhuizen wrote on 1 Oct 2017 12:16
v0.13: guix pull fails; libgit2-0.26.0 and 0.25.1 content hashes fail
(address . bug-guix@gnu.org)
877ewf18d4.fsf@gnu.org
Hi!

As reported by laertus on irc[0]: guix pull on 0.13 without substitutes fails

guix pull

Starting download of /tmp/guix-file.3r6cH0
….tar.gz 5.7MiB/s 00:02 | 13.6MiB transferred
unpacking '/gnu/store/sginfwnrcfqn1far31gmzlaffd8xlxyy-guix-latest.tar.gz'...

Starting download of /gnu/store/c3npgqn9ag2ypi9bda1g779wwwlcqqrf-libgit2-0.25.1.tar.gz
v0.25.1 6.1MiB/s 00:01 | 4.1MiB transferred
output path `/gnu/store/c3npgqn9ag2ypi9bda1g779wwwlcqqrf-libgit2-0.25.1.tar.gz' should have sha256 hash `1cdwcw38frc1wf28x5ppddazv9hywc718j92f3xa3ybzzycyds3s', instead has `0ywcxw1mwd56c8qc14hbx31bf198gxck3nja3laxyglv7l57qp26'
cannot build derivation `/gnu/store/z1ky970mnamnbairnpyxxb72qnc485zq-libgit2-0.25.1.drv': 1 dependencies couldn't be built
cannot build derivation `/gnu/store/rl7ms8rmbywvydy4qf656g1sdfxafb7r-guile-git-0.0-2.06f9fc3.drv': 1 dependencies couldn't be built
guix pull: error: build failed: build of `/gnu/store/rl7ms8rmbywvydy4qf656g1sdfxafb7r-guile-git-0.0-2.06f9fc3.drv' failed

because the libgit2-0.25.1 content hash does not check out.

I verified this on version-0.13. The same goes for 0.26.0 on master

$ guix build -S libgit2 --no-substitutes
The following derivations will be built:
/gnu/store/5szrmzmfgxk6pylk5fh9bk8apj4x8axf-libgit2-0.26.0.tar.xz.drv
/gnu/store/mgh4yjxkxfyqmc7c61vwq4vs8v837602-libgit2-0.26.0.tar.gz.drv
@ build-started /gnu/store/mgh4yjxkxfyqmc7c61vwq4vs8v837602-libgit2-0.26.0.tar.gz.drv - x86_64-linux /var/log/guix/drvs/mg//h4yjxkxfyqmc7c61vwq4vs8v837602-libgit2-0.26.0.tar.gz.drv.bz2

Starting download of /gnu/store/53lj4z9cavl7n27r89zjnvyd8fk854kj-libgit2-0.26.0.tar.gz
v0.26.0 4.5MiB 3.1MiB/s 00:01 [####################] 100.0%
sha256 hash mismatch for output path `/gnu/store/53lj4z9cavl7n27r89zjnvyd8fk854kj-libgit2-0.26.0.tar.gz'
expected: 1fdk9yhwvl1w1z71ykzcvgh4nsf8scxcbclz5anh98zpplmhmisa
actual: 1b3figbhp5l83vd37vq6j2narrq4yl9pfw6mw0px0dzb1hz3jqka
@ build-failed /gnu/store/mgh4yjxkxfyqmc7c61vwq4vs8v837602-libgit2-0.26.0.tar.gz.drv - 1 sha256 hash mismatch for output path `/gnu/store/53lj4z9cavl7n27r89zjnvyd8fk854kj-libgit2-0.26.0.tar.gz'
expected: 1fdk9yhwvl1w1z71ykzcvgh4nsf8scxcbclz5anh98zpplmhmisa
actual: 1b3figbhp5l83vd37vq6j2narrq4yl9pfw6mw0px0dzb1hz3jqka
cannot build derivation `/gnu/store/5szrmzmfgxk6pylk5fh9bk8apj4x8axf-libgit2-0.26.0.tar.xz.drv': 1 dependencies couldn't be built
guix build: error: build failed: build of `/gnu/store/5szrmzmfgxk6pylk5fh9bk8apj4x8axf-libgit2-0.26.0.tar.xz.drv' failed

I found no apparent difference in the content

-r--r--r-- 1 janneke janneke 4252130 Oct 1 09:08 c3npgqn9ag2ypi9bda1g779wwwlcqqrf-libgit2-0.25.1.tar.gz
-rw-r--r-- 1 janneke janneke 4252139 Oct 1 09:09 NEW-c3npgqn9ag2ypi9bda1g779wwwlcqqrf-libgit2-0.25.1.tar.gz
-rw-r--r-- 1 janneke janneke 16363520 Oct 1 09:14 c3npgqn9ag2ypi9bda1g779wwwlcqqrf-libgit2-0.25.1.tar
-rw-r--r-- 1 janneke janneke 16363520 Oct 1 09:14 NEW-c3npgqn9ag2ypi9bda1g779wwwlcqqrf-libgit2-0.25.1.tar

but there's this difference between the tar balls...

12:13:57 janneke@dundal:~/src/guix-0.13
$ cmp -l c3npgqn9ag2ypi9bda1g779wwwlcqqrf-libgit2-0.25.1.tar NEW-c3npgqn9ag2ypi9bda1g779wwwlcqqrf-libgit2-0.25.1.tar
13122049 0 157
13122050 0 162
13122051 0 151
13122052 0 147
13122053 0 151
13122054 0 156
13122055 0 57
13122490 57 0
13122491 157 0
13122492 162 0
13122493 151 0
13122494 147 0
13122495 151 0
13122496 156 0
13270529 0 157
13270530 0 162
13270531 0 151
13270532 0 147
13270533 0 151
13270534 0 156
13270535 0 57
13270972 57 0
13270973 157 0
13270974 162 0
13270975 151 0
13270976 147 0
13270977 151 0
13270978 156 0
13294081 0 157
13294082 0 162
13294083 0 151
13294084 0 147
13294085 0 151
13294086 0 156
13294087 0 57
13294519 57 0
13294520 157 0
13294521 162 0
13294522 151 0
13294523 147 0
13294524 151 0
13294525 156 0

janneke


--
Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond http://lilypond.org
Freelance IT http://JoyofSource.com| Avatar® http://AvatarAcademy.com
J
J
Jan Nieuwenhuizen wrote on 1 Oct 2017 21:20
(address . 28659@debbugs.gnu.org)
87wp4e8yk5.fsf@gnu.org
Jan Nieuwenhuizen writes:

The changing of the libgit-0.26.0 checksum was already reported about 3
weeks ago (github seems to only show relative dates)


and the bug is still open. It seems to be a github thing. As I
understand it, currently our options are to update the hash and pray it
won't happen again or host libgit2 tarballs ourselves.

--
Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond http://lilypond.org
Freelance IT http://JoyofSource.com| Avatar® http://AvatarAcademy.com
L
L
Leo Famulari wrote on 1 Oct 2017 22:42
(name . Jan Nieuwenhuizen)(address . janneke@gnu.org)(address . 28659@debbugs.gnu.org)
20171001204237.GA11804@jasmine.lan
On Sun, Oct 01, 2017 at 09:20:42PM +0200, Jan Nieuwenhuizen wrote:
Toggle quote (11 lines)
> Jan Nieuwenhuizen writes:
>
> The changing of the libgit-0.26.0 checksum was already reported about 3
> weeks ago (github seems to only show relative dates)
>
> https://github.com/libgit2/libgit2/issues/4343
>
> and the bug is still open. It seems to be a github thing. As I
> understand it, currently our options are to update the hash and pray it
> won't happen again or host libgit2 tarballs ourselves.

I contacted GitHub about this issue a few weeks ago and they said that:

1) They do not guarantee bit-reproducibility of the snapshots they
generate automatically for each release tag, and they wish that people
would not rely on them as we do. However, since people *are* relying on
them, they are discussing this issue internally.
2) This is the relevant code change:

In the meantime, we can add this to the list of reasons that
reproducibility is difficult in the long term.

I don't have any solutions in mind besides keeping substitutes available
for as long as possible and, for users, using substitutes. We might also
petition upstream projects to offer a "real" release tarball.
-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEEsFFZSPHn08G5gDigJkb6MLrKfwgFAlnRUzkACgkQJkb6MLrK
fwhr1xAA78npB7SsOoIdkY6pt3QabOHBLwUA7TuKWUphJgBMatzrRf463tO+t6/Z
roQmwZKMJ+AvstOWn/EjYNC1W8ujtlkadsAIPUgDWctAun+rHbxM9DQfTowlgX+t
DnblMcArv3BzRTaV5WQYmizBq6yUl9Tf3Su7/RWMUyfgrSkvBPR0ueNaq3hoqL7d
DuWvYSk9d0VIar6SLi+BcCXRLEHWYx4u+HzP0n4tXMp2HUlZL/3MdaDXeOTv1Kiz
mYf04jq2LlCXzYDZrFJeGrRJU94n/NOOjRZfxDmuDZQUMpOMP+3f3u9wHOigVQeP
iKmjhgaxjc5nJPDzHBkIfsVg8z9jOr6VUG5/Xs/+1dO1k77ccCjN9NEQ/TPMfnWj
WIQ/kyIAvEl4vVsPUlgn8WhrDZ0AcQGxmZz2XjHBIRlS6QGKEbdLtyx5V9JtRYLe
21iNf1KwDt2pce4YbaDyW+w8ilPLSUEKaP/2zQdBX5Svoaa1DwkXADsyoqBPXvUa
QxVzLukpfdhSqvPUyBvshvQTv9ByUuNqYHyW70Kuxe2Z1Q0ARe90+6YuuE06/yLc
bTA1mnVZt2ciK2omSVqsF+m7m8RIabPL8Ad+us0P6XfmhiYZMewd36yEbMf1+VC8
IjNGzB7Jo4wJ1BGLPglDndn3XXio8obB9iPNdQy4LyApM/2NYlE=
=fiPX
-----END PGP SIGNATURE-----


N
(name . Leo Famulari)(address . leo@famulari.name)
20171001210527.ym24ubylu7mh5huv@abyayala
Leo Famulari transcribed 2.3K bytes:
Toggle quote (28 lines)
> On Sun, Oct 01, 2017 at 09:20:42PM +0200, Jan Nieuwenhuizen wrote:
> > Jan Nieuwenhuizen writes:
> >
> > The changing of the libgit-0.26.0 checksum was already reported about 3
> > weeks ago (github seems to only show relative dates)
> >
> > https://github.com/libgit2/libgit2/issues/4343
> >
> > and the bug is still open. It seems to be a github thing. As I
> > understand it, currently our options are to update the hash and pray it
> > won't happen again or host libgit2 tarballs ourselves.
>
> I contacted GitHub about this issue a few weeks ago and they said that:
>
> 1) They do not guarantee bit-reproducibility of the snapshots they
> generate automatically for each release tag, and they wish that people
> would not rely on them as we do. However, since people *are* relying on
> them, they are discussing this issue internally.
> 2) This is the relevant code change:
> https://git.kernel.org/pub/scm/git/git.git/commit/?id=22f0dcd9634a818a0c83f23ea1a48f2d620c0546
>
> In the meantime, we can add this to the list of reasons that
> reproducibility is difficult in the long term.
>
> I don't have any solutions in mind besides keeping substitutes available
> for as long as possible and, for users, using substitutes. We might also
> petition upstream projects to offer a "real" release tarball.

Given that we depend on this for our core functionality,
can't we just keep this on our ftp directory at gnu.org
as a fall-back source in a list?

--
ng0
GnuPG: A88C8ADD129828D7EAC02E52E22F9BBFEE348588
-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEqIyK3RKYKNfqwC5S4i+bv+40hYgFAlnRWJcACgkQ4i+bv+40
hYjWWw//Zx+EuYTMEF/nA1o+WwMFjKsZo/kL6zfNektqIsLJSbGkYCUIrAn3Jkur
bL4FJxj4BMxkNHtkkVkUyhYVMalORoJaL0cAr6d/JQkzZswJHkjkzloIgbSdvRpz
PR2u7gIu9DKqs5fE8fbBTYfrm/VwIgmxoZS5Wb8zt/iC5+yZ3+D3PxiU1ujFMtY9
POivSdWH68KsZBw31dQuEoBINWVhwVc2csRloyHjngsxew983usD25rfJJadR1qP
Jm/yjOUmYqqrAfQr0LbHXs+C4Nfj8GL+c05JwgNEC/+6yaCc/Dp0Fa7QyOPbepCI
8hY2XOmTP6AjdQH7WCBwOh/7ZILlhENvOEs6CyW6qeRZgBze/0pvV/lXwbGhbGzF
tqjS/SVieTuaPmQwdLZ2KvKh49bVWVsa56KM2uK0uOl8hobShBHy5VnbHgtgTmea
eVqz1HKKDyjTg+Uzk++jKs7CwYA25BLD8mHqD1Hyg4UAIQtmM1KPmOhPsUuvt7x2
dKmSJiAZlaBTML+uoQ+Yt7Dg/GvM5HDrY6iOVwHvkCbUGuwrArxHXFFBLZ84DkWH
c86aCebP9wUqEJvogDEvq4XPBVDyLu35KBLZrLfEARtXE5DbWQ7D9MjyNkS9ely+
72dmfviu+CJbKFi8GKZvDbnHGeAXWSU31sGGqNCzR4FidUTTVv4=
=lMkG
-----END PGP SIGNATURE-----


L
L
Ludovic Courtès wrote on 2 Oct 2017 16:57
(name . Leo Famulari)(address . leo@famulari.name)
87vajxoavx.fsf@gnu.org
Hi!

Leo Famulari <leo@famulari.name> skribis:

Toggle quote (7 lines)
> I contacted GitHub about this issue a few weeks ago and they said that:
>
> 1) They do not guarantee bit-reproducibility of the snapshots they
> generate automatically for each release tag, and they wish that people
> would not rely on them as we do. However, since people *are* relying on
> them, they are discussing this issue internally.

Oh?! Then we’re in trouble.

Perhaps we should start using ‘git-fetch’ more, with Software Heritage
as a fallback content-addressed mirror? Though again the difficulty is
that SWH uses Git’s method to hash directory contents, so we’d end up
having to provide both a Nix hash and a Git hash in ‘origin’. :-/

Toggle quote (3 lines)
> In the meantime, we can add this to the list of reasons that
> reproducibility is difficult in the long term.

Heh.

Ludo’.
L
L
Ludovic Courtès wrote on 2 Oct 2017 17:09
(name . Jan Nieuwenhuizen)(address . janneke@gnu.org)(address . 28659@debbugs.gnu.org)
87o9ppoabw.fsf@gnu.org
Hello,

Jan Nieuwenhuizen <janneke@gnu.org> skribis:

Toggle quote (2 lines)
> As reported by laertus on irc[0]: guix pull on 0.13 without substitutes fails

I just checked and we do have substitutes, but I understand it doesn’t
help here.

Toggle quote (13 lines)
> guix pull
>
> Starting download of /tmp/guix-file.3r6cH0
> From https://git.savannah.gnu.org/cgit/guix.git/snapshot/master.tar.gz...
> ….tar.gz 5.7MiB/s 00:02 | 13.6MiB transferred
> unpacking '/gnu/store/sginfwnrcfqn1far31gmzlaffd8xlxyy-guix-latest.tar.gz'...
>
> Starting download of /gnu/store/c3npgqn9ag2ypi9bda1g779wwwlcqqrf-libgit2-0.25.1.tar.gz
> From https://github.com/libgit2/libgit2/archive/v0.25.1.tar.gz...
> following redirection to `https://codeload.github.com/libgit2/libgit2/tar.gz/v0.25.1'...
> v0.25.1 6.1MiB/s 00:01 | 4.1MiB transferred
> output path `/gnu/store/c3npgqn9ag2ypi9bda1g779wwwlcqqrf-libgit2-0.25.1.tar.gz' should have sha256 hash `1cdwcw38frc1wf28x5ppddazv9hywc718j92f3xa3ybzzycyds3s', instead has `0ywcxw1mwd56c8qc14hbx31bf198gxck3nja3laxyglv7l57qp26'

What’s sad here is that we do have the right tarball at:


The problem is that the hash check is performed by guix-daemon itself,
not by “guix perform-download”. So when guix-daemon diagnoses a hash
mismatch, it’s too late and we cannot try again and use the
content-addressed mirror.

A crude but helpful fix would be to have perform-download compute the
hash by itself and act accordingly. It’s crude because that means that
we’d be computing the hash twice: once in ‘guix perform-download’ and a
second time in guix-daemon. For archives below ~20 MiB it’s probably OK
though.

Thoughts?

In the future, with the daemon written in Guile, it’s one area where we
could achieve better integration and coordination among the various
pieces.

Ludo’.
L
L
Ludovic Courtès wrote on 2 Oct 2017 17:16
control message for bug #28659
(address . control@debbugs.gnu.org)
87fub1oa0j.fsf@gnu.org
retitle 28659 Content-addressed mirror is not used upon invalid hash
L
L
Ludovic Courtès wrote on 2 Oct 2017 17:16
(address . control@debbugs.gnu.org)
87efqloa0f.fsf@gnu.org
severity 28659 important
J
J
Jan Nieuwenhuizen wrote on 2 Oct 2017 19:05
Re: bug#28659: v0.13: guix pull fails; libgit2-0.26.0 and 0.25.1 content hashes fail
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 28659@debbugs.gnu.org)
87infx8oqr.fsf@gnu.org
Ludovic Courtès writes:

Toggle quote (4 lines)
> What’s sad here is that we do have the right tarball at:
>
> https://mirror.hydra.gnu.org/file/libgit2-0.25.1.tar.gz/sha256/1cdwcw38frc1wf28x5ppddazv9hywc718j92f3xa3ybzzycyds3s

Sad indeed!

Toggle quote (5 lines)
> The problem is that the hash check is performed by guix-daemon itself,
> not by “guix perform-download”. So when guix-daemon diagnoses a hash
> mismatch, it’s too late and we cannot try again and use the
> content-addressed mirror.

Why don't we try our content-addressed mirror first?

Toggle quote (8 lines)
> A crude but helpful fix would be to have perform-download compute the
> hash by itself and act accordingly. It’s crude because that means that
> we’d be computing the hash twice: once in ‘guix perform-download’ and a
> second time in guix-daemon. For archives below ~20 MiB it’s probably OK
> though.
>
> Thoughts?

We may want more guix hackers' viewpoints here, I don't feel very
qualified...As this would be a temporary workaround only until we have

Toggle quote (4 lines)
> In the future, with the daemon written in Guile, it’s one area where we
> could achieve better integration and coordination among the various
> pieces.

...it might be fine?

Do we want/need to bring out a new release for this, e.g. 0.13.1, or
even 0.14? I'm not sure how bad it is that --no-substitutes does not
work. I think working on guix pull to not compile everything locally
may have priority?

janneke

--
Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond http://lilypond.org
Freelance IT http://JoyofSource.com| Avatar® http://AvatarAcademy.com
L
L
Leo Famulari wrote on 2 Oct 2017 20:19
(name . Ludovic Courtès)(address . ludo@gnu.org)
20171002181929.GA10773@jasmine.lan
On Mon, Oct 02, 2017 at 04:57:38PM +0200, Ludovic Courtès wrote:
Toggle quote (13 lines)
> Hi!
>
> Leo Famulari <leo@famulari.name> skribis:
>
> > I contacted GitHub about this issue a few weeks ago and they said that:
> >
> > 1) They do not guarantee bit-reproducibility of the snapshots they
> > generate automatically for each release tag, and they wish that people
> > would not rely on them as we do. However, since people *are* relying on
> > them, they are discussing this issue internally.
>
> Oh?! Then we’re in trouble.

I wonder, are there really that many affected packages? My sense is that
most GitHub-hosted projects offer their own release tarballs in addition
to the problematic auto-generated snapshots, and we tend to prefer the
upstream-provided tarballs in this case.

We'd need to survey our package sources to know what sort of reaction is
most appropriate.

In general, we should try to make Guix as resilient as possible to
unstable upstream sources, since the problem is not limited to GitHub.

Toggle quote (5 lines)
> Perhaps we should start using ‘git-fetch’ more, with Software Heritage
> as a fallback content-addressed mirror? Though again the difficulty is
> that SWH uses Git’s method to hash directory contents, so we’d end up
> having to provide both a Nix hash and a Git hash in ‘origin’. :-/

And the Git hashes will change from SHA1 to SHA256 sooner or later, and
SHA1 hashes will become less reliable as CPUs get faster (collision
attacks), compounding the problem...
-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEEsFFZSPHn08G5gDigJkb6MLrKfwgFAlnSgy0ACgkQJkb6MLrK
fwhXlA//dSvV1TA7XjxpLsaV9atdgrU7DvsVpuntzeYTmGoSqaQE+ZqlY90/AMkI
B37kgAamMge7cIK4xIYE4QFxiWXBWlEOpIQ98rTPrbVzTAxZPzMu/EWGCQ4pDbmN
ETqPHRrzQARGq8kPJLcKqcwqtQsdina87ITTDzZeYqEuJP90BIwvCWW8MGnJWYh6
3VhHZugZk/5fQQF6Jnv8ILf/BaSqYVsWkrDeuCuNXznUCMT9mfQ/7KvG7nMS6xy8
XNhGGqwsytU4AS4ekOZNtdIhevkFqUdj8t5M2Stp0xcsi4YnvVEi/sio8VbIov7D
jm58w6YtUl1vO+BBIt55c6WJHN9nYxNgemMkrj3n0bsmf7PV/VcbsTg/swCb3J4B
AcedV6RETP8iVB2cYBYCxA38Z+3/FJFyOvERjOkzzurWddMjWpipIsC7atchYUCf
czsLTCCwewieCu7N4yaaIjO3UWbCfq4lDqPsURp5bLtlXdw7NXoDYtvao4TyRjR2
KR3h7Qo6VobgE4jaL0Y/7x2YTXDtPHDpJA4wRLiDkwRj5awJKQ+IyZtr7wZ7jY71
wy0yqck2KfGDnKNCZQUppo93OoJl9JerKtw47CT5cQv+53x52Drr8HdQnLZ7aLlX
tX72HXtHEb41LySXmydDChivReONHpATOrRbLuFfhUHDvHHylL8=
=dby8
-----END PGP SIGNATURE-----


L
L
Leo Famulari wrote on 2 Oct 2017 20:22
(name . Ludovic Courtès)(address . ludo@gnu.org)
20171002182208.GB10773@jasmine.lan
On Mon, Oct 02, 2017 at 05:09:39PM +0200, Ludovic Courtès wrote:
Toggle quote (4 lines)
> What’s sad here is that we do have the right tarball at:
>
> https://mirror.hydra.gnu.org/file/libgit2-0.25.1.tar.gz/sha256/1cdwcw38frc1wf28x5ppddazv9hywc718j92f3xa3ybzzycyds3s

It seems to me that there are several reasons someone may choose not to
use substitutes. Some of those reasons (reproducibility and security
concerns) are obviated for fixed-output derivations like upstream
sources, and I think it would be fine to still use substitutes for these
derivations.

But the motivations of privacy, self-sufficiency, etc are not addressed
by that idea.
-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEEsFFZSPHn08G5gDigJkb6MLrKfwgFAlnSg88ACgkQJkb6MLrK
fwgqYA//Zwe4/WDJ9Hp622dJYZtUbjlsugX44QQIfZY65NNMfhEMdtct9SyiDzHR
mxvGdSf4kBKI+wdYcJIn5CtFJRD+Bz0zCc94tRlNk+W8bcJt2fJIqWJpGjW0VINI
oCp1mu2q+SgDWzOoB3owQyiOu+txm9eB5ZjrUvd+gp++TmbedkAENLBiAaoYZD9E
mXrpjMWq2MCTTACzCLQZU0OxffQlL/4UKq1M9qfdWWDi9V/dZ8jBsz9HHcKzt8Kv
gUUFUAGDe5/o7pXf73PW+S67AvbDZrPQN2V0yckpC3yb/ZqawzWiu2Y3zrW2ISNm
egzs40CLK2oKowl47nrO+Etb5MxmNbZPDjgqiN1WXjFfSm8YcdzAAOKR8jX8ocC6
vzcoHd0iBbfLyK5fFInixwx7ct1AArtpRK1BoC7D7z3+82VQmu/KMxoWInxI0DLF
CdfYO9AydgvCW+qiUOAa6I6xMNvDCvkHoJFt5axe4W+JX1WtFsETP+4uLCA5H6TW
I8CvU3eUU0cxGEjHunFtgdmsQx3dYajTiq4aSP9EQMdaHDouBEIRF++LIwuXqqjk
FrcILwnhgw+/gFjs3+Jix4pneqlDS6R2bOSgpehf9NLrHKrbMxUhVGLeyNhYsE7d
9m8sdEwxcXliq1rDcDynqmTDK6f5ij2jdSziUnZScF4DJ976HJ0=
=964r
-----END PGP SIGNATURE-----


L
L
Ludovic Courtès wrote on 2 Oct 2017 22:00
(name . Leo Famulari)(address . leo@famulari.name)
878tgt721q.fsf@gnu.org
Leo Famulari <leo@famulari.name> skribis:

Toggle quote (5 lines)
> On Mon, Oct 02, 2017 at 05:09:39PM +0200, Ludovic Courtès wrote:
>> What’s sad here is that we do have the right tarball at:
>>
>> https://mirror.hydra.gnu.org/file/libgit2-0.25.1.tar.gz/sha256/1cdwcw38frc1wf28x5ppddazv9hywc718j92f3xa3ybzzycyds3s

Just to be clear: this URL is not that of a substitute, but that of a
content-addressed file (corresponding to the output of a fixed-output
derivation.)

Toggle quote (9 lines)
> It seems to me that there are several reasons someone may choose not to
> use substitutes. Some of those reasons (reproducibility and security
> concerns) are obviated for fixed-output derivations like upstream
> sources, and I think it would be fine to still use substitutes for these
> derivations.
>
> But the motivations of privacy, self-sufficiency, etc are not addressed
> by that idea.

Right. Jan suggested checking the content-addressed mirrors *before*
the real upstream address. That would address the problem of upstream
sources modified in-place, but at the cost of privacy/self-sufficiency
as you note. (Though it’s not really making “privacy” any worse in this
case: it’s gnu.org vs. github.com.)

Perhaps we should make content-addressed mirrors configurable in a way
that’s orthogonal to derivations, something similar in spirit to
--substitute-urls? The difficulty is that content-addressed mirrors are
not just URLs; see (guix download).

Thoughts?

Ludo’.
J
J
Jan Nieuwenhuizen wrote on 2 Oct 2017 22:22
(name . Ludovic Courtès)(address . ludo@gnu.org)
87a8198fli.fsf@gnu.org
Ludovic Courtès writes:

Toggle quote (6 lines)
> Right. Jan suggested checking the content-addressed mirrors *before*
> the real upstream address. That would address the problem of upstream
> sources modified in-place, but at the cost of privacy/self-sufficiency
> as you note. (Though it’s not really making “privacy” any worse in this
> case: it’s gnu.org vs. github.com.)

Yes, that may not preferrable in general without override.

Toggle quote (5 lines)
> Perhaps we should make content-addressed mirrors configurable in a way
> that’s orthogonal to derivations, something similar in spirit to
> --substitute-urls? The difficulty is that content-addressed mirrors are
> not just URLs; see (guix download).

Hmm. I'm not sure what problem we are solving. Should we only do this
for github(-like) tarballs? Do we see this problem with other sources,
should we prevent it? Possibly github will never do something like this
again. Or we could banish github/gitlab(?) auto-generated tarballs and
go for git checkouts+commits?

janneke

--
Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond http://lilypond.org
Freelance IT http://JoyofSource.com| Avatar® http://AvatarAcademy.com
L
L
Leo Famulari wrote on 2 Oct 2017 22:29
(name . Jan Nieuwenhuizen)(address . janneke@gnu.org)
20171002202907.GA23960@jasmine.lan
On Mon, Oct 02, 2017 at 10:22:33PM +0200, Jan Nieuwenhuizen wrote:
Toggle quote (6 lines)
> Hmm. I'm not sure what problem we are solving. Should we only do this
> for github(-like) tarballs? Do we see this problem with other sources,
> should we prevent it? Possibly github will never do something like this
> again. Or we could banish github/gitlab(?) auto-generated tarballs and
> go for git checkouts+commits?

Files referenced by URL (location-addressing vs content-addressing) have
been changed in place by a variety of hosters and upstream projects
since I've started paying attention to these issues. I don't think we
need to do anything special regarding GitHub.
-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEEsFFZSPHn08G5gDigJkb6MLrKfwgFAlnSoZMACgkQJkb6MLrK
fwgprRAAncJAFH/BMSTVqiV8vTF9mvsS8qFSaX1Vfj6XLPa3WpwoobMi6JOZQ1PG
jSWwudvXNdVRljU337d3XpcFEHZdau6BYwgMg52RcQv8Tzd2rgG50alJT5PkrTb0
7Gs4QiTQbJbWkQWJBPZ0g6nvFctj30H2AtFRp91hrQbo1mKt0X+VHjEQavzZ162z
8xLTQ+fYt8/GtZB3EDKk4o0in+HpLSZTIasDmsNsmscEREXic0eEnUlASA56WhrU
gXK82rqD36Vdvn5LJKfWfHFN24hHe/4gv3PT5DnaqmyI8Qz7D0UNcIGpOQM+NN1U
Pg3ugWXHSxyvT9bXJtK/LNATO6/02W7hVB44Hzh1Bw0FD5svD9wCXkzXNojCjqn1
wWGVQ45xMk+Ach5VIntmH6gqLu5dVyFRcS1RX9/osBOPxO6A8SLAiod6XWcp7j/a
7VueoRU6wnsaZ7eIgjUZ59tAu/PUs5+XprFvz1oDfjKqS1hS8gyhncd83LEpdOma
7XXWON82W8xSZfK7mHOsdkje2O5LF1jygqhjJRtCYbhiPmhHC0FUCtue61lT2CH4
YUNDRyaJbnF3d4suGS/yftLFpEyZ/1sandASq643lAHO05lCsBwkcrr2AoA9a5zE
1dLX0qja45/Mf4oJkQUdzyO9kw7vgfvbNhPjvuDJ8mm2S11vWYU=
=XoO+
-----END PGP SIGNATURE-----


M
M
Maxim Cournoyer wrote on 3 Oct 2017 00:47
(name . Leo Famulari)(address . leo@famulari.name)
87infx2mmt.fsf@gmail.com
Leo Famulari <leo@famulari.name> writes:

Toggle quote (16 lines)
> On Mon, Oct 02, 2017 at 04:57:38PM +0200, Ludovic Courtès wrote:
>> Hi!
>>
>> Leo Famulari <leo@famulari.name> skribis:
>>
>> > I contacted GitHub about this issue a few weeks ago and they said that:
>> >
>> > 1) They do not guarantee bit-reproducibility of the snapshots they
>> > generate automatically for each release tag, and they wish that people
>> > would not rely on them as we do. However, since people *are* relying on
>> > them, they are discussing this issue internally.
>>
>> Oh?! Then we’re in trouble.
>
> I wonder, are there really that many affected packages?

There's a list here:
of the homebrew project's maintainers.

Maxim
L
L
Ludovic Courtès wrote on 3 Oct 2017 14:30
(name . Jan Nieuwenhuizen)(address . janneke@gnu.org)
871smk9zx9.fsf@gnu.org
Jan Nieuwenhuizen <janneke@gnu.org> skribis:

Toggle quote (2 lines)
> Ludovic Courtès writes:

[...]

Toggle quote (11 lines)
>> Perhaps we should make content-addressed mirrors configurable in a way
>> that’s orthogonal to derivations, something similar in spirit to
>> --substitute-urls? The difficulty is that content-addressed mirrors are
>> not just URLs; see (guix download).
>
> Hmm. I'm not sure what problem we are solving. Should we only do this
> for github(-like) tarballs? Do we see this problem with other sources,
> should we prevent it? Possibly github will never do something like this
> again. Or we could banish github/gitlab(?) auto-generated tarballs and
> go for git checkouts+commits?

Content-addressed mirrors help with disappearing and modified tarballs
in general; it’s not just GitHub.

Occasionally we see that problem with tarballs coming from elsewhere:
404 is quite frequent, and in-place modification happens from time to
time (even on ftp.gnu.org…).

Ludo’.
L
L
Ludovic Courtès wrote on 3 Oct 2017 14:31
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
87wp4c8lbc.fsf@gnu.org
Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (22 lines)
> Leo Famulari <leo@famulari.name> writes:
>
>> On Mon, Oct 02, 2017 at 04:57:38PM +0200, Ludovic Courtès wrote:
>>> Hi!
>>>
>>> Leo Famulari <leo@famulari.name> skribis:
>>>
>>> > I contacted GitHub about this issue a few weeks ago and they said that:
>>> >
>>> > 1) They do not guarantee bit-reproducibility of the snapshots they
>>> > generate automatically for each release tag, and they wish that people
>>> > would not rely on them as we do. However, since people *are* relying on
>>> > them, they are discussing this issue internally.
>>>
>>> Oh?! Then we’re in trouble.
>>
>> I wonder, are there really that many affected packages?
>
> There's a list here:
> https://github.com/Homebrew/homebrew-core/issues/18044, compiled by one
> of the homebrew project's maintainers.

Interesting. Thanks for the link!

Ludo’.
L
L
Leo Famulari wrote on 3 Oct 2017 16:24
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
20171003142449.GB23431@jasmine.lan
On Mon, Oct 02, 2017 at 06:47:06PM -0400, Maxim Cournoyer wrote:
Toggle quote (7 lines)
> Leo Famulari <leo@famulari.name> writes:
> > I wonder, are there really that many affected packages?
>
> There's a list here:
> https://github.com/Homebrew/homebrew-core/issues/18044, compiled by one
> of the homebrew project's maintainers.

I meant, how many Guix packages use the auto-generated GitHub snapshots?

I believe the tell-tale sign is that the download link will have the
link text 'Source code', as for this release:

-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEEsFFZSPHn08G5gDigJkb6MLrKfwgFAlnTnbEACgkQJkb6MLrK
fwgIQA//Wg9T8UTUJy67D5QirTKifuoVyyC/5u8FloLZ/qp/7yPybSUgkM6GixOm
GM71mwhUFYnzmRzm0kO0u0bKPudby7lsncrx8ZuI3tekMCeagSqs11nEyAAn+R5x
CkFAtKniUi9HAWmucU+VXwFjPwhbmNF30KU2bo3x5sYac/Q6wsyAeUhts3FoyoC0
NY/OYBuY1NmKcRC4cHBP80xymvlDPJsiqNsIKzUDwqqDRxnKZWt1CccgZ3ry8xwz
teZchapxM4ppM0wEelSkCkxSc4cJIMARII6FENDZbKJ9vtXr+oIu/KiMWqLSSRaG
YACujowNL1LOlLz8NOR4TmeGVC7nvtC2h1RN9e+VOBTiqooBOeIoRsqRTPf+S08I
PWx7qMFkoq3SCKCXNq8y+Rw2mM/bu3J7x7J1EkcoX5GBy2rCNdwLKIbWgtmD3fes
SC0EnS0TWIfGHMWF1eXSF/RcReVF4WOd4RRhekU3eebm8XFQ6pJe2ojMtK0xClUl
Eg+cXV0o/eyLi8GeuFX2t6ecvzdP7yLAOcwDxNquOnHawfDdKSk++6mH6dqRWwsC
FyZ3lFI3RHzu//G3m76pj3B8WEe8htHSq6Qm+zKdXVqulmGiyHkvEVvbeK2CsnKb
KMmn98/ZtnhBwrJgVjR5pF0zjAtCVSCVZpBtg1CQQEQlyKC4b00=
=iPiB
-----END PGP SIGNATURE-----


M
M
Maxim Cournoyer wrote on 4 Oct 2017 06:22
(name . Leo Famulari)(address . leo@famulari.name)
874lrfee45.fsf@gmail.com
Leo Famulari <leo@famulari.name> writes:

Toggle quote (15 lines)
> On Mon, Oct 02, 2017 at 06:47:06PM -0400, Maxim Cournoyer wrote:
>> Leo Famulari <leo@famulari.name> writes:
>> > I wonder, are there really that many affected packages?
>>
>> There's a list here:
>> https://github.com/Homebrew/homebrew-core/issues/18044, compiled by one
>> of the homebrew project's maintainers.
>
> I meant, how many Guix packages use the auto-generated GitHub snapshots?
>
> I believe the tell-tale sign is that the download link will have the
> link text 'Source code', as for this release:
>
> https://github.com/libgit2/libgit2/releases/tag/v0.26.0

The following script:
;;; A script to find packages possibly affected by GitHub
;;; infrastructure update that caused minor changes in the
;;; automatically generated tarballs.

(use-modules (ice-9 match)
(gnu packages)
(guix download)
(guix packages))

(define (problematic-uri? uri)

(define (contains-github-archive? uri)
(string-match "github.com/.*/archive/" uri))

;; URI can be a string or a list of string.
(match uri
((uri1 uri2 ...) ;match list of strings
(filter contains-github-archive? uri))
(uri1 ;match string
(contains-github-archive? uri1))))

(define (problematic-github-package? package)
(let ((source (package-source package)))
(and (origin? source)
(eq? (origin-method source) url-fetch)
(problematic-uri? (origin-uri source)))))

(define (problematic-github-packages)
"List of all the potentially problematic GitHub packages."
(fold-packages (lambda (p r)
(if (problematic-github-package? p)
(cons p r)
r))
'()))
(define (main)
"Find and print the names of the potentially problematic GitHub packages."
(let ((packages (problematic-github-packages)))
(format #t "Number of potentially problematic GitHub packages:~a~%"
(length packages))
(for-each (lambda (p)
(format #t "~a~%" (package-name p)))
packages)))

;;; Run the program.
(main)
outputs that there could be up to 1011 affected packages.

The scripts checks for a url-fetch uri of the form
"github.com/.*/archive/", which seems to be the one used for the
dynamically generated archives.

Here are the first 10 lines of the output:
Toggle snippet (13 lines)
Number of potentially problematic GitHub packages:1011
fdupes
cbatticon
sedsed
cpulimit
autojump
sudo
thermald
progress
dstat
[...]

I've checked the first few with for example:
Toggle snippet (3 lines)
guix build --source --no-substitutes sedsed

and they were OK though.

Maxim
L
L
Leo Famulari wrote on 4 Oct 2017 18:54
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
20171004165413.GA4596@jasmine.lan
On Wed, Oct 04, 2017 at 12:22:34AM -0400, Maxim Cournoyer wrote:
Toggle quote (10 lines)
> Here are the first 10 lines of the output:
> --8<---------------cut here---------------start------------->8---
> Number of potentially problematic GitHub packages:1011
> fdupes
> cbatticon
> sedsed
> cpulimit
> autojump
> sudo

I think the script is buggy; sudo's source is not downloaded from GitHub
as far as I can tell.
-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEEsFFZSPHn08G5gDigJkb6MLrKfwgFAlnVEjEACgkQJkb6MLrK
fwicpw//TPd2Fr5Mkp503pNfnrN6H0cgir0LGBmdl8FCF1K3BWL9ZVNnI5e0sejW
18dB8FbIOIsVEVpQ7Rru0iqWNZhQsOCw3HgWQgfGoDGn2xSwRKRxT9q51EFPI2uo
/6bBeYaqmRdngjpKtUZVtb25igd3hRZYDznDL5k33tGNXbu9N2hGLy3mIRYwUDAD
JBcujOAucQAqi5WgJhUglqpGSl2tPbxTm4Pqy+xrl6ziYF9dTqbZbsInZ9ZR7XWq
Gop6OOFPC6gtpAVCf1i6j3sl7/6a0N/aISuXGC+U0kZIIknLYM3p8WXWUV3vvRD/
b/bMMZUzNIz210giaYTk8XrQ8YgLnShD+D9x+7pwyyth9rFwN4vXk677vCVo29Dn
CZRBeWcjEQko8KQNhjLNkt9KwVT/HiHEkMf2pcDfHveQVdDMfHWVp4pl+iYMlJcT
opTCjuRHMGP3Hc8Bg+yWz37sGUtieg+iJm4l3KxcBL4dto27B/fCaUpffwAsyEZG
18/c3hsE/08OjAD2eJmzwodjtbRAmBeizpyCXRcOZjqWQ9QFMKetDiAxOjQOtEhs
LCqpgmHbSd3FiQIH+EiXxEXAXGFKRmT9M3LSsFnrevI49AYm+iiOCadcwikqun8t
PH/kaPWcWzGDrVq3Dp8FDqXnRWK5ZVBmGmQmbVslwyk6yzCM2+c=
=pupx
-----END PGP SIGNATURE-----


M
M
Maxim Cournoyer wrote on 5 Oct 2017 01:53
(name . Leo Famulari)(address . leo@famulari.name)
87r2uih3lx.fsf@gmail.com
Leo Famulari <leo@famulari.name> writes:

Toggle quote (14 lines)
> On Wed, Oct 04, 2017 at 12:22:34AM -0400, Maxim Cournoyer wrote:
>> Here are the first 10 lines of the output:
>> --8<---------------cut here---------------start------------->8---
>> Number of potentially problematic GitHub packages:1011
>> fdupes
>> cbatticon
>> sedsed
>> cpulimit
>> autojump
>> sudo
>
> I think the script is buggy; sudo's source is not downloaded from GitHub
> as far as I can tell.

Good catch! I was assuming empty lists were falsy, but that's not the
case! I've ensured purely boolean predicates now and it gets the list
down to 650.

Here's the corrected script:
;;; A script to find packages possibly affected by GitHub
;;; infrastructure update that caused minor changes in the
;;; automatically generated tarballs.

(use-modules (ice-9 match)
(gnu packages)
(guix download)
(guix packages))

(define (problematic-uri? uri)

(define (contains-github-archive? uri)
(regexp-match? (string-match "github.com/.*/archive/" uri)))

;; URI can be a string or a list of string.
(match uri
((uri1 uri2 ...) ;match list of strings
(not (null? (filter contains-github-archive? uri))))
(uri1 ;match string
(contains-github-archive? uri1))))

(define (problematic-github-package? package)
(let ((source (package-source package)))
(and (origin? source)
(eq? (origin-method source) url-fetch)
(problematic-uri? (origin-uri source)))))

(define (problematic-github-packages)
"List of all the potentially problematic GitHub packages."
(fold-packages (lambda (p r)
(if (problematic-github-package? p)
(cons p r)
r))
'()))
(define (main)
"Find and print the names of the potentially problematic GitHub packages."
(let ((packages (problematic-github-packages)))
(format #t "Number of potentially problematic GitHub packages: ~a~%"
(length packages))
(for-each (lambda (p)
(format #t "~a~%" (package-name p)))
packages)))

;;; Run the program.
(main)
And sample output:
Toggle snippet (21 lines)
Number of potentially problematic GitHub packages: 650
fdupes
cbatticon
cpulimit
thefuck
thermald
neofetch
autojump
progress
nnn
[...]
wxwidgets
xclip
xcape
sxhkd
maim
slop
tinyxml2
xlsx2csv

Maxim
M
M
Maxim Cournoyer wrote on 5 Oct 2017 06:52
(name . Leo Famulari)(address . leo@famulari.name)(address . 28659@debbugs.gnu.org)
87h8vegpqu.fsf@gmail.com
I've modified the script to sort the packages it prints:
Toggle snippet (8 lines)
- (for-each (lambda (p)
- (format #t "~a~%" (package-name p)))
- packages)))
+ (for-each (lambda (name)
+ (format #t "~a~%" name))
+ (sort (map package-name packages) string<?))))


If we can trust the Homebrew list to be extensive, it seems we got
lucky; there's only one affected package that we share which is
yaml-cpp. Here's how it fails on our side:

Toggle snippet (20 lines)
guix build -S --no-substitutes yaml-cpp
The following derivation will be built:
/gnu/store/mlap8jmadirnbii6sppb6vj9x56s8azw-yaml-cpp-0.5.3.tar.gz.drv
@ build-started /gnu/store/mlap8jmadirnbii6sppb6vj9x56s8azw-yaml-cpp-0.5.3.tar.gz.drv - x86_64-linux /var/log/guix/drvs/ml//ap8jmadirnbii6sppb6vj9x56s8azw-yaml-cpp-0.5.3.tar.gz.drv.bz2

Starting download of /gnu/store/qwflwafrzjbr2b7dy4nv18nxykghhmnk-yaml-cpp-0.5.3.tar.gz
From https://github.com/jbeder/yaml-cpp/archive/yaml-cpp-0.5.3.tar.gz...
following redirection to `https://codeload.github.com/jbeder/yaml-cpp/tar.gz/yaml-cpp-0.5.3'...
...p-0.5.3 1.7MiB/s 00:01 | 1.9MiB transferred
sha256 hash mismatch for output path `/gnu/store/qwflwafrzjbr2b7dy4nv18nxykghhmnk-yaml-cpp-0.5.3.tar.gz'
expected: 1vk6pjh0f5k6jwk2sszb9z5169whmiha9ainbdpa1arxlkq7v3b6
actual: 1ck7jk0wjfigrf4cgcjqsir4yp1s6vamhhxhpsgfvs46pgm5pk6y
@ build-failed /gnu/store/mlap8jmadirnbii6sppb6vj9x56s8azw-yaml-cpp-0.5.3.tar.gz.drv - 1 sha256 hash mismatch for output path `/gnu/store/qwflwafrzjbr2b7dy4nv18nxykghhmnk-yaml-cpp-0.5.3.tar.gz'
expected: 1vk6pjh0f5k6jwk2sszb9z5169whmiha9ainbdpa1arxlkq7v3b6
actual: 1ck7jk0wjfigrf4cgcjqsir4yp1s6vamhhxhpsgfvs46pgm5pk6y
guix build: error: build failed: build of
`/gnu/store/mlap8jmadirnbii6sppb6vj9x56s8azw-yaml-cpp-0.5.3.tar.gz.drv'
failed

Maxim
J
J
Jan Nieuwenhuizen wrote on 5 Oct 2017 08:08
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
87infu6sah.fsf@gnu.org
Maxim Cournoyer writes:

Toggle quote (4 lines)
> If we can trust the Homebrew list to be extensive, it seems we got
> lucky; there's only one affected package that we share which is
> yaml-cpp. Here's how it fails on our side:

I needed to also use (ice-9 regex) and then I found these to fail

antlr3
csound
erlang
font-google-material-design-icons
fritzing
libgit2
lxqt-common
ogre
plexus-interpolation
red-eclipse
yaml-cpp

out of 646 packages it's not many but it includes our core dependency
libgit2 which breaks guix pull --no-substitutes; that's hardly being
lucky?

janneke

--
Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond http://lilypond.org
Freelance IT http://JoyofSource.com| Avatar® http://AvatarAcademy.com
L
L
Leo Famulari wrote on 20 Oct 2017 23:17
(name . Ludovic Courtès)(address . ludo@gnu.org)
20171020211700.GA32355@jasmine.lan
On Mon, Oct 02, 2017 at 10:00:33PM +0200, Ludovic Courtès wrote:
Toggle quote (6 lines)
> Right. Jan suggested checking the content-addressed mirrors *before*
> the real upstream address. That would address the problem of upstream
> sources modified in-place, but at the cost of privacy/self-sufficiency
> as you note. (Though it’s not really making “privacy” any worse in this
> case: it’s gnu.org vs. github.com.)

Yeah, I don't personally think there is a privacy issue with fetching
sources from our mirrors at gnu.org, or other domains we control.

Toggle quote (7 lines)
> Perhaps we should make content-addressed mirrors configurable in a way
> that’s orthogonal to derivations, something similar in spirit to
> --substitute-urls? The difficulty is that content-addressed mirrors are
> not just URLs; see (guix download).
>
> Thoughts?

I do think we should make it so that users don't suffer from unreliable
upstream sources when we know the sources are available on our servers
(or the Nix mirror), even with --no-substitutes.
-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEEsFFZSPHn08G5gDigJkb6MLrKfwgFAlnqZ8kACgkQJkb6MLrK
fwg5TQ/8CX5M/DfOOi27pQHOXUE91v7o87jnOSazE2Xqzq+q8N7iBZsY+DxQR676
5KmOw/MREsqP1rWBy4eR5nDRuhQv/hgOSpxh1Kyluk5KkqGezhyCmgmRBqVJnoF6
uZw9VD0UBxcYoZScvemtJjBRFCwjFjpUINW1XE5sUXV7CCqI007kvKB+2Q6iyxBz
R1pK5cMhWFj7GsOoXtYCUIlxER8tK/lE6Tx0OP264TidenTDfH7dy7udmFbNTRYz
72LKl1xno7rIvgRSHA8mxRlAcLXFCXqlbad7QFHEnEQdLmeoLyF565bE/VsH1CdQ
jHW7BJM9svBLmXwfxMOT3YLXGGPgGFI5Kou/uatWmh3PFRvRffe/m97SsA2BYDyh
C08pY8DC+HSoTeF/GZO08Jfg3gT3o9pp+IJRt/AE7qhYqLOkX/zQ6guLm6H1jJIJ
y4dgoy+Mlu8mwVOhsEvnjVLfk7Ho3QvHLcwzKzDrW001Od6bve7CuLW9pHkbOwqm
m7QopDt0XLnL+MTG3o2Lvxbhb5u4rCKfu7Bjb1U01hXlgJWVlULzv4/7voRe1Y1Y
cEax9IBjCnRa2ANa+IX6z/trQr9J/gSMjjtS/Uim15728PY19Mm27ZyleJX8lzbG
SIf+0jpzWLMFSKZ8hr04XNYrdZjANRruWQa9Pt7p6QdlIiX4lqc=
=jxqv
-----END PGP SIGNATURE-----


L
L
Ludovic Courtès wrote on 28 Nov 2017 14:30
(name . Leo Famulari)(address . leo@famulari.name)
87d1421qek.fsf@gnu.org
Leo Famulari <leo@famulari.name> skribis:

Toggle quote (21 lines)
> On Mon, Oct 02, 2017 at 10:00:33PM +0200, Ludovic Courtès wrote:
>> Right. Jan suggested checking the content-addressed mirrors *before*
>> the real upstream address. That would address the problem of upstream
>> sources modified in-place, but at the cost of privacy/self-sufficiency
>> as you note. (Though it’s not really making “privacy” any worse in this
>> case: it’s gnu.org vs. github.com.)
>
> Yeah, I don't personally think there is a privacy issue with fetching
> sources from our mirrors at gnu.org, or other domains we control.
>
>> Perhaps we should make content-addressed mirrors configurable in a way
>> that’s orthogonal to derivations, something similar in spirit to
>> --substitute-urls? The difficulty is that content-addressed mirrors are
>> not just URLs; see (guix download).
>>
>> Thoughts?
>
> I do think we should make it so that users don't suffer from unreliable
> upstream sources when we know the sources are available on our servers
> (or the Nix mirror), even with --no-substitutes.

The more I think about it, the more I’m inclined to simply move
content-addressed mirrors to the front of the list. This means that
users, in practice, would be fetching all the source from
mirror.hydra.gnu.org.

The main issue is making it configurable. Currently the
content-addressed mirror configuration for regular files in (guix
download) looks like this:

Toggle snippet (19 lines)
(define %content-addressed-mirrors
;; List of content-addressed mirrors. Each mirror is represented as a
;; procedure that takes a file name, an algorithm (symbol) and a hash
;; (bytevector), and returns a URL or #f.
;; Note: Avoid 'https' to mitigate <http://bugs.gnu.org/22774>.
;; TODO: Add more.
'(list (lambda (file algo hash)
;; Files served by 'guix publish' are accessible under a single
;; hash algorithm.
(string-append "http://mirror.hydra.gnu.org/file/"
file "/" (symbol->string algo) "/"
(bytevector->nix-base32-string hash)))
(lambda (file algo hash)
;; 'tarballs.nixos.org' supports several algorithms.
(string-append "http://tarballs.nixos.org/"
(symbol->string algo) "/"
(bytevector->nix-base32-string hash)))))

That for VCS checkouts in (guix build download-nar) looks like this:

Toggle snippet (13 lines)
(define (urls-for-item item)
"Return the fallback nar URL for ITEM--e.g.,
\"/gnu/store/cabbag3…-foo-1.2-checkout\"."
;; Here we hard-code nar URLs without checking narinfos. That's probably OK
;; though.
;; TODO: Use HTTPS? The downside is the extra dependency.
(let ((bases '("http://mirror.hydra.gnu.org/guix"
"http://berlin.guixsd.org"))
(item (basename item)))
(append (map (cut string-append <> "/nar/gzip/" item) bases)
(map (cut string-append <> "/nar/" item) bases))))

The latter could be expressed by a command-line flag. In fact it’s the
same as --substitute-urls.

(Time passes…)

Thinking more about it, why not simply always enable substitutes for
fixed-output derivations, like this:
Toggle diff (16 lines)
diff --git a/nix/libstore/build.cc b/nix/libstore/build.cc
index d68e8b2bc..03a8f5080 100644
--- a/nix/libstore/build.cc
+++ b/nix/libstore/build.cc
@@ -1034,8 +1034,10 @@ void DerivationGoal::haveDerivation()
/* We are first going to try to create the invalid output paths
through substitutes. If that doesn't work, we'll build
- them. */
- if (settings.useSubstitutes && substitutesAllowed(drv))
+ them. Always enable substitutes for fixed-output derivations to
+ protect against disappearing files and in-place modifications on
+ upstream sites. */
+ if ((fixedOutput || settings.useSubstitutes) && substitutesAllowed(drv))
foreach (PathSet::iterator, i, invalidOutputs)
addWaitee(worker.makeSubstitutionGoal(*i, buildMode == bmRepair));
This solves all our problems and makes download-nar.scm useless.

As an added bonus, it provides a improves the UI since we now always
see:

Toggle snippet (4 lines)
0.1 MB will be downloaded:
/gnu/store/plx9848n6waj6zghn3d54ybx8ihcn23k-guile-git-0.0-4.951a32c-checkout

… instead of:

Toggle snippet (4 lines)
The following derivation will be built:
/gnu/store/y86rlb6pdm35im7q02y6479ca84zwylz-guile-git-000.0-4.951a32c-checkout.drv

The downside is that it still requires one to authorize the server’s
key, although it’s in theory unnecessary since it’s content addressed.
I’m not sure how to solve that because ‘guix substitute’ doesn’t know
that it’s substituting a fixed-output derivation. I suppose we’d need
to modify the “protocol” between guix-daemon and ‘guix substitute’.

Thoughts?

Ludo’.
L
L
Ludovic Courtès wrote on 14 Dec 2017 17:53
(name . Leo Famulari)(address . leo@famulari.name)
874lot9rou.fsf@gnu.org
ludo@gnu.org (Ludovic Courtès) skribis:

Toggle quote (20 lines)
> Thinking more about it, why not simply always enable substitutes for
> fixed-output derivations, like this:
>
> diff --git a/nix/libstore/build.cc b/nix/libstore/build.cc
> index d68e8b2bc..03a8f5080 100644
> --- a/nix/libstore/build.cc
> +++ b/nix/libstore/build.cc
> @@ -1034,8 +1034,10 @@ void DerivationGoal::haveDerivation()
>
> /* We are first going to try to create the invalid output paths
> through substitutes. If that doesn't work, we'll build
> - them. */
> - if (settings.useSubstitutes && substitutesAllowed(drv))
> + them. Always enable substitutes for fixed-output derivations to
> + protect against disappearing files and in-place modifications on
> + upstream sites. */
> + if ((fixedOutput || settings.useSubstitutes) && substitutesAllowed(drv))
> foreach (PathSet::iterator, i, invalidOutputs)
> addWaitee(worker.makeSubstitutionGoal(*i, buildMode == bmRepair));

[...]

Toggle quote (6 lines)
> The downside is that it still requires one to authorize the server’s
> key, although it’s in theory unnecessary since it’s content addressed.
> I’m not sure how to solve that because ‘guix substitute’ doesn’t know
> that it’s substituting a fixed-output derivation. I suppose we’d need
> to modify the “protocol” between guix-daemon and ‘guix substitute’.

I looked at how to address this by having ‘guix substitute’
automatically determine whether it’s being asked for a content-addressed
item or not. The guts of it is this procedure:

(define* (content-addressed-item? item hash
#:key (hash-algo 'sha256))
"Return true if ITEM, a store file name, is definitely a content-addressed
item (result of a fixed-output derivation) with the given HASH of type
HASH-ALGO, false otherwise.

Note: This procedure is useful when the deriver of ITEM is unknown. In other
cases, the recommended approach is to check 'fixed-output-derivation?' on the
deriver."
;; XXX: This returns #f for "text" items produced by 'add-text-to-store'.
;; There's not much we can do because the file name for these is a function
;; of their content.
(let ((name (store-path-package-name item)))
(or (string=? item (fixed-output-path name hash #:recursive? #f
#:hash-algo hash-algo))
(string=? item (fixed-output-path name hash #:recursive? #t
#:hash-algo hash-algo)))))

It works as expected for the result of “recursive fixed-output
derivations”—i.e., fixed-output derivations that produce a directory,
such as VCS checkouts.

However it doesn’t work for fixed-output derivations that produce a flat
file, such as origins with the ‘url-fetch’ method. The reason is
because in the case of non-recursive derivations, the store file name is
computed as a function of the file hash, not as a function of the nar
hash, whereas narinfos only contains the nar hash (the thing that ‘guix
hash -r’ computes.)

So I think we have to communicate more info from the daemon to ‘guix
substitute’.

Ludo’.
L
L
Ludovic Courtès wrote on 15 Dec 2017 10:30
Always enable substitutes for fixed-output derivations
(name . Leo Famulari)(address . leo@famulari.name)
87a7ykmj7k.fsf_-_@gnu.org
ludo@gnu.org (Ludovic Courtès) skribis:

Toggle quote (3 lines)
> So I think we have to communicate more info from the daemon to ‘guix
> substitute’.

The attached patch addresses that by simply calling out to the daemon to
determine whether we’re dealing with a content-addressed item.

To summarize, the new behavior is that substitutes are always enabled
for fixed-output derivations. That way, people willing to build
everything from source can still use ‘--no-substitutes’ and yet be able
to retrieve source code without being penalized compared to someone
enabling substitutes wholesale.

Of course, when substitutes are missing, we fall back to regular
downloads or VCS checkouts. It is also still possible to choose where
substitutes are downloaded from, using ‘--substitute-urls’, or even to
pass an empty list of URLs.

Feedback welcome!

Ludo’.
L
L
Ludovic Courtès wrote on 14 Feb 2020 22:34
Re: bug#39575: guix time-machine fails when a tarball was modified in-place
(name . Jan Nieuwenhuizen)(address . janneke@gnu.org)
878sl47t0q.fsf@gnu.org
Jan Nieuwenhuizen <janneke@gnu.org> skribis:

Toggle quote (2 lines)
> Ludovic Courtès writes:

[...]

Toggle quote (8 lines)
>> The problem here is really that we fall back to content-addressed
>> mirrors instead of using them directly:
>>
>> https://issues.guix.gnu.org/issue/28659
>
> Wait, what happened here; you finally proposed a patch two years ago and
> nothing happened/we all forgot to follow up?

I think we forgot, indeed.

One thing I don’t quite like about the patch is the fact that ‘guix
substitutes’ connects to the daemon in ‘content-addressed-item?’.

Also, one could argue that we’d steer users towards downloading from our
server, which could be a privacy concern (probably not a strong argument
since one can easily change the substitute URLs.)

Thoughts?

Ludo’.
Z
Z
zimoun wrote on 15 Feb 2020 16:43
(name . Ludovic Courtès)(address . ludo@gnu.org)
CAJ3okZ0-zPcs+pC4tQEymD-On-aN_-hgKRkRzBJzusdbtdYdAg@mail.gmail.com
Hi,

On Fri, 14 Feb 2020 at 22:34, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (4 lines)
> Also, one could argue that we’d steer users towards downloading from our
> server, which could be a privacy concern (probably not a strong argument
> since one can easily change the substitute URLs.)

I am not following the privacy concern.
What do you mean?

Cheers,
simon
L
L
Ludovic Courtès wrote on 16 Feb 2020 11:59
(name . zimoun)(address . zimon.toutoune@gmail.com)
87k14m3iiy.fsf@gnu.org
Hi!

zimoun <zimon.toutoune@gmail.com> skribis:

Toggle quote (9 lines)
> On Fri, 14 Feb 2020 at 22:34, Ludovic Courtès <ludo@gnu.org> wrote:
>
>> Also, one could argue that we’d steer users towards downloading from our
>> server, which could be a privacy concern (probably not a strong argument
>> since one can easily change the substitute URLs.)
>
> I am not following the privacy concern.
> What do you mean?

I mean that by default, someone who’s disabled substitutes (presumably
out of security or privacy concerns) would find themself downloading
source code from ci.guix.gnu.org instead of various upstream sites.

Ludo’.
Z
Z
zimoun wrote on 17 Feb 2020 11:18
(name . Ludovic Courtès)(address . ludo@gnu.org)
CAJ3okZ08ibXTBqsZMwnuEVdhpyXgHVp6+rNGXB02gsHVqwu53A@mail.gmail.com
Hi Ludo,

On Sun, 16 Feb 2020 at 11:59, Ludovic Courtès <ludo@gnu.org> wrote:
Toggle quote (14 lines)
> zimoun <zimon.toutoune@gmail.com> skribis:
> > On Fri, 14 Feb 2020 at 22:34, Ludovic Courtès <ludo@gnu.org> wrote:

> >> Also, one could argue that we’d steer users towards downloading from our
> >> server, which could be a privacy concern (probably not a strong argument
> >> since one can easily change the substitute URLs.)
> >
> > I am not following the privacy concern.
> > What do you mean?
>
> I mean that by default, someone who’s disabled substitutes (presumably
> out of security or privacy concerns) would find themself downloading
> source code from ci.guix.gnu.org instead of various upstream sites.

I do not see the difference between mirroring and traveling back in
time with missing upstream sources.
And because it is content-addressed, it seems even more secure than
downloading from a upstream URL, IMHO.
If one trusts Guix, then an attacker needs to corrupt in the same time
the Guix history and Berlin (and/or any other farm).
If one does not trust Guix, why does they use the recipe coming from
Guix? To be precise, this person has to check all the recipes of all
the dependencies.

Well, I do not see a security concern because we are talking about
serving the sources.
It is another story when the substitutes serve the results of the
build (binaries); because one does not have any strong guarantee that
the substitute serves the expected binaries.

By privacy concern, do you mean that Guix could collect who downloads
what; in a central fashion? Which is not the case when one downloads
from several distributed upstream sources. Right?
Well, I am not convinced because the case of missing upstream source
is rare. And it is easy to protect against such collecting data
process.
In paranoid mode, traveling back in time is becoming difficult because
of the reliability of the sources; I mean if the sources were
reliable, SWH would not exist. ;-) The solution should be an IPFS /
GNUnet / full distributed archive... which is not ready... yet! :-)


Well, maybe for the TODO list of the time-machine: add an option to
allow substitutes *only* for the sources (substitutes meaning
ci.guix.gnu.org and/or SWH). If this option does not exist yet. ;-)


Cheers,
simon
L
L
Ludovic Courtès wrote on 17 Feb 2020 15:40
(name . zimoun)(address . zimon.toutoune@gmail.com)
87pned6zw2.fsf@gnu.org
Hi,

zimoun <zimon.toutoune@gmail.com> skribis:

Toggle quote (15 lines)
> On Sun, 16 Feb 2020 at 11:59, Ludovic Courtès <ludo@gnu.org> wrote:
>> zimoun <zimon.toutoune@gmail.com> skribis:
>> > On Fri, 14 Feb 2020 at 22:34, Ludovic Courtès <ludo@gnu.org> wrote:
>
>> >> Also, one could argue that we’d steer users towards downloading from our
>> >> server, which could be a privacy concern (probably not a strong argument
>> >> since one can easily change the substitute URLs.)
>> >
>> > I am not following the privacy concern.
>> > What do you mean?
>>
>> I mean that by default, someone who’s disabled substitutes (presumably
>> out of security or privacy concerns) would find themself downloading
>> source code from ci.guix.gnu.org instead of various upstream sites.

[...]

Toggle quote (4 lines)
> By privacy concern, do you mean that Guix could collect who downloads
> what; in a central fashion? Which is not the case when one downloads
> from several distributed upstream sources. Right?

Exactly. But like I wrote above, I don’t think it’s a strong argument.

What remains is the issue with ‘content-addressed-item?’, then.

Ludo’.
Z
Z
zimoun wrote on 17 Feb 2020 16:04
(name . Ludovic Courtès)(address . ludo@gnu.org)
CAJ3okZ0BrbfqmrsmfJykwqPA8PQFtSUB3JBdNaN1=npZNa36Eg@mail.gmail.com
On Mon, 17 Feb 2020 at 15:40, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (2 lines)
> Exactly. But like I wrote above, I don’t think it’s a strong argument.

I agree and the big picture depends on the audience.
Scientific communities would be fine with centralized archives such as
SWH. And only centralized archives IMHO can provide a reliable "long
term" support which is the point for that communities. (Quote because
not clearly defined what it is. :-))
Other communities would prefer distributed archive such as IPFS or
GNUnet but 1. it still needs some work and 2. the "long term" is not
guarantee by nature, IMHO. But it is probably not an issue for that
communities.


Toggle quote (2 lines)
> What remains is the issue with ‘content-addressed-item?’, then.

I agree.
The bridge with SWH is in good shape, IMHO.
And the pending IPFS patch would deserve more love. :-) Maybe soon...



Cheers,
simon
Z
Z
zimoun wrote on 9 Sep 2020 16:31
Re: bug#28659: Content-addressed mirror is not used upon invalid hash
(name . Ludovic Courtès)(address . ludo@gnu.org)
87eenbuim8.fsf_-_@gmail.com
Hi,

On Fri, 14 Feb 2020 at 22:34, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (3 lines)
> One thing I don’t quite like about the patch is the fact that ‘guix
> substitutes’ connects to the daemon in ‘content-addressed-item?’.

What is the status of this patch [1] following the recent discussion about
tar “disarchive” and SWH?

Related:
All the best,
simon

L
L
Ludovic Courtès wrote on 10 Sep 2020 10:14
(name . zimoun)(address . zimon.toutoune@gmail.com)
87363qjbez.fsf@gnu.org
Hello,

zimoun <zimon.toutoune@gmail.com> skribis:

Toggle quote (13 lines)
> On Fri, 14 Feb 2020 at 22:34, Ludovic Courtès <ludo@gnu.org> wrote:
>
>> One thing I don’t quite like about the patch is the fact that ‘guix
>> substitutes’ connects to the daemon in ‘content-addressed-item?’.
>
> What is the status of this patch [1] following the recent discussion about
> tar “disarchive” and SWH?
>
> Related:
> - http://issues.guix.gnu.org/issue/39575
> - http://issues.guix.gnu.org/42162
> - https://git.ngyro.com/disarchive/

Thanks for the reminder. I don’t think Timothy’s work changes anything
wrt. to this issue: it would still need to be addressed.

Ludo’.
Z
Z
zimoun wrote on 3 Feb 2022 03:58
(name . Ludovic Courtès)(address . ludo@gnu.org)
86zgn8zmel.fsf_-_@gmail.com
Hi Ludo,

On Fri, 15 Dec 2017 at 10:30, ludo@gnu.org (Ludovic Courtès) wrote:

Toggle quote (6 lines)
>> So I think we have to communicate more info from the daemon to ‘guix
>> substitute’.
>
> The attached patch addresses that by simply calling out to the daemon to
> determine whether we’re dealing with a content-addressed item.

WDYT to rebase this patch [1] and resubmit to guix-patches in order to
get more attention and so potential feedback and/or review?



Cheers,
simon
L
L
Ludovic Courtès wrote on 1 May 12:35 +0200
control message for bug #28659
(address . control@debbugs.gnu.org)
87le4t7rb8.fsf@gnu.org
merge 28659 70588
quit
?
Your comment

Commenting via the web interface is currently disabled.

To comment on this conversation send an email to 28659@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 28659
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch