On Sun, Jun 05, 2016 at 15:39:04 -0500, Christopher Allan Webber wrote:
> One theoretical optimization: if I verify the DAG, could I store
> somewhere that I've verified from commit cabba6e and upward already, so
> the next time I verify it only has to verify the new commits?

tbh, I haven't given this the amount of thought/research that I feel it
needs.  Unfortunately, you got me thinking, so here's another long
message.

In essence, this is equivalent to Ludo's suggestion of stopping at
the last tag (if you envision, say, tagging the last processed commit)
_provided that_ you also verify the commit that the tag is pointing to.

My short answer is: practically speaking, it's probably fine, because
you're more than likely trying to defend against an attacker that gains
access to the repo, not a second-preimage attack.

*   *
  *

Long answer (braindump):

When I consider the potential threats, I consider that the integrity of
each blob, tree, commit, etc are fairly well assured by their hashes,
but depend entirely on the security of SHA-1, whose future is
increasingly grim.  SHA-1 does just fine for uniquely identifying
objects---and if it didn't, hashes offending preimages would just be
blacklisted.  But it was never intended for security.

The problem is pretty bad: signed commits will ensure the integrity of
the commit itself (the object---as in `git cat-file -p COMMIT`); the
problem is that you don't just have to find a preimage for the hashes
signed in that commit: the tree hash is what really dictates the
content, and that tree hash in turn identifies other trees and blobs:

  $ git cat-file -p 'HEAD^{tree}'
  ...
  100644 blob 9b9481deea8cee4cc61971a752d02c04d5f0654e    configure.ac
  040000 tree f2b4528e1f66f3bbc4742dc4a11bd1283cd475b9    doc
  ...

That blob contains the actual file contents.

So in a large project like Guix, you have so many opportunities!  You
can try to find preimages for any of the trees or blobs _without having
to worry about any signatures_; neither trees nor blobs are signed.

With that said, if I recall correctly (and after a very brief glance at
fetch-pack.c), a successful preimage attack would only affect users who
haven't already fetched the legitimate object---otherwise Git wouldn't
bother fetching it.  I'm not sure if I find comfort in this or not: it's
been used by some to dismiss the problem of collisions, but (assuming
git is silent about it---and why wouldn't it be, as it wouldn't know
better) that's worse, since maintainers and common contributors wouldn't
notice anything wrong at all.  But someone who clones fresh and compiles
would be screwed.

So signing commits almost certainly protects you against someone who
gains access to the repository on a common origin or a
maintainer/contributor's PC, provided that nobody's private key is
compromised.

But there doesn't seem to be any way to secure a git repository against
a second-preimage attack.

So given that, it doesn't really matter if you re-verify all the commits
or not: an attacker doesn't need to even bother with the commit
object.  I guess one option is to keep a local copy of the repository,
clone a fresh copy, and occasionally diff _every_ object (commit, tag,
tree, blob) for differences.

So if Git wants to take this issue seriously, changes have to be
made.  In the meantime, in addition to commit verification, you can
always keep around a local copy of the repository, always clone a copy,
and ensure that builds between the two are reproducible.