On Sun, Jun 05, 2016 at 15:39:04 -0500, Christopher Allan Webber wrote: > One theoretical optimization: if I verify the DAG, could I store > somewhere that I've verified from commit cabba6e and upward already, so > the next time I verify it only has to verify the new commits? tbh, I haven't given this the amount of thought/research that I feel it needs. Unfortunately, you got me thinking, so here's another long message. In essence, this is equivalent to Ludo's suggestion of stopping at the last tag (if you envision, say, tagging the last processed commit) _provided that_ you also verify the commit that the tag is pointing to. My short answer is: practically speaking, it's probably fine, because you're more than likely trying to defend against an attacker that gains access to the repo, not a second-preimage attack. * * * Long answer (braindump): When I consider the potential threats, I consider that the integrity of each blob, tree, commit, etc are fairly well assured by their hashes, but depend entirely on the security of SHA-1, whose future is increasingly grim. SHA-1 does just fine for uniquely identifying objects---and if it didn't, hashes offending preimages would just be blacklisted. But it was never intended for security. The problem is pretty bad: signed commits will ensure the integrity of the commit itself (the object---as in `git cat-file -p COMMIT`); the problem is that you don't just have to find a preimage for the hashes signed in that commit: the tree hash is what really dictates the content, and that tree hash in turn identifies other trees and blobs: $ git cat-file -p 'HEAD^{tree}' ... 100644 blob 9b9481deea8cee4cc61971a752d02c04d5f0654e configure.ac 040000 tree f2b4528e1f66f3bbc4742dc4a11bd1283cd475b9 doc ... That blob contains the actual file contents. So in a large project like Guix, you have so many opportunities! You can try to find preimages for any of the trees or blobs _without having to worry about any signatures_; neither trees nor blobs are signed. With that said, if I recall correctly (and after a very brief glance at fetch-pack.c), a successful preimage attack would only affect users who haven't already fetched the legitimate object---otherwise Git wouldn't bother fetching it. I'm not sure if I find comfort in this or not: it's been used by some to dismiss the problem of collisions, but (assuming git is silent about it---and why wouldn't it be, as it wouldn't know better) that's worse, since maintainers and common contributors wouldn't notice anything wrong at all. But someone who clones fresh and compiles would be screwed. So signing commits almost certainly protects you against someone who gains access to the repository on a common origin or a maintainer/contributor's PC, provided that nobody's private key is compromised. But there doesn't seem to be any way to secure a git repository against a second-preimage attack. So given that, it doesn't really matter if you re-verify all the commits or not: an attacker doesn't need to even bother with the commit object. I guess one option is to keep a local copy of the repository, clone a fresh copy, and occasionally diff _every_ object (commit, tag, tree, blob) for differences. So if Git wants to take this issue seriously, changes have to be made. In the meantime, in addition to commit verification, you can always keep around a local copy of the repository, always clone a copy, and ensure that builds between the two are reproducible.