Hi Chris, On Mon, 13 Jul 2020 at 20:20, Christopher Baines wrote: > Going forward, being methodical as a project about storing the tarballs > and source material for the packages is probalby the way to ensure it's > available for the future. I'm not sure the data storage cost is > significant, the cost of doing this is probably in working out what to > store, doing so in a redundant manor, and making the data available. A really rough estimate is 120KB on average* per raw tarball. So if we consider 14000 packages and 70% of them are url-fetch, then it leads to 14k*0.7*120K= 1.2GB; which is not significant. Moreover, if we extrapolate the numbers, between v1.0.0 and now it is 23 commits per day modifying gnu/packages/ so 0.7*23*120K*365= 700MB per year. However, the 120KB of metadata to re-assemble the tarball have to be compared to the 712KB of raw compressed tarball; both about the hello package. *based on the hello package. And it depends on the number of files in the tarball. File stored not compressed: plain sexp. Therefore, in addition to what to store, redundancy and availability, one question is how to store? Git-repo? SQL database? etc. > The Guix Data Service knows about fixed output derivations, so it might > be possible to backfill such a store by just attempting to build those > derivations. It might also be possible to use the Guix Data Service to > work out what's available, and what tarballs are missing. Missing from where? The substitutes farm or SWH? Cheers, simon