Hello, Mark H Weaver skribis: > Actually, IIUC, the build slaves are _already_ compressing everything, > and they always have. They compress the build outputs for transmission > back to the master machine. In the current framework, the master > machine immediately decompresses them upon receipt, and this compression > and decompression is considered an internal detail of the network > transport. > > Currently, the master machine stores all build outputs uncompressed in > /gnu/store, and then later recompresses them for transmission to users > and other build slaves. The needless decompression and recompression is > a tremendous amount of wasted work on our master machine. That it's all > stored uncompressed is also a significant waste of disk space, which > leads to significant additional costs during garbage collection. > > Essentially, my proposal is for the build slaves to be modified to > prepare the compressed NARs in a form suitable for delivery to end users > (and other build slaves) with minimal processing by our master node. > The master node would be significantly modified to receive, store, and > forward NARs explicitly, without ever decompressing them. As far as I > can tell, this would mean strictly less work to do and less data to > store for every machine and in every case. I agree that the redundant compression/decompression is terrible. Yet I’m not sure how to architect a solution where compression is performed by build machines. The main issue is that offloading and publication are two independent mechanisms, as things are. Maybe each build machine for a build farm use-case we could have a “semi-offloading” mechanism whereby the master spawns a remote build without retrieving its result, something akin to: GUIX_DAEMON_SOCKET=ssh://build-machine.example.org \ guix build /gnu/store/…-foo.drv In addition, the build machine would publish its result via ‘guix publish’, which the master could then simply mirror and cache with nginx. There’s the issue of signatures, but perhaps we could have a more sophisticated PKI and have the master delegate to build machines… Then there are other issues such as that of synchronizing the TTL of a narinfo and its corresponding nar, which --cache addresses. Tricky! > Ludovic has pointed out that we cannot do this because Hydra must add > its digital signature, and that this digital signature is stored within > the compressed NAR. Therefore, we cannot avoid having the master > machine decompress and recompress every NAR that is delivered to users. > > In my opinion, we should change the way we sign NARs. Signatures should > be external to the NARs, not internal. Not only would this allow us to > decentralize production of our NARs, but more importantly, it would > enable a community of independent builders to add their signatures to a > common pool of NARs. Having a common pool of NARs enables us to store > these NARs in a shared distribution network without duplication. We > cannot even have a common pool of NARs if they contain > build-farm-specific data such as signatures. Currently the signature is in the narinfos, not in nars proper¹. So we can already add signatures on an externally provided nar, for instance. There’s a silly limitation currently, which is that the signature is computed over all the fields of the narinfo. That’s silly because it means that if you change, say, the compression format or the URL of the nar, then the signature becomes invalid. We should fix that at some point. Ludo’. ¹ For ‘guix publish’. ‘guix archive --export’ appends a signature to the nar set.