Hello, Here's a patch adding remote build support to Cuirass, as presentedduring Guix Days. The concept is the following: * Cuirass talks to a "remote server" instead of the "guix-daemon" tobuild derivations when the "--build-remote" option is passed. * The "remote server" is advertised using Avahi. It queues the receivedbuild requests. It also starts a "publish" server. * The "remote workers" discover the "remote server" using Avahi, connectto it and request some builds. The "remote server" publish server isadded to the workers "guix-daemon" substitute urls list. * On build completion, the "remote server" downloads the build outputsas nar and narinfo files from the worker "publish" server and store themin a cache directory. It can also add them to the store if the"--add-to-store" option is passed. * Cuirass is notified by the "remote server" when a build starts, failsor completes and can update its database accordingly. * The communication between Cuirass, the "remote server" and the "remoteworkers" is done by sending SEXP over ZMQ. This is still a bit rough on the edges, but I have tested it on berlinspawning ~30 workers and building ~10K derivations, it seems to workfine. The corresponding patch and an architecture overview diagram are attached. Thanks, Mathieu :https://xana.lepiller.eu/guix-days-2020/guix-days-2020-mathieu-otacehe-fixing-the-ci.mp4
From 94898f67e1dca6152c434ff50e860691ce813018 Mon Sep 17 00:00:00 2001From: Mathieu Othacehe <firstname.lastname@example.org>Date: Wed, 2 Dec 2020 11:13:33 +0100Subject: [PATCH] Add remote build support. * src/cuirass/remote.scm: New file.* src/cuirass/remote-server.scm: New file.* src/cuirass/remote-worker.scm: New file.* bin/remote-server.in: New file.* bin/remote-worker.in: New file.* Makefile.am (bin_SCRIPTS): Add new binaries,(dist_pkgmodule_DATA): add new files,(EXTRA_DIST): add new binaries,(bin/remote-server, bin/remote-worker): new targets.* .gitignore: Add new binaries.* bin/cuirass.in (%options): Add "--build-remote" option,(show-help): document it,(main): honor it.* src/cuirass/base.scm (with-build-offload-thread): New macro,(%build-remote?, %build-offload-channel): new parameters,(make-build-offload-thread): new procedure,(build-derivations/offload): new procedure,(restart-builds): use it to offload builds when %build-remote? is set,(build-packages): ditto.--- .gitignore | 2 + Makefile.am | 16 +- bin/cuirass.in | 162 ++++++----- bin/remote-server.in | 29 ++ bin/remote-worker.in | 29 ++ src/cuirass/base.scm | 65 ++++- src/cuirass/remote-server.scm | 518 ++++++++++++++++++++++++++++++++++ src/cuirass/remote-worker.scm | 286 +++++++++++++++++++ src/cuirass/remote.scm | 292 +++++++++++++++++++ 9 files changed, 1318 insertions(+), 81 deletions(-) create mode 100644 bin/remote-server.in create mode 100644 bin/remote-worker.in create mode 100644 src/cuirass/remote-server.scm create mode 100644 src/cuirass/remote-worker.scm create mode 100644 src/cuirass/remote.scm
Hello, There's a new variant of this patch on the wip-offload branch ofCuirass. Quite a few things have changed since the first version: * The "remote-server" no longer communicates directly with Cuirass, all the exchanges are done through the database. * The "remote-worker" now honors "timeout" and "max-silent-time" package parameters. * I have added build priorities support. The build priority is computed this way: build_priority = specification_priority * 10 + package_priority * There's a new "worker" table that stores what workers are currently building, and an associated "/workers" page. * The "remote-worker" can connect to a "remote-server" specified on the command line. * The substitutes are downloaded and stored in the publish cache directory. I have deployed another Cuirass instance on berlin using this mechanismand workers on all build machines. It's been buildingmaster/core-updated/staging/modular specifications since a few days forx86_64-linux and i686-linux architectures. The results are really better than using the current implementation andit should be possible to transition to this new architecture soon. Thanks, Mathieu
Toggle quote (2 lines)> Where are these specification_priority and package_priority configured?
specification_priority comes from the new "#:priority" field in theCuirass specification file and package_priority comes from the new"#:priority" field in the job structure. specification_priority ∈ [0, 9]package_priority ∈ [0, 9] ⇒ build_priority ∈ [0, 99] where 0 is the maximal priority. When both specification_priority andpackage_priority are unset, the priority defaults to 99. I'm currently using the following priorities: modular: 1guix-master:2staging:3core-updates:4 The builds are picked according to their priority and then theirtimestamp, so that the most recent builds are picked first when thepriorities are identical. I don't have a strategy regarding package_priority yet.
Toggle quote (2 lines)> This removes part of issues about GC on The Big Store, right?
Yes, this should allow to garbage collect way more aggressively thestore on berlin and hopefully reduce the GC duration. Thanks, Mathieu
Hi Mathieu, Thanks for the explanations. On Mon, 21 Dec 2020 at 16:08, Mathieu Othacehe <email@example.com> wrote:
Toggle quote (21 lines)>> Where are these specification_priority and package_priority configured?>> specification_priority comes from the new "#:priority" field in the> Cuirass specification file and package_priority comes from the new> "#:priority" field in the job structure.>> specification_priority ∈ [0, 9]> package_priority ∈ [0, 9]>> ⇒ build_priority ∈ [0, 99]>> where 0 is the maximal priority. When both specification_priority and> package_priority are unset, the priority defaults to 99.>> I'm currently using the following priorities:>> modular: 1> guix-master:2> staging:3> core-updates:4
Is this the specification_priority? Where would the package_priority be defined? A file mapping the packagename to the priority number? Something else?
Toggle quote (4 lines)> The builds are picked according to their priority and then their> timestamp, so that the most recent builds are picked first when the> priorities are identical.
First are high priority, whatever the timestamp is, right?
Toggle quote (2 lines)> I don't have a strategy regarding package_priority yet.
Why do you need a package_priority strategy? You only need a #:prioritystrategy and a formula to compute it with the relevant parameters, herespecification_priority and package_priority. Or I miss something. In my understanding, the priority is given by something as: #:priority = f(specification_priority, package_priority, timestamp) where, for example, once a week the queue is reevaluated to increase theold; otherwise some could be blocked.
Toggle quote (5 lines)>> This removes part of issues about GC on The Big Store, right?>> Yes, this should allow to garbage collect way more aggressively the> store on berlin and hopefully reduce the GC duration.