cuirass: Add remote build support.

  • Done
  • quality assurance status badge
Details
3 participants
  • Mathieu Othacehe
  • Mathieu Othacehe
  • zimoun
Owner
unassigned
Submitted by
Mathieu Othacehe
Severity
normal
M
M
Mathieu Othacehe wrote on 2 Dec 2020 12:04
(address . guix-patches@gnu.org)
87czzso4dj.fsf@gnu.org
Hello,

Here's a patch adding remote build support to Cuirass, as presented
during Guix Days[1]. The concept is the following:

* Cuirass talks to a "remote server" instead of the "guix-daemon" to
build derivations when the "--build-remote" option is passed.

* The "remote server" is advertised using Avahi. It queues the received
build requests. It also starts a "publish" server.

* The "remote workers" discover the "remote server" using Avahi, connect
to it and request some builds. The "remote server" publish server is
added to the workers "guix-daemon" substitute urls list.

* On build completion, the "remote server" downloads the build outputs
as nar and narinfo files from the worker "publish" server and store them
in a cache directory. It can also add them to the store if the
"--add-to-store" option is passed.

* Cuirass is notified by the "remote server" when a build starts, fails
or completes and can update its database accordingly.

* The communication between Cuirass, the "remote server" and the "remote
workers" is done by sending SEXP over ZMQ.

This is still a bit rough on the edges, but I have tested it on berlin
spawning ~30 workers and building ~10K derivations, it seems to work
fine.

The corresponding patch and an architecture overview diagram are attached.

Thanks,

Mathieu

[1]:
Attachment: remote.png
Z
Z
zimoun wrote on 2 Dec 2020 12:25
86pn3s1mbu.fsf@gmail.com
Hi Mathieu,

On Wed, 02 Dec 2020 at 12:04, Mathieu Othacehe <othacehe@gnu.org> wrote:

Toggle quote (3 lines)
> Here's a patch adding remote build support to Cuirass, as presented
> during Guix Days[1]. The concept is the following:

Neat! You implemented the “dynamic offloading” in Cuirass. \o/

What about the store? And the outputs?


Toggle quote (4 lines)
> This is still a bit rough on the edges, but I have tested it on berlin
> spawning ~30 workers and building ~10K derivations, it seems to work
> fine.

~30 workers on ~30 different machines? Or are some workers running on
the same node?


All the best,
simon
M
M
Mathieu Othacehe wrote on 21 Dec 2020 14:40
(name . zimoun)(address . zimon.toutoune@gmail.com)(address . 45006@debbugs.gnu.org)
87wnxb2s6j.fsf@gnu.org
Hello,

There's a new variant of this patch on the wip-offload branch of
Cuirass. Quite a few things have changed since the first version:

* The "remote-server" no longer communicates directly with Cuirass, all
the exchanges are done through the database.

* The "remote-worker" now honors "timeout" and "max-silent-time" package
parameters.

* I have added build priorities support. The build priority is computed
this way:

build_priority = specification_priority * 10 + package_priority

* There's a new "worker" table that stores what workers are currently
building, and an associated "/workers" page.

* The "remote-worker" can connect to a "remote-server" specified on the
command line.

* The substitutes are downloaded and stored in the publish cache directory.

I have deployed another Cuirass instance on berlin using this mechanism
and workers on all build machines. It's been building
master/core-updated/staging/modular specifications since a few days for
x86_64-linux and i686-linux architectures.

The results are really better than using the current implementation and
it should be possible to transition to this new architecture soon.

Thanks,

Mathieu
Z
Z
zimoun wrote on 21 Dec 2020 15:13
(name . Mathieu Othacehe)(address . othacehe@gnu.org)(address . 45006@debbugs.gnu.org)
86czz39rhg.fsf@gmail.com
Hi Mathieu,

On Mon, 21 Dec 2020 at 14:40, Mathieu Othacehe <othacehe@gnu.org> wrote:

Toggle quote (5 lines)
> * I have added build priorities support. The build priority is computed
> this way:
>
> build_priority = specification_priority * 10 + package_priority

Where are these specification_priority and package_priority configured?


Toggle quote (3 lines)
> * The substitutes are downloaded and stored in the publish cache
> directory.

This removes part of issues about GC on The Big Store, right?


Toggle quote (3 lines)
> The results are really better than using the current implementation and
> it should be possible to transition to this new architecture soon.

Cool! Thanks for working on this. :-)


Cheers,
simon
M
M
Mathieu Othacehe wrote on 21 Dec 2020 16:08
(name . zimoun)(address . zimon.toutoune@gmail.com)(address . 45006@debbugs.gnu.org)
87pn332o4o.fsf@gnu.org
Hey zimoun,

Toggle quote (2 lines)
> Where are these specification_priority and package_priority configured?

specification_priority comes from the new "#:priority" field in the
Cuirass specification file and package_priority comes from the new
"#:priority" field in the job structure.

specification_priority ∈ [0, 9]
package_priority ∈ [0, 9]

⇒ build_priority ∈ [0, 99]

where 0 is the maximal priority. When both specification_priority and
package_priority are unset, the priority defaults to 99.

I'm currently using the following priorities:

modular: 1
guix-master:2
staging:3
core-updates:4

The builds are picked according to their priority and then their
timestamp, so that the most recent builds are picked first when the
priorities are identical.

I don't have a strategy regarding package_priority yet.

Toggle quote (2 lines)
> This removes part of issues about GC on The Big Store, right?

Yes, this should allow to garbage collect way more aggressively the
store on berlin and hopefully reduce the GC duration.

Thanks,

Mathieu
Z
Z
zimoun wrote on 21 Dec 2020 17:41
(name . Mathieu Othacehe)(address . othacehe@gnu.org)(address . 45006@debbugs.gnu.org)
867dpb9kni.fsf@gmail.com
Hi Mathieu,

Thanks for the explanations.

On Mon, 21 Dec 2020 at 16:08, Mathieu Othacehe <othacehe@gnu.org> wrote:

Toggle quote (21 lines)
>> Where are these specification_priority and package_priority configured?
>
> specification_priority comes from the new "#:priority" field in the
> Cuirass specification file and package_priority comes from the new
> "#:priority" field in the job structure.
>
> specification_priority ∈ [0, 9]
> package_priority ∈ [0, 9]
>
> ⇒ build_priority ∈ [0, 99]
>
> where 0 is the maximal priority. When both specification_priority and
> package_priority are unset, the priority defaults to 99.
>
> I'm currently using the following priorities:
>
> modular: 1
> guix-master:2
> staging:3
> core-updates:4

Is this the specification_priority?

Where would the package_priority be defined? A file mapping the package
name to the priority number? Something else?


Toggle quote (4 lines)
> The builds are picked according to their priority and then their
> timestamp, so that the most recent builds are picked first when the
> priorities are identical.

First are high priority, whatever the timestamp is, right?


Toggle quote (2 lines)
> I don't have a strategy regarding package_priority yet.

Why do you need a package_priority strategy? You only need a #:priority
strategy and a formula to compute it with the relevant parameters, here
specification_priority and package_priority. Or I miss something.

In my understanding, the priority is given by something as:

#:priority = f(specification_priority, package_priority, timestamp)

where, for example, once a week the queue is reevaluated to increase the
old; otherwise some could be blocked.


Toggle quote (5 lines)
>> This removes part of issues about GC on The Big Store, right?
>
> Yes, this should allow to garbage collect way more aggressively the
> store on berlin and hopefully reduce the GC duration.

Really cool!

Cheers,
simon
M
M
Mathieu Othacehe wrote on 29 Jan 2021 11:58
control message for bug #45006
(address . control@debbugs.gnu.org)
87a6ssj9hh.fsf@cervin.i-did-not-set--mail-host-address--so-tickle-me
close 45006
quit
?