Cuirass rebuilds the same packae multiple times

OpenSubmitted by Julien Lepiller.
Details
3 participants
  • Julien Lepiller
  • Ludovic Courtès
  • Mathieu Othacehe
Owner
unassigned
Severity
normal
J
J
Julien Lepiller wrote on 9 Feb 14:19 +0100
(address . bug-guix@gnu.org)
20210209141915.40114e57@tachikoma.lepiller.eu
Hi!
I've updated php yesterday and I noticed that cuirass is now busybuilding dependents on aarch64. Looking at the log of some of them, itseems that the workers are independently building the same phpderivation, and not the dependents, at the same time. This is extremelywasteful as php is very long to build (and might even eventually fail).
here is an example: https://ci.guix.gnu.org/build/287478/detailsandhttps://ci.guix.gnu.org/build/287476/detailsare being built at thesame time, and the logs currently show they are both running the testphase of the php package.
Shouldn't cuirass first schedule builds for dependencies before itbuilds dependents?
M
M
Mathieu Othacehe wrote on 9 Feb 16:42 +0100
(name . Julien Lepiller)(address . julien@lepiller.eu)
87lfbxs0w9.fsf@gnu.org
Hello Julien,
Toggle quote (5 lines)> here is an example: https://ci.guix.gnu.org/build/287478/detailsand> https://ci.guix.gnu.org/build/287476/details are being built at the> same time, and the logs currently show they are both running the test> phase of the php package.
Thanks for the report. This problem has been briefly discussedyesterday. It has been introduced by the new remote building mechanismin Cuirass. Hydra solves it by breaking each build into buildstepscorresponding to the derivation inputs.
The buildsteps are then submitted to the workers in a logical order. Iproposed to introduce a similar mechanism in Cuirass but Ludo expresseddoubts. Ludo do you think this problem could be solved otherwise?
Thanks,
Mathieu
L
L
Ludovic Courtès wrote on 10 Feb 11:46 +0100
(name . Mathieu Othacehe)(address . othacehe@gnu.org)
87zh0c8ajo.fsf@gnu.org
Hi,
Mathieu Othacehe <othacehe@gnu.org> skribis:
Toggle quote (14 lines)>> here is an example: https://ci.guix.gnu.org/build/287478/detailsand>> https://ci.guix.gnu.org/build/287476/details are being built at the>> same time, and the logs currently show they are both running the test>> phase of the php package.>> Thanks for the report. This problem has been briefly discussed> yesterday. It has been introduced by the new remote building mechanism> in Cuirass. Hydra solves it by breaking each build into buildsteps> corresponding to the derivation inputs.>> The buildsteps are then submitted to the workers in a logical order. I> proposed to introduce a similar mechanism in Cuirass but Ludo expressed> doubts. Ludo do you think this problem could be solved otherwise?
I’m not sure exactly but I can share my feelings. :-)
Seems to me that ‘BuildSteps’ is an orthogonal concern that has littleto do with Cuirass’ job and with its data model. In Hydra I saw that asa (necessary) kludge.
I like the way the Coordinator does it, and AIUI it’s pretty much thesame as what the daemon is doing: submit build requests in topologicalorder, such that when a derivation build is submitted, its prerequisitesare known to be built already.
I suppose what makes it more difficult here is that we have this extra“job” abstraction on top of derivations; everything in Cuirass revolvesaround jobs, which leads to this impedance mismatch.
If Cuirass would instead delegate derivation build requests to aCoordinator/daemon-like thing, it wouldn’t have to worry about thosedetails. That would better separate concerns.
This is quite a hand-wavy reply but I hope it’s useful!
Thanks,Ludo’.
M
M
Mathieu Othacehe wrote on 10 Feb 12:24 +0100
(name . Ludovic Courtès)(address . ludo@gnu.org)
87o8gs2mjq.fsf@gnu.org
Hey Ludo,
Thanks for sharing your thoughts, it's always useful :).
Toggle quote (4 lines)> Seems to me that ‘BuildSteps’ is an orthogonal concern that has little> to do with Cuirass’ job and with its data model. In Hydra I saw that as> a (necessary) kludge.
I'm not sure to follow you here. Cuirass and Hydra have an almostidentical database schema and are now working very similarly from what Iunderstand.
In Hydra, a JobSet (Specification in Cuirass) has several Builds. EachBuild can be broken in several BuildSteps, corresponding to transitivederivation inputs that must be built.
Hydra manages to get those BuildSteps to be built in a topologicalorder, in the same way as the Guix Build Coordinator.
This makes me think that we could implement this exact same mechanism inCuirass but I'm maybe missing something.
Toggle quote (4 lines)> If Cuirass would instead delegate derivation build requests to a> Coordinator/daemon-like thing, it wouldn’t have to worry about those> details. That would better separate concerns.
I think that having Cuirass delegating its builds to the Coordinator isnot the right move. That would mean doubling the size of the CI codebase, doubling the number of databases, for a feature that we couldimplement in Cuirass, just by making it catch-up on Hydra.
Thanks,
Mathieu
L
L
Ludovic Courtès wrote on 17 Feb 15:22 +0100
(name . Mathieu Othacehe)(address . othacehe@gnu.org)
87k0r6wz8i.fsf@gnu.org
Howdy!
Mathieu Othacehe <othacehe@gnu.org> skribis:
Toggle quote (18 lines)>> Seems to me that ‘BuildSteps’ is an orthogonal concern that has little>> to do with Cuirass’ job and with its data model. In Hydra I saw that as>> a (necessary) kludge.>> I'm not sure to follow you here. Cuirass and Hydra have an almost> identical database schema and are now working very similarly from what I> understand.>> In Hydra, a JobSet (Specification in Cuirass) has several Builds. Each> Build can be broken in several BuildSteps, corresponding to transitive> derivation inputs that must be built.>> Hydra manages to get those BuildSteps to be built in a topological> order, in the same way as the Guix Build Coordinator.>> This makes me think that we could implement this exact same mechanism in> Cuirass but I'm maybe missing something.
When Cuirass was started, I wanted to avoid what I perceived as ashortcoming of Hydra’s design: one daemon connection per job and buildsteps, which kinda replicate what the daemon is doing.
So I suggested going for one connection for all the jobs and passing allthe derivations to the daemon so that the daemon can see the big picture,make better scheduling decisions, and so we don’t have to re-implement“build steps”.
But as you know, this strategy didn’t work out as expected because ofscalability issues in the daemon.

Regardless, it seems to me that ‘BuildSteps’ is a low-level thingcompared to the rest of the Cuirass database: it reifies part of thederivation graph whereas the rest of the database is all about “jobs”and “builds” thereof. It’s not the same abstraction level.
I realize it’s somewhat subjective though and I don’t want to impedeprogress!
Toggle quote (9 lines)>> If Cuirass would instead delegate derivation build requests to a>> Coordinator/daemon-like thing, it wouldn’t have to worry about those>> details. That would better separate concerns.>> I think that having Cuirass delegating its builds to the Coordinator is> not the right move. That would mean doubling the size of the CI code> base, doubling the number of databases, for a feature that we could> implement in Cuirass, just by making it catch-up on Hydra.
I see. Generally speaking, I think better separation of concerns maysometimes be worth extra code, insomuch as it makes it easier to reasonabout things, to debug, and to add new features. Of course it’s atradeoff; adding too much code just for the beauty of abstractions isn’treasonable either.
I wonder if having two databases instead of single one (which wouldessentially be the union of those two databases) is a problem. I guessone problem is if that makes it hard to make commonly-needed “joins”across the two databases.
Regarding features, one thing I like about the Coordinator is itssupport for retrying builds, which could serve to detect flaky builds orbuild processes that are kernel- or hardware-dependent. I think it’s afeature we’d want eventually, but I wonder if it should be Cuirass’sjob.

It’d be nice to focus on a single code base for “distributed builds” ingeneral, and I was hoping for a Coordinator/Cuirass convergence on thisaspect. But at the end of the day, what matters most is what weachieve. Cuirass has been doing so much better on many fronts over thelast few weeks, including reliability, build throughput, and monitoring.At the same time, the Coordinator proves useful and easy to deploy inmore experimental setups; I think Chris’s instance now aggregatesresults from a variety of machines, including POWER and GNU/Hurd, andthat seemed quite easy to do. I’m not going to complain aboutover-success in this area! :-)
Ludo’.
?