Hey Ludo! > Did you manage to come up with a way to synthetically reproduce the > problem? Yes but it is rather complex. I managed to restart a lot of builds at the same time by running SQL queries on the Cuirass database. > 2. It does I/O when it calls ‘read-derivation-from-file’. Under high > I/O load, that could be relatively expensive, though I’d expect it > to be measured in tenths of a second at worst? It looks like read-derivation-from-file is indeed quite expensive. There's an attached strace log that shows a bunch of derivation files reading, caused by read-derivation-from-file. I can count 336 derivation file reads in 30 seconds, which is not much but could get worst I think. > But look, ‘read-derivation-from-file’ is called just to fill in the > “System” field, which is not used anywhere (not a single caller of > ‘narinfo-system’), so we could just as well remove it and see how > it behaves. Yes, I'll propose a patch to remove it. > Anyway, that the main thread is blocking while this happens is certainly > a problem, so this patch looks like an improvement. That we have to use > the ‘http-write’ hack isn’t great, but I think it’s OK, unless we want > to switch to Fibers. I think that applying this patchset + removing read-derivation-from-file call + increasing Nginx timeouts could be a good start. However, I will be mostly afk for about 3 weeks so unable to monitor the publish server on Berlin and fix potential regressions. Maybe we should wait until then. Thanks, Mathieu