Module system thread unsafety and .go compilation

OpenSubmitted by Taylan Ulrich Bayırlı /Kammer.
Details
2 participants
  • Ludovic Courtès
  • Taylan Ulrich Bayırlı /Kammer
Owner
unassigned
Severity
important
T
T
Taylan Ulrich Bayırlı /Kammer wrote on 9 Feb 2016 21:02
(address . bug-guix@gnu.org)(address . guile-devel@gnu.org)
8737t1k5yk.fsf@T420.taylan
To speed up the compilation of the many Scheme files in Guix, we use ascript that first loads all modules to be compiled into the Guileprocess (by calling 'resolve-interface' on the module names), and thenthe corresponding Scheme files are compiled in a par-for-each.
While Guile's module system is known to be thread unsafe, the idea wasthat all mutation should happen in the serial loading phase, and theparallel compile-file calls should then be thread safe.
Sadly that assumption isn't met when autoloads are involved.Minimal-ish test-case:
- Check out 0889321.
- Build it.
- Edit gnu/build/activation.scm and gnu/build/linux-boot.scm to contain merely the following expressions, respectively:
(define-module (gnu build activation) #:use-module (gnu build linux-boot))
(define-module (gnu build linux-boot) #:autoload (system base compile) (compile-file))
- Run make again.
If you're on a multi-core system, you will probably get an error sayingsomething weird like "no such language scheme".
Note: when you then run make *again* it succeeds.

Solution proposals:
1. s/par-for-each/for-each/. Will make compilation slower on multi-core machines. We would do the same for guix pull, which is a bit sad because it's so fast right now. Very simple solution though.
2. We find out some partitioning of the Scheme modules such that there is minimal overlap in total loaded modules when the modules in one subset are each loaded by one Guile process. Then each Guile process loads & compiles the modules in its given subset serially, but these Guile processes run in parallel. This could speed things up even more than now because the module-loading phases of the processes would be parallel too. It also has the side-effect that less memory is consumed the fewer cores you have (because less Scheme modules loaded into memory at once). If someone (Ludo?) has a good general overview of Guix's module graph then maybe they can come up with a sensible partitioning of the modules, say into 4 subsets (maxing out benefits at quad-core), such that loading all modules in one subset loads a minimal amount of modules that are outside that subset. That should be the only challenging part of this solution.
3. We do nothing for now since this bug triggers rarely, and can be worked around by simply re-running make. (We just have to hope that it doesn't trigger on guix pull or on clean builds after some commit; there's no "just rerun make" in guix pull or an automated build of Guix.) AFAIU Wingo expressed motivation to make Guile's module system thread safe, so this problem would then truly disappear.
I think #2 is a pretty good solution. The only thing worrying me isthat we might not be able to sensibly partition the Scheme modulesaccording to any simple logic that can be automated (like guix/ is onesubset, gnu/packages/ is another, etc.). Maintaining the subsetsmanually in the Makefile would be pretty ugly. But maybe some simplelogic, possibly combined with few special-cases in the code, would begood enough.
Thoughts?
Taylan
L
L
Ludovic Courtès wrote on 10 Feb 2016 14:50
(name . Taylan Ulrich "Bayırlı /Kammer")(address . taylanbayirli@gmail.com)
87fux0n07o.fsf@gnu.org
taylanbayirli@gmail.com (Taylan Ulrich "Bayırlı/Kammer") skribis:
Toggle quote (21 lines)> Sadly that assumption isn't met when autoloads are involved.> Minimal-ish test-case:>> - Check out 0889321.>> - Build it.>> - Edit gnu/build/activation.scm and gnu/build/linux-boot.scm to contain> merely the following expressions, respectively:>> (define-module (gnu build activation)> #:use-module (gnu build linux-boot))>> (define-module (gnu build linux-boot)> #:autoload (system base compile) (compile-file))>> - Run make again.>> If you're on a multi-core system, you will probably get an error saying> something weird like "no such language scheme".
Do you have a clear explanation of why this happens? I would expect(system base compile) to already be loaded for instance, so it’s notclear to me what’s going on. Or is it just the mutation of (gnu buildlinux-boot) that’s causing problems?
Toggle quote (28 lines)> Solution proposals:>> 1. s/par-for-each/for-each/. Will make compilation slower on multi-core> machines. We would do the same for guix pull, which is a bit sad> because it's so fast right now. Very simple solution though.>> 2. We find out some partitioning of the Scheme modules such that there> is minimal overlap in total loaded modules when the modules in one> subset are each loaded by one Guile process. Then each Guile process> loads & compiles the modules in its given subset serially, but these> Guile processes run in parallel. This could speed things up even> more than now because the module-loading phases of the processes> would be parallel too. It also has the side-effect that less memory> is consumed the fewer cores you have (because less Scheme modules> loaded into memory at once). If someone (Ludo?) has a good general> overview of Guix's module graph then maybe they can come up with a> sensible partitioning of the modules, say into 4 subsets (maxing out> benefits at quad-core), such that loading all modules in one subset> loads a minimal amount of modules that are outside that subset. That> should be the only challenging part of this solution.>> 3. We do nothing for now since this bug triggers rarely, and can be> worked around by simply re-running make. (We just have to hope that> it doesn't trigger on guix pull or on clean builds after some commit;> there's no "just rerun make" in guix pull or an automated build of> Guix.) AFAIU Wingo expressed motivation to make Guile's module> system thread safe, so this problem would then truly disappear.
Short-term, I’d do #1 or #3; probably #1 though, because random failuresare no fun, and we know they can happen.
Longer-term, I’m not convinced by #2. I think I would instead buildpackages in reverse topological order, probably serially at first, whichwould address http://bugs.gnu.org/15602 (with the caveat that the (gnupackages …) modules cannot be topologically-sorted, but OTOH theytypically don’t use macros, so we’re fine.)
That would require a tool to extract and the ‘define-module’ forms andbuild a graph from there.
But really, we must fix http://bugs.gnu.org/15602, an in particular,‘compile-file’ should not mutate the global module name space. I thinkwe could do something like:
(define (compile-file* …) (let ((root the-root-module) (compile-root (copy-module the-root-module))) (dynamic-wind (lambda () (set! the-root-module compile-root) ;; ditto with the-scm-module ) (lambda () (compile-file …)) (lambda () (set! the-root-module root) ;; … ))))
It’s unclear how costly ‘copy-module’ would be, and the whole strategydepends on it.
Eventually it seems clear that Guile proper needs to address this usecase, and needs to provide thread-safe modules.
Ludo’.
L
L
Ludovic Courtès wrote on 10 Feb 2016 14:50
(name . Taylan Ulrich "Bayırlı /Kammer")(address . taylanbayirli@gmail.com)
87egckn07h.fsf@gnu.org
taylanbayirli@gmail.com (Taylan Ulrich "Bayırlı/Kammer") skribis:
Toggle quote (21 lines)> Sadly that assumption isn't met when autoloads are involved.> Minimal-ish test-case:>> - Check out 0889321.>> - Build it.>> - Edit gnu/build/activation.scm and gnu/build/linux-boot.scm to contain> merely the following expressions, respectively:>> (define-module (gnu build activation)> #:use-module (gnu build linux-boot))>> (define-module (gnu build linux-boot)> #:autoload (system base compile) (compile-file))>> - Run make again.>> If you're on a multi-core system, you will probably get an error saying> something weird like "no such language scheme".
Do you have a clear explanation of why this happens? I would expect(system base compile) to already be loaded for instance, so it’s notclear to me what’s going on. Or is it just the mutation of (gnu buildlinux-boot) that’s causing problems?
Toggle quote (28 lines)> Solution proposals:>> 1. s/par-for-each/for-each/. Will make compilation slower on multi-core> machines. We would do the same for guix pull, which is a bit sad> because it's so fast right now. Very simple solution though.>> 2. We find out some partitioning of the Scheme modules such that there> is minimal overlap in total loaded modules when the modules in one> subset are each loaded by one Guile process. Then each Guile process> loads & compiles the modules in its given subset serially, but these> Guile processes run in parallel. This could speed things up even> more than now because the module-loading phases of the processes> would be parallel too. It also has the side-effect that less memory> is consumed the fewer cores you have (because less Scheme modules> loaded into memory at once). If someone (Ludo?) has a good general> overview of Guix's module graph then maybe they can come up with a> sensible partitioning of the modules, say into 4 subsets (maxing out> benefits at quad-core), such that loading all modules in one subset> loads a minimal amount of modules that are outside that subset. That> should be the only challenging part of this solution.>> 3. We do nothing for now since this bug triggers rarely, and can be> worked around by simply re-running make. (We just have to hope that> it doesn't trigger on guix pull or on clean builds after some commit;> there's no "just rerun make" in guix pull or an automated build of> Guix.) AFAIU Wingo expressed motivation to make Guile's module> system thread safe, so this problem would then truly disappear.
Short-term, I’d do #1 or #3; probably #1 though, because random failuresare no fun, and we know they can happen.
Longer-term, I’m not convinced by #2. I think I would instead buildpackages in reverse topological order, probably serially at first, whichwould address http://bugs.gnu.org/15602 (with the caveat that the (gnupackages …) modules cannot be topologically-sorted, but OTOH theytypically don’t use macros, so we’re fine.)
That would require a tool to extract and the ‘define-module’ forms andbuild a graph from there.
But really, we must fix http://bugs.gnu.org/15602, an in particular,‘compile-file’ should not mutate the global module name space. I thinkwe could do something like:
(define (compile-file* …) (let ((root the-root-module) (compile-root (copy-module the-root-module))) (dynamic-wind (lambda () (set! the-root-module compile-root) ;; ditto with the-scm-module ) (lambda () (compile-file …)) (lambda () (set! the-root-module root) ;; … ))))
It’s unclear how costly ‘copy-module’ would be, and the whole strategydepends on it.
Eventually it seems clear that Guile proper needs to address this usecase, and needs to provide thread-safe modules.
Ludo’.
L
L
Ludovic Courtès wrote on 23 Feb 2016 14:26
control message for bug #22608
(address . control@debbugs.gnu.org)
87lh6bk169.fsf@gnu.org
severity 22608 important
L
L
Ludovic Courtès wrote on 4 Jul 2018 00:10
Re: bug#22608: Module system thread unsafety and .go compilation
(name . Taylan Ulrich "Bayırlı /Kammer")(address . taylanbayirli@gmail.com)
87d0w4ezst.fsf@gnu.org
Hello,
taylanbayirli@gmail.com (Taylan Ulrich "Bayırlı/Kammer") skribis:
Toggle quote (11 lines)> To speed up the compilation of the many Scheme files in Guix, we use a> script that first loads all modules to be compiled into the Guile> process (by calling 'resolve-interface' on the module names), and then> the corresponding Scheme files are compiled in a par-for-each.>> While Guile's module system is known to be thread unsafe, the idea was> that all mutation should happen in the serial loading phase, and the> parallel compile-file calls should then be thread safe.>> Sadly that assumption isn't met when autoloads are involved.
For the record, these issues should be fixed in Guile 2.2.4:
533e3ff17 * Serialize accesses to submodule hash tables.46bcbfa56 * Module import obarrays are accessed in a critical section.761cf0fb8 * Make module autoloading thread-safe.
‘guix pull’ now defaults to 2.2.4, so we’ll see if indeed those crashesdisappear.
Ludo’.
?