Knot: Linker runs very slowly and crashes during build

DoneSubmitted by Simon South.
Details
3 participants
  • Ludovic Courtès
  • Tobias Geerinckx-Rice
  • Simon South
Owner
unassigned
Severity
normal
S
S
Simon South wrote on 4 Oct 2020 22:56
(address . bug-guix@gnu.org)
87a6x1g17f.fsf@simonsouth.net
Building Knot 3.0.0 using "guix build knot" consistently appears to hangfor me when it gets to this point during the linking stage:
CCLD knsec3hash ar: `u' modifier ignored since `D' is the default (see `U') CCLD kdig CCLD khost
While it sits here the compiler is tying up 100% of a single CPUcore. On my ROCK64 with 4 GB of RAM, it eventually crashes with aninternal error:
gcc: internal compiler error: Killed (program cc1) Please submit a full bug report, with preprocessed source if appropriate. See https://gcc.gnu.org/bugs/ for instructions. make[3]: *** [Makefile:5381: libzscanner/la-scanner.lo] Error 1 make[3]: Leaving directory '/tmp/guix-build-knot-3.0.0.drv-0/knot-3.0.0/src'
dmesg shows the compiler was killed for running out of memory:
cc1 invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 CPU: 2 PID: 22340 Comm: cc1 Not tainted 5.8.11-gnu #1 (...) oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=cc1,pid=22340,uid=999 Out of memory: Killed process 22340 (cc1) total-vm:2573780kB, anon-rss:2540708kB, file-rss:0kB, shmem-rss:0kB, UID:999 pgtables:5044kB oom_score_adj:0 oom_reaper: reaped process 22340 (cc1), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
On my x86_64 machine the build eventually completes (that machine hasmuch more memory), but there is the same, weirdly long delay duringlinking while the compiler runs.
I see no such delay however when I build the code "manually", using"guix environment --pure knot" or even "guix environment --no-grafts--container knot" as the manual suggests. The build then completesquickly and successfully on either machine; the problem appears tohappen only when guix-daemon is involved.
Is there a known issue that can cause the linker to consume orders ofmagnitude more resources when run by the Guix build process?
Apart from rebuilding gcc with debugging symbols (which seems to makeGuix want to rebuild every other package in the system as well) andtrying to understand what the compiler is doing, how might I go aboutdiagnosing this?
-- Simon Southsimon@simonsouth.net
S
S
Simon South wrote on 5 Oct 2020 01:01
(address . 43802@debbugs.gnu.org)
87mu11egul.fsf@simonsouth.net
So naturally, as soon as I submit the bug report something occurs to methat gets me unstuck.
The delay and crash are occuring while libtool is using gcc to compilesrc/libzscanner/scanner.c, which appears to be generated at build timefrom the file scanner.c.t0 in the same directory.
When I build Knot on my own, scanner.c has a size of 272 KB. When guixbuilds it, scanner.c somehow balloons out to 1.9 MB! So naturally gcc isgoing to need some time and space to make its way through all that code.
In fact the build process actually points out
NOTE: Compilation of scanner.c can take several minutes!
So perhaps all this is completely expected. Still... 1.9 MB. Of Ccode. It's tempting to think something is going wrong here. (And anyway,why the huge discrepancy in file size?)
I'm investigating.
-- Simon Southsimon@simonsouth.net
S
S
Simon South wrote on 5 Oct 2020 02:09
(address . 43802@debbugs.gnu.org)
878scledol.fsf@simonsouth.net
Turns out this is not a bug. Knot ships with two parser implementations:A smaller, slower one (272 KB) and a larger, faster one (1.9 MB). Thelarger one is a bit too big to build reliably on systems with 4 GB orless of available memory.
To test Knot on these machines, you can run "configure" with"--disable-fastparser" as an argument (or edit gnu/packages/dns.scm todo so) to force it to use the smaller parser. This also allows the buildto complete more quickly on systems that can use either.
So how was I getting the smaller implementation in my own builds withoutrealizing it? The configure script has some magical behaviour: It willautomatically select the faster-building implementation if it finds a".git" folder in the current directory. This is presumably meant to helpdevelopers, but the confusion it caused me demonstrates why I think thissort of magical programming is bad practice.
At any rate, this bug report can be closed.
-- Simon Southsimon@simonsouth.net
S
S
Simon South wrote on 5 Oct 2020 02:16
(address . control@debbugs.gnu.org)
87362tedcn.fsf@simonsouth.net
tags 43802 + notabugclose 43802thanks
-- Simon Southsimon@simonsouth.net
L
L
Ludovic Courtès wrote on 5 Oct 2020 16:15
Re: bug#43802: Knot: Linker runs very slowly and crashes during build
(name . Simon South)(address . simon@simonsouth.net)(address . 43802@debbugs.gnu.org)
87ft6swyg6.fsf@gnu.org
Hi,
Simon South <simon@simonsouth.net> skribis:
Toggle quote (32 lines)> Building Knot 3.0.0 using "guix build knot" consistently appears to hang> for me when it gets to this point during the linking stage:>> CCLD knsec3hash> ar: `u' modifier ignored since `D' is the default (see `U')> CCLD kdig> CCLD khost>> While it sits here the compiler is tying up 100% of a single CPU> core. On my ROCK64 with 4 GB of RAM, it eventually crashes with an> internal error:>> gcc: internal compiler error: Killed (program cc1)> Please submit a full bug report,> with preprocessed source if appropriate.> See <https://gcc.gnu.org/bugs/> for instructions.> make[3]: *** [Makefile:5381: libzscanner/la-scanner.lo] Error 1> make[3]: Leaving directory '/tmp/guix-build-knot-3.0.0.drv-0/knot-3.0.0/src'>> dmesg shows the compiler was killed for running out of memory:>> cc1 invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0> CPU: 2 PID: 22340 Comm: cc1 Not tainted 5.8.11-gnu #1> (...)> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=cc1,pid=22340,uid=999> Out of memory: Killed process 22340 (cc1) total-vm:2573780kB, anon-rss:2540708kB, file-rss:0kB, shmem-rss:0kB, UID:999 pgtables:5044kB oom_score_adj:0> oom_reaper: reaped process 22340 (cc1), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB>> On my x86_64 machine the build eventually completes (that machine has> much more memory), but there is the same, weirdly long delay during> linking while the compiler runs.
I this an LTO build (with ‘-flto’ in the compile and link flags)? Thatcould explain the memory requirements.
Ludo’.
T
T
Tobias Geerinckx-Rice wrote on 5 Oct 2020 17:26
(address . 43802@debbugs.gnu.org)
875z7oae3z.fsf@nckx
Simon,
Would it make sense to provide a faster-building slower-starting Knot variant alongside the main package?
Ludovic Courtès 写道:
Toggle quote (4 lines)> I this an LTO build (with ‘-flto’ in the compile and link > flags)? That> could explain the memory requirements.
No, but good guess.
Simon South 写道:
Toggle quote (2 lines)> Turns out this is not a bug.
The fast parser is written in Ragel[0], which compiles down to almost 2 MiB of ‘C’, which is then thrown at GCC to sort out. I know to put the kettle on before hacking on Knot locally.
What I didn't know was that these generated C files were included in the release tarball. We have the Ragel, we can rebuild them, and we now do so in commit 2b73e50c31a61b5dcef35a1e4b9484d9dbcb0fbc. Thanks for bringing it to my attention.
Kind regards,
T G-R
[0]: http://www.colm.net/open-source/ragel/
-----BEGIN PGP SIGNATURE-----
iIMEARYKACsWIQT12iAyS4c9C3o4dnINsP+IT1VteQUCX3s7EA0cbWVAdG9iaWFzLmdyAAoJEA2w/4hPVW152oIA/2zGSRD4p40y3uklz/gKMRrHDRb2MQt46wU+XCTJs1dxAP0ZCaevCB9eldjoWHL/cISxBOyZAExsFryqkyxW/0PlCg===y1uw-----END PGP SIGNATURE-----
S
S
Simon South wrote on 5 Oct 2020 17:44
(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)
87o8lgbrtq.fsf@simonsouth.net
Tobias Geerinckx-Rice <me@tobias.gr> writes:
Toggle quote (3 lines)> Would it make sense to provide a faster-building slower-starting Knot> variant alongside the main package?
I'm inclined to say "no", especially if we assume a substitute will(nearly always) be available.
Unless someone is hacking on the scanner directly it ought to be safe toadd "--disable-fastparser" to dns.scm temporarily during testing, thenremove it before submitting a patch. If it isn't then probably _that_ isthe bug to be fixed.
Toggle quote (4 lines)> What I didn't know was that these generated C files were included in> the release tarball. We have the Ragel, we can rebuild them, and we> now do so in commit 2b73e50c31a61b5dcef35a1e4b9484d9dbcb0fbc.
Neat!
-- Simon Southsimon@simonsouth.net
L
L
Ludovic Courtès wrote on 8 Oct 2020 00:06
(name . Simon South)(address . simon@simonsouth.net)
87o8ldsnc4.fsf@gnu.org
Simon South <simon@simonsouth.net> skribis:
Toggle quote (6 lines)>> What I didn't know was that these generated C files were included in>> the release tarball. We have the Ragel, we can rebuild them, and we>> now do so in commit 2b73e50c31a61b5dcef35a1e4b9484d9dbcb0fbc.>> Neat!
+1, yay for bootstrapping!
Ludo’.
?