Heap corruption leads to random crashes

  • Done
  • quality assurance status badge
Details
4 participants
  • Brian Woodcox
  • Ludovic Courtès
  • Pierre Langlois
  • shtwzrd
Owner
unassigned
Submitted by
Ludovic Courtès
Severity
important
Merged with
L
L
Ludovic Courtès wrote on 24 Jan 2020 16:14
Finalization thread hits wrong-type-arg on weak vector (AArch64)
(address . bug-Guile@gnu.org)
87tv4kdgyy.fsf@inria.fr
Hello!

While building the “guix-system.drv” derivation on AArch64, I got this
crash (not fully deterministic but quite frequent). Here the
finalization thread gets a wrong-type-arg in ‘scm_i_weak_car’ (i.e.,
accessing a one-element weak vector):

Toggle snippet (92 lines)
$ ( export out=$PWD/build; unset GUILE_LOAD_PATH; unset GUILE_LOAD_COMPILED_PATH; gdb --args "/gnu/store/p8in2npgl5yhliy25ikz7shjbq0gii95-guile-next-3.0.0/bin/guile" "--no-auto-compile" "-L" "/gnu/store/3qg8l6kr4wa9sbgwy00z1mb3p88xf455-module-import" "-C" "/gnu/store/h9qcvg71bmx735fsndagll9y7s72k9n9-module-import-compiled" guix-system-builder )
[…]
loading 'gnu/services/cups.scm'...
Backtrace:
[Switching to Thread 0xffffbebec1d0 (LWP 22464)]

Thread 2 "guile" hit Breakpoint 3, scm_display_backtrace_with_highlights (
stack=stack@entry="#<struct stack>" = {...}, port=port@entry=#<port #<port-type file 4c3b40> 510040>,
first=first@entry=#f, depth=depth@entry=#f, highlights=highlights@entry=()) at backtrace.c:269
269 {
(gdb) bt
#0 scm_display_backtrace_with_highlights (stack=stack@entry="#<struct stack>" = {...},
port=port@entry=#<port #<port-type file 4c3b40> 510040>, first=first@entry=#f, depth=depth@entry=#f,
highlights=highlights@entry=()) at backtrace.c:269
#1 0x0000ffffbf5ef8c4 in print_exception_and_backtrace (
args=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cef60, tag=wrong-type-arg,
port=#<port #<port-type file 4c3b40> 510040>) at continuations.c:409
#2 pre_unwind_handler (error_port=0x510040, tag=wrong-type-arg,
args=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cef60) at continuations.c:453
#3 0x0000ffffbf672588 in catch_pre_unwind_handler (data=0xffffbebeb850,
exn=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70ced40) at throw.c:135
#4 0x0000ffffbf67bdf8 in vm_regular_engine (thread=0x475b40) at vm-engine.c:972
#5 0x0000ffffbf67d10c in scm_call_n (proc=proc@entry=#<unmatched-tag 10045>, argv=<optimized out>, nargs=5)
at vm.c:1589
#6 0x0000ffffbf5f3c10 in scm_apply_0 (proc=#<unmatched-tag 10045>, args=()) at eval.c:603
#7 0x0000ffffbf5f4654 in scm_apply_1 (proc=<optimized out>, arg1=arg1@entry=wrong-type-arg,
args=args@entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70caf80) at eval.c:609
#8 0x0000ffffbf6729e0 in scm_throw (key=key@entry=wrong-type-arg,
args=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70caf80) at throw.c:262
#9 0x0000ffffbf672b44 in scm_ithrow (key=key@entry=wrong-type-arg, args=<optimized out>, no_return=no_return@entry=1)
at throw.c:457
#10 0x0000ffffbf5f1dec in scm_error_scm (key=key@entry=wrong-type-arg, subr=subr@entry="weak-vector-ref",
message=<optimized out>,
args=args@entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cafd0,
data=data@entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cafc0) at error.c:90
#11 0x0000ffffbf5f1ea0 in scm_error (key=key@entry=wrong-type-arg,
subr=subr@entry=0xffffbf6a52d8 <s_scm_weak_vector_ref> "weak-vector-ref",
message=message@entry=0xffffbf696e98 "Wrong type argument in position ~A (expecting ~A): ~S",
args=args@entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cafd0,
rest=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cafc0) at error.c:62
#12 0x0000ffffbf5f22cc in scm_wrong_type_arg_msg (
subr=subr@entry=0xffffbf6a52d8 <s_scm_weak_vector_ref> "weak-vector-ref", pos=pos@entry=1,
bad_value=<error reading variable: ERROR: Cannot access memory at address 0x0>0x30ff880,
szMessage=szMessage@entry=0xffffbf6a5300 "weak vector") at error.c:282
#13 0x0000ffffbf680050 in scm_c_weak_vector_ref (wv=<optimized out>, k=k@entry=0) at weak-vector.c:193
#14 0x0000ffffbf67eff4 in scm_i_weak_car (
pair=<error reading variable: ERROR: Cannot access memory at address 0x0>0x30f8830) at weak-list.h:39
#15 scm_i_visit_weak_list (list_loc=0xffffbf6c81b0 <all_weak_tables>, visit=<optimized out>) at weak-list.h:49
#16 vacuum_all_weak_tables () at weak-table.c:494
#17 0x0000ffffbf5fda44 in async_gc_finalizer (ptr=0x494ec0, data=0x0) at finalizers.c:316
#18 0x0000ffffbf549f74 in GC_invoke_finalizers ()
from /gnu/store/wsqzmim7m23gskpibrpqzx4djadhjz8y-libgc-7.6.12/lib/libgc.so.1
#19 0x0000ffffbf5fdf64 in scm_run_finalizers () at finalizers.c:398
#20 0x0000ffffbf5fdff4 in finalization_thread_proc (unused=<optimized out>) at finalizers.c:233
#21 0x0000ffffbf5ef6e0 in c_body (d=0xffffbebeb918) at continuations.c:430
#22 0x0000ffffbf67bdf8 in vm_regular_engine (thread=0x475b40) at vm-engine.c:972
#23 0x0000ffffbf67d10c in scm_call_n (proc=#<unmatched-tag 10045>, argv=argv@entry=0xffffbebeb660,
nargs=nargs@entry=2) at vm.c:1589
#24 0x0000ffffbf5f3930 in scm_call_2 (proc=<optimized out>, arg1=<optimized out>, arg2=<optimized out>) at eval.c:503
#25 0x0000ffffbf5f4f38 in scm_c_with_exception_handler (type=type@entry=#t, handler=0xffffbebeb670,
handler@entry=0xffffbf6724b0 <catch_post_unwind_handler>, handler_data=0x510040,
handler_data@entry=0xffffbebeb850, thunk=0x0, thunk@entry=0xffffbf6725f8 <catch_body>,
thunk_data=0x1dce42683dff4d67, thunk_data@entry=0xffffbebeb850) at exceptions.c:170
#26 0x0000ffffbf672850 in scm_c_catch (tag=tag@entry=#t, body=body@entry=0xffffbf5ef6c8 <c_body>,
body_data=body_data@entry=0xffffbebeb918, handler=handler@entry=0xffffbf5ef970 <c_handler>,
handler_data=handler_data@entry=0xffffbebeb918,
pre_unwind_handler=pre_unwind_handler@entry=0xffffbf5ef7b8 <pre_unwind_handler>,
pre_unwind_handler_data=pre_unwind_handler_data@entry=0x510040) at throw.c:168
#27 0x0000ffffbf5efbf4 in scm_i_with_continuation_barrier (body=body@entry=0xffffbf5ef6c8 <c_body>,
body_data=body_data@entry=0xffffbebeb918, handler=handler@entry=0xffffbf5ef970 <c_handler>,
handler_data=handler_data@entry=0xffffbebeb918,
pre_unwind_handler=pre_unwind_handler@entry=0xffffbf5ef7b8 <pre_unwind_handler>, pre_unwind_handler_data=0x510040)
at continuations.c:368
#28 0x0000ffffbf5efca0 in scm_c_with_continuation_barrier (func=<optimized out>, data=<optimized out>)
at continuations.c:464
#29 0x0000ffffbf671148 in with_guile (base=0xffffbebeb988, data=0xffffbebeb9a8) at threads.c:645
#30 0x0000ffffbf551618 in GC_call_with_stack_base ()
from /gnu/store/wsqzmim7m23gskpibrpqzx4djadhjz8y-libgc-7.6.12/lib/libgc.so.1
#31 0x0000ffffbf6714a8 in scm_i_with_guile (dynamic_state=<optimized out>, data=<optimized out>, func=<optimized out>)
at threads.c:688
#32 scm_with_guile (func=<optimized out>, data=<optimized out>) at threads.c:694
#33 0x0000ffffbf50e7f4 in start_thread ()
from /gnu/store/nr1aw4i32h7rmxwmq7d2da0mwcwg551j-glibc-2.29/lib/libpthread.so.0
#34 0x0000ffffbf136edc in thread_start () from /gnu/store/nr1aw4i32h7rmxwmq7d2da0mwcwg551j-glibc-2.29/lib/libc.so.6
(gdb) info threads
Id Target Id Frame
1 Thread 0xffffbf6f4010 (LWP 22463) "guile" resize_table (table=table@entry=0x4a2cb0) at weak-table.c:272
* 2 Thread 0xffffbebec1d0 (LWP 22464) "guile" scm_display_backtrace_with_highlights (
stack=stack@entry="#<struct stack>" = {...}, port=port@entry=#<port #<port-type file 4c3b40> 510040>,
first=first@entry=#f, depth=depth@entry=#f, highlights=highlights@entry=()) at backtrace.c:269

The problem appears to be that the type tag of the weak-vector got
zeroed:

Toggle snippet (13 lines)
(gdb) frame 15
#15 scm_i_visit_weak_list (list_loc=0xffffbf6c81b0 <all_weak_tables>, visit=<optimized out>) at weak-list.h:49
49 SCM car = scm_i_weak_car (in);
(gdb) p in
$48 = <error reading variable: ERROR: Cannot access memory at address 0x0>(SCM) <error reading variable: ERROR: Cannot access memory at address 0x0>0x30f8830
(gdb) p *(void**)in
$49 = (void *) 0x30ff880
(gdb) p ((void**)$49)[0]@2
$50 = {0x0, 0x30f8840}
(gdb) p (void**)in
$51 = (void **) 0x30f8830

There’s normally no disappearing link registered on the first element of
the weak vector (type tag + length) so I don’t know how this can happen.
Here’s the other thread (with surprisingly broken stack frames):

Toggle snippet (15 lines)
(gdb) thread 1
[Switching to thread 1 (Thread 0xffffbf6f4010 (LWP 22463))]
#0 resize_table (table=table@entry=0x4a2cb0) at weak-table.c:272
272 scm_t_weak_entry *next = entry->next;
(gdb) bt
#0 resize_table (table=table@entry=0x4a2cb0) at weak-table.c:272
#1 0x0000ffffbf67ef00 in vacuum_weak_table (table=0x4a2cb0) at weak-table.c:318
#2 0x0000ffffbf67f374 in scm_c_weak_table_ref (table=<optimized out>, raw_hash=2016212919028049524,
pred=0xffffbf67eb10 <assq_predicate>, closure=0x72b6a80, dflt=()) at weak-table.c:533
#3 0x0000ffffbf653824 in scm_source_properties (obj=<optimized out>) at srcprop.c:195
#4 0x0000ffffbeebcf14 in ?? ()
#5 0x0000ffffbf03c130 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Thoughts?

This code is the same as in 2.2.

Ludo’.
L
L
Ludovic Courtès wrote on 15 Feb 2020 17:37
control message for bug #39266
(address . control@debbugs.gnu.org)
87lfp36c3j.fsf@gnu.org
severity 39266 important
quit
L
L
Ludovic Courtès wrote on 19 Feb 2020 14:50
Re: bug#39266: Finalization thread hits wrong-type-arg on weak vector (AArch64)
(address . 39266@debbugs.gnu.org)
87k14in0sk.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (10 lines)
> While building the “guix-system.drv” derivation on AArch64, I got this
> crash (not fully deterministic but quite frequent). Here the
> finalization thread gets a wrong-type-arg in ‘scm_i_weak_car’ (i.e.,
> accessing a one-element weak vector):
>
> $ ( export out=$PWD/build; unset GUILE_LOAD_PATH; unset GUILE_LOAD_COMPILED_PATH; gdb --args "/gnu/store/p8in2npgl5yhliy25ikz7shjbq0gii95-guile-next-3.0.0/bin/guile" "--no-auto-compile" "-L" "/gnu/store/3qg8l6kr4wa9sbgwy00z1mb3p88xf455-module-import" "-C" "/gnu/store/h9qcvg71bmx735fsndagll9y7s72k9n9-module-import-compiled" guix-system-builder )
> […]
> loading 'gnu/services/cups.scm'...
> Backtrace:

Apparently this bug does not occur with v3.0.0-23-g7dc90a17e¹. It may
be that 00fbdfa7345765168e14438eed0b0b8c64c27ab9 reduces GC pressure,
which as a side effect makes the problem vanish.

It’s not satisfactory, but as a stop-gap measure, we could release 3.0.1
like this, which could make Guile 3 usable for Guix on AArch64.

Thoughts?

Ludo’.

¹ Specifically, I tested by (1) building a tarball with “make dist”, (2)
running “guix build guile-next
--with-source=guile-next=the-tarball.tar.gz”, and (3) running that
Guile in the code above. For some reason, Guile 3.0.0 built “by hand”
would not reproduce the original bug, which is why I built it through
Guix.
B
B
Brian Woodcox wrote on 19 Feb 2020 15:19
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 39266@debbugs.gnu.org)
90401426-AF73-418E-89DD-1A22FC704BD7@InSkyData.com
Well, I would be happy, because right now I can’t do guix pull.

I tried multiple timed on aarch64 with no success.

Brian.


Toggle quote (34 lines)
> On Feb 19, 2020, at 6:50 AM, Ludovic Courtès <ludo@gnu.org> wrote:
>
> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> While building the “guix-system.drv” derivation on AArch64, I got this
>> crash (not fully deterministic but quite frequent). Here the
>> finalization thread gets a wrong-type-arg in ‘scm_i_weak_car’ (i.e.,
>> accessing a one-element weak vector):
>>
>> $ ( export out=$PWD/build; unset GUILE_LOAD_PATH; unset GUILE_LOAD_COMPILED_PATH; gdb --args "/gnu/store/p8in2npgl5yhliy25ikz7shjbq0gii95-guile-next-3.0.0/bin/guile" "--no-auto-compile" "-L" "/gnu/store/3qg8l6kr4wa9sbgwy00z1mb3p88xf455-module-import" "-C" "/gnu/store/h9qcvg71bmx735fsndagll9y7s72k9n9-module-import-compiled" guix-system-builder )
>> […]
>> loading 'gnu/services/cups.scm'...
>> Backtrace:
>
> Apparently this bug does not occur with v3.0.0-23-g7dc90a17e¹. It may
> be that 00fbdfa7345765168e14438eed0b0b8c64c27ab9 reduces GC pressure,
> which as a side effect makes the problem vanish.
>
> It’s not satisfactory, but as a stop-gap measure, we could release 3.0.1
> like this, which could make Guile 3 usable for Guix on AArch64.
>
> Thoughts?
>
> Ludo’.
>
> ¹ Specifically, I tested by (1) building a tarball with “make dist”, (2)
> running “guix build guile-next
> --with-source=guile-next=the-tarball.tar.gz”, and (3) running that
> Guile in the code above. For some reason, Guile 3.0.0 built “by hand”
> would not reproduce the original bug, which is why I built it through
> Guix.
>
>
>
S
S
shtwzrd wrote on 29 Feb 2020 16:09
BGkf_lA45xCWdeI_JnnNYalFXiAmfLFIQIBNlxuSUh2lEzXev530ERO-hS5p0dM1lvGuT8mo_WaR1ApaqMCmBkR0EZtcngH2i2rZBKxHOCo=@protonmail.com
Seconding what Brian said, I would also be happy with this solution.

Right now, guix users on aarch64 simply aren't using guile 3 because they can't pull, so this one bug probably acts as a blocker to encountering other potential bugs.

So even though it's not satisfactory, getting aarch64 working again could result in more and varied bug reports for guile on that platform, so that there may eventually be enough information to find the actual root cause of the problem.
Attachment: file
L
L
Ludovic Courtès wrote on 9 Mar 2020 10:11
Re: bug#39988: [3.0.1] Segfault in GC
87mu8p6gf4.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (2 lines)
> Apparently a deadlock on ‘all_weak_tables_lock’:

I reproduced the deadlock:

Toggle snippet (196 lines)
(gdb) thread 1
[Switching to thread 1 (Thread 0x7faee3633b80 (LWP 5809))]
#0 0x00007faee3bcb0bc in __lll_lock_wait ()
from /gnu/store/ahqgl4h89xqj695lgqvsaf6zh2nhy4pj-glibc-2.29/lib/libpthread.so.0
(gdb) bt
#0 0x00007faee3bcb0bc in __lll_lock_wait ()
from /gnu/store/ahqgl4h89xqj695lgqvsaf6zh2nhy4pj-glibc-2.29/lib/libpthread.so.0
#1 0x00007faee3bc4674 in pthread_mutex_lock ()
from /gnu/store/ahqgl4h89xqj695lgqvsaf6zh2nhy4pj-glibc-2.29/lib/libpthread.so.0
#2 0x00007faee3d22a2f in scm_c_make_weak_table (k=<optimized out>, kind=SCM_WEAK_TABLE_KIND_KEY)
at weak-table.c:505
#3 0x00007faee139814b in ?? ()
#4 0x00007faee330fd80 in ?? ()
#5 0x00007faee3d89860 in ?? ()
from /gnu/store/s5p2yja08zcg6j56y1wfvnm6nxiyllz1-guile-next-3.0.1/lib/libguile-3.0.so.1
#6 0x00007faee330fd80 in ?? ()
#7 0x00007faee3cc46eb in scm_jit_enter_mcode (thread=0x7faee330fd80,
mcode=0x7faedb2a87f0 "I\211\314I)\304I\203\374\020\017\217", <incomplete sequence \344>) at jit.c:5725
#8 0x00007faee3d1ff88 in vm_regular_engine (thread=0x7faed63eab30) at vm-engine.c:374
#9 0x00007faee3d20935 in scm_call_n (proc=<optimized out>, argv=argv@entry=0x7fff3f56ce98, nargs=nargs@entry=1)
at vm.c:1600
#10 0x00007faee3c9d1e7 in scm_primitive_eval (exp=<optimized out>) at eval.c:671
#11 0x00007faee3cc63ab in scm_primitive_load (filename=<optimized out>) at load.c:131
#12 0x00007faee3d1f4fc in vm_regular_engine (thread=0x7faee330fd80) at vm-engine.c:972
#13 0x00007faee3d20935 in scm_call_n (proc=<optimized out>, argv=argv@entry=0x7fff3f56d068, nargs=nargs@entry=1)
at vm.c:1600
#14 0x00007faee3c9d1e7 in scm_primitive_eval (exp=<optimized out>,
exp@entry=((@ (ice-9 control) %) (begin (set! %load-path (cons "." %load-path)) (set! %load-path (cons "." %load-path)) ((@@ (ice-9 command-line) load/lang) "./build-aux/compile-all.scm") (quit)))) at eval.c:671
#15 0x00007faee3c9d243 in scm_eval (
exp=((@ (ice-9 control) %) (begin (set! %load-path (cons "." %load-path)) (set! %load-path (cons "." %load-path)) ((@@ (ice-9 command-line) load/lang) "./build-aux/compile-all.scm") (quit))),
module_or_state=module_or_state@entry="#<struct module>" = {...}) at eval.c:705
#16 0x00007faee3cf6860 in scm_shell (argc=834, argv=0x7fff3f56d6c8) at script.c:357
#17 0x00007faee3cb4bed in invoke_main_func (body_data=0x7fff3f56d570) at init.c:308
#18 0x00007faee3c97e3a in c_body (d=0x7fff3f56d4b0) at continuations.c:430
#19 0x00007faee3d1f4fc in vm_regular_engine (thread=0x7faee330fd80) at vm-engine.c:972
#20 0x00007faee3d20935 in scm_call_n (proc=<optimized out>, argv=argv@entry=0x7fff3f56d270, nargs=nargs@entry=2)
at vm.c:1600
#21 0x00007faee3c9c07a in scm_call_2 (proc=<optimized out>, arg1=<optimized out>, arg2=<optimized out>)
at eval.c:503
#22 0x00007faee3c9d87a in scm_c_with_exception_handler (type=type@entry=#t,
handler=handler@entry=0x7faee3d15d60 <catch_post_unwind_handler>,
handler_data=handler_data@entry=0x7fff3f56d3e0, thunk=thunk@entry=0x7faee3d15ea0 <catch_body>,
thunk_data=thunk_data@entry=0x7fff3f56d3e0) at exceptions.c:170
#23 0x00007faee3d1609d in scm_c_catch (tag=tag@entry=#t, body=body@entry=0x7faee3c97e30 <c_body>,
body_data=body_data@entry=0x7fff3f56d4b0, handler=handler@entry=0x7faee3c980d0 <c_handler>,
handler_data=handler_data@entry=0x7fff3f56d4b0,
pre_unwind_handler=pre_unwind_handler@entry=0x7faee3c97f30 <pre_unwind_handler>,
pre_unwind_handler_data=0x7faee15e63c0) at throw.c:168
#24 0x00007faee3c983e3 in scm_i_with_continuation_barrier (body=body@entry=0x7faee3c97e30 <c_body>,
body_data=body_data@entry=0x7fff3f56d4b0, handler=handler@entry=0x7faee3c980d0 <c_handler>,
handler_data=handler_data@entry=0x7fff3f56d4b0,
pre_unwind_handler=pre_unwind_handler@entry=0x7faee3c97f30 <pre_unwind_handler>,
pre_unwind_handler_data=0x7faee15e63c0) at continuations.c:368
#25 0x00007faee3c98475 in scm_c_with_continuation_barrier (func=<optimized out>, data=<optimized out>)
at continuations.c:464
#26 0x00007faee3d14b3f in with_guile (base=base@entry=0x7fff3f56d518, data=data@entry=0x7fff3f56d540)
at threads.c:645
#27 0x00007faee3bf9a68 in GC_call_with_stack_base (fn=fn@entry=0x7faee3d14af0 <with_guile>,
arg=arg@entry=0x7fff3f56d540) at misc.c:1941
#28 0x00007faee3d14e58 in scm_i_with_guile (dynamic_state=<optimized out>, data=data@entry=0x7fff3f56d540,
func=func@entry=0x7faee3cb4bd0 <invoke_main_func>) at threads.c:688
#29 scm_with_guile (func=func@entry=0x7faee3cb4bd0 <invoke_main_func>, data=data@entry=0x7fff3f56d570)
at threads.c:694
#30 0x00007faee3cb4d62 in scm_boot_guile (argc=argc@entry=834, argv=argv@entry=0x7fff3f56d6c8,
main_func=main_func@entry=0x401240 <inner_main>, closure=closure@entry=0x0) at init.c:291
#31 0x0000000000401100 in main (argc=834, argv=0x7fff3f56d6c8) at guile.c:95
(gdb) thread 5
[Switching to thread 5 (Thread 0x7faee0f93700 (LWP 5815))]
#0 0x00007faee3bc7efc in pthread_cond_wait@@GLIBC_2.3.2 ()
from /gnu/store/ahqgl4h89xqj695lgqvsaf6zh2nhy4pj-glibc-2.29/lib/libpthread.so.0
(gdb) bt
#0 0x00007faee3bc7efc in pthread_cond_wait@@GLIBC_2.3.2 ()
from /gnu/store/ahqgl4h89xqj695lgqvsaf6zh2nhy4pj-glibc-2.29/lib/libpthread.so.0
#1 0x00007faee3d15355 in scm_pthread_cond_wait (cond=<optimized out>, mutex=<optimized out>) at threads.c:1605
#2 0x00007faee3d15523 in block_self (queue=((#<unmatched-tag 277>) #<unmatched-tag 277>),
mutex=mutex@entry=0x7faee1201f80, waittime=waittime@entry=0x0) at threads.c:312
#3 0x00007faee3d15657 in lock_mutex (current_thread=0x7faee330fb40, waittime=0x0, m=0x7faee1201f80,
kind=SCM_MUTEX_RECURSIVE) at threads.c:1021
#4 scm_timed_lock_mutex (mutex=#<unmatched-tag 10377>, timeout=<optimized out>) at threads.c:1085
#5 0x00007faee13a663f in ?? ()
#6 0x00007faee330fb40 in ?? ()
#7 0x00007faee3d89860 in ?? ()
from /gnu/store/s5p2yja08zcg6j56y1wfvnm6nxiyllz1-guile-next-3.0.1/lib/libguile-3.0.so.1
#8 0x00007faee330fb40 in ?? ()
#9 0x00007faee3cc46eb in scm_jit_enter_mcode (thread=0x7faee330fb40,
mcode=0x7faee1396410 "I\211\314I)\304I\203\374\020\017\214\272\002") at jit.c:5725
#10 0x00007faee3d1fc99 in vm_regular_engine (thread=0x7faee155ace8) at vm-engine.c:360
#11 0x00007faee3d20935 in scm_call_n (proc=<optimized out>, argv=argv@entry=0x7faee0f92090, nargs=nargs@entry=3)
at vm.c:1600
#12 0x00007faee3c9c09f in scm_call_3 (proc=<optimized out>, arg1=arg1@entry=(guile), arg2=<optimized out>,
arg3=arg3@entry=#f) at eval.c:510
#13 0x00007faee3ccbf2f in scm_maybe_resolve_module (name=name@entry=(guile)) at modules.c:195
#14 0x00007faee3cb8898 in resolve_module (name=(guile), public_p=<optimized out>) at intrinsics.c:317
#15 0x00007faee3d1ef94 in vm_regular_engine (thread=0x7faee330fb40) at vm-engine.c:1583
#16 0x00007faee3d20935 in scm_call_n (proc=<optimized out>, argv=argv@entry=0x7faee0f92278, nargs=nargs@entry=1)
at vm.c:1600
#17 0x00007faee3c9c058 in scm_call_1 (proc=<optimized out>, arg1=<optimized out>) at eval.c:496
#18 0x00007faee3d1f4fc in vm_regular_engine (thread=0x7faee330fb40) at vm-engine.c:972
#19 0x00007faee3d20935 in scm_call_n (proc=<optimized out>, argv=argv@entry=0x7faee0f92420, nargs=nargs@entry=4)
at vm.c:1600
#20 0x00007faee3c9c0d4 in scm_call_4 (proc=<optimized out>, arg1=arg1@entry="#<vector>" = {...},
arg2=arg2@entry=#<port #<port-type file 7faee159ab40> 7faee15e63c0>, arg3=arg3@entry=#:count,
arg4=arg4@entry=20) at eval.c:517
#21 0x00007faee3c8f5f9 in display_backtrace_body (a=<optimized out>) at backtrace.c:239
#22 0x00007faee3c9d87a in scm_c_with_exception_handler (type=type@entry=#t,
handler=handler@entry=0x7faee3d15d60 <catch_post_unwind_handler>,
handler_data=handler_data@entry=0x7faee0f925d0, thunk=thunk@entry=0x7faee3d15ea0 <catch_body>,
thunk_data=thunk_data@entry=0x7faee0f925d0) at exceptions.c:170
#23 0x00007faee3d1609d in scm_c_catch (tag=tag@entry=#t, body=body@entry=0x7faee3c8f4d0 <display_backtrace_body>,
body_data=body_data@entry=0x7faee0f92640, handler=handler@entry=0x7faee3c8f8b0 <error_during_backtrace>,
handler_data=handler_data@entry=0x7faee15e63c0, pre_unwind_handler=pre_unwind_handler@entry=0x0,
pre_unwind_handler_data=0x0) at throw.c:168
#24 0x00007faee3d160be in scm_internal_catch (tag=tag@entry=#t,
body=body@entry=0x7faee3c8f4d0 <display_backtrace_body>, body_data=body_data@entry=0x7faee0f92640,
handler=handler@entry=0x7faee3c8f8b0 <error_during_backtrace>, handler_data=handler_data@entry=0x7faee15e63c0)
at throw.c:177
#25 0x00007faee3c8f4c5 in scm_display_backtrace_with_highlights (stack=stack@entry="#<struct stack>" = {...},
port=port@entry=#<port #<port-type file 7faee159ab40> 7faee15e63c0>, first=first@entry=#f,
depth=depth@entry=#f, highlights=highlights@entry=()) at backtrace.c:277
#26 0x00007faee3c9801f in print_exception_and_backtrace (
args=<error reading variable: ERROR: Cannot access memory at address 0x0>0x7faee11f7f50, tag=wrong-type-arg,
port=#<port #<port-type file 7faee159ab40> 7faee15e63c0>) at continuations.c:409
#27 pre_unwind_handler (error_port=0x7faee15e63c0, tag=wrong-type-arg,
args=<error reading variable: ERROR: Cannot access memory at address 0x0>0x7faee11f7f50)
at continuations.c:453
#28 0x00007faee3d15e1b in catch_pre_unwind_handler (data=0x7faee0f92d80,
exn=<error reading variable: ERROR: Cannot access memory at address 0x0>0x7faee11f7980) at throw.c:135
#29 0x00007faee3d1f4fc in vm_regular_engine (thread=0x7faee330fb40) at vm-engine.c:972
#30 0x00007faee3d20935 in scm_call_n (proc=proc@entry=#<unmatched-tag 10045>, argv=<optimized out>, nargs=5)
at vm.c:1600
#31 0x00007faee3c9c3d4 in scm_apply_0 (proc=#<unmatched-tag 10045>, args=()) at eval.c:603
#32 0x00007faee3c9d07d in scm_apply_1 (proc=<optimized out>, arg1=arg1@entry=wrong-type-arg,
args=args@entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x7faee10d5610)
at eval.c:609
#33 0x00007faee3d16259 in scm_throw (key=key@entry=wrong-type-arg,
args=<error reading variable: ERROR: Cannot access memory at address 0x0>0x7faee10d5610) at throw.c:262
#34 0x00007faee3d163a9 in scm_ithrow (key=key@entry=wrong-type-arg, args=<optimized out>,
no_return=no_return@entry=1) at throw.c:457
#35 0x00007faee3c9a585 in scm_error_scm (key=key@entry=wrong-type-arg, subr=<optimized out>,
message=message@entry="Wrong type argument in position ~A (expecting ~A): ~S",
args=args@entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x7faee10d59a0,
data=data@entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x7faee10d5ab0)
at error.c:90
#36 0x00007faee3c9a61f in scm_error (key=wrong-type-arg,
subr=subr@entry=0x7faee3d4bf60 <s_scm_weak_vector_ref> "weak-vector-ref",
message=message@entry=0x7faee3d3d490 "Wrong type argument in position ~A (expecting ~A): ~S",
args=<error reading variable: ERROR: Cannot access memory at address 0x0>0x7faee10d59a0,
rest=rest@entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x7faee10d5ab0)
at error.c:62
#37 0x00007faee3c9a9e0 in scm_wrong_type_arg_msg (
subr=subr@entry=0x7faee3d4bf60 <s_scm_weak_vector_ref> "weak-vector-ref", pos=pos@entry=1,
bad_value=<error reading variable: ERROR: Cannot access memory at address 0x0>0x7faec8cc8980,
szMessage=szMessage@entry=0x7faee3d4bee0 "weak vector") at error.c:282
#38 0x00007faee3d23716 in scm_c_weak_vector_ref (wv=<optimized out>, k=k@entry=0) at weak-vector.c:193
#39 0x00007faee3d22838 in scm_i_weak_car (
pair=<error reading variable: ERROR: Cannot access memory at address 0x0>0x7faec7c7d7e0) at weak-list.h:39
#40 scm_i_visit_weak_list (list_loc=0x7faee3d8a868 <all_weak_tables>, visit=<optimized out>) at weak-list.h:49
#41 vacuum_all_weak_tables () at weak-table.c:494
#42 0x00007faee3ca5f2e in async_gc_finalizer (ptr=0x7faee3312ea0, data=0x0) at finalizers.c:316
#43 0x00007faee3bf26ef in GC_invoke_finalizers () at finalize.c:1276
#44 0x00007faee3ca63c9 in scm_run_finalizers () at finalizers.c:398
#45 0x00007faee3ca643d in finalization_thread_proc (unused=<optimized out>) at finalizers.c:233
#46 0x00007faee3c97e3a in c_body (d=0x7faee0f92e50) at continuations.c:430
#47 0x00007faee3d1f4fc in vm_regular_engine (thread=0x7faee330fb40) at vm-engine.c:972
#48 0x00007faee3d20935 in scm_call_n (proc=<optimized out>, argv=argv@entry=0x7faee0f92c10, nargs=nargs@entry=2)
at vm.c:1600
#49 0x00007faee3c9c07a in scm_call_2 (proc=<optimized out>, arg1=<optimized out>, arg2=<optimized out>)
at eval.c:503
#50 0x00007faee3c9d87a in scm_c_with_exception_handler (type=type@entry=#t,
handler=handler@entry=0x7faee3d15d60 <catch_post_unwind_handler>,
handler_data=handler_data@entry=0x7faee0f92d80, thunk=thunk@entry=0x7faee3d15ea0 <catch_body>,
thunk_data=thunk_data@entry=0x7faee0f92d80) at exceptions.c:170
#51 0x00007faee3d1609d in scm_c_catch (tag=tag@entry=#t, body=body@entry=0x7faee3c97e30 <c_body>,
body_data=body_data@entry=0x7faee0f92e50, handler=handler@entry=0x7faee3c980d0 <c_handler>,
handler_data=handler_data@entry=0x7faee0f92e50,
pre_unwind_handler=pre_unwind_handler@entry=0x7faee3c97f30 <pre_unwind_handler>,
pre_unwind_handler_data=0x7faee15e63c0) at throw.c:168
#52 0x00007faee3c983e3 in scm_i_with_continuation_barrier (body=body@entry=0x7faee3c97e30 <c_body>,
body_data=body_data@entry=0x7faee0f92e50, handler=handler@entry=0x7faee3c980d0 <c_handler>,
handler_data=handler_data@entry=0x7faee0f92e50,
pre_unwind_handler=pre_unwind_handler@entry=0x7faee3c97f30 <pre_unwind_handler>,
pre_unwind_handler_data=0x7faee15e63c0) at continuations.c:368
#53 0x00007faee3c98475 in scm_c_with_continuation_barrier (func=<optimized out>, data=<optimized out>)
at continuations.c:464
#54 0x00007faee3d14b3f in with_guile (base=base@entry=0x7faee0f92eb8, data=data@entry=0x7faee0f92ee0)
at threads.c:645
#55 0x00007faee3bf9a68 in GC_call_with_stack_base (fn=fn@entry=0x7faee3d14af0 <with_guile>,
arg=arg@entry=0x7faee0f92ee0) at misc.c:1941
#56 0x00007faee3d14e58 in scm_i_with_guile (dynamic_state=<optimized out>, data=<optimized out>,
func=<optimized out>) at threads.c:688
#57 scm_with_guile (func=<optimized out>, data=<optimized out>) at threads.c:694
#58 0x00007faee3bc2015 in start_thread ()
from /gnu/store/ahqgl4h89xqj695lgqvsaf6zh2nhy4pj-glibc-2.29/lib/libpthread.so.0
#59 0x00007faee372891f in clone () from /gnu/store/ahqgl4h89xqj695lgqvsaf6zh2nhy4pj-glibc-2.29/lib/libc.so.6

It stems from the bug described in

Ludo’.
L
L
Ludovic Courtès wrote on 9 Mar 2020 15:38
Re: bug#39266: Finalization thread hits wrong-type-arg on weak vector (AArch64)
(address . 39266@debbugs.gnu.org)
874kux385m.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (5 lines)
> While building the “guix-system.drv” derivation on AArch64, I got this
> crash (not fully deterministic but quite frequent). Here the
> finalization thread gets a wrong-type-arg in ‘scm_i_weak_car’ (i.e.,
> accessing a one-element weak vector):

With 3.0.1, I can reproduce the bug on x86_64. With rr (thanks, Andy!),
I found this (starting from the point where the type cell of the weak
vector is zeroed, and reverse-continuing until its gets its original
value of 0x10f):

Toggle snippet (41 lines)
(rr) frame 40
#40 0x00007ffff7f2e66d in scm_i_weak_car (pair=0x7fffe15af690) at ../libguile/pairs.h:190
190 return SCM_CAR (x);
(rr) down
#39 0x00007ffff7f2f576 in scm_c_weak_vector_ref (wv=<optimized out>, k=k@entry=0) at weak-vector.c:193
193 SCM_VALIDATE_WEAK_VECTOR (1, wv);
(rr)
#38 0x00007ffff7ea7ba0 in scm_wrong_type_arg_msg (
subr=subr@entry=0x7ffff7f56f00 <s_scm_weak_vector_ref> "weak-vector-ref", pos=pos@entry=1,
bad_value=0x7fffec472b90, szMessage=szMessage@entry=0x7ffff7f56e80 "weak vector") at error.c:282
282 scm_error (scm_arg_type_key,
(rr) p *((void**)0x7fffec472b90)
$1 = (void *) 0x0
(rr) watch *((void**)0x7fffec472b90)
Hardware watchpoint 1: *((void**)0x7fffec472b90)
(rr) reverse-cont
Continuing.

Thread 1 received signal SIGCONT, Continued.
[Switching to Thread 27074.27074]
__lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:101
101 ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: Dosiero a? dosierujo ne ekzistas.
(rr)
Continuing.

Thread 1 hit Hardware watchpoint 1: *((void**)0x7fffec472b90)

Old value = (void *) 0x0
New value = (void *) 0x10f
__memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:259
259 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Dosiero a? dosierujo ne ekzistas.
(rr) bt
#0 __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:259
#1 0x00007ffff7f1d499 in set_vtable_access_fields (vtable=vtable@entry=0x7fffeb48ee80) at struct.c:143
#2 0x00007ffff7f1dd8d in scm_i_struct_inherit_vtable_magic (vtable=vtable@entry=0x7ffff4e32fa0,
obj=obj@entry=0x7fffeb48ee80) at struct.c:215
#3 0x00007ffff7f1dfea in scm_c_make_structv (vtable=0x7ffff4e32fa0, n_tail=<optimized out>, n_init=8,
init=0x7fffffff50d0) at struct.c:364
#4 0x00007ffff7f1e0b9 in scm_make_struct_no_tail (vtable=0x7ffff4e32fa0, init=0x304) at struct.c:491

Bingo! There’s a mismatch in struct.c:

Toggle snippet (5 lines)
bitmask_size = (nfields + 31U) / 32U;
unboxed_fields = scm_gc_malloc_pointerless (bitmask_size, "unboxed fields");
memset (unboxed_fields, 0, bitmask_size * sizeof(*unboxed_fields));

Pushed a fix as 7c17655cd3d859bf0c5a86d9782a7788205fc05a.

Thanks, rr! You made my day! :-)

Now testing Guix builds on x86_64, i686, ARMv7, and AArch64 to see if
that addresses seemingly related issues.

Ludo’.
P
P
Pierre Langlois wrote on 9 Mar 2020 23:19
(address . bug-guile@gnu.org)(address . 39266@debbugs.gnu.org)
87wo7tdvcq.fsf@gmx.com
Hi Ludo,

Ludovic Courtès writes:

Toggle quote (62 lines)
> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> While building the “guix-system.drv” derivation on AArch64, I got this
>> crash (not fully deterministic but quite frequent). Here the
>> finalization thread gets a wrong-type-arg in ‘scm_i_weak_car’ (i.e.,
>> accessing a one-element weak vector):
>
> With 3.0.1, I can reproduce the bug on x86_64. With rr (thanks, Andy!),
> I found this (starting from the point where the type cell of the weak
> vector is zeroed, and reverse-continuing until its gets its original
> value of 0x10f):
>
> --8<---------------cut here---------------start------------->8---
> (rr) frame 40
> #40 0x00007ffff7f2e66d in scm_i_weak_car (pair=0x7fffe15af690) at ../libguile/pairs.h:190
> 190 return SCM_CAR (x);
> (rr) down
> #39 0x00007ffff7f2f576 in scm_c_weak_vector_ref (wv=<optimized out>, k=k@entry=0) at weak-vector.c:193
> 193 SCM_VALIDATE_WEAK_VECTOR (1, wv);
> (rr)
> #38 0x00007ffff7ea7ba0 in scm_wrong_type_arg_msg (
> subr=subr@entry=0x7ffff7f56f00 <s_scm_weak_vector_ref> "weak-vector-ref", pos=pos@entry=1,
> bad_value=0x7fffec472b90, szMessage=szMessage@entry=0x7ffff7f56e80 "weak vector") at error.c:282
> 282 scm_error (scm_arg_type_key,
> (rr) p *((void**)0x7fffec472b90)
> $1 = (void *) 0x0
> (rr) watch *((void**)0x7fffec472b90)
> Hardware watchpoint 1: *((void**)0x7fffec472b90)
> (rr) reverse-cont
> Continuing.
>
> Thread 1 received signal SIGCONT, Continued.
> [Switching to Thread 27074.27074]
> __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:101
> 101 ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: Dosiero a? dosierujo ne ekzistas.
> (rr)
> Continuing.
>
> Thread 1 hit Hardware watchpoint 1: *((void**)0x7fffec472b90)
>
> Old value = (void *) 0x0
> New value = (void *) 0x10f
> __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:259
> 259 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Dosiero a? dosierujo ne ekzistas.
> (rr) bt
> #0 __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:259
> #1 0x00007ffff7f1d499 in set_vtable_access_fields (vtable=vtable@entry=0x7fffeb48ee80) at struct.c:143
> #2 0x00007ffff7f1dd8d in scm_i_struct_inherit_vtable_magic (vtable=vtable@entry=0x7ffff4e32fa0,
> obj=obj@entry=0x7fffeb48ee80) at struct.c:215
> #3 0x00007ffff7f1dfea in scm_c_make_structv (vtable=0x7ffff4e32fa0, n_tail=<optimized out>, n_init=8,
> init=0x7fffffff50d0) at struct.c:364
> #4 0x00007ffff7f1e0b9 in scm_make_struct_no_tail (vtable=0x7ffff4e32fa0, init=0x304) at struct.c:491
> --8<---------------cut here---------------end--------------->8---
>
> Bingo! There’s a mismatch in struct.c:
>
> --8<---------------cut here---------------start------------->8---
> bitmask_size = (nfields + 31U) / 32U;
> unboxed_fields = scm_gc_malloc_pointerless (bitmask_size, "unboxed fields");
> memset (unboxed_fields, 0, bitmask_size * sizeof(*unboxed_fields));
> --8<---------------cut here---------------end--------------->8---

Oh wow, scary! That was some nice debugging, these types of bugs can be
really hard to get to the bottom of.

Toggle quote (8 lines)
>
> Pushed a fix as 7c17655cd3d859bf0c5a86d9782a7788205fc05a.
>
> Thanks, rr! You made my day! :-)
>
> Now testing Guix builds on x86_64, i686, ARMv7, and AArch64 to see if
> that addresses seemingly related issues.

I've tested it on AArch64 and it's looking good, I'm running Guile 3
finally! I've tested by running 'guix pull --branch=wip-guile-3.0.1' on
a rockpro64 running the Guix system, I've then reconfigured and rebooted
and it's all good.

Thanks so much for the fix! Hopefully it'll work on every platform and
that can be the end of it :-).

Pierre
L
L
Ludovic Courtès wrote on 10 Mar 2020 18:23
control message for bug #39266
(address . control@debbugs.gnu.org)
87k13st98e.fsf@gnu.org
merge 39266 39988
quit
L
L
Ludovic Courtès wrote on 10 Mar 2020 18:25
Re: bug#39266: Finalization thread hits wrong-type-arg on weak vector (AArch64)
(name . Pierre Langlois)(address . pierre.langlois@gmx.com)(address . 39266-done@debbugs.gnu.org)
87d09kt93w.fsf@gnu.org
Hi Pierre,

Pierre Langlois <pierre.langlois@gmx.com> skribis:

Toggle quote (5 lines)
> I've tested it on AArch64 and it's looking good, I'm running Guile 3
> finally! I've tested by running 'guix pull --branch=wip-guile-3.0.1' on
> a rockpro64 running the Guix system, I've then reconfigured and rebooted
> and it's all good.

Thanks for testing!

Toggle quote (3 lines)
> Thanks so much for the fix! Hopefully it'll work on every platform and
> that can be the end of it :-).

Yup, I’ve tested ‘guix pull --branch=wip-guile-3.0.1’ and ‘guix build
guile3.0-guix’ on all 4 architectures that Guix supports, and everything
is fine.

I’ve now pushed the upgrade to 3.0.1 + patch to Guix.

Closing! \o/

The bug appears to be rare for Guile workloads not as intensive as a
Guix build (never reported, never seen), but we should still probably do
a bug-fix 3.0.2 release in the coming weeks, I guess.

Ludo’.
Closed
L
L
Ludovic Courtès wrote on 11 Mar 2020 21:21
control message for bug #39266
(address . control@debbugs.gnu.org)
87v9nafxr4.fsf@gnu.org
retitle 39266 Heap corruption leads to random crashes
quit
L
L
Ludovic Courtès wrote on 12 Mar 2020 16:59
(address . control@debbugs.gnu.org)
87eetxsgwj.fsf@gnu.org
merge 39266 36811
quit
L
L
Ludovic Courtès wrote on 12 Mar 2020 17:01
(address . control@debbugs.gnu.org)
878sk5sgt1.fsf@gnu.org
merge 39266 39208
quit
?
Your comment

This issue is archived.

To comment on this conversation send an email to 39266@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 39266
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch