Finalization thread hits wrong-type-arg on weak vector (AArch64)
(address . bug-Guile@gnu.org)
Hello!
While building the “guix-system.drv” derivation on AArch64, I got this
crash (not fully deterministic but quite frequent). Here the
finalization thread gets a wrong-type-arg in ‘scm_i_weak_car’ (i.e.,
accessing a one-element weak vector):
Toggle snippet (92 lines)
$ ( export out=$PWD/build; unset GUILE_LOAD_PATH; unset GUILE_LOAD_COMPILED_PATH; gdb --args "/gnu/store/p8in2npgl5yhliy25ikz7shjbq0gii95-guile-next-3.0.0/bin/guile" "--no-auto-compile" "-L" "/gnu/store/3qg8l6kr4wa9sbgwy00z1mb3p88xf455-module-import" "-C" "/gnu/store/h9qcvg71bmx735fsndagll9y7s72k9n9-module-import-compiled" guix-system-builder )
[…]
loading 'gnu/services/cups.scm'...
Backtrace:
[Switching to Thread 0xffffbebec1d0 (LWP 22464)]
Thread 2 "guile" hit Breakpoint 3, scm_display_backtrace_with_highlights (
stack=stack@entry="#<struct stack>" = {...}, port=port@entry=#<port #<port-type file 4c3b40> 510040>,
first=first@entry=#f, depth=depth@entry=#f, highlights=highlights@entry=()) at backtrace.c:269
269 {
(gdb) bt
#0 scm_display_backtrace_with_highlights (stack=stack@entry="#<struct stack>" = {...},
port=port@entry=#<port #<port-type file 4c3b40> 510040>, first=first@entry=#f, depth=depth@entry=#f,
highlights=highlights@entry=()) at backtrace.c:269
#1 0x0000ffffbf5ef8c4 in print_exception_and_backtrace (
args=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cef60, tag=wrong-type-arg,
port=#<port #<port-type file 4c3b40> 510040>) at continuations.c:409
#2 pre_unwind_handler (error_port=0x510040, tag=wrong-type-arg,
args=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cef60) at continuations.c:453
#3 0x0000ffffbf672588 in catch_pre_unwind_handler (data=0xffffbebeb850,
exn=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70ced40) at throw.c:135
#4 0x0000ffffbf67bdf8 in vm_regular_engine (thread=0x475b40) at vm-engine.c:972
#5 0x0000ffffbf67d10c in scm_call_n (proc=proc@entry=#<unmatched-tag 10045>, argv=<optimized out>, nargs=5)
at vm.c:1589
#6 0x0000ffffbf5f3c10 in scm_apply_0 (proc=#<unmatched-tag 10045>, args=()) at eval.c:603
#7 0x0000ffffbf5f4654 in scm_apply_1 (proc=<optimized out>, arg1=arg1@entry=wrong-type-arg,
args=args@entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70caf80) at eval.c:609
#8 0x0000ffffbf6729e0 in scm_throw (key=key@entry=wrong-type-arg,
args=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70caf80) at throw.c:262
#9 0x0000ffffbf672b44 in scm_ithrow (key=key@entry=wrong-type-arg, args=<optimized out>, no_return=no_return@entry=1)
at throw.c:457
#10 0x0000ffffbf5f1dec in scm_error_scm (key=key@entry=wrong-type-arg, subr=subr@entry="weak-vector-ref",
message=<optimized out>,
args=args@entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cafd0,
data=data@entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cafc0) at error.c:90
#11 0x0000ffffbf5f1ea0 in scm_error (key=key@entry=wrong-type-arg,
subr=subr@entry=0xffffbf6a52d8 <s_scm_weak_vector_ref> "weak-vector-ref",
message=message@entry=0xffffbf696e98 "Wrong type argument in position ~A (expecting ~A): ~S",
args=args@entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cafd0,
rest=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cafc0) at error.c:62
#12 0x0000ffffbf5f22cc in scm_wrong_type_arg_msg (
subr=subr@entry=0xffffbf6a52d8 <s_scm_weak_vector_ref> "weak-vector-ref", pos=pos@entry=1,
bad_value=<error reading variable: ERROR: Cannot access memory at address 0x0>0x30ff880,
szMessage=szMessage@entry=0xffffbf6a5300 "weak vector") at error.c:282
#13 0x0000ffffbf680050 in scm_c_weak_vector_ref (wv=<optimized out>, k=k@entry=0) at weak-vector.c:193
#14 0x0000ffffbf67eff4 in scm_i_weak_car (
pair=<error reading variable: ERROR: Cannot access memory at address 0x0>0x30f8830) at weak-list.h:39
#15 scm_i_visit_weak_list (list_loc=0xffffbf6c81b0 <all_weak_tables>, visit=<optimized out>) at weak-list.h:49
#16 vacuum_all_weak_tables () at weak-table.c:494
#17 0x0000ffffbf5fda44 in async_gc_finalizer (ptr=0x494ec0, data=0x0) at finalizers.c:316
#18 0x0000ffffbf549f74 in GC_invoke_finalizers ()
from /gnu/store/wsqzmim7m23gskpibrpqzx4djadhjz8y-libgc-7.6.12/lib/libgc.so.1
#19 0x0000ffffbf5fdf64 in scm_run_finalizers () at finalizers.c:398
#20 0x0000ffffbf5fdff4 in finalization_thread_proc (unused=<optimized out>) at finalizers.c:233
#21 0x0000ffffbf5ef6e0 in c_body (d=0xffffbebeb918) at continuations.c:430
#22 0x0000ffffbf67bdf8 in vm_regular_engine (thread=0x475b40) at vm-engine.c:972
#23 0x0000ffffbf67d10c in scm_call_n (proc=#<unmatched-tag 10045>, argv=argv@entry=0xffffbebeb660,
nargs=nargs@entry=2) at vm.c:1589
#24 0x0000ffffbf5f3930 in scm_call_2 (proc=<optimized out>, arg1=<optimized out>, arg2=<optimized out>) at eval.c:503
#25 0x0000ffffbf5f4f38 in scm_c_with_exception_handler (type=type@entry=#t, handler=0xffffbebeb670,
handler@entry=0xffffbf6724b0 <catch_post_unwind_handler>, handler_data=0x510040,
handler_data@entry=0xffffbebeb850, thunk=0x0, thunk@entry=0xffffbf6725f8 <catch_body>,
thunk_data=0x1dce42683dff4d67, thunk_data@entry=0xffffbebeb850) at exceptions.c:170
#26 0x0000ffffbf672850 in scm_c_catch (tag=tag@entry=#t, body=body@entry=0xffffbf5ef6c8 <c_body>,
body_data=body_data@entry=0xffffbebeb918, handler=handler@entry=0xffffbf5ef970 <c_handler>,
handler_data=handler_data@entry=0xffffbebeb918,
pre_unwind_handler=pre_unwind_handler@entry=0xffffbf5ef7b8 <pre_unwind_handler>,
pre_unwind_handler_data=pre_unwind_handler_data@entry=0x510040) at throw.c:168
#27 0x0000ffffbf5efbf4 in scm_i_with_continuation_barrier (body=body@entry=0xffffbf5ef6c8 <c_body>,
body_data=body_data@entry=0xffffbebeb918, handler=handler@entry=0xffffbf5ef970 <c_handler>,
handler_data=handler_data@entry=0xffffbebeb918,
pre_unwind_handler=pre_unwind_handler@entry=0xffffbf5ef7b8 <pre_unwind_handler>, pre_unwind_handler_data=0x510040)
at continuations.c:368
#28 0x0000ffffbf5efca0 in scm_c_with_continuation_barrier (func=<optimized out>, data=<optimized out>)
at continuations.c:464
#29 0x0000ffffbf671148 in with_guile (base=0xffffbebeb988, data=0xffffbebeb9a8) at threads.c:645
#30 0x0000ffffbf551618 in GC_call_with_stack_base ()
from /gnu/store/wsqzmim7m23gskpibrpqzx4djadhjz8y-libgc-7.6.12/lib/libgc.so.1
#31 0x0000ffffbf6714a8 in scm_i_with_guile (dynamic_state=<optimized out>, data=<optimized out>, func=<optimized out>)
at threads.c:688
#32 scm_with_guile (func=<optimized out>, data=<optimized out>) at threads.c:694
#33 0x0000ffffbf50e7f4 in start_thread ()
from /gnu/store/nr1aw4i32h7rmxwmq7d2da0mwcwg551j-glibc-2.29/lib/libpthread.so.0
#34 0x0000ffffbf136edc in thread_start () from /gnu/store/nr1aw4i32h7rmxwmq7d2da0mwcwg551j-glibc-2.29/lib/libc.so.6
(gdb) info threads
Id Target Id Frame
1 Thread 0xffffbf6f4010 (LWP 22463) "guile" resize_table (table=table@entry=0x4a2cb0) at weak-table.c:272
* 2 Thread 0xffffbebec1d0 (LWP 22464) "guile" scm_display_backtrace_with_highlights (
stack=stack@entry="#<struct stack>" = {...}, port=port@entry=#<port #<port-type file 4c3b40> 510040>,
first=first@entry=#f, depth=depth@entry=#f, highlights=highlights@entry=()) at backtrace.c:269
The problem appears to be that the type tag of the weak-vector got
zeroed:
Toggle snippet (13 lines)
(gdb) frame 15
#15 scm_i_visit_weak_list (list_loc=0xffffbf6c81b0 <all_weak_tables>, visit=<optimized out>) at weak-list.h:49
49 SCM car = scm_i_weak_car (in);
(gdb) p in
$48 = <error reading variable: ERROR: Cannot access memory at address 0x0>(SCM) <error reading variable: ERROR: Cannot access memory at address 0x0>0x30f8830
(gdb) p *(void**)in
$49 = (void *) 0x30ff880
(gdb) p ((void**)$49)[0]@2
$50 = {0x0, 0x30f8840}
(gdb) p (void**)in
$51 = (void **) 0x30f8830
There’s normally no disappearing link registered on the first element of
the weak vector (type tag + length) so I don’t know how this can happen.
Here’s the other thread (with surprisingly broken stack frames):
Toggle snippet (15 lines)
(gdb) thread 1
[Switching to thread 1 (Thread 0xffffbf6f4010 (LWP 22463))]
#0 resize_table (table=table@entry=0x4a2cb0) at weak-table.c:272
272 scm_t_weak_entry *next = entry->next;
(gdb) bt
#0 resize_table (table=table@entry=0x4a2cb0) at weak-table.c:272
#1 0x0000ffffbf67ef00 in vacuum_weak_table (table=0x4a2cb0) at weak-table.c:318
#2 0x0000ffffbf67f374 in scm_c_weak_table_ref (table=<optimized out>, raw_hash=2016212919028049524,
pred=0xffffbf67eb10 <assq_predicate>, closure=0x72b6a80, dflt=()) at weak-table.c:533
#3 0x0000ffffbf653824 in scm_source_properties (obj=<optimized out>) at srcprop.c:195
#4 0x0000ffffbeebcf14 in ?? ()
#5 0x0000ffffbf03c130 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
Thoughts?
This code is the same as in 2.2.
Ludo’.