Cuirass & pointer finalization.

OpenSubmitted by Mathieu Othacehe.
Details
3 participants
  • Ludovic Courtès
  • Mathieu Othacehe
  • zimoun
Owner
unassigned
Severity
important
M
M
Mathieu Othacehe wrote on 26 Feb 15:14 +0100
(address . bug-guix@gnu.org)
8735xihq60.fsf@gnu.org
Hello,
I'm trying to fix a memory corruption in the remote-server process ofCuirass since a few days. Even though I don't have a usable core dumpfile yet, I'm pretty sure the error comes from the "zmq-msg-init"procedure of Guile-Simple-ZMQ.
This procedure creates a bytevector, call the C function zmq_msg_init toinitialize it, adds zmq_msg_close as pointer finalizer and returns awrapped pointer.
My understanding is that the wrapped pointer that is passed around inCuirass ensures that the underlying bytevector is not garbage collecteduntil the pointer goes out of scope. However, some assertions failuressuch as this one:
Toggle snippet (3 lines)Assertion failed: check () (src/msg.cpp:394)
let me think that the bytevector is garbage collected, while ZMQ isstill using it. Some help would be much appreciated here :).
Thanks,
Mathieu
Z
Z
zimoun wrote on 26 Feb 21:12 +0100
86im6e1tbr.fsf@gmail.com
Hi Mathieu,
I know nothing about the topic and I probably out-of-scope.
On Fri, 26 Feb 2021 at 15:14, Mathieu Othacehe <othacehe@gnu.org> wrote:
Toggle quote (21 lines)> I'm trying to fix a memory corruption in the remote-server process of> Cuirass since a few days. Even though I don't have a usable core dump> file yet, I'm pretty sure the error comes from the "zmq-msg-init"> procedure of Guile-Simple-ZMQ.>> This procedure creates a bytevector, call the C function zmq_msg_init to> initialize it, adds zmq_msg_close as pointer finalizer and returns a> wrapped pointer.>> My understanding is that the wrapped pointer that is passed around in> Cuirass ensures that the underlying bytevector is not garbage collected> until the pointer goes out of scope. However, some assertions failures> such as this one:>> --8<---------------cut here---------------start------------->8---> Assertion failed: check () (src/msg.cpp:394)> --8<---------------cut here---------------end--------------->8--->> let me think that the bytevector is garbage collected, while ZMQ is> still using it. Some help would be much appreciated here :).
From ’zmq-msg-init’ defined here:
https://github.com/jerry40/guile-simple-zmq/blob/master/simple-zmq.scm.in#L543
and why is ’zmq-message-content’ used for? Since ’message’ isinitialized with zero, I guess. Well, I am confused by:
Toggle snippet (13 lines) (let ((content-ptr (zmq_msg_data (message->pointer message)))[...] (pointer->bytevector content-ptr size))))

(let ((msg (pointer->message! msg-pointer))) (when content-bv (let ((target (zmq-message-content msg))) (bytevector-copy! content-bv 0 target 0 len))) msg))))
Is ’target’ at the same address than ’msg’? Maybe ’target’ createssomehow a dangling pointer.

Cheers,simon
M
M
Mathieu Othacehe wrote on 27 Feb 13:50 +0100
(address . 46796@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
87ft1hvfm4.fsf@gnu.org
Hey,
Here's a Valgrind backtrace:
Toggle snippet (17 lines)==97844== Thread 17:==97844== Invalid read of size 4==97844== at 0x114465B9: zmq::msg_t::close() (in /gnu/store/zd9lbfqa3170nsfd4177dnr38k1sjbnc-zeromq-4.3.4/lib/libzmq.so.5.2.4)==97844== by 0x3A58E98F: ??? ==97844== by 0x489AC78: chained_finalizer (in /gnu/store/q8brh7j5mwy0hbrly6hjb1m3wwndxqc8-guile-3.0.5/lib/libguile-3.0.so.1.3.0)==97844== by 0x49A16EE: GC_invoke_finalizers (in /gnu/store/iycnpxxrg8m9wf9w58d6zvp9sdby6m9d-libgc-7.6.12/lib/libgc.so.1.3.6)==97844== by 0x489AF08: scm_run_finalizers (in /gnu/store/q8brh7j5mwy0hbrly6hjb1m3wwndxqc8-guile-3.0.5/lib/libguile-3.0.so.1.3.0)==97844== by 0x489AF8C: finalization_thread_proc (in /gnu/store/q8brh7j5mwy0hbrly6hjb1m3wwndxqc8-guile-3.0.5/lib/libguile-3.0.so.1.3.0)==97844== by 0x488BB09: c_body (in /gnu/store/q8brh7j5mwy0hbrly6hjb1m3wwndxqc8-guile-3.0.5/lib/libguile-3.0.so.1.3.0)==97844== by 0x4913148: vm_regular_engine (in /gnu/store/q8brh7j5mwy0hbrly6hjb1m3wwndxqc8-guile-3.0.5/lib/libguile-3.0.so.1.3.0)==97844== by 0x49145B4: scm_call_n (in /gnu/store/q8brh7j5mwy0hbrly6hjb1m3wwndxqc8-guile-3.0.5/lib/libguile-3.0.so.1.3.0)==97844== by 0x4890BB9: scm_call_2 (in /gnu/store/q8brh7j5mwy0hbrly6hjb1m3wwndxqc8-guile-3.0.5/lib/libguile-3.0.so.1.3.0)==97844== by 0x48923B9: scm_c_with_exception_handler (in /gnu/store/q8brh7j5mwy0hbrly6hjb1m3wwndxqc8-guile-3.0.5/lib/libguile-3.0.so.1.3.0)==97844== by 0x4909C3C: scm_c_catch (in /gnu/store/q8brh7j5mwy0hbrly6hjb1m3wwndxqc8-guile-3.0.5/lib/libguile-3.0.so.1.3.0)==97844== Address 0x7373313569316263 is not stack'd, malloc'd or (recently) free'd
It looks like the finalizer is operating on a memory region that hasalready been free'd. The documentation associated with thefinalization functions in <gc.h> says:
Toggle snippet (7 lines) /* When obj is no longer accessible, invoke */ /* (*fn)(obj, cd). If a and b are inaccessible, and */ /* a points to b (after disappearing links have been */ /* made to disappear), then only a will be */

As far as I understand, OBJ is the wrapped pointer to the bytevectorcreated in "zmq-msg-init". There's a weak reference between the pointerand the bytevector that is introduced by "register_weak_reference" in"bytevector->pointer".
My interrogation is: do I have the guarantee that the pointer and itsreferences are still readable from within the finalizer? The abovesnippet says that FN is invoked when OBJ is unaccessible, but does thismean its content may have already been collected?
Cc'ing Ludo :)
Thanks,
Mathieu
M
M
Mathieu Othacehe wrote on 27 Feb 13:59 +0100
(name . zimoun)(address . zimon.toutoune@gmail.com)(address . 46796@debbugs.gnu.org)
87blc5vf7d.fsf@gnu.org
Hello zimoun,
Toggle quote (18 lines)> and why is ’zmq-message-content’ used for? Since ’message’ is> initialized with zero, I guess. Well, I am confused by:>> (let ((content-ptr (zmq_msg_data (message->pointer message)))> [...]> (pointer->bytevector content-ptr size))))>> …>> (let ((msg (pointer->message! msg-pointer)))> (when content-bv> (let ((target (zmq-message-content msg)))> (bytevector-copy! content-bv 0 target 0 len)))> msg))))>> Is ’target’ at the same address than ’msg’? Maybe ’target’ creates> somehow a dangling pointer.
No 'target' is not at the same address than 'msg', it's just a field of'msg' that is allocated internally when "zmq_msg_init_size" is called.
Allocating a message with "zmq_msg_init_size" and filling its content bymemcpy'ing data to the memory region pointed by "zmq_msg_data" is theexample given in "Man 3 zmq_msg_send", to I hope this is a validuse-case :).
Thanks,
Mathieu
L
L
Ludovic Courtès wrote on 1 Mar 10:37 +0100
(name . Mathieu Othacehe)(address . othacehe@gnu.org)(address . 46796@debbugs.gnu.org)
87k0qrusde.fsf@gnu.org
Hi!
Mathieu Othacehe <othacehe@gnu.org> skribis:
Toggle quote (6 lines)> Here's a Valgrind backtrace:>> ==97844== Thread 17:> ==97844== Invalid read of size 4> ==97844== at 0x114465B9: zmq::msg_t::close() (in /gnu/store/zd9lbfqa3170nsfd4177dnr38k1sjbnc-zeromq-4.3.4/lib/libzmq.so.5.2.4)
First, is this function idempotent? (Is it OK to close an msg_tmultiple times.)
Second, remember that finalizers can run in a separate thread. Thus,you must make sure there are no other threads, such as ZMQ’s internalthreads, still operating on the object when it is freed.
Toggle quote (29 lines)> ==97844== by 0x3A58E98F: ??? > ==97844== by 0x489AC78: chained_finalizer (in /gnu/store/q8brh7j5mwy0hbrly6hjb1m3wwndxqc8-guile-3.0.5/lib/libguile-3.0.so.1.3.0)> ==97844== by 0x49A16EE: GC_invoke_finalizers (in /gnu/store/iycnpxxrg8m9wf9w58d6zvp9sdby6m9d-libgc-7.6.12/lib/libgc.so.1.3.6)> ==97844== by 0x489AF08: scm_run_finalizers (in /gnu/store/q8brh7j5mwy0hbrly6hjb1m3wwndxqc8-guile-3.0.5/lib/libguile-3.0.so.1.3.0)> ==97844== by 0x489AF8C: finalization_thread_proc (in /gnu/store/q8brh7j5mwy0hbrly6hjb1m3wwndxqc8-guile-3.0.5/lib/libguile-3.0.so.1.3.0)> ==97844== by 0x488BB09: c_body (in /gnu/store/q8brh7j5mwy0hbrly6hjb1m3wwndxqc8-guile-3.0.5/lib/libguile-3.0.so.1.3.0)> ==97844== by 0x4913148: vm_regular_engine (in /gnu/store/q8brh7j5mwy0hbrly6hjb1m3wwndxqc8-guile-3.0.5/lib/libguile-3.0.so.1.3.0)> ==97844== by 0x49145B4: scm_call_n (in /gnu/store/q8brh7j5mwy0hbrly6hjb1m3wwndxqc8-guile-3.0.5/lib/libguile-3.0.so.1.3.0)> ==97844== by 0x4890BB9: scm_call_2 (in /gnu/store/q8brh7j5mwy0hbrly6hjb1m3wwndxqc8-guile-3.0.5/lib/libguile-3.0.so.1.3.0)> ==97844== by 0x48923B9: scm_c_with_exception_handler (in /gnu/store/q8brh7j5mwy0hbrly6hjb1m3wwndxqc8-guile-3.0.5/lib/libguile-3.0.so.1.3.0)> ==97844== by 0x4909C3C: scm_c_catch (in /gnu/store/q8brh7j5mwy0hbrly6hjb1m3wwndxqc8-guile-3.0.5/lib/libguile-3.0.so.1.3.0)> ==97844== Address 0x7373313569316263 is not stack'd, malloc'd or (recently) free'd>>> It looks like the finalizer is operating on a memory region that has> already been free'd. The documentation associated with the> finalization functions in <gc.h> says:>> /* When obj is no longer accessible, invoke */> /* (*fn)(obj, cd). If a and b are inaccessible, and */> /* a points to b (after disappearing links have been */> /* made to disappear), then only a will be */>>> As far as I understand, OBJ is the wrapped pointer to the bytevector> created in "zmq-msg-init". There's a weak reference between the pointer> and the bytevector that is introduced by "register_weak_reference" in> "bytevector->pointer".
There are (roughly) three objects here: the “msg”, the pointer object,and the bytevector that pointer refers to.
The bytevector may be freed when the pointer object becomes unreachable.
But you probably also need a weak link from the “msg” object to thepointer object to ensure that the pointer object outlives the msgobject.
You also need to check the zmq::msg_t::close memory semantics: does itfree the data associated with the message? If so, that’s redundant withthe pointer finalizer.
Toggle quote (5 lines)> My interrogation is: do I have the guarantee that the pointer and its> references are still readable from within the finalizer? The above> snippet says that FN is invoked when OBJ is unaccessible, but does this> mean its content may have already been collected?
Not sure, but most likely the problem is at a higher layer. :-)
If you can get a reduced test case to run under ‘rr’, that should allowyou to see where the message was first freed.
This is all pretty vague and general, but I HTH!
Ludo’.
L
L
Ludovic Courtès wrote on 1 Mar 15:14 +0100
control message for bug #46796
(address . control@debbugs.gnu.org)
87ft1frmel.fsf@gnu.org
severity 46796 importantquit
M
M
Mathieu Othacehe wrote on 2 Mar 09:08 +0100
Re: bug#46796: Cuirass & pointer finalization.
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 46796@debbugs.gnu.org)
87mtvmnfjb.fsf@gnu.org
Hey Ludo,
Many thanks for your help here, it feels great to have your support asalways.
Toggle quote (3 lines)> First, is this function idempotent? (Is it OK to close an msg_t> multiple times.)
The zmq_msg_close function is mostly responsible of freeing the memoryallocated by ZMQ to store the data associated with the message. I thinkit is safe to use it multiple times and from different threads.
It is not responsible of freeing the zmq_msg_t structure itself which isallocated by the user, usually on the stack in C programs.
I have written a small reproducer of the situation:
Toggle snippet (13 lines)(use-modules (system foreign) (rnrs bytevectors))
(define close (dynamic-func "test_close" (dynamic-link "/home/mathieu/tmp/libtest")))
(let loop () (let* ((bv (make-bytevector 64)) (ptr (bytevector->pointer bv))) (set-pointer-finalizer! ptr close) (loop)))
this program creates a bytevector of 64 bytes and attaches the Cfunction "test_close" as a pointer finalizer to the bytevector pointer.
This function looks like:
Toggle snippet (14 lines)inttest_close (void *x){ int i; char *v = x;
for (i = 0; i < 64; i++) { v[i] = '1'; }
return 0;}
It overrides the bytevector content, that should cause a segmentationerror if the bytevector is already freed.
And it does indeed, which makes me think that despite the weak referencebetween the pointer object and the bytevector, the bytevector is alreadyGC'd when the finalizer is called.
I'm now using guardians in Guile-Simple-ZMQ instead of pointerfinalizers, and do not experience crashes anymore, but I would reallylike to understand what's happening here.
Any clues :)?
Thanks,
Mathieu
L
L
Ludovic Courtès wrote on 2 Mar 14:50 +0100
(name . Mathieu Othacehe)(address . othacehe@gnu.org)(address . 46796@debbugs.gnu.org)
87blc1oeba.fsf@gnu.org
Hi!
Mathieu Othacehe <othacehe@gnu.org> skribis:
Toggle quote (40 lines)> I have written a small reproducer of the situation:>> (use-modules (system foreign)> (rnrs bytevectors))>> (define close> (dynamic-func "test_close" (dynamic-link "/home/mathieu/tmp/libtest")))>> (let loop ()> (let* ((bv (make-bytevector 64))> (ptr (bytevector->pointer bv)))> (set-pointer-finalizer! ptr close)> (loop)))>>> this program creates a bytevector of 64 bytes and attaches the C> function "test_close" as a pointer finalizer to the bytevector pointer.>> This function looks like:>> int> test_close (void *x)> {> int i;> char *v = x;>> for (i = 0; i < 64; i++) {> v[i] = '1';> }>> return 0;> }>> It overrides the bytevector content, that should cause a segmentation> error if the bytevector is already freed.>> And it does indeed, which makes me think that despite the weak reference> between the pointer object and the bytevector, the bytevector is already> GC'd when the finalizer is called.
Hmm I think the bytevector and the pointer object can be finalized inthe same GC cycle; when that happens, you have no guarantee as to theorder in which they are finalized.
IOW, I think you cannot assume, from the pointer finalizer, that thebytevector is still reachable.
But… is it really similar to your ZMQ issue? There you had messageobject wrappers (as per ‘define-wrapped-pointer-type’) and a pointerobject to the underlying C object, right?
Toggle quote (4 lines)> I'm now using guardians in Guile-Simple-ZMQ instead of pointer> finalizers, and do not experience crashes anymore, but I would really> like to understand what's happening here.
Guardians are finalizers; it’s just a different interface.
Ludo’.
M
M
Mathieu Othacehe wrote on 2 Mar 18:02 +0100
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 46796@debbugs.gnu.org)
87im69bia7.fsf@gnu.org
Hey,
Toggle quote (4 lines)> Hmm I think the bytevector and the pointer object can be finalized in> the same GC cycle; when that happens, you have no guarantee as to the> order in which they are finalized.
That would explain the crashes indeed.
Toggle quote (4 lines)> But… is it really similar to your ZMQ issue? There you had message> object wrappers (as per ‘define-wrapped-pointer-type’) and a pointer> object to the underlying C object, right?
I think the only difference is that the reproducer doesn't introduce thewrapped pointer object. Using ZMQ, the message creation looks like:
zmq-msg-init Bytevector creation with make-bytevector at address P Bytevector initialization with zmq_msg_init(P) Install zmq_msg_close as finalizer on P Message wrapping using (pointer->message P) Return the wrapped message
The user can then operate on the wrapped message by passing it to othermessage API procedures such as zmq-message-size. Those procedures willcall ZMQ using the underlying pointer.
The bytevector/pointer object undetermined GC order is reallyproblematic then. I'm not sure why I'm not experiencing this crash usingGuardians since they are also using finalizers.
The ultimate work around would be to leave the message closingresponsibility to the user but that would be sad. Do you know if there'sanother to prevent the bytevector from being collected before thepointer object?
Thanks,
Mathieu
L
L
Ludovic Courtès wrote on 8 Mar 14:45 +0100
(name . Mathieu Othacehe)(address . othacehe@gnu.org)(address . 46796@debbugs.gnu.org)
87v9a1ep3p.fsf@gnu.org
Hi,
Mathieu Othacehe <othacehe@gnu.org> skribis:
Toggle quote (20 lines)>> Hmm I think the bytevector and the pointer object can be finalized in>> the same GC cycle; when that happens, you have no guarantee as to the>> order in which they are finalized.>> That would explain the crashes indeed.>>> But… is it really similar to your ZMQ issue? There you had message>> object wrappers (as per ‘define-wrapped-pointer-type’) and a pointer>> object to the underlying C object, right?>> I think the only difference is that the reproducer doesn't introduce the> wrapped pointer object. Using ZMQ, the message creation looks like:>> zmq-msg-init> Bytevector creation with make-bytevector at address P> Bytevector initialization with zmq_msg_init(P)> Install zmq_msg_close as finalizer on P> Message wrapping using (pointer->message P)> Return the wrapped message
Shouldn’t the finalizer be on <message>, then?
Toggle quote (8 lines)> The user can then operate on the wrapped message by passing it to other> message API procedures such as zmq-message-size. Those procedures will> call ZMQ using the underlying pointer.>> The bytevector/pointer object undetermined GC order is really> problematic then. I'm not sure why I'm not experiencing this crash using> Guardians since they are also using finalizers.
Guardians “revive” objects: when you call the guardian, it returns theobject that _would have_ been GC’d. IOW, guardians delay “actual”finalization. That may be the explanation.
Toggle quote (3 lines)> The ultimate work around would be to leave the message closing> responsibility to the user but that would be sad.
Yeah, don’t do that. :-)
Toggle quote (3 lines)> Do you know if there's another to prevent the bytevector from being> collected before the pointer object?
I’d really need to dive into the code but I’m confident there’s nothingspecial about this scenario; we’re probably just overlooking somepointer ownership rule.
I see something risky: AIUI, ‘zmq-message-content’ returns a bytevectorthat aliases a message’s buffer. The problem is that the bytevector maystill be used from Scheme after the message is destroyed, and then badthings can happen.
Also, regarding the message API, my goal back then (but I never gotaround to it) was to not expose the msg API as such, and instead tohave ‘zmq-send’, ‘zmq-receive’ etc. transparently create msg_t objects.That simplifies things for users and perhaps also for theimplementation.
HTH,Ludo’.
?