From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 29 09:31:45 2020 Received: (at 42740) by debbugs.gnu.org; 29 Aug 2020 13:31:45 +0000 Received: from localhost ([127.0.0.1]:48046 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kC0xM-0007tp-JO for submit@debbugs.gnu.org; Sat, 29 Aug 2020 09:31:45 -0400 Received: from eggs.gnu.org ([209.51.188.92]:44732) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kC0xJ-0007tc-Ag for 42740@debbugs.gnu.org; Sat, 29 Aug 2020 09:31:43 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:53554) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kC0xE-0001E7-2G; Sat, 29 Aug 2020 09:31:36 -0400 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=34758 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1kC0xA-0003yN-Kr; Sat, 29 Aug 2020 09:31:33 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Artyom Poptsov Subject: Re: bug#42740: Segfault in libssh during =?utf-8?B?4oCYZ3VpeCBj?= =?utf-8?B?b3B54oCZ?= References: <871rkin6zi.fsf@inria.fr> Date: Sat, 29 Aug 2020 15:31:30 +0200 In-Reply-To: (Artyom Poptsov's message of "Sun, 9 Aug 2020 11:48:29 +0300") Message-ID: <874kollgst.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 42740 Cc: Maxim Cournoyer , 42740@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi Artyom! Artyom Poptsov skribis: > please check if this branch will work without segfaults in Guix: > https://github.com/artyom-poptsov/guile-ssh/tree/wip-fix-segfaults-on-gc > > Key changes: > > - Channels are now protecting the parent session from GC'ing. > > - Every channel procedure now ensures that the parent session is > connected before calling any libssh procedures upon a channel > instance. The idea is that a channel cannot be created when a session > is disconnected and when channel is present and the session is closed, > it means that the session is disconnected and freed. Looks like the problem is still there, after all: --8<---------------cut here---------------start------------->8--- $ guix describe Generacio 154 Aug 29 2020 14:49:14 (nuna) guix 0ec6b8a repository URL: https://git.savannah.gnu.org/git/guix.git branch: master commit: 0ec6b8afd7e7a6c288fbf48c5779f2e0bdaffb55 $ guix copy --to=3Dolimex coreutils-minimal sending 3 store items (86 MiB) to 'A20-OLinuXino.local'... Adres-eraro(nekropsio el=C5=9Dutita) $ gdb $(type -P guile) core [...] Core was generated by `/gnu/store/0w76khfspfy8qmcpjya41chj3bgfcy0k-guile-3.= 0.4/bin/guile --no-auto-com'. Program terminated with signal SIGSEGV, Segmentation fault. warning: Unexpected size of section `.reg-xstate/25533' in core file. #0 0x00007f1ba90e4185 in deflate_fast () from /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so= .1 [Current thread is 1 (Thread 0x7f1baefb9b80 (LWP 25533))] (gdb) bt #0 0x00007f1ba90e4185 in deflate_fast () from /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so= .1 #1 0x00007f1ba90e653d in deflate () from /gnu/store/rykm237xkmq7rl1p0nwass= 01p090p88x-zlib-1.2.11/lib/libz.so.1 #2 0x00007f1ba89b1b4a in gzip_compress (session=3Dsession@entry=3D0x12a4b2= 0, source=3Dsource@entry=3D0x12a5580,=20 level=3D) at /tmp/guix-build-libssh-0.9.4.drv-0/source/s= rc/gzip.c:91 #3 0x00007f1ba89b1e83 in compress_buffer (session=3Dsession@entry=3D0x12a4= b20, buf=3D0x12a5580) at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/gzip.c:112 #4 0x00007f1ba898eb5f in packet_send2 (session=3Dsession@entry=3D0x12a4b20) at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/packet.c:1632 #5 0x00007f1ba898ec32 in ssh_packet_send (session=3Dsession@entry=3D0x12a4= b20) at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/packet.c:1810 #6 0x00007f1ba8978639 in channel_write_common (channel=3D0x12b0e90, data= =3D0x7f1b9dba7020, len=3D65536, is_stderr=3D0) at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/channels.c:1488 #7 0x00007f1ba89fce7a in write_to_channel_port () from /gnu/store/hw2wb78q8zxza1p1kdi8bffdbi1hb19n-guile-ssh-0.13.1/lib/li= bguile-ssh.so.13 #8 0x00007f1baf67eedc in scm_i_write_bytes (port=3D# 7f1ba7f25300>,=20 src=3D"#" =3D {...}, start=3D0, count=3D65536) at ports.c:28= 65 #9 0x00007f1baf68686f in scm_put_bytevector (port=3D# 7f1ba7f25300>,=20 bv=3D"#" =3D {...}, start=3D, count=3D) at r6rs-ports.c:676 [...] (gdb) info threads Id Target Id Frame=20 * 1 Thread 0x7f1baefb9b80 (LWP 25533) 0x00007f1ba90e4185 in deflate_fast= () from /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so= .1 2 Thread 0x7f1baec93700 (LWP 25534) warning: Unexpected size of sectio= n `.reg-xstate/25534' in core file. 0x00007f1baf56094c in futex_wait_cancelable (private=3D,=20 expected=3D0, futex_word=3D0x7f1baf5b86e8 ) at ../sysdeps/n= ptl/futex-internal.h:183 3 Thread 0x7f1bac9d0700 (LWP 25537) warning: Unexpected size of sectio= n `.reg-xstate/25537' in core file. 0x00007f1ba90e479f in deflate_fast () from /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so= .1 4 Thread 0x7f1bae302700 (LWP 25535) warning: Unexpected size of sectio= n `.reg-xstate/25535' in core file. 0x00007f1baf56094c in futex_wait_cancelable (private=3D,=20 expected=3D0, futex_word=3D0x7f1baf5b86e8 ) at ../sysdeps/n= ptl/futex-internal.h:183 5 Thread 0x7f1baa6f9700 (LWP 25538) warning: Unexpected size of sectio= n `.reg-xstate/25538' in core file. 0x00007f1baf5640a4 in __libc_read (fd=3D10, buf=3Dbuf@entry=3D0x7f1baa6f866= 0,=20 nbytes=3Dnbytes@entry=3D1) at ../sysdeps/unix/sysv/linux/read.c:26 6 Thread 0x7f1bad971700 (LWP 25536) warning: Unexpected size of sectio= n `.reg-xstate/25536' in core file. 0x00007f1baf56094c in futex_wait_cancelable (private=3D,=20 expected=3D0, futex_word=3D0x7f1baf5b86e8 ) at ../sysdeps/n= ptl/futex-internal.h:183 (gdb) thread 3 [Switching to thread 3 (Thread 0x7f1bac9d0700 (LWP 25537))] #0 0x00007f1ba90e479f in deflate_fast () from /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so= .1 (gdb) bt #0 0x00007f1ba90e479f in deflate_fast () from /gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so= .1 #1 0x00007f1ba90e653d in deflate () from /gnu/store/rykm237xkmq7rl1p0nwass= 01p090p88x-zlib-1.2.11/lib/libz.so.1 #2 0x00007f1ba89b1b4a in gzip_compress (session=3Dsession@entry=3D0x12a4b2= 0, source=3Dsource@entry=3D0x12a5580,=20 level=3D) at /tmp/guix-build-libssh-0.9.4.drv-0/source/s= rc/gzip.c:91 #3 0x00007f1ba89b1e83 in compress_buffer (session=3Dsession@entry=3D0x12a4= b20, buf=3D0x12a5580) at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/gzip.c:112 #4 0x00007f1ba898eb5f in packet_send2 (session=3Dsession@entry=3D0x12a4b20) at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/packet.c:1632 #5 0x00007f1ba898ec32 in ssh_packet_send (session=3Dsession@entry=3D0x12a4= b20) at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/packet.c:1810 #6 0x00007f1ba897a178 in ssh_channel_send_eof (channel=3Dchannel@entry=3D0= x12b0930) at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/channels.c:1250 #7 0x00007f1ba897a23b in ssh_channel_close (channel=3D0x12b0930) at /tmp/guix-build-libssh-0.9.4.drv-0/source/src/channels.c:1301 #8 0x00007f1ba89fcc36 in ptob_close () from /gnu/store/hw2wb78q8zxza1p1kdi8bffdbi1hb19n-guile-ssh-0.13.1/lib/li= bguile-ssh.so.13 #9 0x00007f1baf67c153 in release_port (port=3D# 7f1ba8e73400>) at ports.c:165 #10 0x00007f1baf67f19b in close_port (port=3D# 7f1ba8e73400>,=20 explicit=3D) at ports.c:893 #11 0x00007f1baf63632a in scm_c_with_exception_handler (type=3Dtype@entry= =3D#t,=20 handler=3Dhandler@entry=3D0x7f1baf6ad7e0 ,=20 handler_data=3Dhandler_data@entry=3D0x7f1bac9cf970, thunk=3Dthunk@entry= =3D0x7f1baf6ad920 ,=20 thunk_data=3Dthunk_data@entry=3D0x7f1bac9cf970) at exceptions.c:170 #12 0x00007f1baf6adb1d in scm_c_catch (tag=3Dtag@entry=3D#t, body=3Dbody@en= try=3D0x7f1baf67f200 ,=20 body_data=3D, handler=3D, handler_data=3D= handler_data@entry=3D0x0,=20 pre_unwind_handler=3Dpre_unwind_handler@entry=3D0x0, pre_unwind_handler= _data=3D0x0) at throw.c:168 #13 0x00007f1baf6adb3e in scm_internal_catch (tag=3Dtag@entry=3D#t, body=3D= body@entry=3D0x7f1baf67f200 ,=20 body_data=3D, handler=3D, handler_data=3D= handler_data@entry=3D0x0) at throw.c:177 #14 0x00007f1baf67ad84 in finalize_port (ptr=3D, data=3D) at ports.c:710 #15 0x00007f1baf58a6ef in GC_invoke_finalizers () from /gnu/store/iycnpxxrg8m9wf9w58d6zvp9sdby6m9d-libgc-7.6.12/lib/libgc.= so.1 #16 0x00007f1baf63ee79 in scm_run_finalizers () at finalizers.c:399 #17 0x00007f1baf63eefd in finalization_thread_proc (unused=3D) at finalizers.c:234 --8<---------------cut here---------------end--------------->8--- So we have the finalization thread closing a channel of session 0x12a4b20 (which causes a write on the channel), and the main thread writing to a channel of that same session. This is exactly what I described at : AIUI, that means there=E2=80=99s one output compression buffer per sessio= n, and it=E2=80=99s not thread-safe (in Guile 2.2 finalizers are called from= a separate thread.) I think the fix, in Guile-SSH, is to associate each libssh object (session, channel, etc.) with a mutex, and to protect all uses of the libssh object by that mutex. Artyom, WDYT? Do you think you could take a look into that? In the meantime, I=E2=80=99ll look for the origin of the channel port that= =E2=80=99s not explicitly closed and see if we can work around it. Ludo=E2=80=99.