3.0.0 JIT segfaults on 64-bit Cygwin

  • Done
  • quality assurance status badge
Details
7 participants
  • Charles Stanhope
  • John Cowan
  • dsmich
  • Ludovic Courtès
  • Mike Gran
  • szgyg
  • Andy Wingo
Owner
unassigned
Submitted by
John Cowan
Severity
normal
J
J
John Cowan wrote on 13 Jan 2020 18:26
Re: GNU Guile 2.9.9 Released [beta]
(name . Andy Wingo)(address . wingo@pobox.com)
CAD2gp_TRK0s7WthpQsh2SnKStUFOFrJZ5wc4TD_mFn1OUotWJg@mail.gmail.com
Guile 2.9.9, like .8 and .7, does not build on Cygwin (64 bit). Configure
runs without error, but make crashes with this (truncated to just the tail):

Making all in bootstrap
make[2]: Entering directory
'/cygdrive/c/Users/rr828893/Downloads/guile-2.9.9/bootstrap'
BOOTSTRAP GUILEC ice-9/eval.go
BOOTSTRAP GUILEC ice-9/psyntax-pp.go
BOOTSTRAP GUILEC language/cps/intmap.go
BOOTSTRAP GUILEC language/cps/intset.go
BOOTSTRAP GUILEC language/cps/graphs.go
BOOTSTRAP GUILEC ice-9/vlist.go
BOOTSTRAP GUILEC srfi/srfi-1.go
/bin/sh: line 6: 4294 Segmentation fault (core dumped)
GUILE_AUTO_COMPILE=0 ../meta/build-env guild compile
--target="x86_64-unknown-cygwin" -O1 -Oresolve-primitives -L
"/home/rr828893/Downloads/guile-2.9.9/module" -L
"/home/rr828893/Downloads/guile-2.9.9/guile-readline" -o "srfi/srfi-1.go"
"../module/srfi/srfi-1.scm"
make[2]: *** [Makefile:1930: srfi/srfi-1.go] Error 139
make[2]: Leaving directory
'/cygdrive/c/Users/rr828893/Downloads/guile-2.9.9/bootstrap'
make[1]: *** [Makefile:1849: all-recursive] Error 1
make[1]: Leaving directory
'/cygdrive/c/Users/rr828893/Downloads/guile-2.9.9'
make: *** [Makefile:1735: all] Error 2

All previous problems (which were easy to work around) have gone away in
this release, which is progress, but it doesn't get me past Guile 2.2.



John Cowan http://vrici.lojban.org/~cowan cowan@ccil.org
Your worships will perhaps be thinking that it is an easy thing
to blow up a dog? [Or] to write a book?
--Don Quixote, Introduction
Attachment: file
J
J
John Cowan wrote on 14 Jan 2020 00:09
Re: bug#39118: GNU Guile 2.9.9 Released [beta]
(name . Andy Wingo)(address . wingo@pobox.com)
CAD2gp_SXA1U-FyY3stDKvrHRq=nSHVsqOrAT24KLhnecNgMWrw@mail.gmail.com
Okay, with GUILE_JIT_THRESHOLD set to -1 in the environment, I can build
Guile under Cygwin. There are two test failures which probably reflect
differences between newlib and glibc:

ERROR: time.test: strptime: GNU %s format: strftime fr_FR.utf8 - arguments:
((system-error "strptime" "~A" ("Invalid argument") (22)))
ERROR: time.test: strptime: GNU %s format: strftime fr_FR.iso88591 -
arguments: ((system-error "strptime" "~A" ("Invalid argument") (22)))

And that's that: Cygwin can support Guile 3.0 without JIT. It might be a
good idea to force this variable on in "configure" when building under
Cygwin.
Attachment: file
L
L
Ludovic Courtès wrote on 20 Jan 2020 10:21
control message for bug #39118
(address . control@debbugs.gnu.org)
878sm2iiun.fsf@gnu.org
retitle 39118 3.0.0 JIT segfaults on 64-bit Cygwin
quit
L
L
Ludovic Courtès wrote on 20 Jan 2020 17:35
Segfault while building on 64-bit Cygwin
(name . John Cowan)(address . cowan@ccil.org)
875zh6gk72.fsf_-_@gnu.org
Hi John,

John Cowan <cowan@ccil.org> skribis:

Toggle quote (27 lines)
> Guile 2.9.9, like .8 and .7, does not build on Cygwin (64 bit). Configure
> runs without error, but make crashes with this (truncated to just the tail):
>
> Making all in bootstrap
> make[2]: Entering directory
> '/cygdrive/c/Users/rr828893/Downloads/guile-2.9.9/bootstrap'
> BOOTSTRAP GUILEC ice-9/eval.go
> BOOTSTRAP GUILEC ice-9/psyntax-pp.go
> BOOTSTRAP GUILEC language/cps/intmap.go
> BOOTSTRAP GUILEC language/cps/intset.go
> BOOTSTRAP GUILEC language/cps/graphs.go
> BOOTSTRAP GUILEC ice-9/vlist.go
> BOOTSTRAP GUILEC srfi/srfi-1.go
> /bin/sh: line 6: 4294 Segmentation fault (core dumped)
> GUILE_AUTO_COMPILE=0 ../meta/build-env guild compile
> --target="x86_64-unknown-cygwin" -O1 -Oresolve-primitives -L
> "/home/rr828893/Downloads/guile-2.9.9/module" -L
> "/home/rr828893/Downloads/guile-2.9.9/guile-readline" -o "srfi/srfi-1.go"
> "../module/srfi/srfi-1.scm"
> make[2]: *** [Makefile:1930: srfi/srfi-1.go] Error 139
> make[2]: Leaving directory
> '/cygdrive/c/Users/rr828893/Downloads/guile-2.9.9/bootstrap'
> make[1]: *** [Makefile:1849: all-recursive] Error 1
> make[1]: Leaving directory
> '/cygdrive/c/Users/rr828893/Downloads/guile-2.9.9'
> make: *** [Makefile:1735: all] Error 2

Could you try building 3.0.0 with JIT enabled and grab a backtrace?

Thanks in advance!

Ludo’.
J
J
John Cowan wrote on 20 Jan 2020 17:38
(name . Ludovic Courtès)(address . ludo@gnu.org)
CAD2gp_ReS1DCZmsakcNK-FegZB_fhzOknDf_d0QdyDJVJS6X_A@mail.gmail.com
Yes, gladly, but I don't know how to get one in this context. Do I need to
add some flags to the Makefile, and if so, where? (It's a twisty maze of
passages, all different.) . Note that this *is* a build with JIT enabled;
when I disable it using the env variable, there are no errors and 3.0.0
works fine.

Also, it may take some time, as I have to rebuild my Windows system.
Attachment: file
M
M
Mike Gran wrote on 20 Jan 2020 18:22
(name . John Cowan)(address . cowan@ccil.org)
20200120172253.GA1112065@spikycactus.com
On Mon, Jan 20, 2020 at 11:38:35AM -0500, John Cowan wrote:
Toggle quote (8 lines)
> Yes, gladly, but I don't know how to get one in this context. Do I need to
> add some flags to the Makefile, and if so, where? (It's a twisty maze of
> passages, all different.) . Note that this *is* a build with JIT enabled;
> when I disable it using the env variable, there are no errors and 3.0.0
> works fine.
>
> Also, it may take some time, as I have to rebuild my Windows system.

I also tried building Guile 3.0.0 on Cygwin 3.1.x. The failure comes from
trying to parse compiled .go files.

The last time that I had this sort of problem, it was because the
O_BINARY flag was dropped or missing when writing .go files, leading
to CR+LF characters in the compiled files. And I diagnosed it by
byte-comparing Linux-compiled .go files with Cygwin-compiled .go
files, and by looking for CR+LF combinations in the compiled .go
files.

I don't know if that is what is happening here, but, I'll check that
next time I have a chance.

Thanks,
Michael
L
L
Ludovic Courtès wrote on 21 Jan 2020 10:01
(name . John Cowan)(address . cowan@ccil.org)
87sgk9faih.fsf@gnu.org
Hello,

John Cowan <cowan@ccil.org> skribis:

Toggle quote (2 lines)
> Yes, gladly, but I don't know how to get one in this context.

You would unpack, configure, and build like you did before (with JIT
enabled, so as to reproduce the crash), but before that you’d run
“ulimit -c unlimited” in that shell to make sure there’s a core dumped
when it crashes.

Once it has crashed, locate the ‘core’ file (or ‘core.*’), and run, say:

gdb libguile/.libs/guile bootstrap/core

Then from the GDB prompt:

thread apply all bt

TIA,
Ludo’.
S
Re: bug#39118: Segfault while building on 64-bit Cygwin
20200121184011.GA1659@dtk
On Tue, Jan 21, 2020 at 10:01:58AM +0100, Ludovic Courtès wrote:
Toggle quote (4 lines)
> but before that you’d run
> “ulimit -c unlimited” in that shell to make sure there’s a core dumped
> when it crashes.

This won't work on cygwin. If you want a core dump, you should use the
dumper tool, as described here
Or you can set error_start to gdb to get an interactive gdb session on error.

s
J
J
John Cowan wrote on 21 Jan 2020 22:37
Re: Segfault while building on 64-bit Cygwin
(name . Ludovic Courtès)(address . ludo@gnu.org)
CAD2gp_Ts8VfLUaQ+kC=g+f_5mv0jzLZpN_-U9dvi6Y4jy0-cLw@mail.gmail.com
Thanks. Unfortunately, the standard recipe for making core dumps on Mac
(put "limit core unlimited" into /etc/launchd.conf and reboot, make sure
/cores is writable, set ulimit -c unlimited) seem to actually enable them
on MacOS Catalina (10.15.2). I have tested with SIGQUIT and SIGSEGV on
running processes and no dumps appear in /cores.

On Tue, Jan 21, 2020 at 4:02 AM Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (22 lines)
> Hello,
>
> John Cowan <cowan@ccil.org> skribis:
>
> > Yes, gladly, but I don't know how to get one in this context.
>
> You would unpack, configure, and build like you did before (with JIT
> enabled, so as to reproduce the crash), but before that you’d run
> “ulimit -c unlimited” in that shell to make sure there’s a core dumped
> when it crashes.
>
> Once it has crashed, locate the ‘core’ file (or ‘core.*’), and run, say:
>
> gdb libguile/.libs/guile bootstrap/core
>
> Then from the GDB prompt:
>
> thread apply all bt
>
> TIA,
> Ludo’.
>
Attachment: file
J
J
John Cowan wrote on 21 Jan 2020 22:53
Re: bug#39118: Segfault while building on 64-bit Cygwin
(name . szgyg)(address . szgyg@ludens.elte.hu)
CAD2gp_RYuBNyrvk2SsWg3egtaw6o+MYh1PAJx2Ki60yxURSUJw@mail.gmail.com
I'm no longer talking about Cygwin (which builds fine without JIT). I'm
now talking about MacOS Catalina, which needs a core dump to debug, but on
which nobody seems to know how to enable core dumps.

On Tue, Jan 21, 2020 at 1:41 PM szgyg <szgyg@ludens.elte.hu> wrote:

Toggle quote (16 lines)
> On Tue, Jan 21, 2020 at 10:01:58AM +0100, Ludovic Courtès wrote:
> > but before that you’d run
> > “ulimit -c unlimited” in that shell to make sure there’s a core dumped
> > when it crashes.
>
> This won't work on cygwin. If you want a core dump, you should use the
> dumper tool, as described here
> https://cygwin.com/cygwin-ug-net/dumper.html
> Or you can set error_start to gdb to get an interactive gdb session on
> error.
>
> s
>
>
>
>
Attachment: file
L
L
Ludovic Courtès wrote on 23 Jan 2020 21:35
Re: Segfault while building on 64-bit Cygwin
(name . John Cowan)(address . cowan@ccil.org)
871rrpoqql.fsf@gnu.org
Hi,

John Cowan <cowan@ccil.org> skribis:

Toggle quote (2 lines)
> Thanks. Unfortunately, the standard recipe for making core dumps on Mac

This bug report is about Cygwin, not macOS, right? :-)

Ludo’.
J
J
John Cowan wrote on 24 Jan 2020 15:36
(name . Ludovic Courtès)(address . ludo@gnu.org)
CAD2gp_RkOYj6E6b9PjHerctAJN6NPYznQ4qi8NSXL0edKEM9dw@mail.gmail.com
Both Cygwin and MacOS crash in pretty much the same way. By disabling the
JIT, I was able to get the Cygwin build to run to completion. On MacOS
with --disable-jit, however, I am now getting an entirely new failure:

CC readline.lo
readline.c:432:7: warning: implicitly declaring library function 'strncmp'
with type 'int (const char *, const char *,
unsigned long)' [-Wimplicit-function-declaration]
if (strncmp (rl_get_keymap_name (rl_get_keymap ()), "vi", 2))
^
readline.c:432:7: note: include the header <string.h> or explicitly provide
a declaration for 'strncmp'
readline.c:432:16: warning: implicit declaration of function
'rl_get_keymap_name' is invalid in C99
[-Wimplicit-function-declaration]
if (strncmp (rl_get_keymap_name (rl_get_keymap ()), "vi", 2))
^
readline.c:432:16: warning: incompatible integer to pointer conversion
passing 'int' to parameter of type 'const char *'
[-Wint-conversion]
if (strncmp (rl_get_keymap_name (rl_get_keymap ()), "vi", 2))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3 warnings generated.
CCLD guile-readline.la
Undefined symbols for architecture x86_64:
"_rl_get_keymap_name", referenced from:
_scm_init_readline in readline.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see
invocation)

On Thu, Jan 23, 2020 at 3:35 PM Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (10 lines)
> Hi,
>
> John Cowan <cowan@ccil.org> skribis:
>
> > Thanks. Unfortunately, the standard recipe for making core dumps on Mac
>
> This bug report is about Cygwin, not macOS, right? :-)
>
> Ludo’.
>
Attachment: file
D
D
dsmich wrote on 24 Jan 2020 16:26
RE: bug#39118: Segfault while building on 64-bit Cygwin
(name . 'John Cowan')(address . cowan@ccil.org)
99218b8d8f572c8748963924e82d265652487a09@webmail
Pretty sure that the missing readline symbol is because the macos
readline is being used/found instead of GNU readline.

-Dale

-----------------------------------------From: "John Cowan"
To: "Ludovic Courtès"
Cc: 39118@debbugs.gnu.org, guile-devel@gnu.org
Sent: Friday January 24 2020 9:36:59AM
Subject: bug#39118: Segfault while building on 64-bit Cygwin

Both Cygwin and MacOS crash in pretty much the same way. By disabling
the JIT, I was able to get the Cygwin build to run to completion. On
MacOS with --disable-jit, however, I am now getting an entirely new
failure:
CC readline.lo
readline.c:432:7: warning: implicitly declaring library function
'strncmp' with type 'int (const char *, const char *,
unsigned long)' [-Wimplicit-function-declaration]
if (strncmp (rl_get_keymap_name (rl_get_keymap ()), "vi", 2))
^
readline.c:432:7: note: include the header or explicitly provide a
declaration for 'strncmp'
readline.c:432:16: warning: implicit declaration of function
'rl_get_keymap_name' is invalid in C99
[-Wimplicit-function-declaration]
if (strncmp (rl_get_keymap_name (rl_get_keymap ()), "vi", 2))
^
readline.c:432:16: warning: incompatible integer to pointer conversion
passing 'int' to parameter of type 'const char *'
[-Wint-conversion]
if (strncmp (rl_get_keymap_name (rl_get_keymap ()), "vi", 2))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3 warnings generated.
CCLD guile-readline.la [1]
Undefined symbols for architecture x86_64:
"_rl_get_keymap_name", referenced from:
_scm_init_readline in readline.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see
invocation)

On Thu, Jan 23, 2020 at 3:35 PM Ludovic Courtès wrote:
Hi,

John Cowan skribis:

> Thanks. Unfortunately, the standard recipe for making core dumps on
Mac

This bug report is about Cygwin, not macOS, right? :-)

Ludo’.

Links:
------
[2] mailto:ludo@gnu.org
[3] mailto:cowan@ccil.org
Attachment: file
L
L
Ludovic Courtès wrote on 25 Jan 2020 14:51
Re: Segfault while building on 64-bit Cygwin
(name . John Cowan)(address . cowan@ccil.org)
87wo9fbq60.fsf@gnu.org
John Cowan <cowan@ccil.org> skribis:

Toggle quote (3 lines)
> Both Cygwin and MacOS crash in pretty much the same way. By disabling the
> JIT, I was able to get the Cygwin build to run to completion.

That I understand. However, I was asking for the backtrace of the crash
on Cygwin when JIT is enabled. Could you grab it?

Thanks in advance,
Ludo’.
J
J
John Cowan wrote on 25 Jan 2020 16:54
(name . Ludovic Courtès)(address . ludo@gnu.org)
CAD2gp_Q3Ua+kRW5OV1jXM1D-H7UhKQp-TSd0RQjue7U=1ua62Q@mail.gmail.com
On Sat, Jan 25, 2020 at 8:51 AM Ludovic Courtès <ludo@gnu.org> wrote:


Toggle quote (4 lines)
> That I understand. However, I was asking for the backtrace of the crash
> on Cygwin when JIT is enabled. Could you grab it?
>

1. The wisdom of the Internet has not been able to figure out how to
generate a core dump on MacOS 10.15.2 (Catalina). The usual set of
enabling steps can be performed without error, but still no core dump.

2. Until today I believed that there was no way to generate a Cygwin core
dump. I know now that there is, but I may not be able to test it until
Monday. I'll let you know, and hopefully that will provide insight into
the MacOS problem as well.

3. I will try to work further on the MacOS libffi problem (which surfaces
when you do --disable-jit to bypass the above problem) to convince MacOS to
use GNU libffi rather than the native one. It probably has to do with
pkg-config, which I barely understand.

"All problems are config problems."



John Cowan http://vrici.lojban.org/~cowan cowan@ccil.org
We are lost, lost. No name, no business, no Precious, nothing. Only empty.
Only hungry: yes, we are hungry. A few little fishes, nassty bony little
fishes, for a poor creature, and they say death. So wise they are; so just,
so very just. --Gollum
Attachment: file
J
J
John Cowan wrote on 31 Jan 2020 15:23
(name . Ludovic Courtès)(address . ludo@gnu.org)
CAD2gp_SnY-vP03psw4wNDyH-CrbpSZy7ttb-Z0koYOmu5r2VJg@mail.gmail.com
Aaaand... Cygwin doesn't do core dumps. Under the skin it's WIndows, after
all. This is what I get when I specify ulimit -c unlimited and rebuild:

Exception: STATUS_ACCESS_VIOLATION at rip=0055A8B1B25
rax=0000000000000000 rbx=FFFFFFFFFFFFFF90 rcx=FFFFFFFFFFFFFF90
rdx=000000000034964A rsi=000007000084ECC0 rdi=FFFFFFFFFFFFFF90
r8 =000007000084ECC0 r9 =0000000000000002 r10=0000000100000000
r11=000000055A86B190 r12=0000000000000002 r13=000000055A931EA0
r14=000006FFFFFEF840 r15=0000000000000000
rbp=000000000034964A rsp=00000000FFFFBDA0
program=C:\Users\rr828893\Downloads\guile-3.0.0\libguile\.libs\guile.exe,
pid 62833, thread main
cs=0033 ds=002B es=002B fs=0053 gs=002B ss=002B

I can't imagine what you can make of that.

On Sat, Jan 25, 2020 at 10:54 AM John Cowan <cowan@ccil.org> wrote:

Toggle quote (35 lines)
>
>
> On Sat, Jan 25, 2020 at 8:51 AM Ludovic Courtès <ludo@gnu.org> wrote:
>
>
>> That I understand. However, I was asking for the backtrace of the crash
>> on Cygwin when JIT is enabled. Could you grab it?
>>
>
> 1. The wisdom of the Internet has not been able to figure out how to
> generate a core dump on MacOS 10.15.2 (Catalina). The usual set of
> enabling steps can be performed without error, but still no core dump.
>
> 2. Until today I believed that there was no way to generate a Cygwin core
> dump. I know now that there is, but I may not be able to test it until
> Monday. I'll let you know, and hopefully that will provide insight into
> the MacOS problem as well.
>
> 3. I will try to work further on the MacOS libffi problem (which surfaces
> when you do --disable-jit to bypass the above problem) to convince MacOS to
> use GNU libffi rather than the native one. It probably has to do with
> pkg-config, which I barely understand.
>
> "All problems are config problems."
>
>
>
> John Cowan http://vrici.lojban.org/~cowan cowan@ccil.org
> We are lost, lost. No name, no business, no Precious, nothing. Only
> empty.
> Only hungry: yes, we are hungry. A few little fishes, nassty bony little
> fishes, for a poor creature, and they say death. So wise they are; so
> just,
> so very just. --Gollum
>
Attachment: file
S
S
szgyg wrote on 3 Feb 2020 23:11
Re: bug#39118: Segfault while building on 64-bit Cygwin
(name . John Cowan)(address . cowan@ccil.org)
20200203221137.GB1659@dtk
On Fri, Jan 31, 2020 at 09:23:19AM -0500, John Cowan wrote:
Toggle quote (4 lines)
> Aaaand... Cygwin doesn't do core dumps. Under the skin it's WIndows, after
> all. This is what I get when I specify ulimit -c unlimited and rebuild:
> [...]

Please see my previous mail on how to get a real core dump on cygwin


Toggle quote (5 lines)
>> On Sat, Jan 25, 2020 at 8:51 AM Ludovic Courtès <ludo@gnu.org> wrote:
>>
>>> That I understand. However, I was asking for the backtrace of the crash
>>> on Cygwin when JIT is enabled. Could you grab it?

#v+
BOOTSTRAP GUILEC ice-9/eval.go
BOOTSTRAP GUILEC ice-9/psyntax-pp.go
BOOTSTRAP GUILEC language/cps/intmap.go
BOOTSTRAP GUILEC language/cps/intset.go
BOOTSTRAP GUILEC language/cps/graphs.go
BOOTSTRAP GUILEC ice-9/vlist.go
BOOTSTRAP GUILEC srfi/srfi-1.go

Thread 1 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 7444.0x2640]
0x000000055a8b1b25 in scm_to_uint64 (val=val@entry=0xffffffffffffff90) at ../../guile-3.0.0/libguile/conv-uinteger.i.c:44
44 else if (SCM_BIGP (val))
(gdb) bt
#0 0x000000055a8b1b25 in scm_to_uint64 (val=val@entry=0xffffffffffffff90) at ../../guile-3.0.0/libguile/conv-uinteger.i.c:44
#1 0x000000055a86b1ea in scm_bytevector_copy_x (source=0x700000948620, source_start=0x34964a, target=0x700000907600, target_start=0x2, len=0xffffffffffffff90)
at ../../guile-3.0.0/libguile/bytevectors.c:604
#2 0x00006ffffe743866 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

BOOTSTRAP GUILEC language/tree-il.go

(gdb) bt
#0 0x000000055a8b1b25 in scm_to_uint64 (val=val@entry=0xffffffffffffff90)
at ../../guile-3.0.0/libguile/conv-uinteger.i.c:44
#1 0x000000055a86b1ea in scm_bytevector_copy_x (source=0x70000055f160, source_start=0x34964a, target=0x700000808c90,
target_start=0x2, len=0xffffffffffffff90) at ../../guile-3.0.0/libguile/bytevectors.c:604
#2 0x00006ffffe73f936 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

BOOTSTRAP GUILEC language/tree-il/analyze.go

(gdb) bt
#0 0x000000055a8b1b25 in scm_to_uint64 (val=val@entry=0xffffffffffffff90)
at ../../guile-3.0.0/libguile/conv-uinteger.i.c:44
#1 0x000000055a86b1ea in scm_bytevector_copy_x (source=0x700000645160, source_start=0x34964a, target=0x7000008012d0,
target_start=0x2, len=0xffffffffffffff90) at ../../guile-3.0.0/libguile/bytevectors.c:604
#2 0x00006ffffe753fc6 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

^C
#v-

s
J
J
John Cowan wrote on 5 Feb 2020 22:11
(name . szgyg)(address . szgyg@ludens.elte.hu)
CAD2gp_Rcv81x4uKKTuokNbAWOkz-j77Hjet-oOzci1UFcgVDWg@mail.gmail.com
On Mon, Feb 3, 2020 at 5:11 PM szgyg <szgyg@ludens.elte.hu> wrote:

On Fri, Jan 31, 2020 at 09:23:19AM -0500, John Cowan wrote:
Toggle quote (9 lines)
> > Aaaand... Cygwin doesn't do core dumps. Under the skin it's WIndows,
> after
> > all. This is what I get when I specify ulimit -c unlimited and rebuild:
> > [...]
>
> Please see my previous mail on how to get a real core dump on cygwin
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=39118#28


Okay, I looked at that page. However, Cygwin's dumper requires you to know
the Windows PID of the process to dump. Clearly it is intended for a
long-running process such as a server process, which you can force to core
dump, as if by "/bin/kill -SIGSEGV pid"; it is not suitable for a process
that gets a segmentation violation for internal reasons. In any case, when
building, I have no idea of the pid of the process which is dumping; it
starts up and then dumps immediately.



John Cowan http://vrici.lojban.org/~cowan cowan@ccil.org
The Penguin shall hunt and devour all that is crufty, gnarly and
bogacious; all code which wriggles like spaghetti, or is infested with
blighting creatures, or is bound by grave and perilous Licences shall it
capture. And in capturing shall it replicate, and in replicating shall
it document, and in documentation shall it bring freedom, serenity and
most cool froodiness to the earth and all who code therein. --Gospel of Tux
Attachment: file
S
S
szgyg wrote on 5 Feb 2020 23:42
(name . John Cowan)(address . cowan@ccil.org)
20200205224249.GC1659@dtk
On Wed, Feb 05, 2020 at 04:11:04PM -0500, John Cowan wrote:
Toggle quote (21 lines)
> On Mon, Feb 3, 2020 at 5:11 PM szgyg <szgyg@ludens.elte.hu> wrote:
>
> On Fri, Jan 31, 2020 at 09:23:19AM -0500, John Cowan wrote:
> > > Aaaand... Cygwin doesn't do core dumps. Under the skin it's WIndows,
> > after
> > > all. This is what I get when I specify ulimit -c unlimited and rebuild:
> > > [...]
> >
> > Please see my previous mail on how to get a real core dump on cygwin
> > https://debbugs.gnu.org/cgi/bugreport.cgi?bug=39118#28
>
>
> Okay, I looked at that page. However, Cygwin's dumper requires you to know
> the Windows PID of the process to dump. Clearly it is intended for a
> long-running process such as a server process, which you can force to core
> dump, as if by "/bin/kill -SIGSEGV pid"; it is not suitable for a process
> that gets a segmentation violation for internal reasons. In any case, when
> building, I have no idea of the pid of the process which is dumping; it
> starts up and then dumps immediately.


| One common way to use dumper is to plug it into cygwin's Just-In-Time
| debugging facility by adding
| error_start=x:\path\to\dumper.exe
| to the CYGWIN environment variable. Please note that x:\path\to\dumper.exe
| is Windows-style and not cygwin path. If error_start is set this way, then
| dumper will be started whenever some program encounters a fatal error.


s
A
A
Andy Wingo wrote on 6 Feb 2020 11:53
Re: Segfault while building on 64-bit Cygwin
(name . Mike Gran)(address . spk121@yahoo.com)
87wo90kmw9.fsf@pobox.com
On Mon 20 Jan 2020 18:22, Mike Gran <spk121@yahoo.com> writes:

Toggle quote (22 lines)
> On Mon, Jan 20, 2020 at 11:38:35AM -0500, John Cowan wrote:
>> Yes, gladly, but I don't know how to get one in this context. Do I need to
>> add some flags to the Makefile, and if so, where? (It's a twisty maze of
>> passages, all different.) . Note that this *is* a build with JIT enabled;
>> when I disable it using the env variable, there are no errors and 3.0.0
>> works fine.
>>
>> Also, it may take some time, as I have to rebuild my Windows system.
>
> I also tried building Guile 3.0.0 on Cygwin 3.1.x. The failure comes from
> trying to parse compiled .go files.
>
> The last time that I had this sort of problem, it was because the
> O_BINARY flag was dropped or missing when writing .go files, leading
> to CR+LF characters in the compiled files. And I diagnosed it by
> byte-comparing Linux-compiled .go files with Cygwin-compiled .go
> files, and by looking for CR+LF combinations in the compiled .go
> files.
>
> I don't know if that is what is happening here, but, I'll check that
> next time I have a chance.

Given that John said that compilation went fine with
GUILE_JIT_THRESHOLD=-1, I think perhaps this problem may have been fixed
in the past. My suspicions are that this issue is an ABI issue with
lightening that could perhaps be reproduced by:

cd lightening
make -C tests test-native

Of course any additional confirmation is useful and welcome!

Cheers,

Andy
C
C
Charles Stanhope wrote on 7 Feb 2020 05:56
(name . Andy Wingo)(address . wingo@pobox.com)
CAPydmiP42upuz1S=aP+hZk+tD5EJm00b4Gox1+LzEoJXVmRO=w@mail.gmail.com
On 2/6/20, Andy Wingo <wingo@pobox.com> wrote:

Toggle quote (11 lines)
> Given that John said that compilation went fine with
> GUILE_JIT_THRESHOLD=-1, I think perhaps this problem may have been fixed
> in the past. My suspicions are that this issue is an ABI issue with
> lightening that could perhaps be reproduced by:
>
> git co https://gitlab.com/wingo/lightening
> cd lightening
> make -C tests test-native
>
> Of course any additional confirmation is useful and welcome!

I haven't been able to get guile to compile under Cygwin (just a
compilation error I haven't had time to track down), but I was able to
quickly try the above. I get:

Testing: test-native-call_10
call_10.c:9: assertion failed: e == 4
/bin/sh: line 1: 7063 Aborted (core dumped) ./$test
make: *** [Makefile:31: test-native] Error 134

Despite what it says about a core dump, I find no such thing. Just a
file with the same name as the executable suffixed with ".stackdump".
(I did attempt to configure the Cygwin dumper before running the
tests.) Unless somebody suggests otherwise, I think the error message
is more useful.

--
Charles
C
C
Charles Stanhope wrote on 14 Feb 2020 18:46
(name . Andy Wingo)(address . wingo@pobox.com)
CAPydmiN7eFD5r-v44hwdFM=1J24okQO8XjHqJ3bXQ2N1OkRVhA@mail.gmail.com
On 2/6/20, Charles Stanhope <charles@stanho.pe> wrote:
Toggle quote (23 lines)
> On 2/6/20, Andy Wingo <wingo@pobox.com> wrote:
>
>> Given that John said that compilation went fine with
>> GUILE_JIT_THRESHOLD=-1, I think perhaps this problem may have been fixed
>> in the past. My suspicions are that this issue is an ABI issue with
>> lightening that could perhaps be reproduced by:
>>
>> git co https://gitlab.com/wingo/lightening
>> cd lightening
>> make -C tests test-native
>>
>> Of course any additional confirmation is useful and welcome!
>
> I haven't been able to get guile to compile under Cygwin (just a
> compilation error I haven't had time to track down), but I was able to
> quickly try the above. I get:
>
> Testing: test-native-call_10
> call_10.c:9: assertion failed: e == 4
> /bin/sh: line 1: 7063 Aborted (core dumped) ./$test
> make: *** [Makefile:31: test-native] Error 134
>

Andy, I don't know if you'd want to continue this here or on
lightening's gitlab page, but I looked into this a little bit a few
minutes here and there this past weeek. The x86 "fast-call" calling
convention used on Windows x64[0] and shared by Cygwin[1] requires
that the caller reserve 32 bytes of memory on the stack for the callee
to spill the register parameters (even if the callee takes fewer than
four parameters). I think lightening is currently missing that for the
x64 case for Cygwin.

To test the idea, I made a small modification (patch attached) that is
*not* intended as a solution as it doesn't work for the general case,
but it does allow the tests to pass on Cygwin 64.


--
Charles
Toggle diff (18 lines)
diff --git a/lightening/x86.c b/lightening/x86.c
index 965191a..91b3a94 100644
--- a/lightening/x86.c
+++ b/lightening/x86.c
@@ -338,11 +338,13 @@ next_abi_arg(struct abi_arg_iterator *iter, jit_operand_t *arg)
if (is_gpr_arg(abi) && iter->gpr_idx < abi_gpr_arg_count) {
*arg = jit_operand_gpr (abi, abi_gpr_args[iter->gpr_idx++]);
#ifdef __CYGWIN__
+ iter->stack_size += 8;
iter->fpr_idx++;
#endif
} else if (is_fpr_arg(abi) && iter->fpr_idx < abi_fpr_arg_count) {
*arg = jit_operand_fpr (abi, abi_fpr_args[iter->fpr_idx++]);
#ifdef __CYGWIN__
+ iter->stack_size += 8;
iter->gpr_idx++;
#endif
} else {
M
M
Mike Gran wrote on 17 Feb 2020 00:23
(name . Charles Stanhope)(address . charles@stanho.pe)
20200216232334.GA2448000@spikycactus.com
On Fri, Feb 14, 2020 at 09:46:04AM -0800, Charles Stanhope wrote:
Toggle quote (13 lines)
> Andy, I don't know if you'd want to continue this here or on
> lightening's gitlab page, but I looked into this a little bit a few
> minutes here and there this past weeek. The x86 "fast-call" calling
> convention used on Windows x64[0] and shared by Cygwin[1] requires
> that the caller reserve 32 bytes of memory on the stack for the callee
> to spill the register parameters (even if the callee takes fewer than
> four parameters). I think lightening is currently missing that for the
> x64 case for Cygwin.
>
> To test the idea, I made a small modification (patch attached) that is
> *not* intended as a solution as it doesn't work for the general case,
> but it does allow the tests to pass on Cygwin 64.

I can confirm that Charles's patch, plus another one line patch
to define CPU_SETSIZE, is enough to get Guile 3.0.x to build and run
on my box. All tests pass except strptime in French, and the absence
of crypt. This is a 64-bit build.

-Mike Gran
J
J
John Cowan wrote on 17 Feb 2020 00:24
(name . Mike Gran)(address . spk121@yahoo.com)
CAD2gp_TpVhEVDjPbwY01C66vH+dPgjUut0smG=oHX4pWUTEB2g@mail.gmail.com
Excellent, and thank you all! I've been WIndowsless for a few weeks, but
that should change again soon.

On Sun, Feb 16, 2020 at 6:23 PM Mike Gran <spk121@yahoo.com> wrote:

Toggle quote (21 lines)
> On Fri, Feb 14, 2020 at 09:46:04AM -0800, Charles Stanhope wrote:
> > Andy, I don't know if you'd want to continue this here or on
> > lightening's gitlab page, but I looked into this a little bit a few
> > minutes here and there this past weeek. The x86 "fast-call" calling
> > convention used on Windows x64[0] and shared by Cygwin[1] requires
> > that the caller reserve 32 bytes of memory on the stack for the callee
> > to spill the register parameters (even if the callee takes fewer than
> > four parameters). I think lightening is currently missing that for the
> > x64 case for Cygwin.
> >
> > To test the idea, I made a small modification (patch attached) that is
> > *not* intended as a solution as it doesn't work for the general case,
> > but it does allow the tests to pass on Cygwin 64.
>
> I can confirm that Charles's patch, plus another one line patch
> to define CPU_SETSIZE, is enough to get Guile 3.0.x to build and run
> on my box. All tests pass except strptime in French, and the absence
> of crypt. This is a 64-bit build.
>
> -Mike Gran
>
Attachment: file
C
C
Charles Stanhope wrote on 17 Feb 2020 02:08
(name . Mike Gran)(address . spk121@yahoo.com)
CAPydmiNnZ7qBbUgNJ_aKhfDORSBcHn8PbQABikL=sbP357tD=Q@mail.gmail.com
On 2/16/20, Mike Gran <spk121@yahoo.com> wrote:
Toggle quote (19 lines)
> On Fri, Feb 14, 2020 at 09:46:04AM -0800, Charles Stanhope wrote:
>> Andy, I don't know if you'd want to continue this here or on
>> lightening's gitlab page, but I looked into this a little bit a few
>> minutes here and there this past weeek. The x86 "fast-call" calling
>> convention used on Windows x64[0] and shared by Cygwin[1] requires
>> that the caller reserve 32 bytes of memory on the stack for the callee
>> to spill the register parameters (even if the callee takes fewer than
>> four parameters). I think lightening is currently missing that for the
>> x64 case for Cygwin.
>>
>> To test the idea, I made a small modification (patch attached) that is
>> *not* intended as a solution as it doesn't work for the general case,
>> but it does allow the tests to pass on Cygwin 64.
>
> I can confirm that Charles's patch, plus another one line patch
> to define CPU_SETSIZE, is enough to get Guile 3.0.x to build and run
> on my box. All tests pass except strptime in French, and the absence
> of crypt. This is a 64-bit build.

Mike, thanks for going further with the Guile build. The CPU_SETSIZE
issue was what was hanging me up from compiling before Andy's comment
got me to look at lightening. I assumed I had some configuration,
package, or compiler issue. Good to know there's a simple fix.

Just a further warning to anyone watching, that patch I posted is a
real hack job just to test my theory of the cause of the segfault. I
would expect it to fail when you have fewer than four arguments in a
JITed function call. I wouldn't try doing much else with that Guile
build besides run the tests. :)

--
Charles
C
C
Charles Stanhope wrote on 17 Feb 2020 20:27
(name . Mike Gran)(address . spk121@yahoo.com)
CAPydmiP44TawQf4SWLp1j4OsN-3e_VdDDx4i_R1w83hYQVGhyw@mail.gmail.com
On 2/16/20, Charles Stanhope <charles@stanho.pe> wrote:
Toggle quote (18 lines)
> On 2/16/20, Mike Gran <spk121@yahoo.com> wrote:
>>
>> I can confirm that Charles's patch, plus another one line patch
>> to define CPU_SETSIZE, is enough to get Guile 3.0.x to build and run
>> on my box. All tests pass except strptime in French, and the absence
>> of crypt. This is a 64-bit build.
>
> Mike, thanks for going further with the Guile build. The CPU_SETSIZE
> issue was what was hanging me up from compiling before Andy's comment
> got me to look at lightening. I assumed I had some configuration,
> package, or compiler issue. Good to know there's a simple fix.
>
> Just a further warning to anyone watching, that patch I posted is a
> real hack job just to test my theory of the cause of the segfault. I
> would expect it to fail when you have fewer than four arguments in a
> JITed function call. I wouldn't try doing much else with that Guile
> build besides run the tests. :)

I had a little bit more time to look into the lightening
implementation last night. I've attached a patch that is less horrible
and more correct than my previous one. It reserves the stack space
regardless of the number of parameters and appears to work. But I'm
new to the lightening code base, so I'm not convinced it is the
correct solution. It's just the solution I was left with after my time
ran out. I wanted to post this patch as a replacement to the prior one
in case people did want to do more testing with Guile 3.0 on Cygwin
x64.

With that, I will let more experienced people come up with the
appropriate solution. Happy hacking, everybody!

--
Charles
Toggle diff (15 lines)
diff --git a/lightening/x86.c b/lightening/x86.c
index 965191a..bdd26e1 100644
--- a/lightening/x86.c
+++ b/lightening/x86.c
@@ -328,6 +328,10 @@ reset_abi_arg_iterator(struct abi_arg_iterator *iter, size_t argc,
memset(iter, 0, sizeof *iter);
iter->argc = argc;
iter->args = args;
+#if __CYGWIN__ && __X64
+ // Reserve slots on the stack for 4 register parameters (8 bytes each).
+ iter->stack_size = 32;
+#endif
}
static void
A
A
Andy Wingo wrote on 17 Feb 2020 22:05
(name . Charles Stanhope)(address . charles@stanho.pe)
87blpwj55i.fsf@pobox.com
Aah, you all are amazing -- thank you!! Applied and merged.

Cheers,

Andy

On Mon 17 Feb 2020 20:27, Charles Stanhope <charles@stanho.pe> writes:

Toggle quote (50 lines)
> On 2/16/20, Charles Stanhope <charles@stanho.pe> wrote:
>> On 2/16/20, Mike Gran <spk121@yahoo.com> wrote:
>>>
>>> I can confirm that Charles's patch, plus another one line patch
>>> to define CPU_SETSIZE, is enough to get Guile 3.0.x to build and run
>>> on my box. All tests pass except strptime in French, and the absence
>>> of crypt. This is a 64-bit build.
>>
>> Mike, thanks for going further with the Guile build. The CPU_SETSIZE
>> issue was what was hanging me up from compiling before Andy's comment
>> got me to look at lightening. I assumed I had some configuration,
>> package, or compiler issue. Good to know there's a simple fix.
>>
>> Just a further warning to anyone watching, that patch I posted is a
>> real hack job just to test my theory of the cause of the segfault. I
>> would expect it to fail when you have fewer than four arguments in a
>> JITed function call. I wouldn't try doing much else with that Guile
>> build besides run the tests. :)
>
> I had a little bit more time to look into the lightening
> implementation last night. I've attached a patch that is less horrible
> and more correct than my previous one. It reserves the stack space
> regardless of the number of parameters and appears to work. But I'm
> new to the lightening code base, so I'm not convinced it is the
> correct solution. It's just the solution I was left with after my time
> ran out. I wanted to post this patch as a replacement to the prior one
> in case people did want to do more testing with Guile 3.0 on Cygwin
> x64.
>
> With that, I will let more experienced people come up with the
> appropriate solution. Happy hacking, everybody!
>
> --
> Charles
>
> diff --git a/lightening/x86.c b/lightening/x86.c
> index 965191a..bdd26e1 100644
> --- a/lightening/x86.c
> +++ b/lightening/x86.c
> @@ -328,6 +328,10 @@ reset_abi_arg_iterator(struct abi_arg_iterator *iter, size_t argc,
> memset(iter, 0, sizeof *iter);
> iter->argc = argc;
> iter->args = args;
> +#if __CYGWIN__ && __X64
> + // Reserve slots on the stack for 4 register parameters (8 bytes each).
> + iter->stack_size = 32;
> +#endif
> }
>
> static void
Closed
?