Guile-Git cross-compiled to i586-pc-gnu gets bytestructures wrong

  • Open
  • quality assurance status badge
Details
4 participants
  • Jan Nieuwenhuizen
  • Ludovic Courtès
  • Mathieu Othacehe
  • Taylan Kammer
Owner
unassigned
Submitted by
Ludovic Courtès
Severity
normal
L
L
Ludovic Courtès wrote on 15 Oct 2020 00:25
(address . bug-guix@gnu.org)
87v9fccuml.fsf@inria.fr
Hello!

You might have seen that ‘guix pull’ doesn’t work in your childhurd:

Toggle snippet (4 lines)
Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...
guix pull: error: Git error: invalid version 0 on git_proxy_options

This is due to an ABI breakage whereby the Guile-Git code
(cross-compiled from x86_64-linux) fills in the ‘fetch_opts’ member of
‘git_clone options’ offset by one word.

On closer inspection, the problem is:

Toggle snippet (6 lines)
scheme@(git structs)> (bytestructure-descriptor-size (bs:struct `(("x" ,(bs:pointer uint8)) ("y" ,size_t))))
$20 = 12
scheme@(git structs)> %host-type
$21 = "i586-pc-gnu"

Compare with the correct answer:

Toggle snippet (11 lines)
$ guix environment --ad-hoc -C -s i686-linux guile guile-bytestructures -- guile

[...]

scheme@(guile-user)> ,use(bytestructures guile)
scheme@(guile-user)> %host-type
$1 = "i686-unknown-linux-gnu"
scheme@(guile-user)> (bytestructure-descriptor-size (bs:struct `(("x" ,(bs:pointer uint8))("y" ,size_t))))
$2 = 8

More specifically, the size of ‘size_t’ is wrong, but pointer size is
right:

Toggle snippet (8 lines)
scheme@(git structs)> (bytestructure-descriptor-size size_t)
$27 = 8
scheme@(git structs)> (bytestructure-descriptor-size uintptr_t )
$28 = 8
scheme@(git structs)> (bytestructure-descriptor-size (bs:pointer uint8))
$29 = 4

‘numeric.scm’ in bytestructures reads:

Toggle snippet (14 lines)
(define arch32bit? (cond-expand
(lp32 #t)
(ilp32 #t)
(else #f)))

;; …

(define uintptr_t (if arch32bit?
uint32
uint64))

(define size_t uintptr_t)

But (bytestructures guile numeric-data-model) has this:

Toggle snippet (15 lines)
(define data-model
(if (= 4 (sizeof '*))
(if (= 2 (sizeof int))
'lp32
'ilp32)
(cond
((= 8 (sizeof int)) 'ilp64)
((= 4 (sizeof long)) 'llp64)
(else 'lp64))))

(cond-expand-provide
(current-module)
(list architecture data-model))

The problem is that the ‘cond-expand’ used to define ‘arch32bit?’ is a
expansion-time thing when (cross-)building Bytestructures itself, so
it’s incorrect from cross-building from 64-bit to 32-bit.

I believe that changing it to:

(define arch32bit? (memq data-model '(ilp32 lp32)))

would fix it because then the test would happen at run time.

(Note that Guile-Git uses only the procedural layer of Bytestructures.
The syntactic layer is likely not cross-compilation-capable for similar
reasons.)

To be continued…

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 15 Oct 2020 09:42
(address . 44000@debbugs.gnu.org)
871ri0c4u8.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (10 lines)
> The problem is that the ‘cond-expand’ used to define ‘arch32bit?’ is a
> expansion-time thing when (cross-)building Bytestructures itself, so
> it’s incorrect from cross-building from 64-bit to 32-bit.
>
> I believe that changing it to:
>
> (define arch32bit? (memq data-model '(ilp32 lp32)))
>
> would fix it because then the test would happen at run time.

I confirm that the attached patch for Bytestructures solves the problem
(running ‘guix pull’ in my childhurd :-)).

Let’s see whether it needs to be adapted for inclusion upstream.

Ludo’.
From a44d04ed3f7031b4ab95b2f0da81c7ab020304d1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <ludo@gnu.org>
Date: Thu, 15 Oct 2020 08:47:06 +0200
Subject: [PATCH] Correctly define 'intptr_t' & co. when cross-compiling.


* bytestructures/guile/numeric-data-model.scm (data-model): Export.
* bytestructures/body/numeric.scm (arch32bit?): Check the value of
DATA-MODEL rather than use 'cond-expand'.
---
bytestructures/body/numeric.scm | 8 ++++----
bytestructures/guile/numeric-data-model.scm | 3 ++-
2 files changed, 6 insertions(+), 5 deletions(-)

Toggle diff (32 lines)
diff --git a/bytestructures/body/numeric.scm b/bytestructures/body/numeric.scm
index 842575d..e23cc24 100644
--- a/bytestructures/body/numeric.scm
+++ b/bytestructures/body/numeric.scm
@@ -282,10 +282,10 @@
(define long-long int64)
(define unsigned-long-long uint64)
-(define arch32bit? (cond-expand
- (lp32 #t)
- (ilp32 #t)
- (else #f)))
+(define arch32bit? (case data-model
+ ((lp32) #t)
+ ((ilp32) #t)
+ (else #f)))
(define intptr_t (if arch32bit?
int32
diff --git a/bytestructures/guile/numeric-data-model.scm b/bytestructures/guile/numeric-data-model.scm
index 773b6cd..bd40c69 100644
--- a/bytestructures/guile/numeric-data-model.scm
+++ b/bytestructures/guile/numeric-data-model.scm
@@ -1,4 +1,5 @@
-(define-module (bytestructures guile numeric-data-model))
+(define-module (bytestructures guile numeric-data-model)
+ #:export (data-model))
(import (system foreign))
(import (system base target))
--
2.28.0
L
M
M
Mathieu Othacehe wrote on 15 Oct 2020 10:48
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 44000@debbugs.gnu.org)
87a6wnq3h4.fsf@gnu.org
Hey Ludo,

Toggle quote (3 lines)
> I confirm that the attached patch for Bytestructures solves the problem
> (running ‘guix pull’ in my childhurd :-)).

Nice catch! I guess we had the same issue when cross-compiling for armhf
from x86_64.

Thanks,

Mathieu
J
J
Jan Nieuwenhuizen wrote on 15 Oct 2020 12:06
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 44000@debbugs.gnu.org)
87imbbls67.fsf@gnu.org
Ludovic Courtès writes:

Hi,

Toggle quote (15 lines)
> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> The problem is that the ‘cond-expand’ used to define ‘arch32bit?’ is a
>> expansion-time thing when (cross-)building Bytestructures itself, so
>> it’s incorrect from cross-building from 64-bit to 32-bit.
>>
>> I believe that changing it to:
>>
>> (define arch32bit? (memq data-model '(ilp32 lp32)))
>>
>> would fix it because then the test would happen at run time.
>
> I confirm that the attached patch for Bytestructures solves the problem
> (running ‘guix pull’ in my childhurd :-)).

Wow, beautiful!

Toggle quote (2 lines)
> Let’s see whether it needs to be adapted for inclusion upstream.

Yes, sure.

Greetings,
Janneke

--
Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond http://lilypond.org
Freelance IT http://JoyofSource.com| Avatar® http://AvatarAcademy.com
T
T
Taylan Kammer wrote on 16 Oct 2020 17:58
1325d1c4-e047-76e5-2180-672fef7238b9@gmail.com
On 15.10.2020 09:42, Ludovic Courtès wrote:
Toggle quote (17 lines)
> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> The problem is that the ‘cond-expand’ used to define ‘arch32bit?’ is a
>> expansion-time thing when (cross-)building Bytestructures itself, so
>> it’s incorrect from cross-building from 64-bit to 32-bit.
>>
>> I believe that changing it to:
>>
>> (define arch32bit? (memq data-model '(ilp32 lp32)))
>>
>> would fix it because then the test would happen at run time.
>
> I confirm that the attached patch for Bytestructures solves the problem
> (running ‘guix pull’ in my childhurd :-)).
>
> Let’s see whether it needs to be adapted for inclusion upstream.

Hi Ludo,

Thanks for finding finding this bug and for the patch.

I'll try to include a portable version of the fix ASAP. I should have
time for it tomorrow.

Toggle quote (5 lines)
>
> Ludo’.
>


- Taylan
T
T
Taylan Kammer wrote on 17 Oct 2020 19:44
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 44000@debbugs.gnu.org)
87mu0kn3wh.fsf@gmail.com
Taylan Kammer <taylan.kammer@gmail.com> writes:

Toggle quote (25 lines)
> On 15.10.2020 09:42, Ludovic Courtès wrote:
>> Ludovic Courtès <ludo@gnu.org> skribis:
>>
>>> The problem is that the ‘cond-expand’ used to define ‘arch32bit?’ is a
>>> expansion-time thing when (cross-)building Bytestructures itself, so
>>> it’s incorrect from cross-building from 64-bit to 32-bit.
>>>
>>> I believe that changing it to:
>>>
>>> (define arch32bit? (memq data-model '(ilp32 lp32)))
>>>
>>> would fix it because then the test would happen at run time.
>>
>> I confirm that the attached patch for Bytestructures solves the problem
>> (running ‘guix pull’ in my childhurd :-)).
>>
>> Let’s see whether it needs to be adapted for inclusion upstream.
>
> Hi Ludo,
>
> Thanks for finding finding this bug and for the patch.
>
> I'll try to include a portable version of the fix ASAP. I should have
> time for it tomorrow.

Hi again,

Could you please test whether bytestructures 1.0.8 fixes the issue?

Thank you!


- Taylan
L
L
Ludovic Courtès wrote on 19 Oct 2020 10:23
(name . Taylan Kammer)(address . taylan.kammer@gmail.com)(address . 44000@debbugs.gnu.org)
87blgy39q7.fsf@gnu.org
Hi,

Taylan Kammer <taylan.kammer@gmail.com> skribis:

Toggle quote (2 lines)
> Could you please test whether bytestructures 1.0.8 fixes the issue?

Thanks for the prompt reply! I tested 1.0.8 and it does not fix the
problem.

I think the problem might be that the ‘cond-expand-provide’ call might
affect a module that’s not the one tested in (eval '(cond-expand …) …).

Does that make sense?

Ludo’.
T
T
Taylan Kammer wrote on 22 Oct 2020 20:47
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 44000@debbugs.gnu.org)
87ft66umhj.fsf@gmail.com
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (14 lines)
> Hi,
>
> Taylan Kammer <taylan.kammer@gmail.com> skribis:
>
>> Could you please test whether bytestructures 1.0.8 fixes the issue?
>
> Thanks for the prompt reply! I tested 1.0.8 and it does not fix the
> problem.
>
> I think the problem might be that the ‘cond-expand-provide’ call might
> affect a module that’s not the one tested in (eval '(cond-expand …) …).
>
> Does that make sense?

Yes, you're right. I've made another release (1.0.9), where I use

(environment '(guile) '(bytestructures guile numeric-data-model))

for the 'base-environment' binding in case we're running on Guile.

It now gives me correct results locally (woe on me for not having
properly tested the previous one) so I think it should definitely work
when cross-compiling too, since the 'eval' is sure to be executed at
run-time and not compile-time...

Fingers crossed, I hope I don't waste your time this time!

- Taylan
L
L
Ludovic Courtès wrote on 16 Nov 2020 17:10
(name . Taylan Kammer)(address . taylan.kammer@gmail.com)(address . 44000@debbugs.gnu.org)
87o8jxuvsy.fsf@gnu.org
Hi Taylan,

Apologies for the delay, it seems I hadn’t noticed your reply.

Taylan Kammer <taylan.kammer@gmail.com> skribis:

Toggle quote (11 lines)
> Yes, you're right. I've made another release (1.0.9), where I use
>
> (environment '(guile) '(bytestructures guile numeric-data-model))
>
> for the 'base-environment' binding in case we're running on Guile.
>
> It now gives me correct results locally (woe on me for not having
> properly tested the previous one) so I think it should definitely work
> when cross-compiling too, since the 'eval' is sure to be executed at
> run-time and not compile-time...

1.0.9 seems to help my rather involved use case (Guix cross-compiled to
GNU/Hurd from x86_64-linux, then running ‘guix pull’, which depends on
Guile-Git, which uses Bytestructures) but it still eventually crashes:

Toggle snippet (5 lines)
ludo@childhurd ~$ /gnu/store/mxi1za8gdq77438ywgzdzy2zywb9nk76-guix-1.2.0rc1-1.3ba6ffd/bin/guix pull
Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...
receiving objects 25% [############# ]Illegal instruction

The problem may well be elsewhere though.

However, at the REPL I can no longer access the ‘numeric’ module:

Toggle snippet (17 lines)
ludo@childhurd ~$ /gnu/store/mxi1za8gdq77438ywgzdzy2zywb9nk76-guix-1.2.0rc1-1.3ba6ffd/bin/guix repl
GNU Guile 3.0.4
Copyright (C) 1995-2020 Free Software Foundation, Inc.

Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.

Enter `,help' for help.
scheme@(guix-user)> ,m(bytestructures body numeric)
While executing meta-command:
error: environment: unbound variable
scheme@(guix-user)> (@@ (bytestructures body numeric) arch-32bit?)
While compiling expression:
error: environment: unbound variable

Thoughts?

Thanks,
Ludo’.
T
T
Taylan Kammer wrote on 16 Nov 2020 18:07
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 44000@debbugs.gnu.org)
0d2d91d0-c335-ad6a-6249-8090837f16ff@gmail.com
On 16.11.2020 17:10, Ludovic Courtès wrote:
Toggle quote (13 lines)
>
> 1.0.9 seems to help my rather involved use case (Guix cross-compiled to
> GNU/Hurd from x86_64-linux, then running ‘guix pull’, which depends on
> Guile-Git, which uses Bytestructures) but it still eventually crashes:
>
> --8<---------------cut here---------------start------------->8---
> ludo@childhurd ~$ /gnu/store/mxi1za8gdq77438ywgzdzy2zywb9nk76-guix-1.2.0rc1-1.3ba6ffd/bin/guix pull
> Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...
> receiving objects 25% [############# ]Illegal instruction
> --8<---------------cut here---------------end--------------->8---
>
> The problem may well be elsewhere though.

Had you tested the same with your patch? If it worked with your patch
but doesn't work with 1.0.9 then I'll have to take another look I guess.

Toggle quote (20 lines)
> However, at the REPL I can no longer access the ‘numeric’ module:
>
> --8<---------------cut here---------------start------------->8---
> ludo@childhurd ~$ /gnu/store/mxi1za8gdq77438ywgzdzy2zywb9nk76-guix-1.2.0rc1-1.3ba6ffd/bin/guix repl
> GNU Guile 3.0.4
> Copyright (C) 1995-2020 Free Software Foundation, Inc.
>
> Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
> This program is free software, and you are welcome to redistribute it
> under certain conditions; type `,show c' for details.
>
> Enter `,help' for help.
> scheme@(guix-user)> ,m(bytestructures body numeric)
> While executing meta-command:
> error: environment: unbound variable
> scheme@(guix-user)> (@@ (bytestructures body numeric) arch-32bit?)
> While compiling expression:
> error: environment: unbound variable
> --8<---------------cut here---------------end--------------->8---

One shouldn't try to use (bytestructures body ...) as modules. The
files in that directory contain no module-related boilerplate at all,
including imports, which is why in this case it complains about the lack
of the binding 'environment'.

The module (bytestructures guile numeric) ought to work, although it
doesn't contain the binding 'arch32bit?'. Do you actually need that
predicate, or was that just for demonstration? The "right" way to test
whether the predicate works correctly would be to check whether e.g.
intptr_t equals int32 or int64 as per 'eq?'.

Toggle quote (4 lines)
> Thanks,
> Ludo’.
>

Hope I was able to help!

Taylan
L
L
Ludovic Courtès wrote on 16 Nov 2020 18:37
(name . Taylan Kammer)(address . taylan.kammer@gmail.com)(address . 44000@debbugs.gnu.org)
87blfxurrf.fsf@gnu.org
Hi,

Taylan Kammer <taylan.kammer@gmail.com> skribis:

Toggle quote (6 lines)
> The module (bytestructures guile numeric) ought to work, although it
> doesn't contain the binding 'arch32bit?'. Do you actually need that
> predicate, or was that just for demonstration? The "right" way to
> test whether the predicate works correctly would be to check whether
> e.g. intptr_t equals int32 or int64 as per 'eq?'.

Oh right, that makes sense. Apparently, all is good:

Toggle snippet (9 lines)
scheme@(guix-user)> ,m (bytestructures guile numeric)
scheme@(bytestructures guile numeric)> intptr_t
$2 = #<bytestructure-descriptor 0x5600e60>
scheme@(bytestructures guile numeric)> (eq? intptr_t int32)
$3 = #t
scheme@(bytestructures guile numeric)> (eq? intptr_t int64)
$4 = #f

I’ll get more debugging info about the crash and will let you know if
Bytestructures is the culprit. ;-)

Ludo’.
L
L
Ludovic Courtès wrote on 16 Nov 2020 23:01
(address . 44000@debbugs.gnu.org)
875z65ufkl.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (8 lines)
> 1.0.9 seems to help my rather involved use case (Guix cross-compiled to
> GNU/Hurd from x86_64-linux, then running ‘guix pull’, which depends on
> Guile-Git, which uses Bytestructures) but it still eventually crashes:
>
> ludo@childhurd ~$ /gnu/store/mxi1za8gdq77438ywgzdzy2zywb9nk76-guix-1.2.0rc1-1.3ba6ffd/bin/guix pull
> Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...
> receiving objects 25% [############# ]Illegal instruction

Most likely, this crash is due to the use of new progress report feature
in (guix git). If I arrange to not use it, clone the whole repo
proceeds without crashing.

Unfortunately I’m unable to get a usable backtrace once SIGILL has been
raised, be it by attaching to it (using
CRASHSERVER=/servers/crash-suspend), spawning it and crashing it from
gdb, or inspecting its core file. At best I can see the backtrace of
the msg thread, nothing more:

Toggle snippet (29 lines)
(gdb) thread 1
[Switching to thread 1 (process 310)]
#0 0x080f08c0 in ?? ()
(gdb) bt
#0 0x080f08c0 in ?? ()
#1 0x00000000 in ?? ()
(gdb) thread 2
[Switching to thread 2 (process 1)]
#0 0x0159282c in mach_msg_trap () at /tmp/guix-build-glibc-cross-i586-pc-gnu-2.31.drv-0/build/mach/mach_msg_trap.S:2
2 /tmp/guix-build-glibc-cross-i586-pc-gnu-2.31.drv-0/build/mach/mach_msg_trap.S: No such file or directory.
(gdb) bt
#0 0x0159282c in mach_msg_trap () at /tmp/guix-build-glibc-cross-i586-pc-gnu-2.31.drv-0/build/mach/mach_msg_trap.S:2
#1 0x01592f2a in __GI___mach_msg (msg=0x2802aa0, option=3, send_size=96, rcv_size=32, rcv_name=109, timeout=0, notify=0) at msg.c:111
#2 0x017dc8ab in __crash_dump_task (crashserver=132, task=1, file=133, signo=11, sigcode=2, sigerror=2, exc=1, code=2, subcode=210986494, cttyid_port=102, cttyid_portPoly=19)
at /tmp/guix-build-glibc-cross-i586-pc-gnu-2.31.drv-0/build/hurd/RPC_crash_dump_task.c:254
#3 0x015b248c in write_corefile (detail=<optimized out>, signo=<optimized out>) at hurdsig.c:296
#4 post_signal (untraced=<optimized out>) at hurdsig.c:947
#5 0x015b274b in _hurd_internal_post_signal (ss=0x1800808, signo=11, detail=0x2802e5c, reply_port=0, reply_port_type=17, untraced=0) at hurdsig.c:1235
#6 0x015b3fc1 in _S_catch_exception_raise (port=96, thread=39, task=1, exception=1, code=2, subcode=210986494) at catch-exc.c:88
#7 0x017c09b4 in _Xexception_raise (InHeadP=0x2802f20, OutHeadP=0x2803f30) at /tmp/guix-build-glibc-cross-i586-pc-gnu-2.31.drv-0/build/mach/mach/exc_server.c:155
#8 0x017c0a52 in _S_exc_server (InHeadP=0x2802f20, OutHeadP=0x2803f30) at /tmp/guix-build-glibc-cross-i586-pc-gnu-2.31.drv-0/build/mach/mach/exc_server.c:208
#9 0x015a7a09 in msgport_server (inp=0x2802f20, outp=0x2803f30) at msgportdemux.c:49
#10 0x015934c3 in __mach_msg_server_timeout (demux=0x15a79b0 <msgport_server>, max_size=4096, rcv_name=96, option=0, timeout=0) at msgserver.c:108
#11 0x01593607 in __mach_msg_server (demux=0x15a79b0 <msgport_server>, max_size=4096, rcv_name=96) at msgserver.c:195
#12 0x015a7a86 in _hurd_msgport_receive () at msgportdemux.c:67
#13 0x011eda50 in entry_point (self=0x804ac20, start_routine=0x15a7a30 <_hurd_msgport_receive>, arg=0x0) at pt-create.c:62
#14 0x00000000 in ?? ()

(Would be nice to fix this debugging issue. It looks like GDB fails to
interpret the thread state of suspended threads or something.)

Anyway, this crash could be due to a GC issue, in which case it would
not be Hurd-specific. I could not reproduce it with i686-linux ‘guix’
though, and obviously not with x86_64 binaries either.

Ludo’.
?