Guile-Git cross-compiled to i586-pc-gnu gets bytestructures wrong

OpenSubmitted by Ludovic Courtès.
Details
4 participants
  • Jan Nieuwenhuizen
  • Ludovic Courtès
  • Mathieu Othacehe
  • Taylan Kammer
Owner
unassigned
Severity
normal
L
L
Ludovic Courtès wrote on 15 Oct 00:25 +0200
(address . bug-guix@gnu.org)
87v9fccuml.fsf@inria.fr
Hello!
You might have seen that ‘guix pull’ doesn’t work in your childhurd:
Toggle snippet (4 lines)Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...guix pull: error: Git error: invalid version 0 on git_proxy_options
This is due to an ABI breakage whereby the Guile-Git code(cross-compiled from x86_64-linux) fills in the ‘fetch_opts’ member of‘git_clone¬†options’ offset by one word.
On closer inspection, the problem is:
Toggle snippet (6 lines)scheme@(git structs)> (bytestructure-descriptor-size (bs:struct `(("x" ,(bs:pointer uint8)) ("y" ,size_t))))$20 = 12scheme@(git structs)> %host-type$21 = "i586-pc-gnu"
Compare with the correct answer:
Toggle snippet (11 lines)$ guix environment --ad-hoc -C -s i686-linux guile guile-bytestructures -- guile
[...]
scheme@(guile-user)> ,use(bytestructures guile)scheme@(guile-user)> %host-type$1 = "i686-unknown-linux-gnu"scheme@(guile-user)> (bytestructure-descriptor-size (bs:struct `(("x" ,(bs:pointer uint8))("y" ,size_t))))$2 = 8
More specifically, the size of ‘size_t’ is wrong, but pointer size isright:
Toggle snippet (8 lines)scheme@(git structs)> (bytestructure-descriptor-size size_t)$27 = 8scheme@(git structs)> (bytestructure-descriptor-size uintptr_t )$28 = 8scheme@(git structs)> (bytestructure-descriptor-size (bs:pointer uint8))$29 = 4
‘numeric.scm’ in bytestructures reads:
Toggle snippet (14 lines)(define arch32bit? (cond-expand (lp32 #t) (ilp32 #t) (else #f)))
;; …
(define uintptr_t (if arch32bit? uint32 uint64))
(define size_t uintptr_t)
But (bytestructures guile numeric-data-model) has this:
Toggle snippet (15 lines)(define data-model (if (= 4 (sizeof '*)) (if (= 2 (sizeof int)) 'lp32 'ilp32) (cond ((= 8 (sizeof int)) 'ilp64) ((= 4 (sizeof long)) 'llp64) (else 'lp64))))
(cond-expand-provide (current-module) (list architecture data-model))
The problem is that the ‘cond-expand’ used to define ‘arch32bit?’ is aexpansion-time thing when (cross-)building Bytestructures itself, soit’s incorrect from cross-building from 64-bit to 32-bit.
I believe that changing it to:
(define arch32bit? (memq data-model '(ilp32 lp32)))
would fix it because then the test would happen at run time.
(Note that Guile-Git uses only the procedural layer of Bytestructures.The syntactic layer is likely not cross-compilation-capable for similarreasons.)
To be continued…
Thanks,Ludo’.
L
L
Ludovic Courtès wrote on 15 Oct 09:42 +0200
(address . 44000@debbugs.gnu.org)
871ri0c4u8.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:
Toggle quote (10 lines)> The problem is that the ‘cond-expand’ used to define ‘arch32bit?’ is a> expansion-time thing when (cross-)building Bytestructures itself, so> it’s incorrect from cross-building from 64-bit to 32-bit.>> I believe that changing it to:>> (define arch32bit? (memq data-model '(ilp32 lp32)))>> would fix it because then the test would happen at run time.
I confirm that the attached patch for Bytestructures solves the problem(running ‘guix pull’ in my childhurd :-)).
Let’s see whether it needs to be adapted for inclusion upstream.
Ludo’.
From a44d04ed3f7031b4ab95b2f0da81c7ab020304d1 Mon Sep 17 00:00:00 2001From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <ludo@gnu.org>Date: Thu, 15 Oct 2020 08:47:06 +0200Subject: [PATCH] Correctly define 'intptr_t' & co. when cross-compiling.
Fixes https://issues.guix.gnu.org/44000.
* bytestructures/guile/numeric-data-model.scm (data-model): Export.* bytestructures/body/numeric.scm (arch32bit?): Check the value ofDATA-MODEL rather than use 'cond-expand'.--- bytestructures/body/numeric.scm | 8 ++++---- bytestructures/guile/numeric-data-model.scm | 3 ++- 2 files changed, 6 insertions(+), 5 deletions(-)
Toggle diff (32 lines)diff --git a/bytestructures/body/numeric.scm b/bytestructures/body/numeric.scmindex 842575d..e23cc24 100644--- a/bytestructures/body/numeric.scm+++ b/bytestructures/body/numeric.scm@@ -282,10 +282,10 @@ (define long-long int64) (define unsigned-long-long uint64) -(define arch32bit? (cond-expand- (lp32 #t)- (ilp32 #t)- (else #f)))+(define arch32bit? (case data-model+ ((lp32) #t)+ ((ilp32) #t)+ (else #f))) (define intptr_t (if arch32bit? int32diff --git a/bytestructures/guile/numeric-data-model.scm b/bytestructures/guile/numeric-data-model.scmindex 773b6cd..bd40c69 100644--- a/bytestructures/guile/numeric-data-model.scm+++ b/bytestructures/guile/numeric-data-model.scm@@ -1,4 +1,5 @@-(define-module (bytestructures guile numeric-data-model))+(define-module (bytestructures guile numeric-data-model)+ #:export (data-model)) (import (system foreign)) (import (system base target))-- 2.28.0
L
L
Ludovic Courtès wrote on 15 Oct 10:09 +0200
(address . 44000@debbugs.gnu.org)
87lfg7c3ll.fsf@gnu.org
M
M
Mathieu Othacehe wrote on 15 Oct 10:48 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 44000@debbugs.gnu.org)
87a6wnq3h4.fsf@gnu.org
Hey Ludo,
Toggle quote (3 lines)> I confirm that the attached patch for Bytestructures solves the problem> (running ‘guix pull’ in my childhurd :-)).
Nice catch! I guess we had the same issue when cross-compiling for armhffrom x86_64.
Thanks,
Mathieu
J
J
Jan Nieuwenhuizen wrote on 15 Oct 12:06 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 44000@debbugs.gnu.org)
87imbbls67.fsf@gnu.org
Ludovic Courtès writes:
Hi,
Toggle quote (15 lines)> Ludovic Courtès <ludo@gnu.org> skribis:>>> The problem is that the ‘cond-expand’ used to define ‘arch32bit?’ is a>> expansion-time thing when (cross-)building Bytestructures itself, so>> it’s incorrect from cross-building from 64-bit to 32-bit.>>>> I believe that changing it to:>>>> (define arch32bit? (memq data-model '(ilp32 lp32)))>>>> would fix it because then the test would happen at run time.>> I confirm that the attached patch for Bytestructures solves the problem> (running ‘guix pull’ in my childhurd :-)).
Wow, beautiful!
Toggle quote (2 lines)> Let’s see whether it needs to be adapted for inclusion upstream.
Yes, sure.
Greetings,Janneke
-- Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond http://lilypond.orgFreelance IT http://JoyofSource.com| Avatar® http://AvatarAcademy.com
T
T
Taylan Kammer wrote on 16 Oct 17:58 +0200
1325d1c4-e047-76e5-2180-672fef7238b9@gmail.com
On 15.10.2020 09:42, Ludovic Courtès wrote:
Toggle quote (17 lines)> Ludovic Courtès <ludo@gnu.org> skribis:> >> The problem is that the ‘cond-expand’ used to define ‘arch32bit?’ is a>> expansion-time thing when (cross-)building Bytestructures itself, so>> it’s incorrect from cross-building from 64-bit to 32-bit.>>>> I believe that changing it to:>>>> (define arch32bit? (memq data-model '(ilp32 lp32)))>>>> would fix it because then the test would happen at run time.> > I confirm that the attached patch for Bytestructures solves the problem> (running ‘guix pull’ in my childhurd :-)).> > Let’s see whether it needs to be adapted for inclusion upstream.
Hi Ludo,
Thanks for finding finding this bug and for the patch.
I'll try to include a portable version of the fix ASAP. I should havetime for it tomorrow.
Toggle quote (5 lines)> > Ludo’.>

- Taylan
T
T
Taylan Kammer wrote on 17 Oct 19:44 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 44000@debbugs.gnu.org)
87mu0kn3wh.fsf@gmail.com
Taylan Kammer <taylan.kammer@gmail.com> writes:
Toggle quote (25 lines)> On 15.10.2020 09:42, Ludovic Courtès wrote:>> Ludovic Courtès <ludo@gnu.org> skribis:>> >>> The problem is that the ‘cond-expand’ used to define ‘arch32bit?’ is a>>> expansion-time thing when (cross-)building Bytestructures itself, so>>> it’s incorrect from cross-building from 64-bit to 32-bit.>>>>>> I believe that changing it to:>>>>>> (define arch32bit? (memq data-model '(ilp32 lp32)))>>>>>> would fix it because then the test would happen at run time.>> >> I confirm that the attached patch for Bytestructures solves the problem>> (running ‘guix pull’ in my childhurd :-)).>> >> Let’s see whether it needs to be adapted for inclusion upstream.>> Hi Ludo,>> Thanks for finding finding this bug and for the patch.>> I'll try to include a portable version of the fix ASAP. I should have> time for it tomorrow.
Hi again,
Could you please test whether bytestructures 1.0.8 fixes the issue?
Thank you!

- Taylan
L
L
Ludovic Courtès wrote on 19 Oct 10:23 +0200
(name . Taylan Kammer)(address . taylan.kammer@gmail.com)(address . 44000@debbugs.gnu.org)
87blgy39q7.fsf@gnu.org
Hi,
Taylan Kammer <taylan.kammer@gmail.com> skribis:
Toggle quote (2 lines)> Could you please test whether bytestructures 1.0.8 fixes the issue?
Thanks for the prompt reply! I tested 1.0.8 and it does not fix theproblem.
I think the problem might be that the ‘cond-expand-provide’ call mightaffect a module that’s not the one tested in (eval '(cond-expand …) …).
Does that make sense?
Ludo’.
T
T
Taylan Kammer wrote on 22 Oct 20:47 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 44000@debbugs.gnu.org)
87ft66umhj.fsf@gmail.com
Ludovic Courtès <ludo@gnu.org> writes:
Toggle quote (14 lines)> Hi,>> Taylan Kammer <taylan.kammer@gmail.com> skribis:>>> Could you please test whether bytestructures 1.0.8 fixes the issue?>> Thanks for the prompt reply! I tested 1.0.8 and it does not fix the> problem.>> I think the problem might be that the ‘cond-expand-provide’ call might> affect a module that’s not the one tested in (eval '(cond-expand …) …).>> Does that make sense?
Yes, you're right. I've made another release (1.0.9), where I use
(environment '(guile) '(bytestructures guile numeric-data-model))
for the 'base-environment' binding in case we're running on Guile.
It now gives me correct results locally (woe on me for not havingproperly tested the previous one) so I think it should definitely workwhen cross-compiling too, since the 'eval' is sure to be executed atrun-time and not compile-time...
Fingers crossed, I hope I don't waste your time this time!
- Taylan
L
L
Ludovic Courtès wrote on 16 Nov 17:10 +0100
(name . Taylan Kammer)(address . taylan.kammer@gmail.com)(address . 44000@debbugs.gnu.org)
87o8jxuvsy.fsf@gnu.org
Hi Taylan,
Apologies for the delay, it seems I hadn’t noticed your reply.
Taylan Kammer <taylan.kammer@gmail.com> skribis:
Toggle quote (11 lines)> Yes, you're right. I've made another release (1.0.9), where I use>> (environment '(guile) '(bytestructures guile numeric-data-model))>> for the 'base-environment' binding in case we're running on Guile.>> It now gives me correct results locally (woe on me for not having> properly tested the previous one) so I think it should definitely work> when cross-compiling too, since the 'eval' is sure to be executed at> run-time and not compile-time...
1.0.9 seems to help my rather involved use case (Guix cross-compiled toGNU/Hurd from x86_64-linux, then running ‘guix pull’, which depends onGuile-Git, which uses Bytestructures) but it still eventually crashes:
Toggle snippet (5 lines)ludo@childhurd ~$ /gnu/store/mxi1za8gdq77438ywgzdzy2zywb9nk76-guix-1.2.0rc1-1.3ba6ffd/bin/guix pullUpdating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...receiving objects 25% [############# ]Illegal instruction
The problem may well be elsewhere though.
However, at the REPL I can no longer access the ‘numeric’ module:
Toggle snippet (17 lines)ludo@childhurd ~$ /gnu/store/mxi1za8gdq77438ywgzdzy2zywb9nk76-guix-1.2.0rc1-1.3ba6ffd/bin/guix replGNU Guile 3.0.4Copyright (C) 1995-2020 Free Software Foundation, Inc.
Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.This program is free software, and you are welcome to redistribute itunder certain conditions; type `,show c' for details.
Enter `,help' for help.scheme@(guix-user)> ,m(bytestructures body numeric)While executing meta-command:error: environment: unbound variablescheme@(guix-user)> (@@ (bytestructures body numeric) arch-32bit?)While compiling expression:error: environment: unbound variable
Thoughts?
Thanks,Ludo’.
T
T
Taylan Kammer wrote on 16 Nov 18:07 +0100
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 44000@debbugs.gnu.org)
0d2d91d0-c335-ad6a-6249-8090837f16ff@gmail.com
On 16.11.2020 17:10, Ludovic Courtès wrote:
Toggle quote (13 lines)>> 1.0.9 seems to help my rather involved use case (Guix cross-compiled to> GNU/Hurd from x86_64-linux, then running ‘guix pull’, which depends on> Guile-Git, which uses Bytestructures) but it still eventually crashes:> > --8<---------------cut here---------------start------------->8---> ludo@childhurd ~$ /gnu/store/mxi1za8gdq77438ywgzdzy2zywb9nk76-guix-1.2.0rc1-1.3ba6ffd/bin/guix pull> Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...> receiving objects 25% [############# ]Illegal instruction> --8<---------------cut here---------------end--------------->8---> > The problem may well be elsewhere though.
Had you tested the same with your patch? If it worked with your patch but doesn't work with 1.0.9 then I'll have to take another look I guess.
Toggle quote (20 lines)> However, at the REPL I can no longer access the ‘numeric’ module:> > --8<---------------cut here---------------start------------->8---> ludo@childhurd ~$ /gnu/store/mxi1za8gdq77438ywgzdzy2zywb9nk76-guix-1.2.0rc1-1.3ba6ffd/bin/guix repl> GNU Guile 3.0.4> Copyright (C) 1995-2020 Free Software Foundation, Inc.> > Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.> This program is free software, and you are welcome to redistribute it> under certain conditions; type `,show c' for details.> > Enter `,help' for help.> scheme@(guix-user)> ,m(bytestructures body numeric)> While executing meta-command:> error: environment: unbound variable> scheme@(guix-user)> (@@ (bytestructures body numeric) arch-32bit?)> While compiling expression:> error: environment: unbound variable> --8<---------------cut here---------------end--------------->8---
One shouldn't try to use (bytestructures body ...) as modules. The files in that directory contain no module-related boilerplate at all, including imports, which is why in this case it complains about the lack of the binding 'environment'.
The module (bytestructures guile numeric) ought to work, although it doesn't contain the binding 'arch32bit?'. Do you actually need that predicate, or was that just for demonstration? The "right" way to test whether the predicate works correctly would be to check whether e.g. intptr_t equals int32 or int64 as per 'eq?'.
Toggle quote (4 lines)> Thanks,> Ludo’.>
Hope I was able to help!
Taylan
L
L
Ludovic Courtès wrote on 16 Nov 18:37 +0100
(name . Taylan Kammer)(address . taylan.kammer@gmail.com)(address . 44000@debbugs.gnu.org)
87blfxurrf.fsf@gnu.org
Hi,
Taylan Kammer <taylan.kammer@gmail.com> skribis:
Toggle quote (6 lines)> The module (bytestructures guile numeric) ought to work, although it> doesn't contain the binding 'arch32bit?'. Do you actually need that > predicate, or was that just for demonstration? The "right" way to> test whether the predicate works correctly would be to check whether> e.g. intptr_t equals int32 or int64 as per 'eq?'.
Oh right, that makes sense. Apparently, all is good:
Toggle snippet (9 lines)scheme@(guix-user)> ,m (bytestructures guile numeric) scheme@(bytestructures guile numeric)> intptr_t$2 = #<bytestructure-descriptor 0x5600e60>scheme@(bytestructures guile numeric)> (eq? intptr_t int32)$3 = #tscheme@(bytestructures guile numeric)> (eq? intptr_t int64)$4 = #f
I’ll get more debugging info about the crash and will let you know ifBytestructures is the culprit. ;-)
Ludo’.
L
L
Ludovic Courtès wrote on 16 Nov 23:01 +0100
(address . 44000@debbugs.gnu.org)
875z65ufkl.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:
Toggle quote (8 lines)> 1.0.9 seems to help my rather involved use case (Guix cross-compiled to> GNU/Hurd from x86_64-linux, then running ‘guix pull’, which depends on> Guile-Git, which uses Bytestructures) but it still eventually crashes:>> ludo@childhurd ~$ /gnu/store/mxi1za8gdq77438ywgzdzy2zywb9nk76-guix-1.2.0rc1-1.3ba6ffd/bin/guix pull> Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...> receiving objects 25% [############# ]Illegal instruction
Most likely, this crash is due to the use of new progress report featurein (guix git). If I arrange to not use it, clone the whole repoproceeds without crashing.
Unfortunately I’m unable to get a usable backtrace once SIGILL has beenraised, be it by attaching to it (usingCRASHSERVER=/servers/crash-suspend), spawning it and crashing it fromgdb, or inspecting its core file. At best I can see the backtrace ofthe msg thread, nothing more:
Toggle snippet (29 lines)(gdb) thread 1[Switching to thread 1 (process 310)]#0 0x080f08c0 in ?? ()(gdb) bt#0 0x080f08c0 in ?? ()#1 0x00000000 in ?? ()(gdb) thread 2[Switching to thread 2 (process 1)]#0 0x0159282c in mach_msg_trap () at /tmp/guix-build-glibc-cross-i586-pc-gnu-2.31.drv-0/build/mach/mach_msg_trap.S:22 /tmp/guix-build-glibc-cross-i586-pc-gnu-2.31.drv-0/build/mach/mach_msg_trap.S: No such file or directory.(gdb) bt#0 0x0159282c in mach_msg_trap () at /tmp/guix-build-glibc-cross-i586-pc-gnu-2.31.drv-0/build/mach/mach_msg_trap.S:2#1 0x01592f2a in __GI___mach_msg (msg=0x2802aa0, option=3, send_size=96, rcv_size=32, rcv_name=109, timeout=0, notify=0) at msg.c:111#2 0x017dc8ab in __crash_dump_task (crashserver=132, task=1, file=133, signo=11, sigcode=2, sigerror=2, exc=1, code=2, subcode=210986494, cttyid_port=102, cttyid_portPoly=19) at /tmp/guix-build-glibc-cross-i586-pc-gnu-2.31.drv-0/build/hurd/RPC_crash_dump_task.c:254#3 0x015b248c in write_corefile (detail=<optimized out>, signo=<optimized out>) at hurdsig.c:296#4 post_signal (untraced=<optimized out>) at hurdsig.c:947#5 0x015b274b in _hurd_internal_post_signal (ss=0x1800808, signo=11, detail=0x2802e5c, reply_port=0, reply_port_type=17, untraced=0) at hurdsig.c:1235#6 0x015b3fc1 in _S_catch_exception_raise (port=96, thread=39, task=1, exception=1, code=2, subcode=210986494) at catch-exc.c:88#7 0x017c09b4 in _Xexception_raise (InHeadP=0x2802f20, OutHeadP=0x2803f30) at /tmp/guix-build-glibc-cross-i586-pc-gnu-2.31.drv-0/build/mach/mach/exc_server.c:155#8 0x017c0a52 in _S_exc_server (InHeadP=0x2802f20, OutHeadP=0x2803f30) at /tmp/guix-build-glibc-cross-i586-pc-gnu-2.31.drv-0/build/mach/mach/exc_server.c:208#9 0x015a7a09 in msgport_server (inp=0x2802f20, outp=0x2803f30) at msgportdemux.c:49#10 0x015934c3 in __mach_msg_server_timeout (demux=0x15a79b0 <msgport_server>, max_size=4096, rcv_name=96, option=0, timeout=0) at msgserver.c:108#11 0x01593607 in __mach_msg_server (demux=0x15a79b0 <msgport_server>, max_size=4096, rcv_name=96) at msgserver.c:195#12 0x015a7a86 in _hurd_msgport_receive () at msgportdemux.c:67#13 0x011eda50 in entry_point (self=0x804ac20, start_routine=0x15a7a30 <_hurd_msgport_receive>, arg=0x0) at pt-create.c:62#14 0x00000000 in ?? ()
(Would be nice to fix this debugging issue. It looks like GDB fails tointerpret the thread state of suspended threads or something.)
Anyway, this crash could be due to a GC issue, in which case it wouldnot be Hurd-specific. I could not reproduce it with i686-linux ‘guix’though, and obviously not with x86_64 binaries either.
Ludo’.
?