Daemon tries to build GNU/Hurd derivations on GNU/Linux

  • Done
  • quality assurance status badge
Details
2 participants
  • Jan Nieuwenhuizen
  • Ludovic Courtès
Owner
unassigned
Submitted by
Ludovic Courtès
Severity
normal
L
L
Ludovic Courtès wrote on 28 Sep 2020 10:43
(address . bug-guix@gnu.org)
87a6xacmw5.fsf@inria.fr
Here’s the problem:

Toggle snippet (18 lines)
$ guix build guile-bootstrap -s i586-gnu
La jena derivo estos konstruata:
/gnu/store/xi7r3bryibnyvjrs6avv8qp676mja4w2-guile-bootstrap-2.0.drv
building /gnu/store/xi7r3bryibnyvjrs6avv8qp676mja4w2-guile-bootstrap-2.0.drv...
builder for `/gnu/store/xi7r3bryibnyvjrs6avv8qp676mja4w2-guile-bootstrap-2.0.drv' failed due to signal 11 (Segmentation fault)
build of /gnu/store/xi7r3bryibnyvjrs6avv8qp676mja4w2-guile-bootstrap-2.0.drv failed
View build log at '/var/log/guix/drvs/xi/7r3bryibnyvjrs6avv8qp676mja4w2-guile-bootstrap-2.0.drv.bz2'.
guix build: error: build of `/gnu/store/xi7r3bryibnyvjrs6avv8qp676mja4w2-guile-bootstrap-2.0.drv' failed
$ uname -o
GNU/Linux
$ guix describe
Generacio 160 Sep 25 2020 23:40:20 (nuna)
guix a0d4aa2
repository URL: https://git.savannah.gnu.org/git/guix.git
branch: master
commit: a0d4aa2457d7e36012143ffe3fbc9dd4bc20ca4f

It’s no wonder that the GNU/Hurd executable fails to run on GNU/Linux.
The reason the daemon tries to run it anyway is because of the hack
introduced in 7bf2a70a4ffd976d50638d3b9f2ec409763157df, in support of
transparent emulation via binfmt_misc.

Ludo’.
L
L
Ludovic Courtès wrote on 28 Sep 2020 12:31
(address . 43668@debbugs.gnu.org)
875z7ychvp.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (5 lines)
> It’s no wonder that the GNU/Hurd executable fails to run on GNU/Linux.
> The reason the daemon tries to run it anyway is because of the hack
> introduced in 7bf2a70a4ffd976d50638d3b9f2ec409763157df, in support of
> transparent emulation via binfmt_misc.

The thing is that x86 GNU/Hurd and GNU/Linux ELF binaries are
indistinguishable AFAICS since they both use ELFOSABI_NONE:

Toggle snippet (12 lines)
scheme@(guile-user)> ,use(guix elf)
scheme@(guile-user)> ,use(rnrs io ports)
scheme@(guile-user)> (define e (parse-elf (call-with-input-file "/gnu/store/vq7zyb4hmlrafflmrcjbqccxp4dsx0s3-bash" get-bytevector-all)))
scheme@(guile-user)> (elf-abi e)
$6 = 0
scheme@(guile-user)> ELFOSABI_GNU
$7 = 3
scheme@(guile-user)> (define e2 (parse-elf (call-with-input-file "/bin/sh" get-bytevector-all)))
scheme@(guile-user)> (elf-abi e2)
$8 = 0

(The ‘file’ command does manage to recognize GNU/Hurd binaries, but I
don’t know how it does it.)

So I think we can’t count on an ‘execve’ error and thus have to treat
this case (same architecture but different OS kernel) specially, as
shown below.

Thoughts?

Ludo’.
Toggle diff (44 lines)
diff --git a/nix/libstore/build.cc b/nix/libstore/build.cc
index 88f8d11103..ccec513d8d 100644
--- a/nix/libstore/build.cc
+++ b/nix/libstore/build.cc
@@ -1946,6 +1946,15 @@ void DerivationGoal::startBuilder()
}
+/* Return true if the operating system kernel part of SYSTEM1 and SYSTEM2 (the
+ bit that comes after the hyphen in system types such as "i686-linux") is
+ the same. */
+static bool sameOperatingSystemKernel(const std::string& system1, const std::string& system2)
+{
+ auto os1 = system1.substr(system1.find("-"));
+ auto os2 = system2.substr(system2.find("-"));
+ return os1 == os2;
+}
void DerivationGoal::runChild()
{
@@ -2208,9 +2217,20 @@ void DerivationGoal::runChild()
foreach (Strings::iterator, i, drv.args)
args.push_back(rewriteHashes(*i, rewritesToTmp));
- execve(drv.builder.c_str(), stringsToCharPtrs(args).data(), stringsToCharPtrs(envStrs).data());
-
- int error = errno;
+ /* If DRV targets the same operating system kernel, try to execute it:
+ there might be binfmt_misc set up for user-land emulation of other
+ architectures. However, if it targets a different operating
+ system--e.g., "i586-gnu" vs. "x86_64-linux"--do not try executing
+ it: the ELF file for that OS is likely indistinguishable from a
+ native ELF binary and it would just crash at run time. */
+ int error;
+ if (sameOperatingSystemKernel(drv.platform, settings.thisSystem)) {
+ execve(drv.builder.c_str(), stringsToCharPtrs(args).data(),
+ stringsToCharPtrs(envStrs).data());
+ error = errno;
+ } else {
+ error = ENOEXEC;
+ }
/* Right platform? Check this after we've tried 'execve' to allow for
transparent emulation of different platforms with binfmt_misc
J
J
Jan Nieuwenhuizen wrote on 28 Sep 2020 13:11
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 43668@debbugs.gnu.org)
87lfgurwaa.fsf@gnu.org
Ludovic Courtès writes:

Hi!

Toggle quote (24 lines)
> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> It’s no wonder that the GNU/Hurd executable fails to run on GNU/Linux.
>> The reason the daemon tries to run it anyway is because of the hack
>> introduced in 7bf2a70a4ffd976d50638d3b9f2ec409763157df, in support of
>> transparent emulation via binfmt_misc.
>
> The thing is that x86 GNU/Hurd and GNU/Linux ELF binaries are
> indistinguishable AFAICS since they both use ELFOSABI_NONE:
>
> scheme@(guile-user)> ,use(guix elf)
> scheme@(guile-user)> ,use(rnrs io ports)
> scheme@(guile-user)> (define e (parse-elf (call-with-input-file "/gnu/store/vq7zyb4hmlrafflmrcjbqccxp4dsx0s3-bash" get-bytevector-all)))
> scheme@(guile-user)> (elf-abi e)
> $6 = 0
> scheme@(guile-user)> ELFOSABI_GNU
> $7 = 3
> scheme@(guile-user)> (define e2 (parse-elf (call-with-input-file "/bin/sh" get-bytevector-all)))
> scheme@(guile-user)> (elf-abi e2)
> $8 = 0
>
> (The ‘file’ command does manage to recognize GNU/Hurd binaries, but I
> don’t know how it does it.)

Looking at the file sources, it uses do_os_note, look:

Toggle snippet (8 lines)
$ readelf -a $(guix build --target=i586-pc-gnu hello)bin/hello

Displaying notes found in: .note.ABI-tag
Owner Data size Description
GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag)
OS: Hurd, ABI: 0.0.0

Toggle quote (6 lines)
> So I think we can’t count on an ‘execve’ error and thus have to treat
> this case (same architecture but different OS kernel) specially, as
> shown below.
>
> Thoughts?

If that really doesn't work...then yeah (yuck ;-)
Greetigs,
Janneke

--
Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond http://lilypond.org
Freelance IT http://JoyofSource.com| Avatar® http://AvatarAcademy.com
L
L
Ludovic Courtès wrote on 28 Sep 2020 22:45
(name . Jan Nieuwenhuizen)(address . janneke@gnu.org)(address . 43668@debbugs.gnu.org)
87r1qlk4v6.fsf@gnu.org
Hi!<

Jan Nieuwenhuizen <janneke@gnu.org> skribis:

Toggle quote (10 lines)
>> Ludovic Courtès <ludo@gnu.org> skribis:
>>
>>> It’s no wonder that the GNU/Hurd executable fails to run on GNU/Linux.
>>> The reason the daemon tries to run it anyway is because of the hack
>>> introduced in 7bf2a70a4ffd976d50638d3b9f2ec409763157df, in support of
>>> transparent emulation via binfmt_misc.
>>
>> The thing is that x86 GNU/Hurd and GNU/Linux ELF binaries are
>> indistinguishable AFAICS since they both use ELFOSABI_NONE:

[...]

Toggle quote (9 lines)
> Looking at the file sources, it uses do_os_note, look:
>
> $ readelf -a $(guix build --target=i586-pc-gnu hello)bin/hello
>
> Displaying notes found in: .note.ABI-tag
> Owner Data size Description
> GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag)
> OS: Hurd, ABI: 0.0.0

Oh, well done, I browsed ‘file’ but didn’t find it.

Toggle quote (8 lines)
>> So I think we can’t count on an ‘execve’ error and thus have to treat
>> this case (same architecture but different OS kernel) specially, as
>> shown below.
>>
>> Thoughts?
>
> If that really doesn't work...then yeah (yuck ;-)

Yeah, I think we’ll have to do this hack (we’re not going to parse ELF
files and all to determine whether to call ‘execve’.)

(Besides, it would be interesting to understand how the libc/Hurd
startup code ends up segfaulting on GNU/Linux.)

Ludo’.
J
J
Jan Nieuwenhuizen wrote on 29 Sep 2020 13:55
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 43668@debbugs.gnu.org)
877dsc3igp.fsf@gnu.org
Ludovic Courtès writes:

Hi!

Toggle quote (11 lines)
> Jan Nieuwenhuizen <janneke@gnu.org> skribis:
>
>>> Ludovic Courtès <ludo@gnu.org> skribis:
>>>
>> Displaying notes found in: .note.ABI-tag
>> Owner Data size Description
>> GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag)
>> OS: Hurd, ABI: 0.0.0
>
> Oh, well done, I browsed ‘file’ but didn’t find it.

:-)

Toggle quote (11 lines)
>>> So I think we can’t count on an ‘execve’ error and thus have to treat
>>> this case (same architecture but different OS kernel) specially, as
>>> shown below.
>>>
>>> Thoughts?
>>
>> If that really doesn't work...then yeah (yuck ;-)
>
> Yeah, I think we’ll have to do this hack (we’re not going to parse ELF
> files and all to determine whether to call ‘execve’.)

Ah, we're C++; I was thinking Guile and "we surely have" an ELF library.
That's allright then. Let's have this workaround.

Toggle quote (3 lines)
> (Besides, it would be interesting to understand how the libc/Hurd
> startup code ends up segfaulting on GNU/Linux.)

Hmm...are you saying something like "it could run until it wants to RCP
Mach or Hurd?" Might it "just load" shared libraries...

Greetings,
Janneke

--
Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond http://lilypond.org
Freelance IT http://JoyofSource.com| Avatar® http://AvatarAcademy.com
L
L
Ludovic Courtès wrote on 1 Oct 2020 12:51
(name . Jan Nieuwenhuizen)(address . janneke@gnu.org)(address . 43668-done@debbugs.gnu.org)
87sgayte1k.fsf@gnu.org
Hi,

Jan Nieuwenhuizen <janneke@gnu.org> skribis:

Toggle quote (6 lines)
>> Yeah, I think we’ll have to do this hack (we’re not going to parse ELF
>> files and all to determine whether to call ‘execve’.)
>
> Ah, we're C++; I was thinking Guile and "we surely have" an ELF library.
> That's allright then. Let's have this workaround.

Yeah. Even with a library, it doesn’t sound right to parse files
beforehand. I think it’s up to the kernel to make the relevant check,
but perhaps there are good reasons why this isn’t happening here.

Anyway, pushed as 9556ac498fd648147ad7d3b52ec86202d0a8e171!

Toggle quote (6 lines)
>> (Besides, it would be interesting to understand how the libc/Hurd
>> startup code ends up segfaulting on GNU/Linux.)
>
> Hmm...are you saying something like "it could run until it wants to RCP
> Mach or Hurd?" Might it "just load" shared libraries...

Yeah I would expect it to run code up to the first Mach syscall but here
it segfaults so maybe it crashes earlier.

Ludo’.
Closed
?