[PATCH core-updates] gnu: glibc-final: Catch all cases of a glibc user not requesting 64-bit offsets and then using readdir.

OpenSubmitted by Danny Milosavljevic.
Details
5 participants
  • Andreas Enge
  • Danny Milosavljevic
  • Efraim Flashner
  • Ludovic Courtès
  • Marius Bakke
Owner
unassigned
Severity
normal
D
D
Danny Milosavljevic wrote on 24 Sep 2020 16:12
(address . guix-patches@gnu.org)(name . Danny Milosavljevic)(address . dannym@scratchpost.org)
20200924141211.21649-1-dannym@scratchpost.org
* gnu/packages/commencement.scm (glibc-final): Catch all cases of a glibc user not
requesting 64-bit offsets and then using readdir.
---
gnu/packages/commencement.scm | 52 ++++++++++++++++++++++++++++++++++-
1 file changed, 51 insertions(+), 1 deletion(-)

Toggle diff (63 lines)
diff --git a/gnu/packages/commencement.scm b/gnu/packages/commencement.scm
index e5a4caa95c..5a62c9df3a 100644
--- a/gnu/packages/commencement.scm
+++ b/gnu/packages/commencement.scm
@@ -3462,7 +3462,57 @@ exec ~a/bin/~a-~a -B~a/lib -Wl,-dynamic-linker -Wl,~a/~a \"$@\"~%"
                ,hurd-headers-boot0)
              '())
        ,@(package-outputs glibc-final-with-bootstrap-bash))
-      ,@(package-arguments glibc-final-with-bootstrap-bash)))))
+      ,@(substitute-keyword-arguments
+         (package-arguments glibc-final-with-bootstrap-bash)
+         ((#:phases phases)
+          `(modify-phases ,phases
+             (add-after 'unpack 'patch-dirent
+               (lambda* (#:key outputs #:allow-other-keys)
+                 ;; QEMU transparent emulation is in somewhat of a pickle sometimes.
+                 ;; There is no support in the kernel syscalls of specifying what
+                 ;; kind of userspace you are emulating.  Some parts of the
+                 ;; structures passed back-and-forth between kernel and guest
+                 ;; userspace can change size (including size of individual fields).
+                 ;;
+                 ;; One of the affected structures is "struct dirent".  The ext4
+                 ;; file system puts a 64 bit hash into "d_off" on the kernel side.
+                 ;; If the guest system's glibc is 32 bit it is going to be very
+                 ;; confused (it does check whether d_off fits into the structure
+                 ;; it gives back to the user--and it doesn't fit.  Hence readdir
+                 ;; fails).
+                 ;; This manifests itself in simple directory reads not working
+                 ;; anymore in parts of cmake, for example.
+                 ;;
+                 ;; There is a very simple and complete way to avoid this problem:
+                 ;; Just always use 64 bit offsets in user space programs (also
+                 ;; on 32 bit machines).
+                 ;;
+                 ;; Note: We might want to avoid using 64 bit when bootstrapping
+                 ;; using mescc (since mescc doesn't directly support 64 bit
+                 ;; values)--but then bootstrapping has to be done on a
+                 ;; file system other than ext4, or on ext4 with the feature
+                 ;; "dir_index" disabled.
+                 ;;
+                 ;; The change below does not affect 64 bit users.
+                 ;;
+                 ;; See <https://issues.guix.gnu.org/43513>.
+                 (let ((port (open-file "include/dirent.h" "a")))
+                   (display "
+#if __SIZEOF_LONG__ < 8
+#ifndef __USE_FILE_OFFSET64
+#undef readdir
+#define readdir @READDIR_WITHOUT_FILE_OFFSET64_IS_A_REALLY_BAD_IDEA@
+#endif
+#endif
+" port)
+                   (close-port port))
+                 ;; This file includes <dirent.h> and thus checks sanity already.
+                 ;; TODO: Check dirent/scandir-tail.c, dirent/scandir64-tail.c.
+                 (substitute* "posix/glob.c"
+                  (("(#[ ]*define[ ][ ]*readdir)") "
+#undef readdir
+#define readdir"))
+                 #t)))))))))
 
 (define/system-dependent gcc-boot0-wrapped
   ;; Make the cross-tools GCC-BOOT0 and BINUTILS-BOOT0 available under the
D
D
Danny Milosavljevic wrote on 24 Sep 2020 16:16
(address . 43591@debbugs.gnu.org)
20200924161607.0c196f98@scratchpost.org
On Thu, 24 Sep 2020 16:12:11 +0200
Danny Milosavljevic <dannym@scratchpost.org> wrote:
Toggle quote (6 lines)
> + ;; Note: We might want to avoid using 64 bit when bootstrapping
> + ;; using mescc (since mescc doesn't directly support 64 bit
> + ;; values)--but then bootstrapping has to be done on a
> + ;; file system other than ext4, or on ext4 with the feature
> + ;; "dir_index" disabled.

Or on a real 32 bit machine, where there is no problem with mismatching struct
sizes between emulated guest glibc and host system kernel, because there is no
emulation going on.
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl9sqicACgkQ5xo1VCww
uqXO7wf+MGIbf5TXP6EEZ+ay4VWLji+EfEdvvv37LSSvlGBtAIyDY2spjc2dcvcd
yBA8o04jMes+JyCFWTpkZxbk5uwpEQxlSbmxtBtnjLnK7+1txVzweONiP590AMa2
9SApQbqpTr6mJrdLuaX+ATU54XXqAQ/Ah+wHWwoFeSk7catsAmhIEIOcbWB1zOMO
G+Mf+xR1l3GInukjUm3FvtJfHFd5B9nzgLeDKXeboXKdSvL90Hm5LYgIDxQkQ8J+
Iua9O5XPR200M4UMfkJVuD+clLP3bt63qmFUE/Z557woIqeA7l6lY4ee8kf8ylR1
Y8piTqrroEewYjqC1/xqCtXOB8Gliw==
=6+cv
-----END PGP SIGNATURE-----


M
M
Marius Bakke wrote on 24 Sep 2020 20:17
Re: [bug#43591] [PATCH core-updates] gnu: glibc-final: Catch all cases of a glibc user not requesting 64-bit offsets and then using readdir.
(name . Danny Milosavljevic)(address . dannym@scratchpost.org)
87363759at.fsf@gnu.org
Danny Milosavljevic <dannym@scratchpost.org> writes:

Toggle quote (15 lines)
> + ;; QEMU transparent emulation is in somewhat of a pickle sometimes.
> + ;; There is no support in the kernel syscalls of specifying what
> + ;; kind of userspace you are emulating. Some parts of the
> + ;; structures passed back-and-forth between kernel and guest
> + ;; userspace can change size (including size of individual fields).
> + ;;
> + ;; One of the affected structures is "struct dirent". The ext4
> + ;; file system puts a 64 bit hash into "d_off" on the kernel side.
> + ;; If the guest system's glibc is 32 bit it is going to be very
> + ;; confused (it does check whether d_off fits into the structure
> + ;; it gives back to the user--and it doesn't fit. Hence readdir
> + ;; fails).
> + ;; This manifests itself in simple directory reads not working
> + ;; anymore in parts of cmake, for example.

Note that for CMake in particular, this problem will be fixed in 3.19:


As mentioned in that issue, and which this patch states on no uncertain
terms, a workaround is to use -D_FILE_OFFSET_BITS=64 on 32-bit platforms.

Toggle quote (21 lines)
> + ;;
> + ;; There is a very simple and complete way to avoid this problem:
> + ;; Just always use 64 bit offsets in user space programs (also
> + ;; on 32 bit machines).
> + ;;
> + ;; Note: We might want to avoid using 64 bit when bootstrapping
> + ;; using mescc (since mescc doesn't directly support 64 bit
> + ;; values)--but then bootstrapping has to be done on a
> + ;; file system other than ext4, or on ext4 with the feature
> + ;; "dir_index" disabled.
> + ;;
> + ;; The change below does not affect 64 bit users.
> + ;;
> + ;; See <https://issues.guix.gnu.org/43513>.
> + (let ((port (open-file "include/dirent.h" "a")))
> + (display "
> +#if __SIZEOF_LONG__ < 8
> +#ifndef __USE_FILE_OFFSET64
> +#undef readdir
> +#define readdir @READDIR_WITHOUT_FILE_OFFSET64_IS_A_REALLY_BAD_IDEA@

Won't this break _everything_ that uses readdir() without 64-bit
offsets? Or does that @@ string get substituted by the glibc build
system somehow.

Toggle quote (11 lines)
> +#endif
> +#endif
> +" port)
> + (close-port port))
> + ;; This file includes <dirent.h> and thus checks sanity already.
> + ;; TODO: Check dirent/scandir-tail.c, dirent/scandir64-tail.c.
> + (substitute* "posix/glob.c"
> + (("(#[ ]*define[ ][ ]*readdir)") "
> +#undef readdir
> +#define readdir"))

Can you file a bug report upstream about the duplicate definition(s)?

Enforcing this restriction in glibc feels rather sledgehammer-y. Would
it make sense to introduce a GCC warning instead? I'm sure there are
legitimate uses of smaller file offsets (i.e. embedded). A GCC warning
will still break -Werror, but that's a lot more manageable than breaking
almost every use of readdir() on 32-bit platforms.
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEu7At3yzq9qgNHeZDoqBt8qM6VPoFAl9s4qoACgkQoqBt8qM6
VPov8QgA0GUVhmZGcGVhQFYtqMUxfLovkbh6OJXKx8eoByy+lNBK1x3cxsukQoob
wRA012R82QMpfOpDTSuKh3WSJDnj2c8x1ZRMKNNjQ3uVd1wvahgeuySdU9zSJTm8
WkM9TVVnhEj80OL94Zqzx7R5guXW/GjMZCmMuS3WiflEwsRf25mtifTIP0QcXoKf
/6NkM2oHQSBmMMimRFmRBcRhW8r+sVK9Mkrbz4OWxiiwRKe7LZoFLRszRkRGtCux
6hWkhxQeqizCuUoXdRDHTXT7kp6xSOEYBK1trMNGLNf+92UUjrhntgyNE573cmEy
nKWxW7qJJkx8AOjqubL5NAqf60TW7w==
=Joa2
-----END PGP SIGNATURE-----

D
D
Danny Milosavljevic wrote on 24 Sep 2020 22:27
(name . Marius Bakke)(address . marius@gnu.org)(address . 43591@debbugs.gnu.org)
20200924222711.2f22281a@scratchpost.org
Hi Marius,

On Thu, 24 Sep 2020 20:17:14 +0200
Marius Bakke <marius@gnu.org> wrote:

Toggle quote (3 lines)
> As mentioned in that issue, and which this patch states on no uncertain
> terms, a workaround is to use -D_FILE_OFFSET_BITS=64 on 32-bit platforms.

Yeah. The problem is how to find all those places where we need to add it,
and how to keep finding them as we maintain stuff (read: as upstream changes
stuff).

Toggle quote (3 lines)
> Won't this break _everything_ that uses readdir() without 64-bit
> offsets?

That's the goal of that patch. Because it won't work at runtime anyway, when
run on a guix x86_64 build machine, building for ARM (which is the usual way
we build stuff for ARM).

Maybe we can find out whether dirent.h is #included for compiling for one of
the main Guix platforms (I prefer this latter solution, if possible).

It's either that or we stop qemu transparent emulation on the build farm,
because the result isn't reliable. This time it was "caught" because of
an overabundance of caution on behalf of the glibc people. We won't be
so lucky next time.

Toggle quote (3 lines)
> Or does that @@ string get substituted by the glibc build
> system somehow.

No. It's specifically made so downstream users of glibc on guix can't use
readdir() on 32-bit systems without enabling large file support.

It totally could use some more #if about whether it's building stuff for
an actual guix main platform (i.e. not on embedded).

Toggle quote (2 lines)
> Can you file a bug report upstream about the duplicate definition(s)?

It's not really a problem for them. But I can try anyway.

Toggle quote (6 lines)
> Enforcing this restriction in glibc feels rather sledgehammer-y. Would
> it make sense to introduce a GCC warning instead? I'm sure there are
> legitimate uses of smaller file offsets (i.e. embedded). A GCC warning
> will still break -Werror, but that's a lot more manageable than breaking
> almost every use of readdir() on 32-bit platforms.

I thought about a gcc #warning. But I don't want it to warn when readdir
isn't used in the first place but dirent.h ended up being included by some
dependency. I guess we could use a deprecation warning on readdir() instead.
Can this warning print a custom message?

What do you think?

In any case, I'd like to have some overview of how bad the problem is and
thus I'd like to have a wip branch with this patch being built entirely
for 32 bits via x86_64 (not sure whether that includes i686 btw).
Is it possible to force using these x86_64 machines in the build farm?

Toggle quote (2 lines)
>I'm sure there are legitimate uses of smaller file offsets (i.e. embedded).

On guix? With our glibc? Do we support that?

Toggle quote (3 lines)
>A GCC warning will still break -Werror, but that's a lot more manageable
>than breaking almost every use of readdir() on 32-bit platforms.

These uses of readdir() are not reliable the way we are building guix on
cuirass.

But I'm all for refining this patch, if there are suggestions on how.

Actually, I think a warning is not strong enough. This bug made it all the
way down to json-c (!) before finally being detected. That should not happen.

And if the build farm would have selected an actual ARM machine, the build
would have succeeded and no one on x86_64 could have reproduced the result.
That would have been real bad.
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl9tAR8ACgkQ5xo1VCww
uqUQqAgAhPm5IvAyqQ+RtKLzcd/JTSlzR8/lbc6RVhuea0/IiYZJHjFKcY9YTbWs
3c9Rvk0zI8XvzXs8ViLqVRwMDf+fFW/KBFKWgkyfpF784T12m/4epJE/3s4v1ihy
UC1WGxS5xTs2PkXcL9aG+FpwV+5zMROV7ge4he0wGJUnB1BtkvYh3PWzVoqeM1Wt
+M8I0anOi7NLc3EHWJDCv+RjcRrI1f2HBXlWprc6rxBGnVpRJrtHZSQHrUrTOQF1
IB8YK12qB6FbDy/DZDaMwLiY4x+sonKQ1+CTJkkG6irJ+Xh/MtHgNj9rWVOXzsCE
3RG3MNjdAx3o/08r262LrNxRFITu1w==
=9PzS
-----END PGP SIGNATURE-----


M
M
Marius Bakke wrote on 25 Sep 2020 01:11
(name . Danny Milosavljevic)(address . dannym@scratchpost.org)(address . 43591@debbugs.gnu.org)
87tuvm4vop.fsf@gnu.org
Danny Milosavljevic <dannym@scratchpost.org> writes:

Toggle quote (19 lines)
> Hi Marius,
>
> On Thu, 24 Sep 2020 20:17:14 +0200
> Marius Bakke <marius@gnu.org> wrote:
>
>> As mentioned in that issue, and which this patch states on no uncertain
>> terms, a workaround is to use -D_FILE_OFFSET_BITS=64 on 32-bit platforms.
>
> Yeah. The problem is how to find all those places where we need to add it,
> and how to keep finding them as we maintain stuff (read: as upstream changes
> stuff).
>
>> Won't this break _everything_ that uses readdir() without 64-bit
>> offsets?
>
> That's the goal of that patch. Because it won't work at runtime anyway, when
> run on a guix x86_64 build machine, building for ARM (which is the usual way
> we build stuff for ARM).

Why is this not an issue with i686 on x86_64 kernels? Or armhf on
AArch64? This problem only manifests when using QEMUs syscall
emulation, right?

Toggle quote (8 lines)
> Maybe we can find out whether dirent.h is #included for compiling for one of
> the main Guix platforms (I prefer this latter solution, if possible).
>
> It's either that or we stop qemu transparent emulation on the build farm,
> because the result isn't reliable. This time it was "caught" because of
> an overabundance of caution on behalf of the glibc people. We won't be
> so lucky next time.

Arguably running code for foreign architectures through QEMU binfmt is
something of a hack. Mandating that every package *must* be patched to
support it seems user-hostile. I'm more in favor of dropping it on the
build farm, or just keep fixing things on a per-package basis.

A less user-hostile solution could perhaps be to (setenv "CFLAGS"
"-D_FILE_OFFSET_BITS=64") on 32-bit architectures in gnu-build-system.
Not sure whether that could cause any adverse effects. But again, I
don't like the idea of optimizing for QEMUs user-mode emulation.
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEu7At3yzq9qgNHeZDoqBt8qM6VPoFAl9tJ5YACgkQoqBt8qM6
VPrAYQgAnI3/88H0lcjkCiUbehr3AmthzPh1jGgIFAxv0XL1+YoCqtsHWCScxMzF
IsNCSMo2cYGplTDbQNyb+vNy1nQrIZecGm2mE8G5kPKir/ofSb05pQckCy7xDONt
/KMOAByMtQEueIbemyr+PdXi5jpTR+pG/Ef2ZjvBDLO7Fl9ZlYgiQx9V6vIjaikI
OWYRzanDMZkLB97DoM6lAAdtSMeSxbDj2n64rbxTdlVBnnrdagereSwWVpsNCeF4
k/z1isflAZxAiPWatsHoOmNPIxqGuPuytrNleHha4xVBEZn7ombTHeRI3qyYScWD
K0owVdK6pO+PUxiG2hzLq5dPmRI2LQ==
=aEhD
-----END PGP SIGNATURE-----

D
D
Danny Milosavljevic wrote on 25 Sep 2020 12:20
(name . Marius Bakke)(address . marius@gnu.org)(address . 43591@debbugs.gnu.org)
20200925122004.38275411@scratchpost.org
Hi Marius,

On Fri, 25 Sep 2020 01:11:18 +0200
Marius Bakke <marius@gnu.org> wrote:

Toggle quote (3 lines)
> Arguably running code for foreign architectures through QEMU binfmt is
> something of a hack.

The problem here is not qemu. It's not even the size of the whole struct
dirent that's the problem, or endianness, or alignment, or any of that stuff.

The problem is that if you have a 64-bit hash value then you can't fit it into
a 32-bit field.

You have to make the field bigger.

It's the only non-insane way to fix this and still be able to use the host
filesystem from the guest.

This bug is making readdir randomly and silently succeed or fail.

Note: The bug I'm referring to is caused by having 32 bit offsets in user space
in the first place--nothing else. That is the root of the problem, not qemu,
not ext4, not the kernel. Having 32 bit offsets in user space when kernel space
uses 64 bit offsets is just weird (note: the kernel always uses 64 bit
offsets--also on 32 bit architectures!).

Note: This also affects 9p, which has been recently "fixed" in the Linux kernel.

Toggle quote (2 lines)
>Why is this not an issue with i686 on x86_64 kernels?

I'm not sure. I'll check.

Toggle quote (2 lines)
>Or armhf on AArch64?

I cannot check that, but it totally should be a problem there.

It has nothing to do with qemu specifically, and IMO qemu is right in not
"fixing" it (because it's NOT a problem in qemu), and in expecting
32 bit guests not to use the getdents64 system call :P

But even using getdents (no 64) as a flag what size d_off the caller
expects is an ugly workaround.
Not because of the flag, but because of what you have to do if it's set.

The right fix is to switch to LFS (and thus also 64 bit d_off) on 32 bit guests.
It's 2020; normal systems have drives a lot bigger than 4 GiB. Even Raspberrys
have bigger drives.

Also, just for the record, glibc always calls getdents64, never getdents.
Using getdents64 and then being surprised when a 64 bit result comes back is
seriously weird. And they are still defending it!

If they didn't do that, at least the kernel would know something is up,
and qemu would still pass through getdents as getdents to the host kernel and
getdents64 as getdents64 to the host kernel. Also, qemu could warn on calls
to getdents (and not on calls to getdents64).

There are no downsides to knowing what the user expects.

But even then, user space probably would have LFS disabled. That means
as soon as user space mounts something from NFS or 9p or whatever on their
(let's say embedded) platform, they would have this problem again,
and it would be very difficult to find what happened. Why keep this problem
alive?

Toggle quote (2 lines)
>This problem only manifests when using QEMUs syscall emulation, right?

No.

It's a fundamental problem with the slot you want to store something in
being too small for it.

We just didn't see it yet, because usually the system *silently returns* halfway
through the readdir loop and does not tell us about any problems once it
encounters any d_off >= 2**32 along the way.

In the "json-c" bug, luckily json-c or cmake actually NEEDED a file that had
not been returned because of that. That is not a given.

Toggle quote (2 lines)
>Mandating that every package *must* be patched to support it seems user-hostile.

Ok, then how about a flag that users can set in order to let it compile
regardless? But then they shouldn't use that flag in any (non-cross-compiled)
guix packages.
We can mention it in the news so people are not surprised.
(Thinking about it, the flag could just be "-D_FILE_OFFSET_BITS=32". Because
when you are setting it explicitly, you probably know what you are doing)

It really depends on whether we officially support running 32 bit guests
on 64 bit hosts. Qemu has nothing to do with it.

Since the majority of our build farm runs qemu transparent emulation, and
since a majority of our ARM packages are built using it, it seems that we
decided to support 32 bit guests on 64 bit hosts. Well, then something like
this patch has to be done, otherwise we have Schrödinger's build system--and
no one likes that.

(I want to stress that the main problem is the case where readdir SUCCEEDS
even though there is a problem;
if readdir would always fail in the situation with a 32 bit guest on
64 bit host, that would be much less bad. But it doesn't always fail!)

I'd be really uneasy if the sanity check this patch introduces were only
enabled when explicitly setting a flag--for something as bad as this.

Especially for an error that silently makes it into base derivations and then
distributes all the way to who-knows-what normal client derivations.

Keeping the user from shooting himself in the foot by default if he doesn't
say "yes, please shoot me in the foot" beforehand is not "user-hostile".

We could also do a deprecation warning instead:

struct dirent* readdir(DIR* d) __attribute__((deprecated("Unsafe if not using large file support")));

f1.c:5:2: warning: 'readdir' is deprecated: Unsafe if not using large file support [-Wdeprecated-declarations]

But then the warning would not be emitted in the, for example, json-c package,
but in cmake. I would never have found the bug that way. I guess now we
know, but still... only emitting a warning seems weak for something this bad.

So I'm against it.

Toggle quote (5 lines)
> I'm more in favor of dropping it on the
> build farm,

>or just keep fixing things on a per-package basis.

That means you have to *find* the things first. The only reason glibc found
the problem this one time is because the d_off telldir pointer just happened
to be >= 2**32. If it's not bigger one time, and is bigger another time in
the same package build run, then you get weird unreproducible results.

That is horrible. A copy-recursively could leave half the files off and not
tell anyone. PLEASE let's do something about it. Also, let's delete all
our ARM substitutes we have so far.

Toggle quote (3 lines)
> A less user-hostile solution could perhaps be to (setenv "CFLAGS"
> "-D_FILE_OFFSET_BITS=64") on 32-bit architectures in gnu-build-system.

Sure, we can definitely do that in addition. It's a good idea. There can
be no ABI compatibility problem inside Guix if *all* Guix packages use that
flag.

While we are at it, we can do it on ALL architectures. Because that's what
we actually want. Should there be a 128 bit CPU, we still want all CPUs
(no matter how many bit registers the CPU has) use the same file offset
bit size.

But will that be picked up everywhere? cmake? Rust C extensions? Python
extensions?

Toggle quote (2 lines)
> Not sure whether that could cause any adverse effects.

It will. And if we want to find out where those effects would be, this
patch--or one with warnings (that one is only useful if there is a way to
automatically collect AND KEEP all warnings into one place)--is how to find
out.

Toggle quote (2 lines)
> But again, I don't like the idea of optimizing for QEMUs user-mode emulation.

It's not optimizing for qemu specifically. Qemu is not at fault here.
(guest) glibc is at fault. Because of the way we build it.

I've thought through a few possible fixes for this problem:

(1) Wait until the kernel adds a "I-have-space-for-x-bits" field to the
getdents64 system call and for qemu to actually use it. But how would qemu
know how much space the guest glibc has in its field?
But fine, let's say they have a crystal ball. Even then, what is ext4 going
to do with this request? Just cut the cookie somewhere and throw away half?
That's not safe. So this solution is right out.

(2) Have glibc use getdents on 32 bit and getdents64 on 64 bit. See above--only
now qemu can tell know much space the guest glibc has, and so can the host kernel
(which is much better).
Cutting the cookie is not safe. So this solution is still out, barely.

(3) Have qemu maintain a mapping table per file of which 64-bit d_off is
associated with which weird cookie qemu invents and remembers.

That doesn't help us in the case where you DO NOT USE qemu but still have
a 32 bit executable running on a 64 bit host (natively).

Therefore, this solution is out.

(4) Build user space with large file support. There are no downsides to this,
and as a distribution we can indeed decide only to support large file guix
packages. I argue that this is the right solution for all "Guix system" hosts,
because of the "SD card" argument above. Also, directories with Guix on it
aren't exactly small either. So you need it anyway. If there are embedded
platforms with disks smaller than 4 GiB used in guix, we can given those
glibcs an original dirent.h (they will fail loudly--so we will find them easily).

I prefer this solution.

(5) Build the qemu that emulates 32 bit ARM for 32 bit i686 (this is not a typo).
In that case the kernel would enable compatibility mode (X86_32 personality) and
it would work. I want to stress that this compatibility mode calculates
different hashes depending on the program that is requesting it, on the
same host, directory and entry. This is not something one would want,
especially when building things--the kernel is basically lying to you.
See kernel option CONFIG_X86_32--which we enable.

It would work okay in practice, because user space programs usually don't store
d_off. But (4) is just better.

P.S. by now, my local build found a binutils 2.35.1 bfd plugin, and
(libstdc++-v3/src/filesystem/dir.cc in) libstdc++ 7.5.0, that also don't use
large files and thus would also only work sporadically in random ways.

How many hundreds of packages does this affect that we don't know yet?

PPS. I get a lot of "grep: invalid option -o" messages when building
glibc-2.32--even without the patch. That's probably not good either.

PPPS. new patch will come soon.
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl9txFQACgkQ5xo1VCww
uqWCSwf/WwR2WWHPz4kO0hVdcEjmjMTwarKs1VF5DNs+pA6WwVN5nB2Nyp5El4mV
wwo4ik3BAzOicLBHwNYk6H+/RINRqFrtDqHUkC6dRo3zXzhmtgXSs2PajYie2ljr
eviZNBmZcwGdEitvv6E/9+K4fbDIIG6rhygWiY/x9YD+3/mdw8eTJ5Km1tnjCSZ4
uPT7CiWB43LuValm7R9k034N/dgix8bT8QTgLPQ25X7bhII1sX6Xs33d20eLCDen
cD1pKmcCvbVgh5ZGsG8BI7Lcp86AfUvfOVlZZlwNIFvSVS9DUMJ23sCiL48GGFV8
FYUb2ZvPyFdP60JzPykH/PlpTXg8nA==
=eqNk
-----END PGP SIGNATURE-----


D
D
Danny Milosavljevic wrote on 25 Sep 2020 12:24
(name . Marius Bakke)(address . marius@gnu.org)(address . 43591@debbugs.gnu.org)
20200925122450.61df97da@scratchpost.org
On Thu, 24 Sep 2020 20:17:14 +0200
Marius Bakke <marius@gnu.org> wrote:

Toggle quote (3 lines)
> As mentioned in that issue, and which this patch states on no uncertain
> terms, a workaround is to use -D_FILE_OFFSET_BITS=64 on 32-bit platforms.

That is not a workaround. That is the correct fix.

The kernel uses 64 bit offsets on 32-bit platforms anyway.

So either they don't ask the kernel questions they don't want to know the
answer to, or they use 64 bit offsets, too.
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl9txXIACgkQ5xo1VCww
uqWFxgf/SFs4Qr2DuOGjlo/8lwAeGFPdnAWlun5d8yEeBcQCOnxPIi/XjeJNryM2
+T/81IBsByMIZlQWPPssYsFgqAqpPh4jG0ymYc77f3mJ4a4vqHzvbM7CWYRpxFO8
WBv42TsOl1szwOrwknF/Qqx7ERv2hNidkzCmqrAaOtnu5aLCrmdWX/UmeZkEZYJG
gTQUKOdhgUQTfJgzW9RbdeN/TmJRsfsQZGzq+65BFDg9HERU/LhtR5E704HmRiZx
lmmHR1ArysoEvhioX6aqaQxkFFgCeTn0IXtmSpaHZSsFHrsl7LAWx4V8DcOaAF8E
upmivdcdHAKPP6ULiRKpsYzceAwHPw==
=YvvO
-----END PGP SIGNATURE-----


D
D
Danny Milosavljevic wrote on 25 Sep 2020 12:42
[PATCH v2 core-updates] gnu: glibc-final: Catch all cases of a glibc user not requesting 64-bit offsets and then using readdir regardless.
(address . 43591@debbugs.gnu.org)(name . Danny Milosavljevic)(address . dannym@scratchpost.org)
20200925104243.2298-1-dannym@scratchpost.org
* gnu/packages/commencement.scm (glibc-final): Catch all cases of a glibc user not
requesting 64-bit offsets and then using readdir.
---
gnu/packages/commencement.scm | 68 ++++++++++++++++++++++++++++++++++-
1 file changed, 67 insertions(+), 1 deletion(-)

Toggle diff (79 lines)
diff --git a/gnu/packages/commencement.scm b/gnu/packages/commencement.scm
index e5a4caa95c..284fa65d20 100644
--- a/gnu/packages/commencement.scm
+++ b/gnu/packages/commencement.scm
@@ -3462,7 +3462,73 @@ exec ~a/bin/~a-~a -B~a/lib -Wl,-dynamic-linker -Wl,~a/~a \"$@\"~%"
                ,hurd-headers-boot0)
              '())
        ,@(package-outputs glibc-final-with-bootstrap-bash))
-      ,@(package-arguments glibc-final-with-bootstrap-bash)))))
+      ,@(substitute-keyword-arguments
+         (package-arguments glibc-final-with-bootstrap-bash)
+         ((#:phases phases)
+          `(modify-phases ,phases
+             (add-after 'unpack 'patch-dirent
+               (lambda* (#:key outputs #:allow-other-keys)
+                 ;; Linux kernel file offsets are always 64 bits.
+                 ;; But userspace can be built to use 32 bit offsets.
+                 ;;
+                 ;; "struct dirent", returned by readdir, uses d_off to store
+                 ;; such an "offset" that it got from the Linux kernel.
+                 ;; In the case of ext4 that "offset" is actually a 64 bit
+                 ;; cookie which includes a hash value.
+                 ;;
+                 ;; Therefore, there are cases where such an offset that it got
+                 ;; from the Linux kernel does not fit in the "struct dirent"
+                 ;; field "d_off".
+                 ;;
+                 ;; If the guest system's glibc is 32 bit AND uses 32 bit
+                 ;; file offsets it is going to be very confused.
+                 ;; It does check whether d_off fits into the structure
+                 ;; it gives back to the user--and it doesn't fit.  Hence readdir
+                 ;; fails, with errno == EOVERFLOW (which is undocumented and thus
+                 ;; an API error).
+                 ;; This manifests itself in simple directory reads not working
+                 ;; anymore in parts of cmake, for example.
+                 ;;
+                 ;; This manifested in Guix when building stuff for
+                 ;; ARMHF on a x86_64 build host using QEMU transparent emulation.
+                 ;;
+                 ;; There is a very simple and complete way to avoid this problem:
+                 ;; Just always use 64 bit offsets in user space programs (also
+                 ;; on 32 bit machines).  The Linux kernel does that already
+                 ;; anyway.
+                 ;;
+                 ;; Note: We might want to avoid using 64 bit when bootstrapping
+                 ;; using mescc (since mescc doesn't directly support 64 bit
+                 ;; values)--but then bootstrapping has to be done on a
+                 ;; file system other than ext4, or on ext4 with the feature
+                 ;; "dir_index" disabled.
+                 ;;
+                 ;; The change below does not affect 64 bit users.
+                 ;;
+                 ;; See <https://issues.guix.gnu.org/43513>.
+                 (let ((port (open-file "dirent/dirent.h" "a")))
+                   (display "
+#ifndef _LIBC
+#if __SIZEOF_LONG__ < 8
+#ifndef __USE_FILE_OFFSET64
+#if defined(_FILE_OFFSET_BITS) && _FILE_OFFSET_BITS == 32
+#warning \"Using -D_FILE_OFFSET_BITS=32 and using readdir is a bad idea, see <https://bugzilla.kernel.org/show_bug.cgi?id=205957>\"
+#else
+#undef readdir
+#define readdir @READDIR_WITHOUT_FILE_OFFSET64_IS_A_REALLY_BAD_IDEA@
+#endif
+#endif
+#endif
+#endif
+" port)
+                   (close-port port))
+                 ;; This file includes <dirent.h> and thus checks sanity already.
+                 ;; TODO: Check dirent/scandir-tail.c, dirent/scandir64-tail.c.
+                 (substitute* "posix/glob.c"
+                  (("(#[ ]*define[ ][ ]*readdir)") "
+#undef readdir
+#define readdir"))
+                 #t)))))))))
 
 (define/system-dependent gcc-boot0-wrapped
   ;; Make the cross-tools GCC-BOOT0 and BINUTILS-BOOT0 available under the
D
D
Danny Milosavljevic wrote on 25 Sep 2020 15:36
Re: [bug#43591] [PATCH core-updates] gnu: glibc-final: Catch all cases of a glibc user not requesting 64-bit offsets and then using readdir.
(name . Marius Bakke)(address . marius@gnu.org)(address . 43591@debbugs.gnu.org)
20200925153646.6ef95908@scratchpost.org
Toggle quote (4 lines)
> >Why is this not an issue with i686 on x86_64 kernels?
>
> I'm not sure. I'll check.

$ cat a00.c
#include <stdio.h>
#if defined( __ILP32__ )
#warning ILP32
#endif
int main() {
return sizeof(off_t);
}
$ LC_ALL=C guix environment -s i686-linux gcc-toolchain -- gcc -o a00 a00.c
a00.c:3:2: warning: #warning ILP32 [-Wcpp]
3 | #warning ILP32
| ^~~~~~~
$ ./a00
$ echo $?
4

That means they are using the Linux kernel's X86_32 ABI.
It has its own getdents64 system call that returns another value for d_off.

$ LC_ALL=C guix environment -s i686-linux gcc-toolchain -- gcc -o a00 -D_FILE_OFFSET_BITS=64 a00.c
a00.c:3:2: warning: #warning ILP32 [-Wcpp]
3 | #warning ILP32
| ^~~~~~~
$ ./a00
$ echo $?
8

That is why __i686__ is not affected--at the cost of the kernel lying to us
about the file system.

Note that I also tried printing the actual d_off values[1]--on ILP32, even with
_FILE_OFFSET_BITS=64, the VALUES are still 32 bits, and the same values as
without _FILE_OFFSET_BITS. The d_off SLOT gets bigger on _FILE_OFFSET_BITS=64.

(I also tried printing the actual d_off values[1] on x86_64 without any guix
environment -s, I get entirely different d_off values!!)

I also tried the former on native ARMHF--you get 32 bits d_off values. And d_off
is always the same size as off_t.

off_t size changes depending on _FILE_OFFSET_BITS.

I do not have access to a real aarch64 machine--so no idea how it is there.
That would be the most interesting case, because those don't have X86_32,
so ILP32 is either not present or implemented differently.

ppc64 would also be interesting...

Test result of [1]:

system _FILE_OFFSET_BITS off_t d_off-sizeof d_off-values
-------------------------------------------------------------
x86_64 - 8 Byte 8 Byte 8 Byte
i686 - 4 Byte 4 Byte 4 Byte
i686 64 8 Byte 8 Byte 4 Byte
i686 32 4 Byte 4 Byte 4 Byte
i686 7 4 Byte 4 Byte 4 Byte
armhf - 4 Byte 4 Byte 4 Byte
armhf 64 8 Byte 8 Byte 4 Byte
armhf 32 4 Byte 4 Byte 4 Byte
armhf 7 4 Byte 4 Byte 4 Byte

This is all without qemu--in order to simplify the test case.

So I take it ext4 has some special compilation mode for 32 bits...

Could someone please test [1] on (real!) aarch64 and ppc64 and ppc32
machines?

[1] $ cat a00.c
#include <stdio.h>
#include <errno.h>
#include <assert.h>
#include <dirent.h>
#if defined( __ILP32__ )
#warning ILP32
#endif

int main() {
DIR* d;
struct dirent* ent;
d = opendir("/tmp");
errno = 0;
assert(sizeof(ent->d_off) == sizeof(off_t));
while ((ent = readdir(d)) != NULL) {
printf("%llu\n", (unsigned long long) ent->d_off);
}
if (errno)
perror("readdir");
return sizeof(off_t);
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl9t8m4ACgkQ5xo1VCww
uqWmVQf9Gy9Y5HXJyFlOOnywiaIBQuhAmEry1kSYKv0CEwwlZDsMmNyYbKZeMtMS
AY5mw0zxZ215+47xTDqys5GOKKn6Iq6DgVw96Ts+sig945Crw52Aey/XIzrGrEaM
9sJ7xIIcLZuBUlOye9QtctB7tgDhCw9afjKfkB4nwnvPP3H6c+sOIcZiEH4OY3VQ
3xN40ErTBY+PovHojfvdiKy/7spTibocYxC5xWOaAXBIkaP6Xj1m7ySk96A4WF/8
WycJHuSsrVn/QEbens03akkCeME0ZDVZkQd0O12FZ3zL01aNB7EBZa3vjGxkcWSM
H4oBddMV8VfQ2CzJeK4nsNz8FmFU2Q==
=14ua
-----END PGP SIGNATURE-----


D
D
Danny Milosavljevic wrote on 25 Sep 2020 17:33
(name . Marius Bakke)(address . marius@gnu.org)(address . 43591@debbugs.gnu.org)
20200925173320.593e9179@scratchpost.org
Hi,

I wrote a FUSE filesystem to test what happens with big d_off (I just
hard-or-ed a bitmask) and ran it on a real ARMHF machine, then made the program
from before([1] from before) look into that directory.

Result (on ARMHF, so real 32 bit machine!):

$ gcc --version
gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
$ gcc a00.c
$ ./a00
1737031971 .
1737032035 ..
1737032035 hello
$ gcc -D_FILE_OFFSET_BITS=64 a00.c
$ ./a.out
320255973458211 .
320255973458275 ..
320255973458275 hello

(Note: Guix gcc-toolchain 10 on ARMHF is still building from source--and
will continue to do so for some hours I guess)

I only had to patch fuse 2.9.4 (lib/fuse_lowlevel.c) to do this:

char *fuse_add_dirent(char *buf, const char *name, const struct stat *stbuf,
off_t off)
{
unsigned namelen = strlen(name);
unsigned entlen = FUSE_NAME_OFFSET + namelen;
unsigned entsize = fuse_dirent_size(namelen);
unsigned padlen = entsize - entlen;
struct fuse_dirent *dirent = (struct fuse_dirent *) buf;

dirent->ino = stbuf->st_ino;
dirent->off = off | 0x1234567890123; // !!!!
dirent->namelen = namelen;
dirent->type = (stbuf->st_mode & 0170000) >> 12;
strncpy(dirent->name, name, namelen);
if (padlen)
memset(buf + entlen, 0, padlen);

return buf + entsize;
}

(I DID NOT have to patch the kernel or even have root)

So it can happen that you get 64 bit d_off even on real 32 bit machines!

That's what I thought--but I still wanted to make sure.

And the same on Guix i686 (a00 is [1] from my previous e-mail):

$ ./a00-i686
readdir: Value too large for defined data type
$ ./a00-i686_flag_32
readdir: Value too large for defined data type
$ ./a00-i686_flag_64
320255973458211
320255973458275
320255973458275

So there you have it, even on i686--without emulating anything--you can get a
64 bit d_off value.
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl9uDcAACgkQ5xo1VCww
uqXHrAf7Bkl5DifcLI6I7rlRy3s0K7YaZv3P+59pkKv5rf/AhRFdOTfn6QHQ/meq
ml+xqF29R85vPLFSmjSsx2psqWPA1H5ujj0UrBOJ+5PqjIWbjzz77TnCgBNrnglP
kMsvxBUUe477SNa+uQ7TxzycTE1xV2jJ5Pyh5L+kx3JjSSYChVtrn0JvwWfhQSaD
x9A9zTzD3GGMTI1PE8uvhKC80PZi4gc8Ov1Qu6gcwBbsTTe82JZsSpd5xezdm4AO
6nvmuHVOaRZXOGNdvAI+Y7l6+EDisMqDNtzpFIEfPeAds0dcNkC8h2DaH547mN+T
ULenCL7NaTlOzXd4ST82iiVy6UQpMA==
=DzDT
-----END PGP SIGNATURE-----


A
A
Andreas Enge wrote on 25 Sep 2020 22:03
(name . Marius Bakke)(address . marius@gnu.org)
20200925200323.GB5828@jurong
On Fri, Sep 25, 2020 at 01:11:18AM +0200, Marius Bakke wrote:
Toggle quote (5 lines)
> Arguably running code for foreign architectures through QEMU binfmt is
> something of a hack. Mandating that every package *must* be patched to
> support it seems user-hostile. I'm more in favor of dropping it on the
> build farm

Indeed it is weird we do not only compile packages natively on the build
farm. If we do not have enough aarch64 machines, would it make sense to buy
some more? How many would we need? Which are our options of machines that
can run the GNU system (as opposed to Guix on a foreign distro, and possibly
non-free firmware)?

Andreas
D
D
Danny Milosavljevic wrote on 26 Sep 2020 03:42
(name . Marius Bakke)(address . marius@gnu.org)(address . 43591@debbugs.gnu.org)
20200926034244.1aa3c4f6@scratchpost.org
Hi,

by now I also tested it on overdrive1, an aarch64 machine.

Toggle quote (4 lines)
> I do not have access to a real aarch64 machine--so no idea how it is there.
> That would be the most interesting case, because those don't have X86_32,
> so ILP32 is either not present or implemented differently.

I added it in the table below under "a64armhf".

ILP32 is not present on aarch64.

Test result of [1] updated:

system _FILE_OFFSET_BITS off_t d_off-sizeof d_off-values
---------------------------------------------------------------
x86_64 - 8 Byte 8 Byte 8 Byte
i686 - 4 Byte 4 Byte 4 Byte
i686 64 8 Byte 8 Byte 4 Byte
i686 32 4 Byte 4 Byte 4 Byte
i686 7 4 Byte 4 Byte 4 Byte
armhf - 4 Byte 4 Byte 4 Byte
armhf 64 8 Byte 8 Byte 4 Byte
armhf 32 4 Byte 4 Byte 4 Byte
armhf 7 4 Byte 4 Byte 4 Byte
a64armhf - 4 Byte 4 Byte FAIL
a64armhf 64 8 Byte 8 Byte 8 Byte
a64armhf 32 4 Byte 4 Byte FAIL
a64armhf 7 4 Byte 4 Byte FAIL

For a64armhf, the version with -D_FILE_OFFSET_BITS=64 is the only one where
readdir succeeds on my specially-prepared directory.
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl9unJQACgkQ5xo1VCww
uqXawgf9H6G51A5PzxdtOgSuUilzwEULll2McSYirbJ5kKgXMNZjWESrpEfsCQsX
p68j23tSxTaInQi1uEBd+ye9Qrt5oQl0xCN14y3KIygb+m8SgC04yMS7Q2VgbhF9
7CSYqjXRb2zkB4OTMVPtFl8p04EcHZvBuFx0p2bCNXfFupPJ2ELRqnNIAht4fInW
JZnD7SYfrh39rBjLFeHIswyeH25LeFysvkGlpPbbL0ZX6Ku1RU/h+B6AXMWIriv4
04/10Qya1dlGVNY1qxShgWrvVTEJuAlZ+L6xg0DjkGbbedjU0cywwxIJybmgoWFY
K7nIC7zvm1DvHjEIzLm/0spCmCCf2A==
=BlIh
-----END PGP SIGNATURE-----


D
D
Danny Milosavljevic wrote on 26 Sep 2020 03:49
(name . Marius Bakke)(address . marius@gnu.org)(address . 43591@debbugs.gnu.org)
20200926034912.6ad44194@scratchpost.org
On Sat, 26 Sep 2020 03:42:44 +0200
Danny Milosavljevic <dannym@scratchpost.org> wrote:

Toggle quote (30 lines)
> Hi,
>
> by now I also tested it on overdrive1, an aarch64 machine.
>
> > I do not have access to a real aarch64 machine--so no idea how it is there.
> > That would be the most interesting case, because those don't have X86_32,
> > so ILP32 is either not present or implemented differently.
>
> I added it in the table below under "a64armhf".
>
> ILP32 is not present on aarch64.
>
> Test result of [1] updated:
>
> system _FILE_OFFSET_BITS off_t d_off-sizeof d_off-values
> ---------------------------------------------------------------
> x86_64 - 8 Byte 8 Byte 8 Byte
> i686 - 4 Byte 4 Byte 4 Byte
> i686 64 8 Byte 8 Byte 4 Byte
> i686 32 4 Byte 4 Byte 4 Byte
> i686 7 4 Byte 4 Byte 4 Byte
> armhf - 4 Byte 4 Byte 4 Byte
> armhf 64 8 Byte 8 Byte 4 Byte
> armhf 32 4 Byte 4 Byte 4 Byte
> armhf 7 4 Byte 4 Byte 4 Byte
> a64armhf - 4 Byte 4 Byte FAIL
> a64armhf 64 8 Byte 8 Byte 8 Byte
> a64armhf 32 4 Byte 4 Byte FAIL
> a64armhf 7 4 Byte 4 Byte FAIL

aarch64 - 8 Byte 8 Byte 8 Byte

Toggle quote (2 lines)
> For a64armhf, the version with -D_FILE_OFFSET_BITS=64 is the only one where
> readdir succeeds on my specially-prepared directory.
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl9unhgACgkQ5xo1VCww
uqX0kAf/aQdWX0rI6ewzOIgGflwtqpn+EX4XCuan4RCip2QINbbjNtkPyizeteKN
0cNwZryEmqfrzCoBiO9UpXIb+n4/pUQn2o8jW/s1x6UJJR1SmWiX4bE28SffOqtU
HOudY9tEgildMf2GJG5XJzJy3XVzvPig93ttsGwOL/CSmVm6/WiEw1n16SJW/O4H
jla3/solrbfLhppngE61fHciv4UhDsaoMi9b6GvXaz7frlBnLZ0eaTAYTf6cfe3f
LoxKnx0GpRwYLIIGpQHIBgDlqC9k0ZCo2fC86oMb+ZQg8RFe8oolhmzU4fZO1fZ6
Puyn6V1vMF9+l91oXoyQiSA9+POL6w==
=CB0G
-----END PGP SIGNATURE-----


D
D
Danny Milosavljevic wrote on 26 Sep 2020 12:50
(name . Andreas Enge)(address . andreas@enge.fr)
20200926124758.28866f16@scratchpost.org
Hi Andreas,

On Fri, 25 Sep 2020 22:03:23 +0200
Andreas Enge <andreas@enge.fr> wrote:

Toggle quote (9 lines)
> On Fri, Sep 25, 2020 at 01:11:18AM +0200, Marius Bakke wrote:
> > Arguably running code for foreign architectures through QEMU binfmt is
> > something of a hack. Mandating that every package *must* be patched to
> > support it seems user-hostile. I'm more in favor of dropping it on the
> > build farm
>
> Indeed it is weird we do not only compile packages natively on the build
> farm.

I'm sorry that my earlier analysis mentioned qemu at all. qemu is not at fault
at all. The same happens if you run 32 bit code on 64 bit hosts without using
qemu! Except for one case: i686 on x86_64--where they have a compatibility
layer in the kernel that works around this problem (which I don't like
either--it obfuscates the problem).

For example a problem appears on:

- armhf on x86_64 host; fault not because of qemu
- armhf on aarch64 host; not using qemu in the first place

And it should appear also on (but I didn't test):

- i686 on aarch64 host; not using qemu in the first place

The problem always appears if the host is 64 bits and the guest is 32 bits,
no matter what cpus both are, except for the case "i686 on x86_64".
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl9vHPQACgkQ5xo1VCww
uqWN+gf6At1d5aYujggHx0tZ53zlU+hUN7ok+tj8Ea51tkjINPKEwPjcmbdGjAs5
5OyXLLBQe+v1nbltp7nTMsryQelqWnYpzEerGw4wuFI2q/SQ3N9ARy1YXmBitTlc
+7nlCkx798uIIEBF9VYJGcTCXd+WPh9O7lqYnhwIMxzPqXvxfnBWmvnBWloMIFx7
Uw+wm/j4jTaI/RFI8/AtotPRfVubqXQhAMagW1vfB/1q0qyy9+v/VvsMO3u7iaKQ
75QdHmGZ1bMKEi4AIpP046Hn6Z0R5rALPk+EG+nOXqXtyq0/N386gF9b+OrCoDjK
vtASqeg4hV5Ot9ApjfbQmHw6XEQUXA==
=Btl3
-----END PGP SIGNATURE-----


E
E
Efraim Flashner wrote on 27 Sep 2020 08:43
(name . Danny Milosavljevic)(address . dannym@scratchpost.org)
20200927064313.GB1386@E5400
On Fri, Sep 25, 2020 at 03:36:46PM +0200, Danny Milosavljevic wrote:
Toggle quote (94 lines)
> > >Why is this not an issue with i686 on x86_64 kernels?
> >
> > I'm not sure. I'll check.
>
> $ cat a00.c
> #include <stdio.h>
> #if defined( __ILP32__ )
> #warning ILP32
> #endif
> int main() {
> return sizeof(off_t);
> }
> $ LC_ALL=C guix environment -s i686-linux gcc-toolchain -- gcc -o a00 a00.c
> a00.c:3:2: warning: #warning ILP32 [-Wcpp]
> 3 | #warning ILP32
> | ^~~~~~~
> $ ./a00
> $ echo $?
> 4
>
> That means they are using the Linux kernel's X86_32 ABI.
> It has its own getdents64 system call that returns another value for d_off.
>
> $ LC_ALL=C guix environment -s i686-linux gcc-toolchain -- gcc -o a00 -D_FILE_OFFSET_BITS=64 a00.c
> a00.c:3:2: warning: #warning ILP32 [-Wcpp]
> 3 | #warning ILP32
> | ^~~~~~~
> $ ./a00
> $ echo $?
> 8
>
> That is why __i686__ is not affected--at the cost of the kernel lying to us
> about the file system.
>
> Note that I also tried printing the actual d_off values[1]--on ILP32, even with
> _FILE_OFFSET_BITS=64, the VALUES are still 32 bits, and the same values as
> without _FILE_OFFSET_BITS. The d_off SLOT gets bigger on _FILE_OFFSET_BITS=64.
>
> (I also tried printing the actual d_off values[1] on x86_64 without any guix
> environment -s, I get entirely different d_off values!!)
>
> I also tried the former on native ARMHF--you get 32 bits d_off values. And d_off
> is always the same size as off_t.
>
> off_t size changes depending on _FILE_OFFSET_BITS.
>
> I do not have access to a real aarch64 machine--so no idea how it is there.
> That would be the most interesting case, because those don't have X86_32,
> so ILP32 is either not present or implemented differently.
>
> ppc64 would also be interesting...
>
> Test result of [1]:
>
> system _FILE_OFFSET_BITS off_t d_off-sizeof d_off-values
> -------------------------------------------------------------
> x86_64 - 8 Byte 8 Byte 8 Byte
> i686 - 4 Byte 4 Byte 4 Byte
> i686 64 8 Byte 8 Byte 4 Byte
> i686 32 4 Byte 4 Byte 4 Byte
> i686 7 4 Byte 4 Byte 4 Byte
> armhf - 4 Byte 4 Byte 4 Byte
> armhf 64 8 Byte 8 Byte 4 Byte
> armhf 32 4 Byte 4 Byte 4 Byte
> armhf 7 4 Byte 4 Byte 4 Byte
>
> This is all without qemu--in order to simplify the test case.
>
> So I take it ext4 has some special compilation mode for 32 bits...
>
> Could someone please test [1] on (real!) aarch64 and ppc64 and ppc32
> machines?
>
> [1] $ cat a00.c
> #include <stdio.h>
> #include <errno.h>
> #include <assert.h>
> #include <dirent.h>
> #if defined( __ILP32__ )
> #warning ILP32
> #endif
>
> int main() {
> DIR* d;
> struct dirent* ent;
> d = opendir("/tmp");
> errno = 0;
> assert(sizeof(ent->d_off) == sizeof(off_t));
> while ((ent = readdir(d)) != NULL) {
> printf("%llu\n", (unsigned long long) ent->d_off);
> }
> if (errno)
> perror("readdir");
> return sizeof(off_t);
missing } at the end

tested on my ibook G4:
(I believe for powerpc the 32-bit is defined as __powerpc__)

(ins)efraim@g4:~$ gcc -o a00 a00.c
(ins)efraim@g4:~$ ./a00
404218588
473424804
681064434
708508942
805784207
980330722
1080794671
1734802884
1909818320
1923689764
2019671154
2125602108
2147483647
(ins)efraim@g4:~$ echo $?
4
(ins)efraim@g4:~$ rm a00
(ins)efraim@g4:~$ gcc -o a00 -D_FILE_OFFSET_BITS=64 a00.c
(ins)efraim@g4:~$ ./a00
404218588
473424804
681064434
708508942
805784207
980330722
1080794671
1734802884
1909818320
1923689764
2019671154
2125602108
2147483647
(ins)efraim@g4:~$ echo $?
8



--
Efraim Flashner <efraim@flashner.co.il> אפרים פלשנר
GPG key = A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted
-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEoov0DD5VE3JmLRT3Qarn3Mo9g1EFAl9wNH4ACgkQQarn3Mo9
g1EyqBAAmV7HRvFIVUEaAl6hqRFPSLSE10DYglJCDfKpbAPw7BxomArRHe6YrNcR
SFukIry5xSm3/+S5BcpgviT9hmzbMnqRn9blFK20iRzalSo98PjnlZKHoNjs/lwe
viVY8pfyNQXGyZFkb6b8MhZw9mvXcX6Zvu7lm5Z5ah8brcxBy4GKsgcKnTyGG2ul
mWJHbUCbcjT/HUF5f19FYlglMA6qcCATfhKy7BjpIoD+dWU5Cu35U/8W4CBrKDw/
OTA+HmcVM/SYlxYHnWdSEtcvbF/IBN2YoXY8buxMNs4XKomnDBRvH4y48IJMh9yK
giU45l5bTMKXk28G5xs125N7QXejLCuD3P82SsnX5ztKGTxoK8a/ns1B/Il5RKk5
CA/5XwAk3lwa2MYmjNZNzm6hI7qAzCRoWJDrxNFa0mpp305QAACp98+HBnsWmX+e
ckrzfPnQVBLfmKxTyv+154GOKDq4/w//dkgBQUYh8cSXOK/ywG/s3xgnNbn5XloV
fokxAzN271k7RFukUhWR6vLDNHu2VfvnsfnMQtm7bw2oO+iM5zXVle65PP0JWgDc
I25kpqDXtFrCedwxAqV+Ox8zeAR1cxS3mvnC5giFLESS1ROXk3ZteNERsSuKczwW
7Hiprbj11f/7t1JPlgOQP3uz1IItreN8rDE02oh/dL95WaNzonY=
=FxbE
-----END PGP SIGNATURE-----


D
D
Danny Milosavljevic wrote on 29 Sep 2020 16:51
(address . 43591@debbugs.gnu.org)(name . Marius Bakke)(address . marius@gnu.org)
20200929165145.5d6b4434@scratchpost.org
(Note: ILP32 is not present on aarch64)

system _FILE_OFFSET_BITS off_t d_off-sizeof d_off-values
---------------------------------------------------------------
x86_64 - 8 Byte 8 Byte 8 Byte
i686 - 4 Byte 4 Byte 4 Byte
i686 64 8 Byte 8 Byte FAIL*
i686 32 4 Byte 4 Byte FAIL*
i686 7 4 Byte 4 Byte 4 Byte
armhf - 4 Byte 4 Byte 4 Byte
armhf 64 8 Byte 8 Byte 4 Byte
armhf 32 4 Byte 4 Byte 4 Byte
armhf 7 4 Byte 4 Byte 4 Byte
a64armhf - 4 Byte 4 Byte FAIL*
a64armhf 64 8 Byte 8 Byte 8 Byte
a64armhf 32 4 Byte 4 Byte FAIL*
a64armhf 7 4 Byte 4 Byte FAIL*
aarch64 - 8 Byte 8 Byte 8 Byte

*: Using FUSE filesystem with big d_off value.
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl9zSgEACgkQ5xo1VCww
uqWFbAgAnExo/8+26P+b98euZLJ++EQl74LB++acJmgQl5Qq+upiUQz+QyTp9fNp
WZHfv7mP9z9pHcOsYsDnFWTdwPiSkYXkDAlqxHedqptA/d8ZjHgUg32JQ4Q7ZXxr
aCyhAk7ej0QYxfDXaduOAJ1rT+7wtBoRDO5XMSz6kKdL66c1UCHv/RhmhV0eq2FQ
kg23P0T+VzWxuII1O/uFgrPdawkKletbL1eGy0u9KncIQMEp2o9iPX0b0rmLRVfP
szOgrhY9lD/+VNnl61aB98OQLoWupXadKlI0sthXvjJYFuY/tXygfGlreqDSqKJr
HgX022VdkFn3OX9qW2sZHCvU89E0jg==
=Kiu1
-----END PGP SIGNATURE-----


L
L
Ludovic Courtès wrote on 29 Sep 2020 22:52
(name . Marius Bakke)(address . marius@gnu.org)
87h7rg4879.fsf@gnu.org
Hi,

Marius Bakke <marius@gnu.org> skribis:

Toggle quote (5 lines)
> Arguably running code for foreign architectures through QEMU binfmt is
> something of a hack. Mandating that every package *must* be patched to
> support it seems user-hostile. I'm more in favor of dropping it on the
> build farm, or just keep fixing things on a per-package basis.

I’m fine with dropping things on the build farm; it’s just about
modifying machines-for-berlin.scm in maintenance.git. Any takers? :-)

Toggle quote (5 lines)
> A less user-hostile solution could perhaps be to (setenv "CFLAGS"
> "-D_FILE_OFFSET_BITS=64") on 32-bit architectures in gnu-build-system.
> Not sure whether that could cause any adverse effects. But again, I
> don't like the idea of optimizing for QEMUs user-mode emulation.

The above would override the default CFLAGS in Autoconf-generated
configure scripts (which is “-O2 -g”). So we’d have to be cautious.
But I think a global solution is preferable to adding
-D_FILE_OFFSET_BITS=64 to tens of packages.

WDYT?

Ludo’.
D
D
Danny Milosavljevic wrote on 30 Sep 2020 00:09
(name . Ludovic Courtès)(address . ludo@gnu.org)
20200930000934.6812b7c8@scratchpost.org
Hi Ludo,

On Tue, 29 Sep 2020 22:52:10 +0200
Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (10 lines)
> Marius Bakke <marius@gnu.org> skribis:
>
> > Arguably running code for foreign architectures through QEMU binfmt is
> > something of a hack. Mandating that every package *must* be patched to
> > support it seems user-hostile. I'm more in favor of dropping it on the
> > build farm, or just keep fixing things on a per-package basis.
>
> I’m fine with dropping things on the build farm; it’s just about
> modifying machines-for-berlin.scm in maintenance.git. Any takers? :-)

I don't know what "dropping things on the build farm" means in this context.
Dropping what exactly?

Toggle quote (3 lines)
> The above would override the default CFLAGS in Autoconf-generated
> configure scripts (which is “-O2 -g”).

That is correct. I'm currently working on v2 (testing a patchset already)
and I totally forgot to add "-g -O2" the first time around. Also, glibc
itself must NOT have -D_FILE_OFFSET_BITS=64 (it makes sense not to, too).

Toggle quote (4 lines)
>So we’d have to be cautious.
> But I think a global solution is preferable to adding
> -D_FILE_OFFSET_BITS=64 to tens of packages.

I agree.

I still would like to see what actually changes--and I think with
guix-data-services it should actually be possible to compare derivations
before-and-after and find out which derivations of which packages changed
at all because of the global -D_FILE_OFFSET_BITS=64. I'd like some help
using guix-data-services to find that out. Otherwise a risk estimation
cannot be done.

Technically, if a package used direct assembly offsets (for some unfathomable
reason), it could have an undetectable problem with the size change of off_t
(and also struct dirent). So examining the source code of the most essential
packages manually is still good. That's what I did in
branch wip-file-offsets-64.

I'm in the process of testing a patchset that globally sets

CFLAGS="-D_FILE_OFFSET_BITS=64 -g -O2"

instead.

That alone is not enough since there are a lot of non-autotools projects that
just ignore the environment variable entirely--not to mention languages other
than C.

I have access to bayfront--but I don't remember how to evaluate a new branch
there (the new branch is "wip-file-offset-bits-64-sledgehammer"). How does
it work?
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl9zsJ4ACgkQ5xo1VCww
uqX2cQgAmvzuWUzAWHm6T9M58P3PhM7dlK5gtkowB3hgxzYk/KkpluX56HXE2P54
q+m78K5rumyDkoVVtMtsAQbrJLnd+HuHOrlwBipcTuBsCPJ6r7Fc4BMVbvfm/kvd
y7fxqsmDybtrynFpUNiQadQZNT//NaSRLqa7XKv7HWblmGYVwuQ0+fdj+IEqUv5X
3l8h/ziDg2vB10rmtmFzh9kjN/lGsYYfHlnY87iNqoYbtY4Klkg9xQiGpKULWdA4
tUATpX9JsHpAmMYN/Rja3Yk1inpMZ4aWpSdYETZGi90WvItyrp8tSlsK0uqh7/Ak
8u39izKY7Yx4fNKDJqtUiy2DHHQW9g==
=F4u0
-----END PGP SIGNATURE-----


D
D
Danny Milosavljevic wrote on 30 Sep 2020 10:45
[PATCH core-updates v2 3/5] gnu: glibc: Do not explicitly set _FILE_OFFSET_BITS.
(address . 43591@debbugs.gnu.org)(name . Danny Milosavljevic)(address . dannym@scratchpost.org)
20200930084512.31738-4-dannym@scratchpost.org
* gnu/packages/base.scm (glibc)[arguments]<#:phases>[set-FILE-OFFSET-BITS]:
Delete phase.
---
gnu/packages/base.scm | 2 ++
1 file changed, 2 insertions(+)

Toggle diff (13 lines)
diff --git a/gnu/packages/base.scm b/gnu/packages/base.scm
index 41976c5ab0..6f3c717b50 100644
--- a/gnu/packages/base.scm
+++ b/gnu/packages/base.scm
@@ -782,6 +782,8 @@ the store.")
 
       #:tests? #f                                 ; XXX
       #:phases (modify-phases %standard-phases
+                 ;; glibc itself should support both--so don't choose here.
+                 (delete 'set-FILE-OFFSET-BITS)
                  (add-before
                   'configure 'pre-configure
                   (lambda* (#:key inputs native-inputs outputs
D
D
Danny Milosavljevic wrote on 30 Sep 2020 10:45
[PATCH core-updates v2 0/5] gnu: glibc-final: Catch all cases of a glibc user not requesting 64-bit offsets and then using readdir.
(address . 43591@debbugs.gnu.org)(name . Danny Milosavljevic)(address . dannym@scratchpost.org)
20200930084512.31738-1-dannym@scratchpost.org
Linux kernel file offsets are always 64 bits.
But userspace can be built to use 32 bit offsets.

"struct dirent", returned by readdir, uses d_off to store
such an "offset" that it got from the Linux kernel.
In the case of ext4 that "offset" is actually a 64 bit
hash value.

Therefore, there are cases where such an offset that it got
from the Linux kernel does not fit in the "struct dirent"
field "d_off".

If the guest system's glibc is 32 bit AND uses 32 bit
file offsets it is going to be very confused.
It does check whether d_off fits into the structure
it gives back to the user--and it doesn't fit. Hence readdir
fails, with errno == EOVERFLOW (which is undocumented and thus
an API error).
This manifests itself in simple directory reads not working
anymore in parts of cmake, for example.

This happened in Guix when building stuff for
ARMHF on a x86_64 build host using QEMU transparent emulation.

There is a very simple and complete way to avoid this problem:
Just always use 64 bit offsets in user space programs (also
on 32 bit machines).

Danny Milosavljevic (5):
gnu: glibc-final: Catch all cases of a glibc user not requesting
64-bit offsets and then using readdir regardless.
build-system/gnu: Explicity declare the _FILE_OFFSET_BITS we want.
gnu: glibc: Do not explicitly set _FILE_OFFSET_BITS.
gnu: glibc-mesboot0: Do not explicitly set _FILE_OFFSET_BITS.
gnu: rhash: Explicity declare the _FILE_OFFSET_BITS we want.

gnu/packages/base.scm | 2 +
gnu/packages/commencement.scm | 70 ++++++++++++++++++++++++++++++++-
gnu/packages/crypto.scm | 9 ++++-
guix/build/gnu-build-system.scm | 13 +++++-
4 files changed, 90 insertions(+), 4 deletions(-)
D
D
Danny Milosavljevic wrote on 30 Sep 2020 10:45
[PATCH core-updates v2 4/5] gnu: glibc-mesboot0: Do not explicitly set _FILE_OFFSET_BITS.
(address . 43591@debbugs.gnu.org)(name . Danny Milosavljevic)(address . dannym@scratchpost.org)
20200930084512.31738-5-dannym@scratchpost.org
* gnu/packages/commencement.scm (glibc-mesboot0)[arguments]<#:phases>[set-FILE-OFFSET-BITS]:
Delete phase.
---
gnu/packages/commencement.scm | 2 ++
1 file changed, 2 insertions(+)

Toggle diff (13 lines)
diff --git a/gnu/packages/commencement.scm b/gnu/packages/commencement.scm
index 284fa65d20..ad6a27ba6b 100644
--- a/gnu/packages/commencement.scm
+++ b/gnu/packages/commencement.scm
@@ -1398,6 +1398,8 @@ ac_cv_c_float_format='IEEE (little-endian)'
            ,(string-append "--prefix=" out)))
        #:phases
        (modify-phases %standard-phases
+         ;; glibc itself should support both--so don't choose here.
+         (delete 'set-FILE-OFFSET-BITS)
          (add-after 'unpack 'apply-boot-patch
            (lambda* (#:key inputs #:allow-other-keys)
              (and (let ((patch (assoc-ref inputs "boot-patch")))
D
D
Danny Milosavljevic wrote on 30 Sep 2020 10:45
[PATCH core-updates v2 2/5] build-system/gnu: Explicity declare the _FILE_OFFSET_BITS we want.
(address . 43591@debbugs.gnu.org)(name . Danny Milosavljevic)(address . dannym@scratchpost.org)
20200930084512.31738-3-dannym@scratchpost.org
* guix/build/gnu-build-system.scm (set-FILE-OFFSET-BITS): New procedure.
(%standard-phases): Add it.
---
guix/build/gnu-build-system.scm | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)

Toggle diff (31 lines)
diff --git a/guix/build/gnu-build-system.scm b/guix/build/gnu-build-system.scm
index d3347c9518..1a1c4627ab 100644
--- a/guix/build/gnu-build-system.scm
+++ b/guix/build/gnu-build-system.scm
@@ -60,6 +60,17 @@ See https://reproducible-builds.org/specs/source-date-epoch/."
   (setenv "SOURCE_DATE_EPOCH" "1")
   #t)
 
+(define* (set-FILE-OFFSET-BITS #:rest _)
+  "Define '_FILE_OFFSET_BITS' using the C preprocessor.
+This ensures that 32 bit and 64 bit user space both can handle results
+returned from a 64 bit kernel.
+See https://bugzilla.kernel.org/show_bug.cgi?id=205957."
+  ;; Setting CFLAGS here makes packages not default CFLAGS.
+  ;; Since glibc (and probably other packages) don't like being built without optimization, enable optimization here.
+  (setenv "CFLAGS" "-D_FILE_OFFSET_BITS=64 -g -O2")
+  (setenv "CXXFLAGS" "-D_FILE_OFFSET_BITS=64 -g -O2")
+  #t)
+
 (define (first-subdirectory directory)
   "Return the file name of the first sub-directory of DIRECTORY."
   (match (scandir directory
@@ -803,7 +814,7 @@ which cannot be found~%"
   ;; Standard build phases, as a list of symbol/procedure pairs.
   (let-syntax ((phases (syntax-rules ()
                          ((_ p ...) `((p . ,p) ...)))))
-    (phases set-SOURCE-DATE-EPOCH set-paths install-locale unpack
+    (phases set-SOURCE-DATE-EPOCH set-FILE-OFFSET-BITS set-paths install-locale unpack
             bootstrap
             patch-usr-bin-file
             patch-source-shebangs configure patch-generated-file-shebangs
D
D
Danny Milosavljevic wrote on 30 Sep 2020 10:45
[PATCH core-updates v2 1/5] gnu: glibc-final: Catch all cases of a glibc user not requesting 64-bit offsets and then using readdir regardless.
(address . 43591@debbugs.gnu.org)(name . Danny Milosavljevic)(address . dannym@scratchpost.org)
20200930084512.31738-2-dannym@scratchpost.org
* gnu/packages/commencement.scm (glibc-final): Catch all cases of a glibc user not
requesting 64-bit offsets and then using readdir.
---
gnu/packages/commencement.scm | 68 ++++++++++++++++++++++++++++++++++-
1 file changed, 67 insertions(+), 1 deletion(-)

Toggle diff (79 lines)
diff --git a/gnu/packages/commencement.scm b/gnu/packages/commencement.scm
index e5a4caa95c..284fa65d20 100644
--- a/gnu/packages/commencement.scm
+++ b/gnu/packages/commencement.scm
@@ -3462,7 +3462,73 @@ exec ~a/bin/~a-~a -B~a/lib -Wl,-dynamic-linker -Wl,~a/~a \"$@\"~%"
                ,hurd-headers-boot0)
              '())
        ,@(package-outputs glibc-final-with-bootstrap-bash))
-      ,@(package-arguments glibc-final-with-bootstrap-bash)))))
+      ,@(substitute-keyword-arguments
+         (package-arguments glibc-final-with-bootstrap-bash)
+         ((#:phases phases)
+          `(modify-phases ,phases
+             (add-after 'unpack 'patch-dirent
+               (lambda* (#:key outputs #:allow-other-keys)
+                 ;; Linux kernel file offsets are always 64 bits.
+                 ;; But userspace can be built to use 32 bit offsets.
+                 ;;
+                 ;; "struct dirent", returned by readdir, uses d_off to store
+                 ;; such an "offset" that it got from the Linux kernel.
+                 ;; In the case of ext4 that "offset" is actually a 64 bit
+                 ;; hash value.
+                 ;;
+                 ;; Therefore, there are cases where such an offset that it got
+                 ;; from the Linux kernel does not fit in the "struct dirent"
+                 ;; field "d_off".
+                 ;;
+                 ;; If the guest system's glibc is 32 bit AND uses 32 bit
+                 ;; file offsets it is going to be very confused.
+                 ;; It does check whether d_off fits into the structure
+                 ;; it gives back to the user--and it doesn't fit.  Hence readdir
+                 ;; fails, with errno == EOVERFLOW (which is undocumented and thus
+                 ;; an API error).
+                 ;; This manifests itself in simple directory reads not working
+                 ;; anymore in parts of cmake, for example.
+                 ;;
+                 ;; This manifested in Guix when building stuff for
+                 ;; ARMHF on a x86_64 build host using QEMU transparent emulation.
+                 ;;
+                 ;; There is a very simple and complete way to avoid this problem:
+                 ;; Just always use 64 bit offsets in user space programs (also
+                 ;; on 32 bit machines).  The Linux kernel does that already
+                 ;; anyway.
+                 ;;
+                 ;; Note: We might want to avoid using 64 bit when bootstrapping
+                 ;; using mescc (since mescc doesn't directly support 64 bit
+                 ;; values)--but then bootstrapping has to be done on a
+                 ;; file system other than ext4, or on ext4 with the feature
+                 ;; "dir_index" disabled.
+                 ;;
+                 ;; The change below does not affect 64 bit users.
+                 ;;
+                 ;; See <https://issues.guix.gnu.org/43513>.
+                 (let ((port (open-file "dirent/dirent.h" "a")))
+                   (display "
+#ifndef _LIBC
+#if __SIZEOF_LONG__ < 8
+#ifndef __USE_FILE_OFFSET64
+#if defined(_FILE_OFFSET_BITS) && _FILE_OFFSET_BITS == 32
+#warning \"Using -D_FILE_OFFSET_BITS=32 and using readdir is a bad idea, see <https://bugzilla.kernel.org/show_bug.cgi?id=205957>\"
+#else
+#undef readdir
+#define readdir @READDIR_WITHOUT_FILE_OFFSET64_IS_A_REALLY_BAD_IDEA@
+#endif
+#endif
+#endif
+#endif
+" port)
+                   (close-port port))
+                 ;; This file includes <dirent.h> and thus checks sanity already.
+                 ;; TODO: Check dirent/scandir-tail.c, dirent/scandir64-tail.c.
+                 (substitute* "posix/glob.c"
+                  (("(#[ ]*define[ ][ ]*readdir)") "
+#undef readdir
+#define readdir"))
+                 #t)))))))))
 
 (define/system-dependent gcc-boot0-wrapped
   ;; Make the cross-tools GCC-BOOT0 and BINUTILS-BOOT0 available under the
D
D
Danny Milosavljevic wrote on 30 Sep 2020 10:45
[PATCH core-updates v2 5/5] gnu: rhash: Explicity declare the _FILE_OFFSET_BITS we want.
(address . 43591@debbugs.gnu.org)(name . Danny Milosavljevic)(address . dannym@scratchpost.org)
20200930084512.31738-6-dannym@scratchpost.org
* gnu/packages/crypto.scm (rhash)[arguments]<#:make-flags>:
Explicity declare the _FILE_OFFSET_BITS we want.
---
gnu/packages/crypto.scm | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

Toggle diff (20 lines)
diff --git a/gnu/packages/crypto.scm b/gnu/packages/crypto.scm
index 028c140185..a10dd62e8b 100644
--- a/gnu/packages/crypto.scm
+++ b/gnu/packages/crypto.scm
@@ -845,8 +845,13 @@ BLAKE.")
                                       "/bin/" ,target "-gcc"))
                      '())))
        #:make-flags
-       ;; The binaries in /bin need some help finding librhash.so.0.
-       (list (string-append "LDFLAGS=-Wl,-rpath=" %output "/lib"))
+       (list ;; This package uses a configure script that is not from GNU
+             ;; autotools; it doesn't handle the environment variable
+             ;; CFLAGS (or for that matter the configure option).
+             ;; Therefore, directly pass it to make.
+             "CFLAGS=-D_FILE_OFFSET_BITS=64"
+             ;; The binaries in /bin need some help finding librhash.so.0.
+             (string-append "LDFLAGS=-Wl,-rpath=" %output "/lib"))
        #:test-target "test"             ; ‘make check’ just checks the sources
        #:phases
        (modify-phases %standard-phases
L
L
Ludovic Courtès wrote on 30 Sep 2020 11:32
Re: [bug#43591] [PATCH core-updates] gnu: glibc-final: Catch all cases of a glibc user not requesting 64-bit offsets and then using readdir.
(name . Danny Milosavljevic)(address . dannym@scratchpost.org)
87tuvf1uet.fsf@gnu.org
Hi,

Danny Milosavljevic <dannym@scratchpost.org> skribis:

Toggle quote (16 lines)
> On Tue, 29 Sep 2020 22:52:10 +0200
> Ludovic Courtès <ludo@gnu.org> wrote:
>
>> Marius Bakke <marius@gnu.org> skribis:
>>
>> > Arguably running code for foreign architectures through QEMU binfmt is
>> > something of a hack. Mandating that every package *must* be patched to
>> > support it seems user-hostile. I'm more in favor of dropping it on the
>> > build farm, or just keep fixing things on a per-package basis.
>>
>> I’m fine with dropping things on the build farm; it’s just about
>> modifying machines-for-berlin.scm in maintenance.git. Any takers? :-)
>
> I don't know what "dropping things on the build farm" means in this context.
> Dropping what exactly?

Dropping emulated builds, or at least 32-bit emulated builds. We just
need to remove build machines from the file above.

Toggle quote (20 lines)
>> The above would override the default CFLAGS in Autoconf-generated
>> configure scripts (which is “-O2 -g”).
>
> That is correct. I'm currently working on v2 (testing a patchset already)
> and I totally forgot to add "-g -O2" the first time around. Also, glibc
> itself must NOT have -D_FILE_OFFSET_BITS=64 (it makes sense not to, too).
>
>>So we’d have to be cautious.
>> But I think a global solution is preferable to adding
>> -D_FILE_OFFSET_BITS=64 to tens of packages.
>
> I agree.
>
> I still would like to see what actually changes--and I think with
> guix-data-services it should actually be possible to compare derivations
> before-and-after and find out which derivations of which packages changed
> at all because of the global -D_FILE_OFFSET_BITS=64. I'd like some help
> using guix-data-services to find that out. Otherwise a risk estimation
> cannot be done.

A change in gnu-build-system would change all the derivations. I don’t
think the Data Service can help us here.

Toggle quote (6 lines)
> Technically, if a package used direct assembly offsets (for some unfathomable
> reason), it could have an undetectable problem with the size change of off_t
> (and also struct dirent). So examining the source code of the most essential
> packages manually is still good. That's what I did in
> branch wip-file-offsets-64.

Yeah.

Toggle quote (6 lines)
> I'm in the process of testing a patchset that globally sets
>
> CFLAGS="-D_FILE_OFFSET_BITS=64 -g -O2"
>
> instead.

OK.

Toggle quote (4 lines)
> That alone is not enough since there are a lot of non-autotools projects that
> just ignore the environment variable entirely--not to mention languages other
> than C.

Yeah…

I have mixed feelings: fixing packages one by one doesn’t sound great,
but OTOH setting the ‘CFLAGS’ environment variable globally can have
unexpected side effects in some cases (overriding package-specific
CFLAGS) and zero effects in other cases (for non-Autoconf packages or
badly-written ‘configure.ac’ files), both of which would be hard to
detect.

~~~

If we take a step back: what’s the problem? We have a problem with
emulated 32-bit architectures on 64-bit architectures. But that’s OK,
we can stop those emulations for now. Then we have packages that do not
support large files; it’s not great but evidently we can live with it.
:-) Ideally, we’d report it upstream when we encounter it.

So to me that hints at targeted fixes: fixing key packages like CMake
(roughly what you already did) where lack of large file support can be
problematic.

Thoughts?

Ludo’.
D
D
Danny Milosavljevic wrote on 30 Sep 2020 12:28
(name . Ludovic Courtès)(address . ludo@gnu.org)
20200930122821.1471d155@scratchpost.org
Hi Ludo,

On Wed, 30 Sep 2020 11:32:58 +0200
Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (3 lines)
> Dropping emulated builds, or at least 32-bit emulated builds. We just
> need to remove build machines from the file above.

Oh.

Do we have real armhf machines? (as in not aarch64)

*Looks at guix-maintenance* We do. Awesome. Then sure.

But just to be clear, WE MUST NOT USE aarch64 to build armhf as long as this
problem isn't fixed.

This problem has nothing to do with emulation.

Toggle quote (3 lines)
> A change in gnu-build-system would change all the derivations. I don’t
> think the Data Service can help us here.

I want to know what actually changes in the final binaries. Surely that works
somehow--guix data services or not.

Basically, for each package in Guix,

diff -r `~/broken-guix/pre-inst-env guix build $package` `~/fixed-guix/pre-inst-env guix build $package` || echo "affected: $package"

but after replacing references by deduplicated content addressed references
(for example if derivation A refers to files in derivation B, but derivation B
only changed the directory name it's in and not the content of the derivation,
then that should not count as a diff in A. That should happen recursively).

Basically ignore the directory names in /gnu/store and make new directory
names that are the hash of each directory's (recursive) CONTENTS (after
fixing references in that content, too, obviously)--as opposed to sources.
Then diff those.

If necessary, I can run that on my laptop--it will just take several weeks
and miss derivations I don't have in the first place.

Toggle quote (7 lines)
> I have mixed feelings: fixing packages one by one doesn’t sound great,
> but OTOH setting the ‘CFLAGS’ environment variable globally can have
> unexpected side effects in some cases (overriding package-specific
> CFLAGS) and zero effects in other cases (for non-Autoconf packages or
> badly-written ‘configure.ac’ files), both of which would be hard to
> detect.

The latter is easy to detect since I patched dirent.h in glibc exactly for that
reason. That way, glibc WON'T let you use it wrong (except if you explicitly
ask for it). On Guix systems, there is no legitimate reason to use it wrong
in the first place.

In my opinion, not having an automated way to tell us when a package is using
it wrong would be not doing our due diligence--how would you know we had
actually fixed the problem for good? You wouldn't know.
And you wouldn't have fixed the problem for good--I can tell you that much now.

There already was an essential package, rhash, which didn't pick up the
CFLAGS--and that's how I found it. It's easy.

About the unexpected side effects--yes, that's right. That's why we should get
a list of diff results (see above for the command) and then manually look at
the source code of those packages and their dependencies.

Toggle quote (2 lines)
> If we take a step back: what’s the problem?

It means we have no trustworthy i686 packages, which means we do not have a
trustworthy full source bootstrap using Mes (since that uses i686 executable
to bootstrap).

In practice, this problem is not so bad since the kernel on i686 has a compat
layer that hasn't been telling us the truth for d_off, so we should be "good".
But philosophically, we are doing it dead wrong.

Also, this won't work on armhf or any other 32 bit architecture--so there,
we would be both philosophically and practically wrong.

Also, the "not telling us the truth for d_off on i686" is a leaky compat layer.
It totally DOES wind up telling us the truth sometimes (see my earlier test
table)--and then we have a problem.

Toggle quote (4 lines)
> Then we have packages that do not
> support large files; it’s not great but evidently we can live with it.
> :-) Ideally, we’d report it upstream when we encounter it.

I really don't care about large file support. That's mostly a bonus we get
while fixing this whole ordeal the right way. That said, maybe users
care they actually can store a 5 GiB file on their 4000 GiB drive on armhf :P

Toggle quote (4 lines)
> So to me that hints at targeted fixes: fixing key packages like CMake
> (roughly what you already did) where lack of large file support can be
> problematic.

As long as we patch glibc's dirent.h so it tells us when we are doing stupid
stuff, we can't go wrong much. So sure.

As I said, the main problem is FINDING the affected packages. It's not like
they'll fail building (in general). They'll just do weird stuff at runtime
instead. Case in point: cmake DID NOT fail building. And it totally is
broken.

Everything after that is easy.
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl90XcUACgkQ5xo1VCww
uqXKxgf/foOlL/5z3NtiynHPzBn2e/EqziNnVXtSjZaPMdvmK4N09rvWBFLDZ3Rl
dOBfMTuwhiY63BbhmmSXngYdO8USLxRoEn1GCgcYrjLt2YzfxGPaQ5xyrXV0FhDR
l5UNQPNcqwk2qfK6kmmzg2qCoRG0C8+/MSt6QPGQLZ4U1JNfMuKqDFVoGR7sJW15
sLOH62x+WvIpOq8sXmHzeeguN1Aj01VaNvWWg5g0inBkqM5FcB1Q3JOxIAFnFsuf
hH2YGqOfqlz/o9hIYHYOSQt9FcwtuBVurz4fac0QmKnNlVQ8Qj2UKuD26LO5BBT3
YR3MH6yIPKc8tMSU2XnnmhnE2hPS6Q==
=/PXl
-----END PGP SIGNATURE-----


D
D
Danny Milosavljevic wrote on 30 Sep 2020 18:55
Re: [PATCH core-updates v2 1/5] gnu: glibc-final: Catch all cases of a glibc user not requesting 64-bit offsets and then using readdir regardless.
(address . 43591@debbugs.gnu.org)
20200930185521.70952954@scratchpost.org
Still broken: boost.

In file included from ./boost/interprocess/detail/os_file_functions.hpp:40:0,
from ./boost/interprocess/mapped_region.hpp:29,
from libs/log/src/posix/ipc_reliable_message_queue.cpp:51:
./boost/interprocess/detail/os_file_functions.hpp:636:16: error: stray ‘@’ in program
while((de=::readdir(d))) {
[...]
build of /gnu/store/j29v8ph3asv4sywi0437hs6wwd9knvy2-boost-1.74.0.drv failed

That means that boost, or that source file, is not being compiled with
_FILE_OFFSET_BITS=64 .

Nowhere does the header file boost/interprocess/detail/os_file_functions.hpp
even check the value of _FILE_OFFSET_BITS.

./libs/filesystem/src/platform_config.hpp is responsible for setting up
_FILE_OFFSET_BITS in general.

That means that platform_config.hpp is not being #included in the former.

boost/interprocess/errors.hpp:// Parts of this code are taken from boost::filesystem library
boost/interprocess/errors.hpp:// See library home page at http://www.boost.org/libs/filesystem

But that means that this does not USE the "filesystem" library and neither does
it set _FILE_OFFSET_BITS like the "filesystem" library would.

All this happens even after I added a phase to gnu-build-system to set
_FILE_OFFSET_BITS unconditionally, so boost ignores that, too.
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl90uHkACgkQ5xo1VCww
uqWXIQgAoDXHn/Nj7x2sGOXLLOAqD6Tbzk1ti0UhU15/ah9ygp/2aN6oVZOVclnz
7j7ppgdUPT78BaPrQ+LaPvaHmrZuHww7oNGQnIi6LmT47ExOZQwr59wHg/Q5lgx4
XVmWuWQ7tqE+WifAEffRR/TLGEJZwAe54LnwaS/QRtnW7T5gRUvJdgUeV8p0MgPa
f58AVS/zu3R3FSJuI4R0N1tlYZBZqNnykBM7n8gWhsacQKZjlrTNevEx6lS/scDd
yiIH9Po0Xej/B0CAa2fIhx56nBJFq5zdsXBBOV+gZIAvbCpxsOeF5mtOhBDijgKx
QemN1/Vzw56tC9soRwuKLJpvlHtnyg==
=xmR4
-----END PGP SIGNATURE-----


L
L
Ludovic Courtès wrote on 1 Oct 2020 09:14
Re: [bug#43591] [PATCH core-updates] gnu: glibc-final: Catch all cases of a glibc user not requesting 64-bit offsets and then using readdir.
(name . Danny Milosavljevic)(address . dannym@scratchpost.org)
87o8lmv2nx.fsf@gnu.org
Hi,

Danny Milosavljevic <dannym@scratchpost.org> skribis:

Toggle quote (17 lines)
> On Wed, 30 Sep 2020 11:32:58 +0200
> Ludovic Courtès <ludo@gnu.org> wrote:
>
>> Dropping emulated builds, or at least 32-bit emulated builds. We just
>> need to remove build machines from the file above.
>
> Oh.
>
> Do we have real armhf machines? (as in not aarch64)
>
> *Looks at guix-maintenance* We do. Awesome. Then sure.
>
> But just to be clear, WE MUST NOT USE aarch64 to build armhf as long as this
> problem isn't fixed.
>
> This problem has nothing to do with emulation.

Now I’m lost; I thought this had to do with qemu-user.

Could you propose a patch for maintenance.git?

Toggle quote (12 lines)
> I want to know what actually changes in the final binaries. Surely that works
> somehow--guix data services or not.
>
> Basically, for each package in Guix,
>
> diff -r `~/broken-guix/pre-inst-env guix build $package` `~/fixed-guix/pre-inst-env guix build $package` || echo "affected: $package"
>
> but after replacing references by deduplicated content addressed references
> (for example if derivation A refers to files in derivation B, but derivation B
> only changed the directory name it's in and not the content of the derivation,
> then that should not count as a diff in A. That should happen recursively).

Yeah, you “just” need to compare modulo store file names…

Toggle quote (12 lines)
>> I have mixed feelings: fixing packages one by one doesn’t sound great,
>> but OTOH setting the ‘CFLAGS’ environment variable globally can have
>> unexpected side effects in some cases (overriding package-specific
>> CFLAGS) and zero effects in other cases (for non-Autoconf packages or
>> badly-written ‘configure.ac’ files), both of which would be hard to
>> detect.
>
> The latter is easy to detect since I patched dirent.h in glibc exactly for that
> reason. That way, glibc WON'T let you use it wrong (except if you explicitly
> ask for it). On Guix systems, there is no legitimate reason to use it wrong
> in the first place.

I’m very reluctant to patching public libc headers. Also, it’s not just
“our” problem, we should definitely discuss it with upstream and perhaps
propose your dirent.h patch.

I’m also not sure what you mean by “using it wrong”, what is “it”?

Building without _FILE_OFFSET_BITS=64 is still a valid option, albeit
one that is not recommended.

Toggle quote (4 lines)
> About the unexpected side effects--yes, that's right. That's why we should get
> a list of diff results (see above for the command) and then manually look at
> the source code of those packages and their dependencies.

A diff at one point in time (if we ever managed to get a usable diff) is
not enough: problems could pop up anytime. Setting ‘CFLAGS’ globally as
an environment variable seems risky.

Toggle quote (17 lines)
>> If we take a step back: what’s the problem?
>
> It means we have no trustworthy i686 packages, which means we do not have a
> trustworthy full source bootstrap using Mes (since that uses i686 executable
> to bootstrap).
>
> In practice, this problem is not so bad since the kernel on i686 has a compat
> layer that hasn't been telling us the truth for d_off, so we should be "good".
> But philosophically, we are doing it dead wrong.
>
> Also, this won't work on armhf or any other 32 bit architecture--so there,
> we would be both philosophically and practically wrong.
>
> Also, the "not telling us the truth for d_off on i686" is a leaky compat layer.
> It totally DOES wind up telling us the truth sometimes (see my earlier test
> table)--and then we have a problem.

Hmm I guess I need to re-read all that, I’m overwhelmed.

Thanks,
Ludo’.
D
D
Danny Milosavljevic wrote on 2 Oct 2020 09:18
(name . Ludovic Courtès)(address . ludo@gnu.org)
20201001120944.5059c650@scratchpost.org
Hi Ludo,

On Thu, 01 Oct 2020 09:14:10 +0200
Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (4 lines)
> > This problem has nothing to do with emulation.
>
> Now I’m lost; I thought this had to do with qemu-user.

I had thought so, too, a few weeks ago. But that's not the case.

It's not at all related to qemu.

The problem is a fundamental problem: a 64 bit value does NOT fit into a 32 bit
slot.

glibc uses getdents64 to get 64 bit dents and then acts all surprised and
errory when it gets 64 bit dents. (also on 32 bit glibc)

The same happens natively when using armhf on aarch64, without qemu-user.
That's what the table I sent was all about.

Calling getdents64 is not the problem--glibc has to do that, otherwise a 32 bit
glibc won't work RELIABLY on a 64 bit kernel anyway.

And emitting an error it does because we do not enable large file support.

This table (updated after I finally compiled a guix gcc-toolchain-10 with the
(unpatched) glibc in question on armhf):

system _FILE_OFFSET_BITS off_t d_off-sizeof d_off-values
---------------------------------------------------------------
x86_64 - 8 Byte 8 Byte 8 Byte
i686 - 4 Byte 4 Byte 4 Byte
i686 64 8 Byte 8 Byte FAIL*
i686 32 4 Byte 4 Byte FAIL*
i686 7 4 Byte 4 Byte 4 Byte
armhf - 4 Byte 4 Byte FAIL*
armhf 64 8 Byte 8 Byte 8 Byte
armhf 32 4 Byte 4 Byte FAIL*
armhf 7 4 Byte 4 Byte FAIL*
a64armhf - 4 Byte 4 Byte FAIL*
a64armhf 64 8 Byte 8 Byte 8 Byte
a64armhf 32 4 Byte 4 Byte FAIL*
a64armhf 7 4 Byte 4 Byte FAIL*
aarch64 - 8 Byte 8 Byte 8 Byte

*: Using FUSE filesystem with big d_off value.

None of those tests were done with qemu. They were all native.

That's why I wanted access to real aarch64 machines--otherwise I could have
done it with qemu on my x86_64 computer :P

Toggle quote (2 lines)
> I’m very reluctant to patching public libc headers.

Well, I don't like it either--that's why it's very very careful. My patch
doesn't change anything that users experience at runtime and basically just
prevents developers from compiling something that is using readdir without
thinking about large files first (because they HAVE TO if their programs run
on a host kernel that has bigger d_off--there's no sane way around it).

If they absolutely want to, they can set _FILE_OFFSET_BITS=32 and it will
let them do it (the argument by Marius is that people might want to do that
on embedded. But that means they'll sometimes have readdir fail--depending
on their setup (also on 32 bit kernels). Embedded is not specially exempt
from ths bug ;) ).

I think that this patch is guix-specific in the sense that it happens
pretty often that we do "-s i686-linux" on x86_64, "-s armhf-linux" on
aarch64 and expect that to work. And there's no qemu we could even patch in
those cases, because no qemu is used in the first place.

Toggle quote (4 lines)
> Also, it’s not just
> “our” problem, we should definitely discuss it with upstream and perhaps
> propose your dirent.h patch.

Sure. I think 15 years of migration path to 64 bit off_t was more than enough.

Now, I'd prefer if glibc made people choose _FILE_OFFSET_BITS explicitly on
32 bit. Everything else is a hack that WILL break unexpectedly. Users still
can choose _FILE_OFFSET_BITS=32, if they want.

Toggle quote (2 lines)
> I’m also not sure what you mean by “using it wrong”, what is “it”?

"it" is users calling readdir() without defining _FILE_OFFSET_BITS=64 in their
source file / Makefile.
This causes glibc to call getdents64 and then act all surprised when it gets
a 64 bit result back.

Toggle quote (9 lines)
> > Also, this won't work on armhf or any other 32 bit architecture--so there,
> > we would be both philosophically and practically wrong.
> >
> > Also, the "not telling us the truth for d_off on i686" is a leaky compat layer.
> > It totally DOES wind up telling us the truth sometimes (see my earlier test
> > table)--and then we have a problem.
>
> Hmm I guess I need to re-read all that, I’m overwhelmed.

Yeah--it's understandable. I'm working on understanding and fixing this problem
for a hundred hours now--it took forever for me to get to the bottom of this,
too.

And in the beginning I, too, suspected qemu. But it's totally blameless.
Nothing should be changed in qemu-user or in our qemu binfmt service.

The fundamental problem is that POSIX specifies that telldir and seekdir must
exist, and return and take a LONG, respectively.

That means that glibc has to preserve d_off it got from getdents64 (size is
64 bits), otherwise how would seekdir work?

But the offset parameter of seekdir is standardized as LONG, which means that it
won't work in the first place on 32 bit when there is either a 64 bit kernel or
a filesystem that just happens to store bigger stuff.

So glibc chose to check whether the getdents64 d_off just happens to fit into
the LONG this time around it was called. I argue that that is insane. It
would be better to always fail, or never fail--not only fail on the first d_off
that is > 2**32. When that happens is a filesystem implementation detail :P

I think the assumption was that the kernel would store an actual offset into
d_off. But it doesn't--it stores a hash in the case of ext4 (and probably
in other cases).

And in any case, even if it was an offset, that is still an unsafe way to fix
the problem.

First, someone needs to fix the POSIX standard to say "off_t", not "long".

Then, distributions who want to use 32 bit userland on 64 bit kernel need
to enable large files globally. That is a choice a distribution has to make.

Not making a choice is a choice too--the behavior will be random, and if
my research in wip-file-offset-bits-64 is any indication then very
fundamental things will be broken at unexpected places, and usually it DOES
NOT result in a build failure (without my glibc patch). That basically
means that using 32 bit native on 64 bit kernel cannot be supported in Guix
if no choice is made.

If choice "yes" is made, one needs to have a way to find these
non-build-breaking using-readdir-wrong packages. How would an alternative
way to do this look?
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl921C0ACgkQ5xo1VCww
uqVaPgf/fIxl8BxRKuGGhS5XYcC2VjATPoqeCdJwh+ewtIhNUd//GAPVz8GLvLlW
kdyReYbVfqYTeQMxvRfurh7ASV3O64fiDMrAMkv1aYmjQXLZrviOI1NgTxDs4GcL
XgSv1sKFLJwuLYqXAlpE9msOkrFX6IPhnprqFfr7OIDQtM6OgBOkWoJ7yW+fU3b0
G+2P52cknDtqLBMFMDH0bGCbILNb+mrQPxYXkJzKYEydWKuErlKnclvLUI88siWX
BAg1LQ88jUjc86PmaJnwHU8nBPj2HrsfabcRNOIOMniSGRRYbFqr4slodVQ7nTzk
tCGBCx9D/Iuvdvll6Lmq7lFuL9c1rQ==
=559R
-----END PGP SIGNATURE-----


D
D
Danny Milosavljevic wrote on 2 Oct 2020 10:12
(name . Ludovic Courtès)(address . ludo@gnu.org)
20201002101237.3fcf2ef9@scratchpost.org
Toggle quote (6 lines)
> My patch
> doesn't change anything that users experience at runtime and basically just
> prevents developers from compiling something that is using readdir without
> thinking about large files first (because they HAVE TO if their programs run
> on a host kernel that has bigger d_off--there's no sane way around it).

And just to be clear, they still SHOULD choose whether or not to support
large files even if the host kernel is 32 bit (!!!!)--see table. It's
just less common there.
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl924PUACgkQ5xo1VCww
uqX8ygf+ID5inG0pcZIPcfNYEiGq4/eQJoWpzzxGfmslDCVcDNoq13ey/z9t+xVW
5G4tBNh3ouz3G7rVwTA7FF3SOFvTGYeCPkF605kDDJgvzFSsC0TB3W4tCWZBC0b9
deOPmnfDHTVt+JCKl1YTgCYiMJO8M03zSyT3VWOkoQmjvSwO0poxCqLjWY50ZG6u
f6foIkLloMCZ27S7fBuieTkGoeDtS03PhEAvnMoMrYKMFLllmENAAgXR1n5/jDB7
esjyfiBjOxGZBWmj6a2acr0LKjIAn8yUCuhY++5AfHTsJatw+y2pqpvZwn+rLIYd
0/FGxwjKY3GFQi2oFi7VzVhf7tRvzw==
=z5Zb
-----END PGP SIGNATURE-----


D
D
Danny Milosavljevic wrote on 2 Oct 2020 11:32
(name . Ludovic Courtès)(address . ludo@gnu.org)
20201002113238.33cdf7d2@scratchpost.org
Updated table after more tests:

system _FILE_OFFSET_BITS off_t d_off-sizeof d_off-values
---------------------------------------------------------------
x86_64 - 8 Byte 8 Byte 8 Byte
i686 - 4 Byte 4 Byte 4 Byte
i686 64 8 Byte 8 Byte FAIL*
i686 32 4 Byte 4 Byte FAIL*
i686 7 4 Byte 4 Byte FAIL*
armhf - 4 Byte 4 Byte FAIL*
armhf 64 8 Byte 8 Byte 8 Byte
armhf 32 4 Byte 4 Byte FAIL*
armhf 7 4 Byte 4 Byte FAIL*
a64armhf - 4 Byte 4 Byte FAIL*
a64armhf 64 8 Byte 8 Byte 8 Byte
a64armhf 32 4 Byte 4 Byte FAIL*
a64armhf 7 4 Byte 4 Byte FAIL*
aarch64 - 8 Byte 8 Byte 8 Byte

*: Using FUSE filesystem with big d_off value.

None of those tests were done with qemu. They were all native.
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl9287cACgkQ5xo1VCww
uqUAdwgAkp3NKYy6+mUkcQNej0JlQ2sS1GAxbuN6joI/c9eeS9lI8AJYYDd0GnR6
usz1zUJjNToZgngCPyNwz5m+z5ZfFQaLrcqAzNQZUIstYlSSt+rls9kVea3WboZz
rFffZgbOUiEXpHB3udUT+meWqPoWQTVacZ9tELqoBbOfEmYGwKon9qeNuhoxqMqB
xjc35UGJIm+bWskBkpjGOS3xQ2TUSMtUI81pyaOb5BETTymCenjUuxszer5hEuLX
DNZIMsI2UH2Y9GShNod+8KoEyOBSbtLYidoseC7oNYfueHc1y+0MJ/mLp50/KpId
qaT4hf5M1JaxjZgKqZ+lA3oPA2hkKg==
=LHw0
-----END PGP SIGNATURE-----


D
D
Danny Milosavljevic wrote on 2 Oct 2020 11:47
20201002114747.4da6c679@scratchpost.org
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl9290MACgkQ5xo1VCww
uqW+/Af9EJhkqXzaTnRfRYtCfIPjCW9i83WYTh75MFwwRSJAzVtrbSObZsvFYzOl
dS1byAbJIHPatHELOD2Fgko5fhX3snl0gmpfTRowjVJGa/ZwLBo8yxx8LVxhMP8U
hCOB+BNjFM7uXkJqvvy6N9mWdS3W9DBy3znZvQnJMYX8z9q7/UVzoSADbWKcZtFs
pbQEhsZvXhbCRZdRjXLun+VHaXKAMHcNJ137HNmdlk2LoaAuiBGsOf2cPyLnXMJA
WzlxqvK6TxiUIgG3FahiyheYzl73mQSjs4QQ2KUfq4faITZVlcYVrrv4dW7KyVKA
ghNyZ2S8/tAMyYttmuS13b/CzvLmDg==
=QFV7
-----END PGP SIGNATURE-----


D
D
Danny Milosavljevic wrote on 6 Oct 2020 17:39
(name . Ludovic Courtès)(address . ludo@gnu.org)
20201006173932.08003c1d@scratchpost.org
Hi Ludo,

On Thu, 01 Oct 2020 09:14:10 +0200
Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (2 lines)
> Could you propose a patch for maintenance.git?

OK--patch #43829.

Toggle quote (5 lines)
> >> I have mixed feelings: fixing packages one by one doesn’t sound great,
> >> but OTOH setting the ‘CFLAGS’ environment variable globally can have
> >> unexpected side effects in some cases (overriding package-specific
> >> CFLAGS)

Doing (setenv CFLAGS "-g -O2 -D_FILE_OFFSET_BITS=64") right after
phase 'set-SOURCE-DATE-EPOCH doesn't override package-specific CFLAGS.

Quite the opposite can happen, though. But:

Toggle quote (9 lines)
> > The latter is easy to detect since I patched dirent.h in glibc exactly for that
> > reason. That way, glibc WON'T let you use it wrong (except if you explicitly
> > ask for it). On Guix systems, there is no legitimate reason to use it wrong
> > in the first place.
>
> I’m very reluctant to patching public libc headers. Also, it’s not just
> “our” problem, we should definitely discuss it with upstream and perhaps
> propose your dirent.h patch.

I've reported it upstream.

However, GNU gcc and glibc support a lot of weird architectures--but Guix
system really doesn't. So it's much easier for us to get a good patch than
it is for them.

Toggle quote (4 lines)
> A diff at one point in time (if we ever managed to get a usable diff) is
> not enough: problems could pop up anytime. Setting ‘CFLAGS’ globally as
> an environment variable seems risky.

We are about 15 years late--so all other distributions already triggered
most of the bugs in that time. I don't think it's that bad anymore...

That's why I would prefer setting CFLAGS globally anyway.

And I don't have the energy to manually FIND AND fix however many packages
are affected otherwise.

Having this problem in 2020 is ridiculous--it's like someone accidentially
enabled a time machine...

The only reason this didn't fall on our head on x86_64 is because on 64 bit
systems something like it is the default anyway.
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl98j7QACgkQ5xo1VCww
uqUvrgf+IotLG5H1B4fcZhvssJhkOrqpzLJmq4n7YnaYiMuUmxAV94jLvyjHWz66
Ab9zW0EFg8eKpR7GpvXI3J35aPUk+aHUb6frhkbp7ky6ItN6IfJYpyNyHJZv3eb4
bmdRAe6uW2a/09oH/Zhtji1cI9CH2tMudfWGfhNUZJwXeM2ZE99HDr4vsSeu0gYu
wEwJMxBdRwnITYofQBbPxnbYGipzLObAErew5zMSpoC4RXPHWbUsNGgkuSlnpSLu
/nUeWfJNkqOLHUj8vsemTyHw+kZ5qVsaKqOpEfjLuqK5ebx65XoDmAezcHwV05d4
4L/P5clGSvZ2ZfD+V1SPGD/b2af7Dg==
=gZXy
-----END PGP SIGNATURE-----


?