Hi Marius, On Fri, 25 Sep 2020 01:11:18 +0200 Marius Bakke wrote: > Arguably running code for foreign architectures through QEMU binfmt is > something of a hack. The problem here is not qemu. It's not even the size of the whole struct dirent that's the problem, or endianness, or alignment, or any of that stuff. The problem is that if you have a 64-bit hash value then you can't fit it into a 32-bit field. You have to make the field bigger. It's the only non-insane way to fix this and still be able to use the host filesystem from the guest. This bug is making readdir randomly and silently succeed or fail. Note: The bug I'm referring to is caused by having 32 bit offsets in user space in the first place--nothing else. That is the root of the problem, not qemu, not ext4, not the kernel. Having 32 bit offsets in user space when kernel space uses 64 bit offsets is just weird (note: the kernel always uses 64 bit offsets--also on 32 bit architectures!). Note: This also affects 9p, which has been recently "fixed" in the Linux kernel. >Why is this not an issue with i686 on x86_64 kernels? I'm not sure. I'll check. >Or armhf on AArch64? I cannot check that, but it totally should be a problem there. It has nothing to do with qemu specifically, and IMO qemu is right in not "fixing" it (because it's NOT a problem in qemu), and in expecting 32 bit guests not to use the getdents64 system call :P But even using getdents (no 64) as a flag what size d_off the caller expects is an ugly workaround. Not because of the flag, but because of what you have to do if it's set. The right fix is to switch to LFS (and thus also 64 bit d_off) on 32 bit guests. It's 2020; normal systems have drives a lot bigger than 4 GiB. Even Raspberrys have bigger drives. Also, just for the record, glibc always calls getdents64, never getdents. Using getdents64 and then being surprised when a 64 bit result comes back is seriously weird. And they are still defending it! If they didn't do that, at least the kernel would know something is up, and qemu would still pass through getdents as getdents to the host kernel and getdents64 as getdents64 to the host kernel. Also, qemu could warn on calls to getdents (and not on calls to getdents64). There are no downsides to knowing what the user expects. But even then, user space probably would have LFS disabled. That means as soon as user space mounts something from NFS or 9p or whatever on their (let's say embedded) platform, they would have this problem again, and it would be very difficult to find what happened. Why keep this problem alive? >This problem only manifests when using QEMUs syscall emulation, right? No. It's a fundamental problem with the slot you want to store something in being too small for it. We just didn't see it yet, because usually the system *silently returns* halfway through the readdir loop and does not tell us about any problems once it encounters any d_off >= 2**32 along the way. In the "json-c" bug, luckily json-c or cmake actually NEEDED a file that had not been returned because of that. That is not a given. >Mandating that every package *must* be patched to support it seems user-hostile. Ok, then how about a flag that users can set in order to let it compile regardless? But then they shouldn't use that flag in any (non-cross-compiled) guix packages. We can mention it in the news so people are not surprised. (Thinking about it, the flag could just be "-D_FILE_OFFSET_BITS=32". Because when you are setting it explicitly, you probably know what you are doing) It really depends on whether we officially support running 32 bit guests on 64 bit hosts. Qemu has nothing to do with it. Since the majority of our build farm runs qemu transparent emulation, and since a majority of our ARM packages are built using it, it seems that we decided to support 32 bit guests on 64 bit hosts. Well, then something like this patch has to be done, otherwise we have Schrödinger's build system--and no one likes that. (I want to stress that the main problem is the case where readdir SUCCEEDS even though there is a problem; if readdir would always fail in the situation with a 32 bit guest on 64 bit host, that would be much less bad. But it doesn't always fail!) I'd be really uneasy if the sanity check this patch introduces were only enabled when explicitly setting a flag--for something as bad as this. Especially for an error that silently makes it into base derivations and then distributes all the way to who-knows-what normal client derivations. Keeping the user from shooting himself in the foot by default if he doesn't say "yes, please shoot me in the foot" beforehand is not "user-hostile". We could also do a deprecation warning instead: struct dirent* readdir(DIR* d) __attribute__((deprecated("Unsafe if not using large file support"))); f1.c:5:2: warning: 'readdir' is deprecated: Unsafe if not using large file support [-Wdeprecated-declarations] But then the warning would not be emitted in the, for example, json-c package, but in cmake. I would never have found the bug that way. I guess now we know, but still... only emitting a warning seems weak for something this bad. So I'm against it. > I'm more in favor of dropping it on the > build farm, >or just keep fixing things on a per-package basis. That means you have to *find* the things first. The only reason glibc found the problem this one time is because the d_off telldir pointer just happened to be >= 2**32. If it's not bigger one time, and is bigger another time in the same package build run, then you get weird unreproducible results. That is horrible. A copy-recursively could leave half the files off and not tell anyone. PLEASE let's do something about it. Also, let's delete all our ARM substitutes we have so far. > A less user-hostile solution could perhaps be to (setenv "CFLAGS" > "-D_FILE_OFFSET_BITS=64") on 32-bit architectures in gnu-build-system. Sure, we can definitely do that in addition. It's a good idea. There can be no ABI compatibility problem inside Guix if *all* Guix packages use that flag. While we are at it, we can do it on ALL architectures. Because that's what we actually want. Should there be a 128 bit CPU, we still want all CPUs (no matter how many bit registers the CPU has) use the same file offset bit size. But will that be picked up everywhere? cmake? Rust C extensions? Python extensions? > Not sure whether that could cause any adverse effects. It will. And if we want to find out where those effects would be, this patch--or one with warnings (that one is only useful if there is a way to automatically collect AND KEEP all warnings into one place)--is how to find out. > But again, I don't like the idea of optimizing for QEMUs user-mode emulation. It's not optimizing for qemu specifically. Qemu is not at fault here. (guest) glibc is at fault. Because of the way we build it. I've thought through a few possible fixes for this problem: (1) Wait until the kernel adds a "I-have-space-for-x-bits" field to the getdents64 system call and for qemu to actually use it. But how would qemu know how much space the guest glibc has in its field? But fine, let's say they have a crystal ball. Even then, what is ext4 going to do with this request? Just cut the cookie somewhere and throw away half? That's not safe. So this solution is right out. (2) Have glibc use getdents on 32 bit and getdents64 on 64 bit. See above--only now qemu can tell know much space the guest glibc has, and so can the host kernel (which is much better). Cutting the cookie is not safe. So this solution is still out, barely. (3) Have qemu maintain a mapping table per file of which 64-bit d_off is associated with which weird cookie qemu invents and remembers. That doesn't help us in the case where you DO NOT USE qemu but still have a 32 bit executable running on a 64 bit host (natively). Therefore, this solution is out. (4) Build user space with large file support. There are no downsides to this, and as a distribution we can indeed decide only to support large file guix packages. I argue that this is the right solution for all "Guix system" hosts, because of the "SD card" argument above. Also, directories with Guix on it aren't exactly small either. So you need it anyway. If there are embedded platforms with disks smaller than 4 GiB used in guix, we can given those glibcs an original dirent.h (they will fail loudly--so we will find them easily). I prefer this solution. (5) Build the qemu that emulates 32 bit ARM for 32 bit i686 (this is not a typo). In that case the kernel would enable compatibility mode (X86_32 personality) and it would work. I want to stress that this compatibility mode calculates different hashes depending on the program that is requesting it, on the same host, directory and entry. This is not something one would want, especially when building things--the kernel is basically lying to you. See kernel option CONFIG_X86_32--which we enable. It would work okay in practice, because user space programs usually don't store d_off. But (4) is just better. P.S. by now, my local build found a binutils 2.35.1 bfd plugin, and (libstdc++-v3/src/filesystem/dir.cc in) libstdc++ 7.5.0, that also don't use large files and thus would also only work sporadically in random ways. How many hundreds of packages does this affect that we don't know yet? PPS. I get a lot of "grep: invalid option -o" messages when building glibc-2.32--even without the patch. That's probably not good either. PPPS. new patch will come soon.