System test partition.img differs in size across hosts(?)

  • Done
  • quality assurance status badge
Details
5 participants
  • david larsson
  • Leo Famulari
  • Maxim Cournoyer
  • Tobias Geerinckx-Rice
  • Mathieu Othacehe
Owner
unassigned
Submitted by
Tobias Geerinckx-Rice
Severity
normal
T
T
Tobias Geerinckx-Rice wrote on 11 Jan 2022 20:31
(name . Bug reports for GNU Guix)(address . bug-guix@gnu.org)
874k6akqlx.fsf@nckx
Guix,

This is weird. On berlin:

Toggle snippet (8 lines)
$ guix build
/gnu/store/91wjmydy556ibl38xydpb8yisp3gvx8w-partition.img.drv
[…]
Creating filesystem with 351 1k blocks and 40 inodes
[…]
/gnu/store/q18ca3ilma0h5hpn4s39xhzn0kc7jm5x-partition.img

On my laptop:

Toggle snippet (13 lines)
$ guix build
/gnu/store/91wjmydy556ibl38xydpb8yisp3gvx8w-partition.img.drv
[…]
Creating filesystem with 242 1k blocks and 32 inodes
[…]
Copying files into the device: ext2fs_symlink: Could not allocate
inode in ext2 filesystem while creating symlink "system"
__populate_fs: Could not allocate inode in ext2 filesystem while
writing symlink"system"
mke2fs: Could not allocate inode in ext2 filesystem while
populating file system

This happens with both a tmpfs and a bcachefs /tmp.

The same make check-system TESTS="openvswitch" fails for Marius as
well, although I don't know the exact output. They tested btrfs
and tmpfs, and suggested a kernel regression.

I don't understand how that would cause this, but I'm forced to
agree: something spooky is going on in the chroot and the kernel
is a big variable.

The attached patch was written before I was aware of above
weirdness and only works around the issue.

Kind regards,

T G-R
From 18f288d4b69faa73ffb75488dbc924640441d7ee Mon Sep 17 00:00:00 2001
From: Tobias Geerinckx-Rice <me@tobias.gr>
Date: Tue, 11 Jan 2022 19:56:53 +0100
Subject: [PATCH] build: image: Account for fixed-size file system structures.

* gnu/build/image.scm (estimate-partition-size): Enforce a 1-MiB minimum.
---
gnu/build/image.scm | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

Toggle diff (28 lines)
diff --git a/gnu/build/image.scm b/gnu/build/image.scm
index bdd5ec25a9..81caa424f8 100644
--- a/gnu/build/image.scm
+++ b/gnu/build/image.scm
@@ -3,7 +3,7 @@
;;; Copyright © 2016 Christine Lemmer-Webber <cwebber@dustycloud.org>
;;; Copyright © 2016, 2017 Leo Famulari <leo@famulari.name>
;;; Copyright © 2017 Marius Bakke <mbakke@fastmail.com>
-;;; Copyright © 2020 Tobias Geerinckx-Rice <me@tobias.gr>
+;;; Copyright © 2020, 2022 Tobias Geerinckx-Rice <me@tobias.gr>
;;; Copyright © 2020 Mathieu Othacehe <m.othacehe@gmail.com>
;;;
;;; This file is part of GNU Guix.
@@ -62,8 +62,10 @@ (define (size-in-kib size)
(define (estimate-partition-size root)
"Given the ROOT directory, evaluate and return its size. As this doesn't
-take the partition metadata size into account, take a 25% margin."
- (* 1.25 (file-size root)))
+take the partition metadata size into account, take a 25% margin. As this in
+turn doesn't take any constant overhead into account, force a 1-MiB minimum."
+ (max (ash 1 20)
+ (* 1.25 (file-size root))))
(define* (make-ext-image partition target root
#:key
--
2.34.0
-----BEGIN PGP SIGNATURE-----

iIMEARYKACsWIQT12iAyS4c9C3o4dnINsP+IT1VteQUCYd3dSg0cbWVAdG9iaWFz
LmdyAAoJEA2w/4hPVW15ZlsA+wXVpEYWsFN1dr6+JG7ORtm3P8snCJRGFG9woL+w
nDeQAQDTyXIUvFylXSHtRC4soI6fyh7A4ImBKKMfvzrOzTsmAw==
=A7Ap
-----END PGP SIGNATURE-----

T
T
Tobias Geerinckx-Rice wrote on 11 Jan 2022 20:44
87zgo2jbr1.fsf@nckx
The most likely culprit is a change or difference in how the
kernel answers FILE-SIZE's ‘how much disc space does FILE
consume?’ — rounding it to N blocks or bytes, including or
excluding directory sizes, differing reported directory sizes,
etc.

I'll do more testing.

Kind regards,

T G-R
-----BEGIN PGP SIGNATURE-----

iIMEARYKACsWIQT12iAyS4c9C3o4dnINsP+IT1VteQUCYd3ewg0cbWVAdG9iaWFz
LmdyAAoJEA2w/4hPVW15sJEA/jjkAnpHLQ1QP0BrCsYY5odeA4T79DyHrKjmauBy
+uFoAQDd/Mtj3rleq6mihEDNdAVa1/tIVHJQ7pZKEW+7BDCKCA==
=E0rL
-----END PGP SIGNATURE-----

M
M
Maxim Cournoyer wrote on 25 Jan 2022 18:54
(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)(address . 53194@debbugs.gnu.org)
87r18v4s5x.fsf@gmail.com
Hi Tobias,

[...]

Toggle quote (27 lines)
> diff --git a/gnu/build/image.scm b/gnu/build/image.scm
> index bdd5ec25a9..81caa424f8 100644
> --- a/gnu/build/image.scm
> +++ b/gnu/build/image.scm
> @@ -3,7 +3,7 @@
> ;;; Copyright © 2016 Christine Lemmer-Webber <cwebber@dustycloud.org>
> ;;; Copyright © 2016, 2017 Leo Famulari <leo@famulari.name>
> ;;; Copyright © 2017 Marius Bakke <mbakke@fastmail.com>
> -;;; Copyright © 2020 Tobias Geerinckx-Rice <me@tobias.gr>
> +;;; Copyright © 2020, 2022 Tobias Geerinckx-Rice <me@tobias.gr>
> ;;; Copyright © 2020 Mathieu Othacehe <m.othacehe@gmail.com>
> ;;;
> ;;; This file is part of GNU Guix.
> @@ -62,8 +62,10 @@ (define (size-in-kib size)
>
> (define (estimate-partition-size root)
> "Given the ROOT directory, evaluate and return its size. As this doesn't
> -take the partition metadata size into account, take a 25% margin."
> - (* 1.25 (file-size root)))
> +take the partition metadata size into account, take a 25% margin. As this in
> +turn doesn't take any constant overhead into account, force a 1-MiB minimum."
> + (max (ash 1 20)
> + (* 1.25 (file-size root))))
>
> (define* (make-ext-image partition target root
> #:key

Looks reasonable to me (although it is interesting that the behavior is
not the same across machines...).

While at it, you may want to fix this docstring:

Toggle snippet (8 lines)
(define (file-size file)
- "Return the size of bytes of FILE, entering it if FILE is a directory."
+ "Return the size in bytes of FILE, entering it if FILE is a directory."
(file-system-fold (const #t)
(lambda (file stat result) ;leaf
(+ (stat:size stat) result))

in guix/build/store-copy.scm.

Thanks!

Maxim
L
L
Leo Famulari wrote on 4 Feb 2022 05:43
(name . Tobias Geerinckx-Rice via Bug reports for GNU Guix)(address . bug-guix@gnu.org)(address . 53194@debbugs.gnu.org)
Yfyu/uHQun5pevhM@jasmine.lan
On Tue, Jan 11, 2022 at 08:31:27PM +0100, Tobias Geerinckx-Rice via Bug reports for GNU Guix wrote:
Toggle quote (15 lines)
> On my laptop:
>
> --8<---------------cut here---------------start------------->8---
> $ guix build /gnu/store/91wjmydy556ibl38xydpb8yisp3gvx8w-partition.img.drv
> […]
> Creating filesystem with 242 1k blocks and 32 inodes
> […]
> Copying files into the device: ext2fs_symlink: Could not allocate inode in
> ext2 filesystem while creating symlink "system"
> __populate_fs: Could not allocate inode in ext2 filesystem while writing
> symlink"system"
> mke2fs: Could not allocate inode in ext2 filesystem while populating file
> system
> --8<---------------cut here---------------end--------------->8---

Same here.

Toggle quote (2 lines)
> This happens with both a tmpfs and a bcachefs /tmp.

And also on btrfs.
L
L
Leo Famulari wrote on 4 Feb 2022 06:17
(name . Tobias Geerinckx-Rice via Bug reports for GNU Guix)(address . bug-guix@gnu.org)(address . 53194@debbugs.gnu.org)
Yfy21H105IgGDlbk@jasmine.lan
On Tue, Jan 11, 2022 at 08:31:27PM +0100, Tobias Geerinckx-Rice via Bug reports for GNU Guix wrote:
Toggle quote (2 lines)
> This is weird. On berlin:

Berlin is using ext4, right?

Toggle quote (2 lines)
> On my laptop:
> --8<---------------cut here---------------start------------->8---
[...]
Toggle quote (6 lines)
> mke2fs: Could not allocate inode in ext2 filesystem while populating file
> system
> --8<---------------cut here---------------end--------------->8---
>
> This happens with both a tmpfs and a bcachefs /tmp.

And it fails for me on btrfs, but not on ext4.

I tested with Guix kernels 5.16.5, 5.15.17, and 5.15.15, as well as
Debian's 5.10.0-11-amd64.
L
L
Leo Famulari wrote on 4 Feb 2022 06:23
(name . Tobias Geerinckx-Rice via Bug reports for GNU Guix)(address . bug-guix@gnu.org)(address . 53194@debbugs.gnu.org)
Yfy4UlFIFVe/eFuK@jasmine.lan
On Tue, Jan 11, 2022 at 08:44:11PM +0100, Tobias Geerinckx-Rice via Bug reports for GNU Guix wrote:
Toggle quote (5 lines)
> The most likely culprit is a change or difference in how the kernel answers
> FILE-SIZE's ‘how much disc space does FILE consume?’ — rounding it to N
> blocks or bytes, including or excluding directory sizes, differing reported
> directory sizes, etc.

I'm going to build the version of the kernel used on berlin and test
with that.

I do find myself wondering if something in Guix is measuring the wrong
thing: maybe we are measuring the size of files compressed in transit,
rather than their uncompressed size on disk. Or something like that.
L
L
Leo Famulari wrote on 4 Feb 2022 06:32
(name . Tobias Geerinckx-Rice via Bug reports for GNU Guix)(address . bug-guix@gnu.org)(address . 53194@debbugs.gnu.org)
Yfy6VPTWFWaDVMqd@jasmine.lan
On Fri, Feb 04, 2022 at 12:23:30AM -0500, Leo Famulari wrote:
Toggle quote (3 lines)
> I'm going to build the version of the kernel used on berlin and test
> with that.

Actually, I already had it built. This bug still manifests on that version
of the kernel. So...

Toggle quote (4 lines)
> I do find myself wondering if something in Guix is measuring the wrong
> thing: maybe we are measuring the size of files compressed in transit,
> rather than their uncompressed size on disk. Or something like that.

I'm still leaning towards something besides a change in the kernel.
L
L
Leo Famulari wrote on 4 Feb 2022 17:55
(name . Tobias Geerinckx-Rice via Bug reports for GNU Guix)(address . bug-guix@gnu.org)
Yf1agsMoPFgOzUxZ@jasmine.lan
On Fri, Feb 04, 2022 at 12:32:04AM -0500, Leo Famulari wrote:
Toggle quote (2 lines)
> I'm still leaning towards something besides a change in the kernel.

Using bisection of the Guix Git repo, it seems the problem was
introduced in commit 2d12ec724ea2, "scripts: system: Rationalize
persistency."
L
L
Leo Famulari wrote on 4 Feb 2022 18:04
(no subject)
(name . GNU bug tracker automated control server)(address . control@debbugs.gnu.org)
Yf1crbaC6auUKuqy@jasmine.lan
block 53214 with 53194
M
M
Maxim Cournoyer wrote on 6 Feb 2022 05:42
Re: bug#53194: System test partition.img differs in size across hosts(?)
(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)(address . 53194@debbugs.gnu.org)
87fsow38sh.fsf@gmail.com
Hello,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (45 lines)
> Hi Tobias,
>
> [...]
>
>> diff --git a/gnu/build/image.scm b/gnu/build/image.scm
>> index bdd5ec25a9..81caa424f8 100644
>> --- a/gnu/build/image.scm
>> +++ b/gnu/build/image.scm
>> @@ -3,7 +3,7 @@
>> ;;; Copyright © 2016 Christine Lemmer-Webber <cwebber@dustycloud.org>
>> ;;; Copyright © 2016, 2017 Leo Famulari <leo@famulari.name>
>> ;;; Copyright © 2017 Marius Bakke <mbakke@fastmail.com>
>> -;;; Copyright © 2020 Tobias Geerinckx-Rice <me@tobias.gr>
>> +;;; Copyright © 2020, 2022 Tobias Geerinckx-Rice <me@tobias.gr>
>> ;;; Copyright © 2020 Mathieu Othacehe <m.othacehe@gmail.com>
>> ;;;
>> ;;; This file is part of GNU Guix.
>> @@ -62,8 +62,10 @@ (define (size-in-kib size)
>>
>> (define (estimate-partition-size root)
>> "Given the ROOT directory, evaluate and return its size. As this doesn't
>> -take the partition metadata size into account, take a 25% margin."
>> - (* 1.25 (file-size root)))
>> +take the partition metadata size into account, take a 25% margin. As this in
>> +turn doesn't take any constant overhead into account, force a 1-MiB minimum."
>> + (max (ash 1 20)
>> + (* 1.25 (file-size root))))
>>
>> (define* (make-ext-image partition target root
>> #:key
>
> Looks reasonable to me (although it is interesting that the behavior is
> not the same across machines...).
>
> While at it, you may want to fix this docstring:
>
> (define (file-size file)
> - "Return the size of bytes of FILE, entering it if FILE is a directory."
> + "Return the size in bytes of FILE, entering it if FILE is a directory."
> (file-system-fold (const #t)
> (lambda (file stat result) ;leaf
> (+ (stat:size stat) result))
>
> in guix/build/store-copy.scm.

FYI, I pushed this workaround in
3c3c9d259f87fbc8c1d9551af32e79f9f168f596.

Thanks,

Maxim
L
L
Leo Famulari wrote on 6 Feb 2022 18:41
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
YgAIWLZD4T85IDo/@jasmine.lan
On Sat, Feb 05, 2022 at 11:42:38PM -0500, Maxim Cournoyer wrote:
Toggle quote (3 lines)
> FYI, I pushed this workaround in
> 3c3c9d259f87fbc8c1d9551af32e79f9f168f596.

I don't see this commit in the repo.
M
M
Maxim Cournoyer wrote on 7 Feb 2022 22:29
(name . Leo Famulari)(address . leo@famulari.name)
87zgn2z7p1.fsf@gmail.com
Hi Leo!

Leo Famulari <leo@famulari.name> writes:

Toggle quote (6 lines)
> On Sat, Feb 05, 2022 at 11:42:38PM -0500, Maxim Cournoyer wrote:
>> FYI, I pushed this workaround in
>> 3c3c9d259f87fbc8c1d9551af32e79f9f168f596.
>
> I don't see this commit in the repo.

Thank you for letting me know. I hate when this happens; usually the
'make authenticate' fails in my Emacs env because it doesn't run in a
'guix shell -D guix' environment and 'make authenticate' fails due to a
missing dependency, failing the git push.

Anyway, now pushed the linux-libre series for real (which included
this), as e5c06dce93.

Thanks!

Maxim
D
D
david larsson wrote on 17 Feb 2022 17:37
(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)
686be3f2c1cf4f7251533c521fc7bfa4@selfhosted.xyz
On 2022-01-11 20:31, Tobias Geerinckx-Rice via Bug reports for GNU Guix
wrote:
Toggle quote (46 lines)
> Guix,
>
> This is weird. On berlin:
>
> --8<---------------cut here---------------start------------->8---
> $ guix build
> /gnu/store/91wjmydy556ibl38xydpb8yisp3gvx8w-partition.img.drv
> […]
> Creating filesystem with 351 1k blocks and 40 inodes
> […]
> /gnu/store/q18ca3ilma0h5hpn4s39xhzn0kc7jm5x-partition.img
> --8<---------------cut here---------------end--------------->8---
>
> On my laptop:
>
> --8<---------------cut here---------------start------------->8---
> $ guix build
> /gnu/store/91wjmydy556ibl38xydpb8yisp3gvx8w-partition.img.drv
> […]
> Creating filesystem with 242 1k blocks and 32 inodes
> […]
> Copying files into the device: ext2fs_symlink: Could not allocate
> inode in ext2 filesystem while creating symlink "system"
> __populate_fs: Could not allocate inode in ext2 filesystem while
> writing symlink"system"
> mke2fs: Could not allocate inode in ext2 filesystem while populating
> file system
> --8<---------------cut here---------------end--------------->8---
>
> This happens with both a tmpfs and a bcachefs /tmp.
>
> The same make check-system TESTS="openvswitch" fails for Marius as
> well, although I don't know the exact output. They tested btrfs and
> tmpfs, and suggested a kernel regression.
>
> I don't understand how that would cause this, but I'm forced to agree:
> something spooky is going on in the chroot and the kernel is a big
> variable.
>
> The attached patch was written before I was aware of above weirdness
> and only works around the issue.
>
> Kind regards,
>
> T G-R

I hope Im not totally off here, so Im just hoping this is worth
mentioning:
Are the hosts using the same version of
? It might produce different sizes if the hosts are on different guix
commits - or is this not a possibility at all if the derivations have
the same hashes?

...because I just happened to notice that recently the guix system image
command produces images that are exactly the additional size of the root
offset and the esp-partition compared to what's specified with the
--image-size option. I think this has changed from 1-2 years back (since
Marius B. blog post reg. Ganeti). I think so because when I set up
Ganeti according to that blog post I could (IIRC) create guix instances
with the ganeti-instance-guix create script without problem - and it
produces images with guix system image --image-size=X command - but when
I did so again 1-2 weeks ago they failed with the error that Ganeti
disks were too small. The size issue could be resolved by removing from
the instance create-script the exact number of bytes to the
--image-size=X option that corresponded to the root offset and the
esp-partition sizes as defined in (gnu system image).

Maybe some commit has changed the size output of guix system image?


Best regards,
David
M
M
Mathieu Othacehe wrote on 31 Oct 2022 09:56
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
87leowgyn1.fsf@gnu.org
Hello,

Toggle quote (3 lines)
> FYI, I pushed this workaround in
> 3c3c9d259f87fbc8c1d9551af32e79f9f168f596.

I'm not able to reproduce this issue with or without the workaround, by
running the openvswitch test on Berlin and on my laptop. I think we can
close it for now and re-open it if someone finds a more reliable
reproducer.

Thanks,

Mathieu
Closed
?
Your comment

This issue is archived.

To comment on this conversation send an email to 53194@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 53194
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch