Store references inside compressed data

  • Open
  • quality assurance status badge
Details
5 participants
  • Leo Prikler
  • Leo Famulari
  • Ludovic Courtès
  • Tobias Geerinckx-Rice
  • Miguel Ángel Arruga Vivas
Owner
unassigned
Submitted by
Miguel Ángel Arruga Vivas
Severity
wishlist
M
M
Miguel Ángel Arruga Vivas wrote on 5 Jan 2021 15:36
(address . bug-guix@gnu.org)
8735zf30yw.fsf@gmail.com
There are several binary formats that allow compression of the
executable image, or some of its data, which is decompress at runtime:

- Kernel images.
- Compressed libraries: e.g. Smalltalk modules.
- Compressed executable or data files: e.g. library.el.gz.

These aren't taken into account by the grafting process, which may lead
to issues when store paths are located inside that kind of files.
L
L
Leo Famulari wrote on 5 Jan 2021 21:22
(name . Miguel Ángel Arruga Vivas)(address . rosen644835@gmail.com)(address . 45676@debbugs.gnu.org)
X/TKctxY2x8DIw8A@jasmine.lan
On Tue, Jan 05, 2021 at 03:36:07PM +0100, Miguel �ngel Arruga Vivas wrote:
Toggle quote (10 lines)
> There are several binary formats that allow compression of the
> executable image, or some of its data, which is decompress at runtime:
>
> - Kernel images.
> - Compressed libraries: e.g. Smalltalk modules.
> - Compressed executable or data files: e.g. library.el.gz.
>
> These aren't taken into account by the grafting process, which may lead
> to issues when store paths are located inside that kind of files.

It's a serious problem, and not just because of grafting. These obscured
references can cause things to be garbage collected inappropriately.

Here is an older case of the same problem:


It was resolved by patching GCC.
L
L
Leo Famulari wrote on 5 Jan 2021 21:22
(name . Miguel Ángel Arruga Vivas)(address . rosen644835@gmail.com)(address . 45676@debbugs.gnu.org)
X/TKic4MWamkuxHa@jasmine.lan
On Tue, Jan 05, 2021 at 03:36:07PM +0100, Miguel �ngel Arruga Vivas wrote:
Toggle quote (10 lines)
> There are several binary formats that allow compression of the
> executable image, or some of its data, which is decompress at runtime:
>
> - Kernel images.
> - Compressed libraries: e.g. Smalltalk modules.
> - Compressed executable or data files: e.g. library.el.gz.
>
> These aren't taken into account by the grafting process, which may lead
> to issues when store paths are located inside that kind of files.

If you have specific instances of this type of bug, please report them.
T
T
Tobias Geerinckx-Rice wrote on 5 Jan 2021 23:33
(name . Miguel Ángel Arruga Vivas)(address . rosen644835@gmail.com)
871rezf1yg.fsf@nckx
Hi!

Miguel Ángel Arruga Vivas wrote:
Toggle quote (5 lines)
> These aren't taken into account by the grafting process, which
> may lead
> to issues when store paths are located inside that kind of
> files.

It's true. It's a known trade-off of an otherwise
almost-zero-effort yet fast reference scanner. I don't think it's
a bug per se, but it is something of which to be aware. I also
think this trade-off is worth it.

Luckily, this case is easier to fix than the infamous
http://issues.guix.gnu.org/24703, because the right solution is
simple:

Toggle quote (3 lines)
> - Compressed libraries: e.g. Smalltalk modules.
> - Compressed executable or data files: e.g. library.el.gz.

Let's stop installing compressed executables & data files. We
already avoid compressed .jars and other renamed zip files. It
ain't right.

It's not 1998, my hard drive isn't 1.1GB, and I didn't just
reinstall Slackware because I ‘accidentally’ gzexe'd gzip.

Gzipping a tiny handful of Lisp or Smalltalk files is pointless
when zstd {,de}compresses my entire 500GB SSD better and faster,
at the file system level where it now squarely belongs. Without
breaking Guix.

Kind regards,

T G-R
-----BEGIN PGP SIGNATURE-----

iIMEARYKACsWIQT12iAyS4c9C3o4dnINsP+IT1VteQUCX/TpVw0cbWVAdG9iaWFz
LmdyAAoJEA2w/4hPVW15fqUA/AguLGDH8mb0d95sWnOYLYTkg0BwteblZfmqDZa4
GH6zAP4oG7O7QPczb4CXwElweEdbpsYuzZf2n3fc8mI7wWzsCw==
=92PV
-----END PGP SIGNATURE-----

L
L
Leo Prikler wrote on 6 Jan 2021 09:54
(address . 45676@debbugs.gnu.org)
06ba5c0f24bdcdb706990c9169093aba72463302.camel@student.tugraz.at
Hi!
Am Dienstag, den 05.01.2021, 23:33 +0100 schrieb Tobias Geerinckx-Rice:
Toggle quote (3 lines)
> Let's stop installing compressed executables & data files. We
> already avoid compressed .jars and other renamed zip files. It
> ain't right.
Would this be strictly necessary even if the same references are kept
through other files, e.g. uncompressed binaries?
I'll attach a patch, that fixes Emacs just in case.

Regards, Leo
From 57c23bf6ecac79c397cb49ff251176ec3a7b1cf5 Mon Sep 17 00:00:00 2001
From: Leo Prikler <leo.prikler@student.tugraz.at>
Date: Wed, 6 Jan 2021 09:24:07 +0100
Subject: [PATCH] gnu: emacs: Don't install compressed archives.


* gnu/packages/emacs.scm (emacs)[#:configure-flags]:
Add --without-compress-install.
(emacs-minimal)[#:configure-flags]: Likewise.
---
gnu/packages/emacs.scm | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

Toggle diff (24 lines)
diff --git a/gnu/packages/emacs.scm b/gnu/packages/emacs.scm
index ca14584ada..aa636b8c9b 100644
--- a/gnu/packages/emacs.scm
+++ b/gnu/packages/emacs.scm
@@ -124,6 +124,7 @@
`(#:tests? #f ; no check target
#:configure-flags (list "--with-modules"
"--with-cairo"
+ "--without-compress-install"
"--disable-build-details")
#:phases
(modify-phases %standard-phases
@@ -355,7 +356,8 @@ also enabled and works without glitches even on X server."))))
(arguments
(substitute-keyword-arguments (package-arguments emacs)
((#:configure-flags flags ''())
- `(list "--with-gnutls=no" "--disable-build-details"))
+ `(list "--with-gnutls=no" "--disable-build-details"
+ "--without-compress-install"))
((#:phases phases)
`(modify-phases ,phases
(delete 'restore-emacs-pdmp)
--
2.30.0
L
L
Ludovic Courtès wrote on 6 Jan 2021 12:35
(name . Leo Famulari)(address . leo@famulari.name)
87eeiymh6h.fsf@gnu.org
Hi,

Leo Famulari <leo@famulari.name> skribis:

Toggle quote (13 lines)
> On Tue, Jan 05, 2021 at 03:36:07PM +0100, Miguel Ángel Arruga Vivas wrote:
>> There are several binary formats that allow compression of the
>> executable image, or some of its data, which is decompress at runtime:
>>
>> - Kernel images.
>> - Compressed libraries: e.g. Smalltalk modules.
>> - Compressed executable or data files: e.g. library.el.gz.
>>
>> These aren't taken into account by the grafting process, which may lead
>> to issues when store paths are located inside that kind of files.
>
> If you have specific instances of this type of bug, please report them.

Agreed. The general issue is “well known” as we say, but what I think
we need to do is look for specific instances and address them.

Ludo’.
M
M
Miguel Ángel Arruga Vivas wrote on 6 Jan 2021 16:03
control message for bug #45676
(address . control@debbugs.gnu.org)
87y2h6151r.fsf@gmail.com
severity 45676 wishlist
quit
M
M
Miguel Ángel Arruga Vivas wrote on 6 Jan 2021 17:57
Re: bug#45676: Store references inside compressed data
(name . Ludovic Courtès)(address . ludo@gnu.org)
87mtxm0zqw.fsf@gmail.com
Hi Ludo and Leo,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (20 lines)
> Hi,
>
> Leo Famulari <leo@famulari.name> skribis:
>
>> On Tue, Jan 05, 2021 at 03:36:07PM +0100, Miguel Ángel Arruga Vivas wrote:
>>> There are several binary formats that allow compression of the
>>> executable image, or some of its data, which is decompress at runtime:
>>>
>>> - Kernel images.
>>> - Compressed libraries: e.g. Smalltalk modules.
>>> - Compressed executable or data files: e.g. library.el.gz.
>>>
>>> These aren't taken into account by the grafting process, which may lead
>>> to issues when store paths are located inside that kind of files.
>>
>> If you have specific instances of this type of bug, please report them.
>
> Agreed. The general issue is “well known” as we say, but what I think
> we need to do is look for specific instances and address them.

It can be tagged it notabug if you consider so. I've tagged it as
wishlist (I should have been done it before) for that reason (it's "well
known"), but I haven't found any specific instance yet. OTOH, I think
it might be closely related to #33848, as the solution for both issues
could be solved by the extension on the dumpPath code path---or the
Scheme implementation equivalent, as pointed there.

Happy hacking!
Miguel
M
M
Miguel Ángel Arruga Vivas wrote on 6 Jan 2021 19:40
(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)(address . 45676@debbugs.gnu.org)
87h7nt29jc.fsf@gmail.com
Hi!

Tobias Geerinckx-Rice <me@tobias.gr> writes:

Toggle quote (7 lines)
> It's true. It's a known trade-off of an otherwise almost-zero-effort
> yet fast reference scanner. I don't think it's a bug per se, but it
> is something of which to be aware.
>
> Let's stop installing compressed executables & data files. We already
> avoid compressed .jars and other renamed zip files.

This is the current trade-off between build time and closure size for
executable code, but it isn't the current status regarding data files.

Toggle quote (4 lines)
> Gzipping a tiny handful of Lisp or Smalltalk files is pointless when
> zstd {,de}compresses my entire 500GB SSD better and faster, at the
> file system level where it now squarely belongs.

Not every system has a file system with compression, nor most of us
mortals have a SSD to test that. ;-)

Toggle quote (2 lines)
> Without breaking Guix.

Software bugs are related to the number of lines, and this probably
would end up adding more, so I get that idea, hehe. :-P

With your proposal closures wouldn't benefit from the "standard tricks"
used by package maintainers to reduce their footprint for uncompressed
file systems. Having an option to remove that compression seems best
for treating it at the file system level---perhaps only some wrappers
for the compression tools to use always -0 could do most of the
trick---but I'd still like to have the option of paying at build/graft
time the storage savings. Of course, this is still only a wish.

Happy hacking!
Miguel
L
L
Ludovic Courtès wrote on 7 Jan 2021 12:05
(name . Miguel Ángel Arruga Vivas)(address . rosen644835@gmail.com)
871rexhurp.fsf@gnu.org
Howdy,

Miguel Ángel Arruga Vivas <rosen644835@gmail.com> skribis:

Toggle quote (29 lines)
> Ludovic Courtès <ludo@gnu.org> writes:
>
>> Hi,
>>
>> Leo Famulari <leo@famulari.name> skribis:
>>
>>> On Tue, Jan 05, 2021 at 03:36:07PM +0100, Miguel Ángel Arruga Vivas wrote:
>>>> There are several binary formats that allow compression of the
>>>> executable image, or some of its data, which is decompress at runtime:
>>>>
>>>> - Kernel images.
>>>> - Compressed libraries: e.g. Smalltalk modules.
>>>> - Compressed executable or data files: e.g. library.el.gz.
>>>>
>>>> These aren't taken into account by the grafting process, which may lead
>>>> to issues when store paths are located inside that kind of files.
>>>
>>> If you have specific instances of this type of bug, please report them.
>>
>> Agreed. The general issue is “well known” as we say, but what I think
>> we need to do is look for specific instances and address them.
>
> It can be tagged it notabug if you consider so. I've tagged it as
> wishlist (I should have been done it before) for that reason (it's "well
> known"), but I haven't found any specific instance yet. OTOH, I think
> it might be closely related to #33848, as the solution for both issues
> could be solved by the extension on the dumpPath code path---or the
> Scheme implementation equivalent, as pointed there.

Yes, though I’d prefer simple workarounds if possible—after all, we’ve
lived with it since the beginning and there’s only ever been a handful
of instances of that problem (one of them was really tricky, see
‘gcc-strmov-store-file-names.patch’…).

Ludo’.
L
L
Ludovic Courtès wrote on 14 Jan 2021 22:31
(name . Leo Prikler)(address . leo.prikler@student.tugraz.at)
87turj19zh.fsf@gnu.org
Hi Leo,

Leo Prikler <leo.prikler@student.tugraz.at> skribis:

Toggle quote (7 lines)
> From 57c23bf6ecac79c397cb49ff251176ec3a7b1cf5 Mon Sep 17 00:00:00 2001
> From: Leo Prikler <leo.prikler@student.tugraz.at>
> Date: Wed, 6 Jan 2021 09:24:07 +0100
> Subject: [PATCH] gnu: emacs: Don't install compressed archives.
>
> See <http://issues.guix.gnu.org/45676#3>.

Perhaps make it a comment next to the option.

Toggle quote (4 lines)
> * gnu/packages/emacs.scm (emacs)[#:configure-flags]:
> Add --without-compress-install.
> (emacs-minimal)[#:configure-flags]: Likewise.

[...]

Toggle quote (2 lines)
> + "--without-compress-install"

Does that disable .el file compression altogether for Emacs’ own files?

If so, isn’t it too much? Do these file currently contain store file
names?

(I know EMMS .el files for instance are full of store file names, so
that one should definitely not be gzipped, but Emacs itself may be
fine?)

Ludo’.
L
L
Leo Prikler wrote on 14 Jan 2021 23:24
(name . Ludovic Courtès)(address . ludo@gnu.org)
98965ea00ce05ba435535e4347dfdfc409c10ec2.camel@student.tugraz.at
Hi Ludo,

Am Donnerstag, den 14.01.2021, 22:31 +0100 schrieb Ludovic Courtès:
Toggle quote (13 lines)
> Hi Leo,
>
> Leo Prikler <leo.prikler@student.tugraz.at> skribis:
>
> > From 57c23bf6ecac79c397cb49ff251176ec3a7b1cf5 Mon Sep 17 00:00:00
> > 2001
> > From: Leo Prikler <leo.prikler@student.tugraz.at>
> > Date: Wed, 6 Jan 2021 09:24:07 +0100
> > Subject: [PATCH] gnu: emacs: Don't install compressed archives.
> >
> > See <http://issues.guix.gnu.org/45676#3>;.
>
> Perhaps make it a comment next to the option.
I'll keep that in mind, but I wasn't going to commit this unless it is
absolutely needed.

Toggle quote (17 lines)
> > * gnu/packages/emacs.scm (emacs)[#:configure-flags]:
> > Add --without-compress-install.
> > (emacs-minimal)[#:configure-flags]: Likewise.
>
> [...]
>
> > + "--without-compress-install"
>
> Does that disable .el file compression altogether for Emacs’ own
> files?
>
> If so, isn’t it too much? Do these file currently contain store file
> names?
>
> (I know EMMS .el files for instance are full of store file names, so
> that one should definitely not be gzipped, but Emacs itself may be
> fine?)
As far as I know, this is an all or nothing deal. If I'm not mistaken,
however, all those references should still exist in the compiled (and
not compressed) .go files however, hence it making little difference.
Perhaps time stamps could be added during compression, but I think our
Emacs reproducibility issues lie elsewhere as well.

All in all, I don't think there's a technical reason to do this (yet),
merely the somewhat purist stance of "no compressed source files".

Regards,
Leo
?