Hi Leo, Stephen, and others,
I originally wrote a very detailed email describing my investigation of
this issue and the results. However, I accidentally deleted it, so
please bear with me as I write a rough summary instead.
Leo Famulari <leo@famulari.name> writes:
Toggle quote (22 lines)
>>From the discussion "Guix Docker image inflation" [0] on help-guix:
>
> On Fri, May 29, 2020 at 02:29:53PM -0400, Stephen Scheck wrote:
>> # Now try to delete it...
>> root@localhost /gnu/store# guix gc --delete
>> /gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules
>> finding garbage collector roots...
>> [0 MiB] deleting
>> '/gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules'
>> deleting `/gnu/store/trash'
>> deleting unused links...
>> note: currently hard linking saves 1181.36 MiB
>>
>> # Still there...
>> root@localhost /gnu/store# du -hs
>> /gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules
>> 210M /gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules
>
> Okay, something is definitely not right.
>
> [0] https://lists.gnu.org/archive/html/help-guix/2020-05/msg00235.html
There are two problems. One is that Stephen's images are getting larger
every day. The other is that Guix is failing to GC dead store items in
the Docker container. The latter does not cause the former.
The reason Guix is failing to GC dead items in the Docker container is
because those dead items are not on the "top layer", so Docker returns
an EXDEV error:
"Renaming directories: Calling rename(2) for a directory is allowed only
when both the source and the destination path are on the top
layer. Otherwise, it returns EXDEV error ('cross-device link not
permitted'). Your application needs to be designed to handle EXDEV and
fall back to a 'copy and unlink' strategy."
You can observe this by running guix-daemon with strace in the
container, and watching what happens when you try to delete one of the
offending store items (make sure it is a directory). For example:
Toggle snippet (3 lines)
685 rename("/gnu/store/xib50iqk3w1gw9l770mad59m9bi3bcpc-manual-database", "/gnu/store/trash/xib50iqk3w1gw9l770mad59m9bi3bcpc-manual-database") = -1 EXDEV (Invalid cross-device link)
In most cases, when guix-daemon GC's a dead directory, it does this
(see: nix/libstore/gc.cc):
- Create a trash directory (usually /gnu/store/trash)
- Move dead directories into the trash directory.
- Delete the trash directory.
The trash directory is on the "top layer" because it gets created in the
running container. However, in practice many store items from lower
layers are made dead when Stephen's script runs "guix pull" and deletes
the old profiles. If any of those store items were directories,
guix-daemon will fail to GC them because of an XDEV error. If this is
confusing to you, I suggest you experiment with Docker a little bit, and
look closely at the steps that Stephen's script is running. I outlined
this in the email I accidentally deleted, but I'm a little too tired to
reproduce it all a second time. I hope you'll understand.
Should Guix do anything about this? We could change guix-daemon to take
correct action in the face of an XDEV error. We could also improve the
logging, since currently it silently swallows the XDEV error.
However, even if we make those changes and Guix is able to GC the dead
store items, it would not prevent Stephen's images from growing in size
without bound. There would still be many store items that came from a
prior image (i.e., a lower layer), became dead after running "guix pull"
and deleting old profile generations, and still exist in the prior
layer, even though they are not visible in the running container. This
is due to Docker's design, in which the visible file system is the
result of stitching together all the layers with overlayfs.
To work around the issue, Stephen can build the images from the same
base image, rather than daisy-chaining new images from old ones. That
way, they would not accumulate layers without bound.
--
Chris