Deleted store items are not actually deleted

  • Done
  • quality assurance status badge
Details
6 participants
  • Chris Marusich
  • Leo Famulari
  • Ludovic Courtès
  • Stephen Scheck
  • Vincent Legoll
  • zimoun
Owner
unassigned
Submitted by
Leo Famulari
Severity
normal
L
L
Leo Famulari wrote on 29 May 2020 21:09
(name . Stephen Scheck)(address . singularsyntax@gmail.com)(address . bug-guix@gnu.org)
20200529190942.GA8440@jasmine.lan
From the discussion "Guix Docker image inflation" [0] on help-guix:

On Fri, May 29, 2020 at 02:29:53PM -0400, Stephen Scheck wrote:
Toggle quote (15 lines)
> # Now try to delete it...
> root@localhost /gnu/store# guix gc --delete
> /gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules
> finding garbage collector roots...
> [0 MiB] deleting
> '/gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules'
> deleting `/gnu/store/trash'
> deleting unused links...
> note: currently hard linking saves 1181.36 MiB
>
> # Still there...
> root@localhost /gnu/store# du -hs
> /gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules
> 210M /gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules

Okay, something is definitely not right.

Z
Z
zimoun wrote on 30 May 2020 01:36
CAJ3okZ2Am=vkK2QJi5o5Z2LHkgDyaR9uC3tFWzMq=iyLOY-5zg@mail.gmail.com
-help-guix

On Sat, 30 May 2020 at 00:11, Stephen Scheck <singularsyntax@gmail.com> wrote:
Toggle quote (6 lines)
>> Do you have '/var/' in your Docker image? Because it looks like the same than:

> If you'd like, you can fetch the exact same image and look around yourself:
>
> docker pull singularsyntax/guix:master-a5374cd # same as singularsyntax/guix:latest

Thanks!
I have already did with singularsyntax/guix-bootstrap-alpine but then
I hit some issues... Well another story. :-)

Why there is (in both images) 13 "layers"?


Toggle quote (3 lines)
> CONTAINER=`docker run --detach --tty --privileged singularsyntax/guix:master-a5374cd`
> docker exec --interactive --tty $CONTAINER /run/current-system/profile/bin/bash --login

Toggle snippet (22 lines)
for f in $(guix gc --list-dead); do du -sh $f; done | sort -h
[...]
103M /gnu/store/flcfw8pcs2s0wnm78lp1jwid5lglky3j-guix-packages-base
103M /gnu/store/hl8h9krwiqwn51gydc13jz4846gq55pr-guix-packages-base
106M /gnu/store/mnvkhj7crqcd0d1xa0zqa3hd1f5hmv83-guix-packages-base
107M /gnu/store/6sggbpgg0zkbgxwf3wa2j15dis8z7cr1-guix-packages-base
107M /gnu/store/m0fv2xmfif5pxnfb1bscfvgyfx0x6xdc-guix-packages-base
107M /gnu/store/n339sr8c63f0nzja6yl8zfwy1jklj19j-guix-packages-base
107M /gnu/store/plaay02w581vx9ilyiv93sl1lw54n7h5-guix-packages-base
205M /gnu/store/5mhn1ynxvy7jihsknsnv3yspkkvc0r5s-guix-2e59ae238-modules
205M /gnu/store/8z9qc2bvq8azc08p4miq77yf2agk07aq-guix-843e77205-modules
205M /gnu/store/brbwlbnx56ms50kklyqk9fsf0xkwjjf9-guix-498e2e669-modules
205M /gnu/store/d3h4b7nvnms8d03ddi9b481dlxpykl7l-guix-5e3d16994-modules
205M /gnu/store/ibgjq1ampj8bldrabbsnwik2sr0gg3as-guix-a43fe7acd-modules
205M /gnu/store/l3amdz5xyhflg5wdzlxr2685dq5glic2-guix-527ab3125-modules
210M /gnu/store/0vwg9aqzs5xrk10vcs4dl105s3f42ilf-guix-b1affd477-modules
210M /gnu/store/47aack48aczpzm635axsy4jf2pvmwrv0-guix-ef1d475b0-modules
210M /gnu/store/hz2rn2l0jixg91q4rsdcwc489y71ll29-guix-05e1edf22-modules
210M /gnu/store/mych9fchln22pbhpc5syxyymx4hz496y-guix-8bd0b533b-modules
210M /gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules

Well, there is 11 generations which corresponds to

Toggle snippet (9 lines)
guix describe
Generation 12 May 29 2020 22:51:28 (current)
guix a5374cd
repository URL: https://git.savannah.gnu.org/git/guix.git
branch: master
commit: a5374cde918cfeae5c16b43b9f2dd2b24bc3564d


But "docker pull" downloaded 13 layers, well because these numbers are
closed, is it meaningful? Really naive question.
Other said, is it not something from the Docker side?

Well, it is like Guix deletes something but this something is empty
(from the Guix side):

Toggle snippet (12 lines)
guix gc -D /gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules
finding garbage collector roots...
[0 MiB] deleting
'/gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules'
deleting `/gnu/store/trash'
deleting unused links...
note: currently hard linking saves 1181.36 MiB

du -sh /gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules
210M /gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules

abd the "real" target is still something on the filesytem side (du -sh).


Bah, I am not enough familiar with Docker...


Good night,
simon
C
C
Chris Marusich wrote on 31 May 2020 06:56
(name . Leo Famulari)(address . leo@famulari.name)
87r1v0k8hl.fsf@gmail.com
Hi Leo, Stephen, and others,

I originally wrote a very detailed email describing my investigation of
this issue and the results. However, I accidentally deleted it, so
please bear with me as I write a rough summary instead.

Leo Famulari <leo@famulari.name> writes:

Toggle quote (22 lines)
>>From the discussion "Guix Docker image inflation" [0] on help-guix:
>
> On Fri, May 29, 2020 at 02:29:53PM -0400, Stephen Scheck wrote:
>> # Now try to delete it...
>> root@localhost /gnu/store# guix gc --delete
>> /gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules
>> finding garbage collector roots...
>> [0 MiB] deleting
>> '/gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules'
>> deleting `/gnu/store/trash'
>> deleting unused links...
>> note: currently hard linking saves 1181.36 MiB
>>
>> # Still there...
>> root@localhost /gnu/store# du -hs
>> /gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules
>> 210M /gnu/store/x7ns2xcp8lfg24zq7gr3y8ffczn1nsxp-guix-d79c917f2-modules
>
> Okay, something is definitely not right.
>
> [0] https://lists.gnu.org/archive/html/help-guix/2020-05/msg00235.html

There are two problems. One is that Stephen's images are getting larger
every day. The other is that Guix is failing to GC dead store items in
the Docker container. The latter does not cause the former.

The reason Guix is failing to GC dead items in the Docker container is
because those dead items are not on the "top layer", so Docker returns
an EXDEV error:


"Renaming directories: Calling rename(2) for a directory is allowed only
when both the source and the destination path are on the top
layer. Otherwise, it returns EXDEV error ('cross-device link not
permitted'). Your application needs to be designed to handle EXDEV and
fall back to a 'copy and unlink' strategy."

You can observe this by running guix-daemon with strace in the
container, and watching what happens when you try to delete one of the
offending store items (make sure it is a directory). For example:

Toggle snippet (3 lines)
685 rename("/gnu/store/xib50iqk3w1gw9l770mad59m9bi3bcpc-manual-database", "/gnu/store/trash/xib50iqk3w1gw9l770mad59m9bi3bcpc-manual-database") = -1 EXDEV (Invalid cross-device link)

In most cases, when guix-daemon GC's a dead directory, it does this
(see: nix/libstore/gc.cc):

- Create a trash directory (usually /gnu/store/trash)
- Move dead directories into the trash directory.
- Delete the trash directory.

The trash directory is on the "top layer" because it gets created in the
running container. However, in practice many store items from lower
layers are made dead when Stephen's script runs "guix pull" and deletes
the old profiles. If any of those store items were directories,
guix-daemon will fail to GC them because of an XDEV error. If this is
confusing to you, I suggest you experiment with Docker a little bit, and
look closely at the steps that Stephen's script is running. I outlined
this in the email I accidentally deleted, but I'm a little too tired to
reproduce it all a second time. I hope you'll understand.

Should Guix do anything about this? We could change guix-daemon to take
correct action in the face of an XDEV error. We could also improve the
logging, since currently it silently swallows the XDEV error.

However, even if we make those changes and Guix is able to GC the dead
store items, it would not prevent Stephen's images from growing in size
without bound. There would still be many store items that came from a
prior image (i.e., a lower layer), became dead after running "guix pull"
and deleting old profile generations, and still exist in the prior
layer, even though they are not visible in the running container. This
is due to Docker's design, in which the visible file system is the
result of stitching together all the layers with overlayfs.

To work around the issue, Stephen can build the images from the same
base image, rather than daisy-chaining new images from old ones. That
way, they would not accumulate layers without bound.

--
Chris
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEy/WXVcvn5+/vGD+x3UCaFdgiRp0FAl7TOQYACgkQ3UCaFdgi
Rp33Ug/9ECuQ7z0lIJi4RJrDuJynnVbEEXGRN0FLUP55HoWULXGvd/cWxKC1Cr5F
NRd598QP+40EYy5SVzNrxXUXTRqmQDSUYbIobUINSwcn+q/HjxwLbTnp6v2fijtM
aaSc2YoCFzZaPT6MUjb8CB2Spgf/DF9XbwBmpJZdTK84w0wyDb2KEWRrO1bxq7c2
9tdDv3jylah3+Z7k/VnzDyGvJxqrqSsUWdRvJy4QL9THXaU5EP4zRCt8eHGGgaRs
SeyjhbYGGCqh2emQOQptEw3AlFnginulO2P0g4dmRx4SaACMnfVfcopB1jGxPNHv
YQXEn2o1/AnFxYhUMzycrXhT2pgjHMVrLBWihaCdZxYP7SSyS1UgHZEUfPknYA7H
xhoybpHzVDgECiN3MID/s2mTeFZEofianvDy+Wt1/mSkWWwiRS9Eg75UWvWvrcAi
5VD/WhaOpPtdS+UuqM6geLmRRaA2G+aAU9vxtzdZ9uB12iM4Jio/vVWIbfiHofFg
acYkYIwibQAKj3K2mfbl/MGyTLL9zc0TIlmLVDOGx4wpOdNeczSkorf3n0Un61mS
1iuA2aThBVn/bdTQRzNEVy9nqN2nbe5oCXesv1NG76lANNC30c4iFLeqsPRipxAM
z+nFAWDAWJzviGglcQnyIy4S97T8n4Kq8RYY9AJrQEZhmT7pY24=
=dn7X
-----END PGP SIGNATURE-----

Z
Z
zimoun wrote on 31 May 2020 10:24
Re: Guix Docker image inflation
CAJ3okZ2PjxUPjdt4tChB=MRT2EvPb69QBOEXut1RSn3SGhU_3A@mail.gmail.com
Dear Stephen,

Follow ups of


On Sat, 30 May 2020 at 19:02, Stephen Scheck <singularsyntax@gmail.com> wrote:

Toggle quote (9 lines)
> You can convince yourself of this by doing something like the following:
>
> docker run <some-linux-image>
> docker exec <container-id> dd if=/dev/urandom of=/RANDOM-DATA
> bs=1048576 count=1024
> docker commit <container-id>
> docker exec <container-id> rm /RANDOM-DATA
> docker commit <container-id>

It does not convince myself and maybe I am doing wrongly but it is not
what I am observing for an example with more than 2 'commits'. Here
my session, based on your images rename "fresh" because it will happen
on any image.

Toggle snippet (75 lines)
$ docker images
REPOSITORY TAG IMAGE ID
CREATED SIZE
fresh latest c36cef8306d5
3 weeks ago 1.06GB
singularsyntax/guix-bootstrap 1.1.0-alpine-3.11 c36cef8306d5
3 weeks ago 1.06GB

$ CONTAINER=`docker run --detach --tty --privileged fresh`
$ docker exec --interactive --tty $CONTAINER /bin/sh
/ # dd if=/dev/urandom of=/DATA bs=1234567 count=1024
dd if=/dev/urandom of=/DATA bs=1234567 count=1024
1024+0 records in
1024+0 records out
/ # exit
exit

$ HASH=`docker commit $CONTAINER` && docker tag $HASH add-data
$ docker stop $CONTAINER && docker rm $CONTAINER
cb89992b76ace2afe5dc6e082c8de121c483dfeeb688d89849713e2cf90b68c7
cb89992b76ace2afe5dc6e082c8de121c483dfeeb688d89849713e2cf90b68c7

$ CONTAINER=`docker run --detach --tty --privileged add-data`
$ docker exec --interactive --tty $CONTAINER /bin/sh
/ # rm /DATA
rm /DATA
/ # dd if=/dev/urandom of=/OTHER bs=1234567 count=1024
dd if=/dev/urandom of=/OTHER bs=1234567 count=1024
1024+0 records in
1024+0 records out
/ # exit
exit

$ HASH=`docker commit $CONTAINER` && docker tag $HASH rm-data-add-other
$ docker stop $CONTAINER && docker rm $CONTAINER
93e9afe593226ec29669efe8515b47487f455d4ad5e012cc67372c2549ec7256
93e9afe593226ec29669efe8515b47487f455d4ad5e012cc67372c2549ec7256

$ CONTAINER=`docker run --detach --tty --privileged rm-data-add-other`
$ docker exec --interactive --tty $CONTAINER /bin/sh
/ # rm /OTHER
rm /OTHER
/ # exit
exit

$ HASH=`docker commit $CONTAINER` && docker tag $HASH rm-other

$ docker stop $CONTAINER && docker rm $CONTAINER
469b341c2f394ef05f5f43a5d96239853e3552d979028a457a9bdd1096a8fab4
469b341c2f394ef05f5f43a5d96239853e3552d979028a457a9bdd1096a8fab4

$ docker images
REPOSITORY TAG IMAGE ID
CREATED SIZE
rm-other latest b80d300aa755
23 seconds ago 3.59GB
rm-data-add-other latest de551eac1d55
About a minute ago 3.59GB
add-data latest 6a563daccccd
3 minutes ago 2.32GB
fresh latest c36cef8306d5
3 weeks ago 1.06GB
singularsyntax/guix-bootstrap 1.1.0-alpine-3.11 c36cef8306d5
3 weeks ago 1.06GB

$ CONTAINER=`docker run --detach --tty --privileged rm-other`
$ docker exec --interactive --tty $CONTAINER /bin/sh
/ # ls /
ls /
bin dev etc gnu home lib media mnt opt proc
root run sbin srv sys tmp usr var
/ # exit
exit

Toggle quote (4 lines)
> You'll end up with two new images - the first one should be about 1 GB
> larger than the base image,
> the second one the same size.

As you see, the image 'rm-other' does not container either /DATA or
/OTHER and its size is not the same than the initial one 'fresh'. So
I do not know if the correct Docker terminology is "layer" because the
issue is definitely on the Docker side and not on the Guix side.


Cheers,
simon
Z
Z
zimoun wrote on 31 May 2020 11:27
Re: bug#41607: Deleted store items are not actually deleted
(name . Chris Marusich)(address . cmmarusich@gmail.com)
CAJ3okZ0KjCUmc-C4dSkVJyGs6oyojYZL_aGFhnLsjqR-8X6PaA@mail.gmail.com
Hi Chris,

Thank you for the detailed explanations.

On Sun, 31 May 2020 at 06:57, Chris Marusich <cmmarusich@gmail.com> wrote:

Toggle quote (14 lines)
> The trash directory is on the "top layer" because it gets created in the
> running container. However, in practice many store items from lower
> layers are made dead when Stephen's script runs "guix pull" and deletes
> the old profiles. If any of those store items were directories,
> guix-daemon will fail to GC them because of an XDEV error. If this is
> confusing to you, I suggest you experiment with Docker a little bit, and
> look closely at the steps that Stephen's script is running. I outlined
> this in the email I accidentally deleted, but I'm a little too tired to
> reproduce it all a second time. I hope you'll understand.
>
> Should Guix do anything about this? We could change guix-daemon to take
> correct action in the face of an XDEV error. We could also improve the
> logging, since currently it silently swallows the XDEV error.

What is the correct action? Because somehow these items cannot be
removed of the Docker chained images.
However, the logging could report the ``error", e.g., "guix gc" lists
all the items and their size and currently it says "[0MiB] deleting"
so instead it could say "[XDEV error] not valid" with some words about
that in the manual, section "Invking guix gc". WDYT?



Cheers,
simon
V
V
Vincent Legoll wrote on 31 May 2020 12:50
Re: Guix Docker image inflation
(name . help-guix)(address . help-guix@gnu.org)
7a07e6d0-e45d-6c5f-f536-34a065100d8f@gmail.com
Hello,

maybe you can try:

docker export <CONTAINER ID> | docker import - img_name

This should flatten the layers back to a single one.

--
Vincent Legoll
Z
Z
zimoun wrote on 31 May 2020 19:58
(name . Vincent Legoll)(address . vincent.legoll@gmail.com)
CAJ3okZ2d-faGxfO5U9==CWaiXhrmtdkx9VPZFgnd8W_Rn_y3oQ@mail.gmail.com
Hi Vincent,

On Sun, 31 May 2020 at 12:50, Vincent Legoll <vincent.legoll@gmail.com> wrote:

Toggle quote (2 lines)
> docker export <CONTAINER ID> | docker import - img_name

I do not know if it really works here. Maybe I am doing incorrectly...

Toggle snippet (10 lines)
$ docker images --format "{{.Size}}\t{{.Repository}}"
959MB 4reexport
960MB 3clean
960MB 2remove-hello
959MB 1install-hello
578MB 0new-fresh
1.06GB fresh
1.06GB singularsyntax/guix-bootstrap

Well, and the interesting part is:

Toggle snippet (23 lines)
$ CONTAINER=`docker run --detach --tty --privileged 4reexport`
$ docker exec --interactive --tty $CONTAINER /bin/sh
/ # /root/.config/guix/current/bin/guix gc --list-live | grep hello
/root/.config/guix/current/bin/guix gc --list-live | grep hello
finding garbage collector roots...
determining live/dead paths...
/ # /root/.config/guix/current/bin/guix gc --list-dead | grep hello
/root/.config/guix/current/bin/guix gc --list-dead | grep hello
finding garbage collector roots...
determining live/dead paths...
/gnu/store/kg9mirg6xbvzcp0a98v7326n1nvvwgsj-hello-2.10
/ # /root/.config/guix/current/bin/guix gc --references
/gnu/store/kg9mirg6xbvzcp0a98v7326n1nvvwgsj-hello-2.10
/root/.config/guix/current/bin/guix gc --references
/gnu/store/kg9mirg6xbvzcp0a98v7326n1nvvwgsj-he
llo-2.10
guix gc: error: path
`/gnu/store/kg9mirg6xbvzcp0a98v7326n1nvvwgsj-hello-2.10' is not valid
/ # exit



Just for the record, the commands run:

Toggle snippet (35 lines)
$ CONTAINER=`docker run --detach --tty --privileged fresh`
$ CMD='CMD "/root/.config/guix/current/bin/guix-daemon"
"--build-users-group=guixbuild"'
$ docker export $CONTAINER \
| docker import --change $CMD - 0new-fresh

$ CONTAINER=`docker run --detach --tty --privileged 0new-fresh`
$ docker exec --interactive --tty $CONTAINER /bin/sh
/ # /root/.config/guix/current/bin/guix install hello
/ # exit

$ docker stop $CONTAINER
$ HASH=`docker commit $CONTAINER` && docker tag $HASH 1install-hello

$ CONTAINER=`docker run --detach --tty --privileged 1install-hello`
$ docker exec --interactive --tty $CONTAINER /bin/sh
/ # /root/.config/guix/current/bin/guix remove hello
/ # exit

$ docker stop $CONTAINER
$ HASH=`docker commit $CONTAINER` && docker tag $HASH 2remove-hello

$ CONTAINER=`docker run --detach --tty --privileged 2remove-hello`
$ docker exec --interactive --tty $CONTAINER /bin/sh
/ # /root/.config/guix/current/bin/guix pull -d
/ # /root/.config/guix/current/bin/guix package -d
/ # /root/.config/guix/current/bin/guix gc
/ # exit
$ docker stop $CONTAINER
$ HASH=`docker commit $CONTAINER` && docker tag $HASH 3clean

$ CONTAINER=`docker run --detach --tty --privileged 3clean`
$ docker export $CONTAINER | docker import --change $CMD - 4reexport

where I cheated with $CMD which does not as is but the full 'CMD...'
has to be typed after '--change'.


All the best,
simon
L
L
Ludovic Courtès wrote on 4 Jun 2020 13:58
Re: bug#41607: Deleted store items are not actually deleted
(name . Chris Marusich)(address . cmmarusich@gmail.com)
87eeqvcaau.fsf@gnu.org
Hi,

Chris Marusich <cmmarusich@gmail.com> skribis:

Toggle quote (35 lines)
> The reason Guix is failing to GC dead items in the Docker container is
> because those dead items are not on the "top layer", so Docker returns
> an EXDEV error:
>
> https://docs.docker.com/storage/storagedriver/overlayfs-driver/
>
> "Renaming directories: Calling rename(2) for a directory is allowed only
> when both the source and the destination path are on the top
> layer. Otherwise, it returns EXDEV error ('cross-device link not
> permitted'). Your application needs to be designed to handle EXDEV and
> fall back to a 'copy and unlink' strategy."
>
> You can observe this by running guix-daemon with strace in the
> container, and watching what happens when you try to delete one of the
> offending store items (make sure it is a directory). For example:
>
> 685 rename("/gnu/store/xib50iqk3w1gw9l770mad59m9bi3bcpc-manual-database", "/gnu/store/trash/xib50iqk3w1gw9l770mad59m9bi3bcpc-manual-database") = -1 EXDEV (Invalid cross-device link)
>
> In most cases, when guix-daemon GC's a dead directory, it does this
> (see: nix/libstore/gc.cc):
>
> - Create a trash directory (usually /gnu/store/trash)
> - Move dead directories into the trash directory.
> - Delete the trash directory.
>
> The trash directory is on the "top layer" because it gets created in the
> running container. However, in practice many store items from lower
> layers are made dead when Stephen's script runs "guix pull" and deletes
> the old profiles. If any of those store items were directories,
> guix-daemon will fail to GC them because of an XDEV error. If this is
> confusing to you, I suggest you experiment with Docker a little bit, and
> look closely at the steps that Stephen's script is running. I outlined
> this in the email I accidentally deleted, but I'm a little too tired to
> reproduce it all a second time. I hope you'll understand.

Interesting, thanks for the analysis!

Toggle quote (4 lines)
> Should Guix do anything about this? We could change guix-daemon to take
> correct action in the face of an XDEV error. We could also improve the
> logging, since currently it silently swallows the XDEV error.

I guess we could delete recursively right away upon EXDEV. It should be
just two lines of code, right?

Toggle quote (4 lines)
> To work around the issue, Stephen can build the images from the same
> base image, rather than daisy-chaining new images from old ones. That
> way, they would not accumulate layers without bound.

Maybe that too.

Thanks,
Ludo’.
C
C
Chris Marusich wrote on 4 Jun 2020 20:50
(name . Ludovic Courtès)(address . ludo@gnu.org)
87r1uu1x9e.fsf@gmail.com
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (7 lines)
>> Should Guix do anything about this? We could change guix-daemon to take
>> correct action in the face of an XDEV error. We could also improve the
>> logging, since currently it silently swallows the XDEV error.
>
> I guess we could delete recursively right away upon EXDEV. It should be
> just two lines of code, right?

I'll try making the change and report back. Yes, there are other cases
where we immediately delete without moving into the trash directory
(e.g., when the trash directory fails to be created), so it seems OK.

--
Chris
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEy/WXVcvn5+/vGD+x3UCaFdgiRp0FAl7ZQl0ACgkQ3UCaFdgi
Rp2UYRAAyAuxoPIgmAd6xgOHavBlkMdxo7S6RACQk7AKjChInSutq7wIVqQKV2my
2F1tdcLvti7kgO9my7MjTtzaMsIrgWh0yxZCWUtRuqVCRYX6byZb9mxmyYnZJu1f
bigJj/dzfMOC0skLDsN6G2dhy/PfIiT/3uZdsmWeVc4Ka/h/LqtR4AmjDdS3HNIA
pX/2J4e+yq4TeurLyfl0ZA+OqR1o7XoFSRBzjsHtr3pyMtqfvPT5csYgKYifenKy
6BQuNMGD/nZgV/S1AMiq9D9LmF60Eb7OBRfZ8tJp5Ho13OPhFiYXb5u4R3Jcrq6E
VF6R9FHIS3i1KxEYsvTCSoCRQEy8Dk1Z4uw6lTNtcb3+Q/eFUpeZCoxP4h6V+1vQ
7s2b4IVeD8InZNDINmvT0G3MEKWnxo1LQEQ0RcxHvJn95S6BD+g+H5Tz9QkmtmYw
4KsiuLCh7uSTL6zwOdTMFbMe6uz5eXqSawtELLFO9ClGlno15Pm+bJzekhEkpNw+
k9BaidDHaM+hKVncCObCr1yyMfoA6uQbyX0opsyfcTHpdwtB7SaMI9p+h57iTJ8+
j1xKvyudX3WEAWUp1iUb1hJaHDNMooMmOa86TbigC+6LVKFk+jqXXUN7IiRrkBbX
QVInBmJfwuhWyx1ykTSiB4EoXWVuiQDQWTxTv3ixfAh2RU9qK8M=
=PXGV
-----END PGP SIGNATURE-----

C
C
Christopher Marusich wrote on 5 Jun 2020 11:32
(name . Chris Marusich)(address . cmmarusich@gmail.com)
87pnad6eor.fsf@gmail.com
Chris Marusich <cmmarusich@gmail.com> writes:

Toggle quote (13 lines)
> Ludovic Courtès <ludo@gnu.org> writes:
>
>>> Should Guix do anything about this? We could change guix-daemon to take
>>> correct action in the face of an XDEV error. We could also improve the
>>> logging, since currently it silently swallows the XDEV error.
>>
>> I guess we could delete recursively right away upon EXDEV. It should be
>> just two lines of code, right?
>
> I'll try making the change and report back. Yes, there are other cases
> where we immediately delete without moving into the trash directory
> (e.g., when the trash directory fails to be created), so it seems OK.

Here is a patch. Turns out it's was just a one line change! If nobody
has any further feedback on it, I'll go ahead and merge it to the master
branch in the next couple days.

I tested it in one of the Docker containers provided by Stephen which
exhibited the problem. I built the new Guix inside the Docker container
and verified that a path which was previously unable to be GC'd due to
the EXDEV error, was now able to be successfully GC'd.

My understanding is that the only reason why the guix-daemon attempts to
move dead directories to the trash directory is to save time on
deleting, since large directories could take a while to fully delete.
If there is any reason why it might be unsafe to delete the directories
directly in case of EXDEV (I cannot think of any), please let me know.

--
Chris
From 505481a6a22819a42320f693988c3f8e13ded080 Mon Sep 17 00:00:00 2001
From: Chris Marusich <cmmarusich@gmail.com>
Date: Thu, 4 Jun 2020 23:26:19 -0700
Subject: [PATCH] daemon: Handle EXDEV when moving to trash directory.

Reported by Stephen Scheck <singularsyntax@gmail.com>.

* nix/libstore/gc.cc (LocalStore::deletePathRecursive): When we try to
move a dead directory into the trashDir using rename(2) but it returns
an EXDEV error, just delete the directory instead. This can happen in a
Docker container when the directory is not on the "top layer".
---
nix/libstore/gc.cc | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

Toggle diff (18 lines)
diff --git a/nix/libstore/gc.cc b/nix/libstore/gc.cc
index 8bc4e01eb0..845fe278c7 100644
--- a/nix/libstore/gc.cc
+++ b/nix/libstore/gc.cc
@@ -455,7 +455,10 @@ void LocalStore::deletePathRecursive(GCState & state, const Path & path)
throw SysError(format("unable to rename `%1%' to `%2%'") % path % tmp);
state.bytesInvalidated += size;
} catch (SysError & e) {
- if (e.errNo == ENOSPC) {
+ // In a Docker container, rename(2) returns EXDEV when the source
+ // and destination are not both on the "top layer". See:
+ // https://bugs.gnu.org/41607
+ if (e.errNo == ENOSPC || e.errNo == EXDEV) {
printMsg(lvlInfo, format("note: can't create move `%1%': %2%") % path % e.msg());
deleteGarbage(state, path);
}
--
2.26.2
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEy/WXVcvn5+/vGD+x3UCaFdgiRp0FAl7aESQACgkQ3UCaFdgi
Rp30yw//Vw6SV8tFac2hCw6CHi8vg8IqjNZdGEq+9sXztSSopCib7KQnoOT9LfqS
uGfwQ7yZEeTcQV6H6dOqHB7PmsyljHwIlMeEXDk4v5hCprxdfEftPO0y71BXJdYW
B4VgDYuqp2GbLzwV9UCSLLpddOamUZK5Rt/fFsFmx55oheBt5s5I+hExGbb/HxT4
bC9XZhWQpMrJn9DtWvWZ2Fujgy9cEBi7gqXbsJ7XrENU4WXwfgrRgLVxKi1iyJWc
LxVFXBY1D1Ob4mvK+mpuWdmvuYMoMHVbxP/so+fbEGcXvvopmYUlarfF+kq8IqLJ
+Il7wuuqYKb3u2oDkLZCJ6yrCHeox6/aaK9xB89Lz1DRIHJ29U7T+JlmNFckJ/0+
Vw5BYvLcX1JbJGhyPSEt16gkSRgjT2KA3vqs9Nd9iFZaec5sN+4DqBUGxuJXr718
UCCCe26dKo6sIzzOzkEoD/BAt5k79LusNzNmh7DSsq7ANY1WkdPNHdhM0iMmqdfL
3WOiOtsDq2Cre6BnFJx3wm1d0LZAB/nvvJRDfJ0oiUYo6EzBVydJd+txdmEkq5VG
+QDY8ktJ+PHdUMRs2DjqlK5oGoHFfXxIVJ6Qi960aToMWJa+SGxEoe7FJWqI4YpJ
mVZGWl46A7eLjjLvTwn8XTK9iFUKjliJJn8ru1C1d8Xftcvq05U=
=Jycu
-----END PGP SIGNATURE-----

Z
Z
zimoun wrote on 5 Jun 2020 11:36
(name . Christopher Marusich)(address . cmmarusich@gmail.com)
CAJ3okZ1xR2xncLmvMG31dJW9BZN53Pm8pxkbXZX0jmXD3a+SFQ@mail.gmail.com
Hi Chris,

On Fri, 5 Jun 2020 at 11:33, Christopher Marusich <cmmarusich@gmail.com> wrote:
Toggle quote (7 lines)
> Chris Marusich <cmmarusich@gmail.com> writes:
> > Ludovic Courtès <ludo@gnu.org> writes:

> Here is a patch. Turns out it's was just a one line change! If nobody
> has any further feedback on it, I'll go ahead and merge it to the master
> branch in the next couple days.

Really cool! How can I test it?


All the best,
simon
L
L
Ludovic Courtès wrote on 5 Jun 2020 18:21
(name . Christopher Marusich)(address . cmmarusich@gmail.com)
877dwl32l7.fsf@gnu.org
Howdy!

Christopher Marusich <cmmarusich@gmail.com> skribis:

Toggle quote (4 lines)
> Here is a patch. Turns out it's was just a one line change! If nobody
> has any further feedback on it, I'll go ahead and merge it to the master
> branch in the next couple days.

Yay!

Toggle quote (33 lines)
> From 505481a6a22819a42320f693988c3f8e13ded080 Mon Sep 17 00:00:00 2001
> From: Chris Marusich <cmmarusich@gmail.com>
> Date: Thu, 4 Jun 2020 23:26:19 -0700
> Subject: [PATCH] daemon: Handle EXDEV when moving to trash directory.
>
> Fixes <https://bugs.gnu.org/41607>.
> Reported by Stephen Scheck <singularsyntax@gmail.com>.
>
> * nix/libstore/gc.cc (LocalStore::deletePathRecursive): When we try to
> move a dead directory into the trashDir using rename(2) but it returns
> an EXDEV error, just delete the directory instead. This can happen in a
> Docker container when the directory is not on the "top layer".
> ---
> nix/libstore/gc.cc | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/nix/libstore/gc.cc b/nix/libstore/gc.cc
> index 8bc4e01eb0..845fe278c7 100644
> --- a/nix/libstore/gc.cc
> +++ b/nix/libstore/gc.cc
> @@ -455,7 +455,10 @@ void LocalStore::deletePathRecursive(GCState & state, const Path & path)
> throw SysError(format("unable to rename `%1%' to `%2%'") % path % tmp);
> state.bytesInvalidated += size;
> } catch (SysError & e) {
> - if (e.errNo == ENOSPC) {
> + // In a Docker container, rename(2) returns EXDEV when the source
> + // and destination are not both on the "top layer". See:
> + // https://bugs.gnu.org/41607
> + if (e.errNo == ENOSPC || e.errNo == EXDEV) {
> printMsg(lvlInfo, format("note: can't create move `%1%': %2%") % path % e.msg());
> deleteGarbage(state, path);
> }

For consistency with (most) of the code, I’d suggest a /* */ comment.

Otherwise LGTM, thank you!

Ludo’.
S
S
Stephen Scheck wrote on 5 Jun 2020 18:05
(name . Christopher Marusich)(address . cmmarusich@gmail.com)
CAKjnHz2Tkc50KV3iBnnQx4KDC0XGMm8UCEK-nOTLU2M2bUvU0Q@mail.gmail.com
Very cool - thanks Chris!

In the meantime, I've updated my build script to externalize the Guix
environment from the Docker container.
So far the daily builds are staying nice and small at ~197MB and one layer.
The images and updated script are
here if anybody is curious:


GitLab allows you to cache files between job stages and even full pipeline
runs. I first tried to cache /var/guix
and /gnu/store and mount them inside a container in which to perform `guix
pull` etc. However, it seems
that handling hard links on externally mounted file systems from within a
container is problematic. I think
passing `--disable-deduplication` to guix-daemon might resolve it, but I
couldn't figure out where/how to
change the Shepherd configuration to do that. So instead, I just copy the
directories into and out of the
container at start and end of the process. It seems to work. Mounting would
probably speed up the process
quite a bit if I could make it work.

Cheers,
-Steve

On Fri, Jun 5, 2020 at 5:32 AM Christopher Marusich <cmmarusich@gmail.com>
wrote:

Toggle quote (34 lines)
> Chris Marusich <cmmarusich@gmail.com> writes:
>
> > Ludovic Courtès <ludo@gnu.org> writes:
> >
> >>> Should Guix do anything about this? We could change guix-daemon to
> take
> >>> correct action in the face of an XDEV error. We could also improve the
> >>> logging, since currently it silently swallows the XDEV error.
> >>
> >> I guess we could delete recursively right away upon EXDEV. It should be
> >> just two lines of code, right?
> >
> > I'll try making the change and report back. Yes, there are other cases
> > where we immediately delete without moving into the trash directory
> > (e.g., when the trash directory fails to be created), so it seems OK.
>
> Here is a patch. Turns out it's was just a one line change! If nobody
> has any further feedback on it, I'll go ahead and merge it to the master
> branch in the next couple days.
>
> I tested it in one of the Docker containers provided by Stephen which
> exhibited the problem. I built the new Guix inside the Docker container
> and verified that a path which was previously unable to be GC'd due to
> the EXDEV error, was now able to be successfully GC'd.
>
> My understanding is that the only reason why the guix-daemon attempts to
> move dead directories to the trash directory is to save time on
> deleting, since large directories could take a while to fully delete.
> If there is any reason why it might be unsafe to delete the directories
> directly in case of EXDEV (I cannot think of any), please let me know.
>
> --
> Chris
>
Attachment: file
C
C
Chris Marusich wrote on 7 Jun 2020 03:31
87r1urmzkz.fsf_-_@gmail.com
Hi everyone,

I have committed the fix in d445c30ea6 and updated the guix package
definition in ecbde6505c to ensure that the next time "guix pull" is
run, the new guix-daemon version will be used.

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (2 lines)
> For consistency with (most) of the code, I’d suggest a /* */ comment.

OK - I did this in d445c30ea6.

Stephen Scheck <singularsyntax@gmail.com> writes:

Toggle quote (8 lines)
> I've updated my build script to externalize the Guix environment from
> the Docker container. So far the daily builds are staying nice and
> small at ~197MB and one layer. The images and updated script are here
> if anybody is curious:
>
> https://hub.docker.com/r/singularsyntax/guix/tags
> https://gitlab.com/singularsyntax-docker-hub/guix/-/blob/master/.gitlab-ci.yml

Neat! I'm glad you were figure out a way to to keep the image from
growing in size. It's good to know of a way to do that.

Thank you for your help and have a nice day! I'm closing this bug
report.

--
Chris
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEy/WXVcvn5+/vGD+x3UCaFdgiRp0FAl7cQ1wACgkQ3UCaFdgi
Rp2uyw/+KFqobULrNAJlv5sKE2H7KpHKyMMx2v3X0UxZO1CAATzRi49wr93KqDIY
1Rmly1/fDmnSAWEM+4VeE4aN86z7ZLLwWhftVmpRQ3260PywPFkCLLxvRBKpj9Je
HgWgdxglzPbSXuePxAr9hreMtRWnjThMlBqwN9m9qP5fj5QyLKAHr2HoVTEakxet
eJzYTcyuHEjK95oUFAsgdZI9OQlGg1DDeCclY/VPJHrLN20DktZkbqOFli7RySam
6cY+nzuqJuc6Clid72i9ZD7oReGxWP1KRMjQQq3T0h+AYBbD5X2XbTxDvSbBqQIm
0YidhUqyEVO8d0NSRLzN7+Sd6qvp7neur6ZQr03Tvl6QU8+aDnsuVDDHXNmLeeIX
eHv0sOB2Duf+FVT1jkZuRFy4oyXSqxf3bA5UHtGa/b8wC232kYiOKWLimm+hDrBT
J9eLL7QAI/EFsHj+W8WoYf3dMs1FreLMQnt5XszRHXXfPf+1ScvmI0rSHB5rWJbp
TC3waFcQMFJlqASlZ0dfv0GyZbeyDWABmIsI2s9c7+nFMDTuVAk1dQ71EFaVqWtL
eWC/5BwosKA+We4+P5bNMGu7iJNcq3RSLp/6aKga4lhUmQpCE9oeTxwhUgKNbCKW
5N9W5gjvW7+zlLqLxcDvJWpsBzLiakI62aJQqj9uUeXH7pq1Qog=
=2mYj
-----END PGP SIGNATURE-----

Z
Z
zimoun wrote on 7 Jun 2020 12:07
(name . Chris Marusich)(address . cmmarusich@gmail.com)
CAJ3okZ090-33=FUzFDAbjX-rX3FgYq_1+wWx+OtMGk+RZKZvBA@mail.gmail.com
Hi Chris,

Thank you.

On Sun, 7 Jun 2020 at 03:31, Chris Marusich <cmmarusich@gmail.com> wrote:

Toggle quote (4 lines)
> I have committed the fix in d445c30ea6 and updated the guix package
> definition in ecbde6505c to ensure that the next time "guix pull" is
> run, the new guix-daemon version will be used.

Sorry to be late, I have things like that:

-8<---------------cut here---------------start------------->8---
[548 MiB] deleting '/gnu/store/dkzivzn17qilmqdfpyps62b395wxhshh-openssl-1.1.1f'
note: can't create move
`/gnu/store/dkzivzn17qilmqdfpyps62b395wxhshh-openssl-1.1 unable to
rename `/gnu/store/dkzivzn17qilmqdfpyps62b395wxhshh-openssl-1.1.1f' to
`/gnu/store/trash/dkzivzn17qilmqdfpyps62b395wxhshh-openssl-1.1.1f':
Invalid cross-device link
[553 MiB] deleting '/gnu/store/yaajjri3cks8rhkb29z08d3q5waww0dx-packages.scm'
Toggle snippet (7 lines)
I should do something wrong...


All the best,
simon
C
C
Chris Marusich wrote on 8 Jun 2020 09:43
(name . zimoun)(address . zimon.toutoune@gmail.com)
874krm9f4b.fsf@gmail.com
Hi zimoun,

zimoun <zimon.toutoune@gmail.com> writes:

Toggle quote (16 lines)
> On Sun, 7 Jun 2020 at 03:31, Chris Marusich <cmmarusich@gmail.com> wrote:
>
>> I have committed the fix in d445c30ea6 and updated the guix package
>> definition in ecbde6505c to ensure that the next time "guix pull" is
>> run, the new guix-daemon version will be used.
>
> Sorry to be late, I have things like that:
>
> [548 MiB] deleting '/gnu/store/dkzivzn17qilmqdfpyps62b395wxhshh-openssl-1.1.1f'
> note: can't create move
> `/gnu/store/dkzivzn17qilmqdfpyps62b395wxhshh-openssl-1.1 unable to
> rename `/gnu/store/dkzivzn17qilmqdfpyps62b395wxhshh-openssl-1.1.1f' to
> `/gnu/store/trash/dkzivzn17qilmqdfpyps62b395wxhshh-openssl-1.1.1f':
> Invalid cross-device link
> [553 MiB] deleting '/gnu/store/yaajjri3cks8rhkb29z08d3q5waww0dx-packages.scm'

That looks normal to me. Previously, the cross-device link error would
be silently ignored and the directory would not be deleted - no message
would be printed about the cross-device link error.

Now, however, the cross-device link error is reported, and the directory
is deleted. If you find that the directory is not actually deleted,
please let me know so I can look into it more. When I tested it on my
end, the directory was in fact deleted.

--
Chris
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEy/WXVcvn5+/vGD+x3UCaFdgiRp0FAl7d7DQACgkQ3UCaFdgi
Rp0OfA//Ug3haMHSVugQMtPIeMvsfiO/uvEAus5ccSbFgoz8MeZI0x6LMxIpbfPs
lf0bv9yNBNBn+Jf/XanwRLjscOCDxJc87eXYYtUHk8B5JrxTQ3ueUK4SspS2GgHA
W9yR+/s2XCiDY3WAPD8Jr3iVTblRESuR7l78K/tCT0dPL0BX3xPtIaBm1PFg/594
LAIqhIfAlmcYtkvsUnp+2NM4t0h58HvD2iMRwdVXKg8A93fx+HBlBqWps1DlT6eH
1kI6BpXYbJGPn6nR4pCTcZHwI6ETXwUtVKOGchntqitsLigfNPVi5UjkKhQu0Y2U
AHXqod284e2kvW73eaOAGoHkJr0rPdyqRjUjO5nOW/uCuN5p/NLl5qE4ar+z61nk
JiDy13Y3yNkUPcpMEN3cmVJUIv7vJ6/w52mQO8z2i3TakbT7696aWuYui15XJlZ+
wv8PQ/g6InTi1093KPL+vmdTzkAlQEwyab+GJKC5WOwOP1x9+STU1qzSyBRbKMfi
K6aNOjL9vf2WjxOt1C4jaAlXxtaTeKuMfkUp41VEIIFaocfw1nq4F2ujVSELvVU4
RQGrgrU3N+bKvHPEY59IFYPuvGOzVULsyy29/N0IOqFTFKSGIskit5J6iWkmvGAz
JlF+P16EzjFhzW1sQEMTblfgJpuicq8ZImU3JJd/RIp+S07fVVA=
=agrs
-----END PGP SIGNATURE-----

?