[PATCH] nix: libstore: Do not remove unused links when deleting specific items.

OpenSubmitted by Maxim Cournoyer.
Details
7 participants
  • Jack Hill
  • Liliana Marie Prikler
  • Ludovic Courtès
  • Maxim Cournoyer
  • Maxime Devos
  • Tobias Geerinckx-Rice
  • zimoun
Owner
unassigned
Severity
normal
M
M
Maxim Cournoyer wrote on 27 Oct 2021 05:49
(address . guix-patches@gnu.org)(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
20211027034918.4591-1-maxim.cournoyer@gmail.com
Deleting unused links can be a very costly operation, especially on rotative
hard drives. As removing single store items is often used for experimentation
rather than for cleaning purposes, this change allows it to run without the
links cleanup.

* nix/libstore/gc.cc (LocalStore::collectGarbage): Do not clean up links when
the specified action is GCOptions::gcDeleteSpecific.
---
nix/libstore/gc.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Toggle diff (15 lines)
diff --git a/nix/libstore/gc.cc b/nix/libstore/gc.cc
index e1d0765154..7d872d8cc1 100644
--- a/nix/libstore/gc.cc
+++ b/nix/libstore/gc.cc
@@ -771,7 +771,7 @@ void LocalStore::collectGarbage(const GCOptions & options, GCResults & results)
     deleteGarbage(state, state.trashDir);
 
     /* Clean up the links directory. */
-    if (options.action == GCOptions::gcDeleteDead || options.action == GCOptions::gcDeleteSpecific) {
+    if (options.action == GCOptions::gcDeleteDead) {
         printMsg(lvlError, format("deleting unused links..."));
         removeUnusedLinks(state);
     }
-- 
2.33.1
L
L
Ludovic Courtès wrote on 28 Oct 2021 16:16
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)(address . 51427@debbugs.gnu.org)
87o8795j61.fsf@gnu.org
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (22 lines)
> Deleting unused links can be a very costly operation, especially on rotative
> hard drives. As removing single store items is often used for experimentation
> rather than for cleaning purposes, this change allows it to run without the
> links cleanup.
>
> * nix/libstore/gc.cc (LocalStore::collectGarbage): Do not clean up links when
> the specified action is GCOptions::gcDeleteSpecific.
> ---
> nix/libstore/gc.cc | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/nix/libstore/gc.cc b/nix/libstore/gc.cc
> index e1d0765154..7d872d8cc1 100644
> --- a/nix/libstore/gc.cc
> +++ b/nix/libstore/gc.cc
> @@ -771,7 +771,7 @@ void LocalStore::collectGarbage(const GCOptions & options, GCResults & results)
> deleteGarbage(state, state.trashDir);
>
> /* Clean up the links directory. */
> - if (options.action == GCOptions::gcDeleteDead || options.action == GCOptions::gcDeleteSpecific) {
> + if (options.action == GCOptions::gcDeleteDead) {

I believe the effect is that ‘guix gc -D /gnu/store/…-disk-image’ would
remove nothing: /gnu/store/.links would still contain a copy of that big
disk image, so as a result, you’ve freed zero bytes.

Am I right?

Perhaps what we could do is, upon ‘gcDeleteSpecific’, only look at the
relevant entry in .links instead of traversing all of them.

WDYT?

Thanks,
Ludo’.
L
L
Liliana Marie Prikler wrote on 31 Oct 2021 09:50
(address . 51427@debbugs.gnu.org)
5c2dd60acfaa7d74b7554babb3e223bc855bac8a.camel@gmail.com
Hi,

Am Donnerstag, den 28.10.2021, 16:16 +0200 schrieb Ludovic Courtès:
Toggle quote (34 lines)
> Hi,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
> > Deleting unused links can be a very costly operation, especially on
> > rotative hard drives. As removing single store items is often used
> > for experimentation rather than for cleaning purposes, this change
> > allows it to run without the links cleanup.
> >
> > * nix/libstore/gc.cc (LocalStore::collectGarbage): Do not clean up
> > links when
> > the specified action is GCOptions::gcDeleteSpecific.
> > ---
> > nix/libstore/gc.cc | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/nix/libstore/gc.cc b/nix/libstore/gc.cc
> > index e1d0765154..7d872d8cc1 100644
> > --- a/nix/libstore/gc.cc
> > +++ b/nix/libstore/gc.cc
> > @@ -771,7 +771,7 @@ void LocalStore::collectGarbage(const GCOptions
> > & options, GCResults & results)
> > deleteGarbage(state, state.trashDir);
> >
> > /* Clean up the links directory. */
> > - if (options.action == GCOptions::gcDeleteDead ||
> > options.action == GCOptions::gcDeleteSpecific) {
> > + if (options.action == GCOptions::gcDeleteDead) {
>
> I believe the effect is that ‘guix gc -D /gnu/store/…-disk-image’
> would remove nothing: /gnu/store/.links would still contain a copy of
> that big disk image, so as a result, you’ve freed zero bytes.
>
> Am I right?
I think that might be the point. As Maxim said, single items are
(likely) not removed for cleaning purposes, so freeing the disk image
has little effect. Plus, you could invoke it like

guix gc -D dead-item dead-item live-item dead-item

It would fail at live-item and then not continue to free the links of
the two dead items prior.

So there's a few things we could do here:

1. simply fail and have the user deal with it (including the option of
doing a normal `guix gc' or `guix gc -C 1')
2. remember which paths were live and dead and always clean up the
links, only reporting errors afterwards
3. add an option to explicitly check the .links directory (which
defaults to true for the current things, but could also be used to
clean links after a liveness check or after a do-nothing `guix gc -F').
4. ...

WDYT?
L
L
Ludovic Courtès wrote on 31 Oct 2021 15:07
(name . Liliana Marie Prikler)(address . liliana.prikler@gmail.com)
87h7cxp9tl.fsf@gnu.org
Hi,

Liliana Marie Prikler <liliana.prikler@gmail.com> skribis:

Toggle quote (39 lines)
> Am Donnerstag, den 28.10.2021, 16:16 +0200 schrieb Ludovic Courtès:
>> Hi,
>>
>> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>>
>> > Deleting unused links can be a very costly operation, especially on
>> > rotative hard drives. As removing single store items is often used
>> > for experimentation rather than for cleaning purposes, this change
>> > allows it to run without the links cleanup.
>> >
>> > * nix/libstore/gc.cc (LocalStore::collectGarbage): Do not clean up
>> > links when
>> > the specified action is GCOptions::gcDeleteSpecific.
>> > ---
>> > nix/libstore/gc.cc | 2 +-
>> > 1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/nix/libstore/gc.cc b/nix/libstore/gc.cc
>> > index e1d0765154..7d872d8cc1 100644
>> > --- a/nix/libstore/gc.cc
>> > +++ b/nix/libstore/gc.cc
>> > @@ -771,7 +771,7 @@ void LocalStore::collectGarbage(const GCOptions
>> > & options, GCResults & results)
>> > deleteGarbage(state, state.trashDir);
>> >
>> > /* Clean up the links directory. */
>> > - if (options.action == GCOptions::gcDeleteDead ||
>> > options.action == GCOptions::gcDeleteSpecific) {
>> > + if (options.action == GCOptions::gcDeleteDead) {
>>
>> I believe the effect is that ‘guix gc -D /gnu/store/…-disk-image’
>> would remove nothing: /gnu/store/.links would still contain a copy of
>> that big disk image, so as a result, you’ve freed zero bytes.
>>
>> Am I right?
> I think that might be the point. As Maxim said, single items are
> (likely) not removed for cleaning purposes, so freeing the disk image
> has little effect.

What do you mean? When doing VM testing, I regularly do ‘guix gc -D
/gnu/store/…-disk-image’ precisely to save space. Fortunately it does
have the intended effect of freeing a bunch of GiBs.

Toggle quote (7 lines)
> Plus, you could invoke it like
>
> guix gc -D dead-item dead-item live-item dead-item
>
> It would fail at live-item and then not continue to free the links of
> the two dead items prior.

Yes, and that’s annoying, but it’s unrelated. :-)

Toggle quote (11 lines)
> So there's a few things we could do here:
>
> 1. simply fail and have the user deal with it (including the option of
> doing a normal `guix gc' or `guix gc -C 1')
> 2. remember which paths were live and dead and always clean up the
> links, only reporting errors afterwards
> 3. add an option to explicitly check the .links directory (which
> defaults to true for the current things, but could also be used to
> clean links after a liveness check or after a do-nothing `guix gc -F').
> 4. ...

You seem to be proposing to remove ‘-D’ altogether. I agree it has the
shortcomings you write, but I think it’s occasionally useful
nonetheless.

My proposal would be either the status quo, or removing just the one
link that matters from /gnu/store/.links upon ‘-D’.

Thoughts?

Ludo’.
L
L
Liliana Marie Prikler wrote on 31 Oct 2021 15:39
(name . Ludovic Courtès)(address . ludo@gnu.org)
9f9a708f8ecbb49464e3d939be89638a53dc0d30.camel@gmail.com
Hi,

Am Sonntag, den 31.10.2021, 15:07 +0100 schrieb Ludovic Courtès:
Toggle quote (3 lines)
> What do you mean? When doing VM testing, I regularly do ‘guix gc -D
> /gnu/store/…-disk-image’ precisely to save space. Fortunately it
> does have the intended effect of freeing a bunch of GiBs.
Fair enough, different strokes and all.

Toggle quote (22 lines)
> > Plus, you could invoke it like
> >
> > guix gc -D dead-item dead-item live-item dead-item
> >
> > It would fail at live-item and then not continue to free the links
> > of the two dead items prior.
>
> Yes, and that’s annoying, but it’s unrelated. :-)
>
> > So there's a few things we could do here:
> >
> > 1. simply fail and have the user deal with it (including the option
> > of doing a normal `guix gc' or `guix gc -C 1')
> > 2. remember which paths were live and dead and always clean up the
> > links, only reporting errors afterwards
> > 3. add an option to explicitly check the .links directory (which
> > defaults to true for the current things, but could also be used to
> > clean links after a liveness check or after a do-nothing `guix gc
> > -F').
> > 4. ...
>
> You seem to be proposing to remove ‘-D’ altogether.
I wrote no such thing. Obviously there needs to be a way of removing
single items from the store, but what else to do is not so clear. It
is only obvious that traversing all of .links is too expensive.

Toggle quote (7 lines)
> I agree it has the shortcomings you write, but I think it’s
> occasionally useful nonetheless.
>
> My proposal would be either the status quo, or removing just the one
> link that matters from /gnu/store/.links upon ‘-D’.
>
> Thoughts?
There isn't "just the one link that matters" when it comes to removing
multiple items -- heck, even if tasked to free up just 5MB rather than
all of the garbage, traversing all .links is probably too expensive.

Accepting that we might want to always delete links at the end, I think
`guix gc' needs a way to record which links had their count go to 1
during garbage deletion, so that when it comes to deleting links only
those are tried (and of course checked again to make sure their count
is indeed still 1). This should be the preferred mode if less than
some arbitrary large number of store items are affected (let's say 1024
or some multiple of it) or a total cleaning of .links has been forced.

Though perhaps there's a way to do this without manual recording.
Let's say we notice a link count going to one as we clear the trash.
We could just add the link behind it to the trash right away to ensure
that it is not reused by something else and then clean the trash a
second time (we would have to check for potential race conditions in
this case).

WDYT?

Liliana
M
M
Maxim Cournoyer wrote on 31 Oct 2021 21:51
(name . Ludovic Courtès)(address . ludo@gnu.org)
87sfwg7w9z.fsf@gmail.com
Hi,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (74 lines)
> Hi,
>
> Liliana Marie Prikler <liliana.prikler@gmail.com> skribis:
>
>> Am Donnerstag, den 28.10.2021, 16:16 +0200 schrieb Ludovic Courtès:
>>> Hi,
>>>
>>> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>>>
>>> > Deleting unused links can be a very costly operation, especially on
>>> > rotative hard drives. As removing single store items is often used
>>> > for experimentation rather than for cleaning purposes, this change
>>> > allows it to run without the links cleanup.
>>> >
>>> > * nix/libstore/gc.cc (LocalStore::collectGarbage): Do not clean up
>>> > links when
>>> > the specified action is GCOptions::gcDeleteSpecific.
>>> > ---
>>> > nix/libstore/gc.cc | 2 +-
>>> > 1 file changed, 1 insertion(+), 1 deletion(-)
>>> >
>>> > diff --git a/nix/libstore/gc.cc b/nix/libstore/gc.cc
>>> > index e1d0765154..7d872d8cc1 100644
>>> > --- a/nix/libstore/gc.cc
>>> > +++ b/nix/libstore/gc.cc
>>> > @@ -771,7 +771,7 @@ void LocalStore::collectGarbage(const GCOptions
>>> > & options, GCResults & results)
>>> > deleteGarbage(state, state.trashDir);
>>> >
>>> > /* Clean up the links directory. */
>>> > - if (options.action == GCOptions::gcDeleteDead ||
>>> > options.action == GCOptions::gcDeleteSpecific) {
>>> > + if (options.action == GCOptions::gcDeleteDead) {
>>>
>>> I believe the effect is that ‘guix gc -D /gnu/store/…-disk-image’
>>> would remove nothing: /gnu/store/.links would still contain a copy of
>>> that big disk image, so as a result, you’ve freed zero bytes.
>>>
>>> Am I right?
>> I think that might be the point. As Maxim said, single items are
>> (likely) not removed for cleaning purposes, so freeing the disk image
>> has little effect.
>
> What do you mean? When doing VM testing, I regularly do ‘guix gc -D
> /gnu/store/…-disk-image’ precisely to save space. Fortunately it does
> have the intended effect of freeing a bunch of GiBs.
>
>> Plus, you could invoke it like
>>
>> guix gc -D dead-item dead-item live-item dead-item
>>
>> It would fail at live-item and then not continue to free the links of
>> the two dead items prior.
>
> Yes, and that’s annoying, but it’s unrelated. :-)
>
>> So there's a few things we could do here:
>>
>> 1. simply fail and have the user deal with it (including the option of
>> doing a normal `guix gc' or `guix gc -C 1')
>> 2. remember which paths were live and dead and always clean up the
>> links, only reporting errors afterwards
>> 3. add an option to explicitly check the .links directory (which
>> defaults to true for the current things, but could also be used to
>> clean links after a liveness check or after a do-nothing `guix gc -F').
>> 4. ...
>
> You seem to be proposing to remove ‘-D’ altogether. I agree it has the
> shortcomings you write, but I think it’s occasionally useful
> nonetheless.
>
> My proposal would be either the status quo, or removing just the one
> link that matters from /gnu/store/.links upon ‘-D’.

The second proposal makes sense. I didn't care about freeing space, as
my use case was getting around corrupting an item in my store due to
https://issues.guix.gnu.org/51400,which the patch proposed here allowed
me to do without wasting hours of cleaning up links (nearly 1 GiB of
store on spinning drives).

Thanks,

Maxim
Z
Z
zimoun wrote on 3 Nov 2021 11:45
Re: [bug#51427] [PATCH] nix: libstore: Do not remove unused links when deleting specific items.
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
CAJ3okZ1=0MZpUq7jk3F46mWfikNa1snj+uERXGgVMRWKviSmhA@mail.gmail.com
Hi,

On Sun, 31 Oct 2021 at 21:53, Maxim Cournoyer <maxim.cournoyer@gmail.com> wrote:

Toggle quote (9 lines)
> > My proposal would be either the status quo, or removing just the one
> > link that matters from /gnu/store/.links upon ‘-D’.
>
> The second proposal makes sense. I didn't care about freeing space, as
> my use case was getting around corrupting an item in my store due to
> https://issues.guix.gnu.org/51400, which the patch proposed here allowed
> me to do without wasting hours of cleaning up links (nearly 1 GiB of
> store on spinning drives).

I often use "guix gc -D" for being able to launch again a command and
then check against SWH or other upstream. Because the "deleting
links" is too much long, I am doing ^C at this step. Therefore, be
able to skip this step when running "guix gc -D" appears to me the
thing to do. Well, if "guix gc -D" is not doing it, then I am forcing
it by interrupting it; which appears to me worse than status quo.
BTW, if I need space, then I do not use "guix gc -D" on few items but
instead I use the options -F or -d or -C; here I am fine it takes the
time it takes. :-)

Well, «"deleting unused links" GC phase is too slow» is not new
because it is bug#24937 [1]. Therefore, status quo is really annoying
and I would prefer to skip this GC phase when using option -D although
it is not optimal. Or remove just the one link that matters.



Cheers,
simon
L
L
Ludovic Courtès wrote on 6 Nov 2021 17:57
Re: bug#51427: [PATCH] nix: libstore: Do not remove unused links when deleting specific items.
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
87ee7tmdbd.fsf@gnu.org
Hi Maxim and all,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (2 lines)
> Ludovic Courtès <ludo@gnu.org> writes:

[...]

Toggle quote (9 lines)
>> You seem to be proposing to remove ‘-D’ altogether. I agree it has the
>> shortcomings you write, but I think it’s occasionally useful
>> nonetheless.
>>
>> My proposal would be either the status quo, or removing just the one
>> link that matters from /gnu/store/.links upon ‘-D’.
>
> The second proposal makes sense.

Maybe that proposal is bogus though because you’d need to know the hash
of the files being removed, which means reading them…

Toggle quote (6 lines)
> I didn't care about freeing space, as my use case was getting around
> corrupting an item in my store due to
> https://issues.guix.gnu.org/51400, which the patch proposed here
> allowed me to do without wasting hours of cleaning up links (nearly 1
> GiB of store on spinning drives).

The ideal solution as zimoun writes would be to address

Perhaps that phase needs to be implemented using a different strategy,
say an sqlite database that records the current link count (hoping that
‘SELECT * FROM links WHERE NLINKS = 1’ would be faster than traversing
all of ‘.links’) as well as a mapping from store item to file hashes.

BTW, those using Btrfs can probably use ‘--disable-deduplication’ and be
done with it.

Thanks,
Ludo’.
M
M
Maxim Cournoyer wrote on 9 Nov 2021 05:11
(name . Ludovic Courtès)(address . ludo@gnu.org)
87k0hi0xzp.fsf@gmail.com
Hi,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (20 lines)
> Hi Maxim and all,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> Ludovic Courtès <ludo@gnu.org> writes:
>
> [...]
>
>>> You seem to be proposing to remove ‘-D’ altogether. I agree it has the
>>> shortcomings you write, but I think it’s occasionally useful
>>> nonetheless.
>>>
>>> My proposal would be either the status quo, or removing just the one
>>> link that matters from /gnu/store/.links upon ‘-D’.
>>
>> The second proposal makes sense.
>
> Maybe that proposal is bogus though because you’d need to know the hash
> of the files being removed, which means reading them…

Oops :-).

Toggle quote (9 lines)
>> I didn't care about freeing space, as my use case was getting around
>> corrupting an item in my store due to
>> https://issues.guix.gnu.org/51400, which the patch proposed here
>> allowed me to do without wasting hours of cleaning up links (nearly 1
>> GiB of store on spinning drives).
>
> The ideal solution as zimoun writes would be to address
> <https://issues.guix.gnu.org/24937>.

Seems there's some improvement ready, but which needs more
testing/measurements? I'd suggest simply invoking GNU sort; if it has
many pages of program for doing what it does, it's probably doing
something fancier/faster than we can (are ready to) emulate -- for free!

Toggle quote (5 lines)
> Perhaps that phase needs to be implemented using a different strategy,
> say an sqlite database that records the current link count (hoping that
> ‘SELECT * FROM links WHERE NLINKS = 1’ would be faster than traversing
> all of ‘.links’) as well as a mapping from store item to file hashes.

Hmm. I'll need to dive in the problem a bit more before I can comment
on this.

Toggle quote (3 lines)
> BTW, those using Btrfs can probably use ‘--disable-deduplication’ and be
> done with it.

I erroneously used to think that Btrfs could do live deduplication, but
it doesn't. There are external tools to do out of band / batch
deduplication though [0]; so if they perform better than the guix daemon's
own dedup, perhaps we could document this way out for our Btrfs users.


Thank you,

Maxim
J
J
Jack Hill wrote on 9 Nov 2021 05:57
Re: [bug#51427] [PATCH] nix: libstore: Do not remove unused links when deleting specific items.
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
alpine.DEB.2.21.2111082329280.4243@marsh.hcoop.net
On Mon, 8 Nov 2021, Maxim Cournoyer wrote:

Toggle quote (12 lines)
> Ludovic Courtès <ludo@gnu.org> writes:
>
>> BTW, those using Btrfs can probably use ‘--disable-deduplication’ and be
>> done with it.
>
> I erroneously used to think that Btrfs could do live deduplication, but
> it doesn't. There are external tools to do out of band / batch
> deduplication though [0]; so if they perform better than the guix daemon's
> own dedup, perhaps we could document this way out for our Btrfs users.
>
> [0] https://btrfs.wiki.kernel.org/index.php/Deduplication

A little while ago I had hoped to test btrfs with --disable-deduplication
and bees [1] as the deduplication agent, but wasn't able to successfully
run a system with --disable-deduplication because I needed the
deduplication to cover up problem with grafts [2]. Until we resolve the
second issue, I don't think we should recommend folks run the daemon with
--disable-deduplication.

[1] https://issues.guix.gnu.org/47983(still missing a service)

Best,
Jack
L
L
Ludovic Courtès wrote on 9 Nov 2021 13:56
Re: bug#51427: [PATCH] nix: libstore: Do not remove unused links when deleting specific items.
(name . Jack Hill)(address . jackhill@jackhill.us)
87lf1xmqrg.fsf_-_@gnu.org
Hi,

Jack Hill <jackhill@jackhill.us> skribis:

Toggle quote (24 lines)
> On Mon, 8 Nov 2021, Maxim Cournoyer wrote:
>
>> Ludovic Courtès <ludo@gnu.org> writes:
>>
>>> BTW, those using Btrfs can probably use ‘--disable-deduplication’ and be
>>> done with it.
>>
>> I erroneously used to think that Btrfs could do live deduplication, but
>> it doesn't. There are external tools to do out of band / batch
>> deduplication though [0]; so if they perform better than the guix daemon's
>> own dedup, perhaps we could document this way out for our Btrfs users.
>>
>> [0] https://btrfs.wiki.kernel.org/index.php/Deduplication
>
> A little while ago I had hoped to test btrfs with
> --disable-deduplication and bees [1] as the deduplication agent, but
> wasn't able to successfully run a system with --disable-deduplication
> because I needed the deduplication to cover up problem with grafts
> [2]. Until we resolve the second issue, I don't think we should
> recommend folks run the daemon with --disable-deduplication.
>
> [1] https://issues.guix.gnu.org/47983 (still missing a service)
> [2] https://issues.guix.gnu.org/47115

Oh, right. We didn’t quite get to the bottom of #2. Is it still an
issue? Some questions remained opened.

Thanks,
Ludo’.
Z
Z
zimoun wrote on 9 Nov 2021 19:10
Re: [bug#51427] [PATCH] nix: libstore: Do not remove unused links when deleting specific items.
86k0hhnqss.fsf@gmail.com
Hi Ludo,

On Sat, 06 Nov 2021 at 17:57, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (3 lines)
> The ideal solution as zimoun writes would be to address
> <https://issues.guix.gnu.org/24937>.

Cool for your last reply with a plan for mitigating the issue.

Even if the phase is drastically speed up, it would be probably still
too slow when using the option ’-D’ remove only one <item>; or just
some.

I have not checked the code, maybe I should start by that. ;-) Is it not
possible to simply skip the deleting phase when the option ’-D’ is used?


Cheers,
simon
L
L
Ludovic Courtès wrote on 17 Nov 2021 11:02
(name . zimoun)(address . zimon.toutoune@gmail.com)
871r3f2j7y.fsf@gnu.org
Hi,

zimoun <zimon.toutoune@gmail.com> skribis:

Toggle quote (3 lines)
> I have not checked the code, maybe I should start by that. ;-) Is it not
> possible to simply skip the deleting phase when the option ’-D’ is used?

No; like I wrote, it would have the effect of not deleting anything:


Needs more thought…

Ludo’.
Z
Z
zimoun wrote on 17 Nov 2021 12:49
(name . Ludovic Courtès)(address . ludo@gnu.org)
867dd7roi2.fsf@gmail.com
Hi Ludo,

On Wed, 17 Nov 2021 at 11:02, Ludovic Courtès <ludo@gnu.org> wrote:
Toggle quote (7 lines)
> zimoun <zimon.toutoune@gmail.com> skribis:
>
>> I have not checked the code, maybe I should start by that. ;-) Is it not
>> possible to simply skip the deleting phase when the option ’-D’ is used?
>
> No; like I wrote, it would have the effect of not deleting anything:

After giving a look at the code, yeah it is not so simple. :-)


Toggle quote (4 lines)
>
> Needs more thought…

The logic is complicated, thus adding this guard…

Toggle snippet (9 lines)
if (options.maxFreed > 0) {
/* Clean up the links directory. */
if (options.action == GCOptions::gcDeleteDead || options.action == GCOptions::gcDeleteSpecific) {
printMsg(lvlError, format("deleting unused links..."));
removeUnusedLinks(state);
}
}

…is probably dumb. From my understanding, it should bypass the phase
’removeUnusedLinks’ when using “guix gc -D”. Well, I have not tested
it.


Cheers,
simon
L
L
Ludovic Courtès wrote on 18 Jul 15:57 +0200
Re: bug#51427: [PATCH] nix: libstore: Do not remove unused links when deleting specific items.
(name . zimoun)(address . zimon.toutoune@gmail.com)
87o7xmy14l.fsf_-_@gnu.org
Hello,

With commit 472a0e82a52a3d5d841e1dfad6b13e26082a5750 (Nov. 2021),
partially fixing https://issues.guix.gnu.org/24937, there is hopefully
less pressure to skip the remove-unused-links phase.

Should we close this issue?

Ludo’.
L
L
Liliana Marie Prikler wrote on 18 Jul 19:03 +0200
2441c768fea8faee800947a17aef896c35173845.camel@gmail.com
Am Montag, dem 18.07.2022 um 15:57 +0200 schrieb Ludovic Courtès:
Toggle quote (7 lines)
> Hello,
>
> With commit 472a0e82a52a3d5d841e1dfad6b13e26082a5750 (Nov. 2021),
> partially fixing <https://issues.guix.gnu.org/24937>, there is
> hopefully less pressure to skip the remove-unused-links phase.
>
> Should we close this issue?
As a hard disk user, I'm leaning towards "no". In fact, I recently
encountered a case where I think I might want to skip it even if not
deleting "specific items". For context, my machine has troubles with
sudden power outages during builds (courtesy of a certain graphics card
manufacturer), so if one of those happens during `guix package' or
`guix system' invocation, the sanest thing to do is to run `guix gc'
after reboot and retry whatever command I wanted to run. However,
since I'm not really deleting much here, I'd probably be fine with
accumulating trash and collecting it at a later date. Deleting unused
links is also something that can on some machines be postponed to a
time when they'd otherwise be idle, though I don't think it matters too
much in the context of CI since the global lock is no longer held at
that point, while the lack of storage is still blocking builds.

Cheers
L
L
Ludovic Courtès wrote on 19 Jul 10:34 +0200
(name . Liliana Marie Prikler)(address . liliana.prikler@gmail.com)
87sfmxv6ue.fsf@gnu.org
Hi,

Liliana Marie Prikler <liliana.prikler@gmail.com> skribis:

Toggle quote (10 lines)
> Am Montag, dem 18.07.2022 um 15:57 +0200 schrieb Ludovic Courtès:
>> Hello,
>>
>> With commit 472a0e82a52a3d5d841e1dfad6b13e26082a5750 (Nov. 2021),
>> partially fixing <https://issues.guix.gnu.org/24937>, there is
>> hopefully less pressure to skip the remove-unused-links phase.
>>
>> Should we close this issue?
> As a hard disk user, I'm leaning towards "no".

At the REPL, could you do:

,use(ice-9 ftw)
,t (length (scandir "/gnu/store/.links"))

?

On my SSD I get:

$4 = 438356
;; 24.613712s real time, 10.195698s run time. 1.805636s spent in GC.

This ‘scandir’ call is a good approximation of the cost of the
remove-unused-links phase.

Ludo’.
L
L
Liliana Marie Prikler wrote on 19 Jul 20:42 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
8538dc9925eb499ea2b728e349d98296eedeb14d.camel@gmail.com
Am Dienstag, dem 19.07.2022 um 10:34 +0200 schrieb Ludovic Courtès:
Toggle quote (25 lines)
> Hi,
>
> Liliana Marie Prikler <liliana.prikler@gmail.com> skribis:
>
> > Am Montag, dem 18.07.2022 um 15:57 +0200 schrieb Ludovic Courtès:
> > > Hello,
> > >
> > > With commit 472a0e82a52a3d5d841e1dfad6b13e26082a5750 (Nov. 2021),
> > > partially fixing <https://issues.guix.gnu.org/24937>, there is
> > > hopefully less pressure to skip the remove-unused-links phase.
> > >
> > > Should we close this issue?
> > As a hard disk user, I'm leaning towards "no".
>
> At the REPL, could you do:
>
>   ,use(ice-9 ftw)
>   ,t (length (scandir "/gnu/store/.links"))
>
> ?
>
> On my SSD I get:
>
>   $4 = 438356
>   ;; 24.613712s real time, 10.195698s run time.  1.805636s spent in GC.
scheme@(guile-user)> ,use (ice-9 ftw)
scheme@(guile-user)> ,t (length (scandir "/gnu/store/.links"))
$1 = 213027
;; 1417.872747s real time, 28.514293s run time. 1.284866s spent in GC.
scheme@(guile-user)> (/ 1417.872747 60)
$2 = 23.63121245

So yeah, assuming that scandir scales linearly, if my store was as big
as yours, I could eat lunch and GC still wouldn't be finished (for
context, lunch breaks in my country are only 30 minutes).
T
T
Tobias Geerinckx-Rice wrote on 19 Jul 21:25 +0200
Re: [bug#51427] [PATCH] nix: libstore: Do not remove unused links when deleting specific items.
(name . Liliana Marie Prikler)(address . liliana.prikler@gmail.com)
877d48lx41@nckx
Liliana Marie Prikler 写道:
Toggle quote (5 lines)
> scheme@(guile-user)> ,t (length (scandir "/gnu/store/.links"))
> $1 = 213027
> ;; 1417.872747s real time, 28.514293s run time. 1.284866s spent
> in GC.

[…]

Toggle quote (2 lines)
> So yeah, assuming that scandir scales linearly

…your rotational drive is beyond ridiculously slower than mine (an
ST1000DM010-2EP102):

athena.tobias.gr:~ λ echo 3 | sudo tee /proc/sys/vm/drop_caches
3
athena.tobias.gr:~ λ guix repl
[…]
scheme@(guix-user)> ,use (ice-9 ftw)
scheme@(guix-user)> ,t (length (scandir "/gnu/store/.links"))
$1 = 164437
;; 7.081361s real time, 2.569773s run time. 0.199963s spent in
GC.

Kind regards,

T G-R
-----BEGIN PGP SIGNATURE-----

iIMEARYKACsWIQT12iAyS4c9C3o4dnINsP+IT1VteQUCYtcGHg0cbWVAdG9iaWFz
LmdyAAoJEA2w/4hPVW15jLUA/RUtMKSluKICEiuL0dauuBZAjb6INZhfjlDq6UM9
o0tLAQC/+FGYEsWJ01hyRbU2KcFGoUXEmN8aGlu7s7Uo9658Cw==
=pcnk
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 21 Jul 11:21 +0200
Re: bug#51427: [PATCH] nix: libstore: Do not remove unused links when deleting specific items.
(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)
877d46n7n5.fsf_-_@gnu.org
Hi,

Tobias Geerinckx-Rice <me@tobias.gr> skribis:

Toggle quote (21 lines)
> Liliana Marie Prikler 写道:
>> scheme@(guile-user)> ,t (length (scandir "/gnu/store/.links"))
>> $1 = 213027
>> ;; 1417.872747s real time, 28.514293s run time. 1.284866s spent in
>> GC.
>
> […]
>
>> So yeah, assuming that scandir scales linearly
>
> …your rotational drive is beyond ridiculously slower than mine (an
> ST1000DM010-2EP102):
>
> athena.tobias.gr:~ λ echo 3 | sudo tee /proc/sys/vm/drop_caches 3
> athena.tobias.gr:~ λ guix repl
> […]
> scheme@(guix-user)> ,use (ice-9 ftw)
> scheme@(guix-user)> ,t (length (scandir "/gnu/store/.links"))
> $1 = 164437
> ;; 7.081361s real time, 2.569773s run time. 0.199963s spent in GC.

It’s crazy that there are two orders of magnitude of difference between
these two hard disks.

Liliana, is your hard disk old or low-end?

I agree that we should strive to have good performance on that kind of
hardware too, but I don’t know how to get there.

Ludo’.
L
L
Liliana Marie Prikler wrote on 21 Jul 20:02 +0200
12c768df412e23b4d69e87631cd805f741c38ce3.camel@gmail.com
Am Donnerstag, dem 21.07.2022 um 11:21 +0200 schrieb Ludovic Courtès:
Toggle quote (31 lines)
> Hi,
>
> Tobias Geerinckx-Rice <me@tobias.gr> skribis:
>
> > Liliana Marie Prikler 写道:
> > > scheme@(guile-user)> ,t (length (scandir "/gnu/store/.links"))
> > > $1 = 213027
> > > ;; 1417.872747s real time, 28.514293s run time.  1.284866s spent
> > > in
> > > GC.
> >
> > […]
> >
> > > So yeah, assuming that scandir scales linearly
> >
> > …your rotational drive is beyond ridiculously slower than mine (an
> > ST1000DM010-2EP102):
> >
> > athena.tobias.gr:~ λ echo 3 | sudo tee /proc/sys/vm/drop_caches 3
> > athena.tobias.gr:~ λ guix repl
> > […]
> > scheme@(guix-user)> ,use (ice-9 ftw)
> > scheme@(guix-user)> ,t (length (scandir "/gnu/store/.links"))
> > $1 = 164437
> > ;; 7.081361s real time, 2.569773s run time.  0.199963s spent in   
> > GC.
>
> It’s crazy that there are two orders of magnitude of difference
> between these two hard disks.
>
> Liliana, is your hard disk old or low-end?
I'm not too sure about age, but it's probably low-end in terms of
speed. There's room for 2TB data after all.

Toggle quote (2 lines)
> I agree that we should strive to have good performance on that kind
> of hardware too, but I don’t know how to get there.
I don't think deleting links will ever be fast on that disk. But what
I've been saying the whole time is that I don't always need the links
deleted. I think adding "expert" switches to skip these phases might
actually be enough – after all, if I ever do want to run a full GC, the
information ought to be the same, no?
L
L
Ludovic Courtès wrote on 22 Jul 14:14 +0200
(name . Liliana Marie Prikler)(address . liliana.prikler@gmail.com)
87ilnpibu1.fsf@gnu.org
Hi,

Liliana Marie Prikler <liliana.prikler@gmail.com> skribis:

Toggle quote (6 lines)
> I don't think deleting links will ever be fast on that disk. But what
> I've been saying the whole time is that I don't always need the links
> deleted. I think adding "expert" switches to skip these phases might
> actually be enough – after all, if I ever do want to run a full GC, the
> information ought to be the same, no?

The expert will have to know that skipping that phase will have the
effect of *not* freeing space on the device, so…

Ludo’.
M
M
Maxime Devos wrote on 22 Jul 15:39 +0200
Re: [bug#51427] [PATCH] nix: libstore: Do not remove unused links when deleting specific items.
4b81f06f-3e89-5a33-ae30-1710e447f849@telenet.be
On 22-07-2022 14:14, Ludovic Courtès wrote:
Toggle quote (11 lines)
> Hi,
>
> Liliana Marie Prikler<liliana.prikler@gmail.com> skribis:
>
>> I don't think deleting links will ever be fast on that disk. But what
>> I've been saying the whole time is that I don't always need the links
>> deleted. I think adding "expert" switches to skip these phases might
>> actually be enough – after all, if I ever do want to run a full GC, the
>> information ought to be the same, no?
> The expert will have to know that skipping that phase will have the
> effect of *not* freeing space on the device, so…
I believe the word "expert" implies that the expert knows that,
otherwise they are, by definition, not an expert, so I don't see your
point. So ... what does the ... after the 'so' hide here? I don't
understand what point you are trying to make here.
The idea is to, when deleting specific items, just do that, and not
start iterating over all (*) the other things in the store.
This is important for, say, testing substitution code efficiently (or
SWH code as mentioned previously, etc).
There, the lack of freeing space is not a concern.  This appears, after
reading debbugs, to be already mentioned at
Maybe something that would be acceptable to all parties: When deleting
specific store items, don't remove _all_ the unused links, but only
remove the unused links that correspond to deleted files. Which after
reading 51427 appears to already have been proposed.
Toggle quote (2 lines)
> Maybe that proposal is bogus though because you’d need to know the hash
> of the files being removed, which means reading them…
I don't see the problem -- when deleting a specific store item, read the
files one-by-one, hash them one-by-one, and delete the link if appropriate.
lessening the need
Sure, but as informally mentioned by, say, Liliana, even after that
things remain ~ O(n) (or probably O(n lg n) if the file system uses some
tree structure) where n=size of the store, which in any realistic
situation is going to be way slower than O(m), where m = the number of
individual store items to delete, for reasonable implementations of
"delete individual store item". (*)
The point isn't to work-around slow "deleting unused links"
implementation, but rather to avoid inherit slowness of deleting
everything when deleting a few things suffice.
Summarised, I don't understand the reluctance to merge an implementation
of "delete individual store item" -- yes, the delete link phase is slow
and could possibly be improved, yes when using certain implementations
very little disk is freed, but those aren't the point of the patch
AFAICT, they are orthogonal concerns.
Greetings,
Maxime.
(*) Yes, I'm neglecting the difference between number of store items and
links and size of store items here, but those don't make a difference to
the conclusion here.
Attachment: file
Attachment: OpenPGP_signature
L
L
Ludovic Courtès wrote on 23 Jul 01:07 +0200
(name . Maxime Devos)(address . maximedevos@telenet.be)
874jz8eogx.fsf@gnu.org
Hi,

Maxime Devos <maximedevos@telenet.be> skribis:

Toggle quote (15 lines)
> On 22-07-2022 14:14, Ludovic Courtès wrote:
>> Hi,
>>
>> Liliana Marie Prikler<liliana.prikler@gmail.com> skribis:
>>
>>> I don't think deleting links will ever be fast on that disk. But what
>>> I've been saying the whole time is that I don't always need the links
>>> deleted. I think adding "expert" switches to skip these phases might
>>> actually be enough – after all, if I ever do want to run a full GC, the
>>> information ought to be the same, no?
>> The expert will have to know that skipping that phase will have the
>> effect of *not* freeing space on the device, so…
>
> I believe the word "expert" implies that the expert knows that,

Apologies for being elliptic. My point here, as has been discussed
earlier in this thread, is that we can’t just skip that phase or we’d
simply leave files around without actually deleting them.

Thus, a command-line switch to skip the phase doesn’t seem valuable to
me because it’d let users run the GC in a way that doesn’t actually
collect garbage.

I hope this is clearer!

Thanks,
Ludo’.
L
L
Liliana Marie Prikler wrote on 23 Jul 08:52 +0200
dc0a9584b5248816ccd7d146b9d6b12417a1b688.camel@gmail.com
Am Samstag, dem 23.07.2022 um 01:07 +0200 schrieb Ludovic Courtès:
Toggle quote (29 lines)
> Hi,
>
> Maxime Devos <maximedevos@telenet.be> skribis:
>
> > On 22-07-2022 14:14, Ludovic Courtès wrote:
> > > Hi,
> > >
> > > Liliana Marie Prikler<liliana.prikler@gmail.com>  skribis:
> > >
> > > > I don't think deleting links will ever be fast on that disk. 
> > > > But what I've been saying the whole time is that I don't always
> > > > need the links deleted.  I think adding "expert" switches to
> > > > skip these phases might actually be enough – after all, if I
> > > > ever do want to run a full GC, the information ought to be the
> > > > same, no?
> > > The expert will have to know that skipping that phase will have
> > > the effect of *not* freeing space on the device, so…
> >
> > I believe the word "expert" implies that the expert knows that,
>
> Apologies for being elliptic.  My point here, as has been discussed
> earlier in this thread, is that we can’t just skip that phase or we’d
> simply leave files around without actually deleting them.
>
> Thus, a command-line switch to skip the phase doesn’t seem valuable
> to me because it’d let users run the GC in a way that doesn’t
> actually collect garbage.
>
> I hope this is clearer!
As noted before, I don't always run GC to free up X amount of space.
Even if I did, link deletion is greedy and frees up whatever it can.
So the initial suggestion to only look at what might have been freed in
this gc already makes sense. However, it was ruled complicated because
the GC is implemented in C++.  

My personal motivation to just skip the phase entirely comes from the
hypothesis that the store is in a sane state even if the links are not
deleted. Particularly, if I `guix gc broken-item' and `guix build
broken-item', even without deleting links, the broken-item should now
be fixed. This has practical advantages over `guix build --repair': if
the last `guix package' or `guix system' failed mid-way, any user, not
just root, can simply `guix gc' the broken items.

Now, I understand that as a default, you never want to skip this phase,
because it doesn't actually free up disk space. But if you have a slow
disk with large space, do you really need to free all that much space,
or would it be fine to delay freeing it until a later date?

Cheers
M
M
Maxime Devos wrote on 23 Jul 12:24 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
5fc80bdb-34e2-c3d0-41b1-4192212bd751@telenet.be
On 23-07-2022 01:07, Ludovic Courtès wrote:
Toggle quote (7 lines)
> Apologies for being elliptic. My point here, as has been discussed
> earlier in this thread, is that we can’t just skip that phase or we’d
> simply leave files around without actually deleting them.
>
> Thus, a command-line switch to skip the phase doesn’t seem valuable to
> me because it’d let users run the GC in a way that doesn’t actually
> collect garbage.
It's definitively possible to skip the phase, AFAICT -- there was some
code doing exactly that, and for some use cases the limitations (i.e.,
very limited amount of space actually being freed) were found to be
acceptable, for the user isn't trying to free space in the first place
(doing that would be a nice side-effect, but not what the user was
trying to accomplish), and other people aren't impacted by the
limitations as it's an off-by-default switch.
As noted before, sometimes the point isn't to free space, but only to
collect _some_ (not all, just some, i.e., the store item, but the
individual files in the store item don't matter) garbage. For some users
and use cases, not freeing space is not a problem, as mentioned in the
previous mail:
Toggle quote (6 lines)
> This is important for, say, testing substitution code efficiently (or
> SWH code as mentioned previously, etc).
>
> There, the lack of freeing space is not a concern.  This appears,
> after reading debbugs, to be already mentioned at
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=51427#20.
For these users, skipping that phase (or another solution, whatever
works), is quite valuable and not a problem.
(Remember, people modifying the substitution code or such are users too.)
That said, there were some approaches mentioned that do skip the phase,
but in a manner such that space is still freed (but only the relevant
space, not things from the whole store, so performance wouldn't be
horrible).
Toggle quote (3 lines)
> Thus, a command-line switch to skip the phase doesn’t seem valuable to
> me because it’d let users run the GC in a way that doesn’t actually
> collect garbage.
If the user runs "guix gc" with an off-by-default switch that isn't
recommended for general usage, whose description makes it look like some
arcane thing (if you didn't know the phase already, how would you know
what ‘don't delete unused links’ means?), while they actually just
wanted to free space, then that's their problem; not Guix, I think. 
Furthermore, if they somehow do that by mistake, then can just do a
regular "guix gc" afterwards -- it's a quite recoverable mistake. As
such, I don't see a problem.
Also, it _does_ collect garbage -- it collects the /gnu/store/... item,
it just doesn't collect _all_ the garbage (it doesn't collect the
individual files in the store item or the things in /gnu/store/.links).
If you mean it doesn't fit under "guix gc" because it doesn't free much
space and hence doesn't fit in the concept of "gc'ing", I suppose we
could make a new command "guix $bikeshed" that's like "guix gc" except
sometimes it doesn't free much space, though I don't see the point when
we already have "guix gc" where it's easy to just add a flag.
Alternatively, we could just inflate the concept of "GC" a little such
that it becomes more useful for some people without making it worse for
others instead of defining a new command.
Summarised: gc'ing is not limited to freeing $N MiB, there are other
valid reasons to gc too as mentioned previously (make some slots empty
in the weak hash table that is /gnu/store), why are attempts to
implement some huge optimisations for the latter rejected when they
don't impact the former at all?
Or summarised another way: we aren't trying to remove the GC, rather
previously the GC mostly only supported running a full cycle, with this
patch the GC also has a more incremental mode of operation.
Greetings,
Maxime.
Attachment: OpenPGP_signature
?