On 22-07-2022 14:14, Ludovic Courtès wrote:
Hi,

Liliana Marie Prikler <liliana.prikler@gmail.com> skribis:

I don't think deleting links will ever be fast on that disk.  But what
I've been saying the whole time is that I don't always need the links
deleted.  I think adding "expert" switches to skip these phases might
actually be enough – after all, if I ever do want to run a full GC, the
information ought to be the same, no? 
The expert will have to know that skipping that phase will have the
effect of *not* freeing space on the device, so…

I believe the word "expert" implies that the expert knows that, otherwise they are, by definition, not an expert, so I don't see your point. So ... what does the ... after the 'so' hide here? I don't understand what point you are trying to make here.

The idea is to, when deleting specific items, just do that, and not start iterating over all (*) the other things in the store.

This is important for, say, testing substitution code efficiently (or SWH code as mentioned previously, etc).

There, the lack of freeing space is not a concern.  This appears, after reading debbugs, to be already mentioned at https://debbugs.gnu.org/cgi/bugreport.cgi?bug=51427#20.

Maybe something that would be acceptable to all parties: When deleting specific store items, don't remove _all_ the unused links, but only remove the unused links that correspond to deleted files. Which after reading 51427 appears to already have been proposed.

Maybe that proposal is bogus though because you’d need to know the hash
of the files being removed, which means reading them…
I don't see the problem -- when deleting a specific store item, read the files one-by-one, hash them one-by-one, and delete the link if appropriate.

> Things about https://debbugs.gnu.org/cgi/bugreport.cgi?bug=24937 lessening the need

Sure, but as informally mentioned by, say, Liliana, even after that things remain ~ O(n) (or probably O(n lg n) if the file system uses some tree structure) where n=size of the store, which in any realistic situation is going to be way slower than O(m), where m = the number of individual store items to delete, for reasonable implementations of "delete individual store item". (*)

The point isn't to work-around slow "deleting unused links" implementation, but rather to avoid inherit slowness of deleting everything when deleting a few things suffice.

Summarised, I don't understand the reluctance to merge an implementation of "delete individual store item" -- yes, the delete link phase is slow and could possibly be improved, yes when using certain implementations very little disk is freed, but those aren't the point of the patch AFAICT, they are orthogonal concerns.

Greetings,
Maxime.

(*) Yes, I'm neglecting the difference between number of store items and links and size of store items here, but those don't make a difference to the conclusion here.