"deleting unused links" GC phase is too slow

OpenSubmitted by Ludovic Courtès.
Details
4 participants
  • Ludovic Courtès
  • Mark H Weaver
  • Ricardo Wurmus
  • Ricardo Wurmus
Owner
unassigned
Severity
important
L
L
Ludovic Courtès wrote on 13 Nov 2016 18:41
(address . bug-guix@gnu.org)
87wpg7ffbm.fsf@gnu.org
‘LocalStore::removeUnusedLinks’ traverses all the entries in/gnu/store/.links and calls lstat(2) on each one of them and checks‘st_nlink’ to determine whether they can be deleted.
There are two problems: lstat(2) can be slow on spinning disks as foundon hydra.gnu.org, and the algorithm is proportional in the number ofentries in /gnu/store/.links, which is a lot on hydra.gnu.org.
Ludo’.
L
L
Ludovic Courtès wrote on 9 Dec 2016 23:43
(address . 24937@debbugs.gnu.org)(name . Mark H Weaver)(address . mhw@netris.org)
87wpf867v6.fsf@gnu.org
ludo@gnu.org (Ludovic Courtès) skribis:
Toggle quote (8 lines)> ‘LocalStore::removeUnusedLinks’ traverses all the entries in> /gnu/store/.links and calls lstat(2) on each one of them and checks> ‘st_nlink’ to determine whether they can be deleted.>> There are two problems: lstat(2) can be slow on spinning disks as found> on hydra.gnu.org, and the algorithm is proportional in the number of> entries in /gnu/store/.links, which is a lot on hydra.gnu.org.
On Dec. 2 on guix-sysadmin@gnu.org, Mark described an improvement thatnoticeably improved performance:
The idea is to read the entire /gnu/store/.links directory, sort the entries by inode number, and then iterate over the entries by inode number, calling 'lstat' on each one and deleting the ones with a link count of 1.
The reason this is so much faster is because the inodes are stored on disk in order of inode number, so this leads to a sequential access pattern on disk instead of a random access pattern.
The difficulty is that the directory is too large to comfortably store all of the entries in virtual memory. Instead, the entries should be written to temporary files on disk, and then sorted using merge sort to ensure sequential access patterns during sorting. Fortunately, this is exactly what 'sort' does from GNU coreutils.
So, for now, I've implemented this as a pair of small C programs that is used in a pipeline with GNU sort. The first program simply reads a directory and writes lines of the form "<inode> <name>" to stdout. (Unfortunately, "ls -i" calls stat on each entry, so it can't be used). This is piped through 'sort -n' and then into another small C program that reads these lines, calls 'lstat' on each one, and deletes the non-directories with link count 1.
Regarding memory usage, I replied:
Really?
For each entry, we have to store roughly 70 bytes for the file name (or 52 if we consider only the basename), plus 8 bytes for the inode number; let’s say 64 bytes.
If we have 10 M entries, that’s 700 MB (or 520 MB), which is a lot, but maybe acceptable?
At worst, we may still see an improvement if we proceed by batches: we read 10000 directory entries (7 MB), sort them, and stat them, then read the next 10000 entries. WDYT?
Ludo’.
L
L
Ludovic Courtès wrote on 9 Dec 2016 23:44
control message for bug #24937
(address . control@debbugs.gnu.org)
87vaus67uc.fsf@gnu.org
severity 24937 important
L
L
Ludovic Courtès wrote on 11 Dec 2016 14:46
Re: bug#24937: "deleting unused links" GC phase is too slow
(address . 24937@debbugs.gnu.org)(name . Mark H Weaver)(address . mhw@netris.org)
87lgvm4lzu.fsf@gnu.org
Hello!
Here’s a proposed patch that follows your suggestion, Mark, but placesan upper bound on the number of directory entries loaded in memory.
On my laptop, which has ~500k entries in /gnu/store/.links, the resultis something like this (notice the inode numbers in ‘lstat’ calls):
Toggle snippet (22 lines)13738 write(4, "gmlo\0\0\0\0\31\0\0\0\0\0\0\0deleting unused "..., 48) = 4813738 open("/gnu/store/.links", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 813738 fstat(8, {st_dev=makedev(8, 2), st_ino=4014083, st_mode=S_IFDIR|0755, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=119224, st_size=60977152, st_atime=2016/12/11-12:01:59, st_mtime=2016/12/11-09:39:45, st_ctime=2016/12/11-09:39:45}) = 013738 getdents(8, {{d_ino=4014083, d_off=4294967296, d_reclen=24, d_name="."}[...]13738 lstat("/gnu/store/.links/1f2f3g8waxwymp9sl2slcfyara164i8w1y2sz3h9js2fcviv2rnc", {st_dev=makedev(8, 2), st_ino=47, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=88, st_size=41482, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:20}) = 013738 lstat("/gnu/store/.links/1p25kpyfw354in4kykmgh5sy9h925hnil1jdzgxhz7n6abbws8px", {st_dev=makedev(8, 2), st_ino=65, st_mode=S_IFREG|0444, st_nlink=9, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2313, st_atime=2016/12/08-21:02:26, st_mtime=1970/01/01-01:00:01, st_ctime=2016/12/11-01:44:49}) = 013738 lstat("/gnu/store/.links/163187g637br9ys5pmshb01wjav53bs1g1a83m7c2alpdyx3yqz2", {st_dev=makedev(8, 2), st_ino=68, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=32, st_size=13561, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:20}) = 0[...]13738 getdents(8, {{d_ino=4257114, d_off=1734093409898198492,[...]13738 lstat("/gnu/store/.links/1m6g06i01ybbkhjjbirjnj7fckw1b772cwygkvbd6v6zgkln7f7m", {st_dev=makedev(8, 2), st_ino=19, st_mode=S_IFREG|0444, st_nlink=4, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2553, st_atime=2016/12/08-21:02:54, st_mtime=1970/01/01-01:00:01, st_ctime=2016/12/07-00:05:19}) = 013738 lstat("/gnu/store/.links/1ml8n3q55ikn8h60sn67jq1y7z7mvdp5kwr33pqrid2r6kk1d4kb", {st_dev=makedev(8, 2), st_ino=30, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2090, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:21}) = 013738 lstat("/gnu/store/.links/1c8cvwlqyyqby3k13cwm40g26pwca5iiz5dcj43xrgn9y91lfvc2", {st_dev=makedev(8, 2), st_ino=33, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=16, st_size=7958, st_atime=2015/11/04-18:55:32, st_mtime=1970/01/01-01:00:01, st_ctime=2016/01/05-11:35:49}) = 0[...]13738 getdents(8, {{d_ino=328672,[...]13738 lstat("/gnu/store/.links/1l55l59dxb74himmkfzx5v63cv7287i6rjhdns1fdlwajqd73lnz", {st_dev=makedev(8, 2), st_ino=21, st_mode=S_IFREG|0444, st_nlink=31, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=615, st_atime=2016/12/08-21:02:30, st_mtime=1970/01/01-01:00:01, st_ctime=2016/12/11-01:44:47}) = 013738 lstat("/gnu/store/.links/1c7mm5amw743mb45f1zg4d4r3g549ch35wks9izkcgkx0jirpxsg", {st_dev=makedev(8, 2), st_ino=48, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=360, st_size=176750, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:20}) = 013738 lstat("/gnu/store/.links/0z5s7b0yk8mfn1np6gk3cdbmpnjgxg1g0l8vfq1aa01zwp06d3f0", {st_dev=makedev(8, 2), st_ino=61, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=1440, st_atime=2016/11/03-20:37:50, st_mtime=1970/01/01-01:00:01, st_ctime=2016/11/07-00:01:57}) = 0
I can’t tell exactly how this affects performance because my machine hasan SSD where this operation takes ~3 seconds on a cold cache. I suspectit has performance comparable to that of doing all the ‘readdir’ at oncefollowed by all the ‘lstat’.
Mark, how does that sound?
I’d like to commit it soon if there are no objections.
Thanks,Ludo’.
Toggle diff (96 lines)diff --git a/nix/libstore/gc.cc b/nix/libstore/gc.ccindex 72eff52..db58603 100644--- a/nix/libstore/gc.cc+++ b/nix/libstore/gc.cc@@ -545,6 +545,9 @@ void LocalStore::tryToDelete(GCState & state, const Path & path) } +/* Like 'dirent', but with just what we need. */+typedef std::pair<Path, ino_t> MiniDirEntry;+ /* Unlink all files in /nix/store/.links that have a link count of 1, which indicates that there are no other links and so they can be safely deleted. FIXME: race condition with optimisePath(): we@@ -555,32 +558,57 @@ void LocalStore::removeUnusedLinks(const GCState & state) AutoCloseDir dir = opendir(linksDir.c_str()); if (!dir) throw SysError(format("opening directory `%1%'") % linksDir); + /* Maximum number of entries stored in memory; each 'MiniDirEntry' takes+ ~80 bytes. */+ const size_t maxEntries = 100000;+ long long actualSize = 0, unsharedSize = 0; - struct dirent * dirent;- while (errno = 0, dirent = readdir(dir)) {- checkInterrupt();- string name = dirent->d_name;- if (name == "." || name == "..") continue;- Path path = linksDir + "/" + name;-- struct stat st;- if (lstat(path.c_str(), &st) == -1)- throw SysError(format("statting `%1%'") % path);-- if (st.st_nlink != 1) {- unsigned long long size = st.st_blocks * 512ULL;- actualSize += size;- unsharedSize += (st.st_nlink - 1) * size;- continue;- }-- printMsg(lvlTalkative, format("deleting unused link `%1%'") % path);-- if (unlink(path.c_str()) == -1)- throw SysError(format("deleting `%1%'") % path);-- state.results.bytesFreed += st.st_blocks * 512;+ bool endOfDir = false;++ while (!endOfDir) {+ /* Read as many entries as possible at once, up to 'maxEntries'. */+ std::list<MiniDirEntry> entries;+ struct dirent * dirent;+ while (errno = 0,+ (entries.size() < maxEntries) && (dirent = readdir(dir))) {+ checkInterrupt();+ string name = dirent->d_name;+ if (name == "." || name == "..") continue;+ entries.emplace_back(MiniDirEntry(name, dirent->d_ino));+ }++ endOfDir = (dirent == NULL);++ /* Sort entries by inode number to minimize disk seeks induced by the+ following 'lstat' calls. */+ entries.sort([] (const MiniDirEntry& e1, const MiniDirEntry& e2) {+ return e1.second < e2.second;+ });++ for (auto && item: entries) {+ checkInterrupt();++ Path path = linksDir + "/" + item.first;++ struct stat st;+ if (lstat(path.c_str(), &st) == -1)+ throw SysError(format("statting `%1%'") % path);++ if (st.st_nlink != 1) {+ unsigned long long size = st.st_blocks * 512ULL;+ actualSize += size;+ unsharedSize += (st.st_nlink - 1) * size;+ continue;+ }+x+ printMsg(lvlTalkative, format("deleting unused link `%1%'") % path);++ if (unlink(path.c_str()) == -1)+ throw SysError(format("deleting `%1%'") % path);++ state.results.bytesFreed += st.st_blocks * 512;+ } } struct stat st;
M
M
Mark H Weaver wrote on 11 Dec 2016 15:23
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 24937@debbugs.gnu.org)
87twaaa6j9.fsf@netris.org
ludo@gnu.org (Ludovic Courtès) writes:
Toggle quote (34 lines)> Here’s a proposed patch that follows your suggestion, Mark, but places> an upper bound on the number of directory entries loaded in memory.>> On my laptop, which has ~500k entries in /gnu/store/.links, the result> is something like this (notice the inode numbers in ‘lstat’ calls):>> 13738 write(4, "gmlo\0\0\0\0\31\0\0\0\0\0\0\0deleting unused "..., 48) = 48> 13738 open("/gnu/store/.links", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 8> 13738 fstat(8, {st_dev=makedev(8, 2), st_ino=4014083, st_mode=S_IFDIR|0755, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=119224, st_size=60977152, st_atime=2016/12/11-12:01:59, st_mtime=2016/12/11-09:39:45, st_ctime=2016/12/11-09:39:45}) = 0> 13738 getdents(8, {{d_ino=4014083, d_off=4294967296, d_reclen=24, d_name="."}> [...]> 13738 lstat("/gnu/store/.links/1f2f3g8waxwymp9sl2slcfyara164i8w1y2sz3h9js2fcviv2rnc", {st_dev=makedev(8, 2), st_ino=47, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=88, st_size=41482, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:20}) = 0> 13738 lstat("/gnu/store/.links/1p25kpyfw354in4kykmgh5sy9h925hnil1jdzgxhz7n6abbws8px", {st_dev=makedev(8, 2), st_ino=65, st_mode=S_IFREG|0444, st_nlink=9, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2313, st_atime=2016/12/08-21:02:26, st_mtime=1970/01/01-01:00:01, st_ctime=2016/12/11-01:44:49}) = 0> 13738 lstat("/gnu/store/.links/163187g637br9ys5pmshb01wjav53bs1g1a83m7c2alpdyx3yqz2", {st_dev=makedev(8, 2), st_ino=68, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=32, st_size=13561, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:20}) = 0> [...]> 13738 getdents(8, {{d_ino=4257114, d_off=1734093409898198492,> [...]> 13738 lstat("/gnu/store/.links/1m6g06i01ybbkhjjbirjnj7fckw1b772cwygkvbd6v6zgkln7f7m", {st_dev=makedev(8, 2), st_ino=19, st_mode=S_IFREG|0444, st_nlink=4, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2553, st_atime=2016/12/08-21:02:54, st_mtime=1970/01/01-01:00:01, st_ctime=2016/12/07-00:05:19}) = 0> 13738 lstat("/gnu/store/.links/1ml8n3q55ikn8h60sn67jq1y7z7mvdp5kwr33pqrid2r6kk1d4kb", {st_dev=makedev(8, 2), st_ino=30, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2090, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:21}) = 0> 13738 lstat("/gnu/store/.links/1c8cvwlqyyqby3k13cwm40g26pwca5iiz5dcj43xrgn9y91lfvc2", {st_dev=makedev(8, 2), st_ino=33, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=16, st_size=7958, st_atime=2015/11/04-18:55:32, st_mtime=1970/01/01-01:00:01, st_ctime=2016/01/05-11:35:49}) = 0> [...]> 13738 getdents(8, {{d_ino=328672,> [...]> 13738 lstat("/gnu/store/.links/1l55l59dxb74himmkfzx5v63cv7287i6rjhdns1fdlwajqd73lnz", {st_dev=makedev(8, 2), st_ino=21, st_mode=S_IFREG|0444, st_nlink=31, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=615, st_atime=2016/12/08-21:02:30, st_mtime=1970/01/01-01:00:01, st_ctime=2016/12/11-01:44:47}) = 0> 13738 lstat("/gnu/store/.links/1c7mm5amw743mb45f1zg4d4r3g549ch35wks9izkcgkx0jirpxsg", {st_dev=makedev(8, 2), st_ino=48, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=360, st_size=176750, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:20}) = 0> 13738 lstat("/gnu/store/.links/0z5s7b0yk8mfn1np6gk3cdbmpnjgxg1g0l8vfq1aa01zwp06d3f0", {st_dev=makedev(8, 2), st_ino=61, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=1440, st_atime=2016/11/03-20:37:50, st_mtime=1970/01/01-01:00:01, st_ctime=2016/11/07-00:01:57}) = 0>> I can’t tell exactly how this affects performance because my machine has> an SSD where this operation takes ~3 seconds on a cold cache. I suspect> it has performance comparable to that of doing all the ‘readdir’ at once> followed by all the ‘lstat’.>> Mark, how does that sound?
I think we should sort the entire directory using merge sort backed todisk files. If we load chunks of the directory, sort them and processthem individually, I expect that this will increase the amount of I/Orequired by a non-trivial factor. In each pass, we would load blocks ofinodes from disk, almost all of which are likely to be present in thestore and thus linked from the directory, but in this scheme we willprocess only a small number of them and drop the rest on the floor to beread again in the next pass. Given that even my fairly optimalimplementation takes about 35 minutes to run on Hydra, I'd prefer toavoid multiplying that by a non-trivial factor.
Why not just use GNU sort? It already exists, and does exactly what weneed.
If you object to using an external program for some reason, I wouldprefer to re-implement a similar algorithm in the daemon.
Thanks, Mark
L
L
Ludovic Courtès wrote on 11 Dec 2016 19:02
(name . Mark H Weaver)(address . mhw@netris.org)(address . 24937@debbugs.gnu.org)
87twaa2vjx.fsf@gnu.org
Mark H Weaver <mhw@netris.org> skribis:
Toggle quote (47 lines)> ludo@gnu.org (Ludovic Courtès) writes:>>> Here’s a proposed patch that follows your suggestion, Mark, but places>> an upper bound on the number of directory entries loaded in memory.>>>> On my laptop, which has ~500k entries in /gnu/store/.links, the result>> is something like this (notice the inode numbers in ‘lstat’ calls):>>>> 13738 write(4, "gmlo\0\0\0\0\31\0\0\0\0\0\0\0deleting unused "..., 48) = 48>> 13738 open("/gnu/store/.links", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 8>> 13738 fstat(8, {st_dev=makedev(8, 2), st_ino=4014083, st_mode=S_IFDIR|0755, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=119224, st_size=60977152, st_atime=2016/12/11-12:01:59, st_mtime=2016/12/11-09:39:45, st_ctime=2016/12/11-09:39:45}) = 0>> 13738 getdents(8, {{d_ino=4014083, d_off=4294967296, d_reclen=24, d_name="."}>> [...]>> 13738 lstat("/gnu/store/.links/1f2f3g8waxwymp9sl2slcfyara164i8w1y2sz3h9js2fcviv2rnc", {st_dev=makedev(8, 2), st_ino=47, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=88, st_size=41482, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:20}) = 0>> 13738 lstat("/gnu/store/.links/1p25kpyfw354in4kykmgh5sy9h925hnil1jdzgxhz7n6abbws8px", {st_dev=makedev(8, 2), st_ino=65, st_mode=S_IFREG|0444, st_nlink=9, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2313, st_atime=2016/12/08-21:02:26, st_mtime=1970/01/01-01:00:01, st_ctime=2016/12/11-01:44:49}) = 0>> 13738 lstat("/gnu/store/.links/163187g637br9ys5pmshb01wjav53bs1g1a83m7c2alpdyx3yqz2", {st_dev=makedev(8, 2), st_ino=68, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=32, st_size=13561, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:20}) = 0>> [...]>> 13738 getdents(8, {{d_ino=4257114, d_off=1734093409898198492,>> [...]>> 13738 lstat("/gnu/store/.links/1m6g06i01ybbkhjjbirjnj7fckw1b772cwygkvbd6v6zgkln7f7m", {st_dev=makedev(8, 2), st_ino=19, st_mode=S_IFREG|0444, st_nlink=4, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2553, st_atime=2016/12/08-21:02:54, st_mtime=1970/01/01-01:00:01, st_ctime=2016/12/07-00:05:19}) = 0>> 13738 lstat("/gnu/store/.links/1ml8n3q55ikn8h60sn67jq1y7z7mvdp5kwr33pqrid2r6kk1d4kb", {st_dev=makedev(8, 2), st_ino=30, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2090, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:21}) = 0>> 13738 lstat("/gnu/store/.links/1c8cvwlqyyqby3k13cwm40g26pwca5iiz5dcj43xrgn9y91lfvc2", {st_dev=makedev(8, 2), st_ino=33, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=16, st_size=7958, st_atime=2015/11/04-18:55:32, st_mtime=1970/01/01-01:00:01, st_ctime=2016/01/05-11:35:49}) = 0>> [...]>> 13738 getdents(8, {{d_ino=328672,>> [...]>> 13738 lstat("/gnu/store/.links/1l55l59dxb74himmkfzx5v63cv7287i6rjhdns1fdlwajqd73lnz", {st_dev=makedev(8, 2), st_ino=21, st_mode=S_IFREG|0444, st_nlink=31, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=615, st_atime=2016/12/08-21:02:30, st_mtime=1970/01/01-01:00:01, st_ctime=2016/12/11-01:44:47}) = 0>> 13738 lstat("/gnu/store/.links/1c7mm5amw743mb45f1zg4d4r3g549ch35wks9izkcgkx0jirpxsg", {st_dev=makedev(8, 2), st_ino=48, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=360, st_size=176750, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:20}) = 0>> 13738 lstat("/gnu/store/.links/0z5s7b0yk8mfn1np6gk3cdbmpnjgxg1g0l8vfq1aa01zwp06d3f0", {st_dev=makedev(8, 2), st_ino=61, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=1440, st_atime=2016/11/03-20:37:50, st_mtime=1970/01/01-01:00:01, st_ctime=2016/11/07-00:01:57}) = 0>>>> I can’t tell exactly how this affects performance because my machine has>> an SSD where this operation takes ~3 seconds on a cold cache. I suspect>> it has performance comparable to that of doing all the ‘readdir’ at once>> followed by all the ‘lstat’.>>>> Mark, how does that sound?>> I think we should sort the entire directory using merge sort backed to> disk files. If we load chunks of the directory, sort them and process> them individually, I expect that this will increase the amount of I/O> required by a non-trivial factor. In each pass, we would load blocks of> inodes from disk, almost all of which are likely to be present in the> store and thus linked from the directory, but in this scheme we will> process only a small number of them and drop the rest on the floor to be> read again in the next pass. Given that even my fairly optimal> implementation takes about 35 minutes to run on Hydra, I'd prefer to> avoid multiplying that by a non-trivial factor.
Sure, though it’s not obvious to me how much of a difference it makes;my guess is that processing in large chunks is already a win, but we’dhave to measure.
Toggle quote (3 lines)> Why not just use GNU sort? It already exists, and does exactly what we> need.
Does ‘sort’ manage to avoid reading whole files in memory? If not, Ithink it wouldn’t buy us anything to use it.
Toggle quote (3 lines)> If you object to using an external program for some reason, I would> prefer to re-implement a similar algorithm in the daemon.
Yeah, I’d rather avoid serializing the list of file names/inode numberpairs just to invoke ‘sort’ on that.
Also, what algorithm are you referring to?
Thanks for the quick feedback!
Ludo’.
M
M
Mark H Weaver wrote on 11 Dec 2016 20:27
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 24937@debbugs.gnu.org)
87lgvm9sgq.fsf@netris.org
ludo@gnu.org (Ludovic Courtès) writes:
Toggle quote (17 lines)> Mark H Weaver <mhw@netris.org> skribis:>>> I think we should sort the entire directory using merge sort backed to>> disk files. If we load chunks of the directory, sort them and process>> them individually, I expect that this will increase the amount of I/O>> required by a non-trivial factor. In each pass, we would load blocks of>> inodes from disk, almost all of which are likely to be present in the>> store and thus linked from the directory, but in this scheme we will>> process only a small number of them and drop the rest on the floor to be>> read again in the next pass. Given that even my fairly optimal>> implementation takes about 35 minutes to run on Hydra, I'd prefer to>> avoid multiplying that by a non-trivial factor.>> Sure, though it’s not obvious to me how much of a difference it makes;> my guess is that processing in large chunks is already a win, but we’d> have to measure.
I agree, it would surely be a win. Given that it currently takes on theorder of a day to run this phase on Hydra, if your proposed method takes2 hours, that would be a huge win, but still not good, IMO. Even 35minutes is slower than I'd like.
Toggle quote (5 lines)>> Why not just use GNU sort? It already exists, and does exactly what we>> need.>> Does ‘sort’ manage to avoid reading whole files in memory?
Yes, it does. I monitored the 'sort' process when I first ran myoptimized pipeline. It created about 10 files in /tmp, approximately 70megabytes each as I recall, and then read them all concurrently whilewriting the sorted output.
My guess is that it reads a manageable chunk of the input, sorts it inmemory, and writes it to a temporary file. I guess it repeats thisprocess, writing multiple temporary files, until the entire input isconsumed, and then reads all of those temporary files, merging themtogether into the output stream.
Toggle quote (6 lines)>> If you object to using an external program for some reason, I would>> prefer to re-implement a similar algorithm in the daemon.>> Yeah, I’d rather avoid serializing the list of file names/inode number> pairs just to invoke ‘sort’ on that.
Sure, I agree that it would be better to avoid that, but IMO not at thecost of using O(N) memory instead of O(1) memory, nor at the cost ofmultiplying the amount of disk I/O by a non-trivial factor.
Toggle quote (2 lines)> Also, what algorithm are you referring to?
The algorithm I described above, which I guess is close to what GNU sortdoes.
Thanks, Mark
L
L
Ludovic Courtès wrote on 13 Dec 2016 01:00
(name . Mark H Weaver)(address . mhw@netris.org)(address . 24937@debbugs.gnu.org)
87d1gwvgu0.fsf@gnu.org
Mark H Weaver <mhw@netris.org> skribis:
Toggle quote (24 lines)> ludo@gnu.org (Ludovic Courtès) writes:>>> Mark H Weaver <mhw@netris.org> skribis:>>>>> I think we should sort the entire directory using merge sort backed to>>> disk files. If we load chunks of the directory, sort them and process>>> them individually, I expect that this will increase the amount of I/O>>> required by a non-trivial factor. In each pass, we would load blocks of>>> inodes from disk, almost all of which are likely to be present in the>>> store and thus linked from the directory, but in this scheme we will>>> process only a small number of them and drop the rest on the floor to be>>> read again in the next pass. Given that even my fairly optimal>>> implementation takes about 35 minutes to run on Hydra, I'd prefer to>>> avoid multiplying that by a non-trivial factor.>>>> Sure, though it’s not obvious to me how much of a difference it makes;>> my guess is that processing in large chunks is already a win, but we’d>> have to measure.>> I agree, it would surely be a win. Given that it currently takes on the> order of a day to run this phase on Hydra, if your proposed method takes> 2 hours, that would be a huge win, but still not good, IMO. Even 35> minutes is slower than I'd like.
Of course.
I did some measurements with the attached program on chapters, which isa Xen VM with spinning disks underneath, similar to hydra.gnu.org. Ithas 600k entries in /gnu/store/.links.
Here’s a comparison of the “optimal” mode (bulk stats after we’vefetched all the dirents) vs. the “semi-interleaved” mode (doing bulkstats every 100,000 dirents):
Toggle snippet (19 lines)ludo@guix:~$ gcc -std=gnu99 -Wall links-traversal.c -DMODE=3ludo@guix:~$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'ludo@guix:~$ time ./a.out603858 dir_entries, 157 secondsstat took 1 seconds
real 2m38.508suser 0m0.324ssys 0m1.824sludo@guix:~$ gcc -std=gnu99 -Wall links-traversal.c -DMODE=2ludo@guix:~$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'ludo@guix:~$ time ./a.out 3852 dir_entries, 172 seconds (including stat)
real 2m51.827suser 0m0.312ssys 0m1.808s
Semi-interleaved is ~12% slower here (not sure how reproducible that isthough).
Toggle quote (16 lines)>>> Why not just use GNU sort? It already exists, and does exactly what we>>> need.>>>> Does ‘sort’ manage to avoid reading whole files in memory?>> Yes, it does. I monitored the 'sort' process when I first ran my> optimized pipeline. It created about 10 files in /tmp, approximately 70> megabytes each as I recall, and then read them all concurrently while> writing the sorted output.>> My guess is that it reads a manageable chunk of the input, sorts it in> memory, and writes it to a temporary file. I guess it repeats this> process, writing multiple temporary files, until the entire input is> consumed, and then reads all of those temporary files, merging them> together into the output stream.
OK. That seems to be that the comment above ‘sortlines’ in sort.cdescribes.
Toggle quote (10 lines)>>> If you object to using an external program for some reason, I would>>> prefer to re-implement a similar algorithm in the daemon.>>>> Yeah, I’d rather avoid serializing the list of file names/inode number>> pairs just to invoke ‘sort’ on that.>> Sure, I agree that it would be better to avoid that, but IMO not at the> cost of using O(N) memory instead of O(1) memory, nor at the cost of> multiplying the amount of disk I/O by a non-trivial factor.
Understood.
sort.c in Coreutils is very big, and we surely don’t want to duplicateall that. Yet, I’d rather not shell out to ‘sort’.
Do you know how many entries are in .links on hydra.gnu.org? If itperforms comparably to chapters, the timings suggests it should havearound 10.5M entries.
Thanks!
Ludo’.
#include <unistd.h>#include <dirent.h>#include <sys/types.h>#include <stdlib.h>#include <stdio.h>#include <sys/time.h>#include <string.h>#include <sys/stat.h>#include <assert.h>
#define STAT_INTERLEAVED 1#define STAT_SEMI_INTERLEAVED 2#define STAT_OPTIMAL 3
struct entry{ char *name; ino_t inode;};
#define MAX_ENTRIES 1000000static struct entry dir_entries[MAX_ENTRIES];
intmain (){ struct timeval start, end;
/* For useful timings, do: sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches' */ gettimeofday (&start, NULL); DIR *links = opendir ("/gnu/store/.links");
size_t count = 0;
#if MODE != STAT_INTERLEAVED void sort_entries (void) { int entry_lower (const void *a, const void *b) { return ((struct entry *)a)->inode < ((struct entry *)b)->inode; }
qsort (dir_entries, count, sizeof (struct entry), entry_lower); }#endif
void stat_entries (void) { for (size_t i = 0; i < count; i++) { struct stat st; lstat (dir_entries[i].name, &st); } }
for (struct dirent *entry = readdir (links); entry != NULL; entry = readdir (links)) { assert (count < MAX_ENTRIES); dir_entries[count].name = strdup (entry->d_name); dir_entries[count].inode = entry->d_ino;#if MODE == STAT_INTERLEAVED struct stat st; lstat (entry->d_name, &st);#endif
#if MODE == STAT_SEMI_INTERLEAVED if (count++ >= 100000) { sort_entries (); stat_entries (); count = 0; }#else count++;#endif }
#if MODE == STAT_SEMI_INTERLEAVED sort_entries (); stat_entries ();#endif
gettimeofday (&end, NULL); printf ("%zi dir_entries, %zi seconds"#if MODE != STAT_OPTIMAL " (including stat)"#endif "\n", count, end.tv_sec - start.tv_sec);
#if MODE == STAT_OPTIMAL sort_entries (); gettimeofday (&start, NULL); stat_entries (); gettimeofday (&end, NULL);
printf ("stat took %zi seconds\n", end.tv_sec - start.tv_sec);#endif
return EXIT_SUCCESS;}
M
M
Mark H Weaver wrote on 13 Dec 2016 05:09
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 24937@debbugs.gnu.org)
8737hs1nd1.fsf@netris.org
Do as you wish. I don't have time to continue discussing this.
Mark
M
M
Mark H Weaver wrote on 13 Dec 2016 13:48
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 24937@debbugs.gnu.org)
87wpf4yoz0.fsf@netris.org
ludo@gnu.org (Ludovic Courtès) writes:
Toggle quote (4 lines)> I did some measurements with the attached program on chapters, which is> a Xen VM with spinning disks underneath, similar to hydra.gnu.org. It> has 600k entries in /gnu/store/.links.
I just want to point out that 600k inodes use 150 megabytes of diskspace on ext4, which is small enough to fit in the cache, so the diskI/O will not be multiplied for such a small test case.
Toggle quote (25 lines)> Here’s a comparison of the “optimal” mode (bulk stats after we’ve> fetched all the dirents) vs. the “semi-interleaved” mode (doing bulk> stats every 100,000 dirents):>> ludo@guix:~$ gcc -std=gnu99 -Wall links-traversal.c -DMODE=3> ludo@guix:~$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'> ludo@guix:~$ time ./a.out> 603858 dir_entries, 157 seconds> stat took 1 seconds>> real 2m38.508s> user 0m0.324s> sys 0m1.824s> ludo@guix:~$ gcc -std=gnu99 -Wall links-traversal.c -DMODE=2> ludo@guix:~$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'> ludo@guix:~$ time ./a.out > 3852 dir_entries, 172 seconds (including stat)>> real 2m51.827s> user 0m0.312s> sys 0m1.808s>> Semi-interleaved is ~12% slower here (not sure how reproducible that is> though).
This directory you're testing on is more than an order of magnitudesmaller than Hydra's when it's full. Unlike in your test above, all ofthe inodes in Hydra's store won't fit in the cache.
In my opinion, the reason Hydra performs so poorly is because efficiencyand scalability are apparently very low priorities in the design of thesoftware running on it. Unfortunately, I feel that my advice in thisarea is discarded more often than not.
Toggle quote (19 lines)>>>> Why not just use GNU sort? It already exists, and does exactly what we>>>> need.>>>>>> Does ‘sort’ manage to avoid reading whole files in memory?>>>> Yes, it does. I monitored the 'sort' process when I first ran my>> optimized pipeline. It created about 10 files in /tmp, approximately 70>> megabytes each as I recall, and then read them all concurrently while>> writing the sorted output.>>>> My guess is that it reads a manageable chunk of the input, sorts it in>> memory, and writes it to a temporary file. I guess it repeats this>> process, writing multiple temporary files, until the entire input is>> consumed, and then reads all of those temporary files, merging them>> together into the output stream.>> OK. That seems to be that the comment above ‘sortlines’ in sort.c> describes.
Also, see https://en.wikipedia.org/wiki/External_sorting. This is awell-studied problem with a long history.
Toggle quote (6 lines)>>>> If you object to using an external program for some reason, I would>>>> prefer to re-implement a similar algorithm in the daemon.>>>>>> Yeah, I’d rather avoid serializing the list of file names/inode number>>> pairs just to invoke ‘sort’ on that.
I'm fairly sure that the overhead of serializing the file names andinode numbers is *far* less than the overhead you would add by iteratingover the inodes in multiple passes.
Toggle quote (9 lines)>> Sure, I agree that it would be better to avoid that, but IMO not at the>> cost of using O(N) memory instead of O(1) memory, nor at the cost of>> multiplying the amount of disk I/O by a non-trivial factor.>> Understood.>> sort.c in Coreutils is very big, and we surely don’t want to duplicate> all that. Yet, I’d rather not shell out to ‘sort’.
The "shell" would not be involved here at all, just the "sort" program.I guess you dislike launching external processes? Can you explain why?
Guix-daemon launches external processes for building derivations, so whyis using one for garbage collection a problem? Emacs, a program thatyou cite in your talks as having many qualities that we seek to emulate,does not shy away from using external programs.
Toggle quote (2 lines)> Do you know how many entries are in .links on hydra.gnu.org?
"df -i /gnu" indicates that it currently has about 5.5M inodes, butthat's with only 29% of the disk in use. A few days ago, when the diskwas full, assuming that the average file size is the same, it may havehad closer to 5.5M / 0.29 ~= 19M inodes, which is over 30 times as manyas used in your measurements above. On ext4, which uses 256-byteinodes, that's about 5 gigabytes of inodes.
Mark
L
L
Ludovic Courtès wrote on 13 Dec 2016 18:02
(name . Mark H Weaver)(address . mhw@netris.org)
87fulrsqxx.fsf@gnu.org
Hello Mark,
Mark H Weaver <mhw@netris.org> skribis:
Toggle quote (10 lines)> ludo@gnu.org (Ludovic Courtès) writes:>>> I did some measurements with the attached program on chapters, which is>> a Xen VM with spinning disks underneath, similar to hydra.gnu.org. It>> has 600k entries in /gnu/store/.links.>> I just want to point out that 600k inodes use 150 megabytes of disk> space on ext4, which is small enough to fit in the cache, so the disk> I/O will not be multiplied for such a small test case.
Right. That’s the only spinning-disk machine I could access withoutproblem. :-/
Ricardo, Roel: would you be able to run that links-traversal.c fromhttps://debbugs.gnu.org/cgi/bugreport.cgi?filename=links-traversal.c;bug=24937;msg=25;att=1on a machine with a big store, as described athttps://debbugs.gnu.org/cgi/bugreport.cgi?bug=24937#25?
Toggle quote (7 lines)>> Semi-interleaved is ~12% slower here (not sure how reproducible that is>> though).>> This directory you're testing on is more than an order of magnitude> smaller than Hydra's when it's full. Unlike in your test above, all of> the inodes in Hydra's store won't fit in the cache.
Good point. I’m trying my best to get performance figures, there’s nodoubt we could do better!
Toggle quote (5 lines)> In my opinion, the reason Hydra performs so poorly is because efficiency> and scalability are apparently very low priorities in the design of the> software running on it. Unfortunately, I feel that my advice in this> area is discarded more often than not.
Well, as you know, I’m currently traveling, yet I take the time toanswer your email at night; I think this should suggest that far fromdiscarding your advice, I very much value it.
I’m a maintainer though, so I’m trying to understand the problem better.It’s not just about finding the “optimal” solution, but also aboutfinding a tradeoff between the benefits and the maintainability costs.
Toggle quote (6 lines)>> sort.c in Coreutils is very big, and we surely don’t want to duplicate>> all that. Yet, I’d rather not shell out to ‘sort’.>> The "shell" would not be involved here at all, just the "sort" program.> I guess you dislike launching external processes? Can you explain why?
I find that passing strings around among programs is inelegant(subjective), but I don’t think you’re really looking to argue aboutthat, are you? :-)
It remains that, if invoking ‘sort’ appears to be preferable *both* fromperformance and maintenance viewpoints, then it’s a good choice. Thatmay be the case, but again, I prefer to have figures to back that.
Toggle quote (7 lines)>> Do you know how many entries are in .links on hydra.gnu.org?>> "df -i /gnu" indicates that it currently has about 5.5M inodes, but> that's with only 29% of the disk in use. A few days ago, when the disk> was full, assuming that the average file size is the same, it may have> had closer to 5.5M / 0.29 ~= 19M inodes,
OK, good to know.
Thanks!
Ludo’.
R
R
Ricardo Wurmus wrote on 13 Dec 2016 18:18
(name . Ludovic Courtès)(address . ludo@gnu.org)
87vaunbvcu.fsf@mdc-berlin.de
Ludovic Courtès <ludo@gnu.org> writes:
Toggle quote (5 lines)> Ricardo, Roel: would you be able to run that links-traversal.c from> <https://debbugs.gnu.org/cgi/bugreport.cgi?filename=links-traversal.c;bug=24937;msg=25;att=1>> on a machine with a big store, as described at> <https://debbugs.gnu.org/cgi/bugreport.cgi?bug=24937#25>?
I just ran this on my workstation in the office where I regularly buildpackages. Here’s the output of “df -i /gnu”
Filesystem Inodes IUsed IFree IUse% Mounted on /dev/mapper/fedora-root 3301376 1098852 2202524 34% /
Probably not large enough to derive conclusions about hydra’s behaviour.
[I can’t run it on the shared store at the MDC because NFS performance istoo poor. I recently ran “guix gc --optimize” to dedupe the sharedstore (post-build deduplication is disabled since a few weeks) and it’sat 3,197,489 used inodes.]
Here are the results of running the link-traversal code on myworkstation:
Toggle snippet (21 lines)rwurmus in ~: gcc -std=gnu99 -Wall links-traversal.c -DMODE=3rwurmus in ~: sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'rwurmus in ~: time ./a.out 412825 dir_entries, 107 secondsstat took 0 seconds
real 1m47.264suser 0m0.214ssys 0m1.314s
rwurmus in ~: gcc -std=gnu99 -Wall links-traversal.c -DMODE=2rwurmus in ~: sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'rwurmus in ~: time ./a.out 12821 dir_entries, 107 seconds (including stat)
real 1m46.475suser 0m0.201ssys 0m1.309s

-- Ricardo
M
M
Mark H Weaver wrote on 15 Dec 2016 02:19
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 24937@debbugs.gnu.org)
87h9663s6e.fsf@netris.org
I apologize for losing my patience earlier.
Mark
R
R
Ricardo Wurmus wrote on 16 Apr 15:26 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 24937@debbugs.gnu.org)
87ftd3muhp.fsf@elephly.net
Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de> writes:
Toggle quote (44 lines)> Ludovic Courtès <ludo@gnu.org> writes:>>> Ricardo, Roel: would you be able to run that links-traversal.c from>> <https://debbugs.gnu.org/cgi/bugreport.cgi?filename=links-traversal.c;bug=24937;msg=25;att=1>>> on a machine with a big store, as described at>> <https://debbugs.gnu.org/cgi/bugreport.cgi?bug=24937#25>?>> I just ran this on my workstation in the office where I regularly build> packages. Here’s the output of “df -i /gnu”>> Filesystem Inodes IUsed IFree IUse% Mounted on> /dev/mapper/fedora-root 3301376 1098852 2202524 34% />> Probably not large enough to derive conclusions about hydra’s behaviour.>> [I can’t run it on the shared store at the MDC because NFS performance is> too poor. I recently ran “guix gc --optimize” to dedupe the shared> store (post-build deduplication is disabled since a few weeks) and it’s> at 3,197,489 used inodes.]>> Here are the results of running the link-traversal code on my> workstation:>> --8<---------------cut here---------------start------------->8---> rwurmus in ~: gcc -std=gnu99 -Wall links-traversal.c -DMODE=3> rwurmus in ~: sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'> rwurmus in ~: time ./a.out> 412825 dir_entries, 107 seconds> stat took 0 seconds>> real 1m47.264s> user 0m0.214s> sys 0m1.314s>> rwurmus in ~: gcc -std=gnu99 -Wall links-traversal.c -DMODE=2> rwurmus in ~: sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'> rwurmus in ~: time ./a.out> 12821 dir_entries, 107 seconds (including stat)>> real 1m46.475s> user 0m0.201s> sys 0m1.309s> --8<---------------cut here---------------end--------------->8---
I ran this for the first time on ci.guix.gnu.org, which has a very bigstore (currently at around 29TB).
df -i /gnu:
Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sdb1 610021376 132350406 477670970 22% /gnu
I had to increase the number of MAX_ENTRIES to 135000000.
I forgot to drop caches initially. This is the first run:
Toggle snippet (10 lines)root@berlin ~ [env]# gcc links-traversal.c -DMODE=3 -o links-traversalroot@berlin ~ [env]# time ./links-traversal57079502 dir_entries, 3906 secondsstat took 136 seconds
real 67m48.145suser 0m59.575ssys 2m30.065s
I aborted the run after I dropped caches after 67 minutes.
I’m going to continue testing on one of the build nodes, and I’ll tryusing statx.
--Ricardo
R
R
Ricardo Wurmus wrote on 16 Apr 16:27 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 24937@debbugs.gnu.org)
87eesnmrow.fsf@elephly.net
Here are more benchmarks on one of the build nodes. It doesn’t nearlyhave as many used inodes as ci.guix.gnu.org, but I could fill it up ifnecessary.
root@hydra-guix-127 ~# df -i /gnu/ Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sda3 28950528 2796829 26153699 10% /
root@hydra-guix-127 ~# ls -1 /gnu/store/.links | wc -l 2017395
I tested all three modes with statx and with lstat. Thelinks-traversal-statx.c is attached below.
* mode 1 + statx
Toggle snippet (27 lines)root@hydra-guix-127 ~ [env]# gcc -Wall -std=c99 links-traversal-statx.c -DMODE=1 -D_GNU_SOURCE=1 -o links-traversallinks-traversal-statx.c:53:8: warning: �stat_entries� defined but not used [-Wunused-function] 53 | void stat_entries (void) | ^~~~~~~~~~~~root@hydra-guix-127 ~ [env]# echo 3 > /proc/sys/vm/drop_cachesroot@hydra-guix-127 ~ [env]# time ./links-traversal 2017397 dir_entries, 9 seconds (including stat)
real 0m9.176suser 0m0.801ssys 0m4.236sroot@hydra-guix-127 ~ [env]# time ./links-traversal 2017397 dir_entries, 4 seconds (including stat)
real 0m3.556suser 0m0.708ssys 0m2.848sroot@hydra-guix-127 ~ [env]# time ./links-traversal 2017397 dir_entries, 4 seconds (including stat)
real 0m3.553suser 0m0.599ssys 0m2.954sroot@hydra-guix-127 ~ [env]#

* mode 2 + statx
Toggle snippet (24 lines)root@hydra-guix-127 ~ [env]# gcc -Wall -std=c99 links-traversal-statx.c -DMODE=2 -D_GNU_SOURCE=1 -o links-traversalroot@hydra-guix-127 ~ [env]# echo 3 > /proc/sys/vm/drop_cachesroot@hydra-guix-127 ~ [env]# time ./links-traversal 17377 dir_entries, 10 seconds (including stat)
real 0m9.598suser 0m1.210ssys 0m4.257sroot@hydra-guix-127 ~ [env]# time ./links-traversal 17377 dir_entries, 4 seconds (including stat)
real 0m4.094suser 0m0.988ssys 0m3.107sroot@hydra-guix-127 ~ [env]# time ./links-traversal 17377 dir_entries, 4 seconds (including stat)
real 0m4.095suser 0m0.933ssys 0m3.162sroot@hydra-guix-127 ~ [env]#

* mode 3 + statx
Toggle snippet (26 lines)root@hydra-guix-127 ~ [env]# gcc -Wall -std=c99 links-traversal-statx.c -DMODE=3 -D_GNU_SOURCE=1 -o links-traversal^Croot@hydra-guix-127 ~ [env]# echo 3 > /proc/sys/vm/drop_cachesroot@hydra-guix-127 ~ [env]# time ./links-traversal 2017397 dir_entries, 7 secondsstat took 3 seconds
real 0m9.992suser 0m1.411ssys 0m4.221sroot@hydra-guix-127 ~ [env]# time ./links-traversal 2017397 dir_entries, 1 secondsstat took 2 seconds
real 0m4.265suser 0m1.120ssys 0m3.145sroot@hydra-guix-127 ~ [env]# time ./links-traversal 2017397 dir_entries, 2 secondsstat took 2 seconds
real 0m4.267suser 0m1.072ssys 0m3.195sroot@hydra-guix-127 ~ [env]#
Now with just lstat:
* mode 1 + lstat
Toggle snippet (26 lines)root@hydra-guix-127 ~ [env]# gcc -Wall -std=c99 links-traversal.c -DMODE=1 -D_GNU_SOURCE=1 -o links-traversallinks-traversal.c:49:8: warning: �stat_entries� defined but not used [-Wunused-function] 49 | void stat_entries (void) | ^~~~~~~~~~~~root@hydra-guix-127 ~ [env]# echo 3 > /proc/sys/vm/drop_cachesroot@hydra-guix-127 ~ [env]# time ./links-traversal 2017397 dir_entries, 9 seconds (including stat)
real 0m9.303suser 0m0.748ssys 0m4.397sroot@hydra-guix-127 ~ [env]# time ./links-traversal 2017397 dir_entries, 4 seconds (including stat)
real 0m3.526suser 0m0.540ssys 0m2.987sroot@hydra-guix-127 ~ [env]# time ./links-traversal 2017397 dir_entries, 3 seconds (including stat)
real 0m3.519suser 0m0.600ssys 0m2.919sroot@hydra-guix-127 ~ [env]#
* mode 2 + lstat
Toggle snippet (23 lines)root@hydra-guix-127 ~ [env]# gcc -Wall -std=c99 links-traversal.c -DMODE=2 -D_GNU_SOURCE=1 -o links-traversalroot@hydra-guix-127 ~ [env]# echo 3 > /proc/sys/vm/drop_cachesroot@hydra-guix-127 ~ [env]# time ./links-traversal 17377 dir_entries, 9 seconds (including stat)
real 0m9.614suser 0m1.205ssys 0m4.250sroot@hydra-guix-127 ~ [env]# time ./links-traversal 17377 dir_entries, 4 seconds (including stat)
real 0m4.060suser 0m1.052ssys 0m3.008sroot@hydra-guix-127 ~ [env]# time ./links-traversal 17377 dir_entries, 4 seconds (including stat)
real 0m4.057suser 0m0.984ssys 0m3.073sroot@hydra-guix-127 ~ [env]#
* mode 3 + lstat
Toggle snippet (26 lines)root@hydra-guix-127 ~ [env]# gcc -Wall -std=c99 links-traversal.c -DMODE=3 -D_GNU_SOURCE=1 -o links-traversalroot@hydra-guix-127 ~ [env]# echo 3 > /proc/sys/vm/drop_cachesroot@hydra-guix-127 ~ [env]# time ./links-traversal 2017397 dir_entries, 6 secondsstat took 3 seconds
real 0m9.767suser 0m1.270ssys 0m4.339sroot@hydra-guix-127 ~ [env]# time ./links-traversal 2017397 dir_entries, 2 secondsstat took 2 seconds
real 0m4.234suser 0m1.136ssys 0m3.097sroot@hydra-guix-127 ~ [env]# time ./links-traversal 2017397 dir_entries, 1 secondsstat took 2 seconds
real 0m4.222suser 0m1.052ssys 0m3.170sroot@hydra-guix-127 ~ [env]#
They are all very close, so I think I need to work with a bigger storeto see a difference.
Or perhaps I did something silly because I don’t know C… If so pleaselet me know.
--Ricardo
L
L
Ludovic Courtès wrote on 17 Apr 10:16 +0200
(name . Ricardo Wurmus)(address . rekado@elephly.net)(address . 24937@debbugs.gnu.org)
874ktieddi.fsf@gnu.org
Hi Ricardo,
Thanks for running this benchmark!
Ricardo Wurmus <rekado@elephly.net> skribis:
Toggle quote (3 lines)> root@hydra-guix-127 ~# ls -1 /gnu/store/.links | wc -l> 2017395
That’s not a lot, my laptop has 2.8M links.
It’s interesting to see that system time remains at ~4.2s in all modes.So the only thing that modes 2 and 3 achieve is increasing CPU time.It’s as if the order in which files are stat’d had no impact on I/Operformance.
Ludo’.
R
R
Ricardo Wurmus wrote on 17 Apr 10:28 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 24937@debbugs.gnu.org)
874ktims89.fsf@elephly.net
Ludovic Courtès <ludo@gnu.org> writes:
Toggle quote (5 lines)>> root@hydra-guix-127 ~# ls -1 /gnu/store/.links | wc -l>> 2017395>> That’s not a lot, my laptop has 2.8M links.
Let me rerun this after copying a few thousand store items fromci.guix.gnu.org over. Maybe we’ll see the different times diverge then.
--Ricardo
?