On Fri, 20 Nov 2020 12:02:25 +0100 Ludovic Courtès wrote: > ‘guix system init’ ends by copying the system’s closure from the > “host” store to the target store; it also initializes the database of > that target store. > > That copy is inefficient for several reasons. Let’s pick one file, > shred.1.gz, that ends up being copied, and let’s look at its > occurrences in the strace log of ‘guix system init config.scm > /tmp/os’: > > --8<---------------cut here---------------start------------->8--- > $ grep -A2 '/shred.1.gz' ,,s > lstat("/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/shred.1.gz", > {st_mode=S_IFREG|0444, st_size=1490, ...}) = 0 openat(AT_FDCWD, > "/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/shred.1.gz", > O_RDONLY) = 15 fstat(15, {st_mode=S_IFREG|0444, st_size=1490, ...}) = > 0 openat(AT_FDCWD, > "/tmp/os/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/shred.1.gz", > O_WRONLY|O_CREAT|O_TRUNC, 0444) = 16 read(15, > "\37\213\10\0\0\0\0\0\2\3\215VMs\3336\20\275\363Wluh\354\251L%vg\322:M"..., > 8192) = 1490 write(16, > "\37\213\10\0\0\0\0\0\2\3\215VMs\3336\20\275\363Wluh\354\251L%vg\322:M"..., > 1490) = 1490 -- utimensat(AT_FDCWD, > "/tmp/os/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/shred.1.gz", > [{tv_sec=1605721025, tv_nsec=616985411} /* > 2020-11-18T18:37:05.616985411+0100 */, {tv_sec=1, tv_nsec=0} /* > 1970-01-01T01:00:01+0100 */], 0) = 0 > lstat("/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/sleep.1.gz", > {st_mode=S_IFREG|0444, st_size=813, ...}) = 0 openat(AT_FDCWD, > "/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/sleep.1.gz", > O_RDONLY) = 15 -- > lstat("/tmp/os/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/shred.1.gz", > {st_mode=S_IFREG|0444, st_size=1490, ...}) = 0 > lstat("/tmp/os/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/shuf.1.gz", > {st_mode=S_IFREG|0444, st_size=972, ...}) = 0 > lstat("/tmp/os/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/sleep.1.gz", > {st_mode=S_IFREG|0444, st_size=813, ...}) = 0 -- > lstat("/tmp/os/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/shred.1.gz", > {st_mode=S_IFREG|0444, st_size=1490, ...}) = 0 openat(AT_FDCWD, > "/tmp/os/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/shred.1.gz", > O_RDONLY) = 17 lseek(17, 0, SEEK_CUR) = 0 read(17, > "\37\213\10\0\0\0\0\0\2\3\215VMs\3336\20\275\363Wluh\354\251L%vg\322:M"..., > 1490) = 1490 -- > lstat("/tmp/os/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/shred.1.gz", > {st_mode=S_IFREG|0444, st_size=1490, ...}) = 0 openat(AT_FDCWD, > "/tmp/os/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/shred.1.gz", > O_RDONLY) = 17 lseek(17, 0, SEEK_CUR) = 0 read(17, > "\37\213\10\0\0\0\0\0\2\3\215VMs\3336\20\275\363Wluh\354\251L%vg\322:M"..., > 1490) = 1490 -- > link("/tmp/os/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/shred.1.gz", > "/tmp/os/gnu/store/.links/0w0qcs5lp36i89yry91r2ixlghihzf0vc56bpd9yylj342gv82xl") > = 0 > lstat("/tmp/os/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/shuf.1.gz", > {st_mode=S_IFREG|0444, st_size=972, ...}) = 0 openat(AT_FDCWD, > "/tmp/os/gnu/store/57xj5gcy1jbl9ai2lnrqnpr0dald9i65-coreutils-8.32/share/man/man1/shuf.1.gz", > O_RDONLY) = 17 --8<---------------cut > here---------------end--------------->8--- > > First, /tmp/os/…/shred.1.gz is read entirely twice: once in > ‘register-items’ (in the ‘nar-sha256’ call) to compute its hash, and a > second time for deduplication (the ‘deduplicate’ call in there.) > > The ‘nar-sha256’ call could be avoided because the database of > /gnu/store contains that value. As for deduplication, we could > perhaps create those ‘.links’ entries as we copy files instead of > re-traversing the whole thing afterwards. > > Second, all of /tmp/os is traversed to reset timestamps, although we > could have cleared those timestamps when we created those files in the > first place ( prevents that though, > unless we keep a bug-fixed copy of ‘copy-recursively’ in there.) > > Third, in the case of the installer, we’re really copying from > /mnt/guix-inst/store to /mnt/gnu/store, which is likely the same > device. In this case we could create hard links instead of actually > copying files. > > Fourth, we’re adding items one by one in the target store database, > but it may be more efficient to more or less dump the subset of the > source database in bulk. > > Surely we can do better. > > Ludo’. > > > Also, if a store is already present (eg.: because of a partial install), it could make sense to (optionally) keep its contents. AFAIK this is still not possible. It was one the bigger time sinks while I was working on the F2FS support.