Maxime Devos writes: > Andrew Tropin schreef op ma 05-07-2021 om 18:39 [+0300]: >> + (define (equal-regulars? file1 file2) >> + "Check if FILE1 and FILE2 are bit for bit identical." >> + (let* ((cmp-binary #$(file-append >> + (@ (gnu packages base) diffutils) "/bin/cmp")) >> + (status (system* cmp-binary file1 file2))) >> + (= status 0))) > > Is there any particular reason to shell out to "cmp" instead > of doing the comparison in Guile? Starting a process isn't > the most efficient thing. > > Try "time /run/current-system/profile/bin echo", on my system, > it takes about 2--3 milliseconds for "echo" to finish > even though it only had to print a newline character. > Compare with "time echo" (to use the shell built-in "echo"), > it takes 0.000s seconds on my system. > > 3 milliseconds isn't much by itself, but this can accumulate, > so I would implement the comparison in Guile. > > As an optimisation, you could look at the value returned by "lstat". > If the 'size' is different, then 'equal-regulars?' can return #f > without reading the file. If the 'inode' and 'device' are equal, > then 'equal-regulars?' can return #t I think (at least on conventional > file systems like btrfs and ext4). No specific reason. Yep, spawning a new process can be expensive, but it's not clear how much time will take the comparison itself and if it worth it to optimize "startup time". I'm not very fluent with guile internals and not sure if reimplementation of cmp in guile would improve or worsen the performance, but it obviously could intoduce some bugs. I found Xinglu's idea of the usage of well-tested cmp to be a reasonable solution here. Also, this service is expected to be used with small amount of files and because many of them are symlinks to the store even smaller number of them will trigger the execution of cmp, so I find the performance optimization to be preliminary here and propose to address the issue when and if it appear someday. However, the ideas about size and inodes are good, easy to implement and I find them potentially useful to prevent unecessary external process spawning. The patch with those improvements are below: