incorrect “guix hash” for FastQC

  • Open
  • quality assurance status badge
Details
2 participants
  • Ludovic Courtès
  • Simon Tournier
Owner
unassigned
Submitted by
Simon Tournier
Severity
normal
S
S
Simon Tournier wrote on 14 Sep 2023 18:46
incorrect “guix hash” for FastQC
(address . bug-guix@gnu.org)
87v8ccd7oy.fsf@gmail.com
Hi,

Currently Guix stores in its source the hash
00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk and this has not
changed since February (Guix revision
b6a4fbb488f1f539ae45ed7924c9af8905fa0d8b).

Well, because the origin contains a snippet, “guix build -S” does not
return the result of the fixed derivation. Instead, it is one or the other:

Toggle snippet (5 lines)
$ guix gc -R $(guix build -S fastqc -d) | grep checkout.drv
/gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv
/gnu/store/7rdcyl7iq00xfdwh6azgmc3i7mr2769b-fastqc-0.11.9-checkout.drv

Let pick the correct one and I get:

Toggle snippet (4 lines)
$ guix hash -rx $(guix build /gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv)
0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479

Oh, it is not the expected result! Instead, I have to omit the option
’-x’:

Toggle snippet (4 lines)
$ guix hash -r $(guix build /gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv)
00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk

Hum, I am missing something…

Let clone from upstream.

Toggle snippet (6 lines)
$ git clone https://github.com/s-andrews/FastQC
$ git -C FastQC checkout v0.11.9
$ guix hash -rx FastQC
0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479

Ah… incorrect but consistent with previously. The question is why is
this following,

Toggle snippet (5 lines)
$ mv FastQC/.git bye-bye
$ guix hash -r FastQC
00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk

is correct? And obviously without .git we get the same hash just as
above,

Toggle snippet (4 lines)
$ guix hash -rx FastQC
0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479

Any idea?

Cheers,
simon
S
S
Simon Tournier wrote on 26 Sep 2023 15:00
(address . 65979@debbugs.gnu.org)
87fs313x92.fsf@gmail.com
Hi,

On Thu, 14 Sep 2023 at 18:46, Simon Tournier <zimon.toutoune@gmail.com> wrote:

Toggle quote (9 lines)
> $ guix hash -rx $(guix build /gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv)
> 0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479
>
> Oh, it is not the expected result! Instead, I have to omit the option
> ’-x’:
>
> $ guix hash -r $(guix build /gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv)
> 00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk

[...]

Toggle quote (5 lines)
> $ git -C FastQC checkout v0.11.9
> $ guix hash -rx FastQC
> 0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479

[...]

Toggle quote (4 lines)
> $ mv FastQC/.git bye-bye
> $ guix hash -r FastQC
> 00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk

[...]

Toggle quote (2 lines)
> Any idea?

The issue is from .svn folders.

Toggle snippet (7 lines)
$ find $(guix build /gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv) \
-type d -name ".svn" -print
/gnu/store/5dyvjgylwiixhp8z9a08rnddh53h4cdx-fastqc-0.11.9-checkout/Help/1 Introduction/.svn
/gnu/store/5dyvjgylwiixhp8z9a08rnddh53h4cdx-fastqc-0.11.9-checkout/Help/3 Analysis Modules/.svn
/gnu/store/5dyvjgylwiixhp8z9a08rnddh53h4cdx-fastqc-0.11.9-checkout/Help/2 Basic Operations/.svn

The option ’-x’ excludes them; see vcs-file? from (guix hash) module,
and let confirm:

$ git -C FastQC checkout v0.11.9

$ guix hash -rx FastQC
0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479

$ find FastQC -type d -name ".svn" -exec rm -r "{}" \;
$ mv FastQC/.git bye-bye
$ guix hash -r FastQC
0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479

Therefore, we have a discrepancy between “guix hash” command-line and
checksum verification from ’source’ field.

The package definition reads:

(source
(origin
(method git-fetch)
(uri (git-reference
(commit (string-append "v" version))))
(file-name (git-file-name name version))
(sha256
(base32
"00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk"))

when this hash contains .svn subfolders that are excluded by ’-x’
command-line option.

What do we fix? The procedure vcs-file? or fixed-output computation?

Cheers,
simon
L
L
Ludovic Courtès wrote on 26 Sep 2023 15:29
(name . Simon Tournier)(address . zimon.toutoune@gmail.com)(address . 65979@debbugs.gnu.org)
87wmwd2hbi.fsf@gnu.org
Hi!

Simon Tournier <zimon.toutoune@gmail.com> skribis:

Toggle quote (12 lines)
> Let pick the correct one and I get:
>
> $ guix hash -rx $(guix build /gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv)
> 0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479
>
>
> Oh, it is not the expected result! Instead, I have to omit the option
> ’-x’:
>
> $ guix hash -r $(guix build /gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv)
> 00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk

This is because:

Toggle snippet (6 lines)
$ find $( guix build /gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv) -name .svn
/gnu/store/5dyvjgylwiixhp8z9a08rnddh53h4cdx-fastqc-0.11.9-checkout/Help/3 Analysis Modules/.svn
/gnu/store/5dyvjgylwiixhp8z9a08rnddh53h4cdx-fastqc-0.11.9-checkout/Help/1 Introduction/.svn
/gnu/store/5dyvjgylwiixhp8z9a08rnddh53h4cdx-fastqc-0.11.9-checkout/Help/2 Basic Operations/.svn

Not a bug, IMO (‘vcs-file?’ shouldn’t try to be smart in this case.)

Ludo’.
L
L
Ludovic Courtès wrote on 26 Sep 2023 15:34
(name . Simon Tournier)(address . zimon.toutoune@gmail.com)(address . 65979@debbugs.gnu.org)
87pm252h3c.fsf@gnu.org
(Oops, now I see you had already found the reason.)

Simon Tournier <zimon.toutoune@gmail.com> skribis:

Toggle quote (2 lines)
> What do we fix? The procedure vcs-file? or fixed-output computation?

There are two things are:

1. ‘vcs-file?’, used by ‘guix hash -rx’;

2. ‘git-fetch’, which does (delete-file-recursively ".git").

Clearly #2 is correct (it’s perfectly fine to have a ‘.svn’ directory in
a Git repo), whereas #1 is an approximation that, in corner cases like
this one, gives the wrong answer.

My take is that it’s OK to keep ‘vcs-file?’ as is: the best we could do
would be to add complicated heuristics in the hope corner cases like
this one would be better dealt with, but it wouldn’t be bullet-proof
anyway.

WDYT?

Ludo’.
S
S
Simon Tournier wrote on 26 Sep 2023 18:23
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 65979@debbugs.gnu.org)
CAJ3okZ0EB+uPLFsouk20JCKoL3gm000LSqH7Sxfe_dk9rqsQLA@mail.gmail.com
Hi,

On Tue, 26 Sept 2023 at 15:34, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (6 lines)
> There are two things are:
>
> 1. ‘vcs-file?’, used by ‘guix hash -rx’;
>
> 2. ‘git-fetch’, which does (delete-file-recursively ".git").

Yes.

Toggle quote (4 lines)
> Clearly #2 is correct (it’s perfectly fine to have a ‘.svn’ directory in
> a Git repo), whereas #1 is an approximation that, in corner cases like
> this one, gives the wrong answer.

These corner cases matter for some SWH loader implementing Nar hashes
in Python. Since they load sources.json (conversion of hash checksum
from package recipe), read the Nar hash and compare it with the one
they internal compute, then they need to implement in Python the
correct behavior.

and all the thread for context.


Toggle quote (5 lines)
> My take is that it’s OK to keep ‘vcs-file?’ as is: the best we could do
> would be to add complicated heuristics in the hope corner cases like
> this one would be better dealt with, but it wouldn’t be bullet-proof
> anyway.

Well, the question is what other VCS as 'svn-fetch', etc. are doing?

Maybe, we can just have a special case for Git repository.

Somehow, since the problem needs to be solved for SWH, the same
solution applies for vcs-file? and "guix hash", no?

Cheers,
simon
L
L
Ludovic Courtès wrote on 1 Oct 2023 17:07
(name . Simon Tournier)(address . zimon.toutoune@gmail.com)(address . 65979@debbugs.gnu.org)
87pm1ys7oq.fsf@gnu.org
Hi,

Simon Tournier <zimon.toutoune@gmail.com> skribis:

[...]

Toggle quote (10 lines)
>> Clearly #2 is correct (it’s perfectly fine to have a ‘.svn’ directory in
>> a Git repo), whereas #1 is an approximation that, in corner cases like
>> this one, gives the wrong answer.
>
> These corner cases matter for some SWH loader implementing Nar hashes
> in Python. Since they load sources.json (conversion of hash checksum
> from package recipe), read the Nar hash and compare it with the one
> they internal compute, then they need to implement in Python the
> correct behavior.

I agree, SWH needs something more robust than ‘vcs-file?’.

Toggle quote (7 lines)
>> My take is that it’s OK to keep ‘vcs-file?’ as is: the best we could do
>> would be to add complicated heuristics in the hope corner cases like
>> this one would be better dealt with, but it wouldn’t be bullet-proof
>> anyway.
>
> Well, the question is what other VCS as 'svn-fetch', etc. are doing?

Remove the relevant metadata directory/directories.

Toggle quote (5 lines)
> Maybe, we can just have a special case for Git repository.
>
> Somehow, since the problem needs to be solved for SWH, the same
> solution applies for vcs-file? and "guix hash", no?

Yes, ‘guix hash’ uses ‘vcs-file?’.

Now, SWH does not use ‘guix hash’ (I suppose) so they’ll have to address
that on their side (in Python code I guess).

Ludo’.
S
S
Simon Tournier wrote on 1 Oct 2023 17:24
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 65979@debbugs.gnu.org)
CAJ3okZ0hV=vheHK8JSRSWPWN2NfWpY+G+4z5_eF2BRY0N9+bSg@mail.gmail.com
Hi,

On Sun, 1 Oct 2023 at 17:07, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (9 lines)
> >> My take is that it’s OK to keep ‘vcs-file?’ as is: the best we could do
> >> would be to add complicated heuristics in the hope corner cases like
> >> this one would be better dealt with, but it wouldn’t be bullet-proof
> >> anyway.
> >
> > Well, the question is what other VCS as 'svn-fetch', etc. are doing?
>
> Remove the relevant metadata directory/directories.

What does it mean "relevant"? :-)

Where do I have to look to implement the exact same behaviour of what
do the fetch methods?


Toggle quote (3 lines)
> Now, SWH does not use ‘guix hash’ (I suppose) so they’ll have to address
> that on their side (in Python code I guess).

Yes for sure. It does not change the fact that "guix hash" has a bug.

Cheers,
simon
S
S
Simon Tournier wrote on 17 Oct 2023 19:56
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 65979@debbugs.gnu.org)
87r0ltp1yr.fsf@gmail.com
Hi Ludo,

On Tue, 26 Sep 2023 at 15:34, Ludovic Courtès <ludo@gnu.org> wrote:

Toggle quote (5 lines)
> My take is that it’s OK to keep ‘vcs-file?’ as is: the best we could do
> would be to add complicated heuristics in the hope corner cases like
> this one would be better dealt with, but it wouldn’t be bullet-proof
> anyway.

Hum, I am probably pig-headed here. :-)

I propose the change below. Well, it does not appear to me a
complicated heuristics but I am biased. The idea is to detect at the
repository root what is the kind of VCS, just by checking if .git (or
.hg or .svn or etc.) exists or not. Then, only exclude this kind… and
not the whole list.

And I think this way is almost bullet-proof, no? Or could you provide
some contrived examples?

WDYT?

Obviously, this fixes the issue:

Toggle snippet (12 lines)
$ guix hash -rx /tmp/FastQC
0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479

$ ./pre-inst-env guix hash -rx /tmp/FastQC
00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk

$ grep -A 15 'define-public fastqc' gnu/packages/bioinformatics.scm | grep -C 1 base32
(sha256
(base32
"00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk"))

And I do not observe a regression using a couple of tests.

Cheers,
simon
From 1bf6555e6a89248f079437b842dbaff9c107de9a Mon Sep 17 00:00:00 2001
Message-Id: <1bf6555e6a89248f079437b842dbaff9c107de9a.1697562942.git.zimon.toutoune@gmail.com>
From: Simon Tournier <zimon.toutoune@gmail.com>
Date: Tue, 17 Oct 2023 18:53:04 +0200
Subject: [PATCH] scripts: hash: Handle repository several VCS folders.

Reported by Simon Tournier <zimon.toutoune@gmail.com>

* guix/hash.scm (vcs-file?): Add optional argument for passing VCS kind of the
file/repository.
(file-hash*): Adjust accordingly.
* guix/scripts/hash.scm (guix-hash)[file-hash]: Detect VCS kind of the
file/repository and passes it.
---
guix/hash.scm | 18 ++++++++++++------
guix/scripts/hash.scm | 18 +++++++++++++-----
2 files changed, 25 insertions(+), 11 deletions(-)

Toggle diff (92 lines)
diff --git a/guix/hash.scm b/guix/hash.scm
index 3cb68e5c44..8fff51e8f1 100644
--- a/guix/hash.scm
+++ b/guix/hash.scm
@@ -1,6 +1,7 @@
;;; GNU Guix --- Functional package management for GNU
;;; Copyright © 2021 Sarah Morgensen <iskarian@mgsn.dev>
;;; Copyright © 2022 Maxime Devos <maximedevos@telenet.be>
+;;; Copyright © 2023 Simon Tournier <zimon.toutoune@gmail.com>
;;;
;;; This file is part of GNU Guix.
;;;
@@ -25,21 +26,26 @@ (define-module (guix hash)
#:export (vcs-file?
file-hash*))
-(define (vcs-file? file stat)
- "Returns true if FILE is a version control system file."
+(define* (vcs-file? file stat
+ #:optional
+ (vcses (list ".bzr" ".git" ".hg" ".svn" "CVS")))
+ "Returns true if FILE matches a version control system from the list VCSES."
(case (stat:type stat)
((directory)
- (member (basename file) '(".bzr" ".git" ".hg" ".svn" "CVS")))
+ (member (basename file) vcses))
((regular)
- ;; Git sub-modules have a '.git' file that is a regular text file.
- (string=? (basename file) ".git"))
+ (if (member ".git" vcses)
+ ;; Git sub-modules have a '.git' file that is a regular text file.
+ (string=? (basename file) ".git")
+ #f))
(else
#f)))
(define* (file-hash* file #:key
(algorithm (hash-algorithm sha256))
(recursive? 'auto)
- (select? (negate vcs-file?)))
+ (select? (negate (lambda (file stat)
+ (vcs-file? file stat)))))
"Compute the hash of FILE with ALGORITHM.
Symbolic links are only dereferenced if RECURSIVE? is false.
diff --git a/guix/scripts/hash.scm b/guix/scripts/hash.scm
index 7197d3965c..ed96e6a7e1 100644
--- a/guix/scripts/hash.scm
+++ b/guix/scripts/hash.scm
@@ -3,7 +3,7 @@
;;; Copyright © 2013 Nikita Karetnikov <nikita@karetnikov.org>
;;; Copyright © 2016 Jan Nieuwenhuizen <janneke@gnu.org>
;;; Copyright © 2018 Tim Gesthuizen <tim.gesthuizen@yahoo.de>
-;;; Copyright © 2021 Simon Tournier <zimon.toutoune@gmail.com>
+;;; Copyright © 2021, 2023 Simon Tournier <zimon.toutoune@gmail.com>
;;; Copyright © 2021 Sarah Morgensen <iskarian@mgsn.dev>
;;;
;;; This file is part of GNU Guix.
@@ -181,9 +181,6 @@ (define-command (guix-hash . args)
(_ #f))
(reverse opts)))
(fmt (assq-ref opts 'format))
- (select? (if (assq-ref opts 'exclude-vcs?)
- (negate vcs-file?)
- (const #t)))
(algorithm (assoc-ref opts 'hash-algorithm))
(serializer (assoc-ref opts 'serializer)))
@@ -193,7 +190,18 @@ (define-command (guix-hash . args)
(catch 'system-error
(lambda _
(with-error-handling
- (serializer file algorithm select?)))
+ (let* ((vcses (fold (lambda (vcs result)
+ (if (file-exists? (string-append file "/" vcs))
+ (cons vcs result)
+ result))
+ '()
+ (list ".bzr" ".git" ".hg" ".svn" "CVS")))
+ (select? (if (assq-ref opts 'exclude-vcs?)
+ (negate (lambda (file stat)
+ (vcs-file? file stat
+ vcses)))
+ (const #t))))
+ (serializer file algorithm select?))))
(lambda args
(leave (G_ "~a ~a~%")
file

base-commit: dcc5c34504c94732c135a85fb4db40ca9796270e
--
2.38.1
?