file-needed/recurive does not canonicalize paths

  • Open
  • quality assurance status badge
Details
2 participants
  • Lars-Dominik Braun
  • Ludovic Courtès
Owner
unassigned
Submitted by
Lars-Dominik Braun
Severity
normal
L
L
Lars-Dominik Braun wrote on 25 Jan 2023 11:23
(address . bug-guix@gnu.org)
Y9EDPtzJws2NFxfT@noor.fritz.box
Hi,

(CC-ing Ludo, who wrote the code according to git logs)

during testing of wip-haskell I observed the make-dynamic-linker-cache
phase is taking alot of time (up to two minutes on a fast machine with
SSD). Looking at ghc-hindent for example [1]:

starting phase `make-dynamic-linker-cache'
created '/gnu/store/2nrzbaxmqs2rq9yv52bpyn2azb3qj6h1-ghc-hindent-5.3.4/etc/ld.so.cache' from 10085 library search path entries
phase `make-dynamic-linker-cache' succeeded after 119.5 seconds

And while Haskell packages link to a pretty large number of dynamic
libraries (116 in this case), 10000 search path entries seems wrong. Running just

(file-needed/recursive "/gnu/store/2nrzbaxmqs2rq9yv52bpyn2azb3qj6h1-ghc-hindent-5.3.4/bin/hindent")

takes a long time and reveals entries like
/gnu/store/1cyk8j2nd6r0cvm6kx1408kd763yf8h5-ghc-9.2.5/lib/ghc-9.2.5/Cabal-3.6.3.0/../directory-1.3.6.2/../unix-2.7.2.2/../bytestring-0.11.3.1/../template-haskell-2.18.0.0/../pretty-1.1.3.6/../array-0.5.4.0/../base-4.16.4.0/../ghc-bignum-1.2/../ghc-prim-0.8.0/libHSghc-prim-0.8.0-ghc9.2.5.so
so it looks like it deduplicates values, but does not canonicalize
paths. A relatively straight-forward fix could be the following change,
but I don’t know if that would cause any issues, since canonicalize-path
throws an exception if the resulting path does not exist. It’s also
a world rebuild since pretty much any package uses this phase (and the
reason and I cannot test it on a larger scale).

---snip---
Toggle diff (21 lines)
diff --git a/guix/build/gremlin.scm b/guix/build/gremlin.scm
index 2a74d51dd9..6eb8f688ea 100644
--- a/guix/build/gremlin.scm
+++ b/guix/build/gremlin.scm
@@ -285,8 +285,8 @@ (define (file-needed/recursive file)
(if (and runpath needed)
(let* ((runpath (map (cute expand-origin <> (dirname file))
runpath))
- (resolved (map (cut search-path runpath <>)
- needed))
+ (resolved (map (lambda (x) (and=> x canonicalize-path)) (map (cut search-path runpath <>)
+ needed)))
(failed (filter-map (lambda (needed resolved)
(and (not resolved)
(not (libc-library? needed))
---snap---

Cheers,
Lars

[1] https://ci.guix.gnu.org/build/366156/log/raw
L
L
Ludovic Courtès wrote on 30 Jan 2023 17:31
(name . Lars-Dominik Braun)(address . lars@6xq.net)(address . 61055@debbugs.gnu.org)
87ilgoyoei.fsf@gnu.org
Hi Lars,

Lars-Dominik Braun <lars@6xq.net> skribis:

Toggle quote (22 lines)
> during testing of wip-haskell I observed the make-dynamic-linker-cache
> phase is taking alot of time (up to two minutes on a fast machine with
> SSD). Looking at ghc-hindent for example [1]:
>
> starting phase `make-dynamic-linker-cache'
> created '/gnu/store/2nrzbaxmqs2rq9yv52bpyn2azb3qj6h1-ghc-hindent-5.3.4/etc/ld.so.cache' from 10085 library search path entries
> phase `make-dynamic-linker-cache' succeeded after 119.5 seconds
>
> And while Haskell packages link to a pretty large number of dynamic
> libraries (116 in this case), 10000 search path entries seems wrong. Running just
>
> (file-needed/recursive "/gnu/store/2nrzbaxmqs2rq9yv52bpyn2azb3qj6h1-ghc-hindent-5.3.4/bin/hindent")
>
> takes a long time and reveals entries like
> /gnu/store/1cyk8j2nd6r0cvm6kx1408kd763yf8h5-ghc-9.2.5/lib/ghc-9.2.5/Cabal-3.6.3.0/../directory-1.3.6.2/../unix-2.7.2.2/../bytestring-0.11.3.1/../template-haskell-2.18.0.0/../pretty-1.1.3.6/../array-0.5.4.0/../base-4.16.4.0/../ghc-bignum-1.2/../ghc-prim-0.8.0/libHSghc-prim-0.8.0-ghc9.2.5.so
> so it looks like it deduplicates values, but does not canonicalize
> paths. A relatively straight-forward fix could be the following change,
> but I don’t know if that would cause any issues, since canonicalize-path
> throws an exception if the resulting path does not exist. It’s also
> a world rebuild since pretty much any package uses this phase (and the
> reason and I cannot test it on a larger scale).

Right. Other arguments against systematic canonicalization: (1)
‘canonicalize-path’ is costly, (2) developers and tools might choose to
write ‘x/y/../z’ for a good reason and changing that could break their
expectations.

Can you see how we end up with those entries? These at DT_NEEDED
entries, not DT_RUNPATH, right?

If so, that probably means that ghc at some points invokes the linker
along the lines of:

ld -o hindent ../foo/../bar/../baz/libbaz.so

Could you check in build logs exactly how that executable gets linked?
Is there a way we could canonicalize there, or, better, get the build
system to do something like:

ld -o hindent -L ../foo/../bar/../baz -lbaz

? That way DT_NEEDED would be just “libbaz.so” instead of the complete
file name. DT_RUNPATH would contain the weird file name, but that’s
probably okay.

HTH,
Ludo’.
L
L
Lars-Dominik Braun wrote on 1 Feb 2023 09:35
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 61055@debbugs.gnu.org)
Y9okST+5mW2ASE40@noor.fritz.box
Hi Ludo,

Toggle quote (2 lines)
> Can you see how we end up with those entries? These at DT_NEEDED
> entries, not DT_RUNPATH, right?
they definitely do not come from the hindent binary. `readelf -d` looks
like this

---snip---
0x0000000000000001 (NEEDED) Shared library: [libHSpath-io-1.7.0-Y7nuUr9RcCC0rgElOk2Zd-ghc9.2.5.so]
0x0000000000000001 (NEEDED) Shared library: [libHSunix-compat-0.5.4-2Fa5hW7FPv81iCQxP5gtt5-ghc9.2.5.so]
0x0000000000000001 (NEEDED) Shared library: [libHStemporary-1.3-K9lHfUrZ43CE7BhnE8cSVB-ghc9.2.5.so]
0x000000000000001d (RUNPATH) Library runpath: [/gnu/store/2nrzbaxmqs2rq9yv52bpyn2azb3qj6h1-ghc-hindent-5.3.4/lib/x86_64-linux-ghc-9.2.5:/gnu/store/1cyk8j2nd6r0cvm6kx1408kd763yf8h5-ghc-9.2.5/lib/ghc-9.2.5/Cabal-3.6.3.0:/gnu/store/lk1mgbds7rf9ilv425msz840ky8cfn79-ghc-diff-0.4.1/lib/x86_64-linux-ghc-9.2.5:/gnu/store/866s8qcghallds7azzcx075a06mr64h4-ghc-hunit-1.6.2.0/lib/x86_64-linux-ghc-9.2.5:/gnu/store/rgm9h2a4v1zgwm1mbfb36ycjbkywlxj6-ghc-onetuple-0.3.1/lib/x86_64-linux-ghc-9.2.5:…
---snap---

and I believe GHC adds `-L/gnu/store/…` and `-lHSXXX` to the linker
invokation only, which should be correct. However GHC’s
bundled libraries are linked differently and result in a `readelf -d
/gnu/store/1cyk8j2nd6r0cvm6kx1408kd763yf8h5-ghc-9.2.5/lib/ghc-9.2.5/Cabal-3.6.3.0/libHSCabal-3.6.3.0-ghc9.2.5.so`
like this:

---snip---
0x0000000000000001 (NEEDED) Shared library: [libHSprocess-1.6.16.0-ghc9.2.5.so]
0x0000000000000001 (NEEDED) Shared library: [libHSparsec-3.1.15.0-ghc9.2.5.so]
0x0000000000000001 (NEEDED) Shared library: [libHStext-1.2.5.0-ghc9.2.5.so]
0x000000000000001d (RUNPATH) Library runpath: [$ORIGIN/../process-1.6.16.0:$ORIGIN/../parsec-3.1.15.0:$ORIGIN/../text-1.2.5.0:$ORIGIN/../mtl-2.2.2:$ORIGIN/../transformers-0.5.6.2:$ORIGIN/../directory-1.3.6.2:$ORIGIN/../unix-2.7.2.2:$ORIGIN/../time-1.11.1.1:$ORIGIN/../filepath-1.4.2.2:$ORIGIN/../binary-0.8.9.0:$ORIGIN/../containers-0.6.5.1:$ORIGIN/../bytestring-0.11.3.1:$ORIGIN/../template-haskell-2.18.0.0:$ORIGIN/../pretty-1.1.3.6:$ORIGIN/../ghc-boot-th-9.2.5:$ORIGIN/../deepseq-1.4.6.1:$ORIGIN/../array-0.5.4.0:$ORIGIN/../base-4.16.4.0:$ORIGIN/../ghc-bignum-1.2:$ORIGIN/../ghc-prim-0.8.0:$ORIGIN/../rts:/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/lib:/gnu/store/094bbaq6glba86h1d4cj16xhdi6fk2jl-gcc-10.3.0-lib/lib:/gnu/store/fwbiihd2sbhai63y1pvvdh0f2bakfzrf-gmp-6.2.1/lib:/gnu/store/094bbaq6glba86h1d4cj16xhdi6fk2jl-gcc-10.3.0-lib/lib/gcc/x86_64-unknown-linux-gnu/10.3.0/../../..]
---snap---

These obviously will get expanded to
/gnu/store/1cyk8j2nd6r0cvm6kx1408kd763yf8h5-ghc-9.2.5/lib/ghc-9.2.5/Cabal-3.6.3.0/../process-1.6.16.0
And there’s about 40 of these bundled libraries referencing each other,
which is why the problem is amplified.

I still believe this is a bug in file-needed/recursive, because it
recurses, but does not correctly keep track of “visited” shared
libraries (by not canonicalizing paths). Its documentation also says
that it returns a “list of absolute .so file names” – which I
would expect not to have relative path elements in. The output of `ldd`
does so too. So yes, canonicalize-path may be expensive, but evaluating
10000 shared libraries is too.

Cheers,
Lars

PS: This is not a problem for wip-haskell any more, since I switched to
static linking, but it’s going to bite someone else.
?