file-needed/recurive does not canonicalize paths

  • Open
  • quality assurance status badge
Details
2 participants
  • Lars-Dominik Braun
  • Ludovic Courtès
Owner
unassigned
Submitted by
Lars-Dominik Braun
Severity
normal
L
L
Lars-Dominik Braun wrote on 25 Jan 2023 11:23
(address . bug-guix@gnu.org)
Y9EDPtzJws2NFxfT@noor.fritz.box
Hi,

(CC-ing Ludo, who wrote the code according to git logs)

during testing of wip-haskell I observed the make-dynamic-linker-cache
phase is taking alot of time (up to two minutes on a fast machine with
SSD). Looking at ghc-hindent for example [1]:

starting phase `make-dynamic-linker-cache'
created '/gnu/store/2nrzbaxmqs2rq9yv52bpyn2azb3qj6h1-ghc-hindent-5.3.4/etc/ld.so.cache' from 10085 library search path entries
phase `make-dynamic-linker-cache' succeeded after 119.5 seconds

And while Haskell packages link to a pretty large number of dynamic
libraries (116 in this case), 10000 search path entries seems wrong. Running just

(file-needed/recursive "/gnu/store/2nrzbaxmqs2rq9yv52bpyn2azb3qj6h1-ghc-hindent-5.3.4/bin/hindent")

takes a long time and reveals entries like
/gnu/store/1cyk8j2nd6r0cvm6kx1408kd763yf8h5-ghc-9.2.5/lib/ghc-9.2.5/Cabal-3.6.3.0/../directory-1.3.6.2/../unix-2.7.2.2/../bytestring-0.11.3.1/../template-haskell-2.18.0.0/../pretty-1.1.3.6/../array-0.5.4.0/../base-4.16.4.0/../ghc-bignum-1.2/../ghc-prim-0.8.0/libHSghc-prim-0.8.0-ghc9.2.5.so
so it looks like it deduplicates values, but does not canonicalize
paths. A relatively straight-forward fix could be the following change,
but I don’t know if that would cause any issues, since canonicalize-path
throws an exception if the resulting path does not exist. It’s also
a world rebuild since pretty much any package uses this phase (and the
reason and I cannot test it on a larger scale).

---snip---
Toggle diff (21 lines)
diff --git a/guix/build/gremlin.scm b/guix/build/gremlin.scm
index 2a74d51dd9..6eb8f688ea 100644
--- a/guix/build/gremlin.scm
+++ b/guix/build/gremlin.scm
@@ -285,8 +285,8 @@ (define (file-needed/recursive file)
(if (and runpath needed)
(let* ((runpath (map (cute expand-origin <> (dirname file))
runpath))
- (resolved (map (cut search-path runpath <>)
- needed))
+ (resolved (map (lambda (x) (and=> x canonicalize-path)) (map (cut search-path runpath <>)
+ needed)))
(failed (filter-map (lambda (needed resolved)
(and (not resolved)
(not (libc-library? needed))
---snap---

Cheers,
Lars

[1] https://ci.guix.gnu.org/build/366156/log/raw
L
L
Ludovic Courtès wrote on 30 Jan 2023 17:31
(name . Lars-Dominik Braun)(address . lars@6xq.net)(address . 61055@debbugs.gnu.org)
87ilgoyoei.fsf@gnu.org
Hi Lars,

Lars-Dominik Braun <lars@6xq.net> skribis:

Toggle quote (22 lines)
> during testing of wip-haskell I observed the make-dynamic-linker-cache
> phase is taking alot of time (up to two minutes on a fast machine with
> SSD). Looking at ghc-hindent for example [1]:
>
> starting phase `make-dynamic-linker-cache'
> created '/gnu/store/2nrzbaxmqs2rq9yv52bpyn2azb3qj6h1-ghc-hindent-5.3.4/etc/ld.so.cache' from 10085 library search path entries
> phase `make-dynamic-linker-cache' succeeded after 119.5 seconds
>
> And while Haskell packages link to a pretty large number of dynamic
> libraries (116 in this case), 10000 search path entries seems wrong. Running just
>
> (file-needed/recursive "/gnu/store/2nrzbaxmqs2rq9yv52bpyn2azb3qj6h1-ghc-hindent-5.3.4/bin/hindent")
>
> takes a long time and reveals entries like
> /gnu/store/1cyk8j2nd6r0cvm6kx1408kd763yf8h5-ghc-9.2.5/lib/ghc-9.2.5/Cabal-3.6.3.0/../directory-1.3.6.2/../unix-2.7.2.2/../bytestring-0.11.3.1/../template-haskell-2.18.0.0/../pretty-1.1.3.6/../array-0.5.4.0/../base-4.16.4.0/../ghc-bignum-1.2/../ghc-prim-0.8.0/libHSghc-prim-0.8.0-ghc9.2.5.so
> so it looks like it deduplicates values, but does not canonicalize
> paths. A relatively straight-forward fix could be the following change,
> but I don’t know if that would cause any issues, since canonicalize-path
> throws an exception if the resulting path does not exist. It’s also
> a world rebuild since pretty much any package uses this phase (and the
> reason and I cannot test it on a larger scale).

Right. Other arguments against systematic canonicalization: (1)
‘canonicalize-path’ is costly, (2) developers and tools might choose to
write ‘x/y/../z’ for a good reason and changing that could break their
expectations.

Can you see how we end up with those entries? These at DT_NEEDED
entries, not DT_RUNPATH, right?

If so, that probably means that ghc at some points invokes the linker
along the lines of:

ld -o hindent ../foo/../bar/../baz/libbaz.so

Could you check in build logs exactly how that executable gets linked?
Is there a way we could canonicalize there, or, better, get the build
system to do something like:

ld -o hindent -L ../foo/../bar/../baz -lbaz

? That way DT_NEEDED would be just “libbaz.so” instead of the complete
file name. DT_RUNPATH would contain the weird file name, but that’s
probably okay.

HTH,
Ludo’.
L
L
Lars-Dominik Braun wrote on 1 Feb 2023 09:35
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 61055@debbugs.gnu.org)
Y9okST+5mW2ASE40@noor.fritz.box
Hi Ludo,

Toggle quote (2 lines)
> Can you see how we end up with those entries? These at DT_NEEDED
> entries, not DT_RUNPATH, right?
they definitely do not come from the hindent binary. `readelf -d` looks
like this

---snip---
0x0000000000000001 (NEEDED) Shared library: [libHSpath-io-1.7.0-Y7nuUr9RcCC0rgElOk2Zd-ghc9.2.5.so]
0x0000000000000001 (NEEDED) Shared library: [libHSunix-compat-0.5.4-2Fa5hW7FPv81iCQxP5gtt5-ghc9.2.5.so]
0x0000000000000001 (NEEDED) Shared library: [libHStemporary-1.3-K9lHfUrZ43CE7BhnE8cSVB-ghc9.2.5.so]
0x000000000000001d (RUNPATH) Library runpath: [/gnu/store/2nrzbaxmqs2rq9yv52bpyn2azb3qj6h1-ghc-hindent-5.3.4/lib/x86_64-linux-ghc-9.2.5:/gnu/store/1cyk8j2nd6r0cvm6kx1408kd763yf8h5-ghc-9.2.5/lib/ghc-9.2.5/Cabal-3.6.3.0:/gnu/store/lk1mgbds7rf9ilv425msz840ky8cfn79-ghc-diff-0.4.1/lib/x86_64-linux-ghc-9.2.5:/gnu/store/866s8qcghallds7azzcx075a06mr64h4-ghc-hunit-1.6.2.0/lib/x86_64-linux-ghc-9.2.5:/gnu/store/rgm9h2a4v1zgwm1mbfb36ycjbkywlxj6-ghc-onetuple-0.3.1/lib/x86_64-linux-ghc-9.2.5:…
---snap---

and I believe GHC adds `-L/gnu/store/…` and `-lHSXXX` to the linker
invokation only, which should be correct. However GHC’s
bundled libraries are linked differently and result in a `readelf -d
/gnu/store/1cyk8j2nd6r0cvm6kx1408kd763yf8h5-ghc-9.2.5/lib/ghc-9.2.5/Cabal-3.6.3.0/libHSCabal-3.6.3.0-ghc9.2.5.so`
like this:

---snip---
0x0000000000000001 (NEEDED) Shared library: [libHSprocess-1.6.16.0-ghc9.2.5.so]
0x0000000000000001 (NEEDED) Shared library: [libHSparsec-3.1.15.0-ghc9.2.5.so]
0x0000000000000001 (NEEDED) Shared library: [libHStext-1.2.5.0-ghc9.2.5.so]
0x000000000000001d (RUNPATH) Library runpath: [$ORIGIN/../process-1.6.16.0:$ORIGIN/../parsec-3.1.15.0:$ORIGIN/../text-1.2.5.0:$ORIGIN/../mtl-2.2.2:$ORIGIN/../transformers-0.5.6.2:$ORIGIN/../directory-1.3.6.2:$ORIGIN/../unix-2.7.2.2:$ORIGIN/../time-1.11.1.1:$ORIGIN/../filepath-1.4.2.2:$ORIGIN/../binary-0.8.9.0:$ORIGIN/../containers-0.6.5.1:$ORIGIN/../bytestring-0.11.3.1:$ORIGIN/../template-haskell-2.18.0.0:$ORIGIN/../pretty-1.1.3.6:$ORIGIN/../ghc-boot-th-9.2.5:$ORIGIN/../deepseq-1.4.6.1:$ORIGIN/../array-0.5.4.0:$ORIGIN/../base-4.16.4.0:$ORIGIN/../ghc-bignum-1.2:$ORIGIN/../ghc-prim-0.8.0:$ORIGIN/../rts:/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/lib:/gnu/store/094bbaq6glba86h1d4cj16xhdi6fk2jl-gcc-10.3.0-lib/lib:/gnu/store/fwbiihd2sbhai63y1pvvdh0f2bakfzrf-gmp-6.2.1/lib:/gnu/store/094bbaq6glba86h1d4cj16xhdi6fk2jl-gcc-10.3.0-lib/lib/gcc/x86_64-unknown-linux-gnu/10.3.0/../../..]
---snap---

These obviously will get expanded to
/gnu/store/1cyk8j2nd6r0cvm6kx1408kd763yf8h5-ghc-9.2.5/lib/ghc-9.2.5/Cabal-3.6.3.0/../process-1.6.16.0
And there’s about 40 of these bundled libraries referencing each other,
which is why the problem is amplified.

I still believe this is a bug in file-needed/recursive, because it
recurses, but does not correctly keep track of “visited” shared
libraries (by not canonicalizing paths). Its documentation also says
that it returns a “list of absolute .so file names” – which I
would expect not to have relative path elements in. The output of `ldd`
does so too. So yes, canonicalize-path may be expensive, but evaluating
10000 shared libraries is too.

Cheers,
Lars

PS: This is not a problem for wip-haskell any more, since I switched to
static linking, but it’s going to bite someone else.
?
Your comment

Commenting via the web interface is currently disabled.

To comment on this conversation send an email to 61055@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 61055
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch