Store references in SBCL-compiled code are "invisible"

DoneSubmitted by Ludovic Courtès.
Details
6 participants
  • Danny Milosavljevic
  • Guillaume Le Vaillant
  • Leo Famulari
  • Ludovic Courtès
  • Pierre Neidhardt
  • Mark H Weaver
Owner
unassigned
Severity
important
L
L
Ludovic Courtès wrote on 23 Dec 2018 15:19
(name . Bug Guix)(address . bug-guix@gnu.org)
87r2e8jpfx.fsf@gnu.org
Hello,
As discussed with Pierre at the R-B Summit, ‘sbcl-next’ lacks areference to ‘next-gtk-webkit’ even though is invokes it:
Toggle snippet (13 lines)$ guix gc --references $(type -P next) | grep next-/gnu/store/9d66xb8wvggsp0x9pxj61mzqy007978f-sbcl-next-1.1.0/gnu/store/pqy064fw3vkfld6lw95vi0zavj19zvrc-sbcl-next-1.1.0-lib$ ./pre-inst-env guix run next
WARNING: Setting locale failed. Check the following variables for correct values: LANG=en_US.utf8Unhandled SIMPLE-ERROR in thread #<SB-THREAD:THREAD "main thread" RUNNING {10005885B3}>: Couldn't execute "/gnu/store/7p6pbcmdgr53dff6033gcfl2jq0d762h-next-gtk-webkit-1.1.0/bin/next-gtk-webkit": No such file or directory
(Here ‘guix run’ runs ‘next’ in a container with exactly the closure of‘next’, nothing more, and the ‘next’ binary is grafted.)
So the problem looks a lot like that this GCC issue we fixed a whileback: https://bugs.gnu.org/24703.
Looking at the ‘sbcl-next’ package, the reference to ‘next-gtk-webkit’is inserted in gtk-webkit.lisp:
Toggle snippet (4 lines)(defvar *gtk-webkit-command* "next-gtk-webkit" "Path to the GTK-Webkit platform port executable.")
Through hexl-mode on the ‘next’ binary, we can find that reference:
Toggle snippet (27 lines)01d0bac0: 2f00 0000 6700 0000 6e00 0000 7500 0000 /...g...n...u...01d0bad0: 2f00 0000 7300 0000 7400 0000 6f00 0000 /...s...t...o...01d0bae0: 7200 0000 6500 0000 2f00 0000 3700 0000 r...e.../...7...01d0baf0: 7000 0000 3600 0000 7000 0000 6200 0000 p...6...p...b...01d0bb00: 6300 0000 6d00 0000 6400 0000 6700 0000 c...m...d...g...01d0bb10: 7200 0000 3500 0000 3300 0000 6400 0000 r...5...3...d...01d0bb20: 6600 0000 6600 0000 3600 0000 3000 0000 f...f...6...0...01d0bb30: 3300 0000 3300 0000 6700 0000 6300 0000 3...3...g...c...01d0bb40: 6600 0000 6c00 0000 3200 0000 6a00 0000 f...l...2...j...01d0bb50: 7100 0000 3000 0000 6400 0000 3700 0000 q...0...d...7...01d0bb60: 3600 0000 3200 0000 6800 0000 2d00 0000 6...2...h...-...01d0bb70: 6e00 0000 6500 0000 7800 0000 7400 0000 n...e...x...t...01d0bb80: 2d00 0000 6700 0000 7400 0000 6b00 0000 -...g...t...k...01d0bb90: 2d00 0000 7700 0000 6500 0000 6200 0000 -...w...e...b...01d0bba0: 6b00 0000 6900 0000 7400 0000 2d00 0000 k...i...t...-...01d0bbb0: 3100 0000 2e00 0000 3100 0000 2e00 0000 1.......1.......01d0bbc0: 3000 0000 2f00 0000 6200 0000 6900 0000 0.../...b...i...01d0bbd0: 6e00 0000 2f00 0000 6e00 0000 6500 0000 n.../...n...e...01d0bbe0: 7800 0000 7400 0000 2d00 0000 6700 0000 x...t...-...g...01d0bbf0: 7400 0000 6b00 0000 2d00 0000 7700 0000 t...k...-...w...01d0bc00: 6500 0000 6200 0000 6b00 0000 6900 0000 e...b...k...i...01d0bc10: 7400 0000 0000 0000 0000 0000 0000 0000 t...............01d0bc20: e100 0100 0000 0000 2800 0000 0000 0000 ........(.......01d0bc30: 2a47 544b 2d57 4542 4b49 542d 434f 4d4d *GTK-WEBKIT-COMM01d0bc40: 414e 442a 0000 0000 0000 0000 0000 0000 AND*............
Apparently this string literal is stored as UTF-32 (UCS-4) or similar,which prevents the reference scanner and the grafting code from findingit, and problems ensue. :-)
Pierre, Andy: is there any way to tell SBCL to store this literal asASCII/UTF-8? That would be an easy fix, though we should discuss thepros and cons and whether to enable that globally.
Thanks in advance!
Ludo’.
P
P
Pierre Neidhardt wrote on 23 Dec 2018 16:05
(name . Ludovic Courtès)(address . ludo@gnu.org)
87a7kwjnai.fsf@ambrevar.xyz
Thanks for looking into this, Ludo.
At first glance, I'd say that this is not a compilation option but the waystrings are encoded by default. It seems that multibyte encoding is used allover the place by a few compilers including SBCL (and CCL I think).
One way I know around this (I'm by no mean a Common Lisp expert) is theflexi-streams package for re-encoding.
More generally, shouldn't we make the reference scanner a bit smarter? Inparticular, how does it handle non-ASCII references? Maybe it would not beunreasonable to handle UTF-8 and UCS-4 for instance?
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAlwfpFUACgkQm9z0l6S7zH/rdggAnQg4EGjzFVDFFNwofXuTBydyu+uLV6FHjagMA1ijxaEPtL27NhmdWUdMoFgRKIKabQpixyDSWJhAPGPIv2JHdrqiBwNRUfaDWSKhoh8/qA654QF8NiSFCHafJpXMcdYVGvQ92Fo9OUFls8CWeWSpaEgQcTeIeeTNLwDCid8ob5gFW8doaqxraGw6dkLeqIOuenB7jI/7cBs4yD4e+r8V/IAY/mVvuTZ+gRFGu+StbMo01KRX2X1xOVaLL4WiaGhJHyDYqw3otCPfZduOvsOyfhSrov3HpPT6vocQ11Wb8tW7t+JipO+w1DLiFMG7JYdAN3ibsDXXncsKcSxgeszj1g===oKMR-----END PGP SIGNATURE-----
M
M
Mark H Weaver wrote on 23 Dec 2018 17:45
Re: bug#33848: Store references in SBCL-compiled code are "invisible"
(name . Ludovic Courtès)(address . ludo@gnu.org)
877eg0i43j.fsf@netris.org
Hi Ludovic,
Ludovic Courtès <ludo@gnu.org> writes:
Toggle quote (61 lines)> As discussed with Pierre at the R-B Summit, ‘sbcl-next’ lacks a> reference to ‘next-gtk-webkit’ even though is invokes it:>> $ guix gc --references $(type -P next) | grep next-> /gnu/store/9d66xb8wvggsp0x9pxj61mzqy007978f-sbcl-next-1.1.0> /gnu/store/pqy064fw3vkfld6lw95vi0zavj19zvrc-sbcl-next-1.1.0-lib> $ ./pre-inst-env guix run next>> WARNING: Setting locale failed.> Check the following variables for correct values:> LANG=en_US.utf8> Unhandled SIMPLE-ERROR in thread #<SB-THREAD:THREAD "main thread" RUNNING> {10005885B3}>:> Couldn't execute "/gnu/store/7p6pbcmdgr53dff6033gcfl2jq0d762h-next-gtk-webkit-1.1.0/bin/next-gtk-webkit": No such file or directory>>> (Here ‘guix run’ runs ‘next’ in a container with exactly the closure of> ‘next’, nothing more, and the ‘next’ binary is grafted.)>> So the problem looks a lot like that this GCC issue we fixed a while> back: <https://bugs.gnu.org/24703>.>> Looking at the ‘sbcl-next’ package, the reference to ‘next-gtk-webkit’> is inserted in gtk-webkit.lisp:>> (defvar *gtk-webkit-command* "next-gtk-webkit"> "Path to the GTK-Webkit platform port executable.")>>> Through hexl-mode on the ‘next’ binary, we can find that reference:>> 01d0bac0: 2f00 0000 6700 0000 6e00 0000 7500 0000 /...g...n...u...> 01d0bad0: 2f00 0000 7300 0000 7400 0000 6f00 0000 /...s...t...o...> 01d0bae0: 7200 0000 6500 0000 2f00 0000 3700 0000 r...e.../...7...> 01d0baf0: 7000 0000 3600 0000 7000 0000 6200 0000 p...6...p...b...> 01d0bb00: 6300 0000 6d00 0000 6400 0000 6700 0000 c...m...d...g...> 01d0bb10: 7200 0000 3500 0000 3300 0000 6400 0000 r...5...3...d...> 01d0bb20: 6600 0000 6600 0000 3600 0000 3000 0000 f...f...6...0...> 01d0bb30: 3300 0000 3300 0000 6700 0000 6300 0000 3...3...g...c...> 01d0bb40: 6600 0000 6c00 0000 3200 0000 6a00 0000 f...l...2...j...> 01d0bb50: 7100 0000 3000 0000 6400 0000 3700 0000 q...0...d...7...> 01d0bb60: 3600 0000 3200 0000 6800 0000 2d00 0000 6...2...h...-...> 01d0bb70: 6e00 0000 6500 0000 7800 0000 7400 0000 n...e...x...t...> 01d0bb80: 2d00 0000 6700 0000 7400 0000 6b00 0000 -...g...t...k...> 01d0bb90: 2d00 0000 7700 0000 6500 0000 6200 0000 -...w...e...b...> 01d0bba0: 6b00 0000 6900 0000 7400 0000 2d00 0000 k...i...t...-...> 01d0bbb0: 3100 0000 2e00 0000 3100 0000 2e00 0000 1.......1.......> 01d0bbc0: 3000 0000 2f00 0000 6200 0000 6900 0000 0.../...b...i...> 01d0bbd0: 6e00 0000 2f00 0000 6e00 0000 6500 0000 n.../...n...e...> 01d0bbe0: 7800 0000 7400 0000 2d00 0000 6700 0000 x...t...-...g...> 01d0bbf0: 7400 0000 6b00 0000 2d00 0000 7700 0000 t...k...-...w...> 01d0bc00: 6500 0000 6200 0000 6b00 0000 6900 0000 e...b...k...i...> 01d0bc10: 7400 0000 0000 0000 0000 0000 0000 0000 t...............> 01d0bc20: e100 0100 0000 0000 2800 0000 0000 0000 ........(.......> 01d0bc30: 2a47 544b 2d57 4542 4b49 542d 434f 4d4d *GTK-WEBKIT-COMM> 01d0bc40: 414e 442a 0000 0000 0000 0000 0000 0000 AND*............>> Apparently this string literal is stored as UTF-32 (UCS-4) or similar,> which prevents the reference scanner and the grafting code from finding> it, and problems ensue. :-)
IMO, we should consider modifying Guix to search for store referencesencoded in UTF-32 and/or UTF-16. I wouldn't be surprised if some otherprograms use those encodings. I'd be willing to work on it.
What do you think?
Mark
L
L
Ludovic Courtès wrote on 23 Dec 2018 18:32
(name . Mark H Weaver)(address . mhw@netris.org)
87d0psi1xo.fsf@gnu.org
Hi Mark,
Mark H Weaver <mhw@netris.org> skribis:
Toggle quote (2 lines)> Ludovic Courtès <ludo@gnu.org> writes:
[...]
Toggle quote (8 lines)>> Apparently this string literal is stored as UTF-32 (UCS-4) or similar,>> which prevents the reference scanner and the grafting code from finding>> it, and problems ensue. :-)>> IMO, we should consider modifying Guix to search for store references> encoded in UTF-32 and/or UTF-16. I wouldn't be surprised if some other> programs use those encodings. I'd be willing to work on it.
I don’t think we’ve encountered the problem before. This would requirefixing both the scanner and the grafting code (though eventually thatmight be a single code base when the Scheme-implemented daemon ismerged) in non-trivial ways.
One issue is that users of an old daemon would get a different behaviorthan users of a new daemon. It would be the first time we introducesuch a significant change in the daemon since Guix was started.
For now I lean towards looking for a way to address the issuespecifically for SBCL. I’d be tempted to generalize if and only if wefind other occurrences of the problem that would make the benefitsoutweigh the development and maintenance costs.
WDYT?
I remember discussing in the past some sort of “pluggable” referencescanning mechanism that could also work for compressed archives, etc.That also looks like the right thing, but it has a development andmaintenance cost that’s pretty high whereas we might be able to addressthe same problems in much simpler ways.
Thanks,Ludo’.
P
P
Pierre Neidhardt wrote on 23 Dec 2018 23:01
(name . Ludovic Courtès)(address . ludo@gnu.org)
874lb3kin6.fsf@ambrevar.xyz
Toggle quote (2 lines)> I don’t think we’ve encountered the problem before.
Actually it does ring a bell for me. Didn't we have a similar issue with Fish,or some dependency?
Toggle quote (3 lines)> For now I lean towards looking for a way to address the issue> specifically for SBCL.
Don't forget that we currently have 5 Lisp compilers.Besides, it's not clear that this can be fixed on the compiler's side, it couldvery well be that patches will be required on a per-project basis.
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAlwgBZ0ACgkQm9z0l6S7zH8XGAgAqUkpkfyLkBTmGkB0E4yhMQ2Mo5elI/PvQQN4NGPHl/VDJysQMZJPnYDSN4FAooCf3v5oenfr+VgZKr7NDDkDgVIdZbUCjzEw0La7FFl8DpB4+riJ0WqtghiBjCr4KRNfuSn1tgIenvMFsswH3otTaAllMIlfqMhxYJDtGYTzcjP059xgDQ0rPlKFPoAOv839rILx0AfdXAp7knIV+q4iN623ZEiGFIJQ3K2JuaPoBkBBVUHkk/lJSOvWnaGrIE56gqOnjMKJuTx9FuhhYPtN8ieNYj/VLV3y5V9v5JyvO3zwV49ahemBEK84n9PxAlU4A0D66Gy6ZmCBN38Ewgfhkw===s6io-----END PGP SIGNATURE-----
L
L
Ludovic Courtès wrote on 24 Dec 2018 15:55
control message for bug #33848
(address . control@debbugs.gnu.org)
87a7kvgek5.fsf@gnu.org
severity 33848 important
L
L
Ludovic Courtès wrote on 24 Dec 2018 15:57
Re: bug#33848: Store references in SBCL-compiled code are "invisible"
(name . Pierre Neidhardt)(address . mail@ambrevar.xyz)
875zvjgefl.fsf@gnu.org
Hi!
Pierre Neidhardt <mail@ambrevar.xyz> skribis:
Toggle quote (9 lines)> Thanks for looking into this, Ludo.>> At first glance, I'd say that this is not a compilation option but the way> strings are encoded by default. It seems that multibyte encoding is used all> over the place by a few compilers including SBCL (and CCL I think).>> One way I know around this (I'm by no mean a Common Lisp expert) is the> flexi-streams package for re-encoding.
OK, we need to investigate.
Toggle quote (4 lines)> More generally, shouldn't we make the reference scanner a bit smarter? In> particular, how does it handle non-ASCII references? Maybe it would not be> unreasonable to handle UTF-8 and UCS-4 for instance?
Store file names are always ASCII so problems arise when they are storedas UTF-16 or UTF-32/UCS-4.
Ludo’.
L
L
Ludovic Courtès wrote on 24 Dec 2018 16:06
(name . Pierre Neidhardt)(address . mail@ambrevar.xyz)
87sgynezha.fsf@gnu.org
Hi Pierre,
Pierre Neidhardt <mail@ambrevar.xyz> skribis:
Toggle quote (5 lines)>> I don’t think we’ve encountered the problem before.>> Actually it does ring a bell for me. Didn't we have a similar issue with Fish,> or some dependency?
We did have a problem with Fish but I can no longer find it. Do youremember what it was? Something with C++, no?
Toggle quote (7 lines)>> For now I lean towards looking for a way to address the issue>> specifically for SBCL.>> Don't forget that we currently have 5 Lisp compilers.> Besides, it's not clear that this can be fixed on the compiler's side, it could> very well be that patches will be required on a per-project basis.
I know little about CL but maybe we can find a solution that works forall five compilers. At least that would be the first approach I wouldsuggest following.
Thanks,Ludo’.
P
P
Pierre Neidhardt wrote on 24 Dec 2018 18:08
(name . Ludovic Courtès)(address . ludo@gnu.org)
87r2e6j1hw.fsf@ambrevar.xyz
Toggle quote (3 lines)> Store file names are always ASCII so problems arise when they are stored> as UTF-16 or UTF-32/UCS-4.
I understand that most programs stick to ASCII filenames, but what about the oddone using non-English, special characters?
Toggle quote (3 lines)> We did have a problem with Fish but I can no longer find it. Do you> remember what it was? Something with C++, no?
I think bug #30265.
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAlwhEqsACgkQm9z0l6S7zH8nhAf9Hv2U1ajhnsl50XrKSr629VR3LFtu6whoiU3WJOygmulOIdlaWJ2IRFSRmwCvD8I/pE+BokAgT28BQNpvyG78+vgJeevb4adTD8eUxQsS2aRLPvJ9Js3B4epYtTDxtm6xp/kFKmxk/9WFYX/lxuyXfSYv/A7m8q3qWfngzvizZjCZVY0iQHrDLlfSxP1TVlUoiudIUo9BCjLmQQuyAkgxgDln9idzgXZKWXZMrW6HcK3Q4Ji2ymowCUf0vHRGj2mjBHo+QSYhOz/NduJPG717THk9C+9xG6eOyFa712VIwEJZc5dPKA50J/s0OaeNE8fk/mCyJ2y3yyE2V61wqhgRvA===E6l9-----END PGP SIGNATURE-----
M
M
Mark H Weaver wrote on 24 Dec 2018 19:12
(name . Ludovic Courtès)(address . ludo@gnu.org)
87tvj2yesd.fsf@netris.org
Hi Ludovic,
Ludovic Courtès <ludo@gnu.org> writes:
Toggle quote (13 lines)> Pierre Neidhardt <mail@ambrevar.xyz> skribis:>>>> For now I lean towards looking for a way to address the issue>>> specifically for SBCL.>>>> Don't forget that we currently have 5 Lisp compilers.>> Besides, it's not clear that this can be fixed on the compiler's side, it could>> very well be that patches will be required on a per-project basis.>> I know little about CL but maybe we can find a solution that works for> all five compilers. At least that would be the first approach I would> suggest following.
I can't imagine a solution that would work for all five compilers, butperhaps that's a failure of imagination on my part. Of course, you'rewelcome to search for such a solution. Can you give me a rough outlineof what you have in mind?
Of course, the usual reason to choose UTF-32 is to support non-ASCIIcharacters while retaining fixed-width code points, so that stringlookups are straightforward and efficient. Using UTF-8 improves spaceefficiency, but at the cost of extra code complexity. That extracomplexity is what I guess we would need to add to each program thatcurrently uses UTF-32. Alternatively, we could extend the on-diskformat to support UTF-8 and then add some kind of "load hook" thatconverts the string to UTF-32 at load time. Either way, it's likely tobe a can of worms.
Consider the case of Guile. Years ago we agreed to switch to UTF-8 asits sole internal string encoding, but it hasn't yet been done becauseit's a big job, even for those of us already intimately familiar withthe code.
Now imagine how hard it would be for someone who barely uses Guile, butnevertheless felt compelled to change our internal string representationto use UTF-8. Moreover, imagine that they hoped to find a singlesolution that would work for several different Scheme implementations.
What would you say to them if they proposed to find a general solutionto convert several Scheme implementations to use UTF-8 as their stringrepresentation, to save themselves the trouble of having to understandeach implementation individually?
I really think it would be a mistake to try to force every program andlanguage implementation to use our preferred string representation. Isuspect it would be vastly easier to compromise and support a few otherpopular string representations in Guix, namely UTF-16 and UTF-32.
If you don't want to change the daemon, it could be worked around in ourbuild-side code as follows: we could add a new phase to certain buildsystems (or possibly gnu-build-system) that scans each output forUTF-16/32 encoded store references that are never referenced in UTF-8.If such references exist, a file with an unobtrusive name would be addedto that output containing those references encoded in UTF-8. This wouldenable our daemon's existing reference scanner to find all of thereferences.
Our grafting code would then need to be extended to recognize andtransform store references encoded in UTF-16/32 as well as UTF-8.
What do you think?
Regards, Mark
P
P
Pierre Neidhardt wrote on 25 Dec 2018 00:58
(name . Mark H Weaver)(address . mhw@netris.org)
87pntqiijc.fsf@ambrevar.xyz
I find Mark's points reasonable, although to be honest I have very littleknowledge of the daemon.
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAlwhcqcACgkQm9z0l6S7zH/OWAf9EAHvRknh8vP7fQlEH0koh30BL0A62gzcn25Aw/Un5zPR7ZZ5vBhm7FqQfOS3fGF+ZYwTd5QWHz/49sNMJes4caaNqxN9x1xm/IBU374/MEkxtvnJqNctYL0zMAIXlruvch+cBYFfAyuCDjkNqFBHuqlFPP1lZCbal6xHvirMLLzNfRhQcFtYXD0Ty3YN0D5T9KfgQcrDEf78ShJSBto7lyBMKe9PqJBeKJexrzkD1XsY+sZB0PXiSrTNCT/tC2MqT8QMRrfGNpEepQIHMqowjVheJ3vcC5NDEKT7IFdY5art5d96+QpFlXNlRrCgC+3b8NGpxoWSR4oGqpR3FrjzEw===Apxv-----END PGP SIGNATURE-----
L
L
Ludovic Courtès wrote on 26 Dec 2018 17:07
(name . Pierre Neidhardt)(address . mail@ambrevar.xyz)
87r2e4e0fu.fsf@gnu.org
Pierre Neidhardt <mail@ambrevar.xyz> skribis:
Toggle quote (6 lines)>> Store file names are always ASCII so problems arise when they are stored>> as UTF-16 or UTF-32/UCS-4.>> I understand that most programs stick to ASCII filenames, but what about the odd> one using non-English, special characters?
That’s a separate debate. :-) Essentially this restriction on storefile names has always been there in Guix (and Nix before that). If wewere to change it, that would raise compatibility issues.
Toggle quote (5 lines)>> We did have a problem with Fish but I can no longer find it. Do you>> remember what it was? Something with C++, no?>> I think bug #30265.
Oh I see, UCS-4 as well. (I can’t believe this bug is still open giventhe relatively simple solutions outlined athttps://issues.guix.info/issue/30265#8. :-))
Thanks,Ludo’.
L
L
Ludovic Courtès wrote on 26 Dec 2018 17:14
(name . Mark H Weaver)(address . mhw@netris.org)
877efwe04u.fsf@gnu.org
Hello!
Mark H Weaver <mhw@netris.org> skribis:
Toggle quote (20 lines)> Ludovic Courtès <ludo@gnu.org> writes:>>> Pierre Neidhardt <mail@ambrevar.xyz> skribis:>>>>>> For now I lean towards looking for a way to address the issue>>>> specifically for SBCL.>>>>>> Don't forget that we currently have 5 Lisp compilers.>>> Besides, it's not clear that this can be fixed on the compiler's side, it could>>> very well be that patches will be required on a per-project basis.>>>> I know little about CL but maybe we can find a solution that works for>> all five compilers. At least that would be the first approach I would>> suggest following.>> I can't imagine a solution that would work for all five compilers, but> perhaps that's a failure of imagination on my part. Of course, you're> welcome to search for such a solution. Can you give me a rough outline> of what you have in mind?
I have nothing specific in mind, I’m just brainstorming with everyonehere. :-)
For a similar situation in C++, there’s a fairly simple and localworkaround:
https://issues.guix.info/issue/30265#8
I’m not familiar with CL but I thought that it we could achievesomething similar, that would be great—I’m not suggesting to change theCL compilers in any non-trivial way.
For example I guess we could always store the file name as a literalbyte vector/list and add a call to turn that into a string.
Does that make sense?
Thanks,Ludo’.
P
P
Pierre Neidhardt wrote on 27 Dec 2018 11:37
(name . Ludovic Courtès)(address . ludo@gnu.org)
8736qji7c1.fsf@ambrevar.xyz
Toggle quote (10 lines)> : > Store file names are always ASCII so problems arise when they are stored> : > as UTF-16 or UTF-32/UCS-4.> : > : I understand that most programs stick to ASCII filenames, but what about the odd> : one using non-English, special characters?> > That’s a separate debate. :-) Essentially this restriction on store> file names has always been there in Guix (and Nix before that). If we> were to change it, that would raise compatibility issues.
But what happens if we attempt to store "á" in the store?
Toggle quote (3 lines)> For example I guess we could always store the file name as a literal> byte vector/list and add a call to turn that into a string.
In the case of Next, that would be a simple patch, but other programs could getmuch more complicated. In the end, this approach requires a linear amount ofwork. Conversely, adding UCS-* support to the scanner would fix this issue onceand for all.
Toggle quote (9 lines)> : > We did have a problem with Fish but I can no longer find it. Do you> : > remember what it was? Something with C++, no?> : > : I think bug #30265.> > Oh I see, UCS-4 as well. (I can’t believe this bug is still open given> the relatively simple solutions outlined at> <https://issues.guix.info/issue/30265#8>. :-))
Well, if currently only two packages out of 8500+ suffer from this, then I thinkit's easier to go with Ludo's suggestion of patching the code to use ASCIIstrings.
Does anyone know about more packages with this issue? It could also be thatmore packages suffer from this, unbeknownst to us.
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAlwkq14ACgkQm9z0l6S7zH9zbAf8C7alkC/FiNu4pb3HkuSWZXKkZ/pccOIXH0ErCiND6SwQC9pBXTgxoYewp9Y3J0SrKyUMVKHidWERkA1EnVR6wBUT3sru6idmiNF2JIBw5JC+UiNdiS5RqvXdKa3eHjqxVXfL2kEINOOSoiB1t6P6chQsxHJjxOs9TTk+8UgFgDMF9VhtYubiaLYfoBOP7FVAIojHHGxth14ekyohT65TD4mgRqK3mTsLxPjrQ43/nAayo6aJWilx5BB1YoRe8bjUNzHS1G0JSsM6E8ZRwwUfwBBhwdqFml2O76LpJoWi/xi358JNldRqD7j/eV0ZNuJAZjONvVJZ9qtfJDifLPJkJQ===IU/3-----END PGP SIGNATURE-----
D
D
Danny Milosavljevic wrote on 27 Dec 2018 14:52
(name . Mark H Weaver)(address . mhw@netris.org)
20181227145258.0c420eac@scratchpost.org
Hi Mark,
On Mon, 24 Dec 2018 13:12:23 -0500Mark H Weaver <mhw@netris.org> wrote:
Toggle quote (4 lines)> Of course, the usual reason to choose UTF-32 is to support non-ASCII> characters while retaining fixed-width code points, so that string> lookups are straightforward and efficient.
This kind of lookup is almost never what is necessary. There are manypeople who assume character is the same as codepoint and to those peopleUTF-32 brings something to the table, but it's really not useful if peopledo text processing correctly, see below.
(Of course whether packages actually do this remains to be seen)
Toggle quote (3 lines)> Using UTF-8 improves space efficiency, but at the cost of extra code>complexity.
I agree.
Toggle quote (4 lines)> That extra> complexity is what I guess we would need to add to each program that> currently uses UTF-32.
Yes, but they usually have to do stream processing even with UTF-32 (becausea character can be composed of possibly infinite number of codepoints),so the infrastructure should be already there and the effort should beminimal.
Toggle quote (5 lines)> Alternatively, we could extend the on-disk> format to support UTF-8 and then add some kind of "load hook" that> converts the string to UTF-32 at load time. Either way, it's likely to> be a can of worms.
If it ever came to that, a pluggable reference scanner would be preferrable. But really, it would irk me to have so much complexityin something so basic (the reference scanner) for no end-user gain(as a distribution we could just mandate UTF-8 for references and theproblem would be gone for the user with no loss of functionality).
It's always easy to add special cases - but more code means more bugsand I think if possible it's best to have only the simple case implementedin the core - because it's less complicated which means more likelyto be correct (for the case it does handle). In the end it depends onwhat would be more code, and more widely used.
Also, if we wanted to debug reference errors, we couldn't use grep anymorebecause it can't handle utf-32 either (neither can any of the other UNIX tools).
Also, I really don't want to return to the time where I had to call iconvonce every three commands to be able to do anything useful on UNIX.
Also, the build daemon is written in C++ and C++ strings are widelyknown to have very very bad codepoint awareness (to say nothing aboutthe horrible conversion facilities).
Also, if both UTF-32 and UTF-8 are used on disk, care needs to not misdetectan UTF-8 sequence as an UTF-32 sequence of different text - or the other wayaround -, but that's unlikely for ASCII strings.
Toggle quote (5 lines)> I really think it would be a mistake to try to force every program and> language implementation to use our preferred string representation. I> suspect it would be vastly easier to compromise and support a few other> popular string representations in Guix, namely UTF-16 and UTF-32.
In 1992, UTF-8 was invented. Subsequently, most of the Internet,all new GNU Linux distributions etc, all UNIX GUI frameworks, Subversionetc standardized on UTF-8, with the eventual goal of standardizing allnetwork transfer and storage to UTF-8. I think that by now the outliersare the ones who need to change, otherwise these senseless encodingconversions will never cease. It's not like different encodings allow forbetter expression of writings or anything useful to the end user.
As a distribution we can't force upstream to change, but just filingbug reports upstream would make us see where they stand on this.
Toggle quote (9 lines)> If you don't want to change the daemon, it could be worked around in our> build-side code as follows: we could add a new phase to certain build> systems (or possibly gnu-build-system) that scans each output for> UTF-16/32 encoded store references that are never referenced in UTF-8.> If such references exist, a file with an unobtrusive name would be added> to that output containing those references encoded in UTF-8. This would> enable our daemon's existing reference scanner to find all of the> references.
I agree that that would be nice. As a first step, even just detectingproblems like that and erroring out would be okay - in order to find themin the first place. Right now, it's difficult to detect and so also difficultto say how wide-spread the problem is. If the problem is wide-spread enoughmy tune could change very quickly.
What you propose is similar to what I did in Java in Guix, only it givesus even more advantages in the Java case (faster class loading andeventual non-propagated inputs).
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAlwk2ToACgkQ5xo1VCwwuqUUzAgApbxUHv/XlbjYMXvV4cOY0maxbx92ndZlJiukCN+bIiMqhuCd7PdEoL7Z1d9ABxe+2oXO4Nkjpez71nhK8ym8KwRYNDuTkCSZbzUJwNEee2pF/OlU2Y+Jugz5ICSlYGCFfwx6Buf9bZReYq1e5qjO//QSytgYC061gYURw/abtGSEyvllHWv4qrl6DFfQuQilycHAOqrT/ACBtMgFFnsV7miHs6CrKTSPPBWKKuA3BM4STNUfHlMeb8ungNUH3ijbDviqBgRiqDy50dZ0kbFv8zSm1LytoySX0qZ7j5oidDJGHATGbEXp4sDU1dPmBSYeasLAVfJn4RQSQFAKba6TuA===gkV6-----END PGP SIGNATURE-----

M
M
Mark H Weaver wrote on 27 Dec 2018 15:03
(name . Pierre Neidhardt)(address . mail@ambrevar.xyz)
87tvizvzgk.fsf@netris.org
Pierre Neidhardt <mail@ambrevar.xyz> writes:
Toggle quote (12 lines)>> : > Store file names are always ASCII so problems arise when they are stored>> : > as UTF-16 or UTF-32/UCS-4.>> : >> : I understand that most programs stick to ASCII filenames, but what about the odd>> : one using non-English, special characters?>> >> That’s a separate debate. :-) Essentially this restriction on store>> file names has always been there in Guix (and Nix before that). If we>> were to change it, that would raise compatibility issues.>> But what happens if we attempt to store "á" in the store?
Indeed. Although we might restrict the immediate entries within/gnu/store to ASCII characters, file names deeper within thosedirectories may have non-ASCII characters. More generally, storereferences may occur within larger strings which might include non-ASCIIcharacters.
Mark
M
M
Mark H Weaver wrote on 27 Dec 2018 15:29
(name . Danny Milosavljevic)(address . dannym@scratchpost.org)
87pntnvy8e.fsf@netris.org
Hi Danny,
Danny Milosavljevic <dannym@scratchpost.org> writes:
Toggle quote (21 lines)> On Mon, 24 Dec 2018 13:12:23 -0500> Mark H Weaver <mhw@netris.org> wrote:>>> Of course, the usual reason to choose UTF-32 is to support non-ASCII>> characters while retaining fixed-width code points, so that string>> lookups are straightforward and efficient.>> This kind of lookup is almost never what is necessary. There are many> people who assume character is the same as codepoint and to those people> UTF-32 brings something to the table, but it's really not useful if people> do text processing correctly, see below.>> (Of course whether packages actually do this remains to be seen)>>> That extra>> complexity is what I guess we would need to add to each program that>> currently uses UTF-32.>> Yes, but they usually have to do stream processing even with UTF-32 (because> a character can be composed of possibly infinite number of codepoints),
I agree with you. However, as silly as it might be, the fact remainsthat almost every modern programming language and string library usescode points as the base units by which to index strings.
Toggle quote (3 lines)> so the infrastructure should be already there and the effort should be> minimal.
The infrastructure might or might not be there, depending on thesophistication of the program's unicode support, but even if it _is_there, it will most likely be a layer that expects to iterate overstrings indexed by code point to find graphemes, etc.
Anyway, if you truly believe the effort should be minimal, feel free toinvestigate and propose patches to fix our 5 common lisp compilers andFish to avoid storing UTF-32 in the object code.
Toggle quote (4 lines)> Also, if both UTF-32 and UTF-8 are used on disk, care needs to not misdetect> an UTF-8 sequence as an UTF-32 sequence of different text - or the other way> around -, but that's unlikely for ASCII strings.
This is not an issue because the substrings that the reference scannerand grafter are looking for are ASCII-only, even if they are part of alarger non-ASCII string. Specifically, they only need to look for thenix hashes.
Toggle quote (11 lines)>> I really think it would be a mistake to try to force every program and>> language implementation to use our preferred string representation. I>> suspect it would be vastly easier to compromise and support a few other>> popular string representations in Guix, namely UTF-16 and UTF-32.>> In 1992, UTF-8 was invented. Subsequently, most of the Internet,> all new GNU Linux distributions etc, all UNIX GUI frameworks, Subversion> etc standardized on UTF-8, with the eventual goal of standardizing all> network transfer and storage to UTF-8. I think that by now the outliers> are the ones who need to change,
I agree that we need to standardize on Unicode. However, given theperhaps unfortunate fact that almost everyone has standardized on codepoints as the units by which to index strings, choosing UTF-32 as aninternal representation is a very reasonable choice, IMO.
Anyway, feel free to engage with the developers of the Common Lispimplementations that use UTF-32 and try to convince them to change.
The remaining question is: what to do if upstream refuses to change? Dowe exclude that software in Guix, or do we maintain our own patches tooverride upstream's decision?
Toggle quote (15 lines)>> If you don't want to change the daemon, it could be worked around in our>> build-side code as follows: we could add a new phase to certain build>> systems (or possibly gnu-build-system) that scans each output for>> UTF-16/32 encoded store references that are never referenced in UTF-8.>> If such references exist, a file with an unobtrusive name would be added>> to that output containing those references encoded in UTF-8. This would>> enable our daemon's existing reference scanner to find all of the>> references.>> I agree that that would be nice. As a first step, even just detecting> problems like that and erroring out would be okay - in order to find them> in the first place. Right now, it's difficult to detect and so also difficult> to say how wide-spread the problem is. If the problem is wide-spread enough> my tune could change very quickly.
Sure, it would be useful to have more data on what packages arecurrently affected by this issue.
Regards, Mark
L
L
Ludovic Courtès wrote on 27 Dec 2018 15:45
(name . Mark H Weaver)(address . mhw@netris.org)
87o9979gfn.fsf@gnu.org
Hello,
Mark H Weaver <mhw@netris.org> skribis:
Toggle quote (20 lines)> Pierre Neidhardt <mail@ambrevar.xyz> writes:>>>> : > Store file names are always ASCII so problems arise when they are stored>>> : > as UTF-16 or UTF-32/UCS-4.>>> : >>> : I understand that most programs stick to ASCII filenames, but what about the odd>>> : one using non-English, special characters?>>> >>> That’s a separate debate. :-) Essentially this restriction on store>>> file names has always been there in Guix (and Nix before that). If we>>> were to change it, that would raise compatibility issues.>>>> But what happens if we attempt to store "á" in the store?>> Indeed. Although we might restrict the immediate entries within> /gnu/store to ASCII characters, file names deeper within those> directories may have non-ASCII characters. More generally, store> references may occur within larger strings which might include non-ASCII> characters.
Right. For example ‘nss-certs’ contains non-ASCII, UTF-8-encoded filenames.
For “top-level” store file names, the restriction is enforced by‘checkStoreName’ in libstore/store-api.cc.
Ludo’.
P
P
Pierre Neidhardt wrote on 27 Dec 2018 16:02
(name . Ludovic Courtès)(address . ludo@gnu.org)
87tvizgghs.fsf@ambrevar.xyz
Just to be sure I understand: non-toplevel, non-ASCII file names willnot be scanned properly, right?
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAlwk6X8ACgkQm9z0l6S7zH/cXQgAnsjU66YtPvV2m1Mu2mRYRZfOFYicrKrjPiUkHKeQDY+CnBfPZQFNbMSLEICwx7JBZ86X48q0jJgJQ2PggQ7L17/4IhLKgL1brVTkuMIWyYXAj6hDF7qJ/ZK1mkUve432gbMkavmBWSqddENa38T/XdUxij3SxeYyDp0YjhRjbeLddZH2eIji+B8mqavVmR/wsyTb0u8+xdScTGaB5QoOiYKiE58g02lrZTF2PUkNXG58LhbmPwl1hpfs0abt5p4k/wvY5d9V8baXLlW9s4NQVE3/vpiE+ycQqPj90iUIummb83trbAshfUHHwhUGRZvYwnWAXVebivC8jWmR2UQdog===3pew-----END PGP SIGNATURE-----
P
P
Pierre Neidhardt wrote on 27 Dec 2018 17:15
(name . Ludovic Courtès)(address . ludo@gnu.org)
87r2e3gd3c.fsf@ambrevar.xyz
Danny Milosavljevic <dannym@scratchpost.org> writes:
Toggle quote (11 lines)> In 1992, UTF-8 was invented. Subsequently, most of the Internet,> all new GNU Linux distributions etc, all UNIX GUI frameworks, Subversion> etc standardized on UTF-8, with the eventual goal of standardizing all> network transfer and storage to UTF-8. I think that by now the outliers> are the ones who need to change, otherwise these senseless encoding> conversions will never cease. It's not like different encodings allow for> better expression of writings or anything useful to the end user.> > As a distribution we can't force upstream to change, but just filing> bug reports upstream would make us see where they stand on this.
I agree with this. Reporting upstream should be a first step.
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAlwk+rcACgkQm9z0l6S7zH9gUgf+Nns9ka7lqwHLI14NoJfKk77nUli0A0ANr+I8yQqyll3u0KdiqulNStOcKnJtLQYny/co27/mMIMllL8im2pwoVhZ6hUxvwyp1AetR4CW5ArPqma2aFKpDFKxd+T5W1ZUA/fwyB3S1hc3qVIVOzxAHSKQp/Ik/tb++ZDmoHCEg5qlAFxJovlcsCPUnOlee9bqMXfweqZhckl+97xXmK9mJ3tZ3ijZKQ/ceBmvJvcf7t+XEOSOQQ3FQxsqYlkUh0jB39NrSTH/HbLxRzPUaihuwZRCEXJu0c29E6S8u+MmXHF04wdH9TXeHoWBBEAnq4txR6tjKKMEjpDAAeCJfHqnOw===QuoM-----END PGP SIGNATURE-----
L
L
Ludovic Courtès wrote on 27 Dec 2018 18:03
(name . Pierre Neidhardt)(address . mail@ambrevar.xyz)
87k1juaomo.fsf@gnu.org
Pierre Neidhardt <mail@ambrevar.xyz> skribis:
Toggle quote (3 lines)> Just to be sure I understand: non-toplevel, non-ASCII file names will> not be scanned properly, right?
Every file in the store is properly scanned for references. It’s justthat users cannot create top-level items with a non-ASCII file name.
I hope this clarifies things!
Ludo’.
P
P
Pierre Neidhardt wrote on 27 Dec 2018 19:57
(name . Ludovic Courtès)(address . ludo@gnu.org)
87muoqhk62.fsf@ambrevar.xyz
Toggle quote (3 lines)> Every file in the store is properly scanned for references. It’s just> that users cannot create top-level items with a non-ASCII file name.
So if '/gnu/store/...-foo/á' is stored as UTF-8 in a binary, then it will befound? Is it because the filesystem encoding is also UTF-8 and Guix scans overbyte arrays?
Sorry for dragging on this, I guess I should look at the code at this point butI have very little time these days.
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAlwlIKUACgkQm9z0l6S7zH+Efgf/TSVtek2CEmPI+/oF8lD2xe6oSUhog4zvhirSXvsrDvLpY/R7i8lqjhw0ZSqU/yqkPiz/b/sFZsWOdyjWUsniVjcOjujuy7tzwZynifj0RF9sCCnjJcM3j8Dm3ioKwO5ppyAZBQRwt+UbjBoOC9NNyT3oDrjs9DWsWEL9cXBvkBbzoKzh/9kH5aaPvY1HoCx7mVuHeRsmKDR+YnaclArbux5jAseNCnWszqUJjkFvuDgGlmup5supsO3Ht+tFPlVeeEKu414jynZUktyKrJZdRplgmpfqBpuB+6TlQTCG9AW4gqvjoqrgfGbRVrM1kAU01xtEILl9Zx7C2wRE2+RAtg===oyjv-----END PGP SIGNATURE-----
L
L
Ludovic Courtès wrote on 27 Dec 2018 22:54
(name . Pierre Neidhardt)(address . mail@ambrevar.xyz)
87zhsq8wkj.fsf@gnu.org
Pierre Neidhardt <mail@ambrevar.xyz> skribis:
Toggle quote (7 lines)>> Every file in the store is properly scanned for references. It’s just>> that users cannot create top-level items with a non-ASCII file name.>> So if '/gnu/store/...-foo/á' is stored as UTF-8 in a binary, then it will be> found? Is it because the filesystem encoding is also UTF-8 and Guix scans over> byte arrays?
The reference scanner, currently written in C++, traverses wholedirectory trees. Being C++ it treats file names as byte arrays so itdoesn’t matter what the file name encoding is.
Note also that the reference scanner only looks for “xyz…-foo”; whatcomes before and after doesn’t matter. So for example if you have“/gnu/store/xyz…-foo/à”, what’s important is the “xyz…-foo” bit.
This is all happening in libstore/references.cc (which is surprisinglysmall) and in (guix build graft) for the grafting part, which Mark wrotea while back.
HTH,Ludo’.
P
P
Pierre Neidhardt wrote on 27 Dec 2018 23:05
(name . Ludovic Courtès)(address . ludo@gnu.org)
87d0pmhbgn.fsf@ambrevar.xyz
Toggle quote (4 lines)> The reference scanner, currently written in C++, traverses whole> directory trees. Being C++ it treats file names as byte arrays so it> doesn’t matter what the file name encoding is.
But what matters then is that the filename encodings on the filesystem and in thebinary match, right?
Toggle quote (4 lines)> Note also that the reference scanner only looks for “xyz…-foo”; what> comes before and after doesn’t matter. So for example if you have> “/gnu/store/xyz…-foo/à”, what’s important is the “xyz…-foo” bit.
OK, makes sense, then my main worry is just moot :)
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAlwlTLgACgkQm9z0l6S7zH/S3Qf/W9Oy7e1p3LkCiKM2l9t7jW3TezaUlHflLGmVd1zdiaTq/aeLdTfY2r+i+/aEweAHmQGD1oHWmSbnDMyBOQalzBNAQi8dg+oOSVNiMASWk+aHCj5OohE1mxrddSLwTxk0a4BIM03GbEc/qFtI2nOZEJGjphkHJGjSKHB/5gzilsLVXRjajYWXVh/PqVaIH3xJzjA4zIyg711PQTKMqB8qIAKpr0OKA23vpZ1FaRKoNY5NRx/g1wQpFTfi4ZdwgJwPwh3bDkU61C5mPiHcVARi8X3M65h96Aj+9RX9TCZ/+3oQTWexKTv2o0xHZfSYbHpvtAb8xkg9sKLaN47TFoM2Zg===djyT-----END PGP SIGNATURE-----
L
L
Ludovic Courtès wrote on 27 Dec 2018 23:59
(name . Pierre Neidhardt)(address . mail@ambrevar.xyz)
87r2e28tkv.fsf@gnu.org
Pierre Neidhardt <mail@ambrevar.xyz> skribis:
Toggle quote (7 lines)>> The reference scanner, currently written in C++, traverses whole>> directory trees. Being C++ it treats file names as byte arrays so it>> doesn’t matter what the file name encoding is.>> But what matters then is that the filename encodings on the filesystem and in the> binary match, right?
I’m not sure what you call “the binary”. Do you mean the nar?
Ludo’.
P
P
Pierre Neidhardt wrote on 28 Dec 2018 08:47
(name . Ludovic Courtès)(address . ludo@gnu.org)
874laygkiy.fsf@ambrevar.xyz
Toggle quote (2 lines)> I’m not sure what you call “the binary”. Do you mean the nar?
No, in this case I referred to "/bin/next" in sbcl-next. So any file in the narpassed to the reference scanner.
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAlwl1RUACgkQm9z0l6S7zH/vtAgAq1KOgRvdEXvdvjPy//H0f/nhzu9Z1qYhwxV9tGOoeBcHiG/vUo6mc7/hRkht5zFI35d6L1o09XB6oJKfsVSi0P5VE0APDZzlc3y1pifNcjFekBHUuiva5jR4zCYeOsXc8AKZSSC7Lf+OPYb2CTFV2nvoANo9dhl5OUPZHZ0B7GcCyHZDtnk4jtiZD3p/FjlLcmZ9hom0CbvHjffkDlq/0nmO22kqY5tuKqY/p6UDyJDXROeMdYi2u2lbhM3kgrw5j7MQrQthmt00PUmqDRPoudjnjc4btoGUjUIjZlVgfZidv1qygC4Bxi/C9LZ3luopQVjPL/m2is4gwUp2DctYjg===3PX/-----END PGP SIGNATURE-----
P
P
Pierre Neidhardt wrote on 30 Mar 12:15 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
87lfa5eymf.fsf@ambrevar.xyz
Many grafting issues with Nyxt have been reported recently:
https://github.com/atlas-engineer/nyxt/issues/1257
Seems that glib is being grafted, which breaks cl-cffi-gtk and thusNyxt.
Is there a way to disable grafting for a given package? This would atleast be a local workaround for Nyxt.
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQFGBAEBCAAwFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAmBi+jgSHG1haWxAYW1icmV2YXIueHl6AAoJEJvc9Jeku8x/zX4IAKAVBaFbpQMrdhbgmIZ8p3izeO0ZKSrBIWYn8cys+gZ1nLS7zIRwqY+WVQjRPSP7pOTlP72hcXfEEC3hLg2h/wcKEXHc1TkloTUfKfoJ5pZ6lfKRan53BjCwPQwTk66HB+0IdGXYXn4NISY4Fn3oep9dYVC4fDv68OTlbl1tIFrydEnr4ZKxEX92fJbAkeP4gzCUzfjkcScQi0G6dQW3YB0nNjG2MFsmSENUdvrb+tD+ltxask7Iv/cPlcRB2Igz72vVEQMdunGRoOymi2X47kQGwPO1Z9Y2PhKe8vUWu5a2uC6daqXGHkwU/fUmOAfSvV8aghPgey9PPxKJVq3qxo0==86zr-----END PGP SIGNATURE-----
L
L
Ludovic Courtès wrote on 30 Mar 22:09 +0200
(name . Pierre Neidhardt)(address . mail@ambrevar.xyz)
87tuoscsk9.fsf@gnu.org
Hi,
Pierre Neidhardt <mail@ambrevar.xyz> skribis:
Toggle quote (10 lines)> Many grafting issues with Nyxt have been reported recently:>> https://github.com/atlas-engineer/nyxt/issues/1257>> Seems that glib is being grafted, which breaks cl-cffi-gtk and thus> Nyxt.>> Is there a way to disable grafting for a given package? This would at> least be a local workaround for Nyxt.
No. I provided guidance in 2018 on how to address this issue:
https://issues.guix.gnu.org/33848
Did anyone talk to the SBCL folks? Please, let’s address this ratherthan look for workarounds.
Thanks,Ludo’.
P
P
Pierre Neidhardt wrote on 31 Mar 09:10 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
87sg4bdci0.fsf@ambrevar.xyz
Toggle quote (4 lines)> I provided guidance in 2018 on how to address this issue:>> https://issues.guix.gnu.org/33848
Looks like you are linking to this very same thread :)
Toggle quote (3 lines)> Did anyone talk to the SBCL folks? Please, let’s address this rather> than look for workarounds.
I've just asked them for advice:
https://bugs.launchpad.net/sbcl/+bug/1922011
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQFGBAEBCAAwFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAmBkIHcSHG1haWxAYW1icmV2YXIueHl6AAoJEJvc9Jeku8x/fpUIAKTjnfzzgCtdSXezcntyxl2O9z30+gVci5tdkmx+rPKPhj0hThw/GhWWyJrKXMuxh9QB5ZfHYQhEImUBlmnmasIZRcX5O15Di5jejNA5rnaCIsxmszlCd5Zv/LT5JBbN6rCONX9L1baYnc0NZ+drar39aULASIWI/cDxcl1eF5+dJz4uXdPoVfDyZVj5u8PqVbP6HJ2wkOpoYjj5pA1w7vgbUNPkSRvaZnjRvXoWHxcMxzzR5/bxEFWgc3Kh6TupvMUOAo0AOK1kdjfmul+jSadcSfUOVe2LTYnwIZGr0zX8nKYPM3g8ZNVmTPMgWdKjZJRPeEZ88+ur9BWyeb9aa3g==TqV5-----END PGP SIGNATURE-----
P
P
Pierre Neidhardt wrote on 31 Mar 18:12 +0200
87im57b8u7.fsf@ambrevar.xyz
A brief discussion has ensued on SBCL bugtracker:
- It's unlikely that SBCL will change its internal string representation.- The main recommendation for an easy fix without updating the scanner is that we tweaked our build system to dump the store reference to a separate ASCII file.
Thoughts?Guillaume?
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQFGBAEBCAAwFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAmBkn4ASHG1haWxAYW1icmV2YXIueHl6AAoJEJvc9Jeku8x/CfQH/RoPIrdxAYdX2vbm+UqjYzGXZUbB4rCfQWfiUDvMOZF4afaGkXVBCqR24/UsuxlAz6doT8mtG9ADPtIkgCLWESF7XvN0OWmikcUOh5UCDnE+A/QOIVwpLCvUVVtdN09XPQysiYVAxgvDdzwlEVdSXvZsGYe/YoLs0PKIW/ddj0SXk/rXvAAm+EU8519TQ6s9oxEVuG/D8ZN6DpjZoQk33kQ/oCt2kWfGSWc15JHqqUotQvBsBgBL86Bu69S61KGOy8aDwQ5MAwbaS6Sic8T6KbUFEh8b1wsg8xqTIzLfq2mpZMJKrBvZ+LhjUV3sFBdsYnBR1SeB8HS4RzUeKdbUsRk==p0DJ-----END PGP SIGNATURE-----
L
L
Ludovic Courtès wrote on 31 Mar 22:42 +0200
(name . Pierre Neidhardt)(address . mail@ambrevar.xyz)
87zgyj5a33.fsf@gnu.org
Hi,
Pierre Neidhardt <mail@ambrevar.xyz> skribis:
Toggle quote (2 lines)> A brief discussion has ensued on SBCL bugtracker:
Nice, thanks for starting the discussion!
Toggle quote (3 lines)> - It's unlikely that SBCL will change its internal string> representation.
Of course, I would not suggest that.
What could have been nice is if there’s a way to mark specific stringsas being ASCII, or if there’s a “byte vector” data type compatible withstrings, for instance.
Toggle quote (4 lines)> - The main recommendation for an easy fix without updating the scanner> is that we tweaked our build system to dump the store reference to a> separate ASCII file.
That’s a good idea: simple and efficient. We do that for the initrd,for instance (commit b36e06c2b0889f1d0f939465589d36887ff24d33).
This could be done in ‘asdf-build-system/sbcl’ I suppose. I can see twodrawbacks:
1. Some packages like ‘nyxt’ don’t use it, so we’d have to duplicate the phase there.
2. It may be coarse-grain compared to scanning binaries for references (for example, we might retain references to build-only tools, such as libraries used only for tests).
That’s probably acceptable though, and certainly better than the status quo.
Thanks,Ludo’.
P
P
Pierre Neidhardt wrote on 31 Mar 22:57 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
874kgravnm.fsf@ambrevar.xyz
Hi Ludo,
Toggle quote (4 lines)> What could have been nice is if there’s a way to mark specific strings> as being ASCII, or if there’s a “byte vector” data type compatible with> strings, for instance.
It does and it could work, but according to upstream it's just simplerto dump deps in a separate file.
Toggle quote (3 lines)> 1. Some packages like ‘nyxt’ don’t use it, so we’d have to duplicate> the phase there.
In the case of Nyxt, the reason it does not use it is because theasdf-build-system/sbcl binary production has some drawbacks. I couldwork on overhauling the build system to fix the uncanny behaviour, thenNyxt would use it.
Toggle quote (4 lines)> 2. It may be coarse-grain compared to scanning binaries for references> (for example, we might retain references to build-only tools, such> as libraries used only for tests).
The build system could easily leave out Lisp native-inputs, no?
Cheers!
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQFGBAEBCAAwFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAmBk4j0SHG1haWxAYW1icmV2YXIueHl6AAoJEJvc9Jeku8x/DwQIAI18KsfbdGJk3Ok5nAtdF5BzhN1QdWuPNjnRbL+kMtlvEVoYXEkh6a7m8h6AMi4mKZ0e2ju6eEIYfj/TUoBgE+vz/ksKySuKhD7c8x1l4M1v4qFfLUPkPyNZ1EQ0TAHhcVtNFl1QjWRHZPsSSBDmi5jW/zVEh6c16nY3q0M+ey03Rox3+RDd3kgNLjNA//G7gmF5xl7T9CLZ1jmpQ7VbTHGht52ina0FX2ryBD2CPxCdivyap67skzAaQMLTZUB8ZXpnxRsUb1ZxG7fkRygXxoIBM84DnIsHs53HWRx3AUf5eCJ+SKrQElhERh31oStF3nMNbU9L+PdReKELsSow6B0==OGKR-----END PGP SIGNATURE-----
M
M
Mark H Weaver wrote on 1 Apr 08:03 +0200
(address . 33848@debbugs.gnu.org)
87czvebky2.fsf@netris.org
Pierre Neidhardt <mail@ambrevar.xyz> writes:
Toggle quote (4 lines)> - The main recommendation for an easy fix without updating the scanner> is that we tweaked our build system to dump the store reference to a> separate ASCII file.
Sounds good. I made a similar proposal in Dec 2018, earlier in thisthread https://bugs.gnu.org/33848#31. I wrote:
If you don't want to change the daemon, it could be worked around in our build-side code as follows: we could add a new phase to certain build systems (or possibly gnu-build-system) that scans each output for UTF-16/32 encoded store references that are never referenced in UTF-8. If such references exist, a file with an unobtrusive name would be added to that output containing those references encoded in UTF-8. This would enable our daemon's existing reference scanner to find all of the references.
Our grafting code would then need to be extended to recognize and transform store references encoded in UTF-16/32 as well as UTF-8.
Mark
P
P
Pierre Neidhardt wrote on 1 Apr 09:13 +0200
(address . 33848@debbugs.gnu.org)
87sg4aa35g.fsf@ambrevar.xyz
Thank you for the reminder, Mark, I had forgotten about your suggestion.
If we are going for a SBCL-specific solution, I believe that allreferences are stored in plain text files at some points (the .asd filesand the .lisp files) which are often encoded in ASCII / UTF-8, sosearching these files without having to deal with their encoding mightbe enough. But of course this is less general and more brittle.
Mark, Guillaume, would you have time to give this a try?I'm a bit busy at the moment. Let me know if you can't work on it, I'lltry to find time to work on a patch.
Cheers!
--Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQFGBAEBCAAwFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAmBlcosSHG1haWxAYW1icmV2YXIueHl6AAoJEJvc9Jeku8x/ieQIAJVk4okboi89JzT/9KpPbx+HNgf3ks7vFfVyWnUCJeq5NBDKVGWmM9LZ230/rXPU63n7y7ZTYiauSL0UTap3tXtygWfgtDM8zHWS4uc5KBybz3RoNNH6rMiPp5bGLbs15Y1NatjltpUFEiQp/EOw15my3ZjVxpcYFaUDwQSDaBRmemExSqT03E4vo7qrI7tLNfAmcW3iE3UKDECgs93WmcliUf0nuQL2YmFHefgOdSScdNOZBO80yIyDmrkWe7C51c22y4pHu86sNLaJ3mVA1N+kNAtjG/qvpUw9E34Q1mSZY/1jbraa9f0J11PR6eJNU43dQdmOUeA/QdqBDCoOXb4==uiP6-----END PGP SIGNATURE-----
L
L
Ludovic Courtès wrote on 1 Apr 09:57 +0200
(name . Mark H Weaver)(address . mhw@netris.org)
87eefu30a4.fsf@gnu.org
Hi,
Mark H Weaver <mhw@netris.org> skribis:
Toggle quote (20 lines)> Pierre Neidhardt <mail@ambrevar.xyz> writes:>> - The main recommendation for an easy fix without updating the scanner>> is that we tweaked our build system to dump the store reference to a>> separate ASCII file.>> Sounds good. I made a similar proposal in Dec 2018, earlier in this> thread <https://bugs.gnu.org/33848#31>. I wrote:>> If you don't want to change the daemon, it could be worked around in our> build-side code as follows: we could add a new phase to certain build> systems (or possibly gnu-build-system) that scans each output for> UTF-16/32 encoded store references that are never referenced in UTF-8.> If such references exist, a file with an unobtrusive name would be added> to that output containing those references encoded in UTF-8. This would> enable our daemon's existing reference scanner to find all of the> references.>> Our grafting code would then need to be extended to recognize and> transform store references encoded in UTF-16/32 as well as UTF-8.
Oh thanks for the reminder.
The separate ASCII file doesn’t solve it all because, as you write, we’dneed to change the grafting code as well.
Then it might be simpler to use a “byte vector” data type for thosestrings. We’ll have to wait for Pierre’s patches to get a better idea.:-)
Thanks,Ludo’.
P
P
Pierre Neidhardt wrote on 1 Apr 10:48 +0200
87eefu9yqe.fsf@ambrevar.xyz
Hi!
Toggle quote (6 lines)> The separate ASCII file doesn’t solve it all because, as you write, we’d> need to change the grafting code as well.>> Then it might be simpler to use a “byte vector” data type for those> strings.
Which strings and where would we use byte vectors?
Toggle quote (3 lines)> We’ll have to wait for Pierre’s patches to get a better idea.> :-)
By "Pierre's patches", you mean a patch to add a build phase thatgenerates an file listings all references?
Cheers!
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQFGBAEBCAAwFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAmBliOkSHG1haWxAYW1icmV2YXIueHl6AAoJEJvc9Jeku8x/NlgIAKW2OYmihOC2vAPTom6G7OEh7DU4xzU5h+e9acIDKKjBR4PwbxTBwWJdF4Hq5i9rRNBM/Vr5OU8lN/0ikpTmh/2TCxzHPBC01Ln+8AJmIKaIc6VIXW24IhsT0F3CBkrbz0ch+iQYFlV3HJ8Vi/V4pa62865Y7jC0mI4evp3puwutJjE4cPchBh6NJyF/0IGkbuS3f4E1ErrtcaFDUX3bCSsTZ29Jk/8UqEE67epxAgZYUj2OeH/464JtGWmrr8j6XQaTrmYWxeAWLDrozjDjtD0yBWxAX5OygcxOVbDcOXJbf4K3OLXoCj+jdIIrorf6y2nuPYqcoJ33h8XfcB1IdYk==4p1L-----END PGP SIGNATURE-----
G
G
Guillaume Le Vaillant wrote on 1 Apr 11:07 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
87im56l6es.fsf@yamatai
Pierre Neidhardt <mail@ambrevar.xyz> skribis:
Toggle quote (6 lines)> If we are going for a SBCL-specific solution, I believe that all> references are stored in plain text files at some points (the .asd files> and the .lisp files) which are often encoded in ASCII / UTF-8, so> searching these files without having to deal with their encoding might> be enough. But of course this is less general and more brittle.
A store reference to a C library in a standalone Lisp binary can comeeither from the current package or from some dependencies(cl+ssl, cl-cffi-gtk, etc.). So we would need to scan the sourcecode of all the Lisp dependencies recursively to get the full list ofstore refrences.And as Mark wrote below, with the current grafting code, this list ofstore references will not solve grafting for paths that are in UTF-32 orboth in ASCII and UTF-32 in the binary file.

Ludovic Courtès <ludo@gnu.org> skribis:
Toggle quote (31 lines)> Mark H Weaver <mhw@netris.org> skribis:>>> Pierre Neidhardt <mail@ambrevar.xyz> writes:>>> - The main recommendation for an easy fix without updating the scanner>>> is that we tweaked our build system to dump the store reference to a>>> separate ASCII file.>>>> Sounds good. I made a similar proposal in Dec 2018, earlier in this>> thread <https://bugs.gnu.org/33848#31>. I wrote:>>>> If you don't want to change the daemon, it could be worked around in our>> build-side code as follows: we could add a new phase to certain build>> systems (or possibly gnu-build-system) that scans each output for>> UTF-16/32 encoded store references that are never referenced in UTF-8.>> If such references exist, a file with an unobtrusive name would be added>> to that output containing those references encoded in UTF-8. This would>> enable our daemon's existing reference scanner to find all of the>> references.>>>> Our grafting code would then need to be extended to recognize and>> transform store references encoded in UTF-16/32 as well as UTF-8.>> Oh thanks for the reminder.>> The separate ASCII file doesn’t solve it all because, as you write, we’d> need to change the grafting code as well.>> Then it might be simpler to use a “byte vector” data type for those> strings. We’ll have to wait for Pierre’s patches to get a better idea.> :-)
I'm not sure what you mean with the "byte vector" data type here. Couldyou explain what you're thinking about with a few more details?
-----BEGIN PGP SIGNATURE-----
iIUEAREKAC0WIQTLxZxm7Ce5cXlAaz5r6CCK3yH+PwUCYGWNSw8cZ2x2QHBvc3Rlby5uZXQACgkQa+ggit8h/j8KtgEAoyvmikRXpUHbT3CLPgewYs1QE7dXwa/fSb96bR8rM6YBAJpnLOUegNHW8lTLfJu8ats/gdkdL+TFysYRBtjIU6Sz=5hTk-----END PGP SIGNATURE-----
P
P
Pierre Neidhardt wrote on 1 Apr 11:13 +0200
87wntm8j18.fsf@ambrevar.xyz
Hi Guillaume!
Toggle quote (6 lines)> A store reference to a C library in a standalone Lisp binary can come> either from the current package or from some dependencies> (cl+ssl, cl-cffi-gtk, etc.). So we would need to scan the source> code of all the Lisp dependencies recursively to get the full list of> store refrences.
I don't think there is need to scan recursively: if package A depends onB which depends on C, then A can lists the dependency on B in a file,and B can do the same for C. This way the relationship between A and Cis properly stored.
Am I missing something?
Toggle quote (4 lines)> And as Mark wrote below, with the current grafting code, this list of> store references will not solve grafting for paths that are in UTF-32 or> both in ASCII and UTF-32 in the binary file.
Indeed, and that's the core of the issue here I believe, since graftingis what breaks Nyxt in practice.
Cheers!
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQFGBAEBCAAwFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAmBljqMSHG1haWxAYW1icmV2YXIueHl6AAoJEJvc9Jeku8x/Dy8IAJqkvs/eVUHm+1JUqFEfkMIzMMOB4vZNi6SVjs6lIlBgj33X5KhzmvYXh8VAZYSylDh/O9HUH3lhCmBueqyUUTfBDtoPm4fjsnBU9J+QsDS9zl3VRvzdjyZUARAf5KxRrREYx/FpMMFhZUB9/lcczISkTpnH/5qcXxpRDdGGNbzJYmOH+KBZNCWD+wKIdXJmY9AncfwApA2ZlSfvEymKEf3ddCT8Mc2NliaE/gPFk9f+ws+iQOzwnOXcsLVtsAXXLt2Cufm5Fw8Z9Su2UR+pfaS0acdIImeyxFVmnFC5OaBuStfHU0YmACpx9Im/om6Xam1BAvYEBWVd2fjdizfFizk==s3Yo-----END PGP SIGNATURE-----
G
G
Guillaume Le Vaillant wrote on 1 Apr 11:52 +0200
(name . Pierre Neidhardt)(address . mail@ambrevar.xyz)
87a6qil4b1.fsf@yamatai
Pierre Neidhardt <mail@ambrevar.xyz> skribis:
Toggle quote (15 lines)> Hi Guillaume!>>> A store reference to a C library in a standalone Lisp binary can come>> either from the current package or from some dependencies>> (cl+ssl, cl-cffi-gtk, etc.). So we would need to scan the source>> code of all the Lisp dependencies recursively to get the full list of>> store refrences.>> I don't think there is need to scan recursively: if package A depends on> B which depends on C, then A can lists the dependency on B in a file,> and B can do the same for C. This way the relationship between A and C> is properly stored.>> Am I missing something?
Oh, you say this file would be created for every Lisp package. I thoughtit would only be for the standalone binary case, because the "regular"asdf-build-system/sbcl used for Lisp libraries ships the sources and itsmake-asdf-configuration phase creates links to the required Lispdependencies in '/gnu/store/...', so there should not be missingreferences.

Toggle quote (9 lines)>> And as Mark wrote below, with the current grafting code, this list of>> store references will not solve grafting for paths that are in UTF-32 or>> both in ASCII and UTF-32 in the binary file.>> Indeed, and that's the core of the issue here I believe, since grafting> is what breaks Nyxt in practice.>> Cheers!
I just wondered: does the grafting code take '.fasl' files intoconsideration?If yes, I guess a Lisp library could also end up in a strangehalf-grafted state if the grafting code modifies ASCII references andnot UTF-32 ones...
-----BEGIN PGP SIGNATURE-----
iIUEAREKAC0WIQTLxZxm7Ce5cXlAaz5r6CCK3yH+PwUCYGWX8g8cZ2x2QHBvc3Rlby5uZXQACgkQa+ggit8h/j+FZAD+NJBBrb4xTqeC8wYQb956Wv1WiGyWm9LTFln5dXAIqfEA/2O1IKrJYZG/fItBPNLo2UR5PYJwolfNVIL4fwwBHLlk=lw/O-----END PGP SIGNATURE-----
P
P
Pierre Neidhardt wrote on 1 Apr 12:06 +0200
(name . Guillaume Le Vaillant)(address . glv@posteo.net)
87czvez5c1.fsf@ambrevar.xyz
Guillaume Le Vaillant <glv@posteo.net> writes:
Toggle quote (7 lines)> Oh, you say this file would be created for every Lisp package. I thought> it would only be for the standalone binary case, because the "regular"> asdf-build-system/sbcl used for Lisp libraries ships the sources and its> make-asdf-configuration phase creates links to the required Lisp> dependencies in '/gnu/store/...', so there should not be missing> references.
Right.
The only case where there could be a missing reference is if the sourcecode contains an FFI reference stored in non-ASCII / UTF-8.
So we need to parse other encodings too as Mark suggested if Iunderstand correctly.
Toggle quote (6 lines)> I just wondered: does the grafting code take '.fasl' files into> consideration?> If yes, I guess a Lisp library could also end up in a strange> half-grafted state if the grafting code modifies ASCII references and> not UTF-32 ones...
Absolutely, .fasls suffer from the same problem since they may encodestrings as UTF-32.
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQFGBAEBCAAwFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAmBlmz4SHG1haWxAYW1icmV2YXIueHl6AAoJEJvc9Jeku8x/ZFsH/A5lZeTZQWXph2ZYsOjwv6mZMRrBY+shH1Gu5mYoU+RIJv742j3gsrActW0Nb3jOY8VDlzcXV4iTCZz7/iOTg52r0wkMB9gQQvJ3aXYa23G3Yoj7bvOkYB6NN+FGSdVBEIzeBA43jU7gLijDQvw6bWKWN50rkb9pAz+8vZexu7htL5bXQvYvQ8REPN7XtejLwHqDPfJZwZnlDBL00gVJHbQqwucg3yTmiHt3e4rYHmRubKiUxtNDGdvZeQAGW0CY7JzHLEzcZSa6HvTptU9t/xNEoaXmDaQlc1G1am0XD3JZvayzPHm8YUcf8UdlGif5QbyXTNQC5+ZQEwp53CLQ1vg==rgOr-----END PGP SIGNATURE-----
P
P
Pierre Neidhardt wrote on 1 Apr 12:07 +0200
(name . Guillaume Le Vaillant)(address . glv@posteo.net)
87a6qiz5b3.fsf@ambrevar.xyz
I'm not familiar with the grafting code, so anyone who is (Mark? Ludo?)might be able to fix this much quicker than me! :)
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQFGBAEBCAAwFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAmBlm2ASHG1haWxAYW1icmV2YXIueHl6AAoJEJvc9Jeku8x/jxMH/09oBJdVeh7HoBNEXMph1n7P1x+s2+j3Ez/ICMW//2RExhoYZoyVVz0ehNep9j8I7ha4SCV7KCkua82HMYB533bjhJjd6Z3enAz2txha/N+QVGpeUyZ7tIHGI69oKcHjF80OzQ1uKkMTJ5GmnVBKm6LvzK6QFOfn6+SlVWSmDBoFi1GF3/BRrGh+WggO8XFdTdaVQeww2o6aBKTsIqVQCDR2HbkFHxUn//d5w0B3OmhEtWePZ6TNuQ3albz+d3I8Wve3F/HzOVCDREGHvJw55bbyyWkn8EJQMZaKySjutHtz4ablW3pJEJsf1aOl6xYNdmmc5SjG8Tax1cWIQiY3L6k==cC1E-----END PGP SIGNATURE-----
L
L
Ludovic Courtès wrote on 1 Apr 17:24 +0200
(name . Pierre Neidhardt)(address . mail@ambrevar.xyz)
87pmze58p6.fsf@gnu.org
Pierre Neidhardt <mail@ambrevar.xyz> skribis:
Toggle quote (3 lines)> I'm not familiar with the grafting code, so anyone who is (Mark? Ludo?)> might be able to fix this much quicker than me! :)
There’s ‘%graft-hooks’ in (guix build graft). One could add a hook thatwould do nothing except for SBCL packages; for SBCL packages, it wouldto the UCS-4 rewriting “somehow”.
The other option, which might be easier, would be to arrange to not useUCS-4 in the first place, by choosing a bytevector data type for stringliterals known to contain a store reference. It can be done if we knowwhere those references come from—e.g., they’re introduced by‘substitute*’ on the source.
I hope this makes sense!
Ludo’.
M
M
Mark H Weaver wrote on 1 Apr 19:23 +0200
875z15c3zx.fsf@netris.org
Hi Ludovic,
Ludovic Courtès <ludo@gnu.org> writes:
Toggle quote (4 lines)> What could have been nice is if there’s a way to mark specific strings> as being ASCII, or if there’s a “byte vector” data type compatible with> strings, for instance.
Do we know that all strings containing store references will berepresentable in ASCII?
Mark
M
M
Mark H Weaver wrote on 1 Apr 19:33 +0200
871rbtc3j5.fsf@netris.org
Pierre Neidhardt <mail@ambrevar.xyz> writes:
Toggle quote (3 lines)> I'm not familiar with the grafting code, so anyone who is (Mark? Ludo?)> might be able to fix this much quicker than me! :)
I'll think about what would be required to modify our grafting code tosupport UCS-4.
Mark
L
L
Ludovic Courtès wrote on 2 Apr 17:13 +0200
(name . Mark H Weaver)(address . mhw@netris.org)
87im541zzg.fsf@gnu.org
Hi Mark,
Mark H Weaver <mhw@netris.org> skribis:
Toggle quote (8 lines)> Ludovic Courtès <ludo@gnu.org> writes:>> What could have been nice is if there’s a way to mark specific strings>> as being ASCII, or if there’s a “byte vector” data type compatible with>> strings, for instance.>> Do we know that all strings containing store references will be> representable in ASCII?
The basename part of /gnu/store/* is guaranteed to be pure ASCII. Now,you’re right that file names that come after could be non-ASCII, thoughthat’s very unlikely. We’d have to look at concrete examples I guess.
Ludo’.
M
M
Mark H Weaver wrote on 3 Apr 00:46 +0200
87r1js9udv.fsf@netris.org
Here's a preliminary draft patch to add support for UTF-32 and UTF-16references to our grafting code. I haven't yet measured the efficiencyimpact of these changes, but I suspect it's not too bad.
I'd be curious to know whether it fixes the Nyxt graft.
Mark
From 0fcfd804570fd1c07ffb1f6c176d6ec3430907df Mon Sep 17 00:00:00 2001From: Mark H Weaver <mhw@netris.org>Date: Fri, 2 Apr 2021 18:36:23 -0400Subject: [PATCH] DRAFT: grafts: Add support for UTF-16 and UTF-32 store references.
--- guix/build/graft.scm | 138 +++++++++++++++++++++++++++++-------------- tests/grafts.scm | 68 +++++++++++++++++++++ 2 files changed, 162 insertions(+), 44 deletions(-)
Toggle diff (276 lines)diff --git a/guix/build/graft.scm b/guix/build/graft.scmindex c119ee71d1..6e7f3859cb 100644--- a/guix/build/graft.scm+++ b/guix/build/graft.scm@@ -1,6 +1,6 @@ ;;; GNU Guix --- Functional package management for GNU ;;; Copyright © 2014, 2015, 2016, 2018 Ludovic Courtès <ludo@gnu.org>-;;; Copyright © 2016 Mark H Weaver <mhw@netris.org>+;;; Copyright © 2016, 2021 Mark H Weaver <mhw@netris.org> ;;; ;;; This file is part of GNU Guix. ;;;@@ -55,6 +55,36 @@ (string->char-set "0123456789abcdfghijklmnpqrsvwxyz") <>)) +(define (nix-base32-char-or-nul? byte)+ (or (nix-base32-char? byte)+ (char=? byte #\nul)))++(define (has-utf16-zeroes? buffer i)+ (let loop ((j (+ 1 (- i (* 2 hash-length)))))+ (or (>= j i)+ (and (zero? (bytevector-u8-ref buffer j))+ (loop (+ j 2))))))++(define (has-utf32-zeroes? buffer i)+ (let loop ((j (+ 1 (- i (* 4 hash-length)))))+ (or (>= j i)+ (and (zero? (bytevector-u8-ref buffer j))+ (zero? (bytevector-u8-ref buffer (+ j 1)))+ (zero? (bytevector-u8-ref buffer (+ j 2)))+ (loop (+ j 4))))))++(define (expand-bytevector bv char-size)+ (let* ((len (bytevector-length bv))+ (bv* (make-bytevector (+ 1 (* char-size+ (- len 1)))+ 0)))+ (let loop ((i 0))+ (when (< i len)+ (bytevector-u8-set! bv* (* i char-size)+ (bytevector-u8-ref bv i))+ (loop (+ i 1))))+ bv*))+ (define* (replace-store-references input output replacement-table #:optional (store (%store-directory))) "Read data from INPUT, replacing store references according to@@ -76,15 +106,16 @@ bytevectors to the same value." (list->vector (map pred (iota 256))) <>)) - (define nix-base32-byte?+ (define nix-base32-byte-or-nul? (optimize-u8-predicate- (compose nix-base32-char?+ (compose nix-base32-char-or-nul? integer->char))) (define (dash? byte) (= byte 45)) (define request-size (expt 2 20)) ; 1 MiB + ;; XXX This comment is no longer accurate! ;; We scan the file for the following 33-byte pattern: 32 bytes of ;; nix-base32 characters followed by a dash. To accommodate large files, ;; we do not read the entire file, but instead work on buffers of up to@@ -116,43 +147,61 @@ bytevectors to the same value." ;; written. (if (< i end) (let ((byte (bytevector-u8-ref buffer i)))- (cond ((and (dash? byte)- ;; We've found a dash. Note that we do not know- ;; whether the preceeding 32 bytes are nix-base32- ;; characters, but we do not need to know. If- ;; they are not, the following lookup will fail.- (lookup-replacement- (string-tabulate (lambda (j)- (integer->char- (bytevector-u8-ref buffer- (+ j (- i hash-length)))))- hash-length)))- => (lambda (replacement)- ;; We've found a hash that needs to be replaced.- ;; First, write out all bytes preceding the hash- ;; that have not yet been written.- (put-bytevector output buffer written- (- i hash-length written))- ;; Now write the replacement string.- (put-bytevector output replacement)- ;; Since the byte at position 'i' is a dash,- ;; which is not a nix-base32 char, the earliest- ;; position where the next hash might start is- ;; i+1, and the earliest position where the- ;; following dash might start is (+ i 1- ;; hash-length). Also, increase the write- ;; position to account for REPLACEMENT.- (let ((len (bytevector-length replacement)))- (scan-from (+ i 1 len)- (+ i (- len hash-length))))))- ;; If the byte at position 'i' is a nix-base32 char,+ (cond ((dash? byte)+ (let* ((char-size+ (if (zero? (bytevector-u8-ref buffer (- i 1)))+ (if (zero? (bytevector-u8-ref buffer (- i 2)))+ (if (and (<= (* 4 hash-length)+ (- i written))+ (has-utf32-zeroes? buffer i))+ 4+ 1)+ (if (and (<= (* 2 hash-length)+ (- i written))+ (has-utf16-zeroes? buffer i))+ 2+ 1))+ 1))+ (replacement*+ (lookup-replacement+ (string-tabulate (lambda (j)+ (integer->char+ (bytevector-u8-ref buffer+ (- i (* char-size+ (- hash-length j))))))+ hash-length)))+ (replacement+ (and replacement*+ (expand-bytevector replacement*+ char-size))))+ (if replacement+ (begin+ ;; We've found a hash that needs to be replaced.+ ;; First, write out all bytes preceding the hash+ ;; that have not yet been written.+ (put-bytevector output buffer written+ (- i (* char-size hash-length) written))+ ;; Now write the replacement string.+ (put-bytevector output replacement)+ ;; Now compute the new value of 'written' and+ ;; the new value of 'i', and iterate.+ (let ((written (+ (- i (* char-size hash-length))+ (bytevector-length replacement))))+ (scan-from (+ written hash-length) written)))+ ;; The byte at position 'i' is a dash, which is+ ;; not a nix-base32 char, so the earliest+ ;; position where the next hash might start is+ ;; i+1, with the following dash at position (+ i+ ;; 1 hash-length).+ (scan-from (+ i 1 hash-length) written))))+ ;; If the byte at position 'i' is a nix-base32 char or nul, ;; then the dash we're looking for might be as early as ;; the following byte, so we can only advance by 1.- ((nix-base32-byte? byte)+ ((nix-base32-byte-or-nul? byte) (scan-from (+ i 1) written))- ;; If the byte at position 'i' is NOT a nix-base32- ;; char, then the earliest position where the next hash- ;; might start is i+1, with the following dash at+ ;; If the byte at position 'i' is NOT a nix-base32 char+ ;; or nul, then the earliest position where the next+ ;; hash might start is i+1, with the following dash at ;; position (+ i 1 hash-length). (else (scan-from (+ i 1 hash-length) written))))@@ -162,18 +211,19 @@ bytevectors to the same value." ;; "unget". If 'end' is less than 'request-size' then we read ;; less than we asked for, which indicates that we are at EOF, ;; so we needn't unget anything. Otherwise, we unget up to- ;; 'hash-length' bytes (32 bytes). However, we must be careful- ;; not to unget bytes that have already been written, because- ;; that would cause them to be written again from the next- ;; buffer. In practice, this case occurs when a replacement is- ;; made near or beyond the end of the buffer. When REPLACEMENT- ;; went beyond END, we consume the extra bytes from INPUT.+ ;; (* 4 hash-length) bytes. However, we must be careful not to+ ;; unget bytes that have already been written, because that+ ;; would cause them to be written again from the next buffer.+ ;; In practice, this case occurs when a replacement is made+ ;; near or beyond the end of the buffer. When REPLACEMENT went+ ;; beyond END, we consume the extra bytes from INPUT. (begin (if (> written end) (get-bytevector-n! input buffer 0 (- written end)) (let* ((unwritten (- end written)) (unget-size (if (= end request-size)- (min hash-length unwritten)+ (min (* 4 hash-length)+ unwritten) 0)) (write-size (- unwritten unget-size))) (put-bytevector output buffer written write-size)diff --git a/tests/grafts.scm b/tests/grafts.scmindex a12c6a5911..0e1c7355b1 100644--- a/tests/grafts.scm+++ b/tests/grafts.scm@@ -1,5 +1,6 @@ ;;; GNU Guix --- Functional package management for GNU ;;; Copyright © 2014, 2015, 2016, 2017, 2018, 2019 Ludovic Courtès <ludo@gnu.org>+;;; Copyright © 2021 Mark H Weaver <mhw@netris.org> ;;; ;;; This file is part of GNU Guix. ;;;@@ -468,4 +469,71 @@ replacement "/gnu/store"))))) +(define (nul-expand str char-size)+ (string-join (map string (string->list str))+ (make-string (- char-size 1) #\nul)))++(for-each+ (lambda (char-size1)+ (for-each+ (lambda (char-size2)+ (for-each+ (lambda (gap)+ (for-each+ (lambda (offset)+ (test-equal (format #f "replace-store-references, char-sizes ~a ~a, gap ~s, offset ~a"+ char-size1 char-size2 gap offset)+ (string-append (make-string offset #\=)+ (nul-expand (string-append "/gnu/store/"+ (make-string 32 #\6)+ "-BlahBlaH")+ char-size1)+ gap+ (nul-expand (string-append "/gnu/store/"+ (make-string 32 #\8)+ "-SoMeTHiNG")+ char-size2)+ (list->string (map integer->char (iota 77 33))))++ ;; Create input data where the right-hand-size of the dash ("-something"+ ;; here) goes beyond the end of the internal buffer of+ ;; 'replace-store-references'.+ (let* ((content (string-append (make-string offset #\=)+ (nul-expand (string-append "/gnu/store/"+ (make-string 32 #\5)+ "-blahblah")+ char-size1)+ gap+ (nul-expand (string-append "/gnu/store/"+ (make-string 32 #\7)+ "-something")+ char-size2)+ (list->string+ (map integer->char (iota 77 33)))))+ (replacement (alist->vhash+ `((,(make-string 32 #\5)+ . ,(string->utf8 (string-append+ (make-string 32 #\6)+ "-BlahBlaH")))+ (,(make-string 32 #\7)+ . ,(string->utf8 (string-append+ (make-string 32 #\8)+ "-SoMeTHiNG")))))))+ (call-with-output-string+ (lambda (output)+ ((@@ (guix build graft) replace-store-references)+ (open-input-string content) output+ replacement+ "/gnu/store"))))))+ ;; offsets to test+ (map (lambda (i) (- buffer-size (* 40 char-size1) i))+ (iota 30))))+ ;; gaps+ '("" "-" " " "a")))+ ;; char-size2 values to test+ '(1 2)))+ ;; char-size1 values to test+ '(1 2 4))++ (test-end)-- 2.31.1
P
P
Pierre Neidhardt wrote on 3 Apr 08:51 +0200
87sg47vp16.fsf@ambrevar.xyz
Wow, that was fast, thank you Mark!
Any idea how I can test this, i.e. how I can force a graft?
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQFGBAEBCAAwFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAmBoEIUSHG1haWxAYW1icmV2YXIueHl6AAoJEJvc9Jeku8x/YPIH/R5/zbjQwOwOV+eL0ihxT3kJmJesqDAxqU3wOOFYvYSDSA3sD20CJxEUMgf/ucBOMWdBmjVgFvQimi7IkFcpGUECnEPQ0sgPP0ubBFI1tl0qMI/q2FnfmYCR1AgoboEccSeRnPJX4N4zvOS1gBaIP1eY6zoppobNmfXnSjK+J/oXHBYnXbJoy/ih8o019dDa5j1aMl0dDjxpfxCpCv2qzbPLQAGeM9AQPhJU53dVRRBtzvCVVLlkmokdng0sfydWB/kOp90qH/jifZiylch6Qnn1AhQ/nr9qt0k0J+Niih4/wF5jsMX8QXVBQAgXhjYFu1kL04ShFeMS3UeNxu+Q/lc==AVI2-----END PGP SIGNATURE-----
M
M
Mark H Weaver wrote on 3 Apr 22:10 +0200
875z139liy.fsf@netris.org
Pierre Neidhardt <mail@ambrevar.xyz> writes:
Toggle quote (4 lines)> Wow, that was fast, thank you Mark!>> Any idea how I can test this, i.e. how I can force a graft?
Just apply the patch to a git checkout of Guix, build it, and then useit to build anything you like, e.g. "./pre-inst-env guix build nyxt".
With this patch applied, all graft derivations will be rebuilt, but*only* grafts. When it's ready (i.e. when it has better comments,docstrings, etc), this change is perfectly appropriate for 'master'.
FYI, I've already applied this new grafting code to my private branch ofGuix, switched to a system and user profile built using it, rebootedinto the new system, and I'm typing this email on it. I've also checkedthat none of the processes running on my system include executables orshared libraries from ungrafted store items (after removing my ibus.cache files, see https://bugs.gnu.org/47576).
Mark
L
L
Ludovic Courtès wrote on 5 Apr 21:45 +0200
(name . Mark H Weaver)(address . mhw@netris.org)
87ft04sefs.fsf@gnu.org
Hi Mark,
Mark H Weaver <mhw@netris.org> skribis:
Toggle quote (4 lines)> Pierre Neidhardt <mail@ambrevar.xyz> writes:>>> Wow, that was fast, thank you Mark!
Yup, thumbs up!
Toggle quote (9 lines)>> Any idea how I can test this, i.e. how I can force a graft?>> Just apply the patch to a git checkout of Guix, build it, and then use> it to build anything you like, e.g. "./pre-inst-env guix build nyxt".>> With this patch applied, all graft derivations will be rebuilt, but> *only* grafts. When it's ready (i.e. when it has better comments,> docstrings, etc), this change is perfectly appropriate for 'master'.
The GC’s scanner still gets it wrong though. I wonder whether havingthe grafting code more capable than the scanner could lead to badsurprises. WDYT?
Thank you for looking into this!
Ludo’.
M
M
Mark H Weaver wrote on 6 Apr 01:04 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
8735w472oo.fsf@netris.org
Hi Ludovic,
Ludovic Courtès <ludo@gnu.org> writes:
Toggle quote (10 lines)> Mark H Weaver <mhw@netris.org> skribis:>>> With this patch applied, all graft derivations will be rebuilt, but>> *only* grafts. When it's ready (i.e. when it has better comments,>> docstrings, etc), this change is perfectly appropriate for 'master'.>> The GC’s scanner still gets it wrong though. I wonder whether having> the grafting code more capable than the scanner could lead to bad> surprises. WDYT?
I've thought about it, and I've been unable to think of anydisadvantage to making the grafter more capable than the scanner.It seems to me a pure win.
That the scanner fails to find all references is clearly an importantproblem that should be fixed ASAP, but as far as I can tell, improvingthe grafter would not make that problem any worse or create any newproblems.
Improving the grafter should have the following effects:
(1) Reducing the number of cases where ungrafted code with security flaws is being used on our systems.
(2) Fixing problems in our Fish, Nyxt, and Common Lisp packages.
Improving the scanner, or adding phases to selected packages or buildsystems to copy hidden references to an ASCII file, should have thefollowing effects:
(1) Reducing the number of cases where run-time dependencies are not known to Guix, which could lead to dependencies being prematurely GC'd or excluded from things like "guix pack".
So, it seems to me that we should persue both of these improvementsconcurrently, and I see no practical advantage to postponing one for thesake of rolling them both out at the same time. Of course, I welcomeother opinions on this.
What do you think?
Thanks, Mark
L
L
Ludovic Courtès wrote on 6 Apr 10:16 +0200
(name . Mark H Weaver)(address . mhw@netris.org)
87v98zomjv.fsf@gnu.org
Hi Mark,
Mark H Weaver <mhw@netris.org> skribis:
Toggle quote (12 lines)> That the scanner fails to find all references is clearly an important> problem that should be fixed ASAP, but as far as I can tell, improving> the grafter would not make that problem any worse or create any new> problems.>> Improving the grafter should have the following effects:>> (1) Reducing the number of cases where ungrafted code with security> flaws is being used on our systems.>> (2) Fixing problems in our Fish, Nyxt, and Common Lisp packages.
These are cases where the scanner may get the references wrong in thefirst place though.
OTOH, there are also cases where the scanner gets the references right.For example, earlier Pierre mentioned that grafting breaks the referenceof nyxt to glib:
https://issues.guix.gnu.org/33848#26
It turns out that the scanner does find a reference to glib “by chance”:
Toggle snippet (6 lines)$ guix gc --references $(guix build nyxt --no-grafts) | grep glib-2/gnu/store/4vmhbc31cpbnlw3c96kcc094ihmaf7dv-glib-2.62.6$ grep -r 4vmhbc31cpbnlw3c96kcc094ihmaf7dv $(guix build nyxt --no-grafts)Duuma dosiero /gnu/store/5pgh0cn1kzyajaanj7f1iyp91hd917d6-nyxt-2-pre-release-6/bin/.nyxt-real kongruas
So in this case, the fixed grafter is a net win.
After applying the patch you sent, I confirm that Nyxt starts just finewhen running:
./pre-inst-env guix environment --ad-hoc nyxt -CN -E ^DISPLAY --share=/tmp/.X11-unix -- nyxt
… whereas on master it fails to start with:
Toggle snippet (8 lines)Unhandled SIMPLE-ERROR in thread #<SB-THREAD:THREAD "main thread" RUNNING {10010D0103}>: Problem running initialization hook GLIB::RUN-INITIALIZERS: Unable to load any of the alternatives: ("/gnu/store/4vmhbc31cpbnlw3c96kcc094ihmaf7dv-glib-2.62.6/lib/libglib-2.0.so.0" "/gnu/store/4vmhbc31cpbnlw3c96kcc094ihmaf7dv-glib-2.62.6/lib/libglib-2.0.so")
Toggle quote (13 lines)> Improving the scanner, or adding phases to selected packages or build> systems to copy hidden references to an ASCII file, should have the> following effects:>> (1) Reducing the number of cases where run-time dependencies are not> known to Guix, which could lead to dependencies being prematurely> GC'd or excluded from things like "guix pack".>> So, it seems to me that we should persue both of these improvements> concurrently, and I see no practical advantage to postponing one for the> sake of rolling them both out at the same time. Of course, I welcome> other opinions on this.
As always I worry about added complexity. In this case, I think you’reright: the UTF-{16,32}-capable grafter would most likely fix a number ofissues right away, including this Nyxt issue.
Thanks!
Ludo’.
P
P
Pierre Neidhardt wrote on 6 Apr 10:23 +0200
87wntfrfd1.fsf@ambrevar.xyz
Hi Mark, hi Ludo,
Toggle quote (16 lines)> After applying the patch you sent, I confirm that Nyxt starts just fine> when running:>> ./pre-inst-env guix environment --ad-hoc nyxt -CN -E ^DISPLAY --share=/tmp/.X11-unix -- nyxt>> … whereas on master it fails to start with:>> --8<---------------cut here---------------start------------->8---> Unhandled SIMPLE-ERROR in thread #<SB-THREAD:THREAD "main thread" RUNNING> {10010D0103}>:> Problem running initialization hook GLIB::RUN-INITIALIZERS:> Unable to load any of the alternatives:> ("/gnu/store/4vmhbc31cpbnlw3c96kcc094ihmaf7dv-glib-2.62.6/lib/libglib-2.0.so.0"> "/gnu/store/4vmhbc31cpbnlw3c96kcc094ihmaf7dv-glib-2.62.6/lib/libglib-2.0.so")> --8<---------------cut here---------------end--------------->8---
Fantastic, this looks like it's fixing this grafting issue indeed!
I also agree this looks like a net win even though the `guix gc` issueis not fix.The latter seems to be relatively easier to fix, doesn't it?
Cheers!
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQFGBAEBCAAwFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAmBsGnoSHG1haWxAYW1icmV2YXIueHl6AAoJEJvc9Jeku8x/BfoH/11cgUMkdFKk3mEPzFgwjwX0qk66UwQa4MiBuWHZtvn4nS4cLRCzdyTk20bWk+w0hg7V5G2+EI5nKfrGGe3wqoH9MAASqG1S14Q1aU6Q4QjBYdFwTy+oCKkCnPp0fL8dw95xHvYmPxRnbD2YTd0zPQ/Ei9cf6534efkJ+zpyGc3h3ilO+BelQH+qSuscWBj6AyQNxvfdHPMZdJvY3jKW+P67kv8wqNhrybX0IWD9lq+AkDdDiniS7MHmTpsOy+ZcBHOHGHT0Or7/07JRcraMU4Z3WLXQMbCQMibQ9HiOkk15Cdx+/3RZZ+LhA738NZdLTznIU9PDVlUR0yuhxGUPFqw==HVpO-----END PGP SIGNATURE-----
M
M
Mark H Weaver wrote on 6 Apr 13:19 +0200
87h7kj7j7x.fsf@netris.org
Here's a revised draft of the patch, which updates the comments andrefactors the code a bit to (hopefully) make it a bit more readable.
Mark
From 6eec36e66d20d82fe02c6de793422875477b90d8 Mon Sep 17 00:00:00 2001From: Mark H Weaver <mhw@netris.org>Date: Fri, 2 Apr 2021 18:36:23 -0400Subject: [PATCH] DRAFT: grafts: Support rewriting UTF-16 and UTF-32 store references.
* guix/build/graft.scm (replace-store-references): Add support forfinding and rewriting UTF-16 and UTF-32 store references.* tests/grafts.scm: Add tests.--- guix/build/graft.scm | 247 +++++++++++++++++++++++++++---------------- tests/grafts.scm | 68 ++++++++++++ 2 files changed, 224 insertions(+), 91 deletions(-)
Toggle diff (370 lines)diff --git a/guix/build/graft.scm b/guix/build/graft.scmindex c119ee71d1..23fca8f29c 100644--- a/guix/build/graft.scm+++ b/guix/build/graft.scm@@ -1,6 +1,6 @@ ;;; GNU Guix --- Functional package management for GNU ;;; Copyright © 2014, 2015, 2016, 2018 Ludovic Courtès <ludo@gnu.org>-;;; Copyright © 2016 Mark H Weaver <mhw@netris.org>+;;; Copyright © 2016, 2021 Mark H Weaver <mhw@netris.org> ;;; ;;; This file is part of GNU Guix. ;;;@@ -55,6 +55,40 @@ (string->char-set "0123456789abcdfghijklmnpqrsvwxyz") <>)) +(define (nix-base32-char-or-nul? byte)+ (or (nix-base32-char? byte)+ (char=? byte #\nul)))++(define (possible-utf16-hash? buffer i w)+ (and (<= (* 2 hash-length) (- i w))+ (let loop ((j (+ 1 (- i (* 2 hash-length)))))+ (or (>= j i)+ (and (zero? (bytevector-u8-ref buffer j))+ (loop (+ j 2)))))))++(define (possible-utf32-hash? buffer i w)+ (and (<= (* 4 hash-length) (- i w))+ (let loop ((j (+ 1 (- i (* 4 hash-length)))))+ (or (>= j i)+ (and (zero? (bytevector-u8-ref buffer j))+ (zero? (bytevector-u8-ref buffer (+ j 1)))+ (zero? (bytevector-u8-ref buffer (+ j 2)))+ (loop (+ j 4)))))))++(define (insert-nuls char-size bv)+ (if (or (not bv) (= char-size 1))+ bv+ (let* ((len (bytevector-length bv))+ (bv* (make-bytevector (+ 1 (* char-size+ (- len 1)))+ 0)))+ (let loop ((i 0))+ (when (< i len)+ (bytevector-u8-set! bv* (* i char-size)+ (bytevector-u8-ref bv i))+ (loop (+ i 1))))+ bv*)))+ (define* (replace-store-references input output replacement-table #:optional (store (%store-directory))) "Read data from INPUT, replacing store references according to@@ -76,9 +110,9 @@ bytevectors to the same value." (list->vector (map pred (iota 256))) <>)) - (define nix-base32-byte?+ (define nix-base32-byte-or-nul? (optimize-u8-predicate- (compose nix-base32-char?+ (compose nix-base32-char-or-nul? integer->char))) (define (dash? byte) (= byte 45))@@ -86,100 +120,131 @@ bytevectors to the same value." (define request-size (expt 2 20)) ; 1 MiB ;; We scan the file for the following 33-byte pattern: 32 bytes of- ;; nix-base32 characters followed by a dash. To accommodate large files,- ;; we do not read the entire file, but instead work on buffers of up to- ;; 'request-size' bytes. To ensure that every 33-byte sequence appears- ;; entirely within exactly one buffer, adjacent buffers must overlap,- ;; i.e. they must share 32 byte positions. We accomplish this by- ;; "ungetting" the last 32 bytes of each buffer before reading the next- ;; buffer, unless we know that we've reached the end-of-file.+ ;; nix-base32 characters followed by a dash. When we find such a pattern+ ;; whose hash is in REPLACEMENT-TABLE, we perform the required rewrite and+ ;; continue scanning.+ ;;+ ;; To support UTF-16 and UTF-32 store references, the 33 bytes comprising+ ;; this hash+dash pattern may optionally be interspersed by extra NUL bytes.+ ;; This simple approach works because the characters we are looking for are+ ;; restricted to ASCII. UTF-16 hashes are interspersed with single NUL+ ;; bytes ("\0"), and UTF-32 hashes are interspersed with triplets of NULs+ ;; ("\0\0\0"). Note that we require NULs to be present only *between* the+ ;; other bytes, and not at either end, in order to be insensitive to byte+ ;; order.+ ;;+ ;; To accommodate large files, we do not read the entire file at once, but+ ;; instead work on buffers of up to 'request-size' bytes. To ensure that+ ;; every hash+dash pattern appears in its entirety in at least one buffer,+ ;; adjacent buffers must overlap by one byte less than the maximum size of a+ ;; hash+dash pattern. We accomplish this by "ungetting" a suffix of each+ ;; buffer before reading the next buffer, unless we know that we've reached+ ;; the end-of-file. (let ((buffer (make-bytevector request-size)))- (let loop ()- ;; Note: We avoid 'get-bytevector-n' to work around- ;; <http://bugs.gnu.org/17466>.+ (define-syntax-rule (byte-at i)+ (bytevector-u8-ref buffer i))+ (let outer-loop () (match (get-bytevector-n! input buffer 0 request-size) ((? eof-object?) 'done) (end- ;; We scan the buffer for dashes that might be preceded by a- ;; nix-base32 hash. The key optimization here is that whenever we- ;; find a NON-nix-base32 character at position 'i', we know that it- ;; cannot be part of a hash, so the earliest position where the next- ;; hash could start is i+1 with the following dash at position i+33.- ;;- ;; Since nix-base32 characters comprise only 1/8 of the 256 possible- ;; byte values, and exclude some of the most common letters in- ;; English text (e t o u), in practice we can advance by 33 positions- ;; most of the time.- (let scan-from ((i hash-length) (written 0))- ;; 'i' is the first position where we look for a dash. 'written'- ;; is the number of bytes in the buffer that have already been- ;; written.+ (define (scan-from i w)+ ;; Scan the buffer for dashes that might be preceded by nix hashes,+ ;; where 'i' is the minimum position where such a dash might be+ ;; found, and 'w' is the number of bytes in the buffer that have+ ;; been written so far.+ ;;+ ;; The key optimization here is that whenever we find a byte at+ ;; position 'i' that cannot occur within a nix hash (because it's+ ;; neither a nix-base32 character nor NUL), we can infer that the+ ;; earliest position where the next hash could start is at i+1,+ ;; and therefore the earliest position for the following dash is+ ;; (+ i 1 hash-length), which is i+33.+ ;;+ ;; Since nix-base32-or-nul characters comprise only about 1/8 of+ ;; the 256 possible byte values, and exclude some of the most+ ;; common letters in English text (e t o u), we can advance 33+ ;; positions most of the time. (if (< i end)- (let ((byte (bytevector-u8-ref buffer i)))- (cond ((and (dash? byte)- ;; We've found a dash. Note that we do not know- ;; whether the preceeding 32 bytes are nix-base32- ;; characters, but we do not need to know. If- ;; they are not, the following lookup will fail.- (lookup-replacement- (string-tabulate (lambda (j)- (integer->char- (bytevector-u8-ref buffer- (+ j (- i hash-length)))))- hash-length)))- => (lambda (replacement)- ;; We've found a hash that needs to be replaced.- ;; First, write out all bytes preceding the hash- ;; that have not yet been written.- (put-bytevector output buffer written- (- i hash-length written))- ;; Now write the replacement string.- (put-bytevector output replacement)- ;; Since the byte at position 'i' is a dash,- ;; which is not a nix-base32 char, the earliest- ;; position where the next hash might start is- ;; i+1, and the earliest position where the- ;; following dash might start is (+ i 1- ;; hash-length). Also, increase the write- ;; position to account for REPLACEMENT.- (let ((len (bytevector-length replacement)))- (scan-from (+ i 1 len)- (+ i (- len hash-length))))))- ;; If the byte at position 'i' is a nix-base32 char,- ;; then the dash we're looking for might be as early as- ;; the following byte, so we can only advance by 1.- ((nix-base32-byte? byte)- (scan-from (+ i 1) written))- ;; If the byte at position 'i' is NOT a nix-base32- ;; char, then the earliest position where the next hash- ;; might start is i+1, with the following dash at- ;; position (+ i 1 hash-length).+ (let ((byte (byte-at i)))+ (cond ((dash? byte)+ (found-dash i w))+ ((nix-base32-byte-or-nul? byte)+ (scan-from (+ i 1) w)) (else- (scan-from (+ i 1 hash-length) written))))-- ;; We have finished scanning the buffer. Now we determine how- ;; many bytes have not yet been written, and how many bytes to- ;; "unget". If 'end' is less than 'request-size' then we read- ;; less than we asked for, which indicates that we are at EOF,- ;; so we needn't unget anything. Otherwise, we unget up to- ;; 'hash-length' bytes (32 bytes). However, we must be careful- ;; not to unget bytes that have already been written, because- ;; that would cause them to be written again from the next- ;; buffer. In practice, this case occurs when a replacement is- ;; made near or beyond the end of the buffer. When REPLACEMENT- ;; went beyond END, we consume the extra bytes from INPUT.- (begin- (if (> written end)- (get-bytevector-n! input buffer 0 (- written end))- (let* ((unwritten (- end written))- (unget-size (if (= end request-size)- (min hash-length unwritten)- 0))- (write-size (- unwritten unget-size)))- (put-bytevector output buffer written write-size)- (unget-bytevector input buffer (+ written write-size)- unget-size)))- (loop)))))))))+ (not-part-of-hash i w))))+ (finish-buffer i w)))++ (define (not-part-of-hash i w)+ ;; Position 'i' is known to not be within a nix hash. Therefore,+ ;; the earliest position where the next hash might start is i+1,+ ;; with the following dash at position (+ i 1 hash-length).+ (scan-from (+ i 1 hash-length) w))++ (define (found-dash i w)+ (cond ((not (zero? (byte-at (- i 1))))+ (found-possible-hash 1 i w))+ ((not (zero? (byte-at (- i 2))))+ (if (possible-utf16-hash? buffer i w)+ (found-possible-hash 2 i w)+ (not-part-of-hash i w)))+ (else+ (if (possible-utf32-hash? buffer i w)+ (found-possible-hash 4 i w)+ (not-part-of-hash i w)))))++ (define (found-possible-hash char-size i w)+ (let* ((hash (string-tabulate+ (lambda (j)+ (integer->char+ (byte-at (- i (* char-size+ (- hash-length j))))))+ hash-length))+ (replacement* (lookup-replacement hash))+ (replacement (and replacement*+ (insert-nuls char-size replacement*))))+ (cond+ ((not replacement)+ (not-part-of-hash i w))+ (else+ ;; We've found a hash that needs to be replaced.+ ;; First, write out all bytes preceding the hash+ ;; that have not yet been written.+ (put-bytevector output buffer w+ (- i (* char-size hash-length) w))+ ;; Now write the replacement string.+ (put-bytevector output replacement)+ ;; Now compute the new value of 'w' and+ ;; the new value of 'i', and iterate.+ (let ((w (+ (- i (* char-size hash-length))+ (bytevector-length replacement))))+ (scan-from (+ w hash-length) w))))))++ (define (finish-buffer i w)+ ;; We have finished scanning the buffer. Now we determine how+ ;; many bytes have not yet been written, and how many bytes to+ ;; "unget". If 'end' is less than 'request-size' then we read+ ;; less than we asked for, which indicates that we are at EOF,+ ;; so we needn't unget anything. Otherwise, we unget up to+ ;; (* 4 hash-length) bytes. However, we must be careful not to+ ;; unget bytes that have already been written, because that+ ;; would cause them to be written again from the next buffer.+ ;; In practice, this case occurs when a replacement is made+ ;; near or beyond the end of the buffer. When REPLACEMENT went+ ;; beyond END, we consume the extra bytes from INPUT.+ (if (> w end)+ (get-bytevector-n! input buffer 0 (- w end))+ (let* ((unwritten (- end w))+ (unget-size (if (= end request-size)+ (min (* 4 hash-length)+ unwritten)+ 0))+ (write-size (- unwritten unget-size)))+ (put-bytevector output buffer w write-size)+ (unget-bytevector input buffer (+ w write-size)+ unget-size)))+ (outer-loop))++ (scan-from hash-length 0)))))) (define (rename-matching-files directory mapping) "Apply MAPPING to the names of all the files in DIRECTORY, where MAPPING isdiff --git a/tests/grafts.scm b/tests/grafts.scmindex a12c6a5911..0e1c7355b1 100644--- a/tests/grafts.scm+++ b/tests/grafts.scm@@ -1,5 +1,6 @@ ;;; GNU Guix --- Functional package management for GNU ;;; Copyright © 2014, 2015, 2016, 2017, 2018, 2019 Ludovic Courtès <ludo@gnu.org>+;;; Copyright © 2021 Mark H Weaver <mhw@netris.org> ;;; ;;; This file is part of GNU Guix. ;;;@@ -468,4 +469,71 @@ replacement "/gnu/store"))))) +(define (nul-expand str char-size)+ (string-join (map string (string->list str))+ (make-string (- char-size 1) #\nul)))++(for-each+ (lambda (char-size1)+ (for-each+ (lambda (char-size2)+ (for-each+ (lambda (gap)+ (for-each+ (lambda (offset)+ (test-equal (format #f "replace-store-references, char-sizes ~a ~a, gap ~s, offset ~a"+ char-size1 char-size2 gap offset)+ (string-append (make-string offset #\=)+ (nul-expand (string-append "/gnu/store/"+ (make-string 32 #\6)+ "-BlahBlaH")+ char-size1)+ gap+ (nul-expand (string-append "/gnu/store/"+ (make-string 32 #\8)+ "-SoMeTHiNG")+ char-size2)+ (list->string (map integer->char (iota 77 33))))++ ;; Create input data where the right-hand-size of the dash ("-something"+ ;; here) goes beyond the end of the internal buffer of+ ;; 'replace-store-references'.+ (let* ((content (string-append (make-string offset #\=)+ (nul-expand (string-append "/gnu/store/"+ (make-string 32 #\5)+ "-blahblah")+ char-size1)+ gap+ (nul-expand (string-append "/gnu/store/"+ (make-string 32 #\7)+ "-something")+ char-size2)+ (list->string+ (map integer->char (iota 77 33)))))+ (replacement (alist->vhash+ `((,(make-string 32 #\5)+ . ,(string->utf8 (string-append+ (make-string 32 #\6)+ "-BlahBlaH")))+ (,(make-string 32 #\7)+ . ,(string->utf8 (string-append+ (make-string 32 #\8)+ "-SoMeTHiNG")))))))+ (call-with-output-string+ (lambda (output)+ ((@@ (guix build graft) replace-store-references)+ (open-input-string content) output+ replacement+ "/gnu/store"))))))+ ;; offsets to test+ (map (lambda (i) (- buffer-size (* 40 char-size1) i))+ (iota 30))))+ ;; gaps+ '("" "-" " " "a")))+ ;; char-size2 values to test+ '(1 2)))+ ;; char-size1 values to test+ '(1 2 4))++ (test-end)-- 2.31.1
L
L
Leo Famulari wrote on 6 Apr 19:23 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
YGyZLj1PXbz9Pe+z@jasmine.lan
On Mon, Apr 05, 2021 at 09:45:43PM +0200, Ludovic Courtès wrote:
Toggle quote (4 lines)> The GC’s scanner still gets it wrong though. I wonder whether having> the grafting code more capable than the scanner could lead to bad> surprises. WDYT?
I'm going off-topic, but I've wished we had a generic fast string-search(and replace?) procedure.
The go-build-system has a slow "one byte a time" implementation becauseI couldn't figure out how to re-use the code in (guix build grafts):
https://git.savannah.gnu.org/cgit/guix.git/tree/guix/build/go-build-system.scm?h=v1.2.0#n269
There are probably some other places we could use a fast procedure. Itmight even be something to add to Guile.
M
M
Mark H Weaver wrote on 7 Apr 01:21 +0200
87mtubngmi.fsf@netris.org
Hi Leo,
Leo Famulari <leo@famulari.name> writes:
Toggle quote (8 lines)> On Mon, Apr 05, 2021 at 09:45:43PM +0200, Ludovic Courtès wrote:>> The GC’s scanner still gets it wrong though. I wonder whether having>> the grafting code more capable than the scanner could lead to bad>> surprises. WDYT?>> I'm going off-topic, but I've wished we had a generic fast string-search> (and replace?) procedure.
Guile has several functions to help with this, e.g. 'string-contains','string-replace', 'string-replace-substring', and of course the regexpfunctions.
The grafting code can't use these things because (1) we are notsearching for a single string, but rather for arbitrary nix hashes, and(2) we can't easily use the regexp functions because there could be NULspresent.
Toggle quote (5 lines)> The go-build-system has a slow "one byte a time" implementation because> I couldn't figure out how to re-use the code in (guix build grafts):>> https://git.savannah.gnu.org/cgit/guix.git/tree/guix/build/go-build-system.scm?h=v1.2.0#n269
From a glance, I would think that 'replace-store-references' from(guix build grafts) is directly applicable here. Do you remember whatthe difficulty was in using it?
Alternatively, since you are only searching for a single string toreplace, I think you could read the entire file into a single string(e.g. using 'get-string-all' with "ISO-8859-1" encoding) and use'string-replace-substring'.
What do you think?
Mark
L
L
Ludovic Courtès wrote on 8 Apr 12:13 +0200
(name . Mark H Weaver)(address . mhw@netris.org)
87ft01axta.fsf@gnu.org
Hi Mark,
Mark H Weaver <mhw@netris.org> skribis:
Toggle quote (10 lines)> From 6eec36e66d20d82fe02c6de793422875477b90d8 Mon Sep 17 00:00:00 2001> From: Mark H Weaver <mhw@netris.org>> Date: Fri, 2 Apr 2021 18:36:23 -0400> Subject: [PATCH] DRAFT: grafts: Support rewriting UTF-16 and UTF-32 store> references.>> * guix/build/graft.scm (replace-store-references): Add support for> finding and rewriting UTF-16 and UTF-32 store references.> * tests/grafts.scm: Add tests.
Please add a “Fixes” line in the commit log.
I’m not reviewing the code in depth and I trust your judgment.
The risks of bugs I can think of are: missed ASCII references (aregression, whereby some ASCII references would not get rewritten), andunrelated UTF-{16,32}-base32-looking references getting rewritten.
I guess the latter is very unlikely because only sequences found in thereplacement table may be rewritten, right?
The former should be caught by ‘tests/grafts.scm’ but we could alsocheck the closure of real-world systems, for instance, to make sure itdoesn’t refer to ungrafted things.
Do you know how this affects performance?
Some superficial comments:
Toggle quote (18 lines)> +(define (possible-utf16-hash? buffer i w)> + (and (<= (* 2 hash-length) (- i w))> + (let loop ((j (+ 1 (- i (* 2 hash-length)))))> + (or (>= j i)> + (and (zero? (bytevector-u8-ref buffer j))> + (loop (+ j 2)))))))> +> +(define (possible-utf32-hash? buffer i w)> + (and (<= (* 4 hash-length) (- i w))> + (let loop ((j (+ 1 (- i (* 4 hash-length)))))> + (or (>= j i)> + (and (zero? (bytevector-u8-ref buffer j))> + (zero? (bytevector-u8-ref buffer (+ j 1)))> + (zero? (bytevector-u8-ref buffer (+ j 2)))> + (loop (+ j 4)))))))> +> +(define (insert-nuls char-size bv)
Perhaps add short docstrings for clarity.
Toggle quote (62 lines)> +(for-each> + (lambda (char-size1)> + (for-each> + (lambda (char-size2)> + (for-each> + (lambda (gap)> + (for-each> + (lambda (offset)> + (test-equal (format #f "replace-store-references, char-sizes ~a ~a, gap ~s, offset ~a"> + char-size1 char-size2 gap offset)> + (string-append (make-string offset #\=)> + (nul-expand (string-append "/gnu/store/"> + (make-string 32 #\6)> + "-BlahBlaH")> + char-size1)> + gap> + (nul-expand (string-append "/gnu/store/"> + (make-string 32 #\8)> + "-SoMeTHiNG")> + char-size2)> + (list->string (map integer->char (iota 77 33))))> +> + ;; Create input data where the right-hand-size of the dash ("-something"> + ;; here) goes beyond the end of the internal buffer of> + ;; 'replace-store-references'.> + (let* ((content (string-append (make-string offset #\=)> + (nul-expand (string-append "/gnu/store/"> + (make-string 32 #\5)> + "-blahblah")> + char-size1)> + gap> + (nul-expand (string-append "/gnu/store/"> + (make-string 32 #\7)> + "-something")> + char-size2)> + (list->string> + (map integer->char (iota 77 33)))))> + (replacement (alist->vhash> + `((,(make-string 32 #\5)> + . ,(string->utf8 (string-append> + (make-string 32 #\6)> + "-BlahBlaH")))> + (,(make-string 32 #\7)> + . ,(string->utf8 (string-append> + (make-string 32 #\8)> + "-SoMeTHiNG")))))))> + (call-with-output-string> + (lambda (output)> + ((@@ (guix build graft) replace-store-references)> + (open-input-string content) output> + replacement> + "/gnu/store"))))))> + ;; offsets to test> + (map (lambda (i) (- buffer-size (* 40 char-size1) i))> + (iota 30))))> + ;; gaps> + '("" "-" " " "a")))> + ;; char-size2 values to test> + '(1 2)))> + ;; char-size1 values to test> + '(1 2 4))
For clarity, perhaps you can define a top-level procedure for the testand call it from ‘for-each’.
Modulo these very minor issues, it looks like it’s ready to go!
Thank you,Ludo’.
M
M
Mark H Weaver wrote on 13 Apr 22:06 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
87k0p6rlt5.fsf@netris.org
Hi Ludovic,
Ludovic Courtès <ludo@gnu.org> writes:
Toggle quote (2 lines)> Please add a “Fixes” line in the commit log.
Done, thanks.
Toggle quote (5 lines)> I’m not reviewing the code in depth and I trust your judgment.>> The risks of bugs I can think of are: missed ASCII references (a> regression, whereby some ASCII references would not get rewritten),
This is indeed a risk whenever we modify the grafting code, which iswritten for efficiency, not clarity. I've tried to be careful, and havechecked that my newly grafted system and user profiles do not retainreferences to ungrafted code, modulo the following pre-existing issues:
https://bugs.gnu.org/47576 (ibus-daemon launches ungrafted subprocesses)
https://bugs.gnu.org/47614 (Chunked store references in .zo files in Racket 8)
Toggle quote (5 lines)> and unrelated UTF-{16,32}-base32-looking references getting rewritten.>> I guess the latter is very unlikely because only sequences found in the> replacement table may be rewritten, right?
Yes, that's correct.
Toggle quote (4 lines)> The former should be caught by ‘tests/grafts.scm’ but we could also> check the closure of real-world systems, for instance, to make sure it> doesn’t refer to ungrafted things.
As I mention above, I've already done so for my own GNOME-based Guixsystem.
Toggle quote (2 lines)> Do you know how this affects performance?
I have not yet measured it, but subjectively, I do not notice anyobvious difference in speed.
I expect that the main performance impact is that large blocks of NULswill be processed more slowly by the current draft version of the newgrafting code. That's because NULs can now be part of a nix hash, andtherefore the new grafting code can only advance 1 byte position whenseeing a NUL, whereas previously it would skip ahead 33 positions inthat case.
If desired, the handling of NULs could be made more efficient, at thecost of a bit more complexity. When seeing a NUL, we could check theadjacent bytes to see if this is part of a nix-base32 character inUTF-16 or UTF-32 encoding. If not, we could skip ahead.
Toggle quote (2 lines)> Perhaps add short docstrings for clarity.
Done.
Toggle quote (10 lines)>> +(for-each>> + (lambda (char-size1)>> + (for-each>> + (lambda (char-size2)>> + (for-each>> + (lambda (gap)>> + (for-each>> + (lambda (offset)>> + (test-equal (format #f "replace-store-references, char-sizes ~a ~a, gap ~s, offset ~a">> + char-size1 char-size2 gap offset)
[...]
Toggle quote (13 lines)>> + ;; offsets to test>> + (map (lambda (i) (- buffer-size (* 40 char-size1) i))>> + (iota 30))))>> + ;; gaps>> + '("" "-" " " "a")))>> + ;; char-size2 values to test>> + '(1 2)))>> + ;; char-size1 values to test>> + '(1 2 4))>> For clarity, perhaps you can define a top-level procedure for the test> and call it from ‘for-each’.
Good idea. I'd like to optimize the tests a bit before pushing this.They take a fairly long time to run, and lead to a huge 1.5G grafts.logfile. It might be sufficient to avoid 'test-equal' here.
Toggle quote (2 lines)> Modulo these very minor issues, it looks like it’s ready to go!
Sounds good. Thanks for the review!
Below, I've attached my current revision of this draft patch, whichincorporates your suggestions regarding the main code. What remains isto improve the tests.
Regards, Mark
From f3141eae346a66ff52c70708c87f880938bdbb24 Mon Sep 17 00:00:00 2001From: Mark H Weaver <mhw@netris.org>Date: Fri, 2 Apr 2021 18:36:23 -0400Subject: [PATCH] DRAFT: grafts: Support rewriting UTF-16 and UTF-32 store references.
Partially fixes https://bugs.gnu.org/33848.
* guix/build/graft.scm (replace-store-references): Add support forfinding and rewriting UTF-16 and UTF-32 store references.* tests/grafts.scm: Add tests.--- guix/build/graft.scm | 281 +++++++++++++++++++++++++++++-------------- tests/grafts.scm | 68 +++++++++++ 2 files changed, 258 insertions(+), 91 deletions(-)
Toggle diff (404 lines)diff --git a/guix/build/graft.scm b/guix/build/graft.scmindex c119ee71d1..30be988587 100644--- a/guix/build/graft.scm+++ b/guix/build/graft.scm@@ -1,6 +1,6 @@ ;;; GNU Guix --- Functional package management for GNU ;;; Copyright © 2014, 2015, 2016, 2018 Ludovic Courtès <ludo@gnu.org>-;;; Copyright © 2016 Mark H Weaver <mhw@netris.org>+;;; Copyright © 2016, 2021 Mark H Weaver <mhw@netris.org> ;;; ;;; This file is part of GNU Guix. ;;;@@ -55,6 +55,52 @@ (string->char-set "0123456789abcdfghijklmnpqrsvwxyz") <>)) +(define (nix-base32-char-or-nul? c)+ "Return true if C is a nix-base32 character or NUL, otherwise return false."+ (or (nix-base32-char? c)+ (char=? c #\nul)))++(define (possible-utf16-hash? buffer i w)+ "Return true if (I - W) is large enough to hold a UTF-16 encoded+nix-base32 hash and if BUFFER contains NULs in all positions where NULs+are to be expected in a UTF-16 encoded hash+dash pattern whose dash is+found at position I. Otherwise, return false."+ (and (<= (* 2 hash-length) (- i w))+ (let loop ((j (+ 1 (- i (* 2 hash-length)))))+ (or (>= j i)+ (and (zero? (bytevector-u8-ref buffer j))+ (loop (+ j 2)))))))++(define (possible-utf32-hash? buffer i w)+ "Return true if (I - W) is large enough to hold a UTF-32 encoded+nix-base32 hash and if BUFFER contains NULs in all positions where NULs+are to be expected in a UTF-32 encoded hash+dash pattern whose dash is+found at position I. Otherwise, return false."+ (and (<= (* 4 hash-length) (- i w))+ (let loop ((j (+ 1 (- i (* 4 hash-length)))))+ (or (>= j i)+ (and (zero? (bytevector-u8-ref buffer j))+ (zero? (bytevector-u8-ref buffer (+ j 1)))+ (zero? (bytevector-u8-ref buffer (+ j 2)))+ (loop (+ j 4)))))))++(define (insert-nuls char-size bv)+ "Given a bytevector BV, return a bytevector containing the same bytes but+with (CHAR-SIZE - 1) NULs inserted between every two adjacent bytes from BV.+For example, (insert-nuls 4 #u8(1 2 3)) => #u8(1 0 0 0 2 0 0 0 3)."+ (if (= char-size 1)+ bv+ (let* ((len (bytevector-length bv))+ (bv* (make-bytevector (+ 1 (* char-size+ (- len 1)))+ 0)))+ (let loop ((i 0))+ (when (< i len)+ (bytevector-u8-set! bv* (* i char-size)+ (bytevector-u8-ref bv i))+ (loop (+ i 1))))+ bv*)))+ (define* (replace-store-references input output replacement-table #:optional (store (%store-directory))) "Read data from INPUT, replacing store references according to@@ -76,9 +122,9 @@ bytevectors to the same value." (list->vector (map pred (iota 256))) <>)) - (define nix-base32-byte?+ (define nix-base32-byte-or-nul? (optimize-u8-predicate- (compose nix-base32-char?+ (compose nix-base32-char-or-nul? integer->char))) (define (dash? byte) (= byte 45))@@ -86,100 +132,153 @@ bytevectors to the same value." (define request-size (expt 2 20)) ; 1 MiB ;; We scan the file for the following 33-byte pattern: 32 bytes of- ;; nix-base32 characters followed by a dash. To accommodate large files,- ;; we do not read the entire file, but instead work on buffers of up to- ;; 'request-size' bytes. To ensure that every 33-byte sequence appears- ;; entirely within exactly one buffer, adjacent buffers must overlap,- ;; i.e. they must share 32 byte positions. We accomplish this by- ;; "ungetting" the last 32 bytes of each buffer before reading the next- ;; buffer, unless we know that we've reached the end-of-file.+ ;; nix-base32 characters followed by a dash. When we find such a pattern+ ;; whose hash is in REPLACEMENT-TABLE, we perform the required rewrite and+ ;; continue scanning.+ ;;+ ;; To support UTF-16 and UTF-32 store references, the 33 bytes comprising+ ;; this hash+dash pattern may optionally be interspersed by extra NUL bytes.+ ;; This simple approach works because the characters we are looking for are+ ;; restricted to ASCII. UTF-16 hashes are interspersed with single NUL+ ;; bytes ("\0"), and UTF-32 hashes are interspersed with triplets of NULs+ ;; ("\0\0\0"). Note that we require NULs to be present only *between* the+ ;; other bytes, and not at either end, in order to be insensitive to byte+ ;; order.+ ;;+ ;; To accommodate large files, we do not read the entire file at once, but+ ;; instead work on buffers of up to REQUEST-SIZE bytes. To ensure that+ ;; every hash+dash pattern appears in its entirety in at least one buffer,+ ;; adjacent buffers must overlap by one byte less than the maximum size of a+ ;; hash+dash pattern. We accomplish this by "ungetting" a suffix of each+ ;; buffer before reading the next buffer, unless we know that we've reached+ ;; the end-of-file. (let ((buffer (make-bytevector request-size)))- (let loop ()- ;; Note: We avoid 'get-bytevector-n' to work around- ;; <http://bugs.gnu.org/17466>.+ (define-syntax-rule (byte-at i)+ (bytevector-u8-ref buffer i))+ (let outer-loop () (match (get-bytevector-n! input buffer 0 request-size) ((? eof-object?) 'done) (end- ;; We scan the buffer for dashes that might be preceded by a- ;; nix-base32 hash. The key optimization here is that whenever we- ;; find a NON-nix-base32 character at position 'i', we know that it- ;; cannot be part of a hash, so the earliest position where the next- ;; hash could start is i+1 with the following dash at position i+33.- ;;- ;; Since nix-base32 characters comprise only 1/8 of the 256 possible- ;; byte values, and exclude some of the most common letters in- ;; English text (e t o u), in practice we can advance by 33 positions- ;; most of the time.- (let scan-from ((i hash-length) (written 0))- ;; 'i' is the first position where we look for a dash. 'written'- ;; is the number of bytes in the buffer that have already been- ;; written.+ (define (scan-from i w)+ ;; Scan the buffer for dashes that might be preceded by nix hashes,+ ;; where I is the minimum position where such a dash might be+ ;; found, and W is the number of bytes in the buffer that have been+ ;; written so far. We assume that I - W >= HASH-LENGTH.+ ;;+ ;; The key optimization here is that whenever we find a byte at+ ;; position I that cannot occur within a nix hash (because it's+ ;; neither a nix-base32 character nor NUL), we can infer that the+ ;; earliest position where the next hash could start is at I + 1,+ ;; and therefore the earliest position for the following dash is+ ;; (+ I 1 HASH-LENGTH), which is I + 33.+ ;;+ ;; Since nix-base32-or-nul characters comprise only about 1/8 of+ ;; the 256 possible byte values, and exclude some of the most+ ;; common letters in English text (e t o u), we can advance 33+ ;; positions much of the time. (if (< i end)- (let ((byte (bytevector-u8-ref buffer i)))- (cond ((and (dash? byte)- ;; We've found a dash. Note that we do not know- ;; whether the preceeding 32 bytes are nix-base32- ;; characters, but we do not need to know. If- ;; they are not, the following lookup will fail.- (lookup-replacement- (string-tabulate (lambda (j)- (integer->char- (bytevector-u8-ref buffer- (+ j (- i hash-length)))))- hash-length)))- => (lambda (replacement)- ;; We've found a hash that needs to be replaced.- ;; First, write out all bytes preceding the hash- ;; that have not yet been written.- (put-bytevector output buffer written- (- i hash-length written))- ;; Now write the replacement string.- (put-bytevector output replacement)- ;; Since the byte at position 'i' is a dash,- ;; which is not a nix-base32 char, the earliest- ;; position where the next hash might start is- ;; i+1, and the earliest position where the- ;; following dash might start is (+ i 1- ;; hash-length). Also, increase the write- ;; position to account for REPLACEMENT.- (let ((len (bytevector-length replacement)))- (scan-from (+ i 1 len)- (+ i (- len hash-length))))))- ;; If the byte at position 'i' is a nix-base32 char,- ;; then the dash we're looking for might be as early as- ;; the following byte, so we can only advance by 1.- ((nix-base32-byte? byte)- (scan-from (+ i 1) written))- ;; If the byte at position 'i' is NOT a nix-base32- ;; char, then the earliest position where the next hash- ;; might start is i+1, with the following dash at- ;; position (+ i 1 hash-length).+ (let ((byte (byte-at i)))+ (cond ((dash? byte)+ (found-dash i w))+ ((nix-base32-byte-or-nul? byte)+ (scan-from (+ i 1) w)) (else- (scan-from (+ i 1 hash-length) written))))-- ;; We have finished scanning the buffer. Now we determine how- ;; many bytes have not yet been written, and how many bytes to- ;; "unget". If 'end' is less than 'request-size' then we read- ;; less than we asked for, which indicates that we are at EOF,- ;; so we needn't unget anything. Otherwise, we unget up to- ;; 'hash-length' bytes (32 bytes). However, we must be careful- ;; not to unget bytes that have already been written, because- ;; that would cause them to be written again from the next- ;; buffer. In practice, this case occurs when a replacement is- ;; made near or beyond the end of the buffer. When REPLACEMENT- ;; went beyond END, we consume the extra bytes from INPUT.- (begin- (if (> written end)- (get-bytevector-n! input buffer 0 (- written end))- (let* ((unwritten (- end written))- (unget-size (if (= end request-size)- (min hash-length unwritten)- 0))- (write-size (- unwritten unget-size)))- (put-bytevector output buffer written write-size)- (unget-bytevector input buffer (+ written write-size)- unget-size)))- (loop)))))))))+ (not-part-of-hash i w))))+ (finish-buffer i w)))++ (define (not-part-of-hash i w)+ ;; Position I is known to not be within a nix hash that we must+ ;; rewrite. Therefore, the earliest position where the next hash+ ;; might start is I + 1, and therefore the earliest position of+ ;; the following dash is (+ I 1 HASH-LENGTH).+ (scan-from (+ i 1 hash-length) w))++ (define (found-dash i w)+ ;; We know that there is a dash '-' at position I, and that+ ;; I >= HASH-LENGTH. The immediately preceding bytes *might*+ ;; contain a nix-base32 hash, but that is not yet known. Here,+ ;; we rule out all but one possible encoding (ASCII, UTF-16,+ ;; UTF-32) by counting how many NULs precede the dash.+ (cond ((not (zero? (byte-at (- i 1))))+ ;; The dash is *not* preceded by a NUL, therefore it+ ;; cannot possibly be a UTF-16 or UTF-32 hash. Proceed+ ;; to check for an ASCII hash.+ (found-possible-hash 1 i w))++ ((not (zero? (byte-at (- i 2))))+ ;; The dash is preceded by exactly one NUL, therefore it+ ;; cannot be an ASCII or UTF-32 hash. Proceed to check+ ;; for a UTF-16 hash.+ (if (possible-utf16-hash? buffer i w)+ (found-possible-hash 2 i w)+ (not-part-of-hash i w)))++ (else+ ;; The dash is preceded by at least two NULs, therefore+ ;; it cannot be an ASCII or UTF-16 hash. Proceed to+ ;; check for a UTF-32 hash.+ (if (possible-utf32-hash? buffer i w)+ (found-possible-hash 4 i w)+ (not-part-of-hash i w)))))++ (define (found-possible-hash char-size i w)+ ;; We know that there is a dash '-' at position I, that+ ;; I >= CHAR-SIZE * HASH-LENGTH, and that the only possible+ ;; encoding for the preceding hash is as indicated by+ ;; CHAR-SIZE. Here we check to see if the given hash is in+ ;; REPLACEMENT-TABLE, and if so, we perform the required+ ;; rewrite.+ (let* ((hash (string-tabulate+ (lambda (j)+ (integer->char+ (byte-at (- i (* char-size+ (- hash-length j))))))+ hash-length))+ (replacement* (lookup-replacement hash))+ (replacement (and replacement*+ (insert-nuls char-size replacement*))))+ (cond+ ((not replacement)+ (not-part-of-hash i w))+ (else+ ;; We've found a hash that needs to be replaced.+ ;; First, write out all bytes preceding the hash+ ;; that have not yet been written.+ (put-bytevector output buffer w+ (- i (* char-size hash-length) w))+ ;; Now write the replacement string.+ (put-bytevector output replacement)+ ;; Now compute the new values of W and I and continue.+ (let ((w (+ (- i (* char-size hash-length))+ (bytevector-length replacement))))+ (scan-from (+ w hash-length) w))))))++ (define (finish-buffer i w)+ ;; We have finished scanning the buffer. Now we determine how many+ ;; bytes have not yet been written, and how many bytes to "unget".+ ;; If END is less than REQUEST-SIZE then we read less than we asked+ ;; for, which indicates that we are at EOF, so we needn't unget+ ;; anything. Otherwise, we unget up to (* 4 HASH-LENGTH) bytes.+ ;; However, we must be careful not to unget bytes that have already+ ;; been written, because that would cause them to be written again+ ;; from the next buffer. In practice, this case occurs when a+ ;; replacement is made near or beyond the end of the buffer. When+ ;; REPLACEMENT went beyond END, we consume the extra bytes from+ ;; INPUT.+ (if (> w end)+ (get-bytevector-n! input buffer 0 (- w end))+ (let* ((unwritten (- end w))+ (unget-size (if (= end request-size)+ (min (* 4 hash-length)+ unwritten)+ 0))+ (write-size (- unwritten unget-size)))+ (put-bytevector output buffer w write-size)+ (unget-bytevector input buffer (+ w write-size)+ unget-size)))+ (outer-loop))++ (scan-from hash-length 0)))))) (define (rename-matching-files directory mapping) "Apply MAPPING to the names of all the files in DIRECTORY, where MAPPING isdiff --git a/tests/grafts.scm b/tests/grafts.scmindex a12c6a5911..0e1c7355b1 100644--- a/tests/grafts.scm+++ b/tests/grafts.scm@@ -1,5 +1,6 @@ ;;; GNU Guix --- Functional package management for GNU ;;; Copyright © 2014, 2015, 2016, 2017, 2018, 2019 Ludovic Courtès <ludo@gnu.org>+;;; Copyright © 2021 Mark H Weaver <mhw@netris.org> ;;; ;;; This file is part of GNU Guix. ;;;@@ -468,4 +469,71 @@ replacement "/gnu/store"))))) +(define (nul-expand str char-size)+ (string-join (map string (string->list str))+ (make-string (- char-size 1) #\nul)))++(for-each+ (lambda (char-size1)+ (for-each+ (lambda (char-size2)+ (for-each+ (lambda (gap)+ (for-each+ (lambda (offset)+ (test-equal (format #f "replace-store-references, char-sizes ~a ~a, gap ~s, offset ~a"+ char-size1 char-size2 gap offset)+ (string-append (make-string offset #\=)+ (nul-expand (string-append "/gnu/store/"+ (make-string 32 #\6)+ "-BlahBlaH")+ char-size1)+ gap+ (nul-expand (string-append "/gnu/store/"+ (make-string 32 #\8)+ "-SoMeTHiNG")+ char-size2)+ (list->string (map integer->char (iota 77 33))))++ ;; Create input data where the right-hand-size of the dash ("-something"+ ;; here) goes beyond the end of the internal buffer of+ ;; 'replace-store-references'.+ (let* ((content (string-append (make-string offset #\=)+ (nul-expand (string-append "/gnu/store/"+ (make-string 32 #\5)+ "-blahblah")+ char-size1)+ gap+ (nul-expand (string-append "/gnu/store/"+ (make-string 32 #\7)+ "-something")+ char-size2)+ (list->string+ (map integer->char (iota 77 33)))))+ (replacement (alist->vhash+ `((,(make-string 32 #\5)+ . ,(string->utf8 (string-append+ (make-string 32 #\6)+ "-BlahBlaH")))+ (,(make-string 32 #\7)+ . ,(string->utf8 (string-append+ (make-string 32 #\8)+ "-SoMeTHiNG")))))))+ (call-with-output-string+ (lambda (output)+ ((@@ (guix build graft) replace-store-references)+ (open-input-string content) output+ replacement+ "/gnu/store"))))))+ ;; offsets to test+ (map (lambda (i) (- buffer-size (* 40 char-size1) i))+ (iota 30))))+ ;; gaps+ '("" "-" " " "a")))+ ;; char-size2 values to test+ '(1 2)))+ ;; char-size1 values to test+ '(1 2 4))++ (test-end)-- 2.31.1
L
L
Ludovic Courtès wrote on 14 Apr 12:55 +0200
(name . Mark H Weaver)(address . mhw@netris.org)
87eefdf840.fsf@gnu.org
Hi Mark,
Mark H Weaver <mhw@netris.org> skribis:
Toggle quote (14 lines)>> The risks of bugs I can think of are: missed ASCII references (a>> regression, whereby some ASCII references would not get rewritten),>> This is indeed a risk whenever we modify the grafting code, which is> written for efficiency, not clarity. I've tried to be careful, and have> checked that my newly grafted system and user profiles do not retain> references to ungrafted code, modulo the following pre-existing issues:>> <https://bugs.gnu.org/47576>> (ibus-daemon launches ungrafted subprocesses)>> <https://bugs.gnu.org/47614>> (Chunked store references in .zo files in Racket 8)
OK.
Toggle quote (14 lines)>> and unrelated UTF-{16,32}-base32-looking references getting rewritten.>>>> I guess the latter is very unlikely because only sequences found in the>> replacement table may be rewritten, right?>> Yes, that's correct.>>> The former should be caught by ‘tests/grafts.scm’ but we could also>> check the closure of real-world systems, for instance, to make sure it>> doesn’t refer to ungrafted things.>> As I mention above, I've already done so for my own GNOME-based Guix> system.
Yes, we should be safe.
Toggle quote (17 lines)>> Do you know how this affects performance?>> I have not yet measured it, but subjectively, I do not notice any> obvious difference in speed.>> I expect that the main performance impact is that large blocks of NULs> will be processed more slowly by the current draft version of the new> grafting code. That's because NULs can now be part of a nix hash, and> therefore the new grafting code can only advance 1 byte position when> seeing a NUL, whereas previously it would skip ahead 33 positions in> that case.>> If desired, the handling of NULs could be made more efficient, at the> cost of a bit more complexity. When seeing a NUL, we could check the> adjacent bytes to see if this is part of a nix-base32 character in> UTF-16 or UTF-32 encoding. If not, we could skip ahead.
Let’s keep it this way for now; we can always revisit later if we feelthe need for it.
Toggle quote (7 lines)>> For clarity, perhaps you can define a top-level procedure for the test>> and call it from ‘for-each’.>> Good idea. I'd like to optimize the tests a bit before pushing this.> They take a fairly long time to run, and lead to a huge 1.5G grafts.log> file. It might be sufficient to avoid 'test-equal' here.
Ah yes, rather use ‘test-assert’ or some such to avoid the huge logfile. :-)
Toggle quote (4 lines)> Below, I've attached my current revision of this draft patch, which> incorporates your suggestions regarding the main code. What remains is> to improve the tests.
LGTM! Feel free to push this version or an improved one. I think it’sgood to have it in the upcoming release, and if it’s pushed sooner,we’ll have more time to react in case something’s wrong.
Thank you!
Ludo’.
L
L
Ludovic Courtès wrote on 14 Apr 17:19 +0200
control message for bug #47297
(address . control@debbugs.gnu.org)
871rbcgagq.fsf@gnu.org
block 47297 by 33848quit
L
L
Leo Famulari wrote on 15 Apr 00:37 +0200
Re: bug#33848: Store references in SBCL-compiled code are "invisible"
(name . Ludovic Courtès)(address . ludo@gnu.org)
YHduuR+fX4/nRI4Z@jasmine.lan
On Wed, Apr 14, 2021 at 12:55:43PM +0200, Ludovic Courtès wrote:
Toggle quote (4 lines)> LGTM! Feel free to push this version or an improved one. I think it’s> good to have it in the upcoming release, and if it’s pushed sooner,> we’ll have more time to react in case something’s wrong.
Just FYI to everyone reading, we aim to release 1.2.1 this Sunday, April18.
M
M
Mark H Weaver wrote on 15 Apr 09:26 +0200
(name . Ludovic Courtès)(address . ludo@gnu.org)
871rbcj9dn.fsf@netris.org
Ludovic Courtès <ludo@gnu.org> writes:
Toggle quote (4 lines)> LGTM! Feel free to push this version or an improved one. I think it’s> good to have it in the upcoming release, and if it’s pushed sooner,> we’ll have more time to react in case something’s wrong.
I pushed an improved version of the patch to 'master' as commit1bab9b9f17256a9e4f45f5b0cceb8b52e0a1b1ed.
Thanks, Mark
P
P
Pierre Neidhardt wrote on 16 Apr 11:44 +0200
87v98mva15.fsf@ambrevar.xyz
Just tested like Ludo did: Nyxt works now without the --no-graftsoption!
Thank you so much, Mark, this is a huge step forward for Nyxt (and Guixfor course)! :)
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQFGBAEBCAAwFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAmB5XHYSHG1haWxAYW1icmV2YXIueHl6AAoJEJvc9Jeku8x/WZ0H/Rlahh8Sq7sOjMiQMxmRBwLbx/ZixNrgMaDF4yiEIYBTixVTK1pXuDnbj460tp/VtgP0Dhsb0dZ49xz9FEkcY9h2X0eJD0l+gaxXoRCc5b0ObaUqdtdAWnfv5++srz9OqCUN7Bxkhh5ydVVxWXHLEkuqHsVyolOvj7sA0d501KP230NaxhntAONQtxuQD19DwVmz0IuW1trFvvWZjpVAgHR/KBXKuZs0uh8oJYRt94OE6ubVV+nY6eSOC+2V0XvhE6fqmSbvPA7R8EvRGdy9/+I07vWKUPw4lGhhioT4Ss1lsUxmK1Ls4caurTzzXb8g/MerhDJlM6vbENqpogG3n2o==EQ8X-----END PGP SIGNATURE-----
L
L
Ludovic Courtès wrote on 22 Apr 15:36 +0200
control message for bug #33848
(address . control@debbugs.gnu.org)
877dku1lx7.fsf@gnu.org
tags 33848 fixedclose 33848 quit
M
M
Mark H Weaver wrote on 30 Apr 22:03 +0200
Re: bug#33848: Store references in SBCL-compiled code are "invisible"
878s4zo7z2.fsf@netris.org
Hi Pierre, and Ludovic,
Pierre Neidhardt <mail@ambrevar.xyz> writes:
Toggle quote (4 lines)> I also agree this looks like a net win even though the `guix gc` issue> is not fix.> The latter seems to be relatively easier to fix, doesn't it?
Yes. I now have a plan to finally fix this bug for good.
I intend to write a scanner in Scheme that is able to find Nix hashesencoded in ASCII, UTF-16, or UTF-32. Using that, I'll write a procedurethat, for each package output, finds all store references that are foundencoded in UTF-16 or UTF-32 but never in ASCII, and write thosereferences to a file (if that set is nonempty). This procedure can thenbe used by selected packages and/or build-systems.
However, there's one thing I will need: the set of all transitive inputs(and native-inputs, including implicit inputs) available to the build,i.e. the set of possible hashes that might legitimately be found by thescanner.
Ludovic: what's the best way to get that list from the build-side code?
Thanks, Mark
-- Disinformation flourishes because many people care deeply about injusticebut very few check the facts. Ask me about https://stallmansupport.org.
P
P
Pierre Neidhardt wrote on 1 May 11:22 +0200
87fsz6st9w.fsf@ambrevar.xyz
Hi Mark,
thanks for the update and thanks for getting your hands dirty, this is avery valuable contribution to the core of Guix!
Cheers!
-- Pierre Neidhardthttps://ambrevar.xyz/
-----BEGIN PGP SIGNATURE-----
iQFGBAEBCAAwFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAmCNHcsSHG1haWxAYW1icmV2YXIueHl6AAoJEJvc9Jeku8x/e7UIAJth7AUZl+5Y0is/vcfuMijMwswE0x2E1IYnWiB9HVnsf2GCp/RoSVwNZzowbH4ehflXCfwS92LD2LlYtv8UzTDL4qT3bbpUCUckgLXi6uSeSdUPnGqmsAozvq0hOMjTCvhgaVgdiKgEDVb4yupkdwLKgeOooBBUGhrnh4YXN/YVOXGB55PFpu5GW8ZUZvMfaav3SrOHKrvvlZZO7PbYA6dfXM/yFL6GtZH/Iicgjfy+TwgZcIfBPKjRWh9GiIzE7C5EgQ2SJkYWxRPsGEPwU81GclibR3a7IYeCpn41MbbVqmsXoD7O0GGkskZ7TPmdzKcmSocagvRo8sRVVtiu+O8==gs+Y-----END PGP SIGNATURE-----
L
L
Ludovic Courtès wrote 4 days ago
(name . Mark H Weaver)(address . mhw@netris.org)
878s4l4q0i.fsf@gnu.org
Hi Mark,
Mark H Weaver <mhw@netris.org> skribis:
Toggle quote (14 lines)> I intend to write a scanner in Scheme that is able to find Nix hashes> encoded in ASCII, UTF-16, or UTF-32. Using that, I'll write a procedure> that, for each package output, finds all store references that are found> encoded in UTF-16 or UTF-32 but never in ASCII, and write those> references to a file (if that set is nonempty). This procedure can then> be used by selected packages and/or build-systems.>> However, there's one thing I will need: the set of all transitive inputs> (and native-inputs, including implicit inputs) available to the build,> i.e. the set of possible hashes that might legitimately be found by the> scanner.>> Ludovic: what's the best way to get that list from the build-side code?
You can use #:references-graphs for that.
Sorry for the delay!
Ludo’.
?