'git-fetch' fails to delete files with non-ASCII names

  • Done
  • quality assurance status badge
Details
4 participants
  • Attila Lendvai
  • Liliana Marie Prikler
  • Ludovic Courtès
  • Maxime Devos
Owner
unassigned
Submitted by
Attila Lendvai
Severity
important
A
A
Attila Lendvai wrote on 12 Apr 2022 21:47
guix-daemon, locale, LANG, and unicode in git tag names
(name . bug-guix@gnu.org)(address . bug-guix@gnu.org)
dgH8LAbYICFwRYBwMqWgymICHzzpPfoGjhJaxzv82-4I5UNd6NWdkLbJehfCHGTEpQk2KvGwhI0OIeRkSu2F85hvrJelJKM_Hv2OEwTp5B0=@lendvai.name
i'm trying to build a golang package that i have just imported. its repo has a tag with unicode in it, namely v½.2.0, as observable at https://github.com/klauspost/pgzip/tags

(define-public the-pkg
(package
(name "go-github-com-klauspost-pgzip")
(version "1.0.2-0.20170402124221-0bf5dcad4ada")
(source
(origin
(method git-fetch)
(uri (git-reference
(commit "0bf5dcad4ada")))
(file-name (git-file-name name version))
(sha256
(base32 "0dgp2iljvhibzxia1g3lsfg4bjmfh4kf0bfrmfi7sd49hwhrvk7s"))))
(build-system go-build-system)
(arguments '(#:skip-build? #t #:import-path "github.com/klauspost/pgzip"))
(synopsis "pgzip")
(description
"Package pgzip implements reading and writing of gzip format compressed files, as
(license license:expat)))

i have attached the build log, but the essence is this:

guile: warning: failed to install locale

and i can't get rid of this^ warning. i installed glibc-locales to root and my user, reconfigured, restarted the guix-daemon.

which is probably the cause of the ultimate error:

warning: failed to delete .git/refs/tags/v??.2.0: No such file or directory
r:sha256 hash mismatch for...

the daemon starts from an empty env:


and then copies the env from the derivation, but it doesn't seem to contain any LANG value. i assume guile is also launched then without a LANG env. BTW, guile could be more informative in its warning, too.

--
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“The unexamined life is not worth living for a human being.”
— Socrates (c. 470–399 BC, tried and executed), 'Apology' (399 BC)
Attachment: build-log
M
M
Maxime Devos wrote on 12 Apr 2022 22:40
be1fe91166c5a9e95975084d9e8dc7f80222cf4d.camel@telenet.be
Attila Lendvai schreef op di 12-04-2022 om 19:47 [+0000]:
Toggle quote (17 lines)
> and i can't get rid of this^ warning. i installed glibc-locales to
> root and my user, reconfigured, restarted the guix-daemon.
>
> which is probably the cause of the ultimate error:
>
> warning: failed to delete .git/refs/tags/v??.2.0: No such file or
> directory
> r:sha256 hash mismatch for...
>
> the daemon starts from an empty env:
>
> https://git.savannah.gnu.org/cgit/guix.git/tree/nix/libstore/build.cc#n1590
>
> and then copies the env from the derivation, but it doesn't seem to
> contain any LANG value. i assume guile is also launched then without
> a LANG env. BTW, guile could be more informative in its warning, too.

Some remarks:

* LANG should be set, because it is in #:leaked-env-vars (see
guix/git-download.scm). I don't know whose LANG it is though
-- the user's, or the daemon's?

* To install an UTF-8 locale, you need glibc-locales (or possibly
glibc-utf8-locales) (At least, for now. Upstream has some plans
for including a C.UTF-8 locale so maybe eventually we can fallback
to C.UTF-8.)

* This locale data needs to be in $GUIX_LOCPATH.

* GUIX_LOCPATH is not leaked.

* Even if it was, I don't think that /gnu/store/...glibc-locales
would be accessible from the build container (though you could give
it a try?).

* So perhaps GUIX_LOCPATH needs to be set in the gexp in
guix/git-download.scm, + some setlocale as done by
gnu-build-system.

* Long-term, it could be interesting to remove the
‘file name = string encoded in current locale's encoding’
assumption from Guile.

* svn-download, hg-download, bzr-download and cvs-download
probably have the same issue.

Greetings,
Maxime.
-----BEGIN PGP SIGNATURE-----

iI0EABYKADUWIQTB8z7iDFKP233XAR9J4+4iGRcl7gUCYlXj2xccbWF4aW1lZGV2
b3NAdGVsZW5ldC5iZQAKCRBJ4+4iGRcl7jnfAP9cLhZT3dU0MecJKeU/FkrkHHBb
5j30CWhGVi7rHTCU4gEAqrB966PjXpiJEwSx/SnGyj4QOgDyNH0N3lt0cp21MAU=
=ND06
-----END PGP SIGNATURE-----


A
A
Attila Lendvai wrote on 13 Apr 2022 09:51
(name . Maxime Devos)(address . maximedevos@telenet.be)(address . 54893@debbugs.gnu.org)
4sSjKaCcadx8brYQC5HZuP-SyMku3BlXRTZwaCUH13qSv01N33lk9vyUWzzE6R889ZuQRpI_6Pl4Q_51v8jMhUhwh6f9rly5h0EhlUqHG80=@lendvai.name
Toggle quote (5 lines)
> * LANG should be set, because it is in #:leaked-env-vars (see
> guix/git-download.scm). I don't know whose LANG it is though
> -- the user's, or the daemon's?


if i add this to the gexp:

(simple-format (current-error-port)
"LANG is '~A'~%"
(getenv "LANG"))
(setenv "LANG" "en_US.utf8")
(setenv "GUIX_LOCPATH" "/run/current-system/locale")
(setlocale LC_ALL (getenv "LANG"))

i see:

LANG is ''
Backtrace:
2 (primitive-load "/gnu/store/z4bis94jg0s0y0xj1xbmliv7xs8?")
In ice-9/eval.scm:
619:8 1 (_ #f)
In unknown file:
0 (setlocale 6 "en_US.utf8")

ERROR: In procedure setlocale:
In procedure setlocale: Invalid argument


Toggle quote (3 lines)
> * GUIX_LOCPATH is not leaked.


it's the same if i add GUIX_LOCPATH to the #:leaked-env-vars and don't setenv it explicitly.


Toggle quote (5 lines)
> * Even if it was, I don't think that /gnu/store/...glibc-locales
> would be accessible from the build container (though you could give
> it a try?).


i didn't check this specifically, but i'm afraid you are right, and this is why my kludge doesn't work.


Toggle quote (5 lines)
> * So perhaps GUIX_LOCPATH needs to be set in the gexp in
> guix/git-download.scm, + some setlocale as done by
> gnu-build-system.


i don't understand why the setlocale call in gnu-build-system's install-locale works, but my setlocale kludge in git-download doesn't.

i even tried to add glibc-locale as native-inputs to the package in question, but it didn't help.


Toggle quote (5 lines)
> * Long-term, it could be interesting to remove the
> ‘file name = string encoded in current locale's encoding’
> assumption from Guile.


i'm not sure why the wrong locale breaks file-system walking and deleting, though.

i assume if every function in guile uses/assumes the same locale (character encoding), then both directions through the guile FFI should be idempotent, no? and i think both ASCII and UTF-8 are idempotent wrt C bytes <-> scheme string conversions. IOW, it's only the displaying of the chars that should be broken, not file operations.

or am i wrong to assume this?

or maybe the character encoding algo used in guile's FFI silently emits actual question marks in place of bytes that are outside the valid range of the encoding used? if so, that's not a very defensive way of coding, and it's eating up hours of my life...

hrm... this is not relevant here, only a related thought: things can go wrong in the GEXP serialization, too: if the writing side and the reading side doesn't use the same character encoding. locale should be set explicitly at the relevant entry points.

i'd appreciate if someone could help me come up with at least a kludge, so that i could make progress until it's fixed properly.

thanks for your insights Maxime,

--
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
If you never heal from what hurt you, you'll bleed on people who didn't cut you.
M
M
Maxime Devos wrote on 13 Apr 2022 10:03
(name . Attila Lendvai)(address . attila@lendvai.name)(address . 54893@debbugs.gnu.org)
0ce6f8b91cdd2979ec7dc74fc8e4eef68d2c7e42.camel@telenet.be
Attila Lendvai schreef op wo 13-04-2022 om 07:51 [+0000]:
Toggle quote (3 lines)
> i don't understand why the setlocale call in gnu-build-system's
> install-locale works, but my setlocale kludge in git-download doesn't.

I don't expect /run/current-system/locale to exist inside the build
container. Maybe try

(setenv "GUIX_LOCPATH" #+(file-append glibc-locales "/lib/locale"))
;; for testing
((@ (guix build utils) invoke)
#+(file-append coreutils "/bin/ls") (getenv "GUIX_LOCPATH"))

instead?

gnu-build-system has a (variant of) glibc-locales in its (implicit)
inputs, so there GUIX_LOCPATH can be set to the /gnu/store/.../locales
file name, in the 'set-paths' procedure.

Toggle quote (2 lines)
> i even tried to add glibc-locale as native-inputs to the package in question, but it didn't help.

Building the package and downloading the source code are separate steps
(derivations), they don't automatically have access to each other's
inputs.

Greetings,
Maxime
-----BEGIN PGP SIGNATURE-----

iI0EABYKADUWIQTB8z7iDFKP233XAR9J4+4iGRcl7gUCYlaDtBccbWF4aW1lZGV2
b3NAdGVsZW5ldC5iZQAKCRBJ4+4iGRcl7gpSAQC2M6xD9u+2sCiBZc/wUiCrsZ9o
vg1DifyPkx7qhAJk5AEAoD+1AQf/O+jSTBqlH62NyS591T26MKnhQewpk+kLuAE=
=LWfP
-----END PGP SIGNATURE-----


M
M
Maxime Devos wrote on 13 Apr 2022 10:22
(name . Attila Lendvai)(address . attila@lendvai.name)(address . 54893@debbugs.gnu.org)
d7cf39802973624ab080d53efc4c84a33c397707.camel@telenet.be
Attila Lendvai schreef op wo 13-04-2022 om 07:51 [+0000]:
Toggle quote (7 lines)
> i'm not sure why the wrong locale breaks file-system walking and deleting, though.
>
> i assume if every function in guile uses/assumes the same locale (character
> encoding), then both directions through the guile FFI should be idempotent, no?
> and i think both ASCII and UTF-8 are idempotent wrt C bytes <-> scheme string
> conversions.

The problem is that the default character encoding is ANSI_X3.4-1968
(US-ASCII) and any bytes above 127 makes things non-ASCII.

Also, the string procedures internally always use UTF-8 (or possibly
ISO-85519-1 as an optimisation?), they are not raw bytes instead they
can be consideres a vector of characters (string-ref returns
characters, not bytes, and doesn't use byte positions).

Toggle quote (3 lines)
> IOW, it's only the displaying of the chars that should be broken,
> not file operations.

LANG=bogus guile
(guile-user)> (setlocale LC_ALL)
(guile-user)> (use-modules (ice-9 i18n))
(guile-user)> (locale-encoding)
(guile-user)> (locale-encoding)
$2 = "ANSI_X3.4-1968"

Apparently the fallback encoding is ‘ANSI_X3.4-1968’. Let's take a
look at this encoding. According to IANA
this character encoding can also be named ‘US-ASCII’ and is specified
in RFC2046. Some excerpts:

"US-ASCII" does not indicate an arbitrary 7-bit
character set[sic], but specifies that all octets in the body must
be interpreted as characters according to the US-ASCII character
set.

so it looks like, say, é cannot be encoded as US-ASCII, it does not
belong to the character set of the encoding. More generally, anything
beyond the 127 (Unicode) codepoint cannot be encoded in ANSI_X3.4-1968.

Let's test this (in a new REPL with an UTF-8 locale):

((@ (ice-9 iconv) string->bytevector) "é" "ANSI_X3.4-1968")
ice-9/boot-9.scm:1669:16: In procedure raise-exception:
Throw to key `encoding-error' with args `("put-char" "conversion to port encoding failed" 84 #<output: string 7fd5bbc23ee0> #\é)'.

((@ (ice-9 iconv) string->bytevector) "é" "ANSI_X3.4-1968" 'substitute)
$2 = #vu8(63)
((@ (rnrs bytevectors) utf8->string) #vu8(63))
$3 = "?"

and the other direction:

((@ (ice-9 iconv) bytevector->string) #vu8(128) "ANSI_X3.4-1968" 'substitute)
$5 = "?" ;; why #\? and not #\?? I don't know, I guess Guile is inconsistent

(FWIW, I would throw an decoding-error here instead of silently corrupting the
file names.)

Greetings,
Maxime.
-----BEGIN PGP SIGNATURE-----

iI0EABYKADUWIQTB8z7iDFKP233XAR9J4+4iGRcl7gUCYlaIRhccbWF4aW1lZGV2
b3NAdGVsZW5ldC5iZQAKCRBJ4+4iGRcl7h9RAP0Sc0iIJeY1/HWIIAkeBn7NNTos
QFDwM3HiI7FUy0l5oAD/RlA9EZ8jjTbGpl+tfqBfzLU32bIUv6d23iRBwCx9jgw=
=ksO0
-----END PGP SIGNATURE-----


M
M
Maxime Devos wrote on 13 Apr 2022 10:29
(name . Attila Lendvai)(address . attila@lendvai.name)(address . 54893@debbugs.gnu.org)
4226ef15e28746f7b6ef394ec20f378fc6793776.camel@telenet.be
Attila Lendvai schreef op wo 13-04-2022 om 07:51 [+0000]:
Toggle quote (5 lines)
> hrm... this is not relevant here, only a related thought: things
> can go wrong in the GEXP serialization, too: if the writing side
> and the reading side doesn't use the same character encoding.
> locale should be set explicitly at the relevant entry points.

Serialisation is always done in UTF-8, search for UTF-8 in (guix gexp).
I don't know if deserialisation of the script in done in UTF-8, though
it should be done that way.

Also, using the same character encoding is not sufficient, the
character encoding must also encode all characters (in practice, that
means UTF-8).

Greetings,
Maxime.
-----BEGIN PGP SIGNATURE-----

iI0EABYKADUWIQTB8z7iDFKP233XAR9J4+4iGRcl7gUCYlaJ7BccbWF4aW1lZGV2
b3NAdGVsZW5ldC5iZQAKCRBJ4+4iGRcl7gFbAP4zTFXoyfxGkB+ljAn6vVob9eVq
dYuWiMhJ7nuMxbp/oAD/Ywv2zkjZcn9kzzfSRjeh6MAl3DDPjKp1WkUx/B8q8Ac=
=l57W
-----END PGP SIGNATURE-----


A
A
Attila Lendvai wrote on 13 Apr 2022 10:45
(name . Maxime Devos)(address . maximedevos@telenet.be)(address . 54893@debbugs.gnu.org)
JCMTCZvk2OTWWeyGIiR41DHbLWe7Ks7r-ekFRD8SVaYA8a3FaWMvTFxUBS3VQ94vVTYX0hWe5v18wrg1N8OJCW6P8dVauBMumKK_Kf2tyBU=@lendvai.name
Toggle quote (11 lines)
> I don't expect /run/current-system/locale to exist inside the build
> container. Maybe try
>
> (setenv "GUIX_LOCPATH" #+(file-append glibc-locales "/lib/locale"))
> ;; for testing
> ((@ (guix build utils) invoke)
> #+(file-append coreutils "/bin/ls") (getenv "GUIX_LOCPATH"))
>
> instead?


thank you, this works indeed as a band aid:

(setenv "GUIX_LOCPATH" #+(file-append glibc-locales "/lib/locale"))
(setlocale LC_ALL "en_US.utf8")

--
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“If a nation expects to be ignorant and free, in a state of civilization, it expects what never was and never will be.”
— Thomas Jefferson (1743–1826)
L
L
Liliana Marie Prikler wrote on 13 Apr 2022 12:40
(address . 54893@debbugs.gnu.org)
a4af2f36b5db04876d351d4f929bc0210a70b78e.camel@ist.tugraz.at
Am Mittwoch, dem 13.04.2022 um 10:22 +0200 schrieb Maxime Devos:
Toggle quote (19 lines)
> [...]
> Let's test this (in a new REPL with an UTF-8 locale):
>
> ((@ (ice-9 iconv) string->bytevector) "é" "ANSI_X3.4-1968")
> ice-9/boot-9.scm:1669:16: In procedure raise-exception:
> Throw to key `encoding-error' with args `("put-char" "conversion to
> port encoding failed" 84 #<output: string 7fd5bbc23ee0> #\é)'.
>
> ((@ (ice-9 iconv) string->bytevector) "é" "ANSI_X3.4-1968" 'substitute)
> $2 = #vu8(63)
> ((@ (rnrs bytevectors) utf8->string) #vu8(63))
> $3 = "?"
>
> and the other direction:
>
> ((@ (ice-9 iconv) bytevector->string) #vu8(128) "ANSI_X3.4-1968"
> 'substitute)
> $5 = "?" ;; why #\? and not #\?? I don't know, I guess Guile is
> inconsistent
You are first encoding a non-ASCII byte to ASCII, which has no glyph
for "I have no idea what this is", so a question mark (#\?) is used.
When converting from invalid ASCII to UTF-8 on the other hand, you do
have #\? as the WTF character, so that is used instead. This is
entirely consistent :)

Cheers
M
M
Maxime Devos wrote on 13 Apr 2022 12:57
(address . 54893@debbugs.gnu.org)
7fef2514b5495e71eb8fa720a8964d938ad6a443.camel@telenet.be
Liliana Marie Prikler schreef op wo 13-04-2022 om 12:40 [+0200]:
Toggle quote (6 lines)
> You are first encoding a non-ASCII byte to ASCII, which has no glyph
> for "I have no idea what this is", so a question mark (#\?) is used.
> When converting from invalid ASCII to UTF-8 on the other hand, you do
> have #\? as the WTF character, so that is used instead.  This is
> entirely consistent :)

Right, makes sense.
-----BEGIN PGP SIGNATURE-----

iI0EABYKADUWIQTB8z7iDFKP233XAR9J4+4iGRcl7gUCYlasgxccbWF4aW1lZGV2
b3NAdGVsZW5ldC5iZQAKCRBJ4+4iGRcl7kRMAP4+VpXoZ/4vBV6xWlq3/lpCupMC
c5PH+OUXbehxtWqAbgEA3CVeTEOF5lUGMnboFPjN5RDsbuaMzwtR2h6+m4EmVAs=
=+B3A
-----END PGP SIGNATURE-----


A
A
Attila Lendvai wrote on 19 Apr 2022 13:38
(name . Maxime Devos)(address . maximedevos@telenet.be)(address . 54893@debbugs.gnu.org)
WcqSj-dYaLnSoWUgtJRJ7q-cPoDkwPkyysN70f0ECkmqH8sc2TMeuKYSfY0VV1MlOxNg5YkjNs1g0BM3eoxNuPw044fcN__02VZO_wS_L-s=@lendvai.name
Toggle quote (6 lines)
> thank you, this works indeed as a band aid:
>
> (setenv "GUIX_LOCPATH" #+(file-append glibc-locales "/lib/locale"))
> (setlocale LC_ALL "en_US.utf8")


i spoke too early. this works in a git checkout of guix, but it fails
to compile when i try to guix pull it.

even if i declare the dependency like this:

#:autoload (gnu packages base) (glibc-locales)

IIUC, this is due to a circular dependency: glibc-locales (and its
variants) depend on git-fetch, therefore i cannot refer to them from
the implementation of git-fetch.

i tried to set the locale to "C" or "POSIX", but it results in ASCII encoding.

i tried to set the locale to "en_US.iso-8859-1", hoping that it's
available, but it isn't.

all that is needed here is an encoding that is idempotent wrt a cycle
through bytes->string, string->bytes. i think the iso-8859-n encodings
are like that.

to verify that hypothesis:

$ mkdir -p /tmp/delme/v½.2.0
$ LANG=C guix repl
scheme@(guix-user)> (use-modules (guix build utils))
scheme@(guix-user)> (delete-file-recursively "/tmp/delme")
warning: failed to delete /tmp/delme/v??.2.0: No such file or directory
warning: failed to delete /tmp/delme: Directory not empty
$1 = #t
$2 = #<vhash 7fd60aef5540 1 pairs>
scheme@(guix-user)>

$ LANG=en_US.iso-8859-1 guix repl
scheme@(guix-user)> (use-modules (guix build utils))
scheme@(guix-user)> (delete-file-recursively "/tmp/delme")
$1 = #<vhash 7f7d7acc2040 2 pairs>
scheme@(guix-user)>


so, is such an idempotent locale available/embedded in glibc without any external dependencies? searching the web suggests that there isn't.

if not, then what would be a bird's eye view plan to make one
available for git-fetch?

should we create a new, ASCII-only git-fetch variant used in the bootstrap process?

--
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“The world is changed by your example, not by your opinion.”
— Paulo Coelho (1947–)
M
M
Maxime Devos wrote on 19 Apr 2022 17:45
(name . Attila Lendvai)(address . attila@lendvai.name)(address . 54893@debbugs.gnu.org)
4ca304c8aa9a3dd176902bac7e22a678b020d69a.camel@telenet.be
Attila Lendvai schreef op di 19-04-2022 om 11:38 [+0000]:
Toggle quote (17 lines)
> > thank you, this works indeed as a band aid:
> >
> > (setenv "GUIX_LOCPATH" #+(file-append glibc-locales "/lib/locale"))
> > (setlocale LC_ALL "en_US.utf8")
>
>
> i spoke too early. this works in a git checkout of guix, but it fails
> to compile when i try to guix pull it.
>
> even if i declare the dependency like this:
>
> #:autoload   (gnu packages base) (glibc-locales)
>
> IIUC, this is due to a circular dependency: glibc-locales (and its
> variants) depend on git-fetch, therefore i cannot refer to them from
> the implementation of git-fetch.

The module of the glibc-locales package depends on git-fetch, but I
don't think the package glibc-locales does. Anyway, circular imports
are messy and (guix build-system ...) and (guix git-download) use an
extra-lazy variant of #:autoload that doesn't load the module even when
compiling (*).

(*) Limitation: this method cannot be used to use macros.

Maybe the attached variant works?

Greetings,
Maxime.
Toggle diff (22 lines)
diff --git a/guix/git-download.scm b/guix/git-download.scm
index 5e624b9ae9..a74ba5f592 100644
--- a/guix/git-download.scm
+++ b/guix/git-download.scm
@@ -104,6 +104,9 @@ (define guile-zlib
(define gnutls
(module-ref (resolve-interface '(gnu packages tls)) 'gnutls))
+ (define glibc-locales
+ (module-ref (resolve-interface '(gnu packages base)) 'glibc-locales))
+
(define modules
(delete '(guix config)
(source-module-closure '((guix build git)
@@ -122,6 +125,8 @@ (define build
(guix swh)
(ice-9 match))
+ (pk #+glibc-locales)
+ (error "see the pk")
(define recursive?
(call-with-input-string (getenv "git recursive?") read))
-----BEGIN PGP SIGNATURE-----

iI0EABYKADUWIQTB8z7iDFKP233XAR9J4+4iGRcl7gUCYl7ZIxccbWF4aW1lZGV2
b3NAdGVsZW5ldC5iZQAKCRBJ4+4iGRcl7onFAP0TjFAJt4hpeIvYd9ZGMiVNYiSr
/9dcnRDVmX+3C37IFgEA234kUQu0kSFHYs1115Cjan5M/JJ/ucN2X2QjKJH2lw4=
=b7p6
-----END PGP SIGNATURE-----


M
M
Maxime Devos wrote on 19 Apr 2022 18:07
(name . Attila Lendvai)(address . attila@lendvai.name)(address . 54893@debbugs.gnu.org)
37f16856d65add3926a5554ee33773a4d1e5f02f.camel@telenet.be
Attila Lendvai schreef op di 19-04-2022 om 11:38 [+0000]:
Toggle quote (3 lines)
> so, is such an idempotent locale available/embedded in glibc without
> any external dependencies? searching the web suggests that there isn't.

Try:

$ LC_CTYPE=anything.ISO-8859-2 guix repl é
hint: Consider installing the `glibc-locales' package and defining `GUIX_LOCPATH', along these lines:

guix install glibc-locales
export GUIX_LOCPATH="$HOME/.guix-profile/lib/locale"

See the "Application Setup" section in the manual, for more info.

;;; Stat of /home/[...]/?Š failed:
;;; In procedure stat: Bestand of map bestaat niet: "/home/[...]/?Š"
guix repl: fout: open-file: Bestand of map bestaat niet: "/home/regulator/source-code/rw/?Š"

IIUC, this causes lib/localcharset.c in Guile to run 'eviron_locale_charset',
which just uses environment variables and ignores glibc's locale data.

I don't know if this requires 'setlocale' or requires the absence of 'setlocale'.

Greetings,
Maxime.
-----BEGIN PGP SIGNATURE-----

iI0EABYKADUWIQTB8z7iDFKP233XAR9J4+4iGRcl7gUCYl7eORccbWF4aW1lZGV2
b3NAdGVsZW5ldC5iZQAKCRBJ4+4iGRcl7mCQAP0ThPcXsgBkGrC9MBVUlFmriZwF
yKNQ9rOQmfj32TqE2AEA3pb/CMG2BdGIcFrfWdPqczFGz2UYKRrAMQoYWtUYpA8=
=LofS
-----END PGP SIGNATURE-----


A
A
Attila Lendvai wrote on 19 Apr 2022 20:09
[PATCH] guix: git-download: Set locale to deal with Unicode in git metadata.
(address . 54893@debbugs.gnu.org)(name . Attila Lendvai)(address . attila@lendvai.name)
20220419180954.9636-1-attila@lendvai.name
Without this the git-fetch GEXP is run in an environment that uses ASCII
character encoding when strings are crossing the Guile - C boundary. It means
that e.g. tag names that have Unicode chars in them will cause problems,
e.g. when walking and deleting the .git directory.


For more details see: https://issues.guix.gnu.org/54893

* guix/git-download.scm (git-fetch): Call setlocale to set it to en_US.utf8.
---

thanks Maxime, this indeed seems to work! and i have successfully
guix pull'ed it, too.

guix/git-download.scm | 10 ++++++++++
1 file changed, 10 insertions(+)

Toggle diff (30 lines)
diff --git a/guix/git-download.scm b/guix/git-download.scm
index 5e624b9ae9..2fc5a06490 100644
--- a/guix/git-download.scm
+++ b/guix/git-download.scm
@@ -104,6 +104,9 @@ (define guile-zlib
(define gnutls
(module-ref (resolve-interface '(gnu packages tls)) 'gnutls))
+ (define glibc-locales
+ (module-ref (resolve-interface '(gnu packages base)) 'glibc-locales))
+
(define modules
(delete '(guix config)
(source-module-closure '((guix build git)
@@ -121,6 +124,13 @@ (define build
(guix build download-nar)
(guix swh)
(ice-9 match))
+ ;; We must set the locale to something/anything that will make the
+ ;; Guile FFI use a character encoding that is idempotent through a
+ ;; bytes->string string->bytes roundtrip. Otherwise e.g. git tags
+ ;; with Unicode characters would break things. For more details
+ ;; and an example see https://issues.guix.gnu.org/54893
+ (setenv "GUIX_LOCPATH" #+(file-append glibc-locales "/lib/locale"))
+ (setlocale LC_ALL "en_US.utf8")
(define recursive?
(call-with-input-string (getenv "git recursive?") read))
--
2.35.1
L
L
Ludovic Courtès wrote on 20 Apr 2022 22:12
Re: bug#54893: guix-daemon, locale, LANG, and unicode in git tag names
(name . Attila Lendvai)(address . attila@lendvai.name)
875yn3o6ol.fsf_-_@gnu.org
Hi,

Attila Lendvai <attila@lendvai.name> skribis:

Toggle quote (11 lines)
> Without this the git-fetch GEXP is run in an environment that uses ASCII
> character encoding when strings are crossing the Guile - C boundary. It means
> that e.g. tag names that have Unicode chars in them will cause problems,
> e.g. when walking and deleting the .git directory.
>
> An example in the wild: https://github.com/klauspost/pgzip/tags
>
> For more details see: https://issues.guix.gnu.org/54893
>
> * guix/git-download.scm (git-fetch): Call setlocale to set it to en_US.utf8.

[...]

Toggle quote (3 lines)
> + (define glibc-locales
> + (module-ref (resolve-interface '(gnu packages base)) 'glibc-locales))

I changed this to ‘glibc-utf8-locales’, which is sufficient here, and
committed.

Thanks everyone for the investigation and fix!

Ludo’.
Closed
L
L
Ludovic Courtès wrote on 20 Apr 2022 23:15
control message for bug #54893
(address . control@debbugs.gnu.org)
87ee1rmp6y.fsf@gnu.org
severity 54893 important
quit
L
L
Ludovic Courtès wrote on 21 Apr 2022 00:15
Re: bug#54893: guix-daemon, locale, LANG, and unicode in git tag names
(name . Attila Lendvai)(address . attila@lendvai.name)(address . 54893-done@debbugs.gnu.org)
87sfq7l7un.fsf_-_@gnu.org
Attila Lendvai <attila@lendvai.name> skribis:

Toggle quote (11 lines)
> Without this the git-fetch GEXP is run in an environment that uses ASCII
> character encoding when strings are crossing the Guile - C boundary. It means
> that e.g. tag names that have Unicode chars in them will cause problems,
> e.g. when walking and deleting the .git directory.
>
> An example in the wild: https://github.com/klauspost/pgzip/tags
>
> For more details see: https://issues.guix.gnu.org/54893
>
> * guix/git-download.scm (git-fetch): Call setlocale to set it to en_US.utf8.

I spoke a bit too fast and realized some adjustments were needed to
avoid a circular dependency on i586-gnu. Pushed as
8852f911ff506dd50b714274ba0e2143f0285f78!

Ludo’.
Closed
L
L
Ludovic Courtès wrote on 21 Apr 2022 00:16
control message for bug #54893
(address . control@debbugs.gnu.org)
87r15rl7tk.fsf@gnu.org
retitle 54893 'git-fetch' fails to delete files with non-ASCII names
quit
?
Your comment

This issue is archived.

To comment on this conversation send an email to 54893@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 54893
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch