(guix cpio) produces corrupted archives when there are non-ASCII filenames

  • Done
  • quality assurance status badge
Details
3 participants
  • Maxim Cournoyer
  • Mark H Weaver
  • Simon Tournier
Owner
unassigned
Submitted by
Maxim Cournoyer
Severity
normal
M
M
Maxim Cournoyer wrote on 23 Feb 2023 04:14
(name . bug-guix)(address . bug-guix@gnu.org)
87mt55hy3x.fsf@gmail.com
Hi,

It appears that the code we have to generate CPIO archives doesn't
handle the presence of non-ASCII characters in the file names of files
to be archived well:

First, to make rpm usable on a Guix System:

Toggle snippet (5 lines)
# mkdir /var/lib/rpm
# chown root:users /var/lib/rpm
# chmod g+rw /var/lib/rpm

Then, produce a problematic CPIO via 'guix pack -f rpm', which uses
(guix cpio):

Toggle snippet (3 lines)
$ rpm_archive=$(guix pack -R -C none -f rpm nss-certs)

Notice that it cannot be installed:
Toggle snippet (6 lines)
$ mkdir /tmp/nss-certs
# rpm --prefix=/tmp/nss-certs -i $rpm_archive
error: unpacking of archive failed: cpio: Bad magic
error: nss-certs-3.81-0.x86_64: install failed

Let's now inspect the cpio itself.

Toggle snippet (14 lines)
$ guix shell rpm cpio
[env]$ rpm2cpio $rpm_archive > nss-certs.cpio
[env]$ cpio -t < nss-certs.cpio |& grep -B3 junk
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/9482e63a.0
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/9846683b.0
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/988a38cb.0
cpio: warning: skipped 248 bytes of junk
--
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/Microsoft_RSA_Root_Certificate_Authority_2017.pem
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/NAVER_Global_Root_Certification_Authority.pem
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/NetLock_Arany_=Class_Gold=_F?tanúsítvány.
cpio: warning: skipped 4 bytes of junk

I haven't yet pin-pointed what the problem is.

I could do with extra eyes :-).

--
Thanks,
Maxim
M
M
Maxim Cournoyer wrote on 24 Feb 2023 05:54
[PATCH] cpio: Properly handle Unicode characters in file names.
(address . 61722@debbugs.gnu.org)
20230224045402.26444-1-maxim.cournoyer@gmail.com

* guix/cpio.scm (file->cpio-header): Compute the file name length in bytes rather than in
characters.
(file->cpio-header*, special-file->cpio-header*): Likewise.
(write-cpio-archive): Likewise, and write the file name as UTF-8 bytes, not
textually, to avoid encoding it as ISO-8859-1.

---

guix/cpio.scm | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

Toggle diff (56 lines)
diff --git a/guix/cpio.scm b/guix/cpio.scm
index d4a7d5f1e0..8fd7552450 100644
--- a/guix/cpio.scm
+++ b/guix/cpio.scm
@@ -170,7 +170,8 @@ (define* (file->cpio-header file #:optional (file-name file)
#:size (stat:size st)
#:dev (stat:dev st)
#:rdev (stat:rdev st)
- #:name-size (string-length file-name))))
+ #:name-size (bytevector-length
+ (string->utf8 file-name)))))
(define* (file->cpio-header* file
#:optional (file-name file)
@@ -182,7 +183,8 @@ (define* (file->cpio-header* file
(make-cpio-header #:mode (stat:mode st)
#:nlink (stat:nlink st)
#:size (stat:size st)
- #:name-size (string-length file-name))))
+ #:name-size (bytevector-length
+ (string->utf8 file-name)))))
(define* (special-file->cpio-header* file
device-type
@@ -201,7 +203,8 @@ (define* (special-file->cpio-header* file
permission-bits)
#:nlink 1
#:rdev (device-number device-major device-minor)
- #:name-size (string-length file-name)))
+ #:name-size (bytevector-length
+ (string->utf8 file-name))))
(define %trailer
"TRAILER!!!")
@@ -237,7 +240,7 @@ (define (dump-file file)
;; We're padding the header + following file name + trailing zero, and
;; the header is 110 byte long.
- (write-padding (+ 110 1 (string-length file)) port)
+ (write-padding (+ 110 (bytevector-length (string->utf8 file)) 1) port)
(case (mode->type (cpio-header-mode header))
((regular)
@@ -246,7 +249,7 @@ (define (dump-file file)
(dump-port input port))))
((symlink)
(let ((target (readlink file)))
- (put-string port target)))
+ (put-bytevector port (string->utf8 target))))
((directory)
#t)
((block-special)

base-commit: c756c62cfdba8d4079be1ba9e370779b850f16b6
--
2.39.1
M
M
Mark H Weaver wrote on 24 Feb 2023 12:46
87bklj70bb.fsf@netris.org
Hi Maxim,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (25 lines)
>
> * guix/cpio.scm (file->cpio-header): Compute the file name length in bytes rather than in
> characters.
> (file->cpio-header*, special-file->cpio-header*): Likewise.
> (write-cpio-archive): Likewise, and write the file name as UTF-8 bytes, not
> textually, to avoid encoding it as ISO-8859-1.
>
> ---
>
> guix/cpio.scm | 13 ++++++++-----
> 1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/guix/cpio.scm b/guix/cpio.scm
> index d4a7d5f1e0..8fd7552450 100644
> --- a/guix/cpio.scm
> +++ b/guix/cpio.scm
> @@ -170,7 +170,8 @@ (define* (file->cpio-header file #:optional (file-name file)
> #:size (stat:size st)
> #:dev (stat:dev st)
> #:rdev (stat:rdev st)
> - #:name-size (string-length file-name))))
> + #:name-size (bytevector-length
> + (string->utf8 file-name)))))

(string-utf8-length file-name) would produce the same result more
efficiently.

Regards,
Mark
S
S
Simon Tournier wrote on 24 Feb 2023 11:18
control message for bug #61722
(address . control@debbugs.gnu.org)
87a613mkn0.fsf@gmail.com
tags 61722 + patch
quit
M
M
Maxim Cournoyer wrote on 24 Feb 2023 14:26
[PATCH v2] cpio: Properly handle Unicode characters in file names.
(address . 61722@debbugs.gnu.org)
20230224132652.4617-1-maxim.cournoyer@gmail.com

* guix/cpio.scm (file->cpio-header): Compute the file name length in bytes rather than in
characters.
(file->cpio-header*, special-file->cpio-header*): Likewise.
(write-cpio-archive): Likewise, and write the file name as UTF-8 bytes, not
textually, to avoid encoding it as ISO-8859-1.

---

Changes in v2:
- Use string-utf8-length

guix/cpio.scm | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

Toggle diff (53 lines)
diff --git a/guix/cpio.scm b/guix/cpio.scm
index d4a7d5f1e0..876f61ea3c 100644
--- a/guix/cpio.scm
+++ b/guix/cpio.scm
@@ -170,7 +170,7 @@ (define* (file->cpio-header file #:optional (file-name file)
#:size (stat:size st)
#:dev (stat:dev st)
#:rdev (stat:rdev st)
- #:name-size (string-length file-name))))
+ #:name-size (string-utf8-length file-name))))
(define* (file->cpio-header* file
#:optional (file-name file)
@@ -182,7 +182,7 @@ (define* (file->cpio-header* file
(make-cpio-header #:mode (stat:mode st)
#:nlink (stat:nlink st)
#:size (stat:size st)
- #:name-size (string-length file-name))))
+ #:name-size (string-utf8-length file-name))))
(define* (special-file->cpio-header* file
device-type
@@ -201,7 +201,7 @@ (define* (special-file->cpio-header* file
permission-bits)
#:nlink 1
#:rdev (device-number device-major device-minor)
- #:name-size (string-length file-name)))
+ #:name-size (string-utf8-length file-name)))
(define %trailer
"TRAILER!!!")
@@ -237,7 +237,7 @@ (define (dump-file file)
;; We're padding the header + following file name + trailing zero, and
;; the header is 110 byte long.
- (write-padding (+ 110 1 (string-length file)) port)
+ (write-padding (+ 110 (string-utf8-length file) 1) port)
(case (mode->type (cpio-header-mode header))
((regular)
@@ -246,7 +246,7 @@ (define (dump-file file)
(dump-port input port))))
((symlink)
(let ((target (readlink file)))
- (put-string port target)))
+ (put-bytevector port (string->utf8 target))))
((directory)
#t)
((block-special)

base-commit: c756c62cfdba8d4079be1ba9e370779b850f16b6
--
2.39.1
M
M
Maxim Cournoyer wrote on 25 Feb 2023 20:52
Re: bug#61722: (guix cpio) produces corrupted archives when there are non-ASCII filenames
(address . 61722-done@debbugs.gnu.org)
875ybpfrps.fsf_-_@gmail.com
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (8 lines)
>
> * guix/cpio.scm (file->cpio-header): Compute the file name length in bytes rather than in
> characters.
> (file->cpio-header*, special-file->cpio-header*): Likewise.
> (write-cpio-archive): Likewise, and write the file name as UTF-8 bytes, not
> textually, to avoid encoding it as ISO-8859-1.

Pushed to master.

Closing.

--
Thanks,
Maxim
Closed
?