(guix cpio) produces corrupted archives when there are non-ASCII filenames

  • Done
  • quality assurance status badge
Details
3 participants
  • Maxim Cournoyer
  • Mark H Weaver
  • Simon Tournier
Owner
unassigned
Submitted by
Maxim Cournoyer
Severity
normal
M
M
Maxim Cournoyer wrote on 23 Feb 2023 04:14
(name . bug-guix)(address . bug-guix@gnu.org)
87mt55hy3x.fsf@gmail.com
Hi,

It appears that the code we have to generate CPIO archives doesn't
handle the presence of non-ASCII characters in the file names of files
to be archived well:

First, to make rpm usable on a Guix System:

Toggle snippet (5 lines)
# mkdir /var/lib/rpm
# chown root:users /var/lib/rpm
# chmod g+rw /var/lib/rpm

Then, produce a problematic CPIO via 'guix pack -f rpm', which uses
(guix cpio):

Toggle snippet (3 lines)
$ rpm_archive=$(guix pack -R -C none -f rpm nss-certs)

Notice that it cannot be installed:
Toggle snippet (6 lines)
$ mkdir /tmp/nss-certs
# rpm --prefix=/tmp/nss-certs -i $rpm_archive
error: unpacking of archive failed: cpio: Bad magic
error: nss-certs-3.81-0.x86_64: install failed

Let's now inspect the cpio itself.

Toggle snippet (14 lines)
$ guix shell rpm cpio
[env]$ rpm2cpio $rpm_archive > nss-certs.cpio
[env]$ cpio -t < nss-certs.cpio |& grep -B3 junk
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/9482e63a.0
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/9846683b.0
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/988a38cb.0
cpio: warning: skipped 248 bytes of junk
--
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/Microsoft_RSA_Root_Certificate_Authority_2017.pem
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/NAVER_Global_Root_Certification_Authority.pem
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/NetLock_Arany_=Class_Gold=_F?tanúsítvány.
cpio: warning: skipped 4 bytes of junk

I haven't yet pin-pointed what the problem is.

I could do with extra eyes :-).

--
Thanks,
Maxim
M
M
Maxim Cournoyer wrote on 24 Feb 2023 05:54
[PATCH] cpio: Properly handle Unicode characters in file names.
(address . 61722@debbugs.gnu.org)
20230224045402.26444-1-maxim.cournoyer@gmail.com

* guix/cpio.scm (file->cpio-header): Compute the file name length in bytes rather than in
characters.
(file->cpio-header*, special-file->cpio-header*): Likewise.
(write-cpio-archive): Likewise, and write the file name as UTF-8 bytes, not
textually, to avoid encoding it as ISO-8859-1.

---

guix/cpio.scm | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

Toggle diff (56 lines)
diff --git a/guix/cpio.scm b/guix/cpio.scm
index d4a7d5f1e0..8fd7552450 100644
--- a/guix/cpio.scm
+++ b/guix/cpio.scm
@@ -170,7 +170,8 @@ (define* (file->cpio-header file #:optional (file-name file)
#:size (stat:size st)
#:dev (stat:dev st)
#:rdev (stat:rdev st)
- #:name-size (string-length file-name))))
+ #:name-size (bytevector-length
+ (string->utf8 file-name)))))
(define* (file->cpio-header* file
#:optional (file-name file)
@@ -182,7 +183,8 @@ (define* (file->cpio-header* file
(make-cpio-header #:mode (stat:mode st)
#:nlink (stat:nlink st)
#:size (stat:size st)
- #:name-size (string-length file-name))))
+ #:name-size (bytevector-length
+ (string->utf8 file-name)))))
(define* (special-file->cpio-header* file
device-type
@@ -201,7 +203,8 @@ (define* (special-file->cpio-header* file
permission-bits)
#:nlink 1
#:rdev (device-number device-major device-minor)
- #:name-size (string-length file-name)))
+ #:name-size (bytevector-length
+ (string->utf8 file-name))))
(define %trailer
"TRAILER!!!")
@@ -237,7 +240,7 @@ (define (dump-file file)
;; We're padding the header + following file name + trailing zero, and
;; the header is 110 byte long.
- (write-padding (+ 110 1 (string-length file)) port)
+ (write-padding (+ 110 (bytevector-length (string->utf8 file)) 1) port)
(case (mode->type (cpio-header-mode header))
((regular)
@@ -246,7 +249,7 @@ (define (dump-file file)
(dump-port input port))))
((symlink)
(let ((target (readlink file)))
- (put-string port target)))
+ (put-bytevector port (string->utf8 target))))
((directory)
#t)
((block-special)

base-commit: c756c62cfdba8d4079be1ba9e370779b850f16b6
--
2.39.1
M
M
Mark H Weaver wrote on 24 Feb 2023 12:46
87bklj70bb.fsf@netris.org
Hi Maxim,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (25 lines)
>
> * guix/cpio.scm (file->cpio-header): Compute the file name length in bytes rather than in
> characters.
> (file->cpio-header*, special-file->cpio-header*): Likewise.
> (write-cpio-archive): Likewise, and write the file name as UTF-8 bytes, not
> textually, to avoid encoding it as ISO-8859-1.
>
> ---
>
> guix/cpio.scm | 13 ++++++++-----
> 1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/guix/cpio.scm b/guix/cpio.scm
> index d4a7d5f1e0..8fd7552450 100644
> --- a/guix/cpio.scm
> +++ b/guix/cpio.scm
> @@ -170,7 +170,8 @@ (define* (file->cpio-header file #:optional (file-name file)
> #:size (stat:size st)
> #:dev (stat:dev st)
> #:rdev (stat:rdev st)
> - #:name-size (string-length file-name))))
> + #:name-size (bytevector-length
> + (string->utf8 file-name)))))

(string-utf8-length file-name) would produce the same result more
efficiently.

Regards,
Mark
S
S
Simon Tournier wrote on 24 Feb 2023 11:18
control message for bug #61722
(address . control@debbugs.gnu.org)
87a613mkn0.fsf@gmail.com
tags 61722 + patch
quit
M
M
Maxim Cournoyer wrote on 24 Feb 2023 14:26
[PATCH v2] cpio: Properly handle Unicode characters in file names.
(address . 61722@debbugs.gnu.org)
20230224132652.4617-1-maxim.cournoyer@gmail.com

* guix/cpio.scm (file->cpio-header): Compute the file name length in bytes rather than in
characters.
(file->cpio-header*, special-file->cpio-header*): Likewise.
(write-cpio-archive): Likewise, and write the file name as UTF-8 bytes, not
textually, to avoid encoding it as ISO-8859-1.

---

Changes in v2:
- Use string-utf8-length

guix/cpio.scm | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

Toggle diff (53 lines)
diff --git a/guix/cpio.scm b/guix/cpio.scm
index d4a7d5f1e0..876f61ea3c 100644
--- a/guix/cpio.scm
+++ b/guix/cpio.scm
@@ -170,7 +170,7 @@ (define* (file->cpio-header file #:optional (file-name file)
#:size (stat:size st)
#:dev (stat:dev st)
#:rdev (stat:rdev st)
- #:name-size (string-length file-name))))
+ #:name-size (string-utf8-length file-name))))
(define* (file->cpio-header* file
#:optional (file-name file)
@@ -182,7 +182,7 @@ (define* (file->cpio-header* file
(make-cpio-header #:mode (stat:mode st)
#:nlink (stat:nlink st)
#:size (stat:size st)
- #:name-size (string-length file-name))))
+ #:name-size (string-utf8-length file-name))))
(define* (special-file->cpio-header* file
device-type
@@ -201,7 +201,7 @@ (define* (special-file->cpio-header* file
permission-bits)
#:nlink 1
#:rdev (device-number device-major device-minor)
- #:name-size (string-length file-name)))
+ #:name-size (string-utf8-length file-name)))
(define %trailer
"TRAILER!!!")
@@ -237,7 +237,7 @@ (define (dump-file file)
;; We're padding the header + following file name + trailing zero, and
;; the header is 110 byte long.
- (write-padding (+ 110 1 (string-length file)) port)
+ (write-padding (+ 110 (string-utf8-length file) 1) port)
(case (mode->type (cpio-header-mode header))
((regular)
@@ -246,7 +246,7 @@ (define (dump-file file)
(dump-port input port))))
((symlink)
(let ((target (readlink file)))
- (put-string port target)))
+ (put-bytevector port (string->utf8 target))))
((directory)
#t)
((block-special)

base-commit: c756c62cfdba8d4079be1ba9e370779b850f16b6
--
2.39.1
M
M
Maxim Cournoyer wrote on 25 Feb 2023 20:52
Re: bug#61722: (guix cpio) produces corrupted archives when there are non-ASCII filenames
(address . 61722-done@debbugs.gnu.org)
875ybpfrps.fsf_-_@gmail.com
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (8 lines)
>
> * guix/cpio.scm (file->cpio-header): Compute the file name length in bytes rather than in
> characters.
> (file->cpio-header*, special-file->cpio-header*): Likewise.
> (write-cpio-archive): Likewise, and write the file name as UTF-8 bytes, not
> textually, to avoid encoding it as ISO-8859-1.

Pushed to master.

Closing.

--
Thanks,
Maxim
Closed
?
Your comment

This issue is archived.

To comment on this conversation send an email to 61722@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 61722
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch