(guix cpio) produces corrupted archives when there are non-ASCII filenames

  • Done
  • quality assurance status badge
Details
3 participants
  • Maxim Cournoyer
  • Mark H Weaver
  • Simon Tournier
Owner
unassigned
Submitted by
Maxim Cournoyer
Severity
normal

Debbugs page

Maxim Cournoyer wrote 2 years ago
(name . bug-guix)(address . bug-guix@gnu.org)
87mt55hy3x.fsf@gmail.com
Hi,

It appears that the code we have to generate CPIO archives doesn't
handle the presence of non-ASCII characters in the file names of files
to be archived well:

First, to make rpm usable on a Guix System:

Toggle snippet (5 lines)
# mkdir /var/lib/rpm
# chown root:users /var/lib/rpm
# chmod g+rw /var/lib/rpm

Then, produce a problematic CPIO via 'guix pack -f rpm', which uses
(guix cpio):

Toggle snippet (3 lines)
$ rpm_archive=$(guix pack -R -C none -f rpm nss-certs)

Notice that it cannot be installed:
Toggle snippet (6 lines)
$ mkdir /tmp/nss-certs
# rpm --prefix=/tmp/nss-certs -i $rpm_archive
error: unpacking of archive failed: cpio: Bad magic
error: nss-certs-3.81-0.x86_64: install failed

Let's now inspect the cpio itself.

Toggle snippet (14 lines)
$ guix shell rpm cpio
[env]$ rpm2cpio $rpm_archive > nss-certs.cpio
[env]$ cpio -t < nss-certs.cpio |& grep -B3 junk
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/9482e63a.0
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/9846683b.0
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/988a38cb.0
cpio: warning: skipped 248 bytes of junk
--
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/Microsoft_RSA_Root_Certificate_Authority_2017.pem
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/NAVER_Global_Root_Certification_Authority.pem
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/NetLock_Arany_=Class_Gold=_FÅ‘tanúsítvány.
cpio: warning: skipped 4 bytes of junk

I haven't yet pin-pointed what the problem is.

I could do with extra eyes :-).

--
Thanks,
Maxim
Maxim Cournoyer wrote 2 years ago
[PATCH] cpio: Properly handle Unicode characters in file names.
(address . 61722@debbugs.gnu.org)
20230224045402.26444-1-maxim.cournoyer@gmail.com

* guix/cpio.scm (file->cpio-header): Compute the file name length in bytes rather than in
characters.
(file->cpio-header*, special-file->cpio-header*): Likewise.
(write-cpio-archive): Likewise, and write the file name as UTF-8 bytes, not
textually, to avoid encoding it as ISO-8859-1.

---

guix/cpio.scm | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

Toggle diff (56 lines)
diff --git a/guix/cpio.scm b/guix/cpio.scm
index d4a7d5f1e0..8fd7552450 100644
--- a/guix/cpio.scm
+++ b/guix/cpio.scm
@@ -170,7 +170,8 @@ (define* (file->cpio-header file #:optional (file-name file)
#:size (stat:size st)
#:dev (stat:dev st)
#:rdev (stat:rdev st)
- #:name-size (string-length file-name))))
+ #:name-size (bytevector-length
+ (string->utf8 file-name)))))
(define* (file->cpio-header* file
#:optional (file-name file)
@@ -182,7 +183,8 @@ (define* (file->cpio-header* file
(make-cpio-header #:mode (stat:mode st)
#:nlink (stat:nlink st)
#:size (stat:size st)
- #:name-size (string-length file-name))))
+ #:name-size (bytevector-length
+ (string->utf8 file-name)))))
(define* (special-file->cpio-header* file
device-type
@@ -201,7 +203,8 @@ (define* (special-file->cpio-header* file
permission-bits)
#:nlink 1
#:rdev (device-number device-major device-minor)
- #:name-size (string-length file-name)))
+ #:name-size (bytevector-length
+ (string->utf8 file-name))))
(define %trailer
"TRAILER!!!")
@@ -237,7 +240,7 @@ (define (dump-file file)
;; We're padding the header + following file name + trailing zero, and
;; the header is 110 byte long.
- (write-padding (+ 110 1 (string-length file)) port)
+ (write-padding (+ 110 (bytevector-length (string->utf8 file)) 1) port)
(case (mode->type (cpio-header-mode header))
((regular)
@@ -246,7 +249,7 @@ (define (dump-file file)
(dump-port input port))))
((symlink)
(let ((target (readlink file)))
- (put-string port target)))
+ (put-bytevector port (string->utf8 target))))
((directory)
#t)
((block-special)

base-commit: c756c62cfdba8d4079be1ba9e370779b850f16b6
--
2.39.1
Mark H Weaver wrote 2 years ago
87bklj70bb.fsf@netris.org
Hi Maxim,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (25 lines)
>
> * guix/cpio.scm (file->cpio-header): Compute the file name length in bytes rather than in
> characters.
> (file->cpio-header*, special-file->cpio-header*): Likewise.
> (write-cpio-archive): Likewise, and write the file name as UTF-8 bytes, not
> textually, to avoid encoding it as ISO-8859-1.
>
> ---
>
> guix/cpio.scm | 13 ++++++++-----
> 1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/guix/cpio.scm b/guix/cpio.scm
> index d4a7d5f1e0..8fd7552450 100644
> --- a/guix/cpio.scm
> +++ b/guix/cpio.scm
> @@ -170,7 +170,8 @@ (define* (file->cpio-header file #:optional (file-name file)
> #:size (stat:size st)
> #:dev (stat:dev st)
> #:rdev (stat:rdev st)
> - #:name-size (string-length file-name))))
> + #:name-size (bytevector-length
> + (string->utf8 file-name)))))

(string-utf8-length file-name) would produce the same result more
efficiently.

Regards,
Mark
Simon Tournier wrote 2 years ago
control message for bug #61722
(address . control@debbugs.gnu.org)
87a613mkn0.fsf@gmail.com
tags 61722 + patch
quit
Maxim Cournoyer wrote 2 years ago
[PATCH v2] cpio: Properly handle Unicode characters in file names.
(address . 61722@debbugs.gnu.org)
20230224132652.4617-1-maxim.cournoyer@gmail.com

* guix/cpio.scm (file->cpio-header): Compute the file name length in bytes rather than in
characters.
(file->cpio-header*, special-file->cpio-header*): Likewise.
(write-cpio-archive): Likewise, and write the file name as UTF-8 bytes, not
textually, to avoid encoding it as ISO-8859-1.

---

Changes in v2:
- Use string-utf8-length

guix/cpio.scm | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

Toggle diff (53 lines)
diff --git a/guix/cpio.scm b/guix/cpio.scm
index d4a7d5f1e0..876f61ea3c 100644
--- a/guix/cpio.scm
+++ b/guix/cpio.scm
@@ -170,7 +170,7 @@ (define* (file->cpio-header file #:optional (file-name file)
#:size (stat:size st)
#:dev (stat:dev st)
#:rdev (stat:rdev st)
- #:name-size (string-length file-name))))
+ #:name-size (string-utf8-length file-name))))
(define* (file->cpio-header* file
#:optional (file-name file)
@@ -182,7 +182,7 @@ (define* (file->cpio-header* file
(make-cpio-header #:mode (stat:mode st)
#:nlink (stat:nlink st)
#:size (stat:size st)
- #:name-size (string-length file-name))))
+ #:name-size (string-utf8-length file-name))))
(define* (special-file->cpio-header* file
device-type
@@ -201,7 +201,7 @@ (define* (special-file->cpio-header* file
permission-bits)
#:nlink 1
#:rdev (device-number device-major device-minor)
- #:name-size (string-length file-name)))
+ #:name-size (string-utf8-length file-name)))
(define %trailer
"TRAILER!!!")
@@ -237,7 +237,7 @@ (define (dump-file file)
;; We're padding the header + following file name + trailing zero, and
;; the header is 110 byte long.
- (write-padding (+ 110 1 (string-length file)) port)
+ (write-padding (+ 110 (string-utf8-length file) 1) port)
(case (mode->type (cpio-header-mode header))
((regular)
@@ -246,7 +246,7 @@ (define (dump-file file)
(dump-port input port))))
((symlink)
(let ((target (readlink file)))
- (put-string port target)))
+ (put-bytevector port (string->utf8 target))))
((directory)
#t)
((block-special)

base-commit: c756c62cfdba8d4079be1ba9e370779b850f16b6
--
2.39.1
Maxim Cournoyer wrote 2 years ago
Re: bug#61722: (guix cpio) produces corrupted archives when there are non-ASCII filenames
(address . 61722-done@debbugs.gnu.org)
875ybpfrps.fsf_-_@gmail.com
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (8 lines)
>
> * guix/cpio.scm (file->cpio-header): Compute the file name length in bytes rather than in
> characters.
> (file->cpio-header*, special-file->cpio-header*): Likewise.
> (write-cpio-archive): Likewise, and write the file name as UTF-8 bytes, not
> textually, to avoid encoding it as ISO-8859-1.

Pushed to master.

Closing.

--
Thanks,
Maxim
Closed
?
Your comment

This issue is archived.

To comment on this conversation send an email to 61722@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 61722
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch
You may also tag this issue. See list of standard tags. For example, to set the confirmed and easy tags
mumi command -t +confirmed -t +easy
Or, remove the moreinfo tag and set the help tag
mumi command -t -moreinfo -t +help