On 24-08-2022 17:24, Remco van 't Veer wrote:
  * Ruby bundles zlib.
Can you point out where it is in the source tree?  Looking at the
sources I can only find a (very beefy) wrapper around zlib which seems
to implement all kinds of zlib stuff but also depends on the zlib
library.  I dunno how to determine if this is bundling or not.


I probably confused the wrapper for a local copy of zlib, nevermind.

There's a zlib-1.2.11-mswin.patch though, I wonder what's up with that.

  * Ruby contains some things generated by bison or such.
It seems the generated parse.c file (from parse.y) is included in the
tarballs as a service to workaround a bootstrap problem; generating the
parser requires ruby.  See also:


I don't know how to deal with this properly.  The only thing I can think
of is compiling in two phases: first with the supplied parse.c and after
without.  Or try it with mruby as a native-input but that seems to
require ruby to compile too.

We have a bunch of old rubies packaged, maybe it can be generated with one of the old versions? Though possibly the old versions have the same problem, I haven't checked.

If not: fully properly generating it might not be possible, but something in-between could be an option:

  1. First, use the pre-generated parse.c.
  2. Once ruby is built, regenerate the parse.c, and verify that it is the same as the old parse.c (ignoring the timestamp)
What's to gain by this?

(1) I would assume it is much easier to hide malware in a generated file like parse.c than in the real source code (*) (IIRC, the .c code generated by bison is much longer than the .y). By generating the parse.c, the potential issue is side-stepped; any security reviewers wouldn't even have to look at parse.c because the pre-generated parse.c isn't used, it's regenerated.

(2) Also: generators like Bison can have bugs, fixed in later versions. Now imagine that Bison had, say, a buffer overflow bug, and that distro's just used the pre-generated parse.c. Then once a fixed version of Bison comes out, we would have to check every package to see if it has a pre-generated parser. It would be much less stressful to just always generate parsers from source, then once the version of Bison in Guix is updated then all packages automatically get the buffer overflow fix.

I don't think my in-between proposal helps much with (1) in case of a competent attacker (though it could stop some insufficiently sophisticated attacks where the parse.c malware doesn't try to subvert the later check), but it still helps with (2) -- it at least detects if ruby used an old bison (and hence that a patch might be in order)


(*) Caveat: I don't have any statistics on this.