In an eariler bug comment [1] I corroborated that nscd was leaking
/etc/passwd information from the host OS into the Guix container, and I
wondered aloud why the container would use the host OS's nscd if there was
a risk of this happening.

I've looked into how Guix configures its own nscd, and it turns out that by
default it enables lookups only for `hosts` and `services` - not for
`passwd`, `group`, or `netgroup`. Presumably, then, this configuration is
sufficient for nscd to prevent the glibc compatibility issues described in
the manual [3].

After adding the following 3 lines in nscd.conf on my foreign distro
(Debian 10) and restarting nscd, my Guix system containers were able to
boot successfully while talking to the daemon:

        enable-cache            passwd          no
        enable-cache            group           no
        enable-cache            netgroup        no

So I think the bug here is that the Guix manual page advising the use of
nscd on a foreign distro [3] doesn't elaborate on which types of service
lookups are safe to enable in the daemon. If Guix is used only to build and
run binaries then perhaps it could use nscd for all lookups, but this is
evidently not the case for Guix system containers.


Cheers,

Jason


[1] https://www.mail-archive.com/bug-guix@gnu.org/msg19915.html
[2]
https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/services/base.scm?h=version-1.1.0#n1238
[3] https://guix.gnu.org/manual/en/html_node/Application-Setup.html

On Mon, Aug 24, 2020 at 11:15 PM conjaroy <conjaroy@gmail.com> wrote:

> I've observed this error under similar circumstances: launching a guix
> system container script with network sharing enabled, on a foreign disto
> (Debian 10) with nscd running.
>
> Using `strace -f /gnu/store/...-run-container`, we can observe the
> container's lookup of user accounts via the foreign distro's nscd socket:
>
> [pid 16582] socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 11
> [pid 16582] connect(11, {sa_family=AF_UNIX,
> sun_path="/var/run/nscd/socket"}, 110) = 0
> [pid 16582] sendto(11, "\2\0\0\0\0\0\0\0\t\0\0\0postgres\0", 21,
> MSG_NOSIGNAL, NULL, 0) = 21
> [pid 16582] poll([{fd=11, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1
> ([{fd=11, revents=POLLIN}])
> [pid 16582] read(11,
> "\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\0"...,
> 36) = 36
> [pid 16582] close(11)                   = 0
>
> Since the user ("postgres") is indeed missing in the foreign disto, the
> lookup fails. In this case, disabling nscd on the foreign distro allowed
> the container script to run without error.
>
> Based on comments in https://issues.guix.info/issue/28128, I see that it
> was a deliberate choice to bind-mount the foreign distro's nscd socket
> inside the container (instead of starting a separate containerized nscd
> instance). But I'm having trouble seeing why it's acceptable to leak state
> from the foreign distro's user space into the container. Is there something
> I'm missing?
>
> Cheers,
>
> Jason
>