In an eariler bug comment [1] I corroborated that nscd was leaking /etc/passwd information from the host OS into the Guix container, and I wondered aloud why the container would use the host OS's nscd if there was a risk of this happening.

I've looked into how Guix configures its own nscd, and it turns out that by default it enables lookups only for `hosts` and `services` - not for `passwd`, `group`, or `netgroup`. Presumably, then, this configuration is sufficient for nscd to prevent the glibc compatibility issues described in the manual [3].

After adding the following 3 lines in nscd.conf on my foreign distro (Debian 10) and restarting nscd, my Guix system containers were able to boot successfully while talking to the daemon:

enable-cache passwd no

enable-cache group no

enable-cache netgroup no

So I think the bug here is that the Guix manual page advising the use of nscd on a foreign distro [3] doesn't elaborate on which types of service lookups are safe to enable in the daemon. If Guix is used only to build and run binaries then perhaps it could use nscd for all lookups, but this is evidently not the case for Guix system containers.

Cheers,

Jason

[1] https://www.mail-archive.com/bug-guix@gnu.org/msg19915.html

[2] https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/services/base.scm?h=version-1.1.0#n1238

[3] https://guix.gnu.org/manual/en/html_node/Application-Setup.html

On Mon, Aug 24, 2020 at 11:15 PM conjaroy <conjaroy@gmail.com> wrote:

I've observed this error under similar circumstances: launching a guix system container script with network sharing enabled, on a foreign disto (Debian 10) with nscd running.

Using `strace -f /gnu/store/...-run-container`, we can observe the container's lookup of user accounts via the foreign distro's nscd socket:

[pid 16582] socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 11
[pid 16582] connect(11, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = 0
[pid 16582] sendto(11, "\2\0\0\0\0\0\0\0\t\0\0\0postgres\0", 21, MSG_NOSIGNAL, NULL, 0) = 21
[pid 16582] poll([{fd=11, events=POLLIN|POLLERR|POLLHUP}], 1, 5000) = 1 ([{fd=11, revents=POLLIN}])
[pid 16582] read(11, "\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377\0\0\0\0\0\0\0\0"..., 36) = 36
[pid 16582] close(11) = 0

Since the user ("postgres") is indeed missing in the foreign disto, the lookup fails. In this case, disabling nscd on the foreign distro allowed the container script to run without error.

Based on comments in https://issues.guix.info/issue/28128, I see that it was a deliberate choice to bind-mount the foreign distro's nscd socket inside the container (instead of starting a separate containerized nscd instance). But I'm having trouble seeing why it's acceptable to leak state from the foreign distro's user space into the container. Is there something I'm missing?

Cheers,

Jason