Hello Guix,
Anadon <joshua.r.marshall.1991@gmail.com> writes:
Toggle quote (6 lines)
> Talking with iskarian on IRC, we've confirmed that guix successfully
> installs, almost successfully sets up (the init.d isn't set up to
> actually daemonize), but for `guix build`, `guix install` and `guix
> pull` all fail with "guix <SUB_CMD>: error: cannot kill processes for
> uid `998': failed with exit code 1" when using WSL1/2.
I've investigated this a bit and it seems to be an issue with the return
code from `kill` when no other processes owned by that user exist to
kill. If I set a process to continually spawn with uid 998, I no longer
encounter the above error. Also, if there is a zombie process under that
uid, I no longer encounter the above error until I manually kill it.
This is on WSL1; I do not know if this technique also applies to WSL2.
For reference, the relevant portion of `nix/libutil/util.cc`:
Toggle snippet (37 lines)
Pid pid = startProcess([&]() {
if (setuid(uid) == -1)
throw SysError("setting uid");
while (true) {
#ifdef __APPLE__
/* OSX's kill syscall takes a third parameter that, among
other things, determines if kill(-1, signo) affects the
calling process. In the OSX libc, it's set to true,
which means "follow POSIX", which we don't want here
*/
if (syscall(SYS_kill, -1, SIGKILL, false) == 0) break;
#elif __GNU__
/* Killing all a user's processes using PID=-1 does currently
not work on the Hurd. */
if (kill(getpid(), SIGKILL) == 0) break;
#else
if (kill(-1, SIGKILL) == 0) break;
#endif
if (errno == ESRCH) break; /* no more processes */
if (errno != EINTR)
throw SysError(format("cannot kill processes for uid `%1%'") % uid);
}
_exit(0);
});
int status = pid.wait(true);
#if __GNU__
/* When the child killed itself, status = SIGKILL. */
if (status == SIGKILL) return;
#endif
if (status != 0)
throw Error(format("cannot kill processes for uid `%1%': %2%") % uid % statusToString(status));
Perhaps the way WSL handles the return code for ``kill` is not as
expected? On a cursory inspection, though, the relevant parts of WSL2's
kernel/signal.c seem the same as the vanilla Linux kernel...
Or perhaps for some reason `kill(-1, SIGKILL)` under WSL is attempting
to kill the calling process (why?) and failing, therefore returning an
error.
Note that the error code 1 reported by Guix does not seem to be the
actual errno reported by `kill`.
I've seen Guix working on WSL2 in the wild before [0] so this is a
really odd error. I'm stumped.
--
Sarah