On WSL1/2, guix install: error: cannot kill processes for uid `999': failed with exit code 1

  • Open
  • quality assurance status badge
Details
2 participants
  • Sarah Morgensen
  • Anadon
Owner
unassigned
Submitted by
Anadon
Severity
normal
A
A
Anadon wrote on 18 Jul 2021 01:30
(address . bug-guix@gnu.org)
CAFkJGRf3j3=bHQ305xGO6pFiF9KRxKUnsdHH7Aib5PVrS+hF_Q@mail.gmail.com
Talking with iskarian on IRC, we've confirmed that guix successfully
installs, almost successfully sets up (the init.d isn't set up to actually
daemonize), but for `guix build`, `guix install` and `guix pull` all fail
with "guix <SUB_CMD>: error: cannot kill processes for uid `998': failed
with exit code 1" when using WSL1/2.
Attachment: file
S
S
Sarah Morgensen wrote on 18 Jul 2021 06:20
(name . Anadon)(address . joshua.r.marshall.1991@gmail.com)(address . 49613@debbugs.gnu.org)
86lf64tfqu.fsf@mgsn.dev
Hello Guix,

Anadon <joshua.r.marshall.1991@gmail.com> writes:

Toggle quote (6 lines)
> Talking with iskarian on IRC, we've confirmed that guix successfully
> installs, almost successfully sets up (the init.d isn't set up to
> actually daemonize), but for `guix build`, `guix install` and `guix
> pull` all fail with "guix <SUB_CMD>: error: cannot kill processes for
> uid `998': failed with exit code 1" when using WSL1/2.

I've investigated this a bit and it seems to be an issue with the return
code from `kill` when no other processes owned by that user exist to
kill. If I set a process to continually spawn with uid 998, I no longer
encounter the above error. Also, if there is a zombie process under that
uid, I no longer encounter the above error until I manually kill it.
This is on WSL1; I do not know if this technique also applies to WSL2.

For reference, the relevant portion of `nix/libutil/util.cc`:

Toggle snippet (37 lines)
Pid pid = startProcess([&]() {

if (setuid(uid) == -1)
throw SysError("setting uid");

while (true) {
#ifdef __APPLE__
/* OSX's kill syscall takes a third parameter that, among
other things, determines if kill(-1, signo) affects the
calling process. In the OSX libc, it's set to true,
which means "follow POSIX", which we don't want here
*/
if (syscall(SYS_kill, -1, SIGKILL, false) == 0) break;
#elif __GNU__
/* Killing all a user's processes using PID=-1 does currently
not work on the Hurd. */
if (kill(getpid(), SIGKILL) == 0) break;
#else
if (kill(-1, SIGKILL) == 0) break;
#endif
if (errno == ESRCH) break; /* no more processes */
if (errno != EINTR)
throw SysError(format("cannot kill processes for uid `%1%'") % uid);
}

_exit(0);
});

int status = pid.wait(true);
#if __GNU__
/* When the child killed itself, status = SIGKILL. */
if (status == SIGKILL) return;
#endif
if (status != 0)
throw Error(format("cannot kill processes for uid `%1%': %2%") % uid % statusToString(status));

Perhaps the way WSL handles the return code for ``kill` is not as
expected? On a cursory inspection, though, the relevant parts of WSL2's
kernel/signal.c seem the same as the vanilla Linux kernel...

Or perhaps for some reason `kill(-1, SIGKILL)` under WSL is attempting
to kill the calling process (why?) and failing, therefore returning an
error.

Note that the error code 1 reported by Guix does not seem to be the
actual errno reported by `kill`.

I've seen Guix working on WSL2 in the wild before [0] so this is a
really odd error. I'm stumped.


--
Sarah
?