On WSL1/2, guix install: error: cannot kill processes for uid `999': failed with exit code 1

  • Open
  • quality assurance status badge
Details
2 participants
  • Sarah Morgensen
  • Anadon
Owner
unassigned
Submitted by
Anadon
Severity
normal
A
A
Anadon wrote on 18 Jul 2021 01:30
(address . bug-guix@gnu.org)
CAFkJGRf3j3=bHQ305xGO6pFiF9KRxKUnsdHH7Aib5PVrS+hF_Q@mail.gmail.com
Talking with iskarian on IRC, we've confirmed that guix successfully
installs, almost successfully sets up (the init.d isn't set up to actually
daemonize), but for `guix build`, `guix install` and `guix pull` all fail
with "guix <SUB_CMD>: error: cannot kill processes for uid `998': failed
with exit code 1" when using WSL1/2.
Attachment: file
S
S
Sarah Morgensen wrote on 18 Jul 2021 06:20
(name . Anadon)(address . joshua.r.marshall.1991@gmail.com)(address . 49613@debbugs.gnu.org)
86lf64tfqu.fsf@mgsn.dev
Hello Guix,

Anadon <joshua.r.marshall.1991@gmail.com> writes:

Toggle quote (6 lines)
> Talking with iskarian on IRC, we've confirmed that guix successfully
> installs, almost successfully sets up (the init.d isn't set up to
> actually daemonize), but for `guix build`, `guix install` and `guix
> pull` all fail with "guix <SUB_CMD>: error: cannot kill processes for
> uid `998': failed with exit code 1" when using WSL1/2.

I've investigated this a bit and it seems to be an issue with the return
code from `kill` when no other processes owned by that user exist to
kill. If I set a process to continually spawn with uid 998, I no longer
encounter the above error. Also, if there is a zombie process under that
uid, I no longer encounter the above error until I manually kill it.
This is on WSL1; I do not know if this technique also applies to WSL2.

For reference, the relevant portion of `nix/libutil/util.cc`:

Toggle snippet (37 lines)
Pid pid = startProcess([&]() {

if (setuid(uid) == -1)
throw SysError("setting uid");

while (true) {
#ifdef __APPLE__
/* OSX's kill syscall takes a third parameter that, among
other things, determines if kill(-1, signo) affects the
calling process. In the OSX libc, it's set to true,
which means "follow POSIX", which we don't want here
*/
if (syscall(SYS_kill, -1, SIGKILL, false) == 0) break;
#elif __GNU__
/* Killing all a user's processes using PID=-1 does currently
not work on the Hurd. */
if (kill(getpid(), SIGKILL) == 0) break;
#else
if (kill(-1, SIGKILL) == 0) break;
#endif
if (errno == ESRCH) break; /* no more processes */
if (errno != EINTR)
throw SysError(format("cannot kill processes for uid `%1%'") % uid);
}

_exit(0);
});

int status = pid.wait(true);
#if __GNU__
/* When the child killed itself, status = SIGKILL. */
if (status == SIGKILL) return;
#endif
if (status != 0)
throw Error(format("cannot kill processes for uid `%1%': %2%") % uid % statusToString(status));

Perhaps the way WSL handles the return code for ``kill` is not as
expected? On a cursory inspection, though, the relevant parts of WSL2's
kernel/signal.c seem the same as the vanilla Linux kernel...

Or perhaps for some reason `kill(-1, SIGKILL)` under WSL is attempting
to kill the calling process (why?) and failing, therefore returning an
error.

Note that the error code 1 reported by Guix does not seem to be the
actual errno reported by `kill`.

I've seen Guix working on WSL2 in the wild before [0] so this is a
really odd error. I'm stumped.


--
Sarah
?
Your comment

Commenting via the web interface is currently disabled.

To comment on this conversation send an email to 49613@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 49613
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch