shepherd freezes if wireguard is started with dns config enabled

  • Done
  • quality assurance status badge
Details
2 participants
  • Ludovic Courtès
  • Nathan Dehnel
Owner
unassigned
Submitted by
Nathan Dehnel
Severity
important
N
N
Nathan Dehnel wrote on 13 Jan 2022 01:27
(name . Tobias Geerinckx-Rice via Bug reports for GNU Guix)(address . bug-guix@gnu.org)
CAEEhgEt5N0T+Bja2KPdSYxnZaGCR_z0L8qOpQPt4H00bx3=O5w@mail.gmail.com
When dns is specified, wireguard runs wg-quick, which runs resolvconf,
which runs /run/current-system/profile/bin/herd restart, which causes
shepherd to freeze because I guess it doesn't like being given
multiple start commands at once. I'm not sure how to fix it.
L
L
Ludovic Courtès wrote on 13 Jan 2022 16:11
(name . Nathan Dehnel)(address . ncdehnel@gmail.com)(address . 53225@debbugs.gnu.org)
87pmov7jrr.fsf@gnu.org
Hi,

Nathan Dehnel <ncdehnel@gmail.com> skribis:

Toggle quote (5 lines)
> When dns is specified, wireguard runs wg-quick, which runs resolvconf,
> which runs /run/current-system/profile/bin/herd restart, which causes
> shepherd to freeze because I guess it doesn't like being given
> multiple start commands at once. I'm not sure how to fix it.

What do you mean by “freezing”? Does ‘herd status’ and similar commands
block forever? Or is it something else?

Requests in the Shepherd are currently handled sequentially. So if you
issue several ‘herd restart’ commands, they’ll be processed one at a
time. This is usually okay because ‘start’ commands are expected to be
quick (just wait for the daemon to write its PID file or similar).

Thanks,
Ludo’.
N
N
Nathan Dehnel wrote on 13 Jan 2022 23:41
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 53225@debbugs.gnu.org)
CAEEhgEv+MriPS8SwUWXV_qg8UQJrVeTYBkYkOHmDWi5OaQWy4A@mail.gmail.com
Toggle quote (1 lines)
>What do you mean by “freezing”? Does ‘herd status’ and similar commands
block forever?
Yes

Toggle quote (1 lines)
>Requests in the Shepherd are currently handled sequentially. So if you
issue several ‘herd restart’ commands, they’ll be processed one at a
time. This is usually okay because ‘start’ commands are expected to be
quick (just wait for the daemon to write its PID file or similar).
What is the nature of this serialization? Does wireguard need to
finish before resolvconf can start? Because that's probably the issue.


On Thu, Jan 13, 2022 at 9:11 AM Ludovic Courtès <ludo@gnu.org> wrote:
Toggle quote (20 lines)
>
> Hi,
>
> Nathan Dehnel <ncdehnel@gmail.com> skribis:
>
> > When dns is specified, wireguard runs wg-quick, which runs resolvconf,
> > which runs /run/current-system/profile/bin/herd restart, which causes
> > shepherd to freeze because I guess it doesn't like being given
> > multiple start commands at once. I'm not sure how to fix it.
>
> What do you mean by “freezing”? Does ‘herd status’ and similar commands
> block forever? Or is it something else?
>
> Requests in the Shepherd are currently handled sequentially. So if you
> issue several ‘herd restart’ commands, they’ll be processed one at a
> time. This is usually okay because ‘start’ commands are expected to be
> quick (just wait for the daemon to write its PID file or similar).
>
> Thanks,
> Ludo’.
L
L
Ludovic Courtès wrote on 17 Jan 2022 14:48
(name . Nathan Dehnel)(address . ncdehnel@gmail.com)(address . 53225@debbugs.gnu.org)
87y23eo4lj.fsf@gnu.org
Hi,

Nathan Dehnel <ncdehnel@gmail.com> skribis:

Toggle quote (11 lines)
>>What do you mean by “freezing”? Does ‘herd status’ and similar commands
> block forever?
> Yes
>
>>Requests in the Shepherd are currently handled sequentially. So if you
> issue several ‘herd restart’ commands, they’ll be processed one at a
> time. This is usually okay because ‘start’ commands are expected to be
> quick (just wait for the daemon to write its PID file or similar).
> What is the nature of this serialization? Does wireguard need to
> finish before resolvconf can start? Because that's probably the issue.

One command sent to shepherd by ‘herd …’ must have completed before the
next one is processed.

You can experience it like this:

sudo herd eval root '(sleep 3)' & echo status && sudo herd status

Here the first ‘herd’ command has shepherd block for 3 seconds, so the
second ‘herd’ command won’t print anything until 3 seconds have passed.

HTH,
Ludo’.
N
N
Nathan Dehnel wrote on 2 Jun 2022 00:56
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 53225@debbugs.gnu.org)
CAEEhgEvwfB49rpxW-w_hj+xD5MVy-OkT_AmHThNXK1dYW34Fyw@mail.gmail.com
Just tested and Shepherd 0.9 does not fix this issue.

On Mon, Jan 17, 2022 at 7:48 AM Ludovic Courtès <ludo@gnu.org> wrote:
Toggle quote (28 lines)
>
> Hi,
>
> Nathan Dehnel <ncdehnel@gmail.com> skribis:
>
> >>What do you mean by “freezing”? Does ‘herd status’ and similar commands
> > block forever?
> > Yes
> >
> >>Requests in the Shepherd are currently handled sequentially. So if you
> > issue several ‘herd restart’ commands, they’ll be processed one at a
> > time. This is usually okay because ‘start’ commands are expected to be
> > quick (just wait for the daemon to write its PID file or similar).
> > What is the nature of this serialization? Does wireguard need to
> > finish before resolvconf can start? Because that's probably the issue.
>
> One command sent to shepherd by ‘herd …’ must have completed before the
> next one is processed.
>
> You can experience it like this:
>
> sudo herd eval root '(sleep 3)' & echo status && sudo herd status
>
> Here the first ‘herd’ command has shepherd block for 3 seconds, so the
> second ‘herd’ command won’t print anything until 3 seconds have passed.
>
> HTH,
> Ludo’.
L
L
Ludovic Courtès wrote on 2 Jun 2022 15:38
(name . Nathan Dehnel)(address . ncdehnel@gmail.com)(address . 53225@debbugs.gnu.org)
87o7zbp4lr.fsf@gnu.org
Hi Nathan,

Nathan Dehnel <ncdehnel@gmail.com> skribis:

Toggle quote (2 lines)
> Just tested and Shepherd 0.9 does not fix this issue.

Could you be more specific? Specifically, could you share
/var/log/messages for the parts related to Wireguard?

Toggle quote (2 lines)
> On Mon, Jan 17, 2022 at 7:48 AM Ludovic Courtès <ludo@gnu.org> wrote:

[...]

Toggle quote (10 lines)
>> One command sent to shepherd by ‘herd …’ must have completed before the
>> next one is processed.
>>
>> You can experience it like this:
>>
>> sudo herd eval root '(sleep 3)' & echo status && sudo herd status
>>
>> Here the first ‘herd’ command has shepherd block for 3 seconds, so the
>> second ‘herd’ command won’t print anything until 3 seconds have passed.

This is actually still the case with 0.9, because here we’re calling
(@ (guile) sleep), which blocks. So… not a good example.

The short story is: it is still possible to write code that blocks
shepherd, as with the ‘sleep’ example above. However, the standard
service constructors/destructors no longer block, and shepherd can serve
multiple clients concurrently.

Ludo’.
N
N
Nathan Dehnel wrote on 9 Jun 2022 01:23
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 53225@debbugs.gnu.org)
CAEEhgEua1cgU0m7VBYOsePrVc6kSXaGt7AGj8VKvO1P57kK2mw@mail.gmail.com
Toggle quote (3 lines)
>Could you be more specific? Specifically, could you share
>/var/log/messages for the parts related to Wireguard?

root@guixtest ~# cat /var/log/messages | grep -i wireguardJun 8
18:20:07 localhost vmunix: [ 6.330271] wireguard: WireGuard 1.0.0
loaded. See www.wireguard.com for information.
Jun 8 18:20:07 localhost vmunix: [ 6.330276] wireguard: Copyright
(C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights
Reserved.

Toggle quote (4 lines)
>However, the standard
>service constructors/destructors no longer block, and shepherd can serve
>multiple clients concurrently.

I don't know, I guess wireguard uses "non-standard" constructors.

On Thu, Jun 2, 2022 at 8:38 AM Ludovic Courtès <ludo@gnu.org> wrote:
Toggle quote (33 lines)
>
> Hi Nathan,
>
> Nathan Dehnel <ncdehnel@gmail.com> skribis:
>
> > Just tested and Shepherd 0.9 does not fix this issue.
>
> Could you be more specific? Specifically, could you share
> /var/log/messages for the parts related to Wireguard?
>
> > On Mon, Jan 17, 2022 at 7:48 AM Ludovic Courtès <ludo@gnu.org> wrote:
>
> [...]
>
> >> One command sent to shepherd by ‘herd …’ must have completed before the
> >> next one is processed.
> >>
> >> You can experience it like this:
> >>
> >> sudo herd eval root '(sleep 3)' & echo status && sudo herd status
> >>
> >> Here the first ‘herd’ command has shepherd block for 3 seconds, so the
> >> second ‘herd’ command won’t print anything until 3 seconds have passed.
>
> This is actually still the case with 0.9, because here we’re calling
> (@ (guile) sleep), which blocks. So… not a good example.
>
> The short story is: it is still possible to write code that blocks
> shepherd, as with the ‘sleep’ example above. However, the standard
> service constructors/destructors no longer block, and shepherd can serve
> multiple clients concurrently.
>
> Ludo’.
L
L
Ludovic Courtès wrote on 9 Jun 2022 17:05
(name . Nathan Dehnel)(address . ncdehnel@gmail.com)(address . 53225@debbugs.gnu.org)
87tu8t3mjj.fsf@gnu.org
Hi Nathan,

Nathan Dehnel <ncdehnel@gmail.com> skribis:

Toggle quote (10 lines)
>>Could you be more specific? Specifically, could you share
>>/var/log/messages for the parts related to Wireguard?
>
> root@guixtest ~# cat /var/log/messages | grep -i wireguardJun 8
> 18:20:07 localhost vmunix: [ 6.330271] wireguard: WireGuard 1.0.0
> loaded. See www.wireguard.com for information.
> Jun 8 18:20:07 localhost vmunix: [ 6.330276] wireguard: Copyright
> (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights
> Reserved.

There should be lines like:

shepherd[1]: Service 'wireguard-XXX' has been started.

Perhaps they’ve been moved to a different files due to log rotation?

Without these, I cannot tell what happened.

Toggle quote (6 lines)
>>However, the standard
>>service constructors/destructors no longer block, and shepherd can serve
>>multiple clients concurrently.
>
> I don't know, I guess wireguard uses "non-standard" constructors.

Indeed, it invokes ‘wg-quick up’ and waits for completion.

I suppose that command blocks until it has set up the VPN, right?

If so, we’ll need to rewrite it differently.

Thanks,
Ludo’.
N
N
Nathan Dehnel wrote on 9 Jun 2022 17:49
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 53225@debbugs.gnu.org)
CAEEhgEtUOiw4UyZn3_40LPrGBOTPsK36E75YL=6Fw7gsW15VzQ@mail.gmail.com
Toggle quote (2 lines)
>There should be lines like:

> shepherd[1]: Service 'wireguard-XXX' has been started.

Toggle quote (4 lines)
>Perhaps they’ve been moved to a different files due to log rotation?

>Without these, I cannot tell what happened.

I tried it again and found this
Jun 9 10:47:44 localhost vmunix: [ 6.497581] wireguard: WireGuard
1.0.0 loaded. See www.wireguard.com for information.
Jun 9 10:47:44 localhost vmunix: [ 6.497584] wireguard: Copyright
(C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights
Reserved.
Jun 9 10:47:44 localhost shepherd[1]: Failed to start wireguard-test
in the background.

On Thu, Jun 9, 2022 at 10:05 AM Ludovic Courtès <ludo@gnu.org> wrote:
Toggle quote (37 lines)
>
> Hi Nathan,
>
> Nathan Dehnel <ncdehnel@gmail.com> skribis:
>
> >>Could you be more specific? Specifically, could you share
> >>/var/log/messages for the parts related to Wireguard?
> >
> > root@guixtest ~# cat /var/log/messages | grep -i wireguardJun 8
> > 18:20:07 localhost vmunix: [ 6.330271] wireguard: WireGuard 1.0.0
> > loaded. See www.wireguard.com for information.
> > Jun 8 18:20:07 localhost vmunix: [ 6.330276] wireguard: Copyright
> > (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights
> > Reserved.
>
> There should be lines like:
>
> shepherd[1]: Service 'wireguard-XXX' has been started.
>
> Perhaps they’ve been moved to a different files due to log rotation?
>
> Without these, I cannot tell what happened.
>
> >>However, the standard
> >>service constructors/destructors no longer block, and shepherd can serve
> >>multiple clients concurrently.
> >
> > I don't know, I guess wireguard uses "non-standard" constructors.
>
> Indeed, it invokes ‘wg-quick up’ and waits for completion.
>
> I suppose that command blocks until it has set up the VPN, right?
>
> If so, we’ll need to rewrite it differently.
>
> Thanks,
> Ludo’.
L
L
Ludovic Courtès wrote on 9 Jun 2022 22:15
(name . Nathan Dehnel)(address . ncdehnel@gmail.com)(address . 53225@debbugs.gnu.org)
87k09p386t.fsf@gnu.org
Hi,

Nathan Dehnel <ncdehnel@gmail.com> skribis:

Toggle quote (9 lines)
> I tried it again and found this
> Jun 9 10:47:44 localhost vmunix: [ 6.497581] wireguard: WireGuard
> 1.0.0 loaded. See www.wireguard.com for information.
> Jun 9 10:47:44 localhost vmunix: [ 6.497584] wireguard: Copyright
> (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights
> Reserved.
> Jun 9 10:47:44 localhost shepherd[1]: Failed to start wireguard-test
> in the background.

Could you provide me (privately if you prefer) the /var/log/messages
sequence starting from boot (the line that reads “syslogd (GNU inetutils
2.0): restart”) up to the last line above?

Thanks in advance,
Ludo’.
L
L
Ludovic Courtès wrote on 13 Jun 2022 11:31
(name . Nathan Dehnel)(address . ncdehnel@gmail.com)
87k09kucyh.fsf@gnu.org
Hi,

The /var/log/messages excerpt you sent me has nothing beyond:

Toggle quote (2 lines)
> Jun 11 11:43:33 localhost shepherd[1]: Service networking has been started.

[…]

Toggle quote (8 lines)
> Jun 11 11:43:33 localhost vmunix: [ 5.552395] wireguard: WireGuard
> 1.0.0 loaded. See www.wireguard.com for information.
> Jun 11 11:43:33 localhost vmunix: [ 5.552398] wireguard: Copyright
> (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights
> Reserved.
> Jun 11 11:43:33 localhost shepherd[1]: Failed to start wireguard-test
> in the background.

That there’s not a single error message from wireguard doesn’t help.

Mathieu, Guillaume: any idea what might prevent the wireguard Shepherd
service from starting, or how we could gather debugging info?


Ludo’.
L
L
Ludovic Courtès wrote on 12 Nov 2022 19:07
control message for bug #53225
(address . control@debbugs.gnu.org)
87fseocaic.fsf@gnu.org
severity 53225 important
quit
L
L
Ludovic Courtès wrote on 12 Nov 2022 19:10
Re: bug#58926: Shepherd becomes unresponsive after an interrupt
(name . Mathieu Othacehe)(address . othacehe@gnu.org)
878rkgcabz.fsf@gnu.org
Mathieu Othacehe <othacehe@gnu.org> skribis:

Toggle quote (8 lines)
> 1. On my laptop with a Wireguard service trying to reach a non-existing
> DNS server.
>
> (service wireguard-service-type
> (wireguard-configuration
> (addresses (list "10.0.0.2/24"))
> (dns '("10.0.0.50")) #does not exit

This one is similar to:


It has to do with the fact that “wg-quick up” blocks until it succeeds
and that ‘invoke’ gets stuck on ‘waitpid’ until the “wg-quick” process
terminates.

The solution will be to use something non-blocking instead of ‘invoke’;
I’m looking into it.

Ludo’.
L
L
Ludovic Courtès wrote on 17 Nov 2022 11:23
(name . Mathieu Othacehe)(address . othacehe@gnu.org)
87a64pkhgy.fsf@gnu.org
Hi,

Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (22 lines)
> Mathieu Othacehe <othacehe@gnu.org> skribis:
>
>> 1. On my laptop with a Wireguard service trying to reach a non-existing
>> DNS server.
>>
>> (service wireguard-service-type
>> (wireguard-configuration
>> (addresses (list "10.0.0.2/24"))
>> (dns '("10.0.0.50")) #does not exit
>
> This one is similar to:
>
> https://issues.guix.gnu.org/53225
> https://issues.guix.gnu.org/53381
>
> It has to do with the fact that “wg-quick up” blocks until it succeeds
> and that ‘invoke’ gets stuck on ‘waitpid’ until the “wg-quick” process
> terminates.
>
> The solution will be to use something non-blocking instead of ‘invoke’;
> I’m looking into it.

This is fixed in the Shepherd 0.9.3, which landed in Guix commit
283d7318c5b312d7129adb6dbeea6ad205ce89d1.

As I wrote, I’m not sure whether it fixes the nginx situation since I
could not reproduce it. I’m closing and let’s open a new issue
specifically for nginx if it comes up again with 0.9.3.

Thanks,
Ludo’.
Closed
?