[shepherd] ntpd throws shepherd out of the loop

  • Done
  • quality assurance status badge
Details
4 participants
  • Liliana Marie Prikler
  • Ludovic Courtès
  • Csepp
  • Timotej Lazar
Owner
unassigned
Submitted by
Liliana Marie Prikler
Severity
normal
L
L
Liliana Marie Prikler wrote on 15 Aug 2023 07:18
(address . bug-guix@gnu.org)
464a337f57514620ffcae1e426a376c6d034a4e8.camel@gmail.com
Hi Guix,

I have a laptop that's a little stuck in the past… more accurately
January of 2020 thanks to what I believe to be an empty CMOS battery.
As of recently (maybe it dates back longer, but I first experienced it
two weeks ago and just now got to debugging it a little), Shepherd gets
stuck at 100% CPU usage "early" on first boot. I can prevent this
issue by getting the system time "close enough" to the actual time
before the NTP sync, but see the first sentence. Not having a network
connection also works, but that's somewhat unpractical. Also, the high
CPU usage still occurs if a sync is done later. I have yet to
encounter the bug post hibernation, but I also wish not to. There
doesn't appear to be anything particular interesting in the logs
either.

Cheers
C
(name . Liliana Marie Prikler)(address . liliana.prikler@gmail.com)
87ttt0mu4g.fsf@riseup.net
Liliana Marie Prikler <liliana.prikler@gmail.com> writes:

Toggle quote (17 lines)
> Hi Guix,
>
> I have a laptop that's a little stuck in the past… more accurately
> January of 2020 thanks to what I believe to be an empty CMOS battery.
> As of recently (maybe it dates back longer, but I first experienced it
> two weeks ago and just now got to debugging it a little), Shepherd gets
> stuck at 100% CPU usage "early" on first boot. I can prevent this
> issue by getting the system time "close enough" to the actual time
> before the NTP sync, but see the first sentence. Not having a network
> connection also works, but that's somewhat unpractical. Also, the high
> CPU usage still occurs if a sync is done later. I have yet to
> encounter the bug post hibernation, but I also wish not to. There
> doesn't appear to be anything particular interesting in the logs
> either.
>
> Cheers

This sounds like an issue with slow incremental system time updates,
although I don't understand why that would cause Shepherd to hang, but
maybe the NTP service is configured to only report itself as initialized
once it has finished synchronizing, which defeats the point of
incremental updating.
There is probably a config setting to tell ntpd to perform the update in
a single step, at least I know chrony has one.

ps.: don't wait until the battery starts leaking to replace it
T
T
Timotej Lazar wrote on 15 Aug 2023 16:27
87h6p0bcty.fsf@araneo.si
Liliana Marie Prikler <liliana.prikler@gmail.com> [2023-08-15 07:18:02+0200]:
Toggle quote (4 lines)
> As of recently (maybe it dates back longer, but I first experienced it
> two weeks ago and just now got to debugging it a little), Shepherd gets
> stuck at 100% CPU usage "early" on first boot.

I have this issue on all Guix systems without a (working) RTC. It seems
to be caused by a recentish update to guile-fibers:


For me this happens regardless of whether the system time is pushed
forward manually or by ntpd. Depending on the time delta and CPU speed,
the usage returns to normal after a couple of days. During that time any
socket-activated services like SSH are also unreachable.
L
L
Ludovic Courtès wrote on 2 Sep 2023 22:44
(name . Timotej Lazar)(address . timotej.lazar@araneo.si)
87fs3we25o.fsf@gnu.org
Hi,

Timotej Lazar <timotej.lazar@araneo.si> skribis:

Toggle quote (10 lines)
> Liliana Marie Prikler <liliana.prikler@gmail.com> [2023-08-15 07:18:02+0200]:
>> As of recently (maybe it dates back longer, but I first experienced it
>> two weeks ago and just now got to debugging it a little), Shepherd gets
>> stuck at 100% CPU usage "early" on first boot.
>
> I have this issue on all Guix systems without a (working) RTC. It seems
> to be caused by a recentish update to guile-fibers:
>
> https://github.com/wingo/fibers/issues/89

Yeah, that’s the one.

Liliana, Timotej: could you try the Guix patch I posted at

Thanks,
Ludo’.
L
L
Liliana Marie Prikler wrote on 2 Sep 2023 23:41
(address . 65306@debbugs.gnu.org)
dd9e2da5bb5502e77bb7ed4b5538b6de2e4d8a3d.camel@gmail.com
Am Samstag, dem 02.09.2023 um 22:44 +0200 schrieb Ludovic Courtès:
Toggle quote (20 lines)
> Hi,
>
> Timotej Lazar <timotej.lazar@araneo.si> skribis:
>
> > Liliana Marie Prikler <liliana.prikler@gmail.com> [2023-08-15
> > 07:18:02+0200]:
> > > As of recently (maybe it dates back longer, but I first
> > > experienced it two weeks ago and just now got to debugging it a
> > > little), Shepherd gets stuck at 100% CPU usage "early" on first
> > > boot.
> >
> > I have this issue on all Guix systems without a (working) RTC. It
> > seems to be caused by a recentish update to guile-fibers:
> >
> > https://github.com/wingo/fibers/issues/89
>
> Yeah, that’s the one.
>
> Liliana, Timotej: could you try the Guix patch I posted at
> <https://issues.guix.gnu.org/64966>?
Do we have a guide on how to swap out shepherd from the config.scm?
The machine that experiences this fault isn't set up for Guix hacking.

Cheers
L
L
Ludovic Courtès wrote on 3 Sep 2023 21:58
(name . Liliana Marie Prikler)(address . liliana.prikler@gmail.com)
87wmx7c9m7.fsf@gnu.org
Hi,

Liliana Marie Prikler <liliana.prikler@gmail.com> skribis:

Toggle quote (2 lines)
> Am Samstag, dem 02.09.2023 um 22:44 +0200 schrieb Ludovic Courtès:

[...]

Toggle quote (5 lines)
>> Liliana, Timotej: could you try the Guix patch I posted at
>> <https://issues.guix.gnu.org/64966>?
> Do we have a guide on how to swap out shepherd from the config.scm?
> The machine that experiences this fault isn't set up for Guix hacking.

You can do something like this in your OS config:

(essential-services
(modify-services (operating-system-default-essential-services
this-operating-system)
(shepherd-root-service-type
config => (shepherd-configuration
(shepherd insert-custom-sherpherd-here)))))

(Initially mentioned at

HTH!

Ludo’.
T
T
Timotej Lazar wrote on 4 Sep 2023 07:46
(name . Ludovic Courtès)(address . ludo@gnu.org)
87r0neh4o4.fsf@araneo.si
Ludovic Courtès <ludo@gnu.org> [2023-09-02 22:44:03+0200]:
Toggle quote (3 lines)
> Liliana, Timotej: could you try the Guix patch I posted at
> <https://issues.guix.gnu.org/64966>?

That patch works for my aarch64 board. I encounter the same issue on an
x86_64 system without a functional RTC, but at least now I know how to
apply a workaround. Thanks!
L
L
Ludovic Courtès wrote on 8 Sep 2023 18:50
(name . Timotej Lazar)(address . timotej.lazar@araneo.si)
8734zo4njd.fsf@gnu.org
Timotej Lazar <timotej.lazar@araneo.si> skribis:

Toggle quote (8 lines)
> Ludovic Courtès <ludo@gnu.org> [2023-09-02 22:44:03+0200]:
>> Liliana, Timotej: could you try the Guix patch I posted at
>> <https://issues.guix.gnu.org/64966>?
>
> That patch works for my aarch64 board. I encounter the same issue on an
> x86_64 system without a functional RTC, but at least now I know how to
> apply a workaround. Thanks!

Right. I’ve committed a variant of this patch (will push shortly).

Thanks for testing!

Ludo’.
Closed
?