guix pull hangs on 32 bit

  • Open
  • quality assurance status badge
Details
2 participants
  • Maxim Cournoyer
  • raingloom
Owner
unassigned
Submitted by
raingloom
Severity
normal
R
R
raingloom wrote on 15 Apr 2022 02:08
(name . Guix Bugs)(address . bug-guix@gnu.org)
20220415020828.00b7865c@riseup.net
It's been at 67% on guix-packages-base for at least an hour now. The
system itself is responsive and with the swap I gave it, it has more
than enough memory. Htop shows three guile processes at the top of the
list when sorted by CPU%, their states are S, D, D.
Both CPUs are practically idling.
This looks like some kind of lockup to me.

Fresh install based on bare-bones example on a 32 bit netbook, but the
install image used is the latest tagged version, since apparently there
is no 32 bit option for edge.

I also tried pulling using channel-with-substitutes, since I'm not too
keen on locally building everything on such an old machine. Although
Guix itself should frankly not take this long to build if we want to be
competitive with other distros. Anyways, pulling with that in
channels.scm gives a cert related error, so that's great, means old
images can't easily be used for installation.
M
M
Maxim Cournoyer wrote on 8 Jun 2022 22:24
(name . raingloom)(address . raingloom@riseup.net)(address . 54944@debbugs.gnu.org)
87czfikio4.fsf@gmail.com
Hi!

raingloom <raingloom@riseup.net> writes:

Toggle quote (18 lines)
> It's been at 67% on guix-packages-base for at least an hour now. The
> system itself is responsive and with the swap I gave it, it has more
> than enough memory. Htop shows three guile processes at the top of the
> list when sorted by CPU%, their states are S, D, D.
> Both CPUs are practically idling.
> This looks like some kind of lockup to me.
>
> Fresh install based on bare-bones example on a 32 bit netbook, but the
> install image used is the latest tagged version, since apparently there
> is no 32 bit option for edge.
>
> I also tried pulling using channel-with-substitutes, since I'm not too
> keen on locally building everything on such an old machine. Although
> Guix itself should frankly not take this long to build if we want to be
> competitive with other distros. Anyways, pulling with that in
> channels.scm gives a cert related error, so that's great, means old
> images can't easily be used for installation.

Have you been able to reproduce this? If so, could you share the commit
you are starting from and the CPU architecture, so that we may hopefully
reproduce too?

Thanks,

Maxim
C
C
Csepp wrote on 1 Dec 2022 01:56
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)(address . 54944@debbugs.gnu.org)
87k03cndha.fsf@riseup.net
Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (30 lines)
> Hi!
>
> raingloom <raingloom@riseup.net> writes:
>
>> It's been at 67% on guix-packages-base for at least an hour now. The
>> system itself is responsive and with the swap I gave it, it has more
>> than enough memory. Htop shows three guile processes at the top of the
>> list when sorted by CPU%, their states are S, D, D.
>> Both CPUs are practically idling.
>> This looks like some kind of lockup to me.
>>
>> Fresh install based on bare-bones example on a 32 bit netbook, but the
>> install image used is the latest tagged version, since apparently there
>> is no 32 bit option for edge.
>>
>> I also tried pulling using channel-with-substitutes, since I'm not too
>> keen on locally building everything on such an old machine. Although
>> Guix itself should frankly not take this long to build if we want to be
>> competitive with other distros. Anyways, pulling with that in
>> channels.scm gives a cert related error, so that's great, means old
>> images can't easily be used for installation.
>
> Have you been able to reproduce this? If so, could you share the commit
> you are starting from and the CPU architecture, so that we may hopefully
> reproduce too?
>
> Thanks,
>
> Maxim

CPU architecture is x86, commit it happened on last time is 347733b.
Other possibly relevant factors:
* spinning rust storage
* 1GB RAM
* encrypted BTRFS root
* 4GB (encrypted) swap
* 128MB zswap

The last was not there when I originally submitted the bug.

The swap is relevant because if it's a timing issue it's very possible
some part of the code assumes reads are almost instant, which is not
true with swap, and delaying a read might be exposing a race condition.
C
(name . Csepp)(address . raingloom@riseup.net)
87pma3x359.fsf@riseup.net
Csepp <raingloom@riseup.net> writes:

Toggle quote (46 lines)
> Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:
>
>> Hi!
>>
>> raingloom <raingloom@riseup.net> writes:
>>
>>> It's been at 67% on guix-packages-base for at least an hour now. The
>>> system itself is responsive and with the swap I gave it, it has more
>>> than enough memory. Htop shows three guile processes at the top of the
>>> list when sorted by CPU%, their states are S, D, D.
>>> Both CPUs are practically idling.
>>> This looks like some kind of lockup to me.
>>>
>>> Fresh install based on bare-bones example on a 32 bit netbook, but the
>>> install image used is the latest tagged version, since apparently there
>>> is no 32 bit option for edge.
>>>
>>> I also tried pulling using channel-with-substitutes, since I'm not too
>>> keen on locally building everything on such an old machine. Although
>>> Guix itself should frankly not take this long to build if we want to be
>>> competitive with other distros. Anyways, pulling with that in
>>> channels.scm gives a cert related error, so that's great, means old
>>> images can't easily be used for installation.
>>
>> Have you been able to reproduce this? If so, could you share the commit
>> you are starting from and the CPU architecture, so that we may hopefully
>> reproduce too?
>>
>> Thanks,
>>
>> Maxim
>
> CPU architecture is x86, commit it happened on last time is 347733b.
> Other possibly relevant factors:
> * spinning rust storage
> * 1GB RAM
> * encrypted BTRFS root
> * 4GB (encrypted) swap
> * 128MB zswap
>
> The last was not there when I originally submitted the bug.
>
> The swap is relevant because if it's a timing issue it's very possible
> some part of the code assumes reads are almost instant, which is not
> true with swap, and delaying a read might be exposing a race condition.

Happening again.
pulled to: 8320c0c
pulled from: 4501a50

Same system.

The system version is from november of last year due, because trying to
upgrade takes so damn long and often gets stuck on some package with no
substitute.
So... the situation is not great...
C
(name . Csepp)(address . raingloom@riseup.net)
87lekrx2m9.fsf@riseup.net
Csepp <raingloom@riseup.net> writes:

Toggle quote (59 lines)
> Csepp <raingloom@riseup.net> writes:
>
>> Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:
>>
>>> Hi!
>>>
>>> raingloom <raingloom@riseup.net> writes:
>>>
>>>> It's been at 67% on guix-packages-base for at least an hour now. The
>>>> system itself is responsive and with the swap I gave it, it has more
>>>> than enough memory. Htop shows three guile processes at the top of the
>>>> list when sorted by CPU%, their states are S, D, D.
>>>> Both CPUs are practically idling.
>>>> This looks like some kind of lockup to me.
>>>>
>>>> Fresh install based on bare-bones example on a 32 bit netbook, but the
>>>> install image used is the latest tagged version, since apparently there
>>>> is no 32 bit option for edge.
>>>>
>>>> I also tried pulling using channel-with-substitutes, since I'm not too
>>>> keen on locally building everything on such an old machine. Although
>>>> Guix itself should frankly not take this long to build if we want to be
>>>> competitive with other distros. Anyways, pulling with that in
>>>> channels.scm gives a cert related error, so that's great, means old
>>>> images can't easily be used for installation.
>>>
>>> Have you been able to reproduce this? If so, could you share the commit
>>> you are starting from and the CPU architecture, so that we may hopefully
>>> reproduce too?
>>>
>>> Thanks,
>>>
>>> Maxim
>>
>> CPU architecture is x86, commit it happened on last time is 347733b.
>> Other possibly relevant factors:
>> * spinning rust storage
>> * 1GB RAM
>> * encrypted BTRFS root
>> * 4GB (encrypted) swap
>> * 128MB zswap
>>
>> The last was not there when I originally submitted the bug.
>>
>> The swap is relevant because if it's a timing issue it's very possible
>> some part of the code assumes reads are almost instant, which is not
>> true with swap, and delaying a read might be exposing a race condition.
>
> Happening again.
> pulled to: 8320c0c
> pulled from: 4501a50
>
> Same system.
>
> The system version is from november of last year due, because trying to
> upgrade takes so damn long and often gets stuck on some package with no
> substitute.
> So... the situation is not great...

The process status says sleep so it's probably hanging in a syscall?
Maybe a kernel bug?
C
Re: bug#54944: guix pull hangs in guix-packages-base.drv even with offloading
(name . Csepp)(address . raingloom@riseup.net)
874jmv2us4.fsf@riseup.net
Csepp <raingloom@riseup.net> writes:

Toggle quote (64 lines)
> Csepp <raingloom@riseup.net> writes:
>
>> Csepp <raingloom@riseup.net> writes:
>>
>>> Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:
>>>
>>>> Hi!
>>>>
>>>> raingloom <raingloom@riseup.net> writes:
>>>>
>>>>> It's been at 67% on guix-packages-base for at least an hour now. The
>>>>> system itself is responsive and with the swap I gave it, it has more
>>>>> than enough memory. Htop shows three guile processes at the top of the
>>>>> list when sorted by CPU%, their states are S, D, D.
>>>>> Both CPUs are practically idling.
>>>>> This looks like some kind of lockup to me.
>>>>>
>>>>> Fresh install based on bare-bones example on a 32 bit netbook, but the
>>>>> install image used is the latest tagged version, since apparently there
>>>>> is no 32 bit option for edge.
>>>>>
>>>>> I also tried pulling using channel-with-substitutes, since I'm not too
>>>>> keen on locally building everything on such an old machine. Although
>>>>> Guix itself should frankly not take this long to build if we want to be
>>>>> competitive with other distros. Anyways, pulling with that in
>>>>> channels.scm gives a cert related error, so that's great, means old
>>>>> images can't easily be used for installation.
>>>>
>>>> Have you been able to reproduce this? If so, could you share the commit
>>>> you are starting from and the CPU architecture, so that we may hopefully
>>>> reproduce too?
>>>>
>>>> Thanks,
>>>>
>>>> Maxim
>>>
>>> CPU architecture is x86, commit it happened on last time is 347733b.
>>> Other possibly relevant factors:
>>> * spinning rust storage
>>> * 1GB RAM
>>> * encrypted BTRFS root
>>> * 4GB (encrypted) swap
>>> * 128MB zswap
>>>
>>> The last was not there when I originally submitted the bug.
>>>
>>> The swap is relevant because if it's a timing issue it's very possible
>>> some part of the code assumes reads are almost instant, which is not
>>> true with swap, and delaying a read might be exposing a race condition.
>>
>> Happening again.
>> pulled to: 8320c0c
>> pulled from: 4501a50
>>
>> Same system.
>>
>> The system version is from november of last year due, because trying to
>> upgrade takes so damn long and often gets stuck on some package with no
>> substitute.
>> So... the situation is not great...
>
> The process status says sleep so it's probably hanging in a syscall?
> Maybe a kernel bug?

Happening again with offloading.
This is getting really annoying.
Offload machine is completely idle, there is a process Guile for
guix-packages-base-builder running on it, its in sleeping status. Ran
for 17 minutes, now the time is not increasing.
I'm attaching a GDB backtrace of all the threads.
Attachment: gdb.txt
System info:
offloading from: x86, 1 GB RAM, 4 GB swap, 2 cores, user guix commit is
8a47949, system commit is 038981e, commit being pulled is 01d5d68
offloading to:
amd64, 8 GB RAM, no swap, 4 cores
guix system commit is 9504dd2c3eef0277369acc0944f87fb4546251b1
C
Re: guix pull: computing Guix derivation takes forever
(name . akib)(address . akib@disroot.org)
87wmzqs6sc.fsf@riseup.net
akib via <help-guix@gnu.org> writes:

Toggle quote (8 lines)
> I've just installed Guix on a partition of my new HDD. After the
> installation I logged in to my user account on a Linux console and
> executed 'guix pull'. After that it pulled the repository and
> computed Guix derivation, but stuck while updating substitutes. So I
> thought something is wrong and restarted the command. Now the pull
> operation always stucks while computing Guix derivation. There is no
> sign of activity according to top.

Is this maybe related to the CC'd bug?
There the freeze happens later, during the building phase for
packages-base, but it seems like the symtoms are the same.
Does this happen every time you try?
Could you get a backtrace with GDB?
I think the incantation was:
set logging on
thread apply all backtrace
quit

And then the output should be gdb.txt.
?