gilbc of the running system got wiped while building a package, system broken

  • Done
  • quality assurance status badge
Details
3 participants
  • Stefan Kuhr
  • Ludovic Courtès
  • Stefan
Owner
unassigned
Submitted by
Stefan
Severity
normal
S
S
Stefan wrote on 19 Nov 2020 09:03
(address . bug-guix@gnu.org)
9763EBD0-9CDC-4A06-A113-F6443CB4348D@vodafonemail.de
Hi!

After trying to just build a package (with a modified guix, but this is certainly unrelated), the system broke catastrophically:

stefan@guix ~/development/guix$ sudo -E ./pre-inst-env guix-daemon --build-users-group=guixbuild &
stefan@guix ~/development/guix$ /home/stefan/development/guix/pre-inst-env guix build -L /home/stefan/guix u-boot-rpi-3
accepted connection from pid 23848, user stefan
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
12,5 MB will be downloaded:
/gnu/store/ldg4jqfan2vp01lm255zz7zrb4vllixp-libxau-1.0.9
/gnu/store/m1r4jwmc56q44x31xcnvg1hcijf0lq88-libxcb-1.14
/gnu/store/8b75zmsyxc5qghfrxhyqi6g23bq993b1-libbsd-0.10.0
/gnu/store/z18hwxwgk551y4a0f6j1dxhmp208i4ha-bash-static-5.0.16
/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31
/gnu/store/z7hanmdmdalqh1v0y7z8ilinfhyfh91d-glibc-2.31-static
/gnu/store/pcsl88vd66k62sk1g4wcc9i985xn369m-libxdmcp-1.1.3
/gnu/store/785ldh00ix897pamyg5p6fpjls6ddwzz-libx11-1.6.A-doc
/gnu/store/x10mk7ri4ny013km57d3h5093270r7pg-libx11-1.6.A
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
substituting /gnu/store/785ldh00ix897pamyg5p6fpjls6ddwzz-libx11-1.6.A-doc...
libx11-1.6.A-doc 1.2MiB 309KiB/s 00:04 [##################] 100.0%

substituting /gnu/store/z18hwxwgk551y4a0f6j1dxhmp208i4ha-bash-static-5.0.16...
bash-static-5.0.16 502KiB 976KiB/s 00:01 [##################] 100.0%

guix build: error: cannot unlink `/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale/de/LC_MESSAGES': Directory not empty
stefan@guix ~/development/guix$ /home/stefan/development/guix/pre-inst-env guix build -L /home/stefan/guix u-boot-rpi-3
accepted connection from pid 23911, user stefan
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
12,0 MB will be downloaded:
/gnu/store/ldg4jqfan2vp01lm255zz7zrb4vllixp-libxau-1.0.9
/gnu/store/m1r4jwmc56q44x31xcnvg1hcijf0lq88-libxcb-1.14
/gnu/store/8b75zmsyxc5qghfrxhyqi6g23bq993b1-libbsd-0.10.0
/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31
/gnu/store/z7hanmdmdalqh1v0y7z8ilinfhyfh91d-glibc-2.31-static
/gnu/store/pcsl88vd66k62sk1g4wcc9i985xn369m-libxdmcp-1.1.3
/gnu/store/785ldh00ix897pamyg5p6fpjls6ddwzz-libx11-1.6.A-doc
/gnu/store/x10mk7ri4ny013km57d3h5093270r7pg-libx11-1.6.A
guix build: error: cannot unlink `/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib/gconv': Directory not empty
stefan@guix ~/development/guix$ /home/stefan/development/guix/pre-inst-env guix build -L /home/stefan/guix u-boot-rpi-3
-bash: /home/stefan/development/guix/pre-inst-env: /bin/sh: Defekter Interpreter: No such file or directory
stefan@guix ~/development/guix$ ls
-bash: /run/current-system/profile/bin/ls: No such file or directory
stefan@guix ~/development/guix$ echo $PATH
/run/setuid-programs:/home/stefan/.config/guix/current/bin:/home/stefan/.guix-profile/bin:/run/current-system/profile/bin:/run/current-system/profile/sbin
stefan@guix ~/development/guix$ ls
-bash: /run/current-system/profile/bin/ls: No such file or directory


The problem that the unlink was not successful was certainly due to a deleted but still opened file on the NFS share. There may be an intermediate hidden .nfs… file, which get created in such a case (“delete on last close”, “silly rename”), However, the RFC-5661 for NFS demands even if OPEN4_RESULT_PRESERVE_UNLINKED is supported, that the directory entry of an open file must not be removed in this case, thus preventing a directory removal.

Taking a look on the nfs-server, this is now the content of the glibc:

# ls gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib -l
total 12
drwxr-xr-x 2 root root 12288 Nov 18 19:41 gconv
lrwxrwxrwx 7 root root 11 Jan 1 1970 libanl.so -> libanl.so.1

Everything is missing.

Of course I tried to reboot the system, but because of the missing ld-linux-aarch64.so.1 the system is not booting properly.


This leaves some questions:

If the whole system (maybe not the booted/running one, but a reconfigured one) was already using /gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31, why has it been downloaded again when building a package? Was this due to grafting?

How can it be that the store is kind of “state“ being modified on the fly?

How does it come that removing files for /gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31 happens before the download even started? Could this render a system unusable if a network loss happens at the right point in time?

Booting the previous system generation luckily worked – it must be using a different glibc version. But now what about that broken glibc version? How to repair it? The garbage collector will not remove it, as it is still referenced by the latest system generation. Actually I don’t want to delete that generation. The database certainly believes, that this glibc package is installed correctly. What to do now?

The used guix version used when building the package was f6a42ac946edccc7de5e93ee247487cbec40072b.


Bye

Stefan
S
S
Stefan Kuhr wrote on 19 Nov 2020 09:29
(address . 44735@debbugs.gnu.org)
C3B47D7A-2170-4F72-AFE3-AAD5E6CBBFFA@arcor.de
Hi!

I didn’t look close enough before or didn’t wait long enough, but booting the previous system failed, actually.

I now tried to boot some more system generations and even the oldest one: None of my previous system generations is booting any longer. The system is bricked.


Bye

Stefan
S
S
Stefan wrote on 19 Nov 2020 12:45
(address . 44735@debbugs.gnu.org)
ACFC26D7-CA41-4961-A894-1045A037F866@vodafonemail.de
Hi!

I resolved the problem by copying gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31 from https://alpha.gnu.org/gnu/guix/guix-binary-1.2.0rc2.aarch64-linux.tar.xzinto gnu/store on the NFS server. The system is running again.

Now I retried my commands and I am able to reproduce the problem:

stefan@guix ~/development/guix$ /home/stefan/development/guix/pre-inst-env guix build -L /home/stefan/guix u-boot-rpi-3
accepted connection from pid 241, user stefan
12,0 MB will be downloaded:
/gnu/store/ldg4jqfan2vp01lm255zz7zrb4vllixp-libxau-1.0.9
/gnu/store/m1r4jwmc56q44x31xcnvg1hcijf0lq88-libxcb-1.14
/gnu/store/8b75zmsyxc5qghfrxhyqi6g23bq993b1-libbsd-0.10.0
/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31
/gnu/store/z7hanmdmdalqh1v0y7z8ilinfhyfh91d-glibc-2.31-static
/gnu/store/pcsl88vd66k62sk1g4wcc9i985xn369m-libxdmcp-1.1.3
/gnu/store/785ldh00ix897pamyg5p6fpjls6ddwzz-libx11-1.6.A-doc
/gnu/store/x10mk7ri4ny013km57d3h5093270r7pg-libx11-1.6.A
guix build: error: cannot unlink `/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale/de/LC_MESSAGES': Directory not empty


This is now the diff of a correct cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31 and the now broken one:

# diff -r cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/ guix-system/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/: etc
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/: include
Only in guix-system/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib/gconv: .nfs0000000003a20e490000231b
Only in guix-system/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib: .nfs0000000003a20e8800002314
Only in guix-system/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib: .nfs0000000003a20e9300002316
Only in guix-system/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib: .nfs0000000003a20e9700002311
Only in guix-system/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib: .nfs0000000003a20e9a00002317
Only in guix-system/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib: .nfs0000000003a20e9d00002315
Only in guix-system/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib: .nfs0000000003a20eac00002319
Only in guix-system/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib: .nfs0000000003a20eb300002318
Only in guix-system/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib: .nfs0000000003a20eb800002313
Only in guix-system/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib: .nfs0000000003a20ec100002312
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/: libexec
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share: doc
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share: i18n
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share: info
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: be
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: ca
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale/de/LC_MESSAGES: @eaDir
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale/de/LC_MESSAGES: libc.mo
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: en_GB
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: eo
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: es
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: fi
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: fr
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: hu
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: ja
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: ko
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: lt
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: nb
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: pl
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: pt
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: pt_BR
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: rw
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: sk
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: sv
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: uk
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: vi
Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: zh_TW

Indeed there are the hidden .nfs files leading to the unlink error.
If I would try the same commands a second time, as before, then more files will be removed and my system will brake again.

Why is guix trying to reinstall /gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31? It is part of the running system already.


Bye

Stefan
S
S
Stefan wrote on 19 Nov 2020 14:55
(address . 44735@debbugs.gnu.org)
4A9406A5-DB4F-4A05-BCC0-4AD8DBD2A112@vodafonemail.de
Hi!

I found the root cause of this issue: I made a typo and inadvertently did a “./configure --localstatedir=/vaar”.

The manual should warn that using a wrong (or omitted) --localstatedir may destroy the guix installation and possibly the whole guix system.

What can be done to prevent that a simple mistake like this destroys a system?


Having the configure command boxed inside the manual would at least reduce the risk of overlooking this, see also https://issues.guix.gnu.org/40848.But this still does not prevent anything.


Bye

Stefan
L
L
Ludovic Courtès wrote on 20 Nov 2020 12:31
Re: bug#44735: gilbc of the running system got wiped while building a package, system broken
(name . Stefan)(address . stefan-guix@vodafonemail.de)(address . 44735@debbugs.gnu.org)
878sawfemr.fsf@gnu.org
Hi Stefan,

Stefan <stefan-guix@vodafonemail.de> skribis:

Toggle quote (2 lines)
> I found the root cause of this issue: I made a typo and inadvertently did a “./configure --localstatedir=/vaar”.

Ouch. :-/

Your store database may no longer be in sync with your actual store so
you may have to reinstall. You can try ‘guix gc --verify’ to get an
idea of how bad the situation is.

Toggle quote (4 lines)
> The manual should warn that using a wrong (or omitted) --localstatedir may destroy the guix installation and possibly the whole guix system.
>
> What can be done to prevent that a simple mistake like this destroys a system?

./configure warns or errors out and the manual warns in a couple of
places too, but evidently it remains too easy to shoot oneself in the
foot.

Could you check ‘config.log’ to see what ‘configure’ said? You can see
the source of this check at the bottom of ‘m4/guix.m4’.

Also, why did you run guix-daemon from your checkout? This is only
necessary if you’re actually hacking on the daemon, but perhaps the
manual is misleading. (Hadn’t you run guix-daemon from the checkout,
the problem would not have occurred, even with a wrong
‘--localstatedir’.)

Thanks,
Ludo’.
S
S
Stefan wrote on 20 Nov 2020 15:35
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 44735@debbugs.gnu.org)
78228B3F-FA9D-48C1-B70C-2F0B4CC65446@vodafonemail.de
Hi Ludo’!

Toggle quote (4 lines)
> Your store database may no longer be in sync with your actual store so
> you may have to reinstall. You can try ‘guix gc --verify’ to get an
> idea of how bad the situation is.

stefan@guix ~/development/guix$ guix gc --verify
reading the store...
checking path existence...
path `/gnu/store/1kh1p8ypgn1yn826cc0mizw7gjjn5yfb-usbutils-012-guile-builder' disappeared, removing from database...
path `/gnu/store/da76qwnqrfravn2qd92b6vk5inp7273v-vala-0.44.5.drv' disappeared, removing from database...
path `/gnu/store/iq987sfc1bwyaijckagv59b0z2z3c4nb-vala-0.44.5.drv' disappeared, removing from database...
path `/gnu/store/m7l8381hqz4dgp12v9fbnf0k9n1ij5ja-module-import-compiled-guile-builder' disappeared, removing from database...
path `/gnu/store/mnhh9m6v88zk9k7lc6hj15db40qv5cnh-guix-packages-base-modules-builder' disappeared, removing from database...
path `/gnu/store/nal2ssav0z0qk523w5v6xp2vfqqfpc13-guix-module-union-builder' disappeared, removing from database...
path `/gnu/store/x5gczh79g5aarws1xgkcp2gc1av4fzas-vala-0.44.5.tar.xz.drv' disappeared, removing from database…

stefan@guix ~/development/guix$ guix gc --verify
reading the store...
checking path existence…

stefan@guix ~/development/guix$

That doesn’t seem to be so bad. :-)

Toggle quote (4 lines)
> ./configure warns or errors out and the manual warns in a couple of
> places too, but evidently it remains too easy to shoot oneself in the
> foot.

It warns in the chapter “2 Requirements”. It doesn’t warn in chapter ”14.1 Building from Git”.

Anyway, it was just a typo. Even if I would have known about that warning, this would have happened.

Toggle quote (3 lines)
> Could you check ‘config.log’ to see what ‘configure’ said? You can see
> the source of this check at the bottom of ‘m4/guix.m4’.

I retried:

stefan@guix ~/development/guix [env]$ ./configure --localstatedir=/vaar
checking for a BSD-compatible install... /gnu/store/5hj9mdr79nqfcqg9hb45dpfrrs5qqrnr-profile/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /gnu/store/5hj9mdr79nqfcqg9hb45dpfrrs5qqrnr-profile/bin/mkdir -p
checking for gawk… gawk

[pages of checking]

checking the current installation's localstatedir... /var
configure: WARNING: chosen localstatedir '/vaar' does not match that of the existing installation '/var'
configure: WARNING: installing may corrupt /gnu/store!
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
config.status: creating po/guix/Makefile.in
config.status: creating po/packages/Makefile.in
config.status: creating etc/guix-daemon.cil
config.status: creating guix/config.scm
config.status: creating etc/committer.scm
config.status: creating test-env
config.status: creating pre-inst-env
config.status: creating nix/config.h
config.status: nix/config.h is unchanged
config.status: executing depfiles commands
config.status: executing po-directories commands
config.status: creating po/guix/POTFILES
config.status: creating po/guix/Makefile
config.status: creating po/packages/POTFILES
config.status: creating po/packages/Makefile
stefan@guix ~/development/guix [env]$

Indeed, there in all that pages of output, luckily on the last page, there is a warning. I could have noticed it. But I did’t. Red colour could have helped. :-)

The same warning is hidden inside the very middle of the config.log. At least the mistake with localstatedir is visible right at its beginning.

Would it be possible to do that check right at the beginning of configure and ask the user for confirmation?

Toggle quote (6 lines)
> Also, why did you run guix-daemon from your checkout? This is only
> necessary if you’re actually hacking on the daemon, but perhaps the
> manual is misleading. (Hadn’t you run guix-daemon from the checkout,
> the problem would not have occurred, even with a wrong
> ‘--localstatedir’.)

I was trying to add a build side module into guix/build. This failed all the time with an error “no code for module”. As neither #:modules nor #:imported-modules are documented (see also http://debbugs.gnu.org/cgi/bugreport.cgi?bug=44758),I was a bit clueless. Then I found out, that I have to add the module into Makefile.am and have to run configure. And there the typo happened. But still this was’t working and I thought that I may need to start the daemon with pre-inst-env to have the GUILE_LOAD_PATH properly point to guix/build. Well, and so the disaster happened.


Bye

Stefan
L
L
Ludovic Courtès wrote on 21 Nov 2020 12:00
(name . Stefan)(address . stefan-guix@vodafonemail.de)(address . 44735@debbugs.gnu.org)
87r1onc6u6.fsf@gnu.org
Hi Stefan,

Stefan <stefan-guix@vodafonemail.de> skribis:

Toggle quote (2 lines)
> That doesn’t seem to be so bad. :-)

Heh, good.

Toggle quote (8 lines)
>> ./configure warns or errors out and the manual warns in a couple of
>> places too, but evidently it remains too easy to shoot oneself in the
>> foot.
>
> It warns in the chapter “2 Requirements”. It doesn’t warn in chapter ”14.1 Building from Git”.
>
> Anyway, it was just a typo. Even if I would have known about that warning, this would have happened.

Yeah, we could always duplicate the warning in the manual, but it can
still be overlooked.

Toggle quote (4 lines)
> checking the current installation's localstatedir... /var
> configure: WARNING: chosen localstatedir '/vaar' does not match that of the existing installation '/var'
> configure: WARNING: installing may corrupt /gnu/store!

[...]

Toggle quote (2 lines)
> Indeed, there in all that pages of output, luckily on the last page, there is a warning. I could have noticed it. But I did’t. Red colour could have helped. :-)

Heh OK. At least it’s there. :-)

Note that it would have been an error if you had not passed an explicit
‘--localstatedir’ (see guix.m4). The assumption here is that, since you
explicitly passed ‘--localstatedir’, you “know what you’re doing”, hence
a mere warning.

Toggle quote (8 lines)
>> Also, why did you run guix-daemon from your checkout? This is only
>> necessary if you’re actually hacking on the daemon, but perhaps the
>> manual is misleading. (Hadn’t you run guix-daemon from the checkout,
>> the problem would not have occurred, even with a wrong
>> ‘--localstatedir’.)
>
> I was trying to add a build side module into guix/build. This failed all the time with an error “no code for module”. As neither #:modules nor #:imported-modules are documented (see also http://debbugs.gnu.org/cgi/bugreport.cgi?bug=44758), I was a bit clueless. Then I found out, that I have to add the module into Makefile.am and have to run configure. And there the typo happened. But still this was’t working and I thought that I may need to start the daemon with pre-inst-env to have the GUILE_LOAD_PATH properly point to guix/build. Well, and so the disaster happened.

OK. You definitely do not need to run guix-daemon from the checkout
to test this kind of changes.

Commit 9022861dc028e99fab930721fe991a682c497bbb clarified that
guix-daemon does not have to be launched from the checkout, but if you
can think of other places that need clarification, please let me know!

In the meantime, I’m closing this issue. Glad you recovered your store!

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 21 Nov 2020 12:01
control message for bug #44735
(address . control@debbugs.gnu.org)
87pn47c6t6.fsf@gnu.org
tags 44735 notabug
close 44735
quit
?