gilbc of the running system got wiped while building a package, system broken

DoneSubmitted by Stefan.
Details
3 participants
  • Stefan Kuhr
  • Ludovic Courtès
  • Stefan
Owner
unassigned
Severity
normal
S
S
Stefan wrote on 19 Nov 2020 09:03
(address . bug-guix@gnu.org)
9763EBD0-9CDC-4A06-A113-F6443CB4348D@vodafonemail.de
Hi!
After trying to just build a package (with a modified guix, but this is certainly unrelated), the system broke catastrophically:
stefan@guix ~/development/guix$ sudo -E ./pre-inst-env guix-daemon --build-users-group=guixbuild &stefan@guix ~/development/guix$ /home/stefan/development/guix/pre-inst-env guix build -L /home/stefan/guix u-boot-rpi-3accepted connection from pid 23848, user stefansubstitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%12,5 MB will be downloaded: /gnu/store/ldg4jqfan2vp01lm255zz7zrb4vllixp-libxau-1.0.9 /gnu/store/m1r4jwmc56q44x31xcnvg1hcijf0lq88-libxcb-1.14 /gnu/store/8b75zmsyxc5qghfrxhyqi6g23bq993b1-libbsd-0.10.0 /gnu/store/z18hwxwgk551y4a0f6j1dxhmp208i4ha-bash-static-5.0.16 /gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31 /gnu/store/z7hanmdmdalqh1v0y7z8ilinfhyfh91d-glibc-2.31-static /gnu/store/pcsl88vd66k62sk1g4wcc9i985xn369m-libxdmcp-1.1.3 /gnu/store/785ldh00ix897pamyg5p6fpjls6ddwzz-libx11-1.6.A-doc /gnu/store/x10mk7ri4ny013km57d3h5093270r7pg-libx11-1.6.Asubstitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%substituting /gnu/store/785ldh00ix897pamyg5p6fpjls6ddwzz-libx11-1.6.A-doc...downloading from https://ci.guix.gnu.org/nar/lzip/785ldh00ix897pamyg5p6fpjls6ddwzz-libx11-1.6.A-doc... libx11-1.6.A-doc 1.2MiB 309KiB/s 00:04 [##################] 100.0%
substituting /gnu/store/z18hwxwgk551y4a0f6j1dxhmp208i4ha-bash-static-5.0.16...downloading from https://ci.guix.gnu.org/nar/lzip/z18hwxwgk551y4a0f6j1dxhmp208i4ha-bash-static-5.0.16... bash-static-5.0.16 502KiB 976KiB/s 00:01 [##################] 100.0%
guix build: error: cannot unlink `/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale/de/LC_MESSAGES': Directory not emptystefan@guix ~/development/guix$ /home/stefan/development/guix/pre-inst-env guix build -L /home/stefan/guix u-boot-rpi-3accepted connection from pid 23911, user stefansubstitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%12,0 MB will be downloaded: /gnu/store/ldg4jqfan2vp01lm255zz7zrb4vllixp-libxau-1.0.9 /gnu/store/m1r4jwmc56q44x31xcnvg1hcijf0lq88-libxcb-1.14 /gnu/store/8b75zmsyxc5qghfrxhyqi6g23bq993b1-libbsd-0.10.0 /gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31 /gnu/store/z7hanmdmdalqh1v0y7z8ilinfhyfh91d-glibc-2.31-static /gnu/store/pcsl88vd66k62sk1g4wcc9i985xn369m-libxdmcp-1.1.3 /gnu/store/785ldh00ix897pamyg5p6fpjls6ddwzz-libx11-1.6.A-doc /gnu/store/x10mk7ri4ny013km57d3h5093270r7pg-libx11-1.6.Aguix build: error: cannot unlink `/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib/gconv': Directory not emptystefan@guix ~/development/guix$ /home/stefan/development/guix/pre-inst-env guix build -L /home/stefan/guix u-boot-rpi-3-bash: /home/stefan/development/guix/pre-inst-env: /bin/sh: Defekter Interpreter: No such file or directorystefan@guix ~/development/guix$ ls-bash: /run/current-system/profile/bin/ls: No such file or directorystefan@guix ~/development/guix$ echo $PATH/run/setuid-programs:/home/stefan/.config/guix/current/bin:/home/stefan/.guix-profile/bin:/run/current-system/profile/bin:/run/current-system/profile/sbinstefan@guix ~/development/guix$ ls-bash: /run/current-system/profile/bin/ls: No such file or directory

The problem that the unlink was not successful was certainly due to a deleted but still opened file on the NFS share. There may be an intermediate hidden .nfs… file, which get created in such a case (“delete on last close”, “silly rename”), However, the RFC-5661 for NFS demands even if OPEN4_RESULT_PRESERVE_UNLINKED is supported, that the directory entry of an open file must not be removed in this case, thus preventing a directory removal.
Taking a look on the nfs-server, this is now the content of the glibc:
# ls gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib -ltotal 12drwxr-xr-x 2 root root 12288 Nov 18 19:41 gconvlrwxrwxrwx 7 root root 11 Jan 1 1970 libanl.so -> libanl.so.1
Everything is missing.
Of course I tried to reboot the system, but because of the missing ld-linux-aarch64.so.1 the system is not booting properly.

This leaves some questions:
If the whole system (maybe not the booted/running one, but a reconfigured one) was already using /gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31, why has it been downloaded again when building a package? Was this due to grafting?
How can it be that the store is kind of “state“ being modified on the fly?
How does it come that removing files for /gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31 happens before the download even started? Could this render a system unusable if a network loss happens at the right point in time?
Booting the previous system generation luckily worked – it must be using a different glibc version. But now what about that broken glibc version? How to repair it? The garbage collector will not remove it, as it is still referenced by the latest system generation. Actually I don’t want to delete that generation. The database certainly believes, that this glibc package is installed correctly. What to do now?
The used guix version used when building the package was f6a42ac946edccc7de5e93ee247487cbec40072b.

Bye
Stefan
S
S
Stefan Kuhr wrote on 19 Nov 2020 09:29
(address . 44735@debbugs.gnu.org)
C3B47D7A-2170-4F72-AFE3-AAD5E6CBBFFA@arcor.de
Hi!
I didn’t look close enough before or didn’t wait long enough, but booting the previous system failed, actually.
I now tried to boot some more system generations and even the oldest one: None of my previous system generations is booting any longer. The system is bricked.

Bye
Stefan
S
S
Stefan wrote on 19 Nov 2020 12:45
(address . 44735@debbugs.gnu.org)
ACFC26D7-CA41-4961-A894-1045A037F866@vodafonemail.de
Hi!
I resolved the problem by copying gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31 from https://alpha.gnu.org/gnu/guix/guix-binary-1.2.0rc2.aarch64-linux.tar.xzinto gnu/store on the NFS server. The system is running again.
Now I retried my commands and I am able to reproduce the problem:
stefan@guix ~/development/guix$ /home/stefan/development/guix/pre-inst-env guix build -L /home/stefan/guix u-boot-rpi-3accepted connection from pid 241, user stefan12,0 MB will be downloaded:/gnu/store/ldg4jqfan2vp01lm255zz7zrb4vllixp-libxau-1.0.9/gnu/store/m1r4jwmc56q44x31xcnvg1hcijf0lq88-libxcb-1.14/gnu/store/8b75zmsyxc5qghfrxhyqi6g23bq993b1-libbsd-0.10.0/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/gnu/store/z7hanmdmdalqh1v0y7z8ilinfhyfh91d-glibc-2.31-static/gnu/store/pcsl88vd66k62sk1g4wcc9i985xn369m-libxdmcp-1.1.3/gnu/store/785ldh00ix897pamyg5p6fpjls6ddwzz-libx11-1.6.A-doc/gnu/store/x10mk7ri4ny013km57d3h5093270r7pg-libx11-1.6.Aguix build: error: cannot unlink `/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale/de/LC_MESSAGES': Directory not empty

This is now the diff of a correct cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31 and the now broken one:
# diff -r cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/ guix-system/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/: etcOnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/: includeOnly in guix-system/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib/gconv: .nfs0000000003a20e490000231bOnly in guix-system/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib: .nfs0000000003a20e8800002314Only in guix-system/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib: .nfs0000000003a20e9300002316Only in guix-system/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib: .nfs0000000003a20e9700002311Only in guix-system/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib: .nfs0000000003a20e9a00002317Only in guix-system/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib: .nfs0000000003a20e9d00002315Only in guix-system/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib: .nfs0000000003a20eac00002319Only in guix-system/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib: .nfs0000000003a20eb300002318Only in guix-system/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib: .nfs0000000003a20eb800002313Only in guix-system/gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/lib: .nfs0000000003a20ec100002312Only in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/: libexecOnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share: docOnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share: i18nOnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share: infoOnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: beOnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: caOnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale/de/LC_MESSAGES: @eaDirOnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale/de/LC_MESSAGES: libc.moOnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: en_GBOnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: eoOnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: esOnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: fiOnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: frOnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: huOnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: jaOnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: koOnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: ltOnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: nbOnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: plOnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: ptOnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: pt_BROnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: rwOnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: skOnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: svOnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: ukOnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: viOnly in cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31/share/locale: zh_TW
Indeed there are the hidden .nfs files leading to the unlink error.If I would try the same commands a second time, as before, then more files will be removed and my system will brake again.
Why is guix trying to reinstall /gnu/store/cb88z63hyg1icd2kkahiink2p291mhr2-glibc-2.31? It is part of the running system already.

Bye
Stefan
S
S
Stefan wrote on 19 Nov 2020 14:55
(address . 44735@debbugs.gnu.org)
4A9406A5-DB4F-4A05-BCC0-4AD8DBD2A112@vodafonemail.de
Hi!
I found the root cause of this issue: I made a typo and inadvertently did a “./configure --localstatedir=/vaar”.
The manual should warn that using a wrong (or omitted) --localstatedir may destroy the guix installation and possibly the whole guix system.
What can be done to prevent that a simple mistake like this destroys a system?

Having the configure command boxed inside the manual would at least reduce the risk of overlooking this, see also https://issues.guix.gnu.org/40848.But this still does not prevent anything.

Bye
Stefan
L
L
Ludovic Courtès wrote on 20 Nov 2020 12:31
Re: bug#44735: gilbc of the running system got wiped while building a package, system broken
(name . Stefan)(address . stefan-guix@vodafonemail.de)(address . 44735@debbugs.gnu.org)
878sawfemr.fsf@gnu.org
Hi Stefan,
Stefan <stefan-guix@vodafonemail.de> skribis:
Toggle quote (2 lines)> I found the root cause of this issue: I made a typo and inadvertently did a “./configure --localstatedir=/vaar”.
Ouch. :-/
Your store database may no longer be in sync with your actual store soyou may have to reinstall. You can try ‘guix gc --verify’ to get anidea of how bad the situation is.
Toggle quote (4 lines)> The manual should warn that using a wrong (or omitted) --localstatedir may destroy the guix installation and possibly the whole guix system.>> What can be done to prevent that a simple mistake like this destroys a system?
./configure warns or errors out and the manual warns in a couple ofplaces too, but evidently it remains too easy to shoot oneself in thefoot.
Could you check ‘config.log’ to see what ‘configure’ said? You can seethe source of this check at the bottom of ‘m4/guix.m4’.
Also, why did you run guix-daemon from your checkout? This is onlynecessary if you’re actually hacking on the daemon, but perhaps themanual is misleading. (Hadn’t you run guix-daemon from the checkout,the problem would not have occurred, even with a wrong‘--localstatedir’.)
Thanks,Ludo’.
S
S
Stefan wrote on 20 Nov 2020 15:35
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 44735@debbugs.gnu.org)
78228B3F-FA9D-48C1-B70C-2F0B4CC65446@vodafonemail.de
Hi Ludo’!
Toggle quote (4 lines)> Your store database may no longer be in sync with your actual store so> you may have to reinstall. You can try ‘guix gc --verify’ to get an> idea of how bad the situation is.
stefan@guix ~/development/guix$ guix gc --verifyreading the store...checking path existence...path `/gnu/store/1kh1p8ypgn1yn826cc0mizw7gjjn5yfb-usbutils-012-guile-builder' disappeared, removing from database...path `/gnu/store/da76qwnqrfravn2qd92b6vk5inp7273v-vala-0.44.5.drv' disappeared, removing from database...path `/gnu/store/iq987sfc1bwyaijckagv59b0z2z3c4nb-vala-0.44.5.drv' disappeared, removing from database...path `/gnu/store/m7l8381hqz4dgp12v9fbnf0k9n1ij5ja-module-import-compiled-guile-builder' disappeared, removing from database...path `/gnu/store/mnhh9m6v88zk9k7lc6hj15db40qv5cnh-guix-packages-base-modules-builder' disappeared, removing from database...path `/gnu/store/nal2ssav0z0qk523w5v6xp2vfqqfpc13-guix-module-union-builder' disappeared, removing from database...path `/gnu/store/x5gczh79g5aarws1xgkcp2gc1av4fzas-vala-0.44.5.tar.xz.drv' disappeared, removing from database…
stefan@guix ~/development/guix$ guix gc --verifyreading the store...checking path existence…
stefan@guix ~/development/guix$
That doesn’t seem to be so bad. :-)
Toggle quote (4 lines)> ./configure warns or errors out and the manual warns in a couple of> places too, but evidently it remains too easy to shoot oneself in the> foot.
It warns in the chapter “2 Requirements”. It doesn’t warn in chapter ”14.1 Building from Git”.
Anyway, it was just a typo. Even if I would have known about that warning, this would have happened.
Toggle quote (3 lines)> Could you check ‘config.log’ to see what ‘configure’ said? You can see> the source of this check at the bottom of ‘m4/guix.m4’.
I retried:
stefan@guix ~/development/guix [env]$ ./configure --localstatedir=/vaarchecking for a BSD-compatible install... /gnu/store/5hj9mdr79nqfcqg9hb45dpfrrs5qqrnr-profile/bin/install -cchecking whether build environment is sane... yeschecking for a thread-safe mkdir -p... /gnu/store/5hj9mdr79nqfcqg9hb45dpfrrs5qqrnr-profile/bin/mkdir -pchecking for gawk… gawk
[pages of checking]
checking the current installation's localstatedir... /varconfigure: WARNING: chosen localstatedir '/vaar' does not match that of the existing installation '/var'configure: WARNING: installing may corrupt /gnu/store!checking that generated files are newer than configure... doneconfigure: creating ./config.statusconfig.status: creating Makefileconfig.status: creating po/guix/Makefile.inconfig.status: creating po/packages/Makefile.inconfig.status: creating etc/guix-daemon.cilconfig.status: creating guix/config.scmconfig.status: creating etc/committer.scmconfig.status: creating test-envconfig.status: creating pre-inst-envconfig.status: creating nix/config.hconfig.status: nix/config.h is unchangedconfig.status: executing depfiles commandsconfig.status: executing po-directories commandsconfig.status: creating po/guix/POTFILESconfig.status: creating po/guix/Makefileconfig.status: creating po/packages/POTFILESconfig.status: creating po/packages/Makefilestefan@guix ~/development/guix [env]$
Indeed, there in all that pages of output, luckily on the last page, there is a warning. I could have noticed it. But I did’t. Red colour could have helped. :-)
The same warning is hidden inside the very middle of the config.log. At least the mistake with localstatedir is visible right at its beginning.
Would it be possible to do that check right at the beginning of configure and ask the user for confirmation?
Toggle quote (6 lines)> Also, why did you run guix-daemon from your checkout? This is only> necessary if you’re actually hacking on the daemon, but perhaps the> manual is misleading. (Hadn’t you run guix-daemon from the checkout,> the problem would not have occurred, even with a wrong> ‘--localstatedir’.)
I was trying to add a build side module into guix/build. This failed all the time with an error “no code for module”. As neither #:modules nor #:imported-modules are documented (see also http://debbugs.gnu.org/cgi/bugreport.cgi?bug=44758),I was a bit clueless. Then I found out, that I have to add the module into Makefile.am and have to run configure. And there the typo happened. But still this was’t working and I thought that I may need to start the daemon with pre-inst-env to have the GUILE_LOAD_PATH properly point to guix/build. Well, and so the disaster happened.

Bye
Stefan
L
L
Ludovic Courtès wrote on 21 Nov 2020 12:00
(name . Stefan)(address . stefan-guix@vodafonemail.de)(address . 44735@debbugs.gnu.org)
87r1onc6u6.fsf@gnu.org
Hi Stefan,
Stefan <stefan-guix@vodafonemail.de> skribis:
Toggle quote (2 lines)> That doesn’t seem to be so bad. :-)
Heh, good.
Toggle quote (8 lines)>> ./configure warns or errors out and the manual warns in a couple of>> places too, but evidently it remains too easy to shoot oneself in the>> foot.>> It warns in the chapter “2 Requirements”. It doesn’t warn in chapter ”14.1 Building from Git”.>> Anyway, it was just a typo. Even if I would have known about that warning, this would have happened.
Yeah, we could always duplicate the warning in the manual, but it canstill be overlooked.
Toggle quote (4 lines)> checking the current installation's localstatedir... /var> configure: WARNING: chosen localstatedir '/vaar' does not match that of the existing installation '/var'> configure: WARNING: installing may corrupt /gnu/store!
[...]
Toggle quote (2 lines)> Indeed, there in all that pages of output, luckily on the last page, there is a warning. I could have noticed it. But I did’t. Red colour could have helped. :-)
Heh OK. At least it’s there. :-)
Note that it would have been an error if you had not passed an explicit‘--localstatedir’ (see guix.m4). The assumption here is that, since youexplicitly passed ‘--localstatedir’, you “know what you’re doing”, hencea mere warning.
Toggle quote (8 lines)>> Also, why did you run guix-daemon from your checkout? This is only>> necessary if you’re actually hacking on the daemon, but perhaps the>> manual is misleading. (Hadn’t you run guix-daemon from the checkout,>> the problem would not have occurred, even with a wrong>> ‘--localstatedir’.)>> I was trying to add a build side module into guix/build. This failed all the time with an error “no code for module”. As neither #:modules nor #:imported-modules are documented (see also http://debbugs.gnu.org/cgi/bugreport.cgi?bug=44758), I was a bit clueless. Then I found out, that I have to add the module into Makefile.am and have to run configure. And there the typo happened. But still this was’t working and I thought that I may need to start the daemon with pre-inst-env to have the GUILE_LOAD_PATH properly point to guix/build. Well, and so the disaster happened.
OK. You definitely do not need to run guix-daemon from the checkoutto test this kind of changes.
Commit 9022861dc028e99fab930721fe991a682c497bbb clarified thatguix-daemon does not have to be launched from the checkout, but if youcan think of other places that need clarification, please let me know!
In the meantime, I’m closing this issue. Glad you recovered your store!
Thanks,Ludo’.
L
L
Ludovic Courtès wrote on 21 Nov 2020 12:01
control message for bug #44735
(address . control@debbugs.gnu.org)
87pn47c6t6.fsf@gnu.org
tags 44735 notabugclose 44735quit
?