guix system reconfigure after kernel panic user or group not created

OpenSubmitted by Oleg Pykhalov.
Details
2 participants
  • Oleg Pykhalov
  • Ludovic Courtès
Owner
unassigned
Severity
normal
O
O
Oleg Pykhalov wrote on 10 Oct 2017 07:51
(address . bug-guix@gnu.org)
8760bnh7os.fsf@gmail.com
Hello Guix,
During 'guix system reconfigure' I got a kernel panic.
Getting it quite often while 'guix system reconfigure' at Linux magnolia4.13.4-gnu #1 SMP 1 x86_64 GNU/Linux. But it's not a subject of currentreport.
After reboot I tried another attempt to make a 'guix systemreconfigure', but because of /etc/group.lock and /etc/passwd.lockoperating-system groups and users fields didn't applied.
I tried to get a creation of adbusers group and add my user to it, butbecause of those files I cannot do it, until manually removed them.
So, do we need to check if *.lock files are exist and remove them at thestart of 'guix system reconfigure' OR abort reconfigure and creation ofGrub boot entry OR something else?
Thanks,Oleg.
O
O
Oleg Pykhalov wrote on 10 Oct 2017 16:55
(address . 28772@debbugs.gnu.org)
87y3ojxdas.fsf@gmail.com
Oleg Pykhalov <go.wigust@gmail.com> writes:
[...]
Toggle quote (4 lines)> Getting it quite often while 'guix system reconfigure' at Linux magnolia> 4.13.4-gnu #1 SMP 1 x86_64 GNU/Linux. But it's not a subject of current> report.
Kernel panic was caused by my miss configuring. I used a Guile moduleon reconfigure via GUILE_LOAD_PATH and (use-modles (my-module) inconfig.scm. I guess it's also required after reconfigure, according tosome times getting errors after reconfigure: cannot find my-module. SoI need to add my-module globally on system.
[...]
L
L
Ludovic Courtès wrote on 10 Oct 2017 17:37
(name . Oleg Pykhalov)(address . go.wigust@gmail.com)(address . 28772@debbugs.gnu.org)
87d15vm2t8.fsf@gnu.org
Hello,
Oleg Pykhalov <go.wigust@gmail.com> skribis:
Toggle quote (2 lines)> During 'guix system reconfigure' I got a kernel panic.
Can you show the exact command and its output?
A user-land program is not supposed to be able to cause a kernel panic;if it does, that’s a kernel bug.
But perhaps you got the kernel panic *after* rebooting in thereconfigured system? That could well be a GuixSD bug, indeed.
Toggle quote (11 lines)> After reboot I tried another attempt to make a 'guix system> reconfigure', but because of /etc/group.lock and /etc/passwd.lock> operating-system groups and users fields didn't applied.>> I tried to get a creation of adbusers group and add my user to it, but> because of those files I cannot do it, until manually removed them.>> So, do we need to check if *.lock files are exist and remove them at the> start of 'guix system reconfigure' OR abort reconfigure and creation of> Grub boot entry OR something else?
I’m not sure I understand what happened. Could you copy/paste commandsand outputs, or transcribe what happened?
TIA,Ludo’.
L
L
Ludovic Courtès wrote on 11 Oct 2017 15:27
(name . Oleg Pykhalov)(address . go.wigust@gmail.com)(address . 28772@debbugs.gnu.org)
87o9pdbyrt.fsf@gnu.org
Oleg Pykhalov <go.wigust@gmail.com> skribis:
Toggle quote (14 lines)> Oleg Pykhalov <go.wigust@gmail.com> writes:>> [...]>>> Getting it quite often while 'guix system reconfigure' at Linux magnolia>> 4.13.4-gnu #1 SMP 1 x86_64 GNU/Linux. But it's not a subject of current>> report.>> Kernel panic was caused by my miss configuring. I used a Guile module> on reconfigure via GUILE_LOAD_PATH and (use-modles (my-module) in> config.scm. I guess it's also required after reconfigure, according to> some times getting errors after reconfigure: cannot find my-module. So> I need to add my-module globally on system.
Without seeing the config, I cannot tell whether it’s a bug on our sideor not.
However, if you’re confident that the bug is in your code, that’s good.;-) Should we close this bug?
Thanks,Ludo’.
O
O
Oleg Pykhalov wrote on 11 Oct 2017 19:19
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 28772@debbugs.gnu.org)
87r2u9fvpz.fsf@gmail.com
Hello Ludovic,
apologies for not adding logs before. It's hard to do when I do guixcommands from Xterm and not from Emacs. Emacs *shell* or *compilation*buffers will eat all memory if they get too much text.
Probably need to redirect STDOUT STDERR in file when Xterm do guix.
I heard Guix folks work on implementing tiny log output to console andredirect everything else to a log file. This will be my life saver.
ludo@gnu.org (Ludovic Courtès) writes:
Toggle quote (8 lines)> Hello,>> Oleg Pykhalov <go.wigust@gmail.com> skribis:>>> During 'guix system reconfigure' I got a kernel panic.>> Can you show the exact command and its output?
Sorry, as I said this is not a topic and I don't want to do it again andI caution to make it on my current system.
I will setup a specific Guix VM for this, where I could make a 'systemreconfigure'. Then I'll create a new bug report with full log.
Neverless I'll leave a how-to reproduce it below for at least for myselfTODO list.
The problem===========
The bigger problem from my view are files like /etc/group.lock and/etc/passwd.lock. For example:
sudo touch /etc/group.lock
/etc/config.scm
(operating-system ;; … (groups (cons (user-group (name "test")) %base-groups)))
reconfigure log
$ guix system reconfigure $HOME/dotfiles/guix/system-magnolia.scm
substitute: updating list of substitutes from 'https://berlin.guixsd.org'... 100.0%The following derivations will be built: /gnu/store/v9dp6193rpxrx1rqfdw59s5ss4wlrfdh-system.drv /gnu/store/carkycnf6zcarbmnk5745pgsx1nv478y-grub.cfg.drv /gnu/store/r5p953fx3dl18aav1ggwmiy2bqnv991s-activate-service.drv /gnu/store/pjjm6595562ysk40zjrznhmsfsid1k8r-activate.drv /gnu/store/l41adszqk24sb200dwm8sj6ky71ivwpi-boot.drv/gnu/store/qqhzapsv5w8mrbz3s8hgy7w42r3dbyv9-system/gnu/store/b4i4drp7lpxmgpcfkbvgmrig2hlszl3j-grub.cfg/gnu/store/0b459jxdmyz5vf22avav9sm8ig03173k-grub-efi-2.02/gnu/store/ijw065yljn1np4x0p5l1qkx9w4z9ikcl-bootloader-installeractivating system...making '/gnu/store/qqhzapsv5w8mrbz3s8hgy7w42r3dbyv9-system' the current system...setting up setuid programs in '/run/setuid-programs'...populating /etc from /gnu/store/iyr9ji3idg3iphi3fskh2hqjlmg4h5w0-etc...usermod: no changesadding group 'test'...groupadd: existing lock file /etc/group.lock without a PIDgroupadd: cannot lock /etc/group; try again later.usermod: no changesusermod: existing lock file /etc/group.lock without a PIDusermod: cannot lock /etc/group; try again later.usermod: no changesusermod: existing lock file /etc/group.lock without a PIDusermod: cannot lock /etc/group; try again later.usermod: existing lock file /etc/group.lock without a PIDusermod: cannot lock /etc/group; try again later.usermod: existing lock file /etc/group.lock without a PIDusermod: cannot lock /etc/group; try again later.usermod: existing lock file /etc/group.lock without a PIDusermod: cannot lock /etc/group; try again later.usermod: existing lock file /etc/group.lock without a PIDusermod: cannot lock /etc/group; try again later.usermod: existing lock file /etc/group.lock without a PIDusermod: cannot lock /etc/group; try again later.usermod: existing lock file /etc/group.lock without a PIDusermod: cannot lock /etc/group; try again later.usermod: existing lock file /etc/group.lock without a PIDusermod: cannot lock /etc/group; try again later.usermod: existing lock file /etc/group.lock without a PIDusermod: cannot lock /etc/group; try again later.usermod: existing lock file /etc/group.lock without a PIDusermod: cannot lock /etc/group; try again later.usermod: no changesusermod: no changesusermod: no changesusermod: no changesusermod: no changesusermod: no changesusermod: no changesusermod: no changesusermod: no changesusermod: no changesusermod: no changesusermod: no changesusermod: no changesusermod: no changesusermod: no changesusermod: no changescreating nginx log directory '/var/log/nginx'creating nginx run directory '/var/run/nginx'creating nginx temp directories '/var/run/nginx/{client_body,proxy,fastcgi,uwsgi,scgi}_temp'nginx: [alert] could not open error log file: open() "/gnu/store/vyj2vkmdmlpxn3mnj71vz8zc8j30ahkf-nginx-1.12.1/logs/error.log" failed (2: No such file or directory)nginx: the configuration file /gnu/store/xms1g2z62rcj2h9i9d6fbqyl65a4yycm-nginx.conf syntax is oknginx: configuration file /gnu/store/xms1g2z62rcj2h9i9d6fbqyl65a4yycm-nginx.conf test is successfulguix system: unloading service 'user-homes'...shepherd: Removing service 'user-homes'...shepherd: Done.guix system: loading new services: user-homes...shepherd: Evaluating user expression (register-services (primitive-load "/gnu/sto?")).shepherd: Service user-homes could not be started.Installing for x86_64-efi platform.Installation finished. No error reported.
The new system generation was produced without "test" group, so youcould reboot into it. And it could lead to problems if we will have atiny output to console and big output to a log file, I guess.
Toggle quote (3 lines)> A user-land program is not supposed to be able to cause a kernel panic;> if it does, that’s a kernel bug.
How to make a kernel panic
The problem will be No defined variable IPTABLES-SSH after 'guix systemreconfigure' and kernel crash after.
$HOME/src/iptables/iptables/ru.scm
(define-module (iptables ru) ;; … )
(define %iptables-ssh "-A INPUT -p tcp --dport 22 \ -m state --state NEW -m recent --set --name SSH -j ACCEPT")
/etc/config.scm
(use-modules ;; … (iptables ru))
(define start-firewall #~(let ((iptables (lambda (str) (zero? (system (string-join `(,#$(file-append iptables "/sbin/iptables") ,str) " ")))))) (format #t "Install iptables rules.~%") (and ;; … (iptables %iptables-ssh))))
(define firewall-service (simple-service 'firewall shepherd-root-service-type (list (shepherd-service (provision '(firewall)) (requirement '()) (start #~(lambda _ #$start-firewall)) (respawn? #f) (stop #~(lambda _ (zero? (system* #$(file-append iptables "/sbin/iptables") "-F"))))))))
(operating-system ;; … (services (cons* ;; … firewall-service)))

Make a kernel panic
sudo GUILE_LOAD_PATH=\"$HOME/src/iptables\ :$GUILE_LOAD_PATH\" guix system reconfigure \ $HOME/dotfiles/guix/system-magnolia.scm
# Run above again and kernel will panic.
Toggle quote (3 lines)> But perhaps you got the kernel panic *after* rebooting in the> reconfigured system? That could well be a GuixSD bug, indeed.
No, it happens after second 'guix system reconfigure' with howto above.
[...]
Thanks,Oleg.
L
L
Ludovic Courtès wrote on 12 Oct 2017 09:57
(name . Oleg Pykhalov)(address . go.wigust@gmail.com)(address . 28772@debbugs.gnu.org)
871sm8ssru.fsf@gnu.org
Hi Oleg,
Oleg Pykhalov <go.wigust@gmail.com> skribis:
Toggle quote (4 lines)> apologies for not adding logs before. It's hard to do when I do guix> commands from Xterm and not from Emacs. Emacs *shell* or *compilation*> buffers will eat all memory if they get too much text.
I sympathize…
Toggle quote (18 lines)> The problem> ===========>> The bigger problem from my view are files like /etc/group.lock and> /etc/passwd.lock. For example:>> sudo touch /etc/group.lock>> /etc/config.scm>> (operating-system> ;; …> (groups (cons> (user-group (name "test"))> %base-groups)))>> reconfigure log
I think we can avoid the problem by forcefully removing these two lockfiles at boot time:
Toggle diff (13 lines)diff --git a/gnu/services.scm b/gnu/services.scmindex 329b7b151..2ef1d8530 100644--- a/gnu/services.scm+++ b/gnu/services.scm@@ -368,6 +368,8 @@ boot." #t)))) ;; Ignore I/O errors so the system can boot. (fail-safe+ (delete-file "/etc/group.lock")+ (delete-file "/etc/passwd.lock") (delete-file-recursively "/tmp") (delete-file-recursively "/var/run") (mkdir "/tmp")
Toggle quote (5 lines)> How to make a kernel panic>> The problem will be No defined variable IPTABLES-SSH after 'guix system> reconfigure' and kernel crash after.
[...]
Toggle quote (8 lines)> Make a kernel panic>> sudo GUILE_LOAD_PATH=\"$HOME/src/iptables\> :$GUILE_LOAD_PATH\" guix system reconfigure \> $HOME/dotfiles/guix/system-magnolia.scm>> # Run above again and kernel will panic.
I tried to reproduce it with my user’s shepherd, but that didn’t work:
Toggle snippet (31 lines)ludo@ribbon ~/src/guix$ cat ,t.scm(define s (make <service> #:provides '(nothing) #:start (lambda _ unbound)))
(register-services s)(start s)ludo@ribbon ~/src/guix$ herd load root ,t.scmLoading ,t.scm.herd: exception caught while executing 'load' on service 'root':ERROR: Unbound variable: fooludo@ribbon ~/src/guix$ herd load root ,t.scmLoading ,t.scm.herd: exception caught while executing 'start' on service 'nothing':ERROR: Unbound variable: unboundludo@ribbon ~/src/guix$ herd load root ,t.scmLoading ,t.scm.Assertion (null? (lookup-services (canonical-name new))) failed.herd: exception caught while executing 'load' on service 'root':ERROR: Throw to key `assertion-failed' with args `()'.ludo@ribbon ~/src/guix$ echo $?1ludo@ribbon ~/src/guix$ herd status
[...]
ludo@ribbon ~/src/guix$ echo $?0
IOW, shepherd caught the exceptions and didn’t die.
What am I missing?
Ludo’.
O
O
Oleg Pykhalov wrote on 12 Oct 2017 10:15
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 28772@debbugs.gnu.org)
87o9pceq8b.fsf@gmail.com
Hello Ludovic,
ludo@gnu.org (Ludovic Courtès) writes:
[...]
Toggle quote (19 lines)> I think we can avoid the problem by forcefully removing these two lock> files at boot time:>> diff --git a/gnu/services.scm b/gnu/services.scm> index 329b7b151..2ef1d8530 100644> --- a/gnu/services.scm> +++ b/gnu/services.scm> @@ -368,6 +368,8 @@ boot."> #t))))> ;; Ignore I/O errors so the system can boot.> (fail-safe> + (delete-file "/etc/group.lock")> + (delete-file "/etc/passwd.lock")> (delete-file-recursively "/tmp")> (delete-file-recursively "/var/run")> (mkdir "/tmp")>>
There is also a '/etc/.pwd.lock'. Info about this filehttps://lists.debian.org/debian-user/2005/07/msg02949.html

I'm not sure if any files are exist. Days past after reconfigurefailure.
$ sudo find /etc -name '*.lock' # Shows nothing.
[...]
Toggle quote (4 lines)> IOW, shepherd caught the exceptions and didn’t die.>> What am I missing?
I'll try to make a reproducible thing later.
Thanks,Oleg.
L
L
Ludovic Courtès wrote on 13 Oct 2017 10:25
(name . Oleg Pykhalov)(address . go.wigust@gmail.com)(address . 28772@debbugs.gnu.org)
87wp3zqwrz.fsf@gnu.org
Hi Oleg,
Oleg Pykhalov <go.wigust@gmail.com> skribis:
Toggle quote (32 lines)> ludo@gnu.org (Ludovic Courtès) writes:>> [...]>>> I think we can avoid the problem by forcefully removing these two lock>> files at boot time:>>>> diff --git a/gnu/services.scm b/gnu/services.scm>> index 329b7b151..2ef1d8530 100644>> --- a/gnu/services.scm>> +++ b/gnu/services.scm>> @@ -368,6 +368,8 @@ boot.">> #t))))>> ;; Ignore I/O errors so the system can boot.>> (fail-safe>> + (delete-file "/etc/group.lock")>> + (delete-file "/etc/passwd.lock")>> (delete-file-recursively "/tmp")>> (delete-file-recursively "/var/run")>> (mkdir "/tmp")>>>>>> There is also a '/etc/.pwd.lock'. Info about this file> https://lists.debian.org/debian-user/2005/07/msg02949.html>>> I'm not sure if any files are exist. Days past after reconfigure> failure.>> $ sudo find /etc -name '*.lock' # Shows nothing.
I’ve pushed it as aad8a143000600abec5c8ebfadec4c09f34f1b73.
Toggle quote (8 lines)> [...]>>> IOW, shepherd caught the exceptions and didn’t die.>>>> What am I missing?>> I'll try to make a reproducible thing later.
Awesome.
Thanks,Ludo’.
L
L
Ludovic Courtès wrote on 13 Oct 2017 10:35
Re: 02/02: doc: Add an example to the documentation of the udev-service.
(address . guix-devel@gnu.org)
87r2u7qwbk.fsf@gnu.org
Hi!
rekado@elephly.net (Ricardo Wurmus) skribis:
Toggle quote (8 lines)> commit e0c1d080b520c1bbd2dcd7bc90a750f5ce580486> Author: Ricardo Wurmus <rekado@elephly.net>> Date: Mon Oct 9 23:03:56 2017 +0200>> doc: Add an example to the documentation of the udev-service.> > * doc/guix.texi (Base Services): Update 'udev-service' documentation.
Good idea.
Toggle quote (14 lines)> +@example> +(define %example-udev-rule> + (udev-rule "90-usb-thing.rules"> + "ACTION==\"add\", SUBSYSTEM==\"usb\", ATTR@{product@}==\"Example\", RUN+=\"/path/to/script\""))> +> +(operating-system> + ;; @dots{}> + (services (modify-services %desktop-services> + (udev-service-type config =>> + (udev-configuration (inherit config)> + (rules (append (udev-configuration-rules config)> + (list %example-udev-rule))))))))> +@end example
https://bugs.gnu.org/28647 is somewhat related.
Fundamentally though, to simplify this use case, we should have:
(define (additional-udev-rules . rules) "Add RULES, a list of file-like object, as a udev rules." (simple-service 'udev-rule udev-service-type rules))
so one can write:
(operating-system ;; … (services (cons (additional-udev-rules %example-udev-rule) %desktop-services)))
Thoughts?
Ludo’.
L
L
Ludovic Courtès wrote on 20 Oct 2017 18:02
Re: bug#28772: guix system reconfigure after kernel panic user or group not created
(name . Oleg Pykhalov)(address . go.wigust@gmail.com)(address . 28772@debbugs.gnu.org)
87a80lvmdb.fsf@gnu.org
Hi,
Oleg Pykhalov <go.wigust@gmail.com> skribis:
Toggle quote (40 lines)> ludo@gnu.org (Ludovic Courtès) writes:>> [...]>>> I think we can avoid the problem by forcefully removing these two lock>> files at boot time:>>>> diff --git a/gnu/services.scm b/gnu/services.scm>> index 329b7b151..2ef1d8530 100644>> --- a/gnu/services.scm>> +++ b/gnu/services.scm>> @@ -368,6 +368,8 @@ boot.">> #t))))>> ;; Ignore I/O errors so the system can boot.>> (fail-safe>> + (delete-file "/etc/group.lock")>> + (delete-file "/etc/passwd.lock")>> (delete-file-recursively "/tmp")>> (delete-file-recursively "/var/run")>> (mkdir "/tmp")>>>>>> There is also a '/etc/.pwd.lock'. Info about this file> https://lists.debian.org/debian-user/2005/07/msg02949.html>>> I'm not sure if any files are exist. Days past after reconfigure> failure.>> $ sudo find /etc -name '*.lock' # Shows nothing.>> [...]>>> IOW, shepherd caught the exceptions and didn’t die.>>>> What am I missing?>> I'll try to make a reproducible thing later.
Did you eventually gather more info?
Thanks,Ludo’.
L
L
Ludovic Courtès wrote on 20 Oct 2017 18:02
control message for bug #28772
(address . control@debbugs.gnu.org)
878tg5vmd6.fsf@gnu.org
tags 28772 moreinfo
O
O
Oleg Pykhalov wrote on 22 Oct 2017 16:41
Re: bug#28772: guix system reconfigure after kernel panic user or group not created
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 28772@debbugs.gnu.org)
87a80jjlcx.fsf@gmail.com
Hello Ludovic,
Apologies for the late reply.
ludo@gnu.org (Ludovic Courtès) writes:
[...]
Toggle quote (2 lines)> Did you eventually gather more info?
Yes, I got an undefined %iptables-rst variable in reconfigure outputat first run.
But at second run substitutions didn't work. GuixSD rebuilds the world.I didn't wait for this.
I also didn't make guix pull.
/tmp/guixsd├── bootstrap.sh├── iptables│   └── iptables.scm├── panic.sh└── vm-image.scm
1 directory, 4 files
Attachment: bootstrap.sh
Attachment: panic.sh
Attachment: vm-image.scm
Attachment: iptables.scm
O
O
Oleg Pykhalov wrote on 22 Oct 2017 22:39
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 28772@debbugs.gnu.org)
87fuaaq5lw.fsf@gmail.com
ludo@gnu.org (Ludovic Courtès) writes:
[...]
Toggle quote (2 lines)> Did you eventually gather more info?
In addition to previous message.
--- /home/natsu/r/bootstrap.sh 2017-10-22 04:48:42.992394510 +0300+++ ./bootstrap.sh 2017-10-22 23:33:54.113495658 +0300@@ -11,7 +11,7 @@ SIG=guixsd-vm-image-$VERSION.$KERNEL.xz.sig pull_release () {- if [ ! -f $RELEASE ] || [ ! -f $RELEASE.xz ] || ! gpg --verify $SIG+ if [ ! -f $RELEASE ] || [ ! -f $RELEASE.xz ] then gpg --keyserver pgp.mit.edu --recv-keys $GPG_KEY wget --output-document=$RELEASE.xz https://alpha.gnu.org/gnu/guix/$RELEASE.xz@@ -23,6 +23,7 @@ if pull_release then qemu-system-x86_64 \+ -enable-kvm \ -daemonize \ -m 1024 \ -virtfs local,path=$PWD,security_model=none,mount_tag=TAG_pwd \
Also need to increase a size of qemu image. I don't know how Ireconfigured successfully in previous message.
L
L
Ludovic Courtès wrote on 23 Oct 2017 01:01
(name . Oleg Pykhalov)(address . go.wigust@gmail.com)(address . 28772@debbugs.gnu.org)
878tg2yegc.fsf@gnu.org
Hi Oleg,
Oleg Pykhalov <go.wigust@gmail.com> skribis:
Toggle quote (12 lines)> ludo@gnu.org (Ludovic Courtès) writes:>> [...]>>> Did you eventually gather more info?>> Yes, I got an undefined %iptables-rst variable in reconfigure output> at first run.>> But at second run substitutions didn't work. GuixSD rebuilds the world.> I didn't wait for this.
In my previous message I showed a way to (attempt to) reproduce theproblem by directly using ‘herd load’.
Could you try to reproduce the problem in this way?
You should even be able to check with a shepherd instance not running asPID 1, which is more convenient.
Thanks in advance,Ludo’.
?