User account password got locked when booting old generation

DoneSubmitted by pelzflorian (Florian Pelz).
Details
4 participants
  • Gábor Boskovits
  • Danny Milosavljevic
  • Ludovic Courtès
  • pelzflorian (Florian Pelz)
Owner
unassigned
Severity
important
Merged with
P
P
pelzflorian (Florian Pelz) wrote on 29 May 2019 22:45
(address . bug-guix@gnu.org)
20190529204517.mqn5xrw23xib4i3u@pelzflorian.localdomain
After I reconfigured to a broken, unbootable generation and thenrebooted to an old working generation, I found my user accountpassword was locked. This was my /etc/shadow:
root::18045::::::florian:!:18045::::::nobody:!:18045::::::guixbuilder01:!:18045::::::guixbuilder02:!:18045::::::guixbuilder03:!:18045::::::guixbuilder04:!:18045::::::guixbuilder05:!:18045::::::guixbuilder06:!:18045::::::guixbuilder07:!:18045::::::guixbuilder08:!:18045::::::guixbuilder09:!:18045::::::guixbuilder10:!:18045::::::ntpd:!:18045::::::messagebus:!:18045::::::polkitd:!:18045::::::geoclue:!:18045::::::colord:!:18045::::::avahi:!:18045::::::gdm:!:18045::::::httpd:!:18045::::::
Logging in as root (root had an empty password before as well) andrunning `passwd florian` fixed it, and I *cannot* reproduce the buganymore by booting the broken generation again, i.e. my passwordremains set now.
I presume it is not possible to lock a password by typing the wrongpassword too often on the virtual console?
If you think this is not enough material to work on, feel free toclose this bug, but there seems to be some misbehavior somewhere inGuix’ password management.
(The reason for the new generation’s brokenness seems unrelated; itcould not boot after I tried adding syslogd to the requirements ofudev-shepherd-service.)
Regards,Florian
L
L
Ludovic Courtès wrote on 1 Jun 2019 00:05
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 35996@debbugs.gnu.org)
877ea6l1on.fsf@gnu.org
Hello,
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
Toggle quote (31 lines)> After I reconfigured to a broken, unbootable generation and then> rebooted to an old working generation, I found my user account> password was locked. This was my /etc/shadow:>> root::18045::::::> florian:!:18045::::::> nobody:!:18045::::::> guixbuilder01:!:18045::::::> guixbuilder02:!:18045::::::> guixbuilder03:!:18045::::::> guixbuilder04:!:18045::::::> guixbuilder05:!:18045::::::> guixbuilder06:!:18045::::::> guixbuilder07:!:18045::::::> guixbuilder08:!:18045::::::> guixbuilder09:!:18045::::::> guixbuilder10:!:18045::::::> ntpd:!:18045::::::> messagebus:!:18045::::::> polkitd:!:18045::::::> geoclue:!:18045::::::> colord:!:18045::::::> avahi:!:18045::::::> gdm:!:18045::::::> httpd:!:18045::::::>> Logging in as root (root had an empty password before as well) and> running `passwd florian` fixed it, and I *cannot* reproduce the bug> anymore by booting the broken generation again, i.e. my password> remains set now.
Did the old generation you booted have user ‘florian’?
Also, how old was it? User account management changed in March (commit0ae735bcc8ff7fdc89d67b492bdee9091ee19e86).
Ludo’.
P
P
pelzflorian (Florian Pelz) wrote on 1 Jun 2019 07:52
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190601055238.jkhefpupavz7aipi@pelzflorian.localdomain
On Sat, Jun 01, 2019 at 12:05:44AM +0200, Ludovic Courtès wrote:
Toggle quote (3 lines)> Did the old generation you booted have user ‘florian’?>
Yes, this is my main machine. The users are the same.
Toggle quote (4 lines)> Also, how old was it? User account management changed in March (commit> 0ae735bcc8ff7fdc89d67b492bdee9091ee19e86).>
The git commit for both old and new generation is
commit ad7466aafd7f166d0b6be5eb32dda1d3ee8a6445 (origin/master, origin/HEAD)Author: Ludovic Court<C3><A8>s <ludo@gnu.org>Date: Sun May 26 23:18:21 2019 +0200
with unrelevant patches (for USB_ModeSwitch) applied on both, as wellas a non-working patch adding syslogd as a requirement toudev-shepherd-service (making the generation unbootable and makingsomething display Stack overflow) and a working patch to add a --debugargument to udevd for the new, unbootable generation. The oldgeneration has a patch that adds wrong arguments to udevd that udevdignores. All seems harmless.
I still cannot reproduce despite rebooting the broken generationmultiple times and then booting back into the old generation.
I wonder what would change /etc/shadow.
Regards,Florian
P
P
pelzflorian (Florian Pelz) wrote on 1 Jun 2019 16:58
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190601145834.f4wgm4oqmdyej7n5@pelzflorian.localdomain
On Sat, Jun 01, 2019 at 07:52:38AM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (3 lines)> I wonder what would change /etc/shadow.>
If the error occurred on common non-Guix distros, it hopefully wouldhave been fixed before, maybe. Of course Guix recreates /etc/shadowmuch more frequently.
Guix appears to add shadow files atomically in gnu/build/accounts.scm.I do not know if there could have been an error reading the old shadowfile, e.g. because it is locked or something?
The elogind source code in src/basic/user-util.c contains code forlocking /etc/shadow, with a comment that explains why its lckpwdf isimplemented differently from shadow-utils.
AccountsService appears to only be usable for reading /etc/shadow, notfor writing it, contrary to what the Guix manual claims (??). Forwriting passwords, gnome-control-center does not use AccountsService,it calls /usr/bin/passwd directly in its source code inpanels/user-accounts/run-passwd.c.
Regards,Florian
L
L
Ludovic Courtès wrote on 1 Jun 2019 23:37
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 35996@debbugs.gnu.org)
87r28dc7gw.fsf@gnu.org
Hi Florian,
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
Toggle quote (8 lines)> On Sat, Jun 01, 2019 at 07:52:38AM +0200, pelzflorian (Florian Pelz) wrote:>> I wonder what would change /etc/shadow.>> >> If the error occurred on common non-Guix distros, it hopefully would> have been fixed before, maybe. Of course Guix recreates /etc/shadow> much more frequently.
Definitely.
Toggle quote (4 lines)> Guix appears to add shadow files atomically in gnu/build/accounts.scm.> I do not know if there could have been an error reading the old shadow> file, e.g. because it is locked or something?
(gnu build accounts) doesn’t care at all about /etc/.pwd.lock, the lockfile used by libc’s ‘lckpwdf’ function.
This is definitely not a problem when booting. It could be a problem ifyou’re concurrently running ‘guix system reconfigure’ (which runsactivation snippets, including the account updating code) and some otherprogram, such as ‘passwd’, that assumes it holds an exclusive lock onthe file. Though in that case, the worst that could happen is that thechanges made by Guix would be undoed by that other program.
Toggle quote (7 lines)> The elogind source code in src/basic/user-util.c contains code for> locking /etc/shadow, with a comment that explains why its lckpwdf is> implemented differently from shadow-utils.>> AccountsService appears to only be usable for reading /etc/shadow, not> for writing it, contrary to what the Guix manual claims (??).
That might be a bug.
Toggle quote (4 lines)> For writing passwords, gnome-control-center does not use> AccountsService, it calls /usr/bin/passwd directly in its source code> in panels/user-accounts/run-passwd.c.
That’s definitely a bug to fix: it should invoke/run/setuid-programs/passwd instead.
Thanks for investigating,Ludo’.
P
P
pelzflorian (Florian Pelz) wrote on 2 Jun 2019 09:05
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190602070545.xp2pqlnzsthpjtbw@pelzflorian.localdomain
On Sat, Jun 01, 2019 at 11:37:51PM +0200, Ludovic Courtès wrote:
Toggle quote (7 lines)> This is definitely not a problem when booting. It could be a problem if> you’re concurrently running ‘guix system reconfigure’ (which runs> activation snippets, including the account updating code) and some other> program, such as ‘passwd’, that assumes it holds an exclusive lock on> the file. Though in that case, the worst that could happen is that the> changes made by Guix would be undoed by that other program.
Thank you for explaining!
OK. Also why would a read error cause the password to get lockedanyway. I suppose this is an issue with mingetty or something lockingthe password, perhaps after too many failed login attempts. (Notethat I’m prone to making typos because with this Macbook keyboard keypresses are sometimes recognized twice on and only on a virtualconsole.) I will investigate mingetty next.

I also tried running this script:
#!/run/current-system/profile/bin/bashMD5=$(sudo md5sum /etc/shadow)echo "Current /etc/shadow has md5sum: $MD5"until [ "$(sudo md5sum /etc/shadow)" != "$MD5" ]; do sudo guix system roll-back sudo guix system reconfigure /etc/config.scmdonenotify-send "/etc/shadow changed!" "Maybe I reproduced the issue."
After repeatedly reconfiguring for some 40 minutes I still got thesame /etc/shadow.
(But I think it broke my motherboard, because recently the output ofreconfigure changed from
[…]shepherd: Evaluating user expression (let* ((services (map primitive-load (?))) # ?) ?).shepherd: Service user-homes has been started.shepherd: Service term-auto could not be started.bootloader successfully installed on '/boot/efi'
to
[…]shepherd: Evaluating user expression (let* ((services (map primitive-load (?))) # ?) ?).shepherd: Service user-homes has been started.shepherd: Service term-auto could not be started.error: '/gnu/store/h5bi85lgnpqcjx2avy126lwiss01idsj-grub-efi-2.02/sbin/grub-install --boot-directory //boot --bootloader-id=Guix --efi-directory //boot/efi' exited with status 1; output follows:
Installing for x86_64-efi platform. Could not prepare Boot variable: No such file or directory /gnu/store/h5bi85lgnpqcjx2avy126lwiss01idsj-grub-efi-2.02/sbin/grub-install: error: efibootmgr failed to register the boot entry: Input/output error.
guix system: error: failed to install bootloader /gnu/store/y6p93xjdbpbp0z2kc0gw5yqjppmdsq7g-bootloader-installer

I now get this on every reconfigure. I tried rebooting; this was abad idea; I csnnot boot anymore. But that is unrelated. Maybe I willinstall GRUB as if it were an external hard drive from now on.)

Toggle quote (20 lines)> > > The elogind source code in src/basic/user-util.c contains code for> > locking /etc/shadow, with a comment that explains why its lckpwdf is> > implemented differently from shadow-utils.> >> > AccountsService appears to only be usable for reading /etc/shadow, not> > for writing it, contrary to what the Guix manual claims (??). > > That might be a bug.> > > For writing passwords, gnome-control-center does not use> > AccountsService, it calls /usr/bin/passwd directly in its source code> > in panels/user-accounts/run-passwd.c.> > That’s definitely a bug to fix: it should invoke> /run/setuid-programs/passwd instead.> > Thanks for investigating,> Ludo’.
I will try and make patches once I can reboot into Guix again.
Regards,Florian
L
L
Ludovic Courtès wrote on 2 Jun 2019 11:38
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 35996@debbugs.gnu.org)
87sgss9vj7.fsf@gnu.org
Hi Florian,
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
Toggle quote (8 lines)> On Sat, Jun 01, 2019 at 11:37:51PM +0200, Ludovic Courtès wrote:>> This is definitely not a problem when booting. It could be a problem if>> you’re concurrently running ‘guix system reconfigure’ (which runs>> activation snippets, including the account updating code) and some other>> program, such as ‘passwd’, that assumes it holds an exclusive lock on>> the file. Though in that case, the worst that could happen is that the>> changes made by Guix would be undoed by that other program.
Actually, another thing that could happen is that Guix reads anincomplete /etc/shadow because some other program is writing to it.
In that case, suppose Guix reads a partial /etc/shadow where user“florian” is missing. It would then create a new /etc/shadow where thepassword for “florian” is uninitialized (or set to the initial valuethat appears in config.scm.)
Could it be what happened to you? You’d have to be running ‘passwd’ or‘usermod’ or whatever at exactly the same time as ‘guix systemreconfigure’ (and you’d have to be “lucky”).
Toggle quote (11 lines)> I also tried running this script:>> #!/run/current-system/profile/bin/bash> MD5=$(sudo md5sum /etc/shadow)> echo "Current /etc/shadow has md5sum: $MD5"> until [ "$(sudo md5sum /etc/shadow)" != "$MD5" ]; do> sudo guix system roll-back> sudo guix system reconfigure /etc/config.scm> done> notify-send "/etc/shadow changed!" "Maybe I reproduced the issue."
The code in (gnu build accounts) is purely functional and deterministic,so you have no chance of getting a different /etc/shadow with thisscript, unless perhaps you concurrently run ‘passwd’ or similar.
Toggle quote (6 lines)> error: '/gnu/store/h5bi85lgnpqcjx2avy126lwiss01idsj-grub-efi-2.02/sbin/grub-install --boot-directory //boot --bootloader-id=Guix --efi-directory //boot/efi' exited with status 1; output follows:>> Installing for x86_64-efi platform.> Could not prepare Boot variable: No such file or directory> /gnu/store/h5bi85lgnpqcjx2avy126lwiss01idsj-grub-efi-2.02/sbin/grub-install: error: efibootmgr failed to register the boot entry: Input/output error.
Maybe you’ve exhausted the room for those EFI “variables” or something?
Thanks for your debugging work!
Ludo’.
P
P
pelzflorian (Florian Pelz) wrote on 2 Jun 2019 12:21
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190602102122.bzapwt36vg32nmwq@pelzflorian.localdomain
On Sun, Jun 02, 2019 at 11:38:36AM +0200, Ludovic Courtès wrote:
Toggle quote (25 lines)> Hi Florian,> > "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:> > > On Sat, Jun 01, 2019 at 11:37:51PM +0200, Ludovic Courtès wrote:> >> This is definitely not a problem when booting. It could be a problem if> >> you’re concurrently running ‘guix system reconfigure’ (which runs> >> activation snippets, including the account updating code) and some other> >> program, such as ‘passwd’, that assumes it holds an exclusive lock on> >> the file. Though in that case, the worst that could happen is that the> >> changes made by Guix would be undoed by that other program.> > Actually, another thing that could happen is that Guix reads an> incomplete /etc/shadow because some other program is writing to it.> > In that case, suppose Guix reads a partial /etc/shadow where user> “florian” is missing. It would then create a new /etc/shadow where the> password for “florian” is uninitialized (or set to the initial value> that appears in config.scm.)> > Could it be what happened to you? You’d have to be running ‘passwd’ or> ‘usermod’ or whatever at exactly the same time as ‘guix system> reconfigure’ (and you’d have to be “lucky”).>
No, I did not change my password in a very long time.
Is there no proper cross-application locking mechanism for/etc/passwd? elogind uses
struct flock flock = { .l_type = F_WRLCK, .l_whence = SEEK_SET, .l_start = 0, .l_len = 0,};[…]fd = open(path, O_WRONLY|O_CREAT|O_CLOEXEC|O_NOCTTY|O_NOFOLLOW, 0600);[…]r = fcntl(fd, F_SETLKW, &flock;
Should Guix adopt something similar for shadow/passwd/… databasereads?



Toggle quote (12 lines)> > error: '/gnu/store/h5bi85lgnpqcjx2avy126lwiss01idsj-grub-efi-2.02/sbin/grub-install --boot-directory //boot --bootloader-id=Guix --efi-directory //boot/efi' exited with status 1; output follows:> >> > Installing for x86_64-efi platform.> > Could not prepare Boot variable: No such file or directory> > /gnu/store/h5bi85lgnpqcjx2avy126lwiss01idsj-grub-efi-2.02/sbin/grub-install: error: efibootmgr failed to register the boot entry: Input/output error.> > Maybe you’ve exhausted the room for those EFI “variables” or something?> > Thanks for your debugging work!> > Ludo’.
Maybe exhausted, maybe it is an error with the NVRAM. I will trymaking grub-install execute like when installing on external USBdrives so it writes nothing to the motherboard.
Regards,Florian
L
L
Ludovic Courtès wrote on 2 Jun 2019 18:00
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 35996@debbugs.gnu.org)
87o93g9dv5.fsf@gnu.org
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
Toggle quote (2 lines)> On Sun, Jun 02, 2019 at 11:38:36AM +0200, Ludovic Courtès wrote:
[...]
Toggle quote (32 lines)>> Actually, another thing that could happen is that Guix reads an>> incomplete /etc/shadow because some other program is writing to it.>> >> In that case, suppose Guix reads a partial /etc/shadow where user>> “florian” is missing. It would then create a new /etc/shadow where the>> password for “florian” is uninitialized (or set to the initial value>> that appears in config.scm.)>> >> Could it be what happened to you? You’d have to be running ‘passwd’ or>> ‘usermod’ or whatever at exactly the same time as ‘guix system>> reconfigure’ (and you’d have to be “lucky”).>>>> No, I did not change my password in a very long time.>> Is there no proper cross-application locking mechanism for> /etc/passwd? elogind uses>> struct flock flock = {> .l_type = F_WRLCK,> .l_whence = SEEK_SET,> .l_start = 0,> .l_len = 0,> };> […]> fd = open(path, O_WRONLY|O_CREAT|O_CLOEXEC|O_NOCTTY|O_NOFOLLOW, 0600);> […]> r = fcntl(fd, F_SETLKW, &flock;>> Should Guix adopt something similar for shadow/passwd/… database> reads?
We could do that yes, that I’d lean towards using the same thing as libcand Shadow. The whole scenario just sounds very unlikely though.
Thanks,Ludo’.
P
P
pelzflorian (Florian Pelz) wrote on 3 Jun 2019 08:03
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190603060301.2nu2zqi5j3v3j5ki@pelzflorian.localdomain
After I booted to a Guix install USB, chrooted as described on theArch wiki and started a Guix daemon, I could reconfigure as before.There was no need to fiddle with grub-install.
After multiple reconfigures, it happened again, my /etc/shadow has !again in the password field. My recently changed root password becameempty as well, like 35902. I did not even run sudo concurrently. Thepassword just got locked.
The /etc from the “populating from /gnu/store/*-etc” messages has nosignificant differences either.


On Sat, Jun 01, 2019 at 11:37:51PM +0200, Ludovic Courtès wrote:
Toggle quote (7 lines)> "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:> > AccountsService appears to only be usable for reading /etc/shadow, not> > for writing it, contrary to what the Guix manual claims (??). > > That might be a bug.>
AccountsService obviously can change passwords. No bug here. Sorry.I was confused.

Toggle quote (8 lines)> > For writing passwords, gnome-control-center does not use> > AccountsService, it calls /usr/bin/passwd directly in its source code> > in panels/user-accounts/run-passwd.c.> > That’s definitely a bug to fix: it should invoke> /run/setuid-programs/passwd instead.>
Find attached two patches that fix GNOME password changing. Both arerequired.
Regards,Florian
From 1eb7699d5036062993a080393bfb4a46d2dc1bea Mon Sep 17 00:00:00 2001From: Florian Pelz <pelzflorian@pelzflorian.de>Date: Mon, 3 Jun 2019 07:19:20 +0200Subject: [PATCH 1/2] =?UTF-8?q?Add=20cracklib=E2=80=99s=20password=20dicti?= =?UTF-8?q?onary=20to=20cracklib=E2=80=99s=20default=20output.?=MIME-Version: 1.0Content-Type: text/plain; charset=UTF-8Content-Transfer-Encoding: 8bit
* gnu/packages/password-utils.scm (cracklib): Use `make dict`.--- gnu/packages/password-utils.scm | 9 +++++++++ 1 file changed, 9 insertions(+)
Toggle diff (22 lines)diff --git a/gnu/packages/password-utils.scm b/gnu/packages/password-utils.scmindex 2b844c9a1c..88f933e43e 100644--- a/gnu/packages/password-utils.scm+++ b/gnu/packages/password-utils.scm@@ -310,6 +310,15 @@ and vice versa.") (sha256 (base32 "1rimpjsdnmw8f5b7k558cic41p2qy2n2yrlqp5vh7mp4162hk0py")))) (build-system gnu-build-system)+ (arguments+ `(#:phases+ (modify-phases %standard-phases+ (add-after 'install 'install-dict+ (lambda* (#:key make-flags #:allow-other-keys)+ (begin+ (chmod (string-append "util/cracklib-format") #o755)+ (apply invoke "make" "dict" make-flags)+ #t)))))) (synopsis "Password checking library") (home-page "https://github.com/cracklib/cracklib") (description-- 2.21.0
From c7c016adc34c591febd0d3630f32dbecdd20ad7c Mon Sep 17 00:00:00 2001From: Florian Pelz <pelzflorian@pelzflorian.de>Date: Sun, 2 Jun 2019 20:01:23 +0200Subject: [PATCH 2/2] Make gnome-control-center find passwd binary.
* gnu/packages/gnome.scm (gnome-control-center): Substitute correct path to passwd.--- gnu/packages/gnome.scm | 3 +++ 1 file changed, 3 insertions(+)
Toggle diff (16 lines)diff --git a/gnu/packages/gnome.scm b/gnu/packages/gnome.scmindex 6a2a683f58..2917107d18 100644--- a/gnu/packages/gnome.scm+++ b/gnu/packages/gnome.scm@@ -5879,6 +5879,9 @@ devices using the GNOME desktop.") (("\"nm-connection-editor") (string-append "\"" nm-applet "/bin/nm-connection-editor")))+ (substitute* '("panels/user-accounts/run-passwd.c")+ (("/usr/bin/passwd")+ "/run/setuid-programs/passwd")) #t)))))) (native-inputs `(("glib:bin" ,glib "bin") ; for glib-mkenums, etc.-- 2.21.0
G
G
Gábor Boskovits wrote on 3 Jun 2019 08:14
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)
CAE4v=piJkUZzjhFkF6OEtTBd6WKY5UusvviWZJ31kY3pFRfu5Q@mail.gmail.com
Hello,



pelzflorian (Florian Pelz) <pelzflorian@pelzflorian.de> ezt írta (idÅ‘pont:2019. jún. 3., Hét 8:04):
Toggle quote (9 lines)> After I booted to a Guix install USB, chrooted as described on the> Arch wiki and started a Guix daemon, I could reconfigure as before.> There was no need to fiddle with grub-install.>> After multiple reconfigures, it happened again, my /etc/shadow has !> again in the password field. My recently changed root password became> empty as well, like 35902. I did not even run sudo concurrently. The> password just got locked.>
This is the same thing that happened to me, and there is another report,regarding passwords being reset. I believe we should merge these two bugs.I am on a mobile with no convinient way to look up the issue number.
Toggle quote (33 lines)>> The /etc from the “populating from /gnu/store/*-etc” messages has no> significant differences either.>>>> On Sat, Jun 01, 2019 at 11:37:51PM +0200, Ludovic Courtès wrote:> > "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:> > > AccountsService appears to only be usable for reading /etc/shadow, not> > > for writing it, contrary to what the Guix manual claims (??).> >> > That might be a bug.> >>> AccountsService obviously can change passwords. No bug here. Sorry.> I was confused.>>> > > For writing passwords, gnome-control-center does not use> > > AccountsService, it calls /usr/bin/passwd directly in its source code> > > in panels/user-accounts/run-passwd.c.> >> > That’s definitely a bug to fix: it should invoke> > /run/setuid-programs/passwd instead.> >>> Find attached two patches that fix GNOME password changing. Both are> required.>> Regards,> Florian>
On my machine it turned out that the hdd is faulty, so this might be ahardware error, I will get a replacement drive tomorrow, and check if theproblem still persist.
Best regards,g_bor
Toggle quote (1 lines)>
Attachment: file
P
P
pelzflorian (Florian Pelz) wrote on 3 Jun 2019 09:18
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190603071830.q4bf76ioaszjdoo6@pelzflorian.localdomain
Please add more logging and/or locking. Note that the elogind has thefollowing comment in its locking implementation in/gnu/store/dm2ri0qvjirl0iq2ndfk5z9lq9dyk4jf-elogind-241.3-checkout/src/basic/user-util.c:
/* This is roughly the same as lckpwdf(), but not as awful. We * don't want to use alarm() and signals, hence we implement * our own trivial version of this. * * Note that shadow-utils also takes per-database locks in * addition to lckpwdf(). However, we don't given that they * are redundant as they invoke lckpwdf() first and keep * it during everything they do. The per-database locks are * awfully racy, and thus we just won't do them. */
Regards,Florian
L
L
Ludovic Courtès wrote on 3 Jun 2019 15:22
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 35996@debbugs.gnu.org)
87tvd6erbo.fsf@gnu.org
Hi Florian,
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
Toggle quote (9 lines)> After I booted to a Guix install USB, chrooted as described on the> Arch wiki and started a Guix daemon, I could reconfigure as before.> There was no need to fiddle with grub-install.>> After multiple reconfigures, it happened again, my /etc/shadow has !> again in the password field. My recently changed root password became> empty as well, like 35902. I did not even run sudo concurrently. The> password just got locked.
What were the differences between your config files when youreconfigured?
Did the config files describe the exact same set of user accounts?
Did the user accounts in the config files differ in any way?
Were the user accounts altered in any way in between reconfigures(‘passwd’, ‘usermod’, GNOME, etc.)?

Looking at ‘user+group-databases’ in (gnu build accounts), which takes alist of <user-account> and a list of <user-group> to produce/etc/{passwd,shadow,group}, the only way this could happen is if/etc/shadow does not exist at the time of reconfigure.
In that case, ‘user+group-databases’ assumes we’re starting anew, so itcreates /etc/shadow with the initial passwords specified in the OSconfig. At that point, the other passwords are lost forever.
Does Shadow or gnome-control-center or something remove /etc/shadowaltogether while it’s accessing it?
At the very least, adding locking like you suggested should avoid thisclass of problems; I’ll look into that.
Toggle quote (11 lines)> From 1eb7699d5036062993a080393bfb4a46d2dc1bea Mon Sep 17 00:00:00 2001> From: Florian Pelz <pelzflorian@pelzflorian.de>> Date: Mon, 3 Jun 2019 07:19:20 +0200> Subject: [PATCH 1/2] =?UTF-8?q?Add=20cracklib=E2=80=99s=20password=20dicti?=> =?UTF-8?q?onary=20to=20cracklib=E2=80=99s=20default=20output.?=> MIME-Version: 1.0> Content-Type: text/plain; charset=UTF-8> Content-Transfer-Encoding: 8bit>> * gnu/packages/password-utils.scm (cracklib): Use `make dict`.
[...]
Toggle quote (8 lines)> From c7c016adc34c591febd0d3630f32dbecdd20ad7c Mon Sep 17 00:00:00 2001> From: Florian Pelz <pelzflorian@pelzflorian.de>> Date: Sun, 2 Jun 2019 20:01:23 +0200> Subject: [PATCH 2/2] Make gnome-control-center find passwd binary.>> * gnu/packages/gnome.scm (gnome-control-center): Substitute correct path to> passwd.
Great, applied both.
Thank you!
Ludo’.
L
L
Ludovic Courtès wrote on 3 Jun 2019 15:25
control message for bug #35996
(address . control@debbugs.gnu.org)
87sgsqer7x.fsf@gnu.org
severity 35996 importantquit
P
P
pelzflorian (Florian Pelz) wrote on 3 Jun 2019 16:52
Re: bug#35996: User account password got locked when booting old generation
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190603145209.ub7663zp7yh7n7i4@pelzflorian.localdomain
On Mon, Jun 03, 2019 at 03:22:51PM +0200, Ludovic Courtès wrote:
Toggle quote (9 lines)> > After multiple reconfigures, it happened again, my /etc/shadow has !> > again in the password field. My recently changed root password became> > empty as well, like 35902. I did not even run sudo concurrently. The> > password just got locked.> > What were the differences between your config files when you> reconfigured?>
For the last reconfigure, there were no differences, although I hadrebooted into an unbootable, older generation with a differentsyslog.conf and broken Udevd arguments before booting the newgeneration. I suppose the other victims of this bug have not bootedto unbootable generations?

Toggle quote (3 lines)> Did the config files describe the exact same set of user accounts?>
Yes, they’re the same.
Toggle quote (3 lines)> Did the user accounts in the config files differ in any way?>
No, they do not differ.
Toggle quote (17 lines)> Were the user accounts altered in any way in between reconfigures> (‘passwd’, ‘usermod’, GNOME, etc.)?> >> Looking at ‘user+group-databases’ in (gnu build accounts), which takes a> list of <user-account> and a list of <user-group> to produce> /etc/{passwd,shadow,group}, the only way this could happen is if> /etc/shadow does not exist at the time of reconfigure.>> In that case, ‘user+group-databases’ assumes we’re starting anew, so it> creates /etc/shadow with the initial passwords specified in the OS> config. At that point, the other passwords are lost forever.> > Does Shadow or gnome-control-center or something remove /etc/shadow> altogether while it’s accessing it?>
I did not use gnome-control-center or shadow or sudo during the lastreconfigure, except sudo for starting the reconfigure.
Toggle quote (4 lines)> At the very least, adding locking like you suggested should avoid this> class of problems; I’ll look into that.>
I do not know if something somehow accessed /etc/shadow in thebackground without my knowledge. I believe locks are important anywayto have more guarantees passwords are not lost when Guix is run on asensitive multi-user setup. Thank you for looking into it.
If locks do not stop these issues, it would be nice to have moredetailed logs of shadow changes written to syslog on reconfigureand/or on reboot.
Regards,Florian
L
L
Ludovic Courtès wrote on 3 Jun 2019 17:22
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 35996@debbugs.gnu.org)
87h896d774.fsf@gnu.org
Hello,
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
Toggle quote (8 lines)> Please add more logging and/or locking. Note that the elogind has the> following comment in its locking implementation in> /gnu/store/dm2ri0qvjirl0iq2ndfk5z9lq9dyk4jf-elogind-241.3-checkout/src/basic/user-util.c:>> /* This is roughly the same as lckpwdf(), but not as awful. We> * don't want to use alarm() and signals, hence we implement> * our own trivial version of this.
Attach is a set of patches to lock /etc/.pwd.lock when we access/etc/{passwd,shadow,group} from the activation snippets. It shouldensure that, when running ‘guix system reconfigure’, we’re honoring thelocking protocol that Shadow & co. follow.
It would be great if you could test again in this context.
Thanks!
Ludo’.
From e26d9759ebe58c93e7eb449ea6bd3317b2840e99 Mon Sep 17 00:00:00 2001From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <ludo@gnu.org>Date: Mon, 3 Jun 2019 16:23:01 +0200Subject: [PATCH 1/5] syscalls: Add 'with-file-lock' macro.
* guix/scripts/offload.scm (lock-file, unlock-file, with-file-lock):Move to...* guix/build/syscalls.scm: ... here.--- .dir-locals.el | 2 ++ guix/build/syscalls.scm | 27 +++++++++++++++++++++++++++ guix/scripts/offload.scm | 25 ------------------------- 3 files changed, 29 insertions(+), 25 deletions(-)
Toggle diff (104 lines)diff --git a/.dir-locals.el b/.dir-locals.elindex f1196fd781..228685a69f 100644--- a/.dir-locals.el+++ b/.dir-locals.el@@ -34,6 +34,8 @@ (eval . (put 'modify-services 'scheme-indent-function 1)) (eval . (put 'with-directory-excursion 'scheme-indent-function 1))+ (eval . (put 'with-file-lock 'scheme-indent-function 1))+ (eval . (put 'package 'scheme-indent-function 0)) (eval . (put 'origin 'scheme-indent-function 0)) (eval . (put 'build-system 'scheme-indent-function 0))diff --git a/guix/build/syscalls.scm b/guix/build/syscalls.scmindex 3abe65bc4f..04fbebb8a2 100644--- a/guix/build/syscalls.scm+++ b/guix/build/syscalls.scm@@ -81,7 +81,11 @@ fdatasync pivot-root scandir*+ fcntl-flock+ lock-file+ unlock-file+ with-file-lock set-thread-name thread-name@@ -1067,6 +1071,29 @@ exception if it's already taken." ;; Presumably we got EAGAIN or so. (throw 'flock-error err)))))) +(define (lock-file file)+ "Wait and acquire an exclusive lock on FILE. Return an open port."+ (let ((port (open-file file "w0")))+ (fcntl-flock port 'write-lock)+ port))++(define (unlock-file port)+ "Unlock PORT, a port returned by 'lock-file'."+ (fcntl-flock port 'unlock)+ (close-port port)+ #t)++(define-syntax-rule (with-file-lock file exp ...)+ "Wait to acquire a lock on FILE and evaluate EXP in that context."+ (let ((port (lock-file file)))+ (dynamic-wind+ (lambda ()+ #t)+ (lambda ()+ exp ...)+ (lambda ()+ (unlock-file port)))))+ ;;; ;;; Miscellaneous, aka. 'prctl'.diff --git a/guix/scripts/offload.scm b/guix/scripts/offload.scmindex eb02672dbf..0c0dd9d516 100644--- a/guix/scripts/offload.scm+++ b/guix/scripts/offload.scm@@ -236,30 +236,6 @@ instead of '~a' of type '~a'~%") ;;; Synchronization. ;;; -(define (lock-file file)- "Wait and acquire an exclusive lock on FILE. Return an open port."- (mkdir-p (dirname file))- (let ((port (open-file file "w0")))- (fcntl-flock port 'write-lock)- port))--(define (unlock-file lock)- "Unlock LOCK."- (fcntl-flock lock 'unlock)- (close-port lock)- #t)--(define-syntax-rule (with-file-lock file exp ...)- "Wait to acquire a lock on FILE and evaluate EXP in that context."- (let ((port (lock-file file)))- (dynamic-wind- (lambda ()- #t)- (lambda ()- exp ...)- (lambda ()- (unlock-file port)))))- (define (machine-slot-file machine slot) "Return the file name of MACHINE's file for SLOT." ;; For each machine we have a bunch of files representing each build slot.@@ -829,7 +805,6 @@ This tool is meant to be used internally by 'guix-daemon'.\n")) (leave (G_ "invalid arguments: ~{~s ~}~%") x)))) ;;; Local Variables:-;;; eval: (put 'with-file-lock 'scheme-indent-function 1) ;;; eval: (put 'with-error-to-port 'scheme-indent-function 1) ;;; eval: (put 'with-timeout 'scheme-indent-function 2) ;;; End:-- 2.21.0
From cf802dcd04425ce783714d70444b023902b36947 Mon Sep 17 00:00:00 2001From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <ludo@gnu.org>Date: Mon, 3 Jun 2019 16:24:31 +0200Subject: [PATCH 2/5] syscalls: 'with-file-lock' expands to a call to 'call-with-file-lock'.
* guix/build/syscalls.scm (call-with-file-lock): New procedure.(with-file-lock): Expand to a call to 'call-with-file-lock'.--- guix/build/syscalls.scm | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
Toggle diff (30 lines)diff --git a/guix/build/syscalls.scm b/guix/build/syscalls.scmindex 04fbebb8a2..3af41f2cf5 100644--- a/guix/build/syscalls.scm+++ b/guix/build/syscalls.scm@@ -1083,17 +1083,19 @@ exception if it's already taken." (close-port port) #t) -(define-syntax-rule (with-file-lock file exp ...)- "Wait to acquire a lock on FILE and evaluate EXP in that context."+(define (call-with-file-lock file thunk) (let ((port (lock-file file))) (dynamic-wind (lambda () #t)- (lambda ()- exp ...)+ thunk (lambda () (unlock-file port))))) +(define-syntax-rule (with-file-lock file exp ...)+ "Wait to acquire a lock on FILE and evaluate EXP in that context."+ (call-with-file-lock file (lambda () exp ...)))+ ;;; ;;; Miscellaneous, aka. 'prctl'.-- 2.21.0
From 7ddcb15289a25a478a6bf7f353f2b4754b4af017 Mon Sep 17 00:00:00 2001From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <ludo@gnu.org>Date: Mon, 3 Jun 2019 17:13:30 +0200Subject: [PATCH 3/5] syscalls: 'with-lock-file' catches ENOSYS.
* guix/build/syscalls.scm (call-with-file-lock): Catch ENOSYS raised by'lock-file'.--- guix/build/syscalls.scm | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
Toggle diff (33 lines)diff --git a/guix/build/syscalls.scm b/guix/build/syscalls.scmindex 3af41f2cf5..5c2eb3c14d 100644--- a/guix/build/syscalls.scm+++ b/guix/build/syscalls.scm@@ -1084,13 +1084,24 @@ exception if it's already taken." #t) (define (call-with-file-lock file thunk)- (let ((port (lock-file file)))+ (let ((port (catch 'system-error+ (lambda ()+ (lock-file file))+ (lambda args+ ;; When using the statically-linked Guile in the initrd,+ ;; 'fcntl-flock' returns ENOSYS unconditionally. Ignore+ ;; that error since we're typically the only process running+ ;; at this point.+ (if (= ENOSYS (system-error-errno args))+ #f+ (apply throw args)))))) (dynamic-wind (lambda () #t) thunk (lambda ()- (unlock-file port)))))+ (when port+ (unlock-file port)))))) (define-syntax-rule (with-file-lock file exp ...) "Wait to acquire a lock on FILE and evaluate EXP in that context."-- 2.21.0
From fffb20dddfd8b5def7255874ce38061acd7f7d10 Mon Sep 17 00:00:00 2001From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <ludo@gnu.org>Date: Mon, 3 Jun 2019 17:14:17 +0200Subject: [PATCH 4/5] activation: Lock /etc/.pwd.lock before accessing databases.
Fixes http://bugs.gnu.org/35996.Reported by Florian Pelz <pelzflorian@pelzflorian.de>.
* gnu/build/accounts.scm (%password-lock-file): New variable.* gnu/build/activation.scm (activate-users+groups): Wrap calls to'user+group-databases', 'write-group', etc. into 'with-file-lock'.--- gnu/build/accounts.scm | 6 ++++++ gnu/build/activation.scm | 33 +++++++++++++++++++-------------- 2 files changed, 25 insertions(+), 14 deletions(-)
Toggle diff (79 lines)diff --git a/gnu/build/accounts.scm b/gnu/build/accounts.scmindex c43ce85b60..8687446aa6 100644--- a/gnu/build/accounts.scm+++ b/gnu/build/accounts.scm@@ -51,6 +51,7 @@ group-entry-gid group-entry-members + %password-lock-file write-group write-passwd write-shadow@@ -224,6 +225,11 @@ each field." (serialization list->comma-separated comma-separated->list) (default '()))) +(define %password-lock-file+ ;; The password database lock file used by libc's 'lckpwdf'. Users should+ ;; grab this lock with 'with-file-lock' when they access the databases.+ "/etc/.pwd.lock")+ (define (database-writer file mode entry->string) (lambda* (entries #:optional (file-or-port file)) "Write ENTRIES to FILE-OR-PORT. When FILE-OR-PORT is a file name, writediff --git a/gnu/build/activation.scm b/gnu/build/activation.scmindex cfdf17df0f..c6c7e7fd3b 100644--- a/gnu/build/activation.scm+++ b/gnu/build/activation.scm@@ -22,6 +22,7 @@ #:use-module (gnu build accounts) #:use-module (gnu build linux-boot) #:use-module (guix build utils)+ #:use-module ((guix build syscalls) #:select (with-file-lock)) #:use-module (ice-9 ftw) #:use-module (ice-9 match) #:use-module (ice-9 vlist)@@ -129,22 +130,26 @@ group records) are all available." ;; Allow home directories to be created under /var/lib. (mkdir-p "/var/lib") - (let-values (((groups passwd shadow)- (user+group-databases users groups)))- (write-group groups)- (write-passwd passwd)- (write-shadow shadow)+ ;; Take same lock as libc's 'lckpwdf' (but without a timeout) while we read+ ;; and write the databases. This ensures there's no race condition with+ ;; other tools that might be accessing it at the same time.+ (with-file-lock %password-lock-file+ (let-values (((groups passwd shadow)+ (user+group-databases users groups)))+ (write-group groups)+ (write-passwd passwd)+ (write-shadow shadow))) - ;; Home directories of non-system accounts are created by- ;; 'activate-user-home'.- (for-each make-home-directory system-accounts)+ ;; Home directories of non-system accounts are created by+ ;; 'activate-user-home'.+ (for-each make-home-directory system-accounts) - ;; Turn shared home directories, such as /var/empty, into root-owned,- ;; read-only places.- (for-each (lambda (directory)- (chown directory 0 0)- (chmod directory #o555))- (duplicates (map user-account-home-directory system-accounts)))))+ ;; Turn shared home directories, such as /var/empty, into root-owned,+ ;; read-only places.+ (for-each (lambda (directory)+ (chown directory 0 0)+ (chmod directory #o555))+ (duplicates (map user-account-home-directory system-accounts)))) (define (activate-user-home users) "Create and populate the home directory of USERS, a list of tuples, unless-- 2.21.0
From dbe1fbab1e7d46258b20382d1587d18e915a5fee Mon Sep 17 00:00:00 2001From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <ludo@gnu.org>Date: Mon, 3 Jun 2019 17:18:41 +0200Subject: [PATCH 5/5] nar: Really lock store files.
Previously, 'lock-store-file' would immediately close the filedescriptor of the '.lock' file, and thus it would immediately releasethe lock.
* guix/nar.scm (lock-store-file, unlock-store-file): Remove.(finalize-store-file): Use 'lock-file' and 'unlock-file' instead.--- guix/nar.scm | 42 ++++++++++++++++-------------------------- 1 file changed, 16 insertions(+), 26 deletions(-)
Toggle diff (71 lines)diff --git a/guix/nar.scm b/guix/nar.scmindex 8894f10d2b..29636aa0f8 100644--- a/guix/nar.scm+++ b/guix/nar.scm@@ -1,5 +1,5 @@ ;;; GNU Guix --- Functional package management for GNU-;;; Copyright © 2012, 2013, 2014, 2015, 2016, 2018 Ludovic Courtès <ludo@gnu.org>+;;; Copyright © 2012, 2013, 2014, 2015, 2016, 2018, 2019 Ludovic Courtès <ludo@gnu.org> ;;; Copyright © 2014 Mark H Weaver <mhw@netris.org> ;;; ;;; This file is part of GNU Guix.@@ -76,16 +76,6 @@ ;; most of the daemon is in Scheme :-)). But note that we do use a couple of ;; RPCs for functionality not available otherwise, like 'valid-path?'. -(define (lock-store-file file)- "Acquire exclusive access to FILE, a store file."- (call-with-output-file (string-append file ".lock")- (cut fcntl-flock <> 'write-lock)))--(define (unlock-store-file file)- "Release access to FILE."- (call-with-input-file (string-append file ".lock")- (cut fcntl-flock <> 'unlock)))- (define* (finalize-store-file source target #:key (references '()) deriver (lock? #t)) "Rename SOURCE to TARGET and register TARGET as a valid store item, with@@ -94,25 +84,25 @@ before attempting to register it; otherwise, assume TARGET's locks are already held." (with-database %default-database-file db (unless (path-id db target)- (when lock?- (lock-store-file target))+ (let ((lock (and lock?+ (lock-file (string-append target ".lock"))))) - (unless (path-id db target)- ;; If FILE already exists, delete it (it's invalid anyway.)- (when (file-exists? target)- (delete-file-recursively target))+ (unless (path-id db target)+ ;; If FILE already exists, delete it (it's invalid anyway.)+ (when (file-exists? target)+ (delete-file-recursively target)) - ;; Install the new TARGET.- (rename-file source target)+ ;; Install the new TARGET.+ (rename-file source target) - ;; Register TARGET. As a side effect, it resets the timestamps of all- ;; its files, recursively, and runs a deduplication pass.- (register-path target- #:references references- #:deriver deriver))+ ;; Register TARGET. As a side effect, it resets the timestamps of all+ ;; its files, recursively, and runs a deduplication pass.+ (register-path target+ #:references references+ #:deriver deriver)) - (when lock?- (unlock-store-file target)))))+ (when lock?+ (unlock-file lock)))))) (define (temporary-store-file) "Return the file name of a temporary file created in the store."-- 2.21.0
D
D
Danny Milosavljevic wrote on 3 Jun 2019 18:01
(name . Ludovic Courtès)(address . ludo@gnu.org)
20190603180149.4e44df43@scratchpost.org
For debugging, I've used "chattr +i" in the past in order to make such files(i.e. /etc/shadow) immutable. This would hopefully expose the mutator (otherthan Guix)--it should log some error somewhere.
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAlz1RG0ACgkQ5xo1VCwwuqVN1wgAhs6xubWsLleLmBIEmoDDfsDvvXgwHA0cYPInceoAkJsVH0xjt5JjJwG6NQ92bIPNpIv5YV1+2xxEKoDFHyv2xg14hC5Capd85D+FKyalv3bnX80H4Oij/p7kE4uv6gmfqNoSzdCsdsSvQzwl0Htiv25FF9QU5YoYaNcsLOKJwzJSHW33Bx7L1Yd4cXndbh2KP3QGvz8gN41InOTwIrwrQhckf7qysiB2gMFhEN17NkVJ8T3GiW++D8qc1lp5KiVVNESbQH9pJBe6M+vQ5R8c/OcFBWtP6EBPq194322mMOcO9dJxDQN7TXo6z/TYB2nu41tJM0npOUAlPogVIoN0eg===TT4A-----END PGP SIGNATURE-----

P
P
pelzflorian (Florian Pelz) wrote on 3 Jun 2019 19:07
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190603170757.bfj7fydd5swhi247@pelzflorian.localdomain
All this is not a problem with name service cache daemon, is it?
On Mon, Jun 03, 2019 at 05:22:55PM +0200, Ludovic Courtès wrote:
Toggle quote (8 lines)> Attach is a set of patches to lock /etc/.pwd.lock when we access> /etc/{passwd,shadow,group} from the activation snippets. It should> ensure that, when running ‘guix system reconfigure’, we’re honoring the> locking protocol that Shadow & co. follow.> > It would be great if you could test again in this context.>
Thank you! Will do.
Regards,Florian
L
L
Ludovic Courtès wrote on 4 Jun 2019 11:22
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 35996@debbugs.gnu.org)
87d0jtemca.fsf@gnu.org
Hi,
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
Toggle quote (15 lines)> On Mon, Jun 03, 2019 at 03:22:51PM +0200, Ludovic Courtès wrote:>> > After multiple reconfigures, it happened again, my /etc/shadow has !>> > again in the password field. My recently changed root password became>> > empty as well, like 35902. I did not even run sudo concurrently. The>> > password just got locked.>> >> What were the differences between your config files when you>> reconfigured?>>>> For the last reconfigure, there were no differences, although I had> rebooted into an unbootable, older generation with a different> syslog.conf and broken Udevd arguments before booting the new> generation.
What’s the effect of this brokenness concretely? Is the wrong root filesystem mounted, or something like that?
Could it somehow lead Guix to stumble upon an empty or missing/etc/shadow when it boots?
Toggle quote (3 lines)> I suppose the other victims of this bug have not booted to unbootable> generations?
It’d be great if the other victims would speak up. :-)
Toggle quote (4 lines)> If locks do not stop these issues, it would be nice to have more> detailed logs of shadow changes written to syslog on reconfigure> and/or on reboot.
There really isn’t much to log: the activation code reads/etc/{shadow,passwd,group}, computes the list of shadow/passwd/groupentries as a function of that, and writes it.
Ludo’.
P
P
pelzflorian (Florian Pelz) wrote on 4 Jun 2019 14:17
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190604121710.uqni7cwp5jo4pwmq@pelzflorian.localdomain
On Tue, Jun 04, 2019 at 11:22:45AM +0200, Ludovic Courtès wrote:
Toggle quote (23 lines)> Hi,> > "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:> > > On Mon, Jun 03, 2019 at 03:22:51PM +0200, Ludovic Courtès wrote:> >> > After multiple reconfigures, it happened again, my /etc/shadow has !> >> > again in the password field. My recently changed root password became> >> > empty as well, like 35902. I did not even run sudo concurrently. The> >> > password just got locked.> >> > >> What were the differences between your config files when you> >> reconfigured?> >>> >> > For the last reconfigure, there were no differences, although I had> > rebooted into an unbootable, older generation with a different> > syslog.conf and broken Udevd arguments before booting the new> > generation.> > What’s the effect of this brokenness concretely? Is the wrong root file> system mounted, or something like that?>
I have multiple broken generation. On one that now for a third time(on old generations without Ludo’s patches) led to a locked/etc/shadow after booting I changed the line(let ((pid (fork+exec-command (list udevd))))in gnu/services/base.scm to, I believe, this:(let ((pid (fork+exec-command (list udevd "--debug-trace"))))
(I am unsure if this is the same broken generation as on my firstreport of the issue. I may have gotten confused.)
This is unbootable, correct would have been --debug and not--debug-trace.
I may also have changed my syslog configuration to the incorrect
(modify-services %desktop-services (syslog-service-type config => (syslog-configuration (inherit config) (config-file(plain-file "my-syslog.conf" " # Log all error messages, authentication messages of # level notice or higher and anything of level err or # higher to the console. # Don't log private authentication messages! * /var/log/full[…]")))))))
Correct would have been *.* instead of * This latter error iswithout relevant effect I believe.
I will try to find the /gnu/store files for this generation.
Danny’s suggestion to `chattr +i /etc/shadow` leads to an error withrename-file trying to rename an empty /etc/shadow.Gi… temporary fileon both this old broken and on healthy generations.

Toggle quote (5 lines)> There really isn’t much to log: the activation code reads> /etc/{shadow,passwd,group}, computes the list of shadow/passwd/group> entries as a function of that, and writes it.>
If I cannot find a more deterministic way, I will try making (guixbuild accounts) print the content of shadow.
Regards,Florian
P
P
pelzflorian (Florian Pelz) wrote on 4 Jun 2019 16:12
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190604141217.7tq33idseebne5v2@pelzflorian.localdomain
On Tue, Jun 04, 2019 at 02:17:11PM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (6 lines)> On Tue, Jun 04, 2019 at 11:22:45AM +0200, Ludovic Courtès wrote:> > What’s the effect of this brokenness concretely? Is the wrong root file> > system mounted, or something like that?> > >
When removing quiet from the linux command line, shepherd complainsincessantly that it is trying to load udev.

All seems irrelevant, but:
Toggle quote (14 lines)> I have multiple broken generation. On one that now for a third time> (on old generations without Ludo’s patches) led to a locked> /etc/shadow after booting I changed the line> (let ((pid (fork+exec-command (list udevd))))> in gnu/services/base.scm to, I believe, this:> (let ((pid (fork+exec-command (list udevd "--debug-trace"))))> > (I am unsure if this is the same broken generation as on my first> report of the issue. I may have gotten confused.)> > This is unbootable, correct would have been --debug and not> --debug-trace.>
The line in/gnu/store/kdql26k1pxgm74d94ryzk8fb4lg5q0ra-shepherd-udev.scmreferenced from/gnu/store/b7mrb5pzsbbvhjmi8lbm9xa4wgvqbc7g-shepherd.conf referencedfrom the broken /var/guix/profiles/system-35-link/bootis
(let ((pid (fork+exec-command (list udevd "--debug-trace" "--verbose"))))
with an unneeded --verbose.
Toggle quote (3 lines)> I may also have changed my syslog configuration to the incorrect>
No, the syslog.conf in/gnu/store/y5nrfbj52vlnj77iyki9hbji8qjwk86d-syslog.conf referencedfrom /gnu/store/5bp6c0p357gaqikxkmvs0idrmvdrzf7h-shepherd-syslogd.scmreferenced from/gnu/store/b7mrb5pzsbbvhjmi8lbm9xa4wgvqbc7g-shepherd.conf referencedfrom the broken /var/guix/profiles/system-35-link/bootappears to be the default syslog.conf.
Booting this old generation and then an older working generationsometimes leads to a broken /etc/shadow. I do not yet know if abroken /etc/shadow can result when booting this old generation andthen a new patched generation.
I will reconfigure some more and try getting a bad /etc/shadow evenwith Ludo’s patches.
Regards,Florian
P
P
pelzflorian (Florian Pelz) wrote on 4 Jun 2019 19:17
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190604171715.gvwr54wiek4xs24e@pelzflorian.localdomain
I got a locked /etc/shadow again now despite Ludovic’s patches (whichwould nonetheless give me a better feeling when pushed).
When booting an unbootable generation with Ludovic’s patches and thenrebooting a normal generation with Ludovic’s patches, /etc/shadow islocked.
Note that I get a message like “/dev/mapper/Guix: recovering journal”when booting (I did not pay attention to that before). I shut downthe unbootable generation with Ctrl+Alt+Del. When I normally shutdown with Ctrl+Alt+Del I get no such message.
The unbootable generation was deliberately made unbootable like this:
On Tue, Jun 04, 2019 at 04:12:17PM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (24 lines)> > I have multiple broken generation. On one that now for a third time> > (on old generations without Ludo’s patches) led to a locked> > /etc/shadow after booting I changed the line> > (let ((pid (fork+exec-command (list udevd))))> > in gnu/services/base.scm to, I believe, this:> > (let ((pid (fork+exec-command (list udevd "--debug-trace"))))> > > > (I am unsure if this is the same broken generation as on my first> > report of the issue. I may have gotten confused.)> > > > This is unbootable, correct would have been --debug and not> > --debug-trace.> >> > The line in> /gnu/store/kdql26k1pxgm74d94ryzk8fb4lg5q0ra-shepherd-udev.scm> referenced from> /gnu/store/b7mrb5pzsbbvhjmi8lbm9xa4wgvqbc7g-shepherd.conf referenced> from the broken /var/guix/profiles/system-35-link/boot> is> > (let ((pid (fork+exec-command (list udevd "--debug-trace" "--verbose"))))>
Regards,Florian
L
L
Ludovic Courtès wrote on 4 Jun 2019 23:21
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 35996@debbugs.gnu.org)
87o93d6o8u.fsf@gnu.org
Hi,
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
Toggle quote (3 lines)> I got a locked /etc/shadow again now despite Ludovic’s patches (which> would nonetheless give me a better feeling when pushed).
Will do. :-)
Toggle quote (4 lines)> When booting an unbootable generation with Ludovic’s patches and then> rebooting a normal generation with Ludovic’s patches, /etc/shadow is> locked.
So with this scenario, the problem is 100% reproducible, right?
Toggle quote (5 lines)> Note that I get a message like “/dev/mapper/Guix: recovering journal”> when booting (I did not pay attention to that before). I shut down> the unbootable generation with Ctrl+Alt+Del. When I normally shut> down with Ctrl+Alt+Del I get no such message.
Indeed, ‘shepherd’ calls ‘disable-reboot-on-ctrl-alt-del’ (whichdisables “hard” reboots upon ctrl-alt-del and instead notifies it) afterit has loaded its config file.
In your case, loading the config file never completes because the‘start’ method is called from the config file for every service, and oneof them, udev, never starts. Thus, when you press Ctrl-Alt-Del, youperform a hard reboot.
The hard reboot happens after Guix has written to /etc/shadow. Onepossibility is that changes to this file haven’t been flushed to disk.Thus, on the next boot, we start off with an empty or truncated/etc/shadow, leading the activation snippet to initialize passwords.

If that theory holds, the patch below (on top of the others) shouldhelp. Could you give it a try?
Actually, the fact that ‘rename-file’ was called *before* ‘close-port’could be problematic in itself; so perhaps, even without the ‘fdatasync’call, we’d get better results… especially since ‘fdatasync’ won’t beavailable in the initrd anyway, hmm…
Thanks,Ludo’.
Toggle diff (41 lines)diff --git a/gnu/build/accounts.scm b/gnu/build/accounts.scmindex 8687446aa6..c13e6f2e89 100644--- a/gnu/build/accounts.scm+++ b/gnu/build/accounts.scm@@ -19,6 +19,7 @@ (define-module (gnu build accounts) #:use-module (guix records) #:use-module (guix combinators)+ #:use-module ((guix build syscalls) #:select (fdatasync)) #:use-module (gnu system accounts) #:use-module (srfi srfi-1) #:use-module (srfi srfi-11)@@ -230,6 +231,14 @@ each field." ;; grab this lock with 'with-file-lock' when they access the databases. "/etc/.pwd.lock") +(define-syntax-rule (catch-ENOSYS exp)+ (catch 'system-error+ (lambda () exp)+ (lambda args+ (if (= ENOSYS (system-error-errno args))+ #f+ (apply throw args)))))+ (define (database-writer file mode entry->string) (lambda* (entries #:optional (file-or-port file)) "Write ENTRIES to FILE-OR-PORT. When FILE-OR-PORT is a file name, write@@ -249,9 +258,12 @@ to it atomically and set the appropriate permissions." (lambda () (chmod port mode) (write-entries port)- (rename-file template file-or-port))- (lambda ()+ (catch-ENOSYS (fdatasync port)) (close-port port)+ (rename-file template file-or-port))+ (lambda ()+ (unless (port-closed? port)+ (close-port port)) (when (file-exists? template) (delete-file template))))))))
P
P
pelzflorian (Florian Pelz) wrote on 5 Jun 2019 08:16
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190605061611.py3v3msydbfn2eoe@pelzflorian.localdomain
On Tue, Jun 04, 2019 at 11:21:05PM +0200, Ludovic Courtès wrote:
Toggle quote (8 lines)> "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:> > When booting an unbootable generation with Ludovic’s patches and then> > rebooting a normal generation with Ludovic’s patches, /etc/shadow is> > locked.> > So with this scenario, the problem is 100% reproducible, right?>
/etc/shadow is not always locked, even if I get a recovering journalmessage.

Toggle quote (4 lines)> If that theory holds, the patch below (on top of the others) should> help. Could you give it a try?>
Will do.
Regards,Florian
L
L
Ludovic Courtès wrote on 5 Jun 2019 11:54
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 35996@debbugs.gnu.org)
87imtk73xs.fsf@gnu.org
Hi,
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
Toggle quote (2 lines)> On Tue, Jun 04, 2019 at 11:21:05PM +0200, Ludovic Courtès wrote:
[...]
Toggle quote (6 lines)>> If that theory holds, the patch below (on top of the others) should>> help. Could you give it a try?>> >> Will do.
Note that you’ll have to create a new “broken” generation with thispatch (because we already know that the old one can corrupt/etc/shadow.)
Ludo’.
P
P
pelzflorian (Florian Pelz) wrote on 5 Jun 2019 13:06
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190605110658.7metilrqike4juml@pelzflorian.localdomain
It appears your patch fixes the issue. I admire the speed at whichyou write patches. :) Thank you!
On Wed, Jun 05, 2019 at 11:54:23AM +0200, Ludovic Courtès wrote:
Toggle quote (5 lines)> Note that you’ll have to create a new “broken” generation with this> patch (because we already know that the old one can corrupt> /etc/shadow.)>
I created a new working generation and then a new unbootablegeneration with broken udevd args, both with all your patches. Irebooted the broken and then the working generation repeatedly twelvetimes. I waited varying amounts of time before doing Ctrl+Alt+Del inthe broken generation. /etc/shadow is still in good health.
However:
On Tue, Jun 04, 2019 at 11:21:05PM +0200, Ludovic Courtès wrote:
Toggle quote (4 lines)> Indeed, ‘shepherd’ calls ‘disable-reboot-on-ctrl-alt-del’ (which> disables “hard” reboots upon ctrl-alt-del and instead notifies it) after> it has loaded its config file.
Is there a good reason shepherd calls disable-reboot-on-ctrl-alt-delat the end? I get recovering journal messages unless on the previousboot I waited for the whole GDM to start (I can login on the TTYbefore GDM has fully started), which takes a long time during whichusers could change their mind and decide they do not want to boot.(The Macbook is not fast anyway and Guix is even slower when bootingcompared to Debian.)
Regards,Florian
L
L
Ludovic Courtès wrote on 5 Jun 2019 17:31
control message for bug #35996
(address . control@debbugs.gnu.org)
87imtkyrpf.fsf@gnu.org
merge 35996 35902quit
L
L
Ludovic Courtès wrote on 5 Jun 2019 23:13
Re: bug#35996: User account password got locked when booting old generation
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 35996@debbugs.gnu.org)
87a7evwxa9.fsf@gnu.org
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
Toggle quote (3 lines)> It appears your patch fixes the issue. I admire the speed at which> you write patches. :) Thank you!
Awesome! I must say that I’m really glad you’re putting this muchenergy into reproducing issues and investigating—it’s rare for peoplewho report bug to dig this deep, but it’s super helpful and motivating!
I’ve pushed the whole series:
d088d5c484 accounts: Call 'fdatasync' when writing databases. ed8570dce3 accounts: Close database before renaming it. 70a7a1b5dc nar: Really lock store files. d497b6ab39 activation: Lock /etc/.pwd.lock before accessing databases. 5f0cf1df71 syscalls: 'with-lock-file' catches ENOSYS. 89ceb86ad4 syscalls: 'with-file-lock' expands to a call to 'call-with-file-lock'. b7178c22bf syscalls: Add 'with-file-lock' macro.
The actual fix is ed8570dce3, AIUI.
Toggle quote (6 lines)> I created a new working generation and then a new unbootable> generation with broken udevd args, both with all your patches. I> rebooted the broken and then the working generation repeatedly twelve> times. I waited varying amounts of time before doing Ctrl+Alt+Del in> the broken generation. /etc/shadow is still in good health.
Good.
Toggle quote (13 lines)> On Tue, Jun 04, 2019 at 11:21:05PM +0200, Ludovic Courtès wrote:>> Indeed, ‘shepherd’ calls ‘disable-reboot-on-ctrl-alt-del’ (which>> disables “hard” reboots upon ctrl-alt-del and instead notifies it) after>> it has loaded its config file.>> Is there a good reason shepherd calls disable-reboot-on-ctrl-alt-del> at the end? I get recovering journal messages unless on the previous> boot I waited for the whole GDM to start (I can login on the TTY> before GDM has fully started), which takes a long time during which> users could change their mind and decide they do not want to boot.> (The Macbook is not fast anyway and Guix is even slower when booting> compared to Debian.)
I agree.
The attached patch for Shepherd moves everything before loading theconfig file. I think it will have the desired effect, though I’m notentirely sure the signal handler would run at the right time etc.
You can test it on the metal if you want (you need to add the patch tothe ‘shepherd’ package), but I’ll see if I can test in a VM.
Thank you!
Ludo’.
Toggle diff (74 lines)diff --git a/modules/shepherd.scm b/modules/shepherd.scmindex 8b2cc1d..769085a 100644--- a/modules/shepherd.scm+++ b/modules/shepherd.scm@@ -198,34 +198,6 @@ socket file at FILE-NAME upon exit of PROC. Return the values of PROC." ;; Start the 'root' service. (start root-service) - ;; This _must_ succeed. (We could also put the `catch' around- ;; `main', but it is often useful to get the backtrace, and- ;; `caught-error' does not do this yet.)- (catch #t- (lambda ()- (load-in-user-module (or config-file (default-config-file))))- (lambda (key . args)- (caught-error key args)- (quit 1)))- ;; Start what was started last time.- (and persistency- (catch 'system-error- (lambda ()- (start-in-order (read (open-input-file- persistency-state-file))))- (lambda (key . args)- (apply format #f (gettext (cadr args)) (caddr args))- (quit 1))))-- (when (provided? 'threads)- ;; XXX: This terrible hack allows us to make sure that signal handlers- ;; get a chance to run in a timely fashion. Without it, after an EINTR,- ;; we could restart the accept(2) call below before the corresponding- ;; async has been queued. See the thread at- ;; <https://lists.gnu.org/archive/html/guile-devel/2013-07/msg00004.html>.- (sigaction SIGALRM (lambda _ (alarm 1)))- (alarm 1))- (when (= 1 (getpid)) ;; When running as PID 1, disable hard reboots upon ctrl-alt-del. ;; Instead, the kernel will send us SIGINT so that we can gracefully@@ -259,6 +231,34 @@ socket file at FILE-NAME upon exit of PROC. Return the values of PROC." (lambda _ (stop root-service))) + ;; This _must_ succeed. (We could also put the `catch' around+ ;; `main', but it is often useful to get the backtrace, and+ ;; `caught-error' does not do this yet.)+ (catch #t+ (lambda ()+ (load-in-user-module (or config-file (default-config-file))))+ (lambda (key . args)+ (caught-error key args)+ (quit 1)))+ ;; Start what was started last time.+ (and persistency+ (catch 'system-error+ (lambda ()+ (start-in-order (read (open-input-file+ persistency-state-file))))+ (lambda (key . args)+ (apply format #f (gettext (cadr args)) (caddr args))+ (quit 1))))++ (when (provided? 'threads)+ ;; XXX: This terrible hack allows us to make sure that signal handlers+ ;; get a chance to run in a timely fashion. Without it, after an EINTR,+ ;; we could restart the accept(2) call below before the corresponding+ ;; async has been queued. See the thread at+ ;; <https://lists.gnu.org/archive/html/guile-devel/2013-07/msg00004.html>.+ (sigaction SIGALRM (lambda _ (alarm 1)))+ (alarm 1))+ ;; Ignore SIGPIPE so that we don't die if a client closes the connection ;; prematurely. (sigaction SIGPIPE SIG_IGN)
P
P
pelzflorian (Florian Pelz) wrote on 6 Jun 2019 09:01
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190606070143.poiz4anyd3i3bjkg@pelzflorian.localdomain
On Wed, Jun 05, 2019 at 11:13:34PM +0200, Ludovic Courtès wrote:
Toggle quote (4 lines)> Awesome! I must say that I’m really glad you’re putting this much> energy into reproducing issues and investigating—it’s rare for people> who report bug to dig this deep, but it’s super helpful and motivating!
:)

Toggle quote (5 lines)> The attached patch for Shepherd moves everything before loading the> config file. I think it will have the desired effect, though I’m not> entirely sure the signal handler would run at the right time etc.>
It works for me without recovering journal message (and taking aninsignificantly longer time to reboot).
Regards,Florian
L
L
Ludovic Courtès wrote on 6 Jun 2019 10:04
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 35996-done@debbugs.gnu.org)
87v9xjqgvi.fsf@gnu.org
Hello!
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
Toggle quote (2 lines)> On Wed, Jun 05, 2019 at 11:13:34PM +0200, Ludovic Courtès wrote:
[...]
Toggle quote (8 lines)>> The attached patch for Shepherd moves everything before loading the>> config file. I think it will have the desired effect, though I’m not>> entirely sure the signal handler would run at the right time etc.>>>> It works for me without recovering journal message (and taking an> insignificantly longer time to reboot).
Excellent. Pushed as Shepherd commitc6f250d1fd1afa9ee49c8bb2414eee087b672789.
Thank you!
Ludo’.
Closed
?
Your comment

This issue is archived.

To comment on this conversation send email to 35996@debbugs.gnu.org