User account password got locked when booting old generation

  • Done
  • quality assurance status badge
Details
4 participants
  • Gábor Boskovits
  • Danny Milosavljevic
  • Ludovic Courtès
  • pelzflorian (Florian Pelz)
Owner
unassigned
Submitted by
pelzflorian (Florian Pelz)
Severity
important
Merged with
P
P
pelzflorian (Florian Pelz) wrote on 29 May 2019 22:45
(address . bug-guix@gnu.org)
20190529204517.mqn5xrw23xib4i3u@pelzflorian.localdomain
After I reconfigured to a broken, unbootable generation and then
rebooted to an old working generation, I found my user account
password was locked. This was my /etc/shadow:

root::18045::::::
florian:!:18045::::::
nobody:!:18045::::::
guixbuilder01:!:18045::::::
guixbuilder02:!:18045::::::
guixbuilder03:!:18045::::::
guixbuilder04:!:18045::::::
guixbuilder05:!:18045::::::
guixbuilder06:!:18045::::::
guixbuilder07:!:18045::::::
guixbuilder08:!:18045::::::
guixbuilder09:!:18045::::::
guixbuilder10:!:18045::::::
ntpd:!:18045::::::
messagebus:!:18045::::::
polkitd:!:18045::::::
geoclue:!:18045::::::
colord:!:18045::::::
avahi:!:18045::::::
gdm:!:18045::::::
httpd:!:18045::::::

Logging in as root (root had an empty password before as well) and
running `passwd florian` fixed it, and I *cannot* reproduce the bug
anymore by booting the broken generation again, i.e. my password
remains set now.

I presume it is not possible to lock a password by typing the wrong
password too often on the virtual console?

If you think this is not enough material to work on, feel free to
close this bug, but there seems to be some misbehavior somewhere in
Guix’ password management.

(The reason for the new generation’s brokenness seems unrelated; it
could not boot after I tried adding syslogd to the requirements of
udev-shepherd-service.)

Regards,
Florian
L
L
Ludovic Courtès wrote on 1 Jun 2019 00:05
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 35996@debbugs.gnu.org)
877ea6l1on.fsf@gnu.org
Hello,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

Toggle quote (31 lines)
> After I reconfigured to a broken, unbootable generation and then
> rebooted to an old working generation, I found my user account
> password was locked. This was my /etc/shadow:
>
> root::18045::::::
> florian:!:18045::::::
> nobody:!:18045::::::
> guixbuilder01:!:18045::::::
> guixbuilder02:!:18045::::::
> guixbuilder03:!:18045::::::
> guixbuilder04:!:18045::::::
> guixbuilder05:!:18045::::::
> guixbuilder06:!:18045::::::
> guixbuilder07:!:18045::::::
> guixbuilder08:!:18045::::::
> guixbuilder09:!:18045::::::
> guixbuilder10:!:18045::::::
> ntpd:!:18045::::::
> messagebus:!:18045::::::
> polkitd:!:18045::::::
> geoclue:!:18045::::::
> colord:!:18045::::::
> avahi:!:18045::::::
> gdm:!:18045::::::
> httpd:!:18045::::::
>
> Logging in as root (root had an empty password before as well) and
> running `passwd florian` fixed it, and I *cannot* reproduce the bug
> anymore by booting the broken generation again, i.e. my password
> remains set now.

Did the old generation you booted have user ‘florian’?

Also, how old was it? User account management changed in March (commit
0ae735bcc8ff7fdc89d67b492bdee9091ee19e86).

Ludo’.
P
P
pelzflorian (Florian Pelz) wrote on 1 Jun 2019 07:52
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190601055238.jkhefpupavz7aipi@pelzflorian.localdomain
On Sat, Jun 01, 2019 at 12:05:44AM +0200, Ludovic Courtès wrote:
Toggle quote (3 lines)
> Did the old generation you booted have user ‘florian’?
>

Yes, this is my main machine. The users are the same.

Toggle quote (4 lines)
> Also, how old was it? User account management changed in March (commit
> 0ae735bcc8ff7fdc89d67b492bdee9091ee19e86).
>

The git commit for both old and new generation is

commit ad7466aafd7f166d0b6be5eb32dda1d3ee8a6445 (origin/master, origin/HEAD)
Author: Ludovic Court<C3><A8>s <ludo@gnu.org>
Date: Sun May 26 23:18:21 2019 +0200

with unrelevant patches (for USB_ModeSwitch) applied on both, as well
as a non-working patch adding syslogd as a requirement to
udev-shepherd-service (making the generation unbootable and making
something display Stack overflow) and a working patch to add a --debug
argument to udevd for the new, unbootable generation. The old
generation has a patch that adds wrong arguments to udevd that udevd
ignores. All seems harmless.

I still cannot reproduce despite rebooting the broken generation
multiple times and then booting back into the old generation.

I wonder what would change /etc/shadow.

Regards,
Florian
P
P
pelzflorian (Florian Pelz) wrote on 1 Jun 2019 16:58
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190601145834.f4wgm4oqmdyej7n5@pelzflorian.localdomain
On Sat, Jun 01, 2019 at 07:52:38AM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (3 lines)
> I wonder what would change /etc/shadow.
>

If the error occurred on common non-Guix distros, it hopefully would
have been fixed before, maybe. Of course Guix recreates /etc/shadow
much more frequently.

Guix appears to add shadow files atomically in gnu/build/accounts.scm.
I do not know if there could have been an error reading the old shadow
file, e.g. because it is locked or something?

The elogind source code in src/basic/user-util.c contains code for
locking /etc/shadow, with a comment that explains why its lckpwdf is
implemented differently from shadow-utils.

AccountsService appears to only be usable for reading /etc/shadow, not
for writing it, contrary to what the Guix manual claims (??). For
writing passwords, gnome-control-center does not use AccountsService,
it calls /usr/bin/passwd directly in its source code in
panels/user-accounts/run-passwd.c.

Regards,
Florian
L
L
Ludovic Courtès wrote on 1 Jun 2019 23:37
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 35996@debbugs.gnu.org)
87r28dc7gw.fsf@gnu.org
Hi Florian,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

Toggle quote (8 lines)
> On Sat, Jun 01, 2019 at 07:52:38AM +0200, pelzflorian (Florian Pelz) wrote:
>> I wonder what would change /etc/shadow.
>>
>
> If the error occurred on common non-Guix distros, it hopefully would
> have been fixed before, maybe. Of course Guix recreates /etc/shadow
> much more frequently.

Definitely.

Toggle quote (4 lines)
> Guix appears to add shadow files atomically in gnu/build/accounts.scm.
> I do not know if there could have been an error reading the old shadow
> file, e.g. because it is locked or something?

(gnu build accounts) doesn’t care at all about /etc/.pwd.lock, the lock
file used by libc’s ‘lckpwdf’ function.

This is definitely not a problem when booting. It could be a problem if
you’re concurrently running ‘guix system reconfigure’ (which runs
activation snippets, including the account updating code) and some other
program, such as ‘passwd’, that assumes it holds an exclusive lock on
the file. Though in that case, the worst that could happen is that the
changes made by Guix would be undoed by that other program.

Toggle quote (7 lines)
> The elogind source code in src/basic/user-util.c contains code for
> locking /etc/shadow, with a comment that explains why its lckpwdf is
> implemented differently from shadow-utils.
>
> AccountsService appears to only be usable for reading /etc/shadow, not
> for writing it, contrary to what the Guix manual claims (??).

That might be a bug.

Toggle quote (4 lines)
> For writing passwords, gnome-control-center does not use
> AccountsService, it calls /usr/bin/passwd directly in its source code
> in panels/user-accounts/run-passwd.c.

That’s definitely a bug to fix: it should invoke
/run/setuid-programs/passwd instead.

Thanks for investigating,
Ludo’.
P
P
pelzflorian (Florian Pelz) wrote on 2 Jun 2019 09:05
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190602070545.xp2pqlnzsthpjtbw@pelzflorian.localdomain
On Sat, Jun 01, 2019 at 11:37:51PM +0200, Ludovic Courtès wrote:
Toggle quote (7 lines)
> This is definitely not a problem when booting. It could be a problem if
> you’re concurrently running ‘guix system reconfigure’ (which runs
> activation snippets, including the account updating code) and some other
> program, such as ‘passwd’, that assumes it holds an exclusive lock on
> the file. Though in that case, the worst that could happen is that the
> changes made by Guix would be undoed by that other program.

Thank you for explaining!

OK. Also why would a read error cause the password to get locked
anyway. I suppose this is an issue with mingetty or something locking
the password, perhaps after too many failed login attempts. (Note
that I’m prone to making typos because with this Macbook keyboard key
presses are sometimes recognized twice on and only on a virtual
console.) I will investigate mingetty next.


I also tried running this script:

#!/run/current-system/profile/bin/bash
MD5=$(sudo md5sum /etc/shadow)
echo "Current /etc/shadow has md5sum: $MD5"
until [ "$(sudo md5sum /etc/shadow)" != "$MD5" ]; do
sudo guix system roll-back
sudo guix system reconfigure /etc/config.scm
done
notify-send "/etc/shadow changed!" "Maybe I reproduced the issue."

After repeatedly reconfiguring for some 40 minutes I still got the
same /etc/shadow.

(But I think it broke my motherboard, because recently the output of
reconfigure changed from

[…]
shepherd: Evaluating user expression (let* ((services (map primitive-load (?))) # ?) ?).
shepherd: Service user-homes has been started.
shepherd: Service term-auto could not be started.
bootloader successfully installed on '/boot/efi'

to

[…]
shepherd: Evaluating user expression (let* ((services (map primitive-load (?))) # ?) ?).
shepherd: Service user-homes has been started.
shepherd: Service term-auto could not be started.
error: '/gnu/store/h5bi85lgnpqcjx2avy126lwiss01idsj-grub-efi-2.02/sbin/grub-install --boot-directory //boot --bootloader-id=Guix --efi-directory //boot/efi' exited with status 1; output follows:

Installing for x86_64-efi platform.
Could not prepare Boot variable: No such file or directory
/gnu/store/h5bi85lgnpqcjx2avy126lwiss01idsj-grub-efi-2.02/sbin/grub-install: error: efibootmgr failed to register the boot entry: Input/output error.

guix system: error: failed to install bootloader /gnu/store/y6p93xjdbpbp0z2kc0gw5yqjppmdsq7g-bootloader-installer


I now get this on every reconfigure. I tried rebooting; this was a
bad idea; I csnnot boot anymore. But that is unrelated. Maybe I will
install GRUB as if it were an external hard drive from now on.)


Toggle quote (20 lines)
>
> > The elogind source code in src/basic/user-util.c contains code for
> > locking /etc/shadow, with a comment that explains why its lckpwdf is
> > implemented differently from shadow-utils.
> >
> > AccountsService appears to only be usable for reading /etc/shadow, not
> > for writing it, contrary to what the Guix manual claims (??).
>
> That might be a bug.
>
> > For writing passwords, gnome-control-center does not use
> > AccountsService, it calls /usr/bin/passwd directly in its source code
> > in panels/user-accounts/run-passwd.c.
>
> That’s definitely a bug to fix: it should invoke
> /run/setuid-programs/passwd instead.
>
> Thanks for investigating,
> Ludo’.

I will try and make patches once I can reboot into Guix again.

Regards,
Florian
L
L
Ludovic Courtès wrote on 2 Jun 2019 11:38
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 35996@debbugs.gnu.org)
87sgss9vj7.fsf@gnu.org
Hi Florian,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

Toggle quote (8 lines)
> On Sat, Jun 01, 2019 at 11:37:51PM +0200, Ludovic Courtès wrote:
>> This is definitely not a problem when booting. It could be a problem if
>> you’re concurrently running ‘guix system reconfigure’ (which runs
>> activation snippets, including the account updating code) and some other
>> program, such as ‘passwd’, that assumes it holds an exclusive lock on
>> the file. Though in that case, the worst that could happen is that the
>> changes made by Guix would be undoed by that other program.

Actually, another thing that could happen is that Guix reads an
incomplete /etc/shadow because some other program is writing to it.

In that case, suppose Guix reads a partial /etc/shadow where user
“florian” is missing. It would then create a new /etc/shadow where the
password for “florian” is uninitialized (or set to the initial value
that appears in config.scm.)

Could it be what happened to you? You’d have to be running ‘passwd’ or
‘usermod’ or whatever at exactly the same time as ‘guix system
reconfigure’ (and you’d have to be “lucky”).

Toggle quote (11 lines)
> I also tried running this script:
>
> #!/run/current-system/profile/bin/bash
> MD5=$(sudo md5sum /etc/shadow)
> echo "Current /etc/shadow has md5sum: $MD5"
> until [ "$(sudo md5sum /etc/shadow)" != "$MD5" ]; do
> sudo guix system roll-back
> sudo guix system reconfigure /etc/config.scm
> done
> notify-send "/etc/shadow changed!" "Maybe I reproduced the issue."

The code in (gnu build accounts) is purely functional and deterministic,
so you have no chance of getting a different /etc/shadow with this
script, unless perhaps you concurrently run ‘passwd’ or similar.

Toggle quote (6 lines)
> error: '/gnu/store/h5bi85lgnpqcjx2avy126lwiss01idsj-grub-efi-2.02/sbin/grub-install --boot-directory //boot --bootloader-id=Guix --efi-directory //boot/efi' exited with status 1; output follows:
>
> Installing for x86_64-efi platform.
> Could not prepare Boot variable: No such file or directory
> /gnu/store/h5bi85lgnpqcjx2avy126lwiss01idsj-grub-efi-2.02/sbin/grub-install: error: efibootmgr failed to register the boot entry: Input/output error.

Maybe you’ve exhausted the room for those EFI “variables” or something?

Thanks for your debugging work!

Ludo’.
P
P
pelzflorian (Florian Pelz) wrote on 2 Jun 2019 12:21
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190602102122.bzapwt36vg32nmwq@pelzflorian.localdomain
On Sun, Jun 02, 2019 at 11:38:36AM +0200, Ludovic Courtès wrote:
Toggle quote (25 lines)
> Hi Florian,
>
> "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
>
> > On Sat, Jun 01, 2019 at 11:37:51PM +0200, Ludovic Courtès wrote:
> >> This is definitely not a problem when booting. It could be a problem if
> >> you’re concurrently running ‘guix system reconfigure’ (which runs
> >> activation snippets, including the account updating code) and some other
> >> program, such as ‘passwd’, that assumes it holds an exclusive lock on
> >> the file. Though in that case, the worst that could happen is that the
> >> changes made by Guix would be undoed by that other program.
>
> Actually, another thing that could happen is that Guix reads an
> incomplete /etc/shadow because some other program is writing to it.
>
> In that case, suppose Guix reads a partial /etc/shadow where user
> “florian” is missing. It would then create a new /etc/shadow where the
> password for “florian” is uninitialized (or set to the initial value
> that appears in config.scm.)
>
> Could it be what happened to you? You’d have to be running ‘passwd’ or
> ‘usermod’ or whatever at exactly the same time as ‘guix system
> reconfigure’ (and you’d have to be “lucky”).
>

No, I did not change my password in a very long time.

Is there no proper cross-application locking mechanism for
/etc/passwd? elogind uses

struct flock flock = {
.l_type = F_WRLCK,
.l_whence = SEEK_SET,
.l_start = 0,
.l_len = 0,
};
[…]
fd = open(path, O_WRONLY|O_CREAT|O_CLOEXEC|O_NOCTTY|O_NOFOLLOW, 0600);
[…]
r = fcntl(fd, F_SETLKW, &flock;

Should Guix adopt something similar for shadow/passwd/… database
reads?




Toggle quote (12 lines)
> > error: '/gnu/store/h5bi85lgnpqcjx2avy126lwiss01idsj-grub-efi-2.02/sbin/grub-install --boot-directory //boot --bootloader-id=Guix --efi-directory //boot/efi' exited with status 1; output follows:
> >
> > Installing for x86_64-efi platform.
> > Could not prepare Boot variable: No such file or directory
> > /gnu/store/h5bi85lgnpqcjx2avy126lwiss01idsj-grub-efi-2.02/sbin/grub-install: error: efibootmgr failed to register the boot entry: Input/output error.
>
> Maybe you’ve exhausted the room for those EFI “variables” or something?
>
> Thanks for your debugging work!
>
> Ludo’.

Maybe exhausted, maybe it is an error with the NVRAM. I will try
making grub-install execute like when installing on external USB
drives so it writes nothing to the motherboard.

Regards,
Florian
L
L
Ludovic Courtès wrote on 2 Jun 2019 18:00
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 35996@debbugs.gnu.org)
87o93g9dv5.fsf@gnu.org
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

Toggle quote (2 lines)
> On Sun, Jun 02, 2019 at 11:38:36AM +0200, Ludovic Courtès wrote:

[...]

Toggle quote (32 lines)
>> Actually, another thing that could happen is that Guix reads an
>> incomplete /etc/shadow because some other program is writing to it.
>>
>> In that case, suppose Guix reads a partial /etc/shadow where user
>> “florian” is missing. It would then create a new /etc/shadow where the
>> password for “florian” is uninitialized (or set to the initial value
>> that appears in config.scm.)
>>
>> Could it be what happened to you? You’d have to be running ‘passwd’ or
>> ‘usermod’ or whatever at exactly the same time as ‘guix system
>> reconfigure’ (and you’d have to be “lucky”).
>>
>
> No, I did not change my password in a very long time.
>
> Is there no proper cross-application locking mechanism for
> /etc/passwd? elogind uses
>
> struct flock flock = {
> .l_type = F_WRLCK,
> .l_whence = SEEK_SET,
> .l_start = 0,
> .l_len = 0,
> };
> […]
> fd = open(path, O_WRONLY|O_CREAT|O_CLOEXEC|O_NOCTTY|O_NOFOLLOW, 0600);
> […]
> r = fcntl(fd, F_SETLKW, &flock;
>
> Should Guix adopt something similar for shadow/passwd/… database
> reads?

We could do that yes, that I’d lean towards using the same thing as libc
and Shadow. The whole scenario just sounds very unlikely though.

Thanks,
Ludo’.
P
P
pelzflorian (Florian Pelz) wrote on 3 Jun 2019 08:03
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190603060301.2nu2zqi5j3v3j5ki@pelzflorian.localdomain
After I booted to a Guix install USB, chrooted as described on the
Arch wiki and started a Guix daemon, I could reconfigure as before.
There was no need to fiddle with grub-install.

After multiple reconfigures, it happened again, my /etc/shadow has !
again in the password field. My recently changed root password became
empty as well, like 35902. I did not even run sudo concurrently. The
password just got locked.

The /etc from the “populating from /gnu/store/*-etc” messages has no
significant differences either.



On Sat, Jun 01, 2019 at 11:37:51PM +0200, Ludovic Courtès wrote:
Toggle quote (7 lines)
> "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
> > AccountsService appears to only be usable for reading /etc/shadow, not
> > for writing it, contrary to what the Guix manual claims (??).
>
> That might be a bug.
>

AccountsService obviously can change passwords. No bug here. Sorry.
I was confused.


Toggle quote (8 lines)
> > For writing passwords, gnome-control-center does not use
> > AccountsService, it calls /usr/bin/passwd directly in its source code
> > in panels/user-accounts/run-passwd.c.
>
> That’s definitely a bug to fix: it should invoke
> /run/setuid-programs/passwd instead.
>

Find attached two patches that fix GNOME password changing. Both are
required.

Regards,
Florian
From 1eb7699d5036062993a080393bfb4a46d2dc1bea Mon Sep 17 00:00:00 2001
From: Florian Pelz <pelzflorian@pelzflorian.de>
Date: Mon, 3 Jun 2019 07:19:20 +0200
Subject: [PATCH 1/2] =?UTF-8?q?Add=20cracklib=E2=80=99s=20password=20dicti?=
=?UTF-8?q?onary=20to=20cracklib=E2=80=99s=20default=20output.?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* gnu/packages/password-utils.scm (cracklib): Use `make dict`.
---
gnu/packages/password-utils.scm | 9 +++++++++
1 file changed, 9 insertions(+)

Toggle diff (22 lines)
diff --git a/gnu/packages/password-utils.scm b/gnu/packages/password-utils.scm
index 2b844c9a1c..88f933e43e 100644
--- a/gnu/packages/password-utils.scm
+++ b/gnu/packages/password-utils.scm
@@ -310,6 +310,15 @@ and vice versa.")
(sha256
(base32 "1rimpjsdnmw8f5b7k558cic41p2qy2n2yrlqp5vh7mp4162hk0py"))))
(build-system gnu-build-system)
+ (arguments
+ `(#:phases
+ (modify-phases %standard-phases
+ (add-after 'install 'install-dict
+ (lambda* (#:key make-flags #:allow-other-keys)
+ (begin
+ (chmod (string-append "util/cracklib-format") #o755)
+ (apply invoke "make" "dict" make-flags)
+ #t))))))
(synopsis "Password checking library")
(home-page "https://github.com/cracklib/cracklib")
(description
--
2.21.0
From c7c016adc34c591febd0d3630f32dbecdd20ad7c Mon Sep 17 00:00:00 2001
From: Florian Pelz <pelzflorian@pelzflorian.de>
Date: Sun, 2 Jun 2019 20:01:23 +0200
Subject: [PATCH 2/2] Make gnome-control-center find passwd binary.

* gnu/packages/gnome.scm (gnome-control-center): Substitute correct path to
passwd.
---
gnu/packages/gnome.scm | 3 +++
1 file changed, 3 insertions(+)

Toggle diff (16 lines)
diff --git a/gnu/packages/gnome.scm b/gnu/packages/gnome.scm
index 6a2a683f58..2917107d18 100644
--- a/gnu/packages/gnome.scm
+++ b/gnu/packages/gnome.scm
@@ -5879,6 +5879,9 @@ devices using the GNOME desktop.")
(("\"nm-connection-editor")
(string-append "\"" nm-applet
"/bin/nm-connection-editor")))
+ (substitute* '("panels/user-accounts/run-passwd.c")
+ (("/usr/bin/passwd")
+ "/run/setuid-programs/passwd"))
#t))))))
(native-inputs
`(("glib:bin" ,glib "bin") ; for glib-mkenums, etc.
--
2.21.0
G
G
Gábor Boskovits wrote on 3 Jun 2019 08:14
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)
CAE4v=piJkUZzjhFkF6OEtTBd6WKY5UusvviWZJ31kY3pFRfu5Q@mail.gmail.com
Hello,




pelzflorian (Florian Pelz) <pelzflorian@pelzflorian.de> ezt írta (id?pont:
2019. jún. 3., Hét 8:04):

Toggle quote (9 lines)
> After I booted to a Guix install USB, chrooted as described on the
> Arch wiki and started a Guix daemon, I could reconfigure as before.
> There was no need to fiddle with grub-install.
>
> After multiple reconfigures, it happened again, my /etc/shadow has !
> again in the password field. My recently changed root password became
> empty as well, like 35902. I did not even run sudo concurrently. The
> password just got locked.
>
This is the same thing that happened to me, and there is another report,
regarding passwords being reset. I believe we should merge these two bugs.
I am on a mobile with no convinient way to look up the issue number.

Toggle quote (33 lines)
>
> The /etc from the “populating from /gnu/store/*-etc” messages has no
> significant differences either.
>
>
>
> On Sat, Jun 01, 2019 at 11:37:51PM +0200, Ludovic Courtès wrote:
> > "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
> > > AccountsService appears to only be usable for reading /etc/shadow, not
> > > for writing it, contrary to what the Guix manual claims (??).
> >
> > That might be a bug.
> >
>
> AccountsService obviously can change passwords. No bug here. Sorry.
> I was confused.
>
>
> > > For writing passwords, gnome-control-center does not use
> > > AccountsService, it calls /usr/bin/passwd directly in its source code
> > > in panels/user-accounts/run-passwd.c.
> >
> > That’s definitely a bug to fix: it should invoke
> > /run/setuid-programs/passwd instead.
> >
>
> Find attached two patches that fix GNOME password changing. Both are
> required.
>
> Regards,
> Florian
>

On my machine it turned out that the hdd is faulty, so this might be a
hardware error, I will get a replacement drive tomorrow, and check if the
problem still persist.

Best regards,
g_bor

Toggle quote (1 lines)
>
Attachment: file
P
P
pelzflorian (Florian Pelz) wrote on 3 Jun 2019 09:18
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190603071830.q4bf76ioaszjdoo6@pelzflorian.localdomain
Please add more logging and/or locking. Note that the elogind has the
following comment in its locking implementation in
/gnu/store/dm2ri0qvjirl0iq2ndfk5z9lq9dyk4jf-elogind-241.3-checkout/src/basic/user-util.c:

/* This is roughly the same as lckpwdf(), but not as awful. We
* don't want to use alarm() and signals, hence we implement
* our own trivial version of this.
*
* Note that shadow-utils also takes per-database locks in
* addition to lckpwdf(). However, we don't given that they
* are redundant as they invoke lckpwdf() first and keep
* it during everything they do. The per-database locks are
* awfully racy, and thus we just won't do them. */

Regards,
Florian
L
L
Ludovic Courtès wrote on 3 Jun 2019 15:22
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 35996@debbugs.gnu.org)
87tvd6erbo.fsf@gnu.org
Hi Florian,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

Toggle quote (9 lines)
> After I booted to a Guix install USB, chrooted as described on the
> Arch wiki and started a Guix daemon, I could reconfigure as before.
> There was no need to fiddle with grub-install.
>
> After multiple reconfigures, it happened again, my /etc/shadow has !
> again in the password field. My recently changed root password became
> empty as well, like 35902. I did not even run sudo concurrently. The
> password just got locked.

What were the differences between your config files when you
reconfigured?

Did the config files describe the exact same set of user accounts?

Did the user accounts in the config files differ in any way?

Were the user accounts altered in any way in between reconfigures
(‘passwd’, ‘usermod’, GNOME, etc.)?


Looking at ‘user+group-databases’ in (gnu build accounts), which takes a
list of <user-account> and a list of <user-group> to produce
/etc/{passwd,shadow,group}, the only way this could happen is if
/etc/shadow does not exist at the time of reconfigure.

In that case, ‘user+group-databases’ assumes we’re starting anew, so it
creates /etc/shadow with the initial passwords specified in the OS
config. At that point, the other passwords are lost forever.

Does Shadow or gnome-control-center or something remove /etc/shadow
altogether while it’s accessing it?

At the very least, adding locking like you suggested should avoid this
class of problems; I’ll look into that.

Toggle quote (11 lines)
> From 1eb7699d5036062993a080393bfb4a46d2dc1bea Mon Sep 17 00:00:00 2001
> From: Florian Pelz <pelzflorian@pelzflorian.de>
> Date: Mon, 3 Jun 2019 07:19:20 +0200
> Subject: [PATCH 1/2] =?UTF-8?q?Add=20cracklib=E2=80=99s=20password=20dicti?=
> =?UTF-8?q?onary=20to=20cracklib=E2=80=99s=20default=20output.?=
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> * gnu/packages/password-utils.scm (cracklib): Use `make dict`.

[...]

Toggle quote (8 lines)
> From c7c016adc34c591febd0d3630f32dbecdd20ad7c Mon Sep 17 00:00:00 2001
> From: Florian Pelz <pelzflorian@pelzflorian.de>
> Date: Sun, 2 Jun 2019 20:01:23 +0200
> Subject: [PATCH 2/2] Make gnome-control-center find passwd binary.
>
> * gnu/packages/gnome.scm (gnome-control-center): Substitute correct path to
> passwd.

Great, applied both.

Thank you!

Ludo’.
L
L
Ludovic Courtès wrote on 3 Jun 2019 15:25
control message for bug #35996
(address . control@debbugs.gnu.org)
87sgsqer7x.fsf@gnu.org
severity 35996 important
quit
P
P
pelzflorian (Florian Pelz) wrote on 3 Jun 2019 16:52
Re: bug#35996: User account password got locked when booting old generation
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190603145209.ub7663zp7yh7n7i4@pelzflorian.localdomain
On Mon, Jun 03, 2019 at 03:22:51PM +0200, Ludovic Courtès wrote:
Toggle quote (9 lines)
> > After multiple reconfigures, it happened again, my /etc/shadow has !
> > again in the password field. My recently changed root password became
> > empty as well, like 35902. I did not even run sudo concurrently. The
> > password just got locked.
>
> What were the differences between your config files when you
> reconfigured?
>

For the last reconfigure, there were no differences, although I had
rebooted into an unbootable, older generation with a different
syslog.conf and broken Udevd arguments before booting the new
generation. I suppose the other victims of this bug have not booted
to unbootable generations?


Toggle quote (3 lines)
> Did the config files describe the exact same set of user accounts?
>

Yes, they’re the same.

Toggle quote (3 lines)
> Did the user accounts in the config files differ in any way?
>

No, they do not differ.

Toggle quote (17 lines)
> Were the user accounts altered in any way in between reconfigures
> (‘passwd’, ‘usermod’, GNOME, etc.)?
>
>
> Looking at ‘user+group-databases’ in (gnu build accounts), which takes a
> list of <user-account> and a list of <user-group> to produce
> /etc/{passwd,shadow,group}, the only way this could happen is if
> /etc/shadow does not exist at the time of reconfigure.
>
> In that case, ‘user+group-databases’ assumes we’re starting anew, so it
> creates /etc/shadow with the initial passwords specified in the OS
> config. At that point, the other passwords are lost forever.
>
> Does Shadow or gnome-control-center or something remove /etc/shadow
> altogether while it’s accessing it?
>

I did not use gnome-control-center or shadow or sudo during the last
reconfigure, except sudo for starting the reconfigure.

Toggle quote (4 lines)
> At the very least, adding locking like you suggested should avoid this
> class of problems; I’ll look into that.
>

I do not know if something somehow accessed /etc/shadow in the
background without my knowledge. I believe locks are important anyway
to have more guarantees passwords are not lost when Guix is run on a
sensitive multi-user setup. Thank you for looking into it.

If locks do not stop these issues, it would be nice to have more
detailed logs of shadow changes written to syslog on reconfigure
and/or on reboot.

Regards,
Florian
L
L
Ludovic Courtès wrote on 3 Jun 2019 17:22
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 35996@debbugs.gnu.org)
87h896d774.fsf@gnu.org
Hello,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

Toggle quote (8 lines)
> Please add more logging and/or locking. Note that the elogind has the
> following comment in its locking implementation in
> /gnu/store/dm2ri0qvjirl0iq2ndfk5z9lq9dyk4jf-elogind-241.3-checkout/src/basic/user-util.c:
>
> /* This is roughly the same as lckpwdf(), but not as awful. We
> * don't want to use alarm() and signals, hence we implement
> * our own trivial version of this.

Attach is a set of patches to lock /etc/.pwd.lock when we access
/etc/{passwd,shadow,group} from the activation snippets. It should
ensure that, when running ‘guix system reconfigure’, we’re honoring the
locking protocol that Shadow & co. follow.

It would be great if you could test again in this context.

Thanks!

Ludo’.
From e26d9759ebe58c93e7eb449ea6bd3317b2840e99 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <ludo@gnu.org>
Date: Mon, 3 Jun 2019 16:23:01 +0200
Subject: [PATCH 1/5] syscalls: Add 'with-file-lock' macro.

* guix/scripts/offload.scm (lock-file, unlock-file, with-file-lock):
Move to...
* guix/build/syscalls.scm: ... here.
---
.dir-locals.el | 2 ++
guix/build/syscalls.scm | 27 +++++++++++++++++++++++++++
guix/scripts/offload.scm | 25 -------------------------
3 files changed, 29 insertions(+), 25 deletions(-)

Toggle diff (104 lines)
diff --git a/.dir-locals.el b/.dir-locals.el
index f1196fd781..228685a69f 100644
--- a/.dir-locals.el
+++ b/.dir-locals.el
@@ -34,6 +34,8 @@
(eval . (put 'modify-services 'scheme-indent-function 1))
(eval . (put 'with-directory-excursion 'scheme-indent-function 1))
+ (eval . (put 'with-file-lock 'scheme-indent-function 1))
+
(eval . (put 'package 'scheme-indent-function 0))
(eval . (put 'origin 'scheme-indent-function 0))
(eval . (put 'build-system 'scheme-indent-function 0))
diff --git a/guix/build/syscalls.scm b/guix/build/syscalls.scm
index 3abe65bc4f..04fbebb8a2 100644
--- a/guix/build/syscalls.scm
+++ b/guix/build/syscalls.scm
@@ -81,7 +81,11 @@
fdatasync
pivot-root
scandir*
+
fcntl-flock
+ lock-file
+ unlock-file
+ with-file-lock
set-thread-name
thread-name
@@ -1067,6 +1071,29 @@ exception if it's already taken."
;; Presumably we got EAGAIN or so.
(throw 'flock-error err))))))
+(define (lock-file file)
+ "Wait and acquire an exclusive lock on FILE. Return an open port."
+ (let ((port (open-file file "w0")))
+ (fcntl-flock port 'write-lock)
+ port))
+
+(define (unlock-file port)
+ "Unlock PORT, a port returned by 'lock-file'."
+ (fcntl-flock port 'unlock)
+ (close-port port)
+ #t)
+
+(define-syntax-rule (with-file-lock file exp ...)
+ "Wait to acquire a lock on FILE and evaluate EXP in that context."
+ (let ((port (lock-file file)))
+ (dynamic-wind
+ (lambda ()
+ #t)
+ (lambda ()
+ exp ...)
+ (lambda ()
+ (unlock-file port)))))
+
;;;
;;; Miscellaneous, aka. 'prctl'.
diff --git a/guix/scripts/offload.scm b/guix/scripts/offload.scm
index eb02672dbf..0c0dd9d516 100644
--- a/guix/scripts/offload.scm
+++ b/guix/scripts/offload.scm
@@ -236,30 +236,6 @@ instead of '~a' of type '~a'~%")
;;; Synchronization.
;;;
-(define (lock-file file)
- "Wait and acquire an exclusive lock on FILE. Return an open port."
- (mkdir-p (dirname file))
- (let ((port (open-file file "w0")))
- (fcntl-flock port 'write-lock)
- port))
-
-(define (unlock-file lock)
- "Unlock LOCK."
- (fcntl-flock lock 'unlock)
- (close-port lock)
- #t)
-
-(define-syntax-rule (with-file-lock file exp ...)
- "Wait to acquire a lock on FILE and evaluate EXP in that context."
- (let ((port (lock-file file)))
- (dynamic-wind
- (lambda ()
- #t)
- (lambda ()
- exp ...)
- (lambda ()
- (unlock-file port)))))
-
(define (machine-slot-file machine slot)
"Return the file name of MACHINE's file for SLOT."
;; For each machine we have a bunch of files representing each build slot.
@@ -829,7 +805,6 @@ This tool is meant to be used internally by 'guix-daemon'.\n"))
(leave (G_ "invalid arguments: ~{~s ~}~%") x))))
;;; Local Variables:
-;;; eval: (put 'with-file-lock 'scheme-indent-function 1)
;;; eval: (put 'with-error-to-port 'scheme-indent-function 1)
;;; eval: (put 'with-timeout 'scheme-indent-function 2)
;;; End:
--
2.21.0
From cf802dcd04425ce783714d70444b023902b36947 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <ludo@gnu.org>
Date: Mon, 3 Jun 2019 16:24:31 +0200
Subject: [PATCH 2/5] syscalls: 'with-file-lock' expands to a call to
'call-with-file-lock'.

* guix/build/syscalls.scm (call-with-file-lock): New procedure.
(with-file-lock): Expand to a call to 'call-with-file-lock'.
---
guix/build/syscalls.scm | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

Toggle diff (30 lines)
diff --git a/guix/build/syscalls.scm b/guix/build/syscalls.scm
index 04fbebb8a2..3af41f2cf5 100644
--- a/guix/build/syscalls.scm
+++ b/guix/build/syscalls.scm
@@ -1083,17 +1083,19 @@ exception if it's already taken."
(close-port port)
#t)
-(define-syntax-rule (with-file-lock file exp ...)
- "Wait to acquire a lock on FILE and evaluate EXP in that context."
+(define (call-with-file-lock file thunk)
(let ((port (lock-file file)))
(dynamic-wind
(lambda ()
#t)
- (lambda ()
- exp ...)
+ thunk
(lambda ()
(unlock-file port)))))
+(define-syntax-rule (with-file-lock file exp ...)
+ "Wait to acquire a lock on FILE and evaluate EXP in that context."
+ (call-with-file-lock file (lambda () exp ...)))
+
;;;
;;; Miscellaneous, aka. 'prctl'.
--
2.21.0
From 7ddcb15289a25a478a6bf7f353f2b4754b4af017 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <ludo@gnu.org>
Date: Mon, 3 Jun 2019 17:13:30 +0200
Subject: [PATCH 3/5] syscalls: 'with-lock-file' catches ENOSYS.

* guix/build/syscalls.scm (call-with-file-lock): Catch ENOSYS raised by
'lock-file'.
---
guix/build/syscalls.scm | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)

Toggle diff (33 lines)
diff --git a/guix/build/syscalls.scm b/guix/build/syscalls.scm
index 3af41f2cf5..5c2eb3c14d 100644
--- a/guix/build/syscalls.scm
+++ b/guix/build/syscalls.scm
@@ -1084,13 +1084,24 @@ exception if it's already taken."
#t)
(define (call-with-file-lock file thunk)
- (let ((port (lock-file file)))
+ (let ((port (catch 'system-error
+ (lambda ()
+ (lock-file file))
+ (lambda args
+ ;; When using the statically-linked Guile in the initrd,
+ ;; 'fcntl-flock' returns ENOSYS unconditionally. Ignore
+ ;; that error since we're typically the only process running
+ ;; at this point.
+ (if (= ENOSYS (system-error-errno args))
+ #f
+ (apply throw args))))))
(dynamic-wind
(lambda ()
#t)
thunk
(lambda ()
- (unlock-file port)))))
+ (when port
+ (unlock-file port))))))
(define-syntax-rule (with-file-lock file exp ...)
"Wait to acquire a lock on FILE and evaluate EXP in that context."
--
2.21.0
From fffb20dddfd8b5def7255874ce38061acd7f7d10 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <ludo@gnu.org>
Date: Mon, 3 Jun 2019 17:14:17 +0200
Subject: [PATCH 4/5] activation: Lock /etc/.pwd.lock before accessing
databases.

Reported by Florian Pelz <pelzflorian@pelzflorian.de>.

* gnu/build/accounts.scm (%password-lock-file): New variable.
* gnu/build/activation.scm (activate-users+groups): Wrap calls to
'user+group-databases', 'write-group', etc. into 'with-file-lock'.
---
gnu/build/accounts.scm | 6 ++++++
gnu/build/activation.scm | 33 +++++++++++++++++++--------------
2 files changed, 25 insertions(+), 14 deletions(-)

Toggle diff (79 lines)
diff --git a/gnu/build/accounts.scm b/gnu/build/accounts.scm
index c43ce85b60..8687446aa6 100644
--- a/gnu/build/accounts.scm
+++ b/gnu/build/accounts.scm
@@ -51,6 +51,7 @@
group-entry-gid
group-entry-members
+ %password-lock-file
write-group
write-passwd
write-shadow
@@ -224,6 +225,11 @@ each field."
(serialization list->comma-separated comma-separated->list)
(default '())))
+(define %password-lock-file
+ ;; The password database lock file used by libc's 'lckpwdf'. Users should
+ ;; grab this lock with 'with-file-lock' when they access the databases.
+ "/etc/.pwd.lock")
+
(define (database-writer file mode entry->string)
(lambda* (entries #:optional (file-or-port file))
"Write ENTRIES to FILE-OR-PORT. When FILE-OR-PORT is a file name, write
diff --git a/gnu/build/activation.scm b/gnu/build/activation.scm
index cfdf17df0f..c6c7e7fd3b 100644
--- a/gnu/build/activation.scm
+++ b/gnu/build/activation.scm
@@ -22,6 +22,7 @@
#:use-module (gnu build accounts)
#:use-module (gnu build linux-boot)
#:use-module (guix build utils)
+ #:use-module ((guix build syscalls) #:select (with-file-lock))
#:use-module (ice-9 ftw)
#:use-module (ice-9 match)
#:use-module (ice-9 vlist)
@@ -129,22 +130,26 @@ group records) are all available."
;; Allow home directories to be created under /var/lib.
(mkdir-p "/var/lib")
- (let-values (((groups passwd shadow)
- (user+group-databases users groups)))
- (write-group groups)
- (write-passwd passwd)
- (write-shadow shadow)
+ ;; Take same lock as libc's 'lckpwdf' (but without a timeout) while we read
+ ;; and write the databases. This ensures there's no race condition with
+ ;; other tools that might be accessing it at the same time.
+ (with-file-lock %password-lock-file
+ (let-values (((groups passwd shadow)
+ (user+group-databases users groups)))
+ (write-group groups)
+ (write-passwd passwd)
+ (write-shadow shadow)))
- ;; Home directories of non-system accounts are created by
- ;; 'activate-user-home'.
- (for-each make-home-directory system-accounts)
+ ;; Home directories of non-system accounts are created by
+ ;; 'activate-user-home'.
+ (for-each make-home-directory system-accounts)
- ;; Turn shared home directories, such as /var/empty, into root-owned,
- ;; read-only places.
- (for-each (lambda (directory)
- (chown directory 0 0)
- (chmod directory #o555))
- (duplicates (map user-account-home-directory system-accounts)))))
+ ;; Turn shared home directories, such as /var/empty, into root-owned,
+ ;; read-only places.
+ (for-each (lambda (directory)
+ (chown directory 0 0)
+ (chmod directory #o555))
+ (duplicates (map user-account-home-directory system-accounts))))
(define (activate-user-home users)
"Create and populate the home directory of USERS, a list of tuples, unless
--
2.21.0
From dbe1fbab1e7d46258b20382d1587d18e915a5fee Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <ludo@gnu.org>
Date: Mon, 3 Jun 2019 17:18:41 +0200
Subject: [PATCH 5/5] nar: Really lock store files.

Previously, 'lock-store-file' would immediately close the file
descriptor of the '.lock' file, and thus it would immediately release
the lock.

* guix/nar.scm (lock-store-file, unlock-store-file): Remove.
(finalize-store-file): Use 'lock-file' and 'unlock-file' instead.
---
guix/nar.scm | 42 ++++++++++++++++--------------------------
1 file changed, 16 insertions(+), 26 deletions(-)

Toggle diff (71 lines)
diff --git a/guix/nar.scm b/guix/nar.scm
index 8894f10d2b..29636aa0f8 100644
--- a/guix/nar.scm
+++ b/guix/nar.scm
@@ -1,5 +1,5 @@
;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2012, 2013, 2014, 2015, 2016, 2018 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2012, 2013, 2014, 2015, 2016, 2018, 2019 Ludovic Courtès <ludo@gnu.org>
;;; Copyright © 2014 Mark H Weaver <mhw@netris.org>
;;;
;;; This file is part of GNU Guix.
@@ -76,16 +76,6 @@
;; most of the daemon is in Scheme :-)). But note that we do use a couple of
;; RPCs for functionality not available otherwise, like 'valid-path?'.
-(define (lock-store-file file)
- "Acquire exclusive access to FILE, a store file."
- (call-with-output-file (string-append file ".lock")
- (cut fcntl-flock <> 'write-lock)))
-
-(define (unlock-store-file file)
- "Release access to FILE."
- (call-with-input-file (string-append file ".lock")
- (cut fcntl-flock <> 'unlock)))
-
(define* (finalize-store-file source target
#:key (references '()) deriver (lock? #t))
"Rename SOURCE to TARGET and register TARGET as a valid store item, with
@@ -94,25 +84,25 @@ before attempting to register it; otherwise, assume TARGET's locks are already
held."
(with-database %default-database-file db
(unless (path-id db target)
- (when lock?
- (lock-store-file target))
+ (let ((lock (and lock?
+ (lock-file (string-append target ".lock")))))
- (unless (path-id db target)
- ;; If FILE already exists, delete it (it's invalid anyway.)
- (when (file-exists? target)
- (delete-file-recursively target))
+ (unless (path-id db target)
+ ;; If FILE already exists, delete it (it's invalid anyway.)
+ (when (file-exists? target)
+ (delete-file-recursively target))
- ;; Install the new TARGET.
- (rename-file source target)
+ ;; Install the new TARGET.
+ (rename-file source target)
- ;; Register TARGET. As a side effect, it resets the timestamps of all
- ;; its files, recursively, and runs a deduplication pass.
- (register-path target
- #:references references
- #:deriver deriver))
+ ;; Register TARGET. As a side effect, it resets the timestamps of all
+ ;; its files, recursively, and runs a deduplication pass.
+ (register-path target
+ #:references references
+ #:deriver deriver))
- (when lock?
- (unlock-store-file target)))))
+ (when lock?
+ (unlock-file lock))))))
(define (temporary-store-file)
"Return the file name of a temporary file created in the store."
--
2.21.0
D
D
Danny Milosavljevic wrote on 3 Jun 2019 18:01
(name . Ludovic Courtès)(address . ludo@gnu.org)
20190603180149.4e44df43@scratchpost.org
For debugging, I've used "chattr +i" in the past in order to make such files
(i.e. /etc/shadow) immutable. This would hopefully expose the mutator (other
than Guix)--it should log some error somewhere.
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAlz1RG0ACgkQ5xo1VCww
uqVN1wgAhs6xubWsLleLmBIEmoDDfsDvvXgwHA0cYPInceoAkJsVH0xjt5JjJwG6
NQ92bIPNpIv5YV1+2xxEKoDFHyv2xg14hC5Capd85D+FKyalv3bnX80H4Oij/p7k
E4uv6gmfqNoSzdCsdsSvQzwl0Htiv25FF9QU5YoYaNcsLOKJwzJSHW33Bx7L1Yd4
cXndbh2KP3QGvz8gN41InOTwIrwrQhckf7qysiB2gMFhEN17NkVJ8T3GiW++D8qc
1lp5KiVVNESbQH9pJBe6M+vQ5R8c/OcFBWtP6EBPq194322mMOcO9dJxDQN7TXo6
z/TYB2nu41tJM0npOUAlPogVIoN0eg==
=TT4A
-----END PGP SIGNATURE-----


P
P
pelzflorian (Florian Pelz) wrote on 3 Jun 2019 19:07
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190603170757.bfj7fydd5swhi247@pelzflorian.localdomain
All this is not a problem with name service cache daemon, is it?

On Mon, Jun 03, 2019 at 05:22:55PM +0200, Ludovic Courtès wrote:
Toggle quote (8 lines)
> Attach is a set of patches to lock /etc/.pwd.lock when we access
> /etc/{passwd,shadow,group} from the activation snippets. It should
> ensure that, when running ‘guix system reconfigure’, we’re honoring the
> locking protocol that Shadow & co. follow.
>
> It would be great if you could test again in this context.
>

Thank you! Will do.

Regards,
Florian
L
L
Ludovic Courtès wrote on 4 Jun 2019 11:22
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 35996@debbugs.gnu.org)
87d0jtemca.fsf@gnu.org
Hi,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

Toggle quote (15 lines)
> On Mon, Jun 03, 2019 at 03:22:51PM +0200, Ludovic Courtès wrote:
>> > After multiple reconfigures, it happened again, my /etc/shadow has !
>> > again in the password field. My recently changed root password became
>> > empty as well, like 35902. I did not even run sudo concurrently. The
>> > password just got locked.
>>
>> What were the differences between your config files when you
>> reconfigured?
>>
>
> For the last reconfigure, there were no differences, although I had
> rebooted into an unbootable, older generation with a different
> syslog.conf and broken Udevd arguments before booting the new
> generation.

What’s the effect of this brokenness concretely? Is the wrong root file
system mounted, or something like that?

Could it somehow lead Guix to stumble upon an empty or missing
/etc/shadow when it boots?

Toggle quote (3 lines)
> I suppose the other victims of this bug have not booted to unbootable
> generations?

It’d be great if the other victims would speak up. :-)

Toggle quote (4 lines)
> If locks do not stop these issues, it would be nice to have more
> detailed logs of shadow changes written to syslog on reconfigure
> and/or on reboot.

There really isn’t much to log: the activation code reads
/etc/{shadow,passwd,group}, computes the list of shadow/passwd/group
entries as a function of that, and writes it.

Ludo’.
P
P
pelzflorian (Florian Pelz) wrote on 4 Jun 2019 14:17
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190604121710.uqni7cwp5jo4pwmq@pelzflorian.localdomain
On Tue, Jun 04, 2019 at 11:22:45AM +0200, Ludovic Courtès wrote:
Toggle quote (23 lines)
> Hi,
>
> "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
>
> > On Mon, Jun 03, 2019 at 03:22:51PM +0200, Ludovic Courtès wrote:
> >> > After multiple reconfigures, it happened again, my /etc/shadow has !
> >> > again in the password field. My recently changed root password became
> >> > empty as well, like 35902. I did not even run sudo concurrently. The
> >> > password just got locked.
> >>
> >> What were the differences between your config files when you
> >> reconfigured?
> >>
> >
> > For the last reconfigure, there were no differences, although I had
> > rebooted into an unbootable, older generation with a different
> > syslog.conf and broken Udevd arguments before booting the new
> > generation.
>
> What’s the effect of this brokenness concretely? Is the wrong root file
> system mounted, or something like that?
>

I have multiple broken generation. On one that now for a third time
(on old generations without Ludo’s patches) led to a locked
/etc/shadow after booting I changed the line
(let ((pid (fork+exec-command (list udevd))))
in gnu/services/base.scm to, I believe, this:
(let ((pid (fork+exec-command (list udevd "--debug-trace"))))

(I am unsure if this is the same broken generation as on my first
report of the issue. I may have gotten confused.)

This is unbootable, correct would have been --debug and not
--debug-trace.

I may also have changed my syslog configuration to the incorrect

(modify-services %desktop-services
(syslog-service-type config =>
(syslog-configuration
(inherit config)
(config-file
(plain-file "my-syslog.conf" "
# Log all error messages, authentication messages of
# level notice or higher and anything of level err or
# higher to the console.
# Don't log private authentication messages!
* /var/log/full
[…]")))))))

Correct would have been *.* instead of * This latter error is
without relevant effect I believe.

I will try to find the /gnu/store files for this generation.

Danny’s suggestion to `chattr +i /etc/shadow` leads to an error with
rename-file trying to rename an empty /etc/shadow.Gi… temporary file
on both this old broken and on healthy generations.


Toggle quote (5 lines)
> There really isn’t much to log: the activation code reads
> /etc/{shadow,passwd,group}, computes the list of shadow/passwd/group
> entries as a function of that, and writes it.
>

If I cannot find a more deterministic way, I will try making (guix
build accounts) print the content of shadow.

Regards,
Florian
P
P
pelzflorian (Florian Pelz) wrote on 4 Jun 2019 16:12
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190604141217.7tq33idseebne5v2@pelzflorian.localdomain
On Tue, Jun 04, 2019 at 02:17:11PM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (6 lines)
> On Tue, Jun 04, 2019 at 11:22:45AM +0200, Ludovic Courtès wrote:
> > What’s the effect of this brokenness concretely? Is the wrong root file
> > system mounted, or something like that?
> >
>

When removing quiet from the linux command line, shepherd complains
incessantly that it is trying to load udev.


All seems irrelevant, but:

Toggle quote (14 lines)
> I have multiple broken generation. On one that now for a third time
> (on old generations without Ludo’s patches) led to a locked
> /etc/shadow after booting I changed the line
> (let ((pid (fork+exec-command (list udevd))))
> in gnu/services/base.scm to, I believe, this:
> (let ((pid (fork+exec-command (list udevd "--debug-trace"))))
>
> (I am unsure if this is the same broken generation as on my first
> report of the issue. I may have gotten confused.)
>
> This is unbootable, correct would have been --debug and not
> --debug-trace.
>

The line in
/gnu/store/kdql26k1pxgm74d94ryzk8fb4lg5q0ra-shepherd-udev.scm
referenced from
/gnu/store/b7mrb5pzsbbvhjmi8lbm9xa4wgvqbc7g-shepherd.conf referenced
from the broken /var/guix/profiles/system-35-link/boot
is

(let ((pid (fork+exec-command (list udevd "--debug-trace" "--verbose"))))

with an unneeded --verbose.

Toggle quote (3 lines)
> I may also have changed my syslog configuration to the incorrect
>

No, the syslog.conf in
/gnu/store/y5nrfbj52vlnj77iyki9hbji8qjwk86d-syslog.conf referenced
from /gnu/store/5bp6c0p357gaqikxkmvs0idrmvdrzf7h-shepherd-syslogd.scm
referenced from
/gnu/store/b7mrb5pzsbbvhjmi8lbm9xa4wgvqbc7g-shepherd.conf referenced
from the broken /var/guix/profiles/system-35-link/boot
appears to be the default syslog.conf.

Booting this old generation and then an older working generation
sometimes leads to a broken /etc/shadow. I do not yet know if a
broken /etc/shadow can result when booting this old generation and
then a new patched generation.

I will reconfigure some more and try getting a bad /etc/shadow even
with Ludo’s patches.

Regards,
Florian
P
P
pelzflorian (Florian Pelz) wrote on 4 Jun 2019 19:17
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190604171715.gvwr54wiek4xs24e@pelzflorian.localdomain
I got a locked /etc/shadow again now despite Ludovic’s patches (which
would nonetheless give me a better feeling when pushed).

When booting an unbootable generation with Ludovic’s patches and then
rebooting a normal generation with Ludovic’s patches, /etc/shadow is
locked.

Note that I get a message like “/dev/mapper/Guix: recovering journal”
when booting (I did not pay attention to that before). I shut down
the unbootable generation with Ctrl+Alt+Del. When I normally shut
down with Ctrl+Alt+Del I get no such message.

The unbootable generation was deliberately made unbootable like this:

On Tue, Jun 04, 2019 at 04:12:17PM +0200, pelzflorian (Florian Pelz) wrote:
Toggle quote (24 lines)
> > I have multiple broken generation. On one that now for a third time
> > (on old generations without Ludo’s patches) led to a locked
> > /etc/shadow after booting I changed the line
> > (let ((pid (fork+exec-command (list udevd))))
> > in gnu/services/base.scm to, I believe, this:
> > (let ((pid (fork+exec-command (list udevd "--debug-trace"))))
> >
> > (I am unsure if this is the same broken generation as on my first
> > report of the issue. I may have gotten confused.)
> >
> > This is unbootable, correct would have been --debug and not
> > --debug-trace.
> >
>
> The line in
> /gnu/store/kdql26k1pxgm74d94ryzk8fb4lg5q0ra-shepherd-udev.scm
> referenced from
> /gnu/store/b7mrb5pzsbbvhjmi8lbm9xa4wgvqbc7g-shepherd.conf referenced
> from the broken /var/guix/profiles/system-35-link/boot
> is
>
> (let ((pid (fork+exec-command (list udevd "--debug-trace" "--verbose"))))
>

Regards,
Florian
L
L
Ludovic Courtès wrote on 4 Jun 2019 23:21
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 35996@debbugs.gnu.org)
87o93d6o8u.fsf@gnu.org
Hi,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

Toggle quote (3 lines)
> I got a locked /etc/shadow again now despite Ludovic’s patches (which
> would nonetheless give me a better feeling when pushed).

Will do. :-)

Toggle quote (4 lines)
> When booting an unbootable generation with Ludovic’s patches and then
> rebooting a normal generation with Ludovic’s patches, /etc/shadow is
> locked.

So with this scenario, the problem is 100% reproducible, right?

Toggle quote (5 lines)
> Note that I get a message like “/dev/mapper/Guix: recovering journal”
> when booting (I did not pay attention to that before). I shut down
> the unbootable generation with Ctrl+Alt+Del. When I normally shut
> down with Ctrl+Alt+Del I get no such message.

Indeed, ‘shepherd’ calls ‘disable-reboot-on-ctrl-alt-del’ (which
disables “hard” reboots upon ctrl-alt-del and instead notifies it) after
it has loaded its config file.

In your case, loading the config file never completes because the
‘start’ method is called from the config file for every service, and one
of them, udev, never starts. Thus, when you press Ctrl-Alt-Del, you
perform a hard reboot.

The hard reboot happens after Guix has written to /etc/shadow. One
possibility is that changes to this file haven’t been flushed to disk.
Thus, on the next boot, we start off with an empty or truncated
/etc/shadow, leading the activation snippet to initialize passwords.


If that theory holds, the patch below (on top of the others) should
help. Could you give it a try?

Actually, the fact that ‘rename-file’ was called *before* ‘close-port’
could be problematic in itself; so perhaps, even without the ‘fdatasync’
call, we’d get better results… especially since ‘fdatasync’ won’t be
available in the initrd anyway, hmm…

Thanks,
Ludo’.
Toggle diff (41 lines)
diff --git a/gnu/build/accounts.scm b/gnu/build/accounts.scm
index 8687446aa6..c13e6f2e89 100644
--- a/gnu/build/accounts.scm
+++ b/gnu/build/accounts.scm
@@ -19,6 +19,7 @@
(define-module (gnu build accounts)
#:use-module (guix records)
#:use-module (guix combinators)
+ #:use-module ((guix build syscalls) #:select (fdatasync))
#:use-module (gnu system accounts)
#:use-module (srfi srfi-1)
#:use-module (srfi srfi-11)
@@ -230,6 +231,14 @@ each field."
;; grab this lock with 'with-file-lock' when they access the databases.
"/etc/.pwd.lock")
+(define-syntax-rule (catch-ENOSYS exp)
+ (catch 'system-error
+ (lambda () exp)
+ (lambda args
+ (if (= ENOSYS (system-error-errno args))
+ #f
+ (apply throw args)))))
+
(define (database-writer file mode entry->string)
(lambda* (entries #:optional (file-or-port file))
"Write ENTRIES to FILE-OR-PORT. When FILE-OR-PORT is a file name, write
@@ -249,9 +258,12 @@ to it atomically and set the appropriate permissions."
(lambda ()
(chmod port mode)
(write-entries port)
- (rename-file template file-or-port))
- (lambda ()
+ (catch-ENOSYS (fdatasync port))
(close-port port)
+ (rename-file template file-or-port))
+ (lambda ()
+ (unless (port-closed? port)
+ (close-port port))
(when (file-exists? template)
(delete-file template))))))))
P
P
pelzflorian (Florian Pelz) wrote on 5 Jun 2019 08:16
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190605061611.py3v3msydbfn2eoe@pelzflorian.localdomain
On Tue, Jun 04, 2019 at 11:21:05PM +0200, Ludovic Courtès wrote:
Toggle quote (8 lines)
> "pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:
> > When booting an unbootable generation with Ludovic’s patches and then
> > rebooting a normal generation with Ludovic’s patches, /etc/shadow is
> > locked.
>
> So with this scenario, the problem is 100% reproducible, right?
>

/etc/shadow is not always locked, even if I get a recovering journal
message.


Toggle quote (4 lines)
> If that theory holds, the patch below (on top of the others) should
> help. Could you give it a try?
>

Will do.

Regards,
Florian
L
L
Ludovic Courtès wrote on 5 Jun 2019 11:54
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 35996@debbugs.gnu.org)
87imtk73xs.fsf@gnu.org
Hi,

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

Toggle quote (2 lines)
> On Tue, Jun 04, 2019 at 11:21:05PM +0200, Ludovic Courtès wrote:

[...]

Toggle quote (6 lines)
>> If that theory holds, the patch below (on top of the others) should
>> help. Could you give it a try?
>>
>
> Will do.

Note that you’ll have to create a new “broken” generation with this
patch (because we already know that the old one can corrupt
/etc/shadow.)

Ludo’.
P
P
pelzflorian (Florian Pelz) wrote on 5 Jun 2019 13:06
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190605110658.7metilrqike4juml@pelzflorian.localdomain
It appears your patch fixes the issue. I admire the speed at which
you write patches. :) Thank you!

On Wed, Jun 05, 2019 at 11:54:23AM +0200, Ludovic Courtès wrote:
Toggle quote (5 lines)
> Note that you’ll have to create a new “broken” generation with this
> patch (because we already know that the old one can corrupt
> /etc/shadow.)
>

I created a new working generation and then a new unbootable
generation with broken udevd args, both with all your patches. I
rebooted the broken and then the working generation repeatedly twelve
times. I waited varying amounts of time before doing Ctrl+Alt+Del in
the broken generation. /etc/shadow is still in good health.

However:

On Tue, Jun 04, 2019 at 11:21:05PM +0200, Ludovic Courtès wrote:
Toggle quote (4 lines)
> Indeed, ‘shepherd’ calls ‘disable-reboot-on-ctrl-alt-del’ (which
> disables “hard” reboots upon ctrl-alt-del and instead notifies it) after
> it has loaded its config file.

Is there a good reason shepherd calls disable-reboot-on-ctrl-alt-del
at the end? I get recovering journal messages unless on the previous
boot I waited for the whole GDM to start (I can login on the TTY
before GDM has fully started), which takes a long time during which
users could change their mind and decide they do not want to boot.
(The Macbook is not fast anyway and Guix is even slower when booting
compared to Debian.)

Regards,
Florian
L
L
Ludovic Courtès wrote on 5 Jun 2019 17:31
control message for bug #35996
(address . control@debbugs.gnu.org)
87imtkyrpf.fsf@gnu.org
merge 35996 35902
quit
L
L
Ludovic Courtès wrote on 5 Jun 2019 23:13
Re: bug#35996: User account password got locked when booting old generation
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 35996@debbugs.gnu.org)
87a7evwxa9.fsf@gnu.org
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

Toggle quote (3 lines)
> It appears your patch fixes the issue. I admire the speed at which
> you write patches. :) Thank you!

Awesome! I must say that I’m really glad you’re putting this much
energy into reproducing issues and investigating—it’s rare for people
who report bug to dig this deep, but it’s super helpful and motivating!

I’ve pushed the whole series:

d088d5c484 accounts: Call 'fdatasync' when writing databases.
ed8570dce3 accounts: Close database before renaming it.
70a7a1b5dc nar: Really lock store files.
d497b6ab39 activation: Lock /etc/.pwd.lock before accessing databases.
5f0cf1df71 syscalls: 'with-lock-file' catches ENOSYS.
89ceb86ad4 syscalls: 'with-file-lock' expands to a call to 'call-with-file-lock'.
b7178c22bf syscalls: Add 'with-file-lock' macro.

The actual fix is ed8570dce3, AIUI.

Toggle quote (6 lines)
> I created a new working generation and then a new unbootable
> generation with broken udevd args, both with all your patches. I
> rebooted the broken and then the working generation repeatedly twelve
> times. I waited varying amounts of time before doing Ctrl+Alt+Del in
> the broken generation. /etc/shadow is still in good health.

Good.

Toggle quote (13 lines)
> On Tue, Jun 04, 2019 at 11:21:05PM +0200, Ludovic Courtès wrote:
>> Indeed, ‘shepherd’ calls ‘disable-reboot-on-ctrl-alt-del’ (which
>> disables “hard” reboots upon ctrl-alt-del and instead notifies it) after
>> it has loaded its config file.
>
> Is there a good reason shepherd calls disable-reboot-on-ctrl-alt-del
> at the end? I get recovering journal messages unless on the previous
> boot I waited for the whole GDM to start (I can login on the TTY
> before GDM has fully started), which takes a long time during which
> users could change their mind and decide they do not want to boot.
> (The Macbook is not fast anyway and Guix is even slower when booting
> compared to Debian.)

I agree.

The attached patch for Shepherd moves everything before loading the
config file. I think it will have the desired effect, though I’m not
entirely sure the signal handler would run at the right time etc.

You can test it on the metal if you want (you need to add the patch to
the ‘shepherd’ package), but I’ll see if I can test in a VM.

Thank you!

Ludo’.
Toggle diff (74 lines)
diff --git a/modules/shepherd.scm b/modules/shepherd.scm
index 8b2cc1d..769085a 100644
--- a/modules/shepherd.scm
+++ b/modules/shepherd.scm
@@ -198,34 +198,6 @@ socket file at FILE-NAME upon exit of PROC. Return the values of PROC."
;; Start the 'root' service.
(start root-service)
- ;; This _must_ succeed. (We could also put the `catch' around
- ;; `main', but it is often useful to get the backtrace, and
- ;; `caught-error' does not do this yet.)
- (catch #t
- (lambda ()
- (load-in-user-module (or config-file (default-config-file))))
- (lambda (key . args)
- (caught-error key args)
- (quit 1)))
- ;; Start what was started last time.
- (and persistency
- (catch 'system-error
- (lambda ()
- (start-in-order (read (open-input-file
- persistency-state-file))))
- (lambda (key . args)
- (apply format #f (gettext (cadr args)) (caddr args))
- (quit 1))))
-
- (when (provided? 'threads)
- ;; XXX: This terrible hack allows us to make sure that signal handlers
- ;; get a chance to run in a timely fashion. Without it, after an EINTR,
- ;; we could restart the accept(2) call below before the corresponding
- ;; async has been queued. See the thread at
- ;; <https://lists.gnu.org/archive/html/guile-devel/2013-07/msg00004.html>.
- (sigaction SIGALRM (lambda _ (alarm 1)))
- (alarm 1))
-
(when (= 1 (getpid))
;; When running as PID 1, disable hard reboots upon ctrl-alt-del.
;; Instead, the kernel will send us SIGINT so that we can gracefully
@@ -259,6 +231,34 @@ socket file at FILE-NAME upon exit of PROC. Return the values of PROC."
(lambda _
(stop root-service)))
+ ;; This _must_ succeed. (We could also put the `catch' around
+ ;; `main', but it is often useful to get the backtrace, and
+ ;; `caught-error' does not do this yet.)
+ (catch #t
+ (lambda ()
+ (load-in-user-module (or config-file (default-config-file))))
+ (lambda (key . args)
+ (caught-error key args)
+ (quit 1)))
+ ;; Start what was started last time.
+ (and persistency
+ (catch 'system-error
+ (lambda ()
+ (start-in-order (read (open-input-file
+ persistency-state-file))))
+ (lambda (key . args)
+ (apply format #f (gettext (cadr args)) (caddr args))
+ (quit 1))))
+
+ (when (provided? 'threads)
+ ;; XXX: This terrible hack allows us to make sure that signal handlers
+ ;; get a chance to run in a timely fashion. Without it, after an EINTR,
+ ;; we could restart the accept(2) call below before the corresponding
+ ;; async has been queued. See the thread at
+ ;; <https://lists.gnu.org/archive/html/guile-devel/2013-07/msg00004.html>.
+ (sigaction SIGALRM (lambda _ (alarm 1)))
+ (alarm 1))
+
;; Ignore SIGPIPE so that we don't die if a client closes the connection
;; prematurely.
(sigaction SIGPIPE SIG_IGN)
P
P
pelzflorian (Florian Pelz) wrote on 6 Jun 2019 09:01
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 35996@debbugs.gnu.org)
20190606070143.poiz4anyd3i3bjkg@pelzflorian.localdomain
On Wed, Jun 05, 2019 at 11:13:34PM +0200, Ludovic Courtès wrote:
Toggle quote (4 lines)
> Awesome! I must say that I’m really glad you’re putting this much
> energy into reproducing issues and investigating—it’s rare for people
> who report bug to dig this deep, but it’s super helpful and motivating!

:)


Toggle quote (5 lines)
> The attached patch for Shepherd moves everything before loading the
> config file. I think it will have the desired effect, though I’m not
> entirely sure the signal handler would run at the right time etc.
>

It works for me without recovering journal message (and taking an
insignificantly longer time to reboot).

Regards,
Florian
L
L
Ludovic Courtès wrote on 6 Jun 2019 10:04
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)(address . 35996-done@debbugs.gnu.org)
87v9xjqgvi.fsf@gnu.org
Hello!

"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> skribis:

Toggle quote (2 lines)
> On Wed, Jun 05, 2019 at 11:13:34PM +0200, Ludovic Courtès wrote:

[...]

Toggle quote (8 lines)
>> The attached patch for Shepherd moves everything before loading the
>> config file. I think it will have the desired effect, though I’m not
>> entirely sure the signal handler would run at the right time etc.
>>
>
> It works for me without recovering journal message (and taking an
> insignificantly longer time to reboot).

Excellent. Pushed as Shepherd commit
c6f250d1fd1afa9ee49c8bb2414eee087b672789.

Thank you!

Ludo’.
Closed
?