System activation fails due to preexisting /etc/modprobe.d

  • Done
  • quality assurance status badge
Details
2 participants
  • Ludovic Courtès
  • Maxim Cournoyer
Owner
unassigned
Submitted by
Maxim Cournoyer
Severity
normal
M
M
Maxim Cournoyer wrote on 26 Sep 2022 04:37
guix deploy fails, leaving the newly installed system generation active
(name . bug-guix)(address . bug-guix@gnu.org)
8735ce7tdv.fsf@gmail.com
Hi,

While attempting to deploy to overdrive1, using the 9971141 commit in
the maintenance repo, I encountered the following error:

Toggle snippet (12 lines)
maxim@hurd ~/src/guix-maintenance/hydra$ guix time-machine --commit=08d515233241ee0921b8b5ab706f98170c62437c -- deploy -L modules deploy-overdrive1.scm
The following 1 machine will be deployed:
overdrive1

guix deploy: deploying to overdrive1...
guix deploy: sending 0 store items (0 MiB) to 'overdrive1.guix.gnu.org'...
guix deploy: sending 0 store items (0 MiB) to 'overdrive1.guix.gnu.org'...
guix deploy: sending 0 store items (0 MiB) to 'overdrive1.guix.gnu.org'...
guix deploy: error: failed to deploy overdrive1: failed to switch systems while deploying 'overdrive1':
system-error "symlink" "~A" ("File exists") (17)

It also looks like even the above failed to "deploy" fully, the system
generation was left as the last active one:

Toggle snippet (19 lines)
[...]
Generation 28 Sep 26 2022 04:04:36 (current)
file name: /var/guix/profiles/system-28-link
canonical file name: /gnu/store/c02w7nyl5nr19x856455p2wh959r25h8-system
label: GNU with Linux-Libre 5.19.10
bootloader: grub-efi
root device: /dev/sda3
kernel: /gnu/store/nmdy7c4i34y12w8af7zl6sl9fmrp8wa0-linux-libre-5.19.10/Image
channels:
sfl-packages:
repository URL: https://gitlab.com/Apteryks/sfl-guix-channel
branch: master
commit: 6385881124429016f750b0f562b70e07f592275e
guix:
repository URL: https://git.savannah.gnu.org/git/guix.git
commit: 08d515233241ee0921b8b5ab706f98170c62437c
configuration file: /gnu/store/myvzd1kpw2pfzfj3krl4lzpcbqsdn48x-configuration.scm

Which leaves me with two questions:

1. why did it fail?

2. when it encounters any error while deploying, shouldn't the
generation be removed instead of left as the active one?

Thanks,

Maxim
L
L
Ludovic Courtès wrote on 26 Sep 2022 17:39
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)(address . 58084@debbugs.gnu.org)
87h70ukuuz.fsf@gnu.org
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (14 lines)
> While attempting to deploy to overdrive1, using the 9971141 commit in
> the maintenance repo, I encountered the following error:
>
> maxim@hurd ~/src/guix-maintenance/hydra$ guix time-machine --commit=08d515233241ee0921b8b5ab706f98170c62437c -- deploy -L modules deploy-overdrive1.scm
> The following 1 machine will be deployed:
> overdrive1
>
> guix deploy: deploying to overdrive1...
> guix deploy: sending 0 store items (0 MiB) to 'overdrive1.guix.gnu.org'...
> guix deploy: sending 0 store items (0 MiB) to 'overdrive1.guix.gnu.org'...
> guix deploy: sending 0 store items (0 MiB) to 'overdrive1.guix.gnu.org'...
> guix deploy: error: failed to deploy overdrive1: failed to switch systems while deploying 'overdrive1':
> system-error "symlink" "~A" ("File exists") (17)

I can reproduce it.

The failing code is in /gnu/store/…-switch-to-system.scm:

Toggle snippet (21 lines)
(begin
(use-modules
(guix config)
(guix profiles)
(guix utils))
(define profile
(or #f
(string-append %state-directory "/profiles/system")))
(let*
((number
(#{1+}
#
(generation-number profile)))
(generation
(generation-file-name profile number)))
(switch-symlinks generation "/gnu/store/kifxq4hmp4ihn6nb06ia8wms33qrndxn-system")
(switch-symlinks profile generation)
(setenv "GUIX_NEW_SYSTEM" "/gnu/store/kifxq4hmp4ihn6nb06ia8wms33qrndxn-system")
(primitive-load "/gnu/store/1wdwlaqkmixb1d7by7fj23lxppw8x44r-activate.scm")))

We can run it manually to get debugging data:

Toggle snippet (23 lines)
ludo@overdrive1 ~$ sudo -E env -i COLUMNS=100 "/gnu/store/xv7j4im9ap92mv0mbsm1wa4px93zxrms-switch-to-system.scm"
making '/gnu/store/kifxq4hmp4ihn6nb06ia8wms33qrndxn-system' the current system...
WARNING: (guile-user): imported module (guix build utils) overrides core binding `delete'
setting up setuid programs in '/run/setuid-programs'...
populating /etc from /gnu/store/hf3qxlaiajvapwis0lq20avgl2whfa5w-etc...
Backtrace:
6 (primitive-load "/gnu/store/xv7j4im9ap92mv0mbsm1wa4px93zxrms-switch-to-system.scm")
5 (primitive-load "/gnu/store/1wdwlaqkmixb1d7by7fj23lxppw8x44r-activate.scm")
In ice-9/boot-9.scm:
260:13 4 (for-each #<procedure primitive-load (_)> _)
In unknown file:
3 (primitive-load "/gnu/store/v03vaksmkpj7wv4dhm0yrd3y65lzbixz-activate-service.scm")
In srfi/srfi-1.scm:
634:9 2 (for-each #<procedure ffffaaff10e0 at gnu/build/activation.scm:257:12 (file)> _)
In gnu/build/activation.scm:
267:20 1 (_ "modprobe.d")
In unknown file:
0 (symlink "/etc/static/modprobe.d" "/etc/modprobe.d")

ERROR: In procedure symlink:
In procedure symlink: File exists

This is because ‘zram-device-service-type’ contributes a file to
/etc/modprobe.d:

Toggle snippet (19 lines)
(define %zram-device-config
`("modprobe.d/zram.conf"
,(plain-file "zram.conf"
"options zram num_devices=1")))

(define zram-device-service-type
(service-type
(name 'zram)
(default-value (zram-device-configuration))
(extensions
(list (service-extension kernel-module-loader-service-type
(const (list "zram")))
(service-extension etc-service-type
(const (list %zram-device-config)))
(service-extension udev-service-type
(compose list zram-device-udev-rule))))
(description "Creates a zram swap device.")))

… which is fine, except that there was already a pre-existing
/etc/modprobe.d directory (coming from openSuSE, the distro that was
initially installed on this machine), which caused this activation code
to break:

Toggle snippet (14 lines)
ludo@overdrive1 ~$ ls -l /etc/modprobe.d
total 36
-rw-r--r-- 1 root root 3221 Nov 6 2016 00-system.conf
-rw-r--r-- 1 root root 532 Nov 14 2012 10-unsupported-modules.conf
-rw-r--r-- 1 root root 181 May 5 2017 50-alsa.conf
-rw-r--r-- 1 root root 5009 Sep 15 2016 50-blacklist.conf
-rw-r--r-- 1 root root 128 Oct 12 2017 50-bluetooth.conf
-rw-r--r-- 1 root root 33 Oct 20 2016 50-ipw2200.conf
-rw-r--r-- 1 root root 34 Oct 20 2016 50-iwl3945.conf
-rw-r--r-- 1 root root 47 Nov 22 2011 99-local.conf
ludo@overdrive1 ~$ ls -ld /etc/modprobe.d
drwxr-xr-x 1 root root 260 Jan 29 2018 /etc/modprobe.d/

Once moved out of the way, reconfiguration proceeds just fine and
happiness ensues:

Toggle snippet (7 lines)
ludo@overdrive1 ~$ ls -l /etc/modprobe.d
lrwxrwxrwx 1 root root 22 Sep 26 17:19 /etc/modprobe.d -> /etc/static/modprobe.d
ludo@overdrive1 ~$ ls -l /etc/modprobe.d/
total 4
lrwxrwxrwx 1 root root 53 Jan 1 1970 zram.conf -> /gnu/store/srl5xij6hf4x6iksx98grb1spcj3rch1-zram.conf

Ludo’.
L
L
Ludovic Courtès wrote on 26 Sep 2022 17:40
control message for bug #58084
(address . control@debbugs.gnu.org)
87fsgekutm.fsf@gnu.org
retitle 58084 System activation fails due to preexisting /etc/modprobe.d
quit
L
L
Ludovic Courtès wrote on 26 Sep 2022 17:41
(address . control@debbugs.gnu.org)
87czbikus6.fsf@gnu.org
tags 58084 notabug
close 58084
quit
M
M
Maxim Cournoyer wrote on 26 Sep 2022 19:46
Re: bug#58084: guix deploy fails, leaving the newly installed system generation active
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 58084@debbugs.gnu.org)
87h70uyqoj.fsf@gmail.com
Hi,

Ludovic Courtès <ludo@gnu.org> writes:

[...]

Toggle quote (52 lines)
> We can run it manually to get debugging data:
>
> ludo@overdrive1 ~$ sudo -E env -i COLUMNS=100 "/gnu/store/xv7j4im9ap92mv0mbsm1wa4px93zxrms-switch-to-system.scm"
> making '/gnu/store/kifxq4hmp4ihn6nb06ia8wms33qrndxn-system' the current system...
> WARNING: (guile-user): imported module (guix build utils) overrides core binding `delete'
> setting up setuid programs in '/run/setuid-programs'...
> populating /etc from /gnu/store/hf3qxlaiajvapwis0lq20avgl2whfa5w-etc...
> Backtrace:
> 6 (primitive-load "/gnu/store/xv7j4im9ap92mv0mbsm1wa4px93zxrms-switch-to-system.scm")
> 5 (primitive-load "/gnu/store/1wdwlaqkmixb1d7by7fj23lxppw8x44r-activate.scm")
> In ice-9/boot-9.scm:
> 260:13 4 (for-each #<procedure primitive-load (_)> _)
> In unknown file:
> 3 (primitive-load "/gnu/store/v03vaksmkpj7wv4dhm0yrd3y65lzbixz-activate-service.scm")
> In srfi/srfi-1.scm:
> 634:9 2 (for-each #<procedure ffffaaff10e0 at gnu/build/activation.scm:257:12 (file)> _)
> In gnu/build/activation.scm:
> 267:20 1 (_ "modprobe.d")
> In unknown file:
> 0 (symlink "/etc/static/modprobe.d" "/etc/modprobe.d")
>
> ERROR: In procedure symlink:
> In procedure symlink: File exists
>
>
> This is because ‘zram-device-service-type’ contributes a file to
> /etc/modprobe.d:
>
> (define %zram-device-config
> `("modprobe.d/zram.conf"
> ,(plain-file "zram.conf"
> "options zram num_devices=1")))
>
> (define zram-device-service-type
> (service-type
> (name 'zram)
> (default-value (zram-device-configuration))
> (extensions
> (list (service-extension kernel-module-loader-service-type
> (const (list "zram")))
> (service-extension etc-service-type
> (const (list %zram-device-config)))
> (service-extension udev-service-type
> (compose list zram-device-udev-rule))))
> (description "Creates a zram swap device.")))
>
>
> … which is fine, except that there was already a pre-existing
> /etc/modprobe.d directory (coming from openSuSE, the distro that was
> initially installed on this machine), which caused this activation code
> to break:

Oh wow! Should we be extra careful and always rm files before linking to
their location? Or define our own 'symlink' procedure that'd take care
of it? That's not very elegant but better than obscure crashes like
this.

What do you think?

Thanks for the debugging!

Maxim
M
M
Maxim Cournoyer wrote on 26 Sep 2022 19:48
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 58084@debbugs.gnu.org)
87a66myqlg.fsf@gmail.com
Hello again,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

[...]

Toggle quote (10 lines)
>> … which is fine, except that there was already a pre-existing
>> /etc/modprobe.d directory (coming from openSuSE, the distro that was
>> initially installed on this machine), which caused this activation code
>> to break:
>
> Oh wow! Should we be extra careful and always rm files before linking to
> their location? Or define our own 'symlink' procedure that'd take care
> of it? That's not very elegant but better than obscure crashes like
> this.

I just had a better idea: fail and report that an unexpected file was
found there, leaving the user to inspect it and choose a proper action.

Thanks,

Maxim
L
L
Ludovic Courtès wrote on 29 Sep 2022 16:47
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)(address . 58084@debbugs.gnu.org)
87zgei9r0s.fsf@gnu.org
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

Toggle quote (17 lines)
> Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:
>
> [...]
>
>>> … which is fine, except that there was already a pre-existing
>>> /etc/modprobe.d directory (coming from openSuSE, the distro that was
>>> initially installed on this machine), which caused this activation code
>>> to break:
>>
>> Oh wow! Should we be extra careful and always rm files before linking to
>> their location? Or define our own 'symlink' procedure that'd take care
>> of it? That's not very elegant but better than obscure crashes like
>> this.
>
> I just had a better idea: fail and report that an unexpected file was
> found there, leaving the user to inspect it and choose a proper action.

Yeah, that’d be nice. It’s really a corner case that you’ll only hit
when installing on a non-empty file system, but gracefully handling it
would be nice for sure.

Ludo’.
?