GRUB prevents booting a degraded RAID1 array atop LUKS

  • Open
  • quality assurance status badge
Details
2 participants
  • Giovanni Biscuolo
  • maxim.cournoyer
Owner
unassigned
Submitted by
maxim.cournoyer
Severity
normal
M
M
maxim.cournoyer wrote on 1 May 2020 15:56
(name . bug-guix)(address . bug-guix@gnu.org)
87o8r7kbbz.fsf@hurd.i-did-not-set--mail-host-address--so-tickle-me
On a system where:

1) Each disks comprising the array is fully LUKS encrypted
2) Each mapped disk is made part of a Btrfs RAID1 array

When attempting to boot the system after pulling out (in BIOS or using
the cable) the drive to simulate a complete disk failure, GRUB hangs,
prompting for the LUKS password of the disappeared drive and
(unsurprisingly) failing to open it.

This prevents booting in a degraded LUKS encrypted, Btrfs RAID1 on Guix
System.

Maxim
M
M
Maxim Cournoyer wrote on 7 Aug 2021 07:06
(address . 40999@debbugs.gnu.org)
874kc14zd9.fsf@gmail.com
Hello,

maxim.cournoyer@gmail.com writes:

Toggle quote (13 lines)
> On a system where:
>
> 1) Each disks comprising the array is fully LUKS encrypted
> 2) Each mapped disk is made part of a Btrfs RAID1 array
>
> When attempting to boot the system after pulling out (in BIOS or using
> the cable) the drive to simulate a complete disk failure, GRUB hangs,
> prompting for the LUKS password of the disappeared drive and
> (unsurprisingly) failing to open it.
>
> This prevents booting in a degraded LUKS encrypted, Btrfs RAID1 on Guix
> System.

I retested this today, and the problem still occurs. Here's a
screenshot from the failed boot (GRUB):
Ideally, GRUB (or is it our boot script?) should be smart enough to
realize that oh, that's Btrfs RAID1, it ought to work in degraded mode,
so let's keep going.

Thanks,

Maxim
G
G
Giovanni Biscuolo wrote on 11 Aug 2021 16:45
87mtpof3bk.fsf@xelera.eu
Hello Maxim,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

[...]

Toggle quote (10 lines)
>> On a system where:
>>
>> 1) Each disks comprising the array is fully LUKS encrypted
>> 2) Each mapped disk is made part of a Btrfs RAID1 array
>>
>> When attempting to boot the system after pulling out (in BIOS or using
>> the cable) the drive to simulate a complete disk failure, GRUB hangs,
>> prompting for the LUKS password of the disappeared drive and
>> (unsurprisingly) failing to open it.

[...]

Toggle quote (2 lines)
> Ideally, GRUB (or is it our boot script?)

Since the end result is your system entered "grub rescue" mode AFAIU
it's a GRUB issue

Toggle quote (3 lines)
> should be smart enough to realize that oh, that's Btrfs RAID1, it
> ought to work in degraded mode, so let's keep going.

I (still) don't have a Guix System to test your setup and (try to) patch
thing up, so we need more info to debug the situation.

Can you please provide the output of the "ls" command and the "set"
command from the grub rescue shell?

Also, please what is your /proc/cmdline (when Linux correcly boots)?

Best regards, Gio

--
Giovanni Biscuolo

Xelera IT Infrastructures
-----BEGIN PGP SIGNATURE-----

iQJABAEBCgAqFiEERcxjuFJYydVfNLI5030Op87MORIFAmET4m8MHGdAeGVsZXJh
LmV1AAoJENN9DqfOzDkSHvgP/1r757Qat/A8c6dAaeW/VmKjgn0Xguwk6TLEbPBR
yJ9PWqv/ryOuBLgYJtO/hJ14Xj1ON8KNEZgc6ckzt557UFG9M7z23U4kw6vZH3Pd
3Tmps5BiYg6eL+SyKfvhtlOlTEWh6/YY83LOjOKvfuHGEGKzqms/WYuI8MWjt8to
D0Q4uIEjrSQioKeoRLKA7zAaClii73XZOdOANWHjmM9lVpgvJfOTwd1jOg4HTE4a
Kkx4Zq99eudKEhYdxlQd44I2orY0G89TKEV95bsJknNvBQqAyTZ0Oby0gy94iEWf
4NXYp9PflTMVbADL93rCf7PEiuvGRacEQAJuczyvBTg4sZnAyA+gUR9KN3/+aHt5
UxGLW9s+Ndw+EPf/Zuklf3V4T9nLX9vNoK373XdktOFLeJ3v4t2P8KSA4Ohp9jlR
Bttg3Vwk2KA56g7gIFGwTOn5YBo3NT8n+CacNe9eoCZVIlT3X9Aze1EXjLdz0BtL
LA1HNBx3k/Z2ahNMkkGZ+iypDJzLAg6FgOSl0Wnqu6Oq3XUiEE6ZCP5LWJ9aX2NA
m5YJABhaMKKnRCn3uDZh8zbwXg2fHxzCgM+TuFxsW2jpKxWRzdpMUu+hceGGWu8U
AjTOfNb3tT4kRIpY95UrDt+Lytf8M8Aq5BIOa5rlJHR5rGkkas2njBcrK6hy05sP
WkfL
=/VQE
-----END PGP SIGNATURE-----

M
M
Maxim Cournoyer wrote on 12 Aug 2021 04:25
(name . Giovanni Biscuolo)(address . g@xelera.eu)(address . 40999@debbugs.gnu.org)
87fsvfz9eb.fsf@gmail.com
Hello Giovanni,

Giovanni Biscuolo <g@xelera.eu> writes:

Toggle quote (23 lines)
> Hello Maxim,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:
>
> [...]
>
>>> On a system where:
>>>
>>> 1) Each disks comprising the array is fully LUKS encrypted
>>> 2) Each mapped disk is made part of a Btrfs RAID1 array
>>>
>>> When attempting to boot the system after pulling out (in BIOS or using
>>> the cable) the drive to simulate a complete disk failure, GRUB hangs,
>>> prompting for the LUKS password of the disappeared drive and
>>> (unsurprisingly) failing to open it.
>
> [...]
>
>> Ideally, GRUB (or is it our boot script?)
>
> Since the end result is your system entered "grub rescue" mode AFAIU
> it's a GRUB issue

Yeah, it looks like it. The grub.cfg file only has basic things in it,
nothing that could explain the failure.

Toggle quote (6 lines)
>> should be smart enough to realize that oh, that's Btrfs RAID1, it
>> ought to work in degraded mode, so let's keep going.
>
> I (still) don't have a Guix System to test your setup and (try to) patch
> thing up, so we need more info to debug the situation.

I believe the basic recipe to reproduce is there:

1. Partition two drives like so (GPT with 2MiB BIOS boot):

$ sudo sfdisk -l /dev/sda
Disk /dev/sda: 931.53 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: WDC WD1002FAEX-0
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: B5BB7BA4-23A3-4E7C-87BB-8339B02C5905

Device Start End Sectors Size Type
/dev/sda1 2048 6143 4096 2M BIOS boot
/dev/sda2 6144 1953523711 1953517568 931.5G Linux filesystem

$ sudo sfdisk -l /dev/sdb
Disk /dev/sdb: 931.53 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: WDC WD1002FAEX-0
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 45C58C18-7B39-A745-B22F-6A2321FB1999

Device Start End Sectors Size Type
/dev/sdb1 2048 6143 4096 2M BIOS boot
/dev/sdb2 6144 1953523711 1953517568 931.5G Linux filesystem

2. LUKS encrypt the whole 2nd (main) partition of each drive.

3. Format the mapped devices as Btrfs RAID1.

4. Reconfigure a Guix system on top of that (see a config snippet below)

5. Disconnect one of the two drives and reboot.

6. Contemplate the failure to get past GRUB.

Toggle quote (3 lines)
> Can you please provide the output of the "ls" command and the "set"
> command from the grub rescue shell?

I'll post after rebooting.

Toggle quote (2 lines)
> Also, please what is your /proc/cmdline (when Linux correcly boots)?

Toggle snippet (7 lines)
BOOT_IMAGE=/@root/gnu/store/1c0dkkkv5vdnyp73gvcl9k1kym5jjm54-linux-libre-5.13.8/bzImage
--root=/dev/mapper/cryptroot
--system=/gnu/store/815481yf1kfacwgkh4aa11rlb3lm6gvi-system
--load=/gnu/store/815481yf1kfacwgkh4aa11rlb3lm6gvi-system/boot quiet
snd_hda_intel.dmic_detect=0 modprobe.blacklist=rtl8187

The system config relevant sections are:

Toggle snippet (57 lines)
(operating-system
(host-name "hurd")
(timezone "America/Montreal")
(keyboard-layout (keyboard-layout "dvorak"))
(bootloader (bootloader-configuration
(bootloader grub-bootloader)
(target "/dev/sda")
(terminal-outputs '(console))
(keyboard-layout keyboard-layout)))
(kernel-arguments '("quiet" "snd_hda_intel.dmic_detect=0"
"modprobe.blacklist=rtl8187"))
(mapped-devices
(list (mapped-device
(source "/dev/sda2")
(target "cryptroot")
(type luks-device-mapping))
(mapped-device
(source "/dev/sdb2")
(target "cryptroot-mirror")
(type luks-device-mapping))
(mapped-device
(source "/dev/sdc2")
(target "cryptroot-mirror2")
(type luks-device-mapping))))

;; Note: Using any of the LUKS encrypted drives exposed under
;; /dev/mapper is enough to reference the Btrfs RAID-1 array,
;; since the 'btrfs device scan' command is executed in the init
;; RAM disk and takes care of assembling the array.
(file-systems (cons* (file-system
(mount-point "/")
(device "/dev/mapper/cryptroot")
(type "btrfs")
(options (alist->file-system-options
(cons '("subvol" . "@root")
%common-btrfs-options)))
(dependencies mapped-devices))
(file-system
(device "/dev/mapper/cryptroot")
(mount-point "/home")
(type "btrfs")
(options (alist->file-system-options
(cons '("subvol" . "@home")
%common-btrfs-options)))
(dependencies mapped-devices))
(file-system
(device "/dev/mapper/cryptroot")
(mount-point "/data")
(type "btrfs")
(options (alist->file-system-options
(cons '("subvol" . "@data")
%common-btrfs-options)))
(dependencies mapped-devices))
%base-file-systems))
[...]

Thanks,

Maxim
G
G
Giovanni Biscuolo wrote on 13 Aug 2021 17:05
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)(address . 40999@debbugs.gnu.org)
877dgpfkrb.fsf@xelera.eu
Hi Maxim,

I'd "debug" the issue trying to compare my Debian system config with
yours since I'm also using a BTRFS RAID1 filesystem on LUKS.

I've still not unplugged one of the two disks on mine to simulate a
drive failure, Soon™ I'd like to test this condition... but it's a
busy machine so I don't know when.

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

[...]

Toggle quote (8 lines)
>>> Ideally, GRUB (or is it our boot script?)
>>
>> Since the end result is your system entered "grub rescue" mode AFAIU
>> it's a GRUB issue
>
> Yeah, it looks like it. The grub.cfg file only has basic things in it,
> nothing that could explain the failure.

Please could you also provide the result of "lsblk -f"?

This is (part of) my disks layout:

Toggle snippet (22 lines)
sdc
??sdc1
??sdc2 vfat F6D8-67E3 470.8M 1% /boot/efi
??sdc3 crypto_L e554b806-19ac-48b2-b521-b4e89839a756
? ??crypt_swap01
? swap a43ce70c-dd35-47d8-a2ef-ef9d3c6d0885 [SWAP]
??sdc4 crypto_L 820bfdf7-46f7-46f5-8536-7e1b0f04e70e
??crypt_btrfs01_03
btrfs btrfs_pool01 82afe97a-bb97-4b3d-90cb-93a058185b97
sdd
??sdd1
??sdd2
??sdd3 crypto_L 960aa919-182b-4604-a8be-8477c86386cc
? ??crypt_swap02
? swap 3f8f6974-05a9-4047-993a-c4ccb27eaa1d [SWAP]
??sdd4 crypto_L c590c62e-6ac8-418c-9ea7-7ae9c79058c8
??crypt_btrfs01_04
btrfs btrfs_pool01 82afe97a-bb97-4b3d-90cb-93a058185b97 802.3G 57% /mnt/btrfs


btrfs_pool01 is my BTRFS RAID1 filesystem, it includes /boot and /
(root) and is on two ancrypted LUKS partitions, as you can see.

Also, please what's your grub.cfg?

This is the config of a menuentry of mine:

Toggle snippet (27 lines)
menuentry 'Debian GNU/Linux' --class debian --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-82afe97a-bb97-4b3d-90cb-93a058185b97' {
load_video
insmod gzio
if [ x$grub_platform = xxen ]; then insmod xzio; insmod lzopio; fi
insmod part_gpt
insmod cryptodisk
insmod luks
insmod gcry_rijndael
insmod gcry_rijndael
insmod gcry_sha256
insmod btrfs
cryptomount -u c590c62e6ac8418c9ea77ae9c79058c8
set root='cryptouuid/c590c62e6ac8418c9ea77ae9c79058c8'
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root --hint='cryptouuid/c590c62e6ac8418c9ea77ae9c79058c8' 82afe97a-bb97-4b3d-90cb-93a058185b97
else
search --no-floppy --fs-uuid --set=root 82afe97a-bb97-4b3d-90cb-93a058185b97
fi
echo 'Loading Linux 5.10.0-0.bpo.3-amd64 ...'
linux /debian_root/boot/vmlinuz-5.10.0-0.bpo.3-amd64 root=UUID=82afe97a-bb97-4b3d-90cb-93a058185b97 ro rootflags=subvol=debian_root ip=10.38.2.2::10.38.2.1:255.255.255.0:anemone:eth0:none quiet
echo 'Loading initial ramdisk ...'
initrd /debian_root/boot/initrd.img-5.10.0-0.bpo.3-amd64
}


AFAIU this code (from the snippet above):

Toggle snippet (9 lines)
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root --hint='cryptouuid/c590c62e6ac8418c9ea77ae9c79058c8' 82afe97a-bb97-4b3d-90cb-93a058185b97
else
search --no-floppy --fs-uuid --set=root 82afe97a-bb97-4b3d-90cb-93a058185b97
fi


sets [1] the root GRUB env variable to the first found device containing
the UUID 82afe97a-bb97-4b3d-90cb-93a058185b97, that is the UUID of my
BTRFS filesystem

AFAIU (but still not tested) this means that if the device with UUID
c590c62[...] is missing the search ensures that GRUB will find the next
device containing the BTRFS filesystem identified by UUID 82afe97a[...]

WDYT?


[...]

Toggle quote (5 lines)
>> Can you please provide the output of the "ls" command and the "set"
>> command from the grub rescue shell?
>
> I'll post after rebooting.

OK thanks.

Toggle quote (10 lines)
>> Also, please what is your /proc/cmdline (when Linux correcly boots)?
>
> --8<---------------cut here---------------start------------->8---
> BOOT_IMAGE=/@root/gnu/store/1c0dkkkv5vdnyp73gvcl9k1kym5jjm54-linux-libre-5.13.8/bzImage
> --root=/dev/mapper/cryptroot
> --system=/gnu/store/815481yf1kfacwgkh4aa11rlb3lm6gvi-system
> --load=/gnu/store/815481yf1kfacwgkh4aa11rlb3lm6gvi-system/boot quiet
> snd_hda_intel.dmic_detect=0 modprobe.blacklist=rtl8187
> --8<---------------cut here---------------end--------------->8---

This is mine (derived from the GRUB menu entry shown above):

Toggle snippet (5 lines)
BOOT_IMAGE=/debian_root/boot/vmlinuz-5.10.0-0.bpo.3-amd64 root=UUID=82afe97a-bb97-4b3d-90cb-93a058185b97 ro rootflags=subvol=debian_root ip=10.38.2.2::10.38.2.1:255.255.255.0:anemone:eth0:none quiet


AFAIU using "root=UUID=..." is more robust than using the (possibly
missing) device mapper path.

[...]

Hope this helps.

Best regards, Gio'

--
Giovanni Biscuolo

Xelera IT Infrastructures
-----BEGIN PGP SIGNATURE-----

iQJABAEBCgAqFiEERcxjuFJYydVfNLI5030Op87MORIFAmEWiigMHGdAeGVsZXJh
LmV1AAoJENN9DqfOzDkSY8YP/iB/GPK1aMMIGc6ypasdMXP136TYlUke51SLjIQ/
rhhYnqy4XYYvK0ajM5IwAmbljZTeFkMuHCUZOp8r7BTHRioJGO3Qt+Hm2G3i5acO
OBnLqfLiJk63pF5vX3Qt/4rtQka89SBjUI9M17PHEfGHJ/8i3eGH9TzcD7muQe/d
AjXVMYGC4I8XC6wBP9y7kHTh3S4VDy95APWinacfomBq7kJe3rB3OBuTKfAT6chx
QhrVT/pIQMrmJdKNRlAgPmT3kWO6HUeeShYiTdjMZgJHiwIUw1JaMJgxTkLLBwZD
Lr8BDifYI95D6DMrdsYH3uzu+TgRyo/WQ37jHiZ6l7TNnCM4JeRtVcjhnEXepGbo
cezY69ZYxquO5br9vZc6bXwu+qc+ecjH5FRyuxbq2lgWUVKcHVzggOV2g1veLzLU
qxdpp6c2HBGj9roPQSWy/o7fNOGKcaQ1baTcIHUpo3cctQL1zbZUC1g8ibP3Zkl2
W0kkFe6l1svgMp+v9vgeoDh2w8XEJbLRu5wnLWMqSidF92vbqwWvZZrgFb5PSfPd
qvXEnzO97S6oQzC0WwkZj0TfBV3Yq80IJNWVzbx2XqW6fbC7q/rndAjRwp8kCNbB
mhwI+u7rl1GIZbrwjlV9owKorSGmg3kOLKB1kTLmRm36IEBGe9J/p0zlSXpU1ZzH
P2A/
=NkUw
-----END PGP SIGNATURE-----

M
M
Maxim Cournoyer wrote on 29 Aug 2021 08:15
(name . Giovanni Biscuolo)(address . g@xelera.eu)(address . 40999@debbugs.gnu.org)
87wno4u3km.fsf@gmail.com
Hello Giovanni!

I've finally reboot the machine, so am I sharing the information
requested:

Giovanni Biscuolo <g@xelera.eu> writes:

Toggle quote (5 lines)
> Hi Maxim,
>
> I'd "debug" the issue trying to compare my Debian system config with
> yours since I'm also using a BTRFS RAID1 filesystem on LUKS.

Sounds useful!

[...]

Toggle quote (2 lines)
> Please could you also provide the result of "lsblk -f"?

NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINT
sda
sda1
sda2 crypto_LU 0792432c-78d8-4dcc-87c5-30200c3d02db
cryptroot btrfs my-root 2e97fbbd-fa4e-4858-948b-b3a89278a39b 201.2G 77% /var/lib/dock
sdb
sdb1
sdb2 crypto_LU a9aead40-9d01-4f7a-bb83-be70dd192b7b
cryptroot-mirror
btrfs my-root 2e97fbbd-fa4e-4858-948b-b3a89278a39b
sdc
sdc1
sdc2 crypto_LU f0afd5c9-da70-46a7-9c6f-5d22913638bf
cryptroot-mirror2
btrfs my-root 2e97fbbd-fa4e-4858-948b-b3a89278a39b
sdd crypto_LU f04928db-90aa-458c-8908-036a620b74f6
luks-f04928db-90aa-458c-8908-036a620b74f6
btrfs Seagate2TB 231e9e86-e841-4c97-81f1-013a2b8d99c2 1.6T 12% /media/maxim/
sr0
sr1
zram0 swap 76423fb7-9d60-47fc-b64c-313f0a7b1f55 [SWAP]
Toggle snippet (32 lines)
The Btrfs file system in my case is labeled 'my-root' and composed of 3
drives in a raid1c3 btrfs array (3 copies). @root is a subvolume on
which the root file system lives.

> This is (part of) my disks layout:
>
>
> sdc
> ..sdc1
> ..sdc2 vfat F6D8-67E3 470.8M 1% /boot/efi
> ..sdc3 crypto_L e554b806-19ac-48b2-b521-b4e89839a756
> . ..crypt_swap01
> . swap a43ce70c-dd35-47d8-a2ef-ef9d3c6d0885 [SWAP]
> ..sdc4 crypto_L 820bfdf7-46f7-46f5-8536-7e1b0f04e70e
> ..crypt_btrfs01_03
> btrfs btrfs_pool01 82afe97a-bb97-4b3d-90cb-93a058185b97
> sdd
> ..sdd1
> ..sdd2
> ..sdd3 crypto_L 960aa919-182b-4604-a8be-8477c86386cc
> . ..crypt_swap02
> . swap 3f8f6974-05a9-4047-993a-c4ccb27eaa1d [SWAP]
> ..sdd4 crypto_L c590c62e-6ac8-418c-9ea7-7ae9c79058c8
> ..crypt_btrfs01_04
> btrfs btrfs_pool01 82afe97a-bb97-4b3d-90cb-93a058185b97 802.3G 57% /mnt/btrfs
>
>
> btrfs_pool01 is my BTRFS RAID1 filesystem, it includes /boot and /
> (root) and is on two ancrypted LUKS partitions, as you can see.

Toggle quote (2 lines)
> Also, please what's your grub.cfg?

Here it is:

Toggle snippet (50 lines)
# This file was generated from your Guix configuration. Any changes
# will be lost upon reconfiguration.

# Set 'root' to the partition that contains /gnu/store.
search --file --set /@root/gnu/store/wlf9ccsl9pmch1dyv5x8c2gdngwn9m5i-grub-image.png


terminal_output console


insmod png
if background_image /@root/gnu/store/wlf9ccsl9pmch1dyv5x8c2gdngwn9m5i-grub-image.png; then
set color_normal=light-gray/black
set color_highlight=yellow/black
else
set menu_color_normal=cyan/blue
set menu_color_highlight=white/blue
fi
# Localization configuration.
# search --file --set /@root/gnu/store/q1cf63j2az4wlajg0caqy4nbndp0mvpm-grub-locales/en@quot.mo
set locale_dir=/@root/gnu/store/q1cf63j2az4wlajg0caqy4nbndp0mvpm-grub-locales
set lang=en_US
insmod keylayouts
keymap /@root/gnu/store/25s8pbpv2fnidrgir26mn97g0ciq52gz-grub-keymap.dvorak

set default=0
set timeout=5
menuentry "GNU with Linux-Libre 5.13.12" {
search --file --set /@root/gnu/store/hvmyb8maz32dy6ra5g68gr4wd08pzq3r-linux-libre-5.13.12/bzImage
linux /@root/gnu/store/hvmyb8maz32dy6ra5g68gr4wd08pzq3r-linux-libre-5.13.12/bzImage --root=/dev/mapper/cryptroot --system=/gnu/store/6qa5ga0pkjbmz8ix8gfrpy65zkl16xi7-system --load=/gnu/store/6qa5ga0pkjbmz8ix8gfrpy65zkl16xi7-system/boot quiet snd_hda_intel.dmic_detect=0 modprobe.blacklist=rtl8187
initrd /@root/gnu/store/kllyldndnazfxxrhkabgifx5zvgyz82q-raw-initrd/initrd.cpio.gz
}

submenu "GNU system, old configurations..." {
menuentry "GNU with Linux-Libre 5.13.11 (#275, 2021-08-23 23:17)" {
search --file --set /@root/gnu/store/fznnj7bgs46czizzhn186606jgr52qnp-linux-libre-5.13.11/bzImage
linux /@root/gnu/store/fznnj7bgs46czizzhn186606jgr52qnp-linux-libre-5.13.11/bzImage --root=/dev/mapper/cryptroot --system=/var/guix/profiles/system-275-link --load=/var/guix/profiles/system-275-link/boot quiet snd_hda_intel.dmic_detect=0 modprobe.blacklist=rtl8187
initrd /@root/gnu/store/g73vj8qy6kfrgmr8gnmmzh2q59cbnf2w-raw-initrd/initrd.cpio.gz
}

[...]

if [ "${grub_platform}" == efi ]; then
menuentry "Firmware setup" {
fwsetup
}
fi


Toggle quote (48 lines)
> This is the config of a menuentry of mine:
>
>
> menuentry 'Debian GNU/Linux' --class debian --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-82afe97a-bb97-4b3d-90cb-93a058185b97' {
> load_video
> insmod gzio
> if [ x$grub_platform = xxen ]; then insmod xzio; insmod lzopio; fi
> insmod part_gpt
> insmod cryptodisk
> insmod luks
> insmod gcry_rijndael
> insmod gcry_rijndael
> insmod gcry_sha256
> insmod btrfs
> cryptomount -u c590c62e6ac8418c9ea77ae9c79058c8
> set root='cryptouuid/c590c62e6ac8418c9ea77ae9c79058c8'
> if [ x$feature_platform_search_hint = xy ]; then
> search --no-floppy --fs-uuid --set=root --hint='cryptouuid/c590c62e6ac8418c9ea77ae9c79058c8' 82afe97a-bb97-4b3d-90cb-93a058185b97
> else
> search --no-floppy --fs-uuid --set=root 82afe97a-bb97-4b3d-90cb-93a058185b97
> fi
> echo 'Loading Linux 5.10.0-0.bpo.3-amd64 ...'
> linux /debian_root/boot/vmlinuz-5.10.0-0.bpo.3-amd64 root=UUID=82afe97a-bb97-4b3d-90cb-93a058185b97 ro rootflags=subvol=debian_root ip=10.38.2.2::10.38.2.1:255.255.255.0:anemone:eth0:none quiet
> echo 'Loading initial ramdisk ...'
> initrd /debian_root/boot/initrd.img-5.10.0-0.bpo.3-amd64
> }
>
>
> AFAIU this code (from the snippet above):
>
>
> if [ x$feature_platform_search_hint = xy ]; then
> search --no-floppy --fs-uuid --set=root --hint='cryptouuid/c590c62e6ac8418c9ea77ae9c79058c8' 82afe97a-bb97-4b3d-90cb-93a058185b97
> else
> search --no-floppy --fs-uuid --set=root 82afe97a-bb97-4b3d-90cb-93a058185b97
> fi
>
>
> sets [1] the root GRUB env variable to the first found device containing
> the UUID 82afe97a-bb97-4b3d-90cb-93a058185b97, that is the UUID of my
> BTRFS filesystem
>
> AFAIU (but still not tested) this means that if the device with UUID
> c590c62[...] is missing the search ensures that GRUB will find the next
> device containing the BTRFS filesystem identified by UUID 82afe97a[...]
>
> WDYT?

[...]


Toggle quote (3 lines)
>>> Can you please provide the output of the "ls" command and the "set"
>>> command from the grub rescue shell?

See the attached screenshot of the result:
I was about to mess around in GRUB to edit the prefix, cmdline and
root values and do `insmod normal`, `normal` to proceed to boot, but
then the init RAM disk failed like so:
So there are more than one things to be adjusted :-).

Thank you, I'll look at the data with a fresh head later, but it seems
to me that we'd need to have GRUB fallback logic for the root devices
when RAID setups are detected. I'll read on what GRUB has in store for
this kind of thing when I have a chance.

Your Debian GRUB config also has me wondering about the 'btrfs' modules,
and the others than Guix System is not using.

Thank you!

Maxim
M
M
Maxim Cournoyer wrote on 5 Mar 2022 04:33
(name . Giovanni Biscuolo)(address . g@xelera.eu)(address . 40999@debbugs.gnu.org)
87ilst13aa.fsf@gmail.com
Hi,

I'm writing here because I just found a much easier way to trigger this
than by opening the case of my desktop and pulling a drive out with this
QEMU script:

Toggle snippet (14 lines)
#!/usr/bin/env bash

devices=(sda sdb sdc)
args=(-enable-kvm -snapshot -m 2G)

i=0
for d in "${devices[@]}"; do
args+=(-drive file=/dev/$d,index=$i,media=disk)
let i++
done

qemu-system-x86_64 "${args[@]}" "$@"

This attempts to boot the drives of the *live* system in QEMU; don't
fret, it's not dangerous as the '-snapshot' option ensure no actual
writes reach the drives. It seems to fail at the mount command in our
initrd, but it at least allow testing GRUB easily.

With the above script and my Btrfs RAIDc3 array on drives /dev/sda,
/dev/sdb and /dev/sdc, after removing 'sdb' from the devices list for
example I get:

Toggle snippet (24 lines)
Booting from Hard Disk...
GRUB loading...
Welcome to GRUB!

Attempting to decrypt master key...
Enter passphrase for hd0,gpt2 (0792432c78d84dcc87c530200c3d02db):
Slot 0 opened
error: failure reading sector 0x0 from `fd0'.
error: no such cryptodisk found.
Attempting to decrypt master key...
Enter passphrase for hd1,gpt2 (f0afd5c9da7046a79c6f5d22913638bf):
Slot 0 opened
error: failure reading sector 0x80 from `fd0'.
error: failure reading sector 0x80 from `fd0'.
error: failure reading sector 0x80 from `fd0'.
error: failure reading sector 0x80 from `fd0'.
error: failure reading sector 0x80 from `fd0'.
error: failure reading sector 0x80 from `fd0'.
error: failure reading sector 0x80 from `fd0'.
error: failure reading sector 0x80 from `fd0'.
error: failure reading sector 0x80 from `fd0'.
error: failure reading sector 0x80 from `fd0'.

Dropping just sdc instead, I get:

Toggle snippet (17 lines)
Booting from Hard Disk...
GRUB loading...
Welcome to GRUB!

Attempting to decrypt master key...
Enter passphrase for hd0,gpt2 (0792432c78d84dcc87c530200c3d02db):
Slot 0 opened
Attempting to decrypt master key...
Enter passphrase for hd1,gpt2 (a9aead409d014f7abb83be70dd192b7b):
Slot 0 opened
error: failure reading sector 0x0 from `fd0'.
error: no such cryptodisk found.
error: failure reading sector 0x80 from `fd0'.
error: unknown filesystem.
Entering rescue mode...

This should make a future fix cheaper to try (but a system test will be
best anyway :-)).

Thanks,

Maxim
M
M
Maxim Cournoyer wrote on 27 Mar 2022 06:07
(address . 40999@debbugs.gnu.org)
878rswvy0p.fsf@gmail.com
Hi,

maxim.cournoyer@gmail.com writes:

Toggle quote (13 lines)
> On a system where:
>
> 1) Each disks comprising the array is fully LUKS encrypted
> 2) Each mapped disk is made part of a Btrfs RAID1 array
>
> When attempting to boot the system after pulling out (in BIOS or using
> the cable) the drive to simulate a complete disk failure, GRUB hangs,
> prompting for the LUKS password of the disappeared drive and
> (unsurprisingly) failing to open it.
>
> This prevents booting in a degraded LUKS encrypted, Btrfs RAID1 on Guix
> System.

It seems this is a problem not unknown to other (non-Btrfs) software
RAID as well, such as mdadm. There was recently a fix for it in Ubuntu
[0]. It can probably provide cues about how to go to fix it in Guix
System.

?