"basic" system tests fail (and all the other ones) on guix master

  • Done
  • quality assurance status badge
Details
7 participants
  • Danny Milosavljevic
  • Oleg Pykhalov
  • Leo Famulari
  • Ludovic Courtès
  • Marius Bakke
  • Mark H Weaver
  • Mathieu Othacehe
Owner
unassigned
Submitted by
Danny Milosavljevic
Severity
normal
Merged with

Debbugs page

Danny Milosavljevic wrote 4 years ago
(address . bug-guix@gnu.org)
20200911195058.6dc013b4@scratchpost.org
Hi,

as of guix master commit 0fb974be9c3e1e22a2145c9c602c44cd10cef2b0 all system
tests, including "basic", fail:

$ guix environment --pure guix --ad-hoc git guile-readline guile-json nano guile-zlib guile-lzlib
(env)$ make TESTS=basic check-system
loading '/gnu/store/s3limrgxj4pd6b4psra66phary2nmqx4-linux-vm-loader'...
[ 2.776050] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
environment variable `PATH' set to `/gnu/store/j3jlpncfqvykkq6sx7h4ly1rdcr2a8qq'
creating partition table with 2 partitions (20.0 MiB, 40.0 MiB)...
Error: Partition(s) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, .
Backtrace:
3 (primitive-load "/gnu/store/liacrhaiab35pcfzdqsa71fvp7f?")
In ./gnu/build/vm.scm:
470:21 2 (initialize-hard-disk "/dev/vda" #:bootloader-package _ ?)
300:2 1 (initialize-partition-table "/dev/vda" (#<<partitio???@ build-log 1
?@ build-log 29023 1
?@ build-log 29023 1
Toggle quote (1 lines)
>@ build-log 29023 1
@ build-log 29023 1
?@ build-log 29023 1
?@ build-log 29023 1
?@ build-log 29023 1
)@ build-log 29023 1
@ build-log 29023 1
?@ build-log 29023 1
?@ build-log 29023 1
?@ build-log 29023 1
)@ build-log 29023 1
@ build-log 29023 1

In ./guix/build/utils.scm:
654:6 0 (invoke _ . _)

./guix/build/utils.scm:654:6: In procedure invoke:
ERROR:
1. &invoke-error:
program: "parted"
arguments: ("--script" "/dev/vda" "mklabel" "msdos" "mkpart" "primary" "e)
exit-status: 1
term-signal: #f
stop-signal: #f
[ 3.446031] Unregister pv shared memory for cpu 0
[ 3.447989] reboot: Restarting system
[ 3.449167] reboot: machine restart
Backtrace:
1 (primitive-load "/gnu/store/1xp94hddh88szwkbqyb49jirmf5?")
In ./gnu/build/vm.scm:
198:12 0 (load-in-linux-vm _ #:output _ #:qemu _ #:memory-size _ ?)

./gnu/build/vm.scm:198:12: In procedure load-in-linux-vm:
guest VM code exited with a non-zero status 256
note: keeping build directory `/tmp/guix-build-qemu-image.drv-0'
builder for `/gnu/store/ks5w1bwcm2rg1fq5s9hc78grxpnm0wpf-qemu-image.drv' failed1
build of /gnu/store/ks5w1bwcm2rg1fq5s9hc78grxpnm0wpf-qemu-image.drv failed
View build log at '/var/log/guix/drvs/ks/5w1bwcm2rg1fq5s9hc78grxpnm0wpf-qemu-im.
cannot build derivation `/gnu/store/8vhwchcxlns64nnqr13p0liw745s6pp8-run-vm.sh.t
cannot build derivation `/gnu/store/672vqgm8mx0a6k2iwsf13q73gmk5vdc4-basic.drv't
guix build: error: build of `/gnu/store/672vqgm8mx0a6k2iwsf13q73gmk5vdc4-basic.d
make: *** [Makefile:6007: check-system] Error 1
dannym@bayfront ~/src/guix-master/guix [env]$

The same happens on my laptop.

./pre-inst-env guix describe
Git checkout:
repository: /home/dannym/src/guix-master/guix
branch: master
commit: 0fb974be9c3e1e22a2145c9c602c44cd10cef2b0

(ssh://dannym@git.sv.gnu.org/srv/git/guix.git)

I've tried to add a call to wipefs before parted--it doesn't help.

Before the problem surfaced I saw a qemu major update.
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl9buQIACgkQ5xo1VCww
uqWBhAf9FvsI4m5s/5ARUyX7W0wwU9CqTVw9eGwTXcNlD+8GvOBTTDYrO0ERbcPj
3hEioTvIGlZJ33ic02ZTrewMWjmlbRewrB3E3vTpFNB21gj9OYP5ptm0amPKOtM9
kvUzxNUO7jpFspUv/dmFvW1d7zxbwGjXyNFQbKgik5P57QRxqhenEORGOwHdlPDD
1iHwN9ZAOQoT0neRrCWChPs3wWxXFzfaQGfTlcC3hGCZ+pRgrvEo0OsvkINc85t4
h65WB8vO8OlYtQA9BgRYMM58uUk9tZ44U+6+PFZ1DcUHfeac6/8LTdSixCSXW2uj
h3Viu0E+twhaxqMGqtXqPDRCGsw1ew==
=SYEI
-----END PGP SIGNATURE-----


Oleg Pykhalov wrote 4 years ago
merge 43344 43352
(name . control)(address . control@debbugs.gnu.org)
878sdfz2ic.fsf@gmail.com
merge 43344 43352
Oleg Pykhalov wrote 4 years ago
control message for bug #43344
(address . control@debbugs.gnu.org)
87y2lfgumt.fsf@gmail.com
merge 43344 43352
quit
Mathieu Othacehe wrote 4 years ago
Re: bug#43344: "basic" system tests fail (and all the other ones) on guix master
(name . Danny Milosavljevic)(address . dannym@scratchpost.org)(address . 43344@debbugs.gnu.org)
877dsw7alf.fsf@gnu.org
Hello Danny,

Toggle quote (4 lines)
> 1. &invoke-error:
> program: "parted"
> arguments: ("--script" "/dev/vda" "mklabel" "msdos" "mkpart" "primary" "e)

So it looks like the parted script failed inside the VM, while running
"initialize-partition-table".

This work fine both on the build farm and on my machine, which makes the
debug harder :(.

Anything special with your hardware? KVM support disabled maybe?

Thanks,

Mathieu
Danny Milosavljevic wrote 4 years ago
(name . Mathieu Othacehe)(address . othacehe@gnu.org)(address . 43344@debbugs.gnu.org)
20200914183037.624cc347@scratchpost.org
Hi Mathieu,

On Mon, 14 Sep 2020 15:26:52 +0200
Mathieu Othacehe <othacehe@gnu.org> wrote:

Toggle quote (2 lines)
> Anything special with your hardware? KVM support disabled maybe?

The culprit had been the Linux kernel update to 5.8.8.

After downgrading to 5.8.7 it works just fine--no other changes done.

Previously, I had tried also to add wipefs -a before the parted--that
hadn't helped either.
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl9fms4ACgkQ5xo1VCww
uqWPnwf+M7BFkY+l76yRBwo3WFAVJeGw659xThs3XgEDPMx3slTamBRBbrglOk/n
djxolbFKNwdLV+0KjUoctZ1/9BHKn7vFNABbzzoXc/F86S3O57mqZeKLgHN8g6MY
UJbDVmTsVqcj7bTUf5H4hN0FT0Wl5jByAP7nnpy3jFyEF3Gc9DQshT/I9WoA5lt5
srTlXFncHrdCZ919CkTiRk8ztwVJA5xGIjAZ7D6rEnixtMe76XkwvWzBIgaC0OL9
V7+pmngloNfzunl/msfXuzPPNkU5cnEWK2PIHeyPuACoexu8LysS2Tupf3PzNBDZ
RmUUNsERytsROsAqJHHoAo87ms0UFw==
=AsAG
-----END PGP SIGNATURE-----


Mark H Weaver wrote 4 years ago
87y2laactz.fsf@netris.org
Hi,

Danny Milosavljevic <dannym@scratchpost.org> writes:

Toggle quote (12 lines)
> On Mon, 14 Sep 2020 15:26:52 +0200
> Mathieu Othacehe <othacehe@gnu.org> wrote:
>
>> Anything special with your hardware? KVM support disabled maybe?
>
> The culprit had been the Linux kernel update to 5.8.8.
>
> After downgrading to 5.8.7 it works just fine--no other changes done.
>
> Previously, I had tried also to add wipefs -a before the parted--that
> hadn't helped either.

That's useful information, but we should not stay frozen on the 5.8.7
kernel for much longer. 5.8.8 contains many bug fixes, some of which
might fix potentially exploitable flaws.

It would be useful to know if this problem still occurs with 5.8.9,
which has since come out. If so, we should do a bisection between 5.8.7
and 5.8.8 to find out which upstream commit introduced the problem.

Would anyone like to investigate this further?

Mark
Leo Famulari wrote 4 years ago
(name . Mark H Weaver)(address . mhw@netris.org)
20200915230628.GA20807@jasmine.lan
On Tue, Sep 15, 2020 at 06:34:21PM -0400, Mark H Weaver wrote:
Toggle quote (10 lines)
> That's useful information, but we should not stay frozen on the 5.8.7
> kernel for much longer. 5.8.8 contains many bug fixes, some of which
> might fix potentially exploitable flaws.
>
> It would be useful to know if this problem still occurs with 5.8.9,
> which has since come out. If so, we should do a bisection between 5.8.7
> and 5.8.8 to find out which upstream commit introduced the problem.
>
> Would anyone like to investigate this further?

I will try to reproduce the bug with 5.8.9 now. I will try the bisection
if time permits.
Ludovic Courtès wrote 4 years ago
(name . Danny Milosavljevic)(address . dannym@scratchpost.org)
87pn6ln7q3.fsf@gnu.org
Hi,

Danny Milosavljevic <dannym@scratchpost.org> skribis:

Toggle quote (4 lines)
> environment variable `PATH' set to `/gnu/store/j3jlpncfqvykkq6sx7h4ly1rdcr2a8qq'
> creating partition table with 2 partitions (20.0 MiB, 40.0 MiB)...
> Error: Partition(s) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, .

[...]

Toggle quote (7 lines)
> 1. &invoke-error:
> program: "parted"
> arguments: ("--script" "/dev/vda" "mklabel" "msdos" "mkpart" "primary" "e)
> exit-status: 1
> term-signal: #f
> stop-signal: #f

The code in question in Parted:

Toggle snippet (38 lines)
if (!add_partition (disk, part)) {
ok[i - 1] = 0;
errnums[i - 1] = errno;
}

[…]

char *bad_part_list = NULL;
/* now warn about any errors */
for (i = 1; i <= lpn; i++) {
if (ok[i - 1] || errnums[i - 1] == ENXIO)
continue;
if (bad_part_list == NULL) {
bad_part_list = malloc (lpn * 5);
if (!bad_part_list)
goto cleanup;
bad_part_list[0] = 0;
}
sprintf (bad_part_list + strlen (bad_part_list), "%d, ", i);
}
if (bad_part_list == NULL)
ret = 1;
else {
bad_part_list[strlen (bad_part_list) - 2] = 0;
if (ped_exception_throw (
PED_EXCEPTION_ERROR,
PED_EXCEPTION_IGNORE_CANCEL,
_("Partition(s) %s on %s have been written, but we have "
"been unable to inform the kernel of the change, "
"probably because it/they are in use. As a result, "
"the old partition(s) will remain in use. You "
"should reboot now before making further changes."),
bad_part_list, disk->dev->path) == PED_EXCEPTION_IGNORE)
ret = 1;
free (bad_part_list);
}

With the patch below, I strace’d ‘parted’, which gives:

Toggle snippet (15 lines)
$ make check-system TESTS=basic

[…]

ioctl(3, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=253, devname="", volname=""}}) = -1 ENOMEM (Cannot allocate memory)
ioctl(3, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=254, devname="", volname=""}}) = -1 ENOMEM (Cannot allocate memory)
ioctl(3, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=255, devname="", volname=""}}) = -1 ENOMEM (Cannot allocate memory)
ioctl(3, BLKPG, {op=BLKPG_DEL_PARTITION, flags=0, datalen=152, data={start=0, length=0, pno=256, devname="", volname=""}}) = -1 ENOMEM (Cannot allocate memory)
write(2, "Error", 5Error) = 5
write(2, ": ", 2: ) = 2
write(2, "Partition(s) 1, 2, 3, 4, 5, 6, 7"..., 495Partition(s) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64 on /dev/vda have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.) = 495
write(2, "\n", 1
) = 1

So I threw more virtual RAM at it:
Toggle diff (9 lines)
diff --git a/gnu/system/vm.scm b/gnu/system/vm.scm
@@ -446,6 +450,7 @@ system that is passed to 'populate-root-file-system'."
#:bootloader-installer
#+(bootloader-installer bootloader)))))))
#:system system
+ #:memory-size 1024
#:make-disk-image? #t
#:disk-image-size disk-image-size
#:disk-image-format disk-image-format
… but that doesn’t help.

Ideas?

Thanks,
Ludo’.
Toggle diff (51 lines)
diff --git a/gnu/build/vm.scm b/gnu/build/vm.scm
index 287d099f79..e793b5b518 100644
--- a/gnu/build/vm.scm
+++ b/gnu/build/vm.scm
@@ -297,7 +297,7 @@ actual /dev name based on DEVICE."
partition-size)
partitions)
", "))
- (apply invoke "parted" "--script"
+ (apply invoke "strace" "parted" "--script"
device "mklabel" label-type
(options partitions offset))
diff --git a/gnu/packages/linux.scm b/gnu/packages/linux.scm
index 59ffb334e0..72fb3ca49d 100644
--- a/gnu/packages/linux.scm
+++ b/gnu/packages/linux.scm
@@ -349,15 +349,15 @@ corresponding UPSTREAM-SOURCE (an origin), using the given DEBLOB-SCRIPTS."
;; The current "stable" kernel. That is, the most recently released major
;; version.
-(define-public linux-libre-5.8-version "5.8.7")
+(define-public linux-libre-5.8-version "5.8.8")
(define deblob-scripts-5.8
(linux-libre-deblob-scripts
- linux-libre-5.8-version
+ "5.8.7"
(base32 "07z7sglyrfh0706icqqf3shadf638pvyid9386r661ds5lbsa2mw")
(base32 "0j6jba5fcddqlb42f95gjl78jisfla4nswqila074gglcrbnl9q7")))
(define-public linux-libre-5.8-pristine-source
(let ((version linux-libre-5.8-version)
- (hash (base32 "1zhpzlhl2ykna2nc70m72wlgyv1pkvkpfssb4k8p5pwlkh1ga2vv")))
+ (hash (base32 "0xm901zvvrwsb9k88la6pb65nybi43bygiyz1z68njwsx6ripxik")))
(make-linux-libre-source version
(%upstream-linux-source version hash)
deblob-scripts-5.8)))
diff --git a/gnu/system/vm.scm b/gnu/system/vm.scm
index 80a8618729..49489b6159 100644
--- a/gnu/system/vm.scm
+++ b/gnu/system/vm.scm
@@ -376,6 +376,10 @@ system that is passed to 'populate-root-file-system'."
(set-path-environment-variable "PATH" '("bin" "sbin") inputs)
+ (setenv "PATH"
+ (string-append #+(file-append strace "/bin") ":"
+ (getenv "PATH")))
+
(let* ((graphs '#$(match inputs
(((names . _) ...)
names)))
Leo Famulari wrote 4 years ago
(name . Mark H Weaver)(address . mhw@netris.org)
20200916141231.GA28015@jasmine.lan
On Tue, Sep 15, 2020 at 07:06:28PM -0400, Leo Famulari wrote:
Toggle quote (3 lines)
> I will try to reproduce the bug with 5.8.9 now. I will try the bisection
> if time permits.

It also fails with 5.8.9.
Danny Milosavljevic wrote 4 years ago
(name . Leo Famulari)(address . leo@famulari.name)
20200916165245.421e3c4c@scratchpost.org
commit 692d0626557451c4b557397f20b7394b612d0289
Author: Christoph Hellwig <hch@lst.de>
Date: Tue Sep 1 11:59:41 2020 +0200

block: fix locking in bdev_del_partition
[ Upstream commit 08fc1ab6d748ab1a690fd483f41e2938984ce353 ]
We need to hold the whole device bd_mutex to protect against
other thread concurrently deleting out partition before we get
to it, and thus causing a use after free.
Fixes: cddae808aeb7 ("block: pass a hd_struct to delete_partition")
Reported-by: syzbot+6448f3c229bc52b82f69@syzkaller.appspotmail.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>

?
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl9iJr4ACgkQ5xo1VCww
uqVrNwf/dfRyuoleqM4FawaPaWmdJPsd8tDj+OGRGRvSnncbqF1IodkEw6XhMx3y
oDs3fy8irWSDPwkFqiL/pFEd15IfbC4mFg2kTUo1bD/R7euCjZMMvwpIWWUnQ+KY
saA/frug/F9yZRUDRQPWmrJJ/eXX2n6ina1rsaRTYt/yOUgmhpR/wPhme/QZdJaR
CsSnosOiR+qqDrHVQJRlP0nu3bMUNXB3j9D1ylSjRfyb6uuAQWuuNoN/u/jedldJ
2STJRZRITq/T+678CGO4f8yMHcuWFzYFcWR1Cf5JYVDUxz86lBTQi+zDf+ULS6lE
Y/mvBSD0GONr20Mj2tsH3qRBI6Ri0A==
=oQi4
-----END PGP SIGNATURE-----


Danny Milosavljevic wrote 4 years ago
(name . Leo Famulari)(address . leo@famulari.name)
20200916170239.580ac59a@scratchpost.org
On Wed, 16 Sep 2020 16:52:45 +0200
Danny Milosavljevic <dannym@scratchpost.org> wrote:

Toggle quote (19 lines)
> commit 692d0626557451c4b557397f20b7394b612d0289
> Author: Christoph Hellwig <hch@lst.de>
> Date: Tue Sep 1 11:59:41 2020 +0200
>
> block: fix locking in bdev_del_partition
>
> [ Upstream commit 08fc1ab6d748ab1a690fd483f41e2938984ce353 ]
>
> We need to hold the whole device bd_mutex to protect against
> other thread concurrently deleting out partition before we get
> to it, and thus causing a use after free.
>
> Fixes: cddae808aeb7 ("block: pass a hd_struct to delete_partition")
> Reported-by: syzbot+6448f3c229bc52b82f69@syzkaller.appspotmail.com
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
>

int bdev_del_partition(struct block_device *bdev, int partno)
{
struct block_device *bdevp;
struct hd_struct *part = NULL;
int ret;

bdevp = bdget_disk(bdev->bd_disk, partno);
if (!bdevp)
return -ENOMEM; <--------------

...
}

struct block_device *bdget_disk(struct gendisk *disk, int partno)
{
struct hd_struct *part;
struct block_device *bdev = NULL;

part = disk_get_part(disk, partno);
if (part)
bdev = bdget(part_devt(part));
disk_put_part(part);

return bdev;
}

struct block_device *bdget(dev_t dev)
{
struct block_device *bdev;
struct inode *inode;

inode = iget5_locked(blockdev_superblock, hash(dev),
bdev_test, bdev_set, &dev);

if (!inode)
return NULL; <--------------------
[...]
}
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl9iKQ8ACgkQ5xo1VCww
uqWWPQf+KN0J6gI0j5qCLy9glxGYn52k+m4TbXE83/Zb68h80/ytGW0F8GIPXCnw
+qSuSEErK+Nr/RBTViAKDAS0X7jcYpMlmL42Exyt3vyEnIYwxN8k3pTwqrYwzMjh
Vr2Srdh3SS/NW7rVcR0dOXsgitEPTUVsvgBjvGaofKvoU0Ny2SYi44nnPKGYqskv
wZE42iRH4c8qZz8+LkjO/MFqjgbOcOFcodrGE9DUjxx4RnpmijHIsoTPKJiZ4qjZ
BrAAN+xj2dBh03dK+ZnCodWLqBZmuz4O/6DsiNkiFxLhSQo+RQY1NaEahz/Gns9s
wti5qCbPep8FkTDzKutOSkq9oUceRg==
=bTO1
-----END PGP SIGNATURE-----


Leo Famulari wrote 4 years ago
(name . Danny Milosavljevic)(address . dannym@scratchpost.org)
20200916162244.GA3262@jasmine.lan
On Wed, Sep 16, 2020 at 04:52:45PM +0200, Danny Milosavljevic wrote:
Toggle quote (20 lines)
> commit 692d0626557451c4b557397f20b7394b612d0289
> Author: Christoph Hellwig <hch@lst.de>
> Date: Tue Sep 1 11:59:41 2020 +0200
>
> block: fix locking in bdev_del_partition
>
> [ Upstream commit 08fc1ab6d748ab1a690fd483f41e2938984ce353 ]
>
> We need to hold the whole device bd_mutex to protect against
> other thread concurrently deleting out partition before we get
> to it, and thus causing a use after free.
>
> Fixes: cddae808aeb7 ("block: pass a hd_struct to delete_partition")
> Reported-by: syzbot+6448f3c229bc52b82f69@syzkaller.appspotmail.com
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
>
> ?

Do you think that's the faulty commit? I won't have time to test it
today.
-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEEsFFZSPHn08G5gDigJkb6MLrKfwgFAl9iO9EACgkQJkb6MLrK
fwigzBAAv6mIgPP/G4Za8VaSYfjcXOThVEY58LgP/fdoC54rp6ip1vs3b0wW86TP
+PB+kmxmgYOPR8G4jsVIPWo/oPq7fQ22P5soNI4AeEPiRiITLUkS+EEys8YSUESP
kLB29RVF3rFb5i79E5px9o/JYkeBPxULug0VGhNWLBQkM3vjrra/PVcDzkHA2TMA
EYAVN0oDhaAJ3AuTv66pyuoBg4t8nARxuX4q/KZ+4QuP2q0GcJNOPy+IIY94CacS
/nqZ46aK0qaQ1FTEjRQyleSq8cFpT37k3oqUP6jUXuvU7HO5v1pqQ+egVCqSp3nv
xU4r5jnsOgMmLkYj4nCmaZVp00VmcsGU6XO9hcKEYOpRORNZIJtcDzr+zIB3mbhZ
oY9MelcqJ+Vj12sIVR4L5vcAvhysrZwiRggqezYNiwY9ZQ/4UUnddk1JH+qcJkRV
nr/SlDTdUHj6S1eyao/Ov2thamWiRKC2rzriUGIpVaeFIhQ2yHnjUkpLZazVOhXk
RDoIDIWfqJh8v2xRH3jA0wzqFiK+hnpa9L5Zed/RESsrLjHi//hbVJFht3t9xZhV
4O0R1jNzXgFa/fxF20s+oHVgrIKs77jiz5vlBJDfJCsEDb8r9tLFPIa6bwbqajVK
I3cllFFmqB004mXiJC3Cy0Fv26a1/s5C4C8mYFd6NnDlvzfW7sQ=
=2iEr
-----END PGP SIGNATURE-----


Mark H Weaver wrote 4 years ago
87363h9oab.fsf@netris.org
Leo Famulari <leo@famulari.name> writes:

Toggle quote (24 lines)
> On Wed, Sep 16, 2020 at 04:52:45PM +0200, Danny Milosavljevic wrote:
>> commit 692d0626557451c4b557397f20b7394b612d0289
>> Author: Christoph Hellwig <hch@lst.de>
>> Date: Tue Sep 1 11:59:41 2020 +0200
>>
>> block: fix locking in bdev_del_partition
>>
>> [ Upstream commit 08fc1ab6d748ab1a690fd483f41e2938984ce353 ]
>>
>> We need to hold the whole device bd_mutex to protect against
>> other thread concurrently deleting out partition before we get
>> to it, and thus causing a use after free.
>>
>> Fixes: cddae808aeb7 ("block: pass a hd_struct to delete_partition")
>> Reported-by: syzbot+6448f3c229bc52b82f69@syzkaller.appspotmail.com
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> Signed-off-by: Sasha Levin <sashal@kernel.org>
>>
>> ?
>
> Do you think that's the faulty commit? I won't have time to test it
> today.

Looks like a good bet to me. Many thanks to Leo for testing 5.8.9, and
to Danny for identifying the likely culprit.

Next, it probably makes sense to test 5.8.9 with the above commit
reverted. Would someone like to try it? If that works, we can avoid
the bisection, resume kernel updates, and revert this change across all
of our kernels until a better solution is found.

Mark
Leo Famulari wrote 4 years ago
(name . Mark H Weaver)(address . mhw@netris.org)
20200917022117.GA1845@jasmine.lan
On Wed, Sep 16, 2020 at 09:36:49PM -0400, Mark H Weaver wrote:
Toggle quote (5 lines)
> Next, it probably makes sense to test 5.8.9 with the above commit
> reverted. Would someone like to try it? If that works, we can avoid
> the bisection, resume kernel updates, and revert this change across all
> of our kernels until a better solution is found.

I'll do this tonight.
Marius Bakke wrote 4 years ago
87r1r06bko.fsf@gnu.org
Leo Famulari <leo@famulari.name> writes:

Toggle quote (8 lines)
> On Wed, Sep 16, 2020 at 09:36:49PM -0400, Mark H Weaver wrote:
>> Next, it probably makes sense to test 5.8.9 with the above commit
>> reverted. Would someone like to try it? If that works, we can avoid
>> the bisection, resume kernel updates, and revert this change across all
>> of our kernels until a better solution is found.
>
> I'll do this tonight.

-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEu7At3yzq9qgNHeZDoqBt8qM6VPoFAl9jIOcACgkQoqBt8qM6
VPrYXwf8CCEvPrJb8YADujgVGCqMZxFDFPeqMooRagQ+lsOoJdxQy5xMbJ6noOuj
6MeSCjzfvtGr8UQ9BAF1MMesajFlpC5DgDeYqe0emi3vSjGZXF7KrqANF2Q7JqZ8
z4E3uHFHF0+8w0f1eEOufgQqsTBW8usa5HpeAAHEnckk4RRCBlN53w/Rcv0A3HLM
TfTn5g7oPi8ByaLh/Mkt1rbruqStbuLVljUNN4H4l8+fjAwBY/z34lKJvuo0d26n
g+lKwwDvDymKl6NX+j9045o7WhjQSgRUd/FaD9Lm8ZP7f55p+1mt4dcnRI0k6GYL
N8Km9Qji92W/ds3il/ET6XyZr0GjMA==
=OxUP
-----END PGP SIGNATURE-----

Leo Famulari wrote 4 years ago
(name . Mark H Weaver)(address . mhw@netris.org)
20200917140544.GA14066@jasmine.lan
On Wed, Sep 16, 2020 at 09:36:49PM -0400, Mark H Weaver wrote:
Toggle quote (12 lines)
> > On Wed, Sep 16, 2020 at 04:52:45PM +0200, Danny Milosavljevic wrote:
> >> commit 692d0626557451c4b557397f20b7394b612d0289
> >> Author: Christoph Hellwig <hch@lst.de>
> >> Date: Tue Sep 1 11:59:41 2020 +0200
> >>
> >> block: fix locking in bdev_del_partition

> Next, it probably makes sense to test 5.8.9 with the above commit
> reverted. Would someone like to try it? If that works, we can avoid
> the bisection, resume kernel updates, and revert this change across all
> of our kernels until a better solution is found.

Using linux-libre 5.8.9 with that commit reverted, `guix system vm` does
work again.
Leo Famulari wrote 4 years ago
(name . Marius Bakke)(address . marius@gnu.org)
20200917142026.GB14066@jasmine.lan
On Thu, Sep 17, 2020 at 10:40:07AM +0200, Marius Bakke wrote:
Toggle quote (16 lines)
> Leo Famulari <leo@famulari.name> writes:
>
> > On Wed, Sep 16, 2020 at 09:36:49PM -0400, Mark H Weaver wrote:
> >> Next, it probably makes sense to test 5.8.9 with the above commit
> >> reverted. Would someone like to try it? If that works, we can avoid
> >> the bisection, resume kernel updates, and revert this change across all
> >> of our kernels until a better solution is found.
> >
> > I'll do this tonight.
>
> A fix for this is available in mainline:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=88ce2a530cc9865a894454b2e40eba5957a60e1a
>
> (via <https://gitlab.gnome.org/GNOME/gparted/-/issues/111>)

What is the recommended way to report this to the stable maintainers?
-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEEsFFZSPHn08G5gDigJkb6MLrKfwgFAl9jcJ8ACgkQJkb6MLrK
fwiHphAAhOooIej7iuzi0LLSeqAnM0cOrxaxb1vKdiV2LiCtBVCuMuSNE+5rX8iL
zXHwth2keT8xld5hxPRM7teioybZ/nsb5cy9WLVKplkZOhUiKhdvyLhi/Hv8m6AQ
j+B0iddjowJY5GabDbHhJPS48aWWJaMRYC2ygIOa5KK4mC7voBFRQZ6ds2Us6GDU
1yF+bkO8BynBL5OqJjg8zvU/FHA9h/LxhsR9zsmed/fKYywf6wRBPXh9uqBd7jEZ
9dbxS0WcT+WWSwjZ+RJ+kmCHrS6Y0LDUpZHqBo07EUvu0Cv+u90gBxgY6gdQ0zu/
GvXU6Qs9U8cYvJxA8c7a2KuWiK8xl0H0oJSyXosRH6iCQsJd94fPbOTHA++dMRzX
FVvWAe3WqndrhkpRkh0vk6Z8Z2lP2Ir4/SmBHOXiAsN+IMtJi/OBOhHroxEexGNc
WPs05gaQJJwGY8ivzZvCy8Z7pcNuxegMHNEAfP74klr3D2diA/6Jab+0MO2gl2Co
PcqUGulIFC463i0T3bNezLvzWZOa/yYPOZDY4waHUapOXg8Q6WIbr9nIyuu9Yta+
z1BafGUfo5nooqPEq+QmY46n8WQuZqA9ZjRcLGJJpbuAC9OeOa/Kbsx3FlxaBUs6
QoWEsUlDF/tRU+DFq5RRm7g+KoLwSEFkRZ6rs5sI/3DDaSnAF3Q=
=lMSH
-----END PGP SIGNATURE-----


Danny Milosavljevic wrote 4 years ago
(name . Leo Famulari)(address . leo@famulari.name)
20200917172806.1d03de5c@scratchpost.org
Hi Leo,

On Thu, 17 Sep 2020 10:05:44 -0400
Leo Famulari <leo@famulari.name> wrote:

Toggle quote (16 lines)
> On Wed, Sep 16, 2020 at 09:36:49PM -0400, Mark H Weaver wrote:
> > > On Wed, Sep 16, 2020 at 04:52:45PM +0200, Danny Milosavljevic wrote:
> > >> commit 692d0626557451c4b557397f20b7394b612d0289
> > >> Author: Christoph Hellwig <hch@lst.de>
> > >> Date: Tue Sep 1 11:59:41 2020 +0200
> > >>
> > >> block: fix locking in bdev_del_partition
>
> > Next, it probably makes sense to test 5.8.9 with the above commit
> > reverted. Would someone like to try it? If that works, we can avoid
> > the bisection, resume kernel updates, and revert this change across all
> > of our kernels until a better solution is found.
>
> Using linux-libre 5.8.9 with that commit reverted, `guix system vm` does
> work again.

Thanks for testing it!

What the commit does is lock the drive whose partition table is being
modified. That sounds like a good idea in general. But parted does not
expect the error code ENOMEM and thus misjudges the situation.

See also:


for the fix that goes into Linux 5.9.

We could also patch parted to also accept ENOMEM in addition to ENXIO if we
wanted to. I wouldn't (it diverges from what upstream is doing--too much work).
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCgAdFiEEds7GsXJ0tGXALbPZ5xo1VCwwuqUFAl9jgIYACgkQ5xo1VCww
uqV/dAf/Vw7MsIv2xeJ+IHCSgg7iH7NmI3xBnI7bbkH/7M+8dUlzQqtV4568uvF3
prWg4uLB5eqUiKYlySpqsGIgpuSnNtnhr8Uwg3B35s+GpjhOobWFCRqG7I0BMzyC
FXLK+6cz8QX45eVXLRDynuhqZ5qEvLoViyLpXoAyJWoBRCZlTUHPCqIwrB4bnS3W
i2kDCyLLjSUZI7Wpae4fFA9EUuDrl1PIMqjBOBNuy3t5vqZeewYAC2BKFy0H5pWD
6QQjzXHvzpi90BGPE6ghN3ycVQYHuEpOFbdjOxW6eC58OeFBWdsI8W/FIQLP9wMi
DAsIDH6snbRP4k51eg69CI0KV3mFmg==
=PtVk
-----END PGP SIGNATURE-----


Mark H Weaver wrote 4 years ago
87zh5o8bkq.fsf@netris.org
Marius Bakke <marius@gnu.org> writes:

Toggle quote (16 lines)
> Leo Famulari <leo@famulari.name> writes:
>
>> On Wed, Sep 16, 2020 at 09:36:49PM -0400, Mark H Weaver wrote:
>>> Next, it probably makes sense to test 5.8.9 with the above commit
>>> reverted. Would someone like to try it? If that works, we can avoid
>>> the bisection, resume kernel updates, and revert this change across all
>>> of our kernels until a better solution is found.
>>
>> I'll do this tonight.
>
> A fix for this is available in mainline:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=88ce2a530cc9865a894454b2e40eba5957a60e1a
>
> (via <https://gitlab.gnome.org/GNOME/gparted/-/issues/111>)

Thank you, Marius! I see that this patch has been included in the
recently released linux-libre-5.8.10. Assuming that it fixes our
problem (which I suspect it does), we can simply update our kernels and
close this bug.

Leo, would you like to update our kernels? I won't be able to get to it
today, and I'm looking to offload that job anyway.

Mark
Leo Famulari wrote 4 years ago
(no subject)
(address . control@debbugs.gnu.org)
20200918134110.GA24439@jasmine.lan
close 43344
?
Your comment

This issue is archived.

To comment on this conversation send an email to 43344@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 43344
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch
You may also tag this issue. See list of standard tags. For example, to set the confirmed and easy tags
mumi command -t +confirmed -t +easy
Or, remove the moreinfo tag and set the help tag
mumi command -t -moreinfo -t +help