Re: [PATCH -next 3/4] block/rq_qos: use a global mutex to protect rq_qos apis

From: Yu Kuai
Date: Sun Jan 08 2023 - 20:38:26 EST


Hi,

在 2023/01/07 2:23, Tejun Heo 写道:
Hello,

On Fri, Jan 06, 2023 at 09:33:26AM +0800, Yu Kuai wrote:
wbt's lazy init is tied to one of the block device sysfs files, right? So,
it *should* already be protected against device removal.

That seems not true, I don't think q->sysfs_lock can protect that,
consider that queue_wb_lat_store() doesn't check if del_gendisk() is
called or not:

t1: wbt lazy init t2: remove device
queue_attr_store
del_gendisk
blk_unregister_queue
mutex_lock(&q->sysfs_lock)
...
mutex_unlock(&q->sysfs_lock);
rq_qos_exit
mutex_lock(&q->sysfs_lock);
queue_wb_lat_store
wbt_init
rq_qos_add
mutex_unlock(&q->sysfs_lock);

So, it's not sysfs_lock but sysfs file deletion. When a kernfs, which backs
sysfs, file is removed, it disables future operations and drains all
inflight ones before returning, so you remove the interface files before
cleaning up the object that it interacts with, you don't have to worry about
racing against file operations as none can be in flight at that point.

Ok, thanks for explanation, I'll look into this and try to find out how
this works.


I tried to comfirm that by adding following delay:

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 93d9e9c9a6ea..101c33cb0a2b 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -11,6 +11,7 @@
#include <linux/blktrace_api.h>
#include <linux/blk-mq.h>
#include <linux/debugfs.h>
+#include <linux/delay.h>

#include "blk.h"
#include "blk-mq.h"
@@ -734,6 +735,8 @@ queue_attr_store(struct kobject *kobj, struct attribute
*attr,
if (!entry->store)
return -EIO;

+ msleep(10000);
+
mutex_lock(&q->sysfs_lock);
res = entry->store(q, page, length);
mutex_unlock(&q->sysfs_lock);

And then do the following test:

1) echo 10000 > /sys/block/sdb/queue/wbt_lat_usec &
2) echo 1 > /sys/block/sda/device/delete

Then, following bug is triggered:

[ 51.923642] BUG: unable to handle page fault for address:
ffffffffffffffed
[ 51.924294] #PF: supervisor read access in kernel mode
[ 51.924773] #PF: error_code(0x0000) - not-present page
[ 51.925252] PGD 1820b067 P4D 1820b067 PUD 1820d067 PMD 0
[ 51.925754] Oops: 0000 [#1] PREEMPT SMP
[ 51.926123] CPU: 1 PID: 539 Comm: bash Tainted: G W
6.2.0-rc1-next-202212267
[ 51.927124] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
?-20190727_073836-b4
[ 51.928334] RIP: 0010:__rq_qos_issue+0x30/0x60

This indicates that we aren't getting the destruction order right. It could
be that there are other reasons why the ordering is like this and we might
have to synchronize separately.

Sorry that I've been asking you to go round and round but block device
add/remove paths have always been really tricky and we wanna avoid adding
more complications if at all possible. Can you see why the device is being
destroyed before the queue attr is removed?

Of course, I'll glad to help, I'll let you know if I have any progress.

Thanks,
Kuai