RE: invalid counter value in request_queue

From: 小川 修一
Date: Thu Dec 09 2021 - 22:10:24 EST


OK, I understood the logic roughly:

-- negative value is OK. all values in each percpu_count_ptr are sum up at percpu_ref_switch_to_atomic_rcu() and store into atomic_t count.
-- atomic type is prefer to percpu type for some device to avoid RCU synchronization. then keep atomic.
But percpu access is less SMP synchronization than atomic. Now my system took percpu type

Finally, no problem. Thanks

Best regards, shuichi ogawa

-----Original Message-----
From: 小川 修一 <ogawa.syuuichi@xxxxxxxxxxxx>
Sent: Friday, December 10, 2021 10:07 AM
To: linux-kernel@xxxxxxxxxxxxxxx
Cc: 山本 達史 <yamamoto.tatsushi@xxxxxxxxxxxx>; Natsume, Wataru (ADITJ/SWG) <wnatsume@xxxxxxxxxxxxxx>; 小川 修一 <ogawa.syuuichi@xxxxxxxxxxxx>
Subject: RE: invalid counter value in request_queue

I forgot to take care gdb-script about ATOMIC mode in percpu_count_ptr. I fixed the script then I got:

super_block 0xffff9bb3f8bbcc30 "vdb"
q=(struct request_queue *) 0xffff9bb3fb53d1c0,q->q_usage_counter.percpu_count_ptr=(unsigned long *) 0x3a80c000b2b9
0:0xffffd634bfc0b2b8,0x0,0
1:0xffffd634bfc8b2b8,0x0,0
2:0xffffd634bfd0b2b8,0x0,0

Result is as same as previous my e-mail.

-----Original Message-----
From: 小川 修一
Sent: Friday, December 10, 2021 9:54 AM
To: linux-kernel@xxxxxxxxxxxxxxx
Cc: 山本 達史 <yamamoto.tatsushi@xxxxxxxxxxxx>; Natsume, Wataru (ADITJ/SWG) <wnatsume@xxxxxxxxxxxxxx>
Subject: RE: invalid counter value in request_queue

Hi all

I understood the reason why per-cpu couner became negative.
As I mention previous mail, q_usage_counter should be atomic counter.
percpu_refcount has not only percpu counter but also ATOMIC counter mode.

As fact, blk_alloc_queue_node() initialize request_queue->q_usage_counter as ATOMIC_MODE like this:

/*
* Init percpu_ref in atomic mode so that it's faster to shutdown.
* See blk_register_queue() for details.
*/
if (percpu_ref_init(&q->q_usage_counter,
blk_queue_usage_counter_release,
PERCPU_REF_INIT_ATOMIC, GFP_KERNEL))

However, q_usage_counter becomes percpu mode at blk_register_queue() .

blk_register_queue()

/*
* SCSI probing may synchronously create and destroy a lot of
* request_queues for non-existent devices. Shutting down a fully
* functional queue takes measureable wallclock time as RCU grace
* periods are involved. To avoid excessive latency in these
* cases, a request_queue starts out in a degraded mode which is
* faster to shut down and is made fully functional here as
* request_queues for non-existent devices never get registered.
*/
if (!blk_queue_init_done(q)) {
queue_flag_set_unlocked(QUEUE_FLAG_INIT_DONE, q);
percpu_ref_switch_to_percpu(&q->q_usage_counter);
blk_queue_bypass_end(q);
}


When I remove percpu_ref_switch_to_percpu(&q->q_usage_counter) in blk_register_queue() ;

- At least percpu counter is no longer used.

super_block 0xffff9bb3f8bbcc30 "vdb"
q=(struct request_queue *) 0xffff9bb3fb53d1c0,q->q_usage_counter.percpu_count_ptr=(unsigned long *) 0x3a80c000b2b9
0:0xffffd634bfc0b2b9,0x0,0
1:0xffffd634bfc8b2b9,0x0,0
2:0xffffd634bfd0b2b9,0x0,0

You can find percpu_count_ptr has a bit __PERCPU_REF_ATOMIC=0x1 per cpu counter looks not used

- I check again q_usage_counter.

crash> p $q->q_usage_counter
$1 = {
count = {
counter = 5
},
percpu_count_ptr = 64324651496121,
release = 0xffffffff9d44445d,
confirm_switch = 0x0,
force_atomic = true,
rcu = {
next = 0x0,
func = 0x0
}
}

atomic_t count is 5.
5 is OK or NG ? I will check it.
Other block device which ext4 mount on has counter=1

Now question is
Why blk_register_queue() turn q_usage_counter to percpu mode ?

I think this code has some reason to turn to percpu mode, though blk-core.c treat that counter as atomic counter.
Does it have some problem ?

Best Regards, shuichi ogawa

-----Original Message-----
From: 小川 修一 <ogawa.syuuichi@xxxxxxxxxxxx>
Sent: Thursday, December 9, 2021 1:50 PM
To: linux-kernel@xxxxxxxxxxxxxxx
Cc: 小川 修一 <ogawa.syuuichi@xxxxxxxxxxxx>; 山本 達史 <yamamoto.tatsushi@xxxxxxxxxxxx>; Natsume, Wataru (ADITJ/SWG) <wnatsume@xxxxxxxxxxxxxx>
Subject: invalid counter value in request_queue

Hi, all
I have first time to post mail, so if you have some matter, please let me know.
I'm studying Linux kernel, referencing kdump, to clarify system performance problem.
Now I found strange value in request_queue->q_usage_counter.percpu_count_ptr

Kernel version: 4.9.52, I checked 5.10.80 briefly, and looked similar.

super_block 0xffff9a563820e430 "vdb"
q=(struct request_queue *) 0xffff9a563b948920,q->q_usage_counter.percpu_count_ptr=(unsigned long *) 0x39dbc000b2b8
0:0xffffd431ffc0b2b8,0xffffffffffffdaf1,-9487
1:0xffffd431ffc8b2b8,0x0,0
2:0xffffd431ffd0b2b8,0x2510,9488

This is output of gdb script in crash commadn. Format is <cpu>:<address>,<value in HEX>, <value in signed long DEC>

Values of percpu_counter_ptr, big value or negative value on cpu0, and positive value on cpu2.
If sum up all cpus, total=1, it means 1 request remain in /dev/vdb at that kdump.

Easy to estimate, count up cpu and count down cpu are different.
I think the q_usage_counter doesn't work as reference counter to guard invalid disposing request queue, however I don't found to use this counter.

System looks no problem. However I wonder that causes any troubles like invalid disposing resource.
I ask you that this is really no problem at all.

---

As we know, this counter is reference counter of request queue access, For example generic_make_request
blk_queue_enter(q, false) -> percpu_ref_tryget_live(&q->q_usage_counter) : count up
blk_queue_exit(q) -> percpu_ref_put(&q->q_usage_counter) : count down

If count up on cpu2, and count down on cpu0, this counter becomes invalid.
I found 2 cases:

case-1: normal case of counting actual requested I/O

blk_mq_map_request() request bio to block device, then count up in blk_queue_enter_live(q)
blk_mq_end_request() called at terminating I/O at IRQ context, then count down in
blk_mq_free_request() -> blk_queue_exit(q)

IRQ context is normally run on cpu0 in my system. so If AP requests FILE-I/O on cpu2, this problem is reproduced.

case-2: preemption

generic_make_request is not preempt disabled, then cpu may changes between blk_queue_enter and blk_queue_exit.

Now I think q_usage_counter should consist of simple atomic_t or kref_t instead of percpu_ref.
System looks no problem as of now, I've not yet make any patches to correct it.
If I have a chance to make the patch, I will post again.

Best Regards, shuichi ogawa, NTT-Data MSE corporation