Re: [PATCH 8/8] blk-mq: drain I/O when all CPUs in a hctx are offline

From: Bart Van Assche
Date: Thu May 28 2020 - 09:37:56 EST


On 2020-05-27 22:19, Ming Lei wrote:
> On Wed, May 27, 2020 at 08:33:48PM -0700, Bart Van Assche wrote:
>> My understanding is that operations that have acquire semantics pair
>> with operations that have release semantics. I haven't been able to find
>> any documentation that shows that smp_mb__after_atomic() has release
>> semantics. So I looked up its definition. This is what I found:
>>
>> $ git grep -nH 'define __smp_mb__after_atomic'
>> arch/ia64/include/asm/barrier.h:49:#define __smp_mb__after_atomic()
>> barrier()
>> arch/mips/include/asm/barrier.h:133:#define __smp_mb__after_atomic()
>> smp_llsc_mb()
>> arch/s390/include/asm/barrier.h:50:#define __smp_mb__after_atomic()
>> barrier()
>> arch/sparc/include/asm/barrier_64.h:57:#define __smp_mb__after_atomic()
>> barrier()
>> arch/x86/include/asm/barrier.h:83:#define __smp_mb__after_atomic() do {
>> } while (0)
>> arch/xtensa/include/asm/barrier.h:20:#define __smp_mb__after_atomic()
>> barrier()
>> include/asm-generic/barrier.h:116:#define __smp_mb__after_atomic()
>> __smp_mb()
>>
>> My interpretation of the above is that not all smp_mb__after_atomic()
>> implementations have release semantics. Do you agree with this conclusion?
>
> I understand smp_mb__after_atomic() orders set_bit(BLK_MQ_S_INACTIVE)
> and reading the tag bit which is done in blk_mq_all_tag_iter().
>
> So the two pair of OPs are ordered:
>
> 1) if one request(tag bit) is allocated before setting BLK_MQ_S_INACTIVE,
> the tag bit will be observed in blk_mq_all_tag_iter() from blk_mq_hctx_has_requests(),
> so the request will be drained.
>
> OR
>
> 2) if one request(tag bit) is allocated after setting BLK_MQ_S_INACTIVE,
> the request(tag bit) will be released and retried on another CPU
> finally, see __blk_mq_alloc_request().
>
> Cc Paul and linux-kernel list.

I do not agree with the above conclusion. My understanding of
acquire/release labels is that if the following holds:
(1) A store operation that stores the value V into memory location M has
a release label.
(2) A load operation that reads memory location M has an acquire label.
(3) The load operation (2) retrieves the value V that was stored by (1).

that the following ordering property holds: all load and store
instructions that happened before the store instruction (1) in program
order are guaranteed to happen before the load and store instructions
that follow (2) in program order.

In the ARM manual these semantics have been described as follows: "A
Store-Release instruction is multicopy atomic when observed with a
Load-Acquire instruction".

In this case the load-acquire operation is the
"test_and_set_bit_lock(nr, word)" statement from the sbitmap code. That
code is executed indirectly by blk_mq_get_tag(). Since there is no
matching store-release instruction in __blk_mq_alloc_request() for
'word', ordering of the &data->hctx->state and 'tag' memory locations is
not guaranteed by the acquire property of the "test_and_set_bit_lock(nr,
word)" statement from the sbitmap code.

Thanks,

Bart.