Re: [PATCH 1/2] nvme: set io-scheduler requirement for ZNS

From: Damien Le Moal
Date: Mon Sep 07 2020 - 04:22:26 EST


On 2020/09/07 16:01, Kanchan Joshi wrote:
>> Even for SMR, the user is free to set the elevator to none, which disables zone
>> write locking. Issuing writes correctly then becomes the responsibility of the
>> application. This can be useful for settings that for instance use NCQ I/O
>> priorities, which give better results when "none" is used.
>
> Was it not a problem that even if the application is sending writes
> correctly, scheduler may not preserve the order.
> And even when none is being used, re-queue can happen which may lead
> to different ordering.

"Issuing writes correctly" means doing small writes, one per zone at most. In
that case, it does not matter if the block layer reorders writes. Per zone, they
will still be sequential.

>> As far as I know, zoned drives are always used in tightly controlled
>> environments. Problems like "does not know what other applications would be
>> doing" are non-existent. Setting up the drive correctly for the use case at hand
>> is a sysadmin/server setup problem, based on *the* application (singular)
>> requirements.
>
> Fine.
> But what about the null-block-zone which sets MQ-deadline but does not
> actually use write-lock to avoid race among multiple appends on a
> zone.
> Does that deserve a fix?

In nullblk, commands are executed under a spinlock. So there is no concurrency
problem. The spinlock serializes the execution of all commands. null_blk zone
append emulation thus does not need to take the scheduler level zone write lock
like scsi does.



--
Damien Le Moal
Western Digital Research