Re: [PATCH 2/2] nvme: add emulation for zone-append

From: Damien Le Moal
Date: Wed Aug 19 2020 - 05:14:25 EST


On 2020/08/19 17:34, Javier Gonzalez wrote:
> On 19.08.2020 09:40, Christoph Hellwig wrote:
>> On Tue, Aug 18, 2020 at 08:04:28PM +0200, Javier Gonzalez wrote:
>>> I understand that you want vendor alignment in the NVMe driver and I
>>> agree. We are not pushing for a non-append model - you can see that we
>>> are investing effort in implementing the append path in thee block layer
>>> and io_uring and we will continue doing so as patches get merged.
>>>
>>> This said, we do have some OEM models that do not implement append and I
>>> would like them to be supported in Linux. As you know, new TPs are being
>>> standardized now and the append emulation is the based for adding
>>> support for this. I do not believe it is unreasonable to find a way to
>>> add support for this SSDs.
>>
>> I do not think we should support anything but Zone Append, especially not
>> the new TP, which is going to add even more horrible code for absolutely
>> no good reason.
>
> I must admit that this is a bit frustrating. The new TP adds
> functionality beyond operating as an Append alternative that I would
> very much like to see upstream (do want to discuss details here).
>
> I understand the concerns about deviating from the Append model, but I
> believe we should find a way to add these new features. We are hiding
> all the logic in the NVMe driver and not touching the interface with the
> block layer, so the overall model is really not changed.
>
>>
>>> If you completely close the door this approach, the alternative is
>>> carrying off-tree patches to the several OEMs that use these devices.
>>> This is not good for the zoned ecosystem nor for the future of Zone
>>> Append.
>>
>> I really don't have a problem with that. If these OEMs want to use
>> an inferior access model only, they have to pay the price for it.
>> I also don't think that proxy arguments are very useful. If you OEMs
>> are troubled by carrying patches becomes they decided to buy inferior
>> drivers they are perfectly happy to argue their cause here on the list.
>
> I am not arguing as a proxy, I am stating the trouble we see from our
> perspective in having to diverge from mainline when our approach is
> being upstream first.
>
> Whether the I/O mode is inferior or superior, they can answer that
> themselves if they read this list.
>>
>>> Are you open to us doing some characterization and if the impact
>>> to the fast path is not significant, moving ahead to a Zone Append
>>> emulation like in SCSI? I will promise that we will remove this path if
>>> requests for these devices terminate.
>>
>> As said I do not think implementing zone append emulation or the TP that
>> shall not be named are a good idea for Linux.
>
> I would ask you to reconsider this position. I have a hard time
> understanding how zone append emulation is a good idea in SCSI and not
> in NVMe, when there is no performance penalty.

While defining a zone append command for SCSI/ZBC is possible (using sense data
for returning the written offset), there is no way to define zone append for
SATA/ZAC without entirely breaking the ATA command model. This is why we went
after an emulation implementation instead of trying to standardized native
commands. That implementation does not have any performance impact over regular
writes *and* zone write locking does not in general degrade HDD write
performance (only a few corner cases suffer from it). Comparing things equally,
the same could be said of NVMe drives that do not have zone append native
support: performance will be essentially the same using regular writes and
emulated zone append. But mq-deadline and zone write locking will significantly
lower performance for emulated zone append compared to a native zone append
support by the drive.


--
Damien Le Moal
Western Digital Research