Re: [PATCH v6 0/2] Block layer support ZAC/ZBC commands

From: Damien Le Moal
Date: Wed Aug 10 2016 - 00:38:28 EST


Shaun,

On 8/10/16 12:58, Shaun Tancheff wrote:
On Tue, Aug 9, 2016 at 3:09 AM, Damien Le Moal <Damien.LeMoal@xxxxxxxx> wrote:
On Aug 9, 2016, at 15:47, Hannes Reinecke <hare@xxxxxxx> wrote:

[trim]

Since disk type == 0 for everything that isn't HM so I would prefer the
sysfs 'zoned' file just report if the drive is HA or HM.

Okay. So let's put in the 'zoned' attribute the device type:
'host-managed', 'host-aware', or 'device managed'.

I hacked your patches and simply put a "0" or "1" in the sysfs zoned file.
Any drive that has ZBC/ZAC command support gets a "1", "0" for everything
else. This means that drive managed models are not exposed as zoned block
devices. For HM vs HA differentiation, an application can look at the
device type file since it is already present.

We could indeed set the "zoned" file to the device type, but HM drives and
regular drives will both have "0" in it, so no differentiation possible.
The other choice could be the "zoned" bits defined by ZBC, but these
do not define a value for host managed drives, and the drive managed value
being not "0" could be confusing too. So I settled for a simple 0/1 boolean.

This seems good to me.

Another option I forgot is for the "zoned" file to indicate the total number of zones of the device, and 0 for a non zoned regular block device. That would work as well.

[...]
Done: I hacked Shaun ioctl code and added finish zone too. The
difference with Shaun initial code is that the ioctl are propagated down to
the driver (__blkdev_driver_ioctl -> sd_ioctl) so that there is no need for
BIO request definition for the zone operations. So a lot less code added.

The purpose of the BIO flags is not to enable the ioctls so much as
the other way round. Creating BIO op's is to enable issuing ZBC
commands from device mapper targets and file systems without some
heinous ioctl hacks.
Making the resulting block layer interfaces available via ioctls is just a
reasonable way to exercise the code ... or that was my intent.

Yes, I understood your code. However, since (or if) we keep the zone information in the RB-tree cache, there is no need for the report zone operation BIO interface. Same for reset write pointer by keeping the mapping to discard. blk_lookup_zone can be used in kernel as a report zone BIO replacement and works as well for the report zone ioctl implementation. For reset, there is blkdev_issue_discrad in kernel, and the reset zone ioctl becomes equivalent to BLKDISCARD ioctl. These are simple. Open, close and finish zone remains. For these, adding the BIO interface seemed an overkill. Hence my choice of propagating the ioctl to the driver.
This is debatable of course, and adding an in-kernel interface is not hard: we can implement blk_open_zone, blk_close_zone and blk_finish_zone using __blkdev_driver_ioctl. That looks clean to me.

Overall, my concern with the BIO based interface for the ZBC commands is that it adds one flag for each command, which is not really the philosophy of the interface and potentially opens the door for more such implementations in the future with new standards and new commands coming up. Clearly that is not a sustainable path. So I think that a more specific interface for these zone operations is a better choice. That is consistent with what happens with the tons of ATA and SCSI commands not actually doing data I/Os (mode sense, log pages, SMART, etc). All these do not use BIOs and are processed as request REQ_TYPE_BLOCK_PC.

The ioctls do not mimic exactly the ZBC standard. For instance, there is no
reporting options for report zones, nor is the "all" bit supported for open,
close or finish zone commands. But the information provided on zones is complete
and maps to the standard definitions.

For the reporting options I have planned to reuse the stream_id in
struct bio when that is formalized. There are certainly other places in
struct bio to stuff a few extra bits ...

We could add reporting options to blk_lookup_zones to filter the result and use that in the ioctl implementation as well. This can be added without any problem.

As far as the all bit ... this is being handled by all the zone action
commands. Just pass a sector of ~0ul and it's handled in sd.c by
sd_setup_zone_action_cmnd().

Apologies as apparently my documentation here is lacking :-(

Yes, I got it (libzbc does the same actually). I did not add it for simplicity. But indeed may be it should be. The same trick can be used with the ioctl to driver interface. No problems.

I also added a reset_wp ioctl for completeness, but this one simply calls
blkdev_issue_discard internally, so it is in fact equivalent to the BLKDISCARD
ioctl call with a range exactly aligned on a zone.

I'm confused as my patch set includes a Reset WP (BLKRESETZONE) that
creates a REQ_OP_ZONE_RESET .. same as open and close. My
expectation being that BLKDISCARD doesn't really need yet another alias.

Yes, we could remove the BLKRESETZONE ioctl and have applications use the BLKDISCARD ioctl instead. In the kernel, both generate blkdev_issue_discard calls and are thus identical. Only the ioctl interface differs (sector vs range argument). Again, I added it to have the full set of 5 ZBC/ZAC operations mapping to one BLKxxxZONE ioctl. But granted, reset is optional.

Best regards.

--
Damien Le Moal, Ph.D.
Sr. Manager, System Software Group, HGST Research,
HGST, a Western Digital brand
Damien.LeMoal@xxxxxxxx
(+81) 0466-98-3593 (ext. 513593)
1 kirihara-cho, Fujisawa,
Kanagawa, 252-0888 Japan
www.hgst.com
Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.