Re: [PATCH v2 0/2] zone-append support in io-uring and aio

From: Kanchan Joshi
Date: Fri Jun 26 2020 - 18:18:49 EST


On Fri, Jun 26, 2020 at 03:11:55AM +0000, Damien Le Moal wrote:
On 2020/06/26 2:18, Kanchan Joshi wrote:
Semantics --->
Zone-append, by its nature, may perform write on a different location than what
was specified. It does not fit into POSIX, and trying to fit may just undermine
its benefit. It may be better to keep semantics as close to zone-append as
possible i.e. specify zone-start location, and obtain the actual-write location
post completion. Towards that goal, existing async APIs seem to fit fine.
Async APIs (uring, linux aio) do not work on implicit write-pointer and demand
explicit write offset (which is what we need for append). Neither write-pointer

What do you mean by "implicit write pointer" ? Are you referring to the behavior
of AIO write with a block device file open with O_APPEND ? The yes, it does not
work. But that is perfectly fine for regular files, that is for zonefs.
Sorry, I meant file pointer.
Yes, block-device opened with O_APPEND does not increase the file-pointer
to end-of-device. That said, for uring and aio, file-pointer position
plays no role, and it is application responsibility to pass the right write
location.
I would prefer that this paragraph simply state the semantic that is implemented
first. Then explain why the choice. But first, clarify how the API works, what
is allowed, what's not etc. That will also simplify reviewing the code as one
can then check the code against the goal.

In this path (block IO) there is hardly any scope/attempt to abstract away anything.
So raw zoned-storage rule/semantics apply. I expect zone-aware
applications, which are already aware of rules, to be consumer of this.

is taken as input, nor it is updated on completion. And there is a clear way to
get zone-append result. Zone-aware applications while using these async APIs
can be fine with, for the lack of better word, zone-append semantics itself.

Sync APIs work with implicit write-pointer (at least few of those), and there is
no way to obtain zone-append result, making it hard for user-space zone-append.

Sync API are executed under inode lock, at least for regular files. So there is
absolutely no problem to use zone append. zonefs does it already. The problem is
the lack of locking for block device file.
Yes. I was refering to the problem of returning actual write-location using
sync APIs like write, pwrite, pwritev/v2.

Tests --->
Using new interface in fio (uring and libaio engine) by extending zbd tests
for zone-append: https://protect2.fireeye.com/url?k=e21dd5e0-bf837b7a-e21c5eaf-0cc47a336fae-c982437ed1be6cc8&q=1&u=https%3A%2F%2Fgithub.com%2Faxboe%2Ffio%2Fpull%2F1026

Changes since v1:
- No new opcodes in uring or aio. Use RWF_ZONE_APPEND flag instead.
- linux-aio changes vanish because of no new opcode
- Fixed the overflow and other issues mentioned by Damien
- Simplified uring support code, fixed the issues mentioned by Pavel
- Added error checks

Kanchan Joshi (1):
fs,block: Introduce RWF_ZONE_APPEND and handling in direct IO path

Selvakumar S (1):
io_uring: add support for zone-append

fs/block_dev.c | 28 ++++++++++++++++++++++++----
fs/io_uring.c | 32 ++++++++++++++++++++++++++++++--
include/linux/fs.h | 9 +++++++++
include/uapi/linux/fs.h | 5 ++++-
4 files changed, 67 insertions(+), 7 deletions(-)



--
Damien Le Moal
Western Digital Research