Re: [PATCH v3 0/4] open/accept directly into io_uring fixed file table

From: Jens Axboe
Date: Tue Aug 24 2021 - 10:02:38 EST


On 8/24/21 3:48 AM, Pavel Begunkov wrote:
> On 8/23/21 8:40 PM, Jens Axboe wrote:
>> On 8/23/21 1:13 PM, Josh Triplett wrote:
>>> On Sat, Aug 21, 2021 at 08:18:12PM -0600, Jens Axboe wrote:
>>>> On 8/21/21 9:52 AM, Pavel Begunkov wrote:
>>>>> Add an optional feature to open/accept directly into io_uring's fixed
>>>>> file table bypassing the normal file table. Same behaviour if as the
>>>>> snippet below, but in one operation:
>>>>>
>>>>> sqe = prep_[open,accept](...);
>>>>> cqe = submit_and_wait(sqe);
>>>>> io_uring_register_files_update(uring_idx, (fd = cqe->res));
>>>>> close((fd = cqe->res));
>>>>>
>>>>> The idea in pretty old, and was brough up and implemented a year ago
>>>>> by Josh Triplett, though haven't sought the light for some reasons.
>>>>>
>>>>> The behaviour is controlled by setting sqe->file_index, where 0 implies
>>>>> the old behaviour. If non-zero value is specified, then it will behave
>>>>> as described and place the file into a fixed file slot
>>>>> sqe->file_index - 1. A file table should be already created, the slot
>>>>> should be valid and empty, otherwise the operation will fail.
>>>>>
>>>>> we can't use IOSQE_FIXED_FILE to switch between modes, because accept
>>>>> takes a file, and it already uses the flag with a different meaning.
>>>>>
>>>>> since RFC:
>>>>> - added attribution
>>>>> - updated descriptions
>>>>> - rebased
>>>>>
>>>>> since v1:
>>>>> - EBADF if slot is already used (Josh Triplett)
>>>>> - alias index with splice_fd_in (Josh Triplett)
>>>>> - fix a bound check bug
>>>>
>>>> With the prep series, this looks good to me now. Josh, what do you
>>>> think?
>>>
>>> I would still like to see this using a union with the `nofile` field in
>>> io_open and io_accept, rather than overloading the 16-bit buf_index
>>> field. That would avoid truncating to 16 bits, and make less work for
>>> expansion to more than 16 bits of fixed file indexes.
>>>
>>> (I'd also like that to actually use a union, rather than overloading the
>>> meaning of buf_index/nofile.)
>>
>> Agree, and in fact there's room in the open and accept command parts, so
>> we can just make it a separate entry there instead of using ->buf_index.
>> Then just pass in the index to io_install_fixed_file() instead of having
>> it pull it from req->buf_index.
>
> That's internal details, can be expanded at wish in the future, if we'd
> ever need larger tables. ->buf_index already holds indexes to different
> resources just fine.

Sure it's internal and can always be changed, doesn't change the fact
that it's a bit iffy that it's used differently in different spots. As
it costs us nothing to simply add a 'fixed_file' u32 for io_accept and
io_open, I really think that should be done instead.

> Aliasing with nofile would rather be ugly, so the only option, as you
> mentioned, is to grab some space from open/accept structs, but don't see
> why we'd want it when there is a more convenient alternative.

Because it's a lot more readable and less error prone imho. Agree on the
union, we don't have to resort to that.

>>> I personally still feel that using non-zero to signify index-plus-one is
>>> both error-prone and not as future-compatible. I think we could do
>>> better with no additional overhead. But I think the final call on that
>>> interface is up to you, Jens. Do you think it'd be worth spending a flag
>>> bit or using a different opcode, to get a cleaner interface? If you
>>> don't, then I'd be fine with seeing this go in with just the io_open and
>>> io_accept change.
>>
>> I'd be inclined to go the extra opcode route instead, as the flag only
>> really would make sense to requests that instantiate file descriptors.
>> For this particular case, we'd need 3 new opcodes for
>> openat/openat2/accept, which is probably a worthwhile expenditure.
>>
>> Pavel, what do you think? Switch to using a different opcode for the new
>> requests, and just grab some space in io_open and io_accept for the fd
>> and pass it in to install.
>
> I don't get it, why it's even called hackish? How that's anyhow better?
> To me the feature looks like a natural extension to the operations, just
> like a read can be tuned with flags, so and creating new opcodes seems
> a bit ugly, unnecessary taking space from opcodes and adding duplication
> (even if both versions call the same handler).

I agree that it's a natural extension, the problem is that we have to do
unnatural things (somewhat) to make it work. I'm fine with using the
union for the splice_fd_in to pass it in, I don't think it's a big deal.

I do wish that IORING_OP_CLOSE would work with them, though. I think we
should to that as a followup patch. It's a bit odd to be able to open a
file with IORING_OP_OPENAT and not being able to close it with
IORING_OP_CLOSE. For the latter, we should just give it fixed file
support, which would be pretty trivial.

> First, why it's not future-compatible? It's a serious argument, but I
> don't see where it came from. Do I miss something?
>
> It's u32 now, and so will easily cover all indexes. SQE fields should
> always be zeroed, that's a rule, liburing follows it, and there would
> have been already lots of problems for users not honoring it. And there
> will be a helper hiding all the index conversions for convenience.
>
> void io_uring_prep_open_direct(sqe, index, ...)
> {
> io_uring_prep_open(sqe, ...);
> sqe->file_index = index + 1;
> }

Let's keep it the way that it is, but I do want to see the buf_index
thing go away and just req->open.fixed_file or whatever being used for
open and accept. We should fold that in.

--
Jens Axboe