Re: [RFC 00/12] io_uring zerocopy send

From: Pavel Begunkov
Date: Fri Dec 03 2021 - 11:19:42 EST


On 12/2/21 21:25, Willem de Bruijn wrote:
What if the ubuf pool can be found from the sk, and the index in that
pool is passed as a cmsg?

It looks to me that ubufs are by nature is something that is not
tightly bound to a socket (at least for io_uring API in the patchset),
it'll be pretty ugly:

1) io_uring'd need to care to register the pool in the socket. Having
multiple rings using the same socket would be horrible. It may be that
it doesn't make much sense to send in parallel from multiple rings, but
a per thread io_uring is a popular solution, and then someone would
want to pass a socket from one thread to another and we'd need to support
it.

2) And io_uring would also need to unregister it, so the pool would
store a list of sockets where it's used, and so referencing sockets
and then we need to bind it somehow to io_uring fixed files or
register all that for tracking referencing circular dependencies.

3) IIRC, we can't add a cmsg entry from the kernel, right? May be wrong,
but if so I don't like exposing basically io_uring's referencing through
cmsg. And it sounds io_uring would need to parse cmsg then.


A lot of nuances :) I'd really prefer to pass it on per-request basis,

Ok

it's much cleaner, but still haven't got what's up with msghdr
initialisation...

And passing the struct through multiple layers of functions.

If you refer to ip_make_skb(ubuf) -> __ip_append_data(ubuf), I agree
it's a bit messier, will see what can be done. If you're about
msghdr::msg_ubuf, for me it's more like passing a callback,
which sounds like a normal thing to do.


Maybe, it's better to add a flags field, which would include
"msg_control_is_user : 1" and whether msghdr includes msg_iocb, msg_ubuf,
and everything else that may be optional. Does it sound sane?

If sendmsg takes the argument, it will just have to be initialized, I think.

Other functions are not aware of its existence so it can remain
uninitialized there.

Got it, need to double check, but looks something like 1/12 should
be as you outlined.

And if there will be multiple optional fields that have to be
initialised, we would be able to hide all the zeroing under a
single bitmask. E.g. instead of

msg->field1 = NULL;
...
msg->fieldN = NULL;

It may look like

msg->mask = 0; // HAS_FIELD1 | HAS_FIELDN;

--
Pavel Begunkov