Re: [PATCH 3/5] io_uring: let fast poll support multishot

From: Hao Xu
Date: Sat May 07 2022 - 03:08:16 EST


在 2022/5/7 上午1:19, Pavel Begunkov 写道:
On 5/6/22 08:01, Hao Xu wrote:
From: Hao Xu <howeyxu@xxxxxxxxxxx>

For operations like accept, multishot is a useful feature, since we can
reduce a number of accept sqe. Let's integrate it to fast poll, it may
be good for other operations in the future.

Signed-off-by: Hao Xu <howeyxu@xxxxxxxxxxx>
---
  fs/io_uring.c | 41 ++++++++++++++++++++++++++---------------
  1 file changed, 26 insertions(+), 15 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8ebb1a794e36..d33777575faf 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5952,7 +5952,7 @@ static void io_poll_remove_entries(struct io_kiocb *req)
   * either spurious wakeup or multishot CQE is served. 0 when it's done with
   * the request, then the mask is stored in req->cqe.res.
   */
-static int io_poll_check_events(struct io_kiocb *req, bool locked)
+static int io_poll_check_events(struct io_kiocb *req, bool *locked)
  {
      struct io_ring_ctx *ctx = req->ctx;
      int v;
@@ -5981,17 +5981,26 @@ static int io_poll_check_events(struct io_kiocb *req, bool locked)
          /* multishot, just fill an CQE and proceed */
          if (req->cqe.res && !(req->apoll_events & EPOLLONESHOT)) {
-            __poll_t mask = mangle_poll(req->cqe.res & req->apoll_events);
-            bool filled;
-
-            spin_lock(&ctx->completion_lock);
-            filled = io_fill_cqe_aux(ctx, req->cqe.user_data, mask,
-                         IORING_CQE_F_MORE);
-            io_commit_cqring(ctx);
-            spin_unlock(&ctx->completion_lock);
-            if (unlikely(!filled))
-                return -ECANCELED;
-            io_cqring_ev_posted(ctx);
+            if (req->flags & REQ_F_APOLL_MULTISHOT) {
+                io_tw_lock(req->ctx, locked);
+                if (likely(!(req->task->flags & PF_EXITING)))
+                    io_queue_sqe(req);

That looks dangerous, io_queue_sqe() usually takes the request ownership
and doesn't expect that someone, i.e. io_poll_check_events(), may still be
actively using it.

E.g. io_accept() fails on fd < 0, return an error,
io_queue_sqe() -> io_queue_async() -> io_req_complete_failed()
kills it. Then io_poll_check_events() and polling in general
carry on using the freed request => UAF. Didn't look at it
too carefully, but there might other similar cases.

I checked this when I did the coding, it seems the only case is
while (atomic_sub_return(v & IO_POLL_REF_MASK, &req->poll_refs));
uses req again after req recycled in io_queue_sqe() path like you
pointed out above, but this case should be ok since we haven't
reuse the struct req{} at that point.
In my first version, I skiped the do while{} in io_poll_check_events()
for multishot apoll and do the reap in io_req_task_submit()

static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
{
int ret;


ret = io_poll_check_events(req, locked);
if (ret > 0)
return;


__io_poll_clean(req);


if (!ret)
io_req_task_submit(req, locked); <------here
else
io_req_complete_failed(req, ret);
}

But the disadvantage is in high frequent workloads case, it may loop in
io_poll_check_events for long time, then finally generating cqes in the
above io_req_task_submit() which is not good in terms of latency.