Re: [PATCH v4 2/2] io_uring: Add support for napi_busy_poll

From: Hao Xu
Date: Wed Mar 02 2022 - 01:27:42 EST



On 3/2/22 04:06, Olivier Langlois wrote:
On Wed, 2022-03-02 at 02:31 +0800, Hao Xu wrote:
+       ne = kmalloc(sizeof(*ne), GFP_NOWAIT);
+       if (!ne)
+               goto out;
IMHO, we need to handle -ENOMEM here, I cut off the error handling
when

I did the quick coding. Sorry for misleading.
If you are correct, I would be shocked about this.

I did return in my 'Linux Device Drivers' book and nowhere it is
mentionned that the kmalloc() can return something else than a pointer

No mention at all about the return value

in man page:
https://www.kernel.org/doc/htmldocs/kernel-api/API-kmalloc.html
API doc:

https://www.kernel.org/doc/html/latest/core-api/mm-api.html?highlight=kmalloc#c.kmalloc

header file:
https://elixir.bootlin.com/linux/latest/source/include/linux/slab.h#L522

I did browse into the kmalloc code. There is a lot of paths to cover
but from preliminary reading, it pretty much seems that kmalloc only
returns a valid pointer or NULL...

/**
* kmem_cache_alloc - Allocate an object
* @cachep: The cache to allocate from.
* @flags: See kmalloc().
*
* Allocate an object from this cache. The flags are only relevant
* if the cache has no available objects.
*
* Return: pointer to the new object or %NULL in case of error
*/
/**
* __do_kmalloc - allocate memory
* @size: how many bytes of memory are required.
* @flags: the type of memory to allocate (see kmalloc).
* @caller: function caller for debug tracking of the caller
*
* Return: pointer to the allocated memory or %NULL in case of error
*/

I'll need someone else to confirm about possible kmalloc() return
values with perhaps an example

I am a bit skeptic that something special needs to be done here...

Or perhaps you are suggesting that io_add_napi() returns an error code
when allocation fails.
This is what I mean.

as done here:
https://elixir.bootlin.com/linux/latest/source/arch/alpha/kernel/core_marvel.c#L867

If that is what you suggest, what would this info do for the caller?

IMHO, it wouldn't help in any way...

Hmm, I'm not sure, you're probably right based on that ENOMEM here shouldn't

fail the arm poll, but we wanna do it, we can do something like what we do for

kmalloc() in io_arm_poll_handler()). I'll leave it to others.

@@ -7519,7 +7633,11 @@ static int __io_sq_thread(struct io_ring_ctx
*ctx, bool cap_entries)
                    !(ctx->flags & IORING_SETUP_R_DISABLED))
                        ret = io_submit_sqes(ctx, to_submit);
                mutex_unlock(&ctx->uring_lock);
-
+#ifdef CONFIG_NET_RX_BUSY_POLL
+               if (!list_empty(&ctx->napi_list) &&
+                   io_napi_busy_loop(&ctx->napi_list))
I'm afraid we may need lock for sqpoll too, since io_add_napi() could
be
in iowq context.

I'll take a look at the lock stuff of this patch tomorrow, too late
now
in my timezone.
Ok, please do. I'm not a big user of io workers. I may have omitted to
consider this possibility.

If that is the case, I think that this would be very easy to fix by
locking the spinlock while __io_sq_thread() is using napi_list.
How about:

if (list is singular) {

     do something;

     return;

}

while (!io_busy_loop_end() && io_napi_busy_loop())

     ;

is there a concern with the current code?
What would be the benefit of your suggestion over current code?

No, it's just coding style concern, since I see

do {

    if() {

        break;

    }

} while();

which means the if statement is actually not int the loop. Anyway, it's just

personal taste.


To me, it seems that if io_blocking_napi_busy_loop() is called, a
reasonable expectation would be that some busy looping is done or else
you could return the function without doing anything which would, IMHO,
be misleading.

By definition, napi_busy_loop() is not blocking and if you desire the
device to be in busy poll mode, you need to do it once in a while or
else, after a certain time, the device will return back to its
interrupt mode.

IOW, io_blocking_napi_busy_loop() follows the same logic used by
napi_busy_loop() that does not call loop_end() before having perform 1
loop iteration.
I see, thanks for explanation. I'm ok with this.

Btw, start_time seems not used in singular branch.
I know. This is why it is conditionally initialized.

like what I said, just personal taste.


+static void io_blocking_napi_busy_loop(struct list_head *napi_list,

+                                      struct io_wait_queue *iowq)
+{
+       if (list_is_singular(napi_list)) {
+               struct napi_entry *ne =
+                       list_first_entry(napi_list,
+                                        struct napi_entry, list);
+
+               napi_busy_loop(ne->napi_id, io_busy_loop_end, iowq,
+                              true, BUSY_POLL_BUDGET);
+               io_check_napi_entry_timeout(ne);
+               break;
+       }
+
+       while (io_napi_busy_loop(napi_list)) {
+               if(io_busy_loop_end(iowq, busy_loop_current_time()))
+                       break;
+       }
+}



Greetings,