Re: [RFC v3 0/9] fixed worker

From: Hao Xu
Date: Sun May 01 2022 - 02:32:48 EST


On 4/30/22 21:11, Jens Axboe wrote:
On 4/29/22 4:18 AM, Hao Xu wrote:
This is the third version of fixed worker implementation.
Wrote a nop test program to test it, 3 fixed-workers VS 3 normal workers.
normal workers:
./run_nop_wqe.sh nop_wqe_normal 200000 100 3 1-3
time spent: 10464397 usecs IOPS: 1911242
time spent: 9610976 usecs IOPS: 2080954
time spent: 9807361 usecs IOPS: 2039284

fixed workers:
./run_nop_wqe.sh nop_wqe_fixed 200000 100 3 1-3
time spent: 17314274 usecs IOPS: 1155116
time spent: 17016942 usecs IOPS: 1175299
time spent: 17908684 usecs IOPS: 1116776

I saw these numbers in v2 as well, and I have to admit I don't
understand them. Because on the surface, it sure looks like the first
set of results (labeled "normal") are better than the second "fixed"
set. Am I reading them wrong, or did you transpose them?
Sorry, I transposed them..

I think this patch series would benefit from a higher level description
of what fixed workers mean in this context. How are they different from
the existing workers, and why would it improve things.
Sure, put that in the Patch 7/9, I'll move it to the cover letter as
well.

things to be done:
- Still need some thinking about the work cancellation

Can you expand? What are the challenges with fixed workers and
cancelation?
Currently, when a fixed worker fetch all the works from its private work
list, I use a temporary acct struct to hold them. This means at that
moment the cancellation cannot find these works which are going to run
but not in the private work list already. This won't be a big problem,
another acct member in io_worker{} should be good enough to resolve
that.

- not very sure IO_WORKER_F_EXIT is safe enough on synchronization
- the iowq hash stuff is not compatible with fixed worker for now

We might need to extract the hashing out a bit so it's not as tied to
the existing implementation.