Re: [PATCH v2] EXP rcu: Move expedited grace period (GP) work to RT kthread_worker

From: Joel Fernandes
Date: Wed Apr 13 2022 - 13:21:35 EST


Hi Paul,


On Wed, Apr 13, 2022 at 8:07 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>
> On Wed, Apr 13, 2022 at 07:37:11PM +0800, Hillf Danton wrote:
> > On Sat, 9 Apr 2022 08:56:12 -0700 Paul E. McKenney wrote:
> > > On Sat, Apr 09, 2022 at 03:17:40PM +0800, Hillf Danton wrote:
> > > > On Fri, 8 Apr 2022 10:53:53 -0700 Kalesh Singh wrote
> > > > > Thanks for the discussion everyone.
> > > > >
> > > > > We didn't fully switch to kthread workers to avoid changing the
> > > > > behavior for users that dont need this low latency exp GPs. Another
> > > > > (and perhaps more important) reason is because kthread_worker offers
> > > > > reduced concurrency than workqueues which Pual reported can pose
> > > > > issues on systems with a large number of CPUs.
> > > >
> > > > A second ... what issues were reported wrt concurrency, given the output
> > > > of grep -nr workqueue block mm drivers.
> > > >
> > > > Feel free to post a URL link to the issues.
> > >
> > > The issues can be easily seen by inspecting kthread_queue_work() and
> > > the functions that it invokes. In contrast, normal workqueues uses
> > > per-CPU mechanisms to avoid contention, as can equally easily be seen
> > > by inspecting queue_work_on() and the functions that it invokes.
> >
> > The worker from kthread_create_worker() roughly matches unbound workqueue
> > that can get every CPU overloaded, thus the difference in implementation
> > details between kthread worker and WQ worker (either bound or unbound) can
> > be safely ignored if the kthread method works, given that prioirty is barely
> > a cure to concurrency issues.
>
> Please look again, this time taking lock contention in to account,
> keeping in mind that systems with several hundred CPUs are reasonably
> common and that systems with more than a thousand CPUs are not unheard of.


You are talking about lock contention in the kthread_worker infra
which unbound WQ does not suffer from, right? I don't think the worker
lock contention will be an issue unless several
synchronize_rcu_expedited() calls are trying to queue work at the same
time. Did I miss something? Considering synchronize_rcu_expedited()
can block in the normal case (blocking is a pretty heavy operation
involving the scheduler and load balancers), I don't see how
contending on the worker infra locks can be an issue. If it was
call_rcu() , then I can relate to any contention since that executes
much more often.

I think the argument about too many things being RT is stronger though.

Thanks,

Joel


>
>
> Thanx, Paul
>
> > Hillf
> > >
> > > Please do feel free to take a look.
> > >
> > > If taking a look does not convince you, please construct some in-kernel
> > > benchmarks to test the scalability of these two mechanisms. Please note
> > > that some care will be required to make sure that you are doing a valid
> > > apples-to-apples comparison.
> > >
> > > Thanx, Paul
> > >