Re: [PATCH] rcu: Only boost rcu reader tasks with lower priority than boost kthreads

From: Paul E. McKenney
Date: Mon Mar 07 2022 - 14:15:20 EST


On Mon, Mar 07, 2022 at 02:03:17AM +0000, Zhang, Qiang1 wrote:
> On 3/4/2022 2:56 PM, Zqiang wrote:
> > When RCU_BOOST is enabled, the boost kthreads will boosting readers
> > who are blocking a given grace period, if the current reader tasks
> > have a higher priority than boost kthreads(the boost kthreads priority
> > not always 1, if the kthread_prio is set), boosting is useless, skip
> > current task and select next task to boosting, reduce the time for a
> > given grace period.
> >
> > Signed-off-by: Zqiang <qiang1.zhang@xxxxxxxxx>

Adding to CC to get more eyes on this. I am not necessarily opposed to
it, but I don't do that much RT work myself these days.

Thanx, Paul

> > ---
> > kernel/rcu/tree_plugin.h | 10 +++++++++-
> > 1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index c3d212bc5338..d35b6da66bbd 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -12,6 +12,7 @@
> > */
> >
> > #include "../locking/rtmutex_common.h"
> > +#include <linux/sched/deadline.h>
> >
> > static bool rcu_rdp_is_offloaded(struct rcu_data *rdp)
> > {
> > @@ -1065,13 +1066,20 @@ static int rcu_boost(struct rcu_node *rnp)
> > * section.
> > */
> > t = container_of(tb, struct task_struct, rcu_node_entry);
> > + if (!rnp->exp_tasks && (dl_task(t) || t->prio <= current->prio)) {
> > + tb = rcu_next_node_entry(t, rnp);
> > + WRITE_ONCE(rnp->boost_tasks, tb);
> > + raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > + goto end;
> > + }
> > +
> > rt_mutex_init_proxy_locked(&rnp->boost_mtx.rtmutex, t);
> > raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > /* Lock only for side effect: boosts task t's priority. */
> > rt_mutex_lock(&rnp->boost_mtx);
> > rt_mutex_unlock(&rnp->boost_mtx); /* Then keep lockdep happy. */
> > rnp->n_boosts++;
> > -
> > +end:
> >>
> >>Nit: maybe rename the label to "skip_boost:" ?
> >>
> >>Code looks fine; however, out of curiosity; given that the higher
> >>priority tasks, in general, would exit their read side critical section
> >>quickly and boost the next blocking reader on exiting their read side
> >>section; do you see noticeable reduction in grace period timings with
> >>the change for certain type of workloads?
>
> Thanks for feedback , In preempt-RT systems, there will be many real-time threads (most
> of them are created by users themselves ), their priority is higher or lower than boost kthreads
> (kthread_prio is set), for rt tasks with higher priority than boost kthreads, maybe it will exit
> read side critical quickly, maybe not, if it is preempted by a higher priority task, If try to boost operation,
> this increases the boosts kthread waiting time, as a result, the next blkd tasks cannot be
> boosted in time. of course, I don't deny it, there are also reasons that user priority setting is inappropriate.
>
> Thanks
> Zqiang
>
> >>
> >>
> >>Thanks
> >>Neeraj
>
> > return READ_ONCE(rnp->exp_tasks) != NULL ||
> > READ_ONCE(rnp->boost_tasks) != NULL;
> > }