RE: [PATCH] rcu-tasks: Directly invoke rcuwait_wake_up() in call_rcu_tasks_generic()

From: Zhang, Qiang1
Date: Thu Feb 23 2023 - 22:22:26 EST


On Thu, Feb 23, 2023 at 10:05 PM Zhang, Qiang1 <qiang1.zhang@xxxxxxxxx> wrote:
>
> On Thu, Feb 23, 2023 at 9:35 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> >
> > On Thu, Feb 23, 2023 at 9:25 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Fri, Feb 24, 2023 at 12:36:05AM +0000, Zhang, Qiang1 wrote:
> > > > On Thu, Feb 23, 2023 at 08:43:05AM +0000, Zhang, Qiang1 wrote:
> > > > > > From: Zqiang <qiang1.zhang@xxxxxxxxx>
> > > > > > Sent: Thursday, February 23, 2023 2:30 PM
> > > > > > To: paulmck@xxxxxxxxxx; frederic@xxxxxxxxxx; quic_neeraju@xxxxxxxxxxx;
> > > > > > joel@xxxxxxxxxxxxxxxxx
> > > > > > Cc: rcu@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> > > > > > Subject: [PATCH] rcu-tasks: Directly invoke rcuwait_wake_up() in
> > > > > > call_rcu_tasks_generic()
> > > > > >
> > > > > > According to commit '3063b33a347c ("Avoid raw-spinlocked wakeups from
> > > > > > call_rcu_tasks_generic()")', the grace-period kthread is delayed to wakeup
> > > > > > using irq_work_queue() is because if the caller of
> > > > > > call_rcu_tasks_generic() holds a raw spinlock, when the kernel is built with
> > > > > > CONFIG_PROVE_RAW_LOCK_NESTING=y, due to a spinlock will be hold in
> > > > > > wake_up(), so the lockdep splats will happen. but now using
> > > > > > rcuwait_wake_up() to wakeup grace-period kthread instead of wake_up(), in
> > > > > > rcuwait_wake_up() no spinlock will be acquired, so this commit remove using
> > > > > >
> > > > > >There are still spinlock-acquisition and spinlock-release invocations within the call path from rcuwait_wake_up().
> > > > > >
> > > > > >rcuwait_wake_up() -> wake_up_process() -> try_to_wake_up(), then:
> > > > > >
> > > > > > raw_spin_lock_irqsave()
> > > > > > ...
> > > > > > raw_spin_unlock_irqrestore
> > > > >
> > > > > Yes, but this is raw_spinlock acquisition and release(note: spinlock will convert to
> > > > > sleepable lock in Preempt-RT kernel, but raw spinlock is not change).
> > > > >
> > > > > acquire raw_spinlock -> acquire spinlock will trigger lockdep warning.
> > > > >
> > > > >Is this really safe in the long run though? I seem to remember there are
> > > > >weird locking dependencies if RCU is used from within the scheduler [1].
> > > > >
> > > >
> > > >
> > > > I have been running rcutorture with rcutorture.type = tasks-tracing,
> > > > so far no problems have been found.
> > > >
> > > >
> > > > >I prefer to keep it as irq_work_queue() unless you are seeing some benefit.
> > > > >Generally, there has to be a 'win' or other justification for adding more
> > > > >risk.
> > > > >
> > > > >thanks,
> > > > >
> > > > >- Joel
> > > > >[1] http://www.joelfernandes.org/rcu/scheduler/locking/2019/09/02/rcu-schedlocks.html
> > > >
> > > >
> > > > The problem in this link, in an earlier RCU version, rcu_read_unlock_special()
> > > > Invoke wakeup and enter scheduler can lead to deadlock, but my modification is for
> > > > call_rcu_tasks_generic(), even if there is a lock dependency problem, we should pay
> > > > more attention to rcu_read_unlock_trace_special()
> > >
> > > Consider ABBA deadlocks as well, not just self-deadlocks (which IIRC is what
> > > the straight-RCU rcu_read_unlock() issues were about).
> > >
> > > What prevents the following scenario?
> > >
> > > In the scheduler you have code like this:
> > > rq = task_rq_lock(p, &rf);
> > > trace_sched_wait_task(p);
> > >
> > > Someone can hook up a BPF program to that tracepoint that then calls
> > > rcu_read_unlock_trace() -> rcu_read_unlock_trace_special(). All of
> > > this while holding the rq and pi scheduler locks.
> > >
> > > That's A (rq lock) -> B (rtpcp lock).
>
> In rcu_read_unlock_trace_special(), the premise of acquiring the rtpcp lock is that
> before that, we have task switch in the rcu_read_lock_trace/unlock_trace critical section.
> but after we already hold the rq lock, no task switch is generated in the
> rcu_read_lock_trace/unlock_trace critical section.
>
> Please correct me if my understanding is wrong.
>
>Yes, but in the next reply I corrected myself and I am still concerned
>about ABBA. There is obviously *some lock* that is held by the callers
>of call_rcu_tasks*(). So there is a dependency that gets created
>between _that_ lock and the rq lock, if you do a wakeup here. And I
>am not sure whether that lock is also acquired when the BPF program
>runs. If it is, then the BPF programs may hang. It is probably worth
>checking with the BPF guys.
>
>More importantly, do you see a benefit with this change in terms of
>anything more than deleting a few lines of code? Paul typically favors
>robustness and guard rails (as do I), unless there is significant
>benefit in performance, power or both.

because I found that the purpose of using irq_work_queue() early is to solve the problem of lockep splat,
my modified junior is also to avoid unnecessary IPI. but like you said, indeed we are not completely sure
whether there is a potential lock dependency problem, so I agree your opinion.

Thanks
Zqiang

>
>Thanks,
>
> - Joel