Re: Tasks RCU, ftrace, and trampolines (was: Re: [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling)

From: Paul E. McKenney
Date: Tue Mar 19 2024 - 19:33:30 EST


On Tue, Mar 19, 2024 at 11:45:15AM +0000, Mark Rutland wrote:
> Hi Paul,
>
> On Fri, Mar 01, 2024 at 05:16:33PM -0800, Paul E. McKenney wrote:
> > The networking NAPI code ends up needing special help to avoid starving
> > Tasks RCU grace periods [1]. I am therefore revisiting trying to make
> > Tasks RCU directly detect trampoline usage, but without quite as much
> > need to identify specific trampolines...
> >
> > I am putting this information in a Google document for future
> > reference [2].
> >
> > Thoughts?
>
> Sorry for the long delay! I've been looking into this general area over the
> last couple of weeks due to the latent bugs I mentioned in:
>
> https://lore.kernel.org/lkml/Zenx_Q0UiwMbSAdP@FVFF77S0Q05N/
>
> I was somewhat hoping that staring at the code for long enough would result in
> an ephinany (and a nice simple-to-backport solution for the latent issues), but
> so far that has eluded me.
>
> I believe some of those cases will need to use synchronize_rcu_tasks() and we
> might be able to make some structural changes to minimize the number of times
> we'd need to synchronize (e.g. having static ftrace call ops->func from the ops
> pointer, so we can switch ops+func atomically), but those look pretty invasive
> so far.
>
> I haven't been able to come up with "a precise and completely reliable way to
> determine whether the current preemption occurred within a trampoline". Since
> preemption might occur within a trampoline's callee that eventually returns
> back to the trampoline, I believe that'll either depend on having a reliable
> stacktrace or requiring the trampoline to dynamically register/unregister
> somewhere around calling other functions. That, and we do also care about those
> callees themselves, and it's not just about the trampolines...
>
> On arm64, we kinda have "permanent trampolines", as our
> DYNAMIC_FTRACE_WILL_CALL_OPS implementation uses a common trampoline. However,
> that will tail-call direct functions (and those could also be directly called
> from ftrace callsites), so we don't have a good way of handling those without a
> change to the direct func calling convention.
>
> I assume that permanent trampolines wouldn't be an option on architectures
> where trampolines are a spectre mitigation.

Thank you for checking! I placed a pointer to this email in the document
and updated the relevant sections accordingly.

> Mark.
>
> > Thanx, Paul
> >
> > [1] https://lore.kernel.org/all/Zd4DXTyCf17lcTfq@debian.debian/
> > [2] https://docs.google.com/document/d/1kZY6AX-AHRIyYQsvUX6WJxS1LsDK4JA2CHuBnpkrR_U/edit?usp=sharing