Re: [PATCH] kprobes: Fix to delay the kprobes jump optimization

From: Sebastian Andrzej Siewior
Date: Mon Feb 22 2021 - 05:08:24 EST


On 2021-02-19 10:18:11 [-0800], Paul E. McKenney wrote:
> If Masami's patch works for the PowerPC guys on v5.10-rc7, then it can
> be backported. The patch making RCU Tasks initialize itself early won't
> have any effect and can be left or reverted, as we choose. The self-test
> patch will need to be either adjusted or reverted.
>
> However...
>
> The root cause of this problem is that softirq only kind-of works
> during a window of time during boot. It works only if the number and
> duration of softirq handlers during this time is small enough, for some
> ill-defined notion of "small enough". If there are too many, whatever
> that means exactly, then we get failed attempt to awaken ksoftirqd, which

The number of registered softirq handlers does not matter nor the amount
times the individual softirqs that were scheduled. The only problem is
that one schedules softirq and then waits for its completion.
So scheduling a timer_list timer works. Waiting for its completion does
not. Once ksoftirqd is up, will be processed.

> (sometimes!) results in a silent hang. Which, as you pointed out earlier,
> is a really obnoxious error message. And any minor change could kick
> us into silent-hang state because of the heuristics used to hand off
> to ksoftirqd. The straw that broke the camel's back and all that.

The problem is that a softirq is raised and being waited for its
completion.
Something like synchronize_rcu() would be such a thing I guess.

> One approach would be to add WARN_ON_ONCE() so that if softirq tries
> to awaken ksoftirqd before it is spawned, we get a nice obvious splat.
> Unfortunately, this gives false positives because there is code that
> needs a softirq handler to run eventually, but is OK with that handler
> being delayed until some random point in the early_initcall() sequence.
>
> Besides which, if we are going to add a check, why not use that check
> just make things work by forcing handler execution to remain within the
> softirq back-of-interrupt context instead of awakening a not-yet-spawned
> ksoftirqd? We can further prevent entry into dyntick-idle state until
> the ksoftirqd kthreads have been spawned, which means that if softirq
> handlers must be deferred, they will be resumed within one jiffy by the
> next scheduler-clock interrupt.

This should work.

> Yes, this can allow softirq handlers to impose large latencies, but only
> during early boot, long before any latency-sensitive applications can
> possibly have been created. So this does not seem like a real problem.
>
> Am I missing something here?
>
> Thanx, Paul

Sebastian