Re: [PATCH 11/32] nohz/cpuset: Don't turn off the tick if rcu needs it
From: Gilad Ben-Yossef
Date: Wed Mar 28 2012 - 08:57:42 EST
On Wed, Mar 28, 2012 at 2:39 PM, Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:
> On Tue, Mar 27, 2012 at 05:21:34PM +0200, Gilad Ben-Yossef wrote:
>> On Thu, Mar 22, 2012 at 6:18 PM, Christoph Lameter <cl@xxxxxxxxx> wrote:
>> > On Thu, 22 Mar 2012, Gilad Ben-Yossef wrote:
>> >
>> >> > Is there any way for userspace to know that the tick is not off yet due to
>> >> > this? It would make sense for us to have busy loop in user space that
>> >> > waits until the OS has completed all processing if that avoids future
>> >> > latencies for the application.
>> >> >
>> >>
>> >> I previously suggested having the user register to receive a signal
>> >> when the tick
>> >> is turned off. Since the tick is always turned off the user task is
>> >> the current task
>> >> by design, *I think* you can simply mark the signal pending when you
>> >> turn the tick off.
>> >
>> > Ok that sounds good. You would define a new signal for this?
>> >
>>
>> My gut instinct is to let the process register with a specific signal
>> (properly the RT range)
>> it wants to receive when the tick goes off and/or on.
>
> Note the signal itself could trigger an event that could restart the tick.
> Calling call_rcu() is sufficient for that. We can probably optimize that
> one day by assigning another CPU to handle the callbacks of a tickless
> CPU but for now...
>
>>
>> > So we would startup the application. App will do all prep work (memory
>> > allocation, device setup etc etc) and then wait for the signal to be
>> > received. After that it would enter the low latency processing phase.
>> >
>> > Could we also get a signal if something disrupts the peace and switches
>> > the timer interrupt on again?
>> >
>>
>> I think you'll have to since once you have the tick turned off there
>> is no guarantee that
>> it wont get turned on by a timer scheduling an task or an IPI.
>
> The problem with this scheme is that if the task is running with the
> guarantee that nothing is going to disturb it (it assumes so when it
> is notified that the timer is stopped), can it seriously recover from
> the fact the timer has been restarted once it gets notified about it?
Recovery in this context involves a programmer/system architect looking
into what made the tick start and making sure that wont happen the next
time around.
I know it's not quite what you had in mind, but it works :-)
>
> I have a hard time to imagine that. It's like an RT task running a
> critical part that suddenly receives a notification from the kernel that
> says "what's up dude? hey by the way you're not real time anymore" :)
> How are we recovering from that?
The point is that it is the difference between a QA report that says:
"Performance dropped below acceptable level for 10 ms some when
during the test run"
and
"We got an indication that the kernel resumed the tick on us, so the test
was stopped and here is the stack trace for all the tasks running,
plus the logs".
> May be instead of focusing on these notifications, we should try hard to
> shut down the tick before we reach userspace: delegate RCU work
> to another CPU, avoid needless IPIs, avoid needless timer list timers, etc...
> Fix those things one by one such that we can configure things to the point we
> get closer to a guarantee of CPU isolation.
>
> Does that sound reasonable?
It does to me :-)
Gilad
--
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@xxxxxxxxxxxxx
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com
"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
-- Jean-Baptiste Queru
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/