Re: [PATCH] nohz1: Documentation

From: Rob Landley
Date: Mon Mar 18 2013 - 16:00:09 EST

On 03/18/2013 01:46:32 PM, Frederic Weisbecker wrote:
2013/3/18 Rob Landley <rob@xxxxxxxxxxx>:
> On 03/18/2013 11:29:42 AM, Paul E. McKenney wrote:
> And really seems like it's kconfig help text?

It's more exhaustive than a Kconfig help. A Kconfig help text should
have the level of detail that describe the purpose and impact of a
feature, as well as some quick reference/pointer to the interface.

Deeper explanation which include implementation internals, finegrained
constraints, TODO list, detailed interface are better here.
I really think we want to keep all the detailed explanations from
Paul's doc. What we need is not a quick reference but a very detailed

It's much _longer_, I'm not sure it contains significantly more information. ("Using more power will shorten battery life" is a nice observation, but is it specific to your subsystem? I dunno, maybe it's a personal idiosyncrasy, but I tend to think that people start with use cases and need to find infrastructure. The other direction seems less interesting somehow. Like a pan with a picture on the front of what you might want to bake with it.)

>> +1. It increases the number of instructions executed on the path
>> + to and from the idle loop.
> This detail didn't get mentioned in my summary.

And it's an important point.

I mentioned increased latency coming out of idle. Increased latency going _to_ idle is an important point? (And pretty much _every_ kconfig option has ramifications at that level which realtime people tend to want to bench.)

Also, I mentioned this one because all the other details I deleted pretty much _did_ get taken into account in my summary.

>> +5. The LB_BIAS scheduler feature is disabled by adaptive ticks.
> I have no idea what that one is, my summary didn't mention it.

Nobody seem to know what that thing is, except probably the scheduler
warlocks :o)
All I know is that it's hard to implement without the tick. So I
disabled it in my tree.

Is it also an important point?

>> +o At least one CPU must keep the scheduling-clock interrupt going
>> + in order to support accurate timekeeping.
> How? You never said how to tell a processor _not_ to suppress interrupts
> when CONFIG_THE_OTHER_HALF_OF_NOHZ is enabled.

Ah indeed it would be nice to point out that there must be an online
CPU outside the value range of the nohz_mask= boot parameter.

There's a nohz_mask boot parameter?

> I take it the problem is the value in the sysenter page won't get updated,
> so gettimeofday() will see a stale value until the CPU hog stops
> suppressing interrupts? I thought the first half of NOHZ had a way of
> dealing with that many moons ago? (Did sysenter cause a regression?)

With CONFIG_NO_HZ, there is always a tick running that updates GTOD
and jiffies as long as there is non-idle CPU. If every CPUs are idle
and one suddenly wakes up, GTOD and jiffies values are caught up.

With full dynticks we have a new problem: there can be a CPU using
jiffies of GTOD without running the tick (we are not idle so there can
be such users). So there must a ticking CPU somewhere.

I.E. because gettimeofday() just checks a memory location without requiring a kernel transition, there's no opportunity for the kernel to trigger and run catch-up code.

So you'd need a timer to remove the read flag on the page containing the jiffies value after it was considered sufficiently stale, and then have the page fault update the value restore the read flag and reset the timer to switch it off again, and then just tell CPU-intensive code that wanted to take advantage of running uninterrupted not to mess with jiffies unless they wanted to trigger interrupts to keep it current.

By the way, I find this "full" name strange if you yourself have a list of more cases where ticks could be dropped, but which you haven't implemented yet. The system being entirely idle means unnecessary ticks can be dropped. The system having no scheduling decisions to make on a processor also means unnecessary ticks can be dropped. But there are two config options and they get treated as entirely different subsystems...

I suppose one of them having a bucket of workarounds and caveats is the reason? One is just "let the system behave more efficiently, only reason it's a config option is increased latency waking up from idle can annoy the realtime guys". The second is "let the system behave more efficiently in a way that opens up a bunch of sharp edges and requires extensive micromanagement". But those sharp edges seem more "unfinished" than really a design limitation...

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at