Re: [PATCH 2/4] nohz: Synchronize sleep time stats with seqlock

From: Frederic Weisbecker
Date: Tue Oct 01 2013 - 10:26:11 EST


2013/10/1 Frederic Weisbecker <fweisbec@xxxxxxxxx>:
> On Wed, Aug 21, 2013 at 06:41:46PM +0200, Oleg Nesterov wrote:
>> On 08/21, Peter Zijlstra wrote:
>> >
>> > The other consideration is that this adds two branches to the normal
>> > schedule path. I really don't know what the regular ratio between
>> > schedule() and io_schedule() is -- and I suspect it can very much depend
>> > on workload -- but it might be a net loss due to that, even if it makes
>> > io_schedule() 'lots' cheaper.
>>
>> Yes, agreed. Please ignore it for now, I didn't try to actually suggest
>> this change. And even if this is fine perfomance wise, this needs some
>> benchmarking.
>>
>> Well. actually I have a vague feeling that _perhaps_ this change can
>> help to solve other problems we are discussing, but I am not sure and
>> right now I can't even explain the idea to me.
>>
>> In short: please ignore ;)
>>
>> Oleg.
>>
>
> Ok, the discussion is hard to synthesize but I think I now have more
> clues to send a better new iteration.
>
> So we have the choice between keeping atomics or using rq->lock. It seems
> that using rq->lock is going to be worrisome for several reasons. So let's
> stick to atomics (but tell me if you prefer the other way).
>
> So the idea for the next try is to do something along the lines of:
>
> struct cpu_idletime {
> nr_iowait,
> seqlock,
> idle_start,
> idle_time,
> iowait_time,
> } __cacheline_aligned_in_smp;
>
> DEFINE_PER_CPU(struct cpu_idletime, cpu_idletime);
>
> io_schedule()
> {
> int prev_cpu;
>
> preempt_disable();
> prev_cpu_idletime = __this_cpu_ptr(&cpu_idletime);
> atomic_inc(prev_cpu_idletime->nr_iowait);
> WARN_ON_ONCE(is_idle_task(current));
> preempt_enable_no_resched();
>
> schedule();
>
> write_seqlock(prev_cpu_idletime->seqlock)
> if (!atomic_dec_return(prev_cpu_idletime->nr_iowait))
> flush_cpu_idle_time(prev_cpu_idletime, 1)

I forgot...
cpu_idletime->idle_start;

after the update.

Also now I wonder if we actually should lock the inc part. Otherwise
it may be hard to get the readers right...

Thanks.

> write_sequnlock(prev_cpu_idletime->seqlock)
>
> }
>
> flush_cpu_idle_time(cpu_idletime, iowait)
> {
> if (!cpu_idletime->idle_start)
> return;
>
> if (nr_iowait)
> cpu_idletime->iowait_time = NOW() - cpu_idletime->idle_start;
> else
> cpu_idletime->idle_time = NOW() - cpu_idletime->idle_start;
> }
>
> idle_entry()
> {
> write_seqlock(this_cpu_idletime->seqlock)
> this_cpu_idletime->idle_start = NOW();
> write_sequnlock(iowait(cpu)->seqlock)
> }
>
> idle_exit()
> {
> write_seqlock(this_cpu_idletime->seqlock)
> flush_cpu_idle_time(this_cpu_idletime, atomic_read(&this_cpu_idletime->nr_iowait));
> this_cpu_idletime->idle_start = 0;
> write_sequnlock(this_cpu_idletime->seqlock)
> }
>
>
> Now this all realy on the fact that atomic_inc(cpu_idletime->nr_iowait) can't happen
> in a CPU that is already idle. So it can't happen between idle_entry() and idle_exit().
> Hence the WARN_ON(is_idle_task(current)) below after the inc.
>
> Hmm?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/