Re: [PATCH 2/4] nohz: Synchronize sleep time stats with seqlock

From: Frederic Weisbecker
Date: Tue Oct 01 2013 - 10:05:47 EST


On Wed, Aug 21, 2013 at 06:41:46PM +0200, Oleg Nesterov wrote:
> On 08/21, Peter Zijlstra wrote:
> >
> > The other consideration is that this adds two branches to the normal
> > schedule path. I really don't know what the regular ratio between
> > schedule() and io_schedule() is -- and I suspect it can very much depend
> > on workload -- but it might be a net loss due to that, even if it makes
> > io_schedule() 'lots' cheaper.
>
> Yes, agreed. Please ignore it for now, I didn't try to actually suggest
> this change. And even if this is fine perfomance wise, this needs some
> benchmarking.
>
> Well. actually I have a vague feeling that _perhaps_ this change can
> help to solve other problems we are discussing, but I am not sure and
> right now I can't even explain the idea to me.
>
> In short: please ignore ;)
>
> Oleg.
>

Ok, the discussion is hard to synthesize but I think I now have more
clues to send a better new iteration.

So we have the choice between keeping atomics or using rq->lock. It seems
that using rq->lock is going to be worrisome for several reasons. So let's
stick to atomics (but tell me if you prefer the other way).

So the idea for the next try is to do something along the lines of:

struct cpu_idletime {
nr_iowait,
seqlock,
idle_start,
idle_time,
iowait_time,
} __cacheline_aligned_in_smp;

DEFINE_PER_CPU(struct cpu_idletime, cpu_idletime);

io_schedule()
{
int prev_cpu;

preempt_disable();
prev_cpu_idletime = __this_cpu_ptr(&cpu_idletime);
atomic_inc(prev_cpu_idletime->nr_iowait);
WARN_ON_ONCE(is_idle_task(current));
preempt_enable_no_resched();

schedule();

write_seqlock(prev_cpu_idletime->seqlock)
if (!atomic_dec_return(prev_cpu_idletime->nr_iowait))
flush_cpu_idle_time(prev_cpu_idletime, 1)
write_sequnlock(prev_cpu_idletime->seqlock)

}

flush_cpu_idle_time(cpu_idletime, iowait)
{
if (!cpu_idletime->idle_start)
return;

if (nr_iowait)
cpu_idletime->iowait_time = NOW() - cpu_idletime->idle_start;
else
cpu_idletime->idle_time = NOW() - cpu_idletime->idle_start;
}

idle_entry()
{
write_seqlock(this_cpu_idletime->seqlock)
this_cpu_idletime->idle_start = NOW();
write_sequnlock(iowait(cpu)->seqlock)
}

idle_exit()
{
write_seqlock(this_cpu_idletime->seqlock)
flush_cpu_idle_time(this_cpu_idletime, atomic_read(&this_cpu_idletime->nr_iowait));
this_cpu_idletime->idle_start = 0;
write_sequnlock(this_cpu_idletime->seqlock)
}


Now this all realy on the fact that atomic_inc(cpu_idletime->nr_iowait) can't happen
in a CPU that is already idle. So it can't happen between idle_entry() and idle_exit().
Hence the WARN_ON(is_idle_task(current)) below after the inc.

Hmm?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/