Re: [patch 1/3] sched: init rt_avg stat whenever rq comes online

From: Peter Zijlstra
Date: Thu Aug 19 2010 - 04:54:05 EST


On Wed, 2010-08-18 at 17:20 -0700, Suresh Siddha wrote:
> On Tue, 2010-08-17 at 01:51 -0700, Peter Zijlstra wrote:
> > On Mon, 2010-08-16 at 21:25 +0200, Peter Zijlstra wrote:
> > > You can use something like:
> > >
> > > suspend:
> > > __get_cpu_var(cyc2ns_suspend) = sched_clock();
> > >
> > > resume:
> > > for_each_possible_cpu(i)
> > > per_cpu(cyc2ns_offset, i) += per_cpu(cyc2ns_suspend);
> > >
> > > or something like that to keep sched_clock() stable, which is exactly
> > > what most (all?) its users expect when we report the TSC is usable.
> >
> > That's actually broken, you only want a single offset, otherwise we
> > de-sync the TSC, which is bad.
> >
> > So simply store the sched_clock() value at suspend time on the single
> > CPU that is still running, then on resume make sure sched_clock()
> > continues there by adding that stamp to all CPU offsets.
>
>
> Peter, That might not be enough. I should add that in my Lenovo T410
> (having 2 core wsm cpu), TSC's are somehow set to a strange big value
> (for example 0xfffffffebc22f02e) after resume from S3. It looks like
> bios might be writing TSC during resume. I am not sure if this is the
> case for other OEM laptops aswell. I am checking.

ARGH, please kill all SMM support for future CPUs ;-)

Are the TSCs still sync'ed though? If so, we can still compute a offset
and continue with things, albeit it requires something like:

local_irq_save(flags);
__get_cpu_var(cyc2ns_offset) = 0;
offset = cyc2ns_suspend - sched_clock();
local_irq_restore(flags);

for_each_possible_cpu(i)
per_cpu(cyc2ns_offset, i) = offset;

Which would take the funny offset into account and make it resume at
where we left off.

If they got out of sync, we need to flip sched_clock_stable and work on
getting the sched_clock.c code to be monotonic over such a flip.

> So such large values of TSC (leading to a very big difference between
> rq->clock and rq->age_stamp) wont be correctly handled by
> scale_rt_power() either.

Still, we need to fix the clock, not fudge the users.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/