Re: [PATCH] Skip tsc synchronization checks if CONSTANT_TSC bit isset.

From: Alok Kataria
Date: Thu Oct 23 2008 - 19:39:33 EST


On Thu, 2008-10-23 at 01:10 -0700, Andi Kleen wrote:
> >
> > The acpi_pm timer wrap problem has come up only with the clocksource and
> > NO_HZ kernels, without NO_HZ there were periodic interrupts which caused
> > the guest to be scheduled before ACPI_PM could wrap around.


> ACPI_PM should be just fixed. My old independent noidlehz implementation
> just always limited the sleep times to half the wrap time of the
> timer. I suspect this needs to be done here too.
>

Yeah fixing ACPI_PM is a good idea, but this too won't fix the problem
in the virtualized space as your VCPU can be descheduled for more than
4secs. Now since time was kept using interrupts before clocksources, the
counter wrap around never made a difference.

> The tsc frequency one didn't sound like a valid bug.

It is not. The calibration algorithm until 2.6.27 did work well for
virtualization.
But with the new fast path algorithm the error in calibration could now
be as high as 7600ppm, as explained on this thread.
http://kerneltrap.org/mailarchive/linux-kernel/2008/9/5/3208194

I know that this is a corner case but can be triggered in virtualization
environment and i have seen instances of it. Its not because of any
virtualization defects but because these races are magnified here.


> I though there were some efforts to make it 64bit too?
> Or is there no VMI ROM on 64bit? Perhaps you could do the
> timer without the ROM then.

Yeah i could, the point though is that the current TSC implementation is
already good enough for virtualization with these small changes, so
adding a new algorithm doesn't seem quite that attractive to me.


> >
> > I guess, the only thing that you don't agree over here is the enabling
> > of CONSTANT_TSC bit when VMware is detected, right ?
>
> My POV is that code supposed to drive real hardware shouldn't
> have any "is hypervisor X|Y|Z" hacks. We already got a whole
> lot of infrastructure for PV hypervisors.

These "is_hypervisor" checks are not in fast path.
Apart from that, with a field in the cpuinfo_structure we wont be
calling all these detection functions over and over again. The move is
already towards standardizing the detection process for any hypervisor.

>
> For tsc_sync I suspect the fix is to either completely trust CONSTANT_TSC
> or make the check accept more offset or possibly a combination of both

I am ok with the CONSTANT_TSC bit check, but if people think that its
not important to skip this for native, i think adding a new flag to skip
this should be safe enough.


Ingo, HPA your views on this whole detection and skipping thing ?

Thanks,
Alok

> .
>
> -Andi
> --
> ak@xxxxxxxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/