Re: [PATCH] processor passthru - upload _Cx and _Pxx data tohypervisor (v5).

From: Konrad Rzeszutek Wilk
Date: Fri Feb 24 2012 - 19:25:36 EST


On Fri, Feb 24, 2012 at 10:23:42AM +0000, Jan Beulich wrote:
> >>> On 23.02.12 at 23:31, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote:
> > This module (processor-passthru) collects the information that the cpufreq
> > drivers and the ACPI processor code save in the 'struct acpi_processor' and
> > then uploads it to the hypervisor.
>
> Thus looks conceptually wrong to me - there shouldn't be a need for a
> CPUFreq driver to be loaded in Dom0 (or your module should masquerade
> as the one and only suitable one).

So before your email I had been thinking that b/c of the cpuidle rework
by Len it meant that when the cpufreq drivers are active - they would be started
from the cpu_idle call - and since cpu_idle call ends up being default_idle on
pvops (which calls safe_halt) that would be fine. This is the work that Len did
"cpuidle: replace xen access to x86 pm_idle and default_idle" and
"cpuidle: stop depending on pm_idle"

But cpufreq != cpuidle != cpufreq governor, and they all are run by different rules.
The ondemand cpufreq governor for example runs a timer and calls the appropiate cpufreq
driver. So with these patches I posted we end up with a cpufreq driver in the kernel
and in Xen hypervisor - both of them trying to change Pstates. Not good (to be fair,
if powernow-k8/acpi-cpufreq would try it via WRMSR - those would up being trapped and
ignored by the hypervisor. I am not sure about the outw though).

The pre-RFC version of this posted driver implemented a cpufreq governor that was
nop and for future work was going to make a hypercall to get the true cpufreq value
to report properly in /proc/cpuinfo - but I hadn't figured out a way to make it be
the default one dynamically.

Perhaps having xencommons do
echo "xen" > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

And s/processor-passthru/cpufreq-xen/ would do it? That would eliminate the [performance,
ondemand,powersave,etc] cpufreq governors from calling into the cpufreq drivers to alter P-states.

Let me CC Dave Jones and the cpufreq mailing list - perhaps they might have
some ideas?
[The patch is http://lwn.net/Articles/483668/]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/