RE: [Xen-devel] [PATCH 3/8] ACPI: processor: add__acpi_processor_[un]register_driver helpers.

From: Tian, Kevin
Date: Mon Jan 16 2012 - 22:05:18 EST


> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@xxxxxxxxxx]
> Sent: Saturday, January 14, 2012 6:25 AM
>
> > > > sure. VCPU!=PCPU requirement is orthogonal to the basic part for gearing
> > > > ACPI information to Xen.
>
> .. snip..
> > > >
> > > > > 1). For new distros (Ubuntu, Fedora), the default is all VCPUs.
> > > >
> > > > good to know that.
> > > >
> > > > > Enterprising users might use dom0_max_vcpus to limit the VCPU
> > > count,
> > > > > but most won't.
> > > > > Which mean we can concentrate on bringing the _Pxx/_Cxx
> parsing
> > > > > up to the hypervisor. Which is really neccessary on any chipset
> > > > > which has the notion of TurboBoost (otherwise the Xen scheduler
> > > > > won't pick this up and won't engage this mode in certain
> > > > > workloads).
>
> .. snip..
>
> > yes, this is a big question. First, I'd like to give my sincere thanks to you and
> > Liang for help push this part to Linux upstream. You've done a great job. :-)
>
> Thanks!
> > Unfortunately I can't afford the time in the short term, due to extremely busy
> > tasks in other projects, at least in the whole Q1. Really sorry about that. :/
>
> Bummer.
> >
> > I would very appreciate your help if you could continue lending some time
> here,
> > since you've done plenty of works on the cleanup. The majority of the tricky
> > part is related to VCPU/PCPU handling. If putting that part aside, the
> remaining
> > logic should be much simpler.
>
> I was trying to figure out how difficult it would be to just bring Pxx states to
> the Xen hypervisor using the existing ACPI interfaces. And while it did not pass
> all the _Pxx states (seems that all the _PCT, _PSS, _PSD, _PPC flags need to
> be enabled in the hypercall to make this work), it demonstrates what I had in
> mind.
>
>
> #include <linux/device.h>
> #include <linux/kernel.h>
> #include <linux/module.h>
> #include <linux/init.h>
> #include <linux/types.h>
> #include <acpi/acpi_bus.h>
> #include <acpi/acpi_drivers.h>
> #include <acpi/processor.h>
> #include <linux/cpumask.h>
>
> #include <xen/interface/platform.h>
> #include <asm/xen/hypercall.h>
>
> #define DRV_NAME "ACPI_PXX"
> #define DRV_CLASS "ACPI_PXX_CLASS"
> MODULE_AUTHOR("Konrad Rzeszutek Wilk");
> MODULE_DESCRIPTION("ACPI Processor Driver to send data to Xen
> hypervisor");
> MODULE_LICENSE("GPL");
>
> static int parse_acpi_cxx(struct acpi_processor *_pr)
> {
> struct acpi_processor_cx *cx;
> int i;
>
> for (i = 1; i <= _pr->power.count; i++) {
> cx = &_pr->power.states[i];
> if (!cx->valid)
> continue;
> pr_info("%s: %d %d %d 0x%x\n", __func__,
> cx->type, cx->latency, cx->power, (u32)cx->address);
> }
> /* TODO: Under Xen, the C-states information is not present.
> * Figure out why. */

it's possible related to this long thread:

http://lists.xen.org/archives/html/xen-devel/2011-08/msg00511.html

IOW, Xen doesn't export mwait capability to dom0, which impacts _PDC setting.
Final solution is to have a para-virtualized PDC call for that.

> return 0;
> }
> static int push_pxx_to_hypervisor(struct acpi_processor *_pr)
> {
> int ret = -EINVAL;
> struct xen_platform_op op = {
> .cmd = XENPF_set_processor_pminfo,
> .interface_version = XENPF_INTERFACE_VERSION,
> .u.set_pminfo.id = _pr->acpi_id,
> .u.set_pminfo.type = XEN_PM_PX,
> };
> struct xen_processor_performance *xen_perf;
> struct xen_processor_px *xen_states, *xen_px = NULL;
> struct acpi_processor_px *px;
> unsigned i;
>
> xen_perf = &op.u.set_pminfo.perf;
> xen_perf->flags = XEN_PX_PSS;
>
> xen_perf->state_count = _pr->performance->state_count;
> xen_states = kzalloc(xen_perf->state_count *
> sizeof(struct xen_processor_px), GFP_KERNEL);
> if (!xen_states)
> return -ENOMEM;
>
> for (i = 0; i < _pr->performance->state_count; i++) {
> xen_px = &(xen_states[i]);
> px = &(_pr->performance->states[i]);
>
> xen_px->core_frequency = px->core_frequency;
> xen_px->power = px->power;
> xen_px->transition_latency = px->transition_latency;
> xen_px->bus_master_latency = px->bus_master_latency;
> xen_px->control = px->control;
> xen_px->status = px->status;
> }
> set_xen_guest_handle(xen_perf->states, xen_states);
> ret = HYPERVISOR_dom0_op(&op);
> kfree(xen_states);
> return ret;
> }
> static int parse_acpi_pxx(struct acpi_processor *_pr)
> {
> struct acpi_processor_px *px;
> int i;
>
> for (i = 0; i < _pr->performance->state_count;i++) {
> px = &(_pr->performance->states[i]);
> pr_info("%s: [%d]: %d, %d, %d, %d, %d, %d\n", __func__,
> i, (u32)px->core_frequency, (u32)px->power,
> (u32)px->transition_latency,
> (u32)px->bus_master_latency,
> (u32)px->control, (u32)px->status);
> }
> if (xen_initial_domain())
> return push_pxx_to_hypervisor(_pr);
> return 0;
> }
> static int parse_acpi_data(void)
> {
> int cpu;
> int err = -ENODEV;
> struct acpi_processor *_pr;
> struct cpuinfo_x86 *c = &cpu_data(0);
>
> /* TODO: Under AMD, the information is populated
> * using the powernow-k8 driver which does an MSR_PSTATE_CUR_LIMIT
> * MSR which returns the wrong value so the population of 'processors'
> * has bogus data. So only run this under Intel for right now. */
> if (!cpu_has(c, X86_FEATURE_EST))
> return -ENODEV;
> for_each_possible_cpu(cpu) {
> _pr = per_cpu(processors, cpu);
> if (!_pr)
> continue;
>
> if (_pr->flags.power)
> (void)parse_acpi_cxx(_pr);
>
> if (_pr->performance->states)
> err = parse_acpi_pxx(_pr);
> if (err)
> break;
> }
> return -ENODEV; /* force it to unload */
> }
> static int __init acpi_pxx_init(void)
> {
> return parse_acpi_data();
> }
> static void __exit acpi_pxx_exit(void)
> {
> }
> module_init(acpi_pxx_init);
> module_exit(acpi_pxx_exit);

the prerequisites for this module to work correctly, is that dom0 has the right
configurations to have all necessary Cx/Px information ready before this
module is loaded. That may mean enabling full CONFIG_CPU_IDLE and CONFIG_CPUFREQ,
which in current form may add some negative impact, e.g. dom0 will try to control
Px/Cx to conflict with Xen. So some tweaks may be required in that part.

given our purpose now, is to come up a cleaner approach which tolerate some
assumptions (e.g. #VCPU of dom0 == #PCPU), there's another option following this
trend (perhaps compensate your idea). We can register a Xen-cpuidle and
xen-cpufreq driver to current Linux cpuidle and cpufreq framework, which plays
mainly two roles:
- a dummy driver to prevent dom0 touching actual Px/Cx
- parse ACPI Cx/Px information to Xen, in a similar way you did above

there may have some other trickiness, but the majority code will be self-contained.

Thanks
Kevin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/