Re: [PATCH v4] acpi: Fix CPU hot removal problem

From: Bjorn Helgaas
Date: Thu Sep 22 2011 - 12:53:52 EST


On Wed, Sep 14, 2011 at 8:56 PM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote:
> On Wed, Sep 14, 2011 at 7:06 PM, canquan.shen <shencanquan@xxxxxxxxxx> wrote:
>> We run linux as a guest in Xen environment. When we used the xen tools
>> (xm vcpu-set <n>) to hot add and remove vcpu to and from the guest, we
>> encountered the failure on vcpu removal. We found the reason is that it
>> didn't go to really remove cpu in the cpu removal code path.
>>
>> This patch adds acpi_bus_trim in acpi_process_hotplug_notify to fix this
>> issue. With this patch, it works fine for us.
>>
>> Signed-off-by:Canquan Shen <shencanquan@xxxxxxxxxx>
>
> Reviewed-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>

On second thought, let's think about this a bit more.

As I mentioned before, I have a long-term goal to move the hotplug
flow out of drivers and into the ACPI core. That will be easier if
the code in the drivers is as generic as possible.

The dock and acpiphp hot-remove code calls acpi_bus_trim(), then
evaluates _EJ0. The core acpi_bus_hot_remove_device() function
already does both acpi_bus_trim() and _EJ0. This function is
currently only used when we write to sysfs "eject" files, but I wonder
if we should use it in acpi_processor_hotplug_notify() as well.

That would get us one step closer to removing this gunk from the
drivers and having acpi_bus_notify() look something like this:

case ACPI_NOTIFY_EJECT_REQUEST:
driver->ops.remove(device);
acpi_bus_hot_remove_device(device);
break;

There is a description of a CPU hot-remove that does include _EJ0
methods in the "DIG64 Hot-Plug & Partitioning Flows Specification"
[1], sec 2.2.4. I know this document is Itanium-oriented, but this
part seems fairly generic and it's the only description of the process
I've seen so far.

So would using acpi_bus_hot_remove_device() instead of acpi_bus_trim()
also solve your problem, Canquan?

Bjorn

[1] http://www.dig64.org/home/DIG64_HPPF_R1_0.pdf

>> ---
>>  drivers/acpi/processor_driver.c |    6 ++++++
>>  1 files changed, 6 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/acpi/processor_driver.c
>> b/drivers/acpi/processor_driver.c
>> index a4e0f1b..03d92d6 100644
>> --- a/drivers/acpi/processor_driver.c
>> +++ b/drivers/acpi/processor_driver.c
>> @@ -641,6 +641,7 @@ static void acpi_processor_hotplug_notify(acpi_handle
>> handle,
>>        struct acpi_processor *pr;
>>        struct acpi_device *device = NULL;
>>        int result;
>> +       u32 id;
>>
>>
>>        switch (event) {
>> @@ -677,6 +678,11 @@ static void acpi_processor_hotplug_notify(acpi_handle
>> handle,
>>                                    "Driver data is NULL, dropping EJECT\n");
>>                        return;
>>                }
>> +               id = pr->id;
>> +               if (acpi_bus_trim(device, 1)) {
>> +                       printk(KERN_ERR  PREFIX
>> +                                   "Fail to Remove CPU %d\n", id);
>> +               }
>>                break;
>>        default:
>>                ACPI_DEBUG_PRINT((ACPI_DB_INFO,
>> --
>> 1.7.6.0
>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/