Re: [PATCH v4] acpi: Fix CPU hot removal problem

From: canquan.shen
Date: Fri Sep 23 2011 - 20:20:38 EST


On 2011/9/23 22:16, Bjorn Helgaas wrote:
On Fri, Sep 23, 2011 at 1:49 AM, canquan.shen<shencanquan@xxxxxxxxxx> wrote:
On 2011/9/23 0:53, Bjorn Helgaas wrote:

On Wed, Sep 14, 2011 at 8:56 PM, Bjorn Helgaas<bhelgaas@xxxxxxxxxx>
wrote:

On Wed, Sep 14, 2011 at 7:06 PM, canquan.shen<shencanquan@xxxxxxxxxx>
wrote:

We run linux as a guest in Xen environment. When we used the xen tools
(xm vcpu-set<n>) to hot add and remove vcpu to and from the guest, we
encountered the failure on vcpu removal. We found the reason is that it
didn't go to really remove cpu in the cpu removal code path.

This patch adds acpi_bus_trim in acpi_process_hotplug_notify to fix this
issue. With this patch, it works fine for us.

Signed-off-by:Canquan Shen<shencanquan@xxxxxxxxxx>

Reviewed-by: Bjorn Helgaas<bhelgaas@xxxxxxxxxx>

On second thought, let's think about this a bit more.

As I mentioned before, I have a long-term goal to move the hotplug
flow out of drivers and into the ACPI core. That will be easier if
the code in the drivers is as generic as possible.

The dock and acpiphp hot-remove code calls acpi_bus_trim(), then
evaluates _EJ0. The core acpi_bus_hot_remove_device() function
already does both acpi_bus_trim() and _EJ0. This function is
currently only used when we write to sysfs "eject" files, but I wonder
if we should use it in acpi_processor_hotplug_notify() as well.

That would get us one step closer to removing this gunk from the
drivers and having acpi_bus_notify() look something like this:

case ACPI_NOTIFY_EJECT_REQUEST:
driver->ops.remove(device);
acpi_bus_hot_remove_device(device);
break;

There is a description of a CPU hot-remove that does include _EJ0
methods in the "DIG64 Hot-Plug& Partitioning Flows Specification"
[1], sec 2.2.4. I know this document is Itanium-oriented, but this
part seems fairly generic and it's the only description of the process
I've seen so far.

So would using acpi_bus_hot_remove_device() instead of acpi_bus_trim()
also solve your problem, Canquan?

Yes. It can solve my problem.
I fully aggree to replace acpi_bus_hot_remove_device() to acpi_bus_trim().
Initially I insert the acpi_bus_hot_remove_device() in acpi_bus_notify
function . lately I think I should give a chance for user,and so send
KOBJ_OFFLINE message to the udvev module.

But why add the driver->ops.remove(device) before
acpi_bus_hot_remove_device(device). it can be called in
acpi_bus_hot_remove_device code path as bellowing:
acpi_bus_trim
acpi_bus_remove
device_release_driver
__device_release_driver
acpi_device_remove
acpi_drv->ops.remove

OK. Maybe the future acpi_bus_notify() code would be even simpler.
The point is that the ACPI core should handle the notification, call
the driver's .remove() method, and do whatever namespace cleanup is
required (i.e., acpi_bus_trim()). None of this should be in the
driver itself.

Can you re-post your patch, using acpi_bus_hot_remove_device() instead
of acpi_bus_trim()? Please include Khalid's tweak, too, so we don't
print warnings for CPUs that don't supply _EJ0 methods.

Bjorn

.

OK. I will modify it and re-post the patch which will include Khalid's tweak. and Thank Khalid for attention this patch.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/