Re: Cannot hot remove a memory device

From: Rafael J. Wysocki
Date: Fri Aug 02 2013 - 19:33:26 EST


On Friday, August 02, 2013 03:46:15 PM Toshi Kani wrote:
> On Thu, 2013-08-01 at 23:43 +0200, Rafael J. Wysocki wrote:
> > Hi,
> >
> > Thanks for your report.
> >
> > On Thursday, August 01, 2013 05:37:21 PM Yasuaki Ishimatsu wrote:
> > > By following commit, I cannot hot remove a memory device.
> > >
> > > ACPI / memhotplug: Bind removable memory blocks to ACPI device nodes
> > > commit e2ff39400d81233374e780b133496a2296643d7d
> > >
> > > Details are follows:
> > > When I add a memory device, acpi_memory_enable_device() always fails
> > > as follows:
> > >
> > > ...
> > > [ 1271.114116] [ffffea121c400000-ffffea121c7fffff] PMD -> [ffff880813c00000-ffff880813ffffff] on node 3
> > > [ 1271.128682] [ffffea121c800000-ffffea121cbfffff] PMD -> [ffff880813800000-ffff880813bfffff] on node 3
> > > [ 1271.143298] [ffffea121cc00000-ffffea121cffffff] PMD -> [ffff880813000000-ffff8808133fffff] on node 3
> > > [ 1271.157799] [ffffea121d000000-ffffea121d3fffff] PMD -> [ffff880812c00000-ffff880812ffffff] on node 3
> > > [ 1271.172341] [ffffea121d400000-ffffea121d7fffff] PMD -> [ffff880812800000-ffff880812bfffff] on node 3
> > > [ 1271.186872] [ffffea121d800000-ffffea121dbfffff] PMD -> [ffff880812400000-ffff8808127fffff] on node 3
> > > [ 1271.201481] [ffffea121dc00000-ffffea121dffffff] PMD -> [ffff880812000000-ffff8808123fffff] on node 3
> > > [ 1271.216041] [ffffea121e000000-ffffea121e3fffff] PMD -> [ffff880811c00000-ffff880811ffffff] on node 3
> > > [ 1271.230623] [ffffea121e400000-ffffea121e7fffff] PMD -> [ffff880811800000-ffff880811bfffff] on node 3
> > > [ 1271.245148] [ffffea121e800000-ffffea121ebfffff] PMD -> [ffff880811400000-ffff8808117fffff] on node 3
> > > [ 1271.259683] [ffffea121ec00000-ffffea121effffff] PMD -> [ffff880811000000-ffff8808113fffff] on node 3
> > > [ 1271.274194] [ffffea121f000000-ffffea121f3fffff] PMD -> [ffff880810c00000-ffff880810ffffff] on node 3
> > > [ 1271.288764] [ffffea121f400000-ffffea121f7fffff] PMD -> [ffff880810800000-ffff880810bfffff] on node 3
>
> It appears that each memory object only has 64MB of memory. This is
> less than the memory block size, which is 128MB. This means that a
> single memory block associates with two ACPI memory device objects.

That'd be bad.

How did that work before if that indeed is the case?

> > > ...
> > > [ 1271.325841] acpi PNP0C80:03: acpi_memory_enable_device() error
> >
> > Well, the only new way acpi_memory_enable_device() can fail after that commit
> > is a failure in acpi_bind_memory_blocks().
>
> Agreed.
>
> > This means that either handle is NULL, which I think we can exclude, because
> > acpi_memory_enable_device() wouldn't be called at all if that were the case, or
> > there's a more subtle error in acpi_bind_one().
> >
> > One that comes to mind is that we may be calling acpi_bind_one() twice for the
> > same memory region, in which it will trigger -EINVAL from the sanity check in
> > there.
>
> I think it fails with -EINVAL at the place with dev_warn(dev, "ACPI
> handle is already set\n"). When two ACPI memory objects associate with
> a same memory block, the bind procedure of the 2nd ACPI memory object
> sees that ACPI_HANDLE(dev) is already set to the 1st ACPI memory object.

That sound's plausible, but I wonder how we can fix that?

There's no way for a single physical device to have two different ACPI
"companions". It looks like the memory blocks should be 64 M each in that
case. Or we need to create two child devices for each memory block and
associate each of them with an ACPI object. That would lead to complications
in the user space interface, though.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/