Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and memory leaks

From: Rafael J. Wysocki
Date: Fri Feb 15 2013 - 07:43:36 EST


On Thursday, February 14, 2013 05:28:02 PM Toshi Kani wrote:
> On Fri, 2013-02-15 at 01:23 +0100, Rafael J. Wysocki wrote:
> > On Thursday, February 14, 2013 11:45:27 PM Moore, Robert wrote:
> > >
> > > > -----Original Message-----
> > > > From: Rafael J. Wysocki [mailto:rjw@xxxxxxx]
> > > > Sent: Thursday, February 14, 2013 12:59 PM
> > > > To: Moore, Robert
> > > > Cc: Toshi Kani; ACPI Devel Maling List; LKML; Bjorn Helgaas; Jiang Liu;
> > > > Yinghai Lu; Yasuaki Ishimatsu; Myron Stowe; linux-pci@xxxxxxxxxxxxxxx
> > > > Subject: Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues and
> > > > memory leaks
> > > >
> > > > On Thursday, February 14, 2013 08:45:14 PM Moore, Robert wrote:
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Rafael J. Wysocki [mailto:rjw@xxxxxxx]
> > > > > > Sent: Thursday, February 14, 2013 4:04 AM
> > > > > > To: Moore, Robert
> > > > > > Cc: Toshi Kani; ACPI Devel Maling List; LKML; Bjorn Helgaas; Jiang
> > > > > > Liu; Yinghai Lu; Yasuaki Ishimatsu; Myron Stowe;
> > > > > > linux-pci@xxxxxxxxxxxxxxx
> > > > > > Subject: Re: [Update][PATCH] ACPI / hotplug: Fix concurrency issues
> > > > > > and memory leaks
> > > > > >
> > > > > > On Thursday, February 14, 2013 02:31:22 AM Moore, Robert wrote:
> > > > > > > > > > I thought about that, but actually there's no guarantee that
> > > > > > > > > > the handle will be valid after _EJ0 as far as I can say. So
> > > > > > > > > > the race condition is going to be there anyway and using
> > > > > > > > > > struct acpi_device just makes it easier to avoid it.
> > > > > > > > >
> > > > > > > > > In theory, yes, a stale handle could be a problem, if _EJ0
> > > > > > > > > performs unload table and if ACPICA frees up its internal data
> > > > > > > > > structure pointed by the handle as a result. But we should
> > > > > > > > > not see such issue now since we do not support dynamic ACPI
> > > > > > > > > namespace
> > > > > > yet.
> > > > > > > >
> > > > > > > > I'm waiting for information from Bob about that. If we can
> > > > > > > > assume ACPI handles to be always valid, that will simplify
> > > > > > > > things quite a
> > > > > > bit.
> > > > > > >
> > > > > > > If a table is unloaded, all the namespace nodes for that table are
> > > > > > > removed from the namespace, and thus any ACPI_HANDLE pointers go
> > > > > > > stale
> > > > > > and invalid.
> > > > > >
> > > > > > OK, thanks!
> > > > > >
> > > > > > To me this means that we cannot assume a handle to stay valid
> > > > > > between a notify handler and acpi_bus_hot_remove_device() run from a
> > > > workqueue.
> > > > > >
> > > > > > Is there a mechanism in ACPICA to ensure that a handle won't become
> > > > > > stale while a notify handler is running for it or is the OS
> > > > > > responsible for ensuring that
> > > > > > _EJ0 won't be run in parallel with notify handlers for device
> > > > > > objects being ejected?
> > > > > >
> > > > >
> > > > > It is up to the host.
> > > >
> > > > I was afraid that that might be the case. :-)
> > > >
> > > > So far the (Linux) host has been happily ignoring that potential problem,
> > > > so I guess it can still be ignored for a while, although we'll need to
> > > > address it eventually at one point.
> > >
> > > I would think it should be fairly simple to setup a mechanism to either tell
> > > the driver or for the driver to figure it out -- such that the driver knows
> > > that all handles associated with the device are now invalid. Another way
> > > to look at it is that when the device is re-installed, the driver should
> > > reinitialize such that it obtains new handles for the devices and subobjects
> > > in question.
> >
> > Unfortunately, there is quite strong assumption in our code that ACPI handles
> > will not become stale before the device objects associated with them are
> > removed. For this reason, we need to know in advance which handles will
> > become stale as a result of a table unload and remove their device objects
> > beforehand.
> >
> > Moreover, when there's a notify handler installed for a given ACPI handle
> > and that handle becomes stale while the notify handler is running, we'll be
> > in trouble. To avoid that we need to ensure that table unloads and notifies
> > will always be mutually exclusive.
>
> I wonder if we can make acpi_ns_validate_handle() to actually be able to
> verify if a given handle is valid. This way, ACPICA can fail gracefully
> (AE_BAD_PARAMETER) when a stable handle is passed to the interfaces.

That'd be good, but to implement it, I think, it would be necessary to
introduce some reference counting of namespace objects such that the given
object would only be deleted after the last reference to it had been dropped.
On table unload it would just be marked as invalid, but it would stay in
memory as long as there were any references to it.

So, for example, a notify handler would start from something like
acpi_add_reference(handle), which would guarantee that the object pointed to by
handle would stay in memory, and it would finish by doing
acpi_drop_reference(handle) or a work item scheduled by it would do that.

We do that for objects based on struct device and it works well.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/