RE: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPIoperation region handlers

From: Zheng, Lv
Date: Sun Jul 28 2013 - 21:43:35 EST


> On Friday, July 26, 2013 10:01 PM Rafael J. Wysocki wrote:
> > On Friday, July 26, 2013 12:47:44 AM Zheng, Lv wrote:
> >
> > > On Friday, July 26, 2013 4:27 AM Rafael J. Wysocki wrote:
> > >
> > > On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> > > > This patch adds reference couting for ACPI operation region handlers
> > > > to fix races caused by the ACPICA address space callback invocations.
> > > >
> > > > ACPICA address space callback invocation is not suitable for Linux
> > > > CONFIG_MODULE=y execution environment. This patch tries to protect
> > > > the address space callbacks by invoking them under a module safe
> > > environment.
> > > > The IPMI address space handler is also upgraded in this patch.
> > > > The acpi_unregister_region() is designed to meet the following
> > > > requirements:
> > > > 1. It acts as a barrier for operation region callbacks - no callback will
> > > > happen after acpi_unregister_region().
> > > > 2. acpi_unregister_region() is safe to be called in moudle->exit()
> > > > functions.
> > > > Using reference counting rather than module referencing allows such
> > > > benefits to be achieved even when acpi_unregister_region() is called
> > > > in the environments other than module->exit().
> > > > The header file of include/acpi/acpi_bus.h should contain the
> > > > declarations that have references to some ACPICA defined types.
> > > >
> > > > Signed-off-by: Lv Zheng <lv.zheng@xxxxxxxxx>
> > > > Reviewed-by: Huang Ying <ying.huang@xxxxxxxxx>
> > > > ---
> > > > drivers/acpi/acpi_ipmi.c | 16 ++--
> > > > drivers/acpi/osl.c | 224
> > > ++++++++++++++++++++++++++++++++++++++++++++++
> > > > include/acpi/acpi_bus.h | 5 ++
> > > > 3 files changed, 235 insertions(+), 10 deletions(-)
> > > >
> > > > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > > > 5f8f495..2a09156 100644
> > > > --- a/drivers/acpi/acpi_ipmi.c
> > > > +++ b/drivers/acpi/acpi_ipmi.c
> > > > @@ -539,20 +539,18 @@ out_ref:
> > > > static int __init acpi_ipmi_init(void) {
> > > > int result = 0;
> > > > - acpi_status status;
> > > >
> > > > if (acpi_disabled)
> > > > return result;
> > > >
> > > > mutex_init(&driver_data.ipmi_lock);
> > > >
> > > > - status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> > > > - ACPI_ADR_SPACE_IPMI,
> > > > - &acpi_ipmi_space_handler,
> > > > - NULL, NULL);
> > > > - if (ACPI_FAILURE(status)) {
> > > > + result = acpi_register_region(ACPI_ADR_SPACE_IPMI,
> > > > + &acpi_ipmi_space_handler,
> > > > + NULL, NULL);
> > > > + if (result) {
> > > > pr_warn("Can't register IPMI opregion space handle\n");
> > > > - return -EINVAL;
> > > > + return result;
> > > > }
> > > >
> > > > result = ipmi_smi_watcher_register(&driver_data.bmc_events);
> > > > @@ -596,9 +594,7 @@ static void __exit acpi_ipmi_exit(void)
> > > > }
> > > > mutex_unlock(&driver_data.ipmi_lock);
> > > >
> > > > - acpi_remove_address_space_handler(ACPI_ROOT_OBJECT,
> > > > - ACPI_ADR_SPACE_IPMI,
> > > > - &acpi_ipmi_space_handler);
> > > > + acpi_unregister_region(ACPI_ADR_SPACE_IPMI);
> > > > }
> > > >
> > > > module_init(acpi_ipmi_init);
> > > > diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c index
> > > > 6ab2c35..8398e51 100644
> > > > --- a/drivers/acpi/osl.c
> > > > +++ b/drivers/acpi/osl.c
> > > > @@ -86,6 +86,42 @@ static struct workqueue_struct *kacpid_wq;
> static
> > > > struct workqueue_struct *kacpi_notify_wq; static struct
> > > > workqueue_struct *kacpi_hotplug_wq;
> > > >
> > > > +struct acpi_region {
> > > > + unsigned long flags;
> > > > +#define ACPI_REGION_DEFAULT 0x01
> > > > +#define ACPI_REGION_INSTALLED 0x02
> > > > +#define ACPI_REGION_REGISTERED 0x04
> > > > +#define ACPI_REGION_UNREGISTERING 0x08
> > > > +#define ACPI_REGION_INSTALLING 0x10
> > >
> > > What about (1UL << 1), (1UL << 2) etc.?
> > >
> > > Also please remove the #defines out of the struct definition.
> >
> > OK.
> >
> > >
> > > > + /*
> > > > + * NOTE: Upgrading All Region Handlers
> > > > + * This flag is only used during the period where not all of the
> > > > + * region handers are upgraded to the new interfaces.
> > > > + */
> > > > +#define ACPI_REGION_MANAGED 0x80
> > > > + acpi_adr_space_handler handler;
> > > > + acpi_adr_space_setup setup;
> > > > + void *context;
> > > > + /* Invoking references */
> > > > + atomic_t refcnt;
> > >
> > > Actually, why don't you use krefs?
> >
> > If you take a look at other piece of my codes, you'll find there are two
> reasons:
> >
> > 1. I'm using while (atomic_read() > 1) to implement the objects' flushing and
> there is no kref API to do so.
>
> No, there's not any, but you can read kref.refcount directly, can't you?
>
> Moreover, it is not entirely clear to me that doing the while (atomic_read() > 1)
> is actually correct.
>
> > I just think it is not suitable for me to introduce such an API into kref.h and
> start another argument around kref designs in this bug fix patch. :-)
> > I'll start a discussion about kref design using another thread.
>
> You don't need to do that at all.
>
> > 2. I'm using ipmi_dev|msg_release() as a pair of ipmi_dev|msg_alloc(), it's
> kind of atomic_t coding style.
> > If atomic_t is changed to struct kref, I will need to implement two API,
> __ipmi_dev_release() to take a struct kref as parameter and call
> ipmi_dev_release inside it.
> > By not using kref, I needn't write codes to implement such API.
>
> I'm not following you, sorry.
>
> Please just use krefs for reference counting, the same way as you use
> struct list_head for implementing lists. This is the way everyone does
> that in the kernel and that's for a reason.
>
> Unless you do your reference counting under a lock, in which case using
> atomic_t isn't necessary at all and you can use a non-atomic counter.

I'll follow your suggestion of kref.
You can find my concern 2 related stuff in the next revision.
It's trivial.

>
> > > > +};
> > > > +
> > > > +static struct acpi_region
> acpi_regions[ACPI_NUM_PREDEFINED_REGIONS]
> > > = {
> > > > + [ACPI_ADR_SPACE_SYSTEM_MEMORY] = {
> > > > + .flags = ACPI_REGION_DEFAULT,
> > > > + },
> > > > + [ACPI_ADR_SPACE_SYSTEM_IO] = {
> > > > + .flags = ACPI_REGION_DEFAULT,
> > > > + },
> > > > + [ACPI_ADR_SPACE_PCI_CONFIG] = {
> > > > + .flags = ACPI_REGION_DEFAULT,
> > > > + },
> > > > + [ACPI_ADR_SPACE_IPMI] = {
> > > > + .flags = ACPI_REGION_MANAGED,
> > > > + },
> > > > +};
> > > > +static DEFINE_MUTEX(acpi_mutex_region);
> > > > +
> > > > /*
> > > > * This list of permanent mappings is for memory that may be accessed
> > > from
> > > > * interrupt context, where we can't do the ioremap().
> > > > @@ -1799,3 +1835,191 @@ void alloc_acpi_hp_work(acpi_handle
> handle,
> > > u32 type, void *context,
> > > > kfree(hp_work);
> > > > }
> > > > EXPORT_SYMBOL_GPL(alloc_acpi_hp_work);
> > > > +
> > > > +static bool acpi_region_managed(struct acpi_region *rgn) {
> > > > + /*
> > > > + * NOTE: Default and Managed
> > > > + * We only need to avoid region management on the regions
> managed
> > > > + * by ACPICA (ACPI_REGION_DEFAULT). Currently, we need
> additional
> > > > + * check as many operation region handlers are not upgraded, so
> > > > + * only those known to be safe are managed
> (ACPI_REGION_MANAGED).
> > > > + */
> > > > + return !(rgn->flags & ACPI_REGION_DEFAULT) &&
> > > > + (rgn->flags & ACPI_REGION_MANAGED); }
> > > > +
> > > > +static bool acpi_region_callable(struct acpi_region *rgn) {
> > > > + return (rgn->flags & ACPI_REGION_REGISTERED) &&
> > > > + !(rgn->flags & ACPI_REGION_UNREGISTERING); }
> > > > +
> > > > +static acpi_status
> > > > +acpi_region_default_handler(u32 function,
> > > > + acpi_physical_address address,
> > > > + u32 bit_width, u64 *value,
> > > > + void *handler_context, void *region_context) {
> > > > + acpi_adr_space_handler handler;
> > > > + struct acpi_region *rgn = (struct acpi_region *)handler_context;
> > > > + void *context;
> > > > + acpi_status status = AE_NOT_EXIST;
> > > > +
> > > > + mutex_lock(&acpi_mutex_region);
> > > > + if (!acpi_region_callable(rgn) || !rgn->handler) {
> > > > + mutex_unlock(&acpi_mutex_region);
> > > > + return status;
> > > > + }
> > > > +
> > > > + atomic_inc(&rgn->refcnt);
> > > > + handler = rgn->handler;
> > > > + context = rgn->context;
> > > > + mutex_unlock(&acpi_mutex_region);
> > > > +
> > > > + status = handler(function, address, bit_width, value, context,
> > > > + region_context);
> > >
> > > Why don't we call the handler under the mutex?
> > >
> > > What exactly prevents context from becoming NULL before the call above?
> >
> > It's a kind of programming style related concern.
> > IMO, using locks around callback function is a buggy programming style that
> could lead to dead locks.
> > Let me explain this using an example.
> >
> > Object A exports a register/unregister API for other objects.
> > Object B calls A's register/unregister API to register/unregister B's callback.
> > It's likely that object B will hold lock_of_B around unregister/register when
> object B is destroyed/created, the lock_of_B is likely also used inside the
> callback.
>
> Why is it likely to be used inside the callback? Clearly, if a callback is
> executed under a lock, that lock can't be acquired by that callback.

I think this is not related to the real purpose of why we must not hold a lock in this situation.
So let's ignore this paragraph.

>
> > So when object A holds the lock_of_A around the callback invocation, it leads
> to dead lock since:
> > 1. the locking order for the register/unregister side will be: lock(lock_of_B),
> lock(lock_of_A)
> > 2. the locking order for the callback side will be: lock(lock_of_A),
> lock(lock_of_B)
> > They are in the reversed order!
> >
> > IMO, Linux may need to introduce __callback, __api as decelerators for the
> functions, and use sparse to enforce this rule, sparse knows if a callback is
> invoked under some locks.
>
> Oh, dear. Yes, sparse knows such things, and so what?

I was thinking sparse can give us warnings on __api marked function invocation where __acquire count is not 0, this might be mandatory for high quality codes.
And sparse can also give us warnings on __callback marked function invocations where __acquire count is not 0, this should be optional.
But since it is not related to our topic, let's ignore this paragraph.

>
> > In the case of ACPICA space_handlers, as you may know, when an ACPI
> operation region handler is invoked, there will be no lock held inside ACPICA
> (interpreter lock must be freed before executing operation region handlers).
> > So the likelihood of the dead lock is pretty much high here!
>
> Sorry, what are you talking about?
>
> Please let me rephrase my question: What *practical* problems would it lead
> to
> if we executed this particular callback under this particular mutex?
>
> Not *theoretical* in the general theory of everything, *practical* in this
> particular piece of code.
>
> And we are talking about a *global* mutex here, not something object-specific.

I think you have additional replies on this in another email.
Let me reply you there.

>
> > > > + atomic_dec(&rgn->refcnt);
> > > > +
> > > > + return status;
> > > > +}
> > > > +
> > > > +static acpi_status
> > > > +acpi_region_default_setup(acpi_handle handle, u32 function,
> > > > + void *handler_context, void **region_context) {
> > > > + acpi_adr_space_setup setup;
> > > > + struct acpi_region *rgn = (struct acpi_region *)handler_context;
> > > > + void *context;
> > > > + acpi_status status = AE_OK;
> > > > +
> > > > + mutex_lock(&acpi_mutex_region);
> > > > + if (!acpi_region_callable(rgn) || !rgn->setup) {
> > > > + mutex_unlock(&acpi_mutex_region);
> > > > + return status;
> > > > + }
> > > > +
> > > > + atomic_inc(&rgn->refcnt);
> > > > + setup = rgn->setup;
> > > > + context = rgn->context;
> > > > + mutex_unlock(&acpi_mutex_region);
> > > > +
> > > > + status = setup(handle, function, context, region_context);
> > >
> > > Can setup drop rgn->refcnt ?
> >
> > The reason is same as the handler, as a setup is also a callback.
>
> Let me rephrase: Is it legitimate for setup to modify rgn->refcnt?
> If so, then why?

Yes, the race is same as the handler.
When ACPICA is accessing the text segment of the setup function implementation, the module owns the setup function can also be unloaded as there is no lock hold before invoking setup - note that ExitInter also happens to setup invocations.

>
> > >
> > > > + atomic_dec(&rgn->refcnt);
> > > > +
> > > > + return status;
> > > > +}
> > > > +
> > > > +static int __acpi_install_region(struct acpi_region *rgn,
> > > > + acpi_adr_space_type space_id)
> > > > +{
> > > > + int res = 0;
> > > > + acpi_status status;
> > > > + int installing = 0;
> > > > +
> > > > + mutex_lock(&acpi_mutex_region);
> > > > + if (rgn->flags & ACPI_REGION_INSTALLED)
> > > > + goto out_lock;
> > > > + if (rgn->flags & ACPI_REGION_INSTALLING) {
> > > > + res = -EBUSY;
> > > > + goto out_lock;
> > > > + }
> > > > +
> > > > + installing = 1;
> > > > + rgn->flags |= ACPI_REGION_INSTALLING;
> > > > + status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> > > space_id,
> > > > + acpi_region_default_handler,
> > > > + acpi_region_default_setup,
> > > > + rgn);
> > > > + rgn->flags &= ~ACPI_REGION_INSTALLING;
> > > > + if (ACPI_FAILURE(status))
> > > > + res = -EINVAL;
> > > > + else
> > > > + rgn->flags |= ACPI_REGION_INSTALLED;
> > > > +
> > > > +out_lock:
> > > > + mutex_unlock(&acpi_mutex_region);
> > > > + if (installing) {
> > > > + if (res)
> > > > + pr_err("Failed to install region %d\n", space_id);
> > > > + else
> > > > + pr_info("Region %d installed\n", space_id);
> > > > + }
> > > > + return res;
> > > > +}
> > > > +
> > > > +int acpi_register_region(acpi_adr_space_type space_id,
> > > > + acpi_adr_space_handler handler,
> > > > + acpi_adr_space_setup setup, void *context) {
> > > > + int res;
> > > > + struct acpi_region *rgn;
> > > > +
> > > > + if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> > > > + return -EINVAL;
> > > > +
> > > > + rgn = &acpi_regions[space_id];
> > > > + if (!acpi_region_managed(rgn))
> > > > + return -EINVAL;
> > > > +
> > > > + res = __acpi_install_region(rgn, space_id);
> > > > + if (res)
> > > > + return res;
> > > > +
> > > > + mutex_lock(&acpi_mutex_region);
> > > > + if (rgn->flags & ACPI_REGION_REGISTERED) {
> > > > + mutex_unlock(&acpi_mutex_region);
> > > > + return -EBUSY;
> > > > + }
> > > > +
> > > > + rgn->handler = handler;
> > > > + rgn->setup = setup;
> > > > + rgn->context = context;
> > > > + rgn->flags |= ACPI_REGION_REGISTERED;
> > > > + atomic_set(&rgn->refcnt, 1);
> > > > + mutex_unlock(&acpi_mutex_region);
> > > > +
> > > > + pr_info("Region %d registered\n", space_id);
> > > > +
> > > > + return 0;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(acpi_register_region);
> > > > +
> > > > +void acpi_unregister_region(acpi_adr_space_type space_id) {
> > > > + struct acpi_region *rgn;
> > > > +
> > > > + if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> > > > + return;
> > > > +
> > > > + rgn = &acpi_regions[space_id];
> > > > + if (!acpi_region_managed(rgn))
> > > > + return;
> > > > +
> > > > + mutex_lock(&acpi_mutex_region);
> > > > + if (!(rgn->flags & ACPI_REGION_REGISTERED)) {
> > > > + mutex_unlock(&acpi_mutex_region);
> > > > + return;
> > > > + }
> > > > + if (rgn->flags & ACPI_REGION_UNREGISTERING) {
> > > > + mutex_unlock(&acpi_mutex_region);
> > > > + return;
> > >
> > > What about
> > >
> > > if ((rgn->flags & ACPI_REGION_UNREGISTERING)
> > > || !(rgn->flags & ACPI_REGION_REGISTERED)) {
> > > mutex_unlock(&acpi_mutex_region);
> > > return;
> > > }
> > >
> >
> > OK.
> >
> > > > + }
> > > > +
> > > > + rgn->flags |= ACPI_REGION_UNREGISTERING;
> > > > + rgn->handler = NULL;
> > > > + rgn->setup = NULL;
> > > > + rgn->context = NULL;
> > > > + mutex_unlock(&acpi_mutex_region);
> > > > +
> > > > + while (atomic_read(&rgn->refcnt) > 1)
> > > > + schedule_timeout_uninterruptible(usecs_to_jiffies(5));
> > >
> > > Wouldn't it be better to use a wait queue here?
> >
> > Yes, I'll try.
>
> By the way, we do we need to do that?

I think you have additional replies on this in another email.
Let me reply you there.

Thanks for commenting.

Best regards
-Lv

>
> > > > + atomic_dec(&rgn->refcnt);
> > > > +
> > > > + mutex_lock(&acpi_mutex_region);
> > > > + rgn->flags &= ~(ACPI_REGION_REGISTERED |
> > > ACPI_REGION_UNREGISTERING);
> > > > + mutex_unlock(&acpi_mutex_region);
> > > > +
> > > > + pr_info("Region %d unregistered\n", space_id); }
> > > > +EXPORT_SYMBOL_GPL(acpi_unregister_region);
> > > > diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index
> > > > a2c2fbb..15fad0d 100644
> > > > --- a/include/acpi/acpi_bus.h
> > > > +++ b/include/acpi/acpi_bus.h
> > > > @@ -542,4 +542,9 @@ static inline int unregister_acpi_bus_type(void
> > > > *bus) { return 0; }
> > > >
> > > > #endif /* CONFIG_ACPI */
> > > >
> > > > +int acpi_register_region(acpi_adr_space_type space_id,
> > > > + acpi_adr_space_handler handler,
> > > > + acpi_adr_space_setup setup, void *context); void
> > > > +acpi_unregister_region(acpi_adr_space_type space_id);
> > > > +
> > > > #endif /*__ACPI_BUS_H__*/
>
> Thanks,
> Rafael
>
>
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
èº{.nÇ+‰·Ÿ®‰­†+%ŠËlzwm…ébëæìr¸›zX§»®w¥Š{ayºÊÚë,j­¢f£¢·hš‹àz¹®w¥¢¸ ¢·¦j:+v‰¨ŠwèjØm¶Ÿÿ¾«‘êçzZ+ƒùšŽŠÝj"ú!¶iO•æ¬z·švØ^¶m§ÿðà nÆàþY&—