Re: [PATCH] sysfs: driver core: Fix glue dir race condition

From: Yijing Wang
Date: Thu Nov 06 2014 - 20:47:25 EST


On 2014/11/7 1:22, Greg KH wrote:
> On Thu, Nov 06, 2014 at 11:55:47AM -0500, Tejun Heo wrote:
>> Maybe "fix glue dir race condition by not removing them" is a better
>> title?
>>
>> On Thu, Nov 06, 2014 at 04:16:38PM +0800, Yijing Wang wrote:
>>> There is a race condition when removing glue directory.
>>> It can be reproduced in following test:
>>>
>>> path 1: Add first child device
>>> device_add()
>>> get_device_parent()
>>> /*find parent from glue_dirs.list*/
>>> list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry)
>>> if (k->parent == parent_kobj) {
>>> kobj = kobject_get(k);
>>> break;
>>> }
>>> ....
>>> class_dir_create_and_add()
>>>
>>> path2: Remove last child device under glue dir
>>> device_del()
>>> cleanup_device_parent()
>>> cleanup_glue_dir()
>>> kobject_put(glue_dir);
>>>
>>> If path2 has been called cleanup_glue_dir(), but not
>>> call kobject_put(glue_dir), the glue dir is still
>>> in parent's kset list. Meanwhile, path1 find the glue
>>> dir from the glue_dirs.list. Path2 may release glue dir
>>> before path1 call kobject_get(). So kernel will report
>>> the warning and bug_on.
>>>
>>> This fix keep glue dir around once it created suggested
>>> by Tejun Heo.
>>
>> I think you prolly want to explain why this is okay / desired.
>> e.g. list how the glue dir is used and how many of them are there and
>> explain that there's no real benefit in removing them.
>
> I'd really _like_ to remove them if at all possible, as if there isn't
> any "children" in the subdirectory, there shouldn't be a need for that
> directory to be there.
>
> This seems to be the "classic" problem we have of a kref in a list that
> can be found while the last instance could be removed at the same time.
> I hate to just throw another lock at the problem, but wouldn't a lock to
> protect the list of glue_dirs be the answer here?

Hi Greg, in this case, we need to protect the race condition between traverse dev->class->p->glue_dirs.list
and kobject_put(glue_dir) in cleanup_glue_dir().

glue_dirs.list_lock only used to protect glue_dirs.list, but what we want to protect is
don't call kobject_put(glue_dir) to decrease glue_dir ref count during we traverse
dev->class->p->glue_dirs.list.


---------------------------------------------------------------------------
/* find our class-directory at the parent and reference it */
spin_lock(&dev->class->p->glue_dirs.list_lock);
list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry) ------>A
if (k->parent == parent_kobj) {
kobj = kobject_get(k);
break;
}
spin_unlock(&dev->class->p->glue_dirs.list_lock);
------------------------------------------------------------------------------
static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir)
{
/* see if we live in a "glue" directory */
if (!glue_dir || !dev->class ||
glue_dir->kset != &dev->class->p->glue_dirs)
return;

kobject_put(glue_dir); --------------->B
}
------------------------------------------------------------------------------


Tejun introduced a mutex gdp_mutex in commit 77d3d7c1d561f49 to fix the race condition in get_device_parent().
We could reuse the mutex to fix the race condition between glue_dirs.list traverse and kobject_put(glue_dir).

Greg, the two solutions (reuse the gdp_mutex and don't remove glue_dir), which one do you prefer ?


diff --git a/drivers/base/core.c b/drivers/base/core.c
index 28b808c..645eacf 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -724,12 +724,12 @@ class_dir_create_and_add(struct class *class, struct kobject *parent_kobj)
return &dir->kobj;
}

+static DEFINE_MUTEX(gdp_mutex);

static struct kobject *get_device_parent(struct device *dev,
struct device *parent)
{
if (dev->class) {
- static DEFINE_MUTEX(gdp_mutex);
struct kobject *kobj = NULL;
struct kobject *parent_kobj;
struct kobject *k;
@@ -793,7 +793,9 @@ static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir)
glue_dir->kset != &dev->class->p->glue_dirs)
return;

+ mutex_lock(&gdp_mutex);
kobject_put(glue_dir);
+ mutex_unlock(&gdp_mutex);
}

static void cleanup_device_parent(struct device *dev)









>
> thanks,
>
> greg k-h
>
> .
>


--
Thanks!
Yijing

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/