Re: [PATCH] module: Fix race condition between load and unload module

From: Greg Kroah-Hartman
Date: Fri Apr 12 2013 - 19:54:00 EST


On Fri, Apr 12, 2013 at 04:47:50PM -0700, Linus Torvalds wrote:
> On Fri, Apr 12, 2013 at 3:32 PM, Anatol Pomozov
> <anatol.pomozov@xxxxxxxxx> wrote:
> >
> > Here is timeline for the crash in case if kset_find_obj() searches for
> > an object tht nobody holds and other thread is doing kobject_put()
> > on the same kobject:
> >
> > THREAD A (calls kset_find_obj()) THREAD B (calls kobject_put())
> > splin_lock()
> > atomic_dec_return(kobj->kref), counter gets zero here
> > ... starts kobject cleanup ....
> > spin_lock() // WAIT thread A in kobj_kset_leave()
> > iterate over kset->list
> > atomic_inc(kobj->kref) (counter becomes 1)
> > spin_unlock()
> > spin_lock() // taken
> > // it does not know that thread A increased counter so it
> > remove obj from list
> > spin_unlock()
> > vfree(module) // frees module object with containing kobj
> >
> > // kobj points to freed memory area!!
> > koubject_put(kobj) // OOPS!!!!
>
> This is a much more generic bug in kobjects, and I would hate to add
> some random workaround for just one case of this bug like you do. The
> more fundamental bug needs to be fixed too.
>
> I think the more fundamental bugfix is to just fix kobject_get() to
> return NULL if the refcount was zero, because in that case the kobject
> no longer really exists.
>
> So instead of having
>
> kref_get(&kobj->kref);
>
> it should do
>
> if (!atomic_inc_not_zero(&kobj->kref.refcount))
> kobj = NULL;
>
> and I think that should fix your race automatically, no? Proper patch
> attached (but TOTALLY UNTESTED - it seems to compile, though).
>
> The problem is that we lose the warning for when the refcount is zero
> and somebody does a kobject_get(), but that is ok *assuming* that
> people actually check the return value of kobject_get() rather than
> just "know" that if they passed in a non-NULL kobj, they'll get it
> right back.
>
> Greg - please take a look... I'm adding Al to the discussion too,
> because Al just *loooves* these kinds of races ;)

We "should" have some type of "higher-up" lock to prevent the
release/get races from happening, we have that in the driver core, and I
thought we had such a lock already in the module subsystem as well,
which will prevent any of this from being needed.

Rusty, don't we have a lock for this somewhere?

Linus, I think your patch will reduce the window the race could happen,
but it should still be there, although testing with it would be
interesting to see if the original problem can be triggered with it.

I'll look at it some more tomorrow, about to go to dinner now...

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/