Re: [PATCH] kobject: Make sure the parent does not get released before its children

From: Greg Kroah-Hartman
Date: Sun May 24 2020 - 07:43:01 EST


On Sat, May 23, 2020 at 12:04:30PM -0700, Dmitry Torokhov wrote:
> On Sat, May 23, 2020 at 8:48 AM Randy Dunlap <rdunlap@xxxxxxxxxxxxx> wrote:
> >
> > On 5/23/20 8:36 AM, Greg Kroah-Hartman wrote:
> > > On Wed, May 13, 2020 at 06:18:40PM +0300, Heikki Krogerus wrote:
> > >> In the function kobject_cleanup(), kobject_del(kobj) is
> > >> called before the kobj->release(). That makes it possible to
> > >> release the parent of the kobject before the kobject itself.
> > >>
> > >> To fix that, adding function __kboject_del() that does
> > >> everything that kobject_del() does except release the parent
> > >> reference. kobject_cleanup() then calls __kobject_del()
> > >> instead of kobject_del(), and separately decrements the
> > >> reference count of the parent kobject after kobj->release()
> > >> has been called.
> > >>
> > >> Reported-by: Naresh Kamboju <naresh.kamboju@xxxxxxxxxx>
> > >> Reported-by: kernel test robot <rong.a.chen@xxxxxxxxx>
> > >> Fixes: 7589238a8cf3 ("Revert "software node: Simplify software_node_release() function"")
> > >> Suggested-by: "Rafael J. Wysocki" <rafael@xxxxxxxxxx>
> > >> Signed-off-by: Heikki Krogerus <heikki.krogerus@xxxxxxxxxxxxxxx>
> > >> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> > >> Reviewed-by: Brendan Higgins <brendanhiggins@xxxxxxxxxx>
> > >> Tested-by: Brendan Higgins <brendanhiggins@xxxxxxxxxx>
> > >> Acked-by: Randy Dunlap <rdunlap@xxxxxxxxxxxxx>
> > >> ---
> > >> lib/kobject.c | 30 ++++++++++++++++++++----------
> > >> 1 file changed, 20 insertions(+), 10 deletions(-)
> > >
> > > Stepping back, now that it turns out this patch causes more problems
> > > than it fixes, how is everyone reproducing the original crash here?
> >
> > Just load lib/test_printf.ko and boom!
> >
> >
> > > Is it just the KUNIT_DRIVER_PE_TEST that is causing the issue?
> > >
> > > In looking at 7589238a8cf3 ("Revert "software node: Simplify
> > > software_node_release() function""), the log messages there look
> > > correct. sysfs can't create a duplicate file, and so when your test is
> > > written to try to create software nodes, you always have to check the
> > > return value. If you run the test in parallel, or before another test
> > > has had a chance to clean up, the function will fail, correctly.
> > >
> > > So what real-world thing is this test "failure" trying to show?
>
> Well, not sure about the test, but speaking more generally, should not
> we postpone releasing parent's reference until we are in
> kobj->release() handler? I.e. after all child state is cleared, and
> all memory is freed, _then_ we unpin the parent?

That's what the patch was trying to do in a way. But I think you are
right, we should _only_ be doing it at that point in time, and no other,
which the patch was not doing.

Let me go try that and see what happens...

thanks,

greg k-h