Re: [PATCH RFC] drivers/core: Replace lockdep_set_novalidate_class() with unique class keys

From: Kent Overstreet
Date: Sun Feb 12 2023 - 15:55:22 EST


On Sun, Feb 12, 2023 at 03:19:16PM -0500, Alan Stern wrote:
> Maybe I didn't understand your suggestion. Did you mean that all
> callers of device_initialize() (or whatever) should be able to specify
> their own lock_class_key? Or were you just trying to avoid the single
> static allocation of a lock_class_key in device_initialize() done as a
> side-effect of the mutex_init() call?
>
> > And whatever effect
> > would only be when lockdep is enabled, so not a concern.
>
> Not if a new function is created (i.e., __device_initialize()).

Follow the same pattern as mutex_init() - have a look over that code.

> > But consider that the name of a lock as registered with lockdep really
> > should correspond to a source code location - i.e. it should be
> > something you can grep for. (We should probably consider adding file and
> > line number to that string, since current names are not unambiguous).
> >
> > Whereas in your pass, you're calling lockdep_set_class(), which
> > generates a class name via stringifcation - with the same name every
> > time.
> >
> > Definitely _don't_ do that. With your patch, the generated lockdep
> > traces will be useless.
>
> I'll revise the patch to use the device's name for the class. While it
> may not be globally unique, it should be sufficiently specific.
>
> (Device names are often set after the device is initialized. Does
> lockdep mind if a lock_class_key's name is changed after it has been
> registered?)

The device name should _not_ be something dynamic, it should be
something easily tied back to a source code location - i.e. related to
the driver name, not the device name.

That's how people read and use lockdep reports!

Do it exactly the same way mutex_init() does it, just lift it up a level
to a wrapper around device_initialize() - stringify the pointer to the
mutex (embedded in struct device, embedded in what-have-you driver code)
and use that.

> You're right. There are no explicitly documented rules for device
> locking as far as I'm aware. Each subsystem handles its own locking
> independently (but self-consistently, we hope). That's one of the
> reasons that creating lockdep rules for devices is so difficult.
>
> The business about not locking a parent if you already own the child's
> lock is perhaps the only universal -- and I don't know that it's written
> down anywhere.

Yeah that's sketchy; if the rules are too complicated to be written
down, they're too complicated.

One thing that could be contemplated is adding support for user-defined
comparison functions to lockdep, to define a lock ordering within a
class when subclass isn't sufficient.

That would work for bcache - for bcache the lock ordering is parent
nodes before children, and if multiple nodes are locked at the same
level they have to be locked in natural key order.

But, this would add a lot of complexity to lockdep, and this is the sort
of situation where if you have a bug in the comparison function (i.e. it
doesn't define a total ordering) it'll break things in terribly fun
ways.

> > The patch I posted would be an improvement over the current situation,
> > because it'd get you checking w.r.t. other lock types - but with that
> > you would still have to have your own deadlock avoidance strategy, and
> > you'd have to be _really_ clear on what it is and how you know that
> > you're getting it right - you're still opting out of checking.
>
> Same with the patch I posted, except that it opts back into checking.
>
> > I think you should really be investigating wait/wound mutexes here.
>
> At this stage, converting would be most impractical. And I don't think
> it's really needed.

They do make you deal with lock restarts; unwinding typical stateful
kernel code is not necessarily super practical :)

Anyways, it sounds like the lockdep-class-per-driver approach will get
you more information, that's certainly a reasonable approach for now.

Cheers,
-Kent