Re: [PATCH 3/3] rust: devres: fix race in Devres::drop()
From: Danilo Krummrich
Date: Wed Jun 04 2025 - 05:50:10 EST
On Tue, Jun 03, 2025 at 04:26:01PM -0700, Boqun Feng wrote:
> On Tue, Jun 03, 2025 at 10:48:52PM +0200, Danilo Krummrich wrote:
> > In Devres::drop() we first remove the devres action and then drop the
> > wrapped device resource.
> >
> > The design goal is to give the owner of a Devres object control over when
> > the device resource is dropped, but limit the overall scope to the
> > corresponding device being bound to a driver.
> >
> > However, there's a race that was introduced with commit 8ff656643d30
> > ("rust: devres: remove action in `Devres::drop`"), but also has been
> > (partially) present from the initial version on.
> >
> > In Devres::drop(), the devres action is removed successfully and
> > subsequently the destructor of the wrapped device resource runs.
> > However, there is no guarantee that the destructor of the wrapped device
> > resource completes before the driver core is done unbinding the
> > corresponding device.
> >
> > If in Devres::drop(), the devres action can't be removed, it means that
> > the devres callback has been executed already, or is still running
> > concurrently. In case of the latter, either Devres::drop() wins revoking
> > the Revocable or the devres callback wins revoking the Revocable. If
> > Devres::drop() wins, we (again) have no guarantee that the destructor of
> > the wrapped device resource completes before the driver core is done
> > unbinding the corresponding device.
> >
> > Depending on the specific device resource, this can potentially lead to
> > user-after-free bugs.
> >
>
> This all sounds reasonable, one question though: it seems to me the
> problem exists only for the device resources that expect the device
> being bounded, so hypothetically if the device resources can be
> programmed against unbound devices, then the current behavior should be
> fine?
I don't think that such device resources exist from a semantical point of view.
We always have to guarantee that a driver released the device resources once the
corresponding device is unbound from the driver.
However, there certainly are differences between how fatal it is if we don't do
so.
Complementing your example below, if we for instance fail to release a memory
region in time, a subsequent driver probing the device may fail requesting the
corresponding region.
> For example, in your case, you want free_irq() to happen before
> the device becomes unbound, which is of course reasonable, but it sounds
> more like a design choice (or what device model we want to use), because
> hypothetically you can program an irq that still works even if the
> device is unbound, no?
You can, just like for every other registration (e.g. class devices, such as
misc device), but it's sub-optimal, since then we could not treat the
registering device of the registration as &Device<Bound>, which allows direct
access to device resources with Devres::access(). Please see also [1] and [2].
We have two (safe and correct) ways to access device resources, one is the
runtime checked access through Revocable::try_access() (which implies the RCU
read-side critical section and atomic check); the other one is the compile-time
checked access through providing a &Device<Bound> as cookie for directy access
without runtime overhead.
Wherever possible, we want to enable the latter, which means that registrations
need to be properly guarded.
[1] https://