Re: [PATCH 3/3] rust: devres: fix race in Devres::drop()

From: Benno Lossin
Date: Thu Jun 12 2025 - 04:48:31 EST


On Thu Jun 12, 2025 at 10:15 AM CEST, Alice Ryhl wrote:
> On Thu, Jun 12, 2025 at 10:13 AM Benno Lossin <lossin@xxxxxxxxxx> wrote:
>> On Tue Jun 3, 2025 at 10:48 PM CEST, Danilo Krummrich wrote:
>> > In Devres::drop() we first remove the devres action and then drop the
>> > wrapped device resource.
>> >
>> > The design goal is to give the owner of a Devres object control over when
>> > the device resource is dropped, but limit the overall scope to the
>> > corresponding device being bound to a driver.
>> >
>> > However, there's a race that was introduced with commit 8ff656643d30
>> > ("rust: devres: remove action in `Devres::drop`"), but also has been
>> > (partially) present from the initial version on.
>> >
>> > In Devres::drop(), the devres action is removed successfully and
>> > subsequently the destructor of the wrapped device resource runs.
>> > However, there is no guarantee that the destructor of the wrapped device
>> > resource completes before the driver core is done unbinding the
>> > corresponding device.
>> >
>> > If in Devres::drop(), the devres action can't be removed, it means that
>> > the devres callback has been executed already, or is still running
>> > concurrently. In case of the latter, either Devres::drop() wins revoking
>> > the Revocable or the devres callback wins revoking the Revocable. If
>> > Devres::drop() wins, we (again) have no guarantee that the destructor of
>> > the wrapped device resource completes before the driver core is done
>> > unbinding the corresponding device.
>>
>> I don't understand the exact sequence of events here. Here is what I got
>> from your explanation:
>>
>> * the driver created a `Devres<T>` associated to their device.
>> * their physical device gets disconnected and thus the driver core
>> starts unbinding the device.
>> * simultaneously, the driver drops the `Devres<T>` (eg because the
>> driver initiated the physical removal)
>> * now `devres_callback` is being called from both `Devres::Drop` (which
>> calls `Devres::remove_action`) and from the driver core.
>> * they both call `inner.data.revoke()`, but only one wins, in our
>> example `Devres::drop`.
>> * but now the driver core has finished running `devres_callback` and
>> finalizes unbinding the device, even though the `Devres` still exists
>> though is almost done being dropped.
>>
>> I don't see a race here. Also the `dev: ARef<Device>` should keep the
>> device alive until the `Devres` is dropped, no?
>
> The race is that Devres is used when the contents *must* be dropped
> before the device is unbound. This example violates that by having
> device unbind finish before the contents are dropped.

If `Devres::drop` is being run, nobody has access to it any longer.
Additionally, the data in the revocable has already been dropped if
`revoke()` has been run, so it's fine?

---
Cheers,
Benno