Re: [PATCH] iommu: Avoid races around device probe

From: Robin Murphy
Date: Mon Nov 07 2022 - 11:41:45 EST


On 2022-11-05 01:36, Brian Norris wrote:
On Fri, Nov 04, 2022 at 07:51:43PM +0000, Robin Murphy wrote:
We currently have 3 different ways that __iommu_probe_device() may be
called, but no real guarantee that multiple callers can't tread on each
other, especially once asynchronous driver probe gets involved. It would
likely have taken a fair bit of luck to hit this previously, but commit
57365a04c921 ("iommu: Move bus setup to IOMMU device registration") ups
the odds since now it's not just omap-iommu that may trigger multiple
bus_iommu_probe() calls in parallel if probing asynchronously.

Add a lock to ensure we can't try to double-probe a device, and also
close some possible race windows to make sure we're truly robust against
trying to double-initialise a group via two different member devices.

Reported-by: Brian Norris <briannorris@xxxxxxxxxxxx>
Signed-off-by: Robin Murphy <robin.murphy@xxxxxxx>
---
drivers/iommu/iommu.c | 28 ++++++++++++++++++++++------
1 file changed, 22 insertions(+), 6 deletions(-)

If I've tested appropriately (there's always room for operator error),
this seems to resolve the problems I reported:

Tested-by: Brian Norris <briannorris@xxxxxxxxxxxx>

I haven't reviewed closely enough to know how precisely this is a
regression (your description sounds like you think the bug existed some
time before that), but based on testing, this sounds like:

Fixes: 57365a04c921 ("iommu: Move bus setup to IOMMU device
registration")

That commit did not introduce the race, just made it more visible. The underlying condition probably goes back at least 3 years to where we started allocating and freeing per-device data around what was then the ops->add_device() call.

In practice, you'd have to be absurdly lucky for an iommu_probe_device() call via {of,acpi}_dma_configure() to line up with bus_iommu_probe() touching the same device, but by inspection I think it's theoretically possible. Thus previously there was probably only a realistic chance of seeing it on certain OMAP systems, where the explicit bus_iommu_probe() calls could overlap if both instances probed in parallel - my commit just brings all the other drivers in line with that same behaviour via iommu_device_register(). Other systems - like Rockchip in particular - may have greater numbers of IOMMU instances and thus even more chance for parallel probes to line up just right.

Since nobody's ever reported real-world issues on OMAP (although it's quite likely nobody's ever tried driver_async_probe with omap-iommu anyway) there doesn't seem to be a compelling reason for backporting, so I didn't fancy spending hours digging through subsystem-wide history trying to figure out an appropriate fixes tag; as long as this can make 6.1 that should be enough :)

Thanks,
Robin.

But even if not, the report could probably use:

Link: https://lore.kernel.org/lkml/Y1CHh2oM5wyHs06J@xxxxxxxxxx/

And most of all, thanks!

Brian