Re: [PATCH v5 19/19] irqdomain: Switch to per-domain locking

From: Johan Hovold
Date: Sat Feb 11 2023 - 06:35:00 EST


On Fri, Feb 10, 2023 at 03:06:37PM +0000, Marc Zyngier wrote:
> On Fri, 10 Feb 2023 12:57:40 +0000,
> Johan Hovold <johan@xxxxxxxxxx> wrote:
> >
> > On Fri, Feb 10, 2023 at 11:38:58AM +0000, Marc Zyngier wrote:
> > > On Fri, 10 Feb 2023 09:56:03 +0000,
> > > Johan Hovold <johan@xxxxxxxxxx> wrote:

> > > > > > @@ -1132,6 +1147,7 @@ struct irq_domain *irq_domain_create_hierarchy(struct irq_domain *parent,
> > > > > > else
> > > > > > domain = irq_domain_create_tree(fwnode, ops, host_data);
> > > > > > if (domain) {
> > > > > > + domain->root = parent->root;
> > > > > > domain->parent = parent;
> > > > > > domain->flags |= flags;
> > > > >
> > > > > So we still have a bug here, as we have published a domain that we
> > > > > keep updating. A parallel probing could find it in the interval and do
> > > > > something completely wrong.
> > > >
> > > > Indeed we do, even if device links should make this harder to hit these
> > > > days.
> > > >
> > > > > Splitting the work would help, as per the following patch.
> > > >
> > > > Looks good to me. Do you want to submit that as a patch that I'll rebase
> > > > on or should I submit it as part of a v6?
> > >
> > > Just take it directly.
> >
> > Ok, thanks.

I've added a commit message and turned it into a patch to include in v6
now:

commit 3af395aa894c7df94ef2337e572e5e1710b4bbda (HEAD -> work)
Author: Marc Zyngier <maz@xxxxxxxxxx>
Date: Thu Feb 9 16:00:55 2023 +0000

irqdomain: Fix domain registration race

Hierarchical domains created using irq_domain_create_hierarchy() are
currently added to the domain list before having been fully initialised.

This specifically means that a racing allocation request might fail to
allocate irq data for the inner domains of a hierarchy in case the
parent domain pointer has not yet been set up.

Note that this is not really any issue for irqchip drivers that are
registered early via IRQCHIP_DECLARE() or IRQCHIP_ACPI_DECLARE(), but
could potentially cause trouble with drivers that are registered later
(e.g. when using IRQCHIP_PLATFORM_DRIVER_BEGIN(), gpiochip drivers,
etc.).

Fixes: afb7da83b9f4 ("irqdomain: Introduce helper function irq_domain_add_hierarchy()")
Cc: stable@xxxxxxxxxxxxxxx # 3.19
...
[ johan: add a commit message ]
Signed-off-by: Johan Hovold <johan+linaro@xxxxxxxxxx>

Could you just give your SoB for the diff here so I can credit you as
author?

> > I guess this turns the "Use irq_domain_create_hierarchy()" patches into
> > fixes that should be backported as well.
>
> Maybe. Backports are not my immediate concern.

Turns out all of those drivers are registered early via
IRQCHIP_DECLARE() or IRQCHIP_ACPI_DECLARE() so there shouldn't really be
any risk of hitting this race for those.

> > But note that your proposed diff may not be sufficient to prevent
> > lookups from racing with domain registration generally. Many drivers
> > still update the bus token after the domain has been added (and
> > apparently some still set flags also after creating hierarchies I just
> > noticed, e.g. amd_iommu_create_irq_domain).
>
> The bus token should only rarely be a problem, as it is often set on
> an intermediate level which isn't directly looked-up by anything else.
> And if it did happen, it would probably result in a the domain not
> being found.
>
> Flags, on the other hand, are more problematic. But I consider this a
> driver bug which should be fixed independently.

I agree.

> > It seems we'd need to expose a separate allocation and registration
> > interface, or at least pass in the bus token to a new combined
> > interface.
>
> Potentially, yes. But this could come later down the line. I'm more
> concerned in getting this series into -next, as the merge window is
> fast approaching.

I'll post a v6 first thing Monday if you can give me that SoB before
then.

Johan