Re: [PATCH v5 25/32] iommu/mtk: Migrate to aggregate driver

From: Yong Wu
Date: Fri Jan 14 2022 - 04:06:44 EST


On Wed, 2022-01-12 at 20:25 -0800, Stephen Boyd wrote:
> >
> > [ 2.654526] ------------[ cut here ]------------
> > [ 2.655558] refcount_t: addition on 0; use-after-free.
> >
> > After this patch, the aggregate_driver flow looks ok. But our
> > driver
> > still aborts like this:
> >
> > [ 2.721316] Unable to handle kernel NULL pointer dereference at
> > virtual address 0000000000000000
> > ...
> > [ 2.731658] pc :
> > mtk_smi_larb_config_port_gen2_general+0xa4/0x138
> > [ 2.732434] lr : mtk_smi_larb_resume+0x54/0x98
> > ...
> > [ 2.742457] Call trace:
> > [ 2.742768] mtk_smi_larb_config_port_gen2_general+0xa4/0x138
> > [ 2.743496] pm_generic_runtime_resume+0x2c/0x48
> > [ 2.744090] __genpd_runtime_resume+0x30/0xa8
> > [ 2.744648] genpd_runtime_resume+0x94/0x2c8
> > [ 2.745191] __rpm_callback+0x44/0x150
> > [ 2.745669] rpm_callback+0x6c/0x78
> > [ 2.746114] rpm_resume+0x314/0x558
> > [ 2.746559] __pm_runtime_resume+0x3c/0x88
> > [ 2.747080] pm_runtime_get_suppliers+0x7c/0x110
> > [ 2.747668] __driver_probe_device+0x4c/0xe8
> > [ 2.748212] driver_probe_device+0x44/0x130
> > [ 2.748745] __device_attach_driver+0x98/0xd0
> > [ 2.749300] bus_for_each_drv+0x68/0xd0
> > [ 2.749787] __device_attach+0xec/0x148
> > [ 2.750277] device_attach+0x14/0x20
> > [ 2.750733] bus_rescan_devices_helper+0x50/0x90
> > [ 2.751319] bus_for_each_dev+0x7c/0xd8
> > [ 2.751806] bus_rescan_devices+0x20/0x30
> > [ 2.752315] __component_add+0x7c/0xa0
> > [ 2.752795] component_add+0x14/0x20
> > [ 2.753253] mtk_smi_larb_probe+0xe0/0x120
> >
> > This is because the device runtime_resume is called before the bind
> > operation(In our case this detailed function is mtk_smi_larb_bind).
> > The issue doesn't happen without this patchset. I'm not sure the
> > right
> > sequence. If we should fix in mediatek driver, the patch could be:
>
> Oh, the runtime PM is moved around with these patches. The aggregate
> device is runtime PM enabled before the probe is called,

In our case, the component device may probe before the aggregate
device. thus the component device runtime PM has already been enabled
when aggregate device probe.

> and there are
> supplier links made to each component, so each component is runtime
> resumed before the aggregate probe function is called.

Yes. This is the current flow.

> It means that all
> the component drivers need to have their resources ready to power on
> before their component_bind() callback is made.

Sorry, I don't understand here well. In this case, The component
drivers prepare the resource for power on in the component_bind since
the resource comes from the aggregate driver. Thus, we expect the
component_bind run before the runtime resume callback.

Another solution is moving the component's pm_runtime_enable into the
component_bind(It's mtk_smi_larb_bind here), then the runtime callback
is called after component_bind in which the resource for power on is
ready.

> Thinking more about it
> that may be wrong if something from the aggregate device is needed to
> fully power on the component. Is that what is happening here?
>
> > diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
> > index b883dcc0bbfa..288841555067 100644
> > --- a/drivers/memory/mtk-smi.c
> > +++ b/drivers/memory/mtk-smi.c
> > @@ -483,8 +483,9 @@ static int __maybe_unused
> > mtk_smi_larb_resume(struct device *dev)
> > if (ret < 0)
> > return ret;
> >
> > - /* Configure the basic setting for this larb */
> > - larb_gen->config_port(dev);
> > + /* Configure the basic setting for this larb after it binds
> > with iommu */
> > + if (larb->mmu)
> > + larb_gen->config_port(dev);
> >
> > return 0;
> > }
> >