Re: [PATCH] drm/msm: Initialize mode_config earlier

From: Johan Hovold
Date: Tue Jan 17 2023 - 03:04:29 EST


On Mon, Jan 16, 2023 at 08:51:22PM -0600, Bjorn Andersson wrote:
> On Fri, Jan 13, 2023 at 10:57:18AM +0200, Dmitry Baryshkov wrote:
> > On 13/01/2023 06:23, Dmitry Baryshkov wrote:
> > > On 13/01/2023 06:10, Bjorn Andersson wrote:
> > > > Invoking drm_bridge_hpd_notify() on a drm_bridge with a HPD-enabled
> > > > bridge_connector ends up in drm_bridge_connector_hpd_cb() calling
> > > > drm_kms_helper_hotplug_event(), which assumes that the associated
> > > > drm_device's mode_config.funcs is a valid pointer.
> > > >
> > > > But in the MSM DisplayPort driver the HPD enablement happens at bind
> > > > time and mode_config.funcs is initialized late in msm_drm_init(). This
> > > > means that there's a window for hot plug events to dereference a NULL
> > > > mode_config.funcs.
> > > >
> > > > Move the assignment of mode_config.funcs before the bind, to avoid this
> > > > scenario.
> > >
> > > Cam we make DP driver not to report HPD events until the enable_hpd()
> > > was called? I think this is what was fixed by your internal_hpd
> > > patchset.
> >
> > Or to express this in another words: I thought that internal_hpd already
> > deferred enabling hpd event reporting till the time when we need it, didn't
> > it?
> >
>
> I added a WARN_ON(1) in drm_bridge_hpd_enable() to get a sense of when
> this window of "opportunity" opens up, and here's the callstack:
>
> ------------[ cut here ]------------
> WARNING: CPU: 6 PID: 99 at drivers/gpu/drm/drm_bridge.c:1260 drm_bridge_hpd_enable+0x48/0x94 [drm]
> ...
> Call trace:
> drm_bridge_hpd_enable+0x48/0x94 [drm]
> drm_bridge_connector_enable_hpd+0x30/0x3c [drm_kms_helper]
> drm_kms_helper_poll_enable+0xa4/0x114 [drm_kms_helper]
> drm_kms_helper_poll_init+0x6c/0x7c [drm_kms_helper]
> msm_drm_bind+0x370/0x628 [msm]
> try_to_bring_up_aggregate_device+0x170/0x1bc
> __component_add+0xb0/0x168
> component_add+0x20/0x2c
> dp_display_probe+0x40c/0x468 [msm]
> platform_probe+0xb4/0xdc
> really_probe+0x13c/0x300
> __driver_probe_device+0xc0/0xec
> driver_probe_device+0x48/0x204
> __device_attach_driver+0x124/0x14c
> bus_for_each_drv+0x90/0xdc
> __device_attach+0xdc/0x1a8
> device_initial_probe+0x20/0x2c
> bus_probe_device+0x40/0xa4
> deferred_probe_work_func+0x94/0xd0
> process_one_work+0x1a8/0x3c0
> worker_thread+0x254/0x47c
> kthread+0xf8/0x1b8
> ret_from_fork+0x10/0x20
> ---[ end trace 0000000000000000 ]---
>
> As drm_kms_helper_poll_init() is the last thing being called in
> msm_drm_init() shifting around the mode_config.func assignment would not
> have any impact.
>
> Perhaps we have shuffled other things around to avoid this bug? Either
> way, let's this on hold until further proof that it's still
> reproducible.

As I've mentioned off list, I haven't hit the apparent race I reported
here:

https://lore.kernel.org/all/Y1efJh11B5UQZ0Tz@xxxxxxxxxxxxxxxxxxxx/

since moving to 6.2. I did hit it with both 6.0 and 6.1-rc2, but it
could very well be that something has changes that fixes (or hides) the
issue since.

Johan