Re: [PATCH] coresight: Defer probe when the child dev is not probed

From: Jinlong Mao
Date: Tue Mar 01 2022 - 21:29:22 EST



On 3/1/2022 11:03 PM, Suzuki K Poulose wrote:
Hi

On 01/03/2022 13:30, Jinlong Mao wrote:
Hi Mike,

On 3/1/2022 9:15 PM, Mike Leach wrote:
Hi,

On Tue, 1 Mar 2022 at 11:42, Jinlong Mao <quic_jinlmao@xxxxxxxxxxx> wrote:
On 2/28/2022 10:51 PM, Suzuki K Poulose wrote:


...


Hi Suzuki,

This issue happens when race condition happens.
The condition is that the device and its child_device's probe happens at the same time.

For example: device0 and its child device device1.
Both of them are calling coresight_register function. device0 is calling coresight_fixup_device_conns.
device1 is waiting for device0 to release the coresight_mutex. Because device1's csdev node is allocated,
coresight_make_links will be called for device0. Then in coresight_add_sysfs_link, has_conns_grp is true
for device0, but has_conns_grp is false for device1 as has_conns_grp is set to true in coresight_create_conns_sysfs_group .
The probe of device0 will fail for at this condition.


struct coresight_device *coresight_register(struct coresight_desc *desc)
{
    .........
     mutex_lock(&coresight_mutex);

     ret = coresight_create_conns_sysfs_group(csdev);
     if (!ret)
         ret = coresight_fixup_device_conns(csdev);
     if (!ret)
         ret = coresight_fixup_orphan_conns(csdev);
     if (!ret && cti_assoc_ops && cti_assoc_ops->add)
         cti_assoc_ops->add(csdev);

     mutex_unlock(&coresight_mutex);

.........

}

static int coresight_fixup_device_conns(struct coresight_device *csdev)
{
    ..........
         conn->child_dev =
coresight_find_csdev_by_fwnode(conn->child_fwnode);
The issue appears to be a constraint hidden in the lower layers of the code.
Would a better solution not be to alter the code here:

if (conn->child_dev && conn->child_dev->has_conns_grp) {
    ...
} else {
       csdev->orphan = true;
}

which would mean that the connection attempt would drop through to
label the connection as an orphan, to be cleaned up by the child
itself when it runs coresight_fixup_orphan_conns()


Tnanks Mike, I think that is a good solution. Alternatively, we
could make sure that device_register() and the fixup following
that are atomic.

i.e.

    mutex_lock()

    device_register()
    fixup_connections()
    create_sysfs()

    mutex_unlock();

The fix may be a bit invasive than Mike's proposal, but it makes
sure we don't end up with half baked device on the coresight-bus.

Suzuki

Thanks Mike & Suzuki.

I will combine your proposals and make the changes.

I will get back to you after the test.


Thanks
Jinlong Mao