Re: [PATCH] platform/chrome: cros_ec_rpmsg: Fix race with host event.

From: Pi-Hsun Shih
Date: Mon Mar 02 2020 - 01:00:46 EST


(Resending since I forgot to use plain text mode in the previous mail,
and got blocked by mailing lists. Sorry for the duplicate email.)

Hi Enric,

On Fri, Feb 28, 2020 at 4:52 PM Enric Balletbo i Serra
<enric.balletbo@xxxxxxxxxxxxx> wrote:
>
> Hi Pi-Hsun,
>
> On 17/2/20 16:55, Enric Balletbo i Serra wrote:
> > Dear remoteproc experts,
> >
> > cc'ing you for if we can have your feedback on this change.
> >
> > Thanks Pi-Hsun, for your quick answer, makes sense but I'm still feeling that I
> > miss something (probably because I'm not a remoteproc expert), so I added the
> > Remoteproc people for if they can comment this patch. We have time as we're in
> > rc2 only, so I'd like to wait a bit in case they can take a look.
> >
> > If no answer is received I'll take a second look and apply the patch.
> >
>
> I'll pick this patch, just I want to request a minor change.
>
> > Thanks,
> > Enric
> >
> > On 15/2/20 4:56, Pi-Hsun Shih wrote:
> >> Hi Enric,
> >>
> >> On Fri, Feb 14, 2020 at 11:10 PM Enric Balletbo i Serra
> >> <enric.balletbo@xxxxxxxxxxxxx> wrote:
> >>>
> >>> Hi Pi-Hsun,
> >>>
> >>> On 14/2/20 9:26, Pi-Hsun Shih wrote:
> >>>> Host event can be sent by remoteproc by any time, and
> >>>> cros_ec_rpmsg_callback would be called after cros_ec_rpmsg_create_ept.
> >>>> But the cros_ec_device is initialized after that, which cause host event
> >>>> handler to use cros_ec_device that are not initialized properly yet.
> >>>>
> >>>
> >>> I don't have the hardware to test but, can't we call first cros_ec_register and
> >>> then cros_ec_rpmsg_create_ept?
> >>>
> >>> Start receiving driver callbacks before finishing to probe the drivers itself
> >>> sounds weird to me.
> >>>
> >>> Thanks,
> >>> Enric
> >>
> >> Since cros_ec_register calls cros_ec_query_all, which sends message to
> >> remoteproc using cros_ec_pkt_xfer_rpmsg (to query protocol version),
> >> the ec_rpmsg->ept need to be ready before calling cros_ec_register.
> >>
> >>>
> >>>> Fix this by don't schedule host event handler before cros_ec_register
> >>>> returns. Instead, remember that we have a pending host event, and
> >>>> schedule host event handler after cros_ec_register.
> >>>>
> >>>> Fixes: 71cddb7097e2 ("platform/chrome: cros_ec_rpmsg: Fix race with host command when probe failed.")
> >>>> Signed-off-by: Pi-Hsun Shih <pihsun@xxxxxxxxxxxx>
> >>>> ---
> >>>> drivers/platform/chrome/cros_ec_rpmsg.c | 16 +++++++++++++++-
> >>>> 1 file changed, 15 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/platform/chrome/cros_ec_rpmsg.c b/drivers/platform/chrome/cros_ec_rpmsg.c
> >>>> index dbc3f5523b83..7e8629e3db74 100644
> >>>> --- a/drivers/platform/chrome/cros_ec_rpmsg.c
> >>>> +++ b/drivers/platform/chrome/cros_ec_rpmsg.c
> >>>> @@ -44,6 +44,8 @@ struct cros_ec_rpmsg {
> >>>> struct completion xfer_ack;
> >>>> struct work_struct host_event_work;
> >>>> struct rpmsg_endpoint *ept;
> >>>> + bool has_pending_host_event;
> >>>> + bool probe_done;
>
>
> Could you try if just calling driver_probe_done() when needed works, so we don't
> need to add a new boolean flag for this?

Changing from "if (ec_rpmsg->probe_done)" to "if (driver_probe_done()
== 0)" works in my testing.

But since driver_probe_done() returns 0 after all driver probes are
done, not after the probe of cros_ec_rpmsg driver, I think it's
possible that we got a host event after the cros_ec_rpmsg driver probe
is done (ec_rpmsg->probe_done is true), but before all driver probe
done (driver_probe_done() is still -EBUSY). In this case the host
event would be lost since we would set the has_pending_host_event flag
but no one would be processing it.

>
> Thanks,
> Enric
>
> >>>> };
> >>>>
> >>>> /**
> >>>> @@ -177,7 +179,14 @@ static int cros_ec_rpmsg_callback(struct rpmsg_device *rpdev, void *data,
> >>>> memcpy(ec_dev->din, resp->data, len);
> >>>> complete(&ec_rpmsg->xfer_ack);
> >>>> } else if (resp->type == HOST_EVENT_MARK) {
> >>>> - schedule_work(&ec_rpmsg->host_event_work);
> >>>> + /*
> >>>> + * If the host event is sent before cros_ec_register is
> >>>> + * finished, queue the host event.
> >>>> + */
> >>>> + if (ec_rpmsg->probe_done)
> >>>> + schedule_work(&ec_rpmsg->host_event_work);
> >>>> + else
> >>>> + ec_rpmsg->has_pending_host_event = true;
> >>>> } else {
> >>>> dev_warn(ec_dev->dev, "rpmsg received invalid type = %d",
> >>>> resp->type);
> >>>> @@ -240,6 +249,11 @@ static int cros_ec_rpmsg_probe(struct rpmsg_device *rpdev)
> >>>> return ret;
> >>>> }
> >>>>
> >>>> + ec_rpmsg->probe_done = true;
> >>>> +
> >>>> + if (ec_rpmsg->has_pending_host_event)
> >>>> + schedule_work(&ec_rpmsg->host_event_work);
> >>>> +
> >>>> return 0;
> >>>> }
> >>>>
> >>>>
> >>>> base-commit: b19e8c68470385dd2c5440876591fddb02c8c402
> >>>>