Re: [BUG] New arm scmi check in linux-next causing rk3568 not to boot due to firmware bug

From: Cristian Marussi
Date: Thu May 05 2022 - 06:47:49 EST


On Thu, May 05, 2022 at 11:40:09AM +0200, Etienne Carriere wrote:
> Hello Nicolas, Cristian,
>
> On Thu, 5 May 2022 at 10:03, Cristian Marussi <cristian.marussi@xxxxxxx> wrote:
> >
> > On Wed, May 04, 2022 at 07:51:45PM +0200, Nicolas Frattaroli wrote:
> > > On Mittwoch, 4. Mai 2022 15:21:30 CEST Sudeep Holla wrote:
> > > > + Cristian
> >
> > +Etieenne
> >
> > Hi Nicolas,
> >

Hi Etienne,

[snip]

> > Having a quick look at TF-A SCMI code in charge of this message (at least in
> > the upstream):
> >
> > https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/drivers/scmi-msg/base.c#n136
> >
> > it seems to me that the bug lies in the fact that the BASE_DISCOVER_PROTOCOLS
> > response message built by the function above is not properly sized: the trailing
> > message payload carrying the list of protocols (after returned_protos field) returns
> > always a fixed arbitrarily sized payload, possibly zeroed if fewer protocols are
> > carried.
> >
> > IOW, even though the answer in this case carries 3 items/protocols, the payload
> > is anyway 8 bytes (with the last 5 bytes zeroed), while by the spec it should have
> > been just 4 bytes.
> >
> > (in fact testing the kernel fix on a JUNO with last SCP fw release did NOT expose
> > any issue...)
> >
> > I think a fix FW side could be something along these lines (UNTESTED NOR
> > BUILT ! ... I Cc'ed Etienne that seems the author of this bit)
> >
> > This basically mirrors the same checks introduced in kernel...if someone
> > is fancy/able to test it....
>
> Indeed the firmware implementation is wrong in TF-A.
> And also in OP-TEE by the way:
> https://github.com/OP-TEE/optee_os/blob/3.17.0/core/drivers/scmi-msg/base.c#L163-L166
>
> @Nicoals, do you want to send a patch to TF-A, or do you want me to do it?
>
> I can fix the optee_os implementation. I'll tell you when I'll have
> created a P-R.
> The fix is the same for TF-A and OP-TEE.
> Proposal from Cristian looks good to me, maybe simplified:
>
> ```patch
> memcpy(outargs, &p2a, sizeof(p2a));
> memcpy(outargs + sizeof(p2a), list + a2p->skip, count);
>
> - scmi_write_response(msg, outargs, sizeof(outargs));
> + list_sz = (1 + (count - 1) / sizeof(uint32_t)) * sizeof(uint32_t);
> + scmi_write_response(msg, outargs, sizeof(p2a) + list_sz);
> ```
>

I don't think list_sz is properly calculated if you don't rule out the
count = 0 case (did the same mistake in Kernel at first :D)...if count is
zero list_sz ends up being 4 [(1 + (0 - 1) / 4 ) * 4] while it should be
zero. (...and 'if (count)' also avoid an unneeded memcpy of zero bytes)

Moreover reviewing my own proposal code below it's probably easier to use
some sort of macro like ALIGN(count, 4) if there's one in TF-A/OP-TEE.
(...checking anyway that can handle correctly the zero case..)

Thanks,
Etienne