Re: [PATCH v2 1/1] cxl/core: Fix initialization of mbox_cmd.size_out in get event

From: Alison Schofield
Date: Fri Apr 05 2024 - 13:37:54 EST


On Fri, Apr 05, 2024 at 05:40:56PM +0100, Jonathan Cameron wrote:
> On Fri, 5 Apr 2024 09:04:16 -0700
> Alison Schofield <alison.schofield@xxxxxxxxx> wrote:
>
> > On Tue, Apr 02, 2024 at 05:14:03PM +0900, Kwangjin Ko wrote:
> > > Since mbox_cmd.size_out is overwritten with the actual output size in
> > > the function below, it needs to be initialized every time.
> > >
> > > cxl_internal_send_cmd -> __cxl_pci_mbox_send_cmd
> > >
> > > Problem scenario:
> > >
> > > 1) The size_out variable is initially set to the size of the mailbox.
> > > 2) Read an event.
> > > - size_out is set to 160 bytes(header 32B + one event 128B).
> > > - Two event are created while reading.
> > > 3) Read the new *two* events.
> > > - size_out is still set to 160 bytes.
> > > - Although the value of out_len is 288 bytes, only 160 bytes are
> > > copied from the mailbox register to the local variable.
> > > - record_count is set to 2.
> > > - Accessing records[1] will result in reading incorrect data.
> >
> > Agree with the other comments on need to set .out_size when doing
> > cxl_internal_send_cmd() in a loop. Poison list retrieval can hit
> > this case if the MORE flag is set and a follow on read of the list
> > delivers more records than the previous read. ie. device gives one
> > record, sets the _MORE flag, then gives 5.
> >
> > 2 other things appeared to me while looking at this:
> >
> > First, it seems that there is another cleanup wrt accessing records
> > with invalid data. Still focusing on get_events and get_poison
> > since those loop through output data based on a device supplied
> > record count. The min_out check means the driver at least gets a
> > count of records to expect. That's good. The problem occurs::
> >
> > if (mbox.size_out != struct_size(payload, records, 'record_count'))
> >
> > The driver will log garbage trace events, and that could lead to
> > bad actions based on bad data. (like a needless scan of device based
> > on a false overflow flag). So, checking that size.out is the proper
> > multiple of record_count protects driver from bad device behavior.
> >
> > I think that can be combined w the patch Dan is suggesting to
> > reset mbox.size_out on each loop.
> Hi Alison,
>
> I'd split it. Dan's one is a bug fix, this is hardening against
> a device bug. Good to have but not really backport material unless
> we think there are devices like this out there.

Agree, not aware of actual device behavior.

>
> >
> > Second thing is the pci-driver quiet handling of PAYLOAD LENGTH
> > values reported by the device. It seems like at a minimum the
> > pci-driver could emit an info or debug message when the device
> > is reporting payload lengths that exceed what the driver can
> > copy in.
>
> When does this happen?
> 1. New fields on end of a fixed length message.
> Correct to silently eat it as the spec is buggy if we don't
> have backwards compatibility.
> I don't think the spec has had that particular type of bug yet,
> but maybe I'm forgetting one.
> 2. Device bug. Can't tell that is different from 1.
>

My thought was 2) device bug. Bad device is returning payload length
that exceeds what pci/cxl-driver can consume, so gets ignored. Am I
worrying about debugging the hardware needlessly?

--Alison

> So maybe dev_dbg(). I'm not sure why the cxl-driver would ever want to
> know.
>
> > I'm referring to the mbox.size_out adjustment in
> > __cxl_pci_mbox_send_cmd(). Or, if it's not the pci-drivers job
> > to judge, pass that actual payload length value back in the
> > mbox structure (new field) so that the cxl-driver can use it.
> > The pci driver would still do it's "#8 Sanitize the copy" work,
> > but it would allow the cxl-driver to clearly see why it got the
> > .size_out it got, and squawk about it if needed.
> >
> > --Alison
> >
> > >
> > > Signed-off-by: Kwangjin Ko <kwangjin.ko@xxxxxx>
> > > ---
> > > drivers/cxl/core/mbox.c | 3 ++-
> > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > > index 9adda4795eb7..a38531a055c8 100644
> > > --- a/drivers/cxl/core/mbox.c
> > > +++ b/drivers/cxl/core/mbox.c
> > > @@ -958,13 +958,14 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
> > > .payload_in = &log_type,
> > > .size_in = sizeof(log_type),
> > > .payload_out = payload,
> > > - .size_out = mds->payload_size,
> > > .min_out = struct_size(payload, records, 0),
> > > };
> > >
> > > do {
> > > int rc, i;
> > >
> > > + mbox_cmd.size_out = mds->payload_size;
> > > +
> > > rc = cxl_internal_send_cmd(mds, &mbox_cmd);
> > > if (rc) {
> > > dev_err_ratelimited(dev,
> > > --
> > > 2.34.1
> > >
>