Re: [PATCH] Urgent bug fix causing Apple SSDs to not work.

From: Keith Busch
Date: Sat Sep 25 2021 - 16:39:02 EST


On Sat, Sep 25, 2021 at 10:08:53PM +0200, Sven Peter wrote:
> I actually ran into a similar issue while adding support for the NVMe
> controller found on the M1 and assumed it was only present there.
>
> Some background why this happens: ANS2 is a co-processor that emulates
> an NVMe MMIO interface and uses the tag as an index to an internal data
> structure.

Thanks for confirming the behavior. The patch should restore the
command_id values to when everything was working. I'll just need to
update the quirk description to better align with the actual limitation
if the patch is successful.

> On the M1 we can directly talk to ANS2 and while we can submit
> commands with a higher index it'll just ignore the upper bits and only
> return the lowest eight IIRC in the completion queue.
> I guess whatever software is running on the T2 actually has an assert to
> ensure that the tag is within the limits before forwarding the command
> to ANS2.

Is the PCI Device ID for the M1 the same as reported for the T2? Either
0x2005 or 0x2003 should make this quirk apply.

Side note, I mistakenly added an entry for device ID 0x2006, but that
was from me misreading the report.

> Best,
>
> Sven
>
>
> On Sat, Sep 25, 2021, at 21:54, Keith Busch wrote:
> > On Sat, Sep 25, 2021 at 11:47:08AM -0700, Linus Torvalds wrote:
> >> On Fri, Sep 24, 2021 at 9:02 PM Aditya Garg <gargaditya08@xxxxxxxx> wrote:
> >> >
> >> > From: Aditya Garg <gargaditya08@xxxxxxxx>
> >> > Date: Fri, 24 Sep 2021 15:36:45 +0530
> >> > Subject: [PATCH] Revert nvme to 5.14.5 to fix incompatibility arised in Apple SSDs.
> >> > Fixes: e7006de6c238 (nvme: code command_id with a genctr for use-after-free validation)
> >>
> >> I think we need to hear more about the problem than just revert a
> >> commit like this randomly. That commit has already been picked up for
> >> -stable,
> >>
> >> What are the exact symptoms, and which Apple SSD is this?
> >>
> >> I do find this:
> >>
> >> https://lore.kernel.org/all/cjJiSFV77WM51ciS8EuBcdeBcv9T83PUB-Kw3yi8PuC_LwrrUUnQ3w5RC1PbKvSYE72KryXp3wOJhv4Ov_WWIe2gKWOOo5uwuUjbbFA8HDM=@protonmail.com/
> >>
> >> which instead of a revert has an actual patch. Can you try that one?
> >>
> >> Keith Busch replied to that one, saying that the Apple SSD might not
> >> be spec compliant, but hey, what else is new? If we start demanding
> >> that hardware comply with specs, we'd have to scrap the whole notion
> >> of working in the real world. Plus it would be very hypocritical of
> >> us, since we ignore all specs when we deem them too limiting (whether
> >> they be language specs, POSIX OS specs, or whatever).
> >
> > Right, we have a lot of quirks for the apple controllers, what's one
> > more? :)
> >
> > Could the following patch be tried? I'm basing this off the 'lspci' from
> > Orlando, but I'm assuming the previous model has the same limitation,
> > too.
> >
> > ---
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index 7efb31b87f37..f0787233557f 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -979,6 +979,7 @@ EXPORT_SYMBOL_GPL(nvme_cleanup_cmd);
> > blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req)
> > {
> > struct nvme_command *cmd = nvme_req(req)->cmd;
> > + struct nvme_ctrl *ctrl = nvme_req(req)->ctrl;
> > blk_status_t ret = BLK_STS_OK;
> >
> > if (!(req->rq_flags & RQF_DONTPREP)) {
> > @@ -1027,7 +1028,8 @@ blk_status_t nvme_setup_cmd(struct nvme_ns *ns,
> > struct request *req)
> > return BLK_STS_IOERR;
> > }
> >
> > - nvme_req(req)->genctr++;
> > + if (!(ctrl->quirks & NVME_QUIRK_SKIP_CID_GEN))
> > + nvme_req(req)->genctr++;
> > cmd->common.command_id = nvme_cid(req);
> > trace_nvme_setup_cmd(req, cmd);
> > return ret;
> > diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> > index 9871c0c9374c..b49761d30df7 100644
> > --- a/drivers/nvme/host/nvme.h
> > +++ b/drivers/nvme/host/nvme.h
> > @@ -86,6 +86,12 @@ enum nvme_quirks {
> > */
> > NVME_QUIRK_NO_DEEPEST_PS = (1 << 5),
> >
> > + /*
> > + * The controller requires the command_id value be be limited to the
> > + * queue depth.
> > + */
> > + NVME_QUIRK_SKIP_CID_GEN = (1 << 6),
> > +
> > /*
> > * Set MEDIUM priority on SQ creation
> > */
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index b82492cd7503..d9f22ed68185 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -3369,7 +3369,10 @@ static const struct pci_device_id nvme_id_table[] = {
> > { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2005),
> > .driver_data = NVME_QUIRK_SINGLE_VECTOR |
> > NVME_QUIRK_128_BYTES_SQES |
> > - NVME_QUIRK_SHARED_TAGS },
> > + NVME_QUIRK_SHARED_TAGS ,
> > + NVME_QUIRK_SKIP_CID_GEN },
> > + { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2006),
> > + .driver_data = NVME_QUIRK_SKIP_CID_GEN },
> >
> > { PCI_DEVICE_CLASS(PCI_CLASS_STORAGE_EXPRESS, 0xffffff) },
> > { 0, }
> > --
>
> --
> Sven Peter