Re: [PATCH v2 3/3] scsi: ufs: ufs-exynos: implement exynos isr

From: Bart Van Assche
Date: Mon Sep 13 2021 - 12:23:23 EST


On 9/13/21 12:55 AM, Kiwoong Kim wrote:
This patch is to raise recovery in some abnormal
conditions using an vendor specific interrupt
for some cases, such as a situation that some
contexts of a pending request in the host isn't
the same with those of its corresponding UPIUs
if they should have been the same exactly.

The representative case is shown like below.
In the case, a broken UTRD entry, for internal
coherent problem or whatever, that had smaller value
of PRDT length than expected was transferred to the host.
So, the host raised an interrupt of transfer complete
even if device didn't finish its data transfer because
the host sees a fetched version of UTRD to determine
if data tranfer is over or not. Then the application level
seemed to recogize this as a sort of corruption and this
symptom led to boot failure.

How can a UTRD entry be broken? Does that perhaps indicate memory
corruption at the host side? Working around host-side memory
corruption in a driver seems wrong to me. I think the root cause
of the memory corruption should be fixed.

+static irqreturn_t exynos_ufs_isr(struct ufs_hba *hba)
+{
+ struct exynos_ufs *ufs = ufshcd_get_variant(hba);
+ u32 status;
+ unsigned long flags;
+
+ if (!hba->priv) return IRQ_HANDLED;

Please verify patches with checkpatch before posting these on the
linux-scsi mailing list. The above if-statement does not follow the
Linux kernel coding style.

+ if (status & RX_UPIU_HIT_ERROR) {
+ pr_err("%s: status: 0x%08x\n", __func__, status);
+ hba->force_reset = true;
+ hba->force_requeue = true;
+ scsi_schedule_eh(hba->host);
+ spin_unlock_irqrestore(hba->host->host_lock, flags);
+ return IRQ_HANDLED;
+ }
+ return IRQ_NONE;
+}

So the above code unlocks the host_lock depending on whether or not
status & RX_UPIU_HIT_ERROR is true? Yikes ...

Additionally, in the above code I found the following pattern:

unsigned long flags;
[ ... ]
spin_unlock_irqrestore(hba->host->host_lock, flags);

Such code is ALWAYS wrong. The value of the 'flags' argument passed to
spin_unlock_irqrestore() must come from spin_lock_irqsave().

Bart.