Re: [PATCH v3] scsi: aacraid: Fix reply queue mapping to CPUs based on IRQ affinity

From: John Meneghini
Date: Wed Jun 18 2025 - 15:37:54 EST


Martin,

This is the AACRAID patch we talked about at LSFMM this spring. This change has undergone extensive internal review and testing at Red Hat while working closely with Microchip. We'd really like to get this merged into v6.16 as a fix because the current upstream aacraid driver is badly broken when CPU offline is used.

I've based this patch on scsi/6.16/scsi-fixes.

Thanks,

/John

On 6/18/25 3:24 PM, John Meneghini wrote:
From: Sagar Biradar <sagar.biradar@xxxxxxxxxxxxx>

From: Sagar Biradar <sagar.biradar@xxxxxxxxxxxxx>

This patch fixes a bug in the original path that caused I/O hangs. The
I/O hangs were because of an MSIx vector not having a mapped online CPU
upon receiving completion.

This patch enables Multi-Q support in the aacriad driver. Multi-Q support
in the driver is needed to support CPU offlining.

SCSI cmds use the mq_map to get the vector_no via blk_mq_unique_tag()
and blk_mq_unique_tag_to_hwq() - which are setup during the blk_mq init.
For reserved cmds, or the ones before the blk_mq init, use the vector_no
0, which is the norm since don't yet have a proper mapping to the queues.

Note that this change can cause a drop in performance in some
configurations. To address any concerns about performance the
CONFIG_SCSI_AACRAID_MULTIQ option has been added.

The CONFIG_SCSI_AACRAID_MULTIQ option is on by default to ensure that
CPU offlining with MultiQ support is enabled. To disable MultiQ support
compile the kernel with CONFIG_SCSI_AACRAID_MULTIQ=N. Disabling MultiQ
support should not be done if your application uses CPU offling.

Closes: https://lore.kernel.org/linux-scsi/20250130173314.608836-1-sagar.biradar@xxxxxxxxxxxxx/

Fixes: c5becf57dd56 ("Revert "scsi: aacraid: Reply queue mapping to CPUs based on IRQ affinity"")
Signed-off-by: Sagar Biradar <sagar.biradar@xxxxxxxxxxxxx>
[jmeneghi: replace aac_cpu_offline_feature with Kconfig option]
Co-developed-by: John Meneghini <jmeneghi@xxxxxxxxxx>
Signed-off-by: John Meneghini <jmeneghi@xxxxxxxxxx>
Reviewed-by: Gilbert Wu <gilbert.wu@xxxxxxxxxxxxx>
Reviewed-by: Tomas Henzl <thenzl@xxxxxxxxxx>
Tested-by: Marco Patalano <mpatalan@xxxxxxxxxx>
---
Documentation/scsi/aacraid.rst | 11 +++++++++++
MAINTAINERS | 1 +
drivers/scsi/Kconfig | 2 +-
drivers/scsi/aacraid/Kconfig | 20 ++++++++++++++++++++
drivers/scsi/aacraid/aachba.c | 1 -
drivers/scsi/aacraid/aacraid.h | 3 +++
drivers/scsi/aacraid/commsup.c | 16 ++++++++++++++--
drivers/scsi/aacraid/linit.c | 23 ++++++++++++++++++++++-
drivers/scsi/aacraid/src.c | 28 +++++++++++++++++++++++++++-
9 files changed, 99 insertions(+), 6 deletions(-)
create mode 100644 drivers/scsi/aacraid/Kconfig

diff --git a/Documentation/scsi/aacraid.rst b/Documentation/scsi/aacraid.rst
index 1904674b94f3..0fc35edd0ac9 100644
--- a/Documentation/scsi/aacraid.rst
+++ b/Documentation/scsi/aacraid.rst
@@ -129,6 +129,17 @@ Supported Cards/Chipsets
People
======
+Sagar Biradar <Sagar.Biradar@xxxxxxxxxxxxx>
+
+ - Added support for CPU offlining and updated the driver to support MultiQ.
+ - Introduced the option CONFIG_SCSI_AACRAID_MULTIQ to control this feature.
+
+By default, MultiQ support is disabled to provide optimal I/O performance.
+
+Enable CONFIG_SCSI_AACRAID_MULTIQ only if CPU offlining support is required,
+as enabling it may result in reduced performance in some configurations.
+Note : Disabling MultiQ while still offlining CPUs may lead to I/O hangs.
+
Alan Cox <alan@xxxxxxxxxxxxxxxxxxx>
Christoph Hellwig <hch@xxxxxxxxxxxxx>
diff --git a/MAINTAINERS b/MAINTAINERS
index a92290fffa16..996c90959caa 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -184,6 +184,7 @@ L: linux-scsi@xxxxxxxxxxxxxxx
S: Supported
W: http://www.adaptec.com/
F: Documentation/scsi/aacraid.rst
+F: drivers/scsi/aacraid/Kconfig
F: drivers/scsi/aacraid/
AAEON UPBOARD FPGA MFD DRIVER
diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 5522310bab8d..a6739cd94db7 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -456,7 +456,7 @@ config SCSI_AACRAID
To compile this driver as a module, choose M here: the module
will be called aacraid.
-
+source "drivers/scsi/aacraid/Kconfig"
source "drivers/scsi/aic7xxx/Kconfig.aic7xxx"
source "drivers/scsi/aic7xxx/Kconfig.aic79xx"
source "drivers/scsi/aic94xx/Kconfig"
diff --git a/drivers/scsi/aacraid/Kconfig b/drivers/scsi/aacraid/Kconfig
new file mode 100644
index 000000000000..c69e1a7a77d2
--- /dev/null
+++ b/drivers/scsi/aacraid/Kconfig
@@ -0,0 +1,20 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Kernel configuration file for the aacraid driver.
+#
+# Copyright (c) 2025 Microchip Technology Inc. and its subsidiaries
+# (mailto:storagedev@xxxxxxxxxxxxx)
+#
+config SCSI_AACRAID_MULTIQ
+ bool "AACRAID Multiq support"
+ depends on SCSI_AACRAID
+ default n
+ help
+ This option enables MultiQ support in the aacraid driver.
+
+ Enabling MultiQ allows the driver to safely handle CPU offlining.
+ However, it may cause a performance drop in certain configurations.
+
+ Enable this option only if your system requires CPU offlining support.
+
+ If unsure, say N.
diff --git a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
index 0be719f38377..db9bef348834 100644
--- a/drivers/scsi/aacraid/aachba.c
+++ b/drivers/scsi/aacraid/aachba.c
@@ -328,7 +328,6 @@ MODULE_PARM_DESC(wwn, "Select a WWN type for the arrays:\n"
"\t1 - Array Meta Data Signature (default)\n"
"\t2 - Adapter Serial Number");
-
static inline int aac_valid_context(struct scsi_cmnd *scsicmd,
struct fib *fibptr) {
struct scsi_device *device;
diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
index 0a5888b53d6d..557f9fc50012 100644
--- a/drivers/scsi/aacraid/aacraid.h
+++ b/drivers/scsi/aacraid/aacraid.h
@@ -1672,6 +1672,9 @@ struct aac_dev
u32 handle_pci_error;
bool init_reset;
u8 soft_reset_support;
+#ifdef CONFIG_SCSI_AACRAID_MULTIQ
+ u8 use_map_queue;
+#endif
};
#define aac_adapter_interrupt(dev) \
diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
index 7d9a4dce236b..309a166ce6e4 100644
--- a/drivers/scsi/aacraid/commsup.c
+++ b/drivers/scsi/aacraid/commsup.c
@@ -215,8 +215,17 @@ int aac_fib_setup(struct aac_dev * dev)
struct fib *aac_fib_alloc_tag(struct aac_dev *dev, struct scsi_cmnd *scmd)
{
struct fib *fibptr;
+#ifdef CONFIG_SCSI_AACRAID_MULTIQ
+ u32 blk_tag;
+ int i;
+ blk_tag = blk_mq_unique_tag(scsi_cmd_to_rq(scmd));
+ i = blk_mq_unique_tag_to_tag(blk_tag);
+ fibptr = &dev->fibs[i];
+#else
fibptr = &dev->fibs[scsi_cmd_to_rq(scmd)->tag];
+#endif
+
/*
* Null out fields that depend on being zero at the start of
* each I/O
@@ -242,14 +251,17 @@ struct fib *aac_fib_alloc(struct aac_dev *dev)
{
struct fib * fibptr;
unsigned long flags;
+
spin_lock_irqsave(&dev->fib_lock, flags);
+ /* Management FIB allocation: use free list within reserved range */
fibptr = dev->free_fib;
- if(!fibptr){
+ if (!fibptr) {
spin_unlock_irqrestore(&dev->fib_lock, flags);
- return fibptr;
+ return NULL;
}
dev->free_fib = fibptr->next;
spin_unlock_irqrestore(&dev->fib_lock, flags);
+
/*
* Set the proper node type code and node byte size
*/
diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c
index 4b12e6dd8f07..8c56579a8efc 100644
--- a/drivers/scsi/aacraid/linit.c
+++ b/drivers/scsi/aacraid/linit.c
@@ -506,6 +506,17 @@ static int aac_sdev_configure(struct scsi_device *sdev,
return 0;
}
+#ifdef CONFIG_SCSI_AACRAID_MULTIQ
+static void aac_map_queues(struct Scsi_Host *shost)
+{
+ struct aac_dev *aac = (struct aac_dev *)shost->hostdata;
+ struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
+
+ blk_mq_map_hw_queues(qmap, &aac->pdev->dev, 0);
+ aac->use_map_queue = true;
+}
+#endif
+
/**
* aac_change_queue_depth - alter queue depths
* @sdev: SCSI device we are considering
@@ -1490,6 +1501,9 @@ static const struct scsi_host_template aac_driver_template = {
.bios_param = aac_biosparm,
.shost_groups = aac_host_groups,
.sdev_configure = aac_sdev_configure,
+#ifdef CONFIG_SCSI_AACRAID_MULTIQ
+ .map_queues = aac_map_queues,
+#endif
.change_queue_depth = aac_change_queue_depth,
.sdev_groups = aac_dev_groups,
.eh_abort_handler = aac_eh_abort,
@@ -1777,7 +1791,11 @@ static int aac_probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
shost->max_lun = AAC_MAX_LUN;
pci_set_drvdata(pdev, shost);
-
+#ifdef CONFIG_SCSI_AACRAID_MULTIQ
+ shost->nr_hw_queues = aac->max_msix;
+ shost->can_queue = min_t(int, aac->vector_cap, shost->can_queue);
+ shost->host_tagset = 1;
+#endif
error = scsi_add_host(shost, &pdev->dev);
if (error)
goto out_deinit;
@@ -1908,6 +1926,9 @@ static void aac_remove_one(struct pci_dev *pdev)
struct aac_dev *aac = (struct aac_dev *)shost->hostdata;
aac_cancel_rescan_worker(aac);
+#ifdef CONFIG_SCSI_AACRAID_MULTIQ
+ aac->use_map_queue = false;
+#endif
scsi_remove_host(shost);
__aac_shutdown(aac);
diff --git a/drivers/scsi/aacraid/src.c b/drivers/scsi/aacraid/src.c
index 28115ed637e8..5881bce3ccfc 100644
--- a/drivers/scsi/aacraid/src.c
+++ b/drivers/scsi/aacraid/src.c
@@ -493,6 +493,12 @@ static int aac_src_deliver_message(struct fib *fib)
#endif
u16 vector_no;
+#ifdef CONFIG_SCSI_AACRAID_MULTIQ
+ struct scsi_cmnd *scmd;
+ u32 blk_tag;
+ struct Scsi_Host *shost = dev->scsi_host_ptr;
+ struct blk_mq_queue_map *qmap;
+#endif
atomic_inc(&q->numpending);
@@ -505,8 +511,28 @@ static int aac_src_deliver_message(struct fib *fib)
if ((dev->comm_interface == AAC_COMM_MESSAGE_TYPE3)
&& dev->sa_firmware)
vector_no = aac_get_vector(dev);
- else
+ else {
+#ifdef CONFIG_SCSI_AACRAID_MULTIQ
+ if (!fib->vector_no || !fib->callback_data) {
+ if (shost && dev->use_map_queue) {
+ qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
+ vector_no = qmap->mq_map[raw_smp_processor_id()];
+ }
+ /*
+ * We hardcode the vector_no for reserved commands
+ * as a valid shost is absent during the init.
+ */
+ else
+ vector_no = 0;
+ } else {
+ scmd = (struct scsi_cmnd *)fib->callback_data;
+ blk_tag = blk_mq_unique_tag(scsi_cmd_to_rq(scmd));
+ vector_no = blk_mq_unique_tag_to_hwq(blk_tag);
+ }
+#else
vector_no = fib->vector_no;
+#endif
+ }
if (native_hba) {
if (fib->flags & FIB_CONTEXT_FLAG_NATIVE_HBA_TMF) {