[RFC PATCH 2/2] block: adaptive rq_affinity

From: Dan Williams
Date: Fri Jul 22 2011 - 16:59:52 EST


For some storage configurations the coarse grained cpu grouping (socket)
does not supply enough cpu to keep up with the demands of high iops.
Bypass the grouping and complete on the direct requester cpu when the
local cpu is under softirq pressure (as measured by ksoftirqd being in
the running state).

Cc: Matthew Wilcox <matthew@xxxxxx>
Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Cc: Roland Dreier <roland@xxxxxxxxxxxxxxx>
Tested-by: Dave Jiang <dave.jiang@xxxxxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
---
block/blk-softirq.c | 12 +++++++++++-
1 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/block/blk-softirq.c b/block/blk-softirq.c
index 475fab8..f0cda19 100644
--- a/block/blk-softirq.c
+++ b/block/blk-softirq.c
@@ -101,16 +101,20 @@ static struct notifier_block __cpuinitdata blk_cpu_notifier = {
.notifier_call = blk_cpu_notify,
};

+DECLARE_PER_CPU(struct task_struct *, ksoftirqd);
+
void __blk_complete_request(struct request *req)
{
int ccpu, cpu, group_cpu = NR_CPUS;
struct request_queue *q = req->q;
+ struct task_struct *tsk;
unsigned long flags;

BUG_ON(!q->softirq_done_fn);

local_irq_save(flags);
cpu = smp_processor_id();
+ tsk = per_cpu(ksoftirqd, cpu);

/*
* Select completion CPU
@@ -124,7 +128,13 @@ void __blk_complete_request(struct request *req)
} else
ccpu = cpu;

- if (ccpu == cpu || ccpu == group_cpu) {
+ /*
+ * try to skip a remote softirq-trigger if the completion is
+ * within the same group, but not if local softirqs have already
+ * spilled to ksoftirqd
+ */
+ if (ccpu == cpu ||
+ (ccpu == group_cpu && tsk->state != TASK_RUNNING)) {
struct list_head *list;
do_local:
list = &__get_cpu_var(blk_cpu_done);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/