Re: [PATCH -v3] use per cpu data for single cpu ipi calls

From: Jens Axboe
Date: Sat Jan 31 2009 - 03:46:29 EST


On Fri, Jan 30 2009, Peter Zijlstra wrote:
> > If another CPU hasn't even received its IPI before the same CPU sends the
> > next one, I'm not sure we _want_ to send one, in fact.
>
> I think the intent was to re-route IO-completion interrupts to whatever
> cpu/node issued the IO with the idea that that cpu/node has the page
> hottest etc. and transferring the completion is cheaper than bouncing
> the page.

Correct

> Since that would be relaying hardware interrupts, there's nothing much
> you can do about the rate, or something, that's up to the firmware on
> $$$ scsi thing.
>
> But Jens already said that that path was using the __ variant and
> providing its own csds, the kmalloc isn't needed there, so it might all
> be moot.

In fact the block layer already does attempt to do what Linus describes.
We queue the events for the target cpu, and then do:

local_irq_save(flags);
list = &__get_cpu_var(blk_cpu_done);
list_add_tail(&rq->csd.list, list);

if (list->next == &rq->csd.list)
raise_softirq_irqoff(BLOCK_SOFTIRQ);

thus only triggering a new softirq interrupt, if the preceeding one
hasn't run already. So this is done for the block layer
trigger_softirq() part, but could be provided by the lower layer as well
instead.

> > But that's a secondary issue, and isn't a correctness thing, just a "do we
> > really need three different allocations?" musing..
>
> Nick, Jens, I was under the presumption that the kmalloc was needed for
> something other than failing to deadlock, happen to remember what?

As far as I remember, it was just the way to allocate memory for the
non-wait case. The per-cpu single csd will limit you to a single pending
entry on the cpu queue, you could have more (like the block layer will
do) and get a nice batching effect for ipi busy workloads instead of a
1:1 mapping between work and ipi's fired.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/