Re: [patch] voluntary-preempt-2.6.9-rc1-bk12-R6

From: William Lee Irwin III
Date: Thu Sep 09 2004 - 17:48:22 EST


Ingo Molnar <mingo@xxxxxxx> wrote:
>> yep, the get_swap_page() latency. I can easily trigger 10+ msec
>> latencies on a box with alot of swap by just letting stuff swap out. I
>> had a quick look but there was no obvious way to break the lock. Maybe
>> Andrew has better ideas? get_swap_page() is pretty stupid, it does a
>> near linear search for a free slot in the swap bitmap - this not only is
>> a latency issue but also an overhead thing as we do it for every other
>> page that touches swap.

On Thu, Sep 09, 2004 at 01:05:26PM -0700, Andrew Morton wrote:
> Someone needs to get down and redesign the swap block allocator. I bet
> latency improvements would fall out of that automatically.
> The main problem is that swap blocks are now physically clustered according
> to the page lru ordering, which doesn't have much relationship to
> process-virtual-address-ordering.
> The swap allocator made sense when we were doing a virtual scan. It
> doesn't make much sense now.

Something odd is going on, in part because I get *blistering* IO speeds
running benchmarks like dbench, tiobench, et al on tmpfs with striped
swap. In fact, IO speeds markedly faster than any other filesystem I've
ever tried, by about 30MB/s (i.e. wirespeed, where others fall about
37.5% short of it). Virtual alignment issues do hurt, but the core
allocation algorithm appears to be better than good, it's astounding.


On Thu, Sep 09, 2004 at 01:05:26PM -0700, Andrew Morton wrote:
> I did a patch a while back which switches the swapspace allocator over to
> perform program-virtual-address clustering, but it didn't help much in
> brief testing and I haven't got back onto it.
> And contrary to my above assertion, I don't think it'll help latency ;)
> A short-term bodge would be to scan the map without locks held, take the
> lock just to actually claim the block, retry if we raced. Use swapon_sem
> to avoid races. After checking that we never perform GFP_WAIT allocations
> while holding swapon_sem.
> The whole thing needs work.

Well, yes, dbench on tmpfs isn't really the load we're shooting for.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/