Re: [patch] voluntary-preempt-2.6.9-rc1-bk12-R6

From: William Lee Irwin III
Date: Tue Sep 21 2004 - 19:20:07 EST


* Lee Revell <rlrevell@xxxxxxxxxxx> wrote:
>> preemption latency trace v1.0.6 on 2.6.9-rc1-bk12-VP-R6
>> --------------------------------------------------
>> latency: 605 us, entries: 5 (5) [VP:1 KP:1 SP:1 HP:1 #CPUS:1]
>> -----------------
>> | task: kswapd0/35, uid:0 nice:0 policy:0 rt_prio:0
>> -----------------
>> => started at: get_swap_page+0x23/0x490
>> => ended at: get_swap_page+0x13f/0x490
>> =======>
>> 00000001 0.000ms (+0.606ms): get_swap_page (add_to_swap)
>> 00000001 0.606ms (+0.000ms): sub_preempt_count (get_swap_page)
>> 00000001 0.606ms (+0.000ms): update_max_trace (check_preempt_timing)
>> 00000001 0.606ms (+0.000ms): _mmx_memcpy (update_max_trace)
>> 00000001 0.607ms (+0.000ms): kernel_fpu_begin (_mmx_memcpy)

This appears to be the middle of the function, which is about right...


On Thu, Sep 09, 2004 at 09:29:24PM +0200, Ingo Molnar wrote:
> yep, the get_swap_page() latency. I can easily trigger 10+ msec
> latencies on a box with alot of swap by just letting stuff swap out. I
> had a quick look but there was no obvious way to break the lock. Maybe
> Andrew has better ideas? get_swap_page() is pretty stupid, it does a
> near linear search for a free slot in the swap bitmap - this not only is
> a latency issue but also an overhead thing as we do it for every other
> page that touches swap.
> rationale: this is pretty much the only latency that we still having
> during heavy VM load and it would Just Be Cool if we fixed this final
> one. audio daemons and apps like jackd use mlockall() so they are not
> affected by swapping.

I presume most of the time is due to scan_swap_map() and not much of the
rest of get_swap_page(). Dangling hierarchical bitmaps off of the
swap_info structures to accelerate the search sounds plausible, though
code reuse is largely infeasible due to memory allocation concerns (it
must be fully-populated during swaplist element creation). The space
overhead should be 1 bit per unsigned short at the first level and 1
bit per word for higher levels until the terms vanish. So that's
si->max*sum(i>=0,BITS_PER_LONG**i<=si->max)floor(si->max/BITS_PER_LONG**i)
<= si->max*/(1-1/BITS_PER_LONG) bits, or
si->max*sizeof(long)/(BITS_PER_LONG-1) bytes, which is
sizeof(long)/sizeof(short)/(BITS_PER_LONG-1) times the size of the
original swap map, which is 2/31 on 32-bit and 4/63 on 64-bit of the
size of the original swap map, both of which are just above 1/16 (1/496
above on 32-bit and 1/1008 above on 64-bit), so the space overhead
appears to be acceptable. A hierarchical bitmap should reduce the time
requirements for the search from O(sum(si) si->max) to
O(BITS_PER_LONG/lg(BITS_PER_LONG)). Sound reasonable?


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/