Re: [PATCH -mm] make swapin readahead skip over holes

From: KOSAKI Motohiro
Date: Wed Jan 11 2012 - 02:15:00 EST


2012/1/9 Rik van Riel <riel@xxxxxxxxxx>:
> On 01/09/2012 06:49 PM, KOSAKI Motohiro wrote:
>>
>> (1/9/12 6:10 PM), Rik van Riel wrote:
>>>
>>> Ever since abandoning the virtual scan of processes, for scalability
>>> reasons, swap space has been a little more fragmented than before.
>>> This can lead to the situation where a large memory user is killed,
>>> swap space ends up full of "holes" and swapin readahead is totally
>>> ineffective.
>>>
>>> On my home system, after killing a leaky firefox it took over an
>>> hour to page just under 2GB of memory back in, slowing the virtual
>>> machines down to a crawl.
>>>
>>> This patch makes swapin readahead simply skip over holes, instead
>>> of stopping at them. This allows the system to swap things back in
>>> at rates of several MB/second, instead of a few hundred kB/second.
>>
>>
>> If I understand correctly, this patch have
>>
>> Pros
>> - increase IO throughput
>
>
> By about a factor 3-10 in my tests here.
>
>
>> Cons
>> - increase a risk to pick up unrelated swap entries by swap readahead
>
>
> I do not believe there is a very large risk of this, because
> since we introduced rmap, we have been placing unrelated
> pages right next to each other in swap.
>
> This is also why, since 2.6.28, the kernel places newly swapped
> in pages on the INACTIVE_ANON list, where they should not
> displace the working set.
>
> Another factor is that swapping on modern systems is often a
> temporary thing. During a load spike, things get swapped out
> and run slowly. After the load spike is over, or some memory
> hog process got killed, we want the system to recover to normal
> performance as soon as possible.  This often involves swapping
> everything back into memory.

Hmmm.... OK, I have to agree this.
But if so, to skip hole is not best way. I think we should always makes
one big IO, even if the swap cluster have some holes. one big IO is
usually faster than multiple small IOs. Isn't it?

Also, I doubt current swap_cluster default is best value on nowadays.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/