Re: questions about init_memory_mapping_high()

From: Yinghai Lu
Date: Thu Feb 24 2011 - 20:38:08 EST


On 02/24/2011 01:15 AM, Tejun Heo wrote:
> Hey, again.
>
> On Wed, Feb 23, 2011 at 02:17:34PM -0800, Yinghai Lu wrote:
>>> Hmmm... I'm not really following. Can you elaborate? The reason why
>>> smaller mapping is bad is because of increased TLB pressure. What
>>> does using the existing entries have to do with it?
>>
>> assume 1g page is used. first node will actually mapped 512G already.
>> so if the system only have 1024g. then first 512g page table will on node0 ram.
>> second 512g page table will be on node4.
>>
>> when only 2M are used, it is 1G boundary. for 1024g system.
>> page table (about 512k) for mem 0-128g is on node0.
>> page table (about 512k) for mem 128g-256g is on node1.
>> ...
>> Do you mean we need to put those all 512k together to reduce TLB presure?
>
> Nope, let's say the machine supports 1GiB mapping, has 8GiB of memory
> where [0,4)GiB is node 0 and [4,8)GiB node1, and there's a hole of
> 128MiB right on top of 4GiB. Before the change, the page mapping code
> wouldn't care about the whole and just map the whole [0,8)GiB area
> with eight 1GiB mapping. Now with your change, [4, 5)GiB will be
> mapped using 2MiB mappings to avoid mapping the 128MiB hole.
>
> We end up unnecessarily using smaller size mappings (512 2MiB mappings
> instead of 1 1GiB mapping) thus increasing TLB pressure. There is no
> reason to match the linear address mapping exactly to the physical
> memory map. It is no accident that the original code didn't consider
> memory holes. Using larger mappings over them is more beneficial to
> trying to punch holes with smaller mappings.
>
> This rather important change was made without any description or
> explanation, which I find somewhat disturbing. Anyways, what we can
> do is just taking bottom and top addresses of occupied NUMA regions
> and round them down and up, respectively, to the largest page mapping
> size supported as long as the top address doesn't go over max_pfn
> instead of mapping exactly according to the memblocks.
>

ok, please check two patches that fix the problem.

thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/