Re: [PATCH 08/10] x86, xen, mm: fix mapping_pagetable_reserve logic

From: H. Peter Anvin
Date: Tue Oct 09 2012 - 03:22:36 EST


On 10/09/2012 02:33 PM, Yinghai Lu wrote:
>>
>> make_range_readwrite is particularly toxic, though, because it makes it
>> sound like it something along the lines of set_memory_rw(), which it
>> most distinctly is not.
>
> it just change some page range from RO back to RW.
>
> so how about update_range_ro_to_rw?
>

You're focusing on what the low-level mechanics of one particular
implementation (Xen) of the hook, and then try to make it describe the
hook itself.

What the hook does, if I am reading it correctly, is take a range that
used to be page tables and turn it back to "ordinary memory". As such,
assuming I'm following the logic correctly, something like
pagetable_unreserve() seems like a reasonable name.

However, why during initialization, why do we have to unreserve memory
that has already been reserved for pagetables? (Keep in mind there may
very well be an entirely sensible answer for that question -- I just
can't tell from the patchset without a much more in-depth analysis.
Keep in mind that that in-depth analysis sucks up time, and it doesn't
scale to expect the maintainer to have to do that.)

>>
>> Magic variables augmented with more magic variables. Why? This also
>> seems to assume that we still do all the kernel page tables in one
>> chunk, which is exactly what we don't want to do.
>
> for 64bit, page table will be three parts
> 1. initial page table from arch/x86/kernel/head_64.S
> 2. page table from BRK.
> 3. page near end of RAM.
>
> verified from /sys/kernel/debug/kernel_page_tables
>
> only range E820_RAM is mapped.
>
> all initial page table for hole between [0, 1G) get cleared too.
>

No, this is wrong, and more importantly, your choice of data structures
encode this. There should not be any requirement for the page tables
near the end of RAM to be contiguous -- consider the case of a memory
hole near the end of RAM, or a large-memory machine where memory is
highly discontiguous and we have to use more than one chunk before we
run out. Then the questions become:

1. do we *have* to have this tracking at all? Obviously we have to know
this memory is in use, but memblock reserve should take care of that.

2. if we do, please use a data structure which can handle an arbitrary
number of ranges.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/