Re: [PATCh] x86: overmapped fix when 4K pages on tail - 64bit

From: Ingo Molnar
Date: Thu Jul 10 2008 - 03:21:01 EST



* Yinghai Lu <yhlu.kernel@xxxxxxxxx> wrote:

> > that the number of mapping ranges depends on our programming, not on
> > any external factor. I.e. if anyone adds a new mapping range to the
> > kernel for any purpose, it must be extended - but otherwise it
> > cannot run out due to new hardware.
>
> 4k, 2M, 1G, 2M, 4k
>
> some day will get 512g page?

i'd not be surprised to see that in ~10 years. Then we'll have to extend
the array to 7 entries ;-)

btw., i have a weird system:

[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
[ 0.000000] BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 000000003ed93000 (usable)
[ 0.000000] BIOS-e820: 000000003ed93000 - 000000003ee4d000 (ACPI NVS)
[ 0.000000] BIOS-e820: 000000003ee4d000 - 000000003fea2000 (usable)
[ 0.000000] BIOS-e820: 000000003fea2000 - 000000003fee9000 (ACPI NVS)
[ 0.000000] BIOS-e820: 000000003fee9000 - 000000003feed000 (usable)
[ 0.000000] BIOS-e820: 000000003feed000 - 000000003feff000 (ACPI data)
[ 0.000000] BIOS-e820: 000000003feff000 - 000000003ff00000 (usable)

look at the RAM splitup:

640K + BIOS-hole + ~1GB + acpi + 17MB + acpi + 16K + acpi + 4K

and the end of it is not 1024 MB but 1023 MB.

so the _best_ mapping strategy would probably be to do 2MB granular
mapping up to 1GB, i.e. to 'overmap' into the end of RAM. But we also
have to make sure that we have no PCI resources or weird chipset
resources in the final 1MB that could hurt us with PAT, aliasing-wise.

Since i'm not sure we can really ensure sanity on that level, i guess
your solution to precisely map everything without overmapping is our
best choice. Thus sane hw with such end of RAM mappings:

BIOS-e820: 0000000100000000 - 0000000120000000 (usable)

and another one with:

BIOS-e820: 0000000100000000 - 0000000830000000 (usable)

... would be slightly faster (because it would use 2MB TLBs at the end
of kernel RAM, instead of broken-up 4K TLBs)

perhaps we could also have a config and boot option that would sanitize
the e820 map to just ignore all non-2MB granular RAM. Losing 1-2MB of
RAM is not an issue on a 32GB system.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/