Re: [Linux-nvdimm] [PATCH 1/2] x86: add support for the non-standard protected e820 type

From: Boaz Harrosh
Date: Sun Apr 05 2015 - 05:18:39 EST


On 04/03/2015 08:12 PM, Yinghai Lu wrote:
> On Fri, Apr 3, 2015 at 9:14 AM, Toshi Kani <toshi.kani@xxxxxx> wrote:
>> On Wed, 2015-04-01 at 09:12 +0200, Christoph Hellwig wrote:
>> :
>>> @@ -748,7 +758,7 @@ u64 __init early_reserve_e820(u64 size, u64 align)
>>> /*
>>> * Find the highest page frame number we have available
>>> */
>>> -static unsigned long __init e820_end_pfn(unsigned long limit_pfn, unsigned type)
>>> +static unsigned long __init e820_end_pfn(unsigned long limit_pfn)
>>> {
>>> int i;
>>> unsigned long last_pfn = 0;
>>> @@ -759,7 +769,11 @@ static unsigned long __init e820_end_pfn(unsigned long limit_pfn, unsigned type)
>>> unsigned long start_pfn;
>>> unsigned long end_pfn;
>>>
>>> - if (ei->type != type)
>>> + /*
>>> + * Persistent memory is accounted as ram for purposes of
>>> + * establishing max_pfn and mem_map.
>>> + */
>>> + if (ei->type != E820_RAM && ei->type != E820_PRAM)
>>> continue;
>>
>> Should we also delete this code, accounting E820_PRAM as ram, along with
>> the deletion of reserve_pmem() in this version?
>

Hi Yinghai, Toshi

In my old patches I did not have these updates as well, and everything
was very much usable, for a long time.

However. I actually liked these changes in Christoph's patches and
thought they should stay, here is why.

Today I will be sending patches to make pmem be supported with
page-struct as an optional alternative to the use of ioremap.
This is for advanced users that wants to RDMA direct_IO and so
on directly out of pmem.
At one point we had a BUG in some mm/memory.c code that was checking max_pfn.
Actually that was a bug and we do not go through this code anymore. And between
us that global variable max_pfn is a bad hack. But I kind of like to have it as
long as it is used. So code that wants to protect by max_pfn can still accept
pmem memory submitted to it.

I have tried to audit the Kernel use of max_pfn and I do not see how
this can hurt? I do see were it would theoretically help.

Think of a system that looks like this as a memory map:
1. VM (Volitile mem)
2. PM
3. VM
4. PM

Which is what is returned by current and planned NUMA implementations.
So pmem region-2 will be covered by max_pfn. But pmem region 4 will not.
If any code checks for max_pfn it will be OK with pmem-2 but *not* with
pmem-4. This is highly unexpected.

I think the all max_pfn should be killed ASAP, but until it is then
it will not hurt for pmem to be covered.

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/