Re: another pmem variant V2

From: Boaz Harrosh
Date: Tue Mar 31 2015 - 11:14:23 EST


On 03/31/2015 12:25 PM, Christoph Hellwig wrote:
> On Thu, Mar 26, 2015 at 06:57:47PM +0200, Boaz Harrosh wrote:
<>
>
> Any news? I'd really like to resend this ASAP to get it into 4.1..


Hi Christoph

I hate to be bearer of bad news but we have a problem with the
e820 patch:
x86: add support for the non-standard protected e820 type

We can not accept it as is right now.
We have conducted farther tests. And it messes up NUMA.

All the below is based on your latest patches on top of 4.0-rc5

Before any modprobe of pmem.ko, just a clean boot.

In the same exact Kernel, if you use memmap=nn!aa ie add "type-12"
section we have below problems, but if we do memmap=nn\$aa ie add
"reserved" section then everything is fine.
[With my old e820 patches it all works fine because it is closer
to the memmap=nn\$aa "reserved" section way]

The problems we see in a NUMA machine.
* On some machines we cannot boot if a single memmap=nn!aa crosses
a NUMA boundary.
Some VMs sometime boot sometimes do not. Some VMs never boot.
Some machines boot just fine.
Doing memmap=nn1!aa1,nn2!aa2 where the split is at the NUMA
boundary will enable all machines to boot

* Regardless if we use memmap=nn!aa crossing a NUMA boundary or
if we do memmap=nn1!aa1,nn2!aa2 the output of
cat /sys/devices/system/node/node1/meminfo
Is all ZEROs. Yes very scary everything is ZERO. Even though
the dmseg prints show the correct numbers the above
node1/meminfo is broken.
If with the same Kernel we do memmap=nn\$aa then everything
is clean.
So something in the way we defined our new type-12 region
upsets the NUMA code. And we need farther investigation.

Perhaps you would like to start with my e820 much more conservative
fix, that makes type-12 memory behave exactly as reserved memory.
And go from there, step by step. Until we fix the problem above.
So we can submit something like that for 4.1.

Talk to me, tell me what you need me to experiment with. Should I try
your platform device way but based on my e820 fix? how can I please
help push this through?

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/