Re: [RFC PATCH 0/7] evacuate struct page from the block layer

From: Boaz Harrosh
Date: Sun Mar 22 2015 - 11:51:43 EST


On 03/20/2015 06:21 PM, Rik van Riel wrote:
> On 03/19/2015 09:43 AM, Matthew Wilcox wrote:
>
>> 1. Construct struct pages for persistent memory
>> 1a. Permanently
>> 1b. While the pages are under I/O
>
> Michael Tsirkin and I have been doing some thinking about what
> it would take to allocate struct pages per 2MB area permanently,
> and allocate additional struct pages for 4kB pages on demand,
> when a 2MB area is broken up into 4kB pages.
>
> This should work for both DRAM and persistent memory.
>

My thoughts as well, this need *not* be a huge evasive change. Is however
a careful surgery in very core code. And lots of sleepless scary nights
and testing to make sure all the side effects are wrinkled out.

BTW: Basic core block code may very well work with:
bv_page, bv_len > PAGE_SIZE bv_offset > PAGE_SIZE.

Meaning bv_page-pfn is contiguous in physical space (and virtual
of course). So much so that there are already rumors that this suppose
to be supported, and there are already out-of-tree drivers that use
this today by kmalloc a page-order and feeding BIOs with bv_len=64K

But going out of block-layer and say to networking say via iscsi and
this breaks pretty fast. Lets fix that then lets introduce a:
page_size(page)
page already knows its size (ie belonging to a 2M THP)

> I am still not convinced it is worthwhile to have struct pages
> for persistent memory though, but I am willing to change my mind.
>

If we want copy-less, we need a common memory descriptor career. Today this
is page-struct. So for me your above statement means:
"still not convinced I care about copy-less pmem"

Otherwise you either enhance what you have today or devise a new
system, which means change the all Kernel.

Lastly: Why does pmem need to wait out-of-tree. Even you say above that
machines with lots of DRAM can enjoy the HUGE-to-4k split. So why
not let pmem waist 4k pages like everyone else and fix it as above
down the line, both for pmem and ram. And save both ways.
Why do we need to first change the all Kernel, then have pmem. Why not
use current infra structure, for good or for worth, and incrementally
do better.

May I call you on the phone to try and work things out. I believe the
huge page thing + 4k on demand is not a very big change, as long as
struct page *page is left as is, everywhere.

But may *now* carry a different physical/virtual contiguous payload
bigger then 4k. Is not the PAGE_SIZE the real bug? lets fix that problem.

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/