Re: shmget limited by SHMEM_MAX_BYTES to 0x4020010000 bytes (Resend).

From: Linus Torvalds
Date: Sat Jan 22 2011 - 10:59:39 EST


On Sat, Jan 22, 2011 at 7:34 AM, Robin Holt <holt@xxxxxxx> wrote:
> I have a customer system with 12 TB of memory.  The customer is trying
> to do a shmget() call with size of 4TB and it fails due to the check in
> shmem_file_setup() against SHMEM_MAX_BYTES which is 0x4020010000.
>
> I have considered a bunch of options and really do not know which
> direction I should take this.
>
> I could add a third level and fourth level with a similar 1/4 size being
> the current level of indirection, and the next quarter being a next level.
> That would get me closer, but not all the way there.

Ugh.

How about just changing the indexing to use a bigger page allocation?
Right now it uses PAGE_CACHE_SIZE and ENTRIES_PER_PAGE, but as far as
I can tell, the indexing logic is entirely independent from PAGE_SIZE
and PAGE_CACHE_SIZE, and could just use its own SHM_INDEX_PAGE_SIZE or
something.

That would allow increasing the indexing capability fairly easily, no?
No actual change to the (messy) algorithm at all, just make the block
size for the index pages bigger.

Sure, it means that you now require multipage allocations in
shmem_dir_alloc(), but that doesn't sound all that hard. The code is
already set up to try to handle it (because we have that conceptual
difference between PAGE_SIZE and PAGE_CACHE_SIZE, even though the two
end up being the same).

NOTE! I didn't look very closely at the details, there may be some
really basic reason why the above is a completely idiotic idea.

The alternative (and I think it might be a good alternative) is to get
rid of the shmem magic indexing entirely, and rip all the code out and
replace it with something that uses the generic radix tree functions.
Again, I didn't actually look at the code enough to judge whether that
would be the most painful effort ever or even possible at all.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/