Re: Large stack usage in fs code (especially for PPC64)

From: Linus Torvalds
Date: Mon Nov 17 2008 - 16:11:13 EST




On Mon, 17 Nov 2008, Steven Rostedt wrote:
>
> 45) 4992 1280 .block_read_full_page+0x23c/0x430
> 46) 3712 1280 .do_mpage_readpage+0x43c/0x740

Ouch.

> Notice at line 45 and 46 the stack usage of block_read_full_page and
> do_mpage_readpage. They each use 1280 bytes of stack! Looking at the start
> of these two:
>
> int block_read_full_page(struct page *page, get_block_t *get_block)
> {
> struct inode *inode = page->mapping->host;
> sector_t iblock, lblock;
> struct buffer_head *bh, *head, *arr[MAX_BUF_PER_PAGE];

Yeah, that's unacceptable.

Well, it's not unacceptable on good CPU's with 4kB blocks (just an 8-entry
array), but as you say:

> On PPC64 I'm told that the page size is 64K, which makes the above equal
> to: 64K / 512 = 128 multiply that by 8 byte words, we have 1024 bytes.

Yeah. Not good. I think 64kB pages are insane. In fact, I think 32kB
pages are insane, and 16kB pages are borderline. I've told people so.

The ppc people run databases, and they don't care about sane people
telling them the big pages suck. It's made worse by the fact that they
also have horribly bad TLB fills on their broken CPU's, and years and
years of telling people that the MMU on ppc's are sh*t has only been
reacted to with "talk to the hand, we know better".

Quite frankly, 64kB pages are INSANE. But yes, in this case they actually
cause bugs. With a sane page-size, that *arr[MAX_BUF_PER_PAGE] thing uses
64 bytes, not 1kB.

I suspect the PPC people need to figure out some way to handle this in
their broken setups (since I don't really expect them to finally admit
that they were full of sh*t with their big pages), but since I think it's
a ppc bug, I'm not at all interested in a fix that penalizes the _good_
case.

So either make it some kind of (clean) conditional dynamic non-stack
allocation, or make it do some outer loop over the whole page that turns
into a compile-time no-op when the page is sufficiently small to be done
in one go.

Or perhaps say "if you have 64kB pages, you're a moron, and to counteract
that moronic page size, you cannot do 512-byte granularity IO any more".

Of course, that would likely mean that FAT etc wouldn't work on ppc64, so
I don't think that's a valid model either. But if the 64kB page size is
just a "database server crazy-people config option", then maybe it's
acceptable.

Database people usually don't want to connect their cameras or mp3-players
with their FAT12 filesystems.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/