Re: PG_zero

From: Martin J. Bligh
Date: Mon Nov 01 2004 - 21:16:29 EST


--On Monday, November 01, 2004 22:57:09 +0100 Andrea Arcangeli <andrea@xxxxxxxxxx> wrote:

> On Sun, Oct 31, 2004 at 07:35:03AM -0800, Martin J. Bligh wrote:
>> I'll non-micro-benchmark stuff for you on big machines if you want, but I've
>> wasted enough time coding this stuff already ;-)
>>
>> BTW, the one really useful thing the whole page zeroing stuff did was to
>> shift the profiled cost of page zeroing out to the routine acutally using
>> the pages, as it's no longer just do_anonymous_page taking the cache hit.
>
> I'm not a big fan of the idle zeroing myself despite I implemented it.
> The only performance difference I can measure is the microbenchmark
> running 3 times faster, everything else seems the same.
>
> However the real point of the patch is to address all other issues with
> the per-cpu list and to fixup the lack of pte_quicklist in 2.6, and to
> avoid wasting zeropages (like the pte_quicklists did) by sharing them
> with all other page-zero users.
>
> I'm fine to drop the idle zeroing stuff, it just took another half an
> hour to add it so I did it just in case (it can be already disabled via
> sysctl of course).
>
> btw, how did you implement idle zeroing? had you two per-cpu lists,
> where the idle zeroing only refile across the two per-cpu lists like I
> did?

I did a nasty hack (I was just testing perf). I think Andy added another
per-cpu queue, when he redid my stuff properly.

> Did you address the COW with zeropage? the design I did for it is
> the only one I could imagine beneficial at all, I would never attempt
> taking any spinlock on the idle task.

I really don't see much point in the COW'ed zeropage thing - it just
generates more minor faults. From the stuff I measured, I couldn't find
anything that read from an allocated page, and never wrote to it. I've
turned it off before, and just given them their own page straight away,
and it had no perf impact.

> No idea if you also were just
> refiling pages across two per-cpu lists. I need two lists anyways, one
> is the hot-cold one (shared to avoid wasting memory like 2.6 mainline)
> and one is the zero-list that is needed to avoid the lack of
> pte_quicklists and to cache the PG_zero info (it actually renders
> pte_quicklists obsolete since they're less optimal at utilizing zero
> resources).

Yeah, fair enough. the only issue is that I think cold allocators could
steal hot pages the way you're doing it (ie if we set up a big DMA).

> Plus the patch fixes other issues like trying all per-cpu
> lists in the classzone before falling back to the buddy allocator.

Yeah, that seems like a good thing to do ... except on machines with
big highmem, won't we exhaust ZONE_NORMAL much quicker?

> It
> removes the low watermark and other minor details. The idle zeroing is
> just a minor thing in the whole patch, the point of PG_zero is to create
> an infrastructure in the main page allocator that is capable of caching
> already zero data like ptes.
>
> So far this in practice has been an improvement or a noop, and it solves
> various theoretical issues too, but I still need to test on some bigger
> box and see if it makes any difference there or not... But since it's
> rock solid here, it was good enough for posting ;)

Heh. If it's the same as what you posted before, I'll do a sniff test on
one of teh big boxes here.

M
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/