Re: [RFC] [PATCH] Include LRU in page count

From: Andrew Morton (akpm@zip.com.au)
Date: Sun Sep 01 2002 - 18:08:03 EST


Daniel Phillips wrote:
>
> On Monday 02 September 2002 00:09, Andrew Morton wrote:
> > Daniel Phillips wrote:
> > >
> > > ...
> > > I'm looking at your spinlock_irq now and thinking the _irq part could
> > > possibly be avoided. Can you please remind me of the motivation for this -
> > > was it originally intended to address the same race we've been working on
> > > here?
> >
> > scalability, mainly. If the CPU holding the lock takes an interrupt,
> > all the other CPUs get to spin until the handler completes. I measured
> > a 30% reducton in contention from this.
> >
> > Not a generally justifiable trick, but this is a heavily-used lock.
> > All the new games in refill_inactive() are there to minimise the
> > interrupt-off time.
>
> Ick. I hope you really chopped the lock hold time into itty-bitty pieces.

Max hold is around 7,000 cycles.

> Note that I changed the spin_lock in page_cache_release to a trylock, maybe
> it's worth checking out the effect on contention. With a little head
> scratching we might be able to get rid of the spin_lock in lru_cache_add as
> well. That leaves (I think) just the two big scan loops. I've always felt
> it's silly to run more than one of either at the same time anyway.

No way. Take a look at http://samba.org/~anton/linux/2.5.30/

That's 8-way power4, the workload is "dd from 7 disks
dd if=/dev/sd* of=/dev/null bs=1024k".

The CPU load in this situation was dominated by the VM. The LRU list and page
reclaim. Spending more CPU in lru_cache_add() than in copy_to_user() is
pretty gross.

I think it's under control now - I expect we're OK up to eight or sixteen
CPUs per node, but I'm still not happy with the level of testing. Most people
who have enough hardware to test this tend to have so much RAM that their
tests don't hit page reclaim.

slablru will probably change the picture too. There's a weird effect with
the traditional try_to_free_pages() code. The first thing it does is to run
kmem_cache_shrink(). Which takes a sempahore, fruitlessly fiddles with the
slab and then runs page reclaim.

On the 4-way I measured 25% contention on the spinlock inside that semaphore.
What is happening is that the combination of the sleeping lock and the slab
operations effectively serialises entry into page reclaim.

slablru removes that little turnstile on entry to try_to_free_pages(), and we
will now see a lot more concurrency in shrink_foo().

My approach was to keep the existing design and warm it up, rather than to
redesign. Is it "good enough" now? Dunno - nobody has run the tests
with slablru. But it's probably too late for a redesign (such as per-cpu LRU,
per-mapping lru, etc).

It would be great to make presence on the LRU contribute to page->count, because
that would permit the removal of a ton of page_cache_get/release operations inside
the LRU lock, perhaps doubling throughput in there. Guess I should get off my
lazy butt and see what you've done (will you for heaven's sake go and buy an IDE
disk and compile up a 2.5 kernel? :))
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Sep 07 2002 - 22:00:14 EST