Re: [pagevec] resize pagevec to O(lg(NR_CPUS))

From: William Lee Irwin III
Date: Sun Sep 12 2004 - 01:31:05 EST

Next message: Willy Tarreau: "Re: CPU Context corruption"
Previous message: Peter Zaitsev: "Re: Linux 2.4.27 SECURITY BUG - TCP Local (probable Remote) Denialof Service"
In reply to: Nick Piggin: "Re: [pagevec] resize pagevec to O(lg(NR_CPUS))"
Next in thread: Nick Piggin: "Re: [pagevec] resize pagevec to O(lg(NR_CPUS))"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

William Lee Irwin III wrote:
>> No, it DTRT. Batching does not directly compensate for increases in
>> arrival rates, rather most directly compensates for increases to lock
>> transfer times, which do indeed increase on systems with large numbers
>> of cpus.

On Sun, Sep 12, 2004 at 02:28:46PM +1000, Nick Piggin wrote:
> Generally though I think you could expect the lru lock to be most
> often taken by the scanner by node local CPUs. Even on the big
> systems. We'll see.

No, I'd expect zone->lru_lock to be taken most often for lru_cache_add()
and lru_cache_del().

William Lee Irwin III wrote:
>> A 511 item pagevec is 4KB on 64-bit machines.

On Sun, Sep 12, 2004 at 02:28:46PM +1000, Nick Piggin wrote:
> Sure. And when you fill it with pages, they'll use up 32KB of dcache
> by using a single 64B line per page. Now that you've blown the cache,
> when you go to move those pages to another list, you'll have to pull
> them out of L2 again one at a time.

There can be no adequate compile-time metric of L1 cache size. 64B
cachelines with 16KB caches sounds a bit small, 256 entries, which is
even smaller than TLB's on various systems.

In general a hard cap at the L1 cache size would be beneficial for
operations done in tight loops, but there is no adequate detection
method. Also recall that the page structures things will be touched
regardless if they are there to be touched in a sufficiently large
pagevec. Various pagevecs are meant to amortize locking done in
scenarios where there is no relationship between calls. Again,
lru_cache_add() and lru_cache_del() are the poster children. These
operations are often done for one page at a time in some long codepath,
e.g. fault handlers, and the pagevec is merely deferring the work until
enough has accumulated. radix_tree_gang_lookup() and mpage_readpages()
OTOH execute the operations to be done under the locks in tight loops,
where the lock acquisitions are to be done immediately by the same caller.

This differentiation between the characteristics of pagevec users
happily matches the cases where they're used on-stack and per-cpu.
In the former case, larger pagevecs are desirable, as the cachelines
will not be L1-hot regardless; in the latter, L1 size limits apply.

On Sun, Sep 12, 2004 at 02:28:46PM +1000, Nick Piggin wrote:
> OK, so a 511 item pagevec is pretty unlikely. How about a 64 item one
> with 128 byte cachelines, and you're touching two cachelines per
> page operation? That's 16K.

4*lg(NR_CPUS) is 64 for 16x-31x boxen. No constant number suffices.
Adaptation to systems and the usage cases would be an improvement.

-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Willy Tarreau: "Re: CPU Context corruption"
Previous message: Peter Zaitsev: "Re: Linux 2.4.27 SECURITY BUG - TCP Local (probable Remote) Denialof Service"
In reply to: Nick Piggin: "Re: [pagevec] resize pagevec to O(lg(NR_CPUS))"
Next in thread: Nick Piggin: "Re: [pagevec] resize pagevec to O(lg(NR_CPUS))"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]