Re: Lockless page cache test results

From: Nick Piggin
Date: Wed Apr 26 2006 - 15:42:09 EST


Jens Axboe wrote:
Hi,

Running a splice benchmark on a 4-way IPF box, I decided to give the
lockless page cache patches from Nick a spin. I've attached the results
as a png, it pretty much speaks for itself.

The test in question splices a 1GiB file to a pipe and then splices that
to some output. Normally that output would be something interesting, in
this case it's simply /dev/null. So it tests the input side of things
only, which is what I wanted to do here. To get adequate runtime, the
operation is repeated a number of times (120 in this example). The
benchmark does that number of loops with 1, 2, 3, and 4 clients each
pinned to a private CPU. The pinning is mainly done for more stable
results.

Thanks Jens!

It's interesting, single threaded performance is down a little. Is
this significant? In some other results you showed me with 3 splices
each running on their own file (ie. no tree_lock contention), lockless
looked slightly faster on the same machine.

In my microbenchmarks, single threaded lockless is quite a bit faster
than vanilla on both P4 and G5.

It could well be that the speculative get_page operation is naturally
a bit slower on Itanium CPUs -- there is a different mix of barriers,
reads, writes, etc. If only someone gave me an IPF system... ;)

As you said, it would be nice to see how this goes when the other end
are 4 gigabit pipes or so... And then things like specweb and file
serving workloads.

Nick

--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com -
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/