Re: [PATCH v3 1/2] writeback: add dirty_background_centisecs per bdivariable

From: Dave Chinner
Date: Sun Oct 21 2012 - 21:27:12 EST

Next message: Stephen Rothwell: "linux-next: manual merge of the modules tree with the tree"
Previous message: Rusty Russell: "Re: RFC: sign the modules at install time"
In reply to: Namjae Jeon: "Re: [PATCH v3 1/2] writeback: add dirty_background_centisecs per bdi variable"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Oct 19, 2012 at 04:51:05PM +0900, Namjae Jeon wrote:
> Hi Dave.
>
> Test Procedure:
>
> 1) Local USB disk WRITE speed on NFS server is ~25 MB/s
>
> 2) Run WRITE test(create 1 GB file) on NFS Client with default
> writeback settings on NFS Server. By default
> bdi->dirty_background_bytes = 0, that means no change in default
> writeback behaviour
>
> 3) Next we change bdi->dirty_background_bytes = 25 MB (almost equal to
> local USB disk write speed on NFS Server)
> *** only on NFS Server - not on NFS Client ***

Ok, so the results look good, but it's not really addressing what I
was asking, though. A typical desktop PC has a disk that can do
100MB/s and GbE, so I was expecting a test that showed throughput
close to GbE maximums at least (ie. around that 100MB/s). I have 3
year old, low end, low power hardware (atom) that hanles twice the
throughput you are testing here, and most current consumer NAS
devices are more powerful than this. IOWs, I think the rates you are
testing at are probably too low even for the consumer NAS market to
consider relevant...

> ----------------------------------------------------------------------------------
> Multiple NFS Client test:
> -----------------------------------------------------------------------------------
> Sorry - We could not arrange multiple PCs to verify this.
> So, we tried 1 NFS Server + 2 NFS Clients using 3 target boards:
> ARM Target + 512 MB RAM + ethernet - 100 Mbits/s, create 1 GB File

But this really doesn't tells us anything - it's still only 100Mb/s,
which we'd expect is already getting very close to line rate even
with low powered client hardware.

What I'm concerned about the NFS server "sweet spot" - a $10k server
that exports 20TB of storage and can sustain close to a GB/s of NFS
traffic over a single 10GbE link with tens to hundreds of clients.
100MB/s and 10 clients is about the minimum needed to be able to
extrapolate a litle and make an informed guess of how it will scale
up....

> > 1. what's the comparison in performance to typical NFS
> > server writeback parameter tuning? i.e. dirty_background_ratio=5,
> > dirty_ratio=10, dirty_expire_centiseconds=1000,
> > dirty_writeback_centisecs=1? i.e. does this give change give any
> > benefit over the current common practice for configuring NFS
> > servers?
>
> Agreed, that above improvement in write speed can be achieved by
> tuning above write-back parameters.
> But if we change these settings, it will change write-back behavior
> system wide.
> On the other hand, if we change proposed per bdi setting,
> bdi->dirty_background_bytes it will change write-back behavior for the
> block device exported on NFS server.

I already know what the difference between global vs per-bdi tuning
means. What I want to know is how your results compare
*numerically* to just having a tweaked global setting on a vanilla
kernel. i.e. is there really any performance benefit to per-bdi
configuration that cannot be gained by existing methods?

> > 2. what happens when you have 10 clients all writing to the server
> > at once? Or a 100? NFS servers rarely have a single writer to a
> > single file at a time, so what impact does this change have on
> > multiple concurrent file write performance from multiple clients
>
> Sorry, we could not arrange more than 2 PCs for verifying this.

Really? Well, perhaps there's some tools that might be useful for
you here:

http://oss.sgi.com/projects/nfs/testtools/

"Weber

Test load generator for NFS. Uses multiple threads, multiple
sockets and multiple IP addresses to simulate loads from many
machines, thus enabling testing of NFS server setups with larger
client counts than can be tested with physical infrastructure (or
Virtual Machine clients). Has been useful in automated NFS testing
and as a pinpoint NFS load generator tool for performance
development."

> > 3. Following on from the multiple client test, what difference does it
> > make to file fragmentation rates? Writing more frequently means
> > smaller allocations and writes, and that tends to lead to higher
> > fragmentation rates, especially when multiple files are being
> > written concurrently. Higher fragmentation also means lower
> > performance over time as fragmentation accelerates filesystem aging
> > effects on performance. IOWs, it may be faster when new, but it
> > will be slower 3 months down the track and that's a bad tradeoff to
> > make.
>
> We agree that there could be bit more framentation. But as you know,
> we are not changing writeback settings at NFS clients.
> So, write-back behavior on NFS client will not change - IO requests
> will be buffered at NFS client as per existing write-back behavior.

I think you misunderstand - writeback settings on the server greatly
impact the way the server writes data and therefore the way files
are fragmented. It has nothing to do with client side tuning.

Effectively, what you are presenting is best case numbers - empty
filesystem, single client, streaming write, no fragmentation, no
allocation contention, no competing IO load that causes write
latency occurring. Testing with lots of clients introduces all of
these things, and that will greatly impact server behaviour.
Aggregation in memory isolates a lot of this variation from
writeback and hence smooths out a lot of the variability that leads
to fragmentation, seeks, latency spikes and preamture filesystem
aging.

That is, if you set a 100MB dirty_bytes limit on a bdi it will give
really good buffering for a single client doing a streaming write.
If you've got 10 clients, then assuming fair distribution of server
resources, then that is 10MB per client per writeback trigger.
That's line ball as to whether it will cause fragmentation severe
enough to impact server throughput. If you've got 100 clients,then
that's only 1MB per client per writeback trigger, and that's
definitely too low to maintain decent writeback behaviour. i.e.
you're now writing 100 files 1MB at a time, and that tends towards
random IO patterns rather than sequential IO patterns. Seek time
dertermines throughput, not IO bandwidth limits.

IOWs, as the client count goes up, the writeback patterns will tends
more towards random IO than sequential IO unless the amount of
buffering allowed before writeback triggers also grows. That's
important, because random IO is much slower than sequential IO.
What I'd like to have is some insight into whether this patch
changes that inflection point, for better or for worse. The only way
to find that is to run multi-client testing....

> > 5. Are the improvements consistent across different filesystem
> > types? We've had writeback changes in the past cause improvements
> > on one filesystem but significant regressions on others. I'd
> > suggest that you need to present results for ext4, XFS and btrfs so
> > that we have a decent idea of what we can expect from the change to
> > the generic code.
>
> As mentioned in the above Table 1 & 2, performance gain in WRITE speed
> is different on different file systems i.e. different on NFS client
> over XFS & EXT4.
> We also tried BTRFS over NFS, but we could not see any WRITE speed
> performance gain/degrade on BTRFS over NFS, so we are not posting
> BTRFS results here.

You should post btrfs numbers even if they show no change. It wasn't
until I got this far that I even realised that you'd even tested
BTRFS. I don't know what to make of this, because I don't know what
the throughput rates compared to XFS and EXT4 are....

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Stephen Rothwell: "linux-next: manual merge of the modules tree with the tree"
Previous message: Rusty Russell: "Re: RFC: sign the modules at install time"
In reply to: Namjae Jeon: "Re: [PATCH v3 1/2] writeback: add dirty_background_centisecs per bdi variable"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]