Re: high-speed disk I/O is CPU-bound?

From: Dave Chinner
Date: Thu May 16 2013 - 18:57:11 EST

On Thu, May 16, 2013 at 11:35:08AM -0400, David Oostdyk wrote:
> On 05/16/13 07:36, Stan Hoeppner wrote:
> >On 5/15/2013 7:59 PM, Dave Chinner wrote:
> >>[cc xfs list, seeing as that's where all the people who use XFS in
> >>these sorts of configurations hang out. ]
> >>
> >>On Fri, May 10, 2013 at 10:04:44AM -0400, David Oostdyk wrote:
> >>>As a basic benchmark, I have an application
> >>>that simply writes the same buffer (say, 128MB) to disk repeatedly.
> >>>Alternatively you could use the "dd" utility. (For these
> >>>benchmarks, I set /proc/sys/vm/dirty_bytes to 512M or lower, since
> >>>these systems have a lot of RAM.)
> >>>
> >>>The basic observations are:
> >>>
> >>>1. "single-threaded" writes, either a file on the mounted
> >>>filesystem or with a "dd" to the raw RAID device, seem to be limited
> >>>to 1200-1400MB/sec. These numbers vary slightly based on whether
> >>>TurboBoost is affecting the writing process or not. "top" will show
> >>>this process running at 100% CPU.
> >>Expected. You are using buffered IO. Write speed is limited by the
> >>rate at which your user process can memcpy data into the page cache.
> >>
> >>>2. With two benchmarks running on the same device, I see aggregate
> >>>write speeds of up to ~2.4GB/sec, which is closer to what I'd expect
> >>>the drives of being able to deliver. This can either be with two
> >>>applications writing to separate files on the same mounted file
> >>>system, or two separate "dd" applications writing to distinct
> >>>locations on the raw device.
> >2.4GB/s is the interface limit of quad lane 6G SAS. Coincidence? If
> >you've daisy chained the SAS expander backplanes within a server chassis
> >(9266-8i/72405), or between external enclosures (9285-8e/71685), and
> >have a single 4 lane cable (SFF-8087/8088/8643/8644) connected to your
> >RAID card, this would fully explain the 2.4GB/s wall, regardless of how
> >many parallel processes are writing, or any other software factor.
> >
> >But surely you already know this, and you're using more than one 4 lane
> >cable. Just covering all the bases here, due to seeing 2.4 GB/s as the
> >stated wall. This number is just too coincidental to ignore.
> We definitely have two 4-lane cables being used, but this is an
> interesting coincidence. I'd be surprised if anyone could really
> achieve the theoretical throughput on one cable, though. We have
> one JBOD that only takes a single 4-lane cable, and we seem to cap
> out at closer to 1450MB/sec on that unit. (This is just a single
> point of reference, and I don't have many tests where only one
> 4-lane cable was in use.)

You can get pretty close to the theoretical limit on the back end
SAS cables - just like you can with FC.

What I'd suggest you do is look at the RAID card configuration -
often they default to active/passive failover configurations when
there are multiple channels to the same storage. Then hey only use
one of the cables for all traffic. Some RAID cards offer
ative/active or "load balanced" options where all back end paths are
used in redundant configurations rather than just one....

> You guys hit the nail on the head! With O_DIRECT I can use a single
> writer thread and easily see the same throughput that I _ever_ saw
> in the multiple-writer case (~2.4GB/sec), and "top" shows the writer
> at 10% CPU usage. I've modified my application to use O_DIRECT and
> it makes a world of difference.

Be aware that O_DIRECT is not a magic bullet. It can make your IO
go a lot slower on some worklaods and storage configs....

> [It's interesting that you see performance benefits for O_DIRECT
> even with a single SATA drive. The reason it took me so long to
> test O_DIRECT in this case, is that I never saw any significant
> benefit from using it in the past. But that is when I didn't have
> such fast storage, so I probably wasn't hitting the bottleneck with
> buffered I/O?]

Right - for applications not designed to use direct IO from the
ground up, this is typically the case - buffered IO is faster right
up to the point where you run out of CPU....


Dave Chinner
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at