Re: Terrible performance of sequential O_DIRECT 4k writes in SANenvironment. ~3 times slower then Solars 10 with the same HBA/Storage.

From: Douglas Gilbert
Date: Thu Jan 09 2014 - 14:54:51 EST


On 14-01-08 08:57 AM, Sergey Meirovich wrote:
Hi James,

On 7 January 2014 22:57, James Smart <james.smart@xxxxxxxxxx> wrote:
Sergey,

The Thor chipset is a bit old - a 4Gig adapter. Most of our performance
improvements, including parallelization, have gone into the 8G and 16G
adapters. But you still should have seen significantly beyond what you
reported.

First of all - thanks a lot!

I took Thor because we have exactly the same Thors in some of our
Solaris servers. I've also tried 6 different qlogics (mostly 8G) and
fnic (10G) as well. Surprisingly enough Thor was the fastest one for
seqwr 4k. Though in most of the cases machines were from our different
DCs and hence each one connected to yet another storage.


We did a sanity check some hardware we already had set up with a Thor
adapter. We saw 23555 iop/s and 92.1 MB/s without needing to do much, well
beyond what you've reported, and still not up to what we know the card can
do. There are some inefficiencies from the linux kernel and some locking
deltas between our solaris and linux drivers - but not enough to account for
what you are seeing.

I expect the Direct IO filesystem behavior is the root issue.

The strangest thing to me that this is the problem with sequential
write. For example the fnic one machine is zoned to EMC XtremIO and
had results: 14.43Mb/sec 3693.65 Requests/sec for sequential 4k. The
same fnic machine perfrormed rather impressive for random 4k
451.11Mb/sec 115485.02 Requests/sec

You could bypass O_DIRECT and use ddpt together with
a bsg pass-through (bsg is a little faster than sg
for these purposes).

For example:

# lsscsi -g
[0:0:0:0] disk ATA INTEL SSDSC2CW12 400i /dev/sda /dev/sg0
[14:0:0:0] disk Linux scsi_debug 0004 - /dev/sg1

# ddpt if=/dev/bsg/14:0:0:0 bs=512 bpt=128 count=1m
Output file not specified so no copy, just reading input
1048576+0 records in
0+0 records out
time to read data: 0.283566 secs at 1893.28 MB/sec

bs= should match the block size of the storage device and
the size of each SCSI READ is dictated by bpt= (so 64 KB
in this case).

Such a test should show you if your performance problem
is in the block layer or below, or above the block layer
(at least the point where pass-through commands are
injected).

Doug Gilbert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/