Re: Potential SCSI system bottleneck.

Gerard Roudier (groudier@club-internet.fr)
Fri, 8 Oct 1999 21:14:36 +0200 (MET DST)


Using very fast HDs and the sym53c8xx driver, the command overhead
(everything minus DATA transfer) can be lower than 70 micro-seconds for a
875, without disconnections and probably not more than 100 micro-seconds
with un single diconnection per transfer.
(Btw, the ncr53c8xx should not have a significantly higher command
overhead with the 875, may be less than 20%)

The benchmarks that have been used basically are measuring sequential
access speed and given the read-ahead performed by the the kernel, actual
IOs size are in average larger than 64 KB.

Using your 2 drives on a single BUS, the bandwidth used is less than:

20 MB/sec = half of the BUS bandwidth (50%)
20 MB/64K = 312 tps.

Even if the total command overhead of your drive is as high as 250
micro-second (2.5 times the driver/HA pair capabilities), the total
command overhead per second will be about:
312*250us = 78 ms.
that represents less than 8% of the total BUS bandwidth.

Result is that your benchmarks are not able to use more than 58% of a
single BUS bandwidth. You cannot expect in this situation to get
significant better performances by using 2 BUSes instead of 1 for your
benchmarks.
(With the ncr53c8xx driver load should be not more than 60% of the
bandwidth)

On the other hand, performing RAID using disks and/or disk areas that
haven't exactly the same characteristics is very unlikely to give additive
performances. The fastest transfer rate will synchronize with the slowest
one, and latency due to additionnal disk revolutions for synchronisation
(8.3 milli-second for a 7200 rpm driver) will also slow down the actual
data transfer rate.

In my opinion, you are wasting time by trying to get any relevant
informations from your results. Results that are just a bit reversed
compared to expectations donnot seem significant to me, given the load and
different disks used for RAID. IMO, the minimal configuration that may
give usable results should load a single BUS by at least 70% of its
bandwidth and for the RAID benchmarks should apply to identical disks and
using same areas of each disk.

(Since I intentionnaly used higher number than reality should have been
for the estimation of the BUS load, the actual BUS load for your
benchmarks should have been not more than 50%).

Gérard.

On Fri, 8 Oct 1999, ishikawa wrote:

> Here is a potential performance problem uncovered
> by a set of benchmark tests.
> Namely two SCSI adaptors don't speed up raid 0/1 perceptibly
> in comparison to the performance of a single SCSI adaptor system.
> (and under dual CPU system for that matter .)
>
> Let me explain. This is lengthy containing the benchmark result, etc.
>
> A certain Mr. Sagai did a comparison of disk system performance on
> Linux using various combination of
> IDE and SCSI disks, making RAID 0 and 1 combinations
> using raid tools and tested the configuration on
> single and dual CPU systems.
> He published the benchmark results in a Japanese monthly magazine,
> "Software Design" in two consecutive issues.
> The came out in July and August this year.
> The test was very exhaustive in terms of
> the various configuration set up and I would like to
> thank him for his time as a linux user at home.
>
> One conclusion in his magazine article based on the benchmark results
> was that there was NO discernable speedup when
> TWO (2) SCSI host adaptors that have one SCSI disk each were used
> in setting up RAID disk in comparison to the performance
> observed with only ONE (1) SCSI host adaptor with two SCSI disks on
> the bus.
>
> I thought that the conclusion was rather COUNTER-INTUITIVE.
> For the last 15 years or so, it was rather rule of a thumb to
> add a SCSI host adapter and balance the load to
> speed up the disk system throughput. I saw magazine articles where
> people added scsi host adaptors for better performance.
>
> So I said to myself something was fishy here and
> set out to find out what might be the cause of this
> counter-intuitive result.
>
> Rereading the articles (the article ran in two issues. He did a very
> throught testing and it must have been a lot of work.)
> I thought I detected a possible weakness in the performance
> measurement.
>
> He used Bonnie (and some other tests) to test the system throughput,
> but in the case of Bonnie, he didn't specify
> large enough temporary file size. Small test file size may mean
> that the file buffers in memory can cushion the I/O requests and
> may not excersize the disk very much.
>
> So I wrote him an e-mail, and mentioned my
> surprise about the counter-intuitive result.
> I suggested that the increasing of the test file size
> may result in a very different Bonnie benchmark and
> his tentative conclusion might have to be changed.
> Mr. Sagai promptly wrote back and told me that
> he himself was surprised and disappointed at the result and
> promised he would run the test again using the larger file size
> and ask the magazine to carry additional article reporting the
> new test results. This was in August.
>
> One month later, he wrote back and told me the new article
> would be forth-coming mis-October.
> And more importantly, again there was NO discernable speedup (!).
> Bonnie and iozone were used.
> He was kind enough to let me use his preliminary results before
> publication
> to investigate, er.., rather ask the veterans of linux-scsi (and
> linux-kernel)
> to investigate the cause of the problem.
>
> I am reporting his results for your information and analysis in this
> e-mail.
>
> - BTW, I don't know if Bonnie is a good performance measurement tool
> today
> although it certainly shows that striping makes the disk system faster
> on Linux
> and at that level, it is useful.
> (I already run this discussion by a friend of mine who found out about
> the software striping speed gain, etc.. He doesn't have much to offer
> about this counter-intuitive result, though.
> BTW, this friend of mine found that software RAID using fast CPU
> beats the hardware RAID under light conditions, and as a matter of
> fact,
> all the test configurations he threw at it! Obviously 400Mhz CPU can
> do processing faster than sub-100 Mhz i960 and other CPUs on the
> hardware RAID cards. )
>
> - Seriously, I recall vaguely that there was a discussion in linux-scsi?
> about
> less parallelism in SCSI subsystem (or whatever)
> as there could be due to some serialization bottleneck or somethig.
> If I recall correctly Linus himself jumped in the discussion and
> mentioned
> that was one reason why he did't like the current SCSI system.
>
> This reduced lack of parallelism (?) or something might be the cause
> of the not-so-optimum result Mr. Sagai observed..
>
> Since the benchmark results were obtained for raid 0/1 in the case of
> single and dual CPUs, I think that the chance of experiencing
> a REAL performance bottleneck in the kernel is high.
>
> Maybe we have found a new performance challenge like
> the ones that emerged after Mindcraft benchmark incident.
> (Or we may find other explanations.)
>
> [BTW, Mr. Sagain himeself seems to think that
> SMP ought to add performance boost at least
> in certain configurations. Come to think of it why not.
> However, Mr. Sagai mentioned a monitoring tool xosview showed
> that CPU (?) seems to be
> accessing the disk one at a time during testing even where there are
> two adapters. I don't know if this was to be expected or not.
> For me tthe surprise was that the addition of the SCSI adaptor to balance
> the load
> ought to increase the performance a little bit but didn't seem to do so
> by
> large margin .]
>
>
> So for your analysis, here is the gist of the revised short benchmark
> article.
>
> ---------------------------------------------------
> Here is Mr. Sagai's system configuration.
>
> Kernel: 2.2.10 + raid0134-19990724-2.2.10.gz patch
> M/B: Abit BP6
> MEMORY: 128MB SDRAM CL=2
> SCSI: I/O Data SC-UPCI(Symbios 53C875) x 2
> HDD: IBM DHEA 34330(Linux kernel, etc. is here)
> IBM DCAS 34330UW(4.3GB)(RAID disk 1)
> IBM DDRS 34560UW(4.5GB)(RAID disk2)
> VIDEO: MGA G100 AGP 8MB
> NETWORK: ELECOM Laneed LD-10/100 AN(DEC DC21140)
>
> (Right. He had to use similar but
> different SCSI disks for RAID. The older model was sold out
> when he tried to buy another one and thus he had to use the newer
> available model for his testing. Remember he did the test in his spare
> time.)
>
> His test method.
>
> He set up Software RAID using
> two SCSI disks.
>
> He benchmarked the following configurations:
>
> RAID 0 (striping), and RAID 1 (mirroring.)
>
> for both single and dual CPU set up.
>
> So there are 4 combinations.
>
> He ran Bonnie (512MB test file size) and iozone
> for measuring the performance.
>
> The results are attached.
>
> >From the results, he thinks that
> two SCSI adaptors (and SMP) does NOT seem to
> offer the perceptible performance gain. The same conclusion.
>
> One possibility he mentions was that the benchmark
> test programs themselves are not quite
> compatible with today's hardware resource setting and
> multithreaded version of the benchmark programs probably
> fare better.
> (CI's comment: All I can think of is that the kernel ought to
> better utlize the added bandwidth offered by two SCSI adapters.)
>
>
> (1) BONNIE results
>
> 1-a. Standalone disk (no RAID. )
>
> (He must have used a Japanized version of Bonnie.
> The header lines are presented in Japanese characters.
> I translate them by hand below. So they may look slightly different from
> the original Bonnie output. )
>
> -------sequentail write-------
> --sequential read- -random-
> -char- -block- -rewrite--
> -char- -block- --seek--
> Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU
> K/sec %CPU
> DCAS 512 5474 70.4 7317 9.8 2876 7.2 5701
> 68.4 7734 7.1 67.9 1.0
> DDRS 512 6792 87.3 9825 12.5 4473 11.5 7413
> 89.4 12715 11.0 94.3 1.3
>
>
> (1-b) RAID 0 / singl CPU case.
>
> The case of two (2) SCSI adaptors (SC UPCI x2) and
> the case of single SCSI adaptor (SC UPCI x 1) are presented.
>
> -------sequentail write-------
> --sequential read- -random-
> -char- -block- -rewrite--
> -char- -block- --seek--
> Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU
> K/sec %CPU
> SC UPCIx2 512 7586 95.5 16702 18.8 4871 10.5 6533
> 77.5 15604 14.0 104.4 0.9
> SC UPCIx1 512 7547 95.2 17350 20.0 4914 10.7 6683
> 79.5 15543 12.2 108.9 1.0
>
> (1-c) RAID 0 / dual CPU case.
>
> The case of two (2) SCSI adaptors (SC UPCI x2) and
> the case of single SCSI adaptor (SC UPCI x 1) are presented.
>
>
> -------sequentail write-------
> --sequential read- -random-
> -char- -block- -rewrite--
> -char- -block- --seek--
> Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU
> K/sec %CPU
> SC UPCIx2 512 7502 96.0 16650 20.6 4915 12.4 6455
> 77.5 15554 14.5 99.5 1.5
> SC UCPIx1 512 7399 95.6 15918 21.9 5018 13.2 6612
> 79.7 15491 15.3 101.9 1.6
>
> (1-c) RAID 1 / single CPU
>
> The case of two (2) SCSI adaptors (SC UPCI x2) and
> the case of single SCSI adaptor (SC UPCI x 1) are presented.
>
>
> -------sequentail write-------
> --sequential read- -random-
> -char- -block- -rewrite--
> -char- -block- --seek--
> Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU
> K/sec %CPU
> SC UPCIx2 512 4851 62.2 6301 7.1 2506 5.6 5675
> 68.4 6304 5.5 80.7 1.2
> SC UPCIx1 512 4948 63.7 6539 7.5 2485 5.6 5699
> 69.1 6299 5.6 80.6 1.1
>
> (1-d) RAID 1 / dual CPU
>
> The case of two (2) SCSI adaptors (SC UPCI x2) and
> the case of single SCSI adaptor (SC UPCI x 1) are presented.
>
>
> -------sequentail write-------
> --sequential read- -random-
> -char- -block- -rewrite--
> -char- -block- --seek--
> Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU
> K/sec %CPU
> SC UPCIx2 512 4753 62.8 6627 8.9 2513 6.9
> 5457 66.9 6310 6.5 81.0 0.9
> SC UPCIx1 512 4985 66.3 6639 9.5 2497 7.3
> 5447 66.5 6263 6.9 80.5 1.5
>
> (The above was Bonnie results...)
> ----
> (Below is IOZONE test results.)
>
> 2. IOZONE test results.
>
> (2-a) Standalone Disk.
>
> 512MB file write / read throughput
> DCAS 8.3 MB/s 7.2 MB/s
> DDRS 9.8 MB/s 11.4 MB/s
>
> (2-b) RAID 0 / single CPU
>
> (Sorry, I forgot to ask.The two rows probably have the same meaning.
> Above is for two adaptor cases.)
> 512MB file write / read throughput
> 17.3 MB/s 14.3 MB/s
> 17.8 MB/s 14.4 MB/s
>
> (2-c) RAID 0 / dual CPU
>
> (Sorry, I forgot to ask.The two rows probably have the same meaning.
> Above is for two adaptor cases.)
>
> 512MB file write / read throughput
> 17.3 MB/s 14.4 MB/s
> 16.5 MB/s 13.9 MB/s
>
> (2-d) RAID 1 / Single CPU
>
> (Sorry, I forgot to ask.The two rows probably have the same meaning.
> Above is for two adaptor cases.)
> 512MB file write / read throughput
> 6.7 MB/s 5.9 MB/s
> 6.9 MB/s 5.9 MB/s
>
> (2-e) RAID 1 / dual CPU
>
> (Sorry, I forgot to ask.The two rows probably have the same meaning.
> Above is for two adaptor cases.)
>
> 512MB file write / read throughput
> 6.3 MB/s 6.4 MB/s
> 6.9 MB/s 5.9 MB/s
>
> (above is iozone benchmark results.)
>
>
>
> That is all.
> I suspect there is an artificial serialization bottleneck somwhere
> and this keeps the kernel to take advantage of the added bandwidth of
> the two SCSI bus on separate controllers...
> (of course, it is possible some other bottleneck is already is in effect.
> Maybe at this load level
> there is indeed no difference.)
>
> Since I though there may be kernel issues I include
> linux-kernel in this post (but I am not on it.).
> Probably the follow-up should be
> to linux-scsi only?
>
> Chiaki Ishikawa
>
> A happy linux user at home
>
>
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/
>
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/