Performance issue with non-4K blocksize on 3ware [was: Re: Speed drop /dev/sda -> /dev/sda1 -> /dev/vg0/test (3ware/LVM)]

From: Miquel van Smoorenburg
Date: Tue Dec 30 2003 - 08:01:10 EST


In article <cistron.20031229125412.GA28262@xxxxxxxxxx>,
Miquel van Smoorenburg <miquels@xxxxxxxxxx> wrote:
>Hello,
>
> I'm running Linux 2.6.0 with a 3ware 8506 controller in
>hardware RAID5 mode. The RAID5 array is built of 7+1 200 GB SATA
>disks.
>
>Now it appears that more "mappings" on the array have a bad
>influence on speed. /dev/sda is the fastest, /dev/sda1 is quite
>a bit slower, LVM on /dev/sda is slower yet and LVM on /dev/sda1
>is the slowest.

Okay, I found a clue. It appears that by default, the blocksize
of /dev/sda is 4096 and the blocksize of /dev/sda1 is 1024 on my system:

# blockdev --getbsz /dev/sda
4096
# blockdev --getbsz /dev/sda1
1024

On 2.4, setting the blocksize is sticky, on 2.6 it isn't. So I have
to work around that to test with other blocksizes, for example:

# /usr/bin/time -p sh -c "blockdev --setbsz 4096 /dev/stdout; dd if=/dev/zero bs=4k count=100000; sync" > /dev/sda1
100000+0 records in
100000+0 records out
real 5.00
user 0.03
sys 0.89

409600000 bytes in 5 seconds =~ 80 MB/sec, which is quite good and
the same throughput as through /dev/sda. So, it appears that the
3ware driver really really wants a 4K blocksize, otherwise performance
drops at least 30-40%.

Now with LVM on top of it, performance drops again. Say I create a
logical volume on /dev/sda1:

# pvcreate /dev/sda1
# vgcreate vg0 /dev/sda1
# lvcreate -L 100G -n test vg0

# blockdev --getbsz /dev/sda1
1024
# blockdev --getbsz /dev/vg0/test
4096

Performance:
# /usr/bin/time -p sh -c "dd if=/dev/zero of=/dev/vg0/test bs=4k count=100000; sync"
100000+0 records in
100000+0 records out
409600000 bytes transferred in 8.512663 seconds (48116553 bytes/sec)
real 8.54
user 0.04
sys 1.02

So, I tried a small patch in dm-table.c to set the blocksize of the
underlying device to 4096:

# diff -u dm-table.c.b4 dm-table.c
--- dm-table.c.b4 2003-12-29 15:18:18.000000000 +0100
+++ dm-table.c 2003-12-30 13:08:06.000000000 +0100
@@ -359,8 +359,10 @@
r = bd_claim(bdev, _claim_ptr);
if (r)
blkdev_put(bdev, BDEV_RAW);
- else
+ else {
d->bdev = bdev;
+ set_blocksize(bdev, 4096);
+ }
return r;
}

While it appears to work, it doesn't seem to make any difference:

# blockdev --getbsz /dev/sda1
4096
# blockdev --getbsz /dev/vg0/test
4096

Performance:
# /usr/bin/time -p sh -c "dd if=/dev/zero of=/dev/vg0/test bs=4k count=100000; sync"
100000+0 records in
100000+0 records out
409600000 bytes transferred in 8.528826 seconds (48025367 bytes/sec)
real 8.54
user 0.05
sys 0.98

Not so weird, since in dm-table.c the device is opened in RAW mode
and raw devices do I/O at hardsect_size, right? So that would be
512 bytes, explaining the bad performance.

I'm hoping someone has read this far and can offer me a clue as
how to fix this. Are somehow requests not merged right, and are
small 512 byte requests submitted to the 3ware driver? Is the
3ware driver responsible for merging requests ? I really don't
know anything about the block layer or bio's so I'm kind of
lost here.

Mike.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/