RE: CCISS performance drop in buffered disk reads in newer kernels

From: Miller, Mike (OS Dev)
Date: Mon Dec 07 2009 - 11:33:19 EST




> -----Original Message-----
> From: Ozan Çağlayan [mailto:ozan@xxxxxxxxxxxxx]
> Sent: Monday, December 07, 2009 4:46 AM
> To: linux-kernel
> Cc: scameron@xxxxxxxxxxxxxxxxxx; Miller, Mike (OS Dev);
> jens.axboe@xxxxxxxxxx
> Subject: CCISS performance drop in buffered disk reads in
> newer kernels
>
> Hi,
>
> We have 2 HP Proliant DL380G5 server running with different kernels.
>
> I was inspecting a basic kernel-compile time. On the one with
> 2.6.25.20 kernel, the compilation took ~1.5 minutes. On the
> one with 2.6.30.9 kernel, it took ~6 minutes. Both systems
> are using ccache as a build helper.
>
> Then I ran hdparm on both systems, the results are below.
>
> I'd like to help debugging this issue through bisect or
> another method but since there are more parameters that
> differ from one to the other server than only the kernel
> version, I'm a little bit stuck.
>
> Thanks,
> Ozan
>

Ozan,
I'm aware of the performance drop. Please see: http://bugzilla.kernel.org/show_bug.cgi?id=13127. I removed the huge read ahead value of 1024 that we used because users were complaining about small writes being starved. That was back around the 2.6.25 timeframe. Since that timeframe there have no changes in the main i/o path. I'll get back on this as time allows.

Meanwhile, you can tweak some of the block layer tunables as such.

echo 64 > /sys/block/cciss\!c0d1/queue/read_ahead_kb
OR
blockdev --setra 128 /dev/cciss/c0d1

These are just example values. There is also max_hw_sectors_kb and max_sectors_kb that be adjusted.

-- mikem

>
> ### 2.6.30.9 (Slow one, compiled with PAE support, FS is ext4) ###
>
> # sync; sleep 2; echo 3 > /proc/sys/vm/drop_caches; hdparm -tT -vvvv
> /dev/cciss/c0d0p5
>
> /dev/cciss/c0d0p5:
> HDIO_DRIVE_CMD(identify) failed: Invalid exchange
> readonly = 0 (off)
> readahead = 256 (on)
> geometry = 245410/255/32, sectors = 2002550382, start = 4225158
> Timing cached reads: 12038 in 2.00 seconds = 6027.00 MB/sec
> Timing buffered disk reads: 184 MB in 3.00 seconds = 61.31
> MB/sec <------ Note the drop here!
>
> # dmesg | grep cciss
> [ 0.000000] Kernel command line: root=LABEL=PARDUS_ROOT vga=791
> splash=silent quiet resume=/dev/cciss/c0d0p1
> [ 6.023542] cciss 0000:18:08.0: PCI INT A -> GSI 19 (level, low) ->
> IRQ 19
> [ 6.023566] cciss: MSI init failed
> [ 6.053008] IRQ 19/cciss0: IRQF_DISABLED is not guaranteed
> on shared IRQs
> [ 6.053015] cciss0: <0x3238> at PCI 0000:18:08.0 IRQ 19 using DAC
> [ 6.053918] cciss/c0d0: p1 p2 < p5 >
> [ 6.320852] kjournald2 starting: pid 190, dev
> cciss!c0d0p5:8, commit
> interval 5 seconds
> [ 6.322344] EXT4-fs: mounted filesystem cciss!c0d0p5 with ordered
> data mode
> [ 10.994505] EXT4 FS on cciss!c0d0p5, internal journal on
> cciss!c0d0p5:8
> [ 11.783302] Adding 2112508k swap on /dev/cciss/c0d0p1. Priority:-1
> extents:1 across:2112508k
> [ 16.696090] JBD: barrier-based sync failed on cciss!c0d0p5:8 -
> disabling barriers
>
>
> ### 2.6.25.20 (Fast one, no PAE support, FS is ext3) ###
>
> # sync;sleep 2; echo 3 > /proc/sys/vm/drop_caches; hdparm -tT -vvv
> /dev/cciss/c0d0p5
>
> /dev/cciss/c0d0p5:
> readonly = 0 (off)
> readahead = 256 (on)
> geometry = 245426/255/32, sectors = 2002678902, start = 4096638
> Timing cached reads: 10650 MB in 2.00 seconds = 5334.38 MB/sec
> Timing buffered disk reads: 420 MB in 3.01 seconds = 139.72 MB/sec
>
> # dmesg | grep cciss
> Kernel command line: root=LABEL=PARDUS_ROOT vga=791
> splash=silent quiet
> resume=/dev/cciss/c0d0p1
> cciss0: <0x3238> at PCI 0000:18:08.0 IRQ 212 using DAC
> cciss/c0d0: p1 p2 < p5 >
> EXT3 FS on cciss/c0d0p5, internal journal Adding 2048248k
> swap on /dev/cciss/c0d0p1. Priority:-1 extents:1 across:2048248k
>
>
--- Begin Message --- Notification number 001000029587 put in process for user phone number Short Desc Hot spot in lab M71B276


--- End Message ---