Re: Block Device Caching

From: Markus Schaber
Date: Fri Jul 02 2004 - 07:04:18 EST


Hi, Helge,

On Fri, 02 Jul 2004 13:15:35 +0200
Helge Hafting <helge.hafting@xxxxxxx> wrote:
> > [Caching problems]
> >Any Ideas?
> >
> Not much. "bear" may have decided to drop cache in one case but not
> in the other - that is valid although strange. Take a look at what
> block size the device uses. A mounted device uses the fs blocksize, a
> partition(or entire disk) may be set to some different blocksize. (I
> believe it might be the classical 512 byte) Caching blocks
> with a size different from the page size (usually 4k) has extra
> overhead and could lead to extra memory pressure and different caching
> decisions.
>
> So I recommend retesting bear with a good blocksize, such as 4k, on
> the device. That also gives somewhat better performance in general
> than 512-byte blocks. There is a syscall for changing the block size
> of a device.
>
> (Block size may also be changed by the following hack: mount a fs
> like ext2 on the device, then umount it. Of course you need to use
> mkfs so this can't be done on a device that contains data.
> "mount" sets the device block size to the size used by the fs.
> After umount, the device is free to be opened directly again, but it
> retains the new blocksize.)

This sounds reasonable, and so I did some tests wr/t mounting and memory
pressure:


bear:~/readcachetest# top | head -n 8 | tail -n 2
Mem: 3953624k total, 775776k used, 3177848k free, 458060k buffers
Swap: 1048568k total, 0k used, 1048568k free, 31812k cached

bear:~/readcachetest# ./readcache /dev/daten/testing 1000
frist run needed 12.374819 seconds, this is 80.809263 MBytes/Sec
second run needed 12.004670 seconds, this is 83.300915 MBytes/Sec
third run needed 11.898035 seconds, this is 84.047492 MBytes/Sec

bear:~/readcachetest# top | head -n 8 | tail -n 2
Mem: 3953624k total, 775648k used, 3177976k free, 457904k buffers
Swap: 1048568k total, 0k used, 1048568k free, 31900k cached

bear:~/readcachetest# mount /dev/daten/testing /testinmount/ -t ext2

bear:~/readcachetest# top | head -n 8 | tail -n 2
Mem: 3953624k total, 775720k used, 3177904k free, 457944k buffers
Swap: 1048568k total, 0k used, 1048568k free, 31860k cached

bear:~/readcachetest# ./readcache /dev/daten/testing 1000
frist run needed 12.199237 seconds, this is 81.972340 MBytes/Sec
second run needed 12.028633 seconds, this is 83.134966 MBytes/Sec
third run needed 12.454502 seconds, this is 80.292251 MBytes/Sec

bear:~/readcachetest# top | head -n 8 | tail -n 2
Mem: 3953624k total, 1125432k used, 2828192k free, 808008k buffers
Swap: 1048568k total, 0k used, 1048568k free, 31792k cached

bear:~/readcachetest# ./readcache /testinmount/test.dump 1000
frist run needed 11.985626 seconds, this is 83.433272 MBytes/Sec
second run needed 3.682361 seconds, this is 271.564901 MBytes/Sec
third run needed 3.638563 seconds, this is 274.833774 MBytes/Sec

bear:~/readcachetest# top | head -n 8 | tail -n 2
Mem: 3953624k total, 2150128k used, 1803496k free, 808736k buffers
Swap: 1048568k total, 0k used, 1048568k free, 1055688k cached

bear:~/readcachetest# ./readcache /dev/daten/testing 1000
frist run needed 12.418232 seconds, this is 80.526761 MBytes/Sec
second run needed 12.381597 seconds, this is 80.765026 MBytes/Sec
third run needed 12.185159 seconds, this is 82.067046 MBytes/Sec

bear:~/readcachetest# top | head -n 8 | tail -n 2
Mem: 3953624k total, 2149232k used, 1804392k free, 807784k buffers
Swap: 1048568k total, 0k used, 1048568k free, 1055756k cached

bear:~/readcachetest# umount /testinmount/

bear:~/readcachetest# top | head -n 8 | tail -n 2
Mem: 3953624k total, 775728k used, 3177896k free, 457872k buffers
Swap: 1048568k total, 0k used, 1048568k free, 31728k cached

bear:~/readcachetest# ./readcache /dev/daten/testing 1000
frist run needed 12.171772 seconds, this is 82.157306 MBytes/Sec
second run needed 11.842943 seconds, this is 84.438471 MBytes/Sec
third run needed 12.167059 seconds, this is 82.189131 MBytes/Sec

bear:~/readcachetest# top | head -n 8 | tail -n 2
Mem: 3953624k total, 775656k used, 3177968k free, 457876k buffers
Swap: 1048568k total, 0k used, 1048568k free, 31860k cached

So files from the mounted fs get cached, but not direct reads from the
block device, although the lvm volume is mounted, and enough memory is
available.

I currently suspect this to be a problem with the cciss controller, see
my other mail to the list...

> You may also want to test with a smaller count. Reading directly from
> pagecache should be noticeably faster than reading from the cache on
> the raid controller, although either obviously is way faster than
> reading from actual disks.

bear:~/readcachetest# ./readcache /dev/daten/testing 100
frist run needed 1.202330 seconds, this is 83.171841 MBytes/Sec
second run needed 0.349686 seconds, this is 285.970842 MBytes/Sec
third run needed 0.346644 seconds, this is 288.480401 MBytes/Sec

bear:~/readcachetest# mount /dev/daten/testing /testinmount/ -t ext2

bear:~/readcachetest# ./readcache /testinmount/test.dump 100
frist run needed 1.205797 seconds, this is 82.932699 MBytes/Sec
second run needed 0.367004 seconds, this is 272.476594 MBytes/Sec
third run needed 0.362783 seconds, this is 275.646874 MBytes/Sec

bear:~/readcachetest# ./readcache /testinmount/test.dump 100
frist run needed 0.363097 seconds, this is 275.408500 MBytes/Sec
second run needed 0.363002 seconds, this is 275.480576 MBytes/Sec
third run needed 0.362852 seconds, this is 275.594457 MBytes/Sec

bear:~/readcachetest# umount /testinmount/

bear:~/readcachetest# mount /dev/daten/testing /testinmount/ -t ext2

bear:~/readcachetest# ./readcache /testinmount/test.dump 100
frist run needed 1.189227 seconds, this is 84.088235 MBytes/Sec
second run needed 0.367456 seconds, this is 272.141426 MBytes/Sec
third run needed 0.363139 seconds, this is 275.376646 MBytes/Sec

Hmm, I'm somehow irritated now, as 275 MB/Sec seems rather low for reads
from the page cache on a 2.8Ghz Xeon machine (even my laptop gets about 400
MB/Sec with this, and Skate even reaches 3486.385664 MBytes/Sec.)


No idea...
Markus.
--
markus schaber | dipl. informatiker
logi-track ag | rennweg 14-16 | ch 8001 zürich
phone +41-43-888 62 52 | fax +41-43-888 62 53
mailto:schabios@xxxxxxxxxxxxxx | www.logi-track.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/