Re: [PATCH 00/16 v3] f2fs: introduce flash-friendly file system

From: Martin Steigerwald
Date: Mon Nov 12 2012 - 10:16:14 EST


Am Samstag, 10. November 2012 schrieb Arnd Bergmann:
> On Saturday 10 November 2012, Martin Steigerwald wrote:
> > Command (m for help): n
> > Partition type:
> > p primary (0 primary, 0 extended, 4 free)
> > e extended
> > Select (default p): p
> > Partition number (1-4, default 1): 1
> > First sector (2048-4095998, default 2048):
> > Using default value 2048
> > Last sector, +sectors or +size{K,M,G} (2048-4095998, default 4095998):
> > Using default value 4095998
>
> This is almost certainly not the right setting for f2fs, which only works
> at its design point if the segments are aligned to erase blocks. All modern
> flash devices have erase blocks larger than 1 MB, so starting the partition
> at a 1 MB offset will cause it to be misaligned. Also, some USB sticks
> have an area optimized for random writes in the beginning of the drive
> where both FAT32 and f2fs store their metadata. It may be worth testing
> again without a partition table, using just the raw device.

Thank you for your hints, Arnd, much appreciated.

I already suspected as such after having read some of the fine documents on
the linaro website.

As I want to write some article to give Linux users some insight about
Linux on "cheap" flash, I am willing to learn more.

> I would also recommend using flashbench to find out the optimum parameters
> for your device. You can download it from
> git://git.linaro.org/people/arnd/flashbench.git
> In the long run, we should automate those tests and make them part of
> mkfs.f2fs, but for now, try to find out the erase block size and the number
> of concurrently used erase blocks on your device using a timing attack
> in flashbench. The README file in there explains how to interpret the
> results from "./flashbench -a /dev/sdb --blocksize=1024" to guess
> the erase block size, although that sometimes doesn't work.

Why do I use a blocksize of 1024 if the kernel reports me 512 byte blocks?

[ 3112.144086] scsi9 : usb-storage 1-1.1:1.0
[ 3113.145968] scsi 9:0:0:0: Direct-Access TinyDisk 2007-05-12 0.00 PQ: 0 ANSI: 2
[ 3113.146476] sd 9:0:0:0: Attached scsi generic sg2 type 0
[ 3113.147935] sd 9:0:0:0: [sdb] 4095999 512-byte logical blocks: (2.09 GB/1.95 GiB)
[ 3113.148935] sd 9:0:0:0: [sdb] Write Protect is off


And how do reads give information about erase block size? WouldnÂt writes me
more conclusive for that? (Having to erase one versus two erase blocks?)


Hmmm, I get very varying results here with said USB stick:

merkaba:~> /tmp/flashbench -a /dev/sdb
align 536870912 pre 1.1ms on 1.1ms post 1.08ms diff 13Âs
align 268435456 pre 1.2ms on 1.19ms post 1.16ms diff 11.6Âs
align 134217728 pre 1.12ms on 1.14ms post 1.15ms diff 9.51Âs
align 67108864 pre 1.12ms on 1.15ms post 1.12ms diff 29.9Âs
align 33554432 pre 1.11ms on 1.17ms post 1.13ms diff 49Âs
align 16777216 pre 1.14ms on 1.16ms post 1.15ms diff 22.4Âs
align 8388608 pre 1.12ms on 1.09ms post 1.06ms diff -2053ns
align 4194304 pre 1.13ms on 1.16ms post 1.14ms diff 21.7Âs
align 2097152 pre 1.11ms on 1.08ms post 1.1ms diff -18488n
align 1048576 pre 1.11ms on 1.11ms post 1.11ms diff -2461ns
align 524288 pre 1.15ms on 1.17ms post 1.1ms diff 45.4Âs
align 262144 pre 1.11ms on 1.13ms post 1.13ms diff 12Âs
align 131072 pre 1.1ms on 1.09ms post 1.16ms diff -38025n
align 65536 pre 1.09ms on 1.08ms post 1.11ms diff -21353n
align 32768 pre 1.1ms on 1.08ms post 1.11ms diff -23854n
merkaba:~> /tmp/flashbench -a /dev/sdb
align 536870912 pre 1.11ms on 1.13ms post 1.13ms diff 10.6Âs
align 268435456 pre 1.12ms on 1.2ms post 1.17ms diff 61.4Âs
align 134217728 pre 1.14ms on 1.19ms post 1.15ms diff 46.8Âs
align 67108864 pre 1.08ms on 1.15ms post 1.08ms diff 63.8Âs
align 33554432 pre 1.09ms on 1.08ms post 1.09ms diff -4761ns
align 16777216 pre 1.12ms on 1.14ms post 1.07ms diff 41.4Âs
align 8388608 pre 1.1ms on 1.1ms post 1.09ms diff 7.48Âs
align 4194304 pre 1.08ms on 1.1ms post 1.1ms diff 10.1Âs
align 2097152 pre 1.1ms on 1.11ms post 1.1ms diff 16Âs
align 1048576 pre 1.09ms on 1.1ms post 1.07ms diff 15.5Âs
align 524288 pre 1.12ms on 1.12ms post 1.1ms diff 11Âs
align 262144 pre 1.13ms on 1.13ms post 1.1ms diff 21.6Âs
align 131072 pre 1.11ms on 1.13ms post 1.12ms diff 17.9Âs
align 65536 pre 1.07ms on 1.1ms post 1.1ms diff 11.6Âs
align 32768 pre 1.09ms on 1.11ms post 1.13ms diff -5131ns
merkaba:~> /tmp/flashbench -a /dev/sdb
align 536870912 pre 1.2ms on 1.18ms post 1.21ms diff -27496n
align 268435456 pre 1.22ms on 1.21ms post 1.24ms diff -18972n
align 134217728 pre 1.15ms on 1.19ms post 1.14ms diff 42.5Âs
align 67108864 pre 1.08ms on 1.09ms post 1.08ms diff 5.29Âs
align 33554432 pre 1.18ms on 1.19ms post 1.18ms diff 9.25Âs
align 16777216 pre 1.18ms on 1.22ms post 1.17ms diff 48.6Âs
align 8388608 pre 1.14ms on 1.17ms post 1.19ms diff 4.36Âs
align 4194304 pre 1.16ms on 1.2ms post 1.11ms diff 65.8Âs
align 2097152 pre 1.13ms on 1.09ms post 1.12ms diff -37718n
align 1048576 pre 1.15ms on 1.2ms post 1.18ms diff 34.9Âs
align 524288 pre 1.14ms on 1.19ms post 1.16ms diff 41.5Âs
align 262144 pre 1.19ms on 1.12ms post 1.15ms diff -52725n
align 131072 pre 1.21ms on 1.11ms post 1.14ms diff -68522n
align 65536 pre 1.21ms on 1.13ms post 1.18ms diff -64248n
align 32768 pre 1.14ms on 1.25ms post 1.12ms diff 116Âs


Even when I apply the explaination of the README I do not seem to get a
clear picture of the stick erase block size.

The values above seem to indicate to me: I donÂt care about alignment at all.


With another flash, likely slower Intenso 4GB stick I get:

[ 3672.512143] scsi 10:0:0:0: Direct-Access Ut165 USB2FlashStorage 0.00 PQ: 0 ANSI: 2
[ 3672.514469] sd 10:0:0:0: Attached scsi generic sg2 type 0
[ 3672.514991] sd 10:0:0:0: [sdb] 7897088 512-byte logical blocks: (4.04 GB/3.76 GiB)
[â]
merkaba:~> /tmp/flashbench -a /dev/sdb
align 1073741824 pre 1.06ms on 1.03ms post 951Âs diff 26.1Âs
align 536870912 pre 1.06ms on 1ms post 941Âs diff 1.17Âs
align 268435456 pre 995Âs on 957Âs post 887Âs diff 15.7Âs
align 134217728 pre 994Âs on 951Âs post 883Âs diff 12.4Âs
align 67108864 pre 994Âs on 989Âs post 1.02ms diff -15104n
align 33554432 pre 934Âs on 974Âs post 1ms diff 4.16Âs
align 16777216 pre 946Âs on 916Âs post 900Âs diff -6588ns
align 8388608 pre 883Âs on 881Âs post 880Âs diff -1176ns
align 4194304 pre 884Âs on 884Âs post 885Âs diff -159ns

here?

align 2097152 pre 880Âs on 879Âs post 783Âs diff 47.6Âs
align 1048576 pre 877Âs on 881Âs post 878Âs diff 3.92Âs
align 524288 pre 869Âs on 870Âs post 875Âs diff -2101ns
align 262144 pre 871Âs on 875Âs post 885Âs diff -2539ns
align 131072 pre 878Âs on 893Âs post 900Âs diff 3.6Âs
align 65536 pre 851Âs on 881Âs post 884Âs diff 13.7Âs
align 32768 pre 836Âs on 833Âs post 880Âs diff -25556n
merkaba:~> /tmp/flashbench -a /dev/sdb
align 1073741824 pre 1.07ms on 1e+03Â post 962Âs diff -14615n
align 536870912 pre 1.06ms on 1.01ms post 940Âs diff 12.2Âs
align 268435456 pre 1ms on 943Âs post 885Âs diff -1132ns
align 134217728 pre 995Âs on 982Âs post 909Âs diff 30Âs
align 67108864 pre 999Âs on 995Âs post 1.01ms diff -9707ns
align 33554432 pre 960Âs on 1.01ms post 1.03ms diff 15.2Âs
align 16777216 pre 954Âs on 928Âs post 878Âs diff 12.1Âs
align 8388608 pre 872Âs on 900Âs post 895Âs diff 16.5Âs
align 4194304 pre 895Âs on 862Âs post 890Âs diff -30439n
align 2097152 pre 889Âs on 901Âs post 876Âs diff 18.7Âs
align 1048576 pre 900Âs on 898Âs post 897Âs diff -708ns

here?

align 524288 pre 885Âs on 874Âs post 881Âs diff -8470ns
align 262144 pre 817Âs on 873Âs post 878Âs diff 25.6Âs
align 131072 pre 882Âs on 854Âs post 881Âs diff -27423n
align 65536 pre 866Âs on 890Âs post 885Âs diff 14.3Âs
align 32768 pre 900Âs on 881Âs post 893Âs diff -15412n
merkaba:~> /tmp/flashbench -a /dev/sdb
align 1073741824 pre 1.12ms on 1.02ms post 949Âs diff -12574n
align 536870912 pre 1.07ms on 1.03ms post 948Âs diff 16.5Âs
align 268435456 pre 1.01ms on 958Âs post 883Âs diff 12.1Âs
align 134217728 pre 994Âs on 946Âs post 879Âs diff 9.2Âs
align 67108864 pre 1ms on 1.05ms post 1.03ms diff 37.9Âs
align 33554432 pre 942Âs on 1.01ms post 1.03ms diff 20.6Âs
align 16777216 pre 939Âs on 903Âs post 880Âs diff -5972ns
align 8388608 pre 900Âs on 914Âs post 923Âs diff 2.42Âs
align 4194304 pre 894Âs on 886Âs post 882Âs diff -1563ns

here?

align 2097152 pre 829Âs on 890Âs post 874Âs diff 37.8Âs
align 1048576 pre 899Âs on 882Âs post 843Âs diff 11.1Âs
align 524288 pre 890Âs on 887Âs post 902Âs diff -9005ns
align 262144 pre 887Âs on 887Âs post 898Âs diff -5474ns
align 131072 pre 928Âs on 895Âs post 914Âs diff -26028n
align 65536 pre 898Âs on 898Âs post 894Âs diff 2.59Âs
align 32768 pre 884Âs on 891Âs post 901Âs diff -1284ns


Similar picture. The diffs seem to be mostly quite small with only some
micro seconds. Or am I misreading something?


Then with a quite fast one 16 GB Transcend.

[ 4055.393399] sd 11:0:0:0: Attached scsi generic sg2 type 0
[ 4055.394729] sd 11:0:0:0: [sdb] 31375360 512-byte logical blocks: (16.0 GB/14.9 GiB)
[ 4055.395262] sd 11:0:0:0: [sdb] Write Protect is off


merkaba:~> /tmp/flashbench -a /dev/sdb
align 4294967296 pre 1.28ms on 1.48ms post 1.33ms diff 179Âs
align 2147483648 pre 1.32ms on 1.51ms post 1.33ms diff 181Âs
align 1073741824 pre 1.31ms on 1.46ms post 1.35ms diff 132Âs
align 536870912 pre 1.27ms on 1.52ms post 1.33ms diff 228Âs
align 268435456 pre 1.28ms on 1.46ms post 1.31ms diff 161Âs
align 134217728 pre 1.28ms on 1.44ms post 1.37ms diff 120Âs
align 67108864 pre 1.27ms on 1.44ms post 1.34ms diff 133Âs
align 33554432 pre 1.24ms on 1.42ms post 1.31ms diff 150Âs
align 16777216 pre 1.23ms on 1.46ms post 1.26ms diff 218Âs
align 8388608 pre 1.31ms on 1.5ms post 1.33ms diff 180Âs
align 4194304 pre 1.27ms on 1.45ms post 1.36ms diff 135Âs
align 2097152 pre 1.29ms on 1.37ms post 1.39ms diff 33.7Âs

here?

align 1048576 pre 1.31ms on 1.44ms post 1.35ms diff 115Âs
align 524288 pre 1.33ms on 1.39ms post 1.48ms diff -12297n
align 262144 pre 1.36ms on 1.42ms post 1.4ms diff 45.6Âs
align 131072 pre 1.37ms on 1.44ms post 1.4ms diff 57.7Âs
align 65536 pre 1.36ms on 1.35ms post 1.33ms diff 4.67Âs
align 32768 pre 1.32ms on 1.38ms post 1.34ms diff 44.1Âs
merkaba:~> /tmp/flashbench -a /dev/sdb
align 4294967296 pre 1.36ms on 1.49ms post 1.34ms diff 139Âs
align 2147483648 pre 1.26ms on 1.48ms post 1.27ms diff 213Âs
align 1073741824 pre 1.26ms on 1.45ms post 1.33ms diff 164Âs
align 536870912 pre 1.22ms on 1.46ms post 1.35ms diff 173Âs
align 268435456 pre 1.34ms on 1.5ms post 1.31ms diff 172Âs
align 134217728 pre 1.34ms on 1.48ms post 1.31ms diff 157Âs
align 67108864 pre 1.29ms on 1.46ms post 1.34ms diff 142Âs
align 33554432 pre 1.28ms on 1.47ms post 1.31ms diff 173Âs
align 16777216 pre 1.26ms on 1.48ms post 1.37ms diff 168Âs
align 8388608 pre 1.31ms on 1.47ms post 1.36ms diff 139Âs
align 4194304 pre 1.26ms on 1.53ms post 1.33ms diff 237Âs
align 2097152 pre 1.34ms on 1.4ms post 1.36ms diff 56.4Âs
align 1048576 pre 1.32ms on 1.35ms post 1.37ms diff 638ns

here?

align 524288 pre 1.29ms on 1.47ms post 1.45ms diff 98.1Âs
align 262144 pre 1.35ms on 1.38ms post 1.42ms diff -11916n
align 131072 pre 1.32ms on 1.46ms post 1.4ms diff 100Âs
align 65536 pre 1.35ms on 1.42ms post 1.43ms diff 30.8Âs
align 32768 pre 1.31ms on 1.37ms post 1.33ms diff 51Âs
merkaba:~> /tmp/flashbench -a /dev/sdb
align 4294967296 pre 1.26ms on 1.49ms post 1.27ms diff 222Âs
align 2147483648 pre 1.25ms on 1.41ms post 1.37ms diff 97.3Âs
align 1073741824 pre 1.26ms on 1.47ms post 1.31ms diff 186Âs
align 536870912 pre 1.25ms on 1.42ms post 1.32ms diff 132Âs
align 268435456 pre 1.2ms on 1.44ms post 1.29ms diff 195Âs
align 134217728 pre 1.27ms on 1.43ms post 1.34ms diff 118Âs
align 67108864 pre 1.25ms on 1.45ms post 1.31ms diff 165Âs
align 33554432 pre 1.22ms on 1.36ms post 1.25ms diff 124Âs
align 16777216 pre 1.24ms on 1.44ms post 1.26ms diff 191Âs
align 8388608 pre 1.22ms on 1.39ms post 1.23ms diff 164Âs
align 4194304 pre 1.23ms on 1.43ms post 1.3ms diff 171Âs
align 2097152 pre 1.26ms on 1.3ms post 1.32ms diff 16.7Âs
align 1048576 pre 1.26ms on 1.27ms post 1.26ms diff 7.91Âs

here?

align 524288 pre 1.24ms on 1.3ms post 1.3ms diff 29.2Âs
align 262144 pre 1.25ms on 1.3ms post 1.28ms diff 28.2Âs
align 131072 pre 1.25ms on 1.29ms post 1.28ms diff 24.8Âs
align 65536 pre 1.15ms on 1.24ms post 1.26ms diff 34.5Âs
align 32768 pre 1.17ms on 1.3ms post 1.26ms diff 82.6Âs


Thing is that me here is not always at the same place :)

> With the correct guess, compare the performance you get using
>
> $ ERASESIZE=$[2*1024*1024] # replace with guess from flashbench -a
> $ ./flashbench /dev/sdb --open-au --open-au-nr=1 --blocksize=4096 --erasesize=${ERASESIZE}
> $ ./flashbench /dev/sdb --open-au --open-au-nr=3 --blocksize=4096 --erasesize=${ERASESIZE}
> $ ./flashbench /dev/sdb --open-au --open-au-nr=5 --blocksize=4096 --erasesize=${ERASESIZE}
> $ ./flashbench /dev/sdb --open-au --open-au-nr=7 --blocksize=4096 --erasesize=${ERASESIZE}
> $ ./flashbench /dev/sdb --open-au --open-au-nr=13 --blocksize=4096 --erasesize=${ERASESIZE}

I omit this for now, cause I am not yet sure about the correct guess.

> The first one of those should always be the fastest, hopefully followed by
> some that are equally fast and then some much slower ones (especially for the
> smaller block sizes). The "active_logs=N" mount option should be one less
> than the highest number above that is still "fast", and only "2", "4" and "6"
> are valid at the moment. If you are lucky, your device is still fast with
> "--open-au-nr=7" and slow only for higher numbers, then the default of "6"
> is ok.
>
> If the erase size is larger than 2 MB, then you have to "-s" option in
> mkfs.f2fs to configure how many 2 MB segments there are in one erase block.
> For a 2 GB USB stick, I would guess that the erase block size is 1, 2 or
> 4 MB. Newer (larger) sticks will have larger erase blocks that may also
> be a multiple of 3 MB (3, 6, 12, or 24).

Thanks,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/