Re: [PATCH v6] fat: Batched discard support for fat

From: Kyungmin Park
Date: Wed Mar 30 2011 - 10:45:00 EST


On Wed, Mar 30, 2011 at 11:20 PM, Arnd Bergmann <arnd@xxxxxxxx> wrote:
> On Wednesday 30 March 2011, Lukas Czerner wrote:
>> On Wed, 30 Mar 2011, Arnd Bergmann wrote:
>> > Sorry for joining the discussion late, but shouldn't you also pass
>> > the alignment of the discards?
>> >
>> > FAT is typically used on cheap media that have very limited support
>> > for garbage-collection, such as eMMC or SD cards.
>> >
>> > On most SDHC cards, you only ever want to issue discard on full erase
>> > blocks (allocation units per spec), typically sized 4 MB.
>>
>> I was not aware of the fact that SD cards (etc..) does have garbage
>> collection of some sort, or that they even have support discard, since I
>> thought that we have only TRIM,UNAMP/WRITE_SAME comands for SATA or SCSI
>> drives.
>>
>> Or is there some sort of kernel mechanism doing garbage collection such
>> is this for the cheap media ?
>
> The garbage collection is what happens on the device internally. Each
> card has only a small number of erase blocks that it can write to
> at a time (between 1 and ten, typically). When you "open" a new erase
> block by writing to it, the card will "close" another erase block
> by garbage-collecting the data, i.e. it copies the data that has been
> recently written together with the data that was in the erase block
> previously and has not been touched since the last GC into a single
> erase block, and then erases ones that has been freed up.
>
> I've explained this in more detail a few weeks ago in an article, see
> https://lwn.net/Articles/428584/.
>
> Discarding full erase blocks makes the wear levelling more efficient,
> since there is a larger number of unused erase blocks to choose from.
>
> The SD card standard does not specify what happens when you erase
> less than an erase block, other than that the data will be in a
> known state (all-ones or all-zeros). On cheap cards that might
> require actually writing that data, while better ones use a sector
> remapping inside of an erase block that makes is possible mark
> sectors or pages as unused, which can speed up future garbage
> collection on the erase block.
>
>> > If you just pass the minimum length, the file system could end up
>> > erasing a 4 MB section that spans two half erase blocks, or it
>> > could span a few clusters of the following erase block, both of
>> > which is not desirable from a performance point of view.
>>
>> Does those cards export such information correctly ?
>
> Most of the time, they export the erase block size correctly, but
> sometimes they lie. In fact, all of the cards I've seen report
> 4 MB erase blocks, but some cards in fact use 1.5 MB, 2 MB or 8 MB
> instead. I'm sure we will see 3 MB, 6 MB, 12 MB and 16 MB soon.
>
> SD cards do not export the page size, but most of them today use
> 16 KB. See my list at
>
> https://wiki.linaro.org/WorkingGroups/KernelConsolidation/Projects/FlashCardSurvey

Thank you for your effort. but think it the original purpose of
batched discard feature.
Do you want to run the batched discard for SD card. I mean usually in
our environment SD card is optional.
and my goal is used for eMMC. and you know, eMMC and SD card internal
algorithm is different.
The goal of SD card is performance but eMMC is different. it should
consider the reliability also. e.g., Sudden Power Off Recovery.

>
>> > On other media, you have the same problem inside an erase block:
>> > These might be able to discard parts of an erase block efficiently,
>> > but normally not less than a flash page (typically 8 to 32 KB).
>>
>> Well I have tested several SSD's and thinly provisioned devices, but I
>> have not seen any strange behaviour, other than it was terribly
>> unefficient to do so. See my results here:
>>
>> http://people.redhat.com/lczerner/discard/test_discard.html
>>
>> the fact is that I have not tried discard size smaller than 4K, since
>> this is the most usual block size for the filesystem.
>
> Good SSDs can deal with remapping 4k blocks, while cheap ones
> (SD cards, USB sticks, or low-end ATA SSDs) can only do operations
> on full blocks.
It really depends on devices. To increase the performance we have to
increase the channel and the way unit at internally.
Most NAND devices are similar there's no way except increase the
interleaving unit. and it affects the price.

That's reason I like the batched discard. it's not easy to guess the
erase block at system level. so it's helpful scan the unused blocks
and *try* to trim at once.

Please remember that the discard is just hint. If we write the flash
sequentially then we maybe don't need to trim it.

Thank you,
Kyungmin Park
>
>> > Again, you don't want to discard partial pages in this case, and
>> > that is much more important than discarding a large number of pages
>> > because it would result in an immediate copy-on-write operation.
>> >
>> > Further, when you erase some pages inside of an erase block, you
>> > probably should not span multiple erase blocks but instead issue
>> > separate requests for each set of pages in one erase block.
>>
>> Does it mean that we should not issue bigger discards that erase block ?
>> That does not sound good given my test results. Or maybe I misunderstood
>> your point ?
>
> I think ideally you want to have multiples of full erase blocks, but I
> would not combine discards of a partial erase block with any other
> discard.
>
>        Arnd
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/