Re: Read support for fat_fallocate()? (was [v2] fat: editions tosupport fat_fallocate())

From: Namjae Jeon
Date: Mon Feb 18 2013 - 09:25:19 EST


2013/2/18 OGAWA Hirofumi <hirofumi@xxxxxxxxxxxxxxxxxx>:
> Andrew Bartlett <abartlet@xxxxxxxxx> writes:
>
>>> >> First, Thanks for your interest !
>>> >> A mismatch between inode size and reserved blocks can be either due to
>>> >> pre-allocation (after our changes) or due to corruption (sudden unplug
>>> >> of media etc).
>>> >> We donât think it is right to include only read only support (i.e.
>>> >> without fallocate support) for such files because if such files are
>>> >> encountered it only means that the file is corrupted, as there is no
>>> >> current method to check if the issue is due to pre-allocation.
>>> >> If it is to be included in the kernel, then the whole patch has to go
>>> >> in.
>>> >
>>> > I don't see why that is the case.
>>> If we consider that there is no FALLOCATE support, then the condition
>>> of file size and blocks not matching can be only possible in case of
>>> corruption, right?
>>
>> Sure. I was just suggesting we transparently recover from that, by
>> using the blocks. Think of it more as an online fsck not about
>> fallocate.
>>
>> Anyway, if you don't think it's reasonable to use those blocks, or to
>> 'just fix it', then we just have to continue to do as we currently do.
>> That is on first sign of FS corruption just stop doing writes, and await
>> an FSCK.
>
> I'm not sure what is suggesting actually though. We have to consider
> about synchronous runtime fsck makes normal path enough slower.
>
> E.g. probably, in this case, all first open(2) of the inode will have to
> walk cluster chain until end of cluster mark, to verify cluster chain.
>
>>> >> But then again, since the FAT specifications do not accommodate
>>> >> for pre-allocation, then it is up to OGAWA to decide if this is
>>> >> acceptable.
>>> >> In any case, the patch will definitely break backward compatibility
>>> >> (on an older fat driver without fallocate support) and also in case
>>> >> for the two variants for the same kernel versions and only one has
>>> >> FALLOCATE enabled, in such cases also, the behavior will assume
>>> >> corruption in one case.
>>> >
>>> > I agree that the sudden unplug is a concern, but why not make the
>>> > filesystem more robust against that inevitable occurrence? If the
>>> > blocks appear to be allocated to the file, why not use them?
>>> We also agree that there should be pre-allocation feature on FAT, and
>>> we had shared the scenarios where this could be required for a TV/
>>> recorder.
>>> But there are certain drawbacks which were raised by OGAWA with
>>> respect to compatibility and we also tend to agree on them.
>>> There could possibly be an issue where we are unable to distinguish
>>> between pre-allocation and corruption. Perhaps we could set a status
>>> bit on the file to indicate whether the file has pre-allocated blocks.
>>> This will make it clear whether the allocation is genuine through the
>>> FAT Fallocate request or is a result of corruption. Depending on the
>>> status of the flag - the decision can be made regard to reading
>>> blocks.
>>> But, the main issue in this will be storing this bit in on-disk
>>> directory entry for that file. From the feature and usability point of
>>> view, we should have fallocate on FAT too.
>>>
>>> But it needs initial ACK from OGAWA to continue to work on this so
>>> that we can figure out the proper solution to move forward.
>>
>> OK. Given the need to find other approaches, I wanted to suggest some
>> ideas - some of which you may have already considered:
>>
>> What about having a shadow FAT in a file, say called 'allocated space',
>> that would contain inode -> cluster list pairs, and where that file
>> would itself contain the free space the 'belongs' to other files?
>>
>> As new clusters become needed in a file, they would simply be removed
>> from the 'allocated space' file, and assigned to the file they really
>> belong to. That way, another OS just sees a large file, nothing more.
>>
>> Or, if we cannot make any changes to the on-disk format, what about
>> keeping such a database in memory, allocating some of the existing free
>> list to files that have had fallocate() called on them? (Naturally,
>> this makes it non-persistent, and instead more of a 'hint', but could at
>> least solve our mutual performance issues).
>
> [...]
>
> Hm. My concerns are compatibility and reliability. Although We can
> change on-disk format if need, but I don't think it can be compatible
> and reliable. If so, who wants to use it? I feel there is no reason to
> use FAT if there is no compatible.
>
> Well, anyway, possible solution would be, we can pre-allocate physical
> blocks via fallocate(2) or something, but discard pre-allocated blocks
> at ->release() (or before unmount at least). This way would have
> compatibility (no on-disk change over unmount) and possible breakage
> would be same with normal extend write patterns on kernel crash
> (i.e. Windows or fsck will truncate after i_size).
Hi OGAWA.
We don't need to consider device unplugging case ?
If yes, I can rework fat fallocate patch as your suggestion.
Thanks.
>
> Thanks.
> --
> OGAWA Hirofumi <hirofumi@xxxxxxxxxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/