Re: [PATCH v2 1/5] fat: allocate persistent inode numbers

From: OGAWA Hirofumi
Date: Thu Sep 13 2012 - 04:33:14 EST


Namjae Jeon <linkinjeon@xxxxxxxxx> writes:

>> I see. So, client can't solve the ESTALE if inode cache was evicted,
>> right? (without application changes)
>
> There can be situation where we may get not only ESTALE but EIO also.
>
> For example,
> -------------------------------
> fd = open(“foo.txt”);
> while (1) {
> sleep(1);
> write(fd..);
> }
> --------------------------------
>
> Here “write” may fail when inode number of “foo.txt” is changed at
> server due to cache eviction under memory pressure.
> When we tried a similar test, we found that “write” is retuning “EIO”
> instead of “ESTALE”
>
> ---------------------------------------------------------------------------------------------------------
> #> ./write_test_dbg bbb 1000 0
> FILE : bbb, SIZE : 1048576000 , FSYNC : OFF , RECORD_SIZE = 4096
> 106264 -rwxr-xr-x 1 root 0 0 Jan 1 00:14 bbb
> write failed after 60080128 bytes:, errno = 5: Input/output error
> ---------------------------------------------------------------------------------------------------------
>
> As we get EIO instead of ESTALE, it may be difficult to decide when
> "restart from LOOKUP” in such situation.
> Also, as per Bruce opinion, we can not avoid ESTALE from inode number
> change in rebooted server case.
> In reboot case, it is worst as it may attempt to write in a different
> file if NFS handle at NFS client match with inode number of some other
> file at NFS server.

I see.

>> Grepping around... Documentation/sysctl/vm.txt mentions a
>> vfs_cache_pressure parameter.
>> Yeah. And dirty hack will be possible to adjust sb->s_shrink.batch.
> I am worrying if it could lead to OOM condition on embedded
> system(short memory(DRAM) and support 3TB HDD disk of big size.)
>
> Please let me know if any issues or queries.

So, now I think stable inode number may be useful if there are users of
it. And I guess those functionality is no collisions with -mm. And I
suppose we can add two modes for "nfs" option (e.g. nfs=1 and nfs=2).

If nfs=1, works like current -mm without no limited operations.
If nfs=2, try to make stable FH and limit some operations

(option name doesn't matter here.)

Does this work fine?
--
OGAWA Hirofumi <hirofumi@xxxxxxxxxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/