Re: [PATCH v2 1/5] fat: allocate persistent inode numbers

From: J. Bruce Fields
Date: Thu Sep 13 2012 - 07:20:27 EST


On Thu, Sep 13, 2012 at 05:33:02PM +0900, OGAWA Hirofumi wrote:
> Namjae Jeon <linkinjeon@xxxxxxxxx> writes:
>
> >> I see. So, client can't solve the ESTALE if inode cache was evicted,
> >> right? (without application changes)
> >
> > There can be situation where we may get not only ESTALE but EIO also.
> >
> > For example,
> > -------------------------------
> > fd = open(âfoo.txtâ);
> > while (1) {
> > sleep(1);
> > write(fd..);
> > }
> > --------------------------------
> >
> > Here âwriteâ may fail when inode number of âfoo.txtâ is changed at
> > server due to cache eviction under memory pressure.
> > When we tried a similar test, we found that âwriteâ is retuning âEIOâ
> > instead of âESTALEâ
> >
> > ---------------------------------------------------------------------------------------------------------
> > #> ./write_test_dbg bbb 1000 0
> > FILE : bbb, SIZE : 1048576000 , FSYNC : OFF , RECORD_SIZE = 4096
> > 106264 -rwxr-xr-x 1 root 0 0 Jan 1 00:14 bbb
> > write failed after 60080128 bytes:, errno = 5: Input/output error
> > ---------------------------------------------------------------------------------------------------------
> >
> > As we get EIO instead of ESTALE, it may be difficult to decide when
> > "restart from LOOKUPâ in such situation.
> > Also, as per Bruce opinion, we can not avoid ESTALE from inode number
> > change in rebooted server case.
> > In reboot case, it is worst as it may attempt to write in a different
> > file if NFS handle at NFS client match with inode number of some other
> > file at NFS server.
>
> I see.
>
> >> Grepping around... Documentation/sysctl/vm.txt mentions a
> >> vfs_cache_pressure parameter.
> >> Yeah. And dirty hack will be possible to adjust sb->s_shrink.batch.
> > I am worrying if it could lead to OOM condition on embedded
> > system(short memory(DRAM) and support 3TB HDD disk of big size.)
> >
> > Please let me know if any issues or queries.
>
> So, now I think stable inode number may be useful if there are users of
> it. And I guess those functionality is no collisions with -mm. And I
> suppose we can add two modes for "nfs" option (e.g. nfs=1 and nfs=2).
>
> If nfs=1, works like current -mm without no limited operations.

Apologies, I haven't been following the conversation carefully: remind
me what "works like current -mm" means?

--b.

> If nfs=2, try to make stable FH and limit some operations
>
> (option name doesn't matter here.)
>
> Does this work fine?
> --
> OGAWA Hirofumi <hirofumi@xxxxxxxxxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/