Re: [PATCH v2 1/5] fat: allocate persistent inode numbers

From: Namjae Jeon
Date: Tue Sep 11 2012 - 08:00:04 EST


2012/9/10, OGAWA Hirofumi <hirofumi@xxxxxxxxxxxxxxxxxx>:
> Namjae Jeon <linkinjeon@xxxxxxxxx> writes:
>
>> Yes, It is true(current VFAT of -mm tree is not stable). Although we
>> set lookupcache=none while mounting, ESTALE error can still occur in
>> rename case.
>> So there still remain ESTALE error issue from rename case on current -mm
>> tree.
>> plz See the step as the following
>> 1. on client write to file.
>> 2. on client, move/rename file.
>> 3. on server, do drop_caches. etc to somehow evict indoe number so
>> that it gets new inode number
>> 4. on client, resume the program to write to file. write will fail
>> (write: Stale NFS file handle)
>
Hi OGAWA.
> Since rename() will be disabled on stable ino patches, this will be
> unfixable, so rather maybe it is worse.
Currently with our patchset : only rename issue (could not find any
correct approach to ignore this. If we do not update this immediately
at i_pos change â it is just delaying the problem). And we can return
EBUSY when rename is called while process is opening file with rename
limitation. Without our patchset also - the rename issue can occur
over NFS file access - when the inode is evicted from the SERVER
cache.
>
> Did you checked why it returns -ESTALE? Or rename() issue also is
> unfixable on -mm?
It is reproducible regardless of lookupcache is enable or disable.
The inode is not found in server inode cache. So when
d_obtain_alias(inode) is called, it returns ESTALE.
Call path like this.
fh_verify()-->nfsd_set_fh_dentry()-->exportfs_decode_fh()-->nop->fh_to_dentry()-->fat_fh_to_dentry()-->generic_fh_to_dentry()-->get_inode()-->fat_nfs_get_inode()

static struct inode *fat_nfs_get_inode(struct super_block *sb,
u64 ino, u32 generation)
{
......
inode = ilookup(sb, ino); ->This looks up in inode cache and
returns null
if (inode && generation && (inode->i_generation != generation)) {
iput(inode);
inode = NULL;
}
return inode;
}

I think that it is unfixable because we can not know i_pos of inode
changed by rename.
And even though we know it, there is no rebuild inode routine in -mm.
And It even can not fix in our patches.

>
>> And ......
>> If we mount NFS with lookupcache=none, FAT file lookup performance is
>> severely dropped.
>> LOOKUP performance is very poor on slow network and slow device. I do
>> not recommend to disable lookup cache on NFS.
>> And that is why reconstructing inode is already implemented in other
>> filesystem (e.g. EXT4, XFS etc..)
>> Currently lookupcache is enabled by default in NFS, it means users
>> already have disclosed and experienced ESTALE issues on NFS over VFAT.
>>
>> I agree wth you to make NFS over VFAT read-only filesystem to avoid all
>> issues.
>> Eventually we can make it writable with rename limitation when we
>> decide that it is pretty stable in mainline.
>> So, I suggest to add 'nfs_ro' mount option instead of 'nfs' option.
>
> -mm seems to be more stable than I thought. As he said, sounds like
> rename() is an only known issue on -mm, true?
Yes, There is only rename issue in stability if we use lookcache is disable.
But performance will severely be dropped
But If lookup cache is enable, there are estale and rename issue in -mm.
>
> And are you tried https://lkml.org/lkml/2012/6/29/381 patches? It sounds
> like to improve performance by enabling lookupcache.
We checked this patches when facing estale issue in -mm.
But It is no use, these patches just retry system call one more when
estale error.

> I'd like to be knowing the critical reason we have to replace it.
I arrange to help your decision as the following.

1. lookup cache is enable at default in NFS. So estale error can be
easily occurred in -mm.
2. If lookup cache is disable, there is rename issue and file lookup
performance is dropped in -mm.
4. If we use our patches, there is rename issue. but we can use VFAT
over NFS with lookup cache enable.
5. If we use read-only with our patches, there is no issue.

Thanks.
>
> Thanks.
> --
> OGAWA Hirofumi <hirofumi@xxxxxxxxxxxxxxxxxx>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/