Re: Remove BKL from FAT/VFAT/MSDOS (v1) (was Re: Fw: Regressioncaused by bf726e "semaphore: fix,")

From: Linus Torvalds
Date: Fri May 16 2008 - 21:32:26 EST




On Sat, 17 May 2008, OGAWA Hirofumi wrote:
> > static struct inode *fat_alloc_inode(struct super_block *sb)
> > {
> > struct msdos_inode_info *ei;
> > - ei = kmem_cache_alloc(fat_inode_cachep, GFP_KERNEL);
> > + ei = kmem_cache_alloc(fat_inode_cachep, GFP_NOFS);
> > if (!ei)
> > return NULL;
> > return &ei->vfs_inode;
>
> Um... do we need this? I think this path is not called from memory
> allocation path...

The issue isn't that *this* is called by memory allocation paths, but
whether this can hold the lock and then some memory allocation path calls
back to the filesystem two write something out - and deadlock.

In other words, the chain is something like this:

msdos_lookup
lock_super() ****
fat_build_inode ->
new_inode() ->
s->s_ops->alloc_inode = fat_alloc_inode ->
kmem_cache_alloc() ->
.. pageout ..
sync_inodes ->
(or any other writeout)
lock_super() ****

and now it deadlocks.

In other words, the kmem_cache_alloc() must _not_ be allowed to actually
cause a filesystem writeout, and that's what GFP_NOFS is all about.

That said, just removing the lock-kernel entirely obviously would not have
that problem either, but I feel safer at least keeping the same locking it
had, rather than removing locking entirely.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/